Using ESMFold#
This tutorial shows you how to use the ESMFold model to create a predicted 3D structure of your protein sequence of interest. We recommend using ESMFold with single-chain sequences. If you have a multi-chain sequence, please try Using AlphaFold2.
What you need before getting started#
Specify a sequence of interest whose structure you want to predict. The example used here is Interleukin 2:
[1]:
import openprotein
# Login to your session
session = openprotein.connect()
sequence = "MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP"
Getting the Model#
Create the model object for ESMFold:
[2]:
esmfoldmodel = session.fold.esmfold
esmfoldmodel.fold?
Signature:
esmfoldmodel.fold(
sequences: Sequence[openprotein.molecules.complex.Complex | openprotein.molecules.protein.Protein | str | bytes],
num_recycles: int | None = None,
) -> openprotein.fold.future.FoldResultFuture
Docstring:
Fold sequences using this model.
Parameters
----------
sequences : Sequence[bytes | str]
sequences to fold
num_recycles : int | None
number of times to recycle models
Returns
-------
FoldResultFuture
File: ~/Projects/openprotein/openprotein-python-private/openprotein/fold/esmfold.py
Type: method
Predicting your sequence#
Call ESMFold on your sequence. The num_recycles hyperparameter allows the model to further refine structures using the previous cycle’s output as the new cycle’s input. This parameter accepts integers between 1 and 48.
Send the sequence of interest to ESM for folding.
Note that we can submit either a Complex, or Protein, or just the sequence itself which itself represents a single Protein. We can also submit a list of sequences to batch the request.
[3]:
esm = esmfoldmodel.fold([sequence.encode()], num_recycles=1)
esm
[3]:
FoldJob(num_records=1, job_id='f3370817-5fdd-400d-bd26-a90805cd9f4a', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 16, 16, 18, 19, 489442, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Wait for the job to complete with wait_until_done():
[4]:
esm.wait_until_done(verbose=True, timeout=300)
Waiting: 100%|█████████████████████████████████████████████████| 100/100 [04:40<00:00, 2.80s/it, status=SUCCESS]
[4]:
True
Retrieving the Results#
Getting the Predicted Structure#
Fetch the results with get().
The results return a list of Structure which contains the 3D structures for each of the input in the request.
We can access the Complex for each Structure, and the folded Protein of interest.
[5]:
result = esm.get()
structure = result[0]
complex = structure[0]
protein = complex.get_protein("A") # auto-named alphabetical order
print("Predicted structure:", structure)
print("Predicted protein sequence:", protein.sequence)
Predicted structure: <openprotein.molecules.structure.Structure object at 0x7f563b329400>
Predicted protein sequence: b'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'
Visualize the structure using molviewspec:
[6]:
%pip install molviewspec
from molviewspec import create_builder
def display_structure(structure_string):
builder = create_builder()
structure = builder.download(url="mystructure.cif")\
.parse(format="mmcif")\
.model_structure()\
.component()\
.representation()\
.color_from_source(schema="atom",
category_name="atom_site",
field_name="auth_asym_id",
palette={"kind": "categorical", # color by chain
"colors": ["blue", "red", "green", "orange"],
"mode": "ordinal"}
)
return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)
display_structure(structure.to_string(format="cif"))
Requirement already satisfied: molviewspec in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (1.7.0)
Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Note: you may need to restart the kernel to use updated packages.
Getting the PAE#
ESMFold provides the PAE (Predicted Aligned Error), which is an N × N matrix estimating the expected error between pairs of residues, useful for assessing relative positions (e.g., domains or chains).
[7]:
# Retrieve the PAE matrix
pae_matrix = esm.get_pae()[0] # note that the pae result is also a list
print("\nPAE matrix shape:", pae_matrix.shape)
PAE matrix shape: (239, 239)
Next steps#
Use the predicted structure to compare with query structure, or try another structure predictor like AlphaFold2 or save your structure for future use:
[8]:
with open("esmfold_prediction.cif", "w") as f:
f.write(structure.to_string(format="cif"))