Using ESMFold2#
ESMFold2 is the latest generation of the ESM structure-prediction family (Biohub/esm). Unlike first-generation ESMFold, it predicts full biomolecular complexes — multiple protein chains, nucleic acids, and small-molecule ligands — with a diffusion-based decoder, and can optionally condition on a multiple sequence alignment (MSA) for improved accuracy.
Two variants are available:
ESMFold2Model(session.fold.esmfold2) — the full model; accepts an optional MSA per protein chain.ESMFold2FastModel(session.fold.esmfold2_fast) — a single-sequence variant for fast predictions without an MSA.
First-generation ESMFold is still available via
session.fold.esmfoldfor quick single-chain, single-sequence predictions. It does not support ligands, nucleic acids, multi-chain complexes, or MSA conditioning — reach for ESMFold2 when you need any of those. See the note on ESMFold at the end of this guide.
What you need before getting started#
Connect to your session and define the sequences you want to fold. The example here is a two-chain complex built from Interleukin-2:
[1]:
import openprotein
from openprotein.molecules import Complex, Ligand, Protein
# Login to your session
session = openprotein.connect()
sequence = "MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP"
Getting the model#
Create the model object for ESMFold2:
[2]:
esmfold2 = session.fold.esmfold2
help(esmfold2.fold)
Help on method fold in module openprotein.fold.esmfold2:
fold(
sequences: Sequence[Complex | Protein | str | bytes] | MSAFuture,
diffusion_samples: int = 1,
num_recycles: int = 3,
num_steps: int = 200,
step_scale: float | None = None,
seed: int | None = None,
**_
) -> FoldResultFuture method of openprotein.fold.esmfold2.ESMFold2Model instance
Request structure prediction with ESMFold2.
Parameters
----------
sequences : Sequence[Complex | Protein | str | bytes] | MSAFuture
List of complexes to fold. `Protein` objects must be tagged with
an `msa`, which can be `Protein.single_sequence_mode` for single
sequence mode. Alternatively, supply an `MSAFuture` to use all
query sequences as a multimer.
diffusion_samples : int
Number of diffusion samples to use.
num_recycles : int
Number of recycling steps to use.
num_steps : int
Number of sampling steps to use.
step_scale : float | None
Scaling factor for diffusion steps.
seed : int | None
Seed for the diffusion sampler.
Returns
-------
FoldResultFuture
Future for the folding result.
Predicting a complex#
Build a Complex of named chains. Each chain is a Protein that must declare how it is conditioned — here we use Protein.single_sequence_mode to fold without an MSA:
[3]:
chain_a = Protein(sequence)
chain_a.msa = Protein.single_sequence_mode
chain_b = Protein(sequence)
chain_b.msa = Protein.single_sequence_mode
complex = Complex(chains={"A": chain_a, "B": chain_b})
future = esmfold2.fold(
[complex],
num_recycles=3, # trunk recycling iterations
num_steps=50, # diffusion sampling steps (default 200; fewer is faster)
diffusion_samples=1, # number of structure samples to draw
seed=0,
)
future
[3]:
FoldJob(num_records=1, job_id='90535dff-d152-457f-9aee-5db886b2eb40', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 6, 4, 14, 27, 40, 226796, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None, failure_message=None)
The runtime hyperparameters trade speed for accuracy: num_recycles controls how many times the trunk refines its representation, num_steps is the number of diffusion sampling steps (the default is 200; we use fewer here for a quick demo), and diffusion_samples draws multiple independent structure samples per input. seed makes a run reproducible.
Wait for the job to complete with wait_until_done():
[4]:
future.wait_until_done(verbose=True, timeout=900)
Waiting: 100%|██████████| 100/100 [04:03<00:00, 2.43s/it, status=SUCCESS]
[4]:
True
Folding with a ligand#
A key capability of ESMFold2 — unavailable in first-generation ESMFold — is co-folding small-molecule ligands alongside protein chains. Add a Ligand chain by SMILES string or by Chemical Component Dictionary (CCD) code:
[5]:
ligand_complex = Complex(chains={"A": chain_a})
ligand_complex.set_chain("L", Ligand(smiles="CCO")) # ethanol, by SMILES
# ...or by CCD code: ligand_complex.set_chain("L", Ligand(ccd="HEM"))
ligand_future = esmfold2.fold([ligand_complex], num_steps=50, seed=0)
ligand_future.wait_until_done(verbose=True, timeout=900)
Waiting: 100%|██████████| 100/100 [00:05<00:00, 18.86it/s, status=SUCCESS]
[5]:
True
Conditioning on an MSA#
The full esmfold2 variant can condition on an MSA for improved accuracy. Attach an MSA to a protein chain by assigning its msa attribute an MSA created with session.align.create_msa:
[6]:
# msa = session.align.create_msa(sequence.encode())
# chain = Protein(sequence)
# chain.msa = msa # an MSAFuture or MSA id
# future = esmfold2.fold([chain])
esmfold2-fast is a single-sequence model and rejects chains that carry an MSA — use Protein.single_sequence_mode with it instead.
Fast single-sequence predictions with ESMFold2-Fast#
When you don’t need an MSA, esmfold2-fast is a lighter-weight, faster variant. It accepts the same inputs and hyperparameters, but every protein chain must use Protein.single_sequence_mode:
[7]:
esmfold2_fast = session.fold.esmfold2_fast
fast_chain = Protein(sequence)
fast_chain.msa = Protein.single_sequence_mode
fast_future = esmfold2_fast.fold([fast_chain], num_steps=50, seed=0)
fast_future.wait_until_done(verbose=True, timeout=900)
Waiting: 100%|██████████| 100/100 [04:47<00:00, 2.87s/it, status=SUCCESS]
[7]:
True
Retrieving the results#
Fetch the results with get(), which returns a list of Structure objects — one per input. Each Structure holds one Complex per diffusion sample:
[8]:
results = future.get()
structure = results[0]
complex = structure[0] # first diffusion sample
protein = complex.get_protein("A") # chains are named alphabetically
print("Predicted structure:", structure)
print("Chain A sequence:", protein.sequence)
Predicted structure: <openprotein.molecules.structure.Structure object at 0x127a534d0>
Chain A sequence: b'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'
Visualize the structure using molviewspec:
[9]:
%pip install molviewspec
from molviewspec import create_builder
def display_structure(structure_string):
builder = create_builder()
structure = builder.download(url="mystructure.cif")\
.parse(format="mmcif")\
.model_structure()\
.component()\
.representation()\
.color_from_source(schema="atom",
category_name="atom_site",
field_name="auth_asym_id",
palette={"kind": "categorical", # color by chain
"colors": ["blue", "red", "green", "orange"],
"mode": "ordinal"}
)
return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)
display_structure(structure.to_string(format="cif"))
Requirement already satisfied: molviewspec in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (1.8.1)
Requirement already satisfied: pydantic<3,>=1 in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Note: you may need to restart the kernel to use updated packages.
Confidence scores#
ESMFold2 returns per-sample confidence scores via get_confidence(). Each entry is an ESMFold2Confidence with the complex pTM/ipTM, the mean complex pLDDT, and per-chain breakdowns:
[10]:
confidence = future.get_confidence()[0] # one list per input; one entry per diffusion sample
c = confidence[0]
print("pTM:", c.ptm)
print("ipTM:", c.iptm)
print("complex pLDDT:", c.complex_plddt)
print("per-chain pTM:", c.chains_ptm)
print("pairwise chain ipTM:", c.pair_chains_iptm)
pTM: 0.23306608200073242
ipTM: 0.07277470082044601
complex pLDDT: 0.39403602480888367
per-chain pTM: {'0': 0.2655980587005615, '1': 0.26771098375320435}
pairwise chain ipTM: {'0': {'0': 0.2655980587005615, '1': 0.06164408102631569}, '1': {'0': 0.06224282458424568, '1': 0.26771098375320435}}
PAE and pLDDT arrays#
The PAE (Predicted Aligned Error) is an N × N matrix estimating the expected error between residue pairs; pLDDT is the per-residue confidence. Both are returned as NumPy arrays, one per input:
[11]:
pae = future.get_pae()[0]
plddt = future.get_plddt()[0]
print("PAE matrix shape:", pae.shape)
print("pLDDT shape:", plddt.shape)
PAE matrix shape: (1, 242, 242)
pLDDT shape: (1, 242)
A note on ESMFold (first generation)#
First-generation ESMFold remains available via session.fold.esmfold for quick single-chain, single-sequence structure predictions:
[12]:
# esm = session.fold.esmfold.fold([sequence.encode()], num_recycles=1)
# structure = esm.get()[0]
ESMFold (v1) does not support ligands, nucleic acids, multi-chain complexes, or MSA conditioning. Use ESMFold2 (or esmfold2-fast) whenever you need any of those capabilities; otherwise the first-generation model is a fast option for single-sequence monomers.
Next steps#
Save your structure for future use, or compare it against another predictor such as AlphaFold2, Boltz, or Protenix-v2:
[13]:
with open("esmfold2_prediction.cif", "w") as f:
f.write(structure.to_string(format="cif"))