Open In Colab Get Notebook View In GitHub

Using ESMFold2#

ESMFold2 is the latest generation of the ESM structure-prediction family (Biohub/esm). Unlike first-generation ESMFold, it predicts full biomolecular complexes — multiple protein chains, nucleic acids, and small-molecule ligands — with a diffusion-based decoder, and can optionally condition on a multiple sequence alignment (MSA) for improved accuracy.

Two variants are available:

  • ESMFold2Model (session.fold.esmfold2) — the full model; accepts an optional MSA per protein chain.

  • ESMFold2FastModel (session.fold.esmfold2_fast) — a single-sequence variant for fast predictions without an MSA.

First-generation ESMFold is still available via session.fold.esmfold for quick single-chain, single-sequence predictions. It does not support ligands, nucleic acids, multi-chain complexes, or MSA conditioning — reach for ESMFold2 when you need any of those. See the note on ESMFold at the end of this guide.

What you need before getting started#

Connect to your session and define the sequences you want to fold. The example here is a two-chain complex built from Interleukin-2:

[1]:
import openprotein
from openprotein.molecules import Complex, Ligand, Protein

# Login to your session
session = openprotein.connect()

sequence = "MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP"

Getting the model#

Create the model object for ESMFold2:

[2]:
esmfold2 = session.fold.esmfold2
help(esmfold2.fold)
Help on method fold in module openprotein.fold.esmfold2:

fold(
    sequences: Sequence[Complex | Protein | str | bytes] | MSAFuture,
    diffusion_samples: int = 1,
    num_recycles: int = 3,
    num_steps: int = 200,
    step_scale: float | None = None,
    seed: int | None = None,
    **_
) -> FoldResultFuture method of openprotein.fold.esmfold2.ESMFold2Model instance
    Request structure prediction with ESMFold2.

    Parameters
    ----------
    sequences : Sequence[Complex | Protein | str | bytes] | MSAFuture
        List of complexes to fold. `Protein` objects must be tagged with
        an `msa`, which can be `Protein.single_sequence_mode` for single
        sequence mode. Alternatively, supply an `MSAFuture` to use all
        query sequences as a multimer.
    diffusion_samples : int
        Number of diffusion samples to use.
    num_recycles : int
        Number of recycling steps to use.
    num_steps : int
        Number of sampling steps to use.
    step_scale : float | None
        Scaling factor for diffusion steps.
    seed : int | None
        Seed for the diffusion sampler.

    Returns
    -------
    FoldResultFuture
        Future for the folding result.

Predicting a complex#

Build a Complex of named chains. Each chain is a Protein that must declare how it is conditioned — here we use Protein.single_sequence_mode to fold without an MSA:

[3]:
chain_a = Protein(sequence)
chain_a.msa = Protein.single_sequence_mode
chain_b = Protein(sequence)
chain_b.msa = Protein.single_sequence_mode

complex = Complex(chains={"A": chain_a, "B": chain_b})

future = esmfold2.fold(
    [complex],
    num_recycles=3,        # trunk recycling iterations
    num_steps=50,          # diffusion sampling steps (default 200; fewer is faster)
    diffusion_samples=1,   # number of structure samples to draw
    seed=0,
)
future
[3]:
FoldJob(num_records=1, job_id='90535dff-d152-457f-9aee-5db886b2eb40', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 6, 4, 14, 27, 40, 226796, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None, failure_message=None)

The runtime hyperparameters trade speed for accuracy: num_recycles controls how many times the trunk refines its representation, num_steps is the number of diffusion sampling steps (the default is 200; we use fewer here for a quick demo), and diffusion_samples draws multiple independent structure samples per input. seed makes a run reproducible.

Wait for the job to complete with wait_until_done():

[4]:
future.wait_until_done(verbose=True, timeout=900)
Waiting: 100%|██████████| 100/100 [04:03<00:00,  2.43s/it, status=SUCCESS]
[4]:
True

Folding with a ligand#

A key capability of ESMFold2 — unavailable in first-generation ESMFold — is co-folding small-molecule ligands alongside protein chains. Add a Ligand chain by SMILES string or by Chemical Component Dictionary (CCD) code:

[5]:
ligand_complex = Complex(chains={"A": chain_a})
ligand_complex.set_chain("L", Ligand(smiles="CCO"))  # ethanol, by SMILES
# ...or by CCD code: ligand_complex.set_chain("L", Ligand(ccd="HEM"))

ligand_future = esmfold2.fold([ligand_complex], num_steps=50, seed=0)
ligand_future.wait_until_done(verbose=True, timeout=900)
Waiting: 100%|██████████| 100/100 [00:05<00:00, 18.86it/s, status=SUCCESS]
[5]:
True

Conditioning on an MSA#

The full esmfold2 variant can condition on an MSA for improved accuracy. Attach an MSA to a protein chain by assigning its msa attribute an MSA created with session.align.create_msa:

[6]:
# msa = session.align.create_msa(sequence.encode())
# chain = Protein(sequence)
# chain.msa = msa  # an MSAFuture or MSA id
# future = esmfold2.fold([chain])

esmfold2-fast is a single-sequence model and rejects chains that carry an MSA — use Protein.single_sequence_mode with it instead.

Fast single-sequence predictions with ESMFold2-Fast#

When you don’t need an MSA, esmfold2-fast is a lighter-weight, faster variant. It accepts the same inputs and hyperparameters, but every protein chain must use Protein.single_sequence_mode:

[7]:
esmfold2_fast = session.fold.esmfold2_fast

fast_chain = Protein(sequence)
fast_chain.msa = Protein.single_sequence_mode
fast_future = esmfold2_fast.fold([fast_chain], num_steps=50, seed=0)
fast_future.wait_until_done(verbose=True, timeout=900)
Waiting: 100%|██████████| 100/100 [04:47<00:00,  2.87s/it, status=SUCCESS]
[7]:
True

Retrieving the results#

Fetch the results with get(), which returns a list of Structure objects — one per input. Each Structure holds one Complex per diffusion sample:

[8]:
results = future.get()
structure = results[0]
complex = structure[0]               # first diffusion sample
protein = complex.get_protein("A")   # chains are named alphabetically

print("Predicted structure:", structure)
print("Chain A sequence:", protein.sequence)
Predicted structure: <openprotein.molecules.structure.Structure object at 0x127a534d0>
Chain A sequence: b'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'

Visualize the structure using molviewspec:

[9]:
%pip install molviewspec
from molviewspec import create_builder

def display_structure(structure_string):
    builder = create_builder()
    structure = builder.download(url="mystructure.cif")\
        .parse(format="mmcif")\
        .model_structure()\
        .component()\
        .representation()\
        .color_from_source(schema="atom",
                            category_name="atom_site",
                            field_name="auth_asym_id",
                            palette={"kind": "categorical",  # color by chain
                                     "colors": ["blue", "red", "green", "orange"],
                                     "mode": "ordinal"}
                          )
    return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)

display_structure(structure.to_string(format="cif"))
Requirement already satisfied: molviewspec in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (1.8.1)
Requirement already satisfied: pydantic<3,>=1 in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /Users/jmage/Projects/openprotein/openprotein-docs/.pixi/envs/dev-nb/lib/python3.14/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Note: you may need to restart the kernel to use updated packages.

Confidence scores#

ESMFold2 returns per-sample confidence scores via get_confidence(). Each entry is an ESMFold2Confidence with the complex pTM/ipTM, the mean complex pLDDT, and per-chain breakdowns:

[10]:
confidence = future.get_confidence()[0]  # one list per input; one entry per diffusion sample
c = confidence[0]

print("pTM:", c.ptm)
print("ipTM:", c.iptm)
print("complex pLDDT:", c.complex_plddt)
print("per-chain pTM:", c.chains_ptm)
print("pairwise chain ipTM:", c.pair_chains_iptm)
pTM: 0.23306608200073242
ipTM: 0.07277470082044601
complex pLDDT: 0.39403602480888367
per-chain pTM: {'0': 0.2655980587005615, '1': 0.26771098375320435}
pairwise chain ipTM: {'0': {'0': 0.2655980587005615, '1': 0.06164408102631569}, '1': {'0': 0.06224282458424568, '1': 0.26771098375320435}}

PAE and pLDDT arrays#

The PAE (Predicted Aligned Error) is an N × N matrix estimating the expected error between residue pairs; pLDDT is the per-residue confidence. Both are returned as NumPy arrays, one per input:

[11]:
pae = future.get_pae()[0]
plddt = future.get_plddt()[0]

print("PAE matrix shape:", pae.shape)
print("pLDDT shape:", plddt.shape)
PAE matrix shape: (1, 242, 242)
pLDDT shape: (1, 242)

A note on ESMFold (first generation)#

First-generation ESMFold remains available via session.fold.esmfold for quick single-chain, single-sequence structure predictions:

[12]:
# esm = session.fold.esmfold.fold([sequence.encode()], num_recycles=1)
# structure = esm.get()[0]

ESMFold (v1) does not support ligands, nucleic acids, multi-chain complexes, or MSA conditioning. Use ESMFold2 (or esmfold2-fast) whenever you need any of those capabilities; otherwise the first-generation model is a fast option for single-sequence monomers.

Next steps#

Save your structure for future use, or compare it against another predictor such as AlphaFold2, Boltz, or Protenix-v2:

[13]:
with open("esmfold2_prediction.cif", "w") as f:
    f.write(structure.to_string(format="cif"))