Open In Colab Get Notebook View In GitHub

Using AlphaFold2#

This tutorial shows you how to use the AlphaFold2 model to create a predicted 3D structure of your protein sequence or complex of interest. We recommend using AlphaFold2 with multi-chain sequences. If you have a single-chain sequence, please visit Using ESMFold. If you have ligands or DNA/RNA of interest, please try Using Boltz instead.

What you need before getting started#

Specify a sequence or complex of interest whose structure you want to predict. This example uses 1SPD.

We will specify a Complex so that we can attach the MSA to provide AlphaFold-2 with the evolutionary context.

[1]:
import openprotein
from openprotein.molecules import Complex, Protein

# Login to your session
session = openprotein.connect()

# Specify your complex
complex = Complex({
    "A": Protein("XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ"),
    "B": Protein("XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ")
})

# We can also directly use a ':'-delimited string as well if we run in single sequence mode, i.e. no MSA.
# complex = "XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ:XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ"

Getting the Model#

Start by getting the AlphaFold2 model object:

[2]:
afmodel = session.fold.alphafold2
afmodel.fold?
Signature:
afmodel.fold(
    sequences: Union[Sequence[openprotein.molecules.complex.Complex | openprotein.molecules.protein.Protein | str], openprotein.align.msa.MSAFuture, NoneType] = None,
    num_recycles: int | None = None,
    num_models: int = 1,
    num_relax: int = 0,
    **kwargs,
) -> openprotein.fold.future.FoldResultFuture
Docstring:
Post sequences to alphafold model.

Parameters
----------
sequences : List[Complex | Protein | str] | MSAFuture
    List of protein sequences to include in folded output. `Protein` objects must be tagged with an `msa`, which can be a `Protein.single_sequence_mode` for single sequence mode. Alternatively, supply an `MSAFuture` to use all query sequences as a multimer.
num_recycles : int
    number of times to recycle models
num_models : int
    number of models to train - best model will be used
num_relax : int
    maximum number of iterations for relax

Returns
-------
job : Job
File:      ~/Projects/openprotein/openprotein-python-private/openprotein/fold/alphafold2.py
Type:      method

You can review some of the metadata about the AlphaFold2 model.

[3]:
afmodel.metadata
[3]:
ModelMetadata(id='alphafold2', description=ModelDescription(citation_title='Highly accurate protein structure prediction with AlphaFold.', doi='10.1038/s41586-021-03819-2', summary='AlphaFold2 model.'), max_sequence_length=2400, dimension=-1, output_types=['fold'], input_tokens=['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', 'X', 'O', 'U', 'B', 'Z', '-'], output_tokens=None, token_descriptions=[[TokenInfo(id=0, token='A', primary=True, description='Alanine')], [TokenInfo(id=1, token='R', primary=True, description='Arginine')], [TokenInfo(id=2, token='N', primary=True, description='Asparagine')], [TokenInfo(id=3, token='D', primary=True, description='Aspartic acid')], [TokenInfo(id=4, token='C', primary=True, description='Cysteine')], [TokenInfo(id=5, token='Q', primary=True, description='Glutamine')], [TokenInfo(id=6, token='E', primary=True, description='Glutamic acid')], [TokenInfo(id=7, token='G', primary=True, description='Glycine')], [TokenInfo(id=8, token='H', primary=True, description='Histidine')], [TokenInfo(id=9, token='I', primary=True, description='Isoleucine')], [TokenInfo(id=10, token='L', primary=True, description='Leucine')], [TokenInfo(id=11, token='K', primary=True, description='Lysine')], [TokenInfo(id=12, token='M', primary=True, description='Methionine')], [TokenInfo(id=13, token='F', primary=True, description='Phenylalanine')], [TokenInfo(id=14, token='P', primary=True, description='Proline')], [TokenInfo(id=15, token='S', primary=True, description='Serine')], [TokenInfo(id=16, token='T', primary=True, description='Threonine')], [TokenInfo(id=17, token='W', primary=True, description='Tryptophan')], [TokenInfo(id=18, token='Y', primary=True, description='Tyrosine')], [TokenInfo(id=19, token='V', primary=True, description='Valine')], [TokenInfo(id=20, token=':', primary=False, description='Chain token, used for polymers')]])

Predicting the Complex Structure#

Call the AlphaFold-2 fold method with our complex and return a job to await. We also set num_models to 3.

[5]:
af2_fold = afmodel.fold(sequences=[complex], num_models=3)

af2_fold
[5]:
FoldJob(num_records=1, job_id='240ec08e-7c47-4ccd-a12f-7ea969b0cd9b', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 16, 17, 14, 41, 8933, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[6]:
af2_fold.wait_until_done(verbose=True, timeout=900)
Waiting: 100%|█████████████████████████████████████████████████| 100/100 [05:39<00:00,  3.40s/it, status=SUCCESS]
[6]:
True

Wait for the job to complete and fetch the results all with wait():

Retrieving the Results#

Getting the Structure#

The primary result is the Structure which contains the parsed molecular structure from the AlphaFold-2 inference. The Structure object itself can hold multiple Complexs which in turn can hold multiple difference chains, including Proteins, which themselves hold the individual predicted 3D coordinates of their atoms.

The number of Complexes in the resulting Structure depends on the num_models parameter in the request, and since we set it to 3, we can expect 3 predicted Complexes.

The output result is a list type because the API supports submitting multiple Complexes for prediction and each result maps to what was submitted in order.

[7]:
result = af2_fold.get()
structure = result[0]
predicted_complex = structure[0]
print("Predicted structures:", result)
print("Predicted molecular complex:", result[0][0])
print("Predicted protein A:\n", predicted_complex.get_protein("A"))
print("Predicted protein B:\n", predicted_complex.get_protein("B"))
Predicted structures: [<openprotein.molecules.structure.Structure object at 0x7fbcd39b3d40>]
Predicted molecular complex: <openprotein.molecules.complex.Complex object at 0x7fbd3c1dfda0>
Predicted protein A:
 0     SEQUENCE ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSA

60    SEQUENCE GPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVH

120   SEQUENCE EKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Predicted protein B:
 0     SEQUENCE ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSA

60    SEQUENCE GPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVH

120   SEQUENCE EKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

Visualize the structure using molviewspec:

[8]:
%pip install molviewspec
from molviewspec import create_builder

def display_structure(structure_string):
    builder = create_builder()
    structure = builder.download(url="mystructure.cif")\
        .parse(format="mmcif")\
        .model_structure()\
        .component()\
        .representation()\
        .color_from_source(schema="atom",
                            category_name="atom_site",
                            field_name="auth_asym_id",
                            palette={"kind": "categorical", # color by chain
                                    "colors": ["blue", "red", "green", "orange"],
                                    "mode": "ordinal"}
                          )
    return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)

display_structure(structure.to_string(format="cif"))
Requirement already satisfied: molviewspec in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (1.7.0)
Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Note: you may need to restart the kernel to use updated packages.

Note that because we set num_models to 3, we have 3 models to inspect in our visualization.

Getting the Prediction Metrics#

AlphaFold-2 supports retrieving the following metrics from the predictions:

  • pLDDT (predicted Local Distance Difference Test)
    A per-residue confidence score—commonly scaled from 0–100 (or 0.0–1.0)—indicating how reliably each residue’s coordinate position is predicted.
  • PAE (Predicted Aligned Error)
    An N × N matrix estimating the expected error between pairs of residues, useful for assessing relative positions (e.g., domains or chains).
  • pTM (predicted TM-score)
    A global confidence metric scaled from 0–1 that estimates the overall quality of the predicted structure. Values above 0.5 generally indicate a confident prediction with the correct overall fold, while higher values (>0.8) suggest high confidence in the structure’s accuracy.

Note that the first dimension is 3 due to num_models.

[9]:
# Retrieve the pLDDT scores
plddt_scores = af2_fold.get_plddt()[0] # note that we are indexing into the first one
print("pLDDT scores shape:", plddt_scores.shape)
print("First 10 scores:", plddt_scores[0, :10])

# Retrieve the PAE matrix
pae_matrix = af2_fold.get_pae()[0]
print("\nPAE matrix shape:", pae_matrix.shape)

# Retrieve the pTM matrix
ptm_matrix = af2_fold.get_ptm()[0]
print("\npTM matrix shape:", ptm_matrix.shape)
pLDDT scores shape: (3, 308)
First 10 scores: [71.75 90.5  97.06 98.56 98.88 98.94 98.88 98.81 98.81 97.31]

PAE matrix shape: (3, 308, 308)

pTM matrix shape: (3,)

Next steps#

Try another structure predictor like Boltz-1 or Boltz-2. You can save your predicted structure like so:

[10]:
with open("alphafold2_prediction.cif", "w") as f:
    f.write(structure.to_string(format="cif"))