Using AlphaFold2#
This tutorial shows you how to use the AlphaFold2 model to create a predicted 3D structure of your protein sequence or complex of interest. We recommend using AlphaFold2 with multi-chain sequences. If you have a single-chain sequence, please visit Using ESMFold. If you have ligands or DNA/RNA of interest, please try Using Boltz instead.
What you need before getting started#
Specify a sequence or complex of interest whose structure you want to predict. This example uses 1SPD.
We will specify a Complex so that we can attach the MSA to provide AlphaFold-2 with the evolutionary context.
[1]:
import openprotein
from openprotein.molecules import Complex, Protein
# Login to your session
session = openprotein.connect()
# Specify your complex
complex = Complex({
"A": Protein("XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ"),
"B": Protein("XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ")
})
# We can also directly use a ':'-delimited string as well if we run in single sequence mode, i.e. no MSA.
# complex = "XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ:XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ"
Getting the Model#
Start by getting the AlphaFold2 model object:
[2]:
afmodel = session.fold.alphafold2
afmodel.fold?
Signature:
afmodel.fold(
sequences: Union[Sequence[openprotein.molecules.complex.Complex | openprotein.molecules.protein.Protein | str], openprotein.align.msa.MSAFuture, NoneType] = None,
num_recycles: int | None = None,
num_models: int = 1,
num_relax: int = 0,
**kwargs,
) -> openprotein.fold.future.FoldResultFuture
Docstring:
Post sequences to alphafold model.
Parameters
----------
sequences : List[Complex | Protein | str] | MSAFuture
List of protein sequences to include in folded output. `Protein` objects must be tagged with an `msa`, which can be a `Protein.single_sequence_mode` for single sequence mode. Alternatively, supply an `MSAFuture` to use all query sequences as a multimer.
num_recycles : int
number of times to recycle models
num_models : int
number of models to train - best model will be used
num_relax : int
maximum number of iterations for relax
Returns
-------
job : Job
File: ~/Projects/openprotein/openprotein-python-private/openprotein/fold/alphafold2.py
Type: method
You can review some of the metadata about the AlphaFold2 model.
[3]:
afmodel.metadata
[3]:
ModelMetadata(id='alphafold2', description=ModelDescription(citation_title='Highly accurate protein structure prediction with AlphaFold.', doi='10.1038/s41586-021-03819-2', summary='AlphaFold2 model.'), max_sequence_length=2400, dimension=-1, output_types=['fold'], input_tokens=['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', 'X', 'O', 'U', 'B', 'Z', '-'], output_tokens=None, token_descriptions=[[TokenInfo(id=0, token='A', primary=True, description='Alanine')], [TokenInfo(id=1, token='R', primary=True, description='Arginine')], [TokenInfo(id=2, token='N', primary=True, description='Asparagine')], [TokenInfo(id=3, token='D', primary=True, description='Aspartic acid')], [TokenInfo(id=4, token='C', primary=True, description='Cysteine')], [TokenInfo(id=5, token='Q', primary=True, description='Glutamine')], [TokenInfo(id=6, token='E', primary=True, description='Glutamic acid')], [TokenInfo(id=7, token='G', primary=True, description='Glycine')], [TokenInfo(id=8, token='H', primary=True, description='Histidine')], [TokenInfo(id=9, token='I', primary=True, description='Isoleucine')], [TokenInfo(id=10, token='L', primary=True, description='Leucine')], [TokenInfo(id=11, token='K', primary=True, description='Lysine')], [TokenInfo(id=12, token='M', primary=True, description='Methionine')], [TokenInfo(id=13, token='F', primary=True, description='Phenylalanine')], [TokenInfo(id=14, token='P', primary=True, description='Proline')], [TokenInfo(id=15, token='S', primary=True, description='Serine')], [TokenInfo(id=16, token='T', primary=True, description='Threonine')], [TokenInfo(id=17, token='W', primary=True, description='Tryptophan')], [TokenInfo(id=18, token='Y', primary=True, description='Tyrosine')], [TokenInfo(id=19, token='V', primary=True, description='Valine')], [TokenInfo(id=20, token=':', primary=False, description='Chain token, used for polymers')]])
Creating an MSA using Homology Search#
When using AlphaFold2 with protein sequences, we need to supply an MSA to help inform the model. Otherwise, we can also explicitly set it to run using single sequence mode. You have to specify protein.msa either an MSA or to use Protein.single_sequence_mode. We will go ahead to create the MSA using our platform capabilities.
Use our complex as the seed sequence to create an MSA:
[4]:
msa_query = []
for p in complex.get_proteins().values():
msa_query.append(p.sequence)
msa = session.align.create_msa(seed=b":".join(msa_query))
for p in complex.get_proteins().values():
p.msa = msa
# If desired, use single sequence mode to specify no msa
# p.msa = Protein.single_sequence_mode
msa
[4]:
MSAJob(job_id='7b5e5586-245d-4019-a30f-c8eea90882b4', job_type=<JobType.align_align: '/align/align'>, status=<JobStatus.SUCCESS: 'SUCCESS'>, created_date=datetime.datetime(2026, 1, 16, 17, 13, 7, 523305, tzinfo=TzInfo(0)), start_date=None, end_date=datetime.datetime(2026, 1, 16, 17, 13, 7, 523396, tzinfo=TzInfo(0)), prerequisite_job_id=None, progress_message=None, progress_counter=None, sequence_length=None)
We can either wait for the results to complete, or we can go ahead and schedule the fold job run immediately after the MSA is done automatically.
Predicting the Complex Structure#
Call the AlphaFold-2 fold method with our complex and return a job to await. We also set num_models to 3.
[5]:
af2_fold = afmodel.fold(sequences=[complex], num_models=3)
af2_fold
[5]:
FoldJob(num_records=1, job_id='240ec08e-7c47-4ccd-a12f-7ea969b0cd9b', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 16, 17, 14, 41, 8933, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[6]:
af2_fold.wait_until_done(verbose=True, timeout=900)
Waiting: 100%|█████████████████████████████████████████████████| 100/100 [05:39<00:00, 3.40s/it, status=SUCCESS]
[6]:
True
Wait for the job to complete and fetch the results all with wait():
Retrieving the Results#
Getting the Structure#
The primary result is the Structure which contains the parsed molecular structure from the AlphaFold-2 inference. The Structure object itself can hold multiple Complexs which in turn can hold multiple difference chains, including Proteins, which themselves hold the individual predicted 3D
coordinates of their atoms.
The number of Complexes in the resulting Structure depends on the num_models parameter in the request, and since we set it to 3, we can expect 3 predicted Complexes.
The output result is a list type because the API supports submitting multiple Complexes for prediction and each result maps to what was submitted in order.
[7]:
result = af2_fold.get()
structure = result[0]
predicted_complex = structure[0]
print("Predicted structures:", result)
print("Predicted molecular complex:", result[0][0])
print("Predicted protein A:\n", predicted_complex.get_protein("A"))
print("Predicted protein B:\n", predicted_complex.get_protein("B"))
Predicted structures: [<openprotein.molecules.structure.Structure object at 0x7fbcd39b3d40>]
Predicted molecular complex: <openprotein.molecules.complex.Complex object at 0x7fbd3c1dfda0>
Predicted protein A:
0 SEQUENCE ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSA
60 SEQUENCE GPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVH
120 SEQUENCE EKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Predicted protein B:
0 SEQUENCE ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSA
60 SEQUENCE GPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVH
120 SEQUENCE EKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Visualize the structure using molviewspec:
[8]:
%pip install molviewspec
from molviewspec import create_builder
def display_structure(structure_string):
builder = create_builder()
structure = builder.download(url="mystructure.cif")\
.parse(format="mmcif")\
.model_structure()\
.component()\
.representation()\
.color_from_source(schema="atom",
category_name="atom_site",
field_name="auth_asym_id",
palette={"kind": "categorical", # color by chain
"colors": ["blue", "red", "green", "orange"],
"mode": "ordinal"}
)
return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)
display_structure(structure.to_string(format="cif"))
Requirement already satisfied: molviewspec in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (1.7.0)
Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Note: you may need to restart the kernel to use updated packages.
Note that because we set num_models to 3, we have 3 models to inspect in our visualization.
Getting the Prediction Metrics#
AlphaFold-2 supports retrieving the following metrics from the predictions:
- pLDDT (predicted Local Distance Difference Test)A per-residue confidence score—commonly scaled from 0–100 (or 0.0–1.0)—indicating how reliably each residue’s coordinate position is predicted.
- PAE (Predicted Aligned Error)An N × N matrix estimating the expected error between pairs of residues, useful for assessing relative positions (e.g., domains or chains).
- pTM (predicted TM-score)A global confidence metric scaled from 0–1 that estimates the overall quality of the predicted structure. Values above 0.5 generally indicate a confident prediction with the correct overall fold, while higher values (>0.8) suggest high confidence in the structure’s accuracy.
Note that the first dimension is 3 due to num_models.
[9]:
# Retrieve the pLDDT scores
plddt_scores = af2_fold.get_plddt()[0] # note that we are indexing into the first one
print("pLDDT scores shape:", plddt_scores.shape)
print("First 10 scores:", plddt_scores[0, :10])
# Retrieve the PAE matrix
pae_matrix = af2_fold.get_pae()[0]
print("\nPAE matrix shape:", pae_matrix.shape)
# Retrieve the pTM matrix
ptm_matrix = af2_fold.get_ptm()[0]
print("\npTM matrix shape:", ptm_matrix.shape)
pLDDT scores shape: (3, 308)
First 10 scores: [71.75 90.5 97.06 98.56 98.88 98.94 98.88 98.81 98.81 97.31]
PAE matrix shape: (3, 308, 308)
pTM matrix shape: (3,)
Next steps#
Try another structure predictor like Boltz-1 or Boltz-2. You can save your predicted structure like so:
[10]:
with open("alphafold2_prediction.cif", "w") as f:
f.write(structure.to_string(format="cif"))