Using Protenix-v2#
This tutorial demonstrates how to use the Protenix-v2 model on the OpenProtein platform to predict the structure of a biomolecular complex that includes proteins, ligands, DNA, and RNA. Protenix-v2 is an enhanced-capacity (464M-parameter) successor to Protenix, with improved antibody-antigen structure prediction and ligand plausibility. Like AlphaFold3, it performs best when each protein chain is paired with a multiple sequence alignment (MSA). We will walk through assembling a complex, building the MSA, submitting the fold, and retrieving the predicted structure together with Protenix-v2’s confidence metrics.
The full API for the model is documented at ProtenixV2Model.
What you need before getting started#
First, ensure you have an active OpenProtein session. Then, import the classes used to define the components of your complex.
[1]:
import openprotein
from openprotein.molecules import Complex, Protein, Ligand
# Login to your session
session = openprotein.connect()
Defining the Molecules#
Protenix-v2 can model proteins, ligands, DNA, and RNA. For this example we will predict the structure of a homodimer in complex with a small molecule ligand by assembling a Complex from Protein and Ligand chains keyed by chain id.
[2]:
# Define the biomolecular complex to predict.
# Start with the protein in a homodimer.
protein = Protein(sequence="MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ")
# You can also specify the protein to be cyclic by setting the property
# protein.cyclic = True
# Define the ligand in our complex.
ligand = Ligand(ccd="SAH")
# Assemble the complex.
complex = Complex({
"A": protein,
"B": protein,
"C": ligand,
})
Create an MSA for the Protein using Homology Search#
Protenix-v2 is an AlphaFold3-style model and expects each protein chain to carry a multiple sequence alignment (MSA). You must either set protein.msa to an MSA built on the platform, or explicitly opt out by setting protein.msa = Protein.single_sequence_mode to run in single-sequence mode. Submitting a Protenix-v2 request without an MSA on one of the proteins will raise an error.
Here we build the MSA using the platform’s homology search via session.align.create_msa. Note the syntax: when seeding an MSA for a complex we follow ColabFold’s convention of joining the chain sequences with :, which lets the MSA service jointly search the multimer.
[3]:
msa_query = []
for p in complex.get_proteins().values():
msa_query.append(p.sequence)
msa = session.align.create_msa(seed=b":".join(msa_query))
for p in complex.get_proteins().values():
p.msa = msa
# If desired, use single sequence mode to specify no msa
# p.msa = Protein.single_sequence_mode
Predicting the Complex Structure#
Now we can call the fold() method on the Protenix-v2 model.
The key steps are:
Access the model via
session.fold.protenix_v2.Pass the defined complex.
Optionally tune the diffusion sampler with
diffusion_samples,num_recycles, andnum_steps.
[4]:
# Request the fold.
fold_job = session.fold.protenix_v2.fold(
sequences=[complex], # list for batch requests
diffusion_samples=1, # number of diffusion samples per input
num_recycles=10, # number of recycling steps
num_steps=200, # number of sampling steps
)
fold_job
[4]:
FoldJob(num_records=1, job_id='5fcddd1e-8287-45c6-9d18-9c2680834ab2', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 6, 4, 23, 1, 46, 73912, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None, failure_message=None)
The call returns a FoldResultFuture object immediately. This is a reference to your job running on the OpenProtein platform: you can monitor its status, or block until completion with wait_until_done().
[5]:
# Wait for the job to finish.
fold_job.wait_until_done(verbose=True)
Waiting: 100%|██████████| 100/100 [07:53<00:00, 4.74s/it, status=SUCCESS]
[5]:
True
Retrieving the Results#
Once the job is complete, you can retrieve the various outputs from the future object.
Getting the Structure#
The primary result is a Structure, returned by get(). A Structure can hold multiple Complex es (one per diffusion sample), each holding the predicted chains — including Protein chains with their per-atom 3D coordinates.
The number of Complex es in the resulting Structure matches the diffusion_samples argument from the request.
The result list itself has one entry per submitted complex, since the fold API supports batched submissions.
[6]:
result = fold_job.get()
structure = result[0]
predicted_complex = structure[0]
print("Predicted structures:", result)
print("Predicted molecular complex:", result[0][0])
print("Predicted protein A:\n", predicted_complex.get_protein("A"))
print("Predicted protein B:\n", predicted_complex.get_protein("B"))
print("Predicted ligand C:\n", predicted_complex.get_ligand("C"))
Predicted structures: [<openprotein.molecules.structure.Structure object at 0x10bd06ea0>]
Predicted molecular complex: <openprotein.molecules.complex.Complex object at 0x10bac73b0>
Predicted protein A:
0 SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA
60 SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL
120 SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT
180 SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL
240 SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD
300 SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI
360 SEQUENCE LLARRATEPSAVPEGQASENLYFQ
Predicted protein B:
0 SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA
60 SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL
120 SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT
180 SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL
240 SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD
300 SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI
360 SEQUENCE LLARRATEPSAVPEGQASENLYFQ
Predicted ligand C:
Ligand(ccd='SAH', smiles=None, _structure_block=<openprotein.utils.cif.StructureCIFBlock object at 0x10bd30410>)
Visualize the structure using molviewspec.
[7]:
%pip install molviewspec
from molviewspec import create_builder
def display_structure(structure_string):
builder = create_builder()
structure = builder.download(url="mystructure.cif")\
.parse(format="mmcif")\
.model_structure()\
.component()\
.representation()\
.color_from_source(schema="atom",
category_name="atom_site",
field_name="auth_asym_id",
palette={"kind": "categorical", # color by chain
"colors": ["blue", "red", "green", "orange"],
"mode": "ordinal"}
)
return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)
display_structure(structure.to_string(format="cif"))
Collecting molviewspec
Using cached molviewspec-1.8.1-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: pydantic<3,>=1 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Using cached molviewspec-1.8.1-py3-none-any.whl (40 kB)
Installing collected packages: molviewspec
Successfully installed molviewspec-1.8.1
Note: you may need to restart the kernel to use updated packages.
Getting Confidence Metrics#
Protenix-v2 returns a structured confidence object per diffusion sample rather than per-residue matrices. Each entry is a ProtenixConfidence that aggregates AlphaFold3-style scores at the complex, chain, and chain-pair level:
ranking_score— composite ranking metric used to order diffusion samples (0.8 * iptm + 0.2 * ptm - 100 * has_clash).ptm/iptm— predicted TM-score for the full complex and the inter-chain interface pTM.plddt— mean per-atom pLDDT in the range[0, 100].gpde— global PDE weighted by contact probabilities.has_clash— binary clash flag (1.0when atomic clashes were detected).num_recycles— number of recycling iterations used.chain_ptm,chain_iptm,chain_plddt,chain_gpde— per-chain variants of the above metrics, one entry per chain.chain_pair_iptm,chain_pair_iptm_global,chain_pair_gpde— chain-pair matrices for evaluating individual interfaces.
Use get_confidence() to fetch the confidences. The outer list is indexed by submitted complex; the inner list is indexed by diffusion sample (controlled with the diffusion_samples argument to fold).
[8]:
import json
confidence = fold_job.get_confidence()[0] # first submitted complex
sample = confidence[0] # first diffusion sample
print("ranking_score:", sample.ranking_score)
print("ptm:", sample.ptm, "iptm:", sample.iptm)
print("plddt:", sample.plddt, "gpde:", sample.gpde)
print("has_clash:", sample.has_clash)
print("\nPer-chain pLDDT:", sample.chain_plddt)
print("Per-chain pTM: ", sample.chain_ptm)
print("Per-chain ipTM: ", sample.chain_iptm)
print("\nFull confidence record:")
print(json.dumps(sample.model_dump(), indent=2))
ranking_score: 0.939129114151001
ptm: 0.9303953647613525 iptm: 0.9413124918937683
plddt: 91.09008026123047 gpde: 0.4112328588962555
has_clash: 0.0
Per-chain pLDDT: [0.9115279912948608, 0.9098753929138184, 0.9562626481056213]
Per-chain pTM: [0.9194399118423462, 0.9173002243041992, 0.8558129668235779]
Per-chain ipTM: [0.9644049406051636, 0.9504706859588623, 0.9766792058944702]
Full confidence record:
{
"ranking_score": 0.939129114151001,
"ptm": 0.9303953647613525,
"iptm": 0.9413124918937683,
"plddt": 91.09008026123047,
"gpde": 0.4112328588962555,
"has_clash": 0.0,
"num_recycles": 10,
"disorder": 0.0,
"chain_ptm": [
0.9194399118423462,
0.9173002243041992,
0.8558129668235779
],
"chain_iptm": [
0.9644049406051636,
0.9504706859588623,
0.9766792058944702
],
"chain_plddt": [
0.9115279912948608,
0.9098753929138184,
0.9562626481056213
],
"chain_gpde": [
0.3905996084213257,
0.39729487895965576,
0.31210482120513916
],
"chain_pair_iptm": [
[
0.0,
0.9381964802742004,
0.9906134009361267
],
[
0.9381964802742004,
0.0,
0.962744951248169
],
[
0.9906134009361267,
0.962744951248169,
0.0
]
],
"chain_pair_iptm_global": [
[
0.0,
0.9574378132820129,
0.9766792058944702
],
[
0.9574378132820129,
0.0,
0.9766792058944702
],
[
0.9766792058944702,
0.9766792058944702,
0.0
]
],
"chain_pair_gpde": [
[
0.0,
0.6037024259567261,
0.5214035511016846
],
[
0.6037024259567261,
0.0,
1.0281025171279907
],
[
0.5214035511016846,
1.0281025171279907,
0.0
]
],
"chain_pair_plddt": [
[
0.0,
0.9107017517089844,
0.9119172096252441
],
[
0.9107017517089844,
0.0,
0.9102789759635925
],
[
0.9119172096252441,
0.9102789759635925,
0.0
]
]
}
Using the original Protenix#
Protenix-v2 is the enhanced-capacity successor and is recommended for most use cases. The first-generation Protenix model (ProtenixModel, model id protenix) remains available and follows the same workflow shown above — just swap the accessor to session.fold.protenix:
fold_job = session.fold.protenix.fold(sequences=[complex])
Use the original Protenix when you need to reproduce earlier results.
Next Steps#
You can examine the predicted structure, or explore the other structure prediction models on our platform such as Boltz-2, AlphaFold2, or RoseTTAFold3. To save the predicted structure to disk:
[9]:
with open("protenix_v2_prediction.cif", "w") as f:
f.write(structure.to_string(format="cif"))