Using Protenix#
This tutorial demonstrates how to use the Protenix model on the OpenProtein platform to predict the structure of a biomolecular complex that includes proteins, ligands, DNA, and RNA. Protenix is an AlphaFold3-style model and, like AlphaFold3, performs best when each protein chain is paired with a multiple sequence alignment (MSA). We will walk through assembling a complex, building the MSA, submitting the fold, and retrieving the predicted structure together with Protenix’s confidence metrics.
The full API for the model is documented at ProtenixModel.
What you need before getting started#
First, ensure you have an active OpenProtein session. Then, import the classes used to define the components of your complex.
[1]:
import openprotein
from openprotein.molecules import Complex, Protein, Ligand
# Login to your session
session = openprotein.connect()
Defining the Molecules#
Protenix can model proteins, ligands, DNA, and RNA. For this example we will predict the structure of a homodimer in complex with a small molecule ligand by assembling a Complex from Protein and Ligand chains keyed by chain id.
[2]:
# Define the biomolecular complex to predict.
# Start with the protein in a homodimer.
protein = Protein(sequence="MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ")
# You can also specify the protein to be cyclic by setting the property
# protein.cyclic = True
# Define the ligand in our complex.
ligand = Ligand(ccd="SAH")
# Assemble the complex. Group chain ids that share the same entity into a
# tuple — this serializes the homodimer as a single protein entity with
# ids ["A", "B"] and only requires one MSA on the entity.
complex = Complex({
("A", "B"): protein,
"C": ligand,
})
Create an MSA for the Protein using Homology Search#
Protenix is an AlphaFold3-style model and expects each protein chain to carry a multiple sequence alignment (MSA). You must either set protein.msa to an MSA built on the platform, or explicitly opt out by setting protein.msa = Protein.single_sequence_mode to run in single-sequence mode. Submitting a Protenix request without an MSA on one of the proteins will raise an error.
Here we build the MSA using the platform’s homology search via session.align.create_msa. Note the syntax: when seeding an MSA for a complex we follow ColabFold’s convention of joining the chain sequences with :, which lets the MSA service jointly search the multimer.
[3]:
msa_query = []
for p in complex.get_proteins().values():
msa_query.append(p.sequence)
msa = session.align.create_msa(seed=b":".join(msa_query))
for p in complex.get_proteins().values():
p.msa = msa
# If desired, use single sequence mode to specify no msa
# p.msa = Protein.single_sequence_mode
Predicting the Complex Structure#
Now we can call the fold() method on the Protenix model.
The key steps are:
Access the model via
session.fold.protenix.Pass the defined complex.
Optionally tune the diffusion sampler with
diffusion_samples,num_recycles, andnum_steps.
[4]:
# Request the fold.
fold_job = session.fold.protenix.fold(
sequences=[complex], # list for batch requests
diffusion_samples=1, # number of diffusion samples per input
num_recycles=10, # number of recycling steps
num_steps=200, # number of sampling steps
)
fold_job
[4]:
FoldJob(num_records=1, job_id='6e51ea51-1b35-4a65-85e0-b60956900007', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 5, 7, 18, 4, 30, 769318, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None, failure_message=None)
The call returns a FoldResultFuture object immediately. This is a reference to your job running on the OpenProtein platform: you can monitor its status, or block until completion with wait_until_done().
[5]:
# Wait for the job to finish.
fold_job.wait_until_done(verbose=True)
Waiting: 100%|██████████| 100/100 [04:32<00:00, 2.72s/it, status=SUCCESS]
[5]:
True
Retrieving the Results#
Once the job is complete, you can retrieve the various outputs from the future object.
Getting the Structure#
The primary result is a Structure, returned by get(). A Structure can hold multiple Complex es (one per diffusion sample), each holding the predicted chains — including Protein chains with their per-atom 3D coordinates.
The number of Complex es in the resulting Structure matches the diffusion_samples argument from the request.
The result list itself has one entry per submitted complex, since the fold API supports batched submissions.
[6]:
result = fold_job.get()
structure = result[0]
predicted_complex = structure[0]
print("Predicted structures:", result)
print("Predicted molecular complex:", result[0][0])
print("Predicted protein A:\n", predicted_complex.get_protein("A"))
print("Predicted protein B:\n", predicted_complex.get_protein("B"))
print("Predicted ligand C:\n", predicted_complex.get_ligand("C"))
Predicted structures: [<openprotein.molecules.structure.Structure object at 0x10d9cb110>]
Predicted molecular complex: <openprotein.molecules.complex.Complex object at 0x10d9cb5f0>
Predicted protein A:
0 SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA
60 SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL
120 SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT
180 SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL
240 SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD
300 SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI
360 SEQUENCE LLARRATEPSAVPEGQASENLYFQ
Predicted protein B:
0 SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA
60 SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL
120 SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT
180 SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL
240 SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD
300 SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI
360 SEQUENCE LLARRATEPSAVPEGQASENLYFQ
Predicted ligand C:
Ligand(ccd='SAH', smiles=None, _structure_block=<openprotein.utils.cif.StructureCIFBlock object at 0x10d89a960>)
Visualize the structure using molviewspec.
[7]:
%pip install molviewspec
from molviewspec import create_builder
def display_structure(structure_string):
builder = create_builder()
structure = builder.download(url="mystructure.cif")\
.parse(format="mmcif")\
.model_structure()\
.component()\
.representation()\
.color_from_source(schema="atom",
category_name="atom_site",
field_name="auth_asym_id",
palette={"kind": "categorical", # color by chain
"colors": ["blue", "red", "green", "orange"],
"mode": "ordinal"}
)
return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)
display_structure(structure.to_string(format="cif"))
Requirement already satisfied: molviewspec in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (1.8.1)
Requirement already satisfied: pydantic<3,>=1 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Note: you may need to restart the kernel to use updated packages.
Getting Confidence Metrics#
Protenix returns a structured confidence object per diffusion sample rather than per-residue matrices. Each entry is a ProtenixConfidence that aggregates AlphaFold3-style scores at the complex, chain, and chain-pair level:
ranking_score— composite ranking metric used to order diffusion samples (0.8 * iptm + 0.2 * ptm - 100 * has_clash).ptm/iptm— predicted TM-score for the full complex and the inter-chain interface pTM.plddt— mean per-atom pLDDT in the range[0, 100].gpde— global PDE weighted by contact probabilities.has_clash— binary clash flag (1.0when atomic clashes were detected).num_recycles— number of recycling iterations used.chain_ptm,chain_iptm,chain_plddt,chain_gpde— per-chain variants of the above metrics, one entry per chain.chain_pair_iptm,chain_pair_iptm_global,chain_pair_gpde— chain-pair matrices for evaluating individual interfaces.
Use get_confidence() to fetch the confidences. The outer list is indexed by submitted complex; the inner list is indexed by diffusion sample (controlled with the diffusion_samples argument to fold).
[8]:
import json
confidence = fold_job.get_confidence()[0] # first submitted complex
sample = confidence[0] # first diffusion sample
print("ranking_score:", sample.ranking_score)
print("ptm:", sample.ptm, "iptm:", sample.iptm)
print("plddt:", sample.plddt, "gpde:", sample.gpde)
print("has_clash:", sample.has_clash)
print("\nPer-chain pLDDT:", sample.chain_plddt)
print("Per-chain pTM: ", sample.chain_ptm)
print("Per-chain ipTM: ", sample.chain_iptm)
print("\nFull confidence record:")
print(json.dumps(sample.model_dump(), indent=2))
ranking_score: 0.937947154045105
ptm: 0.9243689775466919 iptm: 0.9413416385650635
plddt: 88.94894409179688 gpde: 0.4361425042152405
has_clash: 0.0
Per-chain pLDDT: [0.8908922672271729, 0.8876915574073792, 0.9345044493675232]
Per-chain pTM: [0.9132489562034607, 0.9092245101928711, 0.7772278189659119]
Per-chain ipTM: [0.9639949798583984, 0.9491543769836426, 0.9750561714172363]
Full confidence record:
{
"ranking_score": 0.937947154045105,
"ptm": 0.9243689775466919,
"iptm": 0.9413416385650635,
"plddt": 88.94894409179688,
"gpde": 0.4361425042152405,
"has_clash": 0.0,
"num_recycles": 10,
"disorder": 0.0,
"chain_ptm": [
0.9132489562034607,
0.9092245101928711,
0.7772278189659119
],
"chain_iptm": [
0.9639949798583984,
0.9491543769836426,
0.9750561714172363
],
"chain_plddt": [
0.8908922672271729,
0.8876915574073792,
0.9345044493675232
],
"chain_gpde": [
0.42788127064704895,
0.4337179362773895,
0.35881927609443665
],
"chain_pair_iptm": [
[
0.0,
0.9380932450294495,
0.9898967146873474
],
[
0.9380932450294495,
0.0,
0.9602155685424805
],
[
0.9898967146873474,
0.9602155685424805,
0.0
]
],
"chain_pair_iptm_global": [
[
0.0,
0.9565746784210205,
0.9750561714172363
],
[
0.9565746784210205,
0.0,
0.9750561714172363
],
[
0.9750561714172363,
0.9750561714172363,
0.0
]
],
"chain_pair_gpde": [
[
0.0,
0.5503821969032288,
0.41448211669921875
],
[
0.5503821969032288,
0.0,
1.5603805780410767
],
[
0.41448211669921875,
1.5603805780410767,
0.0
]
],
"chain_pair_plddt": [
[
0.0,
0.8892918825149536,
0.8912716507911682
],
[
0.8892918825149536,
0.0,
0.8880988359451294
],
[
0.8912716507911682,
0.8880988359451294,
0.0
]
]
}
Next Steps#
You can examine the predicted structure, or explore the other structure prediction models on our platform such as Boltz-2, AlphaFold2, or RoseTTAFold3. To save the predicted structure to disk:
[9]:
with open("protenix_prediction.cif", "w") as f:
f.write(structure.to_string(format="cif"))