Using Protenix-v2#

This tutorial demonstrates how to use the Protenix-v2 model on the OpenProtein platform to predict the structure of a biomolecular complex that includes proteins, ligands, DNA, and RNA. Protenix-v2 is an enhanced-capacity (464M-parameter) successor to Protenix, with improved antibody-antigen structure prediction and ligand plausibility. Like AlphaFold3, it performs best when each protein chain is paired with a multiple sequence alignment (MSA). We will walk through assembling a complex, building the MSA, submitting the fold, and retrieving the predicted structure together with Protenix-v2’s confidence metrics.

The full API for the model is documented at ProtenixV2Model.

What you need before getting started#

First, ensure you have an active OpenProtein session. Then, import the classes used to define the components of your complex.

[1]:

import openprotein
from openprotein.molecules import Complex, Protein, Ligand

# Login to your session
session = openprotein.connect()

Defining the Molecules#

Protenix-v2 can model proteins, ligands, DNA, and RNA. For this example we will predict the structure of a homodimer in complex with a small molecule ligand by assembling a Complex from Protein and Ligand chains keyed by chain id.

[2]:

# Define the biomolecular complex to predict.
# Start with the protein in a homodimer.
protein = Protein(sequence="MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ")

# You can also specify the protein to be cyclic by setting the property
# protein.cyclic = True

# Define the ligand in our complex.
ligand = Ligand(ccd="SAH")

# Assemble the complex.
complex = Complex({
    "A": protein,
    "B": protein,
    "C": ligand,
})

Create an MSA for the Protein using Homology Search#

Protenix-v2 is an AlphaFold3-style model and expects each protein chain to carry a multiple sequence alignment (MSA). You must either set protein.msa to an MSA built on the platform, or explicitly opt out by setting protein.msa = Protein.single_sequence_mode to run in single-sequence mode. Submitting a Protenix-v2 request without an MSA on one of the proteins will raise an error.

Here we build the MSA using the platform’s homology search via session.align.create_msa. Note the syntax: when seeding an MSA for a complex we follow ColabFold’s convention of joining the chain sequences with :, which lets the MSA service jointly search the multimer.

[3]:

msa_query = []
for p in complex.get_proteins().values():
    msa_query.append(p.sequence)
msa = session.align.create_msa(seed=b":".join(msa_query))

for p in complex.get_proteins().values():
    p.msa = msa
    # If desired, use single sequence mode to specify no msa
    # p.msa = Protein.single_sequence_mode

Predicting the Complex Structure#

Now we can call the fold() method on the Protenix-v2 model.

The key steps are:

Access the model via session.fold.protenix_v2.
Pass the defined complex.
Optionally tune the diffusion sampler with diffusion_samples, num_recycles, and num_steps.

[4]:

# Request the fold.
fold_job = session.fold.protenix_v2.fold(
    sequences=[complex],   # list for batch requests
    diffusion_samples=1,   # number of diffusion samples per input
    num_recycles=10,       # number of recycling steps
    num_steps=200,         # number of sampling steps
)
fold_job

[4]:

FoldJob(num_records=1, job_id='5fcddd1e-8287-45c6-9d18-9c2680834ab2', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 6, 4, 23, 1, 46, 73912, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None, failure_message=None)

The call returns a FoldResultFuture object immediately. This is a reference to your job running on the OpenProtein platform: you can monitor its status, or block until completion with wait_until_done().

[5]:

# Wait for the job to finish.
fold_job.wait_until_done(verbose=True)

Waiting: 100%|██████████| 100/100 [07:53<00:00,  4.74s/it, status=SUCCESS]

[5]:

True

Retrieving the Results#

Once the job is complete, you can retrieve the various outputs from the future object.

Getting the Structure#

The primary result is a Structure, returned by get(). A Structure can hold multiple Complex es (one per diffusion sample), each holding the predicted chains — including Protein chains with their per-atom 3D coordinates.

The number of Complex es in the resulting Structure matches the diffusion_samples argument from the request.

The result list itself has one entry per submitted complex, since the fold API supports batched submissions.

[6]:

result = fold_job.get()
structure = result[0]
predicted_complex = structure[0]
print("Predicted structures:", result)
print("Predicted molecular complex:", result[0][0])
print("Predicted protein A:\n", predicted_complex.get_protein("A"))
print("Predicted protein B:\n", predicted_complex.get_protein("B"))
print("Predicted ligand C:\n", predicted_complex.get_ligand("C"))

Predicted structures: [<openprotein.molecules.structure.Structure object at 0x10bd06ea0>]
Predicted molecular complex: <openprotein.molecules.complex.Complex object at 0x10bac73b0>
Predicted protein A:
 0     SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA

60    SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL

120   SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT

180   SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL

240   SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD

300   SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI

360   SEQUENCE LLARRATEPSAVPEGQASENLYFQ
Predicted protein B:
 0     SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA

60    SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL

120   SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT

180   SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL

240   SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD

300   SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI

360   SEQUENCE LLARRATEPSAVPEGQASENLYFQ
Predicted ligand C:
 Ligand(ccd='SAH', smiles=None, _structure_block=<openprotein.utils.cif.StructureCIFBlock object at 0x10bd30410>)

Visualize the structure using molviewspec.

[7]:

%pip install molviewspec
from molviewspec import create_builder

def display_structure(structure_string):
    builder = create_builder()
    structure = builder.download(url="mystructure.cif")\
        .parse(format="mmcif")\
        .model_structure()\
        .component()\
        .representation()\
        .color_from_source(schema="atom",
                            category_name="atom_site",
                            field_name="auth_asym_id",
                            palette={"kind": "categorical", # color by chain
                                    "colors": ["blue", "red", "green", "orange"],
                                    "mode": "ordinal"}
                          )
    return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)

display_structure(structure.to_string(format="cif"))

Collecting molviewspec
  Using cached molviewspec-1.8.1-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: pydantic<3,>=1 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Using cached molviewspec-1.8.1-py3-none-any.whl (40 kB)
Installing collected packages: molviewspec
Successfully installed molviewspec-1.8.1
Note: you may need to restart the kernel to use updated packages.

Getting Confidence Metrics#

Protenix-v2 returns a structured confidence object per diffusion sample rather than per-residue matrices. Each entry is a ProtenixConfidence that aggregates AlphaFold3-style scores at the complex, chain, and chain-pair level:

ranking_score — composite ranking metric used to order diffusion samples (0.8 * iptm + 0.2 * ptm - 100 * has_clash).
ptm / iptm — predicted TM-score for the full complex and the inter-chain interface pTM.
plddt — mean per-atom pLDDT in the range [0, 100].
gpde — global PDE weighted by contact probabilities.
has_clash — binary clash flag (1.0 when atomic clashes were detected).
num_recycles — number of recycling iterations used.
chain_ptm, chain_iptm, chain_plddt, chain_gpde — per-chain variants of the above metrics, one entry per chain.
chain_pair_iptm, chain_pair_iptm_global, chain_pair_gpde — chain-pair matrices for evaluating individual interfaces.

Use get_confidence() to fetch the confidences. The outer list is indexed by submitted complex; the inner list is indexed by diffusion sample (controlled with the diffusion_samples argument to fold).

[8]:

import json

confidence = fold_job.get_confidence()[0]   # first submitted complex
sample = confidence[0]                       # first diffusion sample

print("ranking_score:", sample.ranking_score)
print("ptm:", sample.ptm, "iptm:", sample.iptm)
print("plddt:", sample.plddt, "gpde:", sample.gpde)
print("has_clash:", sample.has_clash)

print("\nPer-chain pLDDT:", sample.chain_plddt)
print("Per-chain pTM:  ", sample.chain_ptm)
print("Per-chain ipTM: ", sample.chain_iptm)

print("\nFull confidence record:")
print(json.dumps(sample.model_dump(), indent=2))

ranking_score: 0.939129114151001
ptm: 0.9303953647613525 iptm: 0.9413124918937683
plddt: 91.09008026123047 gpde: 0.4112328588962555
has_clash: 0.0

Per-chain pLDDT: [0.9115279912948608, 0.9098753929138184, 0.9562626481056213]
Per-chain pTM:   [0.9194399118423462, 0.9173002243041992, 0.8558129668235779]
Per-chain ipTM:  [0.9644049406051636, 0.9504706859588623, 0.9766792058944702]

Full confidence record:
{
  "ranking_score": 0.939129114151001,
  "ptm": 0.9303953647613525,
  "iptm": 0.9413124918937683,
  "plddt": 91.09008026123047,
  "gpde": 0.4112328588962555,
  "has_clash": 0.0,
  "num_recycles": 10,
  "disorder": 0.0,
  "chain_ptm": [
    0.9194399118423462,
    0.9173002243041992,
    0.8558129668235779
  ],
  "chain_iptm": [
    0.9644049406051636,
    0.9504706859588623,
    0.9766792058944702
  ],
  "chain_plddt": [
    0.9115279912948608,
    0.9098753929138184,
    0.9562626481056213
  ],
  "chain_gpde": [
    0.3905996084213257,
    0.39729487895965576,
    0.31210482120513916
  ],
  "chain_pair_iptm": [
    [
      0.0,
      0.9381964802742004,
      0.9906134009361267
    ],
    [
      0.9381964802742004,
      0.0,
      0.962744951248169
    ],
    [
      0.9906134009361267,
      0.962744951248169,
      0.0
    ]
  ],
  "chain_pair_iptm_global": [
    [
      0.0,
      0.9574378132820129,
      0.9766792058944702
    ],
    [
      0.9574378132820129,
      0.0,
      0.9766792058944702
    ],
    [
      0.9766792058944702,
      0.9766792058944702,
      0.0
    ]
  ],
  "chain_pair_gpde": [
    [
      0.0,
      0.6037024259567261,
      0.5214035511016846
    ],
    [
      0.6037024259567261,
      0.0,
      1.0281025171279907
    ],
    [
      0.5214035511016846,
      1.0281025171279907,
      0.0
    ]
  ],
  "chain_pair_plddt": [
    [
      0.0,
      0.9107017517089844,
      0.9119172096252441
    ],
    [
      0.9107017517089844,
      0.0,
      0.9102789759635925
    ],
    [
      0.9119172096252441,
      0.9102789759635925,
      0.0
    ]
  ]
}

Using the original Protenix#

Protenix-v2 is the enhanced-capacity successor and is recommended for most use cases. The first-generation Protenix model (ProtenixModel, model id protenix) remains available and follows the same workflow shown above — just swap the accessor to session.fold.protenix:

fold_job = session.fold.protenix.fold(sequences=[complex])

Use the original Protenix when you need to reproduce earlier results.

Next Steps#

You can examine the predicted structure, or explore the other structure prediction models on our platform such as Boltz-2, AlphaFold2, or RoseTTAFold3. To save the predicted structure to disk:

[9]:

with open("protenix_v2_prediction.cif", "w") as f:
    f.write(structure.to_string(format="cif"))