Open In Colab Get Notebook View In GitHub

Using Protenix#

This tutorial demonstrates how to use the Protenix model on the OpenProtein platform to predict the structure of a biomolecular complex that includes proteins, ligands, DNA, and RNA. Protenix is an AlphaFold3-style model and, like AlphaFold3, performs best when each protein chain is paired with a multiple sequence alignment (MSA). We will walk through assembling a complex, building the MSA, submitting the fold, and retrieving the predicted structure together with Protenix’s confidence metrics.

The full API for the model is documented at ProtenixModel.

What you need before getting started#

First, ensure you have an active OpenProtein session. Then, import the classes used to define the components of your complex.

[1]:
import openprotein
from openprotein.molecules import Complex, Protein, Ligand

# Login to your session
session = openprotein.connect()

Defining the Molecules#

Protenix can model proteins, ligands, DNA, and RNA. For this example we will predict the structure of a homodimer in complex with a small molecule ligand by assembling a Complex from Protein and Ligand chains keyed by chain id.

[2]:
# Define the biomolecular complex to predict.
# Start with the protein in a homodimer.
protein = Protein(sequence="MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ")

# You can also specify the protein to be cyclic by setting the property
# protein.cyclic = True

# Define the ligand in our complex.
ligand = Ligand(ccd="SAH")

# Assemble the complex. Group chain ids that share the same entity into a
# tuple — this serializes the homodimer as a single protein entity with
# ids ["A", "B"] and only requires one MSA on the entity.
complex = Complex({
    ("A", "B"): protein,
    "C": ligand,
})

Predicting the Complex Structure#

Now we can call the fold() method on the Protenix model.

The key steps are:

  1. Access the model via session.fold.protenix.

  2. Pass the defined complex.

  3. Optionally tune the diffusion sampler with diffusion_samples, num_recycles, and num_steps.

[4]:
# Request the fold.
fold_job = session.fold.protenix.fold(
    sequences=[complex],   # list for batch requests
    diffusion_samples=1,   # number of diffusion samples per input
    num_recycles=10,       # number of recycling steps
    num_steps=200,         # number of sampling steps
)
fold_job

[4]:
FoldJob(num_records=1, job_id='6e51ea51-1b35-4a65-85e0-b60956900007', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 5, 7, 18, 4, 30, 769318, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None, failure_message=None)

The call returns a FoldResultFuture object immediately. This is a reference to your job running on the OpenProtein platform: you can monitor its status, or block until completion with wait_until_done().

[5]:
# Wait for the job to finish.
fold_job.wait_until_done(verbose=True)

Waiting: 100%|██████████| 100/100 [04:32<00:00,  2.72s/it, status=SUCCESS]
[5]:
True

Retrieving the Results#

Once the job is complete, you can retrieve the various outputs from the future object.

Getting the Structure#

The primary result is a Structure, returned by get(). A Structure can hold multiple Complex es (one per diffusion sample), each holding the predicted chains — including Protein chains with their per-atom 3D coordinates.

The number of Complex es in the resulting Structure matches the diffusion_samples argument from the request.

The result list itself has one entry per submitted complex, since the fold API supports batched submissions.

[6]:
result = fold_job.get()
structure = result[0]
predicted_complex = structure[0]
print("Predicted structures:", result)
print("Predicted molecular complex:", result[0][0])
print("Predicted protein A:\n", predicted_complex.get_protein("A"))
print("Predicted protein B:\n", predicted_complex.get_protein("B"))
print("Predicted ligand C:\n", predicted_complex.get_ligand("C"))

Predicted structures: [<openprotein.molecules.structure.Structure object at 0x10d9cb110>]
Predicted molecular complex: <openprotein.molecules.complex.Complex object at 0x10d9cb5f0>
Predicted protein A:
 0     SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA

60    SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL

120   SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT

180   SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL

240   SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD

300   SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI

360   SEQUENCE LLARRATEPSAVPEGQASENLYFQ
Predicted protein B:
 0     SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA

60    SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL

120   SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT

180   SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL

240   SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD

300   SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI

360   SEQUENCE LLARRATEPSAVPEGQASENLYFQ
Predicted ligand C:
 Ligand(ccd='SAH', smiles=None, _structure_block=<openprotein.utils.cif.StructureCIFBlock object at 0x10d89a960>)

Visualize the structure using molviewspec.

[7]:
%pip install molviewspec
from molviewspec import create_builder

def display_structure(structure_string):
    builder = create_builder()
    structure = builder.download(url="mystructure.cif")\
        .parse(format="mmcif")\
        .model_structure()\
        .component()\
        .representation()\
        .color_from_source(schema="atom",
                            category_name="atom_site",
                            field_name="auth_asym_id",
                            palette={"kind": "categorical", # color by chain
                                    "colors": ["blue", "red", "green", "orange"],
                                    "mode": "ordinal"}
                          )
    return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)

display_structure(structure.to_string(format="cif"))

Requirement already satisfied: molviewspec in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (1.8.1)
Requirement already satisfied: pydantic<3,>=1 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /Users/jmage/Projects/openprotein/openprotein-python-private/.venv/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Note: you may need to restart the kernel to use updated packages.

Getting Confidence Metrics#

Protenix returns a structured confidence object per diffusion sample rather than per-residue matrices. Each entry is a ProtenixConfidence that aggregates AlphaFold3-style scores at the complex, chain, and chain-pair level:

  • ranking_score — composite ranking metric used to order diffusion samples (0.8 * iptm + 0.2 * ptm - 100 * has_clash).

  • ptm / iptm — predicted TM-score for the full complex and the inter-chain interface pTM.

  • plddt — mean per-atom pLDDT in the range [0, 100].

  • gpde — global PDE weighted by contact probabilities.

  • has_clash — binary clash flag (1.0 when atomic clashes were detected).

  • num_recycles — number of recycling iterations used.

  • chain_ptm, chain_iptm, chain_plddt, chain_gpde — per-chain variants of the above metrics, one entry per chain.

  • chain_pair_iptm, chain_pair_iptm_global, chain_pair_gpde — chain-pair matrices for evaluating individual interfaces.

Use get_confidence() to fetch the confidences. The outer list is indexed by submitted complex; the inner list is indexed by diffusion sample (controlled with the diffusion_samples argument to fold).

[8]:
import json

confidence = fold_job.get_confidence()[0]   # first submitted complex
sample = confidence[0]                       # first diffusion sample

print("ranking_score:", sample.ranking_score)
print("ptm:", sample.ptm, "iptm:", sample.iptm)
print("plddt:", sample.plddt, "gpde:", sample.gpde)
print("has_clash:", sample.has_clash)

print("\nPer-chain pLDDT:", sample.chain_plddt)
print("Per-chain pTM:  ", sample.chain_ptm)
print("Per-chain ipTM: ", sample.chain_iptm)

print("\nFull confidence record:")
print(json.dumps(sample.model_dump(), indent=2))

ranking_score: 0.937947154045105
ptm: 0.9243689775466919 iptm: 0.9413416385650635
plddt: 88.94894409179688 gpde: 0.4361425042152405
has_clash: 0.0

Per-chain pLDDT: [0.8908922672271729, 0.8876915574073792, 0.9345044493675232]
Per-chain pTM:   [0.9132489562034607, 0.9092245101928711, 0.7772278189659119]
Per-chain ipTM:  [0.9639949798583984, 0.9491543769836426, 0.9750561714172363]

Full confidence record:
{
  "ranking_score": 0.937947154045105,
  "ptm": 0.9243689775466919,
  "iptm": 0.9413416385650635,
  "plddt": 88.94894409179688,
  "gpde": 0.4361425042152405,
  "has_clash": 0.0,
  "num_recycles": 10,
  "disorder": 0.0,
  "chain_ptm": [
    0.9132489562034607,
    0.9092245101928711,
    0.7772278189659119
  ],
  "chain_iptm": [
    0.9639949798583984,
    0.9491543769836426,
    0.9750561714172363
  ],
  "chain_plddt": [
    0.8908922672271729,
    0.8876915574073792,
    0.9345044493675232
  ],
  "chain_gpde": [
    0.42788127064704895,
    0.4337179362773895,
    0.35881927609443665
  ],
  "chain_pair_iptm": [
    [
      0.0,
      0.9380932450294495,
      0.9898967146873474
    ],
    [
      0.9380932450294495,
      0.0,
      0.9602155685424805
    ],
    [
      0.9898967146873474,
      0.9602155685424805,
      0.0
    ]
  ],
  "chain_pair_iptm_global": [
    [
      0.0,
      0.9565746784210205,
      0.9750561714172363
    ],
    [
      0.9565746784210205,
      0.0,
      0.9750561714172363
    ],
    [
      0.9750561714172363,
      0.9750561714172363,
      0.0
    ]
  ],
  "chain_pair_gpde": [
    [
      0.0,
      0.5503821969032288,
      0.41448211669921875
    ],
    [
      0.5503821969032288,
      0.0,
      1.5603805780410767
    ],
    [
      0.41448211669921875,
      1.5603805780410767,
      0.0
    ]
  ],
  "chain_pair_plddt": [
    [
      0.0,
      0.8892918825149536,
      0.8912716507911682
    ],
    [
      0.8892918825149536,
      0.0,
      0.8880988359451294
    ],
    [
      0.8912716507911682,
      0.8880988359451294,
      0.0
    ]
  ]
}

Next Steps#

You can examine the predicted structure, or explore the other structure prediction models on our platform such as Boltz-2, AlphaFold2, or RoseTTAFold3. To save the predicted structure to disk:

[9]:
with open("protenix_prediction.cif", "w") as f:
    f.write(structure.to_string(format="cif"))