Nanobody Scaffolds with BoltzGen#

This tutorial demonstrates how to design nanobody binders with BoltzGen using the OpenProtein.AI Python client.

The design process consists of four main steps:

Specify the design problem as a query that describes (1) the target protein and (2) the scaffold of the nanobody binder.
Generate plausible structures and sequences for the binder using BoltzGen.
Refine the generated sequences using an inverse folding model.
Select the best candidate sequences by predicting their structures and ranking using established in-silico validation metrics.

We illustrate how to take an example from the official BoltzGen repo and create a query to run on our platform.

Prerequisites#

For this tutorial, you will need your OpenProtein.AI Python session for accessing the models available on our platform and manipulating job results, so make sure you have your credentials set up!

Refer to our quickstart for more information.

[1]:

import openprotein
session = openprotein.connect()
session

[1]:

<openprotein.OpenProtein at 0x7f9b908b74a0>

Nanobody binder design specification#

Define the target#

We will use the penguinpox virus from the BoltzGen examples as our target. Let’s download the target structure file and load it as a Protein.

[2]:

from openprotein import Protein
import requests
from pathlib import Path
from molviewspec import create_builder

DATA_DIR = Path("data/penguinpox")
DATA_DIR.mkdir(parents=True, exist_ok=True)

# Target structure (mmCIF)
target_url = "https://raw.githubusercontent.com/HannesStark/boltzgen/main/example/nanobody_against_penguinpox/9bkq-assembly2.cif"
TARGET_NAME = "9bkq-assembly2.cif"
TARGET_PATH = DATA_DIR / TARGET_NAME
if not TARGET_PATH.exists():
    TARGET_PATH.write_bytes(requests.get(target_url).content)

target_protein = Protein.from_filepath(path=TARGET_PATH, chain_id="B")
print("target sequence:", target_protein.sequence)
print("target coordinates shape:", target_protein.coordinates.shape)
print("target plddt shape:", target_protein.plddt.shape)
print("target name:", target_protein.name)

target sequence: b'SATTIQKELENIVVKERQNKKDTILMGLKVEVPWNYCDWASISFYDVRLESGILDMESIAVKYMTGCDIPPHVTLGITNKDQEANFQRFKELTRNIDLTSLSFTCKEVICFPQSRASKELGANGRAVVMKLEASDDVKALRNVLFNVVPTPRDIFGPVLSDPVWCPHVTIGYVRADDEDNKNSFIELAEAFRGSKIKVIGWCE'
target coordinates shape: (203, 37, 3)
target plddt shape: (203,)
target name: 9bkq-assembly2

Now let’s do a quick visualization of our target.

[12]:

def visualize_pdb(pdb_string: str):
    builder = create_builder()
    structure = builder.download(url="mystructure.pdb")\
        .parse(format="pdb")\
        .model_structure()\
        .component()\
        .representation()\
        .color_from_source(schema="atom",
                            category_name="atom_site",
                            field_name="auth_asym_id",
                            palette={"kind": "categorical", # color by chain
                                    "colors": ["blue", "red", "green", "orange"],
                                    "mode": "ordinal"}
                          )
    builder.molstar_notebook(data={'mystructure.pdb': pdb_string}, width=500, height=400)

visualize_pdb(target_protein.make_pdb_string())

Designing the nanobody scaffold#

Next, we will want to design the scaffold for design with the target virus. As a reference, we will use one of the scaffolds from the BoltzGen examples as well. This is the 7EOW, which is also part of the nanobody (VHH) scaffold. First let’s retrieve the specification that shows how the scaffold is built.

[4]:

scaffold_spec_url = "https://raw.githubusercontent.com/HannesStark/boltzgen/main/example/nanobody_scaffolds/7eow.yaml"
print(requests.get(scaffold_spec_url).text)

# caplacizumab
path: 7eow.cif
include:
  - chain:
      id: B

design:
  - chain:
      id: B
      res_index: 26..34,52..59,98..118

structure_groups:
  - group:
      id: B
      visibility: 2
  - group:
      id: B
      visibility: 0
      res_index: 26..34,52..59,98..118

exclude:
  - chain:
      id: B
      res_index: 26..28 # take out 3
  - chain:
      id: B
      res_index: 52..54 # take out 3
  - chain:
      id: B
      res_index: 98..104 # take out seven

design_insertions:
  - insertion:
      id: B
      res_index: 26 # The res_index'th residue will be a designed one (starting to count from 1)
      num_residues: 1..5
  - insertion:
      id: B
      res_index: 52 # The res_index'th residue will be a designed one (starting to count from 1)
      num_residues: 1..5
  - insertion:
      id: B
      res_index: 98 # The res_index'th residue will be a designed one (starting to count from 1)
      num_residues: 1..14

# reindex the residue index which is used in the positional encoding
reset_res_index:
  - chain:
      id: B

This specification from BoltzGen describes how the scaffold protein is prepared for use in a design workflow. There’s no need to actually learn the specification to create designs on our platform, but it is useful as a reference.

In particular, we can see that we want to:

Use chain B in the structure
Design residues 26 - 34, 52 - 59, and 98 - 118
Set the visibility groups for the above residues to 0 and 2 for everything else
Remove residues 26 - 28, 52 - 54, and 98 - 104
Insert residues at each of the removed positions.

Resetting the indices is not necessary, since this is implicitly done when we encode and upload our query.

Note that the order of operations is important since the residue indices are shifted during deletion and insertion.

Residue groups#

It is worth spending some time to understand more about residue groups and visibility, especially since it affects the design results.

The core idea is that everything within the same residue group has their positions fixed relative to each other, and by default, every chain and residue are in the same visibility group (default being 1). This is undesirable when doing cross-chain design since we end up with every designed chain being fixed in place.

Instead, we probably want every chain to be in a different residue group, so that the model can re-position these chains as needed. Furthermore, group 0 is intended in general for residues to be designed, since it indicates these residues are hidden from the others.

With this understanding, we can understand why we move the full scaffold chain above to group 2, since our target protein will be in group 1. Furthermore, we also set all the designed chains to be in group 0.

Query editing#

Now we can download the referred scaffold structure and load it to be edited according to the above specification.

[5]:

raw_scaffold_url = "https://raw.githubusercontent.com/HannesStark/boltzgen/main/example/nanobody_scaffolds/7eow.cif"
raw_scaffold_filestring = requests.get(raw_scaffold_url).text
raw_scaffold = Protein.from_string(raw_scaffold_filestring, "cif", "B")
print("raw scaffold sequence:", raw_scaffold.sequence)
print("raw scaffold coordinates shape:", raw_scaffold.coordinates.shape)
print("raw scaffold plddt shape:", raw_scaffold.plddt.shape)
print("raw scaffold name:", raw_scaffold.name)

raw scaffold sequence: b'MEVQLVESGGGLVQPGGSLRLSCAASGRTFSYNPMGWFRQAPGKGRELVAAISRTGGSTYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCAAAGVRAEDGRVRTLPSEYTFWGQGTQVTVSSLEHHHHHH'
raw scaffold coordinates shape: (137, 37, 3)
raw scaffold plddt shape: (137,)
raw scaffold name: 7EOW

Now let’s make the edits sequentially.

[6]:

# mask structure at positions to design
query_scaffold = raw_scaffold.mask_structure_at(
    list(range(26, 35)) +
    list(range(52, 60)) +
    list(range(98, 119))
)
# set visibility groups
query_scaffold = query_scaffold.set_group_at(
    list(range(len(query_scaffold))), 2
)
query_scaffold = query_scaffold.set_group_at(
    list(range(26, 35)) +
    list(range(52, 60)) +
    list(range(98, 119)), 0
)
# exclude
query_scaffold = query_scaffold.delete(
    list(range(26,29)) +
    list(range(52,55)) +
    list(range(98,104))
)
# insert
# NOTE we just select a number within the range
query_scaffold = query_scaffold.batch_insert(
    {
        26: "3",
        52: "3",
        98: "7",
    }
)
# ensure insertions are group 0
query_scaffold = query_scaffold.set_group_at(
    query_scaffold.get_structure_mask(), 0
)

print("scaffold sequence:", query_scaffold.sequence)
print("scaffold structure mask:", query_scaffold.get_structure_mask().tolist())
print("scaffold groups:", Protein.get_intervals(query_scaffold._group))
print("scaffold length:", len(query_scaffold.sequence))
print("scaffold coordinates shape:", query_scaffold.coordinates.shape)
print("scaffold plddt shape:", query_scaffold.plddt.shape)

scaffold sequence: b'MEVQLVESGGGLVQPGGSLRLSCAAXXXFSYNPMGWFRQAPGKGRELVAATGGSXXXYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCAEDGRVRTXXXXXXXPSEYTFWGQGTQVTVSSLEHHHHHH'
scaffold structure mask: [1, 26, 27, 28, 29, 30, 31, 32, 33, 51, 52, 53, 54, 55, 56, 57, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 128, 129, 130, 131, 132, 133, 134, 135]
scaffold groups: {0: [(1, 1), (26, 33), (51, 57), (96, 116), (128, 135)], 2: [(2, 25), (34, 50), (58, 95), (117, 127)]}
scaffold length: 135
scaffold coordinates shape: (135, 37, 3)
scaffold plddt shape: (135,)

Observe that our edits made are reflected accordingly here. Note that the structure mask has some extra residues. This is because the original scaffold file has the structure information missing at those indices as well. We can either drop them or just leave them here, since we have observed that BoltzGen doesnt have issues with this.

Now let’s combine the target and scaffold to form our query:

[38]:

# reset the chain id so they dont clash
target_protein.chain_id = "A"
query_model = target_protein & query_scaffold
print("Chains in query:", list(query_model.proteins.keys()))
print("Chain A (target chain):", query_model.proteins["A"].sequence)
print("Chain B (scaffold chain):", query_model.proteins["B"].sequence)
print("Chain A (target chain) structure mask:", query_model.proteins["A"].get_structure_mask())
print("Chain B (scaffold chain) structure mask:", query_model.proteins["B"].get_structure_mask())
# Also keep a reference to the structure mask
scaffold_structure_mask = query_model.proteins["B"].get_structure_mask()

Chains in query: ['A', 'B']
Chain A (target chain): b'SATTIQKELENIVVKERQNKKDTILMGLKVEVPWNYCDWASISFYDVRLESGILDMESIAVKYMTGCDIPPHVTLGITNKDQEANFQRFKELTRNIDLTSLSFTCKEVICFPQSRASKELGANGRAVVMKLEASDDVKALRNVLFNVVPTPRDIFGPVLSDPVWCPHVTIGYVRADDEDNKNSFIELAEAFRGSKIKVIGWCE'
Chain B (scaffold chain): b'MEVQLVESGGGLVQPGGSLRLSCAAXXXFSYNPMGWFRQAPGKGRELVAATGGSXXXYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCAEDGRVRTXXXXXXXPSEYTFWGQGTQVTVSSLEHHHHHH'
Chain A (target chain) structure mask: [1 2]
Chain B (scaffold chain) structure mask: [  1  26  27  28  29  30  31  32  33  51  52  53  54  55  56  57  96  97
  98  99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
 116 128 129 130 131 132 133 134 135]

Generate designs with BoltzGen#

Now let’s run these designs with BoltzGen:

[8]:

N = 25
boltzgen_job = session.models.boltzgen.generate(
    query=query_model,
    N=N,
)
boltzgen_job

[8]:

BoltzGenJob(job_id='666a3793-6c86-495d-8c27-b9938125572e', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 12, 23, 23, 12, 14, 886696, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

Wait for completion (around 30 minutes):

[9]:

boltzgen_job.wait_until_done(timeout=30*60)

[9]:

True

Now let’s inspect the designs returned from BoltzGen:

[27]:

boltzgen_designs = boltzgen_job.get()
print("chains in design:", list(boltzgen_designs[0].proteins.keys()))
print("target sequence:", boltzgen_designs[0].proteins["A"].sequence)
print("binder sequence:", boltzgen_designs[0].proteins["B"].sequence)
print("target mask:", boltzgen_designs[0].proteins["A"].get_structure_mask())
print("binder mask:", boltzgen_designs[0].proteins["B"].get_structure_mask())

chains in design: ['A', 'B']
target sequence: b'GGTTIQKELENIVVKERQNKKDTILMGLKVEVPWNYCDWASISFYDVRLESGILDMESIAVKYMTGCDIPPHVTLGITNKDQEANFQRFKELTRNIDLTSLSFTCKEVICFPQSRASKELGANGRAVVMKLEASDDVKALRNVLFNVVPTPRDIFGPVLSDPVWCPHVTIGYVRADDEDNKNSFIELAEAFRGSKIKVIGWCE'
binder sequence: b'GEVQLVESGGGLVQPGGSLRLSCAASGTFTSYAMGWFRQAPGKGRELVAAITSSGSTYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCAAKGANRTNLSALKKESDFIYWGQGTQVTVSSAAGPVVKS'
target mask: []
binder mask: []

As we can see, BoltzGen has gone ahead and redesigned the structure at the residue indices where the structure was masked. It also predicted and replaced the residues at those positions. We will be using inverse folding to re-generate the sequences anyway.

[13]:

visualize_pdb(boltzgen_designs[0].make_pdb_string())

Inverse Folding with PoET-2 (nanobody chain)#

We now propose sequences for the designed nanobody backbone using PoET-2.

As a first step, we will craft a prompt for PoET-2 that will condition it for antibodies when it is used to generate sequences. In particular, we will be sampling 500 sequences from a sequence database containing camel antibodies, clustered at 70% identity.

[15]:

seq_db = requests.get("https://openprotein-public-assets.s3.us-east-1.amazonaws.com/vhh-prompts/data/oas-bactrian_camel_vh_cluster_70_rep_seq.fasta").text
len(seq_db.splitlines())

[15]:

[26]:

import numpy as np
np.random.seed(42)

seqs = seq_db.splitlines()[1::2]
prompt_sample = np.random.choice(seqs, 500, replace=False)
prompt = session.prompt.create_prompt(prompt_sample)
prompt

[26]:

PromptMetadata(id='7af19e70-5a43-4fc6-bbda-2056e73bdd3a', name='prompt-bzt0Yh8wpx', description=None, created_date=datetime.datetime(2025, 12, 23, 23, 42, 14, 839207, tzinfo=TzInfo(0)), num_replicates=1, job_id=None, status=<JobStatus.SUCCESS: 'SUCCESS'>)

Now let’s run the sequence designs. For each design, we will:

Extract the nanobody chain (binder) from the complex
Mask its sequence to invoke inverse folding
Sample 10 sequences from PoET conditioned from our above prompt

[39]:

from openprotein.model import Model
from openprotein.protein import Protein

poet2_jobs = []
for i in range(N):
    boltzgen_model = boltzgen_designs[i]
    # Mask the binder sequence to indicate that it should be generated
    generated_chain = boltzgen_model.proteins["B"]
    query_chain = generated_chain.mask_sequence_at(scaffold_structure_mask)

    # Use PoET2 to design sequences for the binder backbone
    poet2_job = session.embeddings.poet2.generate(
        query=query_chain,
        prompt=prompt,
        num_samples=10,
        temperature=0.3,
        seed=42,
    )
    poet2_jobs.append(poet2_job)

# Wait for all jobs to complete
for poet2_job in poet2_jobs:
    poet2_job.wait_until_done(timeout=600)
    assert poet2_job.status == "SUCCESS"

Let’s retrieve the results for one of the jobs:

[40]:

poet2_jobs[0].get()

[40]:

[Score(name='generated-sequence-1', sequence='MEVQLVESGGGLVQPGGSLRLSCAASGFTLSDTMGWFRQAPGKGRELVAAGGASLSVYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCIVRARRRWYVGLHRVRWYRRAWGQGTQVTVSSRVTWYRRL', score=array([-106.74054])),
 Score(name='generated-sequence-2', sequence='MEVQLVESGGGLVQPGGSLRLSCAASGVRVYYYMGWFRQAPGKGRELVAAAVAASARYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCALADWYFERVDPALRFRVIRRWGQGTQVTVSSQYYFEWKR', score=array([-110.591])),
 Score(name='generated-sequence-3', sequence='MEVQLVESGGGLVQPGGSLRLSCAASGFTFSSAMGWFRQAPGKGRELVAAGGALLRLYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCTAASGELLLVLRQAYRLERLRWGQGTQVTVSSTIRVRRVK', score=array([-109.5933])),
 Score(name='generated-sequence-4', sequence='MEVQLVESGGGLVQPGGSLRLSCAASETSLLVLMGWFRQAPGKGRELVAAREYLVLRYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCEIEIYEFEAFEEEIYLRIQDVWGQGTQVTVSSRLFYRKRR', score=array([-116.00287])),
 Score(name='generated-sequence-5', sequence='MEVQLVESGGGLVQPGGSLRLSCAASGSWYYYYMGWFRQAPGKGRELVAARGFVIVQYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCWFRQAFDYEREYFYWQAAGREWGQGTQVTVSSRLRRLQRE', score=array([-107.0773])),
 Score(name='generated-sequence-6', sequence='MEVQLVESGGGLVQPGGSLRLSCAASEAGTFYYMGWFRQAPGKGRELVAAAGGSLLTYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCWFRRRRRRLEGLFVGAAAAAAWGQGTQVTVSSRFRLRRRR', score=array([-106.681274])),
 Score(name='generated-sequence-7', sequence='MEVQLVESGGGLVQPGGSLRLSCAASGRVYYYAMGWFRQAPGKGRELVAASRERFYYYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCAWQREYYFELDRFYYIIGVKVWGQGTQVTVSSAFKVFYRR', score=array([-114.47687])),
 Score(name='generated-sequence-8', sequence='MEVQLVESGGGLVQPGGSLRLSCAASGAAVYYCMGWFRQAPGKGRELVAAGAASAAFYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCAASGRARRWYRGRRLEAWYRAWGQGTQVTVSSRHYNVLER', score=array([-102.98752])),
 Score(name='generated-sequence-9', sequence='IEVQLVESGGGLVQPGGSLRLSCAASGFTFSATMGWFRQAPGKGRELVAAGRLRVTVYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCRQARWYYRQATPIYVYVYRRAWGQGTQVTVSSVRRRRRRV', score=array([-103.61996])),
 Score(name='generated-sequence-10', sequence='HEVQLVESGGGLVQPGGSLRLSCAASGSTVYYYMGWFRQAPGKGRELVAAGRFAAARYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCRLRLYELELEARLVFEAGRAEWGQGTQVTVSSRRFRVKVR', score=array([-110.864914]))]

Structure Prediction with Boltz#

We validate designed binders by predicting the complex with Boltz (target + designed nanobody sequence). A simple loop predicts a few designs.

[41]:

poet2_models = []
for i in range(N):
    boltzgen_model = boltzgen_designs[i]

    target_chain = boltzgen_model.proteins["A"]
    generated_chain = boltzgen_model.proteins["B"]

    for _, seq, _ in poet2_jobs[i].get():
        poet2_chain = generated_chain.copy()
        poet2_chain.sequence = seq
        # set the msa as single sequence mode
        poet2_chain.msa = Protein.single_sequence_mode
        target_chain.msa = Protein.single_sequence_mode
        poet2_model = Model(proteins={
            "A": target_chain,
            "B": poet2_chain,
        })
        poet2_models.append(poet2_model)

# Use Boltz-2 to predict the structure for the model
boltz2_job = session.fold.boltz2.fold(
    sequences=poet2_models
)
boltz2_job

[41]:

FoldJob(num_records=250, job_id='1f71e504-d111-497e-b60c-1f6f30ce59e0', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 12, 24, 0, 16, 52, 874059, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

Wait for completion. It should take around 30 minutes.

[42]:

boltz2_job.wait_until_done(verbose=True, timeout=60*60)

Waiting: 100%|███████████████████████████████| 100/100 [04:30<00:00,  2.71s/it, status=SUCCESS]

[42]:

True

Let’s get the results from the Boltz fold job:

[47]:

boltz2_results = boltz2_job.get()
boltz2_model = boltz2_results[0][1]
print("Chain A (target chain):", boltz2_model.proteins["A"].sequence)
print("Chain B (scaffold chain):", boltz2_model.proteins["B"].sequence)
visualize_pdb(boltz2_model.make_pdb_string())

Chain A (target chain): b'GGTTIQKELENIVVKERQNKKDTILMGLKVEVPWNYCDWASISFYDVRLESGILDMESIAVKYMTGCDIPPHVTLGITNKDQEANFQRFKELTRNIDLTSLSFTCKEVICFPQSRASKELGANGRAVVMKLEASDDVKALRNVLFNVVPTPRDIFGPVLSDPVWCPHVTIGYVRADDEDNKNSFIELAEAFRGSKIKVIGWCE'
Chain B (scaffold chain): b'MEVQLVESGGGLVQPGGSLRLSCAASGFTLSDTMGWFRQAPGKGRELVAAGGASLSVYYPDSVEGRFTISRDNAKRMVYLQMNSLRAEDTAVYYCIVRARRRWYVGLHRVRWYRRAWGQGTQVTVSSRVTWYRRL'

We can also retrieve the pAE to be used as a metric for ranking candidates.

[49]:

# this will take a minute to pull all the results
boltz2_pae_results = boltz2_job.pae
print("pae shape:", boltz2_pae_results[0][1].shape)

pae shape: (1, 338, 338)

Ranking by metrics#

You can rank candidates by using the metrics returned by Boltz. In this example, we will use similar metrics to what was used in the RFdiffusion example and use the following:

Monomer pLDDT (confidence that sequence folds to designed structure)
Complex pAE interaction (confidence that binder forms intended interface)
Complex Cα RMSD to designed structure

[52]:

import pandas as pd

design_index = []
plddt_scores = []
pae_scores = []
rmsd_scores = []
for i in range(N*10):
    # Get Boltz-2 predictions
    _, boltz2_model = boltz2_results[i]

    design_index.append(f"design{i//10}_poet{i%10}")

    target = boltz2_model.proteins["A"]
    binder = boltz2_model.proteins["B"]

    # Get pLDDT of binder
    plddt_score = np.mean(binder.plddt)
    plddt_scores.append(plddt_score)

    # Get pAE
    _, boltz2_complex_pae = boltz2_pae_results[i]
    binder_target_pae = boltz2_complex_pae.squeeze() # squeeze the shape
    pae_interaction_1 = np.mean(binder_target_pae[len(binder):,:len(binder)])
    pae_interaction_2 = np.mean(binder_target_pae[:len(binder),len(binder):])
    pae_interaction_total = (pae_interaction_1 + pae_interaction_2) / 2
    pae_scores.append(pae_interaction_total)

    # RMSD between designed binder and folded binder
    designed_binder = boltzgen_designs[i//10].proteins["B"]
    folded_binder = binder

    binder_rmsd = designed_binder.rmsd(folded_binder, backbone_only=True)
    rmsd_scores.append(binder_rmsd)

df = pd.DataFrame({"designs": design_index, "plddt": plddt_scores, "pae": pae_scores, "rmsd": rmsd_scores})
print(df.head(10))

         designs      plddt        pae      rmsd
0  design0_poet0  79.046585  22.584146  6.667605
1  design0_poet1  79.454254  16.600618  4.073777
2  design0_poet2  82.345818  21.591618  5.587231
3  design0_poet3  79.180252  21.141327  7.206620
4  design0_poet4  78.572571  22.263670  3.883473
5  design0_poet5  78.572571  22.263670  3.883473
6  design0_poet6  82.345818  21.591618  5.587231
7  design0_poet7  79.454254  16.600618  4.073777
8  design0_poet8  79.046585  22.584146  6.667605
9  design0_poet9  79.180252  21.141327  7.206620

Now sort them by their metrics:

[53]:

import pandas as pd

df_sorted = df.sort_values(by=["plddt", "pae", "rmsd"], ascending=[False, True, True])

print(df_sorted.head(10))

# Optionally, save rankings
# df_sorted.to_csv("boltzgen_rankings.csv", index=False)

            designs      plddt        pae      rmsd
143  design14_poet3  82.345818  21.591618  3.796331
148  design14_poet8  82.345818  21.591618  3.796331
72    design7_poet2  82.345818  21.591618  3.831218
77    design7_poet7  82.345818  21.591618  3.831218
90    design9_poet0  82.345818  21.591618  3.970705
94    design9_poet4  82.345818  21.591618  3.970705
98    design9_poet8  82.345818  21.591618  3.970705
133  design13_poet3  82.345818  21.591618  3.998831
138  design13_poet8  82.345818  21.591618  3.998831
172  design17_poet2  82.345818  21.591618  5.500561