Open In Colab Get Notebook View In GitHub

Using BoltzGen#

This tutorial shows you how to use the BoltzGen model to design novel protein structures.

The examples here are mainly using those from the original documentation but adapted to show how it can be run using the OpenProtein platform, which can then be combined with our other workflows!

Full credit for the examples and model go to the authors of boltzgen!

Unconditional monomer design#

The basic execution of BoltzGen would be an unconditional design of a protein structure of a certain length. You would need 3 things:

  1. An authenticated OpenProtein session

  2. Length of the protein

  3. Number of designs N desired

[1]:
import openprotein
session = openprotein.connect()
length = 150
N = 3
[2]:
boltzgen = session.models.boltzgen
boltzgen.generate?
Signature:
boltzgen.generate(
    design_spec: dict[str, typing.Any],
    structure_file: str | bytes | typing.BinaryIO | None = None,
    n: int = 1,
    **kwargs,
) -> openprotein.models.foundation.boltzgen.BoltzGenFuture
Docstring:
Run a protein structure generate job using BoltzGen.

Parameters
----------
design_spec : dict[str, Any]
    The BoltzGen design specification to run. This is the Python representation
    of the BoltzGen yaml request specification.
structure_file : BinaryIO, optional
    An input PDB file (as a file-like object) used for inpainting or other
    guided design tasks where parts of an existing structure are provided.
n : int, optional
    The number of unique design trajectories to run (default is 1).

Other Parameters
----------------
**kwargs : dict
    Additional keyword args that are passed directly to the boltzgen
    inference script. Overwrites any preceding options.

Returns
-------
BoltzGenFuture
    A future object that can be used to retrieve the results of the design
    job upon completion.
File:      ~/Projects/openprotein/openprotein-python-private/openprotein/models/foundation/boltzgen.py
Type:      method

Craft the design specification, which is based on the official YAML specification from BoltzGen.

[3]:
design_spec = {
    "entities": [
        {
            "protein": {
                "id": "A",
                "sequence": str(length)
            }
        }
    ]
}

Run the design using BoltzGen:

[4]:
unconditional_design_job = boltzgen.generate(N=N, design_spec=design_spec)
unconditional_design_job
[4]:
BoltzGenJob(job_id='89a8e389-0818-4208-b077-cb80393adb9f', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 10, 30, 15, 55, 37, 405506, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

Wait for the job to finish running with wait_until_done.

[5]:
unconditional_design_job.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 382.88it/s, status=SUCCESS]
[5]:
True

Retrieve the PDB file of the design. Use the replicate param to specify the 0-indexed design index to retrieve, in this case 0 to 2.

[6]:
from molviewspec import create_builder

def display_structure(structure_string):
    builder = create_builder()
    structure = builder.download(url="mystructure.cif")\
        .parse(format="mmcif")\
        .model_structure()\
        .component()\
        .representation()\
        .color_from_source(schema="atom",
                            category_name="atom_site",
                            field_name="auth_asym_id",
                            palette={"kind": "categorical", # color by chain
                                    "colors": ["blue", "red", "green", "orange"],
                                    "mode": "ordinal"}
                          )
    return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)

unconditional_design = unconditional_design_job.get(replicate=0)
display_structure(unconditional_design)

Vanilla Protein Binding#

One of the basic examples in BoltzGen is to do a vanilla protein binding. We can retrieve the spec from the examples and the example target directly from their github.

Note that we need to pass in a structure file to run this example. In the spec (as shown below), it refers to a path 1g13.cif. This is auto-patched within our system to refer to the passed structure_file.

[7]:
import requests
import yaml
import json

design_spec = yaml.safe_load(requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13prot.yaml").text)
structure_file = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13.cif").text

print(json.dumps(design_spec, indent=4))
display_structure(structure_file)
{
    "entities": [
        {
            "protein": {
                "id": "C",
                "sequence": "80..140"
            }
        },
        {
            "file": {
                "path": "1g13.cif",
                "include": [
                    {
                        "chain": {
                            "id": "A"
                        }
                    }
                ]
            }
        }
    ]
}

Now we can run the example:

[8]:
vanilla_protein_design_job = boltzgen.generate(
    structure_file=structure_file,
    design_spec=design_spec,
    N=1,
)
vanilla_protein_design_job
[8]:
BoltzGenJob(job_id='8d73ba0a-44a7-4f59-8223-842ba9992141', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 10, 30, 17, 38, 49, 879407, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[9]:
vanilla_protein_design_job.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:58<00:00,  1.19s/it, status=SUCCESS]
[9]:
True

Display the target + binder. Take note that chain B is the target from chain A above, and chain A is the designed binder.

[10]:
from molviewspec import create_builder
vanilla_protein_design = vanilla_protein_design_job.get()
display_structure(vanilla_protein_design)

Vanilla Peptide with Target Binding Site#

Let’s run the other example which involves designing a peptide binder, by retrieving the specs from the official BoltzGen repo.

[11]:
design_spec = yaml.safe_load(requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_peptide_with_target_binding_site/beetletert.yaml").text)
structure_file = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_peptide_with_target_binding_site/5cqg.cif").text

print(json.dumps(design_spec, indent=4))
display_structure(structure_file)
{
    "entities": [
        {
            "protein": {
                "id": "G",
                "sequence": "12..20"
            }
        },
        {
            "file": {
                "path": "5cqg.cif",
                "include": [
                    {
                        "chain": {
                            "id": "A"
                        }
                    }
                ],
                "binding_types": [
                    {
                        "chain": {
                            "id": "A",
                            "binding": "343,344,251"
                        }
                    }
                ],
                "structure_groups": "all"
            }
        }
    ]
}
[12]:
vanilla_peptide_design_job = boltzgen.generate(
    structure_file=structure_file,
    design_spec=design_spec,
    N=1,
)
vanilla_peptide_design_job
/home/jmage/Projects/openprotein/openprotein-python-private/openprotein/base.py:136: UserWarning: The requested payload is >1MB. There might be some delays or issues in processing. If the request fails, please try again with smaller sizes.
  warnings.warn(
[12]:
BoltzGenJob(job_id='5a87d18e-e087-4e3f-9aa6-ab50e767f5a9', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 10, 30, 17, 58, 29, 302761, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

Wait for and retrieve the result. Display the output design.

[13]:
from molviewspec import create_builder
design.wait_until_done(verbose=True, timeout=600)
vanilla_peptide_design = vanilla_peptide_design_job.get()
display_structure(vanilla_peptide_design)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 576.49it/s, status=SUCCESS]

Next Steps#

You can run more of the examples from the BoltzGen repository. Take note that any command line arguments to boltzgen run can be passed as kwargs to the boltzgen.design function.

You can also move on to the next step of the design pipeline by running inverse folding using PoET-2. Refer to the walkthrough of Inverse Folding with PoET-2 for an example.