Open In Colab Get Notebook View In GitHub

Using BoltzGen#

This tutorial shows you how to use the BoltzGen model to design novel protein structures.

The examples here are mainly using those from the original documentation but adapted to show how it can be run using the OpenProtein platform, which can then be combined with our other workflows!

Full credit for the examples and model go to the authors of boltzgen!

Unconditional monomer design#

The basic execution of BoltzGen would be an unconditional design of a protein structure of a certain length. You would need 3 things:

  1. An authenticated OpenProtein session

  2. Length of the protein

  3. Number of designs N desired

[1]:
import openprotein
session = openprotein.connect()
length = 150
N = 3
[2]:
boltzgen = session.models.boltzgen
boltzgen.generate?
Signature:
boltzgen.generate(
    query: str | bytes | vault.protein.Protein | vault.model.Model | openprotein.prompt.models.Query | None = None,
    design_spec: openprotein.models.foundation.boltzgen_schema.BoltzGenDesignSpec | dict[str, typing.Any] | None = None,
    structure_file: str | bytes | typing.BinaryIO | None = None,
    n: int = 1,
    diffusion_batch_size: int | None = None,
    step_scale: float | None = None,
    noise_scale: float | None = None,
    scaffolds: dict[str, str | bytes | typing.BinaryIO] | None = None,
    scaffold_set: openprotein.scaffolds.Scaffolds | str | None = None,
    extra_structure_files: dict[str, str | bytes | typing.BinaryIO] | None = None,
    **kwargs,
) -> openprotein.models.foundation.boltzgen.BoltzGenFuture
Docstring:
Run a protein structure generate job using BoltzGen.

Parameters
----------
query : str or bytes or Protein or Model or Query, optional
    A query representing the design specification. Either `query` or `design_spec`
    must be provided.
    `query` provides a unified way to represent design specifications on the
    OpenProtein platform. In this case, the structure mask of the containing Model
    proteins are specified to be designed. Other parameters like binding, group,
    secondary structures, etc. are also passed through to BoltzGen.
design_spec : BoltzGenDesignSpec | dict[str, Any] | None, optional
    The BoltzGen design specification to run. Either `query` or `design_spec`
    must be provided.
    `design_spec` exposes a low-level interface to using BoltzGen by accepting the YAML
    specification used by official BoltzGen examples.
    Can be a typed BoltzGenDesignSpec object or a dict representing the
    BoltzGen yaml request specification.
    Note: If the design_spec includes file paths, provide
    these extra files either using `scaffolds` or `extra_structure_files`.
structure_file : str | bytes | BinaryIO | None, optional
    (Deprecated: use `extra_structure_files`)
    An input PDB/CIF file used for inpainting or other guided design tasks
    where parts of an existing structure are provided. This parameter provides
    the actual structure content that corresponds to any FileEntity `path`
    fields in the design_spec. Can be:
    - A file path (str) to read from
    - Raw file content (bytes)
    - A file-like object (BinaryIO)
n : int, optional
    The number of unique design trajectories to run (default is 1).
diffusion_batch_size : int, optional
    The batch size for diffusion sampling. Controls how many samples are
    processed in parallel during the diffusion process.
step_scale : float, optional
    Scaling factor for the number of diffusion steps. Higher values may
    improve quality at the cost of longer generation time.
noise_scale : float, optional
    Scaling factor for the noise schedule during diffusion. Controls the
    amount of noise added at each step of the reverse diffusion process.
scaffolds : dict[str, str | bytes | BinaryIO] | None, optional
    Dictionary mapping scaffold filenames to their content. Each value can be:
    - A file path (str) to read from
    - Raw file content (bytes)
    - A file-like object (BinaryIO)
    These files will be packaged into a gzipped tar archive and made available
    to the design process under the 'scaffolds/' directory.
scaffold_set : Scaffolds | str | None, optional
    A pre-defined scaffold set object. Alternative to providing individual
    scaffold files via the `scaffolds` parameter.
extra_structure_files : dict[str, str | bytes | BinaryIO] | None, optional
    Dictionary mapping additional structure filenames to their content, with
    the same format options as `scaffolds`. These files will be packaged into
    the same archive under the 'extra/' directory and can be referenced in
    the design specification.

Other Parameters
----------------
**kwargs : dict
    Additional keyword args that are passed directly to the boltzgen
    inference script. Overwrites any preceding options.

Returns
-------
BoltzGenFuture
    A future object that can be used to retrieve the results of the design
    job upon completion.
File:      ~/Projects/openprotein/openprotein-python-private/openprotein/models/foundation/boltzgen.py
Type:      method

To generate designs, we can use our convenient Query interface.

Alternatively, our python interface also supports the official design specifications from BoltzGen too. Look at the Appendix for an example.

[3]:
design_spec = {
    "entities": [
        {
            "protein": {
                "id": "A",
                "sequence": str(length)
            }
        }
    ]
}

Run the design using BoltzGen:

[4]:
unconditional_design_job = boltzgen.generate(N=N, design_spec=design_spec)
unconditional_design_job
[4]:
BoltzGenJob(job_id='89a8e389-0818-4208-b077-cb80393adb9f', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 10, 30, 15, 55, 37, 405506, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

Wait for the job to finish running with wait_until_done.

[5]:
unconditional_design_job.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 382.88it/s, status=SUCCESS]
[5]:
True

Retrieve the PDB file of the design. Use the replicate param to specify the 0-indexed design index to retrieve, in this case 0 to 2.

[6]:
from molviewspec import create_builder

def display_structure(structure_string):
    builder = create_builder()
    structure = builder.download(url="mystructure.cif")\
        .parse(format="mmcif")\
        .model_structure()\
        .component()\
        .representation()\
        .color_from_source(schema="atom",
                            category_name="atom_site",
                            field_name="auth_asym_id",
                            palette={"kind": "categorical", # color by chain
                                    "colors": ["blue", "red", "green", "orange"],
                                    "mode": "ordinal"}
                          )
    return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)

unconditional_design = unconditional_design_job.get(replicate=0)
display_structure(unconditional_design)

Vanilla Protein Binding#

One of the basic examples in BoltzGen is to do a vanilla protein binding. We can retrieve the spec from the examples and the example target directly from their github.

Note that we need to pass in a structure file to run this example. In the spec (as shown below), it refers to a path 1g13.cif. This is auto-patched within our system to refer to the passed structure_file.

[7]:
import requests
import yaml
import json

design_spec = yaml.safe_load(requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13prot.yaml").text)
structure_file = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13.cif").text

print(json.dumps(design_spec, indent=4))
display_structure(structure_file)
{
    "entities": [
        {
            "protein": {
                "id": "C",
                "sequence": "80..140"
            }
        },
        {
            "file": {
                "path": "1g13.cif",
                "include": [
                    {
                        "chain": {
                            "id": "A"
                        }
                    }
                ]
            }
        }
    ]
}

Now we can run the example:

[8]:
vanilla_protein_design_job = boltzgen.generate(
    structure_file=structure_file,
    design_spec=design_spec,
    N=1,
)
vanilla_protein_design_job
[8]:
BoltzGenJob(job_id='8d73ba0a-44a7-4f59-8223-842ba9992141', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 10, 30, 17, 38, 49, 879407, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[9]:
vanilla_protein_design_job.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:58<00:00,  1.19s/it, status=SUCCESS]
[9]:
True

Display the target + binder. Take note that chain B is the target from chain A above, and chain A is the designed binder.

[10]:
from molviewspec import create_builder
vanilla_protein_design = vanilla_protein_design_job.get()
display_structure(vanilla_protein_design)

Vanilla Peptide with Target Binding Site#

Let’s run the other example which involves designing a peptide binder, by retrieving the specs from the official BoltzGen repo.

[11]:
design_spec = yaml.safe_load(requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_peptide_with_target_binding_site/beetletert.yaml").text)
structure_file = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_peptide_with_target_binding_site/5cqg.cif").text

print(json.dumps(design_spec, indent=4))
display_structure(structure_file)
{
    "entities": [
        {
            "protein": {
                "id": "G",
                "sequence": "12..20"
            }
        },
        {
            "file": {
                "path": "5cqg.cif",
                "include": [
                    {
                        "chain": {
                            "id": "A"
                        }
                    }
                ],
                "binding_types": [
                    {
                        "chain": {
                            "id": "A",
                            "binding": "343,344,251"
                        }
                    }
                ],
                "structure_groups": "all"
            }
        }
    ]
}
[12]:
vanilla_peptide_design_job = boltzgen.generate(
    structure_file=structure_file,
    design_spec=design_spec,
    N=1,
)
vanilla_peptide_design_job
/home/jmage/Projects/openprotein/openprotein-python-private/openprotein/base.py:136: UserWarning: The requested payload is >1MB. There might be some delays or issues in processing. If the request fails, please try again with smaller sizes.
  warnings.warn(
[12]:
BoltzGenJob(job_id='5a87d18e-e087-4e3f-9aa6-ab50e767f5a9', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 10, 30, 17, 58, 29, 302761, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

Wait for and retrieve the result. Display the output design.

[13]:
from molviewspec import create_builder
design.wait_until_done(verbose=True, timeout=600)
vanilla_peptide_design = vanilla_peptide_design_job.get()
display_structure(vanilla_peptide_design)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 576.49it/s, status=SUCCESS]

Next Steps#

You can run more of the examples from the BoltzGen repository. Take note that any command line arguments to boltzgen run can be passed as kwargs to the boltzgen.design function.

You can also move on to the next step of the design pipeline by running inverse folding using PoET-2. Refer to the walkthrough of Inverse Folding with PoET-2 for an example.

Appendix#

Using the BoltzGen design specification#

To support any non-standard workflows that may not be fully covered

[ ]: