Using BoltzGen#
This tutorial shows you how to use the BoltzGen model to design novel protein structures.
The examples here are mainly using those from the original documentation but adapted to show how it can be run using the OpenProtein platform, which can then be combined with our other workflows!
Full credit for the examples and model go to the authors of boltzgen!
Unconditional monomer design#
The basic execution of BoltzGen would be an unconditional design of a protein structure of a certain length. You would need 3 things:
An authenticated OpenProtein session
Length of the protein
Number of designs
Ndesired
[1]:
import openprotein
session = openprotein.connect()
length = 150
N = 3
[2]:
boltzgen = session.models.boltzgen
boltzgen.generate?
Signature:
boltzgen.generate(
query: str | bytes | vault.protein.Protein | vault.model.Model | openprotein.prompt.models.Query | None = None,
design_spec: openprotein.models.foundation.boltzgen_schema.BoltzGenDesignSpec | dict[str, typing.Any] | None = None,
structure_file: str | bytes | typing.BinaryIO | None = None,
n: int = 1,
diffusion_batch_size: int | None = None,
step_scale: float | None = None,
noise_scale: float | None = None,
scaffolds: dict[str, str | bytes | typing.BinaryIO] | None = None,
scaffold_set: openprotein.scaffolds.Scaffolds | str | None = None,
extra_structure_files: dict[str, str | bytes | typing.BinaryIO] | None = None,
**kwargs,
) -> openprotein.models.foundation.boltzgen.BoltzGenFuture
Docstring:
Run a protein structure generate job using BoltzGen.
Parameters
----------
query : str or bytes or Protein or Model or Query, optional
A query representing the design specification. Either `query` or `design_spec`
must be provided.
`query` provides a unified way to represent design specifications on the
OpenProtein platform. In this case, the structure mask of the containing Model
proteins are specified to be designed. Other parameters like binding, group,
secondary structures, etc. are also passed through to BoltzGen.
design_spec : BoltzGenDesignSpec | dict[str, Any] | None, optional
The BoltzGen design specification to run. Either `query` or `design_spec`
must be provided.
`design_spec` exposes a low-level interface to using BoltzGen by accepting the YAML
specification used by official BoltzGen examples.
Can be a typed BoltzGenDesignSpec object or a dict representing the
BoltzGen yaml request specification.
Note: If the design_spec includes file paths, provide
these extra files either using `scaffolds` or `extra_structure_files`.
structure_file : str | bytes | BinaryIO | None, optional
(Deprecated: use `extra_structure_files`)
An input PDB/CIF file used for inpainting or other guided design tasks
where parts of an existing structure are provided. This parameter provides
the actual structure content that corresponds to any FileEntity `path`
fields in the design_spec. Can be:
- A file path (str) to read from
- Raw file content (bytes)
- A file-like object (BinaryIO)
n : int, optional
The number of unique design trajectories to run (default is 1).
diffusion_batch_size : int, optional
The batch size for diffusion sampling. Controls how many samples are
processed in parallel during the diffusion process.
step_scale : float, optional
Scaling factor for the number of diffusion steps. Higher values may
improve quality at the cost of longer generation time.
noise_scale : float, optional
Scaling factor for the noise schedule during diffusion. Controls the
amount of noise added at each step of the reverse diffusion process.
scaffolds : dict[str, str | bytes | BinaryIO] | None, optional
Dictionary mapping scaffold filenames to their content. Each value can be:
- A file path (str) to read from
- Raw file content (bytes)
- A file-like object (BinaryIO)
These files will be packaged into a gzipped tar archive and made available
to the design process under the 'scaffolds/' directory.
scaffold_set : Scaffolds | str | None, optional
A pre-defined scaffold set object. Alternative to providing individual
scaffold files via the `scaffolds` parameter.
extra_structure_files : dict[str, str | bytes | BinaryIO] | None, optional
Dictionary mapping additional structure filenames to their content, with
the same format options as `scaffolds`. These files will be packaged into
the same archive under the 'extra/' directory and can be referenced in
the design specification.
Other Parameters
----------------
**kwargs : dict
Additional keyword args that are passed directly to the boltzgen
inference script. Overwrites any preceding options.
Returns
-------
BoltzGenFuture
A future object that can be used to retrieve the results of the design
job upon completion.
File: ~/Projects/openprotein/openprotein-python-private/openprotein/models/foundation/boltzgen.py
Type: method
To generate designs, we can use our convenient Query interface.
Alternatively, our python interface also supports the official design specifications from BoltzGen too. Look at the Appendix for an example.
[3]:
design_spec = {
"entities": [
{
"protein": {
"id": "A",
"sequence": str(length)
}
}
]
}
Run the design using BoltzGen:
[4]:
unconditional_design_job = boltzgen.generate(N=N, design_spec=design_spec)
unconditional_design_job
[4]:
BoltzGenJob(job_id='89a8e389-0818-4208-b077-cb80393adb9f', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 10, 30, 15, 55, 37, 405506, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Wait for the job to finish running with wait_until_done.
[5]:
unconditional_design_job.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 382.88it/s, status=SUCCESS]
[5]:
True
Retrieve the PDB file of the design. Use the replicate param to specify the 0-indexed design index to retrieve, in this case 0 to 2.
[6]:
from molviewspec import create_builder
def display_structure(structure_string):
builder = create_builder()
structure = builder.download(url="mystructure.cif")\
.parse(format="mmcif")\
.model_structure()\
.component()\
.representation()\
.color_from_source(schema="atom",
category_name="atom_site",
field_name="auth_asym_id",
palette={"kind": "categorical", # color by chain
"colors": ["blue", "red", "green", "orange"],
"mode": "ordinal"}
)
return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)
unconditional_design = unconditional_design_job.get(replicate=0)
display_structure(unconditional_design)
Vanilla Protein Binding#
One of the basic examples in BoltzGen is to do a vanilla protein binding. We can retrieve the spec from the examples and the example target directly from their github.
Note that we need to pass in a structure file to run this example. In the spec (as shown below), it refers to a path 1g13.cif. This is auto-patched within our system to refer to the passed structure_file.
[7]:
import requests
import yaml
import json
design_spec = yaml.safe_load(requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13prot.yaml").text)
structure_file = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13.cif").text
print(json.dumps(design_spec, indent=4))
display_structure(structure_file)
{
"entities": [
{
"protein": {
"id": "C",
"sequence": "80..140"
}
},
{
"file": {
"path": "1g13.cif",
"include": [
{
"chain": {
"id": "A"
}
}
]
}
}
]
}
Now we can run the example:
[8]:
vanilla_protein_design_job = boltzgen.generate(
structure_file=structure_file,
design_spec=design_spec,
N=1,
)
vanilla_protein_design_job
[8]:
BoltzGenJob(job_id='8d73ba0a-44a7-4f59-8223-842ba9992141', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 10, 30, 17, 38, 49, 879407, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[9]:
vanilla_protein_design_job.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:58<00:00, 1.19s/it, status=SUCCESS]
[9]:
True
Display the target + binder. Take note that chain B is the target from chain A above, and chain A is the designed binder.
[10]:
from molviewspec import create_builder
vanilla_protein_design = vanilla_protein_design_job.get()
display_structure(vanilla_protein_design)
Vanilla Peptide with Target Binding Site#
Let’s run the other example which involves designing a peptide binder, by retrieving the specs from the official BoltzGen repo.
[11]:
design_spec = yaml.safe_load(requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_peptide_with_target_binding_site/beetletert.yaml").text)
structure_file = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_peptide_with_target_binding_site/5cqg.cif").text
print(json.dumps(design_spec, indent=4))
display_structure(structure_file)
{
"entities": [
{
"protein": {
"id": "G",
"sequence": "12..20"
}
},
{
"file": {
"path": "5cqg.cif",
"include": [
{
"chain": {
"id": "A"
}
}
],
"binding_types": [
{
"chain": {
"id": "A",
"binding": "343,344,251"
}
}
],
"structure_groups": "all"
}
}
]
}
[12]:
vanilla_peptide_design_job = boltzgen.generate(
structure_file=structure_file,
design_spec=design_spec,
N=1,
)
vanilla_peptide_design_job
/home/jmage/Projects/openprotein/openprotein-python-private/openprotein/base.py:136: UserWarning: The requested payload is >1MB. There might be some delays or issues in processing. If the request fails, please try again with smaller sizes.
warnings.warn(
[12]:
BoltzGenJob(job_id='5a87d18e-e087-4e3f-9aa6-ab50e767f5a9', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 10, 30, 17, 58, 29, 302761, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Wait for and retrieve the result. Display the output design.
[13]:
from molviewspec import create_builder
design.wait_until_done(verbose=True, timeout=600)
vanilla_peptide_design = vanilla_peptide_design_job.get()
display_structure(vanilla_peptide_design)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 576.49it/s, status=SUCCESS]
Next Steps#
You can run more of the examples from the BoltzGen repository. Take note that any command line arguments to boltzgen run can be passed as kwargs to the boltzgen.design function.
You can also move on to the next step of the design pipeline by running inverse folding using PoET-2. Refer to the walkthrough of Inverse Folding with PoET-2 for an example.
Appendix#
Using the BoltzGen design specification#
To support any non-standard workflows that may not be fully covered
[ ]: