openprotein.models#

Unified access to models on the OpenProtein AI platform. Use them to work at a lower level to craft your own workflows.

Note that the Models API is a WIP interface, but we are working hard on bringing all models here for a consistent and simple developer experience.

Interface#

class openprotein.models.ModelsAPI(session)[source]#

API-like accessor that groups all available protein models.

This class is attached to the main APISession and provides a single, consistent entry point for accessing various models.

Parameters:: session (APISession)

Models#

RFdiffusion#

class openprotein.models.foundation.rfdiffusion.RFdiffusionModel(session, model_id='rfdiffusion')[source]#

RFdiffusion model for generating de novo protein structures.

This model supports functionalities like unconditional design, scaffolding, and binder design.

Parameters:

session (APISession)
model_id (str)

get_metadata()[source]#

Get model metadata for this model.

Returns:: The metadata associated with this model.
Return type:: ModelMetadata

generate(n=1, structure_file=None, contigs=None, inpaint_seq=None, provide_seq=None, hotspot=None, T=None, partial_T=None, use_active_site_model=None, use_beta_model=None, symmetry=None, order=None, add_potential=None, scaffold_target_structure_file=None, scaffold_target_use_struct=False, **kwargs)[source]#

Run a protein structure generate job using RFdiffusion.

Parameters:

n (int, optional) – The number of unique design trajectories to run (default is 1).
structure_file (BinaryIO, optional) – An input PDB file (as a file-like object) used for inpainting or other guided design tasks where parts of an existing structure are provided.
contigs (int, str, optional) – Defines the lengths and connectivity of chain segments for the desired structure, specified in RFdiffusion’s contig string format. Required for most design tasks. Example: 150, ‘10-20/A100-110/10-20’ for a binder design.
inpaint_seq (str, optional) – A string specifying the regions in the input structure to mask for in-painting. Example: ‘A1-A10/A30-40’.
provide_seq (str, optional) – A string specifying which segments of the contig have a provided sequence. Example: ‘A1-A10/A30-40’.
hotspot (str, optional) – A string specifying hotspot residues to constrain during design, typically for functional sites. Example: ‘A10,A12,A14’.
T (int, optional) – The number of timesteps for the diffusion process.
partial_T (int, optional) – The number of timesteps for partial diffusion.
use_active_site_model (bool, optional) – If True, uses the active site model checkpoint, which has been finetuned to better keep very small motifs in place in the output for motif scaffolding (default is False).
use_beta_model (bool, optional) – If True, uses the complex beta model checkpoint, which generates a greater diversity of topologies but has not been extensively experimentally validated (default is False).
symmetry ({"cyclic", "dihedral", "tetrahedral"}, optional) – The type of symmetry to apply to the design.
order (int, optional) – The order of the symmetry (e.g., 3 for C3 or D3 symmetry). Must be provided if symmetry is set.
add_potential (bool, optional) – A flag to toggle an additional potential to guide the design. This defaults to true in the case of symmetric design.
scaffold_target_structure_file (str, bytes, BinaryIO, optional) – A PDB file (which can be the text string or bytes or the file-like object) containing a scaffold structure to be used as a structural guide. It could also be used as a target when doing scaffold guided binder design with scaffold_target_use_struct.
scaffold_target_use_struct (bool, optional) – Whether or not to use the provided scaffold structure as a target. Otherwise, it is used only as a topology guide.
**kwargs (dict) – Additional keyword args that are passed directly to the rfdiffusion inference script. Overwrites any preceding options.

Returns:

A future object that can be used to retrieve the results of the design job upon completion.

Return type:

RFdiffusionFuture

predict(n=1, structure_file=None, contigs=None, inpaint_seq=None, provide_seq=None, hotspot=None, T=None, partial_T=None, use_active_site_model=None, use_beta_model=None, symmetry=None, order=None, add_potential=None, scaffold_target_structure_file=None, scaffold_target_use_struct=False, **kwargs)#

Run a protein structure generate job using RFdiffusion.

Parameters:

n (int, optional) – The number of unique design trajectories to run (default is 1).
structure_file (BinaryIO, optional) – An input PDB file (as a file-like object) used for inpainting or other guided design tasks where parts of an existing structure are provided.
contigs (int, str, optional) – Defines the lengths and connectivity of chain segments for the desired structure, specified in RFdiffusion’s contig string format. Required for most design tasks. Example: 150, ‘10-20/A100-110/10-20’ for a binder design.
inpaint_seq (str, optional) – A string specifying the regions in the input structure to mask for in-painting. Example: ‘A1-A10/A30-40’.
provide_seq (str, optional) – A string specifying which segments of the contig have a provided sequence. Example: ‘A1-A10/A30-40’.
hotspot (str, optional) – A string specifying hotspot residues to constrain during design, typically for functional sites. Example: ‘A10,A12,A14’.
T (int, optional) – The number of timesteps for the diffusion process.
partial_T (int, optional) – The number of timesteps for partial diffusion.
use_active_site_model (bool, optional) – If True, uses the active site model checkpoint, which has been finetuned to better keep very small motifs in place in the output for motif scaffolding (default is False).
use_beta_model (bool, optional) – If True, uses the complex beta model checkpoint, which generates a greater diversity of topologies but has not been extensively experimentally validated (default is False).
symmetry ({"cyclic", "dihedral", "tetrahedral"}, optional) – The type of symmetry to apply to the design.
order (int, optional) – The order of the symmetry (e.g., 3 for C3 or D3 symmetry). Must be provided if symmetry is set.
add_potential (bool, optional) – A flag to toggle an additional potential to guide the design. This defaults to true in the case of symmetric design.
scaffold_target_structure_file (str, bytes, BinaryIO, optional) – A PDB file (which can be the text string or bytes or the file-like object) containing a scaffold structure to be used as a structural guide. It could also be used as a target when doing scaffold guided binder design with scaffold_target_use_struct.
scaffold_target_use_struct (bool, optional) – Whether or not to use the provided scaffold structure as a target. Otherwise, it is used only as a topology guide.
**kwargs (dict) – Additional keyword args that are passed directly to the rfdiffusion inference script. Overwrites any preceding options.

Returns:

A future object that can be used to retrieve the results of the design job upon completion.

Return type:

RFdiffusionFuture

class openprotein.models.foundation.rfdiffusion.RFdiffusionFuture(session, job)[source]#

Future for handling the results of an RFdiffusion job.

Parameters:

session (APISession)
job (RFdiffusionJob)

get_pdb(replicate=0)[source]#

Retrieve the PDB file for a specific design.

Parameters:

design_index (int) – The 0-based index of the design to retrieve.
replicate (int)

Returns:

The content of the PDB file as a string.

Return type:

str

get(replicate=0)[source]#

Default result accessor, returns the first PDB.

Parameters:: replicate (int)

cancelled()#

Check if the job has been cancelled.

Returns:: True if the job is cancelled, False otherwise.
Return type:: bool

property created_date: datetime#: The creation timestamp of the job.

done()#

Check if the job has completed.

Returns:: True if the job is done, False otherwise.
Return type:: bool

property end_date: datetime | None#: The end timestamp of the job.

property id: str#: The unique identifier of the job.

property job_id: str#: The unique identifier of the job.

property job_type: str#: The type of the job.

property progress_counter: int#: The progress counter of the job.

refresh()#: Refresh the job status and internal job object.

property start_date: datetime | None#: The start timestamp of the job.

property status: JobStatus#: The current status of the job.

wait(interval=5, timeout=None, verbose=False)#

Wait for the job to complete, then fetch results.

Parameters:

interval (int, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.
timeout (int | None, optional) – Maximum time in seconds to wait. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to False.

Returns:

The results of the job.

Return type:

Any

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for the job to complete.

Parameters:

interval (float, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – Maximum time in seconds to wait. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to False.

Returns:

True if the job completed successfully.

Return type:

bool

Notes

This method does not fetch the job results, unlike wait().

BoltzGen#

class openprotein.models.foundation.boltzgen.BoltzGenModel(session, model_id='boltzgen')[source]#

BoltzGen model for generating de novo protein structures.

This model supports functionalities like unconditional design, scaffolding, and binder design.

Parameters:

session (APISession)
model_id (str)

get_metadata()[source]#

Get model metadata for this model.

Returns:: The metadata associated with this model.
Return type:: ModelMetadata

generate(design_spec, structure_file=None, n=1, diffusion_batch_size=None, step_scale=None, noise_scale=None, **kwargs)[source]#

Run a protein structure generate job using BoltzGen.

Parameters:

design_spec (BoltzGenDesignSpec | dict[str, Any]) –
The BoltzGen design specification to run. Can be a typed BoltzGenDesignSpec object or a dict representing the BoltzGen yaml request specification.

Note: If the design_spec includes FileEntity objects with path fields, those paths are placeholders. The actual structure file content must be provided via the structure_file parameter below, as the platform backend currently only accepts structure files this way.
structure_file (str | bytes | BinaryIO | None, optional) – An input PDB/CIF file used for inpainting or other guided design tasks where parts of an existing structure are provided. This parameter provides the actual structure content that corresponds to any FileEntity path fields in the design_spec. Can be: - A file path (str) to read from - Raw file content (bytes) - A file-like object (BinaryIO)
n (int, optional) – The number of unique design trajectories to run (default is 1).
diffusion_batch_size (int, optional) – The batch size for diffusion sampling. Controls how many samples are processed in parallel during the diffusion process.
step_scale (float, optional) – Scaling factor for the number of diffusion steps. Higher values may improve quality at the cost of longer generation time.
noise_scale (float, optional) – Scaling factor for the noise schedule during diffusion. Controls the amount of noise added at each step of the reverse diffusion process.
**kwargs (dict) – Additional keyword args that are passed directly to the boltzgen inference script. Overwrites any preceding options.

Returns:

A future object that can be used to retrieve the results of the design job upon completion.

Return type:

BoltzGenFuture

predict(design_spec, structure_file=None, n=1, diffusion_batch_size=None, step_scale=None, noise_scale=None, **kwargs)#

Run a protein structure generate job using BoltzGen.

Parameters:

design_spec (BoltzGenDesignSpec | dict[str, Any]) –
The BoltzGen design specification to run. Can be a typed BoltzGenDesignSpec object or a dict representing the BoltzGen yaml request specification.

Note: If the design_spec includes FileEntity objects with path fields, those paths are placeholders. The actual structure file content must be provided via the structure_file parameter below, as the platform backend currently only accepts structure files this way.
structure_file (str | bytes | BinaryIO | None, optional) – An input PDB/CIF file used for inpainting or other guided design tasks where parts of an existing structure are provided. This parameter provides the actual structure content that corresponds to any FileEntity path fields in the design_spec. Can be: - A file path (str) to read from - Raw file content (bytes) - A file-like object (BinaryIO)
n (int, optional) – The number of unique design trajectories to run (default is 1).
diffusion_batch_size (int, optional) – The batch size for diffusion sampling. Controls how many samples are processed in parallel during the diffusion process.
step_scale (float, optional) – Scaling factor for the number of diffusion steps. Higher values may improve quality at the cost of longer generation time.
noise_scale (float, optional) – Scaling factor for the noise schedule during diffusion. Controls the amount of noise added at each step of the reverse diffusion process.
**kwargs (dict) – Additional keyword args that are passed directly to the boltzgen inference script. Overwrites any preceding options.

Returns:

A future object that can be used to retrieve the results of the design job upon completion.

Return type:

BoltzGenFuture

class openprotein.models.foundation.boltzgen.BoltzGenFuture(session, job)[source]#

Future for handling the results of an BoltzGen job.

Parameters:

session (APISession)
job (BoltzGenJob)

get_pdb(replicate=0)[source]#

Retrieve the PDB file for a specific design.

Parameters:

design_index (int) – The 0-based index of the design to retrieve.
replicate (int)

Returns:

The content of the PDB file as a string.

Return type:

str

get(replicate=0)[source]#

Default result accessor, returns the first PDB.

Parameters:: replicate (int)

cancelled()#

Check if the job has been cancelled.

Returns:: True if the job is cancelled, False otherwise.
Return type:: bool

property created_date: datetime#: The creation timestamp of the job.

done()#

Check if the job has completed.

Returns:: True if the job is done, False otherwise.
Return type:: bool

property end_date: datetime | None#: The end timestamp of the job.

property id: str#: The unique identifier of the job.

property job_id: str#: The unique identifier of the job.

property job_type: str#: The type of the job.

property progress_counter: int#: The progress counter of the job.

refresh()#: Refresh the job status and internal job object.

property start_date: datetime | None#: The start timestamp of the job.

property status: JobStatus#: The current status of the job.

wait(interval=5, timeout=None, verbose=False)#

Wait for the job to complete, then fetch results.

Parameters:

interval (int, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.
timeout (int | None, optional) – Maximum time in seconds to wait. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to False.

Returns:

The results of the job.

Return type:

Any

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for the job to complete.

Parameters:

interval (float, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – Maximum time in seconds to wait. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to False.

Returns:

True if the job completed successfully.

Return type:

bool

Notes

This method does not fetch the job results, unlike wait().

class openprotein.models.foundation.boltzgen_schema.BoltzGenDesignSpec(*, entities, constraints=None)[source]#

Complete BoltzGen design specification.

This schema represents the full design specification for BoltzGen, including entities (proteins, ligands, files) and constraints.

Parameters:

entities (list[Entity])
constraints (list[Constraint] | None)

entities#

List of entities in the design.

Type:: list[Entity]

constraints#

List of constraints for the design.

Type:: list[Constraint] | None

Examples

>>> spec = BoltzGenDesignSpec(
...     entities=[
...         Entity(protein=ProteinEntity(id="A", sequence="ACDEFGHIKLMNPQRSTVWY")),
...         Entity(ligand=LigandEntity(id="B", ccd="ATP"))
...     ],
...     constraints=[
...         Constraint(bond=BondConstraint(atom1=["A", 10, "CA"], atom2=["B", 1, "O"]))
...     ]
... )

classmethod check_entities_not_empty(v)[source]#: Ensure at least one entity is provided.

class openprotein.models.foundation.boltzgen_schema.Entity(*, protein=None, ligand=None, file=None)[source]#

Entity wrapper for different entity types.

Parameters:

protein (ProteinEntity | None)
ligand (LigandEntity | None)
file (FileEntity | None)

protein#

Protein entity specification.

Type:: ProteinEntity | None

ligand#

Ligand entity specification.

Type:: LigandEntity | None

file#

File-based entity specification.

Type:: FileEntity | None

check_exactly_one_entity()[source]#: Ensure exactly one entity type is specified.

class openprotein.models.foundation.boltzgen_schema.ProteinEntity(*, id, sequence, secondary_structure=None, binding_types=None, cyclic=False)[source]#

Protein entity specification.

Parameters:

id (str | list[str])
sequence (str)
secondary_structure (str | None)
binding_types (str | dict | None)
cyclic (bool)

id#

Chain identifier(s) for the protein.

Type:: str or list[str]

sequence#

Protein sequence. Can include: - Amino acid letters (A-Z) - Design residues (numbers, e.g., “10” for 10 design residues) - Ranges (e.g., “15..20” for random number between 15-20) - Mixed patterns (e.g., “3..5C6C3” for variable design + fixed residues)

Type:: str

secondary_structure#

Secondary structure specification. Defaults to None.

Type:: str | None

binding_types#

Binding type specification. Can be: - String with characters: ‘u’ (unspecified), ‘B’ (binding), ‘N’ (not binding) - Dict with ‘binding’ and/or ‘not_binding’ keys

Type:: str | dict | None

cyclic#

Whether the protein is cyclic. Defaults to False.

Type:: bool

class openprotein.models.foundation.boltzgen_schema.LigandEntity(*, id, ccd=None, smiles=None, binding_types=None)[source]#

Ligand entity specification.

Parameters:

id (str | list[str])
ccd (str | None)
smiles (str | None)
binding_types (str | dict | None)

id#

Chain identifier(s) for the ligand.

Type:: str or list[str]

ccd#

Chemical Component Dictionary identifier.

Type:: str | None

smiles#

SMILES string representation of the ligand.

Type:: str | None

binding_types#

Binding type specification.

Type:: str | dict | None

check_ccd_or_smiles()[source]#: Ensure either ccd or smiles is provided.

class openprotein.models.foundation.boltzgen_schema.FileEntity(*, path, fuse=None, include=None, exclude=None, include_proximity=None, binding_types=None, structure_groups=None, design=None, secondary_structure=None, design_insertions=None)[source]#

File-based entity specification (e.g., PDB/CIF files).

Note

When using the generate() method, the path field is overwritten by the structure_file argument. The OpenProtein platform backend currently only accepts structure files via the structure_file parameter, not as paths in the design spec. The path field is included here for compatibility with the BoltzGen YAML format, but will be replaced when submitting to the API.

Parameters:

path (str)
fuse (str | None)
include (str | list[dict] | None)
exclude (list[dict] | None)
include_proximity (list[dict] | None)
binding_types (list[dict] | None)
structure_groups (list[dict] | None)
design (list[dict] | None)
secondary_structure (list[dict] | None)
design_insertions (list[dict] | None)

path#

Path to the structure file. This is a placeholder that will be overwritten by the structure_file argument when calling generate(). The actual structure content must be provided via the structure_file parameter.

Type:: str

fuse#

Chain ID to fuse with.

Type:: str | None

include#

Chains or regions to include. Can be “all” or list of chain specifications.

Type:: str | list[dict]

exclude#

Chains or regions to exclude.

Type:: list[dict] | None

include_proximity#

Proximity-based inclusion specifications.

Type:: list[dict] | None

binding_types#

Binding type specifications for chains.

Type:: list[dict] | None

structure_groups#

Structure group specifications.

Type:: list[dict] | None

design#

Design specifications for chains.

Type:: list[dict] | None

secondary_structure#

Secondary structure specifications for chains.

Type:: list[dict] | None

design_insertions#

Design insertion specifications.

Type:: list[dict] | None

class openprotein.models.foundation.boltzgen_schema.Constraint(*, bond=None, total_len=None)[source]#

Constraint wrapper for different constraint types.

Parameters:

bond (BondConstraint | None)
total_len (TotalLengthConstraint | None)

bond#

Bond constraint specification.

Type:: BondConstraint | None

total_len#

Total length constraint specification.

Type:: TotalLengthConstraint | None

check_at_least_one_constraint()[source]#: Ensure at least one constraint type is specified.

class openprotein.models.foundation.boltzgen_schema.BondConstraint(*, atom1, atom2)[source]#

Covalent bond constraint between two atoms.

Parameters:

atom1 (list[str | int])
atom2 (list[str | int])

atom1#

First atom specification: [CHAIN_ID, RES_IDX, ATOM_NAME].

Type:: list[str | int]

atom2#

Second atom specification: [CHAIN_ID, RES_IDX, ATOM_NAME].

Type:: list[str | int]

class openprotein.models.foundation.boltzgen_schema.TotalLengthConstraint(*, min=None, max=None)[source]#

Total length constraint for the design.

Parameters:

min (int | None)
max (int | None)

min#

Minimum total length.

Type:: int | None

max#

Maximum total length.

Type:: int | None