openprotein.fold#

Create PDBs of your protein sequences via our folding models!

Note that for Boltz and AlphaFold2 Models, you will also need to utilize our align workflow to create MSAs.

Interface#

class openprotein.fold.FoldAPI(session)[source]#

Fold API provides a high level interface for making protein structure predictions.

boltz2: Boltz2Model#

Boltz-2 model

boltz_2: Boltz2Model#
boltz1x: Boltz1xModel#

Boltz-1x model

boltz_1x: Boltz1xModel#
boltz1: Boltz1Model#

Boltz-1 model

boltz_1: Boltz1Model#
af2: AlphaFold2Model#

AlphaFold-2 model

alphafold2: AlphaFold2Model#
rf3: RosettaFold3Model#

RosettaFold-3 model

rosettafold_3: RosettaFold3Model#
esmfold: ESMFoldModel#

ESMFold model

minifold: MiniFoldModel#

MiniFold model

protenix: ProtenixModel#

Protenix model

list_models()[source]#

list models available for creating folds of your sequences

get_model(model_id)[source]#

Get model by model_id.

FoldModel allows all the usual job manipulation: e.g. making POST and GET requests for this model specifically.

Parameters:

model_id (str) – the model identifier

Returns:

The model

Return type:

FoldModel

Raises:

HTTPError – If the GET request does not succeed.

get_results(job)[source]#

Retrieves the results of a fold job.

Parameters:

job (Job) – The fold job whose results are to be retrieved.

Returns:

An instance of FoldResultFuture

Return type:

FoldResultFuture

Models#

class openprotein.fold.ProtenixModel(session, model_id, metadata=None)[source]#

Class providing inference endpoints for Protenix structure prediction.

fold(sequences, diffusion_samples=1, num_recycles=10, num_steps=200, templates=None, **_)[source]#

Request structure prediction with Protenix.

Parameters:
  • sequences (Sequence[Complex | Protein | str | bytes] | MSAFuture) – List of protein complexes to include in folded output. Protein objects must be tagged with an msa, which can be a Protein.single_sequence_mode for single sequence mode. Alternatively, supply an MSAFuture to use all query sequences as a multimer.

  • diffusion_samples (int) – Number of diffusion samples to use

  • num_recycles (int) – Number of recycling steps to use

  • num_steps (int) – Number of sampling steps to use

  • templates (list[Protein | Complex | Template] | None = None) – List of templates to use for structure prediction.

Returns:

Future for the folding results.

Return type:

FoldResultFuture

class openprotein.fold.Boltz2Model(session, model_id, metadata=None)[source]#

Class providing inference endpoints for Boltz-2 structure prediction model which jointly models complex structures and binding affinities.

fold(sequences, diffusion_samples=1, num_recycles=3, num_steps=200, step_scale=1.638, use_potentials=False, constraints=None, templates=None, properties=None, method=None, **_)[source]#

Request structure prediction with Boltz-2 model.

Parameters:
  • sequences (Sequence[Complex | Protein | str | bytes] | MSAFuture) – List of protein sequences to include in folded output. Protein objects must be tagged with an msa, which can be a Protein.single_sequence_mode for single sequence mode. Alternatively, supply an MSAFuture to use all query sequences as a multimer.

  • diffusion_samples (int) – Number of diffusion samples to use

  • num_recycles (int) – Number of recycling steps to use

  • num_steps (int) – Number of sampling steps to use

  • step_scale (float) – Scaling factor for diffusion steps.

  • use_potentials (bool = False.) – Whether or not to use potentials.

  • constraints (list[dict] | None = None) – List of constraints.

  • templates (list[Protein | Complex | Template] | None = None) – List of templates to use for structure prediction.

  • properties (list[dict] | None = None) – List of additional properties to predict. Should match the BoltzProperties

  • method (str | None) – The experimental method or supervision source used for the prediction. Defults to None. Supported values (case-insensitive) include: ‘MD’, ‘X-RAY DIFFRACTION’, ‘ELECTRON MICROSCOPY’, ‘SOLUTION NMR’, ‘SOLID-STATE NMR’, ‘NEUTRON DIFFRACTION’, ‘ELECTRON CRYSTALLOGRAPHY’, ‘FIBER DIFFRACTION’, ‘POWDER DIFFRACTION’, ‘INFRARED SPECTROSCOPY’, ‘FLUORESCENCE TRANSFER’, ‘EPR’, ‘THEORETICAL MODEL’, ‘SOLUTION SCATTERING’, ‘OTHER’, ‘AFDB’, ‘BOLTZ-1’. View the documentation on Boltz for upstream details.

Returns:

Future for the folding result.

Return type:

FoldResultFuture

class openprotein.fold.Boltz1xModel(session, model_id, metadata=None)[source]#

Class providing inference endpoints for Boltz-1x open-source structure prediction model, which adds the use of inference potentials to improve performance.

fold(sequences, diffusion_samples=1, num_recycles=3, num_steps=200, step_scale=1.638, use_potentials=True, constraints=None, **_)[source]#

Request structure prediction with Boltz-1x model. Uses potentials with Boltz-1 model.

Parameters:
  • sequences (Sequence[Complex | Protein | str | bytes] | MSAFuture) – List of protein sequences to include in folded output. Protein objects must be tagged with an msa, which can be a Protein.single_sequence_mode for single sequence mode. Alternatively, supply an MSAFuture to use all query sequences as a multimer.

  • diffusion_samples (int) – Number of diffusion samples to use

  • num_recycles (int) – Number of recycling steps to use

  • num_steps (int) – Number of sampling steps to use

  • step_scale (float) – Scaling factor for diffusion steps.

  • constraints (Optional[List[dict]]) – List of constraints.

Returns:

Future for the folding complex result.

Return type:

FoldResultFuture

class openprotein.fold.Boltz1Model(session, model_id, metadata=None)[source]#

Class providing inference endpoints for Boltz-1 open-source structure prediction model.

fold(sequences, diffusion_samples=1, num_recycles=3, num_steps=200, step_scale=1.638, use_potentials=False, constraints=None, **_)[source]#

Request structure prediction with Boltz-1 model.

Parameters:
  • sequences (Sequence[Complex | Protein | str | bytes] | MSAFuture) – List of protein sequences to include in folded output. Protein objects must be tagged with an msa, which can be a Protein.single_sequence_mode for single sequence mode. Alternatively, supply an MSAFuture to use all query sequences as a multimer.

  • diffusion_samples (int) – Number of diffusion samples to use

  • num_recycles (int) – Number of recycling steps to use

  • num_steps (int) – Number of sampling steps to use

  • step_scale (float) – Scaling factor for diffusion steps.

  • use_potentials (bool = False.) – Whether or not to use potentials.

  • constraints (Optional[List[dict]]) – List of constraints.

Returns:

Future for the folding complex result.

Return type:

FoldResultFuture

class openprotein.fold.RosettaFold3Model(session, model_id, metadata=None)[source]#

Class providing inference endpoints for RosettaFold-3 structure prediction model.

fold(sequences, diffusion_samples=1, num_recycles=10, num_steps=50, **kwargs)[source]#

Request structure prediction with RosettaFold-3 model.

Parameters:
  • sequences (list[Complex | Protein | str | bytes] | MSAFuture,) – List of protein sequences to include in folded output. Protein objects must be tagged with an msa, which can be a Protein.single_sequence_mode for single sequence mode. Alternatively, supply an MSAFuture to use all query sequences as a multimer.

  • diffusion_samples (int) – Number of diffusion samples to use

  • num_recycles (int) – Number of recycling steps to use

  • num_steps (int) – Number of sampling steps to use

Returns:

Future for the folding results.

Return type:

FoldResultFuture

class openprotein.fold.AlphaFold2Model(session, model_id, metadata=None)[source]#

Class providing inference endpoints for AlphaFold2 structure prediction models, based on the implementation by ColabFold.

fold(sequences, num_recycles=None, num_models=1, num_relax=0, **kwargs)[source]#

Post sequences to alphafold model.

Parameters:
  • sequences (Sequence[Complex | Protein | str | bytes] | MSAFuture) – List of protein sequences to include in folded output. Protein objects must be tagged with an msa, which can be a Protein.single_sequence_mode for single sequence mode. Alternatively, supply an MSAFuture to use all query sequences as a multimer.

  • num_recycles (int) – number of times to recycle models

  • num_models (int) – number of models to train - best model will be used

  • num_relax (int) – maximum number of iterations for relax

Returns:

job

Return type:

Job

class openprotein.fold.ESMFoldModel(session, model_id, metadata=None)[source]#

Class providing inference endpoints for Facebook’s ESMFold structure prediction models.

fold(sequences, num_recycles=None, **_)[source]#

Fold sequences using this model.

Parameters:
  • sequences (Sequence[Complex | Protein | str | bytes]) – sequences to fold

  • num_recycles (int | None) – number of times to recycle models

Return type:

FoldResultFuture

class openprotein.fold.MiniFoldModel(session, model_id, metadata=None)[source]#

Class providing inference endpoints for MiniFold.

fold(sequences, num_recycles=None, **_)[source]#

Fold sequences using this model.

Parameters:
  • sequences (Sequence[bytes | str]) – sequences to fold

  • num_recycles (int | None) – number of times to recycle models

Return type:

FoldResultFuture

Results#

class openprotein.fold.FoldResultFuture(session, job=None, metadata=None, sequences=None, complexes=None, max_workers=10)[source]#

Fold results represented as a future.

job#

The fold job associated with this future.

Type:

FoldJob

property sequences: list[bytes]#

Get the sequences submitted for the fold request.

Returns:

List of sequences.

Return type:

list[bytes]

property complexes: list[Complex]#

Get the molecular complexes submitted for the fold request.

Returns:

List of complexes.

Return type:

list[Complex]

property id#

Get the ID of the fold request.

Returns:

Fold job ID.

Return type:

str

property metadata: FoldMetadata#

The fold metadata.

property model_id: str#

The fold model used.

get_item(k: int, key: None = None) Structure[source]#
get_item(k: int, key: Literal['pae', 'pde', 'plddt', 'ptm', 'ipae'] | None = None) ndarray
get_item(k: int, key: Literal['affinity']) BoltzAffinity
get_item(k: int, key: Literal['confidence']) list[BoltzConfidence]
get_item(k: int, key: Literal['score', 'metrics'] | None = None) DataFrame

Get fold results for a specified sequence.

Parameters:

sequence (bytes) – Sequence to fetch results for.

Returns:

Complex containing the folded structure.

Return type:

Complex

stream(key: None = None) Iterator[Structure][source]#
stream(key: Literal['pae', 'pde', 'plddt', 'ptm', 'ipae'] | None = None) Iterator[ndarray]
stream(key: Literal['affinity']) Iterator[BoltzAffinity]
stream(key: Literal['confidence']) Iterator[list[BoltzConfidence]]
stream(key: Literal['score', 'metrics'] | None = None) Iterator[DataFrame]

Retrieve results for this job as a stream.

Returns:

A generator that yields (key, value) tuples.

Return type:

Generator

get_pae()[source]#

Get the Predicted Aligned Error (PAE) matrix for all outputs.

Returns:

PAE matrix.

Return type:

list[np.ndarray]

Raises:

AttributeError – If PAE is not supported for the model.

property args: dict[str, Any]#

The registered job arguments.

cancelled()#

Check if the job has been cancelled.

Returns:

True if the job is cancelled, False otherwise.

Return type:

bool

property created_date: datetime#

The creation timestamp of the job.

done()#

Check if the job has completed.

Returns:

True if the job is done, False otherwise.

Return type:

bool

property end_date: datetime | None#

The end timestamp of the job.

get_pde()[source]#

Get the Predicted Distance Error (PDE) matrix.

Returns:

PDE matrix.

Return type:

list[np.ndarray]

Raises:

AttributeError – If PDE is not supported for the model.

property job_id: str#

The unique identifier of the job.

property job_type: str#

The type of the job.

property progress_counter: int#

The progress counter of the job.

refresh()#

Refresh the job status and internal job object.

property start_date: datetime | None#

The start timestamp of the job.

property status: JobStatus#

The current status of the job.

wait(interval=5, timeout=None, verbose=False)#

Wait for the job to complete, then fetch results.

Parameters:
  • interval (int, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int | None, optional) – Maximum time in seconds to wait. Defaults to None.

  • verbose (bool, optional) – Verbosity flag. Defaults to False.

Returns:

The results of the job.

Return type:

Any

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for the job to complete.

Parameters:
  • interval (float, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – Maximum time in seconds to wait. Defaults to None.

  • verbose (bool, optional) – Verbosity flag. Defaults to False.

Returns:

True if the job completed successfully.

Return type:

bool

Notes

This method does not fetch the job results, unlike wait().

get_plddt()[source]#

Get the Predicted Local Distance Difference Test (pLDDT) scores.

Returns:

pLDDT scores.

Return type:

list[np.ndarray]

Raises:

AttributeError – If pLDDT is not supported for the model.

get_ptm()[source]#

Get the Predicted TM (pTM) scores.

Returns:

pTM scores.

Return type:

list[np.ndarray]

Raises:

AttributeError – If pTM is not supported for the model.

get_ipae()[source]#

Get the interface PAE (iPAE) — a synthetic scalar per unit derived from pae and the per-unit protein-chain layout. Returns one shape-(1,) array per fold output.

Returns:

iPAE scalars (one (1,) array per unit).

Return type:

list[np.ndarray]

Raises:

AttributeError – If iPAE is not supported for the model.

get_score()[source]#

Get the predicted scores.

Returns:

Structure prediction scores.

Return type:

list[pd.DataFrame]

Raises:

AttributeError – If score is not supported for the model.

get_metrics()[source]#

Get the predicted metrics.

Returns:

Structure prediction metrics.

Return type:

list[pd.DataFrame]

Raises:

AttributeError – If metrics is not supported for the model.

get_confidence()[source]#

Retrieve the confidences of the structure prediction.

Note

This is currently supported for Boltz models and Protenix.

Returns:

List of list of confidence objects (model-specific schema).

Return type:

list[list[BoltzConfidence]] | list[list[ProtenixConfidence]]

Raises:

AttributeError – If confidence is not supported for the model.

get_affinity()[source]#

Retrieve the predicted binding affinities.

Note

This is only currently supported for Boltz models.

Returns:

BoltzAffinity object containing the predicted affinities.

Return type:

list[list[BoltzAffinity]]

Raises:

AttributeError – If affinity is not supported for the model.

get_pae_batch()[source]#

Get the Predicted Aligned Error (PAE) for every unit as a single stacked np.ndarray of shape [N, ...]. Per-unit arrays that differ in size are NaN-padded to the per-axis max shape.

get_pde_batch()[source]#

Like get_pae_batch but for PDE. Shape [N, ...], NaN-padded.

get_plddt_batch()[source]#

Like get_pae_batch but for pLDDT. Shape [N, ...], NaN-padded.

get_ptm_batch()[source]#

Like get_pae_batch but for pTM. Shape [N, ...], NaN-padded.

get_ipae_batch()[source]#

Get iPAE for every unit as a single np.ndarray of shape [N]. Units whose per-unit iPAE could not be computed appear as NaN.

get_confidence_batch()[source]#

Retrieve per-unit confidence objects in a single HTTP call.

Returns a length-N list; each entry is that unit’s parsed confidence list, or None if the server could not fetch the result for that unit.

get_affinity_batch()[source]#

Retrieve per-unit affinity objects in a single HTTP call.

Returns a length-N list; each entry is a BoltzAffinity or None if the server could not fetch that unit’s result.