openprotein.predictor#

Train property predictors from your datasets to enable predictions for new sequences!

Use crossvalidation on the trained predictors to estimate uncertainty.

Trained predictors can be used to design new sequences based on design goals for properties.

class openprotein.predictor.PredictorAPI[source]#

This class defines a high level interface for accessing the predictors API.

__init__(session, embeddings, svd)[source]#
Parameters:
get_predictor(predictor_id)[source]#

Get predictor by model_id.

PredictorModel allows all the usual prediction job manipulation: e.g. making POST and GET requests for this predictor specifically.

Parameters:

predictor_id (str) – the model identifier

Returns:

The predictor model to inspect and make predictions with.

Return type:

PredictorModel

Raises:

HTTPError – If the GET request does not succeed.

list_predictors()[source]#

List predictors available.

Returns:

List of predictor models to inspect and make predictions with.

Return type:

list[PredictorModel}

Raises:

HTTPError – If the GET request does not succeed.

fit_gp(assay, properties, model, feature_type=None, reduction=None, name=None, description=None, **kwargs)[source]#

Fit a GP on an assay with the specified feature model and hyperparameters.

Parameters:
  • assay (AssayMetadata | str) – Assay to fit GP on.

  • properties (list[str]) – Properties in the assay to fit the gp on.

  • feature_type (str) – Type of features to use for encoding sequences. “SVD” or “PLM”.

  • model (str) – Protembed/SVD model to use depending on feature type.

  • reduction (str | None) – Type of embedding reduction to use for computing features. default = None

  • prompt (PromptFuture | str | None) – Prompt if using PoET-based models.

  • name (str | None)

  • description (str | None)

Returns:

The GP model being fit.

Return type:

PredictorModel

delete_predictor(predictor_id)[source]#

Delete predictor model.

Parameters:

predictor_id (str) – The ID of the predictor.

Returns:

True: successful deletion

Return type:

bool

class openprotein.predictor.PredictorModel[source]#

Class providing predict endpoint for fitted predictor models.

Also implements a Future that waits for train job.

__init__(session, job=None, metadata=None)[source]#

Initializes with either job get or predictor get.

Parameters:
  • session (APISession)

  • job (TrainJob | None)

  • metadata (PredictorMetadata | None)

property id#

ID of predictor.

property reduction#

The reduction of th embeddings used to train the predictor, if any.

property sequence_length#

The sequence length constraint on the predictor, if any.

property training_assay: AssayDataset#

The assay the predictor was trained on.

property training_properties: list[str]#

The list of properties the predictor was trained on.

property metadata#

The predictor metadata.

get_model()[source]#

Retrieve the embeddings or SVD model used to create embeddings to train on.

Return type:

EmbeddingModel | SVDModel | None

property model: EmbeddingModel | SVDModel | None#

The embeddings or SVD model used to create embeddings to train on.

delete()[source]#

Delete this predictor model.

Return type:

bool

get(verbose=False)[source]#

Returns the train loss curves.

Parameters:

verbose (bool)

get_assay()[source]#

Get assay used for train job.

Returns:

AssayDataset

Return type:

Assay dataset used for train job.

crossvalidate(n_splits=None)[source]#

Run a crossvalidation on the trained predictor.

Parameters:

n_splits (int | None)

Return type:

CVResultFuture

predict(sequences)[source]#

Make predictions about the trained properties for a list of sequences.

Parameters:

sequences (list[bytes] | list[str])

Return type:

PredictionResultFuture

single_site(sequence)[source]#

Compute the single-site mutated predictions of a base sequence.

Parameters:

sequence (bytes | str)

Return type:

PredictionResultFuture