openprotein.predictor#

Train property predictors from your datasets to enable predictions for new sequences!

Use crossvalidation on the trained predictors to estimate uncertainty.

Trained predictors can be used to design new sequences based on design goals for properties.

class openprotein.predictor.PredictorAPI(session)[source]#

Predictor API providing the interface to train and predict predictors.

Parameters:: session (APISession)

get_predictor(predictor_id, include_stats=False, include_calibration_curves=False)[source]#

Get predictor by model_id.

PredictorModel allows all the usual prediction job manipulation: e.g. making POST and GET requests for this predictor specifically.

Parameters:

predictor_id (str) – The model identifier.
include_stats (bool) – Whether to include stats of the predictor from the latest evaluation (pearson, spearman, ece). Default is False.
include_calibration_curves (bool) – Whether to include calibration curves of the predictor from the latest evaluation. Default is False.

Returns:

The predictor model to inspect and make predictions with.

Return type:

PredictorModel

Raises:

HTTPError – If the GET request does not succeed.

list_predictors(limit=100, offset=0, include_stats=False, include_calibration_curves=False)[source]#

List predictors available.

Parameters:

limit (int) – Limit of the number of predictors to return in list. Default is 100.
offset (int) – Offset to the predictors to query for paged queries. Default is 0.
include_stats (bool) – Whether to include stats of each predictor from the latest evaluation (pearson, spearman, ece). Default is False.
include_calibration_curves (bool) – Whether to include calibration curves of each predictor from the latest evaluation. Default is False.

Returns:

List of predictor models to inspect and make predictions with.

Return type:

list[PredictorModel]

Raises:

HTTPError – If the GET request does not succeed.

fit_gp(assay, properties, model, feature_type=None, reduction=None, name=None, description=None, **kwargs)[source]#

Fit a GP on an assay with the specified feature model and hyperparameters.

Parameters:

assay (AssayMetadata or AssayDataset or str) – Assay to fit GP on.
properties (list of str) – Properties in the assay to fit the gp on.
model (EmbeddingModel or SVDModel or str) – Instance of either EmbeddingModel or SVDModel to use depending on feature type. Can also be a str specifying the model id, but then feature_type would have to be specified.
feature_type (FeatureType or None) – Type of features to use for encoding sequences. “SVD” or “PLM”. None would require model to be EmbeddingModel or SVDModel.
reduction (str or None, optional) – Type of embedding reduction to use for computing features. E.g. “MEAN” or “SUM”. Used only if using EmbeddingModel, and must be non-nil if using an EmbeddingModel. Defaults to None.
kwargs – Additional keyword arguments to be passed to foundational models, e.g. prompt_id for PoET models.
name (str | None)
description (str | None)

Returns:

The GP model being fit.

Return type:

PredictorModel

delete_predictor(predictor_id)[source]#

Delete predictor model.

Parameters:: predictor_id (str) – The ID of the predictor.
Returns:: True: successful deletion
Return type:: bool

ensemble(predictors)[source]#

Ensemble predictor models together.

Parameters:: predictors (list[PredictorModel]) – List of predictors to ensemble together.
Returns:: Ensembled predictor model
Return type:: PredictorModel

class openprotein.predictor.PredictorModel(session, job=None, metadata=None)[source]#

Class providing predict endpoint for fitted predictor models.

Also implements a Future that waits for train job.

Parameters:

session (APISession)
job (PredictorTrainJob | None)
metadata (PredictorMetadata | None)

property id#: ID of predictor.

property reduction#: The reduction of th embeddings used to train the predictor, if any.

property sequence_length#: The sequence length constraint on the predictor, if any.

property training_assay: AssayDataset#: The assay the predictor was trained on.

property training_properties: list[str]#: The list of properties the predictor was trained on.

property metadata#: The predictor metadata.

get_model()[source]#

Retrieve the embeddings or SVD model used to create embeddings to train on.

Return type:: EmbeddingModel | SVDModel | None

property model: EmbeddingModel | SVDModel | None#: The embeddings or SVD model used to create embeddings to train on.

delete()[source]#

Delete this predictor model.

Return type:: bool

get(verbose=False)[source]#

Returns the train loss curves.

Parameters:: verbose (bool)

get_assay()[source]#

Get assay used for train job.

Returns:: AssayDataset
Return type:: Assay dataset used for train job.

crossvalidate(n_splits=None)[source]#

Run a crossvalidation on the trained predictor.

Parameters:: n_splits (int | None)
Return type:: CVResultFuture

predict(sequences)[source]#

Make predictions about the trained properties for a list of sequences.

Parameters:: sequences (list[bytes] | list[str])
Return type:: PredictionResultFuture

single_site(sequence)[source]#

Compute the single-site mutated predictions of a base sequence.

Parameters:: sequence (bytes | str)
Return type:: PredictionResultFuture