openprotein.embeddings #

fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)#

Fit an SVD on the embedding results of this model.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the args.

Parameters:

sequences (List[bytes]) – sequences to SVD
n_components (int) – number of components in SVD. Will determine output shapes
reduction (ReductionType | None) – embeddings reduction to use (e.g. mean)
assay (AssayDataset | None)

Return type:

fit_umap(sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, **kwargs)#

Fit an UMAP on the embedding results of this model.

This function will create an UMAPModel based on the embeddings from this model as well as the hyperparameters specified in the args.

Parameters:

sequences (list[bytes] | None) – Optional sequences to fit UMAP with. Either use sequences or assay. sequences is preferred.
assay (AssayDataset | None) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.
n_components (int) – Number of components in UMAP fit. Will determine output shapes. Defaults to 2.
reduction (ReductionType | None) – Embeddings reduction to use (e.g. mean). Defaults to MEAN.

Return type:

get_metadata()#

Get model metadata for this model.

Return type:: ModelMetadata

classmethod get_model()#

logits(sequences, **kwargs)#

logit embeddings for sequences using this model.

Parameters:: sequences (List[bytes]) – sequences to SVD
Return type:: EmbeddingResultFuture

property metadata#

class openprotein.embeddings.ESMModel[source]#

Class providing inference endpoints for Facebook’s ESM protein language Models.

Examples

View specific model details (inc supported tokens) with the ? operator.

import openprotein
session = openprotein.connect(username="user", password="password")
session.embedding.esm2_t12_35M_UR50D?

__init__(session, model_id, metadata=None)#

Parameters:

session (APISession)
model_id (str)
metadata (ModelMetadata | None)

attn(sequences, **kwargs)#

Attention embeddings for sequences using this model.

Parameters:: sequences (List[bytes]) – sequences to SVD
Return type:: EmbeddingResultFuture

classmethod create(session, model_id, default=None)#

Create and return an instance of the appropriate Future class based on the job type.

Returns: - An instance of the appropriate Future class.

Parameters:

session (APISession)
model_id (str)
default (type[EmbeddingModel] | None)

embed(sequences, reduction=ReductionType.MEAN, **kwargs)#

Embed sequences using this model.

Parameters:

sequences (List[bytes]) – sequences to SVD
reduction (ReductionType | None, Optional) – embeddings reduction to use (e.g. mean)

Return type:

EmbeddingResultFuture

fit_gp(assay, properties, reduction, name=None, description=None, **kwargs)#

Fit a GP on assay using this embedding model and hyperparameters.

Parameters:

assay (AssayMetadata | str) – Assay to fit GP on.
properties (list[str]) – Properties in the assay to fit the gp on.
reduction (str) – Type of embedding reduction to use for computing features. PLM must use reduction.
name (str | None)
description (str | None)

Return type:

fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)#

Fit an SVD on the embedding results of this model.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the args.

Parameters:

sequences (List[bytes]) – sequences to SVD
n_components (int) – number of components in SVD. Will determine output shapes
reduction (ReductionType | None) – embeddings reduction to use (e.g. mean)
assay (AssayDataset | None)

Return type:

fit_umap(sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, **kwargs)#

Fit an UMAP on the embedding results of this model.

This function will create an UMAPModel based on the embeddings from this model as well as the hyperparameters specified in the args.

Parameters:

sequences (list[bytes] | None) – Optional sequences to fit UMAP with. Either use sequences or assay. sequences is preferred.
assay (AssayDataset | None) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.
n_components (int) – Number of components in UMAP fit. Will determine output shapes. Defaults to 2.
reduction (ReductionType | None) – Embeddings reduction to use (e.g. mean). Defaults to MEAN.

Return type:

get_metadata()#

Get model metadata for this model.

Return type:: ModelMetadata

logits(sequences, **kwargs)#

logit embeddings for sequences using this model.

Parameters:: sequences (List[bytes]) – sequences to SVD
Return type:: EmbeddingResultFuture

class openprotein.embeddings.PoETModel[source]#

Class for OpenProtein’s foundation model PoET - NB. PoET functions are dependent on a prompt supplied via the align endpoints.

Examples

View specific model details (inc supported tokens) with the ? operator.

import openprotein
session = openprotein.connect(username="user", password="password")
session.embedding.poet.<embeddings_method>

__init__(session, model_id, metadata=None)[source]#

Parameters:

session (APISession)
model_id (str)
metadata (ModelMetadata | None)

embed(sequences, prompt=None, reduction=ReductionType.MEAN, **kwargs)[source]#

Embed sequences using this model.

Parameters:

prompt (str | Prompt) – prompt from an align workflow to condition Poet model
sequence (bytes) – Sequence to embed.
reduction (str) – embeddings reduction to use (e.g. mean)
sequences (list[bytes])

Returns:

A future object that returns the embeddings of the submitted sequences.

Return type:

EmbeddingResultFuture

logits(sequences, prompt=None, **kwargs)[source]#

logit embeddings for sequences using this model.

Parameters:

prompt (str | Prompt) – prompt from an align workflow to condition Poet model
sequence (bytes) – Sequence to analyse.
sequences (list[bytes])

Returns:

A future object that returns the logits of the submitted sequences.

Return type:

EmbeddingResultFuture

attn()[source]#: Not Available for Poet.

score(sequences, prompt=None, **kwargs)[source]#

Score query sequences using the specified prompt.

Parameters:

prompt (str | Prompt) – Prompt or prompt_id or prompt from an align workflow to condition Poet model
sequence (list[bytes]) – Sequences to score.
sequences (list[bytes])

Returns:

A future object that returns the scores of the submitted sequences.

Return type:

indel(sequence, prompt=None, insert=None, delete=None, **kwargs)[source]#

Score all indels of the query sequence using the specified prompt.

Parameters:

sequence (bytes) – Sequence to analyse.
prompt (str | Prompt | None) – Prompt or prompt_id or prompt from an align workflow to condition Poet model.
insert (str | None) – Insertion fragment at each site.
delete (int | None) – Range of size of fragment to delete at each site.

Returns:

A future object that returns the scores of the indel-ed sequence.

Return type:

single_site(sequence, prompt=None, **kwargs)[source]#

Score all single substitutions of the query sequence using the specified prompt.

Parameters:

prompt (str | Prompt) – Prompt or prompt_id or prompt from an align workflow to condition Poet model
sequence (bytes) – Sequence to analyse.

Returns:

A future object that returns the scores of the mutated sequence.

Return type:

generate(prompt, num_samples=100, temperature=1.0, topk=None, topp=None, max_length=1000, seed=None, **kwargs)[source]#

Generate protein sequences conditioned on a prompt.

Parameters:

prompt (str | Prompt) – Prompt from an align workflow to condition Poet model
num_samples (int, optional) – The number of samples to generate, by default 100.
temperature (float, optional) – The temperature for sampling. Higher values produce more random outputs, by default 1.0.
topk (int, optional) – The number of top-k residues to consider during sampling, by default None.
topp (float, optional) – The cumulative probability threshold for top-p sampling, by default None.
max_length (int, optional) – The maximum length of generated proteins, by default 1000.
seed (int, optional) – Seed for random number generation, by default a random number.

Returns:

A future object representing the status and information about the generation job.

Return type:

EmbeddingsGenerateFuture

fit_svd(prompt=None, sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)[source]#

Fit an SVD on the embedding results of PoET.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the args.

Parameters:

prompt (str | Prompt) – prompt from an align workflow to condition Poet model
sequences (List[bytes]) – sequences to SVD
n_components (int) – number of components in SVD. Will determine output shapes
reduction (str) – embeddings reduction to use (e.g. mean)
assay (AssayDataset | None)

Returns:

A future that represents the fitted SVD model.

Return type:

fit_umap(prompt=None, sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, **kwargs)[source]#

Fit a UMAP on assay using PoET and hyperparameters.

This function will create a UMAP based on the embeddings from this PoET model as well as the hyperparameters specified in the args.

Parameters:

prompt (str | Prompt) – prompt from an align workflow to condition Poet model
sequences (list[bytes] | None) – Optional sequences to fit UMAP with. Either use sequences or assay. sequences is preferred.
assay (AssayDataset | None) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.
n_components (int) – Number of components in UMAP fit. Will determine output shapes. Defaults to 2.
reduction (ReductionType | None) – Embeddings reduction to use (e.g. mean). Defaults to MEAN.

Returns:

A future that represents the fitted UMAP model.

Return type:

fit_gp(assay, properties, prompt=None, **kwargs)[source]#

Fit a GP on assay using this embedding model and hyperparameters.

Parameters:

assay (AssayMetadata | str) – Assay to fit GP on.
properties (list[str]) – Properties in the assay to fit the gp on.
reduction (str) – Type of embedding reduction to use for computing features. PLM must use reduction.
prompt (str | Prompt | None)

Returns:

A future that represents the trained predictor model.

Return type:

classmethod create(session, model_id, default=None)#

Create and return an instance of the appropriate Future class based on the job type.

Returns: - An instance of the appropriate Future class.

Parameters:

session (APISession)
model_id (str)
default (type[EmbeddingModel] | None)

get_metadata()#

Get model metadata for this model.

Return type:: ModelMetadata

class openprotein.embeddings.PoET2Model[source]#

Class for OpenProtein’s foundation model PoET 2 - NB. PoET functions are dependent on a prompt supplied via the align endpoints.

Examples

View specific model details (inc supported tokens) with the ? operator.

import openprotein
session = openprotein.connect(username="user", password="password")
session.embedding.poet2.<embeddings_method>

__init__(session, model_id, metadata=None)[source]#

Parameters:

session (OpenProtein)
model_id (str)
metadata (ModelMetadata | None)

embed(sequences, reduction=ReductionType.MEAN, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#

Embed sequences using this model.

Parameters:

sequence (bytes) – Sequence to embed.
reduction (str) – embeddings reduction to use (e.g. mean)
prompt (str | Prompt) – Prompt or prompt_id or prompt from an align workflow to condition Poet model
query (str | bytes | Protein | Query | None) – Query to use with prompt. Optional
sequences (list[bytes])
use_query_structure_in_decoder (bool)

Returns:

A future object that returns the embeddings of the submitted sequences.

Return type:

EmbeddingResultFuture

logits(sequences, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#

logit embeddings for sequences using this model.

Parameters:

sequence (bytes) – Sequence to analyse.
prompt (str | Prompt) – Prompt or prompt_id or prompt from an align workflow to condition Poet model
query (str | bytes | Protein | Query | None) – Query to use with prompt. Optional
sequences (list[bytes])
use_query_structure_in_decoder (bool)

Returns:

A future object that returns the logits of the submitted sequences.

Return type:

EmbeddingResultFuture

score(sequences, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#

Score query sequences using the specified prompt.

Parameters:

sequence (list[bytes]) – Sequences to score.
prompt (str | Prompt) – Prompt or prompt_id or prompt from an align workflow to condition Poet model
query (str | bytes | Protein | Query | None) – Query to use with prompt. Optional
sequences (list[bytes])
use_query_structure_in_decoder (bool)

Returns:

A future object that returns the scores of the submitted sequences.

Return type:

single_site(sequence, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#

Score all single substitutions of the query sequence using the specified prompt.

Parameters:

sequence (bytes) – Sequence to analyse.
prompt (str | Prompt) – Prompt or prompt_id or prompt from an align workflow to condition Poet model
query (str | bytes | Protein | Query | None) – Query to use with prompt. Optional
use_query_structure_in_decoder (bool)

Returns:

A future object that returns the scores of the mutated sequence.

Return type:

generate(prompt, query=None, use_query_structure_in_decoder=True, num_samples=100, temperature=1.0, topk=None, topp=None, max_length=1000, seed=None)[source]#

Generate protein sequences conditioned on a prompt.

Parameters:

prompt (str | Prompt) – prompt from an align workflow to condition Poet model
query (str | bytes | Protein | Query | None) – Query to use with prompt. Optional
num_samples (int, optional) – The number of samples to generate, by default 100.
temperature (float, optional) – The temperature for sampling. Higher values produce more random outputs, by default 1.0.
topk (int, optional) – The number of top-k residues to consider during sampling, by default None.
topp (float, optional) – The cumulative probability threshold for top-p sampling, by default None.
max_length (int, optional) – The maximum length of generated proteins, by default 1000.
seed (int, optional) – Seed for random number generation, by default a random number.
use_query_structure_in_decoder (bool)

Returns:

A future object representing the status and information about the generation job.

Return type:

EmbeddingsGenerateFuture

fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#

Fit an SVD on the embedding results of PoET.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the args.

Parameters:

prompt (str | Prompt) – prompt from an align workflow to condition Poet model
query (str | bytes | Protein | Query | None) – Query to use with prompt. Optional
sequences (List[bytes]) – sequences to SVD
n_components (int) – number of components in SVD. Will determine output shapes
reduction (str) – embeddings reduction to use (e.g. mean)
assay (AssayDataset | None)
use_query_structure_in_decoder (bool)

Returns:

A future that represents the fitted SVD model.

Return type:

fit_umap(sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#

Fit a UMAP on assay using PoET and hyperparameters.

This function will create a UMAP based on the embeddings from this PoET model as well as the hyperparameters specified in the args.

Parameters:

prompt (str | Prompt) – prompt from an align workflow to condition Poet model
sequences (list[bytes] | None) – Optional sequences to fit UMAP with. Either use sequences or assay. sequences is preferred.
assay (AssayDataset | None) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.
n_components (int) – Number of components in UMAP fit. Will determine output shapes. Defaults to 2.
reduction (ReductionType | None) – Embeddings reduction to use (e.g. mean). Defaults to MEAN.
query (str | bytes | Protein | Query | None)
use_query_structure_in_decoder (bool)

Returns:

A future that represents the fitted UMAP model.

Return type:

fit_gp(assay, properties, prompt=None, query=None, use_query_structure_in_decoder=True, **kwargs)[source]#

Fit a GP on assay using this embedding model and hyperparameters.

Parameters:

assay (AssayMetadata | str) – Assay to fit GP on.
properties (list[str]) – Properties in the assay to fit the gp on.
reduction (str) – Type of embedding reduction to use for computing features. PLM must use reduction.
query (str | bytes | Protein | Query | None) – Query to use with prompt. Optional
prompt (str | Prompt | None)
use_query_structure_in_decoder (bool)

Returns:

A future that represents the trained predictor model.

Return type:

attn()#: Not Available for Poet.

classmethod create(session, model_id, default=None)#

Create and return an instance of the appropriate Future class based on the job type.

Returns: - An instance of the appropriate Future class.

Parameters:

session (APISession)
model_id (str)
default (type[EmbeddingModel] | None)

get_metadata()#

Get model metadata for this model.

Return type:: ModelMetadata

indel(sequence, prompt=None, insert=None, delete=None, **kwargs)#

Score all indels of the query sequence using the specified prompt.

Parameters:

sequence (bytes) – Sequence to analyse.
prompt (str | Prompt | None) – Prompt or prompt_id or prompt from an align workflow to condition Poet model.
insert (str | None) – Insertion fragment at each site.
delete (int | None) – Range of size of fragment to delete at each site.

Returns:

A future object that returns the scores of the indel-ed sequence.

Return type:

class openprotein.embeddings.SVDModel[source]#

Class providing embedding endpoint for SVD models. Also allows retrieving embeddings of sequences used to fit the SVD with get. Implements a Future to allow waiting for a fit job.

__init__(session, job=None, metadata=None)[source]#

Initializes with either job get or svd metadata get.

Parameters:

session (APISession)
job (FitJob | None)
metadata (SVDMetadata | None)

get_model()[source]#

Fetch embeddings model

Return type:: EmbeddingModel

delete()[source]#

Delete this SVD model.

Return type:: bool

get(verbose=False)[source]#

Return the results from this job.

Parameters:: verbose (bool)

get_inputs()[source]#

Get sequences used for svd job.

Returns:: List[bytes]
Return type:: list of sequences

embed(sequences, **kwargs)[source]#

Use this SVD model to get reduced embeddings from input sequences.

Parameters:: sequences (List[bytes]) – List of protein sequences.
Returns:: Class for further job manipulation.
Return type:: EmbeddingResultFuture

fit_umap(sequences=None, assay=None, n_components=2, **kwargs)[source]#

Fit an UMAP on the embedding results of this model.

This function will create an UMAPModel based on the embeddings from this model as well as the hyperparameters specified in the args.

Parameters:

sequences (List[bytes]) – sequences to UMAP
n_components (int) – number of components in UMAP. Will determine output shapes
reduction (ReductionType | None) – embeddings reduction to use (e.g. mean)
assay (AssayDataset | None)

Return type:

fit_gp(assay, properties, name=None, description=None, **kwargs)[source]#

Fit a GP on assay using this embedding model and hyperparameters.

Parameters:

assay (AssayMetadata | str) – Assay to fit GP on.
properties (list[str]) – Properties in the assay to fit the gp on.
name (str | None)
description (str | None)

Return type: