openprotein.api.embedding#

Create embeddings for your protein sequences using open-source and proprietary models!

Note that for PoET Models, you will also need to utilize our align. workflow.

Endpoints#

class openprotein.api.embedding.EmbeddingAPI[source]#

This class defines a high level interface for accessing the embeddings API.

You can access all our models either via get_model() or directly through the session’s embedding attribute using the model’s ID and the desired method. For example, to use the attention method on the protein sequence model, you would use session.embedding.prot_seq.attn().

Examples

Accessing a model’s method:

# To call the attention method on the protein sequence model:
import openprotein
session = openprotein.connect(username="user", password="password")
session.embedding.prot_seq.attn()

Using the get_model method:

# Get a model instance by name:
import openprotein
session = openprotein.connect(username="user", password="password")
# list available models:
print(session.embedding.list_models() )
# init model by name
model = session.embedding.get_model('prot-seq')
prot_seq: OpenProteinModel#
rotaprot_large_uniref50w: OpenProteinModel#
rotaprot_large_uniref90_ft: OpenProteinModel#
poet: PoETModel#
esm1b_t33_650M_UR50S: ESMModel#
esm1v_t33_650M_UR90S_1: ESMModel#
esm1v_t33_650M_UR90S_2: ESMModel#
esm1v_t33_650M_UR90S_3: ESMModel#
esm1v_t33_650M_UR90S_4: ESMModel#
esm1v_t33_650M_UR90S_5: ESMModel#
esm2_t12_35M_UR50D: ESMModel#
esm2_t30_150M_UR50D: ESMModel#
esm2_t33_650M_UR50D: ESMModel#
esm2_t36_3B_UR50D: ESMModel#
esm2_t6_8M_UR50D: ESMModel#
__init__(session)[source]#
Parameters:

session (APISession)

list_models()[source]#

list models available for creating embeddings of your sequences

Return type:

List[ProtembedModel]

get_model(name)[source]#

Get model by model_id.

ProtembedModel allows all the usual job manipulation: e.g. making POST and GET requests for this model specifically.

Parameters:
  • model_id (str) – the model identifier

  • name (str)

Returns:

The model

Return type:

ProtembedModel

Raises:

HTTPError – If the GET request does not succeed.

get_svd(svd_id)[source]#

Get SVD job results. Including SVD dimension and sequence lengths.

Requires a successful SVD job from fit_svd

Parameters:

svd_id (str) – The ID of the SVD job.

Returns:

The model with the SVD fit.

Return type:

SVDModel

list_svd()[source]#

List SVD models made by user.

Takes no args.

Returns:

SVDModels

Return type:

list[SVDModel]

Models#

class openprotein.api.embedding.OpenProteinModel[source]#

Class providing inference endpoints for proprietary protein embedding models served by OpenProtein.

Examples

View specific model details (inc supported tokens) with the ? operator.

import openprotein
session = openprotein.connect(username="user", password="password")
session.embedding.prot_seq?
__init__(session, model_id, metadata=None)#
attn(sequences)#

Attention embeddings for sequences using this model.

Parameters:

sequences (List[bytes]) – sequences to SVD

Return type:

EmbeddingResultFuture

embed(sequences, reduction='MEAN')#

Embed sequences using this model.

Parameters:
  • sequences (List[bytes]) – sequences to SVD

  • reduction (str) – embeddings reduction to use (e.g. mean)

Return type:

EmbeddingResultFuture

fit_svd(sequences, n_components=1024, reduction=None)#

Fit an SVD on the embedding results of this model.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the args.

Parameters:
  • sequences (List[bytes]) – sequences to SVD

  • n_components (int) – number of components in SVD. Will determine output shapes

  • reduction (str) – embeddings reduction to use (e.g. mean)

Return type:

SVDModel

get_metadata()#

Get model metadata for this model.

Return type:

ModelMetadata

classmethod get_model()#
logits(sequences)#

logit embeddings for sequences using this model.

Parameters:

sequences (List[bytes]) – sequences to SVD

Return type:

EmbeddingResultFuture

property metadata#
model_id = 'protembed'#
class openprotein.api.embedding.ESMModel[source]#

Class providing inference endpoints for Facebook’s ESM protein language Models.

Examples

View specific model details (inc supported tokens) with the ? operator.

import openprotein
session = openprotein.connect(username="user", password="password")
session.embedding.esm2_t12_35M_UR50D?
__init__(session, model_id, metadata=None)#
attn(sequences)#

Attention embeddings for sequences using this model.

Parameters:

sequences (List[bytes]) – sequences to SVD

Return type:

EmbeddingResultFuture

embed(sequences, reduction='MEAN')#

Embed sequences using this model.

Parameters:
  • sequences (List[bytes]) – sequences to SVD

  • reduction (str) – embeddings reduction to use (e.g. mean)

Return type:

EmbeddingResultFuture

fit_svd(sequences, n_components=1024, reduction=None)#

Fit an SVD on the embedding results of this model.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the args.

Parameters:
  • sequences (List[bytes]) – sequences to SVD

  • n_components (int) – number of components in SVD. Will determine output shapes

  • reduction (str) – embeddings reduction to use (e.g. mean)

Return type:

SVDModel

get_metadata()#

Get model metadata for this model.

Return type:

ModelMetadata

logits(sequences)#

logit embeddings for sequences using this model.

Parameters:

sequences (List[bytes]) – sequences to SVD

Return type:

EmbeddingResultFuture

class openprotein.api.embedding.PoETModel[source]#

Class for OpenProtein’s foundation model PoET - NB. PoET functions are dependent on a prompt supplied via the align endpoints.

Examples

View specific model details (inc supported tokens) with the ? operator.

import openprotein
session = openprotein.connect(username="user", password="password")
session.embedding.poet?
__init__(session, model_id, metadata=None)[source]#
embed(prompt, sequences, reduction='MEAN')[source]#

Embed sequences using this model.

Parameters:
  • prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model

  • sequence (bytes) – Sequence to embed.

  • reduction (str) – embeddings reduction to use (e.g. mean)

  • sequences (List[bytes])

Return type:

EmbeddingResultFuture

logits(prompt, sequences)[source]#

logit embeddings for sequences using this model.

Parameters:
  • prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model

  • sequence (bytes) – Sequence to analyse.

  • sequences (List[bytes])

Return type:

EmbeddingResultFuture

attn()[source]#

Not Available for Poet.

score(prompt, sequences)[source]#

Score query sequences using the specified prompt.

Parameters:
  • prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model

  • sequence (bytes) – Sequence to analyse.

  • sequences (List[bytes])

Returns:

The scores of the query sequences.

Return type:

results

single_site(prompt, sequence)[source]#

Score all single substitutions of the query sequence using the specified prompt.

Parameters:
  • prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model

  • sequence (bytes) – Sequence to analyse.

Returns:

The scores of the mutated sequence.

Return type:

results

generate(prompt, num_samples=100, temperature=1.0, topk=None, topp=None, max_length=1000, seed=None)[source]#

Generate protein sequences conditioned on a prompt.

Parameters:
  • prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model

  • num_samples (int, optional) – The number of samples to generate, by default 100.

  • temperature (float, optional) – The temperature for sampling. Higher values produce more random outputs, by default 1.0.

  • topk (int, optional) – The number of top-k residues to consider during sampling, by default None.

  • topp (float, optional) – The cumulative probability threshold for top-p sampling, by default None.

  • max_length (int, optional) – The maximum length of generated proteins, by default 1000.

  • seed (int, optional) – Seed for random number generation, by default a random number.

Raises:

APIError – If there is an issue with the API request.

Returns:

An object representing the status and information about the generation job.

Return type:

Job

fit_svd(prompt, sequences, n_components=1024, reduction=None)[source]#

Fit an SVD on the embedding results of this model.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the args.

Parameters:
  • prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model

  • sequences (List[bytes]) – sequences to SVD

  • n_components (int) – number of components in SVD. Will determine output shapes

  • reduction (str) – embeddings reduction to use (e.g. mean)

Return type:

SVDModel

get_metadata()#

Get model metadata for this model.

Return type:

ModelMetadata

class openprotein.api.embedding.SVDModel[source]#

Class providing embedding endpoint for SVD models. Also allows retrieving embeddings of sequences used to fit the SVD with get.

__init__(session, metadata)[source]#
Parameters:
get_model()[source]#

Fetch embeddings model

Return type:

ProtembedModel

delete()[source]#

Delete this SVD model.

Return type:

bool

get_job()[source]#

Get job associated with this SVD model

Return type:

Job

get_inputs()[source]#

Get sequences used for embeddings job.

Returns:

List[bytes]

Return type:

list of sequences

get_embeddings()[source]#

Get SVD embedding results for this model.

Returns:

EmbeddingResultFuture

Return type:

class for futher job manipulation

embed(sequences)[source]#

Use this SVD model to reduce embeddings results.

Parameters:

sequences (List[bytes]) – List of protein sequences.

Returns:

Class for further job manipulation.

Return type:

EmbeddingResultFuture

classmethod get_job_type()#

Return the job type associated with this Future class.

refresh()#

refresh job status

wait(interval=5, timeout=None, verbose=False)#

Wait for job to complete, then fetch results.

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for job to complete. Do not fetch results (unlike wait())

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

Results#

class openprotein.api.embedding.EmbeddingResultFuture[source]#

Future Job for manipulating results

__init__(session, job, sequences=None, max_workers=10)[source]#

Retrieve results from asynchronous, mapped endpoints. Use max_workers > 0 to enable concurrent retrieval of multiple pages.

Parameters:
get_item(sequence)[source]#

Get embedding results for specified sequence.

Parameters:

sequence (bytes) – sequence to fetch results for

Returns:

embeddings

Return type:

np.ndarray

classmethod get_job_type()#

Return the job type associated with this Future class.

refresh()#

refresh job status

wait(interval=5, timeout=None, verbose=False)#

Wait for job to complete, then fetch results.

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for job to complete. Do not fetch results (unlike wait())

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

class openprotein.api.poet.PoetScoreFuture[source]#

Represents a result of a PoET scoring job.

session#

An instance of APISession for API interactions.

Type:

APISession

job#

The PoET scoring job.

Type:

Job

page_size#

The number of results to fetch in a single page.

Type:

int

get(verbose=False)[source]#

Get the final results of the PoET job.

Return type:

List[tuple]

__init__(session, job, page_size=50000, **kwargs)[source]#

init a PoetScoreFuture instance.

Parameters:
  • (APISession) (session)

  • (Job) (job)

  • (int (page_size)

  • optional) (The number of results to fetch in a single page. Defaults to config.POET_PAGE_SIZE.)

  • session (APISession)

  • job (Job)

get(verbose=False)[source]#

Get the final results of the PoET scoring job.

Parameters:

verbose (bool, optional) – If True, print verbose output. Defaults to False.

Raises:

APIError – If there is an issue with the API request.

Returns:

A list of PoetScoreResult objects representing the scoring results.

Return type:

List[PoetScoreResult]

get_input(input_type)#

See child function docs.

Parameters:

input_type (PoetInputType)

classmethod get_job_type()#

Return the job type associated with this Future class.

get_msa()#

See child function docs.

get_prompt(prompt_index=None)#

See child function docs.

Parameters:

prompt_index (int | None)

get_seed()#

See child function docs.

refresh()#

refresh job status

wait(interval=5, timeout=None, verbose=False)#

Wait for job to complete, then fetch results.

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for job to complete. Do not fetch results (unlike wait())

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

class openprotein.api.poet.PoetSingleSiteFuture[source]#

Represents a result of a PoET single-site analysis job.

session#

An instance of APISession for API interactions.

Type:

APISession

job#

The PoET scoring job.

Type:

Job

page_size#

The number of results to fetch in a single page.

Type:

int

get(verbose=False)[source]#

Get the final results of the PoET job.

Return type:

Dict

__init__(session, job, page_size=50000, **kwargs)[source]#

init a PoetSingleSiteFuture instance.

Parameters:
  • (APISession) (session)

  • (Job) (job)

  • (int (page_size)

  • optional) (The number of results to fetch in a single page. Defaults to config.POET_PAGE_SIZE.)

  • session (APISession)

  • job (Job)

get(verbose=False)[source]#

Get the results of a PoET single-site analysis job.

Parameters:

verbose (bool, optional) – If True, print verbose output. Defaults to False.

Returns:

A dictionary mapping mutation codes to scores.

Return type:

Dict[bytes, float]

Raises:

APIError – If there is an issue with the API request.

get_input(input_type)#

See child function docs.

Parameters:

input_type (PoetInputType)

classmethod get_job_type()#

Return the job type associated with this Future class.

get_msa()#

See child function docs.

get_prompt(prompt_index=None)#

See child function docs.

Parameters:

prompt_index (int | None)

get_seed()#

See child function docs.

refresh()#

refresh job status

wait(interval=5, timeout=None, verbose=False)#

Wait for job to complete, then fetch results.

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for job to complete. Do not fetch results (unlike wait())

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

class openprotein.api.poet.PoetGenerateFuture[source]#

Represents a result of a PoET generation job.

session#

An instance of APISession for API interactions.

Type:

APISession

job#

The PoET scoring job.

Type:

Job

Methods#
stream() -> Iterator[PoetScoreResult]:

Stream the results of the PoET generation job.

stream()[source]#

Stream the results from the response.

Returns:

PoetScoreResult – A result object containing the sequence, score, and name.

Return type:

Yield

Raises:

APIError – If the request fails.

__init__(session, job)#
Parameters:
get_input(input_type)#

See child function docs.

Parameters:

input_type (PoetInputType)

classmethod get_job_type()#

Return the job type associated with this Future class.

get_msa()#

See child function docs.

get_prompt(prompt_index=None)#

See child function docs.

Parameters:

prompt_index (int | None)

get_seed()#

See child function docs.

refresh()#

refresh job status

wait(interval=5, timeout=None, verbose=False)#

Wait for job to complete, then fetch results.

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for job to complete. Do not fetch results (unlike wait())

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results