Scoring sequences#

This tutorial shows you how to use PoET to score sequences relative to a prompt. Scoring your sequences is a starting point for predicting the outcomes of a specific sequence or prioritizing variants for further analysis.

PoET returns a log-likelihood score, which quantifies the model’s level of confidence in the generated sequence. The higher or less negative the score is, the more fit the sequence.

Scoring is consistent when comparing sequences resulting from one individual prompt. We don’t recommend comparing scores across different prompts.

What you need before getting started#

You need a previously generated multiple sequence alignment (MSA) and a prompt. See Creating an MSA and Creating a prompt for more information.

Scoring your sequences#

Use the prompt object with the PoET model from embeddings to access various PoET functions. You also need to select the sequences from your dataset. This example uses fastpetase enzyme:

[ ]:
poet = session.embedding.get_model("poet")
seqs = seqs = ["MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRPSGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTFDYPSSRSSQQMAALRQVASLNGDSSSPIYGKVDTARMGVMGHSMGGGASLRSAANNPSLKAAIPQAPWDSQTNFSSVTVPTLIFACENDSIAPVNSHALPIYDSMSRNAKQFLEINGGSHSCANSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTAVSDFRTANCS", "MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRPSGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQPESRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAAPQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSRNAKQFLEINGGSHSCANSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCS", "MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRPSGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQPESRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAAPQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSQNAKQFLEINGGSHSCANSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCS"]

Next, send your sequences to the PoET scoring endpoint:

[ ]:
poet = session.embedding.get_model('poet')

Initiate scoring:

[ ]:
scorejob = poet.score(prompt=prompt.prompt_id, sequences=seqs )

View your results:

NB. one score per prompt, as defined when we first created the prompt

[ ]:
score_results = scorejob.wait()
score_results
[('sequence-1',
  b'MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRPSGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTFDYPSSRSSQQMAALRQVASLNGDSSSPIYGKVDTARMGVMGHSMGGGASLRSAANNPSLKAAIPQAPWDSQTNFSSVTVPTLIFACENDSIAPVNSHALPIYDSMSRNAKQFLEINGGSHSCANSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTAVSDFRTANCS',
  array([-732.73248291, -728.83435059, -728.38525391])),
 ('sequence-2',
  b'MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRPSGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQPESRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAAPQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSRNAKQFLEINGGSHSCANSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCS',
  array([-739.41723633, -735.26245117, -735.42871094])),
 ('sequence-3',
  b'MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRPSGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQPESRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAAPQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSQNAKQFLEINGGSHSCANSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCS',
  array([-741.50085449, -737.48962402, -737.53479004]))]

Next steps#

Learn more about the score function in our PoET API page.

Now that you have a list of sequence variants of interest, you can use the PoET model to perform a [single site analysis] to score all single substitution variants of your parent sequence conditioned on the prompt. See Using PoET single site analysis for more information.