Using The Rank Sequences Tool#
This tutorial teaches you how to assess protein fitness by using the Rank Sequences tool to score your input sequences relative to a prompt. Use this as a starting point for predicting the outcomes of a specific sequence or prioritizing variants for further analysis.
On this page, you will learn how to score sequences to predict fitness and rank variants, then interpret and fine-tune the results.
If you run into any challenges or have questions while getting started, please contact OpenProtein.AI support.
What You Need Before Starting#
This tool requires a multiple sequence alignment (MSA), from which it builds a prompt. You can upload your own MSA or have the OpenProtein model generate one for you. If you aren’t already familiar with prompts, we recommend learning more about OpenProtein.AI’s prompts and prompt sampling methods before diving in.
You also need an input sequence, or list of sequences you want to score against the prompt.
Rank Your Sequences#
Navigate to the tool by opening the PoET dropdown menu, then selecting Rank Sequences. You can choose the model used to run the job. We recommend using PoET-2 for most use cases.
Step 1: Input Sequences#
You can upload a dataset containing multiple sequences in either .fasta or .csv format. Once uploaded, your dataset will appear.
If you choose to upload a CSV file, please note the following requirements:
The file must not include a header row.
It can contain a maximum of 2 columns.
If there are 2 columns, the first one must be the sequence names.
You can choose the default structure prediction model to generate the sequence structures after the job completes.
Step 2: Prompt Query#
Refer to Creating a Query to learn about Prompt Query.
Step 3: Prompt Context#
Refer to Creating a Context to learn about Prompt Context.
You’re ready to rank your sequences! Click Run. The job may take a few minutes depending on how busy the service is, how long your sequences are, and how many sequences you want to score.
A 400 (Bad request) error code may be due to the following:
Issue description |
Solution |
|---|---|
Invalid PoET Job or Parent |
Re-enter prompt and try again. |
Invalid prompt in PoET service |
Reupload prompt and try again. Refer to the article about prompts. Ensure minimum and maximum similarity parameters are not filtering out all sequences in prompt. |
Invalid user input in align service |
Ensure you don’t have
If necessary, refer to the article on sampling parameters. |
Invalid MSA (not aligned, etc) |
|
Please contact OpenProtein.AI support if the suggested solutions don’t resolve the issue.
Interpreting Your Results#
Refer to Interpreting PoET Results Table.
Fine-tuning Your Results#
Improve your results by adding more sequences with your desired properties to your MSA, or by adjusting the prompt sampling method. You can also adjust the Maximum similarity to seed sequence and Minimum similarity to seed sequence fields.
To improve scores, increase the number of the ensemble setting. This will result in higher scoring sequences, but will take longer to complete.
Next Steps#
Now that you have a list of sequence variants of interest, you can use Structure Prediction to visualize the 3D structures of a protein sequence. You can also use Substitution Analysis to score all single substitution variants of your parent sequence conditioned on the prompt, and view the results in a heatmap.