Creating a multiple sequence alignment#
Multiple sequence alignment (MSA) is a technique for biological sequence analysis, used to infer sequence homology and conduct phylogenetic analysis to assess the sequences’ shared evolutionary origins. You can create an MSA from a seed sequence, or upload a ready-made file. This tutorial covers the workflow for both options.
What you need before getting started#
You need either a seed sequence or an existing MSA formatted as a .fa, .fasta, or .csv file.
Creating an MSA from a seed sequence#
Initiate the seed workflow by specifying your seed sequence. This example uses Alpha-synuclein:
[ ]:
seed = "MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA"
Use the Align
module to create an MSA from your seed sequence:
[ ]:
msa = session.align.create_msa(seed.encode())
print(msa)
status=<JobStatus.SUCCESS: 'SUCCESS'> job_id='52d676fb-18bf-4803-9912-0380252b78e8' job_type=<JobType.align_align: '/align/align'> created_date=datetime.datetime(2024, 6, 13, 3, 12, 6, 555562) start_date=None end_date=datetime.datetime(2024, 6, 13, 3, 12, 6, 556046) prerequisite_job_id=None progress_message=None progress_counter=None num_records=None sequence_length=None msa_id='52d676fb-18bf-4803-9912-0380252b78e8'
Wait for the results with:
[ ]:
r = msa.wait()
If you want to examine the inputs you have used:
[ ]:
list(msa.get_seed())
[['seed',
'MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA']]
View the resulting MSA:
[ ]:
msa.get_msa() # or msa.wait()
<_csv.reader at 0x79cbbd0bd8c0>
Uploading an MSA#
If you have an existing MSA formatted as a .fa, .fasta, or .csv file, upload it with upload_msa(msa_file)
.
Upload and view your MSA:
[ ]:
f = ">101\nAAALLLPPP"
msa = session.align.upload_msa(f.encode())
list(msa.get_msa())
[['101', 'AAALLLPPP']]
Next steps#
Learn more about the MSAs on our MSA API page.
You can use your MSA to create a prompt and start generating, scoring, and analyzing sequences with our state-of-the-art PoET model. See Creating a prompt for instructions.
You can also use your MSA with our structure prediction tool to visualize the 3D structure of sequence. See Structure prediction for more information.