Using AlphaFold2#
This tutorial shows you how to use the AlphaFold2 model to create a PDB of your protein sequence of interest. We recommend using AlphaFold2 with multi-chain sequences. If you have a single-chain sequence, please visit Using ESMFold.
What you need before getting started#
Specify a sequence of interest whose structure you want to predict. The example used here is interleukin 2:
[ ]:
sequence = "MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP"
Creating an MSA#
AlphaFold2 requires evolutionary context from a multiple sequence alignment (MSA) to make structure predictions. This section demonstrates how to create an MSA based on the sequence you wish to fold.
Start by getting the alphafold model object:
[ ]:
afmodel = session.fold.get_model('alphafold2')
afmodel.fold?
[ ]:
afmodel.metadata
ModelMetadata(model_id='alphafold2', description=ModelDescription(citation_title='Highly accurate protein structure prediction with AlphaFold.', doi='10.1038/s41586-021-03819-2', summary='alphafold2 model.'), max_sequence_length=2048, dimension=-1, output_types=['fold'], input_tokens=['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', ':'], output_tokens=None, token_descriptions=[[TokenInfo(id=0, token='A', primary=True, description='Alanine')], [TokenInfo(id=1, token='R', primary=True, description='Arginine')], [TokenInfo(id=2, token='N', primary=True, description='Asparagine')], [TokenInfo(id=3, token='D', primary=True, description='Aspartic acid')], [TokenInfo(id=4, token='C', primary=True, description='Cysteine')], [TokenInfo(id=5, token='Q', primary=True, description='Glutamine')], [TokenInfo(id=6, token='E', primary=True, description='Glutamic acid')], [TokenInfo(id=7, token='G', primary=True, description='Glycine')], [TokenInfo(id=8, token='H', primary=True, description='Histidine')], [TokenInfo(id=9, token='I', primary=True, description='Isoleucine')], [TokenInfo(id=10, token='L', primary=True, description='Leucine')], [TokenInfo(id=11, token='K', primary=True, description='Lysine')], [TokenInfo(id=12, token='M', primary=True, description='Methionine')], [TokenInfo(id=13, token='F', primary=True, description='Phenylalanine')], [TokenInfo(id=14, token='P', primary=True, description='Proline')], [TokenInfo(id=15, token='S', primary=True, description='Serine')], [TokenInfo(id=16, token='T', primary=True, description='Threonine')], [TokenInfo(id=17, token='W', primary=True, description='Tryptophan')], [TokenInfo(id=18, token='Y', primary=True, description='Tyrosine')], [TokenInfo(id=19, token='V', primary=True, description='Valine')], [TokenInfo(id=20, token=':', primary=False, description='Chain token, used for polymers')]])
Use your seed sequence to create an MSA:
[ ]:
msa = session.align.create_msa(sequence.encode())
print(msa)
status=<JobStatus.SUCCESS: 'SUCCESS'> job_id='479a7434-d92f-46da-b785-f6dd6c250b1c' job_type=<JobType.align_align: '/align/align'> created_date=datetime.datetime(2024, 6, 25, 3, 2, 39, 606761) start_date=None end_date=datetime.datetime(2024, 6, 25, 3, 2, 39, 607001) prerequisite_job_id=None progress_message=None progress_counter=None num_records=None sequence_length=None msa_id='479a7434-d92f-46da-b785-f6dd6c250b1c'
Examine the outputs once the MSA is complete:
[ ]:
msa.wait_until_done(verbose=True)
print(list(msa.get_msa())[0:3])
Waiting: 100%|██████████| 100/100 [00:00<00:00, 1486.36it/s, status=SUCCESS]
[['seed', 'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'], ['UniRef100_G1RE34', 'MYRMQLLSCIALSLALVTNGAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVQELKGSETTFMCEWITFCQSIISTLT----------------------------------------------------------------------------------------------------'], ['UniRef100_A0A2K5MA48', 'MYRMQLLSCIALSLALVANSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRTKDLISNINVIVLELKGSETTLMCEWITFCQSIISTLT----------------------------------------------------------------------------------------------------']]
Predicting your sequence#
Call the AlphaFold2 model:
[ ]:
afmodel.fold?
Send the MSA to the fold endpoint and return a fold
job to await:
[ ]:
fold = afmodel.fold(msa=msa, num_models=1 )
fold
<openprotein.api.fold.FoldResultFuture at 0x7d01d6d8c1f0>
[ ]:
fold.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|██████████| 100/100 [02:30<00:00, 1.50s/it, status=SUCCESS]
True
Wait for the job to complete and fetch the results all with wait()
:
[ ]:
result = fold.wait(verbose=True)
result[0][0]
Waiting: 100%|██████████| 100/100 [00:00<00:00, 980.44it/s, status=SUCCESS]
b'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'
Return a PDB file:
[ ]:
print("\n".join( list(result[0][1].decode().split("\n")[0:5]) ) )
MODEL 1
ATOM 1 N MET A 1 -24.000 8.852 20.203 1.00 45.47 N
ATOM 2 CA MET A 1 -23.406 9.719 19.188 1.00 45.47 C
ATOM 3 C MET A 1 -22.453 8.938 18.281 1.00 45.47 C
ATOM 4 CB MET A 1 -22.672 10.883 19.844 1.00 45.47 C
Next steps#
After the PDB contents are returned, save them as a file for use with your molecular visualization system of choice.