Using AlphaFold2#

This tutorial shows you how to use the AlphaFold2 model to create a PDB of your protein sequence of interest. We recommend using AlphaFold2 with multi-chain sequences. If you have a single-chain sequence, please visit Using ESMFold.

What you need before getting started#

Specify a sequence of interest whose structure you want to predict. The example used here is interleukin 2:

[ ]:

sequence = "MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP"

Creating an MSA#

AlphaFold2 requires evolutionary context from a multiple sequence alignment (MSA) to make structure predictions. This section demonstrates how to create an MSA based on the sequence you wish to fold.

Start by getting the alphafold model object:

[ ]:

afmodel = session.fold.get_model('alphafold2')
afmodel.fold?

[ ]:

afmodel.metadata

ModelMetadata(model_id='alphafold2', description=ModelDescription(citation_title='Highly accurate protein structure prediction with AlphaFold.', doi='10.1038/s41586-021-03819-2', summary='alphafold2 model.'), max_sequence_length=2048, dimension=-1, output_types=['fold'], input_tokens=['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', ':'], output_tokens=None, token_descriptions=[[TokenInfo(id=0, token='A', primary=True, description='Alanine')], [TokenInfo(id=1, token='R', primary=True, description='Arginine')], [TokenInfo(id=2, token='N', primary=True, description='Asparagine')], [TokenInfo(id=3, token='D', primary=True, description='Aspartic acid')], [TokenInfo(id=4, token='C', primary=True, description='Cysteine')], [TokenInfo(id=5, token='Q', primary=True, description='Glutamine')], [TokenInfo(id=6, token='E', primary=True, description='Glutamic acid')], [TokenInfo(id=7, token='G', primary=True, description='Glycine')], [TokenInfo(id=8, token='H', primary=True, description='Histidine')], [TokenInfo(id=9, token='I', primary=True, description='Isoleucine')], [TokenInfo(id=10, token='L', primary=True, description='Leucine')], [TokenInfo(id=11, token='K', primary=True, description='Lysine')], [TokenInfo(id=12, token='M', primary=True, description='Methionine')], [TokenInfo(id=13, token='F', primary=True, description='Phenylalanine')], [TokenInfo(id=14, token='P', primary=True, description='Proline')], [TokenInfo(id=15, token='S', primary=True, description='Serine')], [TokenInfo(id=16, token='T', primary=True, description='Threonine')], [TokenInfo(id=17, token='W', primary=True, description='Tryptophan')], [TokenInfo(id=18, token='Y', primary=True, description='Tyrosine')], [TokenInfo(id=19, token='V', primary=True, description='Valine')], [TokenInfo(id=20, token=':', primary=False, description='Chain token, used for polymers')]])

Use your seed sequence to create an MSA:

[ ]:

msa = session.align.create_msa(sequence.encode())
print(msa)

status=<JobStatus.SUCCESS: 'SUCCESS'> job_id='479a7434-d92f-46da-b785-f6dd6c250b1c' job_type=<JobType.align_align: '/align/align'> created_date=datetime.datetime(2024, 6, 25, 3, 2, 39, 606761) start_date=None end_date=datetime.datetime(2024, 6, 25, 3, 2, 39, 607001) prerequisite_job_id=None progress_message=None progress_counter=None num_records=None sequence_length=None msa_id='479a7434-d92f-46da-b785-f6dd6c250b1c'

Examine the outputs once the MSA is complete:

[ ]:

msa.wait_until_done(verbose=True)

print(list(msa.get_msa())[0:3])

Waiting: 100%|██████████| 100/100 [00:00<00:00, 1486.36it/s, status=SUCCESS]

[['seed', 'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'], ['UniRef100_G1RE34', 'MYRMQLLSCIALSLALVTNGAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVQELKGSETTFMCEWITFCQSIISTLT----------------------------------------------------------------------------------------------------'], ['UniRef100_A0A2K5MA48', 'MYRMQLLSCIALSLALVANSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRTKDLISNINVIVLELKGSETTLMCEWITFCQSIISTLT----------------------------------------------------------------------------------------------------']]

Predicting your sequence#

Call the AlphaFold2 model:

[ ]:

afmodel.fold?

Send the MSA to the fold endpoint and return a fold job to await:

[ ]:

fold = afmodel.fold(msa=msa, num_models=1 )

fold

<openprotein.api.fold.FoldResultFuture at 0x7d01d6d8c1f0>

[ ]:

fold.wait_until_done(verbose=True, timeout=600)

Waiting: 100%|██████████| 100/100 [02:30<00:00,  1.50s/it, status=SUCCESS]

True

Wait for the job to complete and fetch the results all with wait():

[ ]:

result = fold.wait(verbose=True)
result[0][0]

Waiting: 100%|██████████| 100/100 [00:00<00:00, 980.44it/s, status=SUCCESS]

b'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'

Return a PDB file:

[ ]:

print("\n".join( list(result[0][1].decode().split("\n")[0:5]) ) )

MODEL     1
ATOM      1  N   MET A   1     -24.000   8.852  20.203  1.00 45.47           N
ATOM      2  CA  MET A   1     -23.406   9.719  19.188  1.00 45.47           C
ATOM      3  C   MET A   1     -22.453   8.938  18.281  1.00 45.47           C
ATOM      4  CB  MET A   1     -22.672  10.883  19.844  1.00 45.47           C

Next steps#

After the PDB contents are returned, save them as a file for use with your molecular visualization system of choice.