Examining structure prediction models#

This tutorial shows you how to view information about our structure prediction models, ESMFold, AlphaFold2, and Boltz. We recommend using ESMFold for single-chain sequences, and Boltz-2 or AlphaFold2 for multi-chain sequences.

Viewing the models#

Access a list of the available folding models:

[1]:

import openprotein
session = openprotein.connect()
session.fold.list_models()

[1]:

[alphafold2, boltz-1, boltz-1x, boltz-2, esmfold, minifold, rosettafold-3]

ESMFold#

View more details of the fold function:

[2]:

esmfoldmodel = session.fold.get_model('esmfold')
esmfoldmodel.fold?

[3]:

esmfoldmodel.metadata

[3]:

ModelMetadata(model_id='esmfold', description=ModelDescription(citation_title='Evolutionary-scale prediction of atomic level protein structure with a language model', doi='10.1126/science.ade2574', summary='esmfold_v1 model with 690M parameters, running on top of esm2_t36_3B_UR50D with 3B parameters.'), max_sequence_length=1024, dimension=-1, output_types=['fold'], input_tokens=['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', ':'], output_tokens=None, token_descriptions=[[TokenInfo(id=0, token='A', primary=True, description='Alanine')], [TokenInfo(id=1, token='R', primary=True, description='Arginine')], [TokenInfo(id=2, token='N', primary=True, description='Asparagine')], [TokenInfo(id=3, token='D', primary=True, description='Aspartic acid')], [TokenInfo(id=4, token='C', primary=True, description='Cysteine')], [TokenInfo(id=5, token='Q', primary=True, description='Glutamine')], [TokenInfo(id=6, token='E', primary=True, description='Glutamic acid')], [TokenInfo(id=7, token='G', primary=True, description='Glycine')], [TokenInfo(id=8, token='H', primary=True, description='Histidine')], [TokenInfo(id=9, token='I', primary=True, description='Isoleucine')], [TokenInfo(id=10, token='L', primary=True, description='Leucine')], [TokenInfo(id=11, token='K', primary=True, description='Lysine')], [TokenInfo(id=12, token='M', primary=True, description='Methionine')], [TokenInfo(id=13, token='F', primary=True, description='Phenylalanine')], [TokenInfo(id=14, token='P', primary=True, description='Proline')], [TokenInfo(id=15, token='S', primary=True, description='Serine')], [TokenInfo(id=16, token='T', primary=True, description='Threonine')], [TokenInfo(id=17, token='W', primary=True, description='Tryptophan')], [TokenInfo(id=18, token='Y', primary=True, description='Tyrosine')], [TokenInfo(id=19, token='V', primary=True, description='Valine')], [TokenInfo(id=20, token=':', primary=False, description='Chain token, used for polymers')]])

AlphaFold2#

View details of AlphaFold2. Note that AlphaFold2 input_tokens is null because it takes an MSA instead. But the same tokens are supported as in ESMFold.

[5]:

afmodel = session.fold.get_model('alphafold2')
afmodel.fold?

[5]:

afmodel.metadata

[5]:

ModelMetadata(id='alphafold2', description=ModelDescription(citation_title='Highly accurate protein structure prediction with AlphaFold.', doi='10.1038/s41586-021-03819-2', summary='alphafold2 model.'), max_sequence_length=2400, dimension=-1, output_types=['fold'], input_tokens=None, output_tokens=None, token_descriptions=[[TokenInfo(id=0, token='A', primary=True, description='Alanine')], [TokenInfo(id=1, token='R', primary=True, description='Arginine')], [TokenInfo(id=2, token='N', primary=True, description='Asparagine')], [TokenInfo(id=3, token='D', primary=True, description='Aspartic acid')], [TokenInfo(id=4, token='C', primary=True, description='Cysteine')], [TokenInfo(id=5, token='Q', primary=True, description='Glutamine')], [TokenInfo(id=6, token='E', primary=True, description='Glutamic acid')], [TokenInfo(id=7, token='G', primary=True, description='Glycine')], [TokenInfo(id=8, token='H', primary=True, description='Histidine')], [TokenInfo(id=9, token='I', primary=True, description='Isoleucine')], [TokenInfo(id=10, token='L', primary=True, description='Leucine')], [TokenInfo(id=11, token='K', primary=True, description='Lysine')], [TokenInfo(id=12, token='M', primary=True, description='Methionine')], [TokenInfo(id=13, token='F', primary=True, description='Phenylalanine')], [TokenInfo(id=14, token='P', primary=True, description='Proline')], [TokenInfo(id=15, token='S', primary=True, description='Serine')], [TokenInfo(id=16, token='T', primary=True, description='Threonine')], [TokenInfo(id=17, token='W', primary=True, description='Tryptophan')], [TokenInfo(id=18, token='Y', primary=True, description='Tyrosine')], [TokenInfo(id=19, token='V', primary=True, description='Valine')], [TokenInfo(id=20, token=':', primary=False, description='Chain token, used for polymers')]])

Next steps#

You can do the same for the other available structure prediction models and examine them. Otherwise go ahead and visualize the predicted structure of your sequence of interest using one of our structure prediction models. See Using ESMFold and Using AlphaFold2 for instructions.