Using RosettaFold-3#
This tutorial demonstrates how to use the RosettaFold-3 model to predict the structure of a molecular complex, including proteins and ligands. We will also show how to request and retrieve predicted binding affinities and other quality metrics.
What you need before getting started#
First, ensure you have an active OpenProtein
session. Then, import the necessary classes for defining the components of your complex.
[1]:
import openprotein
from openprotein.protein import Protein
from openprotein.chains import Ligand
# Login to your session
session = openprotein.connect()
Defining the Molecules#
RosettaFold-3 can model proteins and ligands. For this example, we’ll predict the structure of a protein dimer in complex with a ligand.
We will define a dimer and one ligand. When using RosettaFold-3, similar to Boltz, we can specify that a Protein
is meant to be an oligomer by specifying multiple ids in the chain_id
. In this case, the protein is a dimer since we have ["A", "B"]
.
Note that for affinity prediction, the ligand that is binding must have a single, unique string for its chain_id
.
[2]:
# Define the proteins
proteins = [
Protein(sequence="MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ"),
]
proteins[0].chain_id = ["A", "B"]
# You can also specify the proteins to be cyclic by setting the property
# proteins[0].cyclic = True
# Define the ligand
# We use the three-letter code for S-adenosyl-L-homocysteine (SAH)
# The chain_id 'C' is the "binder" we will reference later.
ligands = [
Ligand(ccd="SAH", chain_id="C")
]
Create MSA for the Protein using Homology Search#
When using RosettaFold-3 with protein sequences, we need to supply an MSA to help inform the model. Otherwise, we can also explicitly set it to run using single sequence mode. You have to specify protein.msa
either an MSA or to use Protein.single_sequence_mode
.
Here, we will be building an MSA using our platform capabilities. Take note of the syntax here: creating an MSA with a complex uses ColabFold’s syntax of joining sequences with :
.
[3]:
msa_query = []
for p in proteins:
if p.chain_id is not None and isinstance(p.chain_id, list):
for _ in p.chain_id:
msa_query.append(p.sequence.decode())
else:
msa_query.append(p.sequence.decode())
msa = session.align.create_msa(seed=":".join(msa_query))
for p in proteins:
p.msa = msa
# If desired, use single sequence mode to specify no msa
# p.msa = Protein.single_sequence_mode
Predicting the Complex Structure and Affinity#
Now, we can call the fold
method on the RosettaFold-3 model.
The key steps are:
Access the model via
session.fold.rosettafold_3
.Pass the defined proteins and ligands.
[4]:
# Request the fold, including an affinity prediction for our ligand.
fold_job = session.fold.rosettafold_3.fold(
proteins=proteins,
ligands=ligands,
)
fold_job
[4]:
FoldJob(num_records=1, job_id='d45432c7-3820-4499-ace0-b5b5ffd0a119', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 9, 11, 6, 59, 46, 990383, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
The call returns a FoldComplexResultFuture
object immediately. This is a reference to your job running on the OpenProtein platform. You can monitor its status or wait for it to complete.
[5]:
# Wait for the job to finish
fold_job.wait_until_done(verbose=True)
Waiting: 100%|█| 100/100 [03:02<00:00, 1.82s/it, status=SU
[5]:
True
Retrieving the Results#
Once the job is complete, you can retrieve the various outputs from the future object.
Getting the Structure File#
The primary result is the predicted structure, which you can retrieve as a mmCIF file. Note that we only implemented mmCIF output format for RosettaFold-3.
[6]:
# Get the result as a PDB bytestring
result = fold_job.get()
print('\n'.join(result.decode().splitlines()[500:510])) # Print a few lines
ATOM 446 C CD . GLU A 0 59 ? -12.421381 25.947437 4.6932273 1 0.854047 ? 59 A 1
ATOM 447 O OE1 . GLU A 0 59 ? -13.197961 25.827934 5.6268296 1 0.808266 ? 59 A 1
ATOM 448 O OE2 . GLU A 0 59 ? -12.76528 25.941364 3.5005865 1 0.790524 ? 59 A 1
ATOM 449 N N . ALA A 0 60 ? -8.835037 25.153896 8.90248 1 0.939511 ? 60 A 1
ATOM 450 C CA . ALA A 0 60 ? -8.65234 25.27169 10.35034 1 0.935187 ? 60 A 1
ATOM 451 C C . ALA A 0 60 ? -8.414101 23.868555 10.900428 1 0.941439 ? 60 A 1
ATOM 452 O O . ALA A 0 60 ? -8.983845 22.844774 10.375406 1 0.936552 ? 60 A 1
ATOM 453 C CB . ALA A 0 60 ? -9.833746 25.863743 11.062814 1 0.918455 ? 60 A 1
ATOM 454 N N . PRO A 0 61 ? -7.5745974 23.69525 11.947628 1 0.93808 ? 61 A 1
ATOM 455 C CA . PRO A 0 61 ? -7.472991 22.403498 12.583999 1 0.938778 ? 61 A 1
Visualize the structure using molviewspec
[7]:
from molviewspec import create_builder
builder = create_builder()
structure = builder.download(url="mystructure.cif")\
.parse(format="mmcif")\
.model_structure()\
.component()\
.representation()\
.color(color="blue")
builder.molstar_notebook(data={'mystructure.cif': result}, width=500, height=400)
Next Steps#
You can use examine the predicted structure, or work on binder design with RFdiffusion on our platform. You can save your predicted structure like so:
[8]:
with open("mystructure.cif", "wb") as f:
f.write(result)