Using RFdiffusion#
This tutorial shows you how to use the RFdiffusion model to design novel protein structures.
The examples here are largely lifted from the original documentation but adapted to show how it can be run using the OpenProtein platform, which can then be combined with our other workflows!
Full credit for the examples and use cases go to the authors of RFdiffusion!
Unconditional monomer design#
The basic execution of RFdiffusion would be an unconditional design of a protein structure of a certain length. You would need 2 things:
Length of the protein
Number of designs
N
desired
[1]:
length = 150
N = 3
[2]:
rfdiffusion = session.models.rfdiffusion
rfdiffusion.generate?
Signature:
rfdiffusion.generate(
n: int = 1,
structure_file: str | bytes | typing.BinaryIO | None = None,
contigs: int | str | None = None,
inpaint_seq: str | None = None,
provide_seq: str | None = None,
hotspot: str | None = None,
T: int | None = None,
partial_T: int | None = None,
use_active_site_model: bool | None = None,
use_beta_model: bool | None = None,
symmetry: Optional[Literal['cyclic', 'dihedral', 'tetrahedral']] = None,
order: int | None = None,
add_potential: bool | None = None,
scaffold_target_structure_file: str | bytes | typing.BinaryIO | None = None,
scaffold_target_use_struct: bool = False,
**kwargs,
) -> openprotein.models.foundation.rfdiffusion.RFdiffusionFuture
Docstring:
Run a protein structure generate job using RFdiffusion.
Parameters
----------
n : int, optional
The number of unique design trajectories to run (default is 1).
structure_file : BinaryIO, optional
An input PDB file (as a file-like object) used for inpainting or other
guided design tasks where parts of an existing structure are provided.
contigs : int, str, optional
Defines the lengths and connectivity of chain segments for the desired
structure, specified in RFdiffusion's contig string format.
Required for most design tasks. Example: 150, '10-20/A100-110/10-20' for a
binder design.
inpaint_seq : str, optional
A string specifying the regions in the input structure to mask for
in-painting. Example: 'A1-A10/A30-40'.
provide_seq : str, optional
A string specifying which segments of the contig have a provided
sequence. Example: 'A1-A10/A30-40'.
hotspot : str, optional
A string specifying hotspot residues to constrain during design,
typically for functional sites. Example: 'A10,A12,A14'.
T : int, optional
The number of timesteps for the diffusion process.
partial_T : int, optional
The number of timesteps for partial diffusion.
use_active_site_model : bool, optional
If True, uses the active site model checkpoint, which has been finetuned to
better keep very small motifs in place in the output for motif scaffolding
(default is False).
use_beta_model : bool, optional
If True, uses the complex beta model checkpoint, which generates a
greater diversity of topologies but has not been extensively
experimentally validated (default is False).
symmetry : {"cyclic", "dihedral", "tetrahedral"}, optional
The type of symmetry to apply to the design.
order : int, optional
The order of the symmetry (e.g., 3 for C3 or D3 symmetry).
Must be provided if `symmetry` is set.
add_potential : bool, optional
A flag to toggle an additional potential to guide the design.
This defaults to true in the case of symmetric design.
scaffold_target_structure_file : str, bytes, BinaryIO, optional
A PDB file (which can be the text string or bytes or the file-like
object) containing a scaffold structure to be used as a structural
guide. It could also be used as a target when doing scaffold guided
scaffold_target_use_struct : bool, optional
Whether or not to use the provided scaffold structure as a target.
Otherwise, it is used only as a topology guide.
Other Parameters
----------------
**kwargs : dict
Additional keyword args that are passed directly to the rfdiffusion
inference script. Overwrites any preceding options.
Returns
-------
RFdiffusionFuture
A future object that can be used to retrieve the results of the design
job upon completion.
File: ~/Projects/openprotein/openprotein-python-private/openprotein/models/foundation/rfdiffusion.py
Type: method
Now let’s get the model:
Run the design using RFdiffusion:
[3]:
design = rfdiffusion.generate(N=N, contigs=length)
design
[3]:
RFdiffusionJob(job_id='169c020b-b992-4cd0-8097-26be7f0bdcb1', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 13, 54, 617888, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Wait for the job to finish running with wait_until_done
.
[4]:
design.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [06:02<00:00, 3.63s/it, status=SUCCESS]
[4]:
True
Retrieve the PDB file of the design. Use the replicate
param to specify the 0-indexed design index to retrieve, in this case 0
to 2
.
[5]:
result = design.get(replicate=0)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
ATOM 1 N GLY A 1 2.829 7.051 28.939 1.00 0.00
ATOM 2 CA GLY A 1 2.481 7.591 27.630 1.00 0.00
ATOM 3 C GLY A 1 3.553 7.267 26.598 1.00 0.00
ATOM 4 O GLY A 1 3.246 6.934 25.453 1.00 0.00
ATOM 5 N GLY A 2 4.718 7.479 26.963 1.00 0.00
ATOM 6 CA GLY A 2 5.797 7.175 26.031 1.00 0.00
ATOM 7 C GLY A 2 5.856 5.684 25.725 1.00 0.00
ATOM 8 O GLY A 2 6.107 5.284 24.588 1.00 0.00
ATOM 9 N GLY A 3 5.607 4.890 26.722 1.00 0.00
ATOM 10 CA GLY A 3 5.590 3.454 26.474 1.00 0.00
Motif Scaffolding#
RFdiffusion can be used to scaffold motifs. To do this, we need a few things:
some particular protein input, from a
.pdb
how to connect these proteins and by how many residues in the new protein
some sample of lengths for the new protein, similar to the above
First, let’s get our pdb from the RCSB protein data bank. We will be using 5TPN
, which represents the crystal structure of RSV F in complex with human antibody hRSV90.
[6]:
import requests
import gzip
import io
def get_pdb(code: str) -> str:
with requests.get(f"https://files.rcsb.org/download/{code}.pdb1.gz", stream=True) as r:
r.raise_for_status()
buf = io.BytesIO(r.content)
with gzip.open(buf, 'rb') as f:
pdb = f.read().decode()
return pdb
pdb = get_pdb("5TPN")
print("\n".join(pdb.splitlines()[210:220]))
ATOM 9 N ILE A 28 30.006 -91.995 -29.741 1.00 73.73 N
ATOM 10 CA ILE A 28 28.960 -92.036 -28.738 1.00 66.97 C
ATOM 11 C ILE A 28 29.318 -93.052 -27.671 1.00 72.27 C
ATOM 12 O ILE A 28 29.765 -94.165 -27.983 1.00 73.82 O
ATOM 13 CB ILE A 28 27.604 -92.343 -29.396 1.00 70.42 C
ATOM 14 CG1 ILE A 28 27.236 -91.185 -30.311 1.00 72.90 C
ATOM 15 CG2 ILE A 28 26.519 -92.523 -28.364 1.00 61.95 C
ATOM 16 CD1 ILE A 28 27.324 -89.849 -29.613 1.00 71.56 C
ATOM 17 N THR A 29 29.133 -92.652 -26.403 1.00 74.86 N
ATOM 18 CA THR A 29 29.375 -93.492 -25.230 1.00 67.44 C
Now we need to specify the motif that we are interested in. This is also done along with the length of the output protein we are interested in using the contigs syntax from RFdiffusion. In particular, we are interested in the residues 163-181 (inclusively) on chain A in the input pdb, and we are interested in randomly sampling 10-40 residues on the N and C terminus. This works out to the following syntax:
[7]:
contigs = "10-40/A163-181/10-40"
Refer to the RFdiffusion documentation for more examples and explanation of the contigs syntax.
Now let’s run our design:
[8]:
design = rfdiffusion.generate(
structure_file=pdb,
contigs=contigs,
N=1,
)
design
[8]:
RFdiffusionJob(job_id='ae149a8b-07b8-43ce-abb3-3a93b99fa2df', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 20, 4, 362109, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[9]:
design.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:38<00:00, 2.58it/s, status=SUCCESS]
[9]:
True
[10]:
result = design.get()
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
ATOM 1 N GLY A 1 -20.735 6.261 4.968 1.00 0.00
ATOM 2 CA GLY A 1 -19.299 6.011 5.004 1.00 0.00
ATOM 3 C GLY A 1 -18.972 4.606 4.516 1.00 0.00
ATOM 4 O GLY A 1 -19.699 3.654 4.801 1.00 0.00
ATOM 5 N GLY A 2 -17.999 4.543 3.767 1.00 0.00
ATOM 6 CA GLY A 2 -17.574 3.224 3.314 1.00 0.00
ATOM 7 C GLY A 2 -16.419 2.698 4.157 1.00 0.00
ATOM 8 O GLY A 2 -15.314 3.239 4.121 1.00 0.00
ATOM 9 N GLY A 3 -16.696 1.761 4.909 1.00 0.00
ATOM 10 CA GLY A 3 -15.676 1.171 5.768 1.00 0.00
Small motifs using active site model#
With very small motifs, RFdiffusion has the tendency to not keep them perfectly fixed in the output. For very small input functional motifs, RFdiffusion recommends using the active site model which is finetuned for such tasks. This is specified using the use_active_site_model
:
[11]:
contigs = "10-40/A170-173/10-40"
design = rfdiffusion.generate(
structure_file=pdb,
contigs=contigs,
N=1,
use_active_site_model=True
)
design
[11]:
RFdiffusionJob(job_id='b18fe59b-b7cf-4ff8-9f96-1f2432ec1b4d', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 20, 45, 39364, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[12]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:38<00:00, 2.57it/s, status=SUCCESS]
ATOM 1 N GLY A 1 -24.784 9.183 17.739 1.00 0.00
ATOM 2 CA GLY A 1 -25.113 7.763 17.749 1.00 0.00
ATOM 3 C GLY A 1 -23.861 6.909 17.900 1.00 0.00
ATOM 4 O GLY A 1 -22.853 7.144 17.232 1.00 0.00
ATOM 5 N GLY A 2 -23.743 6.016 18.764 1.00 0.00
ATOM 6 CA GLY A 2 -22.548 5.192 18.901 1.00 0.00
ATOM 7 C GLY A 2 -22.379 4.264 17.705 1.00 0.00
ATOM 8 O GLY A 2 -21.262 4.024 17.247 1.00 0.00
ATOM 9 N GLY A 3 -23.442 3.869 17.061 1.00 0.00
ATOM 10 CA GLY A 3 -23.352 3.029 15.872 1.00 0.00
Inpainting#
inpaint_seq
lets you hide the amino acid identities of specific residues in your input structure. RFdiffusion will then fill in these residues during design, choosing sequences that fit the new structural context.
For example, if you’re fusing two proteins, residues that were originally on the surface (often polar) might end up buried in the core. Instead of manually mutating them to hydrophobic residues, you can mask them with inpaint_seq so RFdiffusion can decide on the best replacements automatically.
[13]:
inpaint_seq = "A163-168/A170-171/A179"
This means we are masking the residue identities of A163 to A168 (inclusive), and residues A170, A171 and A179.
[14]:
# reset the contigs from the very small motif for inpaint example
contigs = "10-40/A163-181/10-40"
design = rfdiffusion.generate(
structure_file=pdb,
contigs=contigs,
inpaint_seq=inpaint_seq,
N=1,
)
design
[14]:
RFdiffusionJob(job_id='be51eb2f-e86d-4e68-939f-3351ecdc6444', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 21, 26, 735906, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[15]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:38<00:00, 2.58it/s, status=SUCCESS]
ATOM 1 N GLY A 1 -20.970 12.196 12.412 1.00 0.00
ATOM 2 CA GLY A 1 -21.400 10.976 13.086 1.00 0.00
ATOM 3 C GLY A 1 -20.229 10.027 13.306 1.00 0.00
ATOM 4 O GLY A 1 -19.292 9.986 12.510 1.00 0.00
ATOM 5 N GLY A 2 -20.221 9.300 14.400 1.00 0.00
ATOM 6 CA GLY A 2 -19.185 8.320 14.703 1.00 0.00
ATOM 7 C GLY A 2 -19.053 7.294 13.585 1.00 0.00
ATOM 8 O GLY A 2 -17.949 6.860 13.255 1.00 0.00
ATOM 9 N GLY A 3 -20.200 6.833 13.049 1.00 0.00
ATOM 10 CA GLY A 3 -20.176 5.891 11.936 1.00 0.00
Partial diffusion#
We can use partial diffusion to get some diversity around a general fold. This is done using the partial_T
parameter and setting to some timestep to ‘noise’ to. Higher noise means higher diversity. You should sample different values for your specific design problem, but the typical value used by the authors was 20.
With partial diffusion, there is a constraint on contigs since the diffusion is done from a known structure - the contig string has to yield the exact same length as the input protein.
Let’s use 2KL8
:
[16]:
pdb = get_pdb("2KL8")
print("\n".join(pdb.splitlines()[100:110]))
ATOM 63 HB3 ASP A 4 -2.932 0.810 7.543 1.00 38.00 H
ATOM 64 N ILE A 5 -2.097 0.988 4.087 1.00 23.43 N
ATOM 65 CA ILE A 5 -2.325 -0.005 3.033 1.00 2.14 C
ATOM 66 C ILE A 5 -1.796 -1.382 3.465 1.00 34.22 C
ATOM 67 O ILE A 5 -0.589 -1.598 3.543 1.00 12.12 O
ATOM 68 CB ILE A 5 -1.641 0.412 1.707 1.00 35.23 C
ATOM 69 CG1 ILE A 5 -2.050 1.844 1.323 1.00 35.34 C
ATOM 70 CG2 ILE A 5 -1.998 -0.571 0.590 1.00 71.23 C
ATOM 71 CD1 ILE A 5 -1.388 2.353 0.059 1.00 22.53 C
ATOM 72 H ILE A 5 -1.215 1.402 4.168 1.00 38.00 H
And run partial diffusion for 10 timesteps:
[17]:
length = 85
design = rfdiffusion.generate(
structure_file=pdb,
contigs=length,
partial_T=10,
N=1,
)
design
[17]:
RFdiffusionJob(job_id='2bb73c76-f0df-4d38-bbd6-fbbbe612d666', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 22, 9, 356090, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[18]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:16<00:00, 5.92it/s, status=SUCCESS]
ATOM 1 N GLY A 1 -1.921 11.519 5.041 1.00 0.00
ATOM 2 CA GLY A 1 -2.816 10.371 4.958 1.00 0.00
ATOM 3 C GLY A 1 -2.049 9.063 5.094 1.00 0.00
ATOM 4 O GLY A 1 -0.922 8.940 4.615 1.00 0.00
ATOM 5 N GLY A 2 -2.809 8.099 5.541 1.00 0.00
ATOM 6 CA GLY A 2 -2.225 6.768 5.656 1.00 0.00
ATOM 7 C GLY A 2 -3.078 5.728 4.941 1.00 0.00
ATOM 8 O GLY A 2 -4.305 5.745 5.039 1.00 0.00
ATOM 9 N GLY A 3 -2.394 4.880 4.237 1.00 0.00
ATOM 10 CA GLY A 3 -3.064 3.781 3.551 1.00 0.00
Keeping some sequences#
You can also keep parts of the sequence of the partially diffused chain fixed, using provide_seq
. An example of why you might want to do this is in the context of helical peptide binding. If you’ve threaded a helical peptide sequence onto an ideal helix, and now want to diversify the complex, allowing the helix to be predicted now not as an ideal helix, you might do something like:
[19]:
provide_seq = "172-205"
# Provide multiple sequences using a comma delimiter
# provide_seq = "172-177,200-205"
This means we want to keep residues 172 to 205 fixed.
Let’s use the helical peptide example from the RFdiffusion repo:
[20]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/peptide_complex_ideal_helix.pdb").text
print("\n".join(pdb.splitlines()[10:20]))
ATOM 2 CA GLY A 1 16.504 -16.986 9.191 1.00 0.00 C
ATOM 3 C GLY A 1 17.026 -15.603 9.556 1.00 0.00 C
ATOM 4 O GLY A 1 17.709 -14.957 8.761 1.00 0.00 O
ATOM 5 1H GLY A 1 17.057 -18.924 9.375 1.00 0.00 H
ATOM 6 2H GLY A 1 18.312 -17.897 9.186 1.00 0.00 H
ATOM 7 3H GLY A 1 17.535 -17.982 10.620 1.00 0.00 H
ATOM 8 1HA GLY A 1 16.363 -17.053 8.112 1.00 0.00 H
ATOM 9 2HA GLY A 1 15.530 -17.144 9.652 1.00 0.00 H
ATOM 10 N MET A 2 16.700 -15.152 10.764 1.00 0.00 N
ATOM 11 CA MET A 2 16.989 -13.784 11.176 1.00 0.00 C
Let’s run this design. Take note we are using the /0
syntax in contigs
which refers to a chain break in the syntax.
[21]:
contigs = "172-172/0 34-34"
design = rfdiffusion.generate(
structure_file=pdb,
contigs=contigs,
provide_seq=provide_seq,
partial_T=10,
N=1,
)
design
[21]:
RFdiffusionJob(job_id='a52df40b-8438-402d-9ab9-f4d97d7064eb', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 22, 28, 780009, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[22]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:22<00:00, 4.48it/s, status=SUCCESS]
ATOM 1 N GLY A 1 2.783 -22.631 -8.750 1.00 0.00
ATOM 2 CA GLY A 1 2.306 -21.472 -9.494 1.00 0.00
ATOM 3 C GLY A 1 2.318 -20.218 -8.628 1.00 0.00
ATOM 4 O GLY A 1 2.653 -19.131 -9.098 1.00 0.00
ATOM 5 N GLY A 2 2.005 -20.393 -7.442 1.00 0.00
ATOM 6 CA GLY A 2 1.981 -19.249 -6.538 1.00 0.00
ATOM 7 C GLY A 2 3.386 -18.715 -6.288 1.00 0.00
ATOM 8 O GLY A 2 3.595 -17.504 -6.210 1.00 0.00
ATOM 9 N GLY A 3 4.315 -19.601 -6.177 1.00 0.00
ATOM 10 CA GLY A 3 5.687 -19.158 -5.960 1.00 0.00
Binder design#
We can use RFdiffusion to do binder design using the same contigs syntax. To do so, we use the /0
chain break syntax in contigs
. Additionally, we specify hotspot
residues, which tells the model where the protein should make contact.
[23]:
contigs = "A1-150/0 70-100"
hotspot = "A59,A83,A91"
We also add some additional configuration beyond the basic set of design parameters shown by rfdiffusion.generate?
. Our RFdiffusion model interface also takes in the full set of configuration provided by RFdiffusion as kwargs. Note however that these are advanced settings and you should be properly familiar with RFdiffusion to use them. Here, we will reduce the noise added during inference to 0, to improve the quality of the designs, as recommended in their examples. Since RFdiffusion uses
the hydra flattened dot syntax, we use a dictionary to hold the additonals args:
[24]:
kwargs = {
"denoiser.noise_scale_ca": 0,
"denoiser.noise_scale_frame": 0
}
Let’s use the insulin target example from the official repo:
[25]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/insulin_target.pdb").text
print("\n".join(pdb.splitlines()[11:21]))
ATOM 1 N GLU A 1 47.177 109.949 22.284 1.00 0.00 N
ATOM 2 CA GLU A 1 46.223 108.850 22.182 1.00 0.00 C
ATOM 3 C GLU A 1 44.813 109.180 22.682 1.00 0.00 C
ATOM 4 O GLU A 1 44.597 110.149 23.432 1.00 0.00 O
ATOM 5 CB GLU A 1 46.752 107.602 22.893 1.00 0.00 C
ATOM 6 CG GLU A 1 47.989 107.042 22.245 1.00 0.00 C
ATOM 7 CD GLU A 1 47.721 106.544 20.773 1.00 0.00 C
ATOM 8 OE1 GLU A 1 46.771 105.825 20.555 1.00 0.00 O
ATOM 9 OE2 GLU A 1 48.479 106.896 19.900 1.00 0.00 O
ATOM 10 1H GLU A 1 48.067 109.652 21.939 1.00 0.00 H
Let’s run the binder design:
[26]:
design = rfdiffusion.generate(
structure_file=pdb,
contigs=contigs,
hotspot=hotspot,
N=1,
**kwargs
)
design
[26]:
RFdiffusionJob(job_id='dfd45782-3ba3-4b1b-ba1d-9a1697a4b77f', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 22, 53, 716417, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[27]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:17<00:00, 1.29it/s, status=SUCCESS]
ATOM 1 N GLY A 1 -26.746 -16.007 -3.126 1.00 0.00
ATOM 2 CA GLY A 1 -27.132 -15.454 -1.834 1.00 0.00
ATOM 3 C GLY A 1 -26.688 -16.360 -0.692 1.00 0.00
ATOM 4 O GLY A 1 -26.297 -15.885 0.374 1.00 0.00
ATOM 5 N GLY A 2 -26.679 -17.711 -0.968 1.00 0.00
ATOM 6 CA GLY A 2 -26.180 -18.645 0.033 1.00 0.00
ATOM 7 C GLY A 2 -24.691 -18.440 0.284 1.00 0.00
ATOM 8 O GLY A 2 -24.232 -18.489 1.425 1.00 0.00
ATOM 9 N GLY A 3 -23.925 -18.225 -0.793 1.00 0.00
ATOM 10 CA GLY A 3 -22.495 -17.973 -0.664 1.00 0.00
Higher diversity topologies using complex beta model#
RFdiffusion also provides a beta model checkpoint. This is because the default model often generates helical binders which have higher computational and experimental success rates. The beta model generates a greater diversity of topologies, but has not been extensively experimentally validated. Use at your own risk:
[28]:
design = rfdiffusion.generate(
structure_file=pdb,
contigs=contigs,
hotspot=hotspot,
N=1,
use_beta_model=True,
**kwargs
)
design
[28]:
RFdiffusionJob(job_id='5eba9326-21aa-4ebf-b098-f43b578e680d', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 24, 13, 629710, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[29]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:23<00:00, 1.20it/s, status=SUCCESS]
ATOM 1 N GLY A 1 5.993 -12.514 -26.071 1.00 0.00
ATOM 2 CA GLY A 1 5.682 -11.358 -25.238 1.00 0.00
ATOM 3 C GLY A 1 4.622 -11.698 -24.198 1.00 0.00
ATOM 4 O GLY A 1 3.746 -12.529 -24.439 1.00 0.00
ATOM 5 N GLY A 2 4.485 -10.854 -23.249 1.00 0.00
ATOM 6 CA GLY A 2 3.403 -10.944 -22.276 1.00 0.00
ATOM 7 C GLY A 2 2.369 -9.848 -22.499 1.00 0.00
ATOM 8 O GLY A 2 2.713 -8.715 -22.838 1.00 0.00
ATOM 9 N GLY A 3 1.145 -10.131 -22.209 1.00 0.00
ATOM 10 CA GLY A 3 0.061 -9.183 -22.438 1.00 0.00
Refer to the official RFdiffusion documentation’s section on Practical Considerations for Binder Design for some tips on the topic.
Fold conditioning#
We can also condition binder designs on particular topologies, by providing (partial) secondary structure and block adjacency information. An example is to design a TIM barrel but not requiring exact coordinates for the residues. We provide this additional information by specifying the structure to condition on using scaffold_target_structure_file
. Let’s get the TIM barrel:
[30]:
pdb = get_pdb("6WVS")
print("\n".join(pdb.splitlines()[100:110]))
ATOM 44 HG LEU A 4 -3.643 12.831 7.422 1.00 63.20 H
ATOM 45 HD11 LEU A 4 -5.578 13.378 8.608 1.00 72.64 H
ATOM 46 HD12 LEU A 4 -4.379 14.331 9.029 1.00 72.64 H
ATOM 47 HD13 LEU A 4 -5.503 14.859 8.038 1.00 72.64 H
ATOM 48 HD21 LEU A 4 -5.678 12.418 6.361 1.00 68.59 H
ATOM 49 HD22 LEU A 4 -5.596 13.845 5.668 1.00 68.59 H
ATOM 50 HD23 LEU A 4 -4.535 12.721 5.300 1.00 68.59 H
ATOM 51 N ILE A 5 -0.925 15.785 4.277 1.00 40.62 N
ATOM 52 CA ILE A 5 0.164 16.690 3.928 1.00 40.18 C
ATOM 53 C ILE A 5 -0.408 18.094 3.803 1.00 46.92 C
This instructs the system to make the secondary structure and block adjacency information based on the provided scaffold_target_structure_file
, before running scaffold guided inference, which does the fold conditioning.
[31]:
# Additional kwargs provided from example
# Reduce noise to 0.5 for better results
# Sample additional length to increase diversity of the outputs
# Specifically, we mask the loops and insert 0-5 residues (randomly sampled per-loop) into each loop
# Add 0-5 residues (randomly sampled) to the N and the C-terminus
kwargs = {
"denoiser.noise_scale_ca": 0.5,
"denoiser.noise_scale_frame": 0.5,
"scaffoldguided.mask_loops": True,
"scaffoldguided.sampled_insertion": "0-5",
"scaffoldguided.sampled_N": "0-5",
"scaffoldguided.sampled_C": "0-5",
}
design = rfdiffusion.generate(
scaffold_target_structure_file=pdb,
N=1,
**kwargs
)
design
[31]:
RFdiffusionJob(job_id='6845b62e-67d6-46b3-939c-9a9c665472e0', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 25, 41, 498536, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Wait for and retrieve the design:
[32]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:17<00:00, 1.29it/s, status=SUCCESS]
ATOM 1 N GLY A 1 22.987 -8.520 6.056 1.00 0.00
ATOM 2 CA GLY A 1 21.963 -7.664 5.468 1.00 0.00
ATOM 3 C GLY A 1 21.814 -6.366 6.251 1.00 0.00
ATOM 4 O GLY A 1 21.695 -6.379 7.476 1.00 0.00
ATOM 5 N GLY A 2 22.074 -5.335 5.647 1.00 0.00
ATOM 6 CA GLY A 2 21.872 -4.012 6.227 1.00 0.00
ATOM 7 C GLY A 2 20.426 -3.560 6.076 1.00 0.00
ATOM 8 O GLY A 2 19.787 -3.818 5.056 1.00 0.00
ATOM 9 N GLY A 3 20.038 -2.546 6.827 1.00 0.00
ATOM 10 CA GLY A 3 18.659 -2.076 6.757 1.00 0.00
Binder design to flexible peptides#
RFdiffusion can be used to design binders to flexible peptides, where the 3D coordinates of the peptide are not specified, but the secondary structure can be. This allows a user to design binders to a peptide in e.g. either a helical or beta state. The principle here is that we provide an input pdb structure of a peptide, but specify that we want to mask the 3D structure (inpaint_str
). Here, we’re making 70-100
amino acid binders to the tau peptide (pdb indices B165-178
), and we mask
the structure with contigmap.inpaint_str
on this peptide. However, we can then specify that we want it to adopt a helix secondary structure:
[33]:
# Get tau peptide from RFdiffusion repo
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/tau_peptide.pdb").text
contigs = "70-100/0 B165-178"
kwargs = {
"scaffoldguided.scaffoldguided": True,
"contigmap.inpaint_str": "[B165-178]",
"contigmap.inpaint_str_helix": "[B165-178]",
}
design = rfdiffusion.generate(
structure_file=pdb,
contigs=contigs,
N=1,
**kwargs,
)
design
[33]:
RFdiffusionJob(job_id='e3f0a2fa-f5fc-4160-93ad-5ac2dff71de6', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 29, 0, 797164, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[34]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:38<00:00, 2.58it/s, status=SUCCESS]
ATOM 1 N GLY A 1 32.330 1.561 6.905 1.00 0.00
ATOM 2 CA GLY A 1 31.012 1.668 7.521 1.00 0.00
ATOM 3 C GLY A 1 30.015 2.326 6.576 1.00 0.00
ATOM 4 O GLY A 1 28.866 1.894 6.473 1.00 0.00
ATOM 5 N GLY A 2 30.433 3.363 5.969 1.00 0.00
ATOM 6 CA GLY A 2 29.550 4.053 5.036 1.00 0.00
ATOM 7 C GLY A 2 29.246 3.186 3.821 1.00 0.00
ATOM 8 O GLY A 2 28.109 3.135 3.352 1.00 0.00
ATOM 9 N GLY A 3 30.238 2.527 3.315 1.00 0.00
ATOM 10 CA GLY A 3 30.026 1.642 2.176 1.00 0.00
You could alternatively specify to adopt a beta (strand) secondary structure with contigmap.inpaint_str_strand
.
Generating symmetric oligomers#
We can use RFdiffusion to generate structures of different symmetries. Use symmetry to specify one of cyclic
, dihedral
or tetrahedral
. You can provide the order
in the case of cyclic
or dihedral
(defaults to 1). RFdiffusion also provides the use of auxiliary potentials to help guide the inferencing process, which seem to help with motif scaffolding and symmetric oligomer generation. We have defaulted the use of potentials when it comes to symmetric oligomer generation,
using the default parameters specified in the RFdiffusion documentation and examples, which looks like:
[35]:
# these are the default potentials options already added whenever you do any symmetric oligomer generation.
kwargs = {
"potentials.guiding_potentials": "[\"type:olig_contacts,weight_intra:1,weight_inter:0.1\"]",
"potentials.olig_intra_all": True,
"potentials.olig_inter_all": True,
"potentials.guide_scale": 2.0,
"potentials.guide_decay": "quadratic"
}
Use add_potential = False
explicitly to turn it off, and specify your own potentials if desired.
Cyclic#
[36]:
design = rfdiffusion.generate(
symmetry="cyclic",
order=6,
contigs=480,
)
design
[36]:
RFdiffusionJob(job_id='b83a1fc7-dff6-4e98-a72f-b3d9fbb5fa7d', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 29, 41, 715834, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[37]:
result = design.wait(verbose=True)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [04:13<00:00, 2.53s/it, status=SUCCESS]
ATOM 1 N GLY A 1 15.816 5.285 22.037 1.00 0.00
ATOM 2 CA GLY A 1 15.583 4.007 21.376 1.00 0.00
ATOM 3 C GLY A 1 15.737 4.131 19.866 1.00 0.00
ATOM 4 O GLY A 1 15.128 4.998 19.239 1.00 0.00
ATOM 5 N GLY A 2 16.670 3.571 19.434 1.00 0.00
ATOM 6 CA GLY A 2 16.844 3.550 17.987 1.00 0.00
ATOM 7 C GLY A 2 16.564 2.165 17.418 1.00 0.00
ATOM 8 O GLY A 2 17.075 1.163 17.919 1.00 0.00
ATOM 9 N GLY A 3 15.806 2.161 16.413 1.00 0.00
ATOM 10 CA GLY A 3 15.417 0.911 15.772 1.00 0.00
Dihedral#
[38]:
design = rfdiffusion.generate(
symmetry="dihedral",
order=2,
contigs=320,
)
design
[38]:
RFdiffusionJob(job_id='7b08c403-9be3-478e-a55d-4051ea63defb', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 33, 57, 215048, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[39]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [02:07<00:00, 1.27s/it, status=SUCCESS]
ATOM 1 N GLY A 1 32.710 -4.716 -9.170 1.00 0.00
ATOM 2 CA GLY A 1 31.865 -4.144 -8.129 1.00 0.00
ATOM 3 C GLY A 1 31.160 -2.887 -8.621 1.00 0.00
ATOM 4 O GLY A 1 30.004 -2.637 -8.276 1.00 0.00
ATOM 5 N GLY A 2 31.858 -2.112 -9.270 1.00 0.00
ATOM 6 CA GLY A 2 31.238 -0.910 -9.815 1.00 0.00
ATOM 7 C GLY A 2 30.088 -1.259 -10.751 1.00 0.00
ATOM 8 O GLY A 2 29.053 -0.592 -10.754 1.00 0.00
ATOM 9 N GLY A 3 30.256 -2.281 -11.493 1.00 0.00
ATOM 10 CA GLY A 3 29.168 -2.701 -12.368 1.00 0.00
Tetrahedral#
[40]:
# order is ignored for tetrahedral
design = rfdiffusion.generate(
symmetry="tetrahedral",
contigs=1200,
)
design
[40]:
RFdiffusionJob(job_id='2ef4b3ce-582a-49dc-b217-1fd0d9a3445e', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 36, 6, 673269, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
This one takes a little longer due to the longer length.
[41]:
result = design.wait(verbose=True, timeout=1500) # takes longer for the longer sequence
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 510.40it/s, status=SUCCESS]
ATOM 1 N GLY A 1 -25.230 -22.272 10.678 1.00 0.00
ATOM 2 CA GLY A 1 -25.466 -20.849 10.466 1.00 0.00
ATOM 3 C GLY A 1 -26.015 -20.188 11.724 1.00 0.00
ATOM 4 O GLY A 1 -25.740 -19.019 11.993 1.00 0.00
ATOM 5 N GLY A 2 -26.675 -20.934 12.454 1.00 0.00
ATOM 6 CA GLY A 2 -27.204 -20.346 13.679 1.00 0.00
ATOM 7 C GLY A 2 -26.081 -19.891 14.602 1.00 0.00
ATOM 8 O GLY A 2 -26.180 -18.848 15.248 1.00 0.00
ATOM 9 N GLY A 3 -25.012 -20.556 14.552 1.00 0.00
ATOM 10 CA GLY A 3 -23.901 -20.147 15.403 1.00 0.00
Symmetric motif scaffolding#
We can combine motif scaffolding with symmetric generation to scaffold motifs Here. symmetrically we are doing a C4 symmetric nickel design shown in the RFdiffusion paper.
[42]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/nickel_symmetric_motif.pdb").text
design = rfdiffusion.generate(
symmetry="cyclic",
order=4,
structure_file=pdb,
contigs="50/A2-4/50/0 50/A7-9/50/0 50/A12-14/50/0 50/A17-19/50/0",
N=1,
)
design
[42]:
RFdiffusionJob(job_id='4fac45e7-3232-4409-b596-9cb0c1f91ba3', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 10, 2, 59, 498290, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[43]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [03:10<00:00, 1.90s/it, status=SUCCESS]
ATOM 1 N GLY A 1 -18.280 11.394 -28.960 1.00 0.00
ATOM 2 CA GLY A 1 -19.500 11.542 -28.176 1.00 0.00
ATOM 3 C GLY A 1 -19.500 10.605 -26.975 1.00 0.00
ATOM 4 O GLY A 1 -19.956 10.970 -25.891 1.00 0.00
ATOM 5 N GLY A 2 -19.051 9.465 -27.171 1.00 0.00
ATOM 6 CA GLY A 2 -18.974 8.562 -26.029 1.00 0.00
ATOM 7 C GLY A 2 -18.024 9.099 -24.966 1.00 0.00
ATOM 8 O GLY A 2 -18.276 8.964 -23.769 1.00 0.00
ATOM 9 N GLY A 3 -16.946 9.658 -25.393 1.00 0.00
ATOM 10 CA GLY A 3 -16.017 10.222 -24.421 1.00 0.00
Macrocyclic peptide design with RFpeptides#
The newly published RFpeptide protocol, for designing macrocyclic peptides that bind target proteins with atomic accuracy, can be accessed using the flags inference.cyclic=True
and inference.cyc_chains
. The former instructs the system to design at least one macrocycle, and the latter is just a string containing the letter of every chain you would like to design as a cyclic peptide. For example, inference.cyc_chains='a'
means only chain A is cyclized, but one could do
inference.cyc_chains='abcd'
for chains A to D to be cyclized.
Macrocyclic binder design#
We can add the two flags for macrocyclic peptide design to our binder design:
[44]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/7zkr_GABARAP.pdb").text
kwargs = {
"inference.cyclic": True,
"inference.cyc_chains": "a",
}
design = rfdiffusion.generate(
structure_file=pdb,
contigs="12-18 A3-117/0",
hotspot="A51,A52,A50,A48,A62,A65",
N=1,
**kwargs,
)
design
[44]:
RFdiffusionJob(job_id='44fc4c6b-1a4d-4c92-a252-ce5f501a0963', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 10, 6, 11, 323296, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[45]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:46<00:00, 2.16it/s, status=SUCCESS]
ATOM 1 N GLY A 1 2.149 8.945 7.688 1.00 0.00
ATOM 2 CA GLY A 1 1.594 7.613 7.895 1.00 0.00
ATOM 3 C GLY A 1 1.251 7.380 9.361 1.00 0.00
ATOM 4 O GLY A 1 1.955 7.850 10.255 1.00 0.00
ATOM 5 N GLY A 2 0.155 6.778 9.609 1.00 0.00
ATOM 6 CA GLY A 2 -0.244 6.391 10.957 1.00 0.00
ATOM 7 C GLY A 2 -0.226 4.877 11.124 1.00 0.00
ATOM 8 O GLY A 2 -0.674 4.141 10.245 1.00 0.00
ATOM 9 N GLY A 3 0.565 4.412 11.986 1.00 0.00
ATOM 10 CA GLY A 3 0.658 2.990 12.294 1.00 0.00
Macrocyclic monomer design#
Same for monomer design:
[46]:
kwargs = {
"inference.cyclic": True,
"inference.cyc_chains": "a",
}
design = rfdiffusion.generate(
structure_file=pdb,
contigs="12-18",
N=1,
**kwargs,
)
design
[46]:
RFdiffusionJob(job_id='7e777fdd-b684-4024-902d-429135993c6f', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 10, 6, 59, 300539, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[47]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:36<00:00, 2.76it/s, status=SUCCESS]
ATOM 1 N GLY A 1 -0.391 3.319 5.675 1.00 0.00
ATOM 2 CA GLY A 1 -1.312 4.425 5.905 1.00 0.00
ATOM 3 C GLY A 1 -0.560 5.718 6.193 1.00 0.00
ATOM 4 O GLY A 1 0.356 5.743 7.015 1.00 0.00
ATOM 5 N GLY A 2 0.333 6.144 6.421 1.00 0.00
ATOM 6 CA GLY A 2 0.840 7.455 6.809 1.00 0.00
ATOM 7 C GLY A 2 1.456 8.181 5.620 1.00 0.00
ATOM 8 O GLY A 2 1.796 9.361 5.709 1.00 0.00
ATOM 9 N GLY A 3 1.361 7.974 4.648 1.00 0.00
ATOM 10 CA GLY A 3 2.026 8.526 3.474 1.00 0.00