Using RFdiffusion#

This tutorial shows you how to use the RFdiffusion model to design novel protein structures.

The examples here are largely lifted from the original documentation but adapted to show how it can be run using the OpenProtein platform, which can then be combined with our other workflows!

Full credit for the examples and use cases go to the authors of RFdiffusion!

Unconditional monomer design#

The basic execution of RFdiffusion would be an unconditional design of a protein structure of a certain length. You would need 2 things:

  1. Length of the protein

  2. Number of designs N desired

[1]:
length = 150
N = 3
[2]:
rfdiffusion = session.models.rfdiffusion
rfdiffusion.generate?
Signature:
rfdiffusion.generate(
    n: int = 1,
    structure_file: str | bytes | typing.BinaryIO | None = None,
    contigs: int | str | None = None,
    inpaint_seq: str | None = None,
    provide_seq: str | None = None,
    hotspot: str | None = None,
    T: int | None = None,
    partial_T: int | None = None,
    use_active_site_model: bool | None = None,
    use_beta_model: bool | None = None,
    symmetry: Optional[Literal['cyclic', 'dihedral', 'tetrahedral']] = None,
    order: int | None = None,
    add_potential: bool | None = None,
    scaffold_target_structure_file: str | bytes | typing.BinaryIO | None = None,
    scaffold_target_use_struct: bool = False,
    **kwargs,
) -> openprotein.models.foundation.rfdiffusion.RFdiffusionFuture
Docstring:
Run a protein structure generate job using RFdiffusion.

Parameters
----------
n : int, optional
    The number of unique design trajectories to run (default is 1).
structure_file : BinaryIO, optional
    An input PDB file (as a file-like object) used for inpainting or other
    guided design tasks where parts of an existing structure are provided.
contigs : int, str, optional
    Defines the lengths and connectivity of chain segments for the desired
    structure, specified in RFdiffusion's contig string format.
    Required for most design tasks. Example: 150, '10-20/A100-110/10-20' for a
    binder design.
inpaint_seq : str, optional
    A string specifying the regions in the input structure to mask for
    in-painting. Example: 'A1-A10/A30-40'.
provide_seq : str, optional
    A string specifying which segments of the contig have a provided
    sequence. Example: 'A1-A10/A30-40'.
hotspot : str, optional
    A string specifying hotspot residues to constrain during design,
    typically for functional sites. Example: 'A10,A12,A14'.
T : int, optional
    The number of timesteps for the diffusion process.
partial_T : int, optional
    The number of timesteps for partial diffusion.
use_active_site_model : bool, optional
    If True, uses the active site model checkpoint, which has been finetuned to
    better keep very small motifs in place in the output for motif scaffolding
    (default is False).
use_beta_model : bool, optional
    If True, uses the complex beta model checkpoint, which generates a
    greater diversity of topologies but has not been extensively
    experimentally validated (default is False).
symmetry : {"cyclic", "dihedral", "tetrahedral"}, optional
    The type of symmetry to apply to the design.
order : int, optional
    The order of the symmetry (e.g., 3 for C3 or D3 symmetry).
    Must be provided if `symmetry` is set.
add_potential : bool, optional
    A flag to toggle an additional potential to guide the design.
    This defaults to true in the case of symmetric design.
scaffold_target_structure_file : str, bytes, BinaryIO, optional
    A PDB file (which can be the text string or bytes or the file-like
    object) containing a scaffold structure to be used as a structural
    guide. It could also be used as a target when doing scaffold guided
scaffold_target_use_struct : bool, optional
    Whether or not to use the provided scaffold structure as a target.
    Otherwise, it is used only as a topology guide.

Other Parameters
----------------
**kwargs : dict
    Additional keyword args that are passed directly to the rfdiffusion
    inference script. Overwrites any preceding options.

Returns
-------
RFdiffusionFuture
    A future object that can be used to retrieve the results of the design
    job upon completion.
File:      ~/Projects/openprotein/openprotein-python-private/openprotein/models/foundation/rfdiffusion.py
Type:      method

Now let’s get the model:

Run the design using RFdiffusion:

[3]:
design = rfdiffusion.generate(N=N, contigs=length)
design
[3]:
RFdiffusionJob(job_id='169c020b-b992-4cd0-8097-26be7f0bdcb1', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 13, 54, 617888, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

Wait for the job to finish running with wait_until_done.

[4]:
design.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [06:02<00:00,  3.63s/it, status=SUCCESS]
[4]:
True

Retrieve the PDB file of the design. Use the replicate param to specify the 0-indexed design index to retrieve, in this case 0 to 2.

[5]:
result = design.get(replicate=0)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
ATOM      1  N   GLY A   1       2.829   7.051  28.939  1.00  0.00
ATOM      2  CA  GLY A   1       2.481   7.591  27.630  1.00  0.00
ATOM      3  C   GLY A   1       3.553   7.267  26.598  1.00  0.00
ATOM      4  O   GLY A   1       3.246   6.934  25.453  1.00  0.00
ATOM      5  N   GLY A   2       4.718   7.479  26.963  1.00  0.00
ATOM      6  CA  GLY A   2       5.797   7.175  26.031  1.00  0.00
ATOM      7  C   GLY A   2       5.856   5.684  25.725  1.00  0.00
ATOM      8  O   GLY A   2       6.107   5.284  24.588  1.00  0.00
ATOM      9  N   GLY A   3       5.607   4.890  26.722  1.00  0.00
ATOM     10  CA  GLY A   3       5.590   3.454  26.474  1.00  0.00

Motif Scaffolding#

RFdiffusion can be used to scaffold motifs. To do this, we need a few things:

  1. some particular protein input, from a .pdb

  2. how to connect these proteins and by how many residues in the new protein

  3. some sample of lengths for the new protein, similar to the above

First, let’s get our pdb from the RCSB protein data bank. We will be using 5TPN, which represents the crystal structure of RSV F in complex with human antibody hRSV90.

[6]:
import requests
import gzip
import io
def get_pdb(code: str) -> str:
    with requests.get(f"https://files.rcsb.org/download/{code}.pdb1.gz", stream=True) as r:
        r.raise_for_status()
        buf = io.BytesIO(r.content)
        with gzip.open(buf, 'rb') as f:
            pdb = f.read().decode()
            return pdb
pdb = get_pdb("5TPN")
print("\n".join(pdb.splitlines()[210:220]))
ATOM      9  N   ILE A  28      30.006 -91.995 -29.741  1.00 73.73           N
ATOM     10  CA  ILE A  28      28.960 -92.036 -28.738  1.00 66.97           C
ATOM     11  C   ILE A  28      29.318 -93.052 -27.671  1.00 72.27           C
ATOM     12  O   ILE A  28      29.765 -94.165 -27.983  1.00 73.82           O
ATOM     13  CB  ILE A  28      27.604 -92.343 -29.396  1.00 70.42           C
ATOM     14  CG1 ILE A  28      27.236 -91.185 -30.311  1.00 72.90           C
ATOM     15  CG2 ILE A  28      26.519 -92.523 -28.364  1.00 61.95           C
ATOM     16  CD1 ILE A  28      27.324 -89.849 -29.613  1.00 71.56           C
ATOM     17  N   THR A  29      29.133 -92.652 -26.403  1.00 74.86           N
ATOM     18  CA  THR A  29      29.375 -93.492 -25.230  1.00 67.44           C

Now we need to specify the motif that we are interested in. This is also done along with the length of the output protein we are interested in using the contigs syntax from RFdiffusion. In particular, we are interested in the residues 163-181 (inclusively) on chain A in the input pdb, and we are interested in randomly sampling 10-40 residues on the N and C terminus. This works out to the following syntax:

[7]:
contigs = "10-40/A163-181/10-40"

Refer to the RFdiffusion documentation for more examples and explanation of the contigs syntax.

Now let’s run our design:

[8]:
design = rfdiffusion.generate(
    structure_file=pdb,
    contigs=contigs,
    N=1,
)
design
[8]:
RFdiffusionJob(job_id='ae149a8b-07b8-43ce-abb3-3a93b99fa2df', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 20, 4, 362109, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[9]:
design.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:38<00:00,  2.58it/s, status=SUCCESS]
[9]:
True
[10]:
result = design.get()
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
ATOM      1  N   GLY A   1     -20.735   6.261   4.968  1.00  0.00
ATOM      2  CA  GLY A   1     -19.299   6.011   5.004  1.00  0.00
ATOM      3  C   GLY A   1     -18.972   4.606   4.516  1.00  0.00
ATOM      4  O   GLY A   1     -19.699   3.654   4.801  1.00  0.00
ATOM      5  N   GLY A   2     -17.999   4.543   3.767  1.00  0.00
ATOM      6  CA  GLY A   2     -17.574   3.224   3.314  1.00  0.00
ATOM      7  C   GLY A   2     -16.419   2.698   4.157  1.00  0.00
ATOM      8  O   GLY A   2     -15.314   3.239   4.121  1.00  0.00
ATOM      9  N   GLY A   3     -16.696   1.761   4.909  1.00  0.00
ATOM     10  CA  GLY A   3     -15.676   1.171   5.768  1.00  0.00

Small motifs using active site model#

With very small motifs, RFdiffusion has the tendency to not keep them perfectly fixed in the output. For very small input functional motifs, RFdiffusion recommends using the active site model which is finetuned for such tasks. This is specified using the use_active_site_model:

[11]:
contigs = "10-40/A170-173/10-40"
design = rfdiffusion.generate(
    structure_file=pdb,
    contigs=contigs,
    N=1,
    use_active_site_model=True
)
design
[11]:
RFdiffusionJob(job_id='b18fe59b-b7cf-4ff8-9f96-1f2432ec1b4d', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 20, 45, 39364, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[12]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:38<00:00,  2.57it/s, status=SUCCESS]
ATOM      1  N   GLY A   1     -24.784   9.183  17.739  1.00  0.00
ATOM      2  CA  GLY A   1     -25.113   7.763  17.749  1.00  0.00
ATOM      3  C   GLY A   1     -23.861   6.909  17.900  1.00  0.00
ATOM      4  O   GLY A   1     -22.853   7.144  17.232  1.00  0.00
ATOM      5  N   GLY A   2     -23.743   6.016  18.764  1.00  0.00
ATOM      6  CA  GLY A   2     -22.548   5.192  18.901  1.00  0.00
ATOM      7  C   GLY A   2     -22.379   4.264  17.705  1.00  0.00
ATOM      8  O   GLY A   2     -21.262   4.024  17.247  1.00  0.00
ATOM      9  N   GLY A   3     -23.442   3.869  17.061  1.00  0.00
ATOM     10  CA  GLY A   3     -23.352   3.029  15.872  1.00  0.00

Inpainting#

inpaint_seq lets you hide the amino acid identities of specific residues in your input structure. RFdiffusion will then fill in these residues during design, choosing sequences that fit the new structural context.

For example, if you’re fusing two proteins, residues that were originally on the surface (often polar) might end up buried in the core. Instead of manually mutating them to hydrophobic residues, you can mask them with inpaint_seq so RFdiffusion can decide on the best replacements automatically.

[13]:
inpaint_seq = "A163-168/A170-171/A179"

This means we are masking the residue identities of A163 to A168 (inclusive), and residues A170, A171 and A179.

[14]:
# reset the contigs from the very small motif for inpaint example
contigs = "10-40/A163-181/10-40"
design = rfdiffusion.generate(
    structure_file=pdb,
    contigs=contigs,
    inpaint_seq=inpaint_seq,
    N=1,
)
design
[14]:
RFdiffusionJob(job_id='be51eb2f-e86d-4e68-939f-3351ecdc6444', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 21, 26, 735906, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[15]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:38<00:00,  2.58it/s, status=SUCCESS]
ATOM      1  N   GLY A   1     -20.970  12.196  12.412  1.00  0.00
ATOM      2  CA  GLY A   1     -21.400  10.976  13.086  1.00  0.00
ATOM      3  C   GLY A   1     -20.229  10.027  13.306  1.00  0.00
ATOM      4  O   GLY A   1     -19.292   9.986  12.510  1.00  0.00
ATOM      5  N   GLY A   2     -20.221   9.300  14.400  1.00  0.00
ATOM      6  CA  GLY A   2     -19.185   8.320  14.703  1.00  0.00
ATOM      7  C   GLY A   2     -19.053   7.294  13.585  1.00  0.00
ATOM      8  O   GLY A   2     -17.949   6.860  13.255  1.00  0.00
ATOM      9  N   GLY A   3     -20.200   6.833  13.049  1.00  0.00
ATOM     10  CA  GLY A   3     -20.176   5.891  11.936  1.00  0.00

Partial diffusion#

We can use partial diffusion to get some diversity around a general fold. This is done using the partial_T parameter and setting to some timestep to ‘noise’ to. Higher noise means higher diversity. You should sample different values for your specific design problem, but the typical value used by the authors was 20.

With partial diffusion, there is a constraint on contigs since the diffusion is done from a known structure - the contig string has to yield the exact same length as the input protein.

Let’s use 2KL8:

[16]:
pdb = get_pdb("2KL8")
print("\n".join(pdb.splitlines()[100:110]))
ATOM     63  HB3 ASP A   4      -2.932   0.810   7.543  1.00 38.00           H
ATOM     64  N   ILE A   5      -2.097   0.988   4.087  1.00 23.43           N
ATOM     65  CA  ILE A   5      -2.325  -0.005   3.033  1.00  2.14           C
ATOM     66  C   ILE A   5      -1.796  -1.382   3.465  1.00 34.22           C
ATOM     67  O   ILE A   5      -0.589  -1.598   3.543  1.00 12.12           O
ATOM     68  CB  ILE A   5      -1.641   0.412   1.707  1.00 35.23           C
ATOM     69  CG1 ILE A   5      -2.050   1.844   1.323  1.00 35.34           C
ATOM     70  CG2 ILE A   5      -1.998  -0.571   0.590  1.00 71.23           C
ATOM     71  CD1 ILE A   5      -1.388   2.353   0.059  1.00 22.53           C
ATOM     72  H   ILE A   5      -1.215   1.402   4.168  1.00 38.00           H

And run partial diffusion for 10 timesteps:

[17]:
length = 85
design = rfdiffusion.generate(
    structure_file=pdb,
    contigs=length,
    partial_T=10,
    N=1,
)
design
[17]:
RFdiffusionJob(job_id='2bb73c76-f0df-4d38-bbd6-fbbbe612d666', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 22, 9, 356090, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[18]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:16<00:00,  5.92it/s, status=SUCCESS]
ATOM      1  N   GLY A   1      -1.921  11.519   5.041  1.00  0.00
ATOM      2  CA  GLY A   1      -2.816  10.371   4.958  1.00  0.00
ATOM      3  C   GLY A   1      -2.049   9.063   5.094  1.00  0.00
ATOM      4  O   GLY A   1      -0.922   8.940   4.615  1.00  0.00
ATOM      5  N   GLY A   2      -2.809   8.099   5.541  1.00  0.00
ATOM      6  CA  GLY A   2      -2.225   6.768   5.656  1.00  0.00
ATOM      7  C   GLY A   2      -3.078   5.728   4.941  1.00  0.00
ATOM      8  O   GLY A   2      -4.305   5.745   5.039  1.00  0.00
ATOM      9  N   GLY A   3      -2.394   4.880   4.237  1.00  0.00
ATOM     10  CA  GLY A   3      -3.064   3.781   3.551  1.00  0.00

Keeping some sequences#

You can also keep parts of the sequence of the partially diffused chain fixed, using provide_seq. An example of why you might want to do this is in the context of helical peptide binding. If you’ve threaded a helical peptide sequence onto an ideal helix, and now want to diversify the complex, allowing the helix to be predicted now not as an ideal helix, you might do something like:

[19]:
provide_seq = "172-205"
# Provide multiple sequences using a comma delimiter
# provide_seq = "172-177,200-205"

This means we want to keep residues 172 to 205 fixed.

Let’s use the helical peptide example from the RFdiffusion repo:

[20]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/peptide_complex_ideal_helix.pdb").text
print("\n".join(pdb.splitlines()[10:20]))
ATOM      2  CA  GLY A   1      16.504 -16.986   9.191  1.00  0.00           C
ATOM      3  C   GLY A   1      17.026 -15.603   9.556  1.00  0.00           C
ATOM      4  O   GLY A   1      17.709 -14.957   8.761  1.00  0.00           O
ATOM      5 1H   GLY A   1      17.057 -18.924   9.375  1.00  0.00           H
ATOM      6 2H   GLY A   1      18.312 -17.897   9.186  1.00  0.00           H
ATOM      7 3H   GLY A   1      17.535 -17.982  10.620  1.00  0.00           H
ATOM      8 1HA  GLY A   1      16.363 -17.053   8.112  1.00  0.00           H
ATOM      9 2HA  GLY A   1      15.530 -17.144   9.652  1.00  0.00           H
ATOM     10  N   MET A   2      16.700 -15.152  10.764  1.00  0.00           N
ATOM     11  CA  MET A   2      16.989 -13.784  11.176  1.00  0.00           C

Let’s run this design. Take note we are using the /0 syntax in contigs which refers to a chain break in the syntax.

[21]:
contigs = "172-172/0 34-34"
design = rfdiffusion.generate(
    structure_file=pdb,
    contigs=contigs,
    provide_seq=provide_seq,
    partial_T=10,
    N=1,
)
design
[21]:
RFdiffusionJob(job_id='a52df40b-8438-402d-9ab9-f4d97d7064eb', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 22, 28, 780009, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[22]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:22<00:00,  4.48it/s, status=SUCCESS]
ATOM      1  N   GLY A   1       2.783 -22.631  -8.750  1.00  0.00
ATOM      2  CA  GLY A   1       2.306 -21.472  -9.494  1.00  0.00
ATOM      3  C   GLY A   1       2.318 -20.218  -8.628  1.00  0.00
ATOM      4  O   GLY A   1       2.653 -19.131  -9.098  1.00  0.00
ATOM      5  N   GLY A   2       2.005 -20.393  -7.442  1.00  0.00
ATOM      6  CA  GLY A   2       1.981 -19.249  -6.538  1.00  0.00
ATOM      7  C   GLY A   2       3.386 -18.715  -6.288  1.00  0.00
ATOM      8  O   GLY A   2       3.595 -17.504  -6.210  1.00  0.00
ATOM      9  N   GLY A   3       4.315 -19.601  -6.177  1.00  0.00
ATOM     10  CA  GLY A   3       5.687 -19.158  -5.960  1.00  0.00

Binder design#

We can use RFdiffusion to do binder design using the same contigs syntax. To do so, we use the /0 chain break syntax in contigs. Additionally, we specify hotspot residues, which tells the model where the protein should make contact.

[23]:
contigs = "A1-150/0 70-100"
hotspot = "A59,A83,A91"

We also add some additional configuration beyond the basic set of design parameters shown by rfdiffusion.generate?. Our RFdiffusion model interface also takes in the full set of configuration provided by RFdiffusion as kwargs. Note however that these are advanced settings and you should be properly familiar with RFdiffusion to use them. Here, we will reduce the noise added during inference to 0, to improve the quality of the designs, as recommended in their examples. Since RFdiffusion uses the hydra flattened dot syntax, we use a dictionary to hold the additonals args:

[24]:
kwargs = {
    "denoiser.noise_scale_ca": 0,
    "denoiser.noise_scale_frame": 0
}

Let’s use the insulin target example from the official repo:

[25]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/insulin_target.pdb").text
print("\n".join(pdb.splitlines()[11:21]))
ATOM      1  N   GLU A   1      47.177 109.949  22.284  1.00  0.00           N
ATOM      2  CA  GLU A   1      46.223 108.850  22.182  1.00  0.00           C
ATOM      3  C   GLU A   1      44.813 109.180  22.682  1.00  0.00           C
ATOM      4  O   GLU A   1      44.597 110.149  23.432  1.00  0.00           O
ATOM      5  CB  GLU A   1      46.752 107.602  22.893  1.00  0.00           C
ATOM      6  CG  GLU A   1      47.989 107.042  22.245  1.00  0.00           C
ATOM      7  CD  GLU A   1      47.721 106.544  20.773  1.00  0.00           C
ATOM      8  OE1 GLU A   1      46.771 105.825  20.555  1.00  0.00           O
ATOM      9  OE2 GLU A   1      48.479 106.896  19.900  1.00  0.00           O
ATOM     10 1H   GLU A   1      48.067 109.652  21.939  1.00  0.00           H

Let’s run the binder design:

[26]:
design = rfdiffusion.generate(
    structure_file=pdb,
    contigs=contigs,
    hotspot=hotspot,
    N=1,
    **kwargs
)
design
[26]:
RFdiffusionJob(job_id='dfd45782-3ba3-4b1b-ba1d-9a1697a4b77f', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 22, 53, 716417, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[27]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:17<00:00,  1.29it/s, status=SUCCESS]
ATOM      1  N   GLY A   1     -26.746 -16.007  -3.126  1.00  0.00
ATOM      2  CA  GLY A   1     -27.132 -15.454  -1.834  1.00  0.00
ATOM      3  C   GLY A   1     -26.688 -16.360  -0.692  1.00  0.00
ATOM      4  O   GLY A   1     -26.297 -15.885   0.374  1.00  0.00
ATOM      5  N   GLY A   2     -26.679 -17.711  -0.968  1.00  0.00
ATOM      6  CA  GLY A   2     -26.180 -18.645   0.033  1.00  0.00
ATOM      7  C   GLY A   2     -24.691 -18.440   0.284  1.00  0.00
ATOM      8  O   GLY A   2     -24.232 -18.489   1.425  1.00  0.00
ATOM      9  N   GLY A   3     -23.925 -18.225  -0.793  1.00  0.00
ATOM     10  CA  GLY A   3     -22.495 -17.973  -0.664  1.00  0.00

Higher diversity topologies using complex beta model#

RFdiffusion also provides a beta model checkpoint. This is because the default model often generates helical binders which have higher computational and experimental success rates. The beta model generates a greater diversity of topologies, but has not been extensively experimentally validated. Use at your own risk:

[28]:
design = rfdiffusion.generate(
    structure_file=pdb,
    contigs=contigs,
    hotspot=hotspot,
    N=1,
    use_beta_model=True,
    **kwargs
)
design
[28]:
RFdiffusionJob(job_id='5eba9326-21aa-4ebf-b098-f43b578e680d', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 24, 13, 629710, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[29]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:23<00:00,  1.20it/s, status=SUCCESS]
ATOM      1  N   GLY A   1       5.993 -12.514 -26.071  1.00  0.00
ATOM      2  CA  GLY A   1       5.682 -11.358 -25.238  1.00  0.00
ATOM      3  C   GLY A   1       4.622 -11.698 -24.198  1.00  0.00
ATOM      4  O   GLY A   1       3.746 -12.529 -24.439  1.00  0.00
ATOM      5  N   GLY A   2       4.485 -10.854 -23.249  1.00  0.00
ATOM      6  CA  GLY A   2       3.403 -10.944 -22.276  1.00  0.00
ATOM      7  C   GLY A   2       2.369  -9.848 -22.499  1.00  0.00
ATOM      8  O   GLY A   2       2.713  -8.715 -22.838  1.00  0.00
ATOM      9  N   GLY A   3       1.145 -10.131 -22.209  1.00  0.00
ATOM     10  CA  GLY A   3       0.061  -9.183 -22.438  1.00  0.00

Refer to the official RFdiffusion documentation’s section on Practical Considerations for Binder Design for some tips on the topic.

Fold conditioning#

We can also condition binder designs on particular topologies, by providing (partial) secondary structure and block adjacency information. An example is to design a TIM barrel but not requiring exact coordinates for the residues. We provide this additional information by specifying the structure to condition on using scaffold_target_structure_file. Let’s get the TIM barrel:

[30]:
pdb = get_pdb("6WVS")
print("\n".join(pdb.splitlines()[100:110]))
ATOM     44  HG  LEU A   4      -3.643  12.831   7.422  1.00 63.20           H
ATOM     45 HD11 LEU A   4      -5.578  13.378   8.608  1.00 72.64           H
ATOM     46 HD12 LEU A   4      -4.379  14.331   9.029  1.00 72.64           H
ATOM     47 HD13 LEU A   4      -5.503  14.859   8.038  1.00 72.64           H
ATOM     48 HD21 LEU A   4      -5.678  12.418   6.361  1.00 68.59           H
ATOM     49 HD22 LEU A   4      -5.596  13.845   5.668  1.00 68.59           H
ATOM     50 HD23 LEU A   4      -4.535  12.721   5.300  1.00 68.59           H
ATOM     51  N   ILE A   5      -0.925  15.785   4.277  1.00 40.62           N
ATOM     52  CA  ILE A   5       0.164  16.690   3.928  1.00 40.18           C
ATOM     53  C   ILE A   5      -0.408  18.094   3.803  1.00 46.92           C

This instructs the system to make the secondary structure and block adjacency information based on the provided scaffold_target_structure_file, before running scaffold guided inference, which does the fold conditioning.

[31]:
# Additional kwargs provided from example
# Reduce noise to 0.5 for better results
# Sample additional length to increase diversity of the outputs
# Specifically, we mask the loops and insert 0-5 residues (randomly sampled per-loop) into each loop
# Add 0-5 residues (randomly sampled) to the N and the C-terminus
kwargs = {
    "denoiser.noise_scale_ca": 0.5,
    "denoiser.noise_scale_frame": 0.5,
    "scaffoldguided.mask_loops": True,
    "scaffoldguided.sampled_insertion": "0-5",
    "scaffoldguided.sampled_N": "0-5",
    "scaffoldguided.sampled_C": "0-5",
}
design = rfdiffusion.generate(
    scaffold_target_structure_file=pdb,
    N=1,
    **kwargs
)
design
[31]:
RFdiffusionJob(job_id='6845b62e-67d6-46b3-939c-9a9c665472e0', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 25, 41, 498536, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

Wait for and retrieve the design:

[32]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:17<00:00,  1.29it/s, status=SUCCESS]
ATOM      1  N   GLY A   1      22.987  -8.520   6.056  1.00  0.00
ATOM      2  CA  GLY A   1      21.963  -7.664   5.468  1.00  0.00
ATOM      3  C   GLY A   1      21.814  -6.366   6.251  1.00  0.00
ATOM      4  O   GLY A   1      21.695  -6.379   7.476  1.00  0.00
ATOM      5  N   GLY A   2      22.074  -5.335   5.647  1.00  0.00
ATOM      6  CA  GLY A   2      21.872  -4.012   6.227  1.00  0.00
ATOM      7  C   GLY A   2      20.426  -3.560   6.076  1.00  0.00
ATOM      8  O   GLY A   2      19.787  -3.818   5.056  1.00  0.00
ATOM      9  N   GLY A   3      20.038  -2.546   6.827  1.00  0.00
ATOM     10  CA  GLY A   3      18.659  -2.076   6.757  1.00  0.00

Binder design to flexible peptides#

RFdiffusion can be used to design binders to flexible peptides, where the 3D coordinates of the peptide are not specified, but the secondary structure can be. This allows a user to design binders to a peptide in e.g. either a helical or beta state. The principle here is that we provide an input pdb structure of a peptide, but specify that we want to mask the 3D structure (inpaint_str). Here, we’re making 70-100 amino acid binders to the tau peptide (pdb indices B165-178), and we mask the structure with contigmap.inpaint_str on this peptide. However, we can then specify that we want it to adopt a helix secondary structure:

[33]:
# Get tau peptide from RFdiffusion repo
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/tau_peptide.pdb").text
contigs = "70-100/0 B165-178"
kwargs = {
    "scaffoldguided.scaffoldguided": True,
    "contigmap.inpaint_str": "[B165-178]",
    "contigmap.inpaint_str_helix": "[B165-178]",
}
design = rfdiffusion.generate(
    structure_file=pdb,
    contigs=contigs,
    N=1,
    **kwargs,
)
design
[33]:
RFdiffusionJob(job_id='e3f0a2fa-f5fc-4160-93ad-5ac2dff71de6', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 29, 0, 797164, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[34]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:38<00:00,  2.58it/s, status=SUCCESS]
ATOM      1  N   GLY A   1      32.330   1.561   6.905  1.00  0.00
ATOM      2  CA  GLY A   1      31.012   1.668   7.521  1.00  0.00
ATOM      3  C   GLY A   1      30.015   2.326   6.576  1.00  0.00
ATOM      4  O   GLY A   1      28.866   1.894   6.473  1.00  0.00
ATOM      5  N   GLY A   2      30.433   3.363   5.969  1.00  0.00
ATOM      6  CA  GLY A   2      29.550   4.053   5.036  1.00  0.00
ATOM      7  C   GLY A   2      29.246   3.186   3.821  1.00  0.00
ATOM      8  O   GLY A   2      28.109   3.135   3.352  1.00  0.00
ATOM      9  N   GLY A   3      30.238   2.527   3.315  1.00  0.00
ATOM     10  CA  GLY A   3      30.026   1.642   2.176  1.00  0.00

You could alternatively specify to adopt a beta (strand) secondary structure with contigmap.inpaint_str_strand.

Generating symmetric oligomers#

We can use RFdiffusion to generate structures of different symmetries. Use symmetry to specify one of cyclic, dihedral or tetrahedral. You can provide the order in the case of cyclic or dihedral (defaults to 1). RFdiffusion also provides the use of auxiliary potentials to help guide the inferencing process, which seem to help with motif scaffolding and symmetric oligomer generation. We have defaulted the use of potentials when it comes to symmetric oligomer generation, using the default parameters specified in the RFdiffusion documentation and examples, which looks like:

[35]:
# these are the default potentials options already added whenever you do any symmetric oligomer generation.
kwargs = {
    "potentials.guiding_potentials": "[\"type:olig_contacts,weight_intra:1,weight_inter:0.1\"]",
    "potentials.olig_intra_all": True,
    "potentials.olig_inter_all": True,
    "potentials.guide_scale": 2.0,
    "potentials.guide_decay": "quadratic"
}

Use add_potential = False explicitly to turn it off, and specify your own potentials if desired.

Cyclic#

[36]:
design = rfdiffusion.generate(
    symmetry="cyclic",
    order=6,
    contigs=480,
)
design
[36]:
RFdiffusionJob(job_id='b83a1fc7-dff6-4e98-a72f-b3d9fbb5fa7d', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 29, 41, 715834, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[37]:
result = design.wait(verbose=True)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [04:13<00:00,  2.53s/it, status=SUCCESS]
ATOM      1  N   GLY A   1      15.816   5.285  22.037  1.00  0.00
ATOM      2  CA  GLY A   1      15.583   4.007  21.376  1.00  0.00
ATOM      3  C   GLY A   1      15.737   4.131  19.866  1.00  0.00
ATOM      4  O   GLY A   1      15.128   4.998  19.239  1.00  0.00
ATOM      5  N   GLY A   2      16.670   3.571  19.434  1.00  0.00
ATOM      6  CA  GLY A   2      16.844   3.550  17.987  1.00  0.00
ATOM      7  C   GLY A   2      16.564   2.165  17.418  1.00  0.00
ATOM      8  O   GLY A   2      17.075   1.163  17.919  1.00  0.00
ATOM      9  N   GLY A   3      15.806   2.161  16.413  1.00  0.00
ATOM     10  CA  GLY A   3      15.417   0.911  15.772  1.00  0.00

Dihedral#

[38]:
design = rfdiffusion.generate(
    symmetry="dihedral",
    order=2,
    contigs=320,
)
design
[38]:
RFdiffusionJob(job_id='7b08c403-9be3-478e-a55d-4051ea63defb', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 33, 57, 215048, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[39]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [02:07<00:00,  1.27s/it, status=SUCCESS]
ATOM      1  N   GLY A   1      32.710  -4.716  -9.170  1.00  0.00
ATOM      2  CA  GLY A   1      31.865  -4.144  -8.129  1.00  0.00
ATOM      3  C   GLY A   1      31.160  -2.887  -8.621  1.00  0.00
ATOM      4  O   GLY A   1      30.004  -2.637  -8.276  1.00  0.00
ATOM      5  N   GLY A   2      31.858  -2.112  -9.270  1.00  0.00
ATOM      6  CA  GLY A   2      31.238  -0.910  -9.815  1.00  0.00
ATOM      7  C   GLY A   2      30.088  -1.259 -10.751  1.00  0.00
ATOM      8  O   GLY A   2      29.053  -0.592 -10.754  1.00  0.00
ATOM      9  N   GLY A   3      30.256  -2.281 -11.493  1.00  0.00
ATOM     10  CA  GLY A   3      29.168  -2.701 -12.368  1.00  0.00

Tetrahedral#

[40]:
# order is ignored for tetrahedral
design = rfdiffusion.generate(
    symmetry="tetrahedral",
    contigs=1200,
)
design
[40]:
RFdiffusionJob(job_id='2ef4b3ce-582a-49dc-b217-1fd0d9a3445e', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 36, 6, 673269, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

This one takes a little longer due to the longer length.

[41]:
result = design.wait(verbose=True, timeout=1500) # takes longer for the longer sequence
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 510.40it/s, status=SUCCESS]
ATOM      1  N   GLY A   1     -25.230 -22.272  10.678  1.00  0.00
ATOM      2  CA  GLY A   1     -25.466 -20.849  10.466  1.00  0.00
ATOM      3  C   GLY A   1     -26.015 -20.188  11.724  1.00  0.00
ATOM      4  O   GLY A   1     -25.740 -19.019  11.993  1.00  0.00
ATOM      5  N   GLY A   2     -26.675 -20.934  12.454  1.00  0.00
ATOM      6  CA  GLY A   2     -27.204 -20.346  13.679  1.00  0.00
ATOM      7  C   GLY A   2     -26.081 -19.891  14.602  1.00  0.00
ATOM      8  O   GLY A   2     -26.180 -18.848  15.248  1.00  0.00
ATOM      9  N   GLY A   3     -25.012 -20.556  14.552  1.00  0.00
ATOM     10  CA  GLY A   3     -23.901 -20.147  15.403  1.00  0.00

Symmetric motif scaffolding#

We can combine motif scaffolding with symmetric generation to scaffold motifs Here. symmetrically we are doing a C4 symmetric nickel design shown in the RFdiffusion paper.

[42]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/nickel_symmetric_motif.pdb").text
design = rfdiffusion.generate(
    symmetry="cyclic",
    order=4,
    structure_file=pdb,
    contigs="50/A2-4/50/0 50/A7-9/50/0 50/A12-14/50/0 50/A17-19/50/0",
    N=1,
)
design
[42]:
RFdiffusionJob(job_id='4fac45e7-3232-4409-b596-9cb0c1f91ba3', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 10, 2, 59, 498290, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[43]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [03:10<00:00,  1.90s/it, status=SUCCESS]
ATOM      1  N   GLY A   1     -18.280  11.394 -28.960  1.00  0.00
ATOM      2  CA  GLY A   1     -19.500  11.542 -28.176  1.00  0.00
ATOM      3  C   GLY A   1     -19.500  10.605 -26.975  1.00  0.00
ATOM      4  O   GLY A   1     -19.956  10.970 -25.891  1.00  0.00
ATOM      5  N   GLY A   2     -19.051   9.465 -27.171  1.00  0.00
ATOM      6  CA  GLY A   2     -18.974   8.562 -26.029  1.00  0.00
ATOM      7  C   GLY A   2     -18.024   9.099 -24.966  1.00  0.00
ATOM      8  O   GLY A   2     -18.276   8.964 -23.769  1.00  0.00
ATOM      9  N   GLY A   3     -16.946   9.658 -25.393  1.00  0.00
ATOM     10  CA  GLY A   3     -16.017  10.222 -24.421  1.00  0.00

Macrocyclic peptide design with RFpeptides#

The newly published RFpeptide protocol, for designing macrocyclic peptides that bind target proteins with atomic accuracy, can be accessed using the flags inference.cyclic=True and inference.cyc_chains. The former instructs the system to design at least one macrocycle, and the latter is just a string containing the letter of every chain you would like to design as a cyclic peptide. For example, inference.cyc_chains='a' means only chain A is cyclized, but one could do inference.cyc_chains='abcd' for chains A to D to be cyclized.

Macrocyclic binder design#

We can add the two flags for macrocyclic peptide design to our binder design:

[44]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/7zkr_GABARAP.pdb").text
kwargs = {
    "inference.cyclic": True,
    "inference.cyc_chains": "a",
}
design = rfdiffusion.generate(
    structure_file=pdb,
    contigs="12-18 A3-117/0",
    hotspot="A51,A52,A50,A48,A62,A65",
    N=1,
    **kwargs,
)
design
[44]:
RFdiffusionJob(job_id='44fc4c6b-1a4d-4c92-a252-ce5f501a0963', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 10, 6, 11, 323296, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[45]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:46<00:00,  2.16it/s, status=SUCCESS]
ATOM      1  N   GLY A   1       2.149   8.945   7.688  1.00  0.00
ATOM      2  CA  GLY A   1       1.594   7.613   7.895  1.00  0.00
ATOM      3  C   GLY A   1       1.251   7.380   9.361  1.00  0.00
ATOM      4  O   GLY A   1       1.955   7.850  10.255  1.00  0.00
ATOM      5  N   GLY A   2       0.155   6.778   9.609  1.00  0.00
ATOM      6  CA  GLY A   2      -0.244   6.391  10.957  1.00  0.00
ATOM      7  C   GLY A   2      -0.226   4.877  11.124  1.00  0.00
ATOM      8  O   GLY A   2      -0.674   4.141  10.245  1.00  0.00
ATOM      9  N   GLY A   3       0.565   4.412  11.986  1.00  0.00
ATOM     10  CA  GLY A   3       0.658   2.990  12.294  1.00  0.00

Macrocyclic monomer design#

Same for monomer design:

[46]:
kwargs = {
    "inference.cyclic": True,
    "inference.cyc_chains": "a",
}
design = rfdiffusion.generate(
    structure_file=pdb,
    contigs="12-18",
    N=1,
    **kwargs,
)
design
[46]:
RFdiffusionJob(job_id='7e777fdd-b684-4024-902d-429135993c6f', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 10, 6, 59, 300539, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[47]:
result = design.wait(verbose=True, timeout=600)
# show only the first 10 lines
print("\n".join(result.splitlines()[:10]))
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:36<00:00,  2.76it/s, status=SUCCESS]
ATOM      1  N   GLY A   1      -0.391   3.319   5.675  1.00  0.00
ATOM      2  CA  GLY A   1      -1.312   4.425   5.905  1.00  0.00
ATOM      3  C   GLY A   1      -0.560   5.718   6.193  1.00  0.00
ATOM      4  O   GLY A   1       0.356   5.743   7.015  1.00  0.00
ATOM      5  N   GLY A   2       0.333   6.144   6.421  1.00  0.00
ATOM      6  CA  GLY A   2       0.840   7.455   6.809  1.00  0.00
ATOM      7  C   GLY A   2       1.456   8.181   5.620  1.00  0.00
ATOM      8  O   GLY A   2       1.796   9.361   5.709  1.00  0.00
ATOM      9  N   GLY A   3       1.361   7.974   4.648  1.00  0.00
ATOM     10  CA  GLY A   3       2.026   8.526   3.474  1.00  0.00