Using RFdiffusion#
This tutorial shows you how to use the RFdiffusion model to design novel protein structures.
The examples here are largely lifted from the original documentation but adapted to show how it can be run using the OpenProtein platform, which can then be combined with our other workflows!
Full credit for the examples and use cases go to the authors of RFdiffusion!
Unconditional monomer design#
The basic execution of RFdiffusion would be an unconditional design of a protein structure of a certain length. You would need 2 things:
Length of the protein
Number of designs
Ndesired
[1]:
import openprotein
session = openprotein.connect()
session
[1]:
<openprotein.OpenProtein at 0x7f1d7958e300>
[2]:
length = 150
N = 3
Now let’s get the model handle:
[3]:
rfdiffusion = session.models.rfdiffusion
rfdiffusion.generate?
Signature:
rfdiffusion.generate(
query: str | bytes | vault.protein.Protein | vault.model.Model | openprotein.prompt.models.Query | None = None,
contigs: int | str | None = None,
structure_file: str | bytes | typing.BinaryIO | None = None,
N: int = 1,
inpaint_seq: str | None = None,
provide_seq: str | None = None,
hotspot: str | None = None,
T: int | None = None,
partial_T: int | None = None,
use_active_site_model: bool | None = None,
use_beta_model: bool | None = None,
symmetry: Optional[Literal['cyclic', 'dihedral', 'tetrahedral']] = None,
order: int | None = None,
add_potential: bool | None = None,
scaffold_target_structure_file: str | bytes | typing.BinaryIO | None = None,
scaffold_target_use_struct: bool = False,
**kwargs,
) -> openprotein.models.foundation.rfdiffusion.RFdiffusionFuture
Docstring:
Run a protein structure generate job using RFdiffusion.
Parameters
----------
query : str or bytes or Protein or Model or Query, optional
A query representing the design specification. Use either `query` or `contigs`
for default design. Or provide `scaffold_target_structure_file`
for scaffold guided design.
`query` provides a unified way to represent design specifications on the
OpenProtein platform. In this case, the structure mask of the containing Model
proteins are specified to be designed. Other parameters like binding are passed
as hotspots to RFdiffusion.
contigs : int, str, optional
Defines the lengths and connectivity of chain segments for the desired
structure, specified in RFdiffusion's contig string format.
Required for most design tasks. Example: 150, '10-20/A100-110/10-20' for a
binder design.
structure_file : BinaryIO, optional
An input PDB file (as a file-like object) used for inpainting or other
guided design tasks where parts of an existing structure are provided.
n : int, optional
The number of unique design trajectories to run (default is 1).
inpaint_seq : str, optional
A string specifying the regions in the input structure to mask for
in-painting. Example: 'A1-A10/A30-40'.
provide_seq : str, optional
A string specifying which segments of the contig have a provided
sequence. Example: 'A1-A10/A30-40'.
hotspot : str, optional
A string specifying hotspot residues to constrain during design,
typically for functional sites. Example: 'A10,A12,A14'.
T : int, optional
The number of timesteps for the diffusion process.
partial_T : int, optional
The number of timesteps for partial diffusion.
use_active_site_model : bool, optional
If True, uses the active site model checkpoint, which has been finetuned to
better keep very small motifs in place in the output for motif scaffolding
(default is False).
use_beta_model : bool, optional
If True, uses the complex beta model checkpoint, which generates a
greater diversity of topologies but has not been extensively
experimentally validated (default is False).
symmetry : {"cyclic", "dihedral", "tetrahedral"}, optional
The type of symmetry to apply to the design.
order : int, optional
The order of the symmetry (e.g., 3 for C3 or D3 symmetry).
Must be provided if `symmetry` is set.
add_potential : bool, optional
A flag to toggle an additional potential to guide the design.
This defaults to true in the case of symmetric design.
scaffold_target_structure_file : str, bytes, BinaryIO, optional
A PDB file (which can be the text string or bytes or the file-like
object) containing a scaffold structure to be used as a structural
guide. It could also be used as a target when doing scaffold guided
binder design with `scaffold_target_use_struct`.
scaffold_target_use_struct : bool, optional
Whether or not to use the provided scaffold structure as a target.
Otherwise, it is used only as a topology guide.
Other Parameters
----------------
**kwargs : dict
Additional keyword args that are passed directly to the rfdiffusion
inference script. Overwrites any preceding options.
Returns
-------
RFdiffusionFuture
A future object that can be used to retrieve the results of the design
job upon completion.
File: ~/Projects/openprotein/openprotein-python-private/openprotein/models/foundation/rfdiffusion.py
Type: method
We can run designs in a unified manner using the Query object, which itself represents a Protein, which has been masked in some way. For RFdiffusion, which generates protein structures, we want to mask the structure of our Protein to tell the model to generate the structure for us.
In this case, for unconditional monomer design, we can just create a protein chain with length N and all unknown residues. Protein.from_expr provides syntactic sugar for how to quickly construct such chains.
[4]:
from openprotein import Protein
unconditional_monomer = Protein.from_expr(str(length))
print("sequence:", unconditional_monomer.sequence)
print("structure mask:", unconditional_monomer.get_structure_mask())
sequence: b'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
structure mask: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
145 146 147 148 149 150]
The above tells us that our query protein has both the sequence and structure fully masked, with our expected length. Since RFdiffusion only works on structures, and the query has a fully masked structure, the output will be unconditionally generated monomers.
Run the design using RFdiffusion:
[5]:
design = rfdiffusion.generate(N=N, query=unconditional_monomer)
design
[5]:
RFdiffusionJob(job_id='12951030-5ab3-49f1-af1e-0a8c877908ae', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 12, 21, 19, 10, 7, 976286, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Wait for the job to finish running with wait_until_done.
[6]:
design.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|███████████████████████████████████████████████████| 100/100 [06:16<00:00, 3.77s/it, status=SUCCESS]
[6]:
True
Retrieve the designs as a list of N Model objects. Model objects represent multimers, and can hold multiple protein (and other) chains. For now, our design will only return a single chain. Let’s look at the first one.
[7]:
unconditional_monomer_designs = design.get()
unconditional_monomer_design = unconditional_monomer_designs[0]
print("sequence:", unconditional_monomer_design.proteins["A"].sequence)
print("structure mask:", unconditional_monomer_design.proteins["A"].get_structure_mask())
sequence: b'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
structure mask: []
From the above, we can see that the structure mask is now empty, which means RFdiffusion has produced a design for the structure. Also, it has replaced all the residues with G, which is what it does with all designed residues.
[8]:
%pip install molviewspec
Requirement already satisfied: molviewspec in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (1.7.0)
Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Note: you may need to restart the kernel to use updated packages.
[9]:
from molviewspec import create_builder
def visualize_pdb(pdb_string: str):
builder = create_builder()
structure = builder.download(url="mystructure.pdb")\
.parse(format="pdb")\
.model_structure()\
.component()\
.representation()\
.color(color="blue")
builder.molstar_notebook(data={'mystructure.pdb': pdb_string}, width=500, height=400)
visualize_pdb(unconditional_monomer_design.make_pdb_string())
Motif Scaffolding#
RFdiffusion can be used to scaffold motifs. To do this, we need a few things:
some particular protein input, from a
.pdbhow to connect these proteins and by how many residues in the new protein
some sample of lengths for the new protein, similar to the above
First, let’s get our pdb from the RCSB protein data bank. We will be using 5TPN, which represents the crystal structure of RSV F in complex with human antibody hRSV90.
[10]:
import requests
import gzip
import io
from openprotein import Model
def get_pdb(code: str) -> str:
with requests.get(f"https://files.rcsb.org/download/{code}.pdb1.gz", stream=True) as r:
r.raise_for_status()
buf = io.BytesIO(r.content)
with gzip.open(buf, 'rb') as f:
pdb = f.read().decode()
return pdb
pdb = get_pdb("5TPN")
# First load the pdb as a Model
model_5TPN = Model.from_string(pdb, format="pdb")
print("chains in model:", list(model_5TPN.proteins.keys()))
print("chain A sequence:", model_5TPN.proteins["A"].sequence)
chains in model: ['A', 'H', 'L']
chain A sequence: b'NITEEFYQSTCSAVSKGYLSALRTGWYTSVITIELSNIKKIKCNGTDAKIKLIKQELDKYKNAVTELQLLMQSTPATNNQARGSGSGRSLGFLLGVGSAIASGVAVSKVLHLEGEVNKIKSALLSTNKAVVSLSNGVSVLTSKVLDLKNYIDKQLLPIVNKQSCSIPNIETVIEFQQKNNRLLEITREFSVNAGVTTPVSTYMLTNSELLSLINDMPITNDQKKLMSNNVQIVRQQSYSIMSIIKEEVLAYVVQLPLYGVIDTPCWKLHTSPLCTTNTKEGSNICLTRTDRGWYCDNAGSVSFFPQAETCKVQSNRVFCDTMNSLTLPSEVNLCNVDIFNPKYDCKIMTSKTDVSSSVITSLGAIVSCYGKTKCTASNKNRGIIKTFSNGCDYVSNKGVDTVSVGNTLYYVNKQEGKSLYVKGEPIINFYDPLVFPSDQFDASISQVNEKINQSLAFIRKSDELLSAIGGYIPEAPRDGQAYVRKDGEWVLLSTFLGGLVPRGSHHHHHH'
Now we need to specify the motif that we are interested in. This is also done along with the length of the output protein we are interested in using the contigs syntax from RFdiffusion. In particular, we are interested in the residues 163-181 (inclusively) on chain A in the input pdb, and we are interested in generating 25 residues on the N and C terminus. We can generate our query like this:
[11]:
# Slice out the scaffold we are interested in
query_motif = model_5TPN.proteins["A"][162:180] # indexing with square brackets is 0-indexed!
query_motif.chain_id = None # avoid clashing chain ids
# Add 25 residues to be designed to each terminus
motif_scaffold_query = "25" + query_motif + "25"
print("motif scaffold query sequence:", motif_scaffold_query.sequence)
print("motif scaffold query structure mask:", motif_scaffold_query.get_structure_mask())
motif scaffold query sequence: b'XXXXXXXXXXXXXXXXXXXXXXXXXSCSIPNIETVIEFQQKNNXXXXXXXXXXXXXXXXXXXXXXXXX'
motif scaffold query structure mask: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
67 68]
From the above we can see that the 25 residues on each terminus should be designed by RFdiffusion, and we have provided the scaffold specified to use in the design.
Now let’s run the design.
[13]:
motif_scaffold_job = rfdiffusion.generate(
query=motif_scaffold_query,
N=1,
)
motif_scaffold_job
[13]:
RFdiffusionJob(job_id='d5532b07-3412-4eb3-8f2e-b8d5d7ed50b3', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 12, 21, 19, 17, 52, 90516, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[14]:
motif_scaffold_job.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|███████████████████████████████████████████████████| 100/100 [00:36<00:00, 2.74it/s, status=SUCCESS]
[14]:
True
[15]:
motif_scaffold_design = motif_scaffold_job.get()[0]
print("sequence:", motif_scaffold_design.proteins["A"].sequence)
print("structure mask:", motif_scaffold_design.proteins["A"].get_structure_mask())
visualize_pdb(motif_scaffold_design.make_pdb_string())
sequence: b'XXXXXXXXXXXXXXXXXXXXXXXXXSCSIPNIETVIEFQQKNNXXXXXXXXXXXXXXXXXXXXXXXXX'
structure mask: []
Small motifs using active site model#
With very small motifs, RFdiffusion has the tendency to not keep them perfectly fixed in the output. For very small input functional motifs, RFdiffusion recommends using the active site model which is finetuned for such tasks. This is specified using the use_active_site_model:
[16]:
# Slice out a very small motif scaffold as an example
query_small_motif = query_protein[169:172]
# Add 25 residues to be designed to each terminus
small_motif_scaffold_query = "25" + query_small_motif + "25"
print("motif scaffold query sequence:", small_motif_scaffold_query.sequence)
print("motif scaffold query structure mask:", small_motif_scaffold_query.get_structure_mask())
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[16], line 2
1 # Slice out a very small motif scaffold as an example
----> 2 query_small_motif = query_protein[169:172]
4 # Add 25 residues to be designed to each terminus
5 small_motif_scaffold_query = "25" + query_small_motif + "25"
NameError: name 'query_protein' is not defined
[ ]:
small_motif_job = rfdiffusion.generate(
query=small_motif_scaffold_query,
N=1,
use_active_site_model=True
)
small_motif_job
[ ]:
small_motif_design = small_motif_job.wait(verbose=True, timeout=600)[0]
print("sequence:", motif_scaffold_design.proteins["A"].sequence)
print("structure mask:", motif_scaffold_design.proteins["A"].get_structure_mask())
visualize_pdb(small_motif_design.make_pdb_string())
Binder design#
We can use RFdiffusion to do binder design by expressing our query as a multi-chain complex with the Model object. By having a target chain with a structure-masked unknown binder, we get RFdiffusion to design the binder!
Let’s use the insulin example from the official examples.
[ ]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/insulin_target.pdb").text
insulin = Model.from_string(pdb, "pdb")
print("chains in insulin:", list(insulin.proteins.keys()))
print("insulin sequence:", insulin.proteins["A"].sequence)
[ ]:
len(insulin.proteins["A"])
Let’s design a binder of length 80.
RFdiffusion has the concept of hotspots, which are used to indicate where the binding sites on the target should be. We will set these at sites 59, 83 and 91 as in the example.
Refer to the official RFdiffusion documentation’s section on Practical Considerations for Binder Design for some tips on the topic.
[ ]:
# Set our binding sites, or hotspots
from openprotein.protein import Binding
insulin_protein = insulin.proteins["A"]
target = insulin_protein.set_binding_at([59,83,91], Binding.BINDING)
# Combine our target with an unknown binder
binder_query = insulin & "80"
print("target sequence:", binder_query.proteins["A"].sequence)
print("binder sequence:", binder_query.proteins["B"].sequence)
print("target structure mask:", binder_query.proteins["A"].get_structure_mask())
print("binder structure mask:", binder_query.proteins["B"].get_structure_mask())
We also add some additional configuration beyond the basic set of design parameters shown by rfdiffusion.generate?. Our RFdiffusion model interface also takes in the full set of configuration provided by RFdiffusion as kwargs. Note however that these are advanced settings and you should be properly familiar with RFdiffusion to use them. Here, we will reduce the noise added during inference to 0, to improve the quality of the designs, as recommended in their examples. Since RFdiffusion uses
the hydra flattened dot syntax, we use a dictionary to hold the additonals args:
[ ]:
binder_design_extra_args = {
"denoiser.noise_scale_ca": 0,
"denoiser.noise_scale_frame": 0
}
Now let’s run the binder design:
[ ]:
binder_design_job = rfdiffusion.generate(
query=binder_query,
N=1,
**binder_design_extra_args
)
binder_design_job
Let’s wait for and inspect the results.
[ ]:
binder_target_model = binder_design_job.wait(verbose=True, timeout=600)[0]
print("target sequence:", binder_target_model.proteins["A"].sequence)
print("binder sequence:", binder_target_model.proteins["B"].sequence)
print("target structure mask:", binder_target_model.proteins["A"].get_structure_mask())
print("binder structure mask:", binder_target_model.proteins["B"].get_structure_mask())
visualize_pdb(binder_target_model.make_pdb_string())
Higher diversity topologies using complex beta model#
RFdiffusion also provides a beta model checkpoint. This is because the default model often generates helical binders which have higher computational and experimental success rates. The beta model generates a greater diversity of topologies, but has not been extensively experimentally validated. This is exposed via the use_beta_model parameter. Use at your own risk!
Next Steps#
You can run the designed structu through inverse folding with PoET-2 or ProteinMPNN to predict some sequences that could fold into the predicted structures. Alternatively, look at using BoltzGen for another structure generation model that can be used on our platform, with the same Query-based approach.
Lastly, Binder Design with RFdiffusion provides an in-depth look on the full workflow for designing binders, inverse folding, then re-folding with a structure prediction model like ESMFold.
Appendix: Low-level RFdiffusion#
If for some reason our query-based approach does not cover all your design needs, our python client also supports all parameters used by RFdiffusion. There are additional parameters that can be used, along with full support for any keyword arguments used in RFdiffusion, just like the denoiser.noise_scale_ca and denoiser.noise_scale_frame used in the binder design example. The most noteworthy thing to take note of is the use structure_file for specifying input pdb strings.
The subsequent sections shows some examples of the syntax and other use cases as covered in the RFdiffusion docs.
inpaint_seq lets you hide the amino acid identities of specific residues in your input structure. RFdiffusion will then fill in these residues during design, choosing sequences that fit the new structural context.
For example, if you’re fusing two proteins, residues that were originally on the surface (often polar) might end up buried in the core. Instead of manually mutating them to hydrophobic residues, you can mask them with inpaint_seq so RFdiffusion can decide on the best replacements automatically.
[13]:
inpaint_seq = "A163-168/A170-171/A179"
This means we are masking the residue identities of A163 to A168 (inclusive), and residues A170, A171 and A179.
[14]:
# reset the contigs from the very small motif for inpaint example
contigs = "10-40/A163-181/10-40"
design = rfdiffusion.generate(
structure_file=pdb,
contigs=contigs,
inpaint_seq=inpaint_seq,
N=1,
)
design
[14]:
RFdiffusionJob(job_id='be51eb2f-e86d-4e68-939f-3351ecdc6444', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 21, 26, 735906, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Partial diffusion#
We can use partial diffusion to get some diversity around a general fold. This is done using the partial_T parameter and setting to some timestep to ‘noise’ to. Higher noise means higher diversity. You should sample different values for your specific design problem, but the typical value used by the authors was 20.
With partial diffusion, there is a constraint on contigs since the diffusion is done from a known structure - the contig string has to yield the exact same length as the input protein.
Let’s use 2KL8, and run partial diffusion for 10 timesteps:
[17]:
pdb = get_pdb("2KL8")
length = 85
design = rfdiffusion.generate(
structure_file=pdb,
contigs=length,
partial_T=10,
N=1,
)
design
[17]:
RFdiffusionJob(job_id='2bb73c76-f0df-4d38-bbd6-fbbbe612d666', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 22, 9, 356090, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Keeping some sequences#
You can also keep parts of the sequence of the partially diffused chain fixed, using provide_seq. An example of why you might want to do this is in the context of helical peptide binding. If you’ve threaded a helical peptide sequence onto an ideal helix, and now want to diversify the complex, allowing the helix to be predicted now not as an ideal helix, you might do something like:
[19]:
provide_seq = "172-205"
# Provide multiple sequences using a comma delimiter
# provide_seq = "172-177,200-205"
This means we want to keep residues 172 to 205 fixed.
Let’s use the helical peptide example from the RFdiffusion repo, and run this design.
Take note we are using the /0 syntax in contigs which refers to a chain break in the syntax.
[21]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/peptide_complex_ideal_helix.pdb").text
contigs = "172-172/0 34-34"
design = rfdiffusion.generate(
structure_file=pdb,
contigs=contigs,
provide_seq=provide_seq,
partial_T=10,
N=1,
)
design
[21]:
RFdiffusionJob(job_id='a52df40b-8438-402d-9ab9-f4d97d7064eb', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 22, 28, 780009, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Fold conditioning#
We can also condition binder designs on particular topologies, by providing (partial) secondary structure and block adjacency information. An example is to design a TIM barrel but not requiring exact coordinates for the residues. We provide this additional information by specifying the structure to condition on using scaffold_target_structure_file. Let’s get the TIM barrel, and instruct the system to make the secondary structure and block adjacency information based on the provided
scaffold_target_structure_file, before running scaffold guided inference, which does the fold conditioning.
[31]:
pdb = get_pdb("6WVS")
# Additional kwargs provided from example
# Reduce noise to 0.5 for better results
# Sample additional length to increase diversity of the outputs
# Specifically, we mask the loops and insert 0-5 residues (randomly sampled per-loop) into each loop
# Add 0-5 residues (randomly sampled) to the N and the C-terminus
kwargs = {
"denoiser.noise_scale_ca": 0.5,
"denoiser.noise_scale_frame": 0.5,
"scaffoldguided.mask_loops": True,
"scaffoldguided.sampled_insertion": "0-5",
"scaffoldguided.sampled_N": "0-5",
"scaffoldguided.sampled_C": "0-5",
}
design = rfdiffusion.generate(
scaffold_target_structure_file=pdb,
N=1,
**kwargs
)
design
[31]:
RFdiffusionJob(job_id='6845b62e-67d6-46b3-939c-9a9c665472e0', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 25, 41, 498536, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Binder design to flexible peptides#
RFdiffusion can be used to design binders to flexible peptides, where the 3D coordinates of the peptide are not specified, but the secondary structure can be. This allows a user to design binders to a peptide in e.g. either a helical or beta state. The principle here is that we provide an input pdb structure of a peptide, but specify that we want to mask the 3D structure (inpaint_str). Here, we’re making 70-100 amino acid binders to the tau peptide (pdb indices B165-178), and we mask
the structure with contigmap.inpaint_str on this peptide. However, we can then specify that we want it to adopt a helix secondary structure:
[33]:
# Get tau peptide from RFdiffusion repo
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/tau_peptide.pdb").text
contigs = "70-100/0 B165-178"
kwargs = {
"scaffoldguided.scaffoldguided": True,
"contigmap.inpaint_str": "[B165-178]",
"contigmap.inpaint_str_helix": "[B165-178]",
}
design = rfdiffusion.generate(
structure_file=pdb,
contigs=contigs,
N=1,
**kwargs,
)
design
[33]:
RFdiffusionJob(job_id='e3f0a2fa-f5fc-4160-93ad-5ac2dff71de6', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 29, 0, 797164, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
You could alternatively specify to adopt a beta (strand) secondary structure with contigmap.inpaint_str_strand.
Generating symmetric oligomers#
We can use RFdiffusion to generate structures of different symmetries. Use symmetry to specify one of cyclic, dihedral or tetrahedral. You can provide the order in the case of cyclic or dihedral (defaults to 1). RFdiffusion also provides the use of auxiliary potentials to help guide the inferencing process, which seem to help with motif scaffolding and symmetric oligomer generation. We have defaulted the use of potentials when it comes to symmetric oligomer generation,
using the default parameters specified in the RFdiffusion documentation and examples, which looks like:
[35]:
# these are the default potentials options already added whenever you do any symmetric oligomer generation.
kwargs = {
"potentials.guiding_potentials": "[\"type:olig_contacts,weight_intra:1,weight_inter:0.1\"]",
"potentials.olig_intra_all": True,
"potentials.olig_inter_all": True,
"potentials.guide_scale": 2.0,
"potentials.guide_decay": "quadratic"
}
Use add_potential = False explicitly to turn it off, and specify your own potentials if desired.
Cyclic#
[36]:
design = rfdiffusion.generate(
symmetry="cyclic",
order=6,
contigs=480,
)
design
[36]:
RFdiffusionJob(job_id='b83a1fc7-dff6-4e98-a72f-b3d9fbb5fa7d', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 29, 41, 715834, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Dihedral#
[38]:
design = rfdiffusion.generate(
symmetry="dihedral",
order=2,
contigs=320,
)
design
[38]:
RFdiffusionJob(job_id='7b08c403-9be3-478e-a55d-4051ea63defb', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 33, 57, 215048, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Tetrahedral#
[40]:
# order is ignored for tetrahedral
design = rfdiffusion.generate(
symmetry="tetrahedral",
contigs=1200,
)
design
[40]:
RFdiffusionJob(job_id='2ef4b3ce-582a-49dc-b217-1fd0d9a3445e', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 9, 36, 6, 673269, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
This one takes a little longer due to the longer length.
Symmetric motif scaffolding#
We can combine motif scaffolding with symmetric generation to scaffold motifs Here. symmetrically we are doing a C4 symmetric nickel design shown in the RFdiffusion paper.
[42]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/nickel_symmetric_motif.pdb").text
design = rfdiffusion.generate(
symmetry="cyclic",
order=4,
structure_file=pdb,
contigs="50/A2-4/50/0 50/A7-9/50/0 50/A12-14/50/0 50/A17-19/50/0",
N=1,
)
design
[42]:
RFdiffusionJob(job_id='4fac45e7-3232-4409-b596-9cb0c1f91ba3', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 10, 2, 59, 498290, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Macrocyclic peptide design with RFpeptides#
The newly published RFpeptide protocol, for designing macrocyclic peptides that bind target proteins with atomic accuracy, can be accessed using the flags inference.cyclic=True and inference.cyc_chains. The former instructs the system to design at least one macrocycle, and the latter is just a string containing the letter of every chain you would like to design as a cyclic peptide. For example, inference.cyc_chains='a' means only chain A is cyclized, but one could do
inference.cyc_chains='abcd' for chains A to D to be cyclized.
Macrocyclic binder design#
We can add the two flags for macrocyclic peptide design to our binder design:
[44]:
pdb = requests.get("https://raw.githubusercontent.com/RosettaCommons/RFdiffusion/fa340147b9006156b251d1ad0391e3ea8e5f73eb/examples/input_pdbs/7zkr_GABARAP.pdb").text
kwargs = {
"inference.cyclic": True,
"inference.cyc_chains": "a",
}
design = rfdiffusion.generate(
structure_file=pdb,
contigs="12-18 A3-117/0",
hotspot="A51,A52,A50,A48,A62,A65",
N=1,
**kwargs,
)
design
[44]:
RFdiffusionJob(job_id='44fc4c6b-1a4d-4c92-a252-ce5f501a0963', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 10, 6, 11, 323296, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Macrocyclic monomer design#
Same for monomer design:
[46]:
kwargs = {
"inference.cyclic": True,
"inference.cyc_chains": "a",
}
design = rfdiffusion.generate(
structure_file=pdb,
contigs="12-18",
N=1,
**kwargs,
)
design
[46]:
RFdiffusionJob(job_id='7e777fdd-b684-4024-902d-429135993c6f', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 6, 10, 6, 59, 300539, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)