Binder Design With RFdiffusion#
Designing a high-affinity binder starts with the right tools. This tutorial introduces how to use RFdiffusion on the OpenProtein AI platform, using our Python client, to generate and evaluate binder candidates against a specific protein target.
You’ll learn how to set up your environment, define a target structure and binding site constraints, configure RFdiffusion runs, submit and monitor jobs, and retrieve results programmatically.
We’ll also cover how to use the designs with inverse folding for suggesting suitable protein sequences through ProteinMPNN, then put them through structure prediction with AlphaFold2 to evaluate the designed binders. Whether you’re new to RFdiffusion or looking to streamline your workflow, this guide will help you go from target definition to prioritized binder designs quickly and reproducibly.
This tutorial follows the approach described in Watson et al. (2023) “De novo design of protein structure and function with RFdiffusion”, using the publicly available 3DI3 structure (IL-7Rα) as our target. We also follow some of the methodology in Bennet et al. (2023) in “Improving de novo protein binder design with deep learning”.
Prerequisites#
For this tutorial, you will need your OpenProtein python session for accessing the models available on our platform and manipulating job results, so make sure you have your credentials setup!
[1]:
import openprotein
session = openprotein.connect()
session
[1]:
<openprotein.OpenProtein at 0x7fb07bc6e900>
Target Selection#
For this tutorial, we’ll use the 3DI3 structure from the RCSB PDB, which contains the extracellular domain of human interleukin-7 receptor alpha (IL-7Rα). This receptor was used as one of the benchmark targets in the Watson et al. (2023) for evaluating binder design performance.
Download the structure from RCSB and load it as a Protein object:
[2]:
from pathlib import Path
from openprotein import Protein, Model
import numpy as np
import requests
DATA_DIR = Path("data/")
DATA_DIR.mkdir(exist_ok=True)
# Download 3DI3 from RCSB
pdb_id = "3DI3"
structure_filepath = DATA_DIR / f"{pdb_id}.pdb"
if not structure_filepath.exists():
url = f"https://files.rcsb.org/download/{pdb_id}.pdb"
response = requests.get(url)
structure_filepath.write_text(response.text)
# Load the receptor chain (IL-7Ra is chain B)
target_protein = Protein.from_filepath(path=structure_filepath, chain_id="B")
print("target sequence:", target_protein.sequence)
print("target coordinates shape:", target_protein.coordinates.shape)
print("target plddt shape:", target_protein.plddt.shape)
print("target name:", target_protein.name)
target sequence: b'GSHMESGYAQNGDLEDAELDDYSFSCYSQLEVNGSQHSLTCAFEDPDVNTTNLEFEICGALVEVKCLNFRKLQEIYFIETKKFLLIGKSNICVKVGEKSLTCKKIDLTTIVKPEAPFDLSVVYREGANDFVVTFNTSHLQKKYVKVLMHDVAYRQEKDENKWTHVNLSSTKLTLLQRKLQPAAMYEIKVRSIPDHYFKGFWSEWSPSYYFRTPEINNSSGEMD'
target coordinates shape: (223, 37, 3)
target plddt shape: (223,)
target name: 3DI3
Visualize#
We can visually inspect the target structure using molviewspec:
[3]:
%pip install molviewspec
Requirement already satisfied: molviewspec in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (1.7.0)
Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /nix/store/7lagvix8y98xrdj17qz5wllxnksbfh0s-python3.13-typing-extensions-4.15.0/lib/python3.13/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Note: you may need to restart the kernel to use updated packages.
[4]:
from molviewspec import create_builder
def visualize_pdb(pdb_string: str):
builder = create_builder()
structure = builder.download(url="mystructure.pdb")\
.parse(format="pdb")\
.model_structure()\
.component()\
.representation()\
.color_from_source(schema="atom",
category_name="atom_site",
field_name="auth_asym_id",
palette={"kind": "categorical", # color by chain
"colors": ["blue", "red", "green", "orange"],
"mode": "ordinal"}
)
builder.molstar_notebook(data={'mystructure.pdb': pdb_string}, width=500, height=400)
visualize_pdb(target_protein.make_pdb_string())
Binding region selection#
According to the supplementary information of Watson et al (2023), the following hotspots or binding sites were chosen for 3DI3, which we will use for our walkthrough as well:
B58, B80, B139
Note: RFdiffusion has been trained with masking hotspots, so we only need to pick a few potential contact sites within our areas of interest. Refer to the official RFdiffusion docs for tips on picking hotspots.
To encode these into our generate query, we use the set_binding_at method for the Protein.
[5]:
from openprotein.protein import Binding
binding_sites = [58,80,139]
target_protein = target_protein.set_binding_at(binding_sites, Binding.BINDING)
# Verify the binding is set
target_protein.get_binding_at(binding_sites)
[5]:
array(['B', 'B', 'B'], dtype='<U1')
Generate designs with RFdiffusion#
Query design#
To generate a binder with RFdiffusion, we need to specify there is another unknown chain. For this walkthrough, we’ll keep the full target chain and generate a separate binder chain of length 80 residues. To encode this as a Query, we first create a Protein chain with length
We can use
Protein.from_expras an easy constructor for specifying chains with unknown fragments.
The structure mask determines which part of the structure should be designed. The X below is indicating the sequence mask, which is used in inverse folding, which we will also do in the next step after generating the structure designs. We can examine the structure mask using get_structure_mask.
[6]:
binder_chain = Protein.from_expr(80)
print("binder sequence:", binder_chain.sequence)
print("binder structure mask:", binder_chain.get_structure_mask())
binder sequence: b'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
binder structure mask: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
73 74 75 76 77 78 79 80]
As we can see, the whole structure of our binder chain is masked, which is telling the model to fully design the chain. And to indicate to the model that the design is to be done in the presence of another chain, we combine our binder and target Protein objects to create a Model, which represents a multimer.
But before that, let’s quickly examine the structure mask of our target protein to avoid doing unnecessary design.
[7]:
print("target structure mask:", target_protein.get_structure_mask())
target structure mask: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 214 215 216 217 218 219 220 221 222 223]
This is important with RFdiffusion - we should drop any residues that we don’t actually want to design. This saves compute time and also seems to cause some errors in using the model. We should also only do this after setting our hotspots or binding sites since the deletion shifts our residue indices.
[8]:
target_protein = target_protein.delete(target_protein.get_structure_mask())
print("target structure mask:", target_protein.get_structure_mask())
target structure mask: []
That’s better. Now let’s combine our two chains to specify the full query Model object.
[9]:
query_model = target_protein & binder_chain
print("Chains in query:", list(query_model.proteins.keys()))
print("Chain A (target chain):", query_model.proteins["A"].sequence)
print("Chain B (binder chain):", query_model.proteins["B"].sequence)
Chains in query: ['A', 'B']
Chain A (target chain): b'DYSFSCYSQLEVNGSQHSLTCAFEDPDVNTTNLEFEICGALVEVKCLNFRKLQEIYFIETKKFLLIGKSNICVKVGEKSLTCKKIDLTTIVKPEAPFDLSVVYREGANDFVVTFNTSHLQKKYVKVLMHDVAYRQEKDENKWTHVNLSSTKLTLLQRKLQPAAMYEIKVRSIPDHYFKGFWSEWSPSYYFRTP'
Chain B (binder chain): b'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
Run the design job#
Following Bennett et al. (2023), we reduce the noise added during generation, which has been found to help with binder design, albeit at the cost of some diversity:
[10]:
rfdiffusion_design_params = {
"denoiser.noise_scale_ca": 0.5,
"denoiser.noise_scale_frame": 0.5
}
With these inputs, we can run RFdiffusion to generate designs for both potential binding regions:
[11]:
# Number of designs to generate
N = 100
rfdiffusion_job = session.models.rfdiffusion.generate(
query=query_model,
N=N,
**rfdiffusion_design_params,
)
rfdiffusion_job
[11]:
RFdiffusionJob(job_id='3c61719e-67ea-4190-acdd-6e8e8aae7147', job_type='/models/rfdiffusion', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 12, 18, 21, 1, 46, 352527, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Wait for completion#
Wait for the designs to complete. Note that this can take some time depending on the queue:
[12]:
rfdiffusion_job.wait_until_done(verbose=True, timeout=60*60)
Waiting: 100%|██████████████████████████████████████████████████| 100/100 [00:00<00:00, 600.93it/s, status=SUCCESS]
[12]:
True
Analyze generated designs#
Let’s first retrieve our designs, and inspect the first design.
[13]:
rfdiffusion_models = rfdiffusion_job.get()
print("chains in design:", list(rfdiffusion_models[0].proteins.keys()))
print("first design chain A sequence:", rfdiffusion_models[0].proteins["A"].sequence)
print("first design chain B sequence:", rfdiffusion_models[0].proteins["B"].sequence)
print("first design chain A mask:", rfdiffusion_models[0].proteins["A"].get_structure_mask())
print("first design chain B mask:", rfdiffusion_models[0].proteins["B"].get_structure_mask())
chains in design: ['A', 'B']
first design chain A sequence: b'GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG'
first design chain B sequence: b'DYSFSCYSQLEVNGSQHSLTCAFEDPDVNTTNLEFEICGALVEVKCLNFRKLQEIYFIETKKFLLIGKSNICVKVGEKSLTCKKIDLTTIVKPEAPFDLSVVYREGANDFVVTFNTSHLQKKYVKVLMHDVAYRQEKDENKWTHVNLSSTKLTLLQRKLQPAAMYEIKVRSIPDHYFKGFWSEWSPSYYFRTP'
first design chain A mask: []
first design chain B mask: []
As we can see, we are returned with two chains with their structures fully designed. Note however that RFdiffusion has re-positioned our chains. Our binder is now chain A and the target is chain B. This is important to be careful about. Also, RFdiffusion has set our binder sequence as G, which is not a big deal, we will want to mask these for inverse folding in our next step anyway.
We can also visually inspect the design:
[14]:
visualize_pdb(rfdiffusion_models[0].make_pdb_string())
Now let’s iterate through these designs and save them.
[15]:
import os
import numpy as np
from pathlib import Path
import string
OUTPUT_DIR = Path("data/outputs/3DI3_binder_designs")
os.makedirs(OUTPUT_DIR, exist_ok=True)
for i in range(N):
# Retrieve the completed design
designed_model = rfdiffusion_models[i]
# Save the full complex
with open(f"{OUTPUT_DIR}/design{i+1}.pdb", "w") as f:
f.write(designed_model.make_pdb_string())
Inverse Folding with ProteinMPNN#
Following Bennett et al. (2023), we’ll use ProteinMPNN for inverse folding to design sequences that adopt the designed binder structures.
For each of the 100 designs, we will generate 10 proposed sequences from inverse folding.
[16]:
proteinmpnn_jobs = []
for i in range(N):
rfdiffusion_model = rfdiffusion_models[i]
# Mask the binder sequence to indicate that it should be generated
rfdiffusion_model.mask_sequence(chain_ids="A")
# Use ProteinMPNN to design sequences for the binder backbone
mpnn_job = session.models.proteinmpnn.generate(
query=rfdiffusion_model,
num_samples=10,
temperature=0.1, # Bennett et al. used low temperature
seed=42,
)
proteinmpnn_jobs.append(mpnn_job)
# Wait for all jobs to complete
for mpnn_job in proteinmpnn_jobs:
mpnn_job.wait_until_done(timeout=600)
assert mpnn_job.status == "SUCCESS"
Let’s look at the output from one of the ProteinMPNN jobs.
[17]:
proteinmpnn_designs[0].get()
[17]:
[Score(name='generated-sequence-1', sequence='MKKTYTDTVRVIKTSPDTYSLSITVNLDGEKVTISMEVPNTKELTKKKTVTTSSGKKYKITLKLTLEGDEWKVEITIEEL', score=array([1.1707])),
Score(name='generated-sequence-2', sequence='MTKKETTTAKAIEISPDTLDIVIYVNLNGETVTLAMTIPNTPKLKKKVTVTTSSGKKYEIDLEITLEGDEYKINITIKEL', score=array([1.1413])),
Score(name='generated-sequence-3', sequence='MKKEEKTTAKAIKISPDTYEISIDIELDGEKVTISKTIPNTEELEKEVTVTTSSGKKYKIKLKLKLKGDEWEIEITIEEL', score=array([1.1164])),
Score(name='generated-sequence-4', sequence='MTKTETTYVKAIEVSPDTLQAVLDITLDGEKVTLALTIPNTKEFTKEKTVTTSSGKKYKITLKGTLEGDKLKVTITIEEL', score=array([1.0946])),
Score(name='generated-sequence-5', sequence='MTKTYTTTVRVIEISPNTLDYVLYVNLNGETVVIAKTIPNTPEFTHHDIVTTSSGKKYEIDIKGKLEGDNLNLKITIKEL', score=array([1.2136])),
Score(name='generated-sequence-6', sequence='ATTTETTRARAIKISPDKYEISIDLTLNGETVTLNLVIPNTPTLTVTRTVTTSSGKKYKVTLKLTLEGDEWLIDITTEEL', score=array([1.15])),
Score(name='generated-sequence-7', sequence='ETSKEHTTARAIQIDPTTYDTVIDITLGGEKQTIAMRVPNTPTLSKERTITTSSGEKYRINLKITRNGDTWNIDITIEKL', score=array([1.1796])),
Score(name='generated-sequence-8', sequence='EKKEETTTVRAIEISPDTLDTVIDITLNGEKVTIAMRIPNSEELEKEKTVTTSSGKKYKIKMKFKREGDKLNVKITIEEL', score=array([1.1133])),
Score(name='generated-sequence-9', sequence='KKEEYTTTVRAIKISPDTYEISIDVTLNGEKKTINMTVPNTEKLEKEKTITTSSGKKYKIKLELTKEGDTWKVKITIEEL', score=array([1.1253])),
Score(name='generated-sequence-10', sequence='EKKEETQTVRAIKISPDKLETVLDINLNGEKKTISMIIPNSKELEKEKTITTSSGEKYKVKLKLKLEGDKLLVKITIEKL', score=array([1.1292]))]
Each of these 10 sequences correspond to the first design from RFdiffusion. Let’s save the ProteinMPNN predictions together with the RFdiffusion designs so that we have a 1000 of these potential designs.
[18]:
scores = []
for i in range(N):
rfdiffusion_model = rfdiffusion_models[i]
mpnn_job = proteinmpnn_designs[i]
mpnn_results = mpnn_job.get()
for j, (_, sequence, score) in enumerate(mpnn_results):
# replace chain explicitly due to defensive copy
binder = generated_model.proteins["A"]
binder.sequence = sequence
generated_model.proteins["A"] = binder
scores.append(score.item())
with open(f"{OUTPUT_DIR}/design{i+1}_mpnn{j+1}.pdb", "w") as f:
f.write(generated_model.make_pdb_string())
with open(f"{OUTPUT_DIR}/mpnn_scores.txt", "w") as f:
f.write("\n".join([str(score) for score in scores]))
Let’s just verify that our new designed model looks correct:
[19]:
from openprotein import Model
OUTPUT_DIR = Path("data/outputs/3DI3_binder_designs")
proteinmpnn_model = Model.from_filepath(f"{OUTPUT_DIR}/design1_mpnn1.pdb")
print("chains in proteinmpnn + rfdiffusion design:", list(proteinmpnn_model.proteins.keys()))
print("binder sequence:", proteinmpnn_model.proteins["A"].sequence)
print("target sequence:", proteinmpnn_model.proteins["B"].sequence)
print("binder mask:", proteinmpnn_model.proteins["A"].get_structure_mask())
print("target mask:", proteinmpnn_model.proteins["B"].get_structure_mask())
Notice that what we have is a combination of the two models: the binder structure is from RFdiffusion and the inverse folded binder sequence is from ProteinMPNN. The next step is to check if the predicted multimer folds into what we expect.
Structure Prediction with ESMFold#
Whilst Bennett et al. (2023) and Watson et al. (2023) both used AlphaFold2 to re-fold their designs, we will use ESMFold instead to validate our designs.
The key insight from their paper is that AF2’s prediction confidence metrics (particularly pAEinteraction) can effectively discriminate successful binders from failures. Bennett et al. found that the pAEinteraction metric (average pAE of interchain residue pairs) was extremely effective at identifying successful binders, with sharp increases in success rates for designs with pAEinteraction < 10.
We can obtain these same metrics with ESMFold, which will also run a lot faster than AF2. The papers also used AF2 initial guess, with templates, which are features not yet ready for use with our AF2 on our platform. This walkthrough will be updated if add support for these features and find that the AF2 metrics perform better. The key point to note is that our platform allows easy drop-in replacements for various steps in your protein design pipeline.
[20]:
proteinmpnn_models = []
for i in range(N):
for j in range(10):
proteinmpnn_model = Model.from_filepath(f"{OUTPUT_DIR}/design{i+1}_mpnn{j+1}.pdb")
proteinmpnn_models.append(proteinmpnn_model)
esmfold_job = session.fold.esmfold.fold(
proteinmpnn_models
)
esmfold_job
[20]:
FoldJob(num_records=1000, job_id='58a6a0ee-285a-40ff-bc18-32392d9f4d51', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 12, 19, 19, 1, 16, 12126, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Wait for completion. This will likely take quite around an hour.
[21]:
esmfold_job.wait_until_done(verbose=True, timeout=60*60)
Waiting: 100%|██████████████████████████████████████████████████| 100/100 [00:00<00:00, 566.69it/s, status=SUCCESS]
[21]:
True
Let’s retrieve and inspect the ESMFold fold results:
[22]:
esmfold_results = esmfold_job.get()
esmfold_seq, esmfold_model = esmfold_results[0] # a fold returns (seq, model) tuples
print("chains in folded model:", list(esmfold_model.proteins.keys()))
print("first fold chain A sequence:", esmfold_model.proteins["A"].sequence)
print("first fold chain B sequence:", esmfold_model.proteins["B"].sequence)
print("first fold chain A mask:", esmfold_model.proteins["A"].get_structure_mask())
print("first fold chain B mask:", esmfold_model.proteins["B"].get_structure_mask())
chains in folded model: ['A', 'B']
first fold chain A sequence: b'MKKTYTDTVRVIKTSPDTYSLSITVNLDGEKVTISMEVPNTKELTKKKTVTTSSGKKYKITLKLTLEGDEWKVEITIEEL'
first fold chain B sequence: b'DYSFSCYSQLEVNGSQHSLTCAFEDPDVNTTNLEFEICGALVEVKCLNFRKLQEIYFIETKKFLLIGKSNICVKVGEKSLTCKKIDLTTIVKPEAPFDLSVVYREGANDFVVTFNTSHLQKKYVKVLMHDVAYRQEKDENKWTHVNLSSTKLTLLQRKLQPAAMYEIKVRSIPDHYFKGFWSEWSPSYYFRTP'
first fold chain A mask: []
first fold chain B mask: []
As expected, our sequences are the same and the structure mask is there, meaning the whole structure for the complex is predicted by ESMFold.
We can also retrieve the pAE matrix predicted by ESMFold, which are useful as metrics for measuring the success of our designs. This could take awhile to retrieve all the results.
[23]:
esmfold_pae_results = esmfold_job.pae
esmfold_seq, esmfold_complex_pae = esmfold_pae_results[0]
print("pae interaction shape:", esmfold_complex_pae.shape)
pae interaction shape: (273, 273)
Ranking designs by metrics#
Following Bennet et al. (2023), we’ll rank our designs based on:
Monomer pLDDT (confidence that sequence folds to designed structure)
Complex pAE interaction (confidence that binder forms intended interface)
Complex Cα RMSD to designed structure
[24]:
import pandas as pd
design_files = []
plddt_scores = []
pae_scores = []
rmsd_scores = []
for i in range(N*10):
# Get ESMFold predictions
_, esmfold_model = esmfold_results[i]
design_files.append(f"design{i//10}_mpnn{i%10}.pdb")
binder = esmfold_model.proteins["A"]
target = esmfold_model.proteins["B"]
# Get pLDDT of binder
plddt_score = np.mean(binder.plddt)
plddt_scores.append(plddt_score)
# Get pAE
_, esmfold_complex_pae = esmfold_pae_results[i]
binder_target_pae = esmfold_complex_pae.squeeze() # squeeze the shape
pae_interaction_1 = np.mean(binder_target_pae[len(binder):,:len(binder)])
pae_interaction_2 = np.mean(binder_target_pae[:len(binder),len(binder):])
pae_interaction_total = (pae_interaction_1 + pae_interaction_2) / 2
pae_scores.append(pae_interaction_total)
# RMSD between designed binder and folded binder
designed_binder = rfdiffusion_models[i//10].proteins["A"]
folded_binder = binder
binder_rmsd = designed_binder.rmsd(folded_binder, backbone_only=True)
rmsd_scores.append(binder_rmsd)
df = pd.DataFrame({"design_file": design_files, "plddt": plddt_scores, "pae": pae_scores, "rmsd": rmsd_scores})
print(df.head(10))
design_file plddt pae rmsd
0 design0_mpnn0.pdb 64.985374 26.108816 2.508876
1 design0_mpnn1.pdb 60.023247 26.559431 2.313612
2 design0_mpnn2.pdb 64.042992 26.312916 3.289286
3 design0_mpnn3.pdb 65.501999 25.959415 2.709022
4 design0_mpnn4.pdb 65.479004 24.624670 2.541036
5 design0_mpnn5.pdb 72.557373 25.386769 0.839039
6 design0_mpnn6.pdb 58.761375 25.435695 3.850175
7 design0_mpnn7.pdb 69.294373 24.323982 1.071133
8 design0_mpnn8.pdb 63.944874 26.111162 3.166059
9 design0_mpnn9.pdb 61.955128 26.144578 4.354014
Analysis and Ranking#
Let’s rank the successful designs by their AF2 metrics:
[25]:
import pandas as pd
df_sorted = df.sort_values(by=["plddt", "pae", "rmsd"], ascending=[False, True, True])
print(df_sorted.head(10))
# Save rankings
df_sorted.to_csv(OUTPUT_DIR / f"rankings.csv", index=False)
design_file plddt pae rmsd
993 design99_mpnn3.pdb 85.635620 14.546918 0.576264
362 design36_mpnn2.pdb 85.087120 24.013505 0.558198
365 design36_mpnn5.pdb 84.757126 11.708263 0.376451
363 design36_mpnn3.pdb 83.749001 14.447840 0.695756
361 design36_mpnn1.pdb 83.108627 15.472516 0.395722
469 design46_mpnn9.pdb 82.964622 11.207534 0.656172
387 design38_mpnn7.pdb 82.905998 23.893167 0.725844
280 design28_mpnn0.pdb 82.414253 24.409521 0.538304
895 design89_mpnn5.pdb 82.379875 13.552719 0.630388
385 design38_mpnn5.pdb 82.042374 25.747786 0.546411
Summary#
In this tutorial, we’ve demonstrated the deep learning-augmented binder design workflow using RFdiffusion, ProteinMPNN and ESMFold:
Target Selection: Downloaded 3DI3 structure from RCSB PDB
Hotspot Identification: Selected binding regions based on known ligand-receptor interactions
Structure Generation: Used RFdiffusion to generate binder backbones
Sequence Design: Applied ProteinMPNN for fast, efficient sequence design
Validation: Used ESMFold to rank designs based on:
Monomer folding confidence (pLDDT)
Complex formation confidence (pAE interaction)
Structural accuracy (RMSD)
This approach achieves ~10-fold higher success rates compared to purely physics-based methods by leveraging deep learning models to identify Type I failures (sequences that don’t fold as intended) and Type II failures (structures that don’t bind as intended).
Next Steps#
The top-ranked designs from this workflow can be:
Expressed and purified for experimental validation
Tested for binding affinity
Further optimized through additional rounds of design