Python API Documentation#
The OpenProtein Python SDK provides a pythonic interface to the OpenProtein.AI platform for protein engineering. This client library enables you to leverage state-of-the-art foundation models, train custom predictors, design novel sequences, and predict protein structures.
Getting Started#
Install the package via pip or conda (installation guide)
Create a session to authenticate with the platform (session setup)
Choose your workflow based on your protein engineering goals
Usage#
import openprotein
# Connect to the platform
session = openprotein.connect(username="your_username", password="your_password")
# Example: Generate embeddings
future = session.embedding.esm2.embed(sequences=["ACDEFGHIKLMNPQRSTVWY"])
embeddings = future.wait()
Core Concepts#
Understanding these primitives will help you work effectively with the SDK:
- Session Management
The
sessionobject (OpenProtein) is your gateway to all platform capabilities. It manages authentication and provides access to all API modules (session.embedding,session.fold,session.predictor, etc.).- Asynchronous Jobs
Most operations return
Futureobjects that track asynchronous jobs. Usewait()to block until completion, and retrieve the results. Learn more in the Job System section.- Protein Primitives
Protein: Represents a single protein chain with sequence and optional MSALigand,DNA,RNA: Represents other possible chainsModel: A collection of proteins and chains forming a complexAssayDataset: Your experimental data (sequences + measured properties)
Embeddings & Reductions
Foundation models produce embeddings that can be reduced (
MEAN,SUM), kept per-residue (withreduction=None), or transformed with a custom-fitted SVD. These embeddings power downstream prediction and design tasks.
Platform Capabilities#
The SDK is organized around key protein engineering workflows:
Foundation Models#
Foundation models provide high-quality protein embeddings and sequence-level representations for downstream analysis and design.
They support both general-purpose and protein-family–conditioned workflows.
Capabilities
Access to PoET, proprietary OpenProtein models, and community models such as ESM
Per-residue embeddings or reduced representations (mean / sum pooling)
Logits and attention maps for interpretability
Learn more
PoET#
PoET is a conditional protein language model designed for zero-shot prediction and generation conditioned on protein families.
Capabilities
Prompt construction from MSAs
Zero-shot sequence scoring without experimental data
Conditional sequence generation
Single-site variant effect analysis
Learn more
Data Management#
Upload and manage your experimental datasets
Store assay data (sequences + measurements) on the platform
Use datasets for training predictors and design workflows
Learn more
Prediction & Design#
Property Regression Models#
Train custom models on your data
Fit Gaussian Process models using foundation model embeddings
Cross-validation for uncertainty estimation
Predict properties for novel sequences
Single-site saturation mutagenesis analysis
Learn more
Sequence Design#
Optimize sequences for your objectives
Genetic algorithm-based design using trained predictors
Multi-objective optimization support
Design novel variants optimized for your measured properties
Learn more
Structure#
Structure Prediction#
Predict 3D structures from sequences
ESMFold for fast single-chain folding
AlphaFold2 for high-accuracy multi-chain complexes
Boltz (1, 1x, 2) for advanced complex prediction with constraints
RosettaFold3 for alternative multi-chain folding
Learn more
Structure Generation#
Design binders or novel protein structures de novo
RFdiffusion for diffusion-based structure generation
BoltzGen for generative structure design
Useful for binder design and scaffold generation
Learn more
Supporting Tools#
Alignment#
Multiple sequence alignment and antibody numbering
Create MSAs via homology search (MMseqs2)
MAFFT and ClustalOmega alignment
AbNumber for antibody numbering schemes
Learn more
Dimensionality Reduction#
Visualize and analyze embeddings
SVD for linear dimensionality reduction
UMAP for non-linear manifold learning
Fit on training data, transform new sequences
Learn more
Common Workflows#
Workflow 1: Zero-shot prediction with PoET
Create MSA from your seed sequence → session.align.create_msa
Create a prompt from the MSA →
session.prompt.create()Score your variants →
session.embedding.poet.score()
Workflow 2: Train a custom predictor
Upload your assay data →
session.data.create()Train a GP model →
session.embedding.esm2.fit_gp()Predict on new sequences →
predictor.predict()Design optimized variants →
session.design.genetic_algorithm()
Workflow 3: Structure prediction
For single chains:
session.fold.esmfold.fold()For complexes: Create MSA → Build
Proteinobjects →session.fold.alphafold2.fold()
Next Steps#
New users: Start with Installation and Quickstart
Learn the basics: Review the Job System to understand async operations
Explore tutorials: Browse capability-specific tutorials below
API reference: Detailed documentation for all classes and methods