Python API Documentation#
The OpenProtein Python SDK provides a pythonic interface to the OpenProtein.AI platform for protein engineering. This client library enables you to leverage state-of-the-art foundation models, train custom predictors, design novel sequences, and predict protein structures.
Getting Started#
Install the package via pip or conda (installation guide)
Create a session to authenticate with the platform (session setup)
Choose your workflow based on your protein engineering goals
Quick Start#
import openprotein
# Connect to the platform
session = openprotein.connect(username="your_username", password="your_password")
# Example: Generate embeddings
future = session.embedding.esm2.embed(sequences=["ACDEFGHIKLMNPQRSTVWY"])
embeddings = future.wait()
Core Concepts#
Understanding these primitives will help you work effectively with the SDK:
- Session Management
The
sessionobject (OpenProtein) is your gateway to all platform capabilities. It manages authentication and provides access to all API modules (session.embedding,session.fold,session.predictor, etc.).- Asynchronous Jobs
Most operations return
Futureobjects that track asynchronous jobs. Usewait()to block until completion, orrefresh()anddone()to poll status. Learn more in the Jobs System guide.- Protein Primitives
Protein: Represents a single protein chain with sequence and optional MSAChain: Represents ligands, DNA, or RNA moleculesModel: A collection of proteins and chains forming a complexAssayDataset: Your experimental data (sequences + measured properties)
- Embeddings & Reductions
Foundation models produce embeddings that can be reduced (
MEAN,SUM), kept per-residue, or transformed with a custom-fitted SVD. These embeddings power downstream prediction and design tasks.
Platform Capabilities#
The SDK is organized around key protein engineering workflows:
Data & Embeddings#
Foundation Models - Generate high-quality protein embeddings from state-of-the-art models
Access to PoET and proprietary OpenProtein models, along with community-based models like ESM.
Per-residue or reduced embeddings (mean/sum pooling)
Logits and attention maps for interpretability
PoET - Conditional protein language model for zero-shot prediction and generation
Create prompts from MSAs to condition on protein families
Score sequences without experimental data
Generate novel sequences with desired properties
Single-site analysis for variant effect prediction
Data Management - Upload and manage your experimental datasets
Store assay data (sequences + measurements) on the platform
Use datasets for training predictors and design workflows
Prediction & Design#
Property Regression Models - Train custom models on your data
Fit Gaussian Process models using foundation model embeddings
Cross-validation for uncertainty estimation
Predict properties for novel sequences
Single-site saturation mutagenesis analysis
Sequence Design - Optimize sequences for your objectives
Genetic algorithm-based design using trained predictors
Multi-objective optimization support
Design novel variants optimized for your measured properties
Structure#
Structure Prediction - Predict 3D structures from sequences
ESMFold for fast single-chain folding
AlphaFold2 for high-accuracy multi-chain complexes
Boltz (1, 1x, 2) for advanced complex prediction with constraints
RosettaFold3 for alternative multi-chain folding
Structure Generation - Design novel protein structures de novo
RFdiffusion for diffusion-based structure generation
BoltzGen for generative structure design
Useful for binder design and scaffold generation
Supporting Tools#
Alignment - Multiple sequence alignment and antibody numbering
Create MSAs via homology search (MMseqs2)
MAFFT and ClustalOmega alignment
AbNumber for antibody numbering schemes
Dimensionality Reduction - Visualize and analyze embeddings
SVD for linear dimensionality reduction
UMAP for non-linear manifold learning
Fit on training data, transform new sequences
Common Workflows#
Workflow 1: Zero-shot prediction with PoET
Create MSA from your seed sequence →
session.align.create_msa()Create a prompt from the MSA →
session.prompt.create()Score your variants →
session.embedding.poet.score()
Workflow 2: Train a custom predictor
Upload your assay data →
session.data.create()Train a GP model →
session.embedding.esm2.fit_gp()Predict on new sequences →
predictor.predict()Design optimized variants →
session.design.genetic_algorithm()
Workflow 3: Structure prediction
For single chains:
session.fold.esmfold.fold()For complexes: Create MSA → Build
Proteinobjects →session.fold.alphafold2.fold()
Next Steps#
New users: Start with Installation and Session Setup
Learn the basics: Review the Jobs System to understand async operations
Explore tutorials: Browse capability-specific tutorials below
API reference: Detailed documentation for all classes and methods