Python API Documentation#

The OpenProtein Python SDK provides a pythonic interface to the OpenProtein.AI platform for protein engineering. This client library enables you to leverage state-of-the-art foundation models, train custom predictors, design novel sequences, and predict protein structures.

Getting Started#

Install the package via pip or conda (installation guide)
Create a session to authenticate with the platform (session setup)
Choose your workflow based on your protein engineering goals

Usage#

import openprotein

# Connect to the platform
session = openprotein.connect(username="your_username", password="your_password")

# Example: Generate embeddings
future = session.embedding.esm2.embed(sequences=["ACDEFGHIKLMNPQRSTVWY"])
embeddings = future.wait()

Core Concepts#

Understanding these primitives will help you work effectively with the SDK:

Session Management

The session object (OpenProtein) is your gateway to all platform capabilities. It manages authentication and provides access to all API modules (session.embedding, session.fold, session.predictor, etc.).

Asynchronous Jobs

Most operations return Future objects that track asynchronous jobs. Use wait() to block until completion, and retrieve the results. Learn more in the Job System section.

Protein Primitives

Protein: Represents a single protein chain with sequence and optional MSA
Ligand, DNA, RNA: Represents other possible chains
Model: A collection of proteins and chains forming a complex
AssayDataset: Your experimental data (sequences + measured properties)

Embeddings & Reductions

Foundation models produce embeddings that can be reduced (MEAN, SUM), kept per-residue (with reduction=None), or transformed with a custom-fitted SVD. These embeddings power downstream prediction and design tasks.

Platform Capabilities#

The SDK is organized around key protein engineering workflows:

Foundation Models#

Foundation models provide high-quality protein embeddings and sequence-level representations for downstream analysis and design.

They support both general-purpose and protein-family–conditioned workflows.

Capabilities

Access to PoET, proprietary OpenProtein models, and community models such as ESM
Per-residue embeddings or reduced representations (mean / sum pooling)
Logits and attention maps for interpretability

Learn more

PoET#

PoET is a conditional protein language model designed for zero-shot prediction and generation conditioned on protein families.

Capabilities

Prompt construction from MSAs
Zero-shot sequence scoring without experimental data
Conditional sequence generation
Single-site variant effect analysis

Learn more

Data Management#

Upload and manage your experimental datasets

Store assay data (sequences + measurements) on the platform
Use datasets for training predictors and design workflows

Learn more

API Reference

Prediction & Design#

Property Regression Models#

Train custom models on your data

Fit Gaussian Process models using foundation model embeddings
Cross-validation for uncertainty estimation
Predict properties for novel sequences
Single-site saturation mutagenesis analysis

Learn more

Sequence Design#

Optimize sequences for your objectives

Genetic algorithm-based design using trained predictors
Multi-objective optimization support
Design novel variants optimized for your measured properties

Learn more

Structure#

Structure Prediction#

Predict 3D structures from sequences

ESMFold for fast single-chain folding
AlphaFold2 for high-accuracy multi-chain complexes
Boltz (1, 1x, 2) for advanced complex prediction with constraints
RosettaFold3 for alternative multi-chain folding

Learn more

Structure Generation#

Design binders or novel protein structures de novo

RFdiffusion for diffusion-based structure generation
BoltzGen for generative structure design
Useful for binder design and scaffold generation

Learn more

Supporting Tools#

Alignment#

Multiple sequence alignment and antibody numbering

Create MSAs via homology search (MMseqs2)
MAFFT and ClustalOmega alignment
AbNumber for antibody numbering schemes

Learn more

API Reference

Dimensionality Reduction#

Visualize and analyze embeddings

SVD for linear dimensionality reduction
UMAP for non-linear manifold learning
Fit on training data, transform new sequences

Learn more

Transform Models

Common Workflows#

Workflow 1: Zero-shot prediction with PoET

Create MSA from your seed sequence → session.align.create_msa
Create a prompt from the MSA → session.prompt.create()
Score your variants → session.embedding.poet.score()

Workflow 2: Train a custom predictor

Upload your assay data → session.data.create()
Train a GP model → session.embedding.esm2.fit_gp()
Predict on new sequences → predictor.predict()
Design optimized variants → session.design.genetic_algorithm()

Workflow 3: Structure prediction

For single chains: session.fold.esmfold.fold()
For complexes: Create MSA → Build Protein objects → session.fold.alphafold2.fold()

Next Steps#

New users: Start with Installation and Quickstart
Learn the basics: Review the Job System to understand async operations
Explore tutorials: Browse capability-specific tutorials below
API reference: Detailed documentation for all classes and methods