Python API Documentation#

The OpenProtein Python SDK provides a pythonic interface to the OpenProtein.AI platform for protein engineering. This client library enables you to leverage state-of-the-art foundation models, train custom predictors, design novel sequences, and predict protein structures.

Getting Started#

  1. Install the package via pip or conda (installation guide)

  2. Create a session to authenticate with the platform (session setup)

  3. Choose your workflow based on your protein engineering goals

Usage#

import openprotein

# Connect to the platform
session = openprotein.connect(username="your_username", password="your_password")

# Example: Generate embeddings
future = session.embedding.esm2.embed(sequences=["ACDEFGHIKLMNPQRSTVWY"])
embeddings = future.wait()

Core Concepts#

Understanding these primitives will help you work effectively with the SDK:

Session Management

The session object (OpenProtein) is your gateway to all platform capabilities. It manages authentication and provides access to all API modules (session.embedding, session.fold, session.predictor, etc.).

Asynchronous Jobs

Most operations return Future objects that track asynchronous jobs. Use wait() to block until completion, and retrieve the results. Learn more in the Job System section.

Protein Primitives
  • Protein: Represents a single protein chain with sequence and optional MSA

  • Ligand, DNA, RNA: Represents other possible chains

  • Model: A collection of proteins and chains forming a complex

  • AssayDataset: Your experimental data (sequences + measured properties)

Embeddings & Reductions

Foundation models produce embeddings that can be reduced (MEAN, SUM), kept per-residue (with reduction=None), or transformed with a custom-fitted SVD. These embeddings power downstream prediction and design tasks.

Platform Capabilities#

The SDK is organized around key protein engineering workflows:

Foundation Models#

Foundation models provide high-quality protein embeddings and sequence-level representations for downstream analysis and design.

They support both general-purpose and protein-family–conditioned workflows.

Capabilities

  • Access to PoET, proprietary OpenProtein models, and community models such as ESM

  • Per-residue embeddings or reduced representations (mean / sum pooling)

  • Logits and attention maps for interpretability

Learn more

PoET#

PoET is a conditional protein language model designed for zero-shot prediction and generation conditioned on protein families.

Capabilities

  • Prompt construction from MSAs

  • Zero-shot sequence scoring without experimental data

  • Conditional sequence generation

  • Single-site variant effect analysis

Learn more

Data Management#

Upload and manage your experimental datasets

  • Store assay data (sequences + measurements) on the platform

  • Use datasets for training predictors and design workflows

Learn more

Prediction & Design#

Property Regression Models#

Train custom models on your data

  • Fit Gaussian Process models using foundation model embeddings

  • Cross-validation for uncertainty estimation

  • Predict properties for novel sequences

  • Single-site saturation mutagenesis analysis

Learn more

Sequence Design#

Optimize sequences for your objectives

  • Genetic algorithm-based design using trained predictors

  • Multi-objective optimization support

  • Design novel variants optimized for your measured properties

Learn more

Structure#

Structure Prediction#

Predict 3D structures from sequences

  • ESMFold for fast single-chain folding

  • AlphaFold2 for high-accuracy multi-chain complexes

  • Boltz (1, 1x, 2) for advanced complex prediction with constraints

  • RosettaFold3 for alternative multi-chain folding

Learn more

Structure Generation#

Design binders or novel protein structures de novo

  • RFdiffusion for diffusion-based structure generation

  • BoltzGen for generative structure design

  • Useful for binder design and scaffold generation

Learn more

Supporting Tools#

Alignment#

Multiple sequence alignment and antibody numbering

  • Create MSAs via homology search (MMseqs2)

  • MAFFT and ClustalOmega alignment

  • AbNumber for antibody numbering schemes

Learn more

Dimensionality Reduction#

Visualize and analyze embeddings

  • SVD for linear dimensionality reduction

  • UMAP for non-linear manifold learning

  • Fit on training data, transform new sequences

Learn more

Common Workflows#

Workflow 1: Zero-shot prediction with PoET

  1. Create MSA from your seed sequence → session.align.create_msa

  2. Create a prompt from the MSA → session.prompt.create()

  3. Score your variants → session.embedding.poet.score()

Workflow 2: Train a custom predictor

  1. Upload your assay data → session.data.create()

  2. Train a GP model → session.embedding.esm2.fit_gp()

  3. Predict on new sequences → predictor.predict()

  4. Design optimized variants → session.design.genetic_algorithm()

Workflow 3: Structure prediction

  1. For single chains: session.fold.esmfold.fold()

  2. For complexes: Create MSA → Build Protein objects → session.fold.alphafold2.fold()

Next Steps#

  • New users: Start with Installation and Quickstart

  • Learn the basics: Review the Job System to understand async operations

  • Explore tutorials: Browse capability-specific tutorials below

  • API reference: Detailed documentation for all classes and methods