openprotein.protein.Protein#
- class openprotein.protein.Protein(sequence=None, coordinates=None, plddt=None, name=None)[source]#
Represents a protein with optional sequence, atomic coordinates, per-residue confidence scores (pLDDT), and name.
This class supports partial or complete information: users may initialize a Protein with only a sequence, only a structure, or both. The class ensures that all provided fields have consistent residue-level lengths and provides convenient methods for indexing, masking, and structural comparisons.
- sequence#
Amino acid sequence as bytes. Unknown or masked residues are represented as b”X”.
- coordinates#
an array containing the 3D coordinates of the heavy atoms of the protein in atom37 format. It has shape (L, 37, 3), where L is the length of the protein, 37 is the number of heavy atoms, and 3 is the number of coordinates (x, y, and z).
- plddt#
an array of shape (L,). For predicted structures, this contains the pLDDT of each residue, which is a measure of prediction confidence. For experimental structures, this should be set to 100 if the coordinates of the alpha carbon are known, and NaN otherwise.
- name#
Optional identifier for the protein as a string.
- Conventions:
Missing or unknown residues in the sequence are denoted by b”X”.
Missing structural data (coordinates or pLDDT) are represented by NaN.
Residue indices are 1-based for user-facing methods (e.g., mask_sequence_at), but internally stored as 0-based arrays.
Examples
- Create a Protein from sequence only:
Protein(sequence=”ACDEFGHIK”)
- Create a Protein from sequence and name:
Protein(sequence=”ACDEFGHIK”, name=”my_protein”)
- Create a Protein with sequence and structure:
Protein(sequence=”ACD”, coordinates=coords_array, plddt=plddt_array)
- Raises:
ValueError – If sequence, coordinates, or pLDDT are specified with inconsistent lengths.
ValueError – If none of sequence, coordinates, or pLDDT are provided.
- Parameters:
sequence (bytes | str | None)
coordinates (ndarray[tuple[Any, ...], dtype[float32]] | None)
plddt (ndarray[tuple[Any, ...], dtype[float32]] | None)
name (bytes | str | None)
- __init__(sequence=None, coordinates=None, plddt=None, name=None)[source]#
- Parameters:
sequence (bytes | str | None)
coordinates (ndarray[tuple[Any, ...], dtype[float32]] | None)
plddt (ndarray[tuple[Any, ...], dtype[float32]] | None)
name (bytes | str | None)
Methods
__init__([sequence, coordinates, plddt, name])at(positions)Return a new Protein object containing residues at given 1-indexed positions.
from_filepath(path, chain_id[, ...])Create a Protein from a structure file.
from_string(filestring, format, chain_id[, ...])from_structure(structure, chain_id[, ...])make_cif_string()make_fasta_bytes()make_pdb_string()mask_sequence_at(positions)Mask sequence at given 1-indexed positions.
mask_sequence_except_at(positions)Mask sequence at all positions except the given 1-indexed positions.
mask_structure_at(positions)Mask structure at given 1-indexed positions.
mask_structure_except_at(positions)Mask structure at all positions except the given 1-indexed positions.
rmsd(tgt[, backbone_only])Compute the root-mean-square deviation (RMSD) between this Protein and a target Protein.
Attributes
chain_idcyclichas_structureWhether or not the structure is known at any position in the protein.
msa