openprotein.data#

Upload your dataset to OpenProtein.AI’s engineering platform for train, predict and design tasks.

Interface#

class openprotein.data.DataAPI(session)[source]#

API interface for calling AssayData endpoints

Parameters:

session (APISession)

list()[source]#

List all assay datasets.

Returns:

List of all assay datasets.

Return type:

List[AssayDataset]

create(table, name, description=None)[source]#

Create a new assay dataset.

Parameters:
  • table (pd.DataFrame) – DataFrame containing the assay data.

  • name (str) – Name of the assay dataset.

  • description (str, optional) – Description of the assay dataset, by default None.

Returns:

Created assay dataset.

Return type:

AssayDataset

get(assay_id, verbose=False)[source]#

Get an assay dataset by its ID.

Parameters:
  • assay_id (str) – ID of the assay dataset.

  • verbose (bool)

Returns:

Assay dataset with the specified ID.

Return type:

AssayDataset

Raises:

KeyError – If no assay dataset with the given ID is found.

Classes#

class openprotein.data.AssayDataset(session, metadata)[source]#

Assay dataset which contains your sequences and measurements which can be used for training predictors.

Parameters:
list_models()[source]#

List models assoicated with assay.

Returns:

List of models

Return type:

List

update(assay_name=None, assay_description=None)[source]#

Update the assay metadata.

Parameters:
  • assay_name (str, optional) – New name of the assay, by default None.

  • assay_description (str, optional) – New description of the assay, by default None.

Return type:

None

get_first()[source]#

Get head slice of assay data.

Returns:

Dataframe containing the slice of assay data.

Return type:

pd.DataFrame

get_slice(start, end)[source]#

Get a slice of assay data.

Parameters:
  • start (int) – Start index of the slice.

  • end (int) – End index of the slice.

Returns:

Dataframe containing the slice of assay data.

Return type:

pd.DataFrame

class openprotein.data.AssayMetadata(*, assay_name, assay_description, assay_id, original_filename, created_date, num_rows, num_entries, measurement_names, sequence_length=None)[source]#
Parameters:
  • assay_name (str)

  • assay_description (str)

  • assay_id (str)

  • original_filename (str)

  • created_date (datetime)

  • num_rows (int)

  • num_entries (int)

  • measurement_names (list[str])

  • sequence_length (int | None)