Uploading data#

This tutorial shows you how to upload a dataset for training a model.

What you need before getting started#

Format your dataset as a 2-column CSV, including the full sequence of each variant and one or more columns for your measured properties.

Uploading your dataset#

Run the command below, with your dataset’s path in brackets:

[ ]:
import pandas as pd
dataset = pd.read_csv("./example_dataset.csv")
dataset.head(2)
sequence Name acetamide_normalized_fitness isobutyramide_normalized_fitness propionamide_normalized_fitness
0 WRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMK... Seq-1 NaN -0.5174 NaN
1 WRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMK... seq-2 -2.1514 -0.5154 -1.1457

Using your data#

To use the data with our suite of tools, create the dataset in the backend:

[ ]:
assay = session.data.create(dataset, "Dataset name", "Dataset description")
assay_id = assay.id
assay
AssayMetadata(assay_name='Dataset Name', assay_description='Dataset description', assay_id='b6ee60f0-05ad-4c55-ad25-8006b09220ba', original_filename='assay_data', created_date=datetime.datetime(2024, 5, 9, 5, 57, 9, 973856), num_rows=15, num_entries=41, measurement_names=['isobutyramide_normalized_fitness', 'acetamide_normalized_fitness', 'propionamide_normalized_fitness'], sequence_length=346)

Next steps#

Our Data API page contains more information about using your dataset with the API.

You can use your uploaded dataset to train models, perform a single site analysis, and design sequences. For more information, see Training models, Single site analysis, and Designing sequences.