Uploading data#
This tutorial shows you how to upload a dataset for training a model.
What you need before getting started#
Format your dataset as a 2-column CSV, including the full sequence of each variant and one or more columns for your measured properties.
Uploading your dataset#
Run the command below, with your dataset’s path in brackets:
[ ]:
import pandas as pd
dataset = pd.read_csv("./example_dataset.csv")
dataset.head(2)
sequence | Name | acetamide_normalized_fitness | isobutyramide_normalized_fitness | propionamide_normalized_fitness | |
---|---|---|---|---|---|
0 | WRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMK... | Seq-1 | NaN | -0.5174 | NaN |
1 | WRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMK... | seq-2 | -2.1514 | -0.5154 | -1.1457 |
Using your data#
To use the data with our suite of tools, create the dataset in the backend:
[ ]:
assay = session.data.create(dataset, "Dataset name", "Dataset description")
assay_id = assay.id
assay
AssayMetadata(assay_name='Dataset Name', assay_description='Dataset description', assay_id='b6ee60f0-05ad-4c55-ad25-8006b09220ba', original_filename='assay_data', created_date=datetime.datetime(2024, 5, 9, 5, 57, 9, 973856), num_rows=15, num_entries=41, measurement_names=['isobutyramide_normalized_fitness', 'acetamide_normalized_fitness', 'propionamide_normalized_fitness'], sequence_length=346)
Next steps#
Our Data API page contains more information about using your dataset with the API.
You can use your uploaded dataset to train models, perform a single site analysis, and design sequences. For more information, see Training models, Single site analysis, and Designing sequences.