Tutorials
Tutorials for accessing data.
Quick-start
Click here for a complete repo of Kaggle tutorials for the BLASTNet datasets.
Browsing Data
Click here for a quick-start tutorial on browsing BLASTNet data. Make sure to click “Copy & Edit” to run the code on Kaggle’s cloud computing platform. This is the quickest way to interact with our datasets.
Training and Testing ML Models
Click here when you’re ready train on multi-GPUs with TensorFlow. Make sure to click “Copy & Edit” to run the code on Kaggle’s cloud computing platform.
Using your own workstation
Click here for a Google Colab quick-start tutorial on how to use Kaggle, as well as reading and writing basic data formats from BLASTNet. This is useful when you want to use your own workstation with BLASTNet.
Kaggle Command Line API
In BLASTNet, we share our data with Kaggle. Kaggle has released a terminal interface that lets you upload and download data in method suited for most scientific clusters. Go to the Kaggle API GitHub for detailed instructions in their README.
Kaggle API Installation
We will provide quick start instructions here to quickly share data on Kaggle. Pre-requisites: python3.
- Install Kaggle API for python3
pip install kaggle
- Create a Kaggle account here with a valid
<username>
. - Go to your account page
https://www.kaggle.com/<username>/account
and click on ‘Create API Token’ to download akaggle.json
file. - Move the files to the default location and change the permissions:
mkdir ~/.kaggle mv kaggle.json ~/.kaggle/kaggle.json chmod 600 ~/.kaggle/kaggle.json
- Now you’re ready to download and upload.
Kaggle Download
To download a single file from Kaggle:
kaggle datasets download <username>/<datasetname> -f <filename>
To download all files from a dataset from Kaggle:
kaggle datasets download <username>/<datasetname>
Here <username>
is the username of the contributor.
Kaggle Upload
To upload an entire dataset:
- Initialize the uploading process by:
kaggle datasets init -p <path/to/dataset>
- This results in a dataset-metadata.json file that you have to populate to fill in your dataset title in
"title"
and url in"id"
via:vi /path/to/dataset/dataset-metadata.json
Note that only alphanumeric and hyphens - are allowed.
- Put your files into 3 folders (
<data>
,grid
, and<chem_thermo_tran>
), since Kaggle’s API can only show 20 directories/files at most. - Upload your dataset (Kaggle automatically zips folders into .tar file) via:
kaggle datasets create -u -p <path/to/dataset> --dir-mode 'tar'
or update your previously created datasets with
kaggle datasets version -h