Quickstart: command line tools

Install and use the NCBI Datasets command line tools

Quickstart: command line tools

Install and use the NCBI Datasets command line tools

The NCBI Datasets datasets command line tools are datasets and dataformat .

Use datasets to download biological sequence data across all domains of life from NCBI.

Use dataformat to convert metadata from JSON Lines format to other formats.



Note: The NCBI Datasets command line tools are updated frequently to add new features, fix bugs, and enhance usability. Command syntax is subject to change. Please check back often for updates.

Install NCBI Datasets command line tools

The NCBI Datasets command line tools are available on multiple platforms.

SystemArchitectureDownload
LinuxAMD64
macOSUniversal
Windows (64-bit)AMD64
LinuxARM64
LinuxARM (32-bit)

Install using conda

The NCBI Datasets command line tools are available as a conda package . It includes both datasets and dataformat.

Install the datasets conda package:
conda install -c conda-forge ncbi-datasets-cli

Install using curl

Linux

Download datasets:
curl -o datasets 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/datasets'
Download dataformat:
curl -o dataformat 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/dataformat'
Make them executable:
chmod +x datasets dataformat

macOS

Download datasets:
curl -o datasets 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/mac/datasets'
Download dataformat:
curl -o dataformat 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/mac/dataformat'
Make them executable:
chmod +x datasets dataformat

Windows

Download datasets:
curl -o datasets.exe "https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/win64/datasets.exe"
Download dataformat:
curl -o dataformat.exe "https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/win64/dataformat.exe"

Use the datasets tool to download biological data

For example, the following command downloads an NCBI Datasets Gene Data Package , including sequences and metadata, for a set of NCBI GeneIDs.

Command

datasets download gene gene-id 1,2,3,9,10,11,12,13,14,15,16,17 --filename example_gene_data_package.zip
unzip -Z1 example_gene_data_package.zip

Output

Downloading: example_gene_data_package.zip    31.9kB done
README.md
ncbi_dataset/data/gene.fna
ncbi_dataset/data/rna.fna
ncbi_dataset/data/protein.faa
ncbi_dataset/data/data_report.jsonl
ncbi_dataset/data/data_table.tsv
ncbi_dataset/data/dataset_catalog.json

Use the dataformat tool to convert data reports to other formats

A data package downloaded through NCBI Datasets services contains a data report in JSON lines format in general. The dataformat command line tool can convert it to other formats.

For example, the gene data package downloaded through the previous example contains a gene data report. The following dataformat command transforms it to TSV, a tabular form.

Command

dataformat tsv gene --fields gene-id,symbol,transcript-name --package example_gene_data_package.zip | head --lines=10

Output

NCBI GeneID	Symbol	Transcript Transcript Name
2	A2M	transcript variant 2
2	A2M	transcript variant X1
2	A2M	transcript variant 4
2	A2M	transcript variant 1
2	A2M	transcript variant 3
...

Generated November 19, 2021