Download a gene data package

Download an NCBI Datasets gene data package, including FASTA sequences and metadata

Download a gene data package

Download an NCBI Datasets gene data package, including FASTA sequences and metadata
Gene metadata and FASTA sequence are available as a zip-compressed NCBI Datasets gene data package.

Using NCBI gene IDs

Download a gene data package by providing one or more gene IDs (space delimited). If using the --inputfile option instead, each gene ID should be in a separate line.

datasets download gene gene-id 1 2 3 9 10 11 12 13 14 15 16 17

Using gene symbols

Run the following command to download a gene data package by gene symbols.

datasets download gene symbol ACRV1 A2M --taxon human 

Using transcript or protein accessions

Download a gene data package by RefSeq nucleotide or protein accession.

datasets download gene accession NM_020107.5 NP_001334352.2

Using species name

Download a gene data package by species name or Taxonomy ID. Run the following command to download a gene data package for all human genes.

datasets download gene taxon human

Choosing which data files to include in the data package

Eukaryotic gene data packages contain transcript and protein sequences and metadata by default, while prokaryotic data packages (WP_ accessions only) contain gene and protein sequences, plus metadata. You can choose to add additional data files or only include metadata in the data package using --include with one or more terms.

Here are a few examples of using the --include flag to choose which data files to include in the data package.

Get gene and protein sequences for the human BRCA1 gene (gene-id: 672):

datasets download gene gene-id 672 --include gene,protein

Get gene, transcript, CDS and protein sequences for the human BRCA1 gene (Gene ID: 672):

datasets download gene gene-id 672 --include gene,rna,cds,protein

Get a data package with only the gene data report (metadata):

datasets download gene gene-id 672 --include none
Generated May 21, 2024