taxon

download a genome dataset by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)

taxon

download a genome dataset by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)

Name

datasets download genome taxon - download a genome dataset by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)

Synopsis

datasets download genome taxon <taxon> [flags]

Description

Download a genome dataset by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank). Genome datasets include genome, transcript and protein sequence, annotation and a detailed data report. Datasets are downloaded as a zip file.

The default genome dataset includes the following files (if available):

  • genomic.fna (genomic sequences)
  • rna.fna (transcript sequences)
  • protein.faa (protein sequences)
  • genomic.gff (genome annotation in gff3 format)
  • data_report.jsonl (data report with genome assembly and annotation metadata)
  • dataset_catalog.json (a list of files and file types included in the dataset)

Refer to NCBI’s command line quickstart documentation for information about getting started with the command-line tools.

Examples

  datasets download genome taxon human --chromosomes 21
  datasets download genome taxon "bos taurus"
  datasets download genome taxon 10116 --exclude-seq --exclude-gff3

Options

  -a, --annotated                only include genomes with annotation
      --api-key string           NCBI Datasets API Key
      --assembly-level string    restrict assemblies to a comma-separated list of one or more of: chromosome, complete_genome, contig, scaffold
      --assembly-source string   restrict assemblies to refseq or genbank only
      --chromosomes strings      limit to a specified, comma-delimited list of chromosomes (default [all])
      --dehydrated               download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).
      --exclude-genomic-cds      exclude cds_from_genomic.fna (genomic cds file)
      --exclude-gff3             exclude genomic.gff (gff3 annotation file)
      --exclude-protein          exclude protein.faa (protein sequence file)
      --exclude-rna              exclude rna.fna (transcript sequence file)
      --exclude-seq              exclude genomic.fna (genomic sequence file)
      --filename string          specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
  -h, --help                     help for taxon
      --include-gbff             include genomic.gbff (GenBank flat file sequence and annotation), if available
      --include-gtf              include genomic.gtf (gtf annotation file), if available
      --no-progressbar           hide progress bar
      --reference                limit to reference and representative (GCF_ and GCA_) assemblies
      --released-before string   only include genomes that have been released before a specified date (MM/DD/YYYY)
      --released-since string    only include genomes that have been released after a specified date (MM/DD/YYYY)
      --search strings           only include genomes that have the specified text in the
                                 searchable fields: species and infraspecies, assembly name and submitter
                                 To provide multiple strings '--search' can be included multiple times
      --tax-exact-match          exclude sub-species when a species-level taxon is specified
Generated October 18, 2021