genome

download a genome dataset

Name

datasets download genome - download a genome dataset

Description

Download a genome dataset including genome, transcript and protein sequence, annotation and a detailed data report. Genome datasets can be specified by NCBI Assembly or BioProject accession or taxon. Datasets are downloaded as a zip file.

The default genome dataset includes the following files (if available):

  • genomic.fna (genomic sequences)
  • rna.fna (transcript sequences)
  • protein.faa (protein sequences)
  • genomic.gff (genome annotation in gff3 format)
  • data_report.jsonl (data report with genome assembly and annotation metadata)
  • dataset_catalog.json (a list of files and file types included in the dataset)

Refer to NCBI’s command line quickstart documentation for information about getting started with the command-line tools.

Examples

  datasets download genome accession GCF_000001405.39 --chromosomes X,Y --exclude-gff3 --exclude-rna
  datasets download genome taxon "bos taurus" --dehydrated
  datasets download genome taxon human --assembly-level chromosome,complete_genome --dehydrated
  datasets download genome taxon mouse --search C57BL/6J --search "Broad Institute" --dehydrated

Options

  -a, --annotated                only include genomes with annotation
      --api-key string           NCBI Datasets API Key
      --assembly-level string    restrict assemblies to a comma-separated list of one or more of: chromosome, complete_genome, contig, scaffold
      --assembly-source string   restrict assemblies to refseq or genbank only
      --chromosomes strings      limit to a specified, comma-delimited list of chromosomes (default [all])
      --dehydrated               download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).
      --exclude-genomic-cds      exclude cds_from_genomic.fna (genomic cds file)
      --exclude-gff3             exclude genomic.gff (gff3 annotation file)
      --exclude-protein          exclude protein.faa (protein sequence file)
      --exclude-rna              exclude rna.fna (transcript sequence file)
      --exclude-seq              exclude genomic.fna (genomic sequence file)
      --filename string          specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
  -h, --help                     help for genome
      --include-gbff             include genomic.gbff (GenBank flat file sequence and annotation), if available
      --include-gtf              include genomic.gtf (gtf annotation file), if available
      --no-progressbar           hide progress bar
      --reference                limit to reference and representative (GCF_ and GCA_) assemblies
      --released-before string   only include genomes that have been released before a specified date (MM/DD/YYYY)
      --released-since string    only include genomes that have been released after a specified date (MM/DD/YYYY)
      --search strings           only include genomes that have the specified text in the
                                 searchable fields: species and infraspecies, assembly name and submitter
                                 To provide multiple strings '--search' can be included multiple times

Fields

MnemonicName
annotinfo-busco-completeAnnotation Info BUSCO Complete
annotinfo-busco-duplicatedAnnotation Info BUSCO Duplicated
annotinfo-busco-fragmentedAnnotation Info BUSCO Fragmented
annotinfo-busco-lineageAnnotation Info BUSCO Lineage
annotinfo-busco-missingAnnotation Info BUSCO Missing
annotinfo-busco-singlecopyAnnotation Info BUSCO Single Copy
annotinfo-busco-totalcountAnnotation Info BUSCO Total Count
annotinfo-busco-verAnnotation Info BUSCO Version
annotinfo-featcount-gene-non-codingAnnotation Info Count Gene Non-coding
annotinfo-featcount-gene-otherAnnotation Info Count Gene Other
annotinfo-featcount-gene-protein-codingAnnotation Info Count Gene Protein-coding
annotinfo-featcount-gene-pseudogeneAnnotation Info Count Gene Pseudogene
annotinfo-featcount-gene-totalAnnotation Info Count Gene Total
annotinfo-nameAnnotation Info Name
annotinfo-release-dateAnnotation Info Release Date
annotinfo-report-urlAnnotation Info Report URL
annotinfo-sourceAnnotation Info Source
assminfo-accessionAssembly Accession
assminfo-bioproject-lineage-accessionAssembly BioProject Lineage Accession
assminfo-bioproject-lineage-parent-accessionsAssembly BioProject Lineage Parent Accessions
assminfo-bioproject-lineage-titleAssembly BioProject Lineage Title
assminfo-biosample-accessionAssembly BioSample Accession
assminfo-blast-urlAssembly Blast URL
assminfo-descriptionAssembly Description
assminfo-genbank-assm-accessionAssembly GenBank Accession
assminfo-levelAssembly Level
assminfo-linked-assmAssembly Linked Assembly
assminfo-nameAssembly Name
assminfo-paired_accessionAssembly Paired Accession
assminfo-refseq-assm-accessionAssembly RefSeq Accession
assminfo-refseq-categoryAssembly Refseq Dategory
assminfo-sequencing-techAssembly Sequencing Tech
assminfo-submission-dateAssembly Submission Date
assminfo-submitterAssembly Submitter
assminfo-typeAssembly Type
assminfo-ucsc-assm-nameAssembly UCSC Assembly Name
assmstats-contig-l50Assembly Stats Contig L50
assmstats-contig-n50Assembly Stats Contig N50
assmstats-gaps-between-scaffolds-countAssembly Stats Gaps Between Scaffolds Count
assmstats-gc-countAssembly Stats GC Count
assmstats-number-of-component-sequencesAssembly Stats Number of Component Sequences
assmstats-number-of-contigsAssembly Stats Number of Contigs
assmstats-number-of-scaffoldsAssembly Stats Number of Scaffolds
assmstats-scaffold-l50Assembly Stats Scaffold L50
assmstats-scaffold-n50Assembly Stats Scaffold N50
assmstats-total-number-of-chromosomesAssembly Stats Total Number of Chromosomes
assmstats-total-sequence-lenAssembly Stats Total Sequence Length
assmstats-total-ungapped-lenAssembly Stats Total Ungapped Length
breedBreed
common-nameCommon name
cultivarCultivar
ecotypeEcotype
isolateIsolate
organelle-assembly-nameOrganelle Assembly Name
organelle-bioproject-accessionsOrganelle BioProject Accessions
organelle-descriptionOrganelle Description
organelle-infraspecific-nameOrganelle Infraspecific Name
organelle-submitterOrganelle Submitter
organelle-total-seq-lengthOrganelle Total Seq Length
organism-nameOrganism name
sexSex
strainStrain
tax-idTaxonomic ID
wgs-contigs-urlWGS contigs URL
wgs-project-accessionWGS project accession
wgs-urlWGS URL

Commands


accession

download a genome dataset by NCBI Assembly or BioProject accession

taxon

download a genome dataset by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)

Generated October 22, 2021