accession

Download a genome data package by Assembly or BioProject accession

accession

Download a genome data package by Assembly or BioProject accession

Name

datasets download genome accession - Download a genome data package by Assembly or BioProject accession

Synopsis

datasets download genome accession <accession ...> [flags]

Description

Download a genome data package by Assembly or BioProject accession. Genome data packages may include assembled genome, transcript and protein sequences, annotation and one or more data reports. Data packages are downloaded as a zip archive.

The default genome data package includes the following files:

  • _<assembly_name>_genomic.fna (genomic sequences)
  • assembly_data_report.jsonl (data report with genome assembly and annotation metadata)
  • dataset_catalog.json (a list of files and file types included in the data package)

Examples

  datasets download genome accession GCF_000001405.40 --chromosomes X,Y --include protein,cds
  datasets download genome accession GCA_003774525.2 GCA_000001635 --chromosomes X,Y,Un.9
  datasets download genome accession GCA_003774525.2 --preview
  datasets download genome accession PRJNA289059 --include none

Options

      --annotated                 Limit to annotated genomes
      --api-key string            Specify an NCBI API key
      --assembly-level string     Limit to genomes at one or more assembly levels (comma-separated):
                                    * chromosome
                                    * complete
                                    * contig
                                    * scaffold
                                     (default "[]")
      --assembly-source string    Limit to 'RefSeq' (GCF_) or 'GenBank' (GCA_) genomes (default "all")
      --assembly-version string   Limit to 'latest' assembly accession version or include 'all' (latest + previous versions)
      --chromosomes strings       Limit to a specified, comma-delimited list of chromosomes, or 'all' for all chromosomes
      --debug                     Emit debugging info
      --dehydrated                Download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).
      --exclude-atypical          Exclude atypical assemblies
      --exclude-multi-isolate     Exclude assemblies from multi-isolate projects
      --filename string           Specify a custom file name for the downloaded data package (default "ncbi_dataset.zip")
      --from-type                 Only return records with type material
      --help                      Print detailed help about a datasets command
      --include string(,string)   Specify the data files to include (comma-separated).
                                    * genome:     genomic sequence
                                    * rna:        transcript
                                    * protein:    amnio acid sequences
                                    * cds:        nucleotide coding sequences
                                    * gff3:       general feature file
                                    * gtf:        gene transfer format
                                    * gbff:       GenBank flat file
                                    * seq-report: sequence report file
                                    * none:       do not retrieve any sequence files
                                     (default [genome])
      --inputfile string          Read a list of NCBI Assembly or BioProject accessions from a file to use as input
      --mag string                Limit to metagenome assembled genomes (only) or remove them from the results (exclude) (default "all")
      --no-progressbar            Hide progress bar
      --preview                   Show information about the requested data package
      --reference                 Limit to reference genomes
      --released-after string     Limit to genomes released on or after a specified date (MM/DD/YYYY)
      --released-before string    Limit to genomes released on or before a specified date (MM/DD/YYYY)
      --search strings            Limit results to genomes with specified text in the searchable fields:
                                  species and infraspecies, assembly name and submitter.
                                  To search multiple strings, use the flag multiple times.
      --version                   Print version of datasets
Generated May 21, 2024