accession
Download a genome data package by Assembly or BioProject accession
accession
Name
datasets download genome accession - Download a genome data package by Assembly or BioProject accession
Synopsis
datasets download genome accession <accession ...> [flags]
Description
Download a genome data package by Assembly or BioProject accession. Genome data packages may include assembled genome, transcript and protein sequences, annotation and one or more data reports. Data packages are downloaded as a zip archive.
The default genome data package includes the following files:
_<assembly_name>_genomic.fna (genomic sequences) - assembly_data_report.jsonl (data report with genome assembly and annotation metadata)
- dataset_catalog.json (a list of files and file types included in the data package)
Examples
datasets download genome accession GCF_000001405.40 --chromosomes X,Y --include protein,cds
datasets download genome accession GCA_003774525.2 GCA_000001635 --chromosomes X,Y,Un.9
datasets download genome accession GCA_003774525.2 --preview
datasets download genome accession PRJNA289059 --include none
Options
--annotated Limit to annotated genomes
--api-key string Specify an NCBI API key
--assembly-level string Limit to genomes at one or more assembly levels (comma-separated):
* chromosome
* complete
* contig
* scaffold
(default "[]")
--assembly-source string Limit to 'RefSeq' (GCF_) or 'GenBank' (GCA_) genomes (default "all")
--assembly-version string Limit to 'latest' assembly accession version or include 'all' (latest + previous versions)
--chromosomes strings Limit to a specified, comma-delimited list of chromosomes, or 'all' for all chromosomes
--debug Emit debugging info
--dehydrated Download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).
--exclude-atypical Exclude atypical assemblies
--exclude-multi-isolate Exclude assemblies from multi-isolate projects
--filename string Specify a custom file name for the downloaded data package (default "ncbi_dataset.zip")
--from-type Only return records with type material
--help Print detailed help about a datasets command
--include string(,string) Specify the data files to include (comma-separated).
* genome: genomic sequence
* rna: transcript
* protein: amnio acid sequences
* cds: nucleotide coding sequences
* gff3: general feature file
* gtf: gene transfer format
* gbff: GenBank flat file
* seq-report: sequence report file
* none: do not retrieve any sequence files
(default [genome])
--inputfile string Read a list of NCBI Assembly or BioProject accessions from a file to use as input
--mag string Limit to metagenome assembled genomes (only) or remove them from the results (exclude) (default "all")
--no-progressbar Hide progress bar
--preview Show information about the requested data package
--reference Limit to reference genomes
--released-after string Limit to genomes released on or after a specified date (MM/DD/YYYY)
--released-before string Limit to genomes released on or before a specified date (MM/DD/YYYY)
--search strings Limit results to genomes with specified text in the searchable fields:
species and infraspecies, assembly name and submitter.
To search multiple strings, use the flag multiple times.
--version Print version of datasets
Generated May 21, 2024