genome

download a coronavirus genome dataset by taxon

Name

datasets download virus genome - download a coronavirus genome dataset by taxon

Description

Download a coronavirus genome dataset including genome, CDS and protein sequence, annotation and a detailed data report. Coronavirus genome datasets are limited to the Coronaviridae family including SARS-CoV-2. Coronavirus genome datasets can be specified by taxon. Datasets are downloaded as a zip file.

The default coronavirus genome dataset includes the following files (if available):

  • genomic.fna (genomic sequences)
  • cds.fna (nucleotide coding sequences)
  • protein.faa (protein sequences)
  • protein.gpff (protein sequence and annotation in GenPept flat file format)
  • protein structures in PDB format
  • data_report.jsonl (data report with viral metadata)
  • virus_dataset.md (README containing details on sequence file data content and other information)
  • dataset_catalog.json (a list of files and file types included in the dataset)

Refer to NCBI’s command line quickstart documentation for information about getting started with the command-line tools.

Examples

  datasets download virus genome taxon sars-cov-2 --host dog
  datasets download virus genome taxon coronaviridae --host "manis javanica"

Options

      --api-key string    NCBI Datasets API Key
      --filename string   specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
  -h, --help              help for genome
      --no-progressbar    hide progress bar

Commands


taxon

Request genome data by taxonomic id or name. Allowed taxon are limited to all taxa under Coronaviridae, e.g. sars2 or betacoronavirus

Generated November 19, 2021