Download a coronavirus genome dataset including genome, CDS and protein sequence, annotation and a detailed data report. Coronavirus genome datasets are limited to the Coronaviridae family including SARS-CoV-2. Coronavirus genome datasets can be specified by taxon. Datasets are downloaded as a zip file.

The default coronavirus genome dataset includes the following files (if available):

  • genomic.fna (genomic sequences)
  • cds.fna (nucleotide coding sequences)
  • protein.faa (protein sequences)
  • protein.gpff (protein sequence and annotation in GenPept flat file format)
  • protein structures in PDB format
  • data_report.jsonl (data report with viral metadata)
  • (README containing details on sequence file data content and other information)
  • dataset_catalog.json (a list of files and file types included in the dataset)

Refer to NCBI’s command line quickstart documentation for information about getting started with the command-line tools.


  datasets download virus genome taxon sars-cov-2 --host dog
  datasets download virus genome taxon coronaviridae --host "manis javanica"


      --api-key string    NCBI Datasets API Key
      --filename string   specify a custom file name for the downloaded dataset (default "")
  -h, --help              help for genome
      --no-progressbar    hide progress bar



Request genome data by taxonomic id or name. Allowed taxon are limited to all taxa under Coronaviridae, e.g. sars2 or betacoronavirus

Generated November 19, 2021