symbol

Download a gene data package by gene symbol

symbol

Download a gene data package by gene symbol

Name

datasets download gene symbol - Download a gene data package by gene symbol

Synopsis

datasets download gene symbol <gene_symbol ...> [flags]

Description

Download a gene data package by gene symbol and taxon (NCBI Taxonomy ID, scientific or common name for a species). If no taxon is specified, data will be returned for human (–taxon human). Gene data packages include gene, transcript and protein sequences and one or more data reports. Data packages are downloaded as a zip archive.

The default gene data package includes the following files:

  • rna.fna (transcript sequences)
  • protein.faa (protein sequences)
  • data_report.jsonl (data report with gene metadata)
  • dataset_catalog.json (a list of files and file types included in the data package)

Examples

  datasets download gene symbol tp53
  datasets download gene symbol brca1 --taxon "mus musculus"

Options

      --api-key string             Specify an NCBI API key
      --debug                      Emit debugging info
      --fasta-filter strings       Limit protein and RNA sequence files to the specified RefSeq nucleotide and protein accessions
      --fasta-filter-file string   Limit protein and RNA sequence files to the specified RefSeq nucleotide and protein accessions included in the specified file
      --filename string            Specify a custom file name for the downloaded data package (default "ncbi_dataset.zip")
      --help                       Print detailed help about a datasets command
      --include string(,string)    Specify the data files to include (comma-separated).
                                     * gene:           gene sequence
                                     * rna:            transcript
                                     * protein:        amino acid sequences
                                     * cds:            nucleotide coding sequences
                                     * 5p-utr:         5'-UTR
                                     * 3p-utr:         3'-UTR
                                     * product-report: gene transcript and protein locations and metadata
                                     * none:           do not retrieve any sequence files
                                      (default [rna,protein])
      --inputfile string           Read a list of NCBI Gene Symbols from a file to use as input
      --no-progressbar             Hide progress bar
      --ortholog strings           Retrieves data for an ortholog set. Provide one or more taxa (any rank, limited to vertebrates and insects) to filter results or 'all' for the complete set.
      --preview                    Show information about the requested data package
      --taxon string               Define species (NCBI taxid, common or scientific name) for gene symbol (default "human")
      --version                    Print version of datasets
Generated May 13, 2024