gene

download a gene dataset

Name

datasets download gene - download a gene dataset

Synopsis

datasets download gene [flags]

Description

Download a gene dataset including gene, transcript and protein sequence, a data table and a data report. Gene datasets can be specified by NCBI Gene ID, symbol or RefSeq accession. Datasets are downloaded as a zip file.

The default gene dataset includes the following files:

  • gene.fna (gene sequences)
  • rna.fna (transcript sequences)
  • protein.faa (protein sequences)
  • data_report.jsonl (data report with gene metadata)
  • data_table.tsv (data table with gene metadata, one transcript per row)
  • dataset_catalog.json (a list of files and file types included in the dataset)

Refer to NCBI’s command line quickstart documentation for information about getting started with the command-line tools.

Examples

  datasets download gene gene-id 672
  datasets download gene symbol brca1 --taxon mouse
  datasets download gene accession NP_000483.3
  datasets download gene gene-id 2778 --fasta-filter NC_000020.11,NM_001077490.3,NP_001070958.1

Options

      --api-key string             NCBI Datasets API Key
      --exclude-gene               exclude gene.fna (gene sequence file)
      --exclude-protein            exclude protein.faa (protein sequence file)
      --exclude-rna                exclude rna.fna (transcript sequence file)
      --fasta-filter strings       limit gene fasta download to a specific list of accessions
      --fasta-filter-file string   file of accessions to limit gene fasta download
      --filename string            specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
  -h, --help                       help for gene
      --include-3p-utr             include 3p_utr.fna (3'-UTR sequence file)
      --include-5p-utr             include 5p_utr.fna (5'-UTR sequence file)
      --include-cds                include cds.fna (CDS sequence file)
      --no-progressbar             hide progress bar

Fields

MnemonicName
annotation-assemblies-in-scope-accessionAnnotation Assemblies in Scope Accession
annotation-assemblies-in-scope-nameAnnotation Assemblies in Scope Name
annotation-release-dateAnnotation Release Date
annotation-release-nameAnnotation Release Name
chromosomesChromosomes
common-nameCommon Name
descriptionDescription
ensembl-geneidsEnsembl GeneIDs
gene-idNCBI GeneID
gene-typeGene Type
genomic-range-accessionGenomic Range Sequence Accession
genomic-range-range-orientationGenomic Range Orientation
genomic-range-range-startGenomic Range Start
genomic-range-range-stopGenomic Range Stop
genomic-region-gene-range-accessionGenomic Region Gene Range Sequence Accession
genomic-region-gene-range-range-orientationGenomic Region Gene Range Orientation
genomic-region-gene-range-range-startGenomic Region Gene Range Start
genomic-region-gene-range-range-stopGenomic Region Gene Range Stop
genomic-region-genomic-region-typeGenomic Region Genomic Region Type
name-authorityNomenclature Authority
name-idNomenclature ID
omim-idsOMIM IDs
orientationOrientation
protein-accessionProtein Accession
protein-ensembl-proteinProtein Ensembl Protein
protein-isoformProtein Isoform
protein-lengthProtein Length
protein-mat-peptide-accessionProtein Mature Peptide Accession
protein-mat-peptide-lengthProtein Mature Peptide Length
protein-mat-peptide-nameProtein Mature Peptide Name
protein-nameProtein Name
ref-standard-gene-range-accessionReference Standard Gene Range Sequence Accession
ref-standard-gene-range-range-orientationReference Standard Gene Range Orientation
ref-standard-gene-range-range-startReference Standard Gene Range Start
ref-standard-gene-range-range-stopReference Standard Gene Range Stop
ref-standard-genomic-region-typeReference Standard Genomic Region Type
replaced-gene-idReplaced NCBI GeneID
rna-typeRNA Type
swissprot-accessionsSwissProt Accessions
symbolSymbol
synonymsSynonyms
tax-idTaxonomic ID
tax-nameTaxonomic Name
transcript-accessionTranscript Accession
transcript-cds-accessionTranscript CDS Sequence Accession
transcript-cds-range-orientationTranscript CDS Orientation
transcript-cds-range-startTranscript CDS Start
transcript-cds-range-stopTranscript CDS Stop
transcript-ensembl-transcriptTranscript Ensembl Transcript
transcript-genomic-location-accessionTranscript Genomic Accession
transcript-genomic-location-exon-orientationTranscript Genomic Exons Orientation
transcript-genomic-location-exon-startTranscript Genomic Exons Start
transcript-genomic-location-exon-stopTranscript Genomic Exons Stop
transcript-genomic-location-range-orientationTranscript Genomic Orientation
transcript-genomic-location-range-startTranscript Genomic Start
transcript-genomic-location-range-stopTranscript Genomic Stop
transcript-genomic-location-seq-nameTranscript Genomic Seq Name
transcript-lengthTranscript Transcript Length
transcript-nameTranscript Transcript Name
transcript-protein-accessionTranscript Protein Accession
transcript-protein-ensembl-proteinTranscript Protein Ensembl Protein
transcript-protein-isoformTranscript Protein Isoform
transcript-protein-lengthTranscript Protein Length
transcript-protein-mat-peptide-accessionTranscript Protein Mature Peptide Accession
transcript-protein-mat-peptide-lengthTranscript Protein Mature Peptide Length
transcript-protein-mat-peptide-nameTranscript Protein Mature Peptide Name
transcript-protein-nameTranscript Protein Name
transcript-transcript-typeTranscript Type

Commands


gene-id

download a gene dataset by NCBI Gene ID

symbol

download a gene dataset by gene symbol

accession

download a gene dataset by RefSeq nucleotide or protein accession

taxon

download a gene dataset by taxon

Generated October 22, 2021