Download a gene ortholog dataset for a gene using the datasets command-line tool.
Gene ortholog metadata and FASTA sequence are available as an NCBI Datasets Gene Data Package .
At NCBI, gene orthologs are calculated for vertebrate and insect genes. Gene orthologs for most vertebrates are calculated in comparison to human . For fish, we have separately calculated orthologs based on comparison to zebrafish . For insects, orthologs are calculated in comparison to <em>Drosophila melanogaster</em>.
Orthologs are most easily retrieved through the NCBI Datasets command line tool .
Ortholog summaries describe these ortholog datasets in JSON format and can be retrieved using the
summary ortholog command.
Ortholog datasets are downloadable zip files
including sequence data, a data table and a data report for all calculated orthologs of the query gene. Sequence data includes gene, transcript and protein sequences. Ortholog datasets are retrieved using the
download ortholog command.
datasets summary ortholog command prints a summary of an ortholog dataset, including metadata for all calculated gene orthologs of a query gene. The ortholog summary can be requested by NCBI Gene ID, gene symbol or RefSeq nucleotide or protein accession. The summary is returned in JSON format.
When requesting an ortholog summary by gene symbol, you can also specify a species name or species-level Taxonomy ID using the
--taxon flag. If no species is provided, ortholog summaries for human genes will be returned.
For example, here are some datasets examples of each of these:
datasets summary ortholog gene-id 59272 datasets summary ortholog symbol gapdh --taxon mouse datasets summary ortholog symbol gapdh --taxon mouse --taxon-filter mammals
See the additional documentation
for converting JSON (or JSON lines, using the
--as-json-lines flag) to a tabular format.
datasets download ortholog command downloads an ortholog dataset including sequence data, a data table and a data report for for all calculated orthologs of the query gene. Sequence data includes gene, transcript and protein sequences. Datasets are downloaded as a zip file.
Ortholog datasets can be requested by NCBI Gene ID, gene symbol or RefSeq transcript or protein accession.
As with summary requests, you may specify ortholog dataset requested by gene symbol with a species name or species-level Taxonomy ID using the
--taxon flag. If no species is provided, data for human genes will be returned as a gene data package
For example, here are some datasets examples to download a gene ortholog data package:
datasets download ortholog gene-id 59272 datasets download ortholog symbol gapdh --taxon mouse datasets download ortholog accession NM_000492.4 --filename cftr-ortho.zip
To convert the contained data report to tabular format, read about using the dataformat tool .