Representations of a downloaded NCBI Datasets Package.
NCBI Datasets provides data in ZipArchives for Genome, Gene, Pathogen and Virus resources. These classes each contain dataset catalogs that help programmatically determine the file contents.
A quickstart is to download a package, and then create a generic Dataset wrapper:
>>> from ncbi.datasets.package.dataset import get_dataset_from_file
package = get_dataset_from_file(path_to_file) for report in package.get_data_reports():
# do something with the protobuf report object
Create a Dataset-derived object of type ‘dataset_type’ and return it.
A subclass of the class ‘Dataset’ as specified by the caller.
Base class to extract files from datasets package
Functions to extract files from a datasets package based on file names and types in the packages catalog file
Return True if the dataset is stored in a zip file
Return the data directory within the dataset (e.g. ncbi_dataset/data)
Return the datasets file catalog as a dictionary
Return names of all files of type ‘file_type’, e.g. ‘PROTEIN_FASTA’
Return contents of all files of type ‘file_type’ along with their names
Return file handles for all files of type ‘file_type’ along with their names
Return all file types available in the current dataset
Return full text of file ‘file_name’
Get handle of file using name within dataset directory
file_name – Name of file within the data directory, e.g. if the full datasets path is ncbi_dataset/data/GCF_000001405.39/chrX.fna, file_name should be GCF_000001405.39/chrX.fna
Handle to the specified file
Retrieve report records defined via protobuf schema from jsonl files.
file_type – The type of file from the dataset catalog, e.g. ‘DATA_REPORT’ or ‘SEQUENCE_REPORT’.
protobuf_report_type – Schema, defined using GRPC protobuf, for the current dataset and file type.
Yields a set of protobuf objects for the dataset and file type.
Retrieve Assembly reports
Methods to read Assembly and Assembly Sequence reports
Retrieve assembly reports
Yields a set of AssemblyDataReport protobuf objects
Retrieve assembly sequence reports
Yields a set of Assembly SequenceInfo protobuf objects
Retrieve Gene reports
Methods to read Gene reports
Retrieve a gene report object
Yields a set of GeneDescriptor protobuf objects
Retrieve Virus reports
Methods to read Virus reports
Retrieve virus assembly objects
Yields a set of virus assembly report protobuf objects
Retrieve MicroBiggee pathogen reports
Methods to read MicroBiggee reports
Retrieve MicroBigge data report objects
Yields a set of MicroBigge report protobuf objects