Genome assembly report

Genome record accession, organism, assembly statistics, and annotation info

Genome assembly report

Genome record accession, organism, assembly statistics, and annotation info

The downloaded genome package contains a genome assembly data report in JSON Lines format in the file:

ncbi_dataset/data/assembly_data_report.jsonl

Each line of the genome assembly data report file is a hierarchical JSON object that represents a single genome assembly record. The schema of the genome assembly record is defined in the tables below where each row describes a single field in the report or a sub-structure, which is a collection of fields. The outermost structure of the report is AssemblyDataReport.

Table fields that include a Table Field Mnemonic can be used with the dataformat command-line tool's --fields option Refer to the dataformat CLI tool reference to see how you can use this tool to transform assembly data reports from JSON Lines to tabular formats.

Sample report

{
  "accession": "GCF_000001405.40",
  "annotationInfo": {
    "busco": {
      "buscoLineage": "primates_odb10",
      "buscoVer": "4.1.4",
      "complete": 0.99187225,
      "duplicated": 0.007256894,
      "fragmented": 0.0015239477,
      "missing": 0.0066037737,
      "singleCopy": 0.9846154,
      "totalCount": "13780"
    },
    "method": "Best-placed RefSeq; Gnomon; RefSeqFE; cmsearch; tRNAscan-SE",
    "name": "GCF_000001405.40-RS_2023_10",
    "pipeline": "NCBI eukaryotic genome annotation pipeline",
    "provider": "NCBI RefSeq",
    "releaseDate": "2023-10-02",
    "reportUrl": "https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/genome/annotation_euk/Homo_sapiens/GCF_000001405.40-RS_2023_10.html",
    "softwareVersion": "10.2",
    "stats": {
      "geneCounts": {
        "nonCoding": 22158,
        "other": 413,
        "proteinCoding": 20080,
        "pseudogene": 17001,
        "total": 59652
      }
    },
    "status": "Updated annotation"
  },
  "assemblyInfo": {
    "assemblyLevel": "Chromosome",
    "assemblyName": "GRCh38.p14",
    "assemblyStatus": "current",
    "assemblyType": "haploid-with-alt-loci",
    "bioprojectAccession": "PRJNA31257",
    "bioprojectLineage": [
      {
        "bioprojects": [
          {
            "accession": "PRJNA31257",
            "title": "The Human Genome Project, currently maintained by the Genome Reference Consortium (GRC)"
          }
        ]
      }
    ],
    "blastUrl": "https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn&BLAST_SPEC=GDH_GCF_000001405.40",
    "description": "Genome Reference Consortium Human Build 38 patch release 14 (GRCh38.p14)",
    "pairedAssembly": {
      "accession": "GCA_000001405.29",
      "onlyGenbank": "4 unlocalized and unplaced scaffolds.",
      "status": "current"
    },
    "refseqCategory": "reference genome",
    "releaseDate": "2022-02-03",
    "submitter": "Genome Reference Consortium",
    "synonym": "hg38"
  },
  "assemblyStats": {
    "contigL50": 18,
    "contigN50": 57879411,
    "gapsBetweenScaffoldsCount": 349,
    "gcCount": "1374283647",
    "gcPercent": 41.0,
    "numberOfComponentSequences": 35611,
    "numberOfContigs": 996,
    "numberOfOrganelles": 1,
    "numberOfScaffolds": 470,
    "scaffoldL50": 16,
    "scaffoldN50": 67794873,
    "totalNumberOfChromosomes": 24,
    "totalSequenceLength": "3099441038",
    "totalUngappedLength": "2948318359"
  },
  "currentAccession": "GCF_000001405.40",
  "organelleInfo": [
    {
      "description": "Mitochondrion",
      "submitter": "Genome Reference Consortium",
      "totalSeqLength": "16569"
    }
  ],
  "organism": {
    "commonName": "human",
    "organismName": "Homo sapiens",
    "taxId": 9606
  },
  "pairedAccession": "GCA_000001405.29",
  "sourceDatabase": "SOURCE_DATABASE_REFSEQ"
}

AssemblyDataReport Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`accession`	`accession`	Assembly Accession	`string`	The GenColl assembly accession	`GCF_000001405.40`
`currentAccession`	`current-accession`	Current Accession	`string`	The latest GenColl assembly accession for this revision chain	`GCF_000001405.40`
`sourceDatabase`	`source_database`	Source Database	`SourceDatabase`	Source of the accession. The paired accession, if it exists, is from the other database.	`REFSEQ` `GENBANK`
`organism`	`organism-`	Organism	`Organism`
`assemblyInfo`	`assminfo-`	Assembly	`AssemblyInfo`	Metadata for the genome assembly submission
`assemblyStats`	`assmstats-`	Assembly Stats	`AssemblyStats`	Global statistics for the genome assembly
`organelleInfo repeated`	`organelle-`	Organelle	`OrganelleInfo`	Metadata for all associated organelle genomes
`annotationInfo`	`annotinfo-`	Annotation	`AnnotationInfo`	Metadata and statistics for the genome assembly annotation, when available
`wgsInfo`	`wgs-`	WGS	`WGSInfo`	Metadata pertaining to the Whole Genome Shotgun (WGS) record for the genome assembliesthat are complete genomes. Those that are clone-based do not haveWGS-master records.
`typeMaterial`	`type_material-`	Type Material	`TypeMaterial`
`checkmInfo`	`checkm-`	CheckM	`CheckM`	Metadata on the completeness and contamination of this assembly
`averageNucleotideIdentity`	`ani-`	ANI	`AverageNucleotideIdentity`

ANIMatch Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`assembly`	`assembly`	Assembly	`string`		`GCA_010191885.1`
`organismName`	`organism`	Organism	`string`		`Salmonella enterica subsp. enterica serovar Typhimurium`
`category`	`category`	Type Category	`ANITypeCategory`		`Type material`
`ani`	`ani`	ANI	`float`		`98.5`
`assemblyCoverage`	`assembly_coverage`	Assembly Coverage	`float`	AKA qcoverage	`90.75`
`typeAssemblyCoverage`	`type_assembly_coverage`	Type Assembly Coverage	`float`	AKA scoverage	`89.60`

AnnotationInfo Structure

Field	Table Field Mnemonic	Table Column Name	Type
`name`	`name`	Name	`string`
`provider`	`provider`	Provider	`string`
`releaseDate`	`release-date`	Release Date	`string`
`reportUrl`	`report-url`	Report URL	`string`
`stats`	`featcount-`	Count	`FeatureCounts`
`busco`	`busco-`	BUSCO	`BuscoStat`
`method`	`method`	Method	`string`
`pipeline`	`pipeline`	Pipeline	`string`
`softwareVersion`	`software-version`	Software Version	`string`
`status`	`status`	Status	`string`
`releaseVersion`	`release-version`	Release Version	`string`

AssemblyInfo Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`assemblyLevel`	`level`	Level	`string`	The level at which a genome has been assembled	`chromosome` `scaffold` `contig`
`assemblyStatus`	`status`	Status	`AssemblyStatus`	The GenColl assembly status	`current`
`pairedAssembly`	`paired-assm-`	Paired Assembly	`PairedAssembly`	Metadata from the GenBank or RefSeq assembly paired with this one
`assemblyName`	`name`	Name	`string`	The assembly submitter’s name for the genome assembly, when provided. Otherwise, a default name in theform ASM#####v# is assigned	`GRCh38.p14` `ASM985889v3`
`assemblyType`	`type`	Type	`string`	Chromosome content of the submitted genome assembly	`haploid-with-alt-loci` `haploid`
`bioprojectLineage repeated`	`bioproject-`	BioProject	`BioProjectLineage`	The lineage of BioProject accessions. The specific BioProject which produced the sequences in thegenome assembly is listed first, followed in order by its antecedents.
`bioprojectAccession`	`bioproject`	BioProject Accession	`string`
`releaseDate`	`release-date`	Release Date	`string`	Date the assembly was made available by NCBI. This field is not returned by versions of the datasets Command Line Interface (CLI) program < 15.
`description`	`description`	Description	`string`	Long description for this genome
`submitter`	`submitter`	Submitter	`string`	The submitting consortium or organization. Full submitter information is available in the BioProject
`refseqCategory`	`refseq-category`	Refseq Category	`string`	The RefSeq Category is either reference or representative genome and indicates the RefSeq project classification	`reference genome` `representative genome`
`synonym`	`synonym`	Synonym	`string`	Genome name ascribed to this assembly by the UC Santa Cruz genome browser	`hg38`
`linkedAssemblies repeated`	`linked-assm-`	Linked Assembly	`LinkedAssembly`	Genome assemblies derived from the same diploid individual
`atypical`	`atypical`	Atypical	`AtypicalInfo`	Information on atypical genomes - genomes that have assembly issues or are otherwise atypical
`genomeNotes repeated`	`notes`	Notes	`string`	All the RefSeq messages associated with this assembly
`sequencingTech`	`sequencing-tech`	Sequencing Tech	`string`	Sequencing technology used to sequence this genome
`assemblyMethod`	`assembly-method`	Assembly Method	`string`	Genome assembly method
`biosample`	`biosample-`	BioSample	`BioSampleDescriptor`	NCBI BioSample from which the sequences in the genome assembly were obtained.
`blastUrl`	`blast-url`	Blast URL	`string`	URL to blast page for this assembly
`comments`	coming soon	coming soon	`string`	Freeform comments
`suppressionReason`	`suppression-reason`	Suppression Reason	`string`	The reason for the assembly is suppressed, for suppressed assemblies
`diploidRole`			`LinkedAssemblyType`

AssemblyStats Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description
`totalNumberOfChromosomes`	`total-number-of-chromosomes`	Total Number of Chromosomes	`uint32`	Count of nuclear chromosomes, organelles and plasmids in a submitted genome assembly
`totalSequenceLength`	`total-sequence-len`	Total Sequence Length	`uint64`	Total sequence length of the nuclear genome including unplaced and unlocalized sequences
`totalUngappedLength`	`total-ungapped-len`	Total Ungapped Length	`uint64`	Total length of all top-level sequences ignoring gaps. Any stretch of 10 or more Ns in a sequence is treated like a gap
`numberOfContigs`	`number-of-contigs`	Number of Contigs	`uint32`	Total number of sequence contigs in the assembly. Any stretch of 10 or more Ns in a sequence is treated as a gap between twocontigs in a scaffold when counting contigs and calculating contig N50 & L50 values
`contigN50`	`contig-n50`	Contig N50	`uint32`	Length such that sequence contigs of this length or longer include half the bases of the assembly
`contigL50`	`contig-l50`	Contig L50	`uint32`	Number of sequence contigs that are longer than, or equal to, the N50 length and therefore include half the bases of the assembly
`numberOfScaffolds`	`number-of-scaffolds`	Number of Scaffolds	`uint32`	Number of scaffolds including placed, unlocalized, unplaced, alternate loci and patch scaffolds
`scaffoldN50`	`scaffold-n50`	Scaffold N50	`uint32`	Length such that scaffolds of this length or longer include half the bases of the assembly
`scaffoldL50`	`scaffold-l50`	Scaffold L50	`uint32`	Number of scaffolds that are longer than, or equal to, the N50 length and therefore include half the bases of the assembly
`gapsBetweenScaffoldsCount`	`gaps-between-scaffolds-count`	Gaps Between Scaffolds Count	`uint32`	Number of unspanned gaps between scaffolds
`numberOfComponentSequences`	`number-of-component-sequences`	Number of Component Sequences	`uint32`	Total number of component WGS or clone sequences in the assembly
`gcPercent`	`gc-percent`	GC Percent	`float`	The percentage of GC base-pairs in the assembly
`genomeCoverage`	`genome-coverage`	Genome Coverage	`string`	Genome assembly coverage
`numberOfOrganelles`	`number-of-organelles`	Number of Organelles	`uint32`	number of organelles

AtypicalInfo Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`isAtypical`	`is-atypical`	Is Atypical	`bool`	If true there are assembly issues or the assembly is in some way non-standard
`warnings repeated`	`warnings`	Warnings	`string`	The reasons that the assembly is considered atypical

AverageNucleotideIdentity Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`taxonomyCheckStatus`	`check-status`	Check status	`AverageNucleotideIdentity.TaxonomyCheckStatus`		`ok` `failed` `inconclusive`
`matchStatus`	`best-match-status`	Best match status	`AverageNucleotideIdentity.MatchStatus`		`derived-species-match`
`submittedOrganism`	`submitted-organism`	Submitted organism	`string`	Column 5 of ANI Report	`Salmonella enterica subsp. enterica serovar Tennessee str. CDC07-0191`
`submittedSpecies`	`submitted-species`	Submitted species	`string`	Column 6 of ANI Report	`Salmonella enterica`
`category`	`category`	Category	`ANITypeCategory`		`syntype`
`submittedAniMatch`	`submitted-ani-match-`	Declared ANI match	`ANIMatch`
`bestAniMatch`	`best-ani-match-`	Best ANI match	`ANIMatch`
`comment`	`comment`	Comment	`string`

BioProject Structure

A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. The record can be retrieved from NCBI BioProject

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`accession`	`accession`	Accession	`string`	BioProject accession	`PRJEB35387`
`title`	`title`	Title	`string`	Title of the BioProject provided by the submitter	`Sciurus carolinensis (grey squirrel) genome assembly, mSciCar1`
`parentAccessions repeated`	`parent-accessions`	Parent Accessions	`string`	BioProject accession containing multiple children BioProjects	`["PRJNA489243","PRJEB33226","PRJEB40665"]`

BioProjectLineage Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`bioprojects repeated`	`lineage-`	Lineage	`BioProject`	A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium

BioSampleAttribute Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`name`	`name`	Name	`string`
`value`	`value`	Value	`string`

BioSampleContact Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`lab`	`lab`	Lab	`string`	Submitter lab name.

BioSampleDescription Structure

Description of the BioSample object

Field	Table Field Mnemonic	Table Column Name	Type
`title`	`title`	Title	`string`
`organism`	`organism-`	Organism	`Organism`
`comment`	`comment`	Comment	`string`

BioSampleDescriptor Structure

TODO: We may be able to delete but not sure if other things are relying on it…

Field	Table Field Mnemonic	Table Column Name	Type	Examples
`accession`	`accession`	Accession	`string`	`SAMN20055006`
`lastUpdated`	`last-updated`	Last updated	`string`
`publicationDate`	`publication-date`	Publication date	`string`
`submissionDate`	`submission-date`	Submission date	`string`
`sampleIds repeated`	`ids-`	Sample Identifiers	`BioSampleId`
`description`	`description-`	Description	`BioSampleDescription`
`owner`	`owner-`	Owner	`BioSampleOwner`
`models repeated`	`models`	Models	`string`
`bioprojects repeated`	`bioproject-`	BioProject	`BioProject`
`package`	`package`	Package	`string`	`MIGS.ba.air.4.0`
`attributes repeated`	`attribute-`	Attribute	`BioSampleAttribute`
`status`	`status-`	Status	`BioSampleStatus`

BioSampleId Structure

Field	Table Field Mnemonic	Table Column Name	Type	Examples
`db`	`db`	Database	`string`	`Wellcome Sanger Institute`
`label`	`label`	Label	`string`	`Sample name`
`value`	`value`	Value	`string`	`COG-UK/ALDP-17A6A8C`

BioSampleOwner Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`name`	`name`	Name	`string`
`contacts repeated`	`contact-`	Contact	`BioSampleContact`

BioSampleStatus Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`status`	`status`	Status	`string`		`live`
`when`	`when`	When	`string`

BuscoStat Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description
`buscoLineage`	`lineage`	Lineage	`string`	BUSCO Lineage
`buscoVer`	`ver`	Version	`string`	BUSCO Version
`complete`	`complete`	Complete	`float`	BUSCO score: Complete
`singleCopy`	`singlecopy`	Single Copy	`float`	BUSCO score: Single Copy
`duplicated`	`duplicated`	Duplicated	`float`	BUSCO score: Duplicated
`fragmented`	`fragmented`	Fragmented	`float`	BUSCO score: Fragmented
`missing`	`missing`	Missing	`float`	BUSCO score: Missing
`totalCount`	`totalcount`	Total Count	`uint64`	BUSCO score: Total Count

CheckM Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`checkmMarkerSet`	`marker-set`	marker set	`string`	What taxonomic group is used as the basis for comparison with this assembly with regards to checkM values	`Mycobacterium avium`
`checkmSpeciesTaxId`	`species-tax-id`	species tax id	`uint32`	The species-level taxid for this assemblies checkM dataset	`1764`
`checkmMarkerSetRank`	`marker-set-rank`	marker set rank	`string`	CheckM taxonomic rank of checkm_marker_set	`species` `genus`
`checkmVersion`	`version`	version	`string`	CheckM software version	`v1.2.0`
`completeness`	`completeness`	completeness	`float`	What percent complete is this assembly	`86.83`
`contamination`	`contamination`	contamination	`float`	What is the contamination percentage for this assembly	`5.18`
`completenessPercentile`	`completeness-percentile`	completeness percentile	`float`	The percent of assemblies under the taxonomic grouping ‘checkm_marker_set’ that this assembly is as-or-more complete than.	`79`

FeatureCounts Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`geneCounts`	`gene-`	Gene	`GeneCounts`	Counts of gene types

GeneCounts Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description
`total`	`total`	Total	`uint32`	Total number of annotated genes
`proteinCoding`	`protein-coding`	Protein-coding	`uint32`	Count of annotated genes that encode a protein
`nonCoding`	`non-coding`	Non-coding	`uint32`	Count of transcribed non-coding genes (e.g. lncRNAs, miRNAs, rRNAs, etc…) excludes transcribed pseudogenes
`pseudogene`	`pseudogene`	Pseudogene	`uint32`	Count of transcribed and non-transcribed pseudogenes
`other`	`other`	Other	`uint32`	Count of genic region GeneIDs and non-genic regulatory GeneIDs

InfraspecificNames Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`breed`	`breed`	Breed	`string`	A homogenous group of animals within a domesticated species	`Hereford` `boxer`
`cultivar`	`cultivar`	Cultivar	`string`	A variety of plant within a species produced and maintained by cultivation	`B73`
`ecotype`	`ecotype`	Ecotype	`string`	A population or subspecies occupying a distinct habitat	`Alpine`
`isolate`	`isolate`	Isolate	`string`	The individual isolate from which the sequences in the genome assembly were derived	`L1 Dominette 01449 registration number 42190680` `Pmale09`
`sex`	`sex`	Sex	`string`	Male or female	`female`
`strain`	`strain`	Strain	`string`	A genetic variant, subtype or culture within a species	`SE11`

LinkedAssembly Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`linkedAssembly`	`accession`	Accession	`string`	The linked assembly accession	`GCA_000212995.1`
`assemblyType`	`type`	Type	`LinkedAssemblyType`	The linked assembly type

OrganelleInfo Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description
`infraspecificName`	`infraspecific-name`	Infraspecific Name	`string`	The strain, breed, cultivar or ecotype of the organism from which the sequences in the assembly were derived
`bioproject repeated`	`bioproject-accessions`	BioProject Accessions	`string`	The associated BioProject accession, when available
`description`	`description`	Description	`string`	Long description of the organelle genome
`totalSeqLength`	`total-seq-length`	Total Seq Length	`uint64`	Sequence length of the organelle genome
`submitter`	`submitter`	Submitter	`string`	Name of submitter

PairedAssembly Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`accession`	`accession`	Accession	`string`	The GenColl assembly accession of the GenBank or RefSeq assembly paired with this one	`GCF_000001405.40`
`status`	`status`	Status	`AssemblyStatus`	GenColl Assembly status from paired record	`current`
`annotationName`	`name`	Name	`string`	Annotation name from paired record
`onlyGenbank`	`only-genbank`	Only Genbank	`string`	Sequences that are only included in the GenBank assembly
`onlyRefseq`	`only-refseq`	Only RefSeq	`string`	Sequences that are only included in the RefSeq assembly
`changed`	`changed`	Changed	`string`	Sequences present on both the GenBank and the RefSeq assemblies that have been changed, e.g., contaminated sequence in the GenBank assembly has been replaced with a gap
`manualDiff`	`manual-diff`	Manual Diff	`string`	Additional details about sequence differences between the GenBank and RefSeq assemblies

TypeMaterial Structure

Field	Table Field Mnemonic	Table Column Name	Type	Description	Examples
`typeLabel`	`label`	Label	`string`
`typeDisplayText`	`display_text`	Display Text	`string`

WGSInfo Structure

Whole Genome Shotgun (WGS) projects are genome assemblies of incomplete genomes or incomplete chromosomes of prokaryotes or eukaryotes that are generally being sequenced by a whole genome shotgun strategy.

Field	Table Field Mnemonic	Table Column Name	Type	Examples
`wgsProjectAccession`	`project-accession`	project accession	`string`	`AAEX03` `CABHLF01`
`masterWgsUrl`	`url`	URL	`string`	`https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/nuccore/AAEX00000000.3`
`wgsContigsUrl`	`contigs-url`	contigs URL	`string`	`https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/Traces/wgs/AAEX03`

ANITypeCategory Enumeration

Name	Number	Description
`ANI_CATEGORY_UNKNOWN`	`0`
`claderef`	`1`
`category_na`	`2`
`neotype`	`3`
`no_type`	`4`
`pathovar`	`5`
`reftype`	`6`
`suspected_type`	`7`
`syntype`	`8`
`type`	`9`

AssemblyStatus Enumeration

Name	Number	Description
`ASSEMBLY_STATUS_UNKNOWN`	`0`
`current`	`1`
`previous`	`2`
`suppressed`	`3`
`retired`	`4`	This is deprecated - should no longer be seen in the data

AverageNucleotideIdentity.MatchStatus Enumeration

Name	Number	Description
`BEST_MATCH_STATUS_UNKNOWN`	`0`
`approved_mismatch`	`1`
`below_threshold_match`	`2`
`below_threshold_mismatch`	`3`
`best_match_status`	`4`
`derived_species_match`	`5`
`genus_match`	`6`
`low_coverage`	`7`
`mismatch`	`8`
`status_na`	`9`
`species_match`	`10`
`subspecies_match`	`11`
`synonym_match`	`12`
`lineage_match`	`13`
`below_threshold_lineage_match`	`14`

AverageNucleotideIdentity.TaxonomyCheckStatus Enumeration

Name	Number	Description
`TAXONOMY_CHECK_STATUS_UNKNOWN`	`0`
`OK`	`1`
`Failed`	`2`
`Inconclusive`	`3`

LinkedAssemblyType Enumeration

Name	Number	Description
`LINKED_ASSEMBLY_TYPE_UNKNOWN`	`0`
`alternate_pseudohaplotype_of_diploid`	`1`	SEQUI-5245
`principal_pseudohaplotype_of_diploid`	`2`
`maternal_haplotype_of_diploid`	`3`
`paternal_haplotype_of_diploid`	`4`
`haplotype_1`	`6`
`haplotype_2`	`7`
`haplotype_3`	`8`
`haplotype_4`	`9`
`haploid`	`10`	Catch all for any value that is not explicitly listed above

SourceDatabase Enumeration

Name	Number	Description
`SOURCE_DATABASE_UNSPECIFIED`	`0`
`SOURCE_DATABASE_GENBANK`	`1`
`SOURCE_DATABASE_REFSEQ`	`2`

Scalar Value Types

Protocol buffers type	Notes	C++	Python	Java	Go
`double`		`double`	`float`	`double`	`float64`
`float`		`float`	`float`	`float`	`float32`
`int32`	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.	`int32`	`int`	`int`	`int32`
`int64`	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.	`int64`	`int/long`	`long`	`int64`
`uint32`	Uses variable-length encoding.	`uint32`	`int/long`	`int`	`uint32`
`uint64`	Uses variable-length encoding.	`uint64`	`int/long`	`long`	`uint64`
`sint32`	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	`int32`	`int`	`int`	`int32`
`sint64`	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	`int64`	`int/long`	`long`	`int64`
`fixed32`	Always four bytes. More efficient than uint32 if values are often greater than 2^28.	`uint32`	`int`	`int`	`uint32`
`fixed64`	Always eight bytes. More efficient than uint64 if values are often greater than 2^56.	`uint64`	`int/long`	`long`	`uint64`
`sfixed32`	Always four bytes.	`int32`	`int`	`int`	`int32`
`sfixed64`	Always eight bytes.	`int64`	`int/long`	`long`	`int64`
`bool`		`bool`	`boolean`	`boolean`	`bool`
`string`	A string must always contain UTF-8 encoded or 7-bit ASCII text.	`string`	`str/unicode`	`String`	`string`
`bytes`	May contain any arbitrary sequence of bytes.	`string`	`str`	`ByteString`	`[]byte`

Generated May 21, 2024