Assembly Versioning and Status

Assembly Versioning and Status

Assembly Accessions and Versions

Assembly accession numbers are unique identifiers for the collection of sequence records that comprise an individual genome assembly. The format for assembly accession is as follows:

  • GenBank (primary) assembly: [GCA][ _ ][nine digits][.][version number]

  • RefSeq (NCBI-derived) assembly accessions: [GCF][ _ ][nine digits][.][version number]

Assemblies receive a new “.version” when a GenBank submitter or RefSeq staff update the underlying sequences. Previously, chromosome or other sequence names could change without an assembly version change, but since 2024 changes in these names always result in an assembly version change.

Assembly Status

All assembly accessions are assigned one of the following three statuses:

Latest—The current version of the assembly.

Replaced—The assembly was replaced by a newer version. The sequence, annotation and metadata files are still accessible but are “frozen” at the time the status changed from “latest” to “replaced.” Data for replaced assemblies can be retrieved by specific request.

Suppressed—Suppressed data are data that were previously public, have been removed from the NCBI text-based search and comparative analysis results, and may be accessed only by accession number. Assemblies are suppressed for various reasons:

  • GenBank (GCA) assemblies are suppressed because the underlying sequences that make up the assembly are suppressed. If a GenBank assembly is suppressed, the paired RefSeq assembly is also suppressed. If a RefSeq assembly is suppressed, it will remain live in GenBank unless removal is approved by the submitter. For more information, please refer to the Data Status Definitions on the “NLM GenBank and SRA Data Processing” page.

  • RefSeq (GCF) eukaryotic assemblies are suppressed when a new assembly is annotated for the organism or for quality issues. For more information, please refer to The NCBI Eukaryotic Genome Annotation Pipeline page.

  • RefSeq (GCF) prokaryotic assemblies may be suppressed due to sequence, annotation quality, or metadata issues. For more information, please refer to the NCBI Prokaryotic Genome Annotation Pipeline page.

  • RefSeq (GCF) virus assemblies are suppressed because a higher-quality assembly is identified for the species.

Note: You can still find suppressed assemblies by searching using the accession.

Generated May 21, 2024