Displaying new non-NCBI molecules with annotations

Introduction

Sometimes when trying to display BAM or GFF3 files on non-NCBI molecules, users receive the following error message: "graphical view failed to retrieve sequence for id lcl".

De-novo data (sequences and annotations) - are genomic molecules and assemblies without NCBI public accessions.

These may be data not yet submitted to NCBI (pre-submission) or private data (no submission planned). Such data can have identifiers that are not NCBI accessions. Very often, such identifiers start with the prefix 'lcl', and the sequences are referred to as 'local data'.

Genome Workbench organizes its data into projects. A project can be inspected using the Project Tree View. There is almost no validation on when data is loaded into a project, so you have maximum flexibility on how to load it. For example, it is possible to first load GFF3 with annotations, then load BAM files, and then load FASTA with the reference molecules ( “lcl|chr1234” ).

It is important that every project with loaded annotations or connected BAM files have access to all sequences that the sequence ids in the annotations refer to. Genome Workbench automatically loads sequences from NCBI that are referenced by GenBank or RefSeq accessions, but de-novo unsubmitted cases do not work this way. Either the sequences should be loaded into the project using FASTA files, or sequences should be made available via a BLAST database.

Let us discuss a few typical scenarios in which things may go wrong.

Typical Scenarios

The user imports a de-novo GFF3 file into a project and immediately tries to open a Graphical Sequence View to look at the molecule and its annotation. The Graphical Sequence View will not have access to the sequence content, only the annotation, and will fail with the error message.

How to fix:

  • Import a FASTA file containing the sequence into the same project as the GFF3 file

The user imports a FASTA file, then imports a GFF3 file into the same project. The Graphical Sequence View will display the sequence and annotation. Then the user imports another GFF3 file, but accidentally chooses to add it to a different project. The Graphical Sequence View will not show the uploaded track with the sequence in the first project. If the user attempts to open a new Graphical Sequence View for the GFF3 in the second project, it will fail with the error message.

How to fix:

  • Move second GFF3 file into the same project as the FASTA file or
  • Import the FASTA file again into the second project (this is less optimal but also possible)

The user imports a de-novo BAM file into a project and attempts to open a Graphical Sequence View to look at it. The error “failed to retrieve sequence” is displayed.

How to fix:

  • Import a FASTA file with the sequences referred to in the BAM file.

Conclusion

Genome Workbench requires that annotations and the sequences to which the annotations refer are imported into the same project in order to display the data in the Graphical Sequence View.

Why can’t Genome Workbench automatically find sequences from separate projects?

Ids are only known within a specific project, not across different projects.

In Genome Workbench, it is possible to have two different molecules that both use the same sequence ID, but only if these two variants are loaded into different projects. The user can then graphically compare the two different molecules, which is useful if these two molecules are variants of one another. The annotation for these two molecules will also use the same sequence IDs. Genome Workbench uses the project to determine which of these molecules the annotation refers to. Allowing annotation and sequence data to exist in separate projects would lead to inconsistency and potential conflicts in this situation. The current mechanism of using the project to set the scope for data resolution allows the user to do comparative visualization of de-novo molecules with the same sequence IDs without conflict.

For more information please refer to Working with Non-Public Data tutorial

Current Version is 3.7.1 (released October 13, 2021)

Release Notes

Downloads

General


Help


Tutorials


General use Manuals


NCBI GenBank Submissions Manuals


Other Resources


Support Center

Last updated: 2019-12-17T16:00:57Z