Proteogenomics Integrating Novel Junction Peptide Identification Strategy Discovers Three Novel Protein Isoforms of Human NHSL1 and EEF1B2

J Proteome Res. 2021 Dec 3;20(12):5294-5303. doi: 10.1021/acs.jproteome.1c00373. Epub 2021 Aug 21.

Abstract

In eukaryotes, alternative pre-mRNA splicing allows a single gene to encode different protein isoforms that function in many biological processes, and they are used as biomarkers or therapeutic targets for diseases. Although protein isoforms in the human genome are well annotated, we speculate that some low-abundance protein isoforms may still be under-annotated because most genes have a primary coding product and alternative protein isoforms tend to be under-expressed. A peptide coencoded by a novel exon and an annotated exon separated by an intron is known as a novel junction peptide. In the absence of known transcripts and homologous proteins, traditional whole-genome six-frame translation-based proteogenomics cannot identify novel junction peptides, and it cannot capture novel alternative splice sites. In this article, we first propose a strategy and tool for identifying novel junction peptides, called CJunction, which we then integrate into a proteogenomics process specifically designed for novel protein isoform discovery and apply to the analysis of a deep-coverage HeLa mass spectrometry data set with identifier PXD004452 in ProteomeXchange. We succeeded in identifying and validating three novel protein isoforms of two functionally important genes, NHSL1 (causative gene of Nance-Horan syndrome) and EEF1B2 (translation elongation factor), which validate our hypothesis. These novel protein isoforms have significant sequence differences from the annotated gene-coding products introduced by the novel N-terminal, suggesting that they may play importantly different functions.

Keywords: EEF1B2; NHSL1; alternative splicing; junction peptide; protein isoform; proteogenomics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing*
  • Genome, Human
  • Guanine Nucleotide Exchange Factors / genetics*
  • Guanine Nucleotide Exchange Factors / metabolism
  • Humans
  • Mass Spectrometry
  • Peptide Elongation Factor 1 / genetics*
  • Peptide Elongation Factor 1 / metabolism
  • Peptides / chemistry
  • Protein Isoforms / genetics
  • Protein Isoforms / metabolism
  • Proteins* / genetics
  • Proteins* / metabolism
  • Proteogenomics* / methods

Substances

  • Guanine Nucleotide Exchange Factors
  • NHSL1 protein, human
  • Peptide Elongation Factor 1
  • Peptides
  • Protein Isoforms
  • Proteins
  • eEF1B-beta protein, human