A 3-Gene Random Forest Model to Diagnose Non-obstructive Azoospermia Based on Transcription Factor-Related Henes

Reprod Sci. 2023 Jan;30(1):233-246. doi: 10.1007/s43032-022-01008-8. Epub 2022 Jun 17.

Abstract

Non-obstructive azoospermia (NOA) is one of the most severe forms of male infertility, but its diagnosis biomarkers with high sensitivity and specificity are largely unknown. Transcription factors (TFs) play essential roles in many pathological processes in different diseases. Herein, we aimed to identify the TFs showing high diagnosis ability for NOA through machine learning algorithms. The transcriptome data of the testicular tissue from 11 control and 47 NOA subjects were set as the training dataset; meanwhile, 1665 TFs were retrieved from the HumanTFDB. Through the feature extraction methods, including genomic difference analysis, Lasso, Boruta, SVM-RFE, and logistic regression, ETV2, TBX2, and ZNF689 were ultimately screened and then were included in the random forest (RF) diagnosis model. The RF model displayed high predictive power in the training (F-measure = 1) and two external validation (n = 31, F-measure = 0.902; n = 20, F-measure = 0.941) cohorts. The seminal plasma and testicular biopsy samples of 20 control and 20 NOA patients were collected from the local hospital, and the expression levels of ETV2, TBX2, and ZNF689 were measured via RT-qPCR and immunohistochemistry. The RF model could also distinguish the NOA samples in the local cohort (F-measure = 0.741). Single-cell RNA sequencing analysis, which was based on the 432 testicular cell samples from an NOA patient, showed that ETV2, TBX2, and ZNF689 were all significantly associated with spermatogenesis. In all, a 3-TF random forest diagnosis model was successfully established, providing novel insights into the latent mechanisms of NOA.

Keywords: Diagnosis; Machine learning; Male infertility; Non-obstructive azoospermia; Random forest; Transcription factor.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Apoptosis Regulatory Proteins* / genetics
  • Azoospermia* / diagnosis
  • Azoospermia* / genetics
  • Azoospermia* / pathology
  • Humans
  • Male
  • Random Forest
  • T-Box Domain Proteins* / genetics
  • Testis / metabolism
  • Transcription Factors* / genetics

Substances

  • Apoptosis Regulatory Proteins
  • ETV2 protein, human
  • Transcription Factors
  • ZNF689 protein, human
  • T-Box Domain Protein 2
  • T-Box Domain Proteins

Supplementary concepts

  • Azoospermia, Nonobstructive