Abstract
Advancements in single-cell transcriptomics have enabled unprecedented resolution in profiling cellular heterogeneity across tissues and disease states. In this study, we present a comprehensive integration of single-cell RNA sequencing (scRNA-seq) and machine learning (ML) methodologies to construct multi-scale, high-resolution cell atlases and develop robust models for gene prioritization and cell classification. Specifically, we report two major contributions: the generation of a large-scale mouse organ and tissue atlas and the construction of the first complete single-cell transcriptomic atlas of the human retina derived from fresh tissue samples of Chinese individuals. These atlases provide a foundation for understanding organ-specific and retinal cellular complexity, as well as inter-individual variability, with implications for both basic biology and translational medicine.The mouse tissue atlas encompasses scRNA-seq data from 15 organ systems, allowing for hierarchical annotation of cellular composition across diverse tissue environments. By establishing a consistent taxonomy of organ-, tissue-, and cell-type-specific profiles, we enable comparative analysis of lineage relationships and functional specializations across organs. This resource provides critical insight into tissue-resident immune populations, stromal architecture, and inter-organ transcriptional conservation, offering a reference framework for studies in developmental biology, aging, and disease modeling.
In parallel, we constructed a single-cell transcriptomic atlas of the human retina based on fresh enucleated or postmortem samples collected within a short ischemic window, specifically from ethnically Chinese donors. This is the first study to produce a high-resolution reference of retinal cell types and subtypes specific to the Chinese population, filling a critical gap in retinal transcriptomic resources, which have historically been dominated by non-Asian populations. This atlas comprises detailed annotations of major retinal cell classes—including photoreceptors, bipolar cells, horizontal cells, Müller glia, amacrine cells, microglia, and retinal ganglion cells—as well as finer subpopulations. The use of fresh tissue preserved the transcriptomic integrity of labile cell types and enabled the identification of novel markers that would otherwise be degraded in frozen or delayed-processing samples.
To interpret these rich datasets and prioritize functionally relevant features, we integrated supervised machine learning algorithms—including logistic regression, random forests, and artificial neural networks—for feature selection and classification tasks. Using recursive feature elimination (RFE) and cross-validation strategies, we identified stable gene markers capable of discriminating between cell types with high specificity. Notably, we focused on the classification of retinal cell types under both healthy and diabetic conditions, with an emphasis on diabetic retinopathy (DR). Differential expression analysis combined with ML-based gene prioritization revealed robust signatures distinguishing healthy and diabetic retinal states. These included upregulation of genes associated with oxidative stress, vascular dysfunction, and immune activation, as well as the downregulation of genes implicated in neuroprotection and synaptic signaling. These findings shed light on critical pathophysiological processes driving diabetic retinal degeneration.
Additionally, we trained and evaluated multi-class classification models capable of assigning cell type identity with high accuracy across both the mouse and human datasets. The models demonstrated strong generalizability and scalability, offering a computationally efficient and biologically informative approach to annotate large-scale single-cell datasets. This classification framework has potential applications in automated cell annotation, disease subtype classification, and patient stratification in future clinical and research settings.
Despite these contributions, several limitations warrant consideration. The sample size for the human retina atlas, while high quality and ethnically novel, remains limited in number and demographic diversity, necessitating future expansion across broader populations and disease stages. The technical limitations of scRNA-seq—including gene dropout and incomplete transcript detection—pose challenges for capturing the full molecular complexity of rare or lowly expressed cell populations. Furthermore, while the ML models achieved high accuracy, external validation across independent datasets and spatial contexts is required to fully assess their robustness. Critically, no immunohistochemistry (IHC) validation was performed in this study, which limits the current conclusions to transcriptomic inference; future studies will incorporate IHC and spatial transcriptomics to validate key markers at the protein level and within tissue architecture.
In conclusion, this work represents a substantial step forward in the construction of single-cell reference atlases and the application of machine learning to unravel tissue- and disease-specific transcriptional programs. The generation of both a mouse tissue atlas and a human retina atlas based on Chinese fresh samples provides critical biological insights and fills a major gap in ethnic representation in single-cell studies. The integration of ML techniques enhances the interpretability and predictive power of scRNA-seq data, facilitating the discovery of diagnostic markers and therapeutic targets. Taken together, our framework serves as a scalable and generalizable platform for studying cellular diversity, disease mechanisms, and precision medicine applications in both tissue and organoid systems.
| Date of Award | 15 Mar 2026 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Weihua Meng (Supervisor) & Richard Rankin (Supervisor) |
Free Keywords
- Single-cell RNA sequencing
- Mouse tissue and organ system
- Chinese human retina
- Machine learning classification
- Cell-type-specific biomarkers
- Diabetic retinopathy
- Cell atlas
Cite this
- Standard