Epithelial-mesenchymal transition classification based on machine learning for predicting prognosis and treatment response in clear cell renal cell carcinoma
Original Article

Epithelial-mesenchymal transition classification based on machine learning for predicting prognosis and treatment response in clear cell renal cell carcinoma

Guangqiang Zhu1# ORCID logo, Ruipeng Tang2#, Tielong Tang1,3, Xupan Wei2, Chunlin Tan1,3

1Department of Clinical Medicine, North Sichuan Medical College, Nanchong, China; 2Department of Urology, Affiliated Hospital of Panzhihua University, Panzhihua, China; 3Department of Urology, Affiliated Hospital of North Sichuan Medical College, Nanchong, China

Contributions: (I) Conception and design: G Zhu, R Tang; (II) Administrative support: T Tang, C Tan; (III) Provision of study materials or patients: X Wei; (IV) Collection and assembly of data: G Zhu; (V) Data analysis and interpretation: G Zhu; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Chunlin Tan, MD, PhD. Department of Clinical Medicine, North Sichuan Medical College, No. 234, Fujiang Road, Shunqing District, Nanchong 637000, China; Department of Urology, Affiliated Hospital of North Sichuan Medical College, Nanchong, China. Email: cbyxy2019@outlook.com.

Background: Renal cell carcinoma (RCC) is the third most common cancer in the genitourinary system. However, factors such as postoperative metastasis, recurrence, and advanced inoperable conditions contribute to its high mortality rate. Epithelial-mesenchymal transition (EMT) is the initial process that enables cells to metastasize. It is crucial for initiating and promoting both tumor cell invasion and metastasis. This study aims to construct a prognostic prediction model for clear cell renal cell carcinoma (ccRCC) patients using EMT-association genes (EAGs) based on public database data to improve the management of ccRCC.

Methods: EAGs were identified through clustering and differential expression analysis, and machine learning methods were used to construct a prognostic model. External dataset E-MTAB-1980 was used for model construction and validation. Tumor microenvironment scores, enrichment analysis, and drug sensitivity analysis were used to predict treatment efficacy in different risk score groups. Finally, the “scissors” analysis linked high- and low-risk patients to individual cells and further explored the regulatory relationships between high- and low-risk cells through the cell communication network.

Results: A final set of 12 EAGs was used for model construction. Risk scores showed statistically significant differences in different stages. Risk scores were independent prognostic factors for ccRCC patients. Significant differences were observed in the infiltration levels of various immune cells, expression levels of immune checkpoint genes, tumor mutation burden, and drug sensitivity between high- and low-risk groups. Validation tests demonstrated that our EAGs model showed good predictive performance in ccRCC. Cell communication analysis indicated that high-risk and low-risk cell subpopulations had different regulatory networks.

Conclusions: A new EAGs prognostic signature was constructed, which can be used to assess the prognosis and treatment response of ccRCC.

Keywords: Clear cell renal cell carcinoma (ccRCC); epithelial-mesenchymal transition (EMT); prognostic model; machine learning; single-cell


Submitted Feb 13, 2025. Accepted for publication May 17, 2025. Published online Jun 26, 2025.

doi: 10.21037/tau-2025-109


Highlight box

Key findings

• A prognostic signature of 12 epithelial-mesenchymal transition (EMT)-association genes (EAGs) can effectively stratify clear cell renal cell carcinoma (ccRCC) patients into high- and low-risk groups.

• Risk scores correlate with tumor stage, immune infiltration, immune checkpoint expression, and drug sensitivity.

• The “scissors” analysis revealed distinct cell communication networks between high- and low-risk cells.

What is known and what is new?

• EMT drives metastasis in ccRCC, but existing prognostic tools do not reveal intrinsic specificity.

• This study first conducted a cellular-level analysis of dynamic models centered on EMT; then, single-cell “scissor” analysis revealed the intercellular interactions that EMT drives.

What is the implication, and what should change now?

• The EAGs signature enables precision risk stratification and predicts immunotherapy/chemotherapy response, supporting tailored treatment.

• Therapies targeting CCL-CCR signaling should be prioritized in high-risk patients. EMT-centric models should be incorporated into clinical trial designs.

• The model should be validated in a prospective study and risk scores should be included in the ccRCC clinical guidelines to guide adjuvant therapy decisions.


Introduction

Renal cell carcinoma (RCC) predominantly manifests as the clear cell variant. Due to its variable molecular characteristics, the mortality rate for RCC has remained relatively high and stable (1,2). While the pathological staging system serves as a crucial foundation for forecasting the prognosis of RCC in the medical field, patients at the same stage often have different outcomes (3), indicating that the existing staging system inadequately captures the heterogeneity of RCC. Consequently, the traditional tumor node metastasis (TNM) staging and pathological grading possess certain constraints in evaluating the prognosis of RCC. Therefore, investigating potential predictive and treatment-related biomarkers for RCC continues to be a significant challenge today (4-6).

Epithelial-mesenchymal transition (EMT) is a cellular biological process in which epithelial cells lose their polarity and intercellular connections, gaining the ability to migrate and invade like mesenchymal cells (7,8). Under pathological conditions, EMT assumes a crucial role in the invasion and metastasis of tumors (9,10). Through EMT, tumor cells can disassociate from the primary tumor site, infiltrate adjacent tissues, and disseminate to distant organs via the circulatory or lymphatic systems, establishing secondary tumors. This phenomenon increases the tumor’s malignancy and significantly complicates treatment, raising the risk of mortality for patients. Consequently, a more profound understanding of the molecular mechanisms underlying EMT can elucidate the biological foundations of tumor invasion and metastasis, providing a theoretical framework for the formulation of novel anti-tumor strategies. Furthermore, the identification and functional exploration of EMT-related molecules may yield new biomarkers for the early diagnosis and prognostic evaluation of tumors. Nevertheless, although numerous studies have uncovered the significant role of EMT in tumors (11-13), its precise regulatory mechanisms remain inadequately understood, and variations in its expression across diverse tumor types and individuals present challenges for research. Research has found that abnormal EMT signals are also associated with resistance to chemotherapy and immunotherapy in cancer (14-16), providing new insights into the connection between EMT and immune activation. In addition, EMT is also linked to immune cell infiltration, which influences how cancer patients respond to immune checkpoint inhibitors (17,18).

While EMT is known to play significant biological roles in various tumors, its specific function in ccRCC remains unclear. Recently, advancements in high-throughput data and machine learning have prompted researchers to use EMT gene signatures to develop new prognostic models. We hypothesize that EMT gene signatures can serve as important prognostic biomarkers for ccRCC patients, guiding individualized treatment. This study aims to create a robust prognostic risk model using ccRCC patient data from The Cancer Genome Atlas (TCGA) database, incorporating a comprehensive range of EMT genes to support precise treatment strategies for ccRCC. We present this article in accordance with the TRIPOD reporting checklist (available at https://tau.amegroups.com/article/view/10.21037/tau-2025-109/rc).


Methods

Data sources

Messenger ribonucleic acid (mRNA) expression data and clinical information for ccRCC (72 normal tissues and 529 tumor tissues) were obtained from TCGA database (https://tcga-data.nci.nih.gov/tcga/). A total of 1,001 EMT-associated mRNA coding genes were obtained from the dbEMT2.0 database (http://dbemt.bioinfo-minzhao.org/). The external validation dataset was sourced from E-MTAB-1980 (https://ebi.ac.uk/arrayexpress/), which includes transcriptomic data and clinical information for 101 ccRCC samples. Somatic mutation data for TCGA patients were downloaded from UCSC XENA (https://xenabrowser.net/datapages/). Finally, the single-cell RNA sequencing (scRNA-seq) data of ccRCC was downloaded from GSE242299. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Construction of ccRCC molecular subtype for EMT genes

We utilized the “ConsensusClusterPlus” in R to perform consensus clustering analysis. This analysis aimed to define molecular subtypes of ccRCC using 1,001 EMT genes. After comparing the stages and overall survival (OS) of different subtypes, we conducted differential analysis to identify differentially expressed genes (DEGs) between the subtype groups. DEGs between two clusters were identified following the criteria of |log fold change (FC)| >1 and false discovery rate (FDR) <0.05 through the “DESeq2” package.

Development EMT-association genes (EAGs) signature by 10 kinds of machine learning

EAGs were obtained by intersecting the DEGs of the two subtypes with 1,001 EMT genes and used for ten machine learning algorithms (19), including random survival forest, elastic network, Lasso, Ridge, stepwise Cox, CoxBoost, partial least squares regression for Cox, supervised principal components (SuperPC), generalized boosted regression modeling, and survival support vector machine to build and screen prognostic models and calculate risk scores. The risk score for each patient was calculated using the following equation: Risk score = mRNA1 × CoefmRNA1 + mRNA2 × CoefmRNA2 + … mRNAn × CoefmRNAn. Prognostic grouping was based on the risk score, and principal component analysis (PCA) was performed using the R package “limma”.

Validation of the risk model across diverse clinical parameters

The patients were divided into high- and low-risk groups based on the median risk score, and the related line graph, scatter plot, and heatmap of the risk scores were plotted. Using the “survival” and “survminer” packages, Kaplan-Meier (K-M) survival analysis was performed on the high- and low-risk groups by combining the survival times and survival status of each sample. To further assess the accuracy of the predictive risk score model, the “survival” and “timeROC” packages were used to plot ROC curves and calculate the area under the curve (AUC). Finally, the differences between different risk scores and different stages were compared.

Nomogram construction

A nomogram was constructed using the “rms” package in R by integrating the patient’s clinical data, including age, gender, stage, and riskScore. The reliability of the nomogram was validated through a calibration curve.

Tumor microenvironment (TME) in subgroups

The proportion of tumor-infiltrating immune cells and the expression of immune cells in different risk groups were calculated using the ESTIMATE and single-sample gene set enrichment analysis (ssGSEA) methods, and the expression differences of immune checkpoint genes between the two groups were also analyzed.

Tumor mutational burden (TMB) analysis

Download somatic mutation data for ccRCC from the Genomic Data Commons Data Portal, compare mutation burdens between different risk groups, and create a waterfall plot showing the top mutation frequencies.

Enrichment analysis

Functional information was analyzed using gene set variation analysis (GSVA) with MSigDB signature gene sets “Hallmark” (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp). Additionally, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted to analyze the functional enrichment of DEGs between the two groups, aiming to understand the functions of the obtained genes and their roles in disease development. GO enrichment analysis primarily describes the biological processes (BP), cellular components (CC), and molecular functions (MF) associated with the annotated genes. KEGG pathway enrichment analysis focuses on revealing the biological pathways related to the annotated genes.

Drug sensitivity analysis

This study utilized the ‘OncoPredict’ package (20) to estimate drug IC50 values and assess drug sensitivity across different subgroups.

Single-cell analysis

We downloaded the ccRCC scRNA-seq dataset from GSE242299 and 18 ccRCC tumor samples were included. Firstly, we used the Seurat package for cell quality control and standardization. Then, we utilized the RunPCA function for dimensionality reduction and finally employed the FindNeighbors and FindClusters functions for cell clustering and manual annotation of each cell cluster. The Scissor package identifies cell subpopulations linked to model risk score groups and measures the similarity between bulk and single-cell transcriptomic data. Moreover, after optimizing a regression model to analyze the relationship between matrix phenotypes, we identified Scissor+ cells as associated with the high-risk subtype, whereas Scissor- cells were linked to the low-risk subtype. Finally, we use the CellChat package to analyze intercellular communication. Use the “generate cell chat” function to create an intercellular communication network that details ligand-receptor interactions among various cell types. The “plot cell chat” function was used to visualize the intercellular communication network.

Statistical analysis

RNAseq data are stored in transcripts per million (TPM) format and converted to log2 values. All final analyses were performed using TPM data expressed as mean ± standard deviation (SD). Prognostic analysis included creating survival curves using the Kaplan-Meier method and assessing the significance of differences using the Mann-Whitney U test. We evaluated the predictive value of the risk score for prognosis by using receiver operating characteristic (ROC) curves (implemented in the R package “timeROC”) and determining the AUC values. To determine whether the model serves as an independent prognostic factor for ccRCC, we conducted multivariate Cox regression analysis. All statistical analyses and visualizations were performed using R (version 4.3.3), and P<0.05 was considered statistically significant.


Results

Identification of differentially expressed EAGs

Based on survival curves and pathological staging, TCGA-kidney renal clear cell carcinoma (KIRC) patients were clustered into two groups (Figure 1A) with different prognoses and pathological stages by 1,001 EMT genes, with Group B having a worse prognosis and later pathological stage (Figure 1B-1H). We identified 2,044 DEGs between the two ccRCC cohorts and generated an impressive volcano plot (Figure 1I). Subsequently, intersecting the obtained DEGs with the EMT gene set yielded 86 EAGs associated with ccRCC prognosis (Figure 1J).

Figure 1 Different EMT subtypes distinguished by consensus clustering. (A) The consensus matrix when k=2. (B) Survival analysis of patients in distinct EMT clusters. (C) Proportion of T stage, (D) N stage, (E) M stage, (F) staging stage, (G) age and (H) gender in the two clusters. (I) Volcanic plot of differentially expressed genes between two clusters. (J) Venn plot of 2,044 differentially expressed genes and 1,001 EMT genes. DEG, differentially expressed gene; EMT, epithelial-mesenchymal transition; M, metastasis; N, node; T, tumor.

The optimal model from ten machine learning algorithms

We rigorously compared 101 different models generated by 10 machine learning algorithms using both the training and validation sets. The final model, based on the CoxBoost + Enet [alpha =0.3] algorithm, consisted of a prognostic model built from 12 gene signatures (CDH13, PDGFD, TIMP3, ABCG2, ONECUT2, KCNN4, IL4, DLX4, SPDEF, L1CAM, KLF17, MAGEC2) (Figure 2A). CoxBoost and Elastic Net (Enet), used for survival analysis and regression tasks. CoxBoost is based on the Cox proportional hazards model and is designed to model survival data. It combines traditional Cox regression with modern machine learning methods, effectively handling high-dimensional feature data often encountered in genomics and clinical data analysis. Elastic Net is a regularization method ideal for modeling high-dimensional data, especially when many features are correlated. The combination of CoxBoost and Enet (with alpha set to 0.3) effectively manages high-dimensional data while aiding in predicting and selecting key features. The prognostic indices for these genes were determined as follows: −0.04185341, −0.03924688, −0.06858814, −0.04465526, 0.14061576, 0.11482192, 0.92154913, 0.28208768, 0.13813681, 0.03066150, 0.31394598, and 0.22319052. Based on the risk indices, we divided the training and validation sets into high- and low-risk groups using the median value. Analysis using the “limma” and “ggplot2” packages revealed significant differences between the two groups in the PCA plot (Figure 2B). Survival analysis confirmed that the prognostic differences in both the training and validation sets were statistically significant (Figure 2C,2D).

Figure 2 EMT prognostic model built with machine learning. (A) Applied 10 different machine learning algorithms to construct 101 models in total and selected the CoxBoost + Enet (alpha =0.3) combination model. Risk models for 12 EAGs were obtained. (B) PCA plots showed significant differences between groups. Survival analysis verified that the prognostic differences in the (C) training group (TCGA) and (D) testing group (E-MTAB-1980). We plotted ROC curves at 1, 3 and 5 years (AUC =0.772, 0.753, 0.765) in the (E) training group and (F) testing group (AUC =0.795, 0.750, 0.784). Relevant line chart of risk score divides the sample into high and low risk groups based on the median risk score and arranges them from low to high in the (G) training group and (H) testing group. Scatter plot, scatter colors represent the survival status of the sample in the (I) training group and (J) testing group. Model gene heatmap, expression status of model genes in high- and low-risk groups in the (K) training group and (L) testing group. AUC, area under the curve; EAG, EMT-association gene; EMT, epithelial-mesenchymal transition; PCA, principal component analysis; ROC, receiver operating characteristic; TCGA, The Cancer Genome Atlas.

The risk model has ideal predictive ability

To evaluate the model’s predictive capability, we utilized sophisticated R packages, including “survminer” and “timeROC” to meticulously create ROC curves. These curves facilitate a detailed comparison of the model’s performance in predicting 1-, 3-, and 5-year survival rates (Figure 2E,2F). At the same time, the risk factor distribution map of the model and the expression heatmap of the model genes were presented in Figure 2G-2L. This robust validation protocol undoubtedly reinforces the model’s effectiveness in predicting patient prognosis using clinical attributes. In our comprehensive evaluation, we also conducted a meticulous multivariate Cox regression analysis to assess the risk score as a truly independent prognostic factor (Figure 3A,3B). Overall, our use of Cox regression analysis and ROC curves validated the risk score as an independent prognostic factor, confirming its efficacy in predicting survival rates.

Figure 3 A nomogram to expand the clinical application and usability of the constructed risk model. Multivariate Cox analyses were performed to compare the risk score with other clinical factors in the (A) training group and (B) testing group. (C) Nomogram based on gender, stage, age and risk score to predict the prognostic survival probability at 1, 3, and 5 years for ccRCC patients. (D) Calibration curves at 1, 3, and 5 years. AUC, area under the curve; ccRCC, clear cell renal cell carcinoma; CI, confidence interval; HR, hazard ratio; M, metastasis; N, node; T, tumor.

Nomogram enhanced clinical utility of the risk model

In this rigorous analysis, we began constructing a nomogram and utilized the sophisticated “rms” R package to create an informative nomogram that primarily relies on risk scores and a series of clinical attributes. This nomogram is a powerful tool designed for personalized prognosis analysis, integrating clinical attributes and risk scores. The overall goal was to accurately predict the 1-, 3-, and 5-year survival rates of patients in the TCGA-KIRC cohort (Figure 3C,3D).

EAGs are involved in TME

We used ssGSEA analysis to detect the infiltration levels of immune cells, and our detailed analysis showed significant differences in immune cell infiltration between the high- and low-risk groups (Figure 4A). Comprehensive analysis covered various subtypes of T cells, monocytes, and macrophages, revealing distinct differences. Notably, the low-risk group had higher expression levels of Eosinophils, Mast cells, and Neutrophils. In contrast, the high-risk group showed elevated levels in multiple T cell, B cell, and macrophage subtypes. Meanwhile, the ESTIMATE algorithm indicated that the high-risk group had a higher immune score and lower tumor purity (Figure 4B-4E). Therefore, our findings emphasize that the high-risk population exhibits higher levels of immune infiltration compared to the low-risk population.

Figure 4 Correlation of EAGs signature with immune features. (A) ssGSEA analysis of ccRCC in high and low risk score group, ns, no significance; *, P<0.05; **, P<0.01; ***, P<0.001. Significant differences by Wilcoxon rank sum test in (B) ImmuneScore, (C) StormalScore, (D) ESTIMATEScore and (E) TumorPurity between two groups. ccRCC, clear cell renal cell carcinoma; EAG, EMT-association gene; EMT, epithelial-mesenchymal transition; ssGSEA, single-sample gene set enrichment analysis.

Clinical staging and TMB are intricately linked to the risk score

We found a strong correlation between the risk score and both TNM staging and overall clinical staging (Figure 5A-5D). Specifically, a higher risk score usually indicates a higher stage of the disease. This not only reflects the severity of the condition but also offers valuable insights for clinical decision-making. Then, we analyzed mutations in two TCGA cohorts. Using the “maftools” package, we generated waterfall plots and evaluated TMB (Figure 5E,5F). The high-risk group exhibited a higher TMB, which indicated a complex mutational landscape and enhanced our understanding of ccRCC patient OS.

Figure 5 Correlation of EAGs signature with clinical features and TMB. Differences in risk scores between (A) T stages, (B) N stages, (C) M stages and (D) overall stages. (E) Waterfall plot of gene mutations in different risk score groups. (F) Total mutation burden among different risk score groups. EAG, EMT-association gene; EMT, epithelial-mesenchymal transition; M, metastasis; N, node; T, tumor; TMB, tumor mutational burden.

Enrichment analysis of model-associated genes

To further investigate the genomic differences between high-risk and low-risk subgroups, we utilized GSVA to meticulously examine changes in genomic activity within pathways. This investigation identified 27 significantly enriched pathways among 50 Hallmark pathways, including the following notable entries: IL6 JAK STAT3 signaling, inflammatory response, EMT, and allograft rejection were more enriched in the high-risk group. In contrast, pathways such as protein secretion, TGF-β signaling, Notch signaling, Hedgehog signaling, and Wnt/β-catenin signaling were predominantly enriched in the low-risk group (Figure 6A). These insights were visually presented through heatmaps elucidating the differences. Additionally, we explored GO and KEGG enrichment analyses to assess the differences within gene sets between the high-risk and low-risk subgroups. Our stringent significance criteria were defined as FDR <0.05. The GO enrichment analysis revealed numerous prominent pathways, including but not limited to epidermis development, humoral immune response, cell fate commitment, visual perception, sensory perception of light stimulus, and antimicrobial humoral response (Figure 6B). The KEGG analysis highlighted pathways such as neuroactive ligand-receptor interaction, olfactory transduction, cytokine-cytokine receptor interaction, calcium signaling pathway, complement and coagulation cascades, and IL-17 signaling pathway (Figure 6C). Our findings further indicate the correlation between EMT and immune response.

Figure 6 Biological differences between two different risk score groups. (A) GSVA enrichment analysis in high- and low-risk groups. (B) GO analysis of DEGs between high- and low-risk groups. (C) KEGG analysis of DEGs between high- and low-risk groups. DEG, differentially expressed gene; GO, Gene Ontology; GSVA, gene set variation analysis; KEGG, Kyoto Encyclopedia of Genes and Genomes.

Drug sensitivity assessment

Wilcoxon rank sum test showed that the expression of PD-1, CTLA4, LAG3 and TIGIT was higher in the high-risk group than in the low-risk group in ccRCC patients (P<0.001), which further evidence is that EAGs influence the prognosis of ccRCC by influencing the immune environment and immune checkpoints (Figure 7A-7D). The GDSC database was applied to explore potential therapeutic drugs for high- and low-risk populations, we evaluated the therapeutic response to 198 small molecular inhibitors between the two group, the notable drugs are listed as references for subsequent treatment responses. All small molecule drugs provide new insights for the management of different subgroups (Figure 7E-7M).

Figure 7 Prediction of treatment response between two different risk score groups. (A) PD-1 expression in high- and low-risk groups. (B) CTLA4 expression in high- and low-risk groups. (C) TIGIT expression in high and low-risk groups. (D) LAG3 expression in high- and low-risk groups. (E-M) Drug sensitivity analysis in high and low-risk groups. (E) Selumetinib, (F) afatinib, (G) camptothecin, (H) cisplatin, (I) ibrutinib, (J) erlotinib, (K) gefitinib, (L) dabrafenib, (M) nilotinib. **, P<0.01; ***, P<0.001.

Cell-cell interaction reveals regulatory relationships between different subpopulations of cells

Following quality control of ccRCC samples, we generated a Seurat object containing 19,365 genes and 35,746 cells and identified 11 cell populations after clustering and annotation (Figure 8A,8B). Then Scissor analysis identified 3,564 high-risk cells (Scissor+) and 3,277 low-risk cells (Scissor−). Compared to low-risk cells, high-risk cells contain more fibroblasts, endothelium2 (more like tumor-associated), and monocytes, while endothelium1 is more abundant in low-risk cells (Figure 8C,8D). Then, we used CellChat analysis to explore the variations in the intercellular communication patterns between high- and low-risk cells (Figure 9A-9C). This analysis demonstrated that the intercellular interactions between the low- and high-risk groups primarily took place between endothelium and immune cells. Further ligand-receptor profiling analysis revealed that the interactions of HLA-C-KIR2DL3, HLA-A-CD8A, and CD99-CD99 were significantly strengthened and exclusively found in the low-risk group. In contrast, the interactions specific to the high-risk group were CCL5-CCR1, CCL4-CCR5, and CCL3-CCR1 (Figure 9D,9E).

Figure 8 Molecular characteristics in Scissor-high and Scissor-low clusters. (A) UMAP visualization of 11 circulating cells in ccRCC. (B) The dot plot of maker annotation for various cell populations. (C) The UMAP visualization of the Scissor-selected cells distinctly categorizes the populations into high- and low-risk phenotypes, with the high and low dots representing Scissor+ and Scissor− cells, respectively, thereby emphasizing the differential risk profiles associated with these cellular groups. (D) Bar graph showing the proportion of cell types in different scissor sample. ccRCC, clear cell renal cell carcinoma; UMAP, uniform manifold approximation and projection.
Figure 9 Cell-cell communications in Scissor-high and Scissor-low clusters. (A) The circle plot vividly illustrates the significant connections that Scissor-high cells have established with various other cell types. (B) In contrast, the circle plot for Scissor-low cells delineates their significant connections with other cells, revealing a distinct pattern of interactions that may suggest a different functional capacity or signaling profile compared to their Scissor-high counterparts. (C) CellChat showing the overall signaling patterns in each cell. (D,E) A detailed comparison of the proficiency of Scissor ± cells in both signal reception and transmission within the tumor microenvironment highlights the nuanced differences in their capabilities, shedding light on how these variations may influence their roles in tumor progression and immune response.

Discussion

ccRCC represents a significant challenge in oncology, characterized by its heterogeneous nature and variable clinical outcomes. The disease is often diagnosed at advanced stages, leading to poor prognosis and limited treatment options (5). Molecular biology and bioinformatics have highlighted the role of EMT in tumor progression and metastasis (21), suggesting that EMT-related genes may serve as critical biomarkers for prognosis and therapeutic targets. In this study, we aimed to elucidate the prognostic significance of EAGs in ccRCC by employing a comprehensive bioinformatics approach. Utilizing mRNA expression data and clinical information from TCGA and external validation datasets, we performed consensus clustering to identify molecular subtypes of ccRCC based on 1,001 EMT genes. Through the staging and OS comparison of two subtypes, it was affirmed that EMT genes are intricately linked to tumor invasion and metastasis in ccRCC (12), suggesting that EMT genes may function as promising biomarkers for ccRCC. Thereafter, to refine the essential functional genes, we performed a screening of EMT core genes from the DEGs of the two subtypes, which were subsequently utilized to develop a prognostic model. In contrast to previous studies (22,23), we incorporated a more extensive array of EMT genes from the EMT database for the identification of EAGs and amalgamated it with the external dataset E-MTAB-1980 alongside 10 machine learning techniques to generate over 100 prognostic models. Our methodology included rigorous validation of the risk model through survival analysis, ROC curve assessments, and the construction of a nomogram to enhance clinical utility.

Alongside the clinical validation of the risk model, we additionally examined the molecular distinctions between the two subgroups according to risk scores. We discovered notable disparities in immune scores and tumor purity between the two cohorts of patients with genetic risk scores in the ESTIMATE, which aligns with the tumor attributes of ccRCC exhibiting increased immune infiltration and diminished tumor purity (24). The immune cell infiltration analysis of ssGSEA also revealed significant differences between the high- and low-risk groups, highlighting the critical role of the TME in ccRCC prognosis. Among the immune cell types examined, T cells and macrophages emerged as key players in the TME, with the high-risk group exhibiting higher infiltration levels compared to the low-risk group. TAMs fulfill a vital function in the TME (25), by facilitating EMT, TAMs can prompt tumor cells to undergo EMT through the secretion of various cytokines and chemokines, thereby augmenting the migratory and invasive potential of tumor cells (26). Specific T cell subsets, such as CD8+ T cells and regulatory T cells (Tregs), can affect the EMT status of tumor cells through diverse mechanisms. The infiltration of CD8+ T cells is generally correlated with prognosis for tumors, whereas elevated levels of Tregs in certain tumors are associated with immune evasion and EMT (27). The presence of CXCL13+CD8+ T cells within tumors further suggests a grim prognosis for ccRCC patients (28). Moreover, we noted greater expression of immune checkpoints such as PD1, CTLA4, LAG3, and TIGIT in the high-risk group, which can facilitate immune evasion and resistance to immunosuppressive therapies in ccRCC (29-31). This further substantiates the critical role of EMT in immune evasion (32), and affirms that our EAGs prediction model can effectively differentiate prognosis and immune therapy response in ccRCC.

In our drug sensitivity evaluation, we discovered that the high-risk cohort demonstrated an elevated IC50 in vascular endothelial growth factor receptor tyrosine kinase inhibitors (VEGFR TKIs), further substantiating the precision of our EAGs model in forecasting treatment responses in ccRCC. EMT not only impacts the infiltration patterns of immune cells but also fosters angiogenesis within the TME. Tumor cells adopt stromal characteristics through EMT, allowing them to secrete a variety of pro-angiogenic factors, such as VEGF and matrix metalloproteinases (MMPs), which facilitate the formation of new blood vessels (33). Across diverse tumor types, the EMT process is intricately linked to angiogenesis. Research has revealed that EMT-related transcription factors (such as Snail and ZEB) can enhance the expression of VEGF. Furthermore, EMT affects the migration and proliferation of endothelial cells by modifying the matrix components of the TME, thereby advancing tumor angiogenesis (34,35). The above observations are consistent with the known role of EMT in modulating the TME, as EMT-induced changes in tumor cell behavior can influence immune cell recruitment and function. The differential expression of immune checkpoint genes further supports the hypothesis that EMT-related mechanisms play a role in shaping the immunological landscape of ccRCC, potentially impacting therapeutic responses to immune-based therapies.

To investigate the regulatory differences between various risk groups, we used the “scissor” algorithm to analyze scRNA-seq data from ccRCC at the cellular level. These findings show that the low-risk group has a more active immune response in intercellular communication, especially in the interactions between endothelial cells and immune cells. The stronger interactions, particularly the binding of HLA-C to KIR2DL3, indicate that the low-risk group has better immune recognition capabilities, which facilitates the identification and elimination of tumor cells. In contrast, the high-risk group’s intercellular communication is mainly driven by chemokines, such as CCL5, CCL4, and CCL3, binding to their receptors. This phenomenon may indicate a strong inflammatory response in the TME of the high-risk group. However, this response has not effectively translated into specific immune attacks and may instead promote tumor infiltration and metastasis. Although the upregulation of these chemokines may recruit immune cells, it has not effectively activated their response against the tumor, thus worsening the tumor’s malignant characteristics.

In summary, we have developed an EAGs prediction model for ccRCC. This model can be integrated into clinical practice in the future to help doctors create personalized treatment plans. It helps accurately assess a patient’s condition, predicts tumor characteristics, and helps doctors choose the right treatments. For example, due to variations in gene expression patterns, some patients may respond more favorably to immunotherapy or targeted therapies, while others may benefit more from standard chemotherapy. Additionally, the model allows researchers to screen patients for clinical trials. This enhances the precision of trial results through detailed analysis and patient grouping, reduces side effects, and improves treatment success rates and quality of life.

However, this study has several limitations worth considering. First, while using publicly available datasets is beneficial for large-scale analysis, it lacks in vivo and in vitro experimental validation, which may compromise the reliability of the results. Second, although our model focuses on EMT-related genes, these genes may also affect the prognosis of ccRCC through various mechanisms, complicating our understanding of their roles. While our model shows promise in predicting prognosis and drug sensitivity in ccRCC, it should be interpreted with caution given the limitations of the data. Future research should aim to establish causal relationships by investigating the mechanisms linking EMT with immune infiltration and drug sensitivity. Finally, while our nomogram primarily considered clinical features like age, gender, and staging, the absence of comprehensive treatment records and tissue subtype data in the TCGA-KIRC dataset limited the robustness of our model construction. Future research should integrate additional clinical features, including histological data and treatment history. This integration will enhance the accuracy and clinical utility of prognostic models.


Conclusions

The EAGs prediction model we developed can effectively predict the prognosis and treatment responsiveness of ccRCC patients. The model’s risk score can independently predict the prognosis of patients with ccRCC. The constructed nomogram can accurately predict the survival rate of patients with clear cell renal carcinoma, providing a prognostic reference for patients. Additionally, we found significant differences in the infiltration levels of various immune cells between high-risk and low-risk groups. The high-risk group has poorer immunotherapy efficacy and a higher likelihood of immune evasion.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tau.amegroups.com/article/view/10.21037/tau-2025-109/rc

Peer Review File: Available at https://tau.amegroups.com/article/view/10.21037/tau-2025-109/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tau.amegroups.com/article/view/10.21037/tau-2025-109/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Moch H, Amin MB, Berney DM, et al. The 2022 World Health Organization Classification of Tumours of the Urinary System and Male Genital Organs-Part A: Renal, Penile, and Testicular Tumours. Eur Urol 2022;82:458-68. [Crossref] [PubMed]
  2. Siegel RL, Giaquinto AN, Jemal A. Cancer statistics, 2024. CA Cancer J Clin 2024;74:12-49. [Crossref] [PubMed]
  3. Motzer RJ, Jonasch E, Agarwal N, et al. NCCN Guidelines® Insights: Kidney Cancer, Version 2.2024. J Natl Compr Canc Netw 2024;22:4-16. [Crossref] [PubMed]
  4. Cheng L, Zhang S, MacLennan GT, et al. Molecular and cytogenetic insights into the pathogenesis, classification, differential diagnosis, and prognosis of renal epithelial neoplasms. Hum Pathol 2009;40:10-29. [Crossref] [PubMed]
  5. Dunnick NR. Renal cell carcinoma: staging and surveillance. Abdom Radiol (NY) 2016;41:1079-85. [Crossref] [PubMed]
  6. Warren AY, Harrison D. WHO/ISUP classification, grading and pathological staging of renal cell carcinoma: standards and controversies. World J Urol 2018;36:1913-26. [Crossref] [PubMed]
  7. Diepenbruck M, Christofori G. Epithelial-mesenchymal transition (EMT) and metastasis: yes, no, maybe? Curr Opin Cell Biol 2016;43:7-13. [Crossref] [PubMed]
  8. Jing Y, Han Z, Zhang S, et al. Epithelial-Mesenchymal Transition in tumor microenvironment. Cell Biosci 2011;1:29. [Crossref] [PubMed]
  9. Yuan J, Yang L, Zhang H, et al. Decoding tumor microenvironment: EMT modulation in breast cancer metastasis and therapeutic resistance, and implications of novel immune checkpoint blockers. Biomed Pharmacother 2024;181:117714. [Crossref] [PubMed]
  10. Zhao X, Ren T, Li S, et al. A new perspective on the therapeutic potential of tumor metastasis: targeting the metabolic interactions between TAMs and tumor cells. Int J Biol Sci 2024;20:5109-26. [Crossref] [PubMed]
  11. De Craene B, Berx G. Regulatory networks defining EMT during cancer initiation and progression. Nat Rev Cancer 2013;13:97-110. [Crossref] [PubMed]
  12. Mittal V. Epithelial Mesenchymal Transition in Tumor Metastasis. Annu Rev Pathol 2018;13:395-412. [Crossref] [PubMed]
  13. Yu C, Liu Q, Chen C, et al. Landscape perspectives of tumor, EMT, and development. Phys Biol 2019;16:051003. [Crossref] [PubMed]
  14. Li H, Batth IS, Qu X, et al. IGF-IR signaling in epithelial to mesenchymal transition and targeting IGF-IR therapy: overview and new insights. Mol Cancer 2017;16:6. [Crossref] [PubMed]
  15. Liaghat M, Ferdousmakan S, Mortazavi SH, et al. The impact of epithelial-mesenchymal transition (EMT) induced by metabolic processes and intracellular signaling pathways on chemo-resistance, metastasis, and recurrence in solid tumors. Cell Commun Signal 2024;22:575. [Crossref] [PubMed]
  16. Mak MP, Tong P, Diao L, et al. A Patient-Derived, Pan-Cancer EMT Signature Identifies Global Molecular Alterations and Immune Target Enrichment Following Epithelial-to-Mesenchymal Transition. Clin Cancer Res 2016;22:609-20. [Crossref] [PubMed]
  17. Friedman-DeLuca M, Karagiannis GS, Condeelis JS, et al. Macrophages in tumor cell migration and metastasis. Front Immunol 2024;15:1494462. [Crossref] [PubMed]
  18. Fuxe J, Karlsson MC. TGF-β-induced epithelial-mesenchymal transition: a link between cancer and inflammation. Semin Cancer Biol 2012;22:455-61. [Crossref] [PubMed]
  19. Liu H, Zhang W, Zhang Y, et al. Mime: A flexible machine-learning framework to construct and visualize models for clinical characteristics prediction and feature selection. Comput Struct Biotechnol J 2024;23:2798-810. [Crossref] [PubMed]
  20. Maeser D, Gruener RF, Huang RS. oncoPredict: an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data. Brief Bioinform 2021;22:bbab260. [Crossref] [PubMed]
  21. Lu W, Kang Y. Epithelial-Mesenchymal Plasticity in Cancer Progression and Metastasis. Dev Cell 2019;49:361-74. [Crossref] [PubMed]
  22. Hu F, Zeng W, Liu X. A Gene Signature of Survival Prediction for Kidney Renal Cell Carcinoma by Multi-Omic Data Analysis. Int J Mol Sci 2019;20:5720. [Crossref] [PubMed]
  23. Wang C, He Z. Integrating bulk and single-cell RNA sequencing data reveals epithelial-mesenchymal transition molecular subtype and signature to predict prognosis, immunotherapy efficacy, and drug candidates in low-grade gliomas. Front Pharmacol 2023;14:1276466. [Crossref] [PubMed]
  24. Borcherding N, Vishwakarma A, Voigt AP, et al. Mapping the immune environment in clear cell renal carcinoma by single-cell genomics. Commun Biol 2021;4:122. [Crossref] [PubMed]
  25. Núñez SY, Trotta A, Regge MV, et al. Tumor-associated macrophages impair NK cell IFN-γ production and contribute to tumor progression in clear cell renal cell carcinoma. Eur J Immunol 2024;54:e2350878. [Crossref] [PubMed]
  26. Chu X, Tian Y, Lv C. Decoding the spatiotemporal heterogeneity of tumor-associated macrophages. Mol Cancer 2024;23:150. [Crossref] [PubMed]
  27. Taki M, Abiko K, Ukita M, et al. Tumor Immune Microenvironment during Epithelial-Mesenchymal Transition. Clin Cancer Res 2021;27:4669-79. [Crossref] [PubMed]
  28. Dai S, Zeng H, Liu Z, et al. Intratumoral CXCL13(+)CD8(+)T cell infiltration determines poor clinical outcomes and immunoevasive contexture in patients with clear cell renal cell carcinoma. J Immunother Cancer 2021;9:e001823. [Crossref] [PubMed]
  29. Andrews LP, Yano H, Vignali DAA. Inhibitory receptors and ligands beyond PD-1, PD-L1 and CTLA-4: breakthroughs or backups. Nat Immunol 2019;20:1425-34. [Crossref] [PubMed]
  30. Chauvin JM, Pagliano O, Fourcade J, et al. TIGIT and PD-1 impair tumor antigen-specific CD8+ T cells in melanoma patients. J Clin Invest 2015;125:2046-58. [Crossref] [PubMed]
  31. Fourcade J, Sun Z, Benallaoua M, et al. Upregulation of Tim-3 and PD-1 expression is associated with tumor antigen-specific CD8+ T cell dysfunction in melanoma patients. J Exp Med 2010;207:2175-86. [Crossref] [PubMed]
  32. Imodoye SO, Adedokun KA. EMT-induced immune evasion: connecting the dots from mechanisms to therapy. Clin Exp Med 2023;23:4265-87. [Crossref] [PubMed]
  33. Yan J, Gao Y, Lin S, et al. EGR1-CCL2 Feedback Loop Maintains Epithelial-Mesenchymal Transition of Cisplatin-Resistant Gastric Cancer Cells and Promotes Tumor Angiogenesis. Dig Dis Sci 2022;67:3702-13. [Crossref] [PubMed]
  34. Alard A, Katsara O, Rios-Fuller T, et al. Breast cancer cell mesenchymal transition and metastasis directed by DAP5/eIF3d-mediated selective mRNA translation. Cell Rep 2023;42:112646. [Crossref] [PubMed]
  35. Qian J, Tao D, Shan X, et al. Role of angiogenesis in beta-cell epithelial-mesenchymal transition in chronic pancreatitis-induced diabetes. Lab Invest 2022;102:290-7. [Crossref] [PubMed]
Cite this article as: Zhu G, Tang R, Tang T, Wei X, Tan C. Epithelial-mesenchymal transition classification based on machine learning for predicting prognosis and treatment response in clear cell renal cell carcinoma. Transl Androl Urol 2025;14(6):1742-1758. doi: 10.21037/tau-2025-109

Download Citation