Identification of M2 macrophage-related biomarkers for a predictive model of interstitial fibrosis and tubular atrophy after kidney transplantation by machine learning algorithms
Original Article

Identification of M2 macrophage-related biomarkers for a predictive model of interstitial fibrosis and tubular atrophy after kidney transplantation by machine learning algorithms

Kaifeng Mao1,2#, Xiang Xu2#, Fenwang Lin1, Yige Pan3, Zhenquan Lu2, Bingfeng Luo2, Yifei Zhu2, Zhenda Li4, Junsheng Ye1

1Department of Kidney Transplantation, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua Medicine, Tsinghua University, Beijing, China; 2Division of Urology, Department of Surgery, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China; 3Department of Nursing, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China; 4Department of Thoracic Surgery, the University of Hong Kong-Shenzhen Hospital, Shenzhen, China

Contributions: (I) Conception and design: K Mao, J Ye; (II) Administrative support: J Ye; (III) Provision of study materials or patients: K Mao, X Xu, F Lin, Y Pan; (IV) Collection and assembly of data: K Mao, Y Pan, Z Lu, B Luo, Y Zhu; (V) Data analysis and interpretation: F Lin, Z Li; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Junsheng Ye, MD, PhD. Department of Kidney Transplantation, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua Medicine, Tsinghua University, No. 168 Litang Road, Changping District, Beijing 102218, China. Email: yejunsh@126.com.

Background: Interstitial fibrosis and tubular atrophy (IFTA) represent significant histopathological manifestations contributing to long-term kidney allograft failure after transplantation. Identifying M2 macrophage (Mφ2)-related biomarkers could enhance early diagnosis and prognosis prediction, improving patient outcomes. This study aimed to explore Mφ2-related biomarkers for IFTA via bioinformatics and machine learning approaches.

Methods: RNA sequencing (RNA-seq) data from the GSE98320 dataset were analyzed to identify differentially expressed genes (DEGs). Immune cell profiling using the CIBERSORT algorithm and weighted gene co-expression network analysis (WGCNA) was performed to elucidate Mφ2-related biomarkers modules. Three machine learning algorithms were applied to identify hub genes. A nomogram model was developed and validated using multiple external datasets. Consensus clustering was employed to stratify patients into high-risk and low-risk groups based on hub gene expression.

Results: We obtained three hub genes (ALOX5, ARL4C, and MS4A6A) significantly associated with IFTA. The nomogram model demonstrated robust discriminatory power with an area under the curve (AUC) of 0.738 in the training cohort and 0.78–0.88 in external validation cohorts. Consensus clustering stratified patients into high-risk (cluster 1) and low-risk (cluster 2) groups, with elevated hub gene expression correlating with accelerated graft loss (P<0.001). Functional enrichment analysis revealed immune dysregulation and activation of fibrosis-related pathways in the high-risk group.

Conclusions: Our findings uncovered novel Mφ2-related biomarkers for IFTA, offering diagnostic, prognostic, and therapeutic targets to improve kidney allograft outcomes. This study highlighted the potential of integrating bioinformatics and machine learning approaches to advance personalized medicine in kidney transplantation.

Keywords: Interstitial fibrosis and tubular atrophy (IFTA); M2 macrophage (Mφ2); machine learning; diagnosis; kidney transplantation


Submitted Mar 12, 2025. Accepted for publication Jun 18, 2025. Published online Jul 28, 2025.

doi: 10.21037/tau-2025-198


Highlight box

Key findings

• This study identified ALOX5, ARL4C, and MS4A6A as M2 macrophage (Mφ2)-associated hub genes driving interstitial fibrosis and tubular atrophy (IFTA) in kidney transplants using bioinformatics and machine learning.

• We developed a diagnostic nomogram model with strong predictive accuracy for IFTA (area under the curve: 0.738 in training cohort; 0.78–0.88 in external validation cohorts).

• Consensus clustering stratified patients into high-risk (cluster 1) and low-risk (cluster 2) groups, with higher hub gene expression linked to faster graft loss (P<0.001).

• High-risk patients showed dysregulation of TNFα/NF-κB and TGF-β signaling, while low-risk patients displayed metabolic pathway activation.

What is known and what is new?

• Chronic Mφ2 polarization promotes IFTA fibrosis, but reliable biomarkers for early diagnosis and risk assessment are lacking. Current IFTA diagnosis relies on biopsy, which is limited by inter-observer variability and insensitivity to early fibrotic changes.

• This study uses transcriptomics, and machine learning to uncover Mφ2 heterogeneity in human kidney allografts. Our multi-cohort validation bridges computational discovery with clinical prognosis.

What is the implication, and what should change now?

• The identified hub genes (ALOX5, ARL4C, and MS4A6A) offer potential as biomarkers for IFTA diagnosis, prognosis, and therapy, improving kidney transplant outcomes.


Introduction

Globally, solid organ transplantation remains a critical therapeutic intervention for end-stage organ failure, with kidney transplants accounting for around two-thirds of all procedures (1,2). Despite advancements in immunosuppressive regimens and post-transplant management, long-term graft outcomes remain suboptimal, with approximately 40% of kidney allografts failing within a decade (3). Chronic allograft injury, primarily mediated by alloimmune responses, typically manifests histologically as interstitial fibrosis and tubular atrophy (IFTA) and glomerulosclerosis (4-6). These pathological changes are the primary drivers of progressive graft dysfunction and remain the leading cause of long-term graft loss (2,7). While the immune-mediated mechanisms underlying chronic graft injury have been extensively studied, there is an urgent need to elucidate the molecular pathways driving irreversible fibrosis to develop more effective prognostic and therapeutic strategies.

The plasticity of macrophages (Mφ) within the tissue microenvironment plays a critical role in the pathogenesis of fibrosis (8,9). These innate immune cells adopt context-dependent polarization states-classically activated [M1 macrophage (Mφ1); pro-inflammatory] or alternatively activated [M2 macrophage (Mφ2); resolution/repair phenotypes]-through differential cytokine stimulation (8,10). While Mφ1 polarization is induced by interferon-γ (IFN-γ) and lipopolysaccharide, Mφ2 differentiation is primarily driven by interleukin-4 (IL-4) and IL-13 (11). Although transient Mφ2 activation is beneficial for tissue repair, persistent Mφ2 polarization in kidney allografts contribute to maladaptive fibrotic processes (12). Dysregulated Mφ2 secretes elevated levels of TGF-β, which promotes extracellular matrix (ECM) deposition through SMAD3-dependent pathways and facilitates macrophage-to-myofibroblast transition (MMT), a key event in fibrogenesis (13). Emerging evidence further implicates the ATF6/TGF-β/SMAD3 and JAK/STAT6 pathways in the progression of MMT (14). Despite these insights, a comprehensive characterization of fibrosis-associated Mφ2 subpopulations and their transcriptional signatures in human kidney allografts remain limited, a critical gap this study aims to address.

Current clinical diagnosis of IFTA relies heavily on invasive pathological biopsy results, but limitations like inconsistent physician evaluations, vague tissue patterns, and un-reliable guidelines reduce accuracy (7,15,16). Objective molecular measurements could supplement traditional methods, while advancements in gene analysis tools and artificial intelligence (AI) are transforming disease marker discovery (17). Advances in high-throughput sequencing and computational approaches, such as machine learning, now enable the systematic identification of diagnostic gene signatures across various pathologies (18). Among these methods, weighted gene co-expression network analysis (WGCNA) has emerged as a powerful tool for constructing scale-free gene networks that preserve continuous expression relationships (19). By correlating gene modules with clinical phenotypes, WGCNA identifies hub genes that are mechanistically linked to disease progression (19). When combined with advanced machine learning techniques such as least absolute shrinkage and selection operator (LASSO) regression and random forest (RF) classifiers, WGCNA enhances the precision and generalizability of biomarker discovery in high-dimensional datasets (20).

In this study, we utilized the GSE98320 dataset, obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/), to identify potential Mφ2-related differentially expressed genes (DEGs) through differential expression analysis and WGCNA. Using three machine learning algorithms [LASSO, eXtreme Gradient Boosting (XGBoost), and RF], we identified three key Mφ2-associated hub biomarkers—ALOX5, ARL4C, and MS4A6A—as central drivers of IFTA. A diagnostic model based on these genes demonstrated robust predictive performance [area under the curve (AUC): 0.738 in the derivation cohort; 0.78–0.88 in external validation cohorts], establishing them as reliable biomarkers for kidney deterioration. Prognostically, consensus clustering stratified kidney transplant recipients into high-risk (cluster 1) and low-risk (cluster 2) groups, with elevated hub gene expression levels strongly correlating with accelerated graft loss (P<0.001). Functional enrichment analysis revealed significant immune dysregulation, particularly involving TNFα/NF-κB and TGF-β signaling pathways, in high-risk patients. Our findings uncover novel Mφ2-related biomarkers in post-transplant fibrosis, providing valuable diagnostic, prognostic, and therapeutic targets to mitigate IFTA progression and enhance graft survival. We present this article in accordance with the TRIPOD reporting checklist (available at https://tau.amegroups.com/article/view/10.21037/tau-2025-198/rc).


Methods

Data collection and identification of DEGs

Figure 1 presents the workflow of this study. RNA sequencing (RNA-seq) datasets were retrieved from the NCBI GEO (https://www.ncbi.nlm.nih.gov/geo/). Five distinct datasets were utilized: GSE98320, GSE76882, GSE22459, GSE65326, and GSE21374. Detailed information regarding these datasets are provided in Table 1. Differentially expression analysis was performed on the GSE98320 dataset to compare Non-IFTA and IFTA groups using the “limma” package in R (21). The analysis applied stringent criteria: |log fold change (FC)| >0.5 and the adjusted P value <0.05. Visualization of DEGs was facilitated through volcano plots and heatmaps, constructed using the “ggplot2” and “pheatmap” packages in R, respectively. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Figure 1 The research flowchart of this study. DEGs, differentially expressed genes; GO, gene ontology; GSVA, Gene set variation analysis; GSEA, gene set enrichment analysis; IFTA, interstitial fibrosis and tubular atrophy; KEGG, Kyoto Encyclopedia of Genes and Genomes; KM, Kaplan-Meier; LASSO, least absolute shrinkage and selection operator; Mφ2, M2 macrophage; RF, random forest; ROC, receiver operating characteristic; WGCNA, weighted gene co-expression network analysis; XGBoost, eXtreme Gradient Boosting.

Table 1

Details of the GEO datasets in the study

GSE series Platform Organism Source types Sample size Use for
GSE98320 GPL15207 Homo sapiens Kidney transplant biopsies 274 non-IFTA vs. 145 IFTA (I) Getting the DEGs; (II) getting Mφ2-related genes; (III) identifying the hub genes by machine learning algorithms; (IV) construction of the IFTA diagnostic model
GSE76882 GPL13158 Homo sapiens Kidney transplant biopsies 99 non-IFTA vs. 135 IFTA Validating IFTA diagnostic model
GSE22459 GPL570 Homo sapiens Kidney transplant biopsies 25 non-IFTA vs. 40 IFTA Validating IFTA diagnostic model
GSE65326 GPL10558 Homo sapiens Kidney transplant biopsies 6 non-IFTA vs. 16 IFTA Validating IFTA diagnostic model
GSE21374 GPL570 Homo sapiens Kidney transplant biopsies 6 non-IFTA vs. 16 IFTA Investigating the prognostic implications of three hub genes in kidney transplantation

DEGs, differentially expressed genes; GEO, Gene Expression Omnibus; IFTA, interstitial fibrosis and tubular atrophy; Mφ2, M2 macrophage.

Immune cell profiling using CIBERSORT algorithm

The CIBERSORT algorithm was employed to quantify immune cell composition within the microenvironment. Utilizing the LM22 leukocyte gene signature matrix, which identifies 22 human hematopoietic cell subtypes—including T cells, B cells, plasma cells, NK cells, and myeloid subsets-this study estimated the relative abundance of these immune cell types. Specifically, Mφ composition was assessed based on the gene expression matrix derived from the GSE98320 dataset using CIBERSORT.

WGCNA and integration with DEGs

The WGCNA R package was utilized to construct weighted gene co-expression networks (19). Initially, we applied hierarchical clustering analysis to exclude outlier samples. The optimal soft threshold β was determined using the “pickSoftThreshold” function in the WGCNA package, facilitating the construction of an adjacency matrix. This matrix was subsequently converted into a topological overlap matrix (TOM). Average linkage hierarchical clustering was performed based on TOM measures to group genes with comparable expression profiles into distinct modules. Lastly, we evaluated the relationship between these modules, macrophage subtypes, and clinical characteristics. To elucidate genes associated with Mφ2, the intersection of the DEGs and the critical module genes identified via WGCNA was analyzed, yielding a set of the common genes potentially linked to Mφ2.

Identification of hub genes by machine learning

To identify hub genes, we employed a three-step machine learning approach. First, the LASSO logistic regression algorithm was implemented using the “glmnet” package in R, enabling the selection of potential hub genes from the common gene pool (22). The regularization parameter (λ) was optimized through 10-fold cross-validation, with “lambda.min” chosen as the optimal value. Regression coefficients were visualized using path diagrams and cross-validation curves. Next, the RF algorithm was applied via the “randomForest” package. Gene importance was quantified using the Mean Decrease Gini coefficient, which serves as a measure of feature purity. Additionally, the XGBoost algorithm, an advanced gradient boosting method known for its efficacy in classification tasks, was employed to rank features by importance using the “XGBoost” package in R (18). By intersecting the top 5 genes identified by XGBoost, the top 5 genes from the RF algorithm, as well as the significant genes derived from LASSO regression, we ultimately identified three hub genes. This integrative approach ensures the reliability and accuracy of the results, providing a comprehensive framework for hub gene identification.

Functional and pathway enrichment analysis of DEGs

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted to explore the functional roles and pathways of DEGs in the GSE98320 dataset, using the “clusterProfiler” package (23). A significance threshold of P<0.05 was applied. Gene set variation analysis (GSVA) was performed using the R package “GSVA” and the H: hallmark gene sets (h.all.v2023.1.Hs.symbols) from the Molecular Signatures Database (MSigDB; http://www.gsea-msigdb.org/gsea/msigdb/index.jsp) (24), to identify pathway differences between two cluster subtypes in GSE21374. Results were considered significant at false discovery rate (FDR) <0.05. Single-gene gene set enrichment analysis (GSEA) of three hub genes was conducted using the “GSEABase” package to elucidate their biological functions in IFTA.

Identification of molecular subtype

To elucidate the molecular mechanisms of IFTA, we employed the consensus clustering algorithm—a robust method for identifying subgroups within the GSE21374 dataset. Utilizing the R package “ConsensusClusterPlus”, an unsupervised clustering analysis was performed. The optimal number of clusters were determined through consensus matrix plots, cumulative distribution function (CDF) plots, and relative changes in the area under the CDF curve. Principal component analysis (PCA) was subsequently conducted to assess inter-cluster variability. Furthermore, expression patterns of the 3 hub genes were visualized using violin plots, highlighting significant disparities between clusters.

Analysis of immune cell infiltration

CIBERSORT, a deconvolution algorithm, quantified 22 immune cell types in the GSE21374 dataset, generating an immune infiltration map to distinguish two subtypes via unsupervised clustering using the R package “ggpubr”. To validate these findings, single-sample GSEA (ssGSEA) was performed on four subgroups to assess immune function enrichment.

Development and validation of an IFTA predictive nomogram model

We constructed a nomogram model using three hub genes from the GSE98320 dataset, employing the “rms” package in R to predict IFTA incidence. Calibration curves were generated to assess the model’s accuracy, while receiver operating characteristic (ROC) curves evaluated its efficacy. External validation was performed using the GSE76882, GSE22459, and GSE65326 datasets. Kaplan-Meier (KM) survival curves were used to analyze survival differences among kidney transplantation patient subtypes in the GSE21374 dataset.

Statistical analysis

Statistical analyses were conducted using R software (version 4.2.2). Group differences were evaluated using Student’s t-test for normally distributed variables and the Mann-Whitney U test for non-normally distributed data. Survival outcomes were analyzed through K-M survival curves, with comparisons performed via the log-rank test and the “survminer” R package. Correlation analysis was conducted using the Pearson correlation test. All statistical tests were two-sided, and a P value <0.05 was deemed statistically significant to ensure robust findings.


Results

Identification of DEGs

The research flowchart for this study is illustrated in Figure 1. We obtained the GSE98320 dataset from the GEO database, which included a total of 419 samples: 145 from patients with IFTA and 274 from the non-IFTA group. Differential gene expression analysis obtained 197 DEGs, including 139 upregulated genes and 58 downregulated genes in the IFTA group (Figure 2A,2B; available online: https://cdn.amegroups.cn/static/public/tau-2025-198-1.xlsx).

Figure 2 Identification of DEGs and key Mφ2-related gene modules by WGCNA analysis. (A) Volcano plot illustrating DEGs between non-IFTA and IFTA groups in the GSE98320 dataset. (B) Heatmap displaying the expression patterns of DEGs. (C) Scale independence and mean connectivity analyses for selecting the optimal soft threshold power (β=9). (D) Hierarchical clustering dendrogram of co-expressed genes, with modules represented by distinct colors. (E) Correlations between module eigengenes, Mφ phenotypes, and clinical outcomes. Rows denote modules, while columns represent macrophage subtypes and diagnoses. Correlation coefficients (red for positive, green for negative) and P values (in parentheses) are displayed. (F) Scatter plots of module membership versus gene significance for Mφ2 in blue and turquoise modules. Genes within the green box (module membership >0.8, gene significance >0.3) are prioritized as hub candidates. DEGs, differentially expressed genes; FC, fold change; IFTA, interstitial fibrosis and tubular atrophy; Mφ2, M2 macrophage; WGCNA, weighted gene co-expression network analysis.

Estimation of the Mφ infiltration level in kidney fibrosis

The infiltration levels of Mφ across all samples in GSE98320 dataset were quantified using the CIBERSORT algorithm, which was applied to the gene expression matrix. The clinical feature and the proportions of the three Mφ phenotypes were integrated for WGCNA (available online: https://cdn.amegroups.cn/static/public/tau-2025-198-2.xlsx).

Identification of key Mφ2-related biomarker modules using WGCNA

Employing WGCNA, we identified critical gene modules correlated with Mφ. After preprocessing the GSE98320 dataset to eliminate duplicates and missing values, the top 25% genes were selected for WGCNA analysis. The optimal soft threshold β of 10 was determined based on achieving a scale-free R2 of 0.8 (Figure 2C). With β=10 and a minimum module size of 100, six gene co-expression modules were established using average hierarchical clustering and dynamic tree pruning (Figure 2D). The turquoise module showed the strongest negative correlation with Mφ2 infiltration levels (R=−0.42, P=2e−19), while the blue module demonstrated the strongest positive correlation (R=0.28, P=9e−9) (Figure 2E). Both modules were significantly associated with IFTA, suggesting their potential role in the transition from non-IFTA to IFTA, correlated with Mφ2. They were thus designated as Mφ2-related modules for further investigation. We calculated the MM values for genes to identify hub genes in these modules, with a focus on those strongly correlated with Mφ2 [module membership (MM) >0.8, gene significance (GS) >0.3]. The blue and turquoise modules exhibited the highest correlation coefficients with Mφ2 (blue module =0.51, P value =2.4e−52; turquoise module =0.33, P value =8.3e−42) (Figure 2F). A total of 24 key Mφ2-related biomarkers were identified through “network screening” based on GS and MM, marking them as potential biomarkers in the context of Mφ2-related gene modules (Table S1).

Gene co-expression and functional analysis in IFTA

The overlap of DEGs and Mφ2-related biomarkers was visualized in a Venn diagram, obtaining 12 common genes (Figure 3A). To elucidate their roles in IFTA, we conducted GO and KEGG analyses. Biological process (BP) results revealed significant enrichment in TNF signaling, B cell activation, and humoral immune response (Figure 3B). Cellular component (CC) results showed significant enrichment in IPAF inflammasome complex, NLRP3 inflammasome complex and trans-Golgi network (Figure 3C). Molecular function (MF) results highlighted modulation of cysteine endopeptidase, arachidonate metabolism (Figure 3D). KEGG analysis indicated that the common genes were most enriched in the efferocytosis, necroptosis, and NOD-like receptor signaling pathway (Figure 3E).

Figure 3 Functional enrichment analysis based on common genes. (A) Venn diagram intersecting DEGs and key Mφ2-related modules, identifying 12 common genes. (B-D) GO analysis based on common genes: (B) BP, (C) CC, and (D) MF. (E) KEGG analysis based on the common genes. BP, biological process; CC, cellular composition; DEGs, differentially expressed genes; GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; Mφ2, M2 macrophage; MF, molecular function; WGCNA, weighted gene co-expression network analysis.

Identification of the diagnostic markers for kidney fibrosis via machine learning

Employing three machine-learning algorithms, we identified potential biomarkers for kidney fibrosis. LASSO regression identified 5 genes (Figure 4A,4B, Table S2). XGBoost ranked the common genes by their importance (Figure 4C, Table S3). RF algorithm quantified gene importance via mean decrease Gini scores (Figure 4D,4E, Table S4). Intersection of top 5 genes from each machine learning method yielded 3 hub genes: ALOX5, ARL4C, and MS4A6A (Figure 4F).

Figure 4 Identification of kidney fibrosis diagnostic markers via machine learning. (A) LASSO coefficient trajectory. (B) Lambda selection via minimum (left dashed line) and 1 − SE (right dashed line) criteria. (C) XGBoost-derived gene importance ranking. (D) RF structure. (E) Mean Decrease Gini scores from RF. (F) Venn diagram overlapping top 5 genes from each machine learning method. LASSO, least absolute shrinkage and selection operator; RF, random forest; SE, standard error; XGBoost, eXtreme Gradient Boosting.

Expression levels and correlation analysis of the hub genes

The differential expressions of ALOX5, ARL4C and MS4A6A between the Non-IFTA and IFTA in GSE98320 datasets were shown in Figure 5A. The expression of these 3 hub genes exhibited elevated in IFTA (P<0.05). The consistent expression trends in GSE76882 dataset reinforced these findings (Figure 5B). Furthermore, we performed correlation analysis to better understand the correlation between the hub genes. Correlation analysis revealed strong positive correlations among ARL4C and ALOX5 (r=0.818, P<0.001), ALOX5 and MS4A6A (r=0.816, P<0.001), and ARL4C and MS4A6A (r=0.755, P<0.001), underscoring their shared characteristics in IFTA (Figure 5C-5E).

Figure 5 Expression levels and correlation analysis of the hub genes. Violin plot showed differential expression of three hub genes across GSE98320 (A) and GSE76882 (B) datasets. (C-E) Paired correlations visualized through scatter plots: ARL4C vs. ALOX5 (C), ALOX5 vs. MS4A6A (D), ARL4C vs. MS4A6A (E). Statistical thresholds: ***, P<0.001. RIF, renal interstitial fibrosis.

Establishment and validation of the IFTA diagnostic model

In the GSE98320 dataset, the three hub genes were employed to fabricate an IFTA diagnostic model via logistic regression. Subsequently, this model was graphically represented using a nomogram, integrating three significant predictors of IFTA occurrence (Figure 6A). Calibration curves confirmed a strong correlation between observed and predicted IFTA incidence (Figure 6B). With an AUC of 0.738, the model demonstrated adequate discriminative power (Figure 6C). Individual analyses revealed AUC values of 0.720 for ALOX5, 0.730 for ARL4C, and 0.700 for MS4A6A, respectively (Figure 6D-6F). External validation on GSE22459, GSE65326, and GSE76882 affirmed the model’s accuracy with AUCs of 0.782, 0.833, and 0.884, respectively (Figure 6G-6I).

Figure 6 Development and validation of the IFTA diagnostic model. (A) Nomogram for IFTA occurrence risk assessment. (B) Calibration curve. (C) ROC curve for training set (GSE98320). (D-F) ROC curves for hub genes: (D) ALOX5, (E) ARL4C, (F) MS4A6A. (G-I) ROC curves for external sets: (G) GSE22459, (H) GSE65326, (I) GSE76882. For ROC curves (C-I), the indicated thresholds represent optimal cutoffs, with corresponding sensitivity and specificity values reflecting model performance at these practical decision points. AUC, area under the curve; IFTA, interstitial fibrosis and tubular atrophy; ROC, receiver operating characteristic.

The role of three hub genes in kidney transplant prognosis

Investigating the prognostic implications of the three hub genes in kidney transplantation, we employed consensus clustering in the GSE21374 dataset. Optimal stability in clustering was identified with two clusters, validated through CDF and CDF delta curves (Figure 7A,7B). The 282-kidney transplant samples were distinctly partitioned into clusters 1 (n=118) and clusters 2 (n=164), as exhibited in the consensus matrix plot (Figure 7C) (available online: https://cdn.amegroups.cn/static/public/tau-2025-198-3.xlsx). PCA plot delineated clear distinctions between the clusters (Figure 7D). The expression patterns of hub genes were visualized using violin plots (Figure 7E). Furthermore, we compared their expression levels between non-rejection and rejection groups, as well as between survival and loss groups (Figure S1). Comparative KM survival curves revealed that cluster 1, exhibiting poorer outcomes, had a higher rate of kidney graft loss over time post-transplantation (Figure 7F). Notably, cluster 1 had a significantly greater proportion of rejection episodes (P<0.001) (Figure 7G). Moreover, cluster 1 showed a notably elevated incidence of graft loss compared to cluster 2 (P<0.001) (Figure 7H). These results suggested consensus clustering stratified patients into high-risk (cluster 1) and low-risk (cluster 2) groups.

Figure 7 Cluster identification based on three hub genes expression. (A) Consensus CDF plot. (B) CDF area change ratio. (C) k-means matrix at k=2. (D) PCA plot of cluster distribution. (E) Violin plot for gene expression across clusters. (F) KM survival curves for graft loss between the two clusters. (G,H) Bar charts show rejection (G) and loss (H) differences between the two clusters. **, P<0.01; ***, P<0.001. CDF, cumulative distribution function; KM, Kaplan-Meier; PCA, principal component analysis.

Functional enrichment and immune cell infiltrations between the two clusters

Conducting GSVA enrichment analysis with the hallmarks gene set from MSigDB database, we observed that cluster1 exhibited enrichment in inflammatory responses and allograft rejection pathways, including IL-6-JAK-STAT3 signaling, TNFα/NF-κB signaling, and TGF-β signaling. Conversely, cluster 2 showed metabolic pathway activation, such as xenobiotic metabolism and oxidative phosphorylation (Figure 8A). GSEA linked the hub genes (ALOX5, ARL4C, and MS4A6A) to allograft rejection, graft-versus-host disease, and primary immunodeficiency (Figure 8B-8D). Employing CIBERSORT, we discovered significant differences in immune cell infiltration, predominantly within T cells and macrophages (Figure 8E). Subsequent ssGSEA confirmed these differences in immune cell functions, with nearly all cell types showing significant disparity (Figure 8F).

Figure 8 Functional enrichment and immune cell infiltrations between two clusters. (A) GSVA showed the different hallmarks between the two clusters. Dark blue: pathways significantly up-regulated in cluster 1 vs. cluster 2 (t-value ≥5); light green: pathways significantly down-regulated in cluster 1 vs. cluster 2 (t-value ≤−5); light gray: pathways with no significant change (−5< t-value <5). (B-D) GSEA identifies signaling pathways involved in the three hub genes: (B) ALOX5, (C) ARL4C, (D) MS4A6A. Differences in immune cells infiltration by CIBERSORT (E) and ssGSEA (F). *, P<0.05; **, P<0.01; ***, P<0.001; ns, P≥0.05. GSVA, gene set variation analysis; MDSC, myeloid-derived suppressor cell; NK, natural killer; ssGSEA, single-sample gene set enrichment analysis.

Discussion

IFTA represents not only a prevalent histopathological manifestation of chronic kidney disease (CKD) but also a significant contributor to long-term kidney failure in transplanted kidneys (2,24). Emerging in the early post-transplant period as a consequence of chronic fibrosis, IFTA progressively leads to kidney dysfunction (25,26). While early diagnostic models for IFTA have demonstrated utility in prognosis assessment, effective detection methods remain scarce. The predominant understanding of fibrotic progression posits that the graft undergoes irreversible damage, maintaining structural integrity through a non-specific healing response, ultimately manifesting as interstitial fibrosis (27). Within this complex pathological process, Mφ emerge as pivotal immune regulators in kidney homeostasis, orchestrating inflammation, tissue regeneration, and fibrosis (14). Shinoda et al. elucidated that tissue transglutaminase (TG2) activity exacerbates kidney fibrosis through ALOX15-mediated polarization of monocytes into Mφ2 (28). Furthermore, the process of MMT has been identified as a significant contributor to ECM accumulation, with transitional cells co-expressing myofibroblast marker α-SMA and macrophage marker CD68. Notably, most α-SMA+CD68+ cells in fibrotic regions also express CD206, a characteristic Mφ2 marker (14). The advent of AI has revolutionized medical diagnostics, with machine learning emerging as a powerful computational tool for outcome prediction through advanced data mining and algorithmic analysis (29). This technology has shown particular promise in anticipating post-transplantation complications (30).

In our current investigation, we employed a comprehensive approach to identify key molecular players in IFTA pathogenesis. Through differential gene expression analysis and WGCNA of the GSE98320 dataset, we identified 197 DEGs and 24 key modular genes. Their intersection yielded 12 common genes, which were subsequently subjected to functional enrichment analysis. GO and KEGG analyses revealed their involvement in critical immune processes, including efferocytosis, necroptosis, and the NOD-like receptor signaling pathway.

Employing advanced machine learning algorithms (XGBoost, LASSO, and RF algorithms), we pinpointed three hub genes: ALOX5, ARL4C, and MS4A6A. A nomogram model incorporating these genes demonstrated promising efficacy in IFTA onset prediction, with AUC values consistently exceeding 0.7 across multiple validation datasets (GSE76882, GSE22459, and GSE65326). Consensus clustering analysis of GSE21374 delineated two distinct clusters, with subsequent prognosis analysis, functional enrichment, and immune cell infiltration studies revealing significant differences between these subgroups.

The identified hub genes warrant detailed examination. ALOX5 is a critical mediator in lipid metabolism and inflammation (31), significantly influencing Mφ polarization and fibrotic processes. In cancer, ALOX5 regulates the tumor microenvironment (TME) by promoting the infiltration and polarization of Mφ2, a subset of macrophages associated with immune suppression and tumor progression. In intrahepatic cholangiocarcinoma (ICC), the ALOX5 metabolite LTB4 recruits Mφ2 via BLT1/BLT2 receptors and activates the PI3K pathway, driving tumor growth (31). Similarly, in pancreatic cancer, ALOX5 enhances Mφ2 polarization through the JAK/STAT pathway, while Zileuton, an ALOX5 inhibitor, effectively counteracts this effect (32). In gliomas, ALOX5 mediates immunosuppressive Mφ2 polarization and upregulates programmed death-ligand 1 (PD-L1) expression via 5-hydroxyeicosatetraenoic acid (5-HETE), contributing to tumor immune evasion (33). Beyond oncology, ALOX5 plays a role in fibrotic diseases such as encapsulating peritoneal sclerosis, where its overexpression suggests it as a potential therapeutic target (34). In diabetic nephropathy (DN), ALOX5 inhibition reduces NF-κB signaling and mitigates kidney cell injury, offering new avenues for treatment (35). These diverse roles highlight the central role of ALOX5 in linking Mφ2 polarization and fibrotic mechanisms across diverse pathological conditions, emphasizing its potential as a therapeutic target in both cancer and fibrosis.

ARL4C, a membrane-localized GTP-binding protein, plays a significant role in Mφ polarization and fibrosis, particularly in inflammatory and fibrotic diseases (36). In rheumatoid arthritis (RA), ARL4C activation in fibroblast-like synoviocytes (FLSs) promotes synovial inflammation, cartilage degradation, and bone erosion through PI3K/AKT and MAPK signaling pathways. Importantly, silencing ARL4C disrupts the polarization of monocytes to the pro-inflammatory Mφ1 phenotype and inhibits the repolarization of Mφ2 to Mφ1, highlighting its regulatory role in macrophage dynamics (37). Beyond autoimmune diseases, ARL4C contributes to cancer-related fibrosis by driving epithelial-to-mesenchymal transition (EMT) (38) and facilitating invasion in cancers such as pancreatic and colorectal cancer through specific signaling pathways like ARL4C-IQGAP1-MMP14 (38-40). Additionally, therapeutic agents like ursolic acid show promise by modulating AKT signaling to promote ARL4C degradation, thus inhibiting fibrosis and cancer metastasis (41). These findings collectively underscore ARL4C as a key regulator connecting Mφ polarization, fibrotic processes, and disease progression, offering potential therapeutic targets.

MS4A6A plays a critical role in Mφ function (42) and fibrosis progression (43). In gliomas, it serves as a prognostic biomarker produced by Mφ, linked to poor outcomes and tumor aggressiveness (42). Additionally, MS4A6Ahigh Mφ with an Mφ2 phenotype drives inflammatory responses in fibrotic hypersensitivity pneumonitis, highlighting its involvement in immune dysregulation and fibrotic processes (43). Beyond fibrosis, MS4A6A contributes to autoimmune pathology, such as in lupus nephritis, further underscoring its significance in immune modulation and disease progression (44). These findings suggest MS4A6A as a key regulator of Mφ activity and fibrotic mechanisms.

To further elucidate the role of the three hub genes in the long-term outcomes of kidney transplantation, we performed unsupervised clustering on the prognosis set based on their expression, identifying two distinct subgroups. Notably, the cluster1 group exhibited significantly poorer prognosis compared to cluster2. GSVA enrichment analysis revealed that cluster1 was characterized by the upregulation of pathways associated with allograft rejection, including IL-6/JAK/STAT3 signaling and TGF-β signaling, both of which are closely linked to fibrosis (45,46). In contrast, cluster2 showed activation of metabolic processes such as xenobiotic metabolism, adipogenesis, fatty acid metabolism, and oxidative phosphorylation. These findings suggest that the cluster1 group, identified through unsupervised clustering with the IFTA diagnostic model, represents a high-risk population with advanced fibrosis and poor clinical outcomes. This classification provides a valuable tool for risk stratification in the early post-transplantation period, enabling targeted high-frequency screening and early interventions for high-risk patients. Such an approach may mitigate adverse events and reduce the risk of graft loss, ultimately improving transplant survival rates.

While our study provides valuable insights into Mφ2-related biomarkers for IFTA, several limitations should be acknowledged. First, reliance on retrospective transcriptomic data introduces biases from batch effects and platform variability. Although normalized, RNA-seq data may still be influenced by technical factors like sequencing depth and RNA quality. Second, the machine learning workflow prioritized computational robustness, but the limited sample size (n=419 in GSE98320) restricts generalizability. Prospective, multi-center cohorts with standardized histology are needed for validation. Third, the findings remain correlative. While TNFα/NF-κB and TGF-β pathways were implicated, experimental validation (e.g., Mφ2 polarization assays or knockout models) is required to confirm causality of ALOX5, ARL4C, and MS4A6A in fibrosis. Finally, the nomogram model focuses on diagnostic the accuracy but lacks integration of clinical parameters like donor-specific antibodies, epidermal growth factor receptor (eGFR), or proteinuria, which could enhance predictive value. Future studies should address these gaps through multi-omics approaches and longitudinal monitoring to improve personalized management of kidney fibrosis.


Conclusions

This study identifies ALOX5, ARL4C, and MS4A6A as hub genes associated with M2 macrophage-driven kidney fibrosis in IFTA. Using bioinformatics and machine learning, we developed a diagnostic nomogram model with robust predictive accuracy (AUC 0.738–0.88). Consensus clustering stratified patients into high- and low-risk groups, where elevated hub gene expression correlated with accelerated graft loss (P<0.001) and dysregulation of fibrosis-related pathways (TGF-β, TNFα/NF-κB). These findings provide critical diagnostic/prognostic biomarkers and therapeutic targets, advancing personalized strategies for mitigating IFTA progression in kidney transplantation.


Acknowledgments

The authors sincerely acknowledge the Gene Expression Omnibus (GEO) database for providing the invaluable datasets used in this study. Additionally, the authors would like to express their gratitude to jvenn (https://www.bioinformatics.com.cn/static/others/jvenn/example.html) for the Venn diagrams provided.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tau.amegroups.com/article/view/10.21037/tau-2025-198/rc

Peer Review File: Available at https://tau.amegroups.com/article/view/10.21037/tau-2025-198/prf

Funding: This work was supported by the National Natural Science Foundation of China (No. 81670683).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tau.amegroups.com/article/view/10.21037/tau-2025-198/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Zheng X, Zhang W, Zhou H, et al. A randomized controlled trial to evaluate efficacy and safety of early conversion to a low-dose calcineurin inhibitor combined with sirolimus in renal transplant patients. Chin Med J (Engl) 2022;135:1597-603. [Crossref] [PubMed]
  2. Yin Y, Chen C, Zhang D, et al. Construction of predictive model of interstitial fibrosis and tubular atrophy after kidney transplantation with machine learning algorithms. Front Genet 2023;14:1276963. [Crossref] [PubMed]
  3. Lai X, Zheng X, Mathew JM, et al. Tackling Chronic Kidney Transplant Rejection: Challenges and Promises. Front Immunol 2021;12:661643. [Crossref] [PubMed]
  4. Zheng X, Li M, Wang P, et al. Assessment of chronic allograft injury in renal transplantation using diffusional kurtosis imaging. BMC Med Imaging 2021;21:63. [Crossref] [PubMed]
  5. Ahuja HK, Azim S, Maluf D, et al. Immune landscape of the kidney allograft in response to rejection. Clin Sci (Lond) 2023;137:1823-38. [Crossref] [PubMed]
  6. Wang M, Zeng F, Ning F, et al. Ceria nanoparticles ameliorate renal fibrosis by modulating the balance between oxidative phosphorylation and aerobic glycolysis. J Nanobiotechnology 2022;20:3. [Crossref] [PubMed]
  7. Guo Y, Cen K, Hong K, et al. Construction of a neural network diagnostic model for renal fibrosis and investigation of immune infiltration characteristics. Front Immunol 2023;14:1183088. [Crossref] [PubMed]
  8. Zhang Y, Liu Y, Luo S, et al. An adoptive cell therapy with TREM2-overexpressing macrophages mitigates the transition from acute kidney injury to chronic kidney disease. Clin Transl Med 2025;15:e70252. [Crossref] [PubMed]
  9. Froom ZSCS, Callaghan NI, Davenport Huyer L. Cellular crosstalk in fibrosis: Insights into macrophage and fibroblast dynamics. J Biol Chem 2025;301:110203. [Crossref] [PubMed]
  10. Fonseca AC, Colavite PM, Azevedo MCS, et al. Inhibition of MEK1/2 Signaling Pathway Limits M2 Macrophage Polarization and Interferes in the Dental Socket Repair Process in Mice. Biology (Basel) 2025;14:107. [Crossref] [PubMed]
  11. Wang H, Ye X, Spanos M, et al. Exosomal Non-Coding RNA Mediates Macrophage Polarization: Roles in Cardiovascular Diseases. Biology (Basel) 2023;12:745. [Crossref] [PubMed]
  12. Setten E, Castagna A, Nava-Sedeño JM, et al. Understanding fibrosis pathogenesis via modeling macrophage-fibroblast interplay in immune-metabolic context. Nat Commun 2022;13:6499. [Crossref] [PubMed]
  13. Tang PM, Zhang YY, Xiao J, et al. Neural transcription factor Pou4f1 promotes renal fibrosis via macrophage-myofibroblast transition. Proc Natl Acad Sci U S A 2020;117:20741-52. [Crossref] [PubMed]
  14. Li G, Yang H, Zhang D, et al. The role of macrophages in fibrosis of chronic kidney disease. Biomed Pharmacother 2024;177:117079. [Crossref] [PubMed]
  15. St Jeor JD, Reisenauer CJ, Andrews JC, et al. Transjugular Renal Biopsy Bleeding Risk and Diagnostic Yield: A Systematic Review. J Vasc Interv Radiol 2020;31:2106-12. [Crossref] [PubMed]
  16. Pang Q, Chen H, Wu H, et al. N6-methyladenosine regulators-related immune genes enable predict graft loss and discriminate T-cell mediate rejection in kidney transplantation biopsies for cause. Front Immunol 2022;13:1039013. [Crossref] [PubMed]
  17. Greener JG, Kandathil SM, Moffat L, et al. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022;23:40-55. [Crossref] [PubMed]
  18. Mao K, Lin F, Pan Y, et al. Identification of glycosyltransferase genes for diagnosis of T-cell mediated rejection and prediction of graft loss in kidney transplantation. Transpl Immunol 2024;87:102114. [Crossref] [PubMed]
  19. Peng S, Yan W, Yan Y, et al. AP2M1 as the potential biomarker for prediction of the response of atopic dermatitis to Dupilumab therapy: Multi-omics analysis and evidence. Int J Biol Macromol 2025;297:139757. [Crossref] [PubMed]
  20. Jiang H, Zhang X, Wu Y, et al. Bioinformatics identification and validation of biomarkers and infiltrating immune cells in endometriosis. Front Immunol 2022;13:944683. [Crossref] [PubMed]
  21. Huang L, Zhang J, Songyang Z, et al. Identification and Validation of eRNA as a Prognostic Indicator for Cervical Cancer. Biology (Basel) 2024;13:227. [Crossref] [PubMed]
  22. Yao X, Qi X, Wang Y, et al. Identification and Validation of an Annexin-Related Prognostic Signature and Therapeutic Targets for Bladder Cancer: Integrative Analysis. Biology (Basel) 2022;11:259. [Crossref] [PubMed]
  23. Mao K, Lin F, Pan Y, et al. Identification of mitophagy-related gene signatures for predicting delayed graft function and renal allograft loss post-kidney transplantation. Transpl Immunol 2024;87:102148. [Crossref] [PubMed]
  24. Djudjaj S, Boor P. Cellular and molecular mechanisms of kidney fibrosis. Mol Aspects Med 2019;65:16-36. [Crossref] [PubMed]
  25. Feng YL, Wang WB, Ning Y, et al. Small molecules against the origin and activation of myofibroblast for renal interstitial fibrosis therapy. Biomed Pharmacother 2021;139:111386. [Crossref] [PubMed]
  26. Zhang H, Yang Y, Liu Z, et al. Significance of methylation-related genes in diagnosis and subtype classification of renal interstitial fibrosis. Hereditas 2023;160:32. [Crossref] [PubMed]
  27. Romagnani P, Remuzzi G, Glassock R, et al. Chronic kidney disease. Nat Rev Dis Primers 2017;3:17088. [Crossref] [PubMed]
  28. Shinoda Y, Tatsukawa H, Yonaga A, et al. Tissue transglutaminase exacerbates renal fibrosis via alternative activation of monocyte-derived macrophages. Cell Death Dis 2023;14:136. [Crossref] [PubMed]
  29. Lin A, Qi C, Li M, et al. Deep Learning Analysis of the Adipose Tissue and the Prediction of Prognosis in Colorectal Cancer. Front Nutr 2022;9:869263. [Crossref] [PubMed]
  30. Senanayake S, White N, Graves N, et al. Machine learning in predicting graft failure following kidney transplantation: A systematic review of published predictive models. Int J Med Inform 2019;130:103957. [Crossref] [PubMed]
  31. Chen J, Tang Y, Qin D, et al. ALOX5 acts as a key role in regulating the immune microenvironment in intrahepatic cholangiocarcinoma, recruiting tumor-associated macrophages through PI3K pathway. J Transl Med 2023;21:923. [Crossref] [PubMed]
  32. Hu WM, Liu SQ, Zhu KF, et al. The ALOX5 inhibitor Zileuton regulates tumor-associated macrophage M2 polarization by JAK/STAT and inhibits pancreatic cancer invasion and metastasis. Int Immunopharmacol 2023;121:110505. [Crossref] [PubMed]
  33. Chen T, Liu J, Wang C, et al. ALOX5 contributes to glioma progression by promoting 5-HETE-mediated immunosuppressive M2 polarization and PD-L1 expression of glioma-associated microglia/macrophages. J Immunother Cancer 2024;12:e009492. [Crossref] [PubMed]
  34. Lu X, Wu K, Jiang S, et al. Therapeutic mechanism of baicalein in peritoneal dialysis-associated peritoneal fibrosis based on network pharmacology and experimental validation. Front Pharmacol 2023;14:1153503. [Crossref] [PubMed]
  35. Chen X, Xie H, Liu Y, et al. Interference of ALOX5 alleviates inflammation and fibrosis in high glucose induced renal mesangial cells. Exp Ther Med 2023;25:34. [Crossref] [PubMed]
  36. Sztul E, Chen PW, Casanova JE, et al. ARF GTPases and their GEFs and GAPs: concepts and challenges. Mol Biol Cell 2019;30:1249-71. [Crossref] [PubMed]
  37. Tang N, Luo X, Ding Z, et al. Single-Cell Multi-Dimensional data analysis reveals the role of ARL4C in driving rheumatoid arthritis progression and Macrophage polarization dynamics. Int Immunopharmacol 2024;141:112987. [Crossref] [PubMed]
  38. Kanai R, Uehara T, Yoshizawa T, et al. ARL4C is associated with epithelial-to-mesenchymal transition in colorectal cancer. BMC Cancer 2023;23:478. [Crossref] [PubMed]
  39. Harada A, Matsumoto S, Yasumizu Y, et al. Localization of KRAS downstream target ARL4C to invasive pseudopods accelerates pancreatic cancer cell invasion. Elife 2021;10:e66721. [Crossref] [PubMed]
  40. Hu Q, Masuda T, Sato K, et al. Identification of ARL4C as a Peritoneal Dissemination-Associated Gene and Its Clinical Significance in Gastric Cancer. Ann Surg Oncol 2018;25:745-53. [Crossref] [PubMed]
  41. Zhang M, Xiang F, Sun Y, et al. Ursolic acid inhibits the metastasis of colon cancer by downregulating ARL4C expression. Oncol Rep 2024;51:27. [Crossref] [PubMed]
  42. Zhang C, Liu H, Tan Y, et al. MS4A6A is a new prognostic biomarker produced by macrophages in glioma patients. Front Immunol 2022;13:865020. [Crossref] [PubMed]
  43. Wang J, Zhang L, Luo L, et al. Characterizing cellular heterogeneity in fibrotic hypersensitivity pneumonitis by single-cell transcriptional analysis. Cell Death Discov 2022;8:38. [Crossref] [PubMed]
  44. Wang Z, Hu D, Pei G, et al. Identification of driver genes in lupus nephritis based on comprehensive bioinformatics and machine learning. Front Immunol 2023;14:1288699. [Crossref] [PubMed]
  45. Chen QY, Jiang YN, Guan X, et al. Aerobic Exercise Attenuates Pressure Overload-Induced Myocardial Remodeling and Myocardial Inflammation via Upregulating miR-574-3p in Mice. Circ Heart Fail 2024;17:e010569. [Crossref] [PubMed]
  46. Lee YI, Shim JE, Kim J, et al. WNT5A drives interleukin-6-dependent epithelial-mesenchymal transition via the JAK/STAT pathway in keloid pathogenesis. Burns Trauma 2022;10:tkac023. [Crossref] [PubMed]
Cite this article as: Mao K, Xu X, Lin F, Pan Y, Lu Z, Luo B, Zhu Y, Li Z, Ye J. Identification of M2 macrophage-related biomarkers for a predictive model of interstitial fibrosis and tubular atrophy after kidney transplantation by machine learning algorithms. Transl Androl Urol 2025;14(7):1990-2006. doi: 10.21037/tau-2025-198

Download Citation