Personalized prediction for recurrence of cystitis glandularis: insights from SHAP and machine learning models

Yuyang Yuan; Fuchun Zheng; Jiming Yao; Kun Zhou; Jiaqing Yang; Xiaoqiang Liu; Hao Wan; Luyao Chen; Jieping Hu; Lizhi Zhou; Bin Fu

doi:10.21037/tau-2024-665

Original Article

Personalized prediction for recurrence of cystitis glandularis: insights from SHAP and machine learning models

Yuyang Yuan^1,2#, Fuchun Zheng^1,2#, Jiming Yao^1,2, Kun Zhou^1,2, Jiaqing Yang^1,2, Xiaoqiang Liu^1,2, Hao Wan^1,2, Luyao Chen^1,2, Jieping Hu^1,2, Lizhi Zhou^1,2, Bin Fu^1,2

¹Department of Urology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, China; ²Jiangxi Institute of Urology, Nanchang, China

Contributions: (I) Conception and design: Y Yuan, F Zheng, B Fu; (II) Administrative support: B Fu; (III) Provision of study materials or patients: J Yang, X Liu, B Fu; (IV) Collection and assembly of data: Y Yuan, F Zheng, K Zhou, J Yang, H Wan, B Fu; (V) Data analysis and interpretation: Y Yuan, F Zheng, J Yang, B Fu; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

Correspondence to: Bin Fu, PhD. Department of Urology, the First Affiliated Hospital, Jiangxi Medical College, Nanchang University, 17 Yongwai Zheng Street, Nanchang 330000, China, Jiangxi Institute of Urology, Nanchang, China. Email: urofubin@126.com.

Background: Cystitis glandularis (CG) is a rare urological condition characterized by glandular metaplasia of the bladder mucosa. Recurrence following transurethral resection (TUR) is a significant clinical challenge. Traditional predictive models often fail to capture the complexity of the data, resulting in insufficient accuracy. In contrast, machine learning (ML) has demonstrated substantial potential in medical prediction by identifying and analyzing complex patterns that are undetectable by conventional methods. This study aims to develop and evaluate an interpretable ML model to predict recurrence after TUR for CG, thereby improving clinical decision-making and patient outcomes.

Methods: We analyzed predictors of recurrence using the least absolute shrinkage and selection operator (LASSO) and multivariate logistic regression. We developed and tested seven ML-based models: Cox proportional hazards model (CoxPH), LASSO regression, decision tree (rpart), random survival forest (RSF), gradient boosting machine (GBM), support vector machine (SVM), and extreme gradient boosting (XGBoost). Participants were diagnosed with CG by pathology following TUR and treated from 2012 to 2018. Model discrimination was assessed using the receiver operating characteristic (ROC) curve and area under the ROC curve (AUC), while model preference was evaluated through the Brier score (BS). Decision curve analysis (DCA) was used for model comparison. The SHapley Additive exPlanations (SHAP) method was employed for interpretation, providing insights into recurrence prediction and prevention strategies. Finally, user-friendly platform was developed, allowing users to predict CG recurrence by entering feature values into designated text boxes on the webpage.

Results: The RSF model demonstrated the best performance in predicting recurrence, as indicated by superior ROC, DCA, and BS metrics. In SHAP, postoperative regular instillation (PRI) contributed the most to model construction.

Conclusions: The RSF model effectively predicts CG recurrence, offering a framework for individualized treatment strategies. PRI was identified as the most significant risk factor influencing recurrence.

Keywords: Cystitis glandularis (CG); machine learning (ML); prediction model; SHapley Additive exPlanations (SHAP); online platform

Submitted Nov 21, 2024. Accepted for publication Feb 27, 2025. Published online Mar 26, 2025.

doi: 10.21037/tau-2024-665

Highlight box

Key findings

• Our study developed an interpretable predictive model using advanced machine learning (ML) techniques to forecast the 5-year recurrence rate of cystitis glandularis (CG). This model is accessible via an intuitive online platform, making it practical for clinical use.

What is known and what is new?

• Previous studies have used clinical variables in nomogram-based models to predict CG recurrence. However, these models often lack comprehensive interpretability and integration of diverse features.

• We enhanced traditional models by incorporating key pathological features and employing multiple ML algorithms. The best-performing model was identified using receiver operating characteristic curves, Brier scores, and decision curve analysis. For added transparency, we utilized SHapley Additive exPlanations to interpret model predictions, ensuring clarity in decision-making processes. The model is also visualized on an online platform for easy access and use by clinicians.

What is the implication, and what should change now?

• This research provides a robust tool for both patients and healthcare providers to assess recurrence risk of CG based on clinical and pathological data. It facilitates early detection and targeted prevention strategies for individuals at high risk, ultimately improving long-term patient outcomes and guiding more personalized treatment plans.

Introduction

Cystitis glandularis (CG) is an uncommon inflammatory disorder marked by glandular metaplasia in the bladder’s urothelium, typically resulting from chronic irritation or inflammation (1). It is often asymptomatic and identified incidentally during cystoscopic examination (2). However, when symptoms are present, they commonly include urinary frequency, urgency, hematuria, and pelvic discomfort, which can significantly impact the patient’s quality of life (3). As the condition progresses, it can lead to severe complications, including bladder wall thickening, recurrent urinary tract infections (UTIs), and, in rare cases, bladder cancer (4). CG is most commonly diagnosed after transurethral resection (TUR), where histopathological analysis reveals glandular structures within the bladder wall (5).

Although the exact etiology of CG remains unclear, several factors have been identified as potential contributors, including chronic infections, irregular post-operative instillation of chemotherapy agents, pelvic lipomatosis (PL), and prolonged catheter use (6-8). Additionally, pathological subtypes appear to be an important influencing factor. Studies have shown that the recurrence rate of intestinal CG is significantly higher than that of typical CG (9). Therefore, identifying independent risk factors is crucial for predicting and preventing recurrence. In terms of treatment, approaches for CG range from medical management, such as antibiotics for recurrent infections, to surgical interventions like TUR for more severe cases (10). However, despite surgery being a common treatment option, recurrence after surgery remains frequent, further complicating patient management.

Traditional nomograms, as a user-friendly visual prediction tool, are widely used in the medical field (11,12). However, they have some limitations. These nomograms are typically based on linear assumptions, making it difficult to capture complex nonlinear relationships in the data (13). Additionally, their reliance on statistical methods for variable selection hampers their ability to effectively handle large numbers of variables and collinearity, thus limiting their predictive power (14,15). Recent studies have begun to explore the factors driving CG recurrence, with increasing attention on predictive modeling through machine learning (ML) techniques. As an emerging field in medicine, ML has demonstrated tremendous potential in analyzing complex datasets, identifying risk factors, and providing personalized predictions for a variety of conditions (16,17). By leveraging clinical data from CG patients, ML models can offer insights into the factors that influence recurrence, guiding early intervention and individualized treatment strategies.

This study aimed to apply multiple ML classification models to develop a predictive framework for CG recurrence, incorporating patient-specific clinical characteristics. Furthermore, the SHapley Additive exPlanations (SHAP) method was employed to enhance interpretability, enabling clinicians to better understand the risk factors involved and tailor treatment plans accordingly (18). The development of such a model, combined with the use of an online platform, holds great promise in improving the early detection and prevention of CG recurrence, advancing patient care, and contributing to the broader field of personalized medicine (19,20). We present this article in accordance with the TRIPOD reporting checklist (available at https://tau.amegroups.com/article/view/10.21037/tau-2024-665/rc).

Methods

Data source and study population

We included patients who underwent transurethral bladder lesion resection between January 1, 2012, and December 31, 2018, at the First Affiliated Hospital of Nanchang University, and whose postoperative pathology confirmed CG. These patients were followed for 5 years. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). Ethical approval for this study was granted by the Ethics Committee of the First Affiliated Hospital of Nanchang University [approval ID: (2022) CDYFYYLK (11-031)]. Informed consent was obtained from all the patients. We excluded individuals with concurrent malignancies, those whose recurrence pathology indicated malignancy, patients with a history of prior TUR, those with a history of CG recurrence, and those lost to follow-up (Figure S1). Our follow-up protocol was based on standard clinical practices, which included timely monitoring and long-term care. Routine follow-up consisted of ultrasound or CT scans 1 month postoperatively, followed by cystoscopy at 3 and 6 months (excluding patients with significant clinical symptoms). Subsequent follow-up intervals were adjusted based on clinical symptoms or signs of recurrence. CG recurrence was defined as the identification of a new lesion through cystoscopy after the complete resection of the primary lesion.

Predictive variables

Clinical characteristics, including age, sex, smoking history, drinking history, and main symptoms (with asymptomatic cases identified through routine health screenings). Additionally, detailed information on urological history and postoperative follow-up was collected, including the history of long-term catheter use and regular postoperative instillation. The pathological features primarily included histological subtypes, specifically intestinal CG and typical CG.

Construction and evaluation of predictive models

Step 1. Screening characteristic factors: least absolute shrinkage and selection operator (LASSO) regression analysis was performed to adjust variable selection and enhance generalization ability. Results from LASSO were then used for multivariate logistic regression analysis, yielding characteristic factors with P<0.05.

Step 2. Data division: patients were randomly divided into training set and testing set in a 7:3 ratio.

Step 3. Comprehensive analysis of multiple classification models: seven ML models were built, including the Cox proportional hazards model (CoxPH), LASSO, decision tree (rpart), random survival forest (RSF), gradient boosting machine (GBM), support vector machine (SVM), and extreme gradient boosting (XGBoost). The models were trained and tested using 10-fold cross-validation, analyzing the importance of indicators in both sets.

Step 4. Model evaluation: the performance of each model was comprehensively assessed using three key metrics: receiver operating characteristic (ROC) curves, Brier score (BS), and decision curve analysis (DCA). Specifically, ROC curves: these were employed to evaluate the diagnostic accuracy of the models. The area under the ROC curve (AUC) provided a quantitative measure of the model’s ability to distinguish between patients who experienced recurrence and those who did not, at various time points. A higher AUC value indicated superior discriminatory power. BS: this metric was utilized to assess the predictive performance of the models. It measured the average squared difference between the predicted probabilities and the actual outcomes, providing a calibration assessment. A lower BS signified better accuracy and calibration of the model’s predictions. DCA: DCA was conducted to evaluate the clinical applicability of each model. It assessed the net benefit of using the model across a range of threshold probabilities, considering both the true positive and false positive rates. This analysis helped determine the clinical utility of the model by comparing its performance against the strategies of treating all or no patients. Together, these metrics provided a holistic evaluation of the models’ diagnostic accuracy, predictive performance, and clinical value (21-23).

Step 5. SHAP interpretation: SHAP was used to illustrate the importance and contribution of each feature to the model’s predictions, applied to both the overall model and individual samples to interpret results by calculating the contribution of each feature to predicted outcomes.

Statistical analysis

Multivariate logistic regression analysis was performed using IBM SPSS Statistics 27 (IBM Corp., Armonk, NY, USA), while LASSO regression was completed using R 4.4.0 (R Foundation for Statistical Computing, Vienna, Austria) with the glmnet package. The construction, comparison, and performance evaluation of ML models, along with SHAP interpretation, were conducted using the tidyverse and mlr3verse packages. The online platform was developed using Shiny packages.

Results

General characteristics

A total of 239 eligible patients with CG (median age, 50.1±13.67 years) were included in the study, which was randomly divided into a training set (n=167) and a validation set (n=72). The training and validation cohorts were well-balanced in terms of clinical and pathological variables, follow-up, and recurrence incidence (Table 1). The demographic characteristics of the patients are presented in Table 2. One hundred and eighteen patients (49.3%) experienced recurrence of CG.

Table 1

Characteristics in training cohort and testing cohort

Characteristics	All	Training set	Validation set	P value
Gender				0.41
Female	69	48	21
Male	170	119	51
Alcohol consumption				0.37
Yes	54	37	17
No	185	130	55
Smoking				0.17
Yes	70	49	21
No	169	118	51
BOO				0.12
Yes	55	39	16
No	184	128	56
UI				0.23
Yes	129	90	39
No	110	77	33
UUTO				0.19
Yes	73	51	22
No	166	116	50
LTIC				0.43
Yes	64	45	19
No	175	122	53
UC				0.10
Yes	72	50	22
No	167	117	50
Multiple lesions				0.25
Yes	104	73	31
No	135	94	41
Medical examination				0.38
Yes	161	112	49
No	78	55	23
Postoperative instillation				0.24
Yes	172	120	52
No	67	47	20
Histological subtype				0.16
Intestinal CG	37	26	11
Typical CG	202	141	61
Age (years)	50.1 (40.0, 60.0)	49.82 (39.0, 60.0)	49.35 (41.0, 58.75)	0.18
LS (cm)	1.56 (0.7, 2.0)	1.57 (0.70, 2.00)	1.75 (0.8, 2.3)	0.32
N/L	2.78 (1.60, 3.07)	2.75 (1.62, 3.05)	2.81 (1.59, 3.28)	0.15

Data are presented as number or median (quartile). BOO, bladder outlet obstruction; CG, cystitis glandularis; LS, lesion size; LTIC, long-term indwelling catheter; N/L, neutrophil/lymphocyte; UC, urinary calculus; UI, urinary infection; UUTO, upper urinary tract obstruction.

Table 2

Characteristics of patients in recurrence and non-recurrence

Characteristics	Recurrence	Non-recurrence	P value
Gender			0.42
Man	98	72
Woman	20	49
Alcohol consumption			0.55
Yes	30	24
No	88	97
Smoking			0.09
Yes	36	34
No	82	87
BOO			0.08
Yes	35	20
No	83	101
UI			0.041
Yes	41	51
No	77	70
UUTO			0.21
Yes	42	31
No	76	90
LTIC			0.03
Yes	36	28
No	82	93
UC			0.14
Yes	39	33
No	79	88
Multiple lesions			0.72
Yes	60	44
No	58	77
Medical examination			0.13
Yes	37	41
No	81	80
PRI			<0.001
Yes	73	99
No	45	22
Histological subtype			0.053
Intestinal CG	19	18
Typical CG	99	103
Age (years)	50.95 (40.3, 59.8)	49.09 (39.0, 60.0)	0.14
LS (cm)	1.71 (0.7, 2.5)	1.42 (0.6, 2.0)	0.03
N/L	2.84 (1.6, 3.3)	2.72 (1.6, 3.0)	0.33

Data are presented as number or median (quartile). BOO, bladder outlet obstruction; CG, cystitis glandularis; LS, lesion size; LTIC, long-term indwelling catheter; N/L, neutrophil/lymphocyte; PRI, postoperative regular instillation; UC, urinary calculus; UI, urinary infection; UUTO, upper urinary tract obstruction.

Screening of characteristic factors for recurrence risk in CG patients

LASSO regression analysis was conducted with recurrence as the dependent variable. LASSO is effective in compressing variable coefficients to prevent overfitting and address multicollinearity issues (24). The results (lambda with minimum mean square error =0.029) reduced 15 independent variables to nine: gender, urinary infection (UI), upper urinary tract obstruction (UUTO), long-term indwelling catheter (LTIC), urinary calculus (UC), multiple lesions, postoperative regular instillation (PRI), age (years), and lesion size (LS; cm) (Figure 1A,1B). To further control for potential confounding factors, multivariate logistic regression was applied to these nine variables. Ultimately, UI, UUTO, PRI, age (years), LS (cm), and LTIC were identified as significant factors (P<0.05, Table 3).

Figure 1 LASSO regression analysis for the selection of clinical features. (A) LASSO coefficient distribution diagram of clinical features. (B) LASSO regression analysis used the minimum criterion and ten folder crossvalidation method. By introducing a penalty adjustment parameter (λ) to compress the coefficients of clinical features, the coefficients of irrelevant features tend to zero, thereby achieving automatic screening of features. LASSO, least absolute shrinkage and selection operator.

Table 3

Multivariate logistic regression analysis screened by LASSO regression analysis

Characteristics	OR (95% CI)	P value
Gender	1.58 (0.94, 1.71)	0.054
UI	1.77 (1.13, 2.8)	0.01
UUTO	1.97 (1.18, 3.31)	0.009
LTIC	1.42 (1.36, 2.40)	0.04
UC	1.13 (0.82, 1.30)	0.17
Multiple lesions	1.05 (0.77, 1.44)	0.24
PRI	3.63 (2.30, 5.7)	<0.001
Age	1.27 (1.16, 2.37)	0.044
LS	1.24 (1.05, 1.5)	0.01

CI, confidence interval; LASSO, least absolute shrinkage and selection operator; LS, lesion size; LTIC, long-term indwelling catheter; OR, odds ratio; PRI, postoperative regular instillation; UC, urinary calculus; UI, urinary infection; UUTO, upper urinary tract obstruction.

Comprehensive analysis of multi-model

The seven models were trained and evaluated using 10-fold cross-validation. Model performance was assessed using AUC values, which indicated that the RSF model outperformed the others in both the training and testing sets, providing the most accurate predictions for recurrence probabilities at 1, 3, and 5 years (Figure 2A,2B). The ROC curves for all models in both the training and testing sets are provided in Figure S2. To further evaluate the clinical applicability and predictive accuracy of the models, we conducted DCA and calculated BS. DCA revealed that the RSF, CoxPH, and XGBoost models were the most clinically applicable (Figure 2C). BS demonstrated that the RSF model achieved the highest prediction accuracy (Figure 2D). Overall, the RSF model consistently exhibited the best performance in both the training and testing sets, confirming its status as the optimal model. Additionally, survival analysis, which categorized patients into high- and low-risk groups based on the RSF model, showed good discrimination in both sets, effectively identifying high-risk individuals (Figure 2E,2F).

Figure 2 ML model comprehensive analysis. (A) Training sets AUC and (B) testing sets AUC patients were sampled 10 times at a ratio of 7:3. (C) DCA where the black solid line represents the assumption that all patients will experience recurrence, and the blue solid line represents the assumption that no patients will experience recurrence.The remaining solid lines represent different models. (D) BS in all models. (E) Training sets Kaplan-Meier curves and (F) testing sets Kaplan-Meier curves. AUC, area under the ROC curve; BS, Brier score; CoxPH, Cox proportional hazards model; DCA, decision curve analysis; GBM, gradient boosting machine; LASSO, least absolute shrinkage and selection operator; ML, machine learning; ROC, receiver operating characteristic; RSF, random survival forest; rpart, decision tree; SVM, support vector machine; XGBoost, extreme gradient boosting.

SHAP for model interpretation

To provide a more detailed explanation of the impact of the selected variables, we used the SHAP method to illustrate how these variables contribute to recurrence prediction in the model. SHAP values, based on game theory, offer a measure of each feature’s contribution to the predicted outcome. Each feature’s SHAP value represents how that feature either increases or decreases the prediction under the given model (25,26). Figure 3A shows the SHAP values for continuous variables. Each point represents an individual patient, with red points indicating low-risk values and blue points indicating high-risk values. This plot clearly demonstrates the positive correlation between both age and LS with the recurrence of CG. Figure 3B presents the impact of categorical variables on the prediction outcome. Higher values of these variables are generally associated with a greater risk of recurrence. In addition, factors such as a history of or current presence of UI, UUTO, LITC, and irregular postoperative infusion chemotherapy are shown to increase the risk of CG recurrence. Figure 3C ranks six risk factors based on their average absolute SHAP value, with the X-axis indicating the importance of each variable in the model. Our findings suggest that PRI contributes the most to the model’s construction, highlighting its significance as the most important feature. Figure 3D illustrates a typical example of SHAP decomposition, demonstrating the specific contribution of each feature to the risk prediction of recurrence for an individual sample. Positive SHAP values indicate that the feature increases the risk of recurrence, while negative values suggest a reduction in risk. Furthermore, we selected the SHAP method because it provides a transparent and consistent explanation of feature importance. Unlike other methods, SHAP mitigates the impact of multicollinearity, ensuring more robust interpretability. In our example, the patient’s age acted as a protective factor, reducing the recurrence risk, while the other variables significantly increased the risk. The SHAP method has helped clarify the role of each feature in the model, further enhancing its interpretability.

Figure 3 SHAP interprets the model. (A) Attributes of continuous variables in SHAP. Each line represents a feature, and the abscissa is the SHAP value. Red dots represent higher eigenvalues and blue dots represent lower eigenvalues. (B) Attributes of categorical variables in SHAP, with higher values indicating greater eigenvalues. (C) Feature importance ranking as indicated by SHAP. The matrix diagram describes the importance of each covariate in the development of the final prediction model. (D) Single-sample prediction decomposition. SHAP values reflect the impact of each feature on the risk score (the risk of recurrence). Each variable’s SHAP value represents its contribution to the predicted outcome. Positive SHAP values indicate that the feature increases the risk of recurrence, while negative values suggest a reduction in risk. The plot illustrates the contribution of all input variables to the risk prediction for this individual sample. The order of the variables reflects their importance in the model’s prediction, with the most significant variables at the top. LS, lesion size; LTIC, long-term indwelling catheter; PRI, postoperative regular instillation; SHAP, SHapley Additive exPlanations; UI, urinary infection; UUTO, upper urinary tract obstruction.

Online platform

The online platform has been developed to provide a convenient and efficient tool for clinical and basic research, incorporating RSF model for predicting recurrence based on six input features (27,28). Currently, the platform allows users to input values for the six specified features into designated text boxes and generate a prediction of recurrence (Figure 4). The platform is accessible at http://127.0.0.1:5940.

Figure 4 The online platform prediction tool based on ML RSF model. ML, machine learning; RSF, random survival forest.

Discussion

The development of targeted therapies for CG has proven challenging due to its unclear etiology and pathogenesis, which are associated with the activation of T cells (29). ML has increasingly been applied in the field of urology (30,31). Compared to ML models, traditional nomograms are unable to effectively handle multicollinearity and are more prone to overfitting, which in turn reduces their predictive accuracy. More importantly, previous prediction models often fail to incorporate key indicators, such as whether regular postoperative irrigation was performed, which significantly impact recurrence time and status (14). Unlike these traditional models, our study not only focuses on whether recurrence occurs but also on the specific time and status of recurrence, enabling us to more accurately predict the probability of recurrence at different time points. Based on clinical variables and pathological subtypes, we developed seven ML models, ultimately selecting the RSF as the most accurate for predicting CG recurrence due to its superior performance across all evaluation metrics (10,32). In the SHAP analysis, PRI emerges as the most significant feature influencing recurrence. PRI of chemotherapeutic drugs plays a crucial role in preventing recurrence. Studies indicate that patients who receive continuous instillation therapy experience significantly lower recurrence rates compared to those who do not maintain a regular instillation schedule (6). This finding highlights the importance of adhering to a consistent instillation regimen in managing CG. Regular instillation not only helps reduce the risk of recurrence but also enhances overall bladder health by effectively managing residual glandular tissue. In our study, pathological recurrence was defined as the reappearance of new CG lesions after the complete resection of primary lesions. Recently, researchers have introduced the concept of symptomatic recurrence, defined as the reappearance of obvious symptoms during the follow-up period after initial symptoms had completely resolved (33). Symptomatic recurrence may be a primary cause of retreatment and reduced quality of life. We redefined CG recurrence by integrating both symptomatic and pathological recurrence, systematically analyzing risk factors, and assessing pathological recurrence in patients receiving conservative treatment. In our study, 10 patients classified under symptomatic recurrence were included in the non-recurrence group, likely contributing to an actual recurrence rate (53.4%) that exceeds the observed rate (49.3%).

PL is a rare, proliferative disease characterized by an overgrowth of normal fat in the pelvic retroperitoneal space (34). Some researchers have noted that PL patients may present a unique “latter-half-section obstruction” (LHSO) on urodynamics studies (UDS). The presence of LHSO or bladder outlet obstruction (BOO) is associated with morphological and dynamic changes in the urinary system. Patients with PL and LHSO or BOO on UDS are at increased risk for disease progression (35). Moreover, CG, cystitis cystica, or cystitis follicularis has been observed in 75% of patients with PL (36,37). While PL and CG share similar symptoms, their relationship remains unclear and warrants further investigation. Based on previous clinical studies, PL can be considered a risk factor for CG recurrence. In our study, only 4 patients (1.67%) were diagnosed with PL. Due to the small number of cases, including PL as a variable could have introduced significant bias into our prediction model, so we did not consider its potential impact on CG recurrence. Therefore, we chose to exclude this variable from the current analysis. However, in future studies with larger sample sizes, we plan to include PL in our analysis to further assess its potential role in CG recurrence.

There are several limitations to this study. First, it is retrospective in nature, and despite strict inclusion and exclusion criteria, selection bias may still exist. Second, the sample size is relatively small. Lastly, the model was developed using data from a single center. Future multi-center prospective studies are essential to improve the algorithm and validate the findings, particularly through the use of deep learning and artificial neural networks. Additionally, the combined analysis of multi-omics features (such as radiomics and pathomics) will have a more profound impact on the development of predictive models. Moreover, ML, as an emerging research method, holds great potential for application in a broader range of diseases, such as stones and tumors.

Conclusions

The RSF model effectively predicts CG recurrence, offering a framework for individualized treatment strategies. In SHAP, PRI contributed the most to model construction. While the predictive model developed in this study holds substantial value, future research should incorporate multicenter, multi-omics, and multivariate approaches with larger sample sizes to further validate these findings.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tau.amegroups.com/article/view/10.21037/tau-2024-665/rc

Data Sharing Statement: Available at https://tau.amegroups.com/article/view/10.21037/tau-2024-665/dss

Peer Review File: Available at https://tau.amegroups.com/article/view/10.21037/tau-2024-665/prf

Funding: This study was supported by the Jiangxi Provincial “Double Thousand Plan” Fund Project (No. jxsq2019201027).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tau.amegroups.com/article/view/10.21037/tau-2024-665/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). Ethical approval for this study was granted by the Ethics Committee of the First Affiliated Hospital of Nanchang University [approval ID: (2022) CDYFYYLK (11-031)]. Informed consent was obtained from all the patients.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Guo A, Liu A, Teng X. The pathology of urinary bladder lesions with an inverted growth pattern. Chin J Cancer Res 2016;28:107-21. [PubMed]
Singh J, Farooq S, Joshi S, et al. Histopathologic findings in patients who have undergone blue light cystoscopy and bladder biopsy or transurethral resection: A contemporary clinicopathologic analysis of 100 cases. Pathol Res Pract 2022;234:153916. [Crossref] [PubMed]
Xin Z, Zhao C, Huang T, et al. Intestinal metaplasia of the bladder in 89 patients: a study with emphasis on long-term outcome. BMC Urol 2016;16:24. [Crossref] [PubMed]
Agrawal A, Kumar D, Jha AA, et al. Incidence of adenocarcinoma bladder in patients with cystitis cystica et glandularis: A retrospective study. Indian J Urol 2020;36:297-302. [Crossref] [PubMed]
Hu H, Tang Y, Zhou B, et al. Anti-cystitis glandularis action exerted by glycyrrhetinic acid: bioinformatics analysis and molecular validation. Mol Divers 2025; Epub ahead of print. [Crossref] [PubMed]
Abdel Magied MH, Badreldin AM, Leslie SW. Cystitis Cystica and Cystitis Glandularis. 2025.
Ma YH, Xu HH, Xu W, et al. Cystitis glandularis: MR imaging characteristics in 27 patients. Jpn J Radiol 2025;43:483-91. [PubMed]
Susmano DE, Dolin EH. Computed tomography in diagnosis of pelvic lipomatosis. Urology 1979;13:215-20. [Crossref] [PubMed]
Qu Y, Chen X, Cui Y, et al. Changes of bladder mucosal inflammatory factors and prognosis in cystitis glandularis. Int J Clin Exp Pathol 2018;11:3591-7. [PubMed]
Jeon J, Ha JS, Shin SJ, et al. Differences in clinical features between focal and extensive types of cystitis glandularis in patients without a previous history of urinary tract malignancy. Investig Clin Urol 2023;64:597-605. [Crossref] [PubMed]
Li S, Liu X, Weipeng L, et al. Nomogram to predict overall survival in patients with primary bladder neuroendocrine carcinoma: a population-based study. Future Oncol 2022;18:4171-81. [Crossref] [PubMed]
Zheng F, Li S, Wan X, et al. Development and external validation of a nomogram to predict the prognosis of patients with metastatic prostate cancer who underwent radiotherapy. Gland Surg 2024;13:2137-47. [Crossref] [PubMed]
Li Y, Ma J, Cheng W. Harnessing Machine Learning and Nomogram Models to Aid in Predicting Progression-Free Survival for Gastric Cancer Patients Post-Gastrectomy with Deficient Mismatch Repair(dMMR). BMC Cancer 2025;25:141. [Crossref] [PubMed]
Hu J, Li C, Guo X, et al. Development and validation of a predictive nomogram for the risk of recurrence in patients with cystitis glandularis. Ann Transl Med 2020;8:352. [Crossref] [PubMed]
Yan M, He D, Sun Y, et al. Comparative Analysis of Nomogram and Machine Learning Models for Predicting Axillary Lymph Node Metastasis in Early-Stage Breast Cancer: A Study on Clinically and Ultrasound-Negative Axillary Cases Across Two Centers. Ultrasound Med Biol 2025;51:463-74. [Crossref] [PubMed]
Greener JG, Kandathil SM, Moffat L, et al. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022;23:40-55. [Crossref] [PubMed]
He YJ, Liu PL, Wei T, et al. Artificial intelligence in kidney transplantation: a 30-year bibliometric analysis of research trends, innovations, and future directions. Ren Fail 2025;47:2458754. [Crossref] [PubMed]
Lundberg SM, Erion G, Chen H, et al. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat Mach Intell 2020;2:56-67. [Crossref] [PubMed]
Shortliffe EH, Sepúlveda MJ. Clinical Decision Support in the Era of Artificial Intelligence. JAMA 2018;320:2199-200. [Crossref] [PubMed]
Churpek MM, Yuen TC, Winslow C, et al. Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards. Crit Care Med 2016;44:368-74. [Crossref] [PubMed]
Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006;26:565-74. [Crossref] [PubMed]
Obuchowski NA, Bullen JA. Receiver operating characteristic (ROC) curves: review of methods with applications in diagnostic medicine. Phys Med Biol 2018;63:07TR01. [Crossref] [PubMed]
Fenlon C, O'Grady L, Doherty ML, et al. A discussion of calibration techniques for evaluating binary and categorical predictive models. Prev Vet Med 2018;149:107-14. [Crossref] [PubMed]
Sauerbrei W, Royston P, Binder H. Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Stat Med 2007;26:5512-28. [Crossref] [PubMed]
Gao T, Nong Z, Luo Y, et al. Machine learning-based prediction of in-hospital mortality for critically ill patients with sepsis-associated acute kidney injury. Ren Fail 2024;46:2316267. [Crossref] [PubMed]
Li X, Wang Z, Zhao W, et al. Machine learning algorithm for predict the in-hospital mortality in critically ill patients with congestive heart failure combined with chronic kidney disease. Ren Fail 2024;46:2315298. [Crossref] [PubMed]
Lin W, Yang H, Lin J, et al. OralExplorer: a web server for exploring the mechanisms of oral inflammatory diseases. J Transl Med 2024;22:282. [Crossref] [PubMed]
Noori A, Jayakumar R, Moturi V, et al. Alzheimer DataLENS: An Open Data Analytics Portal for Alzheimer's Disease Research. J Alzheimers Dis 2024;99:S397-407. [Crossref] [PubMed]
Zhou TL, Chen HX, Wang YZ, et al. Single-cell RNA sequencing reveals the immune microenvironment and signaling networks in cystitis glandularis. Front Immunol 2023;14:1083598. [Crossref] [PubMed]
Khene ZE, Bigot P, Doumerc N, et al. Application of Machine Learning Models to Predict Recurrence After Surgical Resection of Nonmetastatic Renal Cell Carcinoma. Eur Urol Oncol 2023;6:323-30. [Crossref] [PubMed]
Zou XC, Luo CW, Yuan RM, et al. Develop a radiomics-based machine learning model to predict the stone-free rate post-percutaneous nephrolithotomy. Urolithiasis 2024;52:64. [Crossref] [PubMed]
Zhang W, Yao YS, Lin ME, et al. Unexplained association between cystitis glandularis and interstitial cystitis in females: a retrospective study. Int Urogynecol J 2015;26:1835-41. [Crossref] [PubMed]
Teng X, Han K, Jin W, et al. Development and validation of an early diagnosis model for bone metastasis in non-small cell lung cancer based on serological characteristics of the bone metastasis mechanism. EClinicalMedicine 2024;72:102617. [Crossref] [PubMed]
Ata YM, Al-Jassim FA, Alabassi K, et al. Pelvic lipomatosis-a rare diagnosis and a challenging management: a case report and literature review. J Surg Case Rep 2024;2024:rjae777. [Crossref] [PubMed]
Chen Y, Yang Y, Yu W, et al. Urodynamic characteristics of pelvic lipomatosis with glandular cystitis patients correlate with morphologic alterations of the urinary system and disease severity. Neurourol Urodyn 2018;37:758-67. [Crossref] [PubMed]
Ono T, Tanaka H, Moriwake T, et al. Achondroplasia associated with pelvic lipomatosis. Lancet 1999;353:1017. [Crossref] [PubMed]
Sözen S, Gürocak S, Uzüm N, et al. The importance of re-evaluation in patients with cystitis glandularis associated with pelvic lipomatosis: a case report. Urol Oncol 2004;22:428-30. [Crossref] [PubMed]

Cite this article as: Yuan Y, Zheng F, Yao J, Zhou K, Yang J, Liu X, Wan H, Chen L, Hu J, Zhou L, Fu B. Personalized prediction for recurrence of cystitis glandularis: insights from SHAP and machine learning models. Transl Androl Urol 2025;14(3):808-819. doi: 10.21037/tau-2024-665

Personalized prediction for recurrence of cystitis glandularis: insights from SHAP and machine learning models

Highlight box

Introduction

Methods

Data source and study population

Predictive variables

Construction and evaluation of predictive models

Statistical analysis

Results

General characteristics

Table 1

Table 2

Screening of characteristic factors for recurrence risk in CG patients

Table 3

Comprehensive analysis of multi-model

SHAP for model interpretation

Online platform

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share