Development and validation of an interpretable machine learning model for predicting Gleason score upgrade in prostate cancer

Shu-Feng Li; Jin-Ge Zhao; Chen-Yi Jiang; Shi-Yuan Wang; Si-Yu Liu; Yi-Jun Zhang; Hao Zeng; Fu-Jun Zhao

doi:10.21037/tau-2025-178

Original Article

Development and validation of an interpretable machine learning model for predicting Gleason score upgrade in prostate cancer

Shu-Feng Li^1,2#, Jin-Ge Zhao^3#, Chen-Yi Jiang¹, Shi-Yuan Wang^1,2, Si-Yu Liu^1,2, Yi-Jun Zhang^1,2, Hao Zeng³, Fu-Jun Zhao¹

¹Department of Urology, Shanghai General Hospital, Shanghai, China; ²Shanghai Jiao Tong University School of Medicine, Shanghai, China; ³Department of Urology, West China Hospital of Medicine, Chengdu, China

Contributions: (I) Conception and design: SF Li, FJ Zhao, CY Jiang, H Zeng; (II) Administrative support: FJ Zhao; (III) Provision of study materials or patients: SF Li, JG Zhao, SY Liu; (IV) Collection and assembly of data: SF Li, SY Wang, SY Liu; (V) Data analysis and interpretation: SF Li, SY Wang, YJ Zhang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Fu-Jun Zhao, MD. Department of Urology, Shanghai General Hospital, No. 100 Haining Road, Hongkou District, Shanghai 200080, China. Email: zhaofujun72@163.com; Hao Zeng, MD. Department of Urology, West China Hospital of Medicine, No. 37 Guoxue Alley, Wuhou District, Chengdu 610041, China. Email: kucaizeng@163.com.

Background: The high incidence of Gleason score upgrade (GSU) can lead urologists to underestimate tumor aggressiveness, resulting in suboptimal treatment decisions. This study aimed to develop an interpretable machine learning model to predict the risk of GSU in individuals with prostate cancer (PCa) based on readily available clinical parameters.

Methods: A retrospective analysis was conducted on patients who underwent radical prostatectomy (RP) at Shanghai General Hospital and West China Hospital. Data from Shanghai General Hospital were categorized into a training set (80%) and a test set (20%), while data from West China Hospital were used for external validation. Preoperative clinical and pathological data were collected. Nine machine learning models [including random forest (RF) and light gradient boosting machine (LightGBM)], were developed, and the model demonstrating the best predictive performance was selected as the final model. Model performance was evaluated using receiver operating characteristic (ROC) curves, calibration curves, decision curves, and SHapley Additive exPlanations (SHAP) interpretation.

Results: The LightGBM model demonstrated strong predictive performance, achieving an area under the ROC curve of 84.53% in the test set and 76.61% in external validation. Significant factors associated with GSU included the International Society of Urological Pathology (ISUP) grade, age, clinical tumor stage (T stage), body mass index, prostate-specific antigen (PSA), free-to-total PSA ratio (f/t PSA), platelet-to-lymphocyte ratio (PLR), and bilateral tumor involvement. An online prediction tool was developed based on this model.

Conclusions: A machine learning model and an online prediction tool were developed to accurately predict GSU and identify factors associated with this process. This approach may assist clinicians in identifying individuals at high-risk for GSU and facilitating evidence-based treatment decisions.

Keywords: Gleason score (GS); light gradient boosting machine (LightGBM); machine learning; prostate cancer (PCa); SHapley Additive exPlanations (SHAP)

Submitted Mar 05, 2025. Accepted for publication Apr 25, 2025. Published online Jun 26, 2025.

doi: 10.21037/tau-2025-178

Highlight box

Key findings

• The light gradient boosting machine model showed strong predictive performance for Gleason score upgrade (GSU) in prostate cancer (PCa), with an area under the receiver operating characteristic curve of 84.53% in the test set and 76.61% in external validation.

• Significant factors for GSU included International Society of Urological Pathology grade, age, clinical tumor stage, body mass index, prostate-specific antigen (PSA), free-to-total PSA ratio, platelet-to-lymphocyte ratio, and bilateral tumor involvement.

• An online prediction tool was developed based on the model.

What is known and what is new?

• It’s known that GSU following radical prostatectomy is significantly associated with an elevated risk of biochemical recurrence, distant metastasis, and PCa-specific mortality. The new aspect of this study is the development of a machine learning model and an online tool to predict GSU using nine different algorithms and identifying key clinical factors associated with GSU.

What is the implication, and what should change now?

• The implication is that this model may assist clinicians in identifying high-risk individuals for GSU and facilitating evidence-based treatment decisions. Future studies should incorporate multimodal data and prospective validation in diverse clinical settings to further improve model robustness and generalizability.

Introduction

Prostate cancer (PCa) is among the most prevalent malignancies worldwide and remains the third leading cause of cancer-related mortality (1,2). The Gleason score (GS) serves as a critical prognostic indicator and is fundamental to the risk stratification of PCa (3). It plays a pivotal role in diagnosis, tumor grading, treatment selection, and prognosis assessment (4).

Radical prostatectomy (RP) is associated with various surgical complications. For individuals with low-risk PCa, alternative treatment modalities—including high-intensity focused ultrasound, irreversible electroporation, cryotherapy, or active surveillance—may provide comparable survival outcomes while preserving quality of life (5,6). As a result, an increasing number of individuals with PCa are opting for treatment strategies other than RP (7,8). Therefore, obtaining an accurate GS is essential, as it directly informs prognostic evaluation and personalized treatment planning.

A key limitation of determining GS through biopsy is that the sampled tissue represents only a small portion of the tumor (9). Studies indicate that 30–40% of individuals experience an upgrade in GS following RP compared to the score at initial biopsy—a phenomenon known as GS upgrade (GSU) (10,11). The omission of high-grade tumor components in biopsy specimens may lead to an underestimation of tumor aggressiveness, resulting in potential misclassification and inappropriate clinical management. Additionally, multiple studies have demonstrated that GSU following RP is significantly associated with an elevated risk of biochemical recurrence, distant metastasis, and PCa-specific mortality (12,13). Therefore, the development of an accurate predictive model for GSU is essential for tumor risk assessment and informed treatment decision-making.

In recent years, artificial intelligence and machine learning algorithms have been increasingly utilized in the medical field (14). These models integrate multiple clinical risk factors to enhance predictive accuracy, enabling individualized patient risk estimation and supporting clinical decision-making (15). However, the “black box” nature of machine learning models poses interpretability challenges. To address this, SHapley Additive exPlanations (SHAP) values have been introduced to interpret these models (16,17). SHAP values can provide insights into the contribution of each feature to model predictions (16,17).

Therefore, this study aimed to develop clinical prediction models using nine different machine learning algorithms to facilitate clinical decision-making and evaluate the predictive value of various clinical factors associated with GSU. We present this article in accordance with the TRIPOD reporting checklist (available at https://tau.amegroups.com/article/view/10.21037/tau-2025-178/rc).

Methods

Study population

Data were collected from individuals who underwent RP at Shanghai General Hospital between January 2020 and December 2023 and at West China Hospital between December 2017 and April 2020. The inclusion criterion required that both prostate biopsy and RP had been performed at the same institution. Exclusion criteria were as follows: (I) a history of radiotherapy or androgen deprivation therapy prior to RP; and (II) incomplete clinical data. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Ethics Committees of Shanghai General Hospital (No. 2024KS504) and West China Hospital [No. 2021(1703)]. Due to the retrospective nature of the analysis, informed consent was not obtained from the patients; however, all personal information was anonymized to protect patient privacy.

Data collection

Preoperative clinical characteristics and biopsy records were obtained from electronic medical records. The collected preoperative clinical parameters included age, body mass index (BMI), prostate-specific antigen (PSA), prostate volume, PSA density (PSAD; serum PSA/prostate volume), aspartate aminotransferase/alanine aminotransferase ratio, complete blood count, clinical tumor stage (T stage), and the interval between biopsy and RP. Biopsy data included the number of biopsy cores, the proportion of positive biopsy cores to the total number of cores, the International Society of Urological Pathology (ISUP) grade at biopsy, and whether a targeted biopsy was performed.

Systematic biopsies were routinely performed in all patients. When multiparametric magnetic resonance imaging (mpMRI) revealed suspicious lesions, an additional 3 to 6 cores of targeted biopsy were conducted. All biopsy and surgical pathology specimens were graded according to the ISUP grading system. Pathology reports were jointly reviewed and finalized by senior pathologists at our institution. GSU was defined as an increase in Gleason Grade Group from biopsy to RP pathology, based on the ISUP grading system.

Inflammatory indices were calculated based on perioperative hematological parameters using the following formulas:

$Neutrophil-to-lymphocyte ratio (NLR) = \frac{Neutrophil count}{Lymphocyte count}$ [1]

$Platelet-to-lymphocyte ratio (PLR) = \frac{Platelet count}{Lymphocyte count \times 1000}$ [2]

$Systemic immune-inflammation index (SII) = \frac{Neutrophil count \times platelet count}{Lymphocyte count \times 1000}$ [3]

Statistical analysis

For continuous variables following a normal distribution, data were presented as mean ± standard deviation (SD) and compared using independent t-tests. For continuous variables with a skewed distribution, data were reported as the median and interquartile range (IQR) and compared using the Mann-Whitney U test or Kruskal-Wallis H test, as appropriate. Categorical variables were presented as counts (n) and percentages (%) and analyzed using the Chi-squared (χ²) test or Fisher’s exact test, depending on sample size constraints. Additionally, a heatmap of the feature correlation matrix was generated to assess variable relationships.

Model training and validation

Variable selection was performed using the least absolute shrinkage and selection operator (LASSO) method (18). Predictive models were subsequently developed using multiple machine learning algorithms, including decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), elastic net (Enet), radial support vector machine (RSVM), multilayer perceptron (MLP), logistic regression, light gradient boosting machine (LightGBM), and K-nearest neighbors (KNN).

Data from Shanghai General Hospital were randomly partitioned into a training set (80%) for model development and a test set (20%) for internal validation. To avoid overfitting and underfitting, five-fold cross-validation with stratified sampling was applied during hyperparameter tuning. The configuration that achieved the highest mean area under the receiver operating characteristic (ROC) curve (AUROC) across folds was selected for each model. The models were then retrained using these optimized parameters to enhance stability and reliability. The retrained models were subsequently evaluated on the test dataset to assess their real-world performance. Multiple evaluation metrics, as outlined in the “Evaluation metrics” section, were employed to ensure robustness and reliability. The model with the highest predictive accuracy was selected as the final model. To further assess generalizability, the final model was externally validated using data from West China Hospital.

Evaluation metrics

The diagnostic performance of the machine learning models was compared using multiple evaluation metrics. The optimal cutoff value was determined based on Youden’s index (sensitivity + specificity − 1). Model performance in the validation set was assessed using the following metrics: accuracy, sensitivity, specificity, Youden’s index, F1 score, and the AUROC. Additionally, calibration curves and decision curve analysis (DCA) were used to further evaluate the clinical utility of the models.

Model interpretation

To enhance the interpretability of the final model, SHAP values were utilized to quantify the contribution of each variable to the model’s predictions. This approach provided two levels of interpretations: global feature-level interpretation and local individual-level interpretation.

All statistical analyses were conducted using R version 4.4.0 (R Development Core Team, Vienna, Austria). Computational analyses primarily utilized the CBCgrps and tidymodels packages in R.

Results

Baseline characteristics

A total of 288 individuals from Shanghai General Hospital and 260 individuals from West China Hospital were included in the study. Table 1 presents detailed demographic and clinical characteristics of the study population. Comparative analysis showed that, compared to the non-upgrade group, individuals in the upgrade group were more likely to have a lower ISUP grade at biopsy (predominantly grade 1), significantly fewer positive biopsy cores, and a lower ratio of positive cores to total biopsy cores. Other variables, including age, BMI, prostate volume, PSA levels, and various laboratory parameters, did not demonstrate statistically significant differences between the two groups (P>0.05). The study workflow is illustrated in Figure 1, while Table 2 summarizes the differences in ISUP grades between biopsy and surgical resection.

Table 1

Baseline demographic and clinical characteristics

Variables	Total (n=288)	No upgrading (n=174)	Upgrading (n=114)	P
Age (years)	70.94±6.81	70.67±6.5	71.36±7.25	0.41
BMI (kg/m²)	23.9 [22.46, 25.71]	24.01 [22.47, 25.9]	23.88 [22.46, 25.54]	>0.99
Targeted	110 [38]	69 [40]	41 [36]	0.61
Double side	246 [85]	144 [83]	102 [89]	0.15
T				0.15
2	223 [77]	141 [81]	82 [72]
3a	42 [15]	20 [11]	22 [19]
3b	23 [8]	13 [7]	10 [9]
ISUP				<0.001
1	97 [34]	29 [17]	68 [60]
2	69 [24]	40 [23]	29 [25]
3	53 [18]	47 [27]	6 [5]
4	54 [19]	43 [25]	11 [10]
5	15 [5]	15 [9]	0 [0]
Positive cores	4 [2, 6]	4 [2, 6]	3 [2, 5]	0.03
Biopsy cores				0.65
8	1 [0]	1 [1]	0 [0]
10	1 [0]	0 [0]	1 [1]
12	168 [58]	98 [56]	70 [61]
13	1 [0]	1 [1]	0 [0]
14	34 [12]	20 [11]	14 [12]
15	73 [25]	48 [28]	25 [22]
16	9 [3]	6 [3]	3 [3]
18	1 [0]	0 [0]	1 [1]
Positive/biopsy cores	0.27 [0.16, 0.43]	0.33 [0.17, 0.43]	0.25 [0.13, 0.42]	0.03
Prostate volume (cm³)	34.66 [26.66, 47.96]	34.43 [27.02, 46.88]	35.79 [25.88, 48.63]	0.81
PSAD (ng/mL/cm³)	0.26 [0.17, 0.41]	0.26 [0.16, 0.41]	0.28 [0.18, 0.42]	0.38
PSA (ng/mL)	9.27 [6.42, 13.64]	9.21 [6.34, 12.97]	9.34 [6.8, 14.65]	0.55
f/t PSA	12.2 [8.99, 15.85]	12.2 [8.72, 16.42]	12.18 [9.21, 15.45]	0.73
AST/ALT	1.08 [0.87, 1.24]	1.1 [0.9, 1.23]	1 [0.8, 1.25]	0.30
NR	58.59±5.81	58.39±5.65	58.88±6.06	0.48
LR	30.7 [26.78, 34.88]	31 [27.13, 35.4]	29.45 [26.52, 34.18]	0.28
MPV (fL)	9.9 [9.2, 10.6]	9.9 [9.3, 10.67]	9.8 [9.12, 10.5]	0.34
RDW (%)	12.7 [12.47, 13.2]	12.7 [12.5, 13.17]	12.8 [12.4, 13.2]	0.91
NLR	1.93 [1.54, 2.35]	1.9 [1.54, 2.31]	2.02 [1.57, 2.41]	0.31
PLR	114.14 [92.1, 131.31]	112.7 [91.34, 132.27]	115.46 [95.2, 130.2]	0.42
SII	375.01 [300.8, 483.3]	370.81 [284.06, 473.11]	375.78 [314.17, 496.72]	0.19

Data are presented as mean ± SD, median [IQR], or n [%]. AST/ALT, aspartate aminotransferase to alanine aminotransferase ratio; BMI, body mass index; f/t PSA, free-to-total PSA ratio; IQR, interquartile range; ISUP, International Society of Urological Pathology; LR, lymphocyte ratio; MPV, mean platelet volume; NLR, neutrophil-to-lymphocyte ratio; NR, neutrophil ratio; PLR, platelet-to-lymphocyte ratio; PSA, prostate-specific antigen; PSAD, prostate-specific antigen density; RDW, red cell distribution width; SD, standard deviation; SII, systemic immune-inflammation index; T, tumor stage.

Figure 1 The study flow chart. ADT, androgen deprivation therapy.

Table 2

The differences in ISUP grading between biopsy and post-surgical resection

ISUP of biopsy	ISUP of RP					Total
ISUP of biopsy	1	2	3	4	5	Total
1	29	60	7	0	1	97
2	1	39	22	4	3	69
3	0	22	25	3	3	53
4	0	8	29	6	11	54
5	0	2	6	0	7	15
Total	30	131	89	13	25	288

ISUP, International Society of Urological Pathology; RP, radical prostatectomy.

Feature selection

In the LASSO model, a vertical line was drawn at the value selected through 20-fold cross-validation, resulting in the identification of eight features with non-zero coefficients (Figure 2A,2B). Since the correlation coefficients among these features were all below 0.8 (Figure 3), the following variables were included in model training: ISUP grade, age, clinical T stage, BMI, PSA, free-to-total PSA ratio (f/t PSA), PLR, and bilateral tumor involvement.

Figure 2 Feature selection using LASSO regression. (A) Coefficient path plot of feature variables. (B) Deviation plot for lambda selection. LASSO, least absolute shrinkage and selection operator.

Figure 3 The correlation matrix. AST/ALT, aspartate aminotransferase to alanine aminotransferase ratio; BMI, body mass index; f/t PSA, free-to-total PSA ratio; ISUP, International Society of Urological Pathology; LR, lymphocyte ratio; MPV, mean platelet volume; NLR, neutrophil-to-lymphocyte ratio; NR, neutrophil ratio; PLR, platelet-to-lymphocyte ratio; PSA, prostate-specific antigen; PSAD, prostate-specific antigen density; RDW, red cell distribution width; SII, systemic immune-inflammation index; T, tumor stage.

Model development and performance comparison

Following the identification of eight key variables, multiple machine learning models were implemented to predict GSU. The optimal hyper-parameter values for each model are summarized in Table S1, while the logistic regression coefficients, along with their statistical significance, are reported in Table S2. Among these models, LightGBM exhibited the best across key evaluation metrics (Table 3). Using Youden’s index (calculated as sensitivity + specificity − 1), the optimal cutoff probability for the LightGBM model was determined to be 39.42%. Compared to other models, LightGBM demonstrated superior predictive capability, as evidenced by its ROC curves (Figure 4A) and its higher AUROC with a lower SD (Figure 4B), indicating robust and consistent predictive accuracy. Additionally, the model showed excellent performance in calibration curves (Figure 4C) and DCA (Figure 4D).

Table 3

The performance of each model on the test set at the optimal threshold

Model	Accuracy	Sensitivity	Specificity	Youden’s index	F1 score	AUROC
Logistic	0.75862069	0.86956522	0.68571429	0.5552795	0.74074074	0.82111801
Enet	0.75862069	0.7826087	0.74285714	0.52546584	0.72	0.80745342
DT	0.68965517	0.7826087	0.62857143	0.41118012	0.66666667	0.70559006
RF	0.75862069	0.7826087	0.74285714	0.52546584	0.72	0.78571429
XGBoost	0.72413793	0.7826087	0.68571429	0.46832298	0.69230769	0.80869565
RSVM	0.72413793	0.68571429	0.7826087	0.46832298	0.75	0.75900621
MLP	0.75862069	0.7826087	0.74285714	0.52546584	0.72	0.80248447
LightGBM	0.77586207	0.82608696	0.74285714	0.5689441	0.74509804	0.84534162
KNN	0.62068966	0.45714286	0.86956522	0.32670808	0.59259259	0.69937888

AUROC, area under the receiver operating characteristic curve; DT, decision tree; Enet, elastic net; KNN, K-nearest neighbors; LightGBM, light gradient boosting machine; Logistic, logistic regression; MLP, multilayer perceptron; RF, random forest; RSVM, radial support vector machine; XGBoost, extreme gradient boosting.

Figure 4 Performance evaluation of multiple models. (A) ROC curve evaluation for each model. (B) AUROC values and errors for different models. (C) Comparison of calibration curves across models. (D) Net benefit decision curves for different models. AUROC, area under the receiver operating characteristic curve; DT, decision tree; Enet, elastic net; KNN, K-nearest neighbors; LightGBM, light gradient boosting machine; Logistic, logistic regression; MLP, multilayer perceptron; RF, random forest; ROC, receiver operating characteristic; RSVM, radial support vector machine; XGBoost, extreme gradient boosting.

Model interpretation

The SHAP package was utilized to interpret the LightGBM model, quantifying the contribution of each feature to the model’s predictions across the dataset. This analysis illustrated both the positive and negative impacts of individual features on prediction outcomes (Figure 5). For each sample, SHAP analysis provided detailed insights into how specific variables influenced prediction results (Figure 6). The LightGBM model determined an optimal cutoff threshold of 39.42%, where a predicted probability exceeding this value indicated an increased risk of GSU. Individuals exceeding this threshold may benefit from closer monitoring. Additionally, a user-friendly online tool was developed to facilitate the clinical application of this predictive model. This tool is accessible at (accessible at https://lishufeng1124.shinyapps.io/lgbm_app/).

Figure 5 SHAP interprets the model. Summary plot of SHAP values for the LightGBM model (A) and univariate SHAP value plot (B). The summary plot shows the impact of each feature in predicting GSU. Blue dots indicate lower feature values, while red dots indicate higher feature values. Univariate SHAP value plot shows the relationship between specific feature values and their SHAP values. BMI, body mass index; double side, bilateral tumor involvement; f/t PSA, free-to-total PSA ratio; GSU, Gleason score upgrade; ISUP, International Society of Urological Pathology; LightGBM, light gradient boosting machine; PLR, platelet-to-lymphocyte ratio; PSA, prostate-specific antigen; SHAP, SHapley Additive exPlanations; T, tumor stage.

Figure 6 Single-sample SHAP value plot. They respectively show the SHAP values of each variable for the 3^rd (A), 53^rd (B), 103^rd (C), and 153^rd (D) samples in the test set. BMI, body mass index; double side, bilateral tumor involvement; f/t PSA, free-to-total PSA ratio; ISUP, International Society of Urological Pathology; PLR, platelet-to-lymphocyte ratio; PSA, prostate-specific antigen; SHAP, SHapley Additive exPlanations; T, tumor stage.

External validation of the model

To assess the generalizability of the predictive model, external validation was conducted using data from West China Hospital. The model, initially trained and internally validated using data from Shanghai General Hospital, was applied to this independent dataset. The LightGBM model demonstrated strong clinical discrimination, achieving an AUROC of 76.60% in the external validation set. Furthermore, the model exhibited good calibration, clinical applicability, and net benefit (Figure 7).

Figure 7 The diagnostic performance of LightGBM model on the external validation set. The figure shows ROC curves (A), calibration curves (B), and DCA curves (C). DCA, decision curve analysis; LightGBM, light gradient boosting machine; ROC, receiver operating characteristic.

Discussion

In this study, we developed and validated nine machine learning-based models to predict the likelihood of GSU, and compared their predictive performance. Results from the test dataset indicated that the LightGBM model exhibited the highest accuracy in identifying individuals at risk for GSU. Compared to other models, LightGBM demonstrated superior performance across multiple evaluation metrics, including ROC curves, calibration curves, and DCA. In external validation, although the model’s performance was slightly lower than in the training and test sets, it maintained strong discriminatory ability, achieving an AUROC of 76.61%. Furthermore, the calibration curve and DCA curve remained within acceptable levels. Additionally, to enhance clinical applicability, an online prediction tool was developed, enabling users to efficiently generate individualized risk predictions and support evidence-based clinical decision-making. By accurately predicting GSU risk, our model may facilitate early intervention and personalized treatment planning, potentially improving long-term prognosis and reducing the risk of biochemical recurrence and distant metastasis. This predictive model addresses the limitations of traditional assessment methods, offering a more reliable foundation for clinical treatment decisions.

MRI-guided targeted biopsy data to predict GSU in an active surveillance cohort, achieving high AUROCs of 0.952 and 0.947 using AdaBoost and RF, respectively (19). While their work emphasizes the value of MRI-derived features, our study adopted a broader modeling strategy by evaluating nine machine learning algorithms and additionally incorporating systemic inflammatory markers such as PLR. Despite the absence of MRI variables due to missing data, our LightGBM model achieved robust performance (AUROC =0.8453 internally; 0.7661 externally), and was externally validated and paired with SHAP-based interpretability and a web-based prediction tool, offering a practical and generalizable solution for clinical application.

The model was constructed using LASSO regression, which identified eight key predictive variables: ISUP, age, clinical T stage, BMI, PSA, f/t PSA, PLR, and bilateral tumor involvement. These variables were significantly associated with GSU, and SHAP values were employed to quantify their contributions to individual predictions.

Consistent with previous studies, the findings of this research indicate that the ISUP grade is the primary factor influencing GSU. The probability of GSU is highest when ISUP =1 (11,20). This probability gradually decreases as ISUP grade increases. In biopsy-based assessments, individuals classified as ISUP 1 are typically diagnosed with low-grade PCa. However, due to the limitations of biopsy sampling and tumor heterogeneity, the presence of high-grade cancer cells may be overlooked. In contrast, comprehensive pathological assessment following RP is more likely to detect these previously undetected high-grade tumor regions, leading to GSU occurrence. As ISUP grade increases, higher-grade tumors are more likely to have already been identified during initial biopsy, resulting in greater consistency between biopsy-based grading and postoperative pathology, thereby reducing the likelihood of GSU.

In recent years, growing evidence has indicated that tumor-associated inflammatory mechanisms play a key role in the initiation and progression of malignant tumors, influencing tumor behavior through both stimulatory and inhibitory pathways (21). Several studies have reported associations between NLR, PLR, and SII with GSU (22-25). In the present study, a high PLR was found to be significantly associated with GSU occurrence, whereas NLR and SII were not. The significant correlation between high PLR and GSU may be attributable to the role of platelets in tumor progression and metastasis. Notably, high PLR has been associated with poorer overall survival across multiple solid tumor types (26). Platelets have been shown to protect tumor cells from immune system attacks and facilitate tumor dissemination throughout the body (27). To further elucidate the role of inflammatory markers in predicting GSU, future larger-scale multicenter studies will be required, along with further investigations into the underlying inflammatory mechanisms involved in GSU pathogenesis.

This study also identified bilateral tumor involvement as a significant predictor of GSU, a finding that has not been widely reported in previous research. This association may be attributed to the fact that bilateral tumor involvement generally reflects a higher tumor burden and greater tumor heterogeneity. During biopsy sampling, the limited coverage may fail to fully capture the tumor’s degree of differentiation, leading to an underestimation of tumor aggressiveness. Other key predictors of GSU identified in this study included age, clinical T stage, BMI, PSA, and f/t PSA. The findings demonstrated that advanced age, higher BMI, higher clinical T stage, higher PSA, and lower f/t PSA were associated with a higher likelihood of GSU, which is consistent with previous studies. Aging has been linked to an increased likelihood of poorly differentiated and aggressive tumors in individuals with PCa. Higher BMI is often associated with chronic inflammation and metabolic disorders, both of which may contribute to tumor progression (28). Additionally, a higher clinical T stage indicates greater tumor extension, while elevated PSA levels suggest a greater tumor burden, both of which are associated with higher-grade tumors (29). A lower f/t PSA ratio is typically indicative of more aggressive cancers.

Ultrasound-guided biopsy remains the primary method for obtaining prostate tissue specimens, with systematic biopsy performed alone or in combination with targeted biopsy (30). Targeted biopsy has been shown to improve the diagnostic accuracy of PCa. Previous studies have reported that combining MRI-guided targeted biopsy with systematic biopsy can reduce the incidence of GSU compared to systematic biopsy alone (31,32). However, in this study, the inclusion of targeted biopsy did not significantly enhance the predictive performance of machine learning model for GSU. Univariate correlation analysis indicated that a lower number of positive biopsy cores and a lower proportion of positive cores were associated with GSU. However, these variables were not retained in the LASSO regression, likely due to LASSO’s regularization mechanism, which prioritizes variables with a greater impact on overall model performance.

This study has several notable strengths: First, a total of nine machine learning algorithms were employed to construct predictive models, allowing for a comparative analysis to identify the most effective clinical risk prediction model for GSU. Second, the selected model was validated using external data, enabling an evaluation of its generalization ability across different clinical settings to ensure consistent predictive performance. Third, the best-performing model was developed into a web-based application, aiding in disease severity evaluation, further diagnostic examinations, and informed decision-making.

However, this study also has several limitations. First, the retrospective design and data collection from electronic medical records may introduce selection and information biases. Second, due to incomplete data availability, several potentially important clinical variables—such as digital rectal examination (DRE) results, MRI Prostate Imaging Reporting and Data System (PI-RADS) scores, and genomic biomarkers (33)—were not included in the analysis. Third, although stratified cross-validation and external validation were employed, the sample size from each center remained relatively modest. Larger, multicenter, and prospective datasets would help to further improve model robustness and generalizability. Fourth, the model did not incorporate image-based or radiomic features, which may provide additional predictive value. Finally, while SHAP values improve interpretability, further biological validation of model-identified predictors is necessary to elucidate their roles in GSU pathogenesis. Future studies incorporating multimodal data and prospective validation in diverse clinical settings are warranted.

Conclusions

This study successfully developed a machine learning-based predictive model and an online prediction tool for assessing the risk of GSU. In addition to model development, an in-depth analysis was conducted to identify key clinical factors influencing GSU. Through this comprehensive approach, this study provides a valuable tool for clinicians to identify high-risk individuals, facilitating personalized treatment strategies and evidence-based decision-making.

Acknowledgments

We are particularly grateful to all the people who have given us help on our article.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tau.amegroups.com/article/view/10.21037/tau-2025-178/rc

Data Sharing Statement: Available at https://tau.amegroups.com/article/view/10.21037/tau-2025-178/dss

Peer Review File: Available at https://tau.amegroups.com/article/view/10.21037/tau-2025-178/prf

Funding: This study was supported by the National Natural Science Foundation of China (No. 82371634) and the Natural Science Foundation of Shanghai (No. 22ZR1450800).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tau.amegroups.com/article/view/10.21037/tau-2025-178/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Ethics Committees of Shanghai General Hospital (No. 2024KS504) and West China Hospital [No. 2021(1703)]. Due to the retrospective nature of the analysis, informed consent was not obtained from the patients.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Siegel RL, Miller KD, Wagle NS, et al. Cancer statistics, 2023. CA Cancer J Clin 2023;73:17-48. [Crossref] [PubMed]
Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
Epstein JI, Egevad L, Amin MB, et al. The 2014 International Society of Urological Pathology (ISUP) Consensus Conference on Gleason Grading of Prostatic Carcinoma: Definition of Grading Patterns and Proposal for a New Grading System. Am J Surg Pathol 2016;40:244-52. [Crossref] [PubMed]
Mottet N, van den Bergh RCN, Briers E, et al. EAU-EANM-ESTRO-ESUR-SIOG Guidelines on Prostate Cancer-2020 Update. Part 1: Screening, Diagnosis, and Local Treatment with Curative Intent. Eur Urol 2021;79:243-62. [Crossref] [PubMed]
Donovan JL, Hamdy FC, Lane JA, et al. Patient-Reported Outcomes after Monitoring, Surgery, or Radiotherapy for Prostate Cancer. N Engl J Med 2016;375:1425-37. [Crossref] [PubMed]
Thomsen FB, Brasso K, Klotz LH, et al. Active surveillance for clinically localized prostate cancer--a systematic review. J Surg Oncol 2014;109:830-5. [Crossref] [PubMed]
Chen RC, Basak R, Meyer AM, et al. Association Between Choice of Radical Prostatectomy, External Beam Radiotherapy, Brachytherapy, or Active Surveillance and Patient-Reported Quality of Life Among Men With Localized Prostate Cancer. JAMA 2017;317:1141-50. [Crossref] [PubMed]
Litwin MS, Tan HJ. The Diagnosis and Treatment of Prostate Cancer: A Review. JAMA 2017;317:2532-42. [Crossref] [PubMed]
Ahdoot M, Wilbur AR, Reese SE, et al. MRI-Targeted, Systematic, and Combined Biopsy for Prostate Cancer Diagnosis. N Engl J Med 2020;382:917-28. [Crossref] [PubMed]
Zhou L, Xu LL, Zheng LL, et al. Predictors of Gleason Grading Group Upgrading in Low-Risk Prostate Cancer Patients From Transperineal Biopsy After Radical Prostatectomy. Acad Radiol 2024;31:2838-47. [Crossref] [PubMed]
Wang G, Wang X, Du H, et al. Prediction model of gleason score upgrading after radical prostatectomy based on a bayesian network. BMC Urol 2023;23:159. [Crossref] [PubMed]
Kovac E, Vertosick EA, Sjoberg DD, et al. Effects of pathological upstaging or upgrading on metastasis and cancer-specific mortality in men with clinical low-risk prostate cancer. BJU Int 2018;122:1003-9. [Crossref] [PubMed]
Bakavičius A, Drevinskaitė M, Daniūnaitė K, et al. The Impact of Prostate Cancer Upgrading and Upstaging on Biochemical Recurrence and Cancer-Specific Survival. Medicina (Kaunas) 2020;56:61. [Crossref] [PubMed]
Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med 2019;25:24-9. [Crossref] [PubMed]
Chen JH, Asch SM. Machine Learning and Prediction in Medicine - Beyond the Peak of Inflated Expectations. N Engl J Med 2017;376:2507-9. [Crossref] [PubMed]
Zhang Y, Zhang X, Razbek J, et al. Opening the black box: interpretable machine learning for predictor finding of metabolic syndrome. BMC Endocr Disord 2022;22:214. [Crossref] [PubMed]
Ali S, Akhlaq F, Imran AS, et al. The enlightening role of explainable artificial intelligence in medical & healthcare domains: A systematic literature review. Comput Biol Med 2023;166:107555. [Crossref] [PubMed]
Mullah MAS, Hanley JA, Benedetti A. LASSO type penalized spline regression for binary data. BMC Med Res Methodol 2021;21:83. [Crossref] [PubMed]
ElKarami B, Deebajah M, Polk S, et al. Machine learning-based prediction of upgrading on magnetic resonance imaging targeted biopsy in patients eligible for active surveillance. Urol Oncol 2022;40:191.e15-20. [Crossref] [PubMed]
Yan H, Wu Y, Cui X, et al. From Cognitive MR-Targeted Fusion Prostate Biopsy to Radical Prostatectomy: Incidence and Predictors of Gleason Grade Group Upgrading in a Chinese Cohort. Biomed Res Int 2022;2022:7944342. [Crossref] [PubMed]
Sun Z, Ju Y, Han F, et al. Clinical implications of pretreatment inflammatory biomarkers as independent prognostic indicators in prostate cancer. J Clin Lab Anal 2018;32:e22277. [Crossref] [PubMed]
Baylan B, Ulusoy K, Ekenci B, et al. Can systemic immune-inflammation index and hematologic parameters aid in decision-making for active surveillance or curative treatment in low-risk prostate cancer? Asian J Surg 2024;47:1360-5. [Crossref] [PubMed]
Tomioka M, Saigo C, Kawashima K, et al. Clinical Predictors of Grade Group Upgrading for Radical Prostatectomy Specimens Compared to Those of Preoperative Needle Biopsy Specimens. Diagnostics (Basel) 2022;12:2760. [Crossref] [PubMed]
Wang S, Ji Y, Ma J, et al. Role of inflammatory factors in prediction of Gleason score and its upgrading in localized prostate cancer patients after radical prostatectomy. Front Oncol 2022;12:1079622. [Crossref] [PubMed]
Wang Y, Chen X, Liu K, et al. Predictive Factors for Gleason Score Upgrading in Patients with Prostate Cancer after Radical Prostatectomy: A Systematic Review and Meta-Analysis. Urol Int 2023;107:460-79. [Crossref] [PubMed]
Templeton AJ, Ace O, McNamara MG, et al. Prognostic role of platelet to lymphocyte ratio in solid tumors: a systematic review and meta-analysis. Cancer Epidemiol Biomarkers Prev 2014;23:1204-12. [Crossref] [PubMed]
Lucotti S, Muschel RJ. Platelets and Metastasis: New Implications of an Old Interplay. Front Oncol 2020;10:1350. [Crossref] [PubMed]
Vidal AC, Howard LE, Sun SX, et al. Obesity and prostate cancer-specific mortality after radical prostatectomy: results from the Shared Equal Access Regional Cancer Hospital (SEARCH) database. Prostate Cancer Prostatic Dis 2017;20:72-8. [Crossref] [PubMed]
D’Amico AV, Chen MH, Roehl KA, et al. Preoperative PSA velocity and the risk of death from prostate cancer after radical prostatectomy. N Engl J Med 2004;351:125-35. [Crossref] [PubMed]
Eklund M, Jäderling F, Discacciati A, et al. MRI-Targeted or Standard Biopsy in Prostate Cancer Screening. N Engl J Med 2021;385:908-20. [Crossref] [PubMed]
Weinstein IC, Wu X, Hill A, et al. Impact of Magnetic Resonance Imaging Targeting on Pathologic Upgrading and Downgrading at Prostatectomy: A Systematic Review and Meta-analysis. Eur Urol Oncol 2023;6:355-65. [Crossref] [PubMed]
Zheng T, Bi K, Tang Y, et al. Cognitive fusion-targeted biopsy versus transrectal ultrasonography-guided systematic biopsy: comparison and analysis of the risk of Gleason score upgrading. Int Urol Nephrol 2024;56:981-8. [Crossref] [PubMed]
Hamzeh O, Alkhateeb A, Zheng JZ, et al. A Hierarchical Machine Learning Model to Discover Gleason Grade-Specific Biomarkers in Prostate Cancer. Diagnostics (Basel) 2019;9:219. [Crossref] [PubMed]

Cite this article as: Li SF, Zhao JG, Jiang CY, Wang SY, Liu SY, Zhang YJ, Zeng H, Zhao FJ. Development and validation of an interpretable machine learning model for predicting Gleason score upgrade in prostate cancer. Transl Androl Urol 2025;14(6):1631-1644. doi: 10.21037/tau-2025-178

Development and validation of an interpretable machine learning model for predicting Gleason score upgrade in prostate cancer

Highlight box

Introduction

Methods

Study population

Data collection

Statistical analysis

Model training and validation

Evaluation metrics

Model interpretation

Results

Baseline characteristics

Table 1

Table 2

Feature selection

Model development and performance comparison

Table 3

Model interpretation

External validation of the model

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share