Machine learning studies for predicting 5-year renal cell cancer survival: a multicenter study
Highlight box
Key findings
• The gradient boosting machine model demonstrated area under the receiver operating characteristic curve values of 0.841 (95% confidence interval: 0.833–0.850), surpassing the performance of the other six machine learning algorithms for predicting 5-year survival for renal cell carcinoma (RCC) patients.
• Shapley Additive Explanations (SHAP) analysis provided independent explanations, reaffirming the critical clinical factors associated with the risk of 5-year survival for RCC patients.
• A useful prognosis models was constructed to predict the 5-year survival risk of RCC patients.
What is known and what is new?
• RCC is characterized by high heterogeneity. Which patients need closed follow-up and can benefit from adjuvant therapy after nephrectomy is still controversial.
• There is lack of a tool for quantifying the individual prognosis of RCC patients. Our study aims to construct prediction models to predict the 5-year survival risk of RCC patients, providing valuable insights for guiding the clinical management and individual prognosis of RCC patients.
What is the implication, and what should change now?
• This research aims to analyze the factors influencing 5-year survival of RCC patients based on demographic information and tumor characteristics. Future research should integrate some useful indicators, such as genomic data and tumor markers, to enhance the predictive ability of our models.
Introduction
Renal cell carcinoma (RCC) is the most common malignancy in renal carcinoma, with an estimated 434,840 incidences worldwide in 2022 (1,2). RCC patients (37–61%) are usually diagnosed incidentally via an abdominal examination such as ultrasound or computed tomographic scan, and approximately 70% of patients are presented with stage I RCC at diagnosis (3,4). There are now more than 20 subtypes of malignant renal cell tumors recognized by the World Health Organization classification system. The three dominant histological subtypes are clear cell (75–80%), papillary (10–15%), and chromophobe (5%) (5,6).
Treatment options for RCC confined to the kidney include surgical resection with partial or radical nephrectomy (7). For advanced or metastatic RCC, combinations of immune checkpoint inhibitors or the combination of immune checkpoint inhibitors with tyrosine kinase inhibitors are associated with tumor response of 42% to 71% (8,9). Systemic therapy is currently used as adjuvant therapy after nephrectomy for localized RCC. Whether patients can benefit from adjuvant therapy after nephrectomy is still controversial. High-risk patients may be filtered out as those patients may benefit from adjuvant therapy after nephrectomy (10,11).
Nowadays, machine learning techniques play an important role in identifying prognostic factors in cancers, thereby providing physicians with valuable information for decision-making (12). Machine learning has been widely adopted for survival risk prediction in patients with cancers, with its analytical and visual capabilities serving as potential tools for clinical forecasting (13,14). Machine learning can capture more complex interactions among variables compared to traditional statistical methods. Several machine learning-based models have been developed to predict recurrence or survival in RCC, incorporating multimodal data including clinical parameters, imaging, and even histopathological features (15-17). For example, explainable machine learning models have been proposed to assess distant metastasis risk (15), and subtype-specific tools such as chromophobe RCC nomograms have been recently introduced (18). However, no prior studies have specifically investigated machine learning-based predictive models for RCC patients.
This research aims to analyze the factors influencing 5-year survival of RCC patients based on demographic and tumor-related characteristics. Additionally, this study aims to develop prediction models for 5-year survival risk in RCC patients, offering valuable insights to optimize clinical decision-making and personalized prognosis for RCC patients. We present this article in accordance with the TRIPOD reporting checklist (available at https://tau.amegroups.com/article/view/10.21037/tau-2025-438/rc).
Methods
Data source and cohort construction
The RCC patients data involved in this research were extracted from the Surveillance, Epidemiology, and End Results Program (SEER) database (http://seer.cancer.gov/), updated by the National Cancer Institute in November 2023. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The SEER database is one of the most authoritative large-scale cancer registries in the United States. We extracted clinical case data for RCC patients diagnosed between 2010 and 2021 from SEER database by SEER*Stat software (v 8.4.3).
The inclusion criteria were: (I) patients diagnosed with primary RCC (SEER primary site code C649, ICD-O-3 histology codes in 8000–8110, 8140–8576, 8980, 8981, or 8940–8950); (II) diagnosed between 2010 and 2021; (III) only one malignant tumor. The exclusion criteria were: (I) unknown if surgery performed; (II) tumor diagnosis reported from nursing homes, hospice hospitals, autopsies, or death certificates; (III) survival time less than one month; (IV) carcinoma in situ; (V) missing information on tumor-node-metastasis (TNM) stage, marital status, race, grade, tumor size, age at diagnosis, total number of lymph nodes examined, or number of positive lymph nodes. Ultimately, the selected patients were divided into training and validation sets at a ratio of 7:3 using stratified random sampling based on 5-year survival status. The detailed data screening process was shown in Figure 1.
The final predictors included in the study were age at diagnosis, tumor size, race, marital status, sex, histology, grade, tumor invasion depth (T), and lymph node involvement (N), distant metastases (M), surgery, radiotherapy and chemotherapy.
Feature selection
Least absolute shrinkage and selection operator (LASSO) regression was used to identify clinicopathological factors linked to 5-year mortality in the training set. LASSO is a linear regression technique incorporating L1 regularisation, which enables simultaneous feature selection and parameter estimation. The optimal regularisation parameter (λ) was determined via cross-validation in order to balance model complexity and performance. By minimising the loss function augmented with an L1 penalty term, LASSO progressively shrinks the coefficients of less informative features to zero, thereby identifying the most predictive variables. Following model training, variables with non-zero coefficients were retained as key predictors significantly associated with the outcome of interest.
Simultaneously, feature selection on the training set was performed using the Boruta algorithm, which identifies features most relevant to the risk of 5-year mortality in RCC by comparing their importance to that of randomized “shadow” features. Boruta is a wrapper method based on random forest (RF) importance scores, designed to capture all variables associated with the target outcome while minimizing the omission of useful features. A feature is considered truly important only if its importance significantly exceeds that of its shadow counterpart.
Subsequently, the intersection of features selected by the two methods was used as inputs for machine learning algorithms to develop a model predicting 5-year mortality risk.
Prediction model development and evaluation
In this study, seven machine learning classification algorithms were applied to construct predictive models for 5-year survival in patients with RCC. The algorithms encompassed LightGBM, logistic regression (LR), RF, XGBoost, decision tree (DT), neural network (NN), and gradient boosting machine (GBM). Hyperparameter optimisation was conducted using 10-fold cross-validation to determine the most suitable model configuration. The training set was utilised for the development and selection of the model, while the validation set was reserved for validation.
The performance of the model was evaluated using a range of evaluation metrics, including accuracy, positive predictive value (PPV), negative predictive value (NPV), sensitivity, specificity, F1-score, and the area under the receiver operating characteristic (ROC) curve (AUC). Discriminative ability was evaluated through the implementation of ROC curve analysis, and model calibration was evaluated using calibration plots to assess the concordance between predicted probabilities and observed outcomes. A calibration curve that closely aligns with the ideal 45° reference line is indicative of superior predictive reliability. In order to assess the clinical utility of the models, decision curve analysis (DCA) was performed, which estimates the net benefit of model-guided decision-making across a range of threshold probabilities.
Following the identification of the optimal model, Shapley Additive Explanations (SHAP) analysis was employed to interpret feature importance. The SHAP framework provides both global and local interpretability by assigning consistent and theoretically grounded attribution values to each feature. This elucidates the contribution of individual variables to the predicted 5-year survival outcomes in RCC. Finally, a web-based application was developed to use the final model to predict 5-year survival for the renal cell cancer patients.
Statistical analysis
Baseline characteristics were summarized using mean ± standard deviation for normally distributed continuous variables and median (interquartile range) for non-normally distributed variables. The normality of the data was assessed using Kolmogorov-Smirnov test. For the purpose of conducting between-group comparisons, the independent samples t-test was employed for variables that exhibited a normal distribution, whilst the Mann-Whitney U test was utilised for variables that did not conform to a normal distribution. Categorical variables were summarised as frequencies and percentages, and group differences were evaluated using the Chi-squared test. A two-sided P value of less than 0.05 was considered to be indicative of statistical significance. All statistical analyses were conducted using R software, version 4.2.3. The specific packages and their functions are as follows: the caret, lightgbm packages were used for model development, the risk Regression package was used for plotting ROC and calibration curves, the dcurves package was used for drawing DCA curves, and the kernelshap and shapviz packages were used for SHAP analysis.
Results
Baseline characteristics of RCC patients
We retrieved 38,792 eligible individuals diagnosed with RCC between 2010 and 2021, with 27,155 assigned to the training set and 11,637 to the validation set. Table 1 shows the clinicopathologic characteristics of the study cohort, including patients’ demographic information, cancer conditions, clinical treatment on training and validation sets. The mean ages of RCC patients were both 61 in training and validation sets. Within the study cohort, most of the patients undergo surgery and presented with no lymph nodes or distant metastasis.
Table 1
| Variables | Training set (N=27,155, 70%) | Validation set (N=11,637, 30%) | P value |
|---|---|---|---|
| Age (years) | 61.00 [52.00, 69.00] | 61.00 [52.00, 69.00] | 0.74 |
| Tumor size (mm) | 45.00 [30.00, 74.00] | 45.00 [30.00, 74.00] | 0.75 |
| Race | 0.17 | ||
| White | 22,399 (82.49) | 9,525 (81.85) | |
| Black | 2,804 (10.33) | 1,215 (10.44) | |
| Others | 1,952 (7.19) | 897 (7.71) | |
| Marital status | 0.32 | ||
| Married | 17,293 (63.68) | 7,473 (64.22) | |
| Others | 9,862 (36.32) | 4,164 (35.78) | |
| Sex | 0.60 | ||
| Female | 9,944 (36.62) | 4,229 (36.34) | |
| Male | 17,211 (63.38) | 7,408 (63.66) | |
| Histology | 0.32 | ||
| CCRCC | 18,348 (67.57) | 7,918 (68.04) | |
| PRCC | 2,931 (10.79) | 1,282 (11.02) | |
| RCC-NOS | 3,234 (11.91) | 1,311 (11.27) | |
| Others | 2,642 (9.73) | 1,126 (9.68) | |
| Grade | 0.04 | ||
| Grade I | 2,968 (10.93) | 1,214 (10.43) | |
| Grade II | 13,417 (49.41) | 5,808 (49.91) | |
| Grade III | 7,980 (29.39) | 3,506 (30.13) | |
| Grade IV | 2,790 (10.27) | 1,109 (9.53) | |
| T | 0.20 | ||
| T1 | 17,293 (63.68) | 7,434 (63.88) | |
| T2 | 2,948 (10.86) | 1,329 (11.42) | |
| T3 | 6,290 (23.16) | 2,625 (22.56) | |
| T4 | 624 (2.30) | 249 (2.14) | |
| N | 0.61 | ||
| N0 | 25,706 (94.66) | 11,031 (94.79) | |
| N1 | 1,449 (5.34) | 606 (5.21) | |
| M | 0.86 | ||
| M0 | 24,396 (89.84) | 10,448 (89.78) | |
| M1 | 2,759 (10.16) | 1,189 (10.22) | |
| TNM | 0.33 | ||
| I | 16,866 (62.11) | 7,240 (62.22) | |
| II | 2,434 (8.96) | 1,095 (9.41) | |
| III | 4,877 (17.96) | 2,021 (17.37) | |
| IV | 2,978 (10.97) | 1,281 (11.01) | |
| Surgery | 0.23 | ||
| Yes | 26,171 (96.38) | 11,186 (96.12) | |
| No | 984 (3.62) | 451 (3.88) | |
| Radiation | 0.83 | ||
| Yes | 882 (3.25) | 383 (3.29) | |
| No | 26,273 (96.75) | 11,254 (96.71) | |
| Chemotherapy | 0.58 | ||
| Yes | 2,021 (7.44) | 885 (7.61) | |
| No/unknown | 25,134 (92.56) | 10,752 (92.39) | |
| 5-year survival | 0.99 | ||
| Alive | 20,057 (73.86) | 8,595 (73.86) | |
| Dead | 7,098 (26.14) | 3,042 (26.14) |
Data are presented as median [interquartile range] or n (%). CCRCC, clear cell renal cell carcinoma; M, distant metastases; N, lymph node involvement; PRCC, papillary renal cell carcinoma; RCC-NOS, renal cell carcinoma-not otherwise specified; T, tumor invasion depth; TNM, tumor-node-metastasis.
Table 2 shows the demographic and clinicopathologic variables based on 5-year survival status. There were 28,652 patients alive and 10,140 patients dead. Patients who were alive at 5-year presented with smaller tumor sizes, lower grade stages and TNM stages at the time of diagnosis.
Table 2
| Variables | Alive (N=28,652, 74%) | Dead (N=10,140, 26%) | P value |
|---|---|---|---|
| Age (years) | 59.00 [50.00, 67.00] | 65.00 [57.00, 73.00] | <0.001 |
| Tumor size (mm) | 40.00 [26.00, 60.00] | 70.00 [45.00, 100.00] | <0.001 |
| Race | 0.20 | ||
| White | 23,632 (82.48) | 8,292 (81.78) | |
| Black | 2,923 (10.20) | 1,096 (10.81) | |
| Others | 2,097 (7.32) | 752 (7.42) | |
| Marital status | <0.001 | ||
| Married | 18,854 (65.80) | 5,912 (58.30) | |
| Others | 9,798 (34.20) | 4,228 (41.70) | |
| Sex | <0.001 | ||
| Female | 10,920 (38.11) | 3,253 (32.08) | |
| Male | 17,732 (61.89) | 6,887 (67.92) | |
| Histology | <0.001 | ||
| CCRCC | 19,752 (68.94) | 6,514 (64.24) | |
| PRCC | 3,227 (11.26) | 986 (9.72) | |
| RCC-NOS | 3,143 (10.97) | 1,402 (13.83) | |
| Others | 2,530 (8.83) | 1,238 (12.21) | |
| Grade | <0.001 | ||
| Grade I | 3,489 (12.18) | 693 (6.83) | |
| Grade II | 16,074 (56.10) | 3,151 (31.07) | |
| Grade III | 7,809 (27.25) | 3,677 (36.26) | |
| Grade IV | 1,280 (4.47) | 2,619 (25.83) | |
| T | <0.001 | ||
| T1 | 21,006 (73.31) | 3,721 (36.70) | |
| T2 | 2,944 (10.28) | 1,333 (13.15) | |
| T3 | 4,575 (15.97) | 4,340 (42.80) | |
| T4 | 127 (0.44) | 746 (7.36) | |
| N | <0.001 | ||
| N0 | 28,354 (98.96) | 8,383 (82.67) | |
| N1 | 298 (1.04) | 1,757 (17.33) | |
| M | <0.001 | ||
| M0 | 28,018 (97.79) | 6,826 (67.32) | |
| M1 | 634 (2.21) | 3,314 (32.68) | |
| TNM | <0.001 | ||
| I | 20,850 (72.77) | 3,256 (32.11) | |
| II | 2,791 (9.74) | 738 (7.28) | |
| III | 4,302 (15.01) | 2,596 (25.60) | |
| IV | 709 (2.47) | 3,550 (35.01) | |
| Surgery | <0.001 | ||
| Yes | 28,391 (99.09) | 8,966 (88.42) | |
| No | 261 (0.91) | 1,174 (11.58) | |
| Radiation | <0.001 | ||
| Yes | 151 (0.53) | 1,114 (10.99) | |
| No | 28,501 (99.47) | 9,026 (89.01) | |
| Chemotherapy | <0.001 | ||
| Yes | 577 (2.01) | 2,329 (22.97) | |
| No/unknown | 28,075 (97.99) | 7,811 (77.03) |
Data are presented as median [interquartile range] or n (%). CCRCC, clear cell renal cell carcinoma; M, distant metastases; N, lymph node involvement; PRCC, papillary renal cell carcinoma; RCC-NOS, renal cell carcinoma-not otherwise specified; T, tumor invasion depth; TNM, tumor-node-metastasis.
Feature selection
The results of the feature selection using LASSO regression were demonstrated in Figure 2A,2B. Eleven variables were identified as being associated with 5-year survival in RCC. The results of Boruta were shown in Figure 2C, which identified 12 features associated with the 5-year survival risk of RCC. Finally, the following variables were included in the study: age at diagnosis, tumor size, marital status, histology, tumor grade, tumor invasion depth (T stage), lymph node involvement (N stage), distant metastasis (M stage), surgery, radiotherapy, and chemotherapy.
Model performance
The development of a 5-year survival risk prediction model for RCC incorporated 11 variables selected through LASSO regression. The construction of the prediction models involved the application of seven machine learning methods. In order to reduce the risk of overfitting and to identify the optimal model, 10-fold cross-validation was conducted on the training set. The performance of the model was presented in Table 3. The ROC curves, calibration plots, and DCA curves were presented in Figures 3-5.
Table 3
| Datasets | Models | AUC | Accuracy | PPV | NPV | Recall | Specific | F1 score |
|---|---|---|---|---|---|---|---|---|
| Training set | LGB | 0.899 | 0.828 | 0.641 | 0.913 | 0.773 | 0.847 | 0.701 |
| LR | 0.841 | 0.781 | 0.564 | 0.889 | 0.716 | 0.804 | 0.631 | |
| RF | 0.847 | 0.772 | 0.546 | 0.897 | 0.746 | 0.781 | 0.631 | |
| XGBoost | 0.829 | 0.785 | 0.574 | 0.883 | 0.695 | 0.817 | 0.628 | |
| DT | 0.738 | 0.819 | 0.782 | 0.825 | 0.425 | 0.958 | 0.551 | |
| NN | 0.845 | 0.795 | 0.592 | 0.884 | 0.693 | 0.831 | 0.639 | |
| GBM | 0.844 | 0.774 | 0.551 | 0.896 | 0.742 | 0.786 | 0.632 | |
| Test set | LGB | 0.825 | 0.784 | 0.572 | 0.88 | 0.684 | 0.819 | 0.623 |
| LR | 0.839 | 0.783 | 0.568 | 0.888 | 0.712 | 0.808 | 0.632 | |
| RF | 0.833 | 0.762 | 0.533 | 0.887 | 0.722 | 0.776 | 0.613 | |
| XGBoost | 0.823 | 0.786 | 0.576 | 0.881 | 0.686 | 0.822 | 0.626 | |
| DT | 0.738 | 0.821 | 0.799 | 0.824 | 0.419 | 0.963 | 0.55 | |
| NN | 0.839 | 0.797 | 0.597 | 0.883 | 0.686 | 0.836 | 0.638 | |
| GBM | 0.841 | 0.775 | 0.552 | 0.894 | 0.736 | 0.788 | 0.631 |
DT, decision tree; GBM, gradient boosting machine; LGB, light gradient boosting machine; LR, logistic regression; NN, neural network; NPV, negative predictive value; PPV, positive predictive value; RF, random forest; XGBoost, extreme gradient boosting.
We found that the GBM model showed the best performance in predicting 5-year survival of RCC, with an AUC of 0.841 (95% CI: 0.833–0.850) on the validation set, outperforming other models. The calibration performance of the GBM model is confirmed by the lowest Brier score (0.127, 95% CI: 0.126–0.128) and ECI (0.005, 95% CI: 0.003–0.007) on the validation set (Table 4). In addition, the calibration curve of the GBM model showed good calibration alignment between predicted 5-year survival of RCC and what actually happened (Figure 4). DCA curves also supported the GBM model providing a good net clinical benefit in predicting 5-year survival of RCC (Figure 5), further confirming its excellent performance in the validation set.
Table 4
| Datasets | Models | Brier | Intercept | Slope | ECI |
|---|---|---|---|---|---|
| Training set | LGB | 0.102 | −0.006 | 1.318 | 0.075 |
| LR | 0.126 | 0.002 | 1.025 | 0.004 | |
| RF | 0.126 | 0.010 | 1.448 | 0.259 | |
| XGBoost | 0.138 | −0.158 | 1.898 | 0.879 | |
| DT | 0.145 | 0.031 | 1.013 | 0.037 | |
| NN | 0.124 | 0.015 | 1.002 | 0.008 | |
| GBM | 0.124 | 0 | 1.046 | 0.017 | |
| Test set | LGB | 0.133 | 0.027 | 0.781 | 0.072 |
| LR | 0.127 | 0.018 | 0.974 | 0.011 | |
| RF | 0.131 | 0.019 | 1.323 | 0.174 | |
| XGBoost | 0.139 | −0.163 | 1.814 | 0.890 | |
| DT | 0.143 | 0.025 | 1.028 | 0.009 | |
| NN | 0.127 | 0.041 | 0.959 | 0.013 | |
| GBM | 0.127 | 0.025 | 0.999 | 0.005 |
DT, decision tree; ECI, estimated calibration index; GBM, gradient boosting machine; LGB, light gradient boosting machine; LR, logistic regression; NN, neural network; RF, random forest; XGBoost, extreme gradient boosting.
Figure 6A shows the importance of features in the GBM-based prediction model. The summary plot offers a comprehensive visual representation for interpreting patient risk, revealing the importance and impact of each variable in the model’s predictions. Age at diagnosis, tumor size, M, T, and grade were regarded as important features to interpret the model for predicting the risk of 5-year survival of RCC. In Figure 6B, the position of the point on the X-axis represents the actual SHAP value, reflecting the impact of specific features on the model output of this patient. Mathematically, this corresponds to the relative logarithm of survival risk between patients, meaning that a higher SHAP value represents a greater risk of death than a patient with a lower SHAP value. Features are arranged along the Y-axis according to their importance, which is determined by the mean of their absolute Shapley values. The higher the position of the feature in the graph, the more significant its influence on the model. As age at diagnosis and tumor size increased, so did SHAP values, and SHAP values greater than zero corresponded to positive predictions in the model, indicating an increased risk of death within 5 years. Distant metastasis and higher tumor grade are also correspond to a higher risk of death within 5 years in renal cancer.
We developed a web-based application to facilitate the use of the GBM-based model constructed in this study for prognosis prediction. This predictive tool can be accessed via a personal computer or mobile smartphone, assisting clinicians in patient counseling. By inputting patient characteristics, the application provides the predicted 5-year survival probability as well as the contribution of each feature to the prediction (Figure 7).
Discussion
Renal carcinoma is a complex neoplasm that presents with biological heterogeneity and histological diversity (19). After nephrectomy, patients may have up to 50% risk of recurrence (20). Thus, it is of great importance to identify high risk patients with poor prognosis after nephrectomy. In this retrospective study, we developed a machine-learning-based predictive model using population data from the SEER database to achieve individualized 5-year survival prediction for RCC patients, with particular focus on high-risk subgroups with poor prognosis to analyze the impact of each selected feature on survival outcomes. Furthermore, this study systematically evaluated the prognostic significance of clinicopathological and demographic characteristics in RCC survival prediction. To identify key prognostic factors for RCC, the LASSO regression analysis and Boruta algorithm were employed, which ultimately selected 11 highly predictive variables. These findings indicate that clinicians should prioritize and standardize documentation of these indicators to facilitate risk assessment and guide clinical decision-making in RCC management. Moreover, we comprehensively compared the predictive performance of models, including LightGBM, LR, RF, XGBoost, DT, NN and GBM (21). The GBM model was the best on balance across multiple metrics including AUC, Brier score and ECI, while other models (NN, LR) also performed exceptionally well. Good calibration indicates that the predicted probabilities from our model accurately reflect the actual risk of 5-year mortality in RCC patients, which is essential for clinical decision-making. The model’s predictive capability was rigorously validated through various evaluation metrics and visualization methods including ROC curves, calibration curves, and DCA, thereby enhancing the reliability and clinical utility of our findings.
The evaluation metrics (including AUC, Brier score, and ECI) confirmed the model’s strong ability to predict patient prognosis. GBM is an algorithm that could be considered an accurate and practical model for predicting 5-year RCC survival. This model shows potential for integration into clinical decision support systems (22,23), with which the physicians can input the patient characteristics to obtain automated survival risk assessments. This research advances RCC research by employing machine learning approaches to identify the factors affecting prognosis in RCC patients. The predictive models developed have the potential to assist clinicians in making informed decisions regarding the clinical management of RCC patients, which may contribute to improved patient survival outcomes.
Despite the mentioned benefits, some limitations should be noted. First, the SEER database lacks detailed information on systemic therapy, comorbidities, recurrence patterns, and molecular data, restricting our model to a preliminary framework. Future studies integrating these variables alongside demographic and clinicopathological factors will be essential to develop clinically applicable, personalized risk stratification tools and to better predict the prognosis of RCC (19). Second, survival outcomes were restricted to overall survival rather than cancer-specific or progression-free survival, which may be more clinically informative for treatment decision-making. Third, the exclusion of individuals with incomplete staging or missing data, may introduce survivorship bias and limit the applicability and generalizability of the model. The presence of ‘Unknown’ data for chemotherapy may have introduced bias in evaluating treatment effects; therefore, the results should be interpreted with caution. Fourth, although the internal validation of the models has been performed, external validation has not yet been conducted due to the absence of independent datasets. Consequently, subsequent studies should incorporate external, prospective validation in a minimum of two independent cohorts to substantiate the robustness and generalisability of the model.
Conclusions
This study developed an accurate predictive model for 5-year survival in RCC patients using machine learning, which may offer valuable support to clinicians in making informed clinical decisions.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tau.amegroups.com/article/view/10.21037/tau-2025-438/rc
Peer Review File: Available at https://tau.amegroups.com/article/view/10.21037/tau-2025-438/prf
Funding: None.
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tau.amegroups.com/article/view/10.21037/tau-2025-438/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Siegel RL, Giaquinto AN, Jemal A. Cancer statistics, 2024. CA Cancer J Clin 2024;74:12-49. [Crossref] [PubMed]
- Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
- Capitanio U, Bensalah K, Bex A, et al. Epidemiology of Renal Cell Carcinoma. Eur Urol 2019;75:74-84. [Crossref] [PubMed]
- Capitanio U, Montorsi F. Renal cancer. Lancet 2016;387:894-906. [Crossref] [PubMed]
- Alaghehbandan R, Siadat F, Trpkov K. What's new in the WHO 2022 classification of kidney tumours? Pathologica 2022;115:8-22. [Crossref] [PubMed]
- Trpkov K, Williamson SR, Gill AJ, et al. Novel, emerging and provisional renal entities: The Genitourinary Pathology Society (GUPS) update on renal neoplasia. Mod Pathol 2021;34:1167-84. [Crossref] [PubMed]
- Di Franco G, Palmeri M, Sbrana A, et al. Renal cell carcinoma: The role of radical surgery on different patterns of local or distant recurrence. Surg Oncol 2020;35:106-13. [Crossref] [PubMed]
- Saliby RM, Labaki C, Jammihal TR, et al. Impact of renal cell carcinoma molecular subtypes on immunotherapy and targeted therapy outcomes. Cancer Cell 2024;42:732-5. [Crossref] [PubMed]
- Galluzzi L, Aryankalayil MJ, Coleman CN, et al. Emerging evidence for adapting radiotherapy to immunotherapy. Nat Rev Clin Oncol 2023;20:543-57. [Crossref] [PubMed]
- Bueno AN, Stein MN, Runcie K. Adjuvant therapy in renal cell carcinoma (RCC): progress, at last. Transl Cancer Res 2024;13:6448-62. [Crossref] [PubMed]
- Ciccarese C, Strusi A, Arduini D, et al. Post nephrectomy management of localized renal cell carcinoma. From risk stratification to therapeutic evidence in an evolving clinical scenario. Cancer Treat Rev 2023;115:102528. [Crossref] [PubMed]
- Byun SS, Heo TS, Choi JM, et al. Deep learning based prediction of prognosis in nonmetastatic clear cell renal cell carcinoma. Sci Rep 2021;11:1242. [Crossref] [PubMed]
- Zhao H, Cao Y, Wang Y, et al. Dynamic prognostic model for kidney renal clear cell carcinoma (KIRC) patients by combining clinical and genetic information. Sci Rep 2018;8:17613. [Crossref] [PubMed]
- Guo Y, Braga L, Kapoor A. PD07-08 machine learning to predict recurrence of localized renal cell carcinoma. J Urol 2019;201:e145.
- Hou Z, Wang P, Lv D, et al. Explainable machine learning for predicting distant metastases in renal cell carcinoma patients: a population-based retrospective study. Front Med (Lausanne) 2025;12:1624198. [Crossref] [PubMed]
- Chen S, Wang X, Zhang J, et al. Deep learning-based diagnosis and survival prediction of patients with renal cell carcinoma from primary whole slide images. Pathology 2024;56:951-60. [Crossref] [PubMed]
- Distante A, Marandino L, Bertolo R, et al. Artificial Intelligence in Renal Cell Carcinoma Histopathology: Current Applications and Future Perspectives. Diagnostics (Basel) 2023;13:2294. [Crossref] [PubMed]
- Alshwayyat S, Almasri N, Alshwaiyat Y, et al. Personalized survival predictions in chromophobe renal cell carcinoma: development of a machine learning-based web tool. Int Urol Nephrol 2025; Epub ahead of print. [Crossref]
- Motzer RJ, Banchereau R, Hamidi H, et al. Molecular Subsets in Renal Cancer Determine Outcome to Checkpoint and Angiogenesis Blockade. Cancer Cell 2020;38:803-817.e4. [Crossref] [PubMed]
- Kim SP, Thompson RH, Boorjian SA, et al. Comparative effectiveness for survival and renal function of partial and radical nephrectomy for localized renal tumors: a systematic review and meta-analysis. J Urol 2012;188:51-7. [Crossref] [PubMed]
- Tran KA, Kondrashova O, Bradley A, et al. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med 2021;13:152. [Crossref] [PubMed]
- Schwalbe N, Wahl B. Artificial intelligence and the future of global health. Lancet 2020;395:1579-86. [Crossref] [PubMed]
- Marquardt A, Solimando AG, Kerscher A, et al. Subgroup-Independent Mapping of Renal Cell Carcinoma-Machine Learning Reveals Prognostic Mitochondrial Gene Signature Beyond Histopathologic Boundaries. Front Oncol 2021;11:621278. [Crossref] [PubMed]
- Pan Y, Huang C, Zhang X, et al. Machine learning-guided prevention and management of low anterior resection syndrome: Development of an XGBoost prediction model and validation via SHAP. Laparosc Endosc Robot Surg 2025; [Crossref]

