Predicting survival in small cell lung cancer patients undergoing various treatments: a machine learning approach
Original Article

Predicting survival in small cell lung cancer patients undergoing various treatments: a machine learning approach

Ziran Zhao1, Xi Cheng2, Yibo Gao1, Fengwei Tan1, Qi Xue1, Shugeng Gao1, Jie He1

1Thoracic Surgery Department, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China; 2Interdisciplinary Program of Science in Analytics, Georgia Institute of Technology, Atlanta, GA, USA

Contributions: (I) Conception and design: Z Zhao, S Gao, J He; (II) Administrative support: S Gao, J He; (III) Provision of study materials or patients: S Gao; (IV) Collection and assembly of data: Y Gao, F Tan, Q Xue; (V) Data analysis and interpretation: Z Zhao, X Cheng, J He; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Jie He, MD. Thoracic Surgery Department, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Panjiayuan Nanli No. 17, Beijing 100021, China. Email: prof_hejie@126.com.

Background: Small cell lung cancer (SCLC) is highly metastatic, accounting for 1.796 million global cancer-related deaths in 2020, with no established standard care. This study aimed to assess treatment effects on SCLC patient survival across stages and develop a machine learning-based survival prediction tool for accurate overall survival (OS) estimation.

Methods: We developed four prediction models: Cox proportional hazard (Cox PH) regression, survival tree (ST), random survival forest (RSF), and gradient boosting survival analysis (GBSA). Patients were randomly split 7:3 into training and test datasets, with 10-fold cross-validation and 50 iterations on the training dataset. Cox PH used hazard ratios, while the other models employed importance values to assess feature predictiveness. Harrell’s C-index (C-index) and Brier score (BS) measured model performance, with internal validations using R version 4.2.0.

Results: Cox PH outperformed others based on mean C-index and BS. Multivariate analysis across models highlighted distant metastases (M category), tumor stage, and treatment modalities (radiotherapy, chemotherapy, surgery) as key survival predictors. Stratified Cox PH analysis revealed surgery’s efficacy in early-stage SCLC (stage II) and radiotherapy’s advantage in stage III. Homogeneity was observed in chemotherapy benefits across cancer stages.

Conclusions: Surgery, chemotherapy, and radiotherapy are integral in SCLC treatment, contingent on cancer stage and characteristics. Surgery offers promise for early-stage cases, while advanced-stage strategies require further exploration.

Keywords: Small cell lung cancer (SCLC); survival; machine learning


Submitted Apr 14, 2024. Accepted for publication Jan 27, 2025. Published online Mar 14, 2025.

doi: 10.21037/tlcr-24-331


Highlight box

Key findings

• Surgery, chemotherapy, and radiotherapy play crucial roles in treating small cell lung cancer (SCLC), with their effectiveness varying across cancer stages. Surgery demonstrates promise particularly in early-stage SCLC (stage II), while radiotherapy exhibits advantages in stage III cases.

• Cox proportional hazards emerged as the most effective model, outperforming other models in terms of mean Harrell’s C-index and right-censored Brier score. Distant metastases, tumor stage, and treatment modalities (radiotherapy, chemotherapy, surgery) were identified as crucial predictors of survival.

What is known and what is new?

• SCLC remains highly metastatic, accounting for a significant portion of global cancer-related deaths in 2020, with no established standard of care.

• This study introduces a comprehensive assessment of treatment effects on SCLC patient survival across different stages, along with the development of a machine learning-based survival prediction tool for accurate overall survival estimation.

What is the implications and what should change now?

• Tailoring treatment strategies for SCLC patients based on their cancer stage and characteristics is crucial, emphasizing the significance of surgery, chemotherapy, and radiotherapy in improving survival outcomes.

• Further research is warranted to optimize treatment approaches for advanced-stage SCLC and to refine predictive models for enhanced clinical utility. Additionally, these findings advocate for a more personalized and nuanced approach to SCLC management in clinical practice.


Introduction

Lung cancer is the commonly diagnosed cancer worldwide, and unfortunately, it is also the leading cause of cancer-related deaths globally. In 2020, it accounted for 1.796 million deaths (1). Small cell lung cancer (SCLC), a high-grade neuroendocrine carcinoma, represents approximately 13–20% of all lung cancers (2,3). Unlike non-small cell lung cancer (NSCLC), SCLC is characterized by its aggressive growth fraction, rapid doubling time, and high metastatic potential (4-7). The majority of SCLC patients present with hematogenous metastases at the time of diagnosis, resulting in over 70% of patients having locally advanced or distant metastatic disease [tumor, node, metastasis (TNM) stage III/IV] (8,9). Consequently, the prognosis for SCLC patients is often poor, with high mortality rates.

Previous studies have not identified a clear superior treatment option for SCLC (10,11). Most SCLC patients initially respond to treatment but almost always develop acquired drug resistance and ultimately recur (12,13). Without active treatment, the median overall survival (OS) of SCLC patients is only 2–4 months, and less than 5% of patients survive for 5 years, even with treatment (14). Unfortunately, the poor prognosis of SCLC has not improved significantly over the past three decades (12).

While surgical resection is considered a mainstay of SCLC treatment, only a limited number of patients are eligible for surgery, and the survival outcomes remain poor. Concurrent chemotherapy with cisplatin and etoposide and thoracic radiation therapy is now the standard of care for SCLC, with surgical resection playing a limited role. Despite previous studies indicating the positive role of surgery in early-stage SCLC, the results remain inconsistent and controversial (11,15-18). Therefore, our study aimed to explore the effects of different treatments on the survival benefit of patients with SCLC at different stages of the disease. Additionally, we aimed to develop and validate a more efficient machine learning-based survival prediction tool for clinicians to estimate the OS of SCLC patients more accurately. To ensure the transparent reporting of our study, we present this article in accordance with the TRIPOD reporting checklist (19) (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-24-331/rc).


Methods

Study participants

The study cohort was drawn from the comprehensive Surveillance, Epidemiology, and End Results (SEER) program archives meticulously curated by the National Cancer Institute (NCI) (No. 20087-Nov2020). Ethical clearance was secured from the Cancer Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College, granting us human study exemption. Adhering to SEER stipulations, we formalized a data agreement and directly accessed the database from the NCI SEER website.

Our investigation homed in on individuals diagnosed with SCLC as their primary malignancy, spanning the years 2009 to 2019 within the SEER repository. Inclusion criteria comprised patients with histologic codes (8002, 8041–8045) and site codes (C34.0–C34.3, C34.8–C34.9). Conversely, exclusions were applied to cases: (I) diagnosed post-mortem or solely via death certificate; (II) exhibiting tumor size codes of 990 or 996–999; (III) characterized by tumor extent codes of 950, 980, or 999; (IV) displaying lymph node involvement; or (V) reporting survival periods of less than one month. We excluded individuals with lymph node involvement to reduce heterogeneity in the dataset and focus specifically on survival outcomes in early-stage SCLC patients without nodal disease, ensuring more precise modeling and interpretation of survival predictors. Our analysis focused solely on patients who underwent cancer-directed surgery, radiotherapy, and/or chemotherapy. The selection process is visually depicted in Figure 1.

Figure 1 Flowchart of study design and patient selection. SEER, Surveillance, Epidemiology, and End Results.

Statistical methods

Statistical analysis

Traditionally, survival prediction for SCLC has relied on Kaplan-Meier survival analysis and the Cox proportional hazards (Cox PH) model (20). However, these models have limitations, such as oversimplifying complex relationships and assuming proportional hazards and linear effects of covariates. To overcome these issues, we employed machine learning algorithms, which can handle non-linear associations, non-linear interactions, and effect modification between prognostic factors. All the models were run using completed cases. Among these algorithms, tree-based methods (e.g., decision tree, random forest) are well-known for their ease of use, interpretability, and ability to prevent overfitting. In particular, random survival forest (RSF) is an ensemble learning method that combines survival decision trees to analyze right-censored survival data. The method reduces overfitting by introducing random selection of subjects and predictor variables during the construction of trees and through regularization (21,22). Gradient boosting survival analysis (GBSA) is also a nonparametric model that uses an ensemble of survival trees (ST) to determine how the hazard function varies according to the associated covariates. The ensemble model is trained using a gradient boosting method to optimize accuracy. (23) A summary of each method is provided in Table 1. All statistical analyses were performed using R software (version 4.2.0; http://www.r-project.org).

Table 1

The specifics of the packages for each machine learning method

Methods Packages Links Parameter tuning range
Cox proportional hazards (Cox PH) coxph https://github.com/therneau/survival Regularization penalty = [0.1, 0.01, 0.001]
Random survival forest (RSF) rfsrc https://github.com/kogalur/randomForestSRC Terminal node size of forest = [3,10]
Number of trees = [10, 200]
Number of random splits = [1, 10]
Depth of tree= [1, 20]
Survival tree (ST) rpart https://github.com/cran/rpart Number of random splits = [1, 20]
Depth of tree = [1,30]
Gradient boosting survival analysis (GBSA) gbm https://github.com/gbm-developers/gbm Number of trees = [10, 200]

Predictors and outcome

In this study, we included predictor features such as age at diagnosis, sex, race, marital status at diagnosis, the 8th edition of the American Joint Committee on Cancer (AJCC) TNM category, overall tumor stage, and whether surgical therapy, radiation, and chemotherapy were undertaken. All variables were categorical except age, which was continuous. In the SEER program, surgical therapy was recorded as “Surgery performed”, “Not recommended”, “Recommended but not performed, patient refused”, “Recommended but not performed, unknown reason”, “Recommended, but unknown performed”, and “Unknown”. In this study, we categorized this predictor as “Surgery performed”, “Surgery not performed”, and “Unknown”. The primary outcome of interest was cancer-specific survival, calculated from the date of diagnosis to the date of death due to SCLC cancer. We extracted “Survival months” and “Death status” as outcome variables.

Model development

In this study, we randomly assigned patients to either a training (D0) or test (D1) dataset using a 7:3 split. We used Cox PH, ST, RSF, and GBSA to develop prediction models, with all models built on the training dataset using 10-fold cross-validation with 50 iterations. For each iteration, we randomly drew samples from the observed data using a different seed for parameter tuning, and we applied the optimal parameters for the final model. To address potential collinearity and overfitting, we applied regularization techniques during the model-tuning process. The specific tuning parameters, their ranges, and the software packages used are detailed in Table 1. The optimal hyperparameters obtained from cross-validation were applied to finalize the models. Predictive performance was then evaluated on the independent test dataset, and comparisons between models were conducted based on standard performance metrics.

To further reduce variance due to different partitions of the data, we split the training data (D0) 9:1 randomly into training (V0) and validation (V1) datasets. We used 10-fold cross-validation to average over 10 different partitions, resulting in less sensitivity of the performance estimates to the random partitioning of the data. For the machine learning models, we used all 10 predictor variables as inputs, and the outputs were estimated OS probabilities for patients with SCLC since diagnosis, which were not different from the Cox regression model.

To evaluate the predictive power of each feature, we used the hazard ratio in Cox PH and the value of importance in the other three models (RT, RSF, and GBSA). For RT, RSF, and GBSA, we determined feature importance by calculating the relative influence of each feature. Specifically, we determined whether that feature was selected to split on during the tree-building process and how much the squared error (over all trees) improved or decreased as a result (24,25). RSF provided a fully nonparametric measure of variable importance (VIMP), which we calculated using permutation importance (26). Permutation importance is a prediction-based approach that measures the prediction error attributable to the variable. A large positive value of importance indicated variables with high predictive ability, while zero or negative values identified noise variables.

Model performance evaluation

Evaluating the performance of survival models requires a combination of metrics that assess different aspects of prediction quality. We evaluated the predictive performance of the models using Calibration plots, Harrell’s C-index (C-index), and right-censored Brier score (BSc) in the validations (27,28). The C-index is a widely used goodness of fit measure for models that produce risk scores, particularly in survival analysis with censored data. The intuition behind Harrell’s C-index is as follows: let’s say the ith patient, the event time is Ti, censoring time is Di, the predicted risk score from a model is ηi. Let T˜i=min(Ti,Di) denote the censored time or the latest observed time and ξi=I(Ti<Di) denote the event indicator for right censoring. Then, the C-index is an estimate of the probability that, in a randomly selected pair of cases (i, j), the sequence of events is successfully predicted:

Concordanceprobability=Pr(ηi>ηj|Ti<Tj)

And the C-index is defined as the formula below:

C-index=ijI(ηi>ηj)I(T˜i<T˜j)ξiijI(T˜i>T˜j)ξi

The C-index is a measure of concordance between predicted and observed survival times, and values near 0.5 indicate that the risk score predictions are no better than random chance in determining which patient will live longer. On the other hand, values near 1 indicate that the risk scores are as good as random chance in predicting survival times. Therefore, a higher C-index corresponds to a model with higher prediction accuracy (27).

In addition to using the C-index to evaluate the models’ discriminative performance, we also employed the BSc to assess their prediction performance (29). measures the average difference between true values and estimated values and is used to compute mean squared prediction errors for binary models. We adopted Graf et al. (1999)’s method for constructing BSc for censored time-to-event data (29). The BSc over time may be understood as a mean square error of prediction when the estimated probabilities π^(t|X˜i) which take values in the interval [0, 1]. BSc is defined as below:

BSc(t)=1ni=1n{(0π^(t|X˜i))2I(T˜it,δi=1)(1/G^(T˜i))+(1π^(t|X˜i))2I(T˜i>t)(1/G^(t))}

Where T˜i represents the time to the event of interest, t is the fixed time point, π^(t|X˜i) is the estimated event-free probabilities for patient i, G^ denotes the Kaplan Meier estimate of the censoring distribution G which serves a weight corresponding to the following two conditions: I(T˜it,δi=1) is the condition that patient i experienced event before t; I(T˜i>t) is the condition that patient i either experienced the event or was censored after t. When evaluating the BSc, a score of 0.5 or higher indicates that the model’s predictive performance is no better than random chance, while a lower score indicates better prediction ability (29).

Calibration plots visually assessed how well the predicted survival probabilities agree with the observed survival probabilities. The 45-degree diagonal line in the plot represents perfect calibration. A model with curves closely following this line demonstrated good agreement between predictions and reality. Calibration is essential because even a model with high discrimination (C-index) may poorly estimate absolute risk probabilities.

Ethical considerations

Our study used secondary analysis of research data, and all patient data were obtained from the SEER database, which is publicly available and de-identified. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The use of SEER data for research purposes is exempt from institutional review board (IRB) approval as stated in the SEER Program Code of Federal Regulations (CFR) Title 45, Volume 1, Parts 160–199. Therefore, this study did not require IRB approval. As the data used in this study were already collected for clinical and administrative purposes, informed consent was not required for this study. To protect the privacy and confidentiality of the study participants, all patient data were de-identified and anonymized by SEER prior to our access to the data. No individual patient data were reported in this study. No compensation was provided to the study participants as this was a secondary analysis of publicly available data.


Results

Characteristics of the study participants

Table 2 provides a description of the demographic and tumor-related characteristics of the 51,600 patients included in the final models. The mean survival time for these patients was 17.76 months, with a standard deviation of 24.03 months. Of the patients, 86% experienced the death event. The mean age at diagnosis was 66.60 years, with a standard deviation of 10.07 years. Of the patients, 25,341 (49%) were male, 44,982 (87%) were white, and 26,524 (51%) were married at the time of diagnosis. In terms of tumor characteristics, 35,090 (68%) of the patients were at the distant stage, and T2-T4 stage tumors accounted for 71.8% of the cases. Of the patients, 3.1% received surgery, and 54% and 82% received radiotherapy and chemotherapy, respectively.

Table 2

Demographic and tumor-related characteristics of patients with SCLC in SEER cohorts

Characteristic Values (N=51,600) 95% CI
Survival (months) 17.76 (24.03) 18–18
Death
   Yes 44,512 (86.3) 86–87%
   No 7,088 (13.7) 13–14%
Age (years) 66.60 (10.07) 67–67
Gender
   Female 26,259 (50.9) 50–51%
   Male 25,341 (49.1) 49–50%
Race
   White 44,982 (87.2) 87–87%
   Black 4,576 (8.9) 8.6–9.1%
   Asian or Pacific Islander 1,685 (3.3) 3.1–3.4%
   American Indian/Alaska Native 326 (0.6) 0.57–0.70%
   Unknown 31 (<0.1) 0.04–0.09%
Marital status
   Married (including common law) 26,524 (51.4) 51–52%
   Divorced 7,489 (14.5) 14–15%
   Separated 0 0.00–0.01%
   Single (never married) 6,557 (12.7) 12–13%
   Widowed 9,009 (17.5) 17–18%
   Unmarried or domestic partner 73 (0.1) 0.11–0.18%
   Unknown 1,948 (3.8) 3.6–3.9%
T category
   T0 537 (1.0) 0.96–1.1%
   T1 6,160 (11.9) 12–12%
   T2 11,814 (22.9) 23–23%
   T3 1,984 (3.8) 3.7–4.0%
   T4 23,083 (44.7) 44–45%
   TX 8,022 (15.5) 15–16%
N category
   N0 8,092 (15.7) 15–16%
   N1 3,852 (7.5) 7.2–7.7%
   N2 26,789 (51.9) 51–52%
   N3 9,004 (17.4) 17–18%
   NX 3,863 (7.5) 7.3–7.7%
M category
   M0 20,449 (39.6) 39–40%
   M1 29,041 (56.3) 56–57%
   MX 2,110 (4.1) 3.9–4.3%
Stage
   Distant 35,090 (68.0) 68–68%
   Localized 2,881 (5.6) 5.4–5.8%
   Regional 12,371 (24.0) 24–24%
   Unknown/unstaged 1,258 (2.4) 2.3–2.6%
Surgery
   No 50,022 (97.0) 97–97%
   Yes 1,578 (3.1) 2.9–3.2%
Radiotherapy
   No 23,569 (45.7) 45–46%
   Yes 28,031 (54.3) 54–55%
Chemotherapy
   No/unknown 9,425 (18.3) 18–19%
   Yes 42,175 (81.7) 81–82%

Data are presented as mean (standard deviation) or n (%). CI, confidence interval; SCLC, small cell lung cancer; SEER, Surveillance, Epidemiology, and End Results.

Model performance

Table 3 presents mean and standard deviations of C-index and BSc values for each model from 50 iterations. The mean of C-indexes in the test datasets were 0.6710, 0.6343, 0.6210, and 0.6590 for Cox PH, ST, RSF, and GBSA, respectively. The mean of BSc values in the test datasets were 0.0790, 0.0809, 0.0801, and 0.0790 for Cox PH, ST, RSF, and GBSA, respectively. Cox PH outperformed the other models based on both C-index and BSc values.

Table 3

Model performance comparisons

Modelling approaches Performance score
Mean 95% CI
C-index
   Cox PH 0.6710 0.6710–0.6710
   ST 0.6343 0.6343–0.6343
   RSF 0.6210 0.6209–0.6211
   GBSA 0.6590 0.6589–0.6591
Brier score
   Cox PH 0.0790 0.0786–0.0794
   ST 0.0809 0.0805–0.0813
   RSF 0.0801 0.0797–0.0805
   GBSA 0.0790 0.0786–0.0794

CI, confidence interval; Cox PH, Cox proportional hazard; GBSA, gradient boosting survival analysis; RSF, random survival forest; ST, survival tree.

In terms of calibration, the calibration curves for all four algorithms in the test dataset appeared to be close to each other, with RSF exhibiting weaker calibration in both the training and test datasets (Figure S1). Based on these results, we conclude that Cox PH is the best-performing model.

Ensemble hazard rate estimates and feature importance

Additionally, the tumor stage was identified as an important predictor in all four models, with a higher stage associated with a significantly increased hazard of death. The M category was also identified as an important predictor in all models, with the presence of distant metastases associated with a significantly increased hazard of death. Age at diagnosis was identified as an important predictor in the Cox PH model, with older age associated with a significantly increased hazard of death. Marital status was identified as an important predictor in the RSF and GBSA models, with unmarried status associated with a significantly increased hazard of death (Table 4 and Figure 2).

Table 4

Multivariate analysis using cox proportional hazards model, ST, RSF, gradient boosting machine in patients with SCLC

Variable Cox PH Feature importance
Hazard ratio 95% CI P value ST RSF GBSA
Age 1.01 1.01–1.01 <0.001 79 0.0360 7.30
Gender
   Female Ref 0 0.0043 2.38
   Male 1.22 1.18–1.26 <0.001
Race
   American Indian/Alaska Native Ref 1 0.0421 0.00
   Asian or Pacific Islander 0.99 0.93–1.05 0.71
   Black 0.91 0.82–1.00 0.048
   Unknown 0.94 0.77–1.15 0.57
   White 0.62 0.26–1.50 0.29
Marital status
   Divorced Ref 13 0.0160 0.16
   Married (including common law) 1.14 1.08–1.20 <0.001
   Single (never married) 1.06 1.01–1.12 0.02
   Unknown 1.11 1.06–1.17 <0.001
   Unmarried or domestic partner 2.12 1.32–3.41 0.002
   Widowed 0.93 0.85–1.02 0.11
T category
   T0 Ref 0 0.0105 4.90
   T1 1.44 1.18–1.75 <0.001
   T2 1.76 1.46–2.13 <0.001
   T3 1.81 1.48–2.23 <0.001
   T4 1.85 1.53–2.24 <0.001
   TX 1.52 1.25–1.85 <0.001
N category
   N0 Ref 0 0.0087 4.26
   N1 1.12 1.03–1.21 0.009
   N2 1.3 1.23–1.38 <0.001
   N3 1.27 1.19–1.36 <0.001
   NX 1.23 1.13–1.35 <0.001
M category
   M0 Ref 0 0.0484 28.88
   M1 1.56 1.47–1.65 <0.001
   MX 1.13 1.01–1.28 0.03
Stage
   Distant Ref 1 0.0378 25.31
   Localized 0.62 0.55–0.70 <0.001
   Regional 0.73 0.69–0.79 <0.001
   Unknown/unstaged 0.69 0.58–0.82 <0.001
Surgery
   No Ref 1 0.0190 6.51
   Yes 0.53 0.47–0.60 <0.001
Radiotherapy
   No Ref 1 0.0250 11.57
   Yes 0.74 0.71–0.77 <0.001
Chemotherapy
   No Ref 3 0.0302 8.72
   Yes 0.62 0.59–0.65 <0.001

CI, confidence interval; Cox PH, Cox proportional hazard; GBSA, gradient boosting survival analysis; RSF, random survival forest; SCLC, small cell lung cancer; ST, survival tree.

Figure 2 The feature importance obtained from ST, RSF and GBSA models. GBSA, gradient boosting survival analysis; RSF, random survival forest; ST, survival tree.

Specifically, in the Cox PH model, the P values for radiotherapy, chemotherapy, and surgery were all less than 0.001, indicating a strong relationship between each treatment and decreased risk of death. For radiotherapy, the hazard ratio (HR) was 0.75, indicating a 26% reduction in the hazard of death when holding other covariates constant. Similarly, for chemotherapy, the HR was 0.62, corresponding to a 38% reduction in the hazard of death, and for surgery, the HR was 0.53, corresponding to a 47% reduction in the hazard of death. Therefore, these treatments have moderate beneficial effects on the prognostic outcomes for patients with SCLC.

Stratified survival probability

After considering the performance of different models, we selected Cox PH to further explore the survival outcomes of different treatments in subgroups of patients with SCLC. After controlling for demographic and clinical characteristics, the Cox PH model revealed a significant difference in OS between treated and non-treated patients (Figure 3). The median OS of surgical patients was 23 months [95% confidence interval (CI): 18–31] compared to 12 months (95% CI: 11–14) for non-surgical patients. Patients who received radiotherapy had a median OS of 15 months (95% CI: 13–18), slightly longer than the median OS of 12 months (95% CI: 11–14) for patients who did not receive radiotherapy. Patients who underwent chemotherapy had a median OS of 17 months (95% CI: 15–21), compared to 13 months (95% CI: 10–16) for those who did not receive chemotherapy. The median value was presented as a dotted line in the figures.

Figure 3 The predicting survival of SCLC stratified by types of treatment. SCLC, small cell lung cancer.

Given that M category and stage were prominent risk factors for predicting survival probability in SCLC, we closely examined the effects of the three types of treatment stratified by M category and cancer stage (Figure 4 and Figure S2). In each of the three strata of cancer stage (stage II localized, stage III regional spread, stage V distant spread), all treatments had better outcomes than non-treated patients. However, the benefit of surgery was most evident for SCLC patients at an early stage of cancer (stage II localized), while the benefit of radiotherapy was most apparent for SCLC patients at stage III regional spread. There was no detectable heterogeneity across strata of cancer stage for patients who underwent chemotherapy (Figure 3). Stratified multivariate survival analysis using M category confirmed that both surgery and radiotherapy had better benefits for patients without distant metastases, while chemotherapy did not have a significant difference in its effects on both M categories (Figure S2).

Figure 4 The predicting survival of SCLC stratified by types of treatment and cancer stage. SCLC, small cell lung cancer.

Discussion

In this retrospective cohort study utilizing the SEER database, we aimed to investigate the survival outcomes of different treatments (surgery, chemotherapy, and radiotherapy) for patients with SCLC. Our findings showed that surgery and radiotherapy were associated with a statistically significant improvement in median OS for patients with SCLC with limited stages. To handle the large datasets, we employed machine learning models, including RSF, ST, and GBSA. Although the Cox PH method performed well as a conventional method for SCLC survival prediction, we observed a reduction in BSc for GBSA. The predictive capabilities represented by the C-index in test datasets were similar between RSF, ST, Cox PH, and GBSA, suggesting that the superiority of machine learning is not always observed but in situations where conventional methods hit their limits.

While the predictive role of tumor staging in SCLC is well-established, our study adds to the evidence supporting the necessity of stage-specific treatment approaches. For instance, the OS benefits of surgery are most apparent in early-stage disease, while concurrent chemotherapy and radiotherapy are pivotal for managing limited-stage disease. The stratification offered by staging allows clinicians to optimize treatment decisions, improving patient outcomes.

Our study suggests that surgery can improve median OS in early-stage SCLC, aligning with retrospective studies demonstrating OS benefits of surgical intervention compared to non-surgical treatments. A recent 2023 retrospective cohort study further supports the importance of surgery for limited-stage SCLC, showing real-world evidence of improved outcomes in carefully selected cases (30). Additionally, multimodal approaches, including surgery followed by chemotherapy and/or radiotherapy, have shown improved outcomes, particularly for T1–2N0–1M0 SCLC patients (31). However, studies caution against surgical intervention in advanced stages due to limited efficacy and complications from paraneoplastic syndromes, as observed in recent data.

The findings align with a 2016 systematic review that highlighted the association between surgical intervention and longer median OS across various stages of SCLC. This review, which analyzed 21 studies, demonstrated that surgery resulted in significantly improved OS compared to non-surgical treatments (chemotherapy and/or radiotherapy). Specifically, patients in stage I–II achieved a median OS of 31–34 months with surgery compared to 23–24 months with non-surgical approaches (P<0.001). Similarly, for stage I–III SCLC, the median OS was 26 months with surgery versus 6 months without (P<0.001), while patients in stage IIB–IIIC had a median OS of 20 months with surgery compared to 14 months for non-surgical treatments (P<0.001) (11). These results align with guidelines from the National Comprehensive Cancer Network (NCCN) and the American College of Chest Physicians (ACCP), which recommend surgery primarily for patients with early-stage SCLC without nodal involvement (32,33). Further randomized trials are needed to define the role of surgery, particularly in advanced stages.

Chemotherapy and radiotherapy continue to be mainstays of SCLC treatment. Our findings build on prior evidence showing their OS-favoring effects, particularly in patients with limited-stage disease. For example, a 2020 review highlighted the importance of chemotherapy with cisplatin and etoposide combined with thoracic radiotherapy as the standard of care for early-stage SCLC (34). Similarly, data from a systematic literature review emphasize the critical role of immunotherapy as an adjunct to chemotherapy and radiotherapy in improving OS in advanced disease stages (35). The role of radiotherapy has also been reinforced by recent studies (36,37). Extensive-stage SCLC has shown benefits in local control and palliation when combined with systemic chemotherapy. In limited-stage SCLC, the combination of chemotherapy and radiotherapy remains essential for improving survival rates (34,38).

Our study has limitations due to the inherent nature of the SEER database, including the lack of critical clinical confounders such as baseline lung function, specific treatment regimens, treatment sequence, and tobacco usage. Although we analyzed seven variables closely associated with OS, unmeasured confounders could bias our estimation of treatment effects. Additionally, our findings are limited to internal cross-validation due to the lack of an appropriate independent cohort-based database for external validation. Future research with larger, multi-institutional datasets is necessary to address these limitations.

Our results emphasize the importance of tailoring treatment based on disease stage and patient-specific characteristics. While surgery offers significant benefits for early-stage SCLC, its role in advanced stages requires careful evaluation. Multimodal approaches, including chemotherapy and radiotherapy, remain critical for improving outcomes across all disease stages. The integration of machine learning approaches highlights their potential to address large, complex datasets in oncology. However, our findings underscore the need for external validation and refinement of these methods before widespread clinical adoption. Future analyses should address treatment efficacy in underrepresented patient subgroups (e.g., by age, sex, or ethnicity) to ensure equitable care and improve model generalizability.


Conclusions

In this population-based study, surgery and radiotherapy were associated with improved OS in patients with limited-stage SCLC, while chemotherapy remains a critical component of multimodal treatment strategies. Our findings reinforce current clinical guidelines and provide valuable insights for optimizing treatment approaches. Although machine learning models did not demonstrate clear advantages over conventional methods, they remain promising tools for addressing complex oncologic datasets. Further research is needed to validate these findings and explore strategies for improving outcomes across all stages of SCLC.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-24-331/rc

Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-24-331/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-24-331/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
  2. Bunn PA Jr, Minna JD, Augustyn A, et al. Small Cell Lung Cancer: Can Recent Advances in Biology and Molecular Biology Be Translated into Improved Outcomes? J Thorac Oncol 2016;11:453-74. [Crossref] [PubMed]
  3. Byers LA, Rudin CM. Small cell lung cancer: where do we go from here? Cancer 2015;121:664-72. [Crossref] [PubMed]
  4. Zamay TN, Zamay GS, Kolovskaya OS, et al. Current and Prospective Protein Biomarkers of Lung Cancer. Cancers (Basel) 2017;9:155. [Crossref] [PubMed]
  5. Pietanza MC, Byers LA, Minna JD, et al. Small cell lung cancer: will recent progress lead to improved outcomes? Clin Cancer Res 2015;21:2244-55. [Crossref] [PubMed]
  6. Travis WD. Update on small cell carcinoma and its differentiation from squamous cell carcinoma and other non-small cell carcinomas. Mod Pathol 2012;25:S18-30. [Crossref] [PubMed]
  7. Glisson BS, Byers LA, Nicholson A. Pathobiology and staging of small cell carcinoma of the lung. UpToDate com May. 2013.
  8. Walters S, Maringe C, Coleman MP, et al. Lung cancer survival and stage at diagnosis in Australia, Canada, Denmark, Norway, Sweden and the UK: a population-based study, 2004-2007. Thorax 2013;68:551-64. [Crossref] [PubMed]
  9. Fiorentino FP, Macaluso M, Miranda F, et al. CTCF and BORIS regulate Rb2/p130 gene transcription: a novel mechanism and a new paradigm for understanding the biology of lung cancer. Mol Cancer Res 2011;9:225-33. [Crossref] [PubMed]
  10. Johal S, Hettle R, Carroll J, et al. Real-world treatment patterns and outcomes in small-cell lung cancer: a systematic literature review. J Thorac Dis 2021;13:3692-707. [Crossref] [PubMed]
  11. Stokes M, Berfeld N, Gayle A, et al. A systematic literature review of real-world treatment outcomes of small cell lung cancer. Medicine (Baltimore) 2022;101:e29783. [Crossref] [PubMed]
  12. Wang S, Zimmermann S, Parikh K, et al. Current Diagnosis and Management of Small-Cell Lung Cancer. Mayo Clin Proc 2019;94:1599-622. [Crossref] [PubMed]
  13. Kalemkerian GP, Akerley W, Bogner P, et al. Small cell lung cancer. J Natl Compr Canc Netw 2013;11:78-98. [Crossref] [PubMed]
  14. Yang S, Zhang Z, Wang Q. Emerging therapies for small cell lung cancer. J Hematol Oncol 2019;12:47. [Crossref] [PubMed]
  15. Yang CJ, Chan DY, Shah SA, et al. Long-term Survival After Surgery Compared With Concurrent Chemoradiation for Node-negative Small Cell Lung Cancer. Ann Surg 2018;268:1105-12. [Crossref] [PubMed]
  16. Brock MV, Hooker CM, Syphard JE, et al. Surgical resection of limited disease small cell lung cancer in the new era of platinum chemotherapy: Its time has come. J Thorac Cardiovasc Surg 2005;129:64-72. [Crossref] [PubMed]
  17. Inoue M, Miyoshi S, Yasumitsu T, et al. Surgical results for small cell lung cancer based on the new TNM staging system. Thoracic Surgery Study Group of Osaka University, Osaka, Japan. Ann Thorac Surg 2000;70:1615-9. [Crossref] [PubMed]
  18. Chandra V, Allen MS, Nichols FC 3rd, et al. The role of pulmonary resection in small cell lung cancer. Mayo Clin Proc 2006;81:619-24. [Crossref] [PubMed]
  19. Collins GS, Reitsma JB, Altman DG, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): the TRIPOD Statement. Br J Surg 2015;102:148-58. [Crossref] [PubMed]
  20. Wang Y, Zheng Q, Jia B, et al. Effects of Surgery on Survival of Early-Stage Patients With SCLC: Propensity Score Analysis and Nomogram Construction in SEER Database. Front Oncol 2020;10:626. [Crossref] [PubMed]
  21. Wongvibulsin S, Wu KC, Zeger SL. Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis. BMC Med Res Methodol 2019;20:1. [Crossref] [PubMed]
  22. Jaeger BC, Long DL, Long DM, et al. Oblique random survival forests. Ann Appl Stat 2019;13:1847-83. [Crossref] [PubMed]
  23. Chen Y, Jia Z, Mercola D, et al. A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Comput Math Methods Med 2013;2013:873595. [Crossref] [PubMed]
  24. Atkinson EJ, Therneau TM. An introduction to recursive partitioning using the RPART routines. Rochester: Mayo Foundation; 2000.
  25. Adler AI, Painsky A. Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection. Entropy (Basel) 2022;24:687. [Crossref] [PubMed]
  26. Breiman L. Manual on setting up, using, and understanding random forests v3. 1. Statistics Department University of California Berkeley, CA, USA 2002;1:3-42.
  27. Harrell FE Jr, Califf RM, Pryor DB, et al. Evaluating the yield of medical tests. JAMA 1982;247:2543-6. [Crossref] [PubMed]
  28. Vasilev I, Petrovskiy M, Mashechkin IV, editors. Survival Analysis Algorithms based on Decision Trees with Weighted Log-rank Criteria. Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2022), pages 132-140.
  29. Graf E, Schmoor C, Sauerbrei W, et al. Assessment and comparison of prognostic classification schemes for survival data. Stat Med 1999;18:2529-45. [Crossref] [PubMed]
  30. Blackhall F, Girard N, Livartowski A, et al. Treatment patterns and outcomes among patients with small-cell lung cancer (SCLC) in Europe: a retrospective cohort study. BMJ Open 2023;13:e052556. [Crossref] [PubMed]
  31. He J, Xu S, Pan H, et al. Treatments for combined small cell lung cancer patients. Transl Lung Cancer Res 2020;9:1785-94. [Crossref] [PubMed]
  32. Network NCC. National Comprehensive Cancer Network (NCCN) guidelines. 2015.
  33. Simon GR, Turrisi A; American College of Chest Physicians. Management of small cell lung cancer: ACCP evidence-based clinical practice guidelines (2nd edition). Chest 2007;132:324S-39S.
  34. Saltos A, Shafique M, Chiappori A. Update on the Biology, Management, and Treatment of Small Cell Lung Cancer (SCLC). Front Oncol 2020;10:1074. [Crossref] [PubMed]
  35. Jones GS, Elimian K, Baldwin DR, et al. A systematic review of survival following anti-cancer treatment for small cell lung cancer. Lung Cancer 2020;141:44-55. [Crossref] [PubMed]
  36. Tjong MC, Mak DY, Shahi J, et al. Current Management and Progress in Radiotherapy for Small Cell Lung Cancer. Front Oncol 2020;10:1146. [Crossref] [PubMed]
  37. Zhu H, Zhou Z, Wang Y, et al. Thoracic radiation therapy improves the overall survival of patients with extensive-stage small cell lung cancer with distant metastasis. Cancer 2011;117:5423-31. [Crossref] [PubMed]
  38. Zhang Y, Zeng Y, Yin Y, et al. The role of radiotherapy in extensive-stage small cell lung cancer: insights from treatment failure patterns in the era of immunotherapy. BMC Cancer 2024;24:1534. [Crossref] [PubMed]
Cite this article as: Zhao Z, Cheng X, Gao Y, Tan F, Xue Q, Gao S, He J. Predicting survival in small cell lung cancer patients undergoing various treatments: a machine learning approach. Transl Lung Cancer Res 2025;14(3):736-748. doi: 10.21037/tlcr-24-331

Download Citation