Development and validation of a dynamic survival nomogram for metastatic non-small cell lung cancer based on the SEER database and an external validation cohort
Original Article

Development and validation of a dynamic survival nomogram for metastatic non-small cell lung cancer based on the SEER database and an external validation cohort

Qing Wang1#, Yansu Wang2#, Xinyu Wang1#, Yusuke Nakamura3, Per Hydbring4, Yoshikane Yamauchi5, Xiaojing Zhao1, Min Cao1

1Department of Thoracic Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; 2Department of Radiotherapy, Shanghai Tenth People’s Hospital, Tongji University School of Medicine, Shanghai, China; 3National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan; 4Department of Oncology and Pathology, Karolinska Institutet, Stockholm, Sweden; 5Department of Surgery, Teikyo University School of Medicine, Tokyo, Japan

Contributions: (I) Conception and design: Q Wang; (II) Administrative support: M Cao, X Zhao; (III) Provision of study materials or patients: Q Wang, M Cao; (IV) Collection and assembly of data: Y Wang, Q Wang; (V) Data analysis and interpretation: X Wang, Q Wang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Min Cao. Department of Thoracic Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, 160 Pujian Road, Shanghai 200127, China. Email: drcaomin@126.com; Xiaojing Zhao. Department of Thoracic Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, 160 Pujian Road, Shanghai 200127, China. Email: zhaoxiaojing@renji.com.

Background: Limited efficacy and poor prognosis are common in patients with metastatic non-small cell lung cancer (NSCLC). An accurate and useful nomogram helps the clinician predict the prognosis of the patients. However, there has been no previous report on the nomogram specially for predicting the overall survival (OS) of metastatic NSCLC patients.

Methods: A total of 18,343 patients diagnosed with metastatic NSCLC in the Surveillance, Epidemiology, and End Results (SEER) database were included and divided into the training cohort (n=12,840) and the internal validation cohort (n=5,503), and 242 patients in Renji Hospital were additionally enrolled as the external validation cohort. Demographical, clinical, and OS data were collected. A Cox proportional hazards regression model was used to develop a nomogram based on the training cohort. To validate the nomogram, we applied C-indexes, calibration curves, receiver operating characteristic (ROC) curve, decision curve analysis (DCA), and a Kaplan-Meier survival curve.

Results: The multivariate Cox regression model found that there were a total of 16 independent risk factors for OS of the patients (all 16 factors showed P<0.001), which were integrated into the nomogram with a C-index of 0.702 [95% confidence interval (CI): 0.684–0.720]. The nomogram also exhibited good prognostic value in the internal validation cohort (C-index =0.699, 95% CI: 0.673–0.725) and external validation cohort (C-index =0.695, 95% CI: 0.653–0.737). The ROC and Kaplan-Meier survival curve analyses demonstrated a high discriminative ability. High-risk patients had significantly less favorable OS than low-risk patients in the SEER population and external validation cohort (both P<0.001). The DCA analysis showed that the nomogram provided better prognosis prediction than the tumor-node-metastasis (TNM) staging system.

Conclusions: We constructed and validated a dynamic nomogram with 16 variables based on a large-scale population of SEER database to predict the prognosis of metastatic NSCLC patients. The nomogram is expected to provide higher predictive ability and accuracy than the TNM staging system.

Keywords: Non-small cell lung cancer (NSCLC); metastasis; prognosis; nomogram; overall survival (OS)


Submitted May 24, 2022. Accepted for publication Aug 10, 2022.

doi: 10.21037/tlcr-22-544


Introduction

In recent years, there have been sharp declines in the incidence and mortality, and significant improvement of survival of lung cancer patients, especially those with non-small cell lung cancer (NSCLC). It is estimated that the 2-year relative survival of NSCLC has been improved from 34% (diagnosed from 2009 through 2010) to 42% (diagnosed from 2015 through 2016), including absolute improvement of 5–6% for patients at all stages (1). However, lung cancer is at present the second common cancer and the leading cause of cancer death, with an estimated 2.3 million newly diagnosed cases and 1.8 million deaths in 2020 of the USA alone (2). According to the American Joint Committee on Cancer (AJCC) 8th edition tumor-node-metastasis (TNM) staging system, metastatic lung cancer is defined as stages M1 and IV, including M1a (separate tumor nodule/s in a contralateral lobe; pleural nodules, or malignant pleural or pericardial effusion), M1b [single extrathoracic metastasis or involvement of a single distant (non-regional) node], and M1c (multiple extrathoracic metastases in 1 or multiple organs) (3). Patients with stage IV account for about 35% of all patients, while the 2- and 5-year survival rates have been reported at only 23% and 10% for stage IVA, and 10% and 0% for stage IVB, respectively (4). Despite the novel molecular-targeted therapies and immunotherapies have been developed, stage IV patients still have a very poor prognosis (5).

A predictive model helps the clinicians estimate disease progression and predict patients’ survival according to their baseline characteristics and clinical data. Based on a Cox hazard regression model, a nomogram is a widely applied tool for predicting the survival of patients with malignant tumors (6). Many studies have reported the creation of a nomogram for lung cancer. Liang et al. developed a nomogram based on a Chinese multi-institutional registry of 6,111 patients with resected NSCLC and validated by a separate cohort of 2,148 patients from the International Association for the Study of Lung Cancer (IASLC) database. The nomogram included 6 independent prognostic factors and reached a C-index of 0.71, higher than the TNM staging system for predicting overall survival (OS) (7).

The TNM staging system is the most-widely used tool for guiding clinical treatments and predicting the prognosis (8). Wankhede et al. evaluated the 8th AJCC TNM stage for NSCLC by meta-analysis, indicating that the C-index of the 8th and 7th editions were 0.690 and 0.688, respectively (9). For the purpose of convenient use and easy prediction, the TNM staging system only includes three key factor, lacking some essential information for survival analysis, such as age, gender, histology, and treatments. Therefore, numerous nomograms have been developed to predict the prognosis of lung cancer. A study published a nomogram for stage IB NSCLC, with age, gender, histology, differentiation grade, the extent of surgery, and lymph nodes resected entered. The authors found that the nomogram demonstrated good prognostic applicability and clinical accuracy, with the C-index values of 0.637 (95% CI: 0.634–0.641) for the training cohort and 0.667 (95% CI: 0.656–0.678) for the external validation cohort (10). The nomogram demonstrates better performance in prognosis prediction with much more factors requested. However, there has been no previous report of a nomogram for patients with metastatic NSCLC. In this study, we developed a nomogram for patients with metastatic NSCLC based on the Surveillance, Epidemiology, and End Results (SEER) database. We validated the nomogram with an internal validation cohort from the SEER database and an external validation cohort from a single center. We present the following article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-22-544/rc).


Methods

Patient selection

We selected the patients from 18 population-based cancer registries (with additional treatments fields) of the SEER database (http://seer.cancer.gov/). The SEER*Stat program (v 8.3.9; seer.cancer.gov/seerstat) was used to extract the information of patients with lung cancer. The extraction conditions were as follows: “the location of the disease: Lung and Bronchus” and “diagnosis year: 2004–2016”.

Following variables were extracted: “Age recode with <1 year old”, “Race recode (White, Black, Other)”, “Sex”, “Marital status at diagnosis”, “Primary Site – labeled”, “Histologic Type ICD-O-3”, “Grade”, “Laterality”, “Derived AJCC Stage Group, 7th ed (2010–2015)”, “Derived AJCC T, 7th ed (2010–2015)”, “Derived AJCC N, 7th ed (2010–2015)”, “Derived AJCC M, 7th ed (2010–2015)”, “Derived AJCC Stage Group, 6th ed (2004–2015)”, “Derived AJCC T, 6th ed (2004–2015)”, “Derived AJCC N, 6th ed (2004–2015)”, “Derived AJCC M, 6th ed (2004–2015)”, “RX Summ--Surg Prim Site (1998+)”, “RX Summ--Scope Reg LN Sur (2003+)”, “RX Summ--Surg Oth Reg/Dis (2003+)”, “Chemotherapy recode”, “Radiation recode”, “SEER Combined Mets at DX-bone (2010+)”, “SEER Combined Mets at DX-brain (2010+)”, “SEER Combined Mets at DX-liver (2010+)”, “SEER Combined Mets at DX-lung (2010+)”, “Survival months”, “Vital status recode”, “First malignant primary indicator”, “Total number of in situ/malignant tumors for patient”. We screened the selected patients according to the following exclusion criteria: (I) patients diagnosed with small cell lung cancer (SCLC); (II) patients with M0 stage, MX, or unknown M stage; (III) age <18 years; (IV) patients in whom lung cancer was the first primary tumor; (V) patients with more than 1 malignant tumor; (VI) patients without information about the survival months; (VII) patients with unknown race, marital status, tumor site, grade, T stage, N stage, and metastatic sites. The patients’ T stage and N stage were transformed into the AJCC 8th TNM stage, while the M stage was not changed. In the 7th AJCC TNM stage, M1b stands for distant metastasis, divided into M1b (single extrathoracic metastasis) and M1c (multiple extrathoracic metastases) in the 8th TNM staging system.

The selected patients from the SEER database were randomly assigned to the training and internal validation cohorts with a bootstrapping technique and a proportion of 7:3. We selected patients with metastatic NSCLC diagnosed from 2015 to 2020 in Renji Hospital as the external validation cohort. At last, a total of 242 patients with metastatic NSCLC at Renji Hospital were enrolled as the external validation cohort, who had complete baseline characteristics and follow-up data. Clinical and pathological data were retrieved retrospectively from the hospital database, and follow-up information was collected by telephone interview. Patients without follow-up data and other essential clinical data were excluded.

Ethical statement

The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by The Ethics Committee of Renji Hospital, Shanghai Jiao Tong University School of Medicine (Shanghai, China) (No. RA-2020-572), and informed consent was taken from all the patients.

Nomogram development

We calculated the hazard ratios (HRs) and 95% confidence intervals (CIs) of the risk factors for the OS of the training cohort by applying the univariate Cox proportional hazards regression model when the risk factors with a P value less than 0.05 were included in the multivariate regression model. The independent risk factors were integrated into the nomogram model (P<0.05 in the multivariate Cox proportional hazards regression analysis). The probability of OS less than 3, 6, and 12 months could be estimated with the nomogram.

Nomogram validation

The training, internal, and external validation cohorts were used to validate the discriminative ability and calibration of the nomogram. Harrell’s C-statistic (C-index) was adopted as the primary indicator of discriminative power. Ranging from 0.5 to 1, the C-index values means that the discrimination ranges from none to perfect. A predicting model with C-index higher than 0.7 is usually considered as useful and predicative. Time-dependent receiver operating characteristic (ROC) curves with area under the curve (AUC) at 3, 6, and 12 months were also applied to demonstrate discriminative power. We used the calibration plot which to calibrate the relationship between observations and predicted probabilities. A standard curve of the calibration plot is a straight line through the origin of the axes with a slope of 1. when the prediction line falls on the 45-degree diagonal more, the model is more accurate. Finally, we applied decision curve analysis (DCA) to compare the accuracy of the nomogram and the TNM staging system.

Statistical analysis

We used R software (version 4.0.2; The R Foundation for Statistical Computing, Vienna, Austria) to construct the nomogram. All tests were two-sided and the statistical result was considered statistically significant when the P value was less than 0.05. We presented categorical variables as proportions and used Chi-square tests or Fisher’s precision probability test to compare the difference of categorical variables. According to the previous report (11), we also calculated the sum score of each patient based on the Cox hazards proportional regression model. We divided the patients into the low-risk and high-risk groups with the cut-off point for the risk stratification, which was calculated by the “surv_cutpoint” function of the “survminer” of the R packages. Survival analysis between the low-risk and high-risk groups was conducted with a Kaplan–Meier survival curve and the log-rank test.


Results

Demographic and clinicopathological characteristics

We listed all analyzed variables of the included patients in the SEER database (Table 1). A total of 18,343 patients in the SEER database were randomly divided into the training cohort (n=12,840) and the internal validation cohort (n=5,503). There was no statistically-significant difference between the training cohort and the internal validation cohort in all analyzed variables, including age, race, gender, marital status, primary site, histology, grade, laterality, T stage, N stage, M stage, surgery, chemotherapy, radiation therapy, bone metastasis, liver metastasis, brain metastasis, and lung metastasis.

Table 1

Demographics and clinicopathological characteristics of the training and internal validation cohort

Variables Training set (N=12,840) Test set (N=5,503) P value
Age 0.065
   20–54 years 1,745 (13.6%) 818 (14.9%)
   55–64 years 3,565 (27.8%) 1,543 (28.0%)
   65–74 years 4,308 (33.6%) 1,745 (31.7%)
   75–84 years 2,654 (20.7%) 1,158 (21.0%)
   85+ years 568 (4.4%) 239 (4.3%)
Race 0.431
   White 9,899 (77.1%) 4,267 (77.5%)
   Black 1,794 (14.0%) 731 (13.3%)
   Other 1,147 (8.9%) 505 (9.2%)
Gender 0.427
   Female 5,713 (44.5%) 2,484 (45.1%)
   Male 7,127 (55.5%) 3,019 (54.9%)
Marital status 0.34
   Married 6,954 (54.2%) 3,023 (54.9%)
   Others 5,886 (45.8%) 2,480 (45.1%)
Primary site 0.799
   Main bronchus 627 (4.9%) 256 (4.7%)
   Upper lobe 7,695 (59.9%) 3,341 (60.7%)
   Middle lobe 579 (4.5%) 246 (4.5%)
   Lower lobe 3,784 (29.5%) 1,588 (28.9%)
   Overlapping lesion of lung 155 (1.2%) 72 (1.3%)
Histology 0.466
   Adenocarcinoma 6,758 (52.6%) 2,950 (53.6%)
   Squamous 3,152 (24.5%) 1,315 (23.9%)
   Others 2,930 (22.8%) 1,238 (22.5%)
Grade 0.541
   Grade I 701 (5.5%) 289 (5.3%)
   Grade II 3,595 (28.0%) 1,504 (27.3%)
   Grade III 8,127 (63.3%) 3,543 (64.4%)
   Grade IV 417 (3.2%) 167 (3.0%)
Laterality 0.905
   Bilateral 46 (0.4%) 21 (0.4%)
   Left 5,331 (41.5%) 2,269 (41.2%)
   Right 7,463 (58.1%) 3,213 (58.4%)
T stage 0.989
   T0 4 (0.0%) 2 (0.0%)
   T1 1,206 (9.4%) 519 (9.4%)
   T2 2,101 (16.4%) 903 (16.4%)
   T3 1,137 (8.9%) 498 (9.0%)
   T4 8,392 (65.4%) 3,581 (65.1%)
N stage
   N0 3,093 (24.1%) 1,289 (23.4%) 0.394
   N1 1,158 (9.0%) 483 (8.8%)
   N2 5,958 (46.4%) 2,546 (46.3%)
   N3 2,631 (20.5%) 1,185 (21.5%)
M stage
   M1a 2,551 (19.9%) 1,109 (20.2%) 0.268
   M1b 10,091 (78.6%) 4,326 (78.6%)
   M1NOS 198 (1.5%) 68 (1.2%)
Surgery 0.898
   No 11,967 (93.2%) 5,132 (93.3%)
   Yes 873 (6.8%) 371 (6.7%)
Chemotherapy 0.0807
   No/unknown 4,873 (38.0%) 2,013 (36.6%)
   Yes 7,967 (62.0%) 3,490 (63.4%)
Radiation 0.341
   No 11,495 (89.5%) 4,953 (90.0%)
   Yes 1,345 (10.5%) 550 (10.0%)
Bone metastasis 0.227
   No 7,946 (61.9%) 3,353 (60.9%)
   Yes 4,894 (38.1%) 2,150 (39.1%)
Liver metastasis 0.365
   No 10,790 (84.0%) 4,654 (84.6%)
   Yes 2,050 (16.0%) 849 (15.4%)
Lung metastasis 0.932
   No 8,535 (66.5%) 3,654 (66.4%)
   Yes 4,305 (33.5%) 1,849 (33.6%)
Brain metastasis
   No 8,858 (69.0%) 3,753 (68.2%) 0.297
   Yes 3,982 (31.0%) 1,750 (31.8%)

In addition, we enrolled a total of 242 patients with metastatic NSCLC at Renji Hospital as the external validation cohort. We compared the demographic and clinicopathological characteristics of the SEER cohort and the external validation cohort (Table 2). All patients in the external validation cohort were Chinese that corresponded ‘others’ in the SEER database and the age of the external validation cohort was higher than that of the SEER cohort (P<0.001). The external validation cohort had a significantly higher proportion of T4 stage, M1a stage, surgery, and liver metastasis, and that showed a significantly lower proportion of chemotherapy than the SEER cohort (all P<0.05). However, there was no statistically-significant difference between the 2 cohorts regarding gender, marital status, primary site, histology, grade, laterality, N stage, radiation therapy, bone metastasis, brain metastasis, and lung metastasis (all P>0.05). The significant differences between the 2 cohorts helped to highlight the efficacy of the validation. Figure 1 shows the flowchart of the study design.

Table 2

Demographics and clinicopathological characteristics of the SEER and External validation cohort

Variables External cohort (N=242) SEER cohort (N=18,343) P value
Age <0.001
   20–54 years 26 (10.7%) 2,563 (14.0%)
   55–64 years 60 (24.8%) 5,108 (27.8%)
   65–74 years 66 (27.3%) 6,053 (33.0%)
   75–84 years 53 (21.9%) 3,812 (20.8%)
   85+ years 37 (15.3%) 807 (4.4%)
Race <0.001
   White 0 (0%) 14,166 (77.2%)
   Black 0 (0%) 2,525 (13.8%)
   Other 242 (100%) 1,652 (9.0%)
Gender 0.912
   Female 109 (45.0%) 8,197 (44.7%)
   Male 133 (55.0%) 10,146 (55.3%)
Marital status 0.061
   Married 117 (48.3%) 9,977 (54.4%)
   Others 125 (51.7%) 8,366 (45.6%)
Primary site 0.359
   Main bronchus 8 (3.3%) 883 (4.8%)
   Upper lobe 138 (57.0%) 11,036 (60.2%)
   Middle lobe 15 (6.2%) 825 (4.5%)
   Lower lobe 76 (31.4%) 5,372 (29.3%)
   Overlapping lesion of lung 5 (2.1%) 227 (1.2%)
Histology 0.160
   Adenocarcinoma 120 (49.6%) 9,708 (52.9%)
   Squamous 72 (29.8%) 4,467 (24.4%)
   Others 50 (20.7%) 4,168 (22.7%)
Grade 0.329
   Grade I 8 (3.3%) 990 (5.4%)
   Grade II 66 (27.3%) 5,099 (27.8%)
   Grade III 163 (67.4%) 11,670 (63.6%)
   Grade IV 5 (2.1%) 584 (3.2%)
Laterality 0.504
   Bilateral 0 (0%) 67 (0.4%)
   Left 106 (43.8%) 7,600 (41.4%)
   Right 136 (56.2%) 10,676 (58.2%)
T stage 0.012
   T0 + T1 19 (7.9%) 1,731 (9.4%)
   T2 22 (9.1%) 3,004 (16.4%)
   T3 25 (10.3%) 1,635 (8.9%)
   T4 176 (72.7%) 11,973 (65.3%)
N stage 0.177
   N0 51 (21.1%) 4,382 (23.9%)
   N1 17 (7.0%) 1,641 (8.9%)
   N2 111 (45.9%) 8,504 (46.4%)
   N3 63 (26.0%) 3,816 (20.8%)
M stage 0.002
   M1a 68 (28.1%) 3,660 (20.0%)
   M1b 168 (69.4%) 14,417 (78.6%)
   M1NOS 6 (2.5%) 266 (1.5%)
Surgery 0.007
   No 215 (88.8%) 17,099 (93.2%)
   Yes 27 (11.2%) 1,244 (6.8%)
Chemotherapy 0.024
   No/unknown 108 (44.6%) 6,886 (37.5%)
   Yes 134 (55.4%) 11,457 (62.5%)
Radiation 0.093
   No 225 (93.0%) 16,448 (89.7%)
   Yes 17 (7.0%) 1,895 (10.3%)
Bone metastasis 0.290
   No 141 (58.3%) 11,299 (61.6%)
   Yes 101 (41.7%) 7,044 (38.4%)
Liver metastasis 0.040
   No 192 (79.3%) 15,444 (84.2%)
   Yes 50 (20.7%) 2,899 (15.8%)
Lung metastasis 0.110
   No 149 (61.6%) 12,189 (66.5%)
   Yes 93 (38.4%) 6,154 (33.5%)
Brain metastasis 0.459
   No 161 (66.5%) 12,611 (68.8%)
   Yes 81 (33.5%) 5,732 (31.2%)

SEER, Surveillance, Epidemiology, and End Results.

Figure 1 Flowchart of patient screening and study design. SEER, Surveillance, Epidemiology, and End Results; SCLC, small cell lung cancer; NSCLC, non-small cell lung cancer; ROC, receiver operating curve; DCA, decision curve analysis.

Univariate and multivariate analysis in the training cohort

We conducted the univariate and multivariate analysis using the Cox proportional hazards regression model in the training cohort (n=12,840), with a total of 11,446 events recorded (Table 3). In terms of OS, in terms of OS, the univariate analysis showed that the vast majority of the variables including age, race, gender, marital status, primary site, histology, grade, T stage, N stage, M stage, surgery, chemotherapy, radiation therapy, bone metastasis, liver metastasis, and brain metastasis was significantly associated with the OS of the patients (all P<0.05), with the exceptions of laterality and lung metastasis (P>0.05). When incorporated into the multivariate model, all included variables remained statistically significant after a stepwise regression (all P<0.05).

Table 3

Univariate and multivariate Cox regression analysis of each factor’s ability for predicting OS in the training cohort

Variables Univariate analysis Multivariate analysis
HR (95% CI) P value HR (95% CI) P value
Age
   20–54 years Reference Reference
   55–64 years 1.182 (1.111–1.258) <0.001 1.150 (1.080–1.224) <0.001
   65–74 years 1.356 (1.277–1.441) <0.001 1.301 (1.224–1.383) <0.001
   75–84 years 1.540 (1.443–1.643) <0.001 1.401 (1.309–1.498) <0.001
   85+ years 1.736 (1.573–1.917) <0.001 1.351 (1.219–1.497) <0.001
Race
   Black Reference Reference
   White 0.954 (0.905–1.005) 0.081 1.351 (0.981–1.092) 0.205
   Other 0.684 (0.631–0.741) <0.001 0.757 (0.698–0.822) <0.001
Gender
   Female Reference Reference
   Male 1.287 (1.24–1.335) <0.001 1.237 (1.191–1.285) <0.001
Marital status
   Married Reference Reference
   Others 1.211 (1.167–1.256) <0.001 1.151 (1.108–1.196) <0.001
Primary site
   Main bronchus Reference Reference
   Upper lobe 0.761 (0.700–0.828) <0.001 0.843 (0.774–0.918) <0.001
   Middle lobe 0.699 (0.620–0.788) <0.001 0.794 (0.704–0.896) <0.001
   Lower lobe 0.789 (0.723–0.861) <0.001 0.877 (0.803–0.958) 0.003
   Overlapping lesion of lung 0.714 (0.591–0.863) <0.001 0.815 (0.674–0.984) 0.034
Histology
   Adenocarcinoma Reference Reference
   Squamous 1.341 (1.282–1.402) <0.001 1.198 (1.143–1.255) <0.001
   Others 1.229 (1.173–1.286) <0.001 1.167 (1.113–1.224) <0.001
Grade
   Grade I Reference Reference
   Grade II 1.288 (1.178–1.409) <0.001 1.185 (1.082–1.298) <0.001
   Grade III 1.668 (1.531–1.817) <0.001 1.448 (1.327–1.580) <0.001
   Grade IV 1.961 (1.724–2.232) <0.001 1.643 (1.440–1.875) <0.001
Laterality
   Bilateral Reference NA
   Left 0.914 (0.672–1.244) 0.570 NA
   Right 0.926 (0.681–1.260) 0.627 NA
T stage
   T0 + T1 Reference Reference
   T2 1.200 (1.112–1.296) <0.001 1.218 (1.128–1.315) <0.001
   T3 1.408 (1.292–1.535) <0.001 1.300 (1.191–1.419) <0.001
   T4 1.379 (1.291–1.472) <0.001 1.427 (1.335–1.526) <0.001
N stage
   N0 Reference Reference
   N1 1.171 (1.089–1.259) <0.001 1.173 (1.090–1.261) <0.001
   N2 1.318 (1.258–1.381) <0.001 1.299 (1.238–1.364) <0.001
   N3 1.328 (1.256–1.404) <0.001 1.409 (1.329–1.493) <0.001
M stage
   M1a Reference Reference
   M1b 1.403 (1.338–1.471) <0.001 1.228 (1.161–1.300) <0.001
   M1NOS 1.296 (1.112–1.510) <0.001 1.124 (0.961–1.313) 0.142
Surgery
   No Reference Reference
   Yes 0.444 (0.409–0.481) <0.001 0.535 (0.490–0.584) <0.001
Chemotherapy
   No/unknown Reference Reference
   Yes 0.462 (0.445–0.480) <0.001 0.426 (0.410–0.444) <0.001
Radiation
   No Reference Reference
   Yes 0.7095 (0.667–0.7547) <0.001 0.892 (0.835–0.954) <0.001
Bone metastasis
   No Reference Reference
   Yes 1.252 (1.206–1.3) <0.001 1.248 (1.197–1.301) <0.001
Liver metastasis
   No Reference Reference
   Yes 1.434 (1.365–1.506) <0.001 1.299 (1.234–1.366) <0.001
Lung metastasis
   No Reference NA
   Yes 0.9835 (0.946–1.022) 0.4014 NA
Brain metastasis
   No Reference Reference
   Yes 1.171 (1.126–1.218) <0.001 1.332 (1.274–1.393) <0.001

OS, overall survival; CI, confidence interval; HR, hazard ratio; NA, not available.

Development of the nomogram

We established the nomogram based on the established multivariate model (Figure 2). A total of 16 risk factors were included: age, race, gender, marital status, primary site, histology, grade, T stage, N stage, M stage, surgery, chemotherapy, radiation therapy, bone metastasis, liver metastasis, and brain metastasis. Since most patients survived less than 1 year, we built the nomogram predicting the survival probability at 3, 6, and 12 months. The C-index of the nomogram was 0.702 (95% CI: 0.684–0.720). For example, in the case of a 40-year-old white patient who was divorced and had been diagnosed with a grade III lung adenocarcinoma in the left upper lobe, the TNM stage was T1N1M1b (bone metastasis) and stage IV. He had received no surgery, no chemotherapy, and no radiation. This patient would be scored 1,100 points according to the nomogram, with the survival probabilities of 0.386 for less than 3 months, 0.6 for less than 6 months, and 0.813 for less than 12 months. The nomogram was published online at https://pillawang.shinyapps.io/dynnomapp/.

Figure 2 Nomogram for predicting the probability of 3-, 6-, and 12-month OS in patients with metastatic NSCLC. NSCLC, non-small cell lung cancer; OS, overall survival. ***, P<0.001.

Validation of the nomogram

We applied an internal validation cohort from the SEER database (n=5,503) and an external validation cohort (n=242) to validate the nomogram, indicating that the nomogram also exhibited good prognostic value in the internal validation cohort (C-index =0.699, 95% CI: 0.673–0.725) and external validation cohort (C-index =0.695, 95% CI: 0.653–0.737). We also plotted the calibration plots of the nomogram in the training cohort, internal validation cohort, and external validation cohort (Figure 3) by 1,000 bootstrap resamples. The calibration plots showed that there was a good concordance between the predicted and observed 3-, 6-, and 12-month OS probability in internal and external validations. However, we noticed that the 12-month OS rate of the external validation group was higher than those of the training cohort and the internal validation cohort.

Figure 3 Calibration curves predicting the survival probability less than 3 (A), 6 (B), and 12 (C) months in the training, internal, and external cohorts.

The ROC analysis showed that the nomogram had a high discriminative ability in all cohorts (Figure 4). The training cohort’s 3-, 6-, and 12-month AUCs were 0.781, 0.762, and 0.754, respectively. The internal validation cohort’s 3-, 6-, and 12-month AUCs were 0.777, 0.754, and 0.747, respectively. The external validation cohort’s 3-, 6-, and 12-month AUCs were 0.793, 0.753, and 0.759, respectively.

Figure 4 ROC curves and AUCs at 3-, 6-, and 12-month in the training cohort (A), internal validation cohort (B), and the external validation cohort (C). ROC, receiver operating characteristic; AUC, area under the curve.

Survival and DCA analysis

The Cox hazard proportional regression model's cut-off point was set at 1.05, dividing the patients into the high- and low-risk groups. We compared the survival between the high- and low-risk groups using Kaplan–Meier survival curve (Figure 5), indicating a significant difference between the high- and low-risk groups in the training, internal, and external validation cohort (all P<0.001). We also completed the DCA analysis to compare the nomogram and the TNM staging system in the prediction performance (Figure 6). The results demonstrated that the nomogram was better than the TNM staging system in predicting 3-, 6-, and 12-month OS. The C-index of the TNM staging system was 0.563 (95% CI: 0.560–0.565).

Figure 5 Kaplan-Meier curves of OS for risk stratification in the training cohort (A), internal validation cohort (B), and the external validation cohort (C). OS, overall survival.
Figure 6 DCA of AJCC 8th TNM stage and nomogram for 3-, 6-, and 12-month OS of the training (A), internal (B), and external cohorts (C). DCA, decision curve analysis; AJCC, American Joint Committee on Cancer; TNM, tumor-node-metastasis; OS, overall survival.

Discussion

According to the latest reports in the US, the incidence of NSCLC per 100,000 has dropped from 46.4 in 2010 to 40.9 in 2017 overall, and that of stage IV at diagnosis has decreased slightly from 21.7 to 19.6 (12). Nevertheless, the 5-year survival probability decreases sharply according to the stages, from 50–65% for stage I to 2–3% for stage IV (13). In this study, we attempted to build a nomogram for stage IV patients based on the SEER database and then to validate the nomogram with internal and external validation cohorts.

A total of 16 independent risk factors, which was significantly higher than those in previous reports, were identified in this study. The entered risk factors could be attributed to 3 aspects. Firstly, the demographic characteristics including age, gender, marital status, and race were chosen for the nomogram. The earlier nomogram for stage IB NSCLC only contained the age and gender without marital status and race because the authors did not input the race into the univariate analysis. The sample size was far less than in our study; hence the marital status was not statistically significant (10). Secondly, the tumor information including primary site, histology, grade, T stage, N stage, and M stage was entered into the nomogram. Zheng et al. investigated lung cancer incidence, survival, and prognostic factors with bone metastasis and developed a nomogram (14). The factors of age, gender, the total number of sites, histological types, grade, tumor size, and treatment were enrolled into the model, which was quite different from our study. The total number of sites was limited to 1 in our study, and the tumor size equaled the T stage in our model. Wang et al. compared different N descriptor numbers of positive lymph nodes (NPLN), log odds of positive lymph nodes (LODDS), and lymph node ratio (LNR) in their prognostic roles for lung adenocarcinoma. They found that LODDS + LNR demonstrated the highest prediction accuracy, and developed a nomogram based on the findings (11). All of the nomograms above did not include the M stage since the studies were limited to the M0 stage or metastasis to bone. Thirdly, the treatment modalities, including surgery, chemotherapy, and radiation, were vital in the model. All treatments were important protecting factors for OS, with a significantly lowered hazard ratio (HR) of 0.535 (95% CI: 0.490–0.584) for the surgery, 0.426 (95% CI: 0.410–0.444) for the chemotherapy, and 0.892 (95% CI: 0.835–0.954) for the radiation. Surprisingly, the surgery was a significantly-improving factor for OS. Chao et al. compared the OS of patients with stage IV extrathoracic metastatic NSCLC receiving surgery or not. They demonstrated that surgery could improve the survival of patients with single organ metastasis, while surgery showed no significant survival benefits in patients with multiple organ metastases (15). Lastly, the metastasis sites were included in the multivariate model. We have transformed the 7th AJCC TNM staging into the 8th edition, although the M stage was not changed because the number of sites of the metastasis was unknown in the SEER database.

A nomogram with a C-index higher than 0.70 is usually considered accurate and useful. Liang’s nomogram had a C-index higher than the 7th AJCC TNM staging system in both the primary cohort (0.71 vs. 0.68, respectively; P<0.01) and IASLC cohort (0.67 vs. 0.64, respectively; P=0.06). We also calculated the C-index of the TNM staging system in metastatic NSCLC patients, which was lower than that of the nomogram (0.563 vs. 0.702, P<0.001). The DCA analysis also demonstrated that the nomogram performed better than the TNM staging system. The calibration plot and Kaplan–Meier survival curve were constructed to validate the nomogram in the internal and external validation cohorts, indicating that the nomogram was as accurate and discriminative as in the internal validation cohort. We noticed that the OS of the external cohort was better than that of the SEER cohort. We supposed that the diagnosis year of the external validation cohort was 2015–2020, when novel therapies had improved the OS of stage IV.

To the best of our knowledge, this is the first nomogram for predicting the survival of patients with metastatic NSCLC based on an extensive database with long-term follow-up and validated by a single-center retrospective cohort. We have also provided an online tool of the nomogram for prognosis prediction. However, several limitations of this study must be noted. Firstly, our nomogram was more complex than the TNM classification, when16 items must be considered and analyzed. It is hard to make an accurate grading of the pathological results. Since metastatic NSCLC cases are the main subjects, the pathological specimens were likely to be biopsy specimens, and the entire tumors have not been evaluated. Secondly, although we have transformed the T stage and N stage from the 7th AJCC TNM stage to the 8th AJCC TNM stage, the M stage could not be transformed due to the lack of information about the number of the metastatic sites in the SEER database. In our nomogram, the M1b and M1c stage in the 8th AJCC TNM must be allocated into the M1b stage. Thirdly, molecular or genetic information is now becoming an important aspect affecting the prognosis, which was absent from the nomogram and should be considered in future models. Lastly, only traditional treatments were included in the model without novel therapies, such as targeted therapy and immunotherapy.


Conclusions

We have developed a novel dynamic nomogram for predicting the survival of metastatic NSCLC patients. The internal and external cohort validations demonstrated that the nomogram had good accuracy and discriminative ability. This tool provides a practical tool for clinicians to evaluate the stage and predict the prognosis for patients with stage IV NSCLC.


Acknowledgments

The authors would like to thank all patients and staff who have participated in the SEER program. The authors also appreciate the academic support from the AME Thoracic Surgery Collaborative Group.

Funding: This study was funded by the 2021 “Clinical+” Excellence Program (Grant No. 2021ZYA001), and Three-year Action Plan Project to Promote Clinical Skills and Clinical Innovation Capability of Municipal Hospitals (No. SHDC2020CR5001), Shanghai Shenkang Hospital Development Center.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-22-544/rc

Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-22-544/dss

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-22-544/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by The Ethics Committee of Renji Hospital, Shanghai Jiao Tong University School of Medicine (Shanghai, China) (No. RA-2020-572), and informed consent was taken from all the patients.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Siegel RL, Miller KD, Fuchs HE, et al. Cancer Statistics, 2021. CA Cancer J Clin 2021;71:7-33. [Crossref] [PubMed]
  2. Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
  3. Lim W, Ridge CA, Nicholson AG, et al. The 8th lung cancer TNM classification and clinical staging system: review of the changes and clinical implications. Quant Imaging Med Surg 2018;8:709-18. [Crossref] [PubMed]
  4. Goldstraw P, Chansky K, Crowley J, et al. The IASLC Lung Cancer Staging Project: Proposals for Revision of the TNM Stage Groupings in the Forthcoming (Eighth) Edition of the TNM Classification for Lung Cancer. J Thorac Oncol 2016;11:39-51. [Crossref] [PubMed]
  5. Planchard D, Popat S, Kerr K, et al. Metastatic non-small cell lung cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol 2018;29:iv192-237. [Crossref]
  6. Iasonos A, Schrag D, Raj GV, et al. How to build and interpret a nomogram for cancer prognosis. J Clin Oncol 2008;26:1364-70. [Crossref] [PubMed]
  7. Liang W, Zhang L, Jiang G, et al. Development and validation of a nomogram for predicting survival in patients with resected non-small-cell lung cancer. J Clin Oncol 2015;33:861-9. [Crossref] [PubMed]
  8. Nicholson AG, Chansky K, Crowley J, et al. The International Association for the Study of Lung Cancer Lung Cancer Staging Project: Proposals for the Revision of the Clinical and Pathologic Staging of Small Cell Lung Cancer in the Forthcoming Eighth Edition of the TNM Classification for Lung Cancer. J Thorac Oncol 2016;11:300-11.
  9. Wankhede D. Evaluation of Eighth AJCC TNM Sage for Lung Cancer NSCLC: A Meta-analysis. Ann Surg Oncol 2021;28:142-7. [Crossref] [PubMed]
  10. Zuo Z, Zhang G, Song P, et al. Survival Nomogram for Stage IB Non-Small-Cell Lung Cancer Patients, Based on the SEER Database and an External Validation Cohort. Ann Surg Oncol 2021;28:3941-50. [Crossref] [PubMed]
  11. Wang S, Yu Y, Xu W, et al. Dynamic nomograms combining N classification with ratio-based nodal classifications to predict long-term survival for patients with lung adenocarcinoma after surgery: a SEER population-based study. BMC Cancer 2021;21:653. [Crossref] [PubMed]
  12. Ganti AK, Klein AB, Cotarla I, et al. Update of Incidence, Prevalence, Survival, and Initial Treatment in Patients With Non-Small Cell Lung Cancer in the US. JAMA Oncol 2021;7:1824-32. [Crossref] [PubMed]
  13. Mar J, Arrospide A, Iruretagoiena ML, et al. Changes in lung cancer survival by TNM stage in the Basque country from 2003 to 2014 according to period of diagnosis. Cancer Epidemiol 2020;65:101668. [Crossref] [PubMed]
  14. Zheng XQ, Huang JF, Lin JL, et al. Incidence, prognostic factors, and a nomogram of lung cancer with bone metastasis at initial diagnosis: a population-based study. Transl Lung Cancer Res 2019;8:367-79. [Crossref] [PubMed]
  15. Chao C, Qian Y, Li X, et al. Surgical Survival Benefits With Different Metastatic Patterns for Stage IV Extrathoracic Metastatic Non-Small Cell Lung Cancer: A SEER-Based Study. Technol Cancer Res Treat 2021;20:15330338211033064. [Crossref] [PubMed]
Cite this article as: Wang Q, Wang Y, Wang X, Nakamura Y, Hydbring P, Yamauchi Y, Zhao X, Cao M. Development and validation of a dynamic survival nomogram for metastatic non-small cell lung cancer based on the SEER database and an external validation cohort. Transl Lung Cancer Res 2022;11(8):1678-1691. doi: 10.21037/tlcr-22-544

Download Citation