Development and validation of an interpretable machine learning model for prediction of occult lymph node metastasis in clinical stage T1 lung adenocarcinoma
Highlight box
Key findings
• We developed an interpretable machine learning (ML) model that accurately predicts occult lymph node metastasis (OLNM) in clinical stage T1 lung adenocarcinoma (LUAD).
What is known and what is new?
• Prediction of OLNM remains challenging, and existing models often lack interpretability or multimodal integration.
• This study introduces a novel ML model that integrates clinical, radiological, and molecular data, providing a robust, interpretable tool for individualized risk assessment.
What is the implication, and what should change now?
• Clinical implementation could optimize the balance between oncological radicality and surgical morbidity, supporting tailored management in early-stage LUAD.
Introduction
Lung cancer remains the leading cause of cancer-related mortality worldwide, with adenocarcinoma being the most predominant histological subtype (1-4). The widespread implementation of low-dose computed tomography (LDCT) screening has led to a significant increase in the detection of early-stage non-small cell lung cancers (NSCLCs), particularly peripheral small pulmonary nodules (5-7). Most of the pathological types are lung adenocarcinoma (LUAD). For these patients, surgical resection, often via anatomic lobectomy with systematic lymph node dissection (LND), represents the cornerstone of curative-intent treatment (8,9).
The accurate assessment of lymph node status is a critical prognostic factor and the cornerstone of postoperative adjuvant therapy decision-making. The current clinical standard relies on imaging characteristics, such as the size and morphology of lymph nodes on computed tomography (CT) or metabolic activity on positron emission tomography (PET)-CT, to preoperatively predict nodal involvement (10,11). However, these methods exhibit considerable limitations in sensitivity and specificity for detecting occult lymph node metastasis (OLNM), defined as metastatic disease not identified by preoperative imaging but confirmed by postoperative pathological examination (12). A significant proportion (10–20%) of patients with clinically node-negative (cN0) early-stage LUAD are ultimately upstaged to pathologic node-positive (pN1 or pN2) disease following surgery (13-16). This discrepancy poses a major clinical challenge. For patients with undetected OLNM, undergoing limited resection without LND may lead to inadequate staging and a higher risk of disease recurrence due to missed micrometastases (17). Conversely, for the majority without OLNM, systematic LND may represent an overtreatment, associated with increased surgical trauma, prolonged operative time, and potential complications such as chylothorax or nerve injury, without conferring a survival benefit (18-20).
Therefore, there is an urgent, unmet clinical need for a highly accurate tool to individually predict the risk of OLNM in patients with LUAD. This would facilitate a more personalized surgical strategy, potentially sparing low-risk patients the morbidity of extensive lymphadenectomy while ensuring that high-risk patients receive the appropriate oncologic resection and adjuvant therapy.
In recent years, machine learning (ML) algorithms have emerged as powerful tools in medical research, capable of identifying complex, non-linear patterns from high-dimensional multimodal data that may elude conventional statistical methods or human interpretation (21). Several studies have begun to explore radiomics features extracted from CT scans, alongside clinical and molecular biomarkers, to predict lymph node metastasis (22-24). However, many existing models are limited by a reliance on radiomics alone or lack rigorous validation in a dedicated cohort of small-sized tumors where the prediction of OLNM is most consequential for surgical planning (25,26).
To address this gap, we retrospectively analyzed cases from the Cancer Hospital of the Chinese Academy of Medical Sciences. And we developed and validated a novel ML-based predictive model integrating preoperative clinical characteristics, standard CT imaging features, and deep learning-based radiomic signatures. The primary objective of this study was to construct a robust, clinically applicable tool for the individualized comprehensive prediction of OLNM in patients with cT1N0 LUAD and compare with current predictive model, thereby providing a data-driven foundation to optimize the extent of LND and personalize surgical management. Besides, this study can further provide which clinical and pathological factors are high-risk factors for lymph node metastasis. We present this article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-1112/rc).
Methods
Study population
A retrospective cohort study was conducted on patients who underwent surgical resection for LUAD at the Department of Thoracic Surgery, Cancer Hospital of the Chinese Academy of Medical Sciences between 1st July 2019 and 31st July 2025.
The initial patient identification was performed through a query of our prospectively maintained surgical database. The inclusion criteria were as follows: (I) pathologically confirmed primary LUAD; (II) availability of complete preoperative thin-section chest CT images within 1 month prior to surgery and the tumors with a maximum diameter not exceeding 3 cm on imaging examination; (III) availability of a detailed postoperative pathological report; and (IV) underwent systematic LND or sampling. The key exclusion criteria included: (I) presence of multiple primary lung cancers or distant metastasis; (II) receipt of any neoadjuvant therapy (chemotherapy or radiotherapy) prior to surgery, which could alter tumor characteristics and lymph node status; (III) incomplete clinical or pathological data; (IV) tumors with a maximum diameter exceeding 3 cm; and (V) lymph node diameter in the hila or mediastina >1 cm on high-resolution CT (HRCT). In all, we found that 12,679 patients met the criteria, of whom 12,372 were LN-negative pathologically and 307 were LN-positive pathologically. The inclusion and exclusion criteria for study participants are depicted in Figure 1. OLNM in our study is defined as pathologically proven metastasis (pN1/N2) in patients who were cN0 based on HRCT criteria. Besides, due to previous large-scale studies demonstrating that LUAD in situ does not undergo lymph node metastasis, patients with LUAD in situ were not included in the data (9,27-29).
Given the significant class imbalance between the positive (n=307) and negative (n=12,372) cohorts, which could severely bias the ML model towards predicting the majority class, we employed propensity score matching (PSM) to create a balanced dataset for robust model development. Propensity scores were estimated using a logistic regression model with the presence of OLNM as the outcome and the baseline covariates of gender, age, and tumor location as predictors. PSM was then performed using the nearest-neighbor method without replacement, with a caliper width set to 0.1 of the standard deviation of the logit of the propensity score. Patients were matched at a 1:1 ratio. This process successfully matched all 307 OLNM-positive cases, resulting in a final balanced study population of 614 patients. The balance of covariates before and after matching was assessed using standardized mean differences (SMDs). After matching, the SMD for each matched variable was reduced to below 0.1, indicating a well-balanced cohort (Table S1).
This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Institutional Review Board of the National Cancer Center (approval No. NCC2023C-913). The board granted a waiver of informed consent based on the retrospective design of the study. All patient data were anonymized to protect confidentiality.
Clinicopathological characteristics
A comprehensive set of clinical and pathological variables was collected from electronic medical records for each eligible patient. These data encompassed: sex, age at surgery, smoking history (smoker and non-smoker), and family history of any cancer.
Tumor characteristics: tumor location (categorized by lobe), maximum tumor diameter on preoperative CT, tumor stage (T stage) [according to the 8th edition of the American Joint Committee on Cancer (AJCC) tumor-node-metastasis (TNM) staging system], and the nodule consistency [solid nodule (SN), part-SN (PSN), and pure ground-glass nodule (pGGN)]. All patients underwent standardized preoperative contrast-enhanced HRCT scans within 1 month prior to surgery, using 64-detector-row CT scanners (Lightspeed Ultra, GE Healthcare, Chicago, IL, USA; Toshiba, Tokyo, Japan). The total tumor diameter was measured based on the longest diameter of each tumor, employing a lung window setting with a level of −500 Hounsfield units (HU) and a width of 1,400 HU. The imaging features of nodule consistency were evaluated by two experienced thoracic surgeons, one with 4 years of clinical experience and the other with more than 20 years of clinical experience. If there was a discrepancy between the original diagnostic report and the interpretation of the two researchers, a consensus was reached to resolve the discrepancy.
Pathological outcomes: histopathological subtype [based on the 2015 World Health Organization (WHO) classification, including minimally invasive adenocarcinoma (MIA), lepidic, acinar, papillary, micropapillary, and solid patterns], the presence of visceral pleural invasion (VPI), the presence of spread through air spaces (STAS), and the definitive pathological status of dissected lymph nodes. The presence of OLNM was defined as pathologically confirmed metastasis (pN1 or pN2) in patients who were clinically staged as cN0. Epidermal growth factor receptor (EGFR) mutations were detected in tumor samples and were tested using polymerase chain reaction (PCR)-based methods in all patients.
A small proportion of the clinical and pathological variables contained missing values. The extent of missingness for any single variable was minimal (<5% of the total cohort). Missing data were handled using random forest imputation.
Model building and validation
To develop and validate the ML model, the entire dataset was randomly partitioned into a training set (80% of patients) and an internal validation set (the remaining 20%) using a computer-generated random number sequence, ensuring a proportional distribution of the primary outcome (OLNM) between the two sets.
In this study, potential predictors associated with OLNM were initially selected through univariate analysis, with statistically significant variables subsequently incorporated into a multivariate logistic regression analysis for further refinement. Following this variable selection process, three distinct predictive models—a clinical model, a radiomics model, and a composite model—were constructed using R software (version 4.4.2). The models were developed by fitting the data using the glm function, with model performance and variable significance being assessed via the summary function. The predictive probability for each patient was calculated using the predict function with the type set to “response”. The performance of these models was rigorously evaluated on an internal testing cohort and an internal validation cohort by plotting receiver operating characteristic (ROC) curves and calculating the corresponding areas under the ROC curve (AUCs). To statistically compare the predictive efficacy between models within each cohort, DeLong’s test was employed. Furthermore, decision curve analysis (DCA) was applied across the training, testing, and validation sets to quantify and compare the net clinical benefit of the clinical, radiomics, and composite models over a range of threshold probabilities. Based on a comprehensive appraisal of the AUC values, DeLong’s test results, and DCA findings, the model demonstrating superior and stable predictive performance with the highest clinical utility was ultimately selected.
Statistical analyses
All statistical analyses and ML procedures were carried out in R (version 4.4.2; R Foundation for Statistical Computing). Statistical significance was defined as a P value <0.05. Descriptive statistics were computed for all study variables. Categorical data are summarized as counts and percentages, while continuous variables are expressed as either mean ± standard deviation.
Variable selection for the multivariable logistic regression model was conducted systematically. All variables that exhibited a statistically significant association with OLNM in the univariable analysis (P<0.05) were eligible for inclusion in the subsequent multivariable model. We acknowledge that the statistical power for certain subgroup analyses (e.g., VPI) might be limited due to a low number of cases. The results from these specific subgroups should therefore be interpreted with appropriate caution. It is noted that variables with limited cases which did not retain statistical significance in the final multivariable model, such as VPI, were not carried forward for the development of the ML predictors.
Results
Clinical characteristics
The 614 patients from the CCAM cohorts were randomly divided into training and testing groups in 8:2 ratio. Characteristics of 614 patients are shown in Table 1. The training data includes 492 patients, while test data including 122 patients. The characteristics for each group are presented in Table S2.
Table 1
| Characteristics | Data (n=614) |
|---|---|
| Gender | |
| Female | 324 (52.8) |
| Male | 290 (47.2) |
| Age (years) | 59.35±9.63 |
| Size (cm) | 1.85±0.63 |
| Smoking history | |
| Yes | 165 (26.9) |
| No | 449 (73.1) |
| Family history | |
| Yes | 191 (31.1) |
| No | 423 (68.9) |
| Primary site | |
| LUL | 189 (30.8) |
| LLL | 118 (19.2) |
| RUL | 156 (25.4) |
| RML | 47 (7.7) |
| RLL | 104 (16.9) |
| CTR | |
| pGGN | 124 (20.2) |
| PSN | 341 (55.5) |
| CN | 149 (24.3) |
| T stage | |
| mi | 112 (18.2) |
| 1a | 79 (12.9) |
| 1b | 221 (36.0) |
| 1c | 202 (32.9) |
| Grade | |
| Low | 268 (43.6) |
| Mid | 178 (29.0) |
| High | 168 (27.4) |
| Histologic type | |
| MIA | 109 (17.8) |
| Acinar | 244 (39.7) |
| Lepidic | 73 (11.9) |
| Papillary | 49 (8.0) |
| Micropapillary | 60 (9.8) |
| Solidv | 79 (12.9) |
| STAS | |
| Yes | 204 (33.2) |
| No | 410 (66.8) |
| VPI | |
| Yes | 16 (2.6) |
| No | 598 (97.4) |
| EGFR mutation | |
| Yes | 442 (72.0) |
| No | 172 (28.0) |
Data are presented as n (%) or mean ± SD. CN, solid nodule; CTR, consolidation-to-tumor ratio; EGFR, epidermal growth factor receptor; LLL, left lower lobe; LUL, left upper lobe; MIA, minimally invasive adenocarcinoma; pGGN, pure ground-glass nodule; PSN, part-solid nodule; RLL, right lower lobe; RML, right middle lobe; RUL, right upper lobe; SD, standard deviation; STAS, spread through air spaces; T stage, tumor stage; VPI, visceral pleural invasion.
Variable selection by logistic regression
We conducted univariable and multivariable logistic regression to evaluate potential predictors and determine those significantly associated with OLNM in early-stage LUAD patients. The results are summarized in Table 2.
Table 2
| Variables | Univariable analysis | Multivariable analysis | |||
|---|---|---|---|---|---|
| OR (95% CI) | P value | OR (95% CI) | P value | ||
| Gender | |||||
| Male | 1.00 (ref.) | ||||
| Female | 1.00 (0.73–1.37) | >0.99 | |||
| Age | 1.00 (0.98–1.02) | 0.94 | |||
| Smoking history | |||||
| No | 1.00 (ref.) | ||||
| Yes | 1.16 (0.81–1.66) | 0.41 | |||
| Family history | |||||
| No | 1.00 (ref.) | ||||
| Yes | 0.75 (0.53–1.05) | 0.10 | |||
| Tumor size | 3.45 (2.58–4.59) | <0.001 | 1.59 (0.85–2.99) | 0.15 | |
| Primary site | |||||
| LUL | 1.00 (ref.) | ||||
| LLL | 1.01 (0.64–1.60) | 0.96 | |||
| RUL | 1.01 (0.66–1.54) | 0.96 | |||
| RML | 1.05 (0.56–2.00) | 0.87 | |||
| RLL | 1.01 (0.63–1.63) | 0.97 | |||
| CTR | |||||
| PSN | 1.00 (ref.) | 1.00 (ref.) | |||
| CN | 4.66 (2.87–7.57) | <0.001 | 2.04 (1.07–3.92) | 0.03 | |
| pGGN | 0.01 (0.00–0.06) | <0.001 | 0.05 (0.01–0.32) | 0.002 | |
| T stage | |||||
| 1a | 1.00 (ref.) | 1.00 (ref.) | |||
| mi | 0.00 (0.00–Inf) | 0.98 | 0.00 (0.00–Inf) | >0.99 | |
| 1b | 11.76 (5.58–24.75) | <0.001 | 2.91 (0.93–9.16) | 0.07 | |
| 1c | 34.68 (15.90–75.68) | <0.001 | 4.54 (1.19–17.33) | 0.03 | |
| Grade | |||||
| Mid | 1.00 (ref.) | 1.00 (ref.) | |||
| High | 0.02 (0.00–0.08) | <0.001 | 0.11 (0.01–0.88) | 0.04 | |
| Low | 13.14 (8.08–21.37) | <0.001 | 6.72 (3.54–12.74) | <0.001 | |
| Histologic type | |||||
| Acinar | 1.00 (ref.) | 1.00 (ref.) | |||
| MIA | 0.00 (0.00–Inf) | 0.98 | 0.47 (0.00–Inf) | >0.99 | |
| Lepidic | 0.10 (0.05–0.21) | <0.001 | 1.25 (0.31–5.09) | 0.76 | |
| Papillary | 0.69 (0.37–1.27) | 0.23 | 0.45 (0.20–0.98) | 0.045 | |
| Micropapillary | 4.29 (1.95–9.42) | <0.001 | 1.14 (0.44–2.94) | 0.79 | |
| Solid | 8.03 (3.36–19.18) | <0.001 | 2.25 (0.81–6.25) | 0.12 | |
| STAS | |||||
| No | 1.00 (ref.) | 1.00 (ref.) | |||
| Yes | 12.08 (7.79–18.74) | <0.001 | 2.59 (1.38–4.86) | 0.003 | |
| VPI | |||||
| No | 1.00 (ref.) | 1.00 (ref.) | |||
| Yes | 4.48 (1.26–15.88) | 0.02 | 0.29 (0.07–1.17) | 0.08 | |
| EGFR mutation | |||||
| No | 1.00 (ref.) | 1.00 (ref.) | |||
| Yes | 4.11 (2.78–6.06) | <0.001 | 4.43 (2.18–9.01) | <0.001 | |
CI, confidence interval; CN, solid nodule; CTR, consolidation-to-tumor ratio; EGFR, epidermal growth factor receptor; LLL, left lower lobe; LUL, left upper lobe; MIA, minimally invasive adenocarcinoma; OR, odds ratio; pGGN, pure ground-glass nodule; PSN, part-solid nodule; ref., reference; RLL, right lower lobe; RML, right middle lobe; RUL, right upper lobe; STAS, spread through air spaces; T stage, tumor stage; VPI, visceral pleural invasion.
In the univariable analysis, multiple factors were significantly associated with OLNM, including tumor size [odds ratio (OR) =3.45, 95% confidence interval (CI): 2.58–4.59, P<0.001], consolidation-to-tumor ratio (CTR; CN vs. PSN: OR =4.66, 95% CI: 2.87–7.57, P<0.001; pGGN vs. PSN: OR =0.01, 95% CI: 0.00–0.06, P<0.001), T stage (1b vs. 1a: OR =11.76, 95% CI: 5.58–24.75, P<0.001; 1c vs. 1a: OR =34.68, 95% CI: 15.90–75.68, P<0.001), grade (low vs. mid: OR =13.14, 95% CI: 8.08–21.37, P<0.001; high vs. mid: OR =0.02, 95% CI: 0.00–0.08, P<0.001), histologic type (e.g., solid vs. acinar: OR =8.03, 95% CI: 3.36–19.18, P<0.001), STAS (OR =12.08, 95% CI: 7.79–18.74, P<0.001), and EGFR mutation status (OR =4.11, 95% CI: 2.78–6.06, P<0.001).
These significant variables were further incorporated into the multivariable logistic regression model. The results indicated that CTR (CN vs. PSN: OR =2.04, 95% CI: 1.07–3.92, P=0.03; pGGN vs. PSN: OR =0.05, 95% CI: 0.01–0.32, P=0.002), T stage (1c vs. 1a: OR =4.54, 95% CI: 1.19–17.33, P=0.03), grade (low vs. mid: OR =6.72, 95% CI: 3.54–12.74, P<0.001), STAS (OR =2.59, 95% CI: 1.38–4.86, P=0.003), and EGFR mutation status (OR =4.43, 95% CI: 2.18–9.01, P<0.001) remained independent predictors of OLNM.
Besides, we calculated the variance inflation factor (VIF). The VIF for “tumor size” was 6.45 and for “T stage” was 7.12. These values, while indicating a moderate correlation, are well below the common threshold of 10, suggesting that collinearity is not severe enough to destabilize the model (Table S3).
Predictive model development
In addition to the independent variables identified by multivariable regression, tumor size was also included in the model. Based on these identified predictors, we developed and compared multiple ML models for predicting OLNM, including logistic regression, k-nearest neighbors, decision tree, random forest, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), support vector machine, and neural network. The performance metrics of each model on both training and test datasets are detailed in Table 3, supported by ROC curves (Figure 2).
Table 3
| Model | AUC (95% CI) | Accuracy | Precision | Sensitivity | Specificity | F1 score |
|---|---|---|---|---|---|---|
| Logistic | ||||||
| Train | 0.935 (0.914–0.956) | 0.870 | 0.837 | 0.919 | 0.821 | 0.876 |
| Test | 0.951 (0.910–0.991) | 0.918 | 0.881 | 0.967 | 0.869 | 0.922 |
| k-nearest neighbors | ||||||
| Train | 0.975 (0.964–0.985) | 0.894 | 0.867 | 0.931 | 0.858 | 0.898 |
| Test | 0.933 (0.889–0.982) | 0.909 | 0.878 | 0.953 | 0.867 | 0.911 |
| Decision tree | ||||||
| Train | 0.928 (0.905–0.951) | 0.882 | 0.876 | 0.890 | 0.874 | 0.883 |
| Test | 0.915 (0.862–0.969) | 0.893 | 0.864 | 0.934 | 0.852 | 0.898 |
| Random forest | ||||||
| Train | 0.981 (0.973–0.989) | 0.917 | 0.902 | 0.935 | 0.898 | 0.918 |
| Test | 0.934 (0.885–0.984) | 0.910 | 0.879 | 0.951 | 0.869 | 0.913 |
| XGboost | ||||||
| Train | 0.979 (0.952–0.979) | 0.917 | 0.896 | 0.943 | 0.890 | 0.919 |
| Test | 0.935 (0.894–0.985) | 0.869 | 0.846 | 0.902 | 0.836 | 0.873 |
| LightGBM | ||||||
| Train | 0.963 (0.948–0.977) | 0.894 | 0.873 | 0.943 | 0.866 | 0.897 |
| Test | 0.929 (0.878–0.980) | 0.902 | 0.866 | 0.902 | 0.852 | 0.906 |
| SVM | ||||||
| Train | 0.934 (0.913–0.955) | 0.858 | 0.814 | 0.927 | 0.789 | 0.867 |
| Test | 0.947 (0.906–0.990) | 0.902 | 0.855 | 0.967 | 0.836 | 0.908 |
| Neural network | ||||||
| Train | 0.934 (0.921–0.963) | 0.894 | 0.876 | 0.919 | 0.870 | 0.897 |
| Test | 0.949 (0.914–0.990) | 0.902 | 0.877 | 0.934 | 0.869 | 0.905 |
AUC, area under the receiver operating characteristic curve; CI, confidence interval; LightGBM, light gradient boosting machine; SVM, support vector machine; XGBoost, extreme gradient boosting.
Model calibration and clinical utility
To comprehensively evaluate the reliability and clinical value of each ML model, we generated and compared calibration curves and decision curves for all constructed predictors in both training and test cohorts (Figures 3,4). The results of this multi-faceted comparison further solidified the selection of the random forest model as the optimal predictive tool.
The calibration curve of the random forest model demonstrated superior performance, with its predicted probabilities showing excellent agreement with the actual observed outcomes. The curve closely approximated the ideal 45-degree reference line across most of the prediction spectrum, indicating that its probability estimates are highly reliable for clinical risk stratification.
The DCA for the clinical utility assessment is presented in Figure 4. In Figure 4A, the random forest and XGboost model consistently exhibited superior standardized net benefit compared to most competing models across the entire range of high-risk thresholds (0.0–0.8) and their corresponding cost-benefit ratios (1:100 to 4:1). In Figure 4B, each models have high standardized network benefits.
Predictive model comparison
Among all models, the random forest classifier demonstrated superior performance on the training set, achieving the highest AUC (0.981), accuracy (0.917), precision (0.902), and F1 score (0.918). More importantly, it maintained robust performance on the independent validation set (AUC =0.934, accuracy =0.910, sensitivity =0.951, specificity =0.869). The minimal performance drop and its top-tier validation performance confirm its reliability without evident overfitting. The calibration curve further confirmed that the random forest model provided the best agreement between predicted probabilities and actual outcomes.
Although the random forest model exhibited near-perfect performance on the training set (AUC =0.981), a potential indicator of overfitting, its outstanding and consistent performance on the independent validation set, coupled with its excellent calibration and superior net benefit in DCA, confirms its robustness and strong generalizability. Therefore, we select the random forest model as the optimal predictive model for subsequent analysis and are confident in its reliability for clinical application.
Interpretation of the optimal model via SHapley Additive exPlanations (SHAP)
To elucidate the decision-making process of the optimal random forest model and enhance its clinical interpretability, we performed a SHAP analysis. This analysis quantifies the contribution and directional impact of each feature on the model’s individual predictions.
The global feature importance, as summarized by the SHAP bar plot (Figure 5A), confirmed that grade, T stage, CTR, and histologic were the most influential drivers of the random forest model’s output. The bee swarm plot (Figure 5B) further illustrates the distribution of SHAP values for each feature, revealing not only the magnitude but also the directionality (positive or negative association with the risk of OLNM) of their effects.
To enhance clinical applicability, we utilized SHAP local explanation plots to interpret individual predictions. For instance, Figure 6A demonstrates the explanation for a high-risk prediction, where features such as a low grade, high CTR (CN), and T stage were the primary drivers pushing the base value towards a high probability of OLNM. Conversely, Figure 6B illustrates a low-risk case, where factors like a pGGN and low T stage strongly contributed to a negative prediction. This interpretability facilitates transparent and trustworthy decision-support at the individual level, bridging the gap between model predictions and clinical reasoning.
Furthermore, dependence plots (Figure S1) were employed to delve into the nuanced relationships between key features and the model’s predictions, with SHAP values greater than zero indicating a higher likelihood of OLNM.
Discussion
The evolution of video-assisted thoracic surgery (VATS) has revolutionized the management of early-stage lung cancer, facilitating a shift towards minimally invasive and parenchyma-sparing procedures. Within this context, the extent of systematic lymphadenectomy remains a subject of intense debate, particularly for early-stage diseases. While systematic lymphadenectomy ensures accurate staging and may mitigate local recurrence, it is concomitantly associated with prolonged operative time, increased perioperative morbidity (e.g., chylothorax, recurrent laryngeal nerve injury), and emerging evidence suggests it might potentially impair the host immune response, thereby influencing the efficacy of subsequent immunotherapy (30-35). Conversely, inadequate lymphadenectomy risks staging inaccuracy and undertreatment, potentially compromising long-term survival (36-38). Consequently, the pre-operative identification of patients with OLNM is paramount for tailoring surgical strategy and optimizing the risk-benefit ratio.
To address this challenge, we developed and validated multiple ML models using retrospective data from the National Cancer Center of China. Compared with previous studies (23,25,39), our study is a multimodal ML-based predictive model specifically designed for preoperatively assessing the risk of lymph node metastasis in early-stage LUAD. The superior predictive performance of our model, as evidenced by its high AUC and accuracy, likely stems from its ability to synergistically integrate diverse data types—including clinical, radiological, and key molecular characteristics (e.g., EGFR status)—into a unified analytical framework. Furthermore, a pivotal strength of our model that significantly enhances its translational potential is most of its reliance on variables readily available during the pre-operative or intra-operative phase (rapid freezing pathology), like grade, CTR, T stage, and so on. Consequently, our tool demonstrates strong clinical feasibility and is positioned not merely as a theoretical exercise but as a practical decision-support system capable of seamlessly integrating into the existing surgical workflow to promote personalized patient management.
Although the present study did not collect long-term survival data, the presence of OLNM is a well-established adverse prognostic factor associated with significantly worse progression-free and overall survival (40-42). The clinical value of our predictive model lies in its ability to identify these high-risk individuals, thereby ensuring they receive optimal surgical management and adjuvant therapy, which are known to improve outcomes in node-positive disease and avoid the overtreatment to low-risk individuals.
Our ML model, particularly the random forest algorithm, demonstrated excellent and robust performance in predicting OLNM. It achieved high predictive accuracy and exceptional sensitivity, ensuring that the vast majority of patients with nodal involvement would be correctly identified, which is paramount for clinical decision-making. Furthermore, the application of SHAP analysis provided crucial model interpretability, successfully identifying and quantifying key predictive features and their complex interactions, thereby bridging the gap between a ‘black-box’ algorithm and clinically actionable insights. The model’s strong generalizability, evidenced by consistent performance across validation sets, highlights its potential utility as a reliable tool for pre-operative risk stratification, paving the way for more personalized surgical management in early-stage lung cancer.
Using our random forest model’s performance on the validation set (sensitivity =95.1%, specificity =86.9%), we simulated its application: The model could correctly identify a significant proportion of low-risk patients. In addition, among the patients in our center, the number of patients with negative lymph nodes is far greater than that with positive lymph nodes. Because systematic LND is still routinely recommended, its ability to identify negative lymph nodes is more important and has greater clinical significance. However, due to the large amount of data, accurate evaluation cannot be carried out. The model’s high sensitivity ensures that occult metastases are rarely missed.
Regarding model performance, it is noteworthy that the AUC of some internal validation sets was marginally higher than that of the training set. This observation, though seemingly counterintuitive, is not uncommon and can be attributed to the effective regularization applied during model training, which slightly constrained performance on the training data to prevent overfitting, and the inherent randomness in data splitting, which may have resulted in a validation set with a slightly more favorable case distribution. This phenomenon ultimately underscores the model’s robust generalizability and its ability to perform reliably on unseen data.
Besides, our analysis yielded several other key findings: First, our data robustly reinforce the prevailing international consensus: MIA and pGGNs exhibit an exceedingly low risk of OLNM. In our cohort, none of the patients with pathologically confirmed MIA had nodal involvement. Notably, among the 124 patients with radiologically defined pGGNs, only 2 (1.6%) had lymph node metastasis, and both presented with tumors approaching the 3 cm size criterion. This finding provides compelling evidence to inform surgical decision-making. It suggests that for patients with intraoperative frozen section confirmation of MIA or radiologically defined pGGNs (especially those <2 cm), omitting systematic LND in favor of selective sampling or no dissection may be a safe and viable approach, potentially maximizing the benefits of minimally invasive surgery without compromising oncological outcomes. Our conclusions align with the objectives of ongoing prospective trials (e.g., ECTOP-1009) and could contribute to future refinements in clinical guidelines (43).
Second, our ML model successfully identified and quantified key clinicopathological predictors of OLNM. Histological subtype, differentiation grade, and EGFR mutation status were all identified as significant contributors. Specifically, poor differentiation, a solid or micropapillary predominant pattern, and the presence of an EGFR mutation were salient risk factors for nodal metastasis. More importantly, SHAP analysis revealed clinically relevant synergistic interactions between these variables. For instance, the co-occurrence of an EGFR mutation and poor differentiation conferred a risk of OLNM far greater than the sum of their individual effects. The discovery of these complex patterns underscores the unique advantage of ML over traditional statistical methods in modeling intricate clinical relationships.
The high predictive accuracy of our model stems from its fundamental design as a comprehensive tool that synergistically integrates multimodal data, rather than relying solely on preoperative imaging. While clinical and radiological features (such as CTR and tumor size) provide the initial anatomical landscape, it is the incorporation of definitive histopathological characteristics—including tumor grade, histologic type, and STAS—that unlocks a deeper, biologically grounded understanding of tumor aggressiveness and metastatic potential. This integration reflects a critical clinical reality: accurate risk stratification for OLNM requires moving beyond what imaging alone can reveal. Our findings demonstrate that the combination of pre-operative radiological assessment with pathological profiling creates a more robust and reliable predictive system. The significant contribution of these pathological features to the model’s output, as evidenced by the SHAP analysis, underscores their indispensable role. The clinical translation of our predictive model requires careful consideration of variable availability. Our model identifies the definitive pathological profile associated with OLNM. While some of them are postoperative variables, the other critical components—tumor grade and CTR—can be determined during surgery via frozen section analysis and preoperative imaging. This means that a simplified, intraoperative risk assessment can be performed using these available factors. We envision a two-stage clinical utility: firstly, an intraoperative model using frozen section-based variables can provide immediate guidance; secondly, if the patient has a clear preoperative pathological diagnosis, a very accurate prediction of lymph node metastasis can be made. Our current study provides the essential pathological foundation for this future development.
Despite these promising results, there are several limitations in this study. First, this study is primarily limited by its single-center, retrospective design and the potential for selection bias inherent in such analyses; consequently, the generalizability of our predictive model requires further validation in large-scale, prospective, multi-center cohorts to confirm its robustness and clinical utility. Second, it is important to note that all data used to develop and validate our model were derived from an East Asian population, which characteristically exhibits a high prevalence of EGFR mutations—a key predictive feature incorporated into our algorithm. Therefore, the generalizability of our predictive model to other ethnic populations with differing EGFR mutation rates and genetic backgrounds may be limited. Third, the high predictive accuracy of our model is contingent upon the inclusion of detailed histopathological features (such as grade, histologic type, and STAS). This means that a truly preoperative prediction with comparable accuracy would necessitate the availability of this pathological information before surgery. This could be achieved through preoperative biopsy; however, the challenge lies in the potential for sampling error and the difficulty in fully characterizing the entire tumor’s heterogeneity from a small biopsy sample. Future research should focus on developing non-invasive or minimally invasive methods that can reliably preoperatively proxy these critical pathological determinants, thereby enabling a shift to genuine preoperative risk stratification.
Conclusions
In conclusion, we have successfully developed and internally validated an interpretable ML-based predictive model that effectively integrates routinely available preoperative clinical, radiological, pathological findings and gene mutation features for the individualized prediction of OLNM in patients with clinical stage T1N0 LUAD. The random forest algorithm emerged as the optimal model, demonstrating high predictive accuracy, robust calibration, and substantial clinical utility, as evidenced by DCA. Importantly, the application of SHAP analysis enhances the model’s transparency, providing clinicians with intuitive insights into the model’s decision-making process for each patient. Our findings reinforce the exceedingly low risk of OLNM in adenocarcinoma in situ and MIA, supporting a more conservative approach to LND in these subgroups. While promising, the model requires further validation in prospective, multi-center settings to confirm its generalizability across diverse populations and clinical practices before widespread clinical implementation. This tool holds significant potential to inform personalized surgical planning, optimizing the balance between oncological radicality and surgical morbidity.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-1112/rc
Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-1112/dss
Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-1112/prf
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-1112/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Institutional Review Board of the National Cancer Center (approval No. NCC2023C-913). The board granted a waiver of informed consent based on the retrospective design of the study. All patient data were anonymized to protect confidentiality.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
- Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
- Leiter A, Veluswamy RR, Wisnivesky JP. The global burden of lung cancer: current status and future trends. Nat Rev Clin Oncol 2023;20:624-39. [Crossref] [PubMed]
- Aran D, Hu Z, Butte AJ. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol 2017;18:220. [Crossref] [PubMed]
- Aberle DR, Adams AM, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365:395-409. [Crossref] [PubMed]
- Sateia HF, Choi Y, Stewart RW, et al. Screening for lung cancer. Semin Oncol 2017;44:74-82. [Crossref] [PubMed]
- Barta JA, Powell CA, Wisnivesky JP. Global Epidemiology of Lung Cancer. Ann Glob Health 2019;85:8. [Crossref] [PubMed]
- Zer A, Ahn MJ, Barlesi F, et al. Early and locally advanced non-small-cell lung cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann Oncol 2025;36:1245-62. [Crossref] [PubMed]
- Maniwa T, Okami J, Miyoshi T, et al. Lymph node dissection in small peripheral lung cancer: Supplemental analysis of JCOG0802/WJOG4607L. J Thorac Cardiovasc Surg 2024;168:674-683.e1. [Crossref] [PubMed]
- Martinez-Zayas G, Almeida FA, Yarmus L, et al. Predicting Lymph Node Metastasis in Non-small Cell Lung Cancer: Prospective External and Temporal Validation of the HAL and HOMER Models. Chest 2021;160:1108-20. [Crossref] [PubMed]
- Li Z, Pan C, Xu W, et al. Distinct impacts of radiological appearance on lymph node metastasis and prognosis based on solid size in clinical T1 non-small cell lung cancer. Respir Res 2024;25:96. [Crossref] [PubMed]
- Deng J, Zhong Y, Wang T, et al. Lung cancer with PET/CT-defined occult nodal metastasis yields favourable prognosis and benefits from adjuvant therapy: a multicentre study. Eur J Nucl Med Mol Imaging 2022;49:2414-24. [Crossref] [PubMed]
- Li F, Zhai S, Fu L, et al. Nomograms for intraoperative prediction of lymph node metastasis in clinical stage IA lung adenocarcinoma. Cancer Med 2023;12:14360-74. [Crossref] [PubMed]
- Ghaly G, Rahouma M, Kamel MK, et al. Clinical Predictors of Nodal Metastases in Peripherally Clinical T1a N0 Non-Small Cell Lung Cancer. Ann Thorac Surg 2017;104:1153-8. [Crossref] [PubMed]
- Ismail M, Nachira D, Swierzy M, et al. Lymph node upstaging for non-small cell lung cancer after uniportal video-assisted thoracoscopy. J Thorac Dis 2018;10:S3648-54. [Crossref] [PubMed]
- Licht PB, Jørgensen OD, Ladegaard L, et al. A national study of nodal upstaging after thoracoscopic versus open lobectomy for clinical stage I lung cancer. Ann Thorac Surg 2013;96:943-9; discussion 949-50. [Crossref] [PubMed]
- Huang X, Wang J, Chen Q, et al. Mediastinal lymph node dissection versus mediastinal lymph node sampling for early stage non-small cell lung cancer: a systematic review and meta-analysis. PLoS One 2014;9:e109979. [Crossref] [PubMed]
- Shayani J, Flores RM, Hakami A. Mediastinal lymph node dissection: the debate is not resolved. J Thorac Dis 2017;9:1848-50. [Crossref] [PubMed]
- Darling GE, Allen MS, Decker PA, et al. Randomized trial of mediastinal lymph node sampling versus complete lymphadenectomy during pulmonary resection in the patient with N0 or N1 (less than hilar) non-small cell carcinoma: results of the American College of Surgery Oncology Group Z0030 Trial. J Thorac Cardiovasc Surg 2011;141:662-70. [Crossref] [PubMed]
- Luo J, Yang S, Dong S. Selective Mediastinal Lymphadenectomy or Complete Mediastinal Lymphadenectomy for Clinical Stage I Non-Small Cell Lung Cancer: A Meta-Analysis. Adv Ther 2021;38:5671-83. [Crossref] [PubMed]
- Deo RC. Machine Learning in Medicine. Circulation 2015;132:1920-30. [Crossref] [PubMed]
- Qu L, Zhu J, Mei X, et al. Evaluating axillary lymph node metastasis risks in breast cancer patients via Semi-ALNP: a multicenter study. EClinicalMedicine 2025;85:103311. [Crossref] [PubMed]
- Wu B, Zhu Y, Hu Z, et al. Machine learning predictive models and risk factors for lymph node metastasis in non-small cell lung cancer. BMC Pulm Med 2024;24:526. [Crossref] [PubMed]
- Gu W, Chen Y, Zhu H, et al. Development and validation of CT-based radiomics deep learning signatures to predict lymph node metastasis in non-functional pancreatic neuroendocrine tumors: a multicohort study. EClinicalMedicine 2023;65:102269. [Crossref] [PubMed]
- Liu MW, Zhang X, Wang YM, et al. A comparison of machine learning methods for radiomics modeling in prediction of occult lymph node metastasis in clinical stage IA lung adenocarcinoma patients. J Thorac Dis 2024;16:1765-76. [Crossref] [PubMed]
- Zhang H, Li Y, Wu S, et al. Machine learning-based radiomics for guiding lymph node dissection in clinical stage I lung adenocarcinoma: a multicenter retrospective study. Transl Lung Cancer Res 2024;13:3579-89. [Crossref] [PubMed]
- Tsutani Y, Miyata Y, Nakayama H, et al. Appropriate sublobar resection choice for ground glass opacity-dominant clinical stage IA lung adenocarcinoma: wedge resection or segmentectomy. Chest 2014;145:66-71. [Crossref] [PubMed]
- Nakamura K, Saji H, Nakajima R, et al. A phase III randomized trial of lobectomy versus limited resection for small-sized peripheral non-small cell lung cancer (JCOG0802/WJOG4607L). Jpn J Clin Oncol 2010;40:271-4. [Crossref] [PubMed]
- Lee SY, Jeon JH, Jung W, et al. Predictive Factors for Lymph Node Metastasis in Clinical Stage I Part-Solid Lung Adenocarcinoma. Ann Thorac Surg 2021;111:456-62. [Crossref] [PubMed]
- Riely GJ, Wood DE, Aisner DL, et al. NCCN Guidelines® Insights: Non-Small Cell Lung Cancer, Version 7.2025. J Natl Compr Canc Netw 2025;23:354-62. [Crossref] [PubMed]
- Allen MS, Darling GE, Pechet TT, et al. Morbidity and mortality of major pulmonary resections in patients with early-stage lung cancer: initial results of the randomized, prospective ACOSOG Z0030 trial. Ann Thorac Surg 2006;81:1013-9; discussion 1019-20. [Crossref] [PubMed]
- Cho HJ, Kim DK, Lee GD, et al. Chylothorax complicating pulmonary resection for lung cancer: effective management and pleurodesis. Ann Thorac Surg 2014;97:408-13. [Crossref] [PubMed]
- Deng H, Zhou J, Chen H, et al. Impact of lymphadenectomy extent on immunotherapy efficacy in postresectional recurred non-small cell lung cancer: a multi-institutional retrospective cohort study. Int J Surg 2024;110:238-52. [Crossref] [PubMed]
- Fear VS, Forbes CA, Neeve SA, et al. Tumour draining lymph node-generated CD8 T cells play a role in controlling lung metastases after a primary tumour is removed but not when adjuvant immunotherapy is used. Cancer Immunol Immunother 2021;70:3249-58. [Crossref] [PubMed]
- Fransen MF, Schoonderwoerd M, Knopf P, et al. Tumor-draining lymph nodes are pivotal in PD-1/PD-L1 checkpoint therapy. JCI Insight 2018;3:e124507. [Crossref] [PubMed]
- Lee SW, Kim SJ. Is Delayed Image of 18F-FDG PET/CT Necessary for Mediastinal Lymph Node Staging in Non-Small Cell Lung Cancer Patients? Clin Nucl Med 2022;47:414-21. [Crossref] [PubMed]
- Keller SM, Adak S, Wagner H, et al. Mediastinal lymph node dissection improves survival in patients with stages II and IIIa non-small cell lung cancer. Eastern Cooperative Oncology Group. Ann Thorac Surg 2000;70:358-65; discussion 365-6. [Crossref] [PubMed]
- Liang W, He J, Shen Y, et al. Impact of Examined Lymph Node Count on Precise Staging and Long-Term Survival of Resected Non-Small-Cell Lung Cancer: A Population Study of the US SEER Database and a Chinese Multi-Institutional Registry. J Clin Oncol 2017;35:1162-70. [Crossref] [PubMed]
- Kang Y, Li M, Xing X, et al. Computed tomography-based radiomics model for predicting station 4 lymph node metastasis in non-small cell lung cancer. BMC Med Imaging 2025;25:202. [Crossref] [PubMed]
- Xu Y, Ma D, Qin Y, et al. Prognostic significance of pathological response and lymph node status in neoadjuvant immunotherapy for potentially resectable non-small cell lung cancer. Ann Med 2025;57:2453825. [Crossref] [PubMed]
- Sun W, Qu L, Wu J, et al. "Percentage" and "size" of residual viable tumor in lymph node, the performance in estimating pathologic response of lymph node in non-small cell lung cancer treated with neoadjuvant chemoimmunotherapy. Hum Pathol 2024;149:1-9. [Crossref] [PubMed]
- Lao S, Chen Z, Wang W, et al. Prognostic patterns in invasion lymph nodes of lung adenocarcinoma reveal distinct tumor microenvironments. NPJ Precis Oncol 2024;8:164. [Crossref] [PubMed]
- Zhang Y, Qian B, Song Q, et al. Phase III Study of Mediastinal Lymph Node Dissection for Ground Glass Opacity-Dominant Lung Adenocarcinoma. J Clin Oncol 2025;43:3081-9. [Crossref] [PubMed]

