An online explainable ensemble machine learning model for predicting epidermal growth factor receptor mutation status in lung adenocarcinoma
Highlight box
Key findings
• An online ensemble machine learning (EML) model by combining multiple single models achieved good performance in predicting epidermal growth factor receptor (EGFR) mutation status in lung adenocarcinoma.
What is known and what is new?
• Sex, smoking history, pleural retraction, texture, long-axis diameter, vascular convergence, air bronchogram, bubblelike lucency, and deep learning score were significantly correlated with EGFR mutation status.
• The SHapley Additive exPlanation (SHAP) method provided favorable explanations for the EML model.
• An online web tool can enhance clinical utility and accessibility for both clinicians and patients.
What is the implication, and what should change now?
• We developed and validated a novel online EML model that combines multiple machine learning algorithms based on clinical and radiological characteristics, along with a computed tomography-based deep learning model, for predicting EGFR mutation status in lung adenocarcinoma. The online model might assist clinicians in making individualized predictions for patients and guiding EGFR-tyrosine kinase inhibitor targeted therapies.
Introduction
Globally, lung cancer remains the most frequently diagnosed cancer, with an estimated 2.5 million new cases in 2022. Additionally, it is the leading cause of cancer-related death, accounting for nearly 1.8 million deaths (1,2). Among the subtypes of lung cancer, lung adenocarcinoma is the most prevalent, accounting for approximately 40% of all cases (3). The emergence of epidermal growth factor receptor-tyrosine kinase inhibitors (EGFR-TKIs) targeted therapies, such as afatinib and osimertinib, has significantly improved the prognosis of patients with lung adenocarcinoma (4-6). However, epidermal growth factor receptor (EGFR) mutation status is the strongest correlate of lung adenocarcinoma patients’ response to the EGFR-TKIs. The objective response rates of EGFR-TKIs can reach 80% in patients with EGFR mutations but only 7% in patients with EGFR wild-type (7,8). Therefore, accurate detection of EGFR mutation status prior to treatment is essential to identify suitable candidates for EGFR-TKIs targeted therapies, thus supporting individualized treatment decisions.
Historically, the gold standard for detecting EGFR mutation status is molecular genetic testing of tumor tissue acquired from biopsy (9). However, performing biopsies before treatment is not always feasible due to issues such as small sample sizes and sampling errors (10). Additionally, tumor tissue biopsies present significant shortcomings, including the risk of local tumor metastasis (11) and the financial burden associated with repeated procedures. Therefore, a non-invasive and reproducible method for accurately predicting EGFR mutation status is urgently needed. Computed tomography (CT) scans are extensively used as a non-invasive and reproducible method for diagnosing, staging, and monitoring lung cancer (12-14). Previous studies have shown that machine learning (ML) models using clinical and radiological characteristics can predict EGFR mutation status in lung adenocarcinoma (11,15,16). Furthermore, research has indicated that CT-based deep learning (DL) models can also achieve this prediction. However, many of these models rely on a single ML algorithm, which may not adequately capture the complexity of the data. This limitation could result in bias and suboptimal predictive performance.
Ensemble machine learning (EML) is a branch of ML that combines multiple ML algorithms to improve predictive performance and robustness (17,18). This approach leverages the strengths of various models to mitigate their individual weaknesses, resulting in a model that is often more accurate and reliable than any single model alone (19,20). EML techniques have been successfully applied across a wide range of domains, demonstrating their versatility and effectiveness. One notable application of EML is in the field of healthcare, where it has been used to predict persistent high healthcare utilizers (21). By integrating multiple models, researchers have improved the accuracy of predictions regarding care management needs, thus facilitating more efficient allocation of healthcare resources to patients who require them most. Similarly, in the field of cancer diagnosis, ensemble learning models have been employed to classify cancer types based on genomic data (22), showing significant improvements in diagnostic accuracy.
The main purpose of this study was to develop and validate an EML model that combines multiple ML models to predict EGFR mutation status in lung adenocarcinoma. To address the “black box” nature of ML algorithms, we employed the SHapley Additive exPlanation (SHAP) method to explain the EML model’s prediction process (23-25). Additionally, we created an online web tool for the EML model, enhancing its clinical utility and making it more accessible for both clinicians and patients. We present this article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-237/rc).
Methods
This study was approved by the institutional ethics review board (approve number: PJ2024-01-57) of the First Affiliated Hospital of Anhui Medical University and adhered to the Declaration of Helsinki and its subsequent amendments. All participating hospitals/institutions were informed and agreed the study. Informed consent was waived due to its retrospective nature. This study was also registered as ChiCTR2400083082 in the WHO International Clinical Trials Registry. Clinical data and CT images were collected from the electronic medical records and the picture archiving and communication system (PACS) of Anhui Chest Hospital, the First Affiliated Hospital of Anhui Medical University, and Fuyang People’s Hospital. Lung adenocarcinoma patients who underwent molecular genetic testing were identified retrospectively from three hospitals between March 2019 and January 2023. The inclusion criteria were: (I) histologically confirmed lung adenocarcinoma; (II) EGFR genotype determined using tumor specimens. The exclusion criteria were: (I) a gap of over one month between CT scan and EGFR testing; (II) previous treatment; (III) missing CT image or poor image quality. Patients from the primary center formed the training cohort (n=556), while those from the other two centers comprised the validation cohort (n=267). The patient enrollment process is illustrated in Figure 1. To analyze survival outcomes, we included 74 patients from the validation cohort, all of whom had EGFR mutations characterized by either exon 19 deletions or exon 21 L858R point mutations and received EGFR-TKI treatment. The selection process for the survival cohort is illustrated in Figure S1, which outlines the strict exclusion criteria: (I) having uncommon EGFR mutations; (II) having undergone surgical operations; (III) having received non-EGFR-TKI treatment; (IV) having changed treatment regions after EGFR-TKI treatment; and (V) lacking follow-up data.
EGFR mutation status acquiring
An example of CT-guided percutaneous biopsy of the target tumor to obtain tissue for molecular genetic testing is shown in Figure S2. Histopathological confirmation focused on the tumor specimen area with a higher concentration of malignant cells and fewer differentiated cells. The EGFR genotype was determined using amplification refractory mutation system-polymerase chain reaction or gene sequencing. Tumors were classified as having EGFR mutations if any mutation in the exons was detected; otherwise, they were deemed EGFR wild-type.
CT scan protocol
Before the CT scan, participants received training on taking deep breaths and holding them. Each participant was positioned head-first and supine with their arms raised during the scanning process. CT scans were performed at the end of inhalation, covering the area from the apex of the lungs to the diaphragm. CT images were obtained using CT systems, including the SOMATOM Definition AS+ from Siemens Healthcare and the GE Discovery 750 from GE Medical Systems. The imaging used a pulmonary window centered at −500 Hounsfield units (HU) with a width of 1,500 HU, as well as a mediastinal window centered at 50 HU with a width of 350 HU. Scanning parameters were as follows: tube voltage (kV): 120; tube current: automatic tube current modulation; contrast agent concentration: 300 mgI/mL; contrast-enhanced CT: 25 seconds after injection of the contrast agent; contrast agent infusion rate (mL/s): 3; slice thickness (mm): 1; field of view (mm): 350×350; matrix: 512×512.
Assessment of radiological characteristic
For patients with multiple lung lesions, the largest lesion was selected as the primary tumor for analysis. Sixteen radiological characteristics from a previous study (15) were assessed by a junior radiologist and confirmed by a senior thoracic radiologist. These radiological characteristics included: long-axis diameter, short-axis diameter, size category, texture, border definition, enhancement, bubblelike lucency, spiculation, air bronchogram, pleural attachment, vascular convergence, pleural retraction, thickened adjacent bronchovascular bundles (TABB), peripheral emphysema, peripheral fibrosis, and lymphadenopathy. When encountering characteristics that were challenging to define clearly, a detailed analysis was performed using multiplanar reconstruction (MPR), maximum intensity projection (MIP), and volume rendering technique (VRT) through the post-processing technology of CT. Both thoracic radiologists were blinded to clinical and molecular data, and any discrepancies were resolved by consensus. Thirty tumors were randomly selected to evaluate intra- and inter-observer agreements between the two thoracic radiologists, who independently assessed the radiological characteristics at the start and again after two weeks.
Characteristic selection
Twenty-two characteristics were selected using least absolute shrinkage and selection operator (LASSO) regression, which included 16 radiological characteristics and six clinical characteristics: sex, age, smoking history, clinical stage, tumor location, and carcinoembryonic antigen (CEA) level. LASSO regression applies an L1 regularization penalty to the traditional linear regression approach, allowing it to shrink the coefficients of some less important characteristics to zero (26). The strength of this method lies in its capability to identify and eliminate redundant characteristics that contribute minimally to predictive performance. To enhance generalizability and robustness, a bootstrap procedure with 1,000 resamples was executed, and ten-fold cross-validation was conducted on each bootstrap sample to determine the optimal regularization parameter λ that minimizes cross-validation error.
Model construction, evaluation, and explanation
The overall workflow for this study is detailed in Figure 2. Five prevalent single ML models incorporating clinical and radiological characteristics—random forest (RF), logistic regression (LR), support vector machine (SVM), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost)—along with a CT-based DL model were constructed to predict EGFR mutation status. The DL model was trained using a residual network with 50 parameter layers (ResNet50) (27), a state-of-the-art convolutional neural network (CNN) architecture. Tumor annotation was performed using ITK-SNAP 3.8.0 software. A junior radiologist manually delineated the region of interest (ROI) with a bounding box encompassing the entire tumor in the axial plane, which was subsequently verified by a senior radiologist. Any discrepancies in the annotations were resolved through consensus between the two radiologists. We employed trilinear interpolation to resample the original CT images and the corresponding ROI masks to a standardized voxel size of 1×1×1 mm3. Subsequently, the tumor regions were resized to 224×224 pixels and normalized to a range of 0 to 1 for input into the model. The ResNet50 architecture comprises 16 residual blocks, concluding with a global average pooling layer and a fully connected layer with softmax activation. The output of this fully connected layer generated the prediction scores for EGFR mutations. The EML model was further developed by combining these six models using a weighted averaging approach (18), and a grid search method was applied for hyperparameter optimization. The weighted averaging formula typically takes the form of a linear combination of the predictions from each model. This approach allows the ensemble model to leverage the strengths of each constituent model while mitigating their weaknesses (28,29). The performance of the established models for predicting EGFR mutation status was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC). Accuracy, specificity, and sensitivity were also computed. The Delong test was applied to compare the statistical differences in AUCs. Decision curve analysis (DCA) curves were used to assess the clinical net benefit. Calibration curves were used to illustrate the agreement between the model predictions and the actual observed outcomes.
The Shapley value, based on coalitional game theory, can be used to assess the contribution of each input feature to the model’s output, providing both global and local explanations for the model. In this study, we employed the SHAP method to explain the prediction process of the EML model, addressing the “black box” challenge. A SHAP summary plot and a bee swarm plot were generated to provide a global explanation. Additionally, a SHAP force plot was created for a selection of representative cases to offer a local explanation.
Survival analysis
In the survival cohort, patients were classified into low-score and high-score groups based on the median value of the EML score (median, 0.713). The Kaplan-Meier method was used to estimate progression-free survival (PFS), and differences in survival curves between the low-score and high-score groups were assessed using the log-rank test.
Statistical analysis
Quantitative variables that follow a normal distribution were presented as mean and standard deviation (SD). In contrast, those that do not follow a normal distribution were presented as median and interquartile range (IQR). Frequencies and percentages were used for categorical variables. Quantitative variables were compared using Student’s t-test or Mann-Whitney U test, while categorical variables were compared using the Pearson χ2 test. The inter- and intra-observer agreements of CT imaging features were assessed using intraclass correlation coefficients (ICCs) for quantitative features and kappa coefficients for categorical features. All tests were two sided with type I error rates of 0.05. ML algorithms and SHAP analysis were implemented using Python 3.6. All statistical analyses were conducted using SPSS statistical software and R (version 4.2.1, https://www.r-project.org).
Results
Baseline clinical characteristics
The baseline clinical characteristics and radiological characteristics of the study patients are detailed in Tables 1,2, respectively. This study ultimately identified 823 patients, comprising 432 females and 391 males, with a mean age of 62.62±11.16 years. The training cohort included 556 cases, and the validation cohort comprised 267 cases. No significant differences were observed between the two cohorts regarding age (P=0.70), sex (P=0.47), smoking status (P=0.36), clinical stage (P=0.23), CEA level (P=0.38), EGFR mutation status (P=0.82) and tumor location (P=0.94).
Table 1
| Variables | Total (n=823) | Training cohort (n=556) | Validation cohort (n=267) | P |
|---|---|---|---|---|
| Age (years) | 62.62±11.16 | 62.73±11.11 | 62.40±11.29 | 0.70 |
| Sex | 0.47 | |||
| Male | 391 (47.51) | 269 (48.38) | 122 (45.69) | |
| Female | 432 (52.49) | 287 (51.62) | 145 (54.31) | |
| Smoking | 0.36 | |||
| Never | 584 (70.96) | 389 (69.94) | 195 (73.03) | |
| Current or former | 239 (29.04) | 167 (30.04) | 72 (26.97) | |
| Clinical stage | 0.23 | |||
| I–III | 496 (60.27) | 343 (61.69) | 153 (57.30) | |
| IV | 327 (39.73) | 213 (38.31) | 114 (42.70) | |
| CEA (ng/mL) | 9.79 (5.13, 17.22) | 9.48 (4.93, 17.12) | 10.44 (5.45, 18.19) | 0.38 |
| EGFR | 0.82 | |||
| Wild | 353 (42.89) | 237 (42.63) | 116 (43.45) | |
| Mutant | 470 (57.11) | 319 (57.37) | 151 (56.55) | |
| Location | 0.94 | |||
| Left lower lobe | 204 (24.79) | 141 (25.36) | 63 (23.60) | |
| Left upper lobe | 132 (16.04) | 87 (15.65) | 45 (16.85) | |
| Right lower lobe | 263 (31.96) | 181 (32.55) | 82 (30.71) | |
| Right middle lobe | 83 (10.09) | 55 (9.89) | 28 (10.49) | |
| Right upper lobe | 141 (17.13) | 92 (16.55) | 49 (18.35) |
Data are presented as n (%), median (IQR) or mean ± SD. CEA, carcinoembryonic antigen; EGFR epidermal growth factor receptor; IQR, interquartile range; SD, standard deviation.
Table 2
| Characteristics | Total (n=823) | Training cohort (n=556) | Validation cohort (n=267) | P |
|---|---|---|---|---|
| Long-axis diameter (cm) | 3.26±1.32 | 3.21±1.31 | 3.36±1.35 | 0.14 |
| Short-axis diameter (cm) | 2.21±0.84 | 2.19±0.82 | 2.25±0.88 | 0.34 |
| Size category | 0.17 | |||
| Diameter ≤3 cm | 395 (48.00) | 276 (49.64) | 119 (44.57) | |
| Diameter >3 cm | 428 (52.00) | 280 (50.36) | 148 (55.43) | |
| Texture | 0.47 | |||
| Subsolid | 248 (30.13) | 172 (30.94) | 76 (28.46) | |
| Solid | 575 (69.87) | 384 (69.06) | 191 (71.54) | |
| Border definition | 0.65 | |||
| Well-defined | 370 (44.96) | 253 (45.50) | 117 (43.82) | |
| Poor-defined | 453 (55.04) | 303 (54.50) | 150 (56.18) | |
| Enhancement | 0.37 | |||
| Homogeneous | 342 (41.56) | 237 (42.63) | 105 (39.33) | |
| Non-homogeneous | 481 (58.44) | 319 (57.37) | 162 (60.67) | |
| Bubblelike lucency | 0.63 | |||
| Present | 271 (32.93) | 180 (32.37) | 91 (34.08) | |
| Spiculation | 0.78 | |||
| Present | 465 (56.50) | 316 (56.83) | 149 (55.81) | |
| Air bronchogram | 0.77 | |||
| Present | 444 (53.95) | 298 (53.60) | 146 (54.68) | |
| Pleural attachment | 0.67 | |||
| Present | 266 (32.32) | 177 (31.83) | 89 (33.33) | |
| Vascular convergence | 0.59 | |||
| Present | 507 (61.60) | 339 (60.97) | 168 (62.92) | |
| Pleural retraction | 0.12 | |||
| Present | 371 (45.08) | 261 (46.94) | 110 (41.20) | |
| TABB | 0.63 | |||
| Present | 311 (37.79) | 207 (37.23) | 104 (38.95) | |
| Peripheral emphysema | 0.54 | |||
| Present | 432 (52.49) | 296 (53.23) | 136 (50.94) | |
| Peripheral fibrosis | 0.22 | |||
| Present | 469 (56.99) | 325 (58.45) | 144 (53.93) | |
| Lymphadenopathy | 0.45 | |||
| Present | 339 (41.19) | 224 (40.29) | 115 (43.07) |
Data are presented as n (%) or mean ± SD. SD, standard deviation; TABB, thickened adjacent bronchovascular bundles.
Intra- and inter-observer agreement
Two thoracic radiologists demonstrated excellent intra- and inter-observer agreement for radiological characteristics. All Kappa coefficients for categorical features and the ICCs for quantitative features were greater than 0.80 (Table S1).
Characteristic selection
Figure 3 shows the cross-validation plot and the graph of coefficient paths for LASSO regression. It was observed that the best performance occurred at a logarithm of λ (lambda.min) of −4.4. A total of eight characteristics were identified as critical predictors for the ML models used to predict EGFR mutation status. The eight characteristics identified were sex, smoking history, pleural retraction, texture, long-axis diameter, vascular convergence, air bronchogram, and bubblelike lucency.
Model performance and comparisons
Selected clinical and radiological characteristics were used to build five ML models for predicting the EGFR mutation status in lung adenocarcinoma. Subsequently, an EML model was created by combining these models with a DL model using a weighted averaging approach. The performance of models in the training and validation cohorts is detailed in Table 3. In the training cohort, the AUC values for RF, LR, SVM, LightGBM, XGBoost, DL, and EML were 0.851 [95% confidence interval (CI): 0.819–0.883], 0.790 (95% CI: 0.752–0.828), 0.810 (95% CI: 0.774–0.847), 0.835 (95% CI: 0.802–0.868), 0.853 (95% CI: 0.821–0.884), 0.884 (95% CI: 0.855–0.912) and 0.928 (95% CI: 0.908–0.949), respectively (Figure 4A). The Delong test indicated that the AUC of the EML model showed significant differences compared with RF (P<0.001), LR (P<0.001), SVM (P<0.001), LightGBM (P<0.001), XGBoost (P<0.001), and DL (P=0.009) (Table 4). In the validation cohort, the EML model also demonstrated promising performance, achieving an AUC of 0.813 (95% CI: 0.763–0.864), which surpassed that of single ML models, including RF with an AUC of 0.753 (95% CI: 0.695–0.812), LR with an AUC of 0.744 (95% CI: 0.686–0.803), SVM with an AUC of 0.732 (95% CI: 0.671–0.792), LightGBM with an AUC of 0.749 (95% CI: 0.691–0.808), XGBoost with an AUC of 0.751 (95% CI: 0.693–0.809), and DL with an AUC of 0.754 (95% CI: 0.696–0.812) (Figure 4B). The Delong test demonstrated that the AUC of the EML model significantly differed from those of RF (P=0.003), LR (P<0.001), SVM (P<0.001), LightGBM (P=0.01), XGBoost (P=0.002), and DL (P=0.004) (Table 4). Moreover, DCA revealed that the clinical net benefit of the EML model was superior to single ML models (Figure 4C,4D). Additionally, calibration curves demonstrated that the EML model for EGFR mutation status prediction exhibited better agreement with actual results (Figure 4E,4F). The Delong test result heatmap among ML models is shown in Figure 5.
Table 3
| Models | AUC | ACC | SNE | SPE |
|---|---|---|---|---|
| Training cohort | ||||
| RF | 0.851 (0.819–0.883) | 0.777 | 0.777 | 0.776 |
| LR | 0.790 (0.752–0.828) | 0.723 | 0.752 | 0.684 |
| SVM | 0.810 (0.774–0.847) | 0.745 | 0.727 | 0.768 |
| LightGBM | 0.835 (0.802–0.868) | 0.746 | 0.683 | 0.831 |
| XGBoost | 0.853 (0.821–0.884) | 0.781 | 0.824 | 0.722 |
| DL | 0.884 (0.855–0.912) | 0.815 | 0.831 | 0.793 |
| EML | 0.928 (0.908–0.949) | 0.858 | 0.881 | 0.827 |
| Validation cohort | ||||
| RF | 0.753 (0.695–0.812) | 0.715 | 0.735 | 0.690 |
| LR | 0.744 (0.686–0.803) | 0.689 | 0.702 | 0.672 |
| SVM | 0.732 (0.671–0.792) | 0.678 | 0.576 | 0.810 |
| LightGBM | 0.749 (0.691–0.808) | 0.708 | 0.509 | 0.861 |
| XGBoost | 0.751 (0.693–0.809) | 0.693 | 0.695 | 0.690 |
| DL | 0.754 (0.696–0.812) | 0.700 | 0.616 | 0.810 |
| EML | 0.813 (0.763–0.864) | 0.723 | 0.662 | 0.802 |
AUC, area under the receiver operating characteristic curve; ACC, accuracy; DL, deep learning; EML, ensemble machine learning; LightGBM, light gradient boosting machine; LR, logistic regression; RF, random forest; SNE, sensitivity; SPE, specificity; SVM, support vector machine; XGBoost, extreme gradient boosting.
Table 4
| Model | Z | P |
|---|---|---|
| Training cohort | ||
| EML vs. RF | 3.939 | <0.001 |
| EML vs. LR | 6.335 | <0.001 |
| EML vs. SVM | 5.465 | <0.001 |
| EML vs. LightGBM | 4.710 | <0.001 |
| EML vs. XGBoost | 3.920 | <0.001 |
| EML vs. DL | 2.597 | 0.009 |
| Validation cohort | ||
| EML vs. RF | 2.987 | 0.003 |
| EML vs. LR | 3.706 | <0.001 |
| EML vs. SVM | 4.040 | <0.001 |
| EML vs. LightGBM | 2.565 | 0.01 |
| EML vs. XGBoost | 3.065 | 0.002 |
| EML vs. DL | 2.871 | 0.004 |
DL, deep learning; EML, ensemble machine learning; LightGBM, light gradient boosting machine; LR, logistic regression; ML, machine learning; RF, random forest; SVM, support vector machine; XGBoost, extreme gradient boosting.
EML model explanation
We employed SHAP values to provide a detailed and clear explanation of the prediction process of the EML model. In the global explanation, the SHAP summary plot ranks the critical variables in descending order of their contribution to the model’s predictions (Figure 6A). The analysis identified the following critical variables, ranked from largest to smallest contribution: DL score (mean SHAP value of 0.997), long-axis diameter (mean SHAP value of 0.591), smoking history (mean SHAP value of 0.439), pleural retraction (mean SHAP value of 0.392), texture (mean SHAP value of 0.346), vascular convergence (mean SHAP value of 0.322), sex (mean SHAP value of 0.174), air bronchogram (mean SHAP value of 0.092), and bubblelike lucency (mean SHAP value of 0.068). The bee swarm plot further illustrates the impact of each variable on the prediction process (Figure 6B). The results indicated that patients who are female, have no smoking history, exhibit a smaller long-axis diameter, possess subsolid tumors, show vascular convergence, demonstrate pleural retraction, present with air bronchogram, display bubblelike lucency, and have higher DL scores are more likely to be EGFR-mutant. In the local explanation of the model, the SHAP force plot illustrates a typical case of a patient with a correctly predicted EGFR mutation status (Figure 7).
Survival outcomes associated with EML
The median PFS for patients in the survival cohort was 15.19 months (95% CI: 12.36–19.27). Notably, patients in the high-score group exhibited significantly longer PFS compared to those in the low-score group, with a hazard ratio (HR) of 0.30 (95% CI: 0.15–0.61; P<0.001) (Figure 8A). To further investigate the clinical relevance of the EML score in specific subgroups, we analyzed its impact on patients with exon 19 deletion mutations and those with L858R point mutations. Statistically significant differences in PFS were observed between the low-score and high-score groups for patients with exon 19 deletion mutations (P=0.02) (Figure 8B) and for those with L858R point mutations (P=0.009) (Figure 8C).
Web tool
To facilitate the model’s implementation and guide more individualized clinical decision-making, we have constructed an online web tool. This tool is designed to provide both clinicians and patients with a user-friendly interface that allows for the input of individual patient data to generate tailored risk assessments (Available from: https://ensemble-machine-learning.github.io/EGFR-prediction/). The web tool is based on the EML model and can predict the probability of EGFR mutation (Figure 9). On the left side of the tool, users can input the DL score and the long-axis diameter of the tumor, and select the following characteristics: texture (subsolid or solid), bubblelike lucency (absent or present), air bronchogram (absent or present), vascular convergence (absent or present), pleural retraction (absent or present), sex (male or female), and smoking history (never or current or former). After completing the information input, click the “Predict” button at the bottom. The tool will automatically display the prediction of EGFR-mutant probability in the lower right corner. It is worth noting that, based on the actual clinical situation, we set the input parameter of the long diameter within the range of 1 to 10 centimeters.
Discussion
The National Comprehensive Cancer Network (NCCN) guidelines have highlighted the importance of testing for gene mutations in patients diagnosed with lung adenocarcinoma (30), as the implementation of targeted therapies for driver gene mutations has significantly improved the survival rates of patients. Among various mutations, the EGFR mutation is the most prevalent driver mutation in lung adenocarcinoma, accounting for about 50% of Asian patients and 15% to 25% of patients in North America and Europe (30,31). Accurately determining EGFR mutation status in a non-invasive manner is crucial for identifying lung adenocarcinoma patients who are suitable for EGFR-TKIs treatment (32).
Recent literature has demonstrated that ML models based on clinical and radiological characteristics can non-invasively predict EGFR mutation status in patients with lung adenocarcinoma. For instance, Liu et al. developed an LR model based on clinical and radiological characteristics, achieving an AUC of 0.778 (15). However, this study was confined to a single center and lacked independent validation. Similarly, Zhang et al. reported an AUC of 0.740 for a nomogram model that combined clinical and radiological characteristics (16), but this model was also limited to a single-center design and exhibited suboptimal predictive performance. While radiomics provides more quantitative features, it often requires extensive and precise manual delineation, which can be a significant limitation. Furthermore, previous literature indicated that the performance of ML models based on radiomic features for predicting EGFR mutation status was unsatisfactory. For example, Huang et al. reported an AUC of only 0.68 using a CT-based radiomics model (33), while Wang et al. achieved an AUC of just 0.64 with a similar method (34).
Ensemble learning is a robust technique that combines multiple learning algorithms to improve model performance, resulting in enhanced accuracy and robustness compared to single models. This approach capitalizes on the strengths of various algorithms, making it particularly advantageous in situations where single models may face challenges due to data limitations or model complexity. Recently, ensemble learning has garnered increasing interest in the medical field. For instance, a study by Karashima et al. reported that an ensemble learning model achieved an AUC of 90% in predicting primary aldosteronism (35). Similarly, Muhammad Usman et al. demonstrated that their ensemble learning model attained 94.2% sensitivity and 95.8% specificity in predicting epileptic seizures (36). In this study, we developed five ML models based on clinical and radiological characteristics, in addition to a CT-based DL model for the non-invasive prediction of EGFR mutation status. Subsequently, we combined these models to create an EML model. To our knowledge, this is the first multi-center study utilizing ensemble learning to predict EGFR mutation status in patients with lung adenocarcinoma. The EML model outperformed the single models, achieving an AUC of 0.928 in the training cohort and 0.813 in the validation cohort.
The “black box” nature and poor explanations of ML models may hinder their broad adoption in clinical settings, as clinicians are often reluctant to trust the predictions (37). To better uncover predictive mechanisms and address the “black box” challenge of ML algorithms, we applied the SHAP method to explain the constructed EML model. The global explanation provided by the SHAP method indicates that EGFR mutations are more common in females and non-smokers, and that tumors are more likely to present with a smaller long-axis diameter, subsolid characteristics, pleural retraction, vascular convergence, air bronchogram, bubble-like lucency, and a higher DL score. These results are consistent with the clinical demographic characteristics of EGFR mutations and the radiological characteristics of EGFR-mutated tumors reported in previous literature (15,38,39). The underlying mechanism may be associated with the correlation between EGFR mutations and estrogen levels, the higher mutation frequencies observed in non-smoking patients, and the progression of lung cancer from a ground-glass to a solid state driven by EGFR mutation (40-42). It is worth noting that the SHAP method revealed that the long-axis diameter is the most critical predictive factor among the clinical and radiological characteristics in this study. In a previous study by Liu et al. (15), the odds ratio (OR) for the long-axis diameter was 0.78 in univariate analysis, while smoking history had a more critical OR value of 0.34. The explanation for these differences may be that the long-axis diameter of tumors is a quantitative variable measured in centimeters, ranging from less than 1 cm to over 10 cm, whereas smoking history is a binary categorical variable.
The application of online web tools in clinical settings has shown significant potential in enhancing individualized decision-making processes (43,44). These tools serve as decision aids, providing both clinicians and patients with individualized information that can guide individualized treatments. For instance, the development of an online web tool for coronary heart disease has demonstrated significant potential in enhancing patient self-management (45). In the field of lung adenocarcinoma treatment, an online web tool could provide patients with an accessible platform to manage their condition effectively, potentially reducing health disparities. By inputting clinical and radiological characteristics, along with the DL score, our web tool generates the probability of EGFR mutations, thereby supporting patients and clinicians in making individualized clinical decisions.
Our study has several limitations. First, it is a retrospective analysis, which is subject to inherent selection bias; thus, the findings should be validated in additional prospective cohorts. Second, this study focused solely on analyzing EGFR mutation status without exploring specific subtypes of EGFR mutations. Future research should aim to develop more precise predictive models that can effectively differentiate among distinct EGFR mutation subtypes. Finally, although we analyzed the PFS of patients with EGFR mutations treated with EGFR-TKIs, we did not further evaluate other prognostic indicators, such as overall survival (OS), due to the potential impact of additional treatments administered after disease progression.
Conclusions
Our study developed an online explainable EML model by combining multiple ML models for the non-invasive prediction of EGFR mutation status and achieved good performance. Our model might assist clinicians in making individualized predictions for patients and guiding EGFR-TKIs targeting treatment.
Acknowledgments
We are grateful to all our colleagues for their help during the study and all the selfless volunteers who participated in this study.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-237/rc
Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-237/dss
Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-237/prf
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-237/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board (IRB) of the First Affiliated Hospital of Anhui Medical University in China (approve number: PJ2024-01-57; the approval date: January, 2024), all participating hospitals/institutions were informed and agreed the study. Informed consent was waived due to its retrospective nature.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
- Siegel RL, Kratzer TB, Giaquinto AN, et al. Cancer statistics, 2025. CA Cancer J Clin 2025;75:10-45. [Crossref] [PubMed]
- Kim MJ, Cervantes C, Jung YS, et al. PAF remodels the DREAM complex to bypass cell quiescence and promote lung tumorigenesis. Mol Cell 2021;81:1698-1714.e6. [Crossref] [PubMed]
- Yang JH, Kim Y, Lee S, et al. 4O: Amivantamab plus lazertinib vs osimertinib in first-line (1L) EGFR-mutant (EGFRm) advanced NSCLC: Final overall survival (OS) from the phase III MARIPOSA study. J Thorac Oncol 2025;20:S6-S8.
- Hua X, Wu X, Lv L, et al. Anlotinib enhances the anti-tumor activity of osimertinib in patients with non-small cell lung cancer by reversing drug resistance. Transl Lung Cancer Res 2025;14:40-57. [Crossref] [PubMed]
- Zhou F, Guo H, Xia Y, et al. The changing treatment landscape of EGFR-mutant non-small-cell lung cancer. Nat Rev Clin Oncol 2025;22:95-116. [Crossref] [PubMed]
- Soria JC, Ohe Y, Vansteenkiste J, et al. Osimertinib in Untreated EGFR-Mutated Advanced Non-Small-Cell Lung Cancer. N Engl J Med 2018;378:113-25. [Crossref] [PubMed]
- Lee JK, Hahn S, Kim DW, et al. Epidermal growth factor receptor tyrosine kinase inhibitors vs conventional chemotherapy in non-small cell lung cancer harboring wild-type epidermal growth factor receptor: a meta-analysis. JAMA 2014;311:1430-7. [Crossref] [PubMed]
- Wang L, Song B, Zhang Z, et al. Evaluating efficacy and safety of a novel registration-free CT-guided needle biopsy navigation system (RC 120): A multicenter, prospective clinical trial. Lung Cancer 2024;198:108025. [Crossref] [PubMed]
- Liu HE, Vuppalapaty M, Wilkerson C, et al. Detection of EGFR Mutations in cfDNA and CTCs, and Comparison to Tumor Tissue in Non-Small-Cell-Lung-Cancer (NSCLC) Patients. Front Oncol 2020;10:572895. [Crossref] [PubMed]
- Luo Y, Li S, Ma H, et al. CT-based decision tree model for predicting EGFR mutation status in synchronous multiple primary lung cancers. J Thorac Dis 2023;15:1196-209. [Crossref] [PubMed]
- Kirby M, Smith BM, Quantitative CT. Scan Imaging of the Airways for Diagnosis and Management of Lung Disease. Chest 2023;164:1150-8. [Crossref] [PubMed]
- Pan Z, Hu G, Zhu Z, et al. Predicting Invasiveness of Lung Adenocarcinoma at Chest CT with Deep Learning Ternary Classification Models. Radiology 2024;311:e232057. [Crossref] [PubMed]
- Yun JK, Kim JY, Ahn Y, et al. Predicting Recurrence after Sublobar Resection in Patients with Lung Adenocarcinoma Using Preoperative Chest CT Scans. Radiology 2024;313:e233244. [Crossref] [PubMed]
- Liu Y, Kim J, Qu F, et al. CT Features Associated with Epidermal Growth Factor Receptor Mutation Status in Patients with Lung Adenocarcinoma. Radiology 2016;280:271-80. [Crossref] [PubMed]
- Zhang G, Zhang J, Cao Y, et al. Nomogram based on preoperative CT imaging predicts the EGFR mutation status in lung adenocarcinoma. Transl Oncol 2021;14:100954. [Crossref] [PubMed]
- Kovačić Đ, Radočaj D, Jurišić M. Ensemble machine learning prediction of anaerobic co-digestion of manure and thermally pretreated harvest residues. Bioresour Technol 2024;402:130793. [Crossref] [PubMed]
- Sahoo G, Nayak AK, Tripathy PK, et al. Predicting Breast Cancer Relapse from Histopathological Images with Ensemble Machine Learning Models. Curr Oncol 2024;31:6577-97. [Crossref] [PubMed]
- Hasrod T, Nuapia YB, Tutu H. Comparison of individual and ensemble machine learning models for prediction of sulphate levels in untreated and treated Acid Mine Drainage. Environ Monit Assess 2024;196:332. [Crossref] [PubMed]
- Cai Z, Sun Q, Li C, et al. Machine-learning-based prediction by stacking ensemble strategy for surgical outcomes in patients with degenerative cervical myelopathy. J Orthop Surg Res 2024;19:539. [Crossref] [PubMed]
- Howson SN, McShea MJ, Ramachandran R, et al. Improving the Prediction of Persistent High Health Care Utilizers: Retrospective Analysis Using Ensemble Methodology. JMIR Med Inform 2022;10:e33212. [Crossref] [PubMed]
- Pasha Syed AR, Anbalagan R, Setlur AS, et al. Implementation of ensemble machine learning algorithms on exome datasets for predicting early diagnosis of cancers. BMC Bioinformatics 2022;23:496. [Crossref] [PubMed]
- Huang JC, Lyu SC, Pan B, et al. A logistic regression model to predict long-term survival for borderline resectable pancreatic cancer patients with upfront surgery. Cancer Imaging 2025;25:10. [Crossref] [PubMed]
- Crombé A, Kataoka M. Breast cancer molecular subtype prediction: Improving interpretability of complex machine-learning models based on multiparametric-MRI features using SHapley Additive exPlanations (SHAP) methodology. Diagn Interv Imaging 2024;105:161-2. [Crossref] [PubMed]
- Castagno S, Birch M, van der Schaar M, et al. Predicting rapid progression in knee osteoarthritis: a novel and interpretable automated machine learning approach, with specific focus on young patients and early disease. Ann Rheum Dis 2025;84:124-35. [Crossref] [PubMed]
- Xie R, Herder C, Sha S, et al. Novel type 2 diabetes prediction score based on traditional risk factors and circulating metabolites: model derivation and validation in two large cohort studies. EClinicalMedicine 2025;79:102971. [Crossref] [PubMed]
- Zhang JN, Li ZF, Zheng SY, et al. Deep learning model for predicting spread through air spaces of lung adenocarcinoma based on transfer learning mechanism. Transl Lung Cancer Res 2025;14:1061-75. [Crossref] [PubMed]
- Ichikawa S, Motosugi U, Tamada D, et al. Improving the Quality of Diffusion-weighted Imaging of the Left Hepatic Lobe Using Weighted Averaging of Signals from Multiple Excitations. Magn Reson Med Sci 2019;18:225-32. [Crossref] [PubMed]
- Nourani V, Elkiran G, Abba SI. Wastewater treatment plant performance analysis using artificial intelligence - an ensemble approach. Water Sci Technol 2018;78:2064-76. [Crossref] [PubMed]
- Riely GJ, Wood DE, Ettinger DS, et al. Non-Small Cell Lung Cancer, Version 4.2024, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 2024;22:249-74. [Crossref] [PubMed]
- Leighl NB. Osimertinib in Stage III EGFR-Mutated NSCLC - Game, Set, Match. N Engl J Med 2024;391:652-4. [Crossref] [PubMed]
- Mu W, Jiang L, Zhang J, et al. Non-invasive decision support for NSCLC treatment using PET/CT radiomics. Nat Commun 2020;11:5228. [Crossref] [PubMed]
- Huang X, Sun Y, Tan M, et al. Three-Dimensional Convolutional Neural Network-Based Prediction of Epidermal Growth Factor Receptor Expression Status in Patients With Non-Small Cell Lung Cancer. Front Oncol 2022;12:772770. [Crossref] [PubMed]
- Wang S, Shi J, Ye Z, et al. Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning. Eur Respir J 2019;53:1800986. [Crossref] [PubMed]
- Karashima S, Kawakami M, Nambo H, et al. A hyperaldosteronism subtypes predictive model using ensemble learning. Sci Rep 2023;13:3043. [Crossref] [PubMed]
- Muhammad Usman S, Khalid S, Bashir S. A deep learning based ensemble learning method for epileptic seizure prediction. Comput Biol Med 2021;136:104710. [Crossref] [PubMed]
- Wei Z, Bai X, Xv Y, et al. A radiomics-based interpretable machine learning model to predict the HER2 status in bladder cancer: a multicenter study. Insights Imaging 2024;15:262. [Crossref] [PubMed]
- Ren X, Wen X, Ren YJ, et al. Significance of thyroid transcription factor 1 and Napsin A for prompting the status of EGFR mutations in lung adenocarcinoma patients. J Thorac Dis 2022;14:4395-404. [Crossref] [PubMed]
- Seow WJ, Matsuo K, Hsiung CA, et al. Association between GWAS-identified lung adenocarcinoma susceptibility loci and EGFR mutations in never-smoking Asian women, and comparison with findings from Western populations. Hum Mol Genet 2017;26:454-65. [Crossref] [PubMed]
- Tani Y, Kaneda H, Koh Y, et al. The Impact of Estrogen Receptor Expression on Mutational Status in the Evolution of Non-Small Cell Lung Cancer. Clin Lung Cancer 2023;24:165-74. [Crossref] [PubMed]
- Huang Z, Sun S, Lee M, et al. Single-cell analysis of somatic mutations in human bronchial epithelial cells in relation to aging and smoking. Nat Genet 2022;54:492-8. [Crossref] [PubMed]
- Ito M, Miyata Y, Tsutani Y, et al. Positive EGFR mutation status is a risk of recurrence in pN0-1 lung adenocarcinoma when combined with pathological stage and histological subtype: A retrospective multi-center analysis. Lung Cancer 2020;141:107-13. [Crossref] [PubMed]
- McEvoy AM, Hippe DS, Lachance K, et al. Merkel cell carcinoma recurrence risk estimation is improved by integrating factors beyond cancer stage: A multivariable model and web-based calculator. J Am Acad Dermatol 2024;90:569-76. [Crossref] [PubMed]
- Saglietto A, Gaita F, Blomstrom-Lundqvist C, et al. AFA-Recur: an ESC EORP AFA-LT registry machine-learning web calculator predicting atrial fibrillation recurrence after ablation. Europace 2023;25:92-100. [Crossref] [PubMed]
- Ren W, Zhang Z, Wang Y, et al. Coronary health index based on immunoglobulin light chains to assess coronary heart disease risk with machine learning: a diagnostic trial. J Transl Med 2025;23:22. [Crossref] [PubMed]

