Development of a multimodal fully automated ensemble model to predict EGFR mutation and efficacy of EGFR-TKI in non-small cell lung cancer
Highlight box
Key findings
• The multimodal ensemble model combining clinical, imaging, and machine learning features predicted epidermal growth factor receptor (EGFR) mutation status in non-small cell lung cancer (NSCLC) with high accuracy.
• The multimodal ensemble model showed higher predictive performance for EGFR exon 19 deletions compared with L858R mutations in the therapeutic response of EGFR-tyrosine kinase inhibitor (TKI).
What is known and what is new?
• EGFR mutations are key driver oncogenes in NSCLC and are associated with the efficacy of EGFR-TKI therapy. Molecular testing is the current standard but requires tumor tissue or plasma samples.
• This study demonstrates that the multimodal ensemble model integrating clinical data and computed tomography imaging can non-invasively predict EGFR mutation status and therapeutic response based on EGFR mutation subtypes.
What is the implication, and what should change now?
• The multimodal ensemble model may serve as a valuable adjunct to conventional molecular testing, especially when tissue samples are insufficient or unavailable.
• Future large-scale validation studies are warranted to confirm these findings and to facilitate the integration of artificial intelligence-assisted predictions of EGFR mutation and therapeutic efficacy into routine clinical decision-making for NSCLC.
Introduction
Epidermal growth factor receptor (EGFR) mutations are prevalent in non-small cell lung cancer (NSCLC), and targeted therapy to EGFR has revolutionized the treatment of NSCLC harboring mutant EGFR (1-3). However, testing for EGFR mutations requires tumor biopsies, which are invasive and time-consuming for the patients (4). The clinical benefit from EGFR-tyrosine kinase inhibitors (EGFR-TKIs) therapy is highly variable, indicating the necessity of developing an artificial intelligence-enabled approach for predicting the efficacy of EGFR-TKI in patients with mutant EGFR.
The modalities of artificial intelligence in medical care are rapidly evolving. Unlike conventional machine learning, which constitutes the mainstream approach for radiomics analyses, deep learning algorithms such as the convolutional neural network have recently emerged as a useful tool for image analysis (5). Deep learning algorithms exhibited superior performance compared to conventional machine learning and can thus be used to enhance predictive accuracy (6).
Molecular profiling is crucial for treatment decision-making in lung cancer, but it can be affected by tumor heterogeneity, invasive biopsy for gene sequencing, and dynamical genomic change during therapy (7). Previous studies showed the utilization of a non-invasive radiomic model on diagnostic computed tomography (CT) was associated with EGFR mutation status in lung cancer (8,9). In this study, we developed a comprehensive approach by integrating multiple machine learning models (logistic regression, Bernoulli naive Bayes and Gaussian naive Bayes) and the deep learning model (convolutional neural network) to evaluate the predictive accuracy of mutant EGFR genotypes and therapeutic responses to EGFR-TKI in patients with NSCLC. We present this article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-672/rc).
Methods
Dataset
We retrospectively enrolled patients with advanced lung cancer who underwent EGFR-TKI therapy at the Department of Respiratory Medicine of Juntendo Hospital Affiliated with Juntendo University from August 2015 to September 2021 (Figure S1). The demographic data included sex, age, smoking history, and Eastern Cooperative Oncology Group (ECOG) performance status. Tumor characteristics, including histology, tumor molecular profiling for EGFR mutation, programmed death-ligand 1 (PD-L1) status, tumor-node-metastasis (TNM) classification as proposed by the 8th edition of the Union for International Cancer Control (UICC), and the number of metastatic organs, were noted. Treatment-related data was also recorded, including the ordinal line of treatment, type of EGFR-TKI, therapeutic response, and patient survival. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Medical Research Ethics Committee of Juntendo University Faculty of Medicine (No. E21-0096). Informed consent was waived in this retrospective study.
Ensemble model to predict EGFR mutation and treatment efficacy
The ensemble model integrated multiple machine learning models (logistic regression, Bernoulli naive Bayes and Gaussian naïve Bayes), which were built to analyze clinical information, and the deep learning model (deep convolutional neural network), which was used to analyze CT images. Logistic regression was selected as a transparent baseline model suitable for structured clinical data. Bernoulli and Gaussian Naïve Bayes classifiers were employed for their efficiency in handling categorical and binary features under limited sample sizes. A deep convolutional neural network was used to analyze CT images, as it can capture complex spatial representations beyond the capability of conventional classifiers. These models were integrated to exploit complementary strengths—interpretability from classical methods and high-dimensional feature learning from deep learning. The ensemble model was designed to predict EGFR mutation and the overall response of the 1st-line EGFR-TKI.
The clinical information input comprised sex, age, smoking history, ECOG performance status, and tumor characteristics, including histology, PD-L1 status, and number of N in TNM classification. The lung CT images were contrast-enhanced and scanned as the baseline CT for the 1st line treatment. The CT images were resized to 3 mm iso voxel, and the lung field mask images were obtained using the lung mask pre-trained model available on https://github.com/JoHof/lungmask as preprocessing (10).
The overview of our ensemble model is shown in Figure 1. In the deep learning model, pixel-by-pixel features were calculated by 3D U-net (11). The mean and maximum values were calculated over the lung mask for each channel, which were concatenated and input into the multilayer perceptron. The estimation results for each of the sub-models were output as probabilities, and the ensemble model, by integrating sub-models, was finally output as the final estimation result. Each sub-model was independent and could be adjusted individually, attached, or detached.
The deep model was a 3D U-net (11) with four encoder–decoder levels (3×3×3 convolutions with ReLU, batch normalization, 2×2×2 pooling/upsampling). CT volumes were resampled to 3 mm isotropic voxels and lung masks were obtained with the lungmask tool. Feature maps were aggregated by global mean/max pooling within the lung field. The resulting image feature vector was concatenated with the clinical vector (age, sex, smoking history, ECOG, histology, PD-L1, N-status) and passed to a multilayer perceptron for final prediction.
The ensemble used a stacked generalization approach. Each base learner was trained independently on the training folds, and their predicted probabilities were combined by a logistic regression meta-learner, fitted with inner cross-validation to prevent information leakage. This procedure provided the final ensemble prediction.
In the machine learning model training, logistic regression Bernoulli naive Bayes and Gaussian naïve Bayes were trained using Scikit-learn with default parameter settings (12). In the deep learning model training, the loss function was simple binary cross-entropy loss with no weight adjustment. In the training, 3D lung volumes were augmented in random intensity shift (prob =0.5, offsets =2) and random zoom (prob =0.5, min_zoom =0.95, max_zoom =1.05) using Project MONAI (13). A stochastic gradient descent optimizer was used to train our hybrid model in an end-to-end fashion (14). We used One Cycle LR with a maximum learning rate of 5×102 and total steps of 50, then performed 6 times of that cycle (15). The program was coded using Python 3.11 with Pytorch 2.0.1 (Python Software Foundation).
Statistical analysis
The study employed 5-fold cross-validation to assess the performance of the proposed model. The final performance metric was the average of the area under the curve (AUC) computed across all folds. Pairwise comparisons of AUCs between models were performed using the Hanley-McNeil method, which is appropriate for independent receiver operating characteristic (ROC) curves. McNemar’s test was used for paired binary outcomes such as accuracy. All statistical tests were two-sided, and significance was defined at P<0.05.
Results
Patient characteristics
This study included 150 consecutive evaluable patients (Table 1). Among them, 89 patients (59.3%) had EGFR-wild type, and 59 patients (39.3%) had EGFR common mutation (EGFR com), including 33 patients with exon 19 deletions (Del19), 26 patients with L858R mutation, and 2 patients with EGFR uncommon mutation (EGFR uncom).
Table 1
| Features | N | Percentage (%) |
|---|---|---|
| Sex | ||
| Female | 60 | 40.0 |
| Male | 90 | 60.0 |
| Age at diagnosis (years) | ||
| <60 | 44 | 29.3 |
| 60–75 | 69 | 46.0 |
| >75 | 37 | 24.7 |
| Histology | ||
| Adenocarcinoma | 126 | 84.0 |
| Squamous carcinoma | 5 | 3.3 |
| Others | 19 | 12.7 |
| Actionable mutation | ||
| EGFR(+) | 61 | 40.7 |
| ALK/BRAF/KRAS(+) | 15 | 10.0 |
| Negative | 74 | 49.3 |
| Clinical stage | ||
| I/II | 14 | 9.3 |
| III | 18 | 12.0 |
| IV | 118 | 78.7 |
| ECOG PS | ||
| 0–1 | 124 | 82.7 |
| 2–4 | 17 | 11.3 |
| Unknown | 9 | 6.0 |
| Smoking status | ||
| Never | 41 | 27.3 |
| Former/current | 109 | 72.7 |
| PD-L1 expression | ||
| <50% | 90 | 60.0 |
| ≥50% | 37 | 24.7 |
| NA | 23 | 15.3 |
| First line of therapy | ||
| EGFR-TKI | 57 | 38.0 |
| ALK-TKI | 1 | 0.7 |
| MET-TKI | 1 | 0.7 |
| Chemotherapy | 16 | 10.7 |
| PD-1/PD-L1 | 15 | 10.0 |
| Chemotherapy + EGFR-TKI | 4 | 2.7 |
| Chemotherapy + PD-1/PD-L1 | 56 | 37.3 |
| Best response | ||
| CR/PR | 71 | 47.3 |
| SD | 50 | 33.3 |
| PD | 17 | 11.3 |
| NE | 12 | 8.0 |
ALK, anaplastic lymphoma kinase; BRAF, v-raf murine sarcoma viral oncogene homolog B1; CR, complete response; ECOG, Eastern Cooperative Oncology Group; EGFR, epidermal growth factor receptor; KRAS, Kirsten rat sarcoma viral oncogene homolog; NA, not available; NE, not evaluable; PD, progressive disease; PD-1, programmed cell death protein 1; PD-L1, programmed death-ligand 1; PR, partial response; PS, Performance Status; SD, stable disease; TKI, tyrosine kinase inhibitor.
The ROC analysis results of the 5-fold cross-validation of the ensemble model are shown in Figure 2. The ensemble model predicted EGFR mutation with an AUC of 0.88 [95% confidence interval (CI): 0.79–0.97], while ach sub-model of the ensemble model predicted EGFR mutation with AUC of 0.83 (95% CI: 0.71–0.95), 0.88 (95% CI: 0.79–0.97), 0.87 (95% CI: 0.79–0.94), and 0.84 (95% CI: 0.72–0.96) for logistic regression, Bernoulli naive Bayes, Gaussian naïve Bayes and deep neural network respectively. The standard deviation of AUC for the ensemble model was 0.07, which was lower than any other sub-models, −0.10, 0.07, 0.07, and 0.10 for logistic regression, Bernoulli naive Bayes, Gaussian naive Bayes, and deep convolutional neural network, respectively.
ROC analysis for predicting 1st line EGFR-TKI overall response was shown in Figure 3. The predictive accuracy of the ensemble model exhibited an AUC of 0.56 (95% CI: 0.29–0.83), compared to the AUC of 0.59 (95% CI: 0.33–0.86), 0.43 (95% CI: 0.17–0.70), 0.49 (95% CI: 0.33–0.65), and 0.44 (95% CI: 0.15–0.72) in logistic regression, Bernoulli naive Bayes, Gaussian naive Bayes, and deep neural network, respectively. Specifically, the ensemble model reported that the predictive accuracy of the Del19 EGFR mutation was higher than that of the L858R EGFR mutation (overall response rate predictive accuracy of AUC: 0.67 vs. 0.52; disease control rate AUC: 0.88 vs. 0.53). Formal comparison using Hanley-McNeil method showed that the difference in AUCs between Del19 and L858R was statistically significant for disease control rate (P=0.04) but not for overall response rate (P=0.91).
Discussion
This study investigated the performance of a multimodal ensemble model, which integrates multiple machine learning models and a neural network model, in predicting EGFR status and EGFR-TKI response in patients with NSCLC. To our knowledge, this is the first clinical study to investigate the diagnostic accuracy of an ensemble model, incorporating each sub-model, by integrating clinical information and CT images in predicting EGFR mutation and EGFR-TKI efficacy in lung cancer. The ensemble model demonstrated strong performance in predicting EGFR mutations and showed a higher predictive value for EGFR-TKI treatment response in EGFR exon 19 deletions than in L858R mutations.
In our implementation, the ensemble used a stacked generalization strategy, where base learners (logistic regression, Bernoulli and Gaussian Naïve Bayes, and the deep learning model) were trained independently and their probability outputs combined by a logistic regression meta-learner. We chose stacking over simpler approaches such as majority voting or bagging, because stacking can explicitly learn the optimal weighting of heterogeneous models and is well suited to multimodal data, improving robustness and predictive performance.
Artificial intelligence represents a promising tool for the non-invasive prediction of EGFR mutation status in NSCLC, particularly when tissue samples are limited or unavailable. Previous studies have shown that machine learning systems can predict EGFR genotypes with AUCs ranging from 0.75 to 0.81 (16). More recently, meta-analyses concluded that radiomics-based machine learning models achieve pooled AUCs of approximately 0.80–0.85 for EGFR mutation prediction in NSCLC (17). Notably, the predictive performance of our ensemble model (AUC =0.88) is comparable to, or even exceeds, the accuracy reported in these prior studies.
Currently, the predictive ability of EGFR-TKI response remains poor, although CT-based ensemble deep learning methods for cancer diagnosis and prognosis represent a cutting-edge approach in medical imaging and artificial intelligence. A recent proof-of-concept study reported that ensemble deep learning, by integrating CT with conventional risk factors, improved the predictive performance of overall survival in NSCLC treated with immune checkpoint inhibitors. Another ensemble deep-learning study showed that the fusion of radiomics and clinical features can enhance the predictive performance in determining brain metastasis risk in advanced NSCLC (18). Similarly, our CT-based ensemble deep learning study initially demonstrated the high and stable predictive performance of EGFR mutation.
Patients with EGFR mutations exhibited heterogeneous therapeutic responses after treatment with EGFR-TKI, indicating the need to identify patients with mutated EGFR who may benefit from EGFR-TKI treatment in clinical settings. A previous deep-learning study evaluated the predictive performance for the best overall response to third-generation EGFR-TKI in T790M-mutated NSCLC, with AUCs of 0.73–0.83 (19). The predictive accuracy of EGFR-TKI efficacy was limited based on pretreatment clinical information and CT images in NSCLC patients with mutant EGFR. Interestingly, the predictive performance of therapeutic response was markedly increased when EGFR mutant genotypes were analyzed in the present study. The limited predictive accuracy of EGFR-TKI therapeutic response may be attributed to the inherent complexity of EGFR mutant genotypes, the complexity of the mechanisms of primary resistance to EGFR-TKIs, differences in treatment regimens of TKIs, and the small cohort of this study (20).
Our choice of algorithms reflects a balance between interpretability and predictive performance. While logistic regression and Naïve Bayes provide transparency and robustness for small structured datasets, convolutional neural networks are indispensable for extracting latent features from imaging data. By combining these methods in an ensemble, we aimed to ensure robustness across heterogeneous data modalities while acknowledging the inherent trade-offs of each approach. We chose stacking over majority voting or bagging, as it can explicitly learn optimal weights for heterogeneous base models, thereby improving robustness and predictive performance in multimodal data.
This study has several limitations. The sample size was small, which limited the statistical power of subgroup analyses such as Del19 versus L858R. Although the ensemble model demonstrated high accuracy in predicting EGFR mutation status, its performance in predicting overall EGFR-TKI treatment response was modest, necessitating further investigation based on EGFR-mutated genotypes to enhance clinical applicability. Additionally, the study was retrospective and conducted at a single center, which may limit its generalizability. Future large-scale validation studies are warranted to confirm these findings and to facilitate the integration of artificial intelligence-assisted EGFR mutation prediction and therapeutic response into routine clinical decision-making for NSCLC.
Conclusions
The ensemble model demonstrated strong performance in predicting EGFR mutations and showed higher predictive efficacy for EGFR exon 19 deletions than for L858R mutations in terms of therapeutic response to EGFR-TKI. These findings suggest that ensemble learning may enhance the non-invasive prediction of EGFR mutation status and EGFR genotype-specific therapeutic outcomes in NSCLC. The ensemble model may serve as a valuable adjunct to conventional molecular testing, especially when tissue samples are insufficient or unavailable.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-672/rc
Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-672/dss
Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-672/prf
Funding: None.
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-672/coif). Taichi Miyawaki reports receiving consulting fees from Amgen K.K., personal fees from Chugai Pharmaceutical Co., Ltd., Kyowa Hakko Kirin Co., Ltd., Bristol-Myers Squibb, MSD, Daiichi-Sankyo Co., Ltd., AstraZeneca K.K., Taiho Pharma, Ono Pharmaceutical Co., Ltd., outside the submitted work. T.S. reports receiving grants from AstraZeneca, Chugai Pharmaceutical, Boehringer Ingelheim, Novartis, MSD, and Novocure; receiving honoraria from AstraZeneca, Chugai Pharmaceutical, Boehringer Ingelheim, Novartis, MSD, Taiho Pharma, Daiichi-Sankyo, Ono Pharmaceutical, Bristol-Myers Squibb, Nippon Kayaku, Pfizer, Takeda, Eli Lilly and Company, Eisai, Merck biopharma, and Amgen. R.K. reports receiving grants from Daiichi Sankyo Co., Ltd., AstraZeneca K. K., and Chugai Pharmaceutical Co., Ltd. Tomoyasu Mimori reports receiving personal fees from Astra Zeneca, Bristol Myers Squibb, Chugai Pharmaceutical, Daiichi-Sankyo Co., Ltd., Eisai, Nippon Kayaku Co., Ltd., Ono Pharmaceutical, and Taiho Pharmaceutical outside the submitted work. Y.O. is a current employee of Plusman LLC. and Milliman, Inc. K.T. reports receiving grants or contracts from Chugai Pharmaceutical Co., Ltd., Ono Pharmaceutical Co., Ltd., Eli Lilly Japan K. K., Taiho Pharmaceutical Co., Ltd., Nippon Kayaku Co., Ltd., Nippon Boehringer Ingelheim Co., Ltd.; receiving honoraria for speaking from Chugai Pharmaceutical Co., Ltd., AstraZeneca K. K., Taiho Pharmaceutical Co., Ltd., Ono Pharmaceutical Co., Ltd., Eli Lilly Japan K. K., Bristol-Myers K. K., Daiichi Sankyo Co., Ltd., Nippon Kayaku Co., Ltd., Nippon Boehringer Ingelheim Co., Ltd.; is the Board of Director of the Japan Lung Cancer Society, and the Japanese Respiratory Society. The other authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Medical Research Ethics Committee of Juntendo University Faculty of Medicine (No. E21-0096). Informed consent was waived in this retrospective study.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Mitsudomi T, Morita S, Yatabe Y, et al. Gefitinib versus cisplatin plus docetaxel in patients with non-small-cell lung cancer harbouring mutations of the epidermal growth factor receptor (WJTOG3405): an open label, randomised phase 3 trial. Lancet Oncol 2010;11:121-8. [Crossref] [PubMed]
- Soria JC, Ohe Y, Vansteenkiste J, et al. Osimertinib in Untreated EGFR-Mutated Advanced Non-Small-Cell Lung Cancer. N Engl J Med 2018;378:113-25. [Crossref] [PubMed]
- Sequist LV, Yang JC, Yamamoto N, et al. Phase III Study of Afatinib or Cisplatin Plus Pemetrexed in Patients With Metastatic Lung Adenocarcinoma With EGFR Mutations. J Clin Oncol 2023;41:2869-76. [Crossref] [PubMed]
- Castellanos E, Feld E, Horn L. Driven by Mutations: The Predictive Value of Mutation Subtype in EGFR-Mutated Non-Small Cell Lung Cancer. J Thorac Oncol 2017;12:612-23. [Crossref] [PubMed]
- Greener JG, Kandathil SM, Moffat L, et al. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022;23:40-55. [Crossref] [PubMed]
- Nguyen HS, Ho DKN, Nguyen NN, et al. Predicting EGFR Mutation Status in Non-Small Cell Lung Cancer Using Artificial Intelligence: A Systematic Review and Meta-Analysis. Acad Radiol 2024;31:660-83. [Crossref] [PubMed]
- Kim L, Tsao MS. Tumour tissue sampling for lung cancer management in the era of personalised therapy: what is good enough for molecular testing? Eur Respir J 2014;44:1011-22. [Crossref] [PubMed]
- Wang S, Shi J, Ye Z, et al. Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning. Eur Respir J 2019;53:1800986. [Crossref] [PubMed]
- Mu W, Jiang L, Zhang J, et al. Non-invasive decision support for NSCLC treatment using PET/CT radiomics. Nat Commun 2020;11:5228. [Crossref] [PubMed]
- Hofmanninger J, Prayer F, Pan J, et al. Automatic lung segmentation in routine imaging is primarily a data diversity problem, not a methodology problem. Eur Radiol Exp 2020;4:50. [Crossref] [PubMed]
- Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, Cham, 2015::234-41. Available online: https://doi.org/
10.1007/978-3-319-24574-4_28 - Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. Journal of machine learning research. J Mach Learn Res 2011;12:2825-30.
- Chlap P, Finnegan RN. PlatiPy: Processing Library and Analysis Toolkit for Medical Imaging in Python. J Open Source Softw 2023;18:5374.
- Ruder S. An overview of gradient descent optimization algorithms. arXiv. 2016:1609.04747.
- Smith LN, Topin N. Super-convergence: Very fast training of neural networks using large learning rates. In: Pham T. Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications. SPIE 2019;11006:369-86.
- Wang S, Yu H, Gan Y, et al. Mining whole-lung information by artificial intelligence for predicting EGFR genotype and targeted therapy response in lung cancer: a multicohort study. Lancet Digit Health 2022;4:e309-19. [Crossref] [PubMed]
- Chen J, Chen A, Yang S, et al. Accuracy of machine learning in preoperative identification of genetic mutation status in lung cancer: A systematic review and meta-analysis. Radiother Oncol 2024;196:110325. [Crossref] [PubMed]
- Gong J, Wang T, Wang Z, et al. Enhancing brain metastasis prediction in non-small cell lung cancer: a deep learning-based segmentation and CT radiomics-based ensemble learning model. Cancer Imaging 2024;24:1. [Crossref] [PubMed]
- Lou N, Cui X, Lin X, et al. Development and validation of a deep learning-based model to predict response and survival of T790M mutant non-small cell lung cancer patients in early clinical phase trials using electronic medical record and pharmacokinetic data. Transl Lung Cancer Res 2024;13:706-20. [Crossref] [PubMed]
- Zheng Q, Huang Y, Zhao H, et al. EGFR mutation genotypes affect efficacy and resistance mechanisms of osimertinib in T790M-positive NSCLC patients. Transl Lung Cancer Res 2020;9:471-83. [Crossref] [PubMed]

