Development and validation of a PET/CT radiomics and dual-task learning model for the prediction of pathological subtypes and EGFR mutation in non-small cell lung cancer
Highlight box
Key findings
• A novel Dual-Modal Dual-Task Prediction (DMDP) framework was developed, integrating 3D positron emission tomography/computed tomography (PET/CT) imaging with radiomic features through a multi-scale cross-attention mechanism to enable the simultaneous prediction of non-small cell lung cancer (NSCLC) pathological subtypes and epidermal growth factor receptor (EGFR) mutation status.
• The DMDP model demonstrated superior predictive performance, yielding an area under the receiver operating characteristic curve (AUC) of 0.93 (0.88 accuracy) for EGFR mutation status and 0.88 for pathological subtype classification.
• The integration of multi-modal heterogeneity features significantly enhanced specificity to 88%, effectively addressing the sensitivity-specificity imbalance commonly observed in single-modality models.
What is known and what is new?
• While traditional tissue biopsy is invasive, current single-modality imaging predictive models often suffer from limited diagnostic accuracy due to insufficient feature representation.
• This study establishes a transparent, interpretable multi-modal fusion framework. Grad-CAM visualization validated that the model precisely localizes tumor margins and hypermetabolic regions, revealing that internal tumor characteristics are more predictive of gene mutations, whereas peritumoral interstitial information is critical for pathological subtype classification.
What is the implication, and what should change now?
• This integrated framework provides a non-invasive, rapid assessment tool that can be embedded into routine clinical workflows to guide personalized therapeutic decision-making without the immediate necessity for invasive biopsies.
• Future clinical protocols for lung cancer diagnosis should incorporate multi-modal artificial intelligence tools to leverage the complementary value of PET/CT imaging in molecular subtyping.
Introduction
Accounting for roughly 85% of lung cancer cases, non-small cell lung cancer (NSCLC) remains a leading cause of global cancer deaths (1). The World Health Organization (WHO) classifies NSCLC into distinct subtypes, mainly adenocarcinoma and squamous cell carcinoma (2). The epidermal growth factor receptor (EGFR) mutation rate in adenocarcinoma can be as high as 50–60% in Asian populations, while it is less than 5% in squamous cell carcinoma (3). This highlights a deep correlation between molecular mechanisms and pathological subtypes (4,5). The latest National Comprehensive Cancer Network guidelines (Version 4.2026) recommend broad molecular profiling, including EGFR testing, to guide targeted therapies in advanced NSCLC (6). The limitations of traditional invasive biopsy in dynamic monitoring have driven the development of imaging-based predictive research (7,8).
Early studies utilized radiomic features combined with machine learning (ML) models to achieve EGFR mutation prediction (9). Le et al. (10) achieved an accuracy of approximately 80% using the XGBoost classifier. To improve performance, research has shifted towards integrating positron emission tomography (PET) metabolic parameters [e.g., metabolic tumor volume (MTV), total lesion glycolysis (TLG)] and clinical information to reveal the biological characteristics of lesions (11,12). Although the predictive capability of metabolic parameters remains controversial, dual-modal fusion has consistently been shown to outperform single-modality features (13). Recently, deep learning (DL) methods such as convolutional neural networks (CNN) and densely connected convolutional networks (DenseNet) have demonstrated superior performance through end-to-end learning (14,15). However, DL models’ “black-box” nature conflicts with clinical interpretability. Despite attempts using methods like gradient-weighted class activation mapping (Grad-CAM) (16), this field still requires dual-modal fusion solutions that strike a balance between high performance and interpretability. Research has found significant differences between adenocarcinoma and squamous cell carcinoma in clinical, epidemiological, and imaging characteristics (17), providing a basis for non-invasive classification. ML models like support vector machine (SVM) and Random Forest (RF), using multi-modal features, excel at subtype differentiation, surpassing clinical-only models (18).
Simultaneously, DL has achieved breakthroughs in pathological image analysis; for instance, VGG16 achieved an accuracy of 0.84, while advanced Transformers like PKMT-Net boosted the prediction area under the curve (AUC) to 0.92 (19,20). Furthermore, CNN-vision transformer (ViT) fusion models have achieved up to 98% accuracy in pathological classification (21). Despite these advances, the joint application of positron emission tomography/computed tomography (PET/CT) and clinical information remains underutilized for comprehensive diagnosis (22). Previous studies on NSCLC characterization have largely relied on single-modality or single task models, which often struggle to detect subtle genetic phenotypes and fully capture tumor heterogeneity. Integrating computed tomography (CT)-derived anatomical information with PET-derived functional and metabolic features in a dual-modal, dual-task framework offers a more comprehensive representation of tumor biology.
Although CT remains the standard first-line imaging modality, in real-world clinical practice, patients with high-risk pulmonary nodules or suspected lung cancer often undergo FDG PET/CT for staging and treatment decision-making (23,24). Because PET/CT inherently includes CT acquisition, and many patients have previously undergone diagnostic chest CT, PET and CT data are naturally co-available in this selected population (25). However, DL models that jointly fuse PET/CT and clinical data for simultaneous EGFR mutation prediction and subtype classification remain scarce, and existing approaches lack sufficient interpretability.
Based on these observations, this study proposes a dual-modal predictive framework that fuses PET/CT imaging features with clinical information. This framework combines ensemble learning with DL, aiming to construct a radiomics tool that balances both performance and interpretability. We present this article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0279/rc).
Methods
Patient cohorts
Patients were consecutively enrolled from three centers between 2007 and 2021. Multi-center data comprised: (I) The Cancer Imaging Archive (TCIA) NSCLC Radiogenomics (26) (n=211; CT, PET, and EGFR data); (II) TCIA Lung-PET-CT-Dx (27) (n=355; 146 paired PET/CT cases); and (III) a clinical cohort from Guangdong Provincial People’s Hospital (n=95; PET/CT and clinical records). For the EGFR mutation prediction task, only patients with available EGFR mutation status were included. EGFR mutation status in the TCIA NSCLC Radiogenomics dataset was obtained from publicly released clinical/genomic annotations, with EGFR exons 18–21 tested by multiplex PCR followed by the SNaPshot single-base extension assay. In the Guangdong Provincial People’s Hospital cohort, EGFR mutation status was extracted from institutional clinical records and molecular pathology reports, with molecular testing performed using targeted probe-capture-based next-generation sequencing on the Illumina platform. These molecular testing results were used as the reference standard for EGFR mutation status. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study, including the cohort from Guangdong Provincial People’s Hospital, was approved by the Medical Ethics Committee of Shenzhen University Medical School (No. PN-202600012), and the requirement for individual informed consent was waived for this retrospective analysis.
This was a retrospective prediction model study, and the study size was determined by all eligible patients with available PET/CT images, clinical information, and reference-standard labels during the study period. The final study population and the detailed screening process, based on the inclusion and exclusion criteria, are illustrated in the flow chart (Figure 1).
Inclusion and exclusion criteria
Consecutive patients were included if they met the following criteria: (I) underwent whole-body or chest 18F-FDG PET/CT scanning prior to diagnosis; (II) had postoperative pathological confirmation of NSCLC with detailed pathological information available; (III) had received no prior antineoplastic therapy or surgical resection prior to PET/CT examination.
Exclusion criteria comprised: (I) incomplete clinical data or loss to follow-up; (II) a history of other malignant tumors; (III) multicentric lung cancer or poorly defined tumor margins; (IV) misalignment between PET and CT images; (V) insufficient number of CT or PET slices or poor image quality; and (VI) abnormally low FDG uptake in the tumor tissue.
Imaging preprocessing
ML dataset preprocessing included converting raw DICOM data into NIfTI format and performing standardized uptake value (SUV) (28) correction on PET images to minimize device and physiological variations. Tumor region of interest (ROI) extraction followed a standardized protocol: existing masks were applied to annotated data, and CT images underwent automatic segmentation via nnU-Net (29). Remaining PET images were manually delineated using 3D Slicer (30) by one of the authors.
The DL dataset was randomly partitioned into training and validation sets (6:4 ratio) using stratified sampling based on EGFR mutation and pathological subtype labels. ROI preprocessing involved 3D clipping around the tumor center to generate uniform volumes (48 mm × 96 mm × 96 mm). Missing values in high-dimensional radiomic features were imputed via median imputation and Z-score normalized to ensure training stability.
Feature selection
Feature selection was performed separately for the EGFR mutation prediction and pathological subtype classification tasks. For PET features, Recursive Feature Elimination with Cross-Validation (RFECV) (26) was applied to identify the optimal feature subset. The initial 20 PET features were reduced to 6 features for EGFR mutation prediction and 5 features for pathological subtype classification. For CT radiomic features, a two-step selection strategy was adopted. First, 130 candidate CT features were screened using the Minimum Redundancy Maximum Relevance (mRMR) algorithm, and the top 20 features were retained for each region-specific feature set. Subsequently, least absolute shrinkage and selection operator (LASSO) regression was used for further dimensionality reduction, and features with non-zero coefficients were retained for model construction.
For EGFR mutation prediction, the final CT_reg1, CT_reg2, CT_reg5, CT_reg6, and CT_reg_all feature sets contained 8, 2, 5, 6, and 7 features, respectively. The PET-clinical and PET-CT-clinical feature sets contained 9 and 19 features, respectively. For pathological subtype classification, the corresponding CT_reg1, CT_reg2, CT_reg5, CT_reg6, and CT_reg_all feature sets contained 5, 7, 5, 6, and 9 features, respectively. The PET-clinical and PET-CT-clinical feature sets contained 8 and 13 features, respectively (Tables S1,S2). The de-identified extracted features after feature selection have been deposited in a publicly accessible GitHub repository (https://github.com/jiang-0311/DMDP).
All selected features were standardized before model training to reduce inter-feature variability. Potential dataset-related batch effects were explored using UMAP visualization, and no additional harmonization procedure, such as ComBat, was applied because no obvious dataset-specific clustering was observed in either cohort (Figures S1,S2).
ML model development
The predictive models for EGFR mutation status and pathological subtype classification were developed following a standardized radiomics-based ML pipeline (Figure 2). The final selected PET, CT, PET-clinical, and PET-CT-clinical feature sets were used for model construction, with detailed feature compositions provided in Tables S1,S2. The dataset was divided into training and independent test cohorts at a ratio of 8:2 using stratified random sampling. A two-layer stacked ensemble framework was then applied, with Logistic Regression (LR), RF, and Extremely Randomized Trees (ERT) used as first-layer base learners and LR used as the second-layer meta-learner. Hyperparameters were optimized using grid search within the training cohort, and 5-fold cross-validation was performed to evaluate model stability. The final models were assessed on the independent test cohort to evaluate generalizability.
DL model construction
To optimize multimodal information fusion and multi-level feature learning, we developed a DMDP model (Figure 3), which employs an end-to-end learning framework to extract high-dimensional latent representations while simultaneously optimizing synergetic fusion of PET/CT data and collaborative prediction across two diagnostic tasks. The model is structured into five core modules:
Dual-modal encoder: This module extracted tumor-related features from 3D CT and PET images using 3 × 3 convolutional kernels, batch normalization (31), and rectified linear unit (ReLU) (32) activation.
Multi-scale cross attention: Pixel-level information exchange between modalities was facilitated via a Transformer-based lightweight cross-modal attention module (33).
Text encoder: Manually extracted radiomic and clinical features were encoded using fully connected layers, instance normalization, and ReLU activation.
Contrastive learning loss: Features were aligned by mapping imaging and radiomic data into a unified low-dimensional space to minimize inter-modal distances.
Dual-task prediction heads: Parallel prediction for EGFR mutation and pathological subtype was performed using fully connected layers and a joint loss function incorporating cross-entropy and contrastive loss.
Model evaluation and interpretability
Model performance was evaluated using AUC, accuracy, F1-score, sensitivity, and specificity. Ablation experiments were conducted to quantify the contribution of the cross-modal alignment loss and feature screening modules. Grad-CAM was employed for the visualization of regions of interest, and training stability was assessed via loss curve analysis. All models were implemented using the PyTorch (version 1.10.2) framework.
Statistical analysis
Statistical analyses were performed using R (version 4.2.1) and Python (version 3.9). A two-sided P<0.05 was considered statistically significant. Continuous variables were evaluated for normality via the Shapiro-Wilk test. Normally distributed data were expressed as mean ± standard deviation and compared using the independent samples t-test; otherwise, median and the Mann-Whitney U test were used. Categorical variables were expressed as frequencies (%) and compared using the Chi-square or Fisher’s exact test. These methods were applied to evaluate the discriminative value of clinical factors, metabolic indices (SUVmax, SUVmean, TLG), and heterogeneity parameters (SUVskew, SUVkurtosis, Het) across cohorts.
Results
Patient cohorts and clinical characteristics
After applying the inclusion and exclusion criteria, 228 patients were included in the EGFR mutation prediction cohort and 365 in the pathological subtype classification cohort (Figure 1). Demographic characteristics of the study population are summarized in Table 1.
Table 1
| Characteristics | GD-Data (n=75) | NR-Data (n=186) | LPCD-Data (n=123) | P value |
|---|---|---|---|---|
| Age (years) | 63 [54, 72] | 69 [63, 75] | 62 [56, 69] | <0.001 |
| Weight (kg) | – | 78.2 [62.8, 90.1] | 66.0 [58.0, 76.2] | <0.001 |
| Gender | 0.35 | |||
| Male | 28 (37.3) | 62 (33.3) | 51 (41.5) | |
| Female | 47 (62.7) | 124 (66.7) | 72 (58.5) | |
| Smoking status | <0.001 | |||
| Never | 22 (29.3) | 41 (22.0) | 53 (43.1) | |
| Smoked | 53 (70.7) | 145 (78.0) | 70 (56.9) | |
| EGFR mutation status | <0.001 | |||
| EGFR− | 45 (60.0) | 117 (62.9) | – | |
| EGFR+ | 30 (40.0) | 36 (19.4) | – | |
| Pathological type | <0.001 | |||
| Squamous cell carcinoma | 12 (16.0) | 29 (15.6) | 32 (26.0) | |
| Adenocarcinoma | 48 (64.0) | 153 (82.3) | 91 (74.0) |
Data are presented as median [IQR] or n (%). EGFR, epidermal growth factor receptor; GD-Data, Guangdong Provincial People’s Hospital Clinical Dataset; IQR, interquartile range; LPCD-Data, The TCIA Lung-PET-CT-Dx dataset; NR-Data, The TCIA NSCLC Radiogenomics dataset; TCIA, The Cancer Imaging Archive.
EGFR mutation prediction dataset
The training cohort (n=228) comprised 66 (28.9%) EGFR-mutant and 162 (71.1%) wild-type cases. Significant differences existed in sex (higher female proportion in the mutant group; P<0.001), weight, and age (all P<0.05). Regarding PET, heterogeneity metrics (SUVskew, SUVkurtosis, Het) and tri-axial parameters (type_xy, type_xz, type_yz) varied significantly (all P<0.05), associating EGFR mutation with metabolic/spatial heterogeneity. Conversely, American Joint Committee on Cancer (AJCC) stage, SUVmax, and SUVmean showed no significant differences (P>0.3). Thus, EGFR mutation primarily influences clinical features and PET heterogeneity rather than tumor stage or baseline metabolism (Table S3).
Pathological subtype classification dataset
The cohort (n=365) comprised 292 (80.0%) adenocarcinomas and 73 (20.0%) squamous cell carcinomas. Only sex distribution varied significantly between subtypes (higher male proportion in squamous cell carcinoma; P<0.001); age, weight, and AJCC stage showed no differences. Squamous cell carcinoma exhibited higher metabolic activity, with significant differences in SUVmax, SUVmean, and TLG_3. Heterogeneity metrics, including Het (P=0.02), tri-axial SUV_mean_sd (all P<0.001), and type_yz (P=0.01), also varied significantly. SUVskew, SUVkurtosis, and MTV_3 showed no significant variations (Table S4).
Results of ML analysis
Model performance and discriminative ability
For EGFR mutation prediction, the predictive performance for EGFR mutation status across various feature ensembles is summarized in Table 2. The multi-modal integration of PET, CT, and clinical data (PET_CT_clinical) demonstrated the highest diagnostic efficacy among all ensembles. Within the CT-only category, the ensemble incorporating all sub-regional features (CT_reg_all) achieved the optimal performance. Specifically, the Stacking ensemble model yielded a peak AUC of 0.90 (95% CI: 0.81–0.96) using the PET_CT_clinical set, representing a significant improvement over the baseline model and indicating exceptional discriminatory capability. The corresponding receiver operating characteristic (ROC) curves for the top-performing feature sets (CT_reg5, CT_reg_all, PET_clinical, and PET_CT_clinical) are illustrated in Figure 4A-4D.
Table 2
| Feature set | Best model | AUC (95% CI) | Acc (95% CI) | F1 (95% CI) | Sen (95% CI) | Spe (95% CI) |
|---|---|---|---|---|---|---|
| CT_reg1 | Logistic | 0.72 (0.57–0.86) | 0.59 (0.49–0.74) | 0.43 (0.33–0.61) | 0.31 (0.25–0.50) | 0.85 (0.78–0.93) |
| CT_reg2 | Stacking | 0.62 (0.52–0.79) | 0.62 (0.56–0.75) | 0.61 (0.46–0.73) | 0.62 (0.48–0.74) | 0.61 (0.48–0.82) |
| CT_reg5 | Stacking | 0.77 (0.53–0.88) | 0.67 (0.46–0.79) | 0.62 (0.46–0.70) | 0.54 (0.45–0.73) | 0.79 (0.67–0.90) |
| CT_reg6 | Stacking | 0.74 (0.61–0.88) | 0.76 (0.69–0.90) | 0.71 (0.62–0.86) | 0.69 (0.51–0.84) | 0.70 (0.65–0.89) |
| CT_reg_all | Stacking | 0.82 (0.63–0.87) | 0.76 (0.67–0.85) | 0.70 (0.61–0.80) | 0.92 (0.67–1.00) | 0.58 (0.43–0.68) |
| PET | Stacking | 0.72 (0.55–0.90) | 0.59 (0.48–0.67) | 0.44 (0.35–0.52) | 0.46 (0.32–0.58) | 0.73 (0.56–0.86) |
| PET_clinical | Logistic | 0.78 (0.60–0.93) | 0.68 (0.57–0.80) | 0.64 (0.58–0.77) | 0.62 (0.56–0.75) | 0.79 (0.68–0.87) |
| PET_CT_clinical | Stacking | 0.90 (0.81–0.96) | 0.83 (0.72–0.92) | 0.78 (0.71–0.95) | 0.85 (0.75–0.91) | 0.79 (0.72–0.86) |
Acc, accuracy; AUC, area under the receiver operating characteristic curve; CI, confidence interval; CT, computed tomography; PET, positron emission tomography; Sen, sensitivity; Spe, specificity.
For pathological subtype classification (adenocarcinoma versus squamous cell carcinoma), internal validation (Table S5) identified the PET_clinical feature set as the optimal ensemble, while the CT_reg6 model outperformed other CT-based configurations (Figure 4E-4H). The RF model achieved the best performance using the PET_clinical set, with an AUC of 0.87 (95% CI: 0.78–0.95). Notably, models utilizing PET-derived features alone exhibited performance comparable to combined feature sets, underscoring the dominant role of metabolic imaging in subtype differentiation. When evaluated on the independent external validation set (GD-Data), the lead model maintained an AUC of 0.74 (95% CI: 0.63–0.82) (Table 3), confirming the robust generalizability of the proposed framework across different clinical centers.
Table 3
| Model | Internal validation | External validation | |||||
|---|---|---|---|---|---|---|---|
| AUC (95% CI) | ACC (95% CI) | F1 (95% CI) | AUC (95% CI) | ACC (95% CI) | F1 (95% CI) | ||
| RF | 0.87 (0.76–0.96) | 0.85 (0.75–0.93) | 0.91 (0.85–0.96) | 0.74 (0.63–0.82) | 0.75 (0.68–0.82) | 0.77 (0.70–0.77) | |
| EXT | 0.80 (0.64–0.92) | 0.75 (0.64–0.85) | 0.84 (0.75–0.91) | 0.71 (0.54–0.84) | 0.74 (0.67–0.81) | 0.60 (0.52–0.67) | |
| LR | 0.78 (0.60–0.92) | 0.69 (0.56–0.80) | 0.78 (0.67–0.87) | 0.69 (0.52–0.81) | 0.63 (0.52–0.74) | 0.63 (0.58–0.71) | |
| Stacking | 0.84 (0.69–0.95) | 0.77 (0.67–0.87) | 0.84 (0.75–0.92) | 0.72 (0.57–0.86) | 0.72 (0.64–0.79) | 0.62 (0.54–0.70) | |
ACC, accuracy; AUC, area under the receiver operating characteristic curve; CI, confidence interval; EXT, extra randomized trees; LR, logistic regression; RF, random forest.
Model calibration and probability distribution
Model reliability and discriminative robustness were further evaluated using calibration curves and probability distribution plots (Figure 5). For EGFR mutation prediction based on the PET-CT-clinical feature set (Figure 5A-5D), the Stacking model (Figure 5A) demonstrated superior performance, with its calibration curve most closely approximating the ideal diagonal line (Brier score = 0.12). This indicates a high degree of consistency between the predicted probabilities and actual mutation risks. Correspondingly, the violin plots revealed a distinct bimodal separation in predicted probabilities between the mutation and wild-type groups, further validating the robustness of the Stacking approach.
Regarding pathological subtype classification (Figure 5E-5H), models constructed using the PET-clinical feature set were compared. The RF model (Figure 5E) exhibited the optimal calibration, showing the closest alignment with the ideal diagonal and the most pronounced separation in probability distributions. These findings suggest that the RF model possesses the highest discriminative performance and generalization capability for subtype differentiation, followed by the Stacking model (Figure 5F).
Results of DL approach
Loss curve analysis and convergence assessment
To ensure robust model training, we evaluated the impact of different loss functions and feature sets on convergence (Figure S3). A comparative analysis revealed that incorporating cross-modal contrastive learning loss significantly stabilized the total loss (Figure S3B,S3C). Additionally, the transition from Figure S3C,S3D demonstrates that radiomic feature screening further facilitated model convergence, leading to a more efficient training process.
Model evaluation
Ablation results are summarized in Table 4. After integrating radiomic features, the specificity for mutation prediction and subtype classification changed by 9% and 13%, respectively. Without dual-modal alignment loss, sensitivity and specificity for subtype classification were 0.92 and 0.25; with this loss, these metrics were 0.85 and 0.88. Feature screening increased mutation prediction sensitivity by 16% and subtype classification specificity by 13%.
Table 4
| Group | Module combination | Task | Acc | F1 | Sen | Spe |
|---|---|---|---|---|---|---|
| 1 | DMDP (without heterogeneous radiomic features) | Mutant prediction | 0.79 | 0.75 | 0.71 | 0.79 |
| Subtype classification | 0.81 | 0.78 | 0.82 | 0.75 | ||
| 2 | DMDP (without Lcl loss) | Mutant prediction | 0.62 | 0.64 | 0.71 | 0.58 |
| Subtype classification | 0.81 | 0.39 | 0.92 | 0.25 | ||
| 3 | DMDP (heterogeneous radiomic features without selected) | Mutant prediction | 0.79 | 0.75 | 0.70 | 0.83 |
| Subtype classification | 0.79 | 0.77 | 0.80 | 0.75 | ||
| 4 | DMDP | Mutant prediction | 0.88 | 0.87 | 0.86 | 0.88 |
| Subtype classification | 0.85 | 0.86 | 0.85 | 0.88 |
Acc, accuracy; DMDP, Dual-Modal Dual-task Prediction; Lcl loss, contrastive loss; Sen, sensitivity; Spe, specificity.
Performance comparison of the DMDP model and alternative models
To validate the performance of the DMDP model in mutation prediction and subtype classification, this study compared it with optimal ML models (Stacking and RF) (see Table 5 for details). Results demonstrate that DMDP achieved overall optimal and more balanced performance across both tasks. In mutation prediction, it increased overall accuracy by 8% and specificity by 9%. For subtype classification, specificity improved by 55% and AUC rose by 8%. This indicates that multimodal data inputs and the DMDP architecture are better suited for predicting genetic mutations and pathological subtypes.
Table 5
| Feature set | Best model | Task | AUC (95% CI) | Acc (95% CI) | F1 (95% CI) | Sen (95% CI) | Spe (95% CI) |
|---|---|---|---|---|---|---|---|
| PET_CT_clinical | Stacking | Mutant prediction | 0.90 (0.81–0.96) | 0.80 (0.74–0.86) | 0.71 (0.64–0.78) | 0.85 (0.79–0.91) | 0.79 (0.72–0.86) |
| PET_clinical | RF | Subtype classification | 0.87 (0.78–0.95) | 0.85 (0.79–0.91) | 0.91 (0.86–0.96) | 0.98 (0.95–1.00) | 0.33 (0.24–0.42) |
| PET_CT_clinical | DMDP | Mutant prediction | 0.93 (0.81–1.00) | 0.88 (0.83–0.98) | 0.87 (0.69–0.97) | 0.86 (0.74–0.95) | 0.88 (0.75–0.94) |
| PET_CT_clinical | DMDP | Subtype classification | 0.88 (0.73–0.98) | 0.85 (0.75–0.94) | 0.86 (0.83–0.97) | 0.85 (0.73–0.95) | 0.88 (0.77–0.96) |
Acc, accuracy; AUC, area under the receiver operating characteristic curve; CI, confidence interval; CT, computed tomography; DMDP, Dual-Modal Dual-task Prediction; PET, positron emission tomography; RF, random forest; Sen, sensitivity; Spe, specificity.
Model interpretability analysis
Grad-CAM visualization elucidated the model’s attentional focus. The original input CT and PET images are shown in Figure S4A,S4B. In the Dual-Modal Encoder (Figure S4C,S4D), the network localized tumor regions within individual modalities. After Multi-Scale Cross Attention integration (Figure S4E-S4H), the small-scale branch targeted lesion margins, while large-scale branch captured high-density CT and hypermetabolic PET regions, validating the multi-modal fusion strategy.
Analysis across task-label patterns (Figure 6) revealed systematic attentional shifts. In concordant positive cases, subtype prediction prioritized the CT microenvironment, whereas mutation prediction targeted the tumor core. Label divergence shifted attention toward peri-tumoral tissues. These findings demonstrate that task-label variations modulate dual-task feature learning, reflecting latent inter-task correlations and the synergy of dual-prediction heads.
Discussion
This study demonstrated that the DMDP model, which integrated CT and PET images with a collaborative learning architecture, achieved superior performance in both EGFR mutation prediction and pathological subtype classification compared with baseline models. Specifically, the model achieved a 9% increase in specificity for mutation prediction and a 55% increase for subtype classification. Ablation experiments confirmed that the cross-modal alignment loss played a central role in maintaining model robustness by correcting the sensitivity-specificity imbalance often observed with single-modality inputs. Grad-CAM visualization analysis revealed that the dual-task prediction heads automatically focused on anatomical regions: gene mutation prediction prioritized internal tumor features, whereas subtype classification relied more on interstitial information in the peritumoral area. This complementary feature extraction pattern may have contributed to the observed performance gains.
Compared to previous studies relying on manual radiomic features, such as the best-performing model reported by Huang et al. (34) (AUC of 0.803), our end-to-end learning framework achieved improved predictive performance in the present dataset. While DL models by Coudray et al. (13) and Wang et al. (35) performed well in single tasks, their reliance on a single modality limited their ability to capture the full spectrum of tumor heterogeneity. The DMDP model addressed the “black box” nature of conventional DL by incorporating a multi-scale cross-attention mechanism, providing an interpretable path for feature fusion. Furthermore, we found that incorporating feature screening strategies from ML into the DL pre-training phase effectively accelerated convergence and enhanced information efficiency. This hybrid strategy addressed the training instability often encountered with pure DL models in small medical imaging datasets.
Despite its strengths, this study has several limitations. First, its retrospective design and limited sample size may affect the stability and generalizability of the model; therefore, further independent, multi-center prospective validation is required before clinical implementation. Second, reliance on paired PET/CT may restrict clinical utility in resource-limited settings. Finally, while Grad-CAM improves transparency, the biological basis of DL features remains partially opaque. Future research should integrate multi-omics data and explore missing-modality learning to enhance the framework’s biological interpretability and clinical flexibility.
Looking beyond the refinement of artificial intelligence algorithms, the broader landscape of NSCLC management will inevitably be shaped by other synergistic technologies. In the last few years, technological developments in the medical field have been rapid and are continuously evolving. One of the most revolutionary breakthroughs was the introduction of the Internet of Things (IoT) concept within medical practice (36). In the context of NSCLC, the IoT can facilitate real-time patient monitoring and seamless data integration, which could eventually complement the predictive power of advanced radiomics. Additionally, the role of 3D printing in oncology is becoming increasingly prominent, offering unprecedented opportunities for personalized surgical planning and targeted interventions in NSCLC (37). Furthermore, therapeutic phases rely on advanced imaging; for instance, 18F-FDG PET-CT-based radiotherapy planning in stage III NSCLC allows for precise target volume delineation and improved outcomes (38). Integrating our framework with these emerging technologies may support a more personalized and data-driven approach to lung cancer management.
Conclusions
This study developed a DL-based framework integrating dual-modal PET/CT radiomics for the simultaneous prediction of EGFR mutations and pathological subtypes in NSCLC. By characterizing complementary anatomical and metabolic imaging information, the model showed potential value for non-invasive molecular and histological prediction and may provide a technical basis for imaging-based risk stratification in selected clinical settings.
Acknowledgments
We sincerely thank all patients whose medical data contributed to this study.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0279/rc
Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0279/dss
Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0279/prf
Funding: This study was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0279/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study, including the cohort from Guangdong Provincial People’s Hospital, was approved by the Medical Ethics Committee of Shenzhen University Medical School (No. PN-202600012), and the requirement for individual informed consent was waived for this retrospective analysis.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
- Ettinger DS, Wood DE, Aisner DL, et al. Non-Small Cell Lung Cancer, Version 3.2022, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 2022;20:497-530. [Crossref] [PubMed]
- Vavala T, Malapelle U, Veggiani C, et al. Molecular profiling of advanced non-small cell lung cancer in the era of immunotherapy approach: a multicenter Italian observational prospective study of biomarker screening in daily clinical practice. J Clin Pathol 2022;75:234-40. [Crossref] [PubMed]
- Yamamoto H, Toyooka S, Mitsudomi T. Impact of EGFR mutation analysis in non-small cell lung cancer. Lung Cancer 2009;63:315-21. [Crossref] [PubMed]
- da Cunha Santos G, Shepherd FA, Tsao MS. EGFR mutations and lung cancer. Annu Rev Pathol 2011;6:49-69. [Crossref] [PubMed]
- Riely GJ, Wood DE, Aisner DL, et al. Non-Small Cell Lung Cancer, Version 4.2026, NCCN Clinical Practice Guidelines In Oncology. J Natl Compr Canc Netw 2026;24:e260017. [Crossref] [PubMed]
- Eze C, Schmidt-Hegemann NS, Sawicki LM, et al. PET/CT imaging for evaluation of multimodal treatment efficacy and toxicity in advanced NSCLC-current state and future directions. Eur J Nucl Med Mol Imaging 2021;48:3975-89. [Crossref] [PubMed]
- Mu W, Jiang L, Zhang J, et al. Non-invasive decision support for NSCLC treatment using PET/CT radiomics. Nat Commun 2020;11:5228. [Crossref] [PubMed]
- Zhang J, Zhao X, Zhao Y, et al. Value of pre-therapy (18)F-FDG PET/CT radiomics in predicting EGFR mutation status in patients with non-small cell lung cancer. Eur J Nucl Med Mol Imaging 2020;47:1137-46. [Crossref] [PubMed]
- Le NQK, Kha QH, Nguyen VH, et al. Machine Learning-Based Radiomics Signatures for EGFR and KRAS Mutations Prediction in Non-Small-Cell Lung Cancer. Int J Mol Sci 2021;22:9254. [Crossref] [PubMed]
- Moon SH, Kim J, Joung JG, et al. Correlations between metabolic texture features, genetic heterogeneity, and mutation burden in patients with lung cancer. Eur J Nucl Med Mol Imaging 2019;46:446-54. [Crossref] [PubMed]
- Lv Z, Fan J, Xu J, et al. Value of (18)F-FDG PET/CT for predicting EGFR mutations and positive ALK expression in patients with non-small cell lung cancer: a retrospective analysis of 849 Chinese patients. Eur J Nucl Med Mol Imaging 2018;45:735-50. [Crossref] [PubMed]
- Coudray N, Ocampo PS, Sakellaropoulos T, et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med 2018;24:1559-67. [Crossref] [PubMed]
- Song Z, Liu T, Shi L, et al. The deep learning model combining CT image and clinicopathological information for predicting ALK fusion status and response to ALK-TKI therapy in non-small cell lung cancer patients. Eur J Nucl Med Mol Imaging 2021;48:361-71. [Crossref] [PubMed]
- Hong D, Xu K, Zhang L, et al. Radiomics Signature as a Predictive Factor for EGFR Mutations in Advanced Lung Adenocarcinoma. Front Oncol 2020;10:28. [Crossref] [PubMed]
- Selvaraju RR, Cogswell M, Das A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In: 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017. DOI:
10.1109/ICCV.2017.74 . - Wang W, Liu H, Li G. What's the difference between lung adenocarcinoma and lung squamous cell carcinoma? Evidence from a retrospective analysis in a cohort of Chinese patients. Front Endocrinol (Lausanne) 2022;13:947443.
- Zhao H, Su Y, Wang M, et al. The Machine Learning Model for Distinguishing Pathological Subtypes of Non-Small Cell Lung Cancer. Front Oncol 2022;12:875761. [Crossref] [PubMed]
- Han Y, Ma Y, Wu Z, et al. Histologic subtype classification of non-small cell lung cancer using PET/CT images. Eur J Nucl Med Mol Imaging 2021;48:350-60. [Crossref] [PubMed]
- Zhao Z, Guo S, Han L, et al. PKMT-Net: A pathological knowledge-inspired multi-scale transformer network for subtype prediction of lung cancer using histopathological images. Biomed Signal Process Control 2025;106:107742.
- Imran M, Haq B, Elbasi E, et al. Transformer-Based Hierarchical Model for Non-Small Cell Lung Cancer Detection and Classification. IEEE Access 2024;12:145920-33.
- Wang X, Lu Z. Radiomics Analysis of PET and CT Components of 18F-FDG PET/CT Imaging for Prediction of Progression-Free Survival in Advanced High-Grade Serous Ovarian Cancer. Front Oncol 2021;11:638124. [Crossref] [PubMed]
- Callister ME, Baldwin DR, Akram AR, et al. British Thoracic Society guidelines for the investigation and management of pulmonary nodules. Thorax 2015;70:ii1-ii54. [Crossref] [PubMed]
- Silvestri GA, Gonzalez AV, Jantz MA, et al. Methods for staging non-small cell lung cancer: Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 2013;143:e211S-50S.
- Boellaard R, Delgado-Bolton R, Oyen WJ, et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging 2015;42:328-54. [Crossref] [PubMed]
- Bakr S, Gevaert O, Echegaray S, et al. Data for NSCLC Radiogenomics (Version 4). [Data set]. The Cancer Imaging Archive [cited 2026 May 24]; Available online: https://www.cancerimagingarchive.net/collection/nsclc-radiogenomics/
- Li P, Wang S, Li T, et al. A large-scale CT and PET/CT dataset for lung cancer diagnosis [Homepage on the Internet]. 2020 [cited 2026 May 24]; Available from: https://www.cancerimagingarchive.net/collection/lung-pet-ct-dx/
- Basu S, Zaidi H, Holm S, et al. Quantitative Techniques in PET-CT Imaging. CMIR 2011;7:216-33.
- Isensee F, Jaeger PF, Kohl SAA, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021;18:203-11. [Crossref] [PubMed]
- Fedorov A, Beichel R, Kalpathy-Cramer J, et al. 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magn Reson Imaging 2012;30:1323-41. [Crossref] [PubMed]
- Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. Lille, France: JMLR.org, 2015:448-56.
- Nair V, Hinton GE. Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. Madison, WI, USA: Omnipress, 2010:807-14.
- Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017:6000-10.
- Huang L, Xu L, Wang X, et al. Prediction of EGFR Mutations in Lung Adenocarcinoma via CT Images: A Comparative Study of Intratumoral and Peritumoral Radiomics, Deep Learning, and Fusion Models. Acad Radiol 2025;32:4880-92. [Crossref] [PubMed]
- Wang S, Shi J, Ye Z, et al. Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning. Eur Respir J 2019;53:1800986. [Crossref] [PubMed]
- Mulita F, Verras GI, Anagnostopoulos CN, et al. A Smarter Health through the Internet of Surgical Things. Sensors (Basel) 2022;22:4577. [Crossref] [PubMed]
- Anagnostopoulos S, Baltayiannis N, Koletsis NE, et al. 3D printing in medicine: bridging imaging, education, and practice. Arch Med Sci Atheroscler Dis 2025;10:e172-88. [Crossref] [PubMed]
- Mulita A, Valsamaki P, Bekou E, et al. Benefits from 18F-FDG PET-CT-Based Radiotherapy Planning in Stage III Non-Small-Cell Lung Cancer: A Prospective Single-Center Study. Cancers (Basel) 2025;17:1969. [Crossref] [PubMed]

