Development and validation of a PET/CT radiomics and dual-task learning model for the prediction of pathological subtypes and EGFR mutation in non-small cell lung cancer

Fan Jiang; Nan-Feng Zhang; Yi Gao; Xin Chen; En-Tao Liu; Tian Mou

doi:10.21037/tlcr-2026-0279

Original Article

Development and validation of a PET/CT radiomics and dual-task learning model for the prediction of pathological subtypes and EGFR mutation in non-small cell lung cancer

Fan Jiang^1#, Nan-Feng Zhang^1#, Yi Gao^1,2,3#, Xin Chen¹, En-Tao Liu⁴, Tian Mou¹

¹School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; ²Guangdong Provincial Key Laboratory of Mathematical and Neural Dynamical Systems, Dongguan, China; ³Marshall Laboratory of Biomedical Engineering, Shenzhen, China; ⁴PET Center, Department of Nuclear Medicine, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China

Contributions: (I) Conception and design: T Mou; (II) Administrative support: X Chen; (III) Provision of study materials or patients: T Mou, Y Gao, ET Liu; (IV) Collection and assembly of data: F Jiang, NF Zhang, X Chen, ET Liu; (V) Data analysis and interpretation: F Jiang, NF Zhang; (VI) Manuscript writing: F Jiang, NF Zhang, T Mou, Y Gao; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Tian Mou, PhD. School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, 1066 Xueyuan Avenue, Nanshan District, Shenzhen 518060, China. Email: tian.mou@szu.edu.cn.

Background: Accurate pathological subtyping and epidermal growth factor receptor (EGFR) mutation profiling are critical for personalized non-small cell lung cancer (NSCLC) management. However, traditional invasive biopsies possess inherent limitations in dynamic monitoring and capturing tumor heterogeneity. While dual-modal positron emission tomography/computed tomography (PET/CT) imaging provides valuable non-invasive phenotypic insights, deep learning models that jointly fuse these modalities for simultaneous prediction while maintaining clinical interpretability remain scarce. Therefore, this study proposes an integrated dual-modal PET/CT radiomics framework for the simultaneous prediction of pathological subtypes and EGFR mutation status in NSCLC.

Methods: This retrospective study included a total of 384 NSCLC patients with PET/CT images across three independent cohorts. From CT images, sub-regional radiomic features were systematically extracted, while PET images provided spatial metabolic heterogeneity descriptors. Building on these, a Dual-Modal Dual-task Prediction (DMDP) model was developed. This model employs a multi-scale cross-attention mechanism to fuse PET/CT information and utilizes a dual-task learning strategy to synergistically predict both EGFR mutation and pathological subtype. The model’s efficacy was fully validated through ablation studies, and its decision interpretability was assessed using gradient-weighted class activation mapping (Grad-CAM) heatmaps.

Results: Significant differences were identified in PET metabolic parameters and imaging heterogeneity across pathological subtypes and EGFR mutation states (P<0.05). The DMDP model outperformed single-task and traditional machine learning approaches. For EGFR mutation prediction, the model achieved an area under the curve (AUC) of 0.93 (95% CI: 0.81–1.00), with an accuracy of 0.88 (95% CI: 0.83–0.98), sensitivity of 0.86 (95% CI: 0.74–0.95), and specificity of 0.88 (95% CI: 0.75–0.94). For pathological subtyping, the model achieved an AUC of 0.88 (95% CI: 0.73–0.98), sensitivity of 0.85 (95% CI: 0.73–0.95), and specificity of 0.88 (95% CI: 0.77–0.96), demonstrating balanced diagnostic performance compared with traditional models. Integrating multimodal heterogeneity features enhanced predictive performance (P<0.001). Grad-CAM analysis suggested that the model focused on tumor margins and hypermetabolic regions.

Conclusions: The DMDP framework integrated structural and metabolic information and showed potential for non-invasive prediction of pathological subtypes and EGFR mutation status in NSCLC, providing a possible basis for imaging-based risk stratification in selected clinical settings.

Keywords: Non-small cell lung cancer (NSCLC); positron emission tomography/computed tomography radiomics (PET/CT radiomics); EGFR mutation; pathological subtype; dual-task prediction

Submitted Mar 05, 2026. Accepted for publication May 29, 2026. Published online Jun 24, 2026.

doi: 10.21037/tlcr-2026-0279

Highlight box

Key findings

• A novel Dual-Modal Dual-Task Prediction (DMDP) framework was developed, integrating 3D positron emission tomography/computed tomography (PET/CT) imaging with radiomic features through a multi-scale cross-attention mechanism to enable the simultaneous prediction of non-small cell lung cancer (NSCLC) pathological subtypes and epidermal growth factor receptor (EGFR) mutation status.

• The DMDP model demonstrated superior predictive performance, yielding an area under the receiver operating characteristic curve (AUC) of 0.93 (0.88 accuracy) for EGFR mutation status and 0.88 for pathological subtype classification.

• The integration of multi-modal heterogeneity features significantly enhanced specificity to 88%, effectively addressing the sensitivity-specificity imbalance commonly observed in single-modality models.

What is known and what is new?

• While traditional tissue biopsy is invasive, current single-modality imaging predictive models often suffer from limited diagnostic accuracy due to insufficient feature representation.

• This study establishes a transparent, interpretable multi-modal fusion framework. Grad-CAM visualization validated that the model precisely localizes tumor margins and hypermetabolic regions, revealing that internal tumor characteristics are more predictive of gene mutations, whereas peritumoral interstitial information is critical for pathological subtype classification.

What is the implication, and what should change now?

• This integrated framework provides a non-invasive, rapid assessment tool that can be embedded into routine clinical workflows to guide personalized therapeutic decision-making without the immediate necessity for invasive biopsies.

• Future clinical protocols for lung cancer diagnosis should incorporate multi-modal artificial intelligence tools to leverage the complementary value of PET/CT imaging in molecular subtyping.

Introduction

Accounting for roughly 85% of lung cancer cases, non-small cell lung cancer (NSCLC) remains a leading cause of global cancer deaths (1). The World Health Organization (WHO) classifies NSCLC into distinct subtypes, mainly adenocarcinoma and squamous cell carcinoma (2). The epidermal growth factor receptor (EGFR) mutation rate in adenocarcinoma can be as high as 50–60% in Asian populations, while it is less than 5% in squamous cell carcinoma (3). This highlights a deep correlation between molecular mechanisms and pathological subtypes (4,5). The latest National Comprehensive Cancer Network guidelines (Version 4.2026) recommend broad molecular profiling, including EGFR testing, to guide targeted therapies in advanced NSCLC (6). The limitations of traditional invasive biopsy in dynamic monitoring have driven the development of imaging-based predictive research (7,8).

Early studies utilized radiomic features combined with machine learning (ML) models to achieve EGFR mutation prediction (9). Le et al. (10) achieved an accuracy of approximately 80% using the XGBoost classifier. To improve performance, research has shifted towards integrating positron emission tomography (PET) metabolic parameters [e.g., metabolic tumor volume (MTV), total lesion glycolysis (TLG)] and clinical information to reveal the biological characteristics of lesions (11,12). Although the predictive capability of metabolic parameters remains controversial, dual-modal fusion has consistently been shown to outperform single-modality features (13). Recently, deep learning (DL) methods such as convolutional neural networks (CNN) and densely connected convolutional networks (DenseNet) have demonstrated superior performance through end-to-end learning (14,15). However, DL models’ “black-box” nature conflicts with clinical interpretability. Despite attempts using methods like gradient-weighted class activation mapping (Grad-CAM) (16), this field still requires dual-modal fusion solutions that strike a balance between high performance and interpretability. Research has found significant differences between adenocarcinoma and squamous cell carcinoma in clinical, epidemiological, and imaging characteristics (17), providing a basis for non-invasive classification. ML models like support vector machine (SVM) and Random Forest (RF), using multi-modal features, excel at subtype differentiation, surpassing clinical-only models (18).

Simultaneously, DL has achieved breakthroughs in pathological image analysis; for instance, VGG16 achieved an accuracy of 0.84, while advanced Transformers like PKMT-Net boosted the prediction area under the curve (AUC) to 0.92 (19,20). Furthermore, CNN-vision transformer (ViT) fusion models have achieved up to 98% accuracy in pathological classification (21). Despite these advances, the joint application of positron emission tomography/computed tomography (PET/CT) and clinical information remains underutilized for comprehensive diagnosis (22). Previous studies on NSCLC characterization have largely relied on single-modality or single task models, which often struggle to detect subtle genetic phenotypes and fully capture tumor heterogeneity. Integrating computed tomography (CT)-derived anatomical information with PET-derived functional and metabolic features in a dual-modal, dual-task framework offers a more comprehensive representation of tumor biology.

Although CT remains the standard first-line imaging modality, in real-world clinical practice, patients with high-risk pulmonary nodules or suspected lung cancer often undergo FDG PET/CT for staging and treatment decision-making (23,24). Because PET/CT inherently includes CT acquisition, and many patients have previously undergone diagnostic chest CT, PET and CT data are naturally co-available in this selected population (25). However, DL models that jointly fuse PET/CT and clinical data for simultaneous EGFR mutation prediction and subtype classification remain scarce, and existing approaches lack sufficient interpretability.

Based on these observations, this study proposes a dual-modal predictive framework that fuses PET/CT imaging features with clinical information. This framework combines ensemble learning with DL, aiming to construct a radiomics tool that balances both performance and interpretability. We present this article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0279/rc).

Methods

Patient cohorts

Patients were consecutively enrolled from three centers between 2007 and 2021. Multi-center data comprised: (I) The Cancer Imaging Archive (TCIA) NSCLC Radiogenomics (26) (n=211; CT, PET, and EGFR data); (II) TCIA Lung-PET-CT-Dx (27) (n=355; 146 paired PET/CT cases); and (III) a clinical cohort from Guangdong Provincial People’s Hospital (n=95; PET/CT and clinical records). For the EGFR mutation prediction task, only patients with available EGFR mutation status were included. EGFR mutation status in the TCIA NSCLC Radiogenomics dataset was obtained from publicly released clinical/genomic annotations, with EGFR exons 18–21 tested by multiplex PCR followed by the SNaPshot single-base extension assay. In the Guangdong Provincial People’s Hospital cohort, EGFR mutation status was extracted from institutional clinical records and molecular pathology reports, with molecular testing performed using targeted probe-capture-based next-generation sequencing on the Illumina platform. These molecular testing results were used as the reference standard for EGFR mutation status. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study, including the cohort from Guangdong Provincial People’s Hospital, was approved by the Medical Ethics Committee of Shenzhen University Medical School (No. PN-202600012), and the requirement for individual informed consent was waived for this retrospective analysis.

This was a retrospective prediction model study, and the study size was determined by all eligible patients with available PET/CT images, clinical information, and reference-standard labels during the study period. The final study population and the detailed screening process, based on the inclusion and exclusion criteria, are illustrated in the flow chart (Figure 1).

Figure 1 Flowchart of patient selection. CT, computed tomography; EGFR, epidermal growth factor receptor; FDG, fluorodeoxyglucose; NSCLC, non-small cell lung cancer; PET, positron emission tomography; TCIA, The Cancer Imaging Archive.

Inclusion and exclusion criteria

Consecutive patients were included if they met the following criteria: (I) underwent whole-body or chest ¹⁸F-FDG PET/CT scanning prior to diagnosis; (II) had postoperative pathological confirmation of NSCLC with detailed pathological information available; (III) had received no prior antineoplastic therapy or surgical resection prior to PET/CT examination.

Exclusion criteria comprised: (I) incomplete clinical data or loss to follow-up; (II) a history of other malignant tumors; (III) multicentric lung cancer or poorly defined tumor margins; (IV) misalignment between PET and CT images; (V) insufficient number of CT or PET slices or poor image quality; and (VI) abnormally low FDG uptake in the tumor tissue.

Imaging preprocessing

ML dataset preprocessing included converting raw DICOM data into NIfTI format and performing standardized uptake value (SUV) (28) correction on PET images to minimize device and physiological variations. Tumor region of interest (ROI) extraction followed a standardized protocol: existing masks were applied to annotated data, and CT images underwent automatic segmentation via nnU-Net (29). Remaining PET images were manually delineated using 3D Slicer (30) by one of the authors.

The DL dataset was randomly partitioned into training and validation sets (6:4 ratio) using stratified sampling based on EGFR mutation and pathological subtype labels. ROI preprocessing involved 3D clipping around the tumor center to generate uniform volumes (48 mm × 96 mm × 96 mm). Missing values in high-dimensional radiomic features were imputed via median imputation and Z-score normalized to ensure training stability.

Feature selection

Feature selection was performed separately for the EGFR mutation prediction and pathological subtype classification tasks. For PET features, Recursive Feature Elimination with Cross-Validation (RFECV) (26) was applied to identify the optimal feature subset. The initial 20 PET features were reduced to 6 features for EGFR mutation prediction and 5 features for pathological subtype classification. For CT radiomic features, a two-step selection strategy was adopted. First, 130 candidate CT features were screened using the Minimum Redundancy Maximum Relevance (mRMR) algorithm, and the top 20 features were retained for each region-specific feature set. Subsequently, least absolute shrinkage and selection operator (LASSO) regression was used for further dimensionality reduction, and features with non-zero coefficients were retained for model construction.

For EGFR mutation prediction, the final CT_reg1, CT_reg2, CT_reg5, CT_reg6, and CT_reg_all feature sets contained 8, 2, 5, 6, and 7 features, respectively. The PET-clinical and PET-CT-clinical feature sets contained 9 and 19 features, respectively. For pathological subtype classification, the corresponding CT_reg1, CT_reg2, CT_reg5, CT_reg6, and CT_reg_all feature sets contained 5, 7, 5, 6, and 9 features, respectively. The PET-clinical and PET-CT-clinical feature sets contained 8 and 13 features, respectively (Tables S1,S2). The de-identified extracted features after feature selection have been deposited in a publicly accessible GitHub repository (https://github.com/jiang-0311/DMDP).

All selected features were standardized before model training to reduce inter-feature variability. Potential dataset-related batch effects were explored using UMAP visualization, and no additional harmonization procedure, such as ComBat, was applied because no obvious dataset-specific clustering was observed in either cohort (Figures S1,S2).

ML model development

The predictive models for EGFR mutation status and pathological subtype classification were developed following a standardized radiomics-based ML pipeline (Figure 2). The final selected PET, CT, PET-clinical, and PET-CT-clinical feature sets were used for model construction, with detailed feature compositions provided in Tables S1,S2. The dataset was divided into training and independent test cohorts at a ratio of 8:2 using stratified random sampling. A two-layer stacked ensemble framework was then applied, with Logistic Regression (LR), RF, and Extremely Randomized Trees (ERT) used as first-layer base learners and LR used as the second-layer meta-learner. Hyperparameters were optimized using grid search within the training cohort, and 5-fold cross-validation was performed to evaluate model stability. The final models were assessed on the independent test cohort to evaluate generalizability.

Figure 2 Schematic illustration of the ensemble learning model construction. (A) The individual base models (EXT, LG, and RF) trained on the integrated PET_CT_clinical feature set. (B) The stacking framework where a meta-learner combines the outputs of base models to generate final predictions for EGFR mutation and pathological subtypes. Arrows indicate the sequential data flow from feature input to the ensemble prediction. CT, computed tomography; EGFR, epidermal growth factor receptor; EXT, extremely randomized trees; LG, logistic regression; PET, positron emission tomography; PET_CT_clinical, combined features from PET, CT, and clinical data; RF, random forest.

DL model construction

To optimize multimodal information fusion and multi-level feature learning, we developed a DMDP model (Figure 3), which employs an end-to-end learning framework to extract high-dimensional latent representations while simultaneously optimizing synergetic fusion of PET/CT data and collaborative prediction across two diagnostic tasks. The model is structured into five core modules:

Figure 3 Schematic overview of the DMDP architecture. The framework integrates multi-modal imaging data with clinical text-based features for simultaneous prediction. Arrows indicate the direction of data flow between network modules. BN, batch normalization; CT, computed tomography; DMDP, Dual-Modal Dual-task Prediction; EGFR, epidermal growth factor receptor; FC, fully connected layer; IN, instance normalization; PET, positron emission tomography; ReLU, rectified linear unit.

Dual-modal encoder: This module extracted tumor-related features from 3D CT and PET images using 3 × 3 convolutional kernels, batch normalization (31), and rectified linear unit (ReLU) (32) activation.

Multi-scale cross attention: Pixel-level information exchange between modalities was facilitated via a Transformer-based lightweight cross-modal attention module (33).

Text encoder: Manually extracted radiomic and clinical features were encoded using fully connected layers, instance normalization, and ReLU activation.

Contrastive learning loss: Features were aligned by mapping imaging and radiomic data into a unified low-dimensional space to minimize inter-modal distances.

Dual-task prediction heads: Parallel prediction for EGFR mutation and pathological subtype was performed using fully connected layers and a joint loss function incorporating cross-entropy and contrastive loss.

Model evaluation and interpretability

Model performance was evaluated using AUC, accuracy, F1-score, sensitivity, and specificity. Ablation experiments were conducted to quantify the contribution of the cross-modal alignment loss and feature screening modules. Grad-CAM was employed for the visualization of regions of interest, and training stability was assessed via loss curve analysis. All models were implemented using the PyTorch (version 1.10.2) framework.

Statistical analysis

Statistical analyses were performed using R (version 4.2.1) and Python (version 3.9). A two-sided P<0.05 was considered statistically significant. Continuous variables were evaluated for normality via the Shapiro-Wilk test. Normally distributed data were expressed as mean ± standard deviation and compared using the independent samples t-test; otherwise, median and the Mann-Whitney U test were used. Categorical variables were expressed as frequencies (%) and compared using the Chi-square or Fisher’s exact test. These methods were applied to evaluate the discriminative value of clinical factors, metabolic indices (SUVmax, SUVmean, TLG), and heterogeneity parameters (SUVskew, SUVkurtosis, Het) across cohorts.

Results

Patient cohorts and clinical characteristics

After applying the inclusion and exclusion criteria, 228 patients were included in the EGFR mutation prediction cohort and 365 in the pathological subtype classification cohort (Figure 1). Demographic characteristics of the study population are summarized in Table 1.

Table 1

Characteristics of the study population

Characteristics	GD-Data (n=75)	NR-Data (n=186)	LPCD-Data (n=123)	P value
Age (years)	63 [54, 72]	69 [63, 75]	62 [56, 69]	<0.001
Weight (kg)	–	78.2 [62.8, 90.1]	66.0 [58.0, 76.2]	<0.001
Gender				0.35
Male	28 (37.3)	62 (33.3)	51 (41.5)
Female	47 (62.7)	124 (66.7)	72 (58.5)
Smoking status				<0.001
Never	22 (29.3)	41 (22.0)	53 (43.1)
Smoked	53 (70.7)	145 (78.0)	70 (56.9)
EGFR mutation status				<0.001
EGFR−	45 (60.0)	117 (62.9)	–
EGFR+	30 (40.0)	36 (19.4)	–
Pathological type				<0.001
Squamous cell carcinoma	12 (16.0)	29 (15.6)	32 (26.0)
Adenocarcinoma	48 (64.0)	153 (82.3)	91 (74.0)

Data are presented as median [IQR] or n (%). EGFR, epidermal growth factor receptor; GD-Data, Guangdong Provincial People’s Hospital Clinical Dataset; IQR, interquartile range; LPCD-Data, The TCIA Lung-PET-CT-Dx dataset; NR-Data, The TCIA NSCLC Radiogenomics dataset; TCIA, The Cancer Imaging Archive.

EGFR mutation prediction dataset

The training cohort (n=228) comprised 66 (28.9%) EGFR-mutant and 162 (71.1%) wild-type cases. Significant differences existed in sex (higher female proportion in the mutant group; P<0.001), weight, and age (all P<0.05). Regarding PET, heterogeneity metrics (SUVskew, SUVkurtosis, Het) and tri-axial parameters (type_xy, type_xz, type_yz) varied significantly (all P<0.05), associating EGFR mutation with metabolic/spatial heterogeneity. Conversely, American Joint Committee on Cancer (AJCC) stage, SUVmax, and SUVmean showed no significant differences (P>0.3). Thus, EGFR mutation primarily influences clinical features and PET heterogeneity rather than tumor stage or baseline metabolism (Table S3).

Pathological subtype classification dataset

The cohort (n=365) comprised 292 (80.0%) adenocarcinomas and 73 (20.0%) squamous cell carcinomas. Only sex distribution varied significantly between subtypes (higher male proportion in squamous cell carcinoma; P<0.001); age, weight, and AJCC stage showed no differences. Squamous cell carcinoma exhibited higher metabolic activity, with significant differences in SUVmax, SUVmean, and TLG_3. Heterogeneity metrics, including Het (P=0.02), tri-axial SUV_mean_sd (all P<0.001), and type_yz (P=0.01), also varied significantly. SUVskew, SUVkurtosis, and MTV_3 showed no significant variations (Table S4).

Results of ML analysis

Model performance and discriminative ability

For EGFR mutation prediction, the predictive performance for EGFR mutation status across various feature ensembles is summarized in Table 2. The multi-modal integration of PET, CT, and clinical data (PET_CT_clinical) demonstrated the highest diagnostic efficacy among all ensembles. Within the CT-only category, the ensemble incorporating all sub-regional features (CT_reg_all) achieved the optimal performance. Specifically, the Stacking ensemble model yielded a peak AUC of 0.90 (95% CI: 0.81–0.96) using the PET_CT_clinical set, representing a significant improvement over the baseline model and indicating exceptional discriminatory capability. The corresponding receiver operating characteristic (ROC) curves for the top-performing feature sets (CT_reg5, CT_reg_all, PET_clinical, and PET_CT_clinical) are illustrated in Figure 4A-4D.

Table 2

Optimal models for different feature sets and their classification performance metrics

Feature set	Best model	AUC (95% CI)	Acc (95% CI)	F1 (95% CI)	Sen (95% CI)	Spe (95% CI)
CT_reg1	Logistic	0.72 (0.57–0.86)	0.59 (0.49–0.74)	0.43 (0.33–0.61)	0.31 (0.25–0.50)	0.85 (0.78–0.93)
CT_reg2	Stacking	0.62 (0.52–0.79)	0.62 (0.56–0.75)	0.61 (0.46–0.73)	0.62 (0.48–0.74)	0.61 (0.48–0.82)
CT_reg5	Stacking	0.77 (0.53–0.88)	0.67 (0.46–0.79)	0.62 (0.46–0.70)	0.54 (0.45–0.73)	0.79 (0.67–0.90)
CT_reg6	Stacking	0.74 (0.61–0.88)	0.76 (0.69–0.90)	0.71 (0.62–0.86)	0.69 (0.51–0.84)	0.70 (0.65–0.89)
CT_reg_all	Stacking	0.82 (0.63–0.87)	0.76 (0.67–0.85)	0.70 (0.61–0.80)	0.92 (0.67–1.00)	0.58 (0.43–0.68)
PET	Stacking	0.72 (0.55–0.90)	0.59 (0.48–0.67)	0.44 (0.35–0.52)	0.46 (0.32–0.58)	0.73 (0.56–0.86)
PET_clinical	Logistic	0.78 (0.60–0.93)	0.68 (0.57–0.80)	0.64 (0.58–0.77)	0.62 (0.56–0.75)	0.79 (0.68–0.87)
PET_CT_clinical	Stacking	0.90 (0.81–0.96)	0.83 (0.72–0.92)	0.78 (0.71–0.95)	0.85 (0.75–0.91)	0.79 (0.72–0.86)

Acc, accuracy; AUC, area under the receiver operating characteristic curve; CI, confidence interval; CT, computed tomography; PET, positron emission tomography; Sen, sensitivity; Spe, specificity.

Figure 4 Comparison of model performance for EGFR mutation prediction and pathological subtype classification. The diagonal grey dashed line in each panel represents chance performance (AUC =0.50). (A-D) ROC curves for EGFR mutation prediction using: (A) the top 5 CT radiomic features (CT_reg5), (B) all CT radiomic features (CT_reg_all), (C) combined PET and clinical features (PET_clinical), and (D) the multi-modal integration of PET, CT, and clinical features (PET_CT_clinical). (E-H) ROC curves for pathological subtype classification (adenocarcinoma vs. squamous cell carcinoma) using: (E) the top 6 radiomic features (reg_6), (F) all radiomic features (reg_all), (G) PET radiomic features (PET), and (H) combined PET and clinical features (PET_clinical). AUC, area under the curve; CI, confidence interval; CT, computed tomography; EGFR, epidermal growth factor receptor; PET, positron emission tomography; ROC, receiver operating characteristic.

For pathological subtype classification (adenocarcinoma versus squamous cell carcinoma), internal validation (Table S5) identified the PET_clinical feature set as the optimal ensemble, while the CT_reg6 model outperformed other CT-based configurations (Figure 4E-4H). The RF model achieved the best performance using the PET_clinical set, with an AUC of 0.87 (95% CI: 0.78–0.95). Notably, models utilizing PET-derived features alone exhibited performance comparable to combined feature sets, underscoring the dominant role of metabolic imaging in subtype differentiation. When evaluated on the independent external validation set (GD-Data), the lead model maintained an AUC of 0.74 (95% CI: 0.63–0.82) (Table 3), confirming the robust generalizability of the proposed framework across different clinical centers.

Table 3

Model performance comparison for subtype classification: internal versus external validation

Model	Internal validation			External validation
Model	AUC (95% CI)	ACC (95% CI)	F1 (95% CI)	AUC (95% CI)	ACC (95% CI)	F1 (95% CI)
RF	0.87 (0.76–0.96)	0.85 (0.75–0.93)	0.91 (0.85–0.96)	0.74 (0.63–0.82)	0.75 (0.68–0.82)	0.77 (0.70–0.77)
EXT	0.80 (0.64–0.92)	0.75 (0.64–0.85)	0.84 (0.75–0.91)	0.71 (0.54–0.84)	0.74 (0.67–0.81)	0.60 (0.52–0.67)
LR	0.78 (0.60–0.92)	0.69 (0.56–0.80)	0.78 (0.67–0.87)	0.69 (0.52–0.81)	0.63 (0.52–0.74)	0.63 (0.58–0.71)
Stacking	0.84 (0.69–0.95)	0.77 (0.67–0.87)	0.84 (0.75–0.92)	0.72 (0.57–0.86)	0.72 (0.64–0.79)	0.62 (0.54–0.70)

ACC, accuracy; AUC, area under the receiver operating characteristic curve; CI, confidence interval; EXT, extra randomized trees; LR, logistic regression; RF, random forest.

Model calibration and probability distribution

Model reliability and discriminative robustness were further evaluated using calibration curves and probability distribution plots (Figure 5). For EGFR mutation prediction based on the PET-CT-clinical feature set (Figure 5A-5D), the Stacking model (Figure 5A) demonstrated superior performance, with its calibration curve most closely approximating the ideal diagonal line (Brier score = 0.12). This indicates a high degree of consistency between the predicted probabilities and actual mutation risks. Correspondingly, the violin plots revealed a distinct bimodal separation in predicted probabilities between the mutation and wild-type groups, further validating the robustness of the Stacking approach.

Figure 5 Calibration and discriminative performance of machine learning models for EGFR mutation and pathological subtype prediction. The solid black diagonal line in the calibration plots represents perfect agreement between predicted and actual probabilities. (A-D) Performance of four models based on the PET_CT_clinical feature set for EGFR mutation prediction: (A) Stacking, (B) EXT, (C) RF, and (D) LG. For each model, the left sub-panel shows the calibration curve with the Brier score, and the right sub-panel shows a violin plot of predicted probability distributions across groups. (E-H) Performance of four models based on the PET_clinical feature set for pathological subtype classification (adenocarcinoma vs. squamous cell carcinoma): (E) RF, (F) Stacking, (G) EXT, and (H) LG. CT, computed tomography; EGFR, epidermal growth factor receptor; EXT, extremely randomized trees; LG, logistic regression; NSCLC, non-small cell lung cancer; PET, positron emission tomography; RF, random forest.

Regarding pathological subtype classification (Figure 5E-5H), models constructed using the PET-clinical feature set were compared. The RF model (Figure 5E) exhibited the optimal calibration, showing the closest alignment with the ideal diagonal and the most pronounced separation in probability distributions. These findings suggest that the RF model possesses the highest discriminative performance and generalization capability for subtype differentiation, followed by the Stacking model (Figure 5F).

Results of DL approach

Loss curve analysis and convergence assessment

To ensure robust model training, we evaluated the impact of different loss functions and feature sets on convergence (Figure S3). A comparative analysis revealed that incorporating cross-modal contrastive learning loss significantly stabilized the total loss (Figure S3B,S3C). Additionally, the transition from Figure S3C,S3D demonstrates that radiomic feature screening further facilitated model convergence, leading to a more efficient training process.

Model evaluation

Ablation results are summarized in Table 4. After integrating radiomic features, the specificity for mutation prediction and subtype classification changed by 9% and 13%, respectively. Without dual-modal alignment loss, sensitivity and specificity for subtype classification were 0.92 and 0.25; with this loss, these metrics were 0.85 and 0.88. Feature screening increased mutation prediction sensitivity by 16% and subtype classification specificity by 13%.

Table 4

Performance of the ablation experiment across two tasks

Group	Module combination	Task	Acc	F1	Sen	Spe
1	DMDP (without heterogeneous radiomic features)	Mutant prediction	0.79	0.75	0.71	0.79
1	DMDP (without heterogeneous radiomic features)	Subtype classification	0.81	0.78	0.82	0.75
2	DMDP (without L_clloss)	Mutant prediction	0.62	0.64	0.71	0.58
2	DMDP (without L_clloss)	Subtype classification	0.81	0.39	0.92	0.25
3	DMDP (heterogeneous radiomic features without selected)	Mutant prediction	0.79	0.75	0.70	0.83
3	DMDP (heterogeneous radiomic features without selected)	Subtype classification	0.79	0.77	0.80	0.75
4	DMDP	Mutant prediction	0.88	0.87	0.86	0.88
4	DMDP	Subtype classification	0.85	0.86	0.85	0.88

Acc, accuracy; DMDP, Dual-Modal Dual-task Prediction; L_cl loss, contrastive loss; Sen, sensitivity; Spe, specificity.

Performance comparison of the DMDP model and alternative models

To validate the performance of the DMDP model in mutation prediction and subtype classification, this study compared it with optimal ML models (Stacking and RF) (see Table 5 for details). Results demonstrate that DMDP achieved overall optimal and more balanced performance across both tasks. In mutation prediction, it increased overall accuracy by 8% and specificity by 9%. For subtype classification, specificity improved by 55% and AUC rose by 8%. This indicates that multimodal data inputs and the DMDP architecture are better suited for predicting genetic mutations and pathological subtypes.

Table 5

Performance of different methods across two tasks

Feature set	Best model	Task	AUC (95% CI)	Acc (95% CI)	F1 (95% CI)	Sen (95% CI)	Spe (95% CI)
PET_CT_clinical	Stacking	Mutant prediction	0.90 (0.81–0.96)	0.80 (0.74–0.86)	0.71 (0.64–0.78)	0.85 (0.79–0.91)	0.79 (0.72–0.86)
PET_clinical	RF	Subtype classification	0.87 (0.78–0.95)	0.85 (0.79–0.91)	0.91 (0.86–0.96)	0.98 (0.95–1.00)	0.33 (0.24–0.42)
PET_CT_clinical	DMDP	Mutant prediction	0.93 (0.81–1.00)	0.88 (0.83–0.98)	0.87 (0.69–0.97)	0.86 (0.74–0.95)	0.88 (0.75–0.94)
PET_CT_clinical	DMDP	Subtype classification	0.88 (0.73–0.98)	0.85 (0.75–0.94)	0.86 (0.83–0.97)	0.85 (0.73–0.95)	0.88 (0.77–0.96)

Acc, accuracy; AUC, area under the receiver operating characteristic curve; CI, confidence interval; CT, computed tomography; DMDP, Dual-Modal Dual-task Prediction; PET, positron emission tomography; RF, random forest; Sen, sensitivity; Spe, specificity.

Model interpretability analysis

Grad-CAM visualization elucidated the model’s attentional focus. The original input CT and PET images are shown in Figure S4A,S4B. In the Dual-Modal Encoder (Figure S4C,S4D), the network localized tumor regions within individual modalities. After Multi-Scale Cross Attention integration (Figure S4E-S4H), the small-scale branch targeted lesion margins, while large-scale branch captured high-density CT and hypermetabolic PET regions, validating the multi-modal fusion strategy.

Analysis across task-label patterns (Figure 6) revealed systematic attentional shifts. In concordant positive cases, subtype prediction prioritized the CT microenvironment, whereas mutation prediction targeted the tumor core. Label divergence shifted attention toward peri-tumoral tissues. These findings demonstrate that task-label variations modulate dual-task feature learning, reflecting latent inter-task correlations and the synergy of dual-prediction heads.

Figure 6 Feature gradient heatmaps of the DMDP model across six clinical label patterns. (A) Original CT and PET images; (B) attention heatmaps for the pathological subtype prediction task; and (C) attention heatmaps for the EGFR mutation prediction task. Columns I–VI represent six distinct label combination patterns. The heatmaps were generated using gradient-weighted class activation mapping (Grad-CAM). The colour bar indicates the relative importance of image regions to the model’s prediction, where warmer colours (red) represent regions with high influence and cooler colours (blue) represent regions with low influence. CT, computed tomography; DMDP, Dual-Modal Dual-task Prediction; EGFR, epidermal growth factor receptor; Grad-CAM, gradient-weighted class activation mapping; PET, positron emission tomography.

Discussion

This study demonstrated that the DMDP model, which integrated CT and PET images with a collaborative learning architecture, achieved superior performance in both EGFR mutation prediction and pathological subtype classification compared with baseline models. Specifically, the model achieved a 9% increase in specificity for mutation prediction and a 55% increase for subtype classification. Ablation experiments confirmed that the cross-modal alignment loss played a central role in maintaining model robustness by correcting the sensitivity-specificity imbalance often observed with single-modality inputs. Grad-CAM visualization analysis revealed that the dual-task prediction heads automatically focused on anatomical regions: gene mutation prediction prioritized internal tumor features, whereas subtype classification relied more on interstitial information in the peritumoral area. This complementary feature extraction pattern may have contributed to the observed performance gains.

Compared to previous studies relying on manual radiomic features, such as the best-performing model reported by Huang et al. (34) (AUC of 0.803), our end-to-end learning framework achieved improved predictive performance in the present dataset. While DL models by Coudray et al. (13) and Wang et al. (35) performed well in single tasks, their reliance on a single modality limited their ability to capture the full spectrum of tumor heterogeneity. The DMDP model addressed the “black box” nature of conventional DL by incorporating a multi-scale cross-attention mechanism, providing an interpretable path for feature fusion. Furthermore, we found that incorporating feature screening strategies from ML into the DL pre-training phase effectively accelerated convergence and enhanced information efficiency. This hybrid strategy addressed the training instability often encountered with pure DL models in small medical imaging datasets.

Despite its strengths, this study has several limitations. First, its retrospective design and limited sample size may affect the stability and generalizability of the model; therefore, further independent, multi-center prospective validation is required before clinical implementation. Second, reliance on paired PET/CT may restrict clinical utility in resource-limited settings. Finally, while Grad-CAM improves transparency, the biological basis of DL features remains partially opaque. Future research should integrate multi-omics data and explore missing-modality learning to enhance the framework’s biological interpretability and clinical flexibility.

Looking beyond the refinement of artificial intelligence algorithms, the broader landscape of NSCLC management will inevitably be shaped by other synergistic technologies. In the last few years, technological developments in the medical field have been rapid and are continuously evolving. One of the most revolutionary breakthroughs was the introduction of the Internet of Things (IoT) concept within medical practice (36). In the context of NSCLC, the IoT can facilitate real-time patient monitoring and seamless data integration, which could eventually complement the predictive power of advanced radiomics. Additionally, the role of 3D printing in oncology is becoming increasingly prominent, offering unprecedented opportunities for personalized surgical planning and targeted interventions in NSCLC (37). Furthermore, therapeutic phases rely on advanced imaging; for instance, 18F-FDG PET-CT-based radiotherapy planning in stage III NSCLC allows for precise target volume delineation and improved outcomes (38). Integrating our framework with these emerging technologies may support a more personalized and data-driven approach to lung cancer management.

Conclusions

This study developed a DL-based framework integrating dual-modal PET/CT radiomics for the simultaneous prediction of EGFR mutations and pathological subtypes in NSCLC. By characterizing complementary anatomical and metabolic imaging information, the model showed potential value for non-invasive molecular and histological prediction and may provide a technical basis for imaging-based risk stratification in selected clinical settings.

Acknowledgments

We sincerely thank all patients whose medical data contributed to this study.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0279/rc

Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0279/dss

Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0279/prf

Funding: This study was supported by the Shenzhen Medical Research Fund (No. C2501017) and the National Natural Science Foundation of China (Nos. 82202246 and 82171957).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0279/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study, including the cohort from Guangdong Provincial People’s Hospital, was approved by the Medical Ethics Committee of Shenzhen University Medical School (No. PN-202600012), and the requirement for individual informed consent was waived for this retrospective analysis.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
Ettinger DS, Wood DE, Aisner DL, et al. Non-Small Cell Lung Cancer, Version 3.2022, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 2022;20:497-530. [Crossref] [PubMed]
Vavala T, Malapelle U, Veggiani C, et al. Molecular profiling of advanced non-small cell lung cancer in the era of immunotherapy approach: a multicenter Italian observational prospective study of biomarker screening in daily clinical practice. J Clin Pathol 2022;75:234-40. [Crossref] [PubMed]
Yamamoto H, Toyooka S, Mitsudomi T. Impact of EGFR mutation analysis in non-small cell lung cancer. Lung Cancer 2009;63:315-21. [Crossref] [PubMed]
da Cunha Santos G, Shepherd FA, Tsao MS. EGFR mutations and lung cancer. Annu Rev Pathol 2011;6:49-69. [Crossref] [PubMed]
Riely GJ, Wood DE, Aisner DL, et al. Non-Small Cell Lung Cancer, Version 4.2026, NCCN Clinical Practice Guidelines In Oncology. J Natl Compr Canc Netw 2026;24:e260017. [Crossref] [PubMed]
Eze C, Schmidt-Hegemann NS, Sawicki LM, et al. PET/CT imaging for evaluation of multimodal treatment efficacy and toxicity in advanced NSCLC-current state and future directions. Eur J Nucl Med Mol Imaging 2021;48:3975-89. [Crossref] [PubMed]
Mu W, Jiang L, Zhang J, et al. Non-invasive decision support for NSCLC treatment using PET/CT radiomics. Nat Commun 2020;11:5228. [Crossref] [PubMed]
Zhang J, Zhao X, Zhao Y, et al. Value of pre-therapy (18)F-FDG PET/CT radiomics in predicting EGFR mutation status in patients with non-small cell lung cancer. Eur J Nucl Med Mol Imaging 2020;47:1137-46. [Crossref] [PubMed]
Le NQK, Kha QH, Nguyen VH, et al. Machine Learning-Based Radiomics Signatures for EGFR and KRAS Mutations Prediction in Non-Small-Cell Lung Cancer. Int J Mol Sci 2021;22:9254. [Crossref] [PubMed]
Moon SH, Kim J, Joung JG, et al. Correlations between metabolic texture features, genetic heterogeneity, and mutation burden in patients with lung cancer. Eur J Nucl Med Mol Imaging 2019;46:446-54. [Crossref] [PubMed]
Lv Z, Fan J, Xu J, et al. Value of (18)F-FDG PET/CT for predicting EGFR mutations and positive ALK expression in patients with non-small cell lung cancer: a retrospective analysis of 849 Chinese patients. Eur J Nucl Med Mol Imaging 2018;45:735-50. [Crossref] [PubMed]
Coudray N, Ocampo PS, Sakellaropoulos T, et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med 2018;24:1559-67. [Crossref] [PubMed]
Song Z, Liu T, Shi L, et al. The deep learning model combining CT image and clinicopathological information for predicting ALK fusion status and response to ALK-TKI therapy in non-small cell lung cancer patients. Eur J Nucl Med Mol Imaging 2021;48:361-71. [Crossref] [PubMed]
Hong D, Xu K, Zhang L, et al. Radiomics Signature as a Predictive Factor for EGFR Mutations in Advanced Lung Adenocarcinoma. Front Oncol 2020;10:28. [Crossref] [PubMed]
Selvaraju RR, Cogswell M, Das A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In: 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017. DOI: 10.1109/ICCV.2017.74.
Wang W, Liu H, Li G. What's the difference between lung adenocarcinoma and lung squamous cell carcinoma? Evidence from a retrospective analysis in a cohort of Chinese patients. Front Endocrinol (Lausanne) 2022;13:947443.
Zhao H, Su Y, Wang M, et al. The Machine Learning Model for Distinguishing Pathological Subtypes of Non-Small Cell Lung Cancer. Front Oncol 2022;12:875761. [Crossref] [PubMed]
Han Y, Ma Y, Wu Z, et al. Histologic subtype classification of non-small cell lung cancer using PET/CT images. Eur J Nucl Med Mol Imaging 2021;48:350-60. [Crossref] [PubMed]
Zhao Z, Guo S, Han L, et al. PKMT-Net: A pathological knowledge-inspired multi-scale transformer network for subtype prediction of lung cancer using histopathological images. Biomed Signal Process Control 2025;106:107742.
Imran M, Haq B, Elbasi E, et al. Transformer-Based Hierarchical Model for Non-Small Cell Lung Cancer Detection and Classification. IEEE Access 2024;12:145920-33.
Wang X, Lu Z. Radiomics Analysis of PET and CT Components of 18F-FDG PET/CT Imaging for Prediction of Progression-Free Survival in Advanced High-Grade Serous Ovarian Cancer. Front Oncol 2021;11:638124. [Crossref] [PubMed]
Callister ME, Baldwin DR, Akram AR, et al. British Thoracic Society guidelines for the investigation and management of pulmonary nodules. Thorax 2015;70:ii1-ii54. [Crossref] [PubMed]
Silvestri GA, Gonzalez AV, Jantz MA, et al. Methods for staging non-small cell lung cancer: Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 2013;143:e211S-50S.
Boellaard R, Delgado-Bolton R, Oyen WJ, et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging 2015;42:328-54. [Crossref] [PubMed]
Bakr S, Gevaert O, Echegaray S, et al. Data for NSCLC Radiogenomics (Version 4). [Data set]. The Cancer Imaging Archive [cited 2026 May 24]; Available online: https://www.cancerimagingarchive.net/collection/nsclc-radiogenomics/
Li P, Wang S, Li T, et al. A large-scale CT and PET/CT dataset for lung cancer diagnosis [Homepage on the Internet]. 2020 [cited 2026 May 24]; Available from: https://www.cancerimagingarchive.net/collection/lung-pet-ct-dx/
Basu S, Zaidi H, Holm S, et al. Quantitative Techniques in PET-CT Imaging. CMIR 2011;7:216-33.
Isensee F, Jaeger PF, Kohl SAA, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021;18:203-11. [Crossref] [PubMed]
Fedorov A, Beichel R, Kalpathy-Cramer J, et al. 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magn Reson Imaging 2012;30:1323-41. [Crossref] [PubMed]
Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. Lille, France: JMLR.org, 2015:448-56.
Nair V, Hinton GE. Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. Madison, WI, USA: Omnipress, 2010:807-14.
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017:6000-10.
Huang L, Xu L, Wang X, et al. Prediction of EGFR Mutations in Lung Adenocarcinoma via CT Images: A Comparative Study of Intratumoral and Peritumoral Radiomics, Deep Learning, and Fusion Models. Acad Radiol 2025;32:4880-92. [Crossref] [PubMed]
Wang S, Shi J, Ye Z, et al. Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning. Eur Respir J 2019;53:1800986. [Crossref] [PubMed]
Mulita F, Verras GI, Anagnostopoulos CN, et al. A Smarter Health through the Internet of Surgical Things. Sensors (Basel) 2022;22:4577. [Crossref] [PubMed]
Anagnostopoulos S, Baltayiannis N, Koletsis NE, et al. 3D printing in medicine: bridging imaging, education, and practice. Arch Med Sci Atheroscler Dis 2025;10:e172-88. [Crossref] [PubMed]
Mulita A, Valsamaki P, Bekou E, et al. Benefits from 18F-FDG PET-CT-Based Radiotherapy Planning in Stage III Non-Small-Cell Lung Cancer: A Prospective Single-Center Study. Cancers (Basel) 2025;17:1969. [Crossref] [PubMed]

Cite this article as: Jiang F, Zhang NF, Gao Y, Chen X, Liu ET, Mou T. Development and validation of a PET/CT radiomics and dual-task learning model for the prediction of pathological subtypes and EGFR mutation in non-small cell lung cancer. Transl Lung Cancer Res 2026;15(6):178. doi: 10.21037/tlcr-2026-0279

Development and validation of a PET/CT radiomics and dual-task learning model for the prediction of pathological subtypes and EGFR mutation in non-small cell lung cancer

Highlight box

Introduction

Methods

Patient cohorts

Inclusion and exclusion criteria

Imaging preprocessing

Feature selection

ML model development

DL model construction

Model evaluation and interpretability

Statistical analysis

Results

Patient cohorts and clinical characteristics

Table 1

EGFR mutation prediction dataset

Pathological subtype classification dataset

Results of ML analysis

Model performance and discriminative ability

Table 2

Table 3

Model calibration and probability distribution

Results of DL approach

Loss curve analysis and convergence assessment

Model evaluation

Table 4

Performance comparison of the DMDP model and alternative models

Table 5

Model interpretability analysis

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share