Development and validation of a dual-attention deep learning prediction model for noninvasive differentiation of pulmonary invasive mucinous adenocarcinoma from inflammatory pulmonary nodules
Original Article

Development and validation of a dual-attention deep learning prediction model for noninvasive differentiation of pulmonary invasive mucinous adenocarcinoma from inflammatory pulmonary nodules

Jiading Xie1,2#, Shiqing Wang3#, Junjie Cheng1,2, Qizheng Wei1,2, Haitang Yang1, Feng Yao1,2 ORCID logo

1Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; 2University of Shanghai for Science and Technology School of Health Science and Engineering, Shanghai, China; 3Department of Radiology, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China

Contributions: (I) Conception and design: J Xie, S Wang, H Yang, F Yao; (II) Administrative support: H Yang, F Yao; (III) Provision of study materials or patients: J Cheng, Q Wei, H Yang; (IV) Collection and assembly of data: J Xie, S Wang, J Cheng, Q Wei; (V) Data analysis and interpretation: J Xie, S Wang, H Yang, F Yao; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Haitang Yang, MD, PhD. Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, 241 West Huaihai Road, Shanghai 200030, China. Email: haitang.yang@shsmu.edu.cn; Feng Yao, MD, PhD. Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, 241 West Huaihai Road, Shanghai 200030, China; University of Shanghai for Science and Technology School of Health Science and Engineering, Shanghai, China. Email: yaofeng@shsmu.edu.cn.

Background: Pulmonary invasive mucinous adenocarcinoma (PIMA) is a rare subtype of lung adenocarcinoma that often mimics benign inflammatory pulmonary nodules (IPNs) on chest computed tomography (CT), leading to diagnostic challenges. We aimed to develop and internally validate a deep learning (DL)-based prediction model for noninvasive differentiation of PIMA from IPN.

Methods: This retrospective single-center study included 443 patients with pathologically confirmed PIMA or IPN between January 2021 and December 2023. The reference standard was postoperative histopathology. A total of 1,409 CT slices were used for model development. The dataset was randomly divided at the patient level into training (80%) and validation (20%) cohorts for internal validation. A dual-attention deep learning model (SE-DAS ResNet) integrating spatial and channel-wise attention mechanisms was developed. Model performance was evaluated using the area under the curve (AUC), sensitivity, and specificity with their 95% confidence intervals (CIs), as well as accuracy and F1 score.

Results: In the internal validation cohort, the SE-DAS ResNet achieved an AUC of 0.990 (95% CI: 0.978–1.000), an accuracy of 96.6%, a sensitivity of 100% (95% CI: 90.7–100%), and a specificity of 94.1% (95% CI: 84.1–98.4%). The model significantly outperformed a standard ResNet50 baseline (AUC 0.914, P<0.05). An online research platform was developed to facilitate real-time inference.

Conclusions: This dual-attention DL prediction model demonstrated excellent performance for differentiating PIMA from IPN on CT images. However, the reliance on an 80/20 internal random split rather than independent external validation may lead to optimistic performance estimates. External validation in independent multicenter cohorts is required to confirm generalizability.

Keywords: Pulmonary invasive mucinous adenocarcinoma (PIMA); inflammatory pulmonary nodule (IPN); deep learning (DL); dual-attention network; computed tomography (CT)


Submitted Feb 15, 2026. Accepted for publication May 12, 2026. Published online Jun 23, 2026.

doi: 10.21037/tlcr-2026-1-0205


Highlight box

Key findings

• A dual-attention deep learning model (SE-DAS ResNet) was developed to noninvasively differentiate pulmonary invasive mucinous adenocarcinoma (PIMA) from inflammatory pulmonary nodules (IPNs) on chest computed tomography (CT). The model achieved excellent diagnostic performance on an internal validation cohort [area under the curve (AUC) 0.990], with 100% sensitivity and high specificity, and provided clinically interpretable Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations.

What is known and what is new?

• PIMA often mimics benign IPNs on CT, leading to frequent misdiagnosis. Conventional imaging features lack sufficient specificity, and no reliable noninvasive method currently exists to distinguish PIMA from IPN with high accuracy.

• This study introduces a dedicated dual-attention deep learning (DL) framework that integrates channel-wise and spatial attention to capture subtle imaging differences between PIMA and IPN. The model demonstrates near-perfect discrimination and aligns its attention with radiologically meaningful lesion regions.

What is the implication, and what should change now?

• This work suggests that interpretable DL models can serve as effective decision-support tools in the evaluation of indeterminate pulmonary nodules.

• Incorporating such AI systems into clinical workflows may help reduce missed malignancies, avoid unnecessary invasive procedures for benign nodules, and support more precise, noninvasive management of patients with suspected PIMA.


Introduction

Pulmonary invasive mucinous adenocarcinoma (PIMA) is a rare subtype of lung adenocarcinoma, accounting for approximately 2–5% of lung adenocarcinomas, and is characterized by invasive, mucus-producing tumor cells (1). PIMA typically affects middle-aged to elderly adults (often women) and has a clinical and pathological profile distinct from other lung cancers (2).

Accurate early diagnosis of PIMA is crucial, but it is challenging in practice (1). The current diagnostic gold standard is histopathology from a biopsy or surgical specimen, which is invasive and time-consuming (3). Chest computed tomography (CT) is a noninvasive mainstay for lung nodule evaluation and lung cancer screening (4). However, the CT appearance of PIMA frequently overlaps with benign inflammatory pulmonary nodules (IPNs), leading to misdiagnosis (5). PIMA lesions often present in the lower lobes and along bronchovascular bundles, appearing as ill-defined consolidations or ground-glass opacities (6). They commonly show irregular margins, bronchial distortion or plugging, and sometimes contain pseudocavities (7). Importantly, about 75% of PIMA cases appear as localized nodules or mass-like opacities on CT (8). These nodular PIMAs can look radiologically very similar to inflammatory nodules. For example, both may show spiculated or irregular edges and heterogeneous attenuation (9). Because no highly specific imaging hallmark exists for PIMA, even experienced radiologists can mistake an early-stage PIMA for an infection or organizing pneumonia (10). This substantial imaging overlap poses a critical diagnostic dilemma: a malignant PIMA may be initially treated as an infection (delaying surgery), or a benign nodule might be subjected to unnecessary invasive biopsy or resection under suspicion of cancer.

Recent advances in artificial intelligence (AI) offer a potential solution to this diagnostic challenge (11). Deep learning (DL) and specifically convolutional neural networks have demonstrated remarkable success in medical image analysis by automatically learning complex feature representations (12,13). In thoracic imaging, DL models have achieved expert-level performance in tasks such as lung nodule detection, malignancy risk estimation, and classification of common tumor subtypes (14). However, differentiating a rare entity like PIMA from benign mimics using CT is largely unexplored.

Two major hurdles have limited progress: (I) the subtlety of imaging differences between PIMA and inflammatory lesions, which makes it hard for both humans and algorithms to distinguish them based on CT alone; (II) the scarcity of well-annotated PIMA cases, since this subtype is uncommon and publicly available datasets are virtually nonexistent (15). These challenges have hindered the development and validation of reliable AI models for PIMA, thereby delaying their adoption in clinical practice.

To address this unmet clinical need, we aimed to develop and internally validate a diagnostic DL-based prediction model for noninvasively differentiating PIMA from IPN using chest CT images. We constructed a dedicated CT dataset (PIMA-CT-Set) and developed a dual-attention deep learning network, termed SE-DAS ResNet. The model integrates a squeeze-and-excitation (SE) module (16) to recalibrate channel-wise feature importance and a dynamic attention submodule (DAS) to emphasize spatially relevant regions. These mechanisms were incorporated into a ResNet50 (17) backbone to enhance lesion discrimination. The model was subsequently evaluated using internal validation to assess its discrimination performance and potential clinical applicability. We present this article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-1-0205/rc).


Methods

Study design and population

This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This retrospective study was approved by the Institutional Review Board of Shanghai Chest Hospital [#KS(Y)21316], and informed consent was obtained from all patients. We screened consecutive patients from Shanghai Chest Hospital from January 2021 to December 2023 who underwent chest CT for evaluation of pulmonary nodules and met our inclusion criteria. The selection process is summarized in Figure 1. Patients were divided into two groups.

Figure 1 Flowchart of patient selection and dataset partitioning. After applying inclusion/exclusion criteria, 443 patients (190 with PIMA and 253 with IPN) were eligible. Key exclusion steps included ruling out other diagnoses and ensuring benign nodules had documented resolution. The remaining cases were randomly split into an 80% training set (for model development) and a 20% internal validation set, with the split done at the patient level (no patient appears in both sets). CT, computed tomography; IPN, inflammatory pulmonary nodule; PIMA, pulmonary invasive mucinous adenocarcinoma.

Inclusion criteria

  • PIMA group: (i) histopathologically confirmed invasive mucinous adenocarcinoma (from surgical resection or biopsy); (ii) a pre-treatment chest CT scan within 1 month of diagnosis; and (iii) a clearly visible solid or predominantly solid pulmonary nodule/mass on CT.
  • IPN group: (i) no evidence of malignancy on any biopsy (if performed) and (ii) the pulmonary nodule had a benign course: it either significantly decreased in size or completely resolved after appropriate antibiotics or anti-inflammatory treatment, with no recurrence or progression over at least 6 months of follow-up.

Exclusion criteria

We excluded any patient whose CT image quality was poor (e.g., severe motion artifacts), whose lesion was not adequately visualized, or whose clinical follow-up was insufficient to establish benignity. As a reference standard, PIMA diagnoses were confirmed by pathology, and IPN cases were defined by strict benign follow-up (resolution or stability). This pragmatic approach avoids unnecessary biopsies for benign disease, but we used stringent criteria to minimize the chance of occult cancer in the IPN group.

After applying all criteria, 443 patients were eligible, including 190 with PIMA and 253 with IPN, contributing a total of 1,409 annotated CT slices (Figure 1). We randomly split the dataset at the patient level into a training set (80%, n=354) and an internal validation set (20%, n=89). The dataset was randomly divided at the patient level into a training set (80%, n=354) and an internal validation set (20%, n=89). To prevent information leakage, this split was performed using unique patient identifiers before slice extraction, and no slice-level randomization was conducted. Accordingly, all CT slices from the same patient were assigned exclusively to either the training or validation set, ensuring that no patient-specific anatomical information was shared between cohorts.

An experienced thoracic radiologist, blinded to outcomes, reviewed all CT scans and marked the slices containing each target nodule. These annotated slices were used to construct the model input datasets without revealing the clinical diagnosis to the model during training. No formal sample size calculation was performed because of the retrospective design. All consecutive eligible patients during the study period were included to minimize selection bias and maximize statistical power.

CT acquisition and image preprocessing

All chest CT scans were performed without intravenous contrast according to our institution’s routine thoracic imaging protocol, using a standard 5-mm slice thickness and a sharp lung reconstruction kernel for lung-window evaluation. Scanner models and specific acquisition parameters, such as exact tube voltage and tube current, varied during the retrospective study period and could not be fully retrieved for all cases. Nevertheless, all included images met routine diagnostic quality standards. The radiologic workflow is summarized in Figure 2.

Figure 2 Overview of the dual-attention deep learning model development. (A) Example of lung parenchyma segmentation: the original chest CT (including all structures) is processed to remove extrapulmonary tissues, yielding an image volume of only the lungs. (B) From the segmented lung CT, slices containing the nodule are extracted for the classification dataset (with all cases anonymized). (C) The ResNet50-based classifier takes a lung CT slice as input and outputs the probability of the nodule being PIMA. (D) Schematic of a standard ResNet residual block. (E) Our modified residual block with integrated SE and DAS attention modules (dual-attention block). (F) Diagram of the DAS architecture, which learns to emphasize important spatial features. (G) Diagram of the SE module, which learns to recalibrate feature channel importance. BN, batch normalization; CT, computed tomography; DAS, dynamic attention submodule; FC, fully connected; GAP, global average pooling; nnU-Net, no-new-Net; PIMA, pulmonary invasive mucinous adenocarcinoma; ReLu, rectified linear unit; SE, squeeze-and-excitation.

We automatically segmented the lung parenchyma on each CT slice using a pretrained nnU-Net model (18). This removed non-pulmonary structures (chest wall, mediastinum, etc.) so that the model’s attention would focus on the lungs. To preserve anatomical context (e.g., lobe location, surrounding lung texture), we retained the full segmented lung fields rather than cropping tightly around the nodule. Using the radiologist’s annotation, we extracted all segmented slices containing the nodule for each patient, resulting in a stack of 2D images per case. Non-lung areas in these slices were zeroed out. To mitigate potential scanner-related variability and improve consistency across 5-mm CT slices, the DL pipeline applied standardized image preprocessing, including isotropic resampling and Hounsfield unit (HU) window normalization. The resulting images were anonymized and converted to a standardized 512×512 pixel format for input to the network.

We also applied basic data augmentation during model training to improve generalizability: each training image could be randomly flipped horizontally and rotated by a small angle (19). This simulates different patient positions and scanner angles, helping the model not overfit to specific orientations of nodules.

Model architecture (SE-DAS ResNet) and training

The predictors used for model development consisted exclusively of CT image data. No clinical variables were incorporated into the model. The SE-DAS ResNet architecture was selected to address the clinical need to capture both global contextual information and fine-grained imaging features relevant to PIMA. ResNet50 was used as the backbone because of its established robustness in hierarchical feature extraction and its broad application in thoracic medical imaging tasks (17). We initialized the network with weights pretrained on ImageNet to leverage transfer learning (20). We then inserted two attention modules into each residual block (Figure 2D-2G). The SE module was used for channel-wise feature recalibration, enabling the network to emphasize informative feature channels and suppress less relevant responses. To further capture fine-grained spatial cues, we incorporated the DAS. Inspired by established attention mechanisms such as the convolutional block attention module (CBAM) (21), DAS was designed as a task-specific parallel adaptation for this diagnostic setting. Unlike CBAM, which applies channel and spatial attention sequentially and generates spatial attention using channel-wise pooling followed by a large-kernel 7×7 convolution, DAS adopts a parallel dual-branch structure. Its spatial branch uses a point-wise 1×1 convolution without spatial pooling, thereby preserving feature-map resolution and reducing the risk of smoothing localized medical imaging features. This design allows the network to better localize subtle structural patterns, such as low-contrast internal texture and irregular nodule margins, that are important for differentiating PIMA from IPN.

During training, each 2D lung slice was labeled as “PIMA” or “IPN” according to the patient’s diagnosis. We trained the network using a weighted binary cross-entropy loss to account for class imbalance. After training, we aggregated slice-level predictions to make a patient-level decision: the model outputs a probability of “PIMA” for each slice, and we averaged these probabilities across all slices of a patient. This patient-level aggregation strategy was specifically implemented to mitigate intra-patient correlation bias, ensuring that patients with larger nodules (and thus more slices) did not disproportionately influence the final clinical prediction. This yielded a single PIMA probability per patient. We used a decision threshold of 0.5 on this patient-level score (≥0.5 = predict PIMA; <0.5 = predict IPN).

Hyperparameters (learning rate, epochs, batch size, optimizer, Loss function, etc.) and training details are provided in Appendix 1.

Model interpretability

To make the model’s predictions more transparent, we used Gradient-weighted Class Activation Mapping (Grad-CAM) to produce visual explanations (22). The details are provided in Appendix 1. For each test slice, Grad-CAM generates a heatmap highlighting which regions of the image most strongly influenced the model’s decision. We overlaid these heatmaps on the original CT slices. Clinically, seeing that the model “looks” at sensible locations (e.g., nodule margins) can increase radiologist trust in the AI output.

Implementation and online platform

We have made the model accessible via an online research platform (jiadingxie.top): users can upload chest CT images and receive a predicted PIMA risk score along with Grad-CAM visualizations. This enables external investigators to test the model on their own data. We emphasize that this platform is for research use and not a clinical diagnostic tool; its purpose is to facilitate validation and further development.

Figure S1 illustrates the end-to-end prediction workflow used in the online platform, from image input and lung segmentation to output of the prediction and Grad-CAM visualization. The source code and trained model parameters are publicly available at https://github.com/JiadingXie/SEDAS-ResNet to promote transparency, reproducibility, and external validation by independent investigators.

Outcome measures and statistical analysis

There were no missing data for outcome status. For clinical variables, cases with incomplete data were excluded from comparative analyses; no imputation was performed. Our primary performance metric was the area under the curve (AUC) for distinguishing PIMA vs. IPN on the validation set. To mitigate intra-patient correlation bias in statistical evaluation, AUC and all secondary performance metrics, including overall accuracy, sensitivity, specificity, and F1 score, were calculated strictly at the patient level using the aggregated patient-level probabilities. We computed 95% confidence intervals (CIs) for AUC using a bootstrap resampling of patients (10,000 iterations). Secondary metrics included overall accuracy, sensitivity, specificity, and F1 score, using the 0.5 threshold on patient-level probability. Sensitivity and specificity are reported with 95% CIs (Wilson intervals). Critically, we compared AUC values between the proposed SE-DAS ResNet and all baseline/ablation models using DeLong’s test to assess statistical significance. For demographic comparisons, continuous variables (age, tumor size) were compared by the Mann-Whitney U test, and categorical variables (sex, location) by chi-square or Fisher’s exact test as appropriate. A two-sided P value <0.05 was considered statistically significant.

In addition to discrimination, we assessed the calibration of the SE-DAS ResNet in the patient-level validation cohort. Calibration reflects the agreement between predicted probabilities and the observed frequency of PIMA. It was quantified using the Brier score, which ranges from 0 to 1, with lower values indicating better calibration. A calibration curve was also generated using five quantile-based bins, with 95% Wilson CIs added to show the uncertainty around the observed event rates in each bin.


Results

Patient and nodule characteristics

In total, 443 patients were included (190 PIMA, 253 IPN). The training set had 152 PIMA and 202 IPN, and the validation set had 38 PIMA and 51 IPN. Table 1 summarizes the cohort characteristics. PIMA patients were older (median age 63.0 vs. 51.0 years; P<0.001) and had significantly larger nodules (median diameter 5.8 vs. 2.1 cm; P<0.001) compared to IPN patients. Sex distribution and nodule lobe location did not differ significantly between groups. The training and validation subsets had similar demographics and nodule features (see Table S1). Missing data were rare (<5%) and occurred only in baseline clinical variables. No missing data were present for the outcome (pathological diagnosis or benign follow-up). Missing values were handled using complete-case analysis.

Table 1

Comparison of clinical characteristics between pulmonary invasive mucinous adenocarcinoma and inflammatory pulmonary nodules

Characteristics Total (n=443) PIMA (n=190) IPN (n=253) P value
Sex 0.71
   Female 198 (44.7) 83 (43.7) 115 (45.4)
   Male 245 (55.3) 107 (56.3) 138 (54.6)
Age (years) 63.0 (55.0–72.0) 51.0 (45.0–61.0) <0.001
Tumor size (cm) 5.8 (3.5–8.2) 2.1 (1.0–3.5) <0.001
Lesion location 0.27
   LUL 95 (21.4) 42 (22.1) 53 (21.0)
   LLL 102 (23.0) 48 (25.3) 54 (21.3)
   RUL 97 (21.9) 34 (17.9) 63 (24.9)
   RML 38 (8.6) 13 (6.8) 25 (9.9)
   RLL 111 (25.1) 53 (27.9) 58 (22.9)

Data are presented as median (IQR) or n (%). Age and tumor size were compared using the Mann-Whitney U test. Sex and lesion location were compared using the chi-square test or Fisher’s exact test, as appropriate. Statistical significance was defined as P<0.05. LLL, left lower lobe; IPN, inflammatory pulmonary nodule; IQR, interquartile range; LUL, left upper lobe; PIMA, pulmonary invasive mucinous adenocarcinoma; RLL, right lower lobe; RML, right middle lobe; RUL, right upper lobe.

Diagnostic performance and ablation analysis

On the internal validation cohort (n=89 patients), the baseline ResNet50 model with segmentation and augmentation achieved an AUC of 0.973 (Figure 3A). The single-attention models showed further improvement, with AUC values of 0.976 and 0.983, respectively (Figure 3B,3C). We also compared the proposed architecture with a ResNet variant incorporating the standard CBAM. Although the CBAM-based model performed well (AUC 0.980; Figure S2), it showed lower sensitivity (97.4% vs. 100%) and specificity (90.2% vs. 94.1%) than the SE-DAS ResNet. The full SE-DAS ResNet achieved excellent patient-level discrimination between PIMA and IPN (Figure 3D), with an AUC of 0.990 (95% CI: 0.978–1.000). At the prespecified probability threshold of 0.5, the model correctly identified all 38 PIMA cases (sensitivity 100.0%, 95% CI: 90.7–100.0%) and correctly classified 48 of 51 IPN cases (specificity 94.1%). The overall patient-level accuracy was 96.6%, and the F1 score was 0.962.

Figure 3 Diagnostic performance and ablation analysis of different model configurations on the validation set. (A-D) The ROC curves compare the overall discrimination ability of different models for distinguishing PIMA from IPN. The proposed SE-DAS ResNet nearly reaches the top-left corner, achieving an AUC of 0.990. In contrast, the baseline ResNet50 (A) and the single-attention models show lower AUCs (B,C), indicating inferior performance. The dual-attention SE-DAS ResNet (D) achieves the highest AUC, demonstrating superior overall diagnostic accuracy. (E-K) The corresponding patient-level confusion matrices illustrate the classification results of the ablation study for each model configuration on the validation set, with the ground truth labels (PIMA or IPN) shown on the vertical axis and model predictions on the horizontal axis. (E) Baseline ResNet50 without lung segmentation or attention modules. (F) ResNet50 with lung segmentation preprocessing only. (G) ResNet50 with horizontal flip augmentation only. (H) ResNet50 with both lung segmentation and augmentation, but without attention modules. (I) ResNet50 with the SE attention module only. (J) ResNet50 with the DAS attention module only. (K) Full SE-DAS ResNet incorporating both attention modules and preprocessing. AUC, area under the curve; DAS, dynamic attention submodule; IPN, inflammatory pulmonary nodule; PIMA, pulmonary invasive mucinous adenocarcinoma; ROC, receiver operating characteristic; SE, squeeze-and-excitation.

Using paired AUC comparison, the SE-DAS ResNet significantly outperformed the baseline model (P<0.001). The ablation analysis further showed that each added component, including lung segmentation, data augmentation, and attention modules, contributed to improved diagnostic performance (P<0.05 for each incremental comparison; Table 2). In the validation cohort, the model achieved 100.0% sensitivity, with no false-negative PIMA cases. However, the model was not error-free, as 3 of 51 IPN cases were misclassified as suspicious, corresponding to a false-positive rate of 5.88%. This pattern suggests that the model adopted a conservative decision boundary that prioritized minimizing missed PIMA cases while maintaining high specificity. The patient-level confusion matrices (Figure 3E-3K) show a progressive reduction in misclassifications with the integration of preprocessing and dual-attention modules.

Table 2

Performance comparison of preprocessing and attention modules

Model configuration AUC (95% CI) Accuracy Sensitivity (95% CI) Specificity (95% CI) F1 score P value
Baseline (ResNet50) 0.914 (0.882–0.946) 0.876 0.921 (0.792–0.973) 0.843 (0.718–0.915) 0.864 <0.001
+Lung segmentation 0.958 (0.934–0.982) 0.921 0.974 (0.862–0.999) 0.882 (0.766–0.943) 0.914 0.004
+Data augmentation 0.942 (0.913–0.971) 0.899 0.947 (0.827–0.985) 0.863 (0.743–0.932) 0.889 0.01
+ Seg + Aug 0.973 (0.952–0.994) 0.921 1.000 (0.908–1.000) 0.863 (0.743–0.932) 0.916 0.03
+ Seg + Aug + SE module 0.976 (0.956–0.996) 0.933 1.000 (0.908–1.000) 0.882 (0.766–0.943) 0.927 0.02
+ Seg + Aug + DAS module 0.983 (0.965–1.000) 0.944 1.000 (0.908–1.000) 0.902 (0.790–0.957) 0.938 0.04
+ Seg + Aug + CBAM 0.980 (0.961–0.996) 0.933 0.974 (0.862–0.999) 0.902 (0.790–0.957) 0.925 0.02
SE-DAS ResNet (full) 0.990 (0.978–1.000) 0.966 1.000 (0.908–1.000) 0.941 (0.841–0.984) 0.962 Reference

AUC, area under the curve; Aug, augmentation; CBAM, convolutional block attention module; CI, confidence interval; DAS, dynamic attention submodule; SE, squeeze-and-excitation; Seg, segmentation.

The training and validation score curves (Figure S3) did not show deterioration in diagnostic metrics. However, the loss curves (Figure S4) showed that the training cross-entropy loss approached zero, whereas the validation loss plateaued and fluctuated during later epochs. This pattern suggests loss-level overconfidence or overfitting on difficult validation cases, rather than a complete absence of overfitting (23,24). Nevertheless, the validation AUC and accuracy remained stable, indicating that the model’s discriminative performance was preserved despite the divergence in cross-entropy loss. To reduce the risk of selecting an overfitted final model, checkpoint selection was based on validation AUC with early stopping, and the best-performing checkpoint was retained for final evaluation.

We performed ablation studies to assess the contribution of each component (preprocessing and attention modules) to performance (Table 2). The baseline model (ResNet50 with no lung segmentation or augmentation) had AUC 0.914. Adding only lung segmentation (no attention) increased AUC to 0.958, and adding only horizontal flip augmentation gave a similar boost (AUC 0.942). Incorporating a single attention module yielded further gains: SE-only gave AUC 0.976, and DAS-only gave AUC 0.983. Finally, the full SE-DAS ResNet with both segmentation, augmentation, and both attention modules achieved AUC 0.990 and 96.6% accuracy, significantly outperforming every simpler variant (P<0.05 for each comparison). These results indicate that segmentation, augmentation, and both attention mechanisms each contributed complementary benefits to model performance.

Figure 3E-3K shows the confusion matrices for key models. The baseline model (Figure 3E) had multiple false negatives (missed PIMA cases) and some false positives. As we added preprocessing and attention (Figure 3F-3J), the number of errors progressively declined. The full SE-DAS ResNet (Figure 3K) had zero false negatives and only a few false positives, consistent with the perfect sensitivity and very high specificity reported above. In summary, attention modules and focused preprocessing reduced both false negatives and false positives. The corresponding slice-level confusion matrices, which provide additional information on slice-level feature extraction performance, are presented in Figure S5.

In addition to discrimination, the SE-DAS ResNet showed reasonable calibration in the validation cohort, with a patient-level Brier score of 0.081. As shown in the calibration plot (Figure S6), the predicted probabilities were generally consistent with the observed frequencies of PIMA. However, because calibration was assessed in a relatively small internal validation cohort, these results should be interpreted with appropriate caution, and the probability outputs should not be considered definitive clinical risk estimates without further external validation.

Robustness and sensitivity analyses

All robustness analyses were exploratory and performed on the internal validation cohort. We tested the model’s robustness in several ways. We first stratified the patient-level validation set by nodule size (<3 vs. ≥3 cm), which showed consistently high performance in both subgroups (Table S2). In the clinically challenging <3 cm subgroup, which included 41 patients (6 PIMA and 35 IPN), the SE-DAS ResNet maintained strong patient-level discrimination, with an AUC of 0.975. The model correctly identified all 6 small PIMA cases, corresponding to a sensitivity of 100.0% (95% CI: 0.610–1.000), and correctly classified 33 of 35 IPN cases, corresponding to a specificity of 94.3% (95% CI: 0.814–0.984).

We next assessed model stability across two acquisition periods, comparing earlier and later years of the study period. Patient-level AUC and accuracy remained stable, suggesting that performance was not substantially affected by minor temporal or protocol-related variations within our single-center dataset. Finally, in a predefined “hard-case” subset (n=60) after excluding lesions with highly obvious imaging features, the model continued to show strong discrimination, with an AUC of 0.968.

Together, these exploratory analyses suggest that the model’s performance was not driven solely by gross lesion size or obvious imaging differences. However, because these analyses were conducted within the internal validation cohort, they should be interpreted as supportive rather than definitive evidence of size-independent or externally generalizable performance.

Interpretability analysis

A key goal was to ensure the model’s decisions are interpretable for clinicians. Figure 4 displays representative Grad-CAM heatmaps for examples of correct and incorrect predictions. In true PIMA cases correctly identified, the model’s attention (red regions) was largely focused on the nodule itself, especially along irregular or spiculated borders and heterogeneous internal textures (Figure 4A). These are exactly the features that radiologists use to suspect mucinous malignancy (e.g., spiculation suggesting invasive growth, and patchy density consistent with mucin). These findings suggest that the model focused on radiologically relevant regions.

Figure 4 Representative Grad-CAM interpretability maps for model predictions. Each panel overlays a heatmap (red = high attention) on the original CT image. (A) Example of a PIMA case correctly identified by the model. The heatmap is concentrated on the nodule’s irregular perimeter and internal texture, corresponding to known radiologic signs of malignancy. (B) Example of an IPN case correctly recognized as benign. The model’s attention is diffuse or focused on surrounding lung changes rather than the nodule, consistent with an inflammatory pattern. CT, computed tomography; Grad-CAM, Gradient-weighted Class Activation Mapping; IPN, inflammatory pulmonary nodule; PIMA, pulmonary invasive mucinous adenocarcinoma.

In true IPN cases correctly identified, the heatmaps were generally diffuse or highlighted adjacent inflammatory changes (Figure 4B), such as peripheral consolidation or atelectasis near the nodule, rather than the nodule margins. This pattern suggests the model learned to recognize secondary signs of inflammation. In contrast, for some misclassified cases, the attention maps were less focused. For instance, in certain false negatives (PIMA predicted benign), the Grad-CAM heatmap was scattered and did not highlight the lesion, often occurring when the tumor had very subtle or low-contrast features. In some false positives (IPN predicted malignant), the heatmap did highlight the nodule, but the nodule had atypical features (e.g., slightly irregular edge or persistent opacity) that even experienced radiologists might find worrisome. These failure cases indicate that the model errs in truly ambiguous situations, underscoring that AI should complement rather than replace clinical judgment. Overall, the visual explanations suggest the model is reasoning in a clinically sensible way.


Discussion

We developed and validated an interpretable DL model to address a notoriously difficult problem in thoracic imaging: distinguishing invasive mucinous adenocarcinoma from benign inflammatory nodules on CT. Our custom SE-DAS ResNet achieved near-perfect accuracy (AUC 0.990) on an internal validation set, substantially outperforming a standard ResNet50 baseline. Notably, sensitivity reached 100%, with no PIMA cases missed—an especially important feature in a clinical triage context. By operating at the patient level (averaging slice predictions) and incorporating lung segmentation along with dual attention mechanisms, the model appears capable of capturing subtle imaging patterns that are difficult to discern visually. Specifically, unlike human vision which may struggle with low-contrast and diffuse mucinous textures, the point-wise spatial attention (DAS) is designed to capture these sub-visual, pixel-level morphological variations without spatial blurring, which largely explains the model’s superior discriminative capability over traditional qualitative assessments. These findings underscore the potential clinical value of AI-assisted nodule analysis, particularly for rare subtypes such as PIMA, where radiologists may have limited exposure to the full imaging spectrum.

Accurate, noninvasive differentiation between PIMA and IPN has important implications for patient care. Misclassification of malignant nodules as benign may delay treatment until disease progression, whereas false-positive cancer diagnoses may subject patients to unnecessary biopsy or surgery, along with psychological distress. In our study, the AI model reduced both error types. All PIMA cases were correctly identified, suggesting potential to prevent missed cancers. Meanwhile, high specificity (only 5.88% of benign nodules misclassified) indicates that most patients with inflammatory nodules could avoid invasive follow-up. In multidisciplinary lung nodule clinics, such a tool may assist in determining which patients require immediate biopsy versus close observation.

Importantly, we envision this model as a decision-support system rather than a standalone diagnostic tool. Following radiologist interpretation of CT images, the AI output could serve as a second opinion. A high malignancy probability combined with a heatmap highlighting suspicious features may reinforce clinical suspicion, whereas a low-risk score emphasizing benign patterns could provide reassurance. In tumor board discussions, objective probability estimates and visual rationales may influence management decisions, such as prioritizing biopsy for high-risk nodules or recommending conservative treatment for low-risk cases. This approach aligns with current clinical use of computer-aided detection tools, which are designed to augment—not replace—physician expertise.

A key strength of our model lies in its interpretability. By integrating dual attention modules, we gained insight into the model’s reasoning process. Grad-CAM visualizations frequently highlighted features commonly considered by radiologists, including spiculated margins, heterogeneous internal texture, satellite nodules, and bronchioles extending into lesions—findings suggestive of aerogenous spread in PIMA. This alignment with established radiologic knowledge is critical for fostering trust in AI systems. Although Grad-CAM provides relatively coarse explanations, it offers rapid and intuitive visual feedback during image review. Future research could explore more advanced explainability approaches, such as example-based reasoning or quantitative feature attribution methods, to further enhance transparency.

Previous studies have emphasized the difficulty of differentiating PIMA from benign inflammatory conditions due to overlapping imaging characteristics. Conventional radiologic assessment relies largely on qualitative pattern recognition, which often lacks specificity in nodular presentations. Radiomics approaches have attempted to extract quantitative texture features, but these depend on handcrafted feature engineering and precise segmentation and may be sensitive to scanner variability. In contrast, our DL framework automatically learns hierarchical features directly from images. The addition of dual attention mechanisms enables selective emphasis on the most discriminative patterns, effectively modeling complex multi-scale features in a data-driven and potentially more robust manner.

Our study also adds to the growing evidence that task-adapted attention mechanisms may improve medical imaging AI. Compared with general-purpose attention modules such as CBAM, which use sequential channel-spatial attention and spatial pooling followed by relatively large convolutional kernels, the SE-DAS architecture was designed to better preserve subtle CT features relevant to PIMA-IPN differentiation. By adopting a parallel topology and a point-wise 1×1 spatial convolution, the DAS module reduces the risk of smoothing localized lesion features and helps retain boundary and texture information that may be important for recognizing mucinous adenocarcinoma. The complementary combination of channel-wise recalibration through the SE module and localized spatial attention through the DAS module improved the model’s ability to represent the low-contrast and heterogeneous imaging characteristics of PIMA. Consistent with prior work in lung adenocarcinoma subtype classification (25) and recent advances in hybrid DL architectures for complex pulmonary diseases (26), integrating multi-scale and multi-feature representations may enhance diagnostic performance in challenging thoracic imaging tasks.

Despite these promising findings, several limitations should be acknowledged. First, this was a retrospective single-center study. We attempted to improve potential robustness by applying lung segmentation to reduce non-pulmonary background information, using data augmentation, and including cases acquired over a three-year period. Nevertheless, single-center DL models remain vulnerable to site-specific bias related to scanner models, reconstruction protocols, local imaging workflows, and patient-selection patterns. Therefore, the current findings should be interpreted as evidence of strong internal validation performance rather than definitive generalizability. Independent external validation in multi-center cohorts is required to confirm the robustness of the model and to assess its performance across heterogeneous clinical settings.

Second, most IPN cases were defined based on clinical follow-up rather than histopathology. Although stringent criteria were applied (treatment response and at least six months of stability or resolution), a small risk of occult malignancy cannot be completely excluded. This is particularly relevant because some lepidic-predominant lung adenocarcinomas may grow very slowly, with reported median volume-doubling times exceeding 1,100 days (27). Therefore, a 6-month stability window may still miss a small proportion of indolent malignant lesions (28). This pragmatic reference standard may have introduced some degree of misclassification and could potentially affect the apparent sensitivity and specificity of the model. In addition, nodule annotation was performed by a single senior radiologist. The lack of inter-observer reliability assessment, such as Cohen’s kappa or intraclass correlation coefficient, represents a reproducibility limitation. Future studies should incorporate multi-reader annotation with consensus review to reduce subjective annotation bias.

Third, model performance was evaluated using internal validation based on a single random patient-level split rather than patient-wise cross-validation or external validation. Although we used 10,000 patient-level bootstrap resampling iterations and performed multiple exploratory subgroup analyses to estimate uncertainty and assess robustness, a single split may still produce optimistic performance estimates. The relatively limited number of PIMA cases may also affect the stability of model evaluation despite the use of class-balanced training strategies. In particular, the 100.0% validation sensitivity should be interpreted cautiously because it was based on 38 PIMA cases in the validation cohort. The corresponding 95% CI was 90.7–100.0%, indicating substantial statistical uncertainty around this point estimate. We anticipate that model performance may decrease when tested in larger, noisier, and more heterogeneous external datasets. Finally, because this was a retrospective proof-of-concept study, we did not conduct a multi-reader comparative study. Therefore, the incremental clinical value of SE-DAS ResNet over experienced radiologists remains to be established in future prospective reader studies.

Finally, interpretability relied on Grad-CAM visualizations, which provide qualitative rather than mechanistic explanations. Future multi-center prospective studies with larger cohorts and independent validation are warranted to confirm clinical utility and calibration stability.

To facilitate broader testing and external validation, we have made the model available through an online research platform. Wider evaluation across diverse patient populations and imaging systems will be essential for real-world translation. Integration into clinical workflows could involve automated segmentation and scoring following routine CT interpretation, with AI results presented to radiologists as structured decision-support outputs.


Conclusions

In conclusion, we developed and internally validated a dual-attention DL diagnostic prediction model for noninvasive differentiation of PIMA from benign inflammatory nodules on CT. The model demonstrated high discrimination performance (AUC 0.99) with 100% sensitivity in the internal validation cohort and provided interpretable visual explanations aligned with known radiologic features.

While these findings suggest potential utility as a clinical decision-support tool, external validation in independent multi-center cohorts is essential before clinical implementation. Future research should focus on prospective evaluation, calibration assessment in broader populations, and integration into routine clinical workflows to determine its real-world impact on patient management.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-1-0205/rc

Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-1-0205/dss

Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-1-0205/prf

Funding: This work was supported by the National Natural Science Foundation of China (No. 82371128 to F.Y., and No. 82202925 to H.Y.), the Clinical Research Special Fund of Shanghai Chest Hospital (No. 2024IIT-Q007 to H.Y.), the Fundamental Research Funds for the Central Universities (No. YG2025ZD14 to F.Y.), and National Program for High-Level Medical Talents (No. RC-202509-055 to F.Y.). The funding sources had no involvement in the study design; collection, analysis, or interpretation of data; or the decision to submit the manuscript for publication.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-1-0205/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This retrospective study was were approved by the Institutional Review Board [#KS(Y)21316] of Shanghai Chest Hospital and were conducted in accordance with all relevant ethical guidelines, including the Declaration of Helsinki and its subsequent amendments. All patients had signed informed consent for inclusion of their clinical data and specimens in research projects.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Succony L, Rassl DM, Barker AP, et al. Adenocarcinoma spectrum lesions of the lung: Detection, pathology and treatment strategies. Cancer Treat Rev 2021;99:102237. [Crossref] [PubMed]
  2. Lee HY, Cha MJ, Lee KS, et al. Prognosis in Resected Invasive Mucinous Adenocarcinomas of the Lung: Related Factors and Comparison with Resected Nonmucinous Adenocarcinomas. J Thorac Oncol 2016;11:1064-73. [Crossref] [PubMed]
  3. Sasaki T, Kuno H, Hiyama T, et al. 2021 WHO Classification of Lung Cancer: Molecular Biology Research and Radiologic-Pathologic Correlation. Radiographics 2024;44:e230136. [Crossref] [PubMed]
  4. Eddy RL, Sin DD. Make it count with photon-counting computed tomography: a revolution in technology for investigating the airways. Eur Respir J 2025;65:2500297. [Crossref] [PubMed]
  5. Pan X, Fang R, Zhang B, et al. Pathological and imaging features of pulmonary invasive mucinous adenocarcinoma-a retrospective cohort study. Transl Lung Cancer Res 2024;13:1376-82. [Crossref] [PubMed]
  6. Zhang YN, Cao K, Yang Y, et al. Case Report: Do not diagnose lung cancer as pneumonia: continue to monitor a case of invasive mucinous adenocarcinoma as it progresses from small to large. Front Med (Lausanne) 2025;12:1578874. [Crossref] [PubMed]
  7. Qi L, Jia J, Zhang G, et al. Radiological Features of Primary Pulmonary Invasive Mucinous Adenocarcinoma Based on 312 Consecutive Patients. Clin Respir J 2024;18:e13820. [Crossref] [PubMed]
  8. Nie K, Nie W, Zhang YX, et al. Comparing clinicopathological features and prognosis of primary pulmonary invasive mucinous adenocarcinoma based on computed tomography findings. Cancer Imaging 2019;19:47. [Crossref] [PubMed]
  9. Li Q, Fan X, Huo JW, et al. Differential diagnosis of localized pneumonic-type lung adenocarcinoma and pulmonary inflammatory lesion. Insights Imaging 2022;13:49. [Crossref] [PubMed]
  10. Ge L, Wang L, Pei D. Pulmonary mucinous adenocarcinoma: An overview of pathophysiology and advancements in treatment. Heliyon 2024;10:e28881. [Crossref] [PubMed]
  11. Lotter W, Hassett MJ, Schultz N, et al. Artificial Intelligence in Oncology: Current Landscape, Challenges, and Future Directions. Cancer Discov 2024;14:711-26. [Crossref] [PubMed]
  12. Fukushima K. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 1980;36:193-202. [Crossref] [PubMed]
  13. Jiao L, Wang M, Liu X, et al. Multiscale Deep Learning for Detection and Recognition: A Comprehensive Survey. IEEE Trans Neural Netw Learn Syst 2025;36:5900-20. [Crossref] [PubMed]
  14. Venkadesh KV, Aleef TA, Scholten ET, et al. Prior CT Improves Deep Learning for Malignancy Risk Estimation of Screening-detected Pulmonary Nodules. Radiology 2023;308:e223308. [Crossref] [PubMed]
  15. Wehbe A, Dellepiane S, Minetti I. Enhanced Lung Cancer Detection and TNM Staging Using YOLOv8 and TNMClassifier: An Integrated Deep Learning Approach for CT Imaging. IEEE Access 2024;12:141414-24.
  16. Wang YR, Bian YX, Jiang SL. PSE: Enhancing structural contextual awareness of networks in medical imaging with Permute Squeeze-and-Excitation module. Biomedical Signal Processing and Control 2025;100:14.
  17. He KM, Zhang XY, Ren SQ, et al. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27-30; Seattle, WA. New York: IEEE; 2016.
  18. Isensee F, Jaeger PF, Kohl SAA, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021;18:203-11. [Crossref] [PubMed]
  19. Garcea F, Serra A, Lamberti F, et al. Data augmentation for medical imaging: A systematic literature review. Comput Biol Med 2023;152:106391. [Crossref] [PubMed]
  20. Geroski T, Rankovic V, Pavic O, et al. Enhancing COVID-19 disease severity classification through advanced transfer learning techniques and optimal weight initialization schemes. Biomedical Signal Processing and Control 2025;100:12.
  21. Woo SH, Park J, Lee JY, et al. CBAM: Convolutional Block Attention Module. 15th European Conference on Computer Vision (ECCV); 2018 Sep 08-14; Munich, Germany. Cham: Springer International Publishing Ag; 2018.
  22. Selvaraju RR, Cogswell M, Das A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int J Comput Vis 2020;128:336-59.
  23. Berta E, Holzmüller D, Jordan MI, Bach F. Rethinking early stopping: Refine, then calibrate. arXiv preprint arXiv:250119195. 2025.
  24. Guo C, Pleiss G, Sun Y, et al. On calibration of modern neural networks. Proceedings of the 34th International Conference on Machine Learning (ICML 2017). PMLR; 2017;70:1321-30.
  25. Zhang X, Zhang G, Qiu X, et al. Exploring non-invasive precision treatment in non-small cell lung cancer patients through deep learning radiomics across imaging features and molecular phenotypes. Biomark Res 2024;12:12. [Crossref] [PubMed]
  26. Abdelhamid A, El-Ghamry A, Abdelhay EH, et al. Improved pulmonary embolism detection in CT pulmonary angiogram scans with hybrid vision transformers and deep learning techniques. Sci Rep 2025;15:31443. [Crossref] [PubMed]
  27. Hong JH, Park S, Kim H, et al. Volume and Mass Doubling Time of Lung Adenocarcinoma according to WHO Histologic Classification. Korean J Radiol 2021;22:464-75. [Crossref] [PubMed]
  28. MacMahon H, Naidich DP, Goo JM, et al. Guidelines for Management of Incidental Pulmonary Nodules Detected on CT Images: From the Fleischner Society 2017. Radiology 2017;284:228-43. [Crossref] [PubMed]
Cite this article as: Xie J, Wang S, Cheng J, Wei Q, Yang H, Yao F. Development and validation of a dual-attention deep learning prediction model for noninvasive differentiation of pulmonary invasive mucinous adenocarcinoma from inflammatory pulmonary nodules. Transl Lung Cancer Res 2026;15(6):169. doi: 10.21037/tlcr-2026-1-0205

Download Citation