Attention mechanism-based habitat analysis for predicting pleural invasion and prognosis of pulmonary nodules
Highlight box
Key findings
• The study developed predictive models for visceral pleural invasion (VPI) and overall survival (OS) in cT1 stage lung adenocarcinoma using enhanced computed tomography (CT) imaging. A key innovation of this research is the utilization of attention-based transformer fusion to integrate intra-tumoral and peritumoral radiomic features, significantly enhancing the predictive accuracy. Both the Rad-adjacent model and the combined model demonstrated robust performance, achieving area under the curve (AUC) values of 0.822 and 0.819 for VPI prediction, respectively. These models provide a powerful, non-invasive tool to support surgical decision-making.
What is known and what is new?
• VPI is a critical prognostic factor in lung cancer, and current predictions rely primarily on postoperative pathological assessments. There are no effective preoperative or intraoperative methods currently available for accurately predicting VPI status.
• This study introduces an innovative approach by combining Habitat analysis with transformer fusion to predict VPI and OS. By utilizing generative adversarial network-enhanced CT images and fusing radiomic features from both intra-tumoral and peritumoral areas, the study achieves a breakthrough in non-invasive predictive accuracy.
What is the implication, and what should change now?
• This research innovatively combines Habitat analysis with transformer fusion for VPI prediction, achieving excellent results. By integrating these predictive models into clinical practice, surgeons can improve preoperative decision-making, potentially enhancing patient outcomes by identifying high-risk individuals. Future research should validate these findings across larger and more diverse cohorts while addressing computational challenges for seamless clinical integration.
Introduction
With the publication of a series of high-level research findings, an increasingly important role has been played by sublobar resection in the surgical treatment of early-stage non-small cell lung cancer (1-5). Sublobar resection, which is characterized by less invasive trauma, enables the preservation of greater pulmonary function for patients. Despite the comparable overall survival (OS) rates to lobectomy observed in long-term follow-up studies, a higher rate of local recurrence (11% vs. 5%) is faced by patients who undergo sublobar resection (1,4-6). Additionally, postoperative T-stage upstaging is experienced by approximately 0.4% to 14.6% of patients who undergo sublobar resection (1,4-6). In the 8th edition of the tumor-node-metastasis (TNM) Classification of Lung Cancer by the International Association for the Study of Lung Cancer (IASLC), visceral pleura invasion (VPI) is a key factor affecting T-stage in pulmonary nodules under 3 cm and is recognized as a risk factor for poor prognosis (7). In clinical practice, the efficacy of sublobar resection for patients with positive VPI is unclear. VPI status is typically determined postoperatively using pathological elastic fiber staining, lacking preoperative or intraoperative methods (8). This omission hampers the consideration of tumor heterogeneity, including VPI, in selecting candidates for sublobar resection.
Recently, there have been continuous advancements in computed tomography (CT) studies focusing on solid lung nodules measuring ≤3 cm (9,10). However, CT images contain a wealth of valuable information that is challenging for humans to discern manually, including variations in shape, intensity, gradient, and texture (11,12). These features constitute the heterogeneity within tumors. Numerous studies have already utilized radiomics methods to conduct valuable research on this heterogeneity (13,14). However, these methods analyze the tumor region at a macroscopic level, resulting in the loss of voxel-level information. By employing habitat analysis, the region of interest (ROI) region is analyzed at the voxel level, and then unsupervised clustering methods are used to segment the voxels into different subregions, helping to identify areas of heterogeneity within the tumor. Various machine learning models are then utilized to quantitatively analyze the characteristics of these different subregions. This approach has been applied in the fields such as resistance to immunotherapy, Ki-67 status in ovarian cancer, breast cancer, and chemotherapy for esophageal cancer (15-18). The heterogeneity within the tumor microenvironment results in varied biological behaviors and prognoses (19). In this study, the heterogeneity of the tumor microenvironment is analyzed from an imaging perspective, and its utility in predicting the prognosis of lung adenocarcinoma and VPI status is explored. This approach is expected to assist surgeons in making more precise decisions when formulating surgical plans. We present this article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2024-1122/rc).
Methods
Study design and population
This retrospective study included consecutive patients with pathologically invasive pulmonary adenocarcinoma who underwent lobectomy, with preoperative CT showing solid lung nodules measuring <3 cm, from January 2017 to December 2022 in the Fifth Affiliated Hospital of Sun Yat-sen University (Center 1) and from January 2017 to December 2020 in the Sun Yat-sen Memorial Hospital (Center 2) (Figure 1). After screening, 487 patients from Center 1 and 181 patients from Center 2 were eventually included in the study (Figure 1). All these patients were of East Asian ethnicity. In the public radiogenomics dataset (named NSCLC Radiogenomics from The Cancer Imaging Archive), a total of 211 multi-ethnic patients from North America were identified, with 74 patients ultimately included in the study (20) (Figure 1). The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committees of the Fifth Affiliated Hospital of Sun Yat-sen University [No. (2024) K83-1] and the Sun Yat-sen Memorial Hospital (No. SYSKY-2025-217-01), and individual consent for this retrospective analysis was waived. The pleural invasion status of all pulmonary nodules was verified through elastic fiber staining and documented in the pathology reports. The follow-up strategies are detailed in Appendix 1.

Image segmentation and super-resolution image reconstruction
The CT protocol and evaluation are detailed in Appendix 2. All CT scans were resampled to achieve uniform voxel dimensions of 1 mm × 1 mm × 3 mm, with imaging parameters standardized to a window width of 1,500 and a level of −600. In addition to the tumor ROI being delineated, each ROI was expanded by 5 mm (Appendix 3) to represent the peritumoral region (Figure 2). Precise segmentation was conducted using ITK-SNAP, a freely available software (http://www.itksnap.org/pmwiki/pmwiki.php). The three dimensional (3D) super-resolution image reconstruction (SRIR) in this study used a generative adversarial network (GAN) based on the OneKey AI platform (21) (Figure 2). GANs consist of a generator, which creates high-resolution images from low-resolution ones, and a discriminator, which distinguishes between real and generated images. Through adversarial training, the generator learns to improve image resolution. It could enhance the spatial resolution from a voxel size of 1 mm × 1 mm × 3 mm to 0.25 mm × 0.25 mm × 3 mm (21).

Habitat sub-region clustering and image cropping
In order to characterize the heterogeneity within the tumor, this study divided the intra-tumoral region. Based on the SRIR-processed CT images, radiomics features were extracted from a 5×5×5 voxel matrix formed around each voxel point (Figure 2). Nineteen features were included in the study (Appendix 3). K-means unsupervised clustering method was used to aggregate voxels in the normalized images into relevant sub-regions. To determine the optimal number of clusters, this study tested 2 to 9 clusters and evaluated the performance of the clustering algorithm using the Calinski-Harabasz (CH) index. Subsequently, the sub-region images were cropped at the ROI’s largest cross-section to prepare for the extraction of 2D deep learning features.
Feature extraction and model ensembling
This study employed a transformer-based fusion technique to integrate radiomic features from various ROIs, as shown in Figure 2. It began with the extraction of 1,834 radiomic features and 1,024 deep learning [vision transformer (ViT)] features from each ROI, with model training details in Appendix 4. To effectively integrate these diverse feature sets, we employed the transformer architecture, leveraging its capability to capture long-range dependencies and contextual relationships via attention mechanisms (22). The self-attention mechanism allows the model to concurrently focus on different parts of input feature vectors, directing attention from each ROI feature vector to all others within the same ROI using scaled dot-product attention to calculate attention weights and assess feature significance (22-24). Cross-attention enables interaction between feature vectors from different ROIs by using two sets of queries, keys, and values from distinct ROI feature sets, facilitating information exchange and feature reinforcement (24). After attention mechanisms, the output undergoes layer normalization and is processed through position-wise feed-forward networks to stabilize learning and ensure consistent gradient flow (22). Residual connections and dropout are included to improve performance and prevent overfitting. The enriched encoded features from both self-attention and cross-attention are concatenated into a unified representation, which is then passed through fully connected layers for classification (23,24).
Statistical analysis and model evaluation
Analyses were conducted using SPSS v26.0 and Python v3.7. Normal distribution variables were assessed through the z-test, while non-normal variables were evaluated using the Wilcoxon test. Categorical data were subjected to analysis via Chi-square or Fisher’s exact test. Multivariable logistic regression was implemented with the forward stepwise method. The area under the curve (AUC) is employed to evaluate the effectiveness of the model’s predictions, with the receiver operating characteristic (ROC) curve illustrating the trade-off between true positive and false positive rates. Optimal cut-off values (Youden’s index) for patient categorization were determined through ROC curve analysis (Appendix 5). Kaplan-Meier survival curves were compared using the log-rank test, with P<0.05 indicating significance. Model comparison was performed using the DeLong test, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Model performance was assessed through decision curves analysis (DCA) and calibration curves.
Results
Patient characteristics
In accordance with the inclusion and exclusion criteria (Figure 1), a total of 742 patients were enrolled in the study. Center 1 enrolled 487 patients, who were randomly divided into training and validation cohorts in a 7:3 ratio. Center 2 included 181 patients, and an additional 74 cases from the external database were incorporated, collectively forming the external test cohort. The details of the characteristics are provided in Table 1. A statistically significant difference in gender distribution was observed across the cohorts. While the external test cohort had a balanced gender ratio, females were slightly more prevalent in the training and validation cohorts. However, univariate and multivariate analyses indicated that gender had no significant impact on VPI status prediction (Table S1). The Kaplan-Meier survival analysis revealed the OS curves for the three cohorts, as shown in Figure S1. Among them, 33, 9, and 42 patients experienced outcome events, respectively. The median follow-up time was 49 months [95% confidence interval (CI): 46.678–51.313] for the training cohort, 49 months (95% CI: 45.453–52.547) for the validation cohort, and 57 months (95% CI: 55.265–58.735) for the test cohort. The mean survival time was 80.84 months (95% CI: 77.270–84.418) for the training cohort, 77.75 months (95% CI: 73.853–81.645) for the validating cohort, and 88.18 months (95% CI: 84.357–92.036) for the external cohort. The performance of clinical characteristics in predicting VPI status and OS was found to be poor (Appendix 6, Figure S2). Therefore, clinical models were excluded from the subsequent modeling process.
Table 1
Character | All cohorts | Training cohort | Validation cohort | External test cohort | P value |
---|---|---|---|---|---|
Age (years) | 61.0±10.20 | 60.7±9.98 | 60.7±9.09 | 61.5±11.08 | 0.65 |
Sex | 0.048 | ||||
Male | 338 (45.6) | 145 (19.5) | 61 (8.2) | 132 (17.8) | |
Female | 404 (54.4) | 196 (26.4) | 85 (11.5) | 123 (16.6) | |
Smoke | 0.06 | ||||
Negative | 535 (72.1) | 257 (34.6) | 108 (14.6) | 170 (22.9) | |
Positive | 207 (27.9) | 84 (11.3) | 38 (5.1) | 85 (11.5) | |
Location | 0.16 | ||||
RUL | 224 (30.2) | 103 (13.9) | 42 (5.7) | 79 (10.6) | |
RML | 70 (9.4) | 32 (4.3) | 10 (1.3) | 28 (3.8) | |
RLL | 134 (18.1) | 58 (7.8) | 32 (4.3) | 44 (5.9) | |
LUL | 185 (24.9) | 78 (10.5) | 36 (4.9) | 71 (9.6) | |
LLL | 129 (17.4) | 70 (9.4) | 26 (3.5) | 33 (4.4) | |
VPI status | 0.09 | ||||
Negative | 456 (61.5) | 220 (29.6) | 93 (12.5) | 143 (19.3) | |
Positive | 286 (38.5) | 121 (16.3) | 53 (7.1) | 112 (15.1) | |
Lymphovascular and perineural invasion | 0.13 | ||||
Negative | 512 (69.0) | 223 (30.0) | 103 (13.9) | 186 (25.1) | |
Positive | 230 (31.0) | 118 (15.9) | 43 (5.8) | 69 (9.3) | |
STAS | 0.13 | ||||
Negative | 529 (71.3) | 238 (32.1) | 98 (13.2) | 193 (26.0) | |
Positive | 213 (28.7) | 103 (13.9) | 48 (6.5) | 62 (8.4) | |
Differentiation | 0.55 | ||||
Well | 107 (14.4) | 45 (6.1) | 26 (3.5) | 36 (4.9) | |
Moderately | 405 (54.6) | 188 (25.3) | 72 (9.7) | 145 (19.5) | |
Poorly | 230 (31.0) | 108 (14.6) | 48 (6.5) | 74 (10.0) | |
Clear | 0.09 | ||||
Negative | 446 (60.1) | 205 (27.6) | 98 (13.2) | 143 (19.3) | |
Positive | 296 (39.9) | 136 (18.3) | 48 (6.5) | 112 (15.1) | |
Lobulated sign | 0.17 | ||||
Negative | 134 (18.1) | 53 (7.1) | 26 (3.5) | 55 (7.4) | |
Positive | 608 (81.9) | 288 (38.8) | 120 (16.2) | 200 (27.0) | |
Spiculated sign | 0.11 | ||||
Negative | 269 (36.3) | 124 (16.7) | 43 (5.8) | 102 (13.7) | |
Positive | 473 (63.7) | 217 (29.2) | 103 (13.9) | 153 (20.6) | |
Pleural indentation | 0.08 | ||||
Negative | 338 (45.6) | 168 (22.6) | 68 (9.2) | 102 (13.7) | |
Positive | 404 (54.4) | 173 (23.3) | 78 (10.5) | 153 (20.6) | |
Air bronchogram | 0.11 | ||||
Negative | 591 (79.6) | 265 (35.7) | 112 (15.1) | 214 (28.8) | |
Positive | 151 (20.4) | 76 (10.2) | 34 (4.6) | 41 (5.5) | |
Vessel convergence | 0.17 | ||||
Negative | 491 (66.2) | 230 (31.0) | 87 (11.7) | 174 (23.5) | |
Positive | 251 (33.8) | 111 (15.0) | 59 (8.0) | 81 (10.9) | |
Vacuole sign | 0.18 | ||||
Negative | 620 (83.6) | 279 (37.6) | 119 (16.0) | 222 (29.9) | |
Positive | 122 (16.4) | 62 (8.4) | 27 (3.6) | 33 (4.4) | |
cT stage | 0.08 | ||||
T1a | 214 (28.8) | 90 (12.1) | 42 (5.7) | 82 (11.1) | |
T1b | 380 (51.2) | 193 (26.0) | 72 (9.7) | 115 (15.5) | |
T1c | 148 (19.9) | 58 (7.8) | 32 (4.3) | 58 (7.8) | |
LN metastasis | 0.054 | ||||
N0 | 667 (89.9) | 311 (41.9) | 132 (17.8) | 224 (30.2) | |
N1 | 49 (6.6) | 24 (3.2) | 10 (1.3) | 15 (2.0) | |
N2 | 26 (3.5) | 6 (0.8) | 4 (0.5) | 16 (2.2) |
Values are mean ± SD or n (%) unless otherwise defined. LLL, left lower lobe; LN, lymph node; LUL, left upper lobe; RLL, right lower lobe; RML, right middle lobe; RUL, right upper lobe; SD, standard deviation; STAS, spread through air spaces; VPI, visceral pleura invasion.
Sub-region cluster and feature extraction
The resolution of the original CT images was enhanced using GAN-based SRIR. Based on these enhanced images, intra-tumoral habitat features were extracted. Subsequently, K-means clustering was performed, and the clustering effectiveness was evaluated using the CH index. The results showed that the optimal number of clusters in the training cohort was three (Figure 2), leading to the division of the intra-tumoral region into three subregions. From each subregion of each patient, 1,834 radiomics features and 1,024 3D ViT features were extracted. Axial images were cropped from the largest axial section of the intra-tumoral ROI, and 1,024 ViT features were extracted from three subregions within these images (Figure 2). Missing subregion features for certain samples were imputed using the K-nearest neighbor method, specifically for Rad/ViT-Intra-H1, H2, and H3. The peritumoral region was defined by extending 5 mm outward from the boundary of the intra-tumoral region, and radiomics and 2D/3D ViT features were extracted using the same methods. In this study, the intra-tumoral plus peritumoral region was defined as adjacent.
VPI status prediction model
After extracting features from the three intra-tumoral subregions, the status of VPI was predicted using a fully connected layer based on transformer fusion. In the external test cohort (Figure 3 and Table 2), the Rad-intra model achieved the highest AUC value of 0.775 (95% CI: 0.7171–0.8323) (Table 2). The ViT-3D-intra model performed the worst, with an AUC value of only 0.687 (95% CI: 0.6209–0.7525). The new model was established by incorporating features from the peritumoral region, which significantly improved predictive performance (Table 2). In the external test cohort, the Rad-adjacent model achieved an AUC value of 0.822 (95% CI: 0.7726–0.8724). Compared to the Rad-intra model, the DeLong test did not indicate a statistically significant difference in the ROC curves between the two models (P=0.16). The NRI for the Rad-adjacent model reached 0.022. However, the IDI analysis between the models did not reveal any statistically significant difference (Figure 3). For the ViT-3D model, the AUC value increased from 0.687 to 0.779 after incorporating peritumoral features (Table 2). The DeLong test resulted in a P value of 0.023, indicating a statistically significant difference. The NRI value reached 0.112, but the IDI analysis did not achieve statistical significance (Figure 3).

Table 2
Model | Accuracy | AUC (95% CI) | Sensitivity | Specificity | PPV | NPV | F1 | Cohort |
---|---|---|---|---|---|---|---|---|
Rad-intra | 0.718 | 0.905 (0.8681–0.9422) | 0.952 | 0.667 | 0.388 | 0.984 | 0.551 | Train |
0.849 | 0.869 (0.7956–0.9423) | 0.679 | 0.890 | 0.594 | 0.921 | 0.633 | Validation | |
0.741 | 0.775 (0.7171–0.8323) | 0.639 | 0.831 | 0.768 | 0.724 | 0.697 | Test | |
Rad-adjacent | 0.883 | 0.947 (0.9218–0.9725) | 0.903 | 0.878 | 0.622 | 0.976 | 0.737 | Train |
0.753 | 0.843 (0.7652–0.9201) | 0.786 | 0.746 | 0.423 | 0.936 | 0.550 | Validation | |
0.753 | 0.822 (0.7726–0.8724) | 0.639 | 0.853 | 0.792 | 0.730 | 0.707 | Test | |
ViT-2D-intra | 0.795 | 0.936 (0.9098–0.9615) | 0.968 | 0.756 | 0.469 | 0.991 | 0.632 | Train |
0.644 | 0.784 (0.6914–0.8770) | 0.857 | 0.593 | 0.333 | 0.946 | 0.480 | Validation | |
0.675 | 0.707 (0.6440–0.7710) | 0.790 | 0.574 | 0.618 | 0.757 | 0.694 | Test | |
ViT-2D-adjacent | 0.815 | 0.892 (0.8540–0.9301) | 0.790 | 0.821 | 0.495 | 0.946 | 0.609 | Train |
0.815 | 0.825 (0.7345–0.9156) | 0.643 | 0.856 | 0.514 | 0.910 | 0.571 | Validation | |
0.714 | 0.754 (0.6955–0.8125) | 0.655 | 0.765 | 0.709 | 0.717 | 0.681 | Test | |
ViT-3D-intra | 0.850 | 0.924 (0.8953–0.9535) | 0.855 | 0.849 | 0.558 | 0.963 | 0.675 | Train |
0.760 | 0.817 (0.7245–0.9099) | 0.679 | 0.780 | 0.422 | 0.911 | 0.521 | Validation | |
0.667 | 0.687 (0.6209–0.7525) | 0.571 | 0.750 | 0.667 | 0.667 | 0.615 | Test | |
ViT-3D-adjacent | 0.845 | 0.865 (0.8095–0.9197) | 0.758 | 0.864 | 0.553 | 0.941 | 0.639 | Train |
0.911 | 0.863 (0.7577–0.9675) | 0.714 | 0.958 | 0.800 | 0.934 | 0.755 | Validation | |
0.722 | 0.779 (0.7228–0.8357) | 0.647 | 0.787 | 0.726 | 0.718 | 0.684 | Test | |
Combine | 0.883 | 0.933 (0.9003–0.9648) | 0.790 | 0.903 | 0.645 | 0.951 | 0.710 | Train |
0.808 | 0.867 (0.8051–0.9297) | 0.857 | 0.797 | 0.500 | 0.959 | 0.632 | Validation | |
0.737 | 0.819 (0.7693–0.8686) | 0.899 | 0.596 | 0.660 | 0.871 | 0.762 | Test |
2D, two dimensional; AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value; ViT, vision transformer; VPI, visceral pleural invasion.
Radiomics features were reduced to 1,024 dimensions using principal component analysis (PCA). Subsequently, these features were integrated to develop a combined model. The results showed that the AUC value reached 0.819 (95% CI: 0.7693–0.8686). NRI analysis indicated positive improvement in predictive performance compared to all other models. Compared to the ViT-3D-intra model and the ViT-2D-intra model, the NRI values were found to be 0.173 and 0.131, respectively. The DeLong test indicated that the differences between the models were statistically significant, with P values of 0.007 and <0.001; however, the IDI analysis did not demonstrate statistical significance. Compared to the Rad-adjacent model, the combined model showed a slight positive improvement, with an NRI value of 0.003, and the IDI analysis did not indicate statistical significance. The DeLong test also did not show a statistically significant difference in the ROC curves between the two models (P=0.89) (Figure 3, Figure S3, and Table 2).
OS stratification and prediction model
In the external test cohort, both the Rad-adjacent model and the combined model showed the strongest stratification capability, effectively distinguishing patients into low-risk and high-risk groups. The differences in stratification between the two groups were statistically significant, with P values of 0.032 and 0.013 (Figure 4, Figure S4, and Figure S5).

Using survival status as the predictive indicator, in the test cohort, the combine model achieved an AUC of 0.821 (95% CI: 0.7692–0.8724) when predicting 5-year OS (Table 3). NRI analysis indicated that the combine model exhibited a positive improvement. However, the results of the IDI test revealed no statistical differences between the combined model and the other models. This indicates that, although there was some improvement in classification for predicting 5-year OS, there was no significant enhancement in discriminatory ability (Figure 5 and Figure S6). Interestingly, in the 5-year OS analysis, although the Rad-adjacent model did not have the highest AUC value (0.775, 95% CI: 0.7180–0.8312) and the NRI index was not the highest, the improvements of the Rad-adjacent model compared to the 3D-intra models were statistically significant according to the IDI test (Figure 5).
Table 3
5-year OS model | Accuracy | AUC (95% CI) | Sensitivity | Specificity | PPV | NPV | F1 | Cohort |
---|---|---|---|---|---|---|---|---|
Rad-intra | 0.844 | 0.855 (0.7962–0.9137) | 0.673 | 0.888 | 0.607 | 0.913 | 0.638 | Train |
0.726 | 0.839 (0.7764–0.9013) | 0.964 | 0.669 | 0.409 | 0.987 | 0.574 | Validation | |
0.647 | 0.706 (0.6434–0.7694) | 0.849 | 0.471 | 0.584 | 0.780 | 0.692 | Test | |
Rad-adjacent | 0.822 | 0.908 (0.8680–0.9475) | 0.818 | 0.822 | 0.542 | 0.946 | 0.652 | Train |
0.801 | 0.867 (0.8105–0.9232) | 0.929 | 0.771 | 0.491 | 0.978 | 0.642 | Validation | |
0.706 | 0.775 (0.7180–0.8312) | 0.790 | 0.632 | 0.653 | 0.775 | 0.715 | Test | |
ViT-2D-intra | 0.848 | 0.899 (0.8462–0.9521) | 0.891 | 0.836 | 0.583 | 0.968 | 0.705 | Train |
0.815 | 0.795 (0.6922–0.8974) | 0.571 | 0.873 | 0.516 | 0.896 | 0.542 | Validation | |
0.612 | 0.604 (0.5344–0.6744) | 0.370 | 0.824 | 0.647 | 0.599 | 0.471 | Test | |
ViT-2D-adjacent | 0.751 | 0.843 (0.7888–0.8976) | 0.782 | 0.743 | 0.439 | 0.930 | 0.562 | Train |
0.822 | 0.755 (0.6458–0.8639) | 0.536 | 0.890 | 0.536 | 0.890 | 0.536 | Validation | |
0.639 | 0.686 (0.6213–0.7512) | 0.832 | 0.471 | 0.579 | 0.762 | 0.683 | Test | |
ViT-3D-intra | 0.751 | 0.812 (0.7508–0.8734) | 0.709 | 0.762 | 0.433 | 0.911 | 0.538 | Train |
0.664 | 0.755 (0.6673–0.8430) | 0.786 | 0.636 | 0.338 | 0.926 | 0.473 | Validation | |
0.627 | 0.658 (0.5915–0.7255) | 0.655 | 0.603 | 0.591 | 0.667 | 0.622 | Test | |
ViT-3D-adjacent | 0.866 | 0.850 (0.7808–0.9190) | 0.745 | 0.897 | 0.651 | 0.932 | 0.695 | Train |
0.767 | 0.787 (0.6839–0.8906) | 0.679 | 0.788 | 0.432 | 0.912 | 0.528 | Validation | |
0.690 | 0.733 (0.6712–0.7940) | 0.664 | 0.713 | 0.669 | 0.708 | 0.667 | Test | |
Combine | 0.799 | 0.920 (0.8792–0.9612) | 0.945 | 0.762 | 0.505 | 0.982 | 0.658 | Train |
0.877 | 0.900 (0.8170–0.9838) | 0.786 | 0.898 | 0.647 | 0.946 | 0.710 | Validation | |
0.745 | 0.821 (0.7692–0.8724) | 0.782 | 0.713 | 0.705 | 0.789 | 0.741 | Test |
2D, two dimensional; AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; OS, overall survival; PPV, positive predictive value; ViT, vision transformer; VPI, visceral pleural invasion.

Discussion
This study utilized preoperative CT imaging to analyze intra-tumoral subregions and peritumoral features, establishing VPI status models for cT1 stage solid lung adenocarcinoma. The AUC values for the Rad-adjacent and combine models reached 0.822 and 0.819, respectively. Additionally, OS prediction models were developed, demonstrating utility and model generalization capabilities across multinational and multiethnic independent datasets. These models provide clinicians with robust stratification and prognostic information prior to surgery, aiding surgical decision-making.
It has been demonstrated that super-resolution techniques based on GANs could significantly enhance image resolution and improve the predictive efficacy of models utilized for tumor staging from imaging data (25,26). Additionally, a study has indicated that GANs can bolster the robustness of radiomics features in CT images (27). Evaluations across diverse medical imaging modalities, including CT, magnetic resonance imaging (MRI), and ultrasound, have indicated notable advancements in both image quality and spatial resolution (25-28). These studies highlight the promising potential of applying SRIR to medical image analysis, although further practical applications are needed to confirm clinical utility. Currently, few studies have used CT images to determine VPI status in lung adenocarcinoma with external validation, and most methodologies employ images with original resolution. These studies, which utilize deep learning or radiomics approaches to predict VPI status, have typically achieved AUC values below 0.70 in external validation cohorts (29,30). Choi et al. developed a deep learning model to predict VPI, achieving an AUC of 0.75, comparable to the performance of radiologists (31). However, this study included some patients with higher stages beyond cT1. To extract more valuable information from existing CT imaging data, this study employed GANs to enhance the resolution of the original images. Based on these enhanced images, intra-tumoral subregions were analyzed to obtain more detailed intra-tumoral feature information. Additionally, features from the peritumoral regions were extracted in this study.
For the integration of these different regional feature information, the fusion method used in modeling is crucial. Common fusion methods in image analysis, such as early fusion, late fusion, and intermediate fusion, may introduce redundant information, propagate noise, and fail to fully utilize the potential inter-modal relationships. These methods often overlook the synergistic effects between multimodal data, leading to suboptimal fusion outcomes (32-34). Lin et al. developed a convolutional neural network model based on 3D-ResNet-9 to predict VPI status. In the external validation set, the clinical model achieved an AUC of 0.66, the deep learning model achieved an AUC of 0.62, but the combined model only achieved an AUC of 0.69 (29). Additionally, this study only discussed the predictive performance of the 3D model. This study employed a transformer-based fusion method to capture the relationships and dependencies between different subregions through an attention mechanism. In the external test cohort, the combined model (AUC: 0.819) and the Rad-adjacent model (AUC: 0.822) both demonstrated good predictive performance and model generalization capability when predicting VPI status.
It has been reported that VPI is a prognostic factor in lung cancer patients (35,36). Previous studies on the CT semantic features of VPI suggested that these features could not independently predict the survival of patients with cT1 stage lung adenocarcinoma (37). This result is generally consistent with the findings of the present study. Based on VPI status, Lin et al. developed a deep learning model that achieved good prognostic stratification efficacy, with P=0.02 (29). This conclusion was also validated in this study using a multinational, multicenter external test cohort. Furthermore, this study extended the analysis to evaluate the prognostic prediction capability of this method for OS. Although the prediction results over long time scales remain unstable, the prediction efficacy for 5-year OS was maximized, with the combined model achieving an AUC of 0.821.
There are several limitations in this study. Firstly, the sample size included in this study is relatively small, and the retrospective nature of the study inevitably leads to selection bias in the population. Secondly, to fully consider the generalizability and robustness of the study conclusions, researchers should place greater emphasis on the impact of different CT scanners, institutions, and parameters on the results. Thirdly, this study still employed manual segmentation of the ROI, which brings about several long-standing issues that need to be addressed: time-consuming processes, subjectivity, poor reproducibility, and difficulties in scaling. Fourthly, the follow-up time for the training cohort included in this study was relatively short, which may limit the models’ predictive efficacy for OS. Fifthly, the methods used in this study require considerable computational resources and are time-consuming, which also limits their applicability.
Conclusions
In summary, the application of GANs for SRIR enhanced the detail of CT images. This facilitated a comprehensive analysis of tumor heterogeneity, incorporating peritumoral information with intra-tumoral sub-regional features. Consequently, the developed models demonstrated robust predictive capabilities for VPI status and OS in patients with cT1 stage lung adenocarcinoma. This approach offers valuable non-invasive preoperative decision-making support for surgeons.
Acknowledgments
The authors would like to extend our gratitude to the OneKey platform (https://github.com/OnekeyAI-Platform) for its invaluable contribution to this study.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2024-1122/rc
Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2024-1122/dss
Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2024-1122/prf
Funding: None.
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2024-1122/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of our institutions [No. (2024) K83-1 (the Fifth Affiliated Hospital of Sun Yat-sen University) and No. SYSKY-2025-217-01 (the Sun Yat-sen Memorial Hospital)]. Individual consent for this retrospective analysis was waived.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Aokage K, Suzuki K, Saji H, et al. Segmentectomy for ground-glass-dominant lung cancer with a tumour diameter of 3 cm or less including ground-glass opacity (JCOG1211): a multicentre, single-arm, confirmatory, phase 3 trial. Lancet Respir Med 2023;11:540-9. [Crossref] [PubMed]
- Altorki N, Wang X, Kozono D, et al. Lobar or Sublobar Resection for Peripheral Stage IA Non-Small-Cell Lung Cancer. N Engl J Med 2023;388:489-98. [Crossref] [PubMed]
- Suzuki K, Watanabe SI, Wakabayashi M, et al. A single-arm study of sublobar resection for ground-glass opacity dominant peripheral lung cancer. J Thorac Cardiovasc Surg 2022;163:289-301.e2. [Crossref] [PubMed]
- Saji H, Okada M, Tsuboi M, et al. Segmentectomy versus lobectomy in small-sized peripheral non-small-cell lung cancer (JCOG0802/WJOG4607L): a multicentre, open-label, phase 3, randomised, controlled, non-inferiority trial. Lancet 2022;399:1607-17. [Crossref] [PubMed]
- Ito H, Suzuki K, Mizutani T, et al. Long-term survival outcome after lobectomy in patients with clinical T1 N0 lung cancer. J Thorac Cardiovasc Surg 2020; Epub ahead of print. [Crossref]
- Speicher PJ, Gu L, Gulack BC, et al. Sublobar Resection for Clinical Stage IA Non-small-cell Lung Cancer in the United States. Clin Lung Cancer 2016;17:47-55. [Crossref] [PubMed]
- Goldstraw P, Chansky K, Crowley J, et al. The IASLC Lung Cancer Staging Project: Proposals for Revision of the TNM Stage Groupings in the Forthcoming (Eighth) Edition of the TNM Classification for Lung Cancer. J Thorac Oncol 2016;11:39-51. [Crossref] [PubMed]
- Travis WD, Brambilla E, Noguchi M, et al. International association for the study of lung cancer/american thoracic society/european respiratory society international multidisciplinary classification of lung adenocarcinoma. J Thorac Oncol 2011;6:244-85. [Crossref] [PubMed]
- Sun Q, Li P, Zhang J, et al. CT Predictors of Visceral Pleural Invasion in Patients with Non-Small Cell Lung Cancers 30 mm or Smaller. Radiology 2024;310:e231611. [Crossref] [PubMed]
- Onoda H, Higashi M, Murakami T, et al. Correlation between pleural tags on CT and visceral pleural invasion of peripheral lung cancer that does not appear touching the pleural surface. Eur Radiol 2021;31:9022-9. [Crossref] [PubMed]
- Bera K, Velcheti V, Madabhushi A. Novel Quantitative Imaging for Predicting Response to Therapy: Techniques and Clinical Applications. Am Soc Clin Oncol Educ Book 2018;38:1008-18. [Crossref] [PubMed]
- Han F, Wang H, Zhang G, et al. Texture feature analysis for computer-aided diagnosis on pulmonary nodules. J Digit Imaging 2015;28:99-115. [Crossref] [PubMed]
- Just N. Improving tumour heterogeneity MRI assessment with histograms. Br J Cancer 2014;111:2205-13. [Crossref] [PubMed]
- O'Connor JP, Rose CJ, Waterton JC, et al. Imaging intratumor heterogeneity: role in therapy response, resistance, and clinical outcome. Clin Cancer Res 2015;21:249-57. [Crossref] [PubMed]
- Syed AK, Whisenant JG, Barnes SL, et al. Multiparametric Analysis of Longitudinal Quantitative MRI data to Identify Distinct Tumor Habitats in Preclinical Models of Breast Cancer. Cancers (Basel) 2020;12:1682. [Crossref] [PubMed]
- Vaidya P, Bera K, Patil PD, et al. Novel, non-invasive imaging approach to identify patients with advanced non-small cell lung cancer at risk of hyperprogressive disease with immune checkpoint blockade. J Immunother Cancer 2020;8:e001343. [Crossref] [PubMed]
- Wang X, Xu C, Grzegorzek M, et al. Habitat radiomics analysis of pet/ct imaging in high-grade serous ovarian cancer: Application to Ki-67 status and progression-free survival. Front Physiol 2022;13:948767. [Crossref] [PubMed]
- Xie C, Yang P, Zhang X, et al. Sub-region based radiomics analysis for survival prediction in oesophageal tumours treated by definitive concurrent chemoradiotherapy. EBioMedicine 2019;44:289-97. [Crossref] [PubMed]
- Marusyk A, Polyak K. Tumor heterogeneity: causes and consequences. Biochim Biophys Acta 2010;1805:105-17. [Crossref] [PubMed]
- Bakr S, Gevaert O, Echegaray S, et al. A radiogenomic dataset of non-small cell lung cancer. Sci Data 2018;5:180202. [Crossref] [PubMed]
- wangqingbaidu. OnekeyAI-Platform. 2024.7.31. Available online: https://github.com/OnekeyAI-Platform
- Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 2020;21:1-67.
- Liang Z, Zhao K, Liang G, et al. MAXFormer: Enhanced transformer for medical image segmentation with multi-attention and multi-scale features fusion. Knowledge-Based Systems 2023;280:110987.
- Carion N, Massa F, Synnaeve G, et al. End-to-End Object Detection with Transformers. In: Vedaldi A, Bischof H, Brox T, Frahm JM. (eds) Computer Vision–ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12346. Springer, Cham.
- Ma C, Yang CY, Yang X, et al. Learning a no-reference quality metric for single-image super-resolution. Computer Vision and Image Understanding 2017;158:1-16.
- Fan M, Liu Z, Xu M, et al. Generative adversarial network-based super-resolution of diffusion-weighted imaging: Application to tumour radiomics in breast cancer. NMR Biomed 2020;33:e4345. [Crossref] [PubMed]
- de Farias EC, di Noia C, Han C, et al. Impact of GAN-based lesion-focused medical image super-resolution on the robustness of radiomic features. Sci Rep 2021;11:21361. [Crossref] [PubMed]
- Hou M, Zhou L, Sun J. Deep-learning-based 3D super-resolution MRI radiomics model: superior predictive performance in preoperative T-staging of rectal cancer. Eur Radiol 2023;33:1-10. [Crossref] [PubMed]
- Lin X, Liu K, Li K, et al. A CT-based deep learning model: visceral pleural invasion and survival prediction in clinical stage IA lung adenocarcinoma. iScience 2024;27:108712. [Crossref] [PubMed]
- Kong L, Xue W, Zhao H, et al. Predicting pleural invasion of invasive lung adenocarcinoma in the adjacent pleura by imaging histology. Oncol Lett 2023;26:438. [Crossref] [PubMed]
- Choi H, Kim H, Hong W, et al. Prediction of visceral pleural invasion in lung cancer on CT: deep learning model achieves a radiologist-level performance with adaptive sensitivity and specificity to clinical needs. Eur Radiol 2021;31:2866-76. [Crossref] [PubMed]
- Cichy RM, Pantazis D, Oliva A. Similarity-Based Fusion of MEG and fMRI Reveals Spatio-Temporal Dynamics in Human Cortex During Visual Object Recognition. Cereb Cortex 2016;26:3563-79. [Crossref] [PubMed]
- Owens A, Wu J, Mcdermott JH, et al. Ambient Sound Provides Supervision for Visual Learning. Springer, Cham 2016.
- Cichy RM, Pantazis D, Oliva A. Resolving human object recognition in space and time. Nat Neurosci 2014;17:455-62. [Crossref] [PubMed]
- Yoshida J, Nagai K, Asamura H, et al. Visceral pleura invasion impact on non-small cell lung cancer patient survival: its implications for the forthcoming TNM staging based on a large-scale nation-wide database. J Thorac Oncol 2009;4:959-63. [Crossref] [PubMed]
- Jiang L, Liang W, Shen J, et al. The impact of visceral pleural invasion in node-negative non-small cell lung cancer: a systematic review and meta-analysis. Chest 2015;148:903-11. [Crossref] [PubMed]
- Kim H, Goo JM, Kim YT, et al. CT-defined Visceral Pleural Invasion in T1 Lung Adenocarcinoma: Lack of Relationship to Disease-Free Survival. Radiology 2019;292:741-9. [Crossref] [PubMed]