Attention mechanism-based habitat analysis for predicting pleural invasion and prognosis of pulmonary nodules

Wei Zhang; Xiangfeng Gan; Wenzeng Chen; Xiaohui Duan; Zhuojian Shen; Haohua Xu; Honglue Dai; Ju Chen; Baishen Chen

doi:10.21037/tlcr-2024-1122

Original Article

Attention mechanism-based habitat analysis for predicting pleural invasion and prognosis of pulmonary nodules

Wei Zhang^1#, Xiangfeng Gan^2#, Wenzeng Chen³, Xiaohui Duan⁴, Zhuojian Shen¹, Haohua Xu³, Honglue Dai³, Ju Chen¹, Baishen Chen¹

¹Department of Thoracic Surgery, Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China; ²Department of Thoracic Surgery, the Fifth Affiliated Hospital of Sun Yat-sen University, Sun Yat-sen University, Zhuhai, China; ³Department of Thoracic Surgery, Shenshan Medical Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Shanwei, China; ⁴Department of Radiology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China

Contributions: (I) Conception and design: B Chen; (II) Administrative support: B Chen; (III) Provision of study materials or patients: W Zhang, X Gan; (IV) Collection and assembly of data: W Zhang, X Gan, X Duan, W Chen; (V) Data analysis and interpretation: W Zhang, X Gan; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Baishen Chen, MD. Department of Thoracic Surgery, Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, 107 Yanjiang Road West, Guangzhou 510120, China. Email: pason06213367@163.com.

Background: The use of segmental resection in pulmonary adenocarcinoma is increasing, yet visceral pleural invasion (VPI) remains a critical risk factor impacting overall survival (OS). The benefits of segmental resection for these patients are unclear, and non-invasive methods to predict VPI need further development. This study aims to develop a predictive model for VPI and OS, aiding surgeons in preoperative and intraoperative decision-making.

Methods: A retrospective study was conducted using data from the Sun Yat-sen Memorial Hospital, the Fifth Affiliated Hospital of Sun Yat-sen University and an external dataset (named NSCLC Radiogenomics from The Cancer Imaging Archive) of cT1 stage pulmonary nodules. Original computed tomography images were enhanced using generative adversarial networks. Habitat analysis identified tumor subregions, which were clustered. Radiomics and vision transformer features were extracted and integrated using attention-equipped transformers to develop prediction models. Performance was evaluated using receiver operating characteristic (ROC) curves, net reclassification improvement, and integrated discrimination improvement.

Results: The study included 742 patients, comprising 338 males and 404 females, with a mean age of 61±10.2 years. Data from the Fifth Affiliated Hospital of Sun Yat-sen University were divided into training and validation cohorts, while data from the Sun Yat-sen Memorial Hospital and the NSCLC Radiogenomics dataset formed the test cohort. The Rad-adjacent model had an area under the curve (AUC) of 0.822 for predicting VPI, while the combined model achieved an AUC of 0.819. For predicting 5-year OS, the combined model’s AUC was 0.821, compared to 0.775 for the Rad-adjacent model.

Conclusions: The developed models show strong predictive capabilities for VPI and OS in cT1 stage lung adenocarcinoma, providing valuable non-invasive support for surgical decision-making.

Keywords: Non-small cell lung cancer; habitat analysis; generative adversarial networks (GANs); transformer fusion

Submitted Nov 21, 2024. Accepted for publication Mar 10, 2025. Published online May 28, 2025.

doi: 10.21037/tlcr-2024-1122

Highlight box

Key findings

• The study developed predictive models for visceral pleural invasion (VPI) and overall survival (OS) in cT1 stage lung adenocarcinoma using enhanced computed tomography (CT) imaging. A key innovation of this research is the utilization of attention-based transformer fusion to integrate intra-tumoral and peritumoral radiomic features, significantly enhancing the predictive accuracy. Both the Rad-adjacent model and the combined model demonstrated robust performance, achieving area under the curve (AUC) values of 0.822 and 0.819 for VPI prediction, respectively. These models provide a powerful, non-invasive tool to support surgical decision-making.

What is known and what is new?

• VPI is a critical prognostic factor in lung cancer, and current predictions rely primarily on postoperative pathological assessments. There are no effective preoperative or intraoperative methods currently available for accurately predicting VPI status.

• This study introduces an innovative approach by combining Habitat analysis with transformer fusion to predict VPI and OS. By utilizing generative adversarial network-enhanced CT images and fusing radiomic features from both intra-tumoral and peritumoral areas, the study achieves a breakthrough in non-invasive predictive accuracy.

What is the implication, and what should change now?

• This research innovatively combines Habitat analysis with transformer fusion for VPI prediction, achieving excellent results. By integrating these predictive models into clinical practice, surgeons can improve preoperative decision-making, potentially enhancing patient outcomes by identifying high-risk individuals. Future research should validate these findings across larger and more diverse cohorts while addressing computational challenges for seamless clinical integration.

Introduction

With the publication of a series of high-level research findings, an increasingly important role has been played by sublobar resection in the surgical treatment of early-stage non-small cell lung cancer (1-5). Sublobar resection, which is characterized by less invasive trauma, enables the preservation of greater pulmonary function for patients. Despite the comparable overall survival (OS) rates to lobectomy observed in long-term follow-up studies, a higher rate of local recurrence (11% vs. 5%) is faced by patients who undergo sublobar resection (1,4-6). Additionally, postoperative T-stage upstaging is experienced by approximately 0.4% to 14.6% of patients who undergo sublobar resection (1,4-6). In the 8th edition of the tumor-node-metastasis (TNM) Classification of Lung Cancer by the International Association for the Study of Lung Cancer (IASLC), visceral pleura invasion (VPI) is a key factor affecting T-stage in pulmonary nodules under 3 cm and is recognized as a risk factor for poor prognosis (7). In clinical practice, the efficacy of sublobar resection for patients with positive VPI is unclear. VPI status is typically determined postoperatively using pathological elastic fiber staining, lacking preoperative or intraoperative methods (8). This omission hampers the consideration of tumor heterogeneity, including VPI, in selecting candidates for sublobar resection.

Recently, there have been continuous advancements in computed tomography (CT) studies focusing on solid lung nodules measuring ≤3 cm (9,10). However, CT images contain a wealth of valuable information that is challenging for humans to discern manually, including variations in shape, intensity, gradient, and texture (11,12). These features constitute the heterogeneity within tumors. Numerous studies have already utilized radiomics methods to conduct valuable research on this heterogeneity (13,14). However, these methods analyze the tumor region at a macroscopic level, resulting in the loss of voxel-level information. By employing habitat analysis, the region of interest (ROI) region is analyzed at the voxel level, and then unsupervised clustering methods are used to segment the voxels into different subregions, helping to identify areas of heterogeneity within the tumor. Various machine learning models are then utilized to quantitatively analyze the characteristics of these different subregions. This approach has been applied in the fields such as resistance to immunotherapy, Ki-67 status in ovarian cancer, breast cancer, and chemotherapy for esophageal cancer (15-18). The heterogeneity within the tumor microenvironment results in varied biological behaviors and prognoses (19). In this study, the heterogeneity of the tumor microenvironment is analyzed from an imaging perspective, and its utility in predicting the prognosis of lung adenocarcinoma and VPI status is explored. This approach is expected to assist surgeons in making more precise decisions when formulating surgical plans. We present this article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2024-1122/rc).

Methods

Study design and population

This retrospective study included consecutive patients with pathologically invasive pulmonary adenocarcinoma who underwent lobectomy, with preoperative CT showing solid lung nodules measuring <3 cm, from January 2017 to December 2022 in the Fifth Affiliated Hospital of Sun Yat-sen University (Center 1) and from January 2017 to December 2020 in the Sun Yat-sen Memorial Hospital (Center 2) (Figure 1). After screening, 487 patients from Center 1 and 181 patients from Center 2 were eventually included in the study (Figure 1). All these patients were of East Asian ethnicity. In the public radiogenomics dataset (named NSCLC Radiogenomics from The Cancer Imaging Archive), a total of 211 multi-ethnic patients from North America were identified, with 74 patients ultimately included in the study (20) (Figure 1). The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committees of the Fifth Affiliated Hospital of Sun Yat-sen University [No. (2024) K83-1] and the Sun Yat-sen Memorial Hospital (No. SYSKY-2025-217-01), and individual consent for this retrospective analysis was waived. The pleural invasion status of all pulmonary nodules was verified through elastic fiber staining and documented in the pathology reports. The follow-up strategies are detailed in Appendix 1.

Figure 1 Flow diagram of the study population. Center 1, the Fifth Affiliated Hospital of Sun Yat-sen University. Center 2, Sun Yat-sen Memorial Hospital, Sun Yat-sen University. CT, computed tomography.

Image segmentation and super-resolution image reconstruction

The CT protocol and evaluation are detailed in Appendix 2. All CT scans were resampled to achieve uniform voxel dimensions of 1 mm × 1 mm × 3 mm, with imaging parameters standardized to a window width of 1,500 and a level of −600. In addition to the tumor ROI being delineated, each ROI was expanded by 5 mm (Appendix 3) to represent the peritumoral region (Figure 2). Precise segmentation was conducted using ITK-SNAP, a freely available software (http://www.itksnap.org/pmwiki/pmwiki.php). The three dimensional (3D) super-resolution image reconstruction (SRIR) in this study used a generative adversarial network (GAN) based on the OneKey AI platform (21) (Figure 2). GANs consist of a generator, which creates high-resolution images from low-resolution ones, and a discriminator, which distinguishes between real and generated images. Through adversarial training, the generator learns to improve image resolution. It could enhance the spatial resolution from a voxel size of 1 mm × 1 mm × 3 mm to 0.25 mm × 0.25 mm × 3 mm (21).

Figure 2 The workflow of this study. Step 1: in the lung window of regular images, ROIs are delineated. GAN technology is applied for SRIR, enhancing the resolution of the original data to 0.25×0.25×3 mm³. From the super-resolution images obtained, 19 intra-tumoral habitat features are extracted. Heatmaps of two of these features are shown in the figure. Habitat features are analyzed using k-means unsupervised clustering. The CH index, as illustrated, shows its maximum value when the number of clusters is 3. The voxel percentages for Subregion H1, H2, and H3 are 53.09%, 33.57%, and 13.34%, respectively. Step 2: a total of 1,834 radiomics features and 1,024 2D/3D-ViT features are extracted from the three intra-tumoral subregions and the peri-tumoral region. Missing intra-tumoral subregion features are imputed using the k-Nearest Neighbors method. The intra-tumoral features from the three subregions are fused using a transformer-based method to output the Rad/2D/3D-intra prediction model. After incorporating peritumoral region features, the same method is used to output the Rad/2D/3D-adjacent prediction model. Finally, the 1,834 radiomics features are compressed into 1,024 features using PCA. The compressed radiomics features and the 2D/3D-ViT features (including both intra-tumoral and peritumoral regions) are fused using the Transformer method to output the combined model. Step 3: schematic diagram of Transformer fusion. 2D, two dimensional; GAN, generative adversarial network; PCA, principal component analysis; ROI, region of interest; SRIR, super-resolution image reconstruction; ViT, vision transformer.

Habitat sub-region clustering and image cropping

In order to characterize the heterogeneity within the tumor, this study divided the intra-tumoral region. Based on the SRIR-processed CT images, radiomics features were extracted from a 5×5×5 voxel matrix formed around each voxel point (Figure 2). Nineteen features were included in the study (Appendix 3). K-means unsupervised clustering method was used to aggregate voxels in the normalized images into relevant sub-regions. To determine the optimal number of clusters, this study tested 2 to 9 clusters and evaluated the performance of the clustering algorithm using the Calinski-Harabasz (CH) index. Subsequently, the sub-region images were cropped at the ROI’s largest cross-section to prepare for the extraction of 2D deep learning features.

Feature extraction and model ensembling

This study employed a transformer-based fusion technique to integrate radiomic features from various ROIs, as shown in Figure 2. It began with the extraction of 1,834 radiomic features and 1,024 deep learning [vision transformer (ViT)] features from each ROI, with model training details in Appendix 4. To effectively integrate these diverse feature sets, we employed the transformer architecture, leveraging its capability to capture long-range dependencies and contextual relationships via attention mechanisms (22). The self-attention mechanism allows the model to concurrently focus on different parts of input feature vectors, directing attention from each ROI feature vector to all others within the same ROI using scaled dot-product attention to calculate attention weights and assess feature significance (22-24). Cross-attention enables interaction between feature vectors from different ROIs by using two sets of queries, keys, and values from distinct ROI feature sets, facilitating information exchange and feature reinforcement (24). After attention mechanisms, the output undergoes layer normalization and is processed through position-wise feed-forward networks to stabilize learning and ensure consistent gradient flow (22). Residual connections and dropout are included to improve performance and prevent overfitting. The enriched encoded features from both self-attention and cross-attention are concatenated into a unified representation, which is then passed through fully connected layers for classification (23,24).

Statistical analysis and model evaluation

Analyses were conducted using SPSS v26.0 and Python v3.7. Normal distribution variables were assessed through the z-test, while non-normal variables were evaluated using the Wilcoxon test. Categorical data were subjected to analysis via Chi-square or Fisher’s exact test. Multivariable logistic regression was implemented with the forward stepwise method. The area under the curve (AUC) is employed to evaluate the effectiveness of the model’s predictions, with the receiver operating characteristic (ROC) curve illustrating the trade-off between true positive and false positive rates. Optimal cut-off values (Youden’s index) for patient categorization were determined through ROC curve analysis (Appendix 5). Kaplan-Meier survival curves were compared using the log-rank test, with P<0.05 indicating significance. Model comparison was performed using the DeLong test, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Model performance was assessed through decision curves analysis (DCA) and calibration curves.

Results

Patient characteristics

In accordance with the inclusion and exclusion criteria (Figure 1), a total of 742 patients were enrolled in the study. Center 1 enrolled 487 patients, who were randomly divided into training and validation cohorts in a 7:3 ratio. Center 2 included 181 patients, and an additional 74 cases from the external database were incorporated, collectively forming the external test cohort. The details of the characteristics are provided in Table 1. A statistically significant difference in gender distribution was observed across the cohorts. While the external test cohort had a balanced gender ratio, females were slightly more prevalent in the training and validation cohorts. However, univariate and multivariate analyses indicated that gender had no significant impact on VPI status prediction (Table S1). The Kaplan-Meier survival analysis revealed the OS curves for the three cohorts, as shown in Figure S1. Among them, 33, 9, and 42 patients experienced outcome events, respectively. The median follow-up time was 49 months [95% confidence interval (CI): 46.678–51.313] for the training cohort, 49 months (95% CI: 45.453–52.547) for the validation cohort, and 57 months (95% CI: 55.265–58.735) for the test cohort. The mean survival time was 80.84 months (95% CI: 77.270–84.418) for the training cohort, 77.75 months (95% CI: 73.853–81.645) for the validating cohort, and 88.18 months (95% CI: 84.357–92.036) for the external cohort. The performance of clinical characteristics in predicting VPI status and OS was found to be poor (Appendix 6, Figure S2). Therefore, clinical models were excluded from the subsequent modeling process.

Table 1

Baseline clinical and radiological characteristics of patients by cohort

Character	All cohorts	Training cohort	Validation cohort	External test cohort	P value
Age (years)	61.0±10.20	60.7±9.98	60.7±9.09	61.5±11.08	0.65
Sex					0.048
Male	338 (45.6)	145 (19.5)	61 (8.2)	132 (17.8)
Female	404 (54.4)	196 (26.4)	85 (11.5)	123 (16.6)
Smoke					0.06
Negative	535 (72.1)	257 (34.6)	108 (14.6)	170 (22.9)
Positive	207 (27.9)	84 (11.3)	38 (5.1)	85 (11.5)
Location					0.16
RUL	224 (30.2)	103 (13.9)	42 (5.7)	79 (10.6)
RML	70 (9.4)	32 (4.3)	10 (1.3)	28 (3.8)
RLL	134 (18.1)	58 (7.8)	32 (4.3)	44 (5.9)
LUL	185 (24.9)	78 (10.5)	36 (4.9)	71 (9.6)
LLL	129 (17.4)	70 (9.4)	26 (3.5)	33 (4.4)
VPI status					0.09
Negative	456 (61.5)	220 (29.6)	93 (12.5)	143 (19.3)
Positive	286 (38.5)	121 (16.3)	53 (7.1)	112 (15.1)
Lymphovascular and perineural invasion					0.13
Negative	512 (69.0)	223 (30.0)	103 (13.9)	186 (25.1)
Positive	230 (31.0)	118 (15.9)	43 (5.8)	69 (9.3)
STAS					0.13
Negative	529 (71.3)	238 (32.1)	98 (13.2)	193 (26.0)
Positive	213 (28.7)	103 (13.9)	48 (6.5)	62 (8.4)
Differentiation					0.55
Well	107 (14.4)	45 (6.1)	26 (3.5)	36 (4.9)
Moderately	405 (54.6)	188 (25.3)	72 (9.7)	145 (19.5)
Poorly	230 (31.0)	108 (14.6)	48 (6.5)	74 (10.0)
Clear					0.09
Negative	446 (60.1)	205 (27.6)	98 (13.2)	143 (19.3)
Positive	296 (39.9)	136 (18.3)	48 (6.5)	112 (15.1)
Lobulated sign					0.17
Negative	134 (18.1)	53 (7.1)	26 (3.5)	55 (7.4)
Positive	608 (81.9)	288 (38.8)	120 (16.2)	200 (27.0)
Spiculated sign					0.11
Negative	269 (36.3)	124 (16.7)	43 (5.8)	102 (13.7)
Positive	473 (63.7)	217 (29.2)	103 (13.9)	153 (20.6)
Pleural indentation					0.08
Negative	338 (45.6)	168 (22.6)	68 (9.2)	102 (13.7)
Positive	404 (54.4)	173 (23.3)	78 (10.5)	153 (20.6)
Air bronchogram					0.11
Negative	591 (79.6)	265 (35.7)	112 (15.1)	214 (28.8)
Positive	151 (20.4)	76 (10.2)	34 (4.6)	41 (5.5)
Vessel convergence					0.17
Negative	491 (66.2)	230 (31.0)	87 (11.7)	174 (23.5)
Positive	251 (33.8)	111 (15.0)	59 (8.0)	81 (10.9)
Vacuole sign					0.18
Negative	620 (83.6)	279 (37.6)	119 (16.0)	222 (29.9)
Positive	122 (16.4)	62 (8.4)	27 (3.6)	33 (4.4)
cT stage					0.08
T1a	214 (28.8)	90 (12.1)	42 (5.7)	82 (11.1)
T1b	380 (51.2)	193 (26.0)	72 (9.7)	115 (15.5)
T1c	148 (19.9)	58 (7.8)	32 (4.3)	58 (7.8)
LN metastasis					0.054
N0	667 (89.9)	311 (41.9)	132 (17.8)	224 (30.2)
N1	49 (6.6)	24 (3.2)	10 (1.3)	15 (2.0)
N2	26 (3.5)	6 (0.8)	4 (0.5)	16 (2.2)

Values are mean ± SD or n (%) unless otherwise defined. LLL, left lower lobe; LN, lymph node; LUL, left upper lobe; RLL, right lower lobe; RML, right middle lobe; RUL, right upper lobe; SD, standard deviation; STAS, spread through air spaces; VPI, visceral pleura invasion.

Sub-region cluster and feature extraction

The resolution of the original CT images was enhanced using GAN-based SRIR. Based on these enhanced images, intra-tumoral habitat features were extracted. Subsequently, K-means clustering was performed, and the clustering effectiveness was evaluated using the CH index. The results showed that the optimal number of clusters in the training cohort was three (Figure 2), leading to the division of the intra-tumoral region into three subregions. From each subregion of each patient, 1,834 radiomics features and 1,024 3D ViT features were extracted. Axial images were cropped from the largest axial section of the intra-tumoral ROI, and 1,024 ViT features were extracted from three subregions within these images (Figure 2). Missing subregion features for certain samples were imputed using the K-nearest neighbor method, specifically for Rad/ViT-Intra-H1, H2, and H3. The peritumoral region was defined by extending 5 mm outward from the boundary of the intra-tumoral region, and radiomics and 2D/3D ViT features were extracted using the same methods. In this study, the intra-tumoral plus peritumoral region was defined as adjacent.

VPI status prediction model

After extracting features from the three intra-tumoral subregions, the status of VPI was predicted using a fully connected layer based on transformer fusion. In the external test cohort (Figure 3 and Table 2), the Rad-intra model achieved the highest AUC value of 0.775 (95% CI: 0.7171–0.8323) (Table 2). The ViT-3D-intra model performed the worst, with an AUC value of only 0.687 (95% CI: 0.6209–0.7525). The new model was established by incorporating features from the peritumoral region, which significantly improved predictive performance (Table 2). In the external test cohort, the Rad-adjacent model achieved an AUC value of 0.822 (95% CI: 0.7726–0.8724). Compared to the Rad-intra model, the DeLong test did not indicate a statistically significant difference in the ROC curves between the two models (P=0.16). The NRI for the Rad-adjacent model reached 0.022. However, the IDI analysis between the models did not reveal any statistically significant difference (Figure 3). For the ViT-3D model, the AUC value increased from 0.687 to 0.779 after incorporating peritumoral features (Table 2). The DeLong test resulted in a P value of 0.023, indicating a statistically significant difference. The NRI value reached 0.112, but the IDI analysis did not achieve statistical significance (Figure 3).

Figure 3 The predictive capability of each model for VPI status in the test cohort. (A) The ROC curves indicate that the Rad-adjacent model has the highest predictive performance. (B,C) The calibration curves and DCA curves for each model are shown in the figure. (D) The numbers shown in the figure for the Delong test and IDI test represent P values, where 0.000 indicates P<0.001. NRI analysis demonstrates that the Rad-adjacent model shows positive improvement compared to other models. The NRI values compared to the 2D- and 3D-intra models are 0.131 and 0.173, respectively. Furthermore, the P values from the IDI test between the models are displayed in the corresponding cells. 2D, two dimensional; DCA, decision curve analysis; IDI, integrated discrimination improvement; NRI, net reclassification improvement; ROC, receiver operating characteristic; VPI, visceral pleural invasion.

Table 2

VPI status prediction model performance metrics

Model	Accuracy	AUC (95% CI)	Sensitivity	Specificity	PPV	NPV	F1	Cohort
Rad-intra	0.718	0.905 (0.8681–0.9422)	0.952	0.667	0.388	0.984	0.551	Train
	0.849	0.869 (0.7956–0.9423)	0.679	0.890	0.594	0.921	0.633	Validation
	0.741	0.775 (0.7171–0.8323)	0.639	0.831	0.768	0.724	0.697	Test
Rad-adjacent	0.883	0.947 (0.9218–0.9725)	0.903	0.878	0.622	0.976	0.737	Train
	0.753	0.843 (0.7652–0.9201)	0.786	0.746	0.423	0.936	0.550	Validation
	0.753	0.822 (0.7726–0.8724)	0.639	0.853	0.792	0.730	0.707	Test
ViT-2D-intra	0.795	0.936 (0.9098–0.9615)	0.968	0.756	0.469	0.991	0.632	Train
	0.644	0.784 (0.6914–0.8770)	0.857	0.593	0.333	0.946	0.480	Validation
	0.675	0.707 (0.6440–0.7710)	0.790	0.574	0.618	0.757	0.694	Test
ViT-2D-adjacent	0.815	0.892 (0.8540–0.9301)	0.790	0.821	0.495	0.946	0.609	Train
	0.815	0.825 (0.7345–0.9156)	0.643	0.856	0.514	0.910	0.571	Validation
	0.714	0.754 (0.6955–0.8125)	0.655	0.765	0.709	0.717	0.681	Test
ViT-3D-intra	0.850	0.924 (0.8953–0.9535)	0.855	0.849	0.558	0.963	0.675	Train
	0.760	0.817 (0.7245–0.9099)	0.679	0.780	0.422	0.911	0.521	Validation
	0.667	0.687 (0.6209–0.7525)	0.571	0.750	0.667	0.667	0.615	Test
ViT-3D-adjacent	0.845	0.865 (0.8095–0.9197)	0.758	0.864	0.553	0.941	0.639	Train
	0.911	0.863 (0.7577–0.9675)	0.714	0.958	0.800	0.934	0.755	Validation
	0.722	0.779 (0.7228–0.8357)	0.647	0.787	0.726	0.718	0.684	Test
Combine	0.883	0.933 (0.9003–0.9648)	0.790	0.903	0.645	0.951	0.710	Train
	0.808	0.867 (0.8051–0.9297)	0.857	0.797	0.500	0.959	0.632	Validation
	0.737	0.819 (0.7693–0.8686)	0.899	0.596	0.660	0.871	0.762	Test

2D, two dimensional; AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value; ViT, vision transformer; VPI, visceral pleural invasion.

Radiomics features were reduced to 1,024 dimensions using principal component analysis (PCA). Subsequently, these features were integrated to develop a combined model. The results showed that the AUC value reached 0.819 (95% CI: 0.7693–0.8686). NRI analysis indicated positive improvement in predictive performance compared to all other models. Compared to the ViT-3D-intra model and the ViT-2D-intra model, the NRI values were found to be 0.173 and 0.131, respectively. The DeLong test indicated that the differences between the models were statistically significant, with P values of 0.007 and <0.001; however, the IDI analysis did not demonstrate statistical significance. Compared to the Rad-adjacent model, the combined model showed a slight positive improvement, with an NRI value of 0.003, and the IDI analysis did not indicate statistical significance. The DeLong test also did not show a statistically significant difference in the ROC curves between the two models (P=0.89) (Figure 3, Figure S3, and Table 2).

OS stratification and prediction model

In the external test cohort, both the Rad-adjacent model and the combined model showed the strongest stratification capability, effectively distinguishing patients into low-risk and high-risk groups. The differences in stratification between the two groups were statistically significant, with P values of 0.032 and 0.013 (Figure 4, Figure S4, and Figure S5).

Figure 4 Stratification capability of each model for patient prognosis in the test cohort. Kaplan-Meier curves demonstrate a trend towards stratification capability in the 3D models, although the difference between high- and low-risk groups is not statistically significant. The Rad-adjacent model and the combine model effectively distinguish patients into high- and low-risk groups, with P values of 0.03 and 0.01, respectively. 2D, two dimensions; 3D, three dimensions.

Using survival status as the predictive indicator, in the test cohort, the combine model achieved an AUC of 0.821 (95% CI: 0.7692–0.8724) when predicting 5-year OS (Table 3). NRI analysis indicated that the combine model exhibited a positive improvement. However, the results of the IDI test revealed no statistical differences between the combined model and the other models. This indicates that, although there was some improvement in classification for predicting 5-year OS, there was no significant enhancement in discriminatory ability (Figure 5 and Figure S6). Interestingly, in the 5-year OS analysis, although the Rad-adjacent model did not have the highest AUC value (0.775, 95% CI: 0.7180–0.8312) and the NRI index was not the highest, the improvements of the Rad-adjacent model compared to the 3D-intra models were statistically significant according to the IDI test (Figure 5).

Table 3

OS prediction model performance metrics

5-year OS model	Accuracy	AUC (95% CI)	Sensitivity	Specificity	PPV	NPV	F1	Cohort
Rad-intra	0.844	0.855 (0.7962–0.9137)	0.673	0.888	0.607	0.913	0.638	Train
	0.726	0.839 (0.7764–0.9013)	0.964	0.669	0.409	0.987	0.574	Validation
	0.647	0.706 (0.6434–0.7694)	0.849	0.471	0.584	0.780	0.692	Test
Rad-adjacent	0.822	0.908 (0.8680–0.9475)	0.818	0.822	0.542	0.946	0.652	Train
	0.801	0.867 (0.8105–0.9232)	0.929	0.771	0.491	0.978	0.642	Validation
	0.706	0.775 (0.7180–0.8312)	0.790	0.632	0.653	0.775	0.715	Test
ViT-2D-intra	0.848	0.899 (0.8462–0.9521)	0.891	0.836	0.583	0.968	0.705	Train
	0.815	0.795 (0.6922–0.8974)	0.571	0.873	0.516	0.896	0.542	Validation
	0.612	0.604 (0.5344–0.6744)	0.370	0.824	0.647	0.599	0.471	Test
ViT-2D-adjacent	0.751	0.843 (0.7888–0.8976)	0.782	0.743	0.439	0.930	0.562	Train
	0.822	0.755 (0.6458–0.8639)	0.536	0.890	0.536	0.890	0.536	Validation
	0.639	0.686 (0.6213–0.7512)	0.832	0.471	0.579	0.762	0.683	Test
ViT-3D-intra	0.751	0.812 (0.7508–0.8734)	0.709	0.762	0.433	0.911	0.538	Train
	0.664	0.755 (0.6673–0.8430)	0.786	0.636	0.338	0.926	0.473	Validation
	0.627	0.658 (0.5915–0.7255)	0.655	0.603	0.591	0.667	0.622	Test
ViT-3D-adjacent	0.866	0.850 (0.7808–0.9190)	0.745	0.897	0.651	0.932	0.695	Train
	0.767	0.787 (0.6839–0.8906)	0.679	0.788	0.432	0.912	0.528	Validation
	0.690	0.733 (0.6712–0.7940)	0.664	0.713	0.669	0.708	0.667	Test
Combine	0.799	0.920 (0.8792–0.9612)	0.945	0.762	0.505	0.982	0.658	Train
	0.877	0.900 (0.8170–0.9838)	0.786	0.898	0.647	0.946	0.710	Validation
	0.745	0.821 (0.7692–0.8724)	0.782	0.713	0.705	0.789	0.741	Test

2D, two dimensional; AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; OS, overall survival; PPV, positive predictive value; ViT, vision transformer; VPI, visceral pleural invasion.

Figure 5 Predictive capability of each model for 5-year OS in the test cohort. The numbers shown in the figure for the DeLong test and IDI test represent P values, where 0.000 indicates P<0.001. For 5-year OS, the ROC curves indicate that the combine model has the highest predictive performance. NRI analysis demonstrates that the combine model shows positive improvement compared to other models. For 5-year OS, the combine model shows notable NRI indices compared to other models. The Delong test indicates significant statistical differences, yet the IDI test does not show significant differences. However, the IDI test reveals that the Rad-adjacent model shows statistically significant improvement compared to the 3D-intra models. 2D, two dimensional; DCA, decision curve analysis; IDI, integrated discrimination improvement; NRI, net reclassification improvement; OS, overall survival; ROC, receiver operating characteristic.

Discussion

This study utilized preoperative CT imaging to analyze intra-tumoral subregions and peritumoral features, establishing VPI status models for cT1 stage solid lung adenocarcinoma. The AUC values for the Rad-adjacent and combine models reached 0.822 and 0.819, respectively. Additionally, OS prediction models were developed, demonstrating utility and model generalization capabilities across multinational and multiethnic independent datasets. These models provide clinicians with robust stratification and prognostic information prior to surgery, aiding surgical decision-making.

It has been demonstrated that super-resolution techniques based on GANs could significantly enhance image resolution and improve the predictive efficacy of models utilized for tumor staging from imaging data (25,26). Additionally, a study has indicated that GANs can bolster the robustness of radiomics features in CT images (27). Evaluations across diverse medical imaging modalities, including CT, magnetic resonance imaging (MRI), and ultrasound, have indicated notable advancements in both image quality and spatial resolution (25-28). These studies highlight the promising potential of applying SRIR to medical image analysis, although further practical applications are needed to confirm clinical utility. Currently, few studies have used CT images to determine VPI status in lung adenocarcinoma with external validation, and most methodologies employ images with original resolution. These studies, which utilize deep learning or radiomics approaches to predict VPI status, have typically achieved AUC values below 0.70 in external validation cohorts (29,30). Choi et al. developed a deep learning model to predict VPI, achieving an AUC of 0.75, comparable to the performance of radiologists (31). However, this study included some patients with higher stages beyond cT1. To extract more valuable information from existing CT imaging data, this study employed GANs to enhance the resolution of the original images. Based on these enhanced images, intra-tumoral subregions were analyzed to obtain more detailed intra-tumoral feature information. Additionally, features from the peritumoral regions were extracted in this study.

For the integration of these different regional feature information, the fusion method used in modeling is crucial. Common fusion methods in image analysis, such as early fusion, late fusion, and intermediate fusion, may introduce redundant information, propagate noise, and fail to fully utilize the potential inter-modal relationships. These methods often overlook the synergistic effects between multimodal data, leading to suboptimal fusion outcomes (32-34). Lin et al. developed a convolutional neural network model based on 3D-ResNet-9 to predict VPI status. In the external validation set, the clinical model achieved an AUC of 0.66, the deep learning model achieved an AUC of 0.62, but the combined model only achieved an AUC of 0.69 (29). Additionally, this study only discussed the predictive performance of the 3D model. This study employed a transformer-based fusion method to capture the relationships and dependencies between different subregions through an attention mechanism. In the external test cohort, the combined model (AUC: 0.819) and the Rad-adjacent model (AUC: 0.822) both demonstrated good predictive performance and model generalization capability when predicting VPI status.

It has been reported that VPI is a prognostic factor in lung cancer patients (35,36). Previous studies on the CT semantic features of VPI suggested that these features could not independently predict the survival of patients with cT1 stage lung adenocarcinoma (37). This result is generally consistent with the findings of the present study. Based on VPI status, Lin et al. developed a deep learning model that achieved good prognostic stratification efficacy, with P=0.02 (29). This conclusion was also validated in this study using a multinational, multicenter external test cohort. Furthermore, this study extended the analysis to evaluate the prognostic prediction capability of this method for OS. Although the prediction results over long time scales remain unstable, the prediction efficacy for 5-year OS was maximized, with the combined model achieving an AUC of 0.821.

There are several limitations in this study. Firstly, the sample size included in this study is relatively small, and the retrospective nature of the study inevitably leads to selection bias in the population. Secondly, to fully consider the generalizability and robustness of the study conclusions, researchers should place greater emphasis on the impact of different CT scanners, institutions, and parameters on the results. Thirdly, this study still employed manual segmentation of the ROI, which brings about several long-standing issues that need to be addressed: time-consuming processes, subjectivity, poor reproducibility, and difficulties in scaling. Fourthly, the follow-up time for the training cohort included in this study was relatively short, which may limit the models’ predictive efficacy for OS. Fifthly, the methods used in this study require considerable computational resources and are time-consuming, which also limits their applicability.

Conclusions

In summary, the application of GANs for SRIR enhanced the detail of CT images. This facilitated a comprehensive analysis of tumor heterogeneity, incorporating peritumoral information with intra-tumoral sub-regional features. Consequently, the developed models demonstrated robust predictive capabilities for VPI status and OS in patients with cT1 stage lung adenocarcinoma. This approach offers valuable non-invasive preoperative decision-making support for surgeons.

Acknowledgments

The authors would like to extend our gratitude to the OneKey platform (https://github.com/OnekeyAI-Platform) for its invaluable contribution to this study.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2024-1122/rc

Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2024-1122/dss

Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2024-1122/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2024-1122/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of our institutions [No. (2024) K83-1 (the Fifth Affiliated Hospital of Sun Yat-sen University) and No. SYSKY-2025-217-01 (the Sun Yat-sen Memorial Hospital)]. Individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Aokage K, Suzuki K, Saji H, et al. Segmentectomy for ground-glass-dominant lung cancer with a tumour diameter of 3 cm or less including ground-glass opacity (JCOG1211): a multicentre, single-arm, confirmatory, phase 3 trial. Lancet Respir Med 2023;11:540-9. [Crossref] [PubMed]
Altorki N, Wang X, Kozono D, et al. Lobar or Sublobar Resection for Peripheral Stage IA Non-Small-Cell Lung Cancer. N Engl J Med 2023;388:489-98. [Crossref] [PubMed]
Suzuki K, Watanabe SI, Wakabayashi M, et al. A single-arm study of sublobar resection for ground-glass opacity dominant peripheral lung cancer. J Thorac Cardiovasc Surg 2022;163:289-301.e2. [Crossref] [PubMed]
Saji H, Okada M, Tsuboi M, et al. Segmentectomy versus lobectomy in small-sized peripheral non-small-cell lung cancer (JCOG0802/WJOG4607L): a multicentre, open-label, phase 3, randomised, controlled, non-inferiority trial. Lancet 2022;399:1607-17. [Crossref] [PubMed]
Ito H, Suzuki K, Mizutani T, et al. Long-term survival outcome after lobectomy in patients with clinical T1 N0 lung cancer. J Thorac Cardiovasc Surg 2020; Epub ahead of print. [Crossref]
Speicher PJ, Gu L, Gulack BC, et al. Sublobar Resection for Clinical Stage IA Non-small-cell Lung Cancer in the United States. Clin Lung Cancer 2016;17:47-55. [Crossref] [PubMed]
Goldstraw P, Chansky K, Crowley J, et al. The IASLC Lung Cancer Staging Project: Proposals for Revision of the TNM Stage Groupings in the Forthcoming (Eighth) Edition of the TNM Classification for Lung Cancer. J Thorac Oncol 2016;11:39-51. [Crossref] [PubMed]
Travis WD, Brambilla E, Noguchi M, et al. International association for the study of lung cancer/american thoracic society/european respiratory society international multidisciplinary classification of lung adenocarcinoma. J Thorac Oncol 2011;6:244-85. [Crossref] [PubMed]
Sun Q, Li P, Zhang J, et al. CT Predictors of Visceral Pleural Invasion in Patients with Non-Small Cell Lung Cancers 30 mm or Smaller. Radiology 2024;310:e231611. [Crossref] [PubMed]
Onoda H, Higashi M, Murakami T, et al. Correlation between pleural tags on CT and visceral pleural invasion of peripheral lung cancer that does not appear touching the pleural surface. Eur Radiol 2021;31:9022-9. [Crossref] [PubMed]
Bera K, Velcheti V, Madabhushi A. Novel Quantitative Imaging for Predicting Response to Therapy: Techniques and Clinical Applications. Am Soc Clin Oncol Educ Book 2018;38:1008-18. [Crossref] [PubMed]
Han F, Wang H, Zhang G, et al. Texture feature analysis for computer-aided diagnosis on pulmonary nodules. J Digit Imaging 2015;28:99-115. [Crossref] [PubMed]
Just N. Improving tumour heterogeneity MRI assessment with histograms. Br J Cancer 2014;111:2205-13. [Crossref] [PubMed]
O'Connor JP, Rose CJ, Waterton JC, et al. Imaging intratumor heterogeneity: role in therapy response, resistance, and clinical outcome. Clin Cancer Res 2015;21:249-57. [Crossref] [PubMed]
Syed AK, Whisenant JG, Barnes SL, et al. Multiparametric Analysis of Longitudinal Quantitative MRI data to Identify Distinct Tumor Habitats in Preclinical Models of Breast Cancer. Cancers (Basel) 2020;12:1682. [Crossref] [PubMed]
Vaidya P, Bera K, Patil PD, et al. Novel, non-invasive imaging approach to identify patients with advanced non-small cell lung cancer at risk of hyperprogressive disease with immune checkpoint blockade. J Immunother Cancer 2020;8:e001343. [Crossref] [PubMed]
Wang X, Xu C, Grzegorzek M, et al. Habitat radiomics analysis of pet/ct imaging in high-grade serous ovarian cancer: Application to Ki-67 status and progression-free survival. Front Physiol 2022;13:948767. [Crossref] [PubMed]
Xie C, Yang P, Zhang X, et al. Sub-region based radiomics analysis for survival prediction in oesophageal tumours treated by definitive concurrent chemoradiotherapy. EBioMedicine 2019;44:289-97. [Crossref] [PubMed]
Marusyk A, Polyak K. Tumor heterogeneity: causes and consequences. Biochim Biophys Acta 2010;1805:105-17. [Crossref] [PubMed]
Bakr S, Gevaert O, Echegaray S, et al. A radiogenomic dataset of non-small cell lung cancer. Sci Data 2018;5:180202. [Crossref] [PubMed]
wangqingbaidu. OnekeyAI-Platform. 2024.7.31. Available online: https://github.com/OnekeyAI-Platform
Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 2020;21:1-67.
Liang Z, Zhao K, Liang G, et al. MAXFormer: Enhanced transformer for medical image segmentation with multi-attention and multi-scale features fusion. Knowledge-Based Systems 2023;280:110987.
Carion N, Massa F, Synnaeve G, et al. End-to-End Object Detection with Transformers. In: Vedaldi A, Bischof H, Brox T, Frahm JM. (eds) Computer Vision–ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12346. Springer, Cham.
Ma C, Yang CY, Yang X, et al. Learning a no-reference quality metric for single-image super-resolution. Computer Vision and Image Understanding 2017;158:1-16.
Fan M, Liu Z, Xu M, et al. Generative adversarial network-based super-resolution of diffusion-weighted imaging: Application to tumour radiomics in breast cancer. NMR Biomed 2020;33:e4345. [Crossref] [PubMed]
de Farias EC, di Noia C, Han C, et al. Impact of GAN-based lesion-focused medical image super-resolution on the robustness of radiomic features. Sci Rep 2021;11:21361. [Crossref] [PubMed]
Hou M, Zhou L, Sun J. Deep-learning-based 3D super-resolution MRI radiomics model: superior predictive performance in preoperative T-staging of rectal cancer. Eur Radiol 2023;33:1-10. [Crossref] [PubMed]
Lin X, Liu K, Li K, et al. A CT-based deep learning model: visceral pleural invasion and survival prediction in clinical stage IA lung adenocarcinoma. iScience 2024;27:108712. [Crossref] [PubMed]
Kong L, Xue W, Zhao H, et al. Predicting pleural invasion of invasive lung adenocarcinoma in the adjacent pleura by imaging histology. Oncol Lett 2023;26:438. [Crossref] [PubMed]
Choi H, Kim H, Hong W, et al. Prediction of visceral pleural invasion in lung cancer on CT: deep learning model achieves a radiologist-level performance with adaptive sensitivity and specificity to clinical needs. Eur Radiol 2021;31:2866-76. [Crossref] [PubMed]
Cichy RM, Pantazis D, Oliva A. Similarity-Based Fusion of MEG and fMRI Reveals Spatio-Temporal Dynamics in Human Cortex During Visual Object Recognition. Cereb Cortex 2016;26:3563-79. [Crossref] [PubMed]
Owens A, Wu J, Mcdermott JH, et al. Ambient Sound Provides Supervision for Visual Learning. Springer, Cham 2016.
Cichy RM, Pantazis D, Oliva A. Resolving human object recognition in space and time. Nat Neurosci 2014;17:455-62. [Crossref] [PubMed]
Yoshida J, Nagai K, Asamura H, et al. Visceral pleura invasion impact on non-small cell lung cancer patient survival: its implications for the forthcoming TNM staging based on a large-scale nation-wide database. J Thorac Oncol 2009;4:959-63. [Crossref] [PubMed]
Jiang L, Liang W, Shen J, et al. The impact of visceral pleural invasion in node-negative non-small cell lung cancer: a systematic review and meta-analysis. Chest 2015;148:903-11. [Crossref] [PubMed]
Kim H, Goo JM, Kim YT, et al. CT-defined Visceral Pleural Invasion in T1 Lung Adenocarcinoma: Lack of Relationship to Disease-Free Survival. Radiology 2019;292:741-9. [Crossref] [PubMed]

Cite this article as: Zhang W, Gan X, Chen W, Duan X, Shen Z, Xu H, Dai H, Chen J, Chen B. Attention mechanism-based habitat analysis for predicting pleural invasion and prognosis of pulmonary nodules. Transl Lung Cancer Res 2025;14(5):1596-1610. doi: 10.21037/tlcr-2024-1122

Attention mechanism-based habitat analysis for predicting pleural invasion and prognosis of pulmonary nodules

Highlight box

Introduction

Methods

Study design and population

Image segmentation and super-resolution image reconstruction

Habitat sub-region clustering and image cropping

Feature extraction and model ensembling

Statistical analysis and model evaluation

Results

Patient characteristics

Table 1

Sub-region cluster and feature extraction

VPI status prediction model

Table 2

OS stratification and prediction model

Table 3

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share