Deep learning radiomics and 18F-FDG PET/CT imaging: mediastinal lymph node characteristics as predictors of metastasis in non-small cell lung cancer
Original Article

Deep learning radiomics and 18F-FDG PET/CT imaging: mediastinal lymph node characteristics as predictors of metastasis in non-small cell lung cancer

Furui Duan1# ORCID logo, Hanyu Zu2# ORCID logo, Rui Zhang1 ORCID logo, Yan Li1 ORCID logo, Yu Zhao1 ORCID logo, Xuewei Wang3 ORCID logo, Minghui Zhang4 ORCID logo, Ping Li1 ORCID logo, Dalong Wang1 ORCID logo

1PET/CT Department, The Second Affiliated Hospital of Harbin Medical University, Harbin, China; 2Radiology Department, Binzhou Medical University Affiliated Hospital, Binzhou, China; 3Image Center Department, Affiliated Cancer Hospital of Harbin Medical University, Harbin, China; 4Department of Respiratory Medicine, Affiliated Cancer Hospital of Harbin Medical University, Harbin, China

Contributions: (I) Conception and design: F Duan, H Zu; (II) Administrative support: D Wang, P Li; (III) Provision of study materials or patients: M Zhang, Y Zhao; (IV) Collection and assembly of data: R Zhang, X Wang, Y Li; (V) Data analysis and interpretation: M Zhang, Y Li; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work as co-first authors.

Correspondence to: Dalong Wang, MD; Ping Li, MD. PET/CT Department, The Second Affiliated Hospital of Harbin Medical University, No. 246 Xuefu Road, Nangang District, Harbin 150086, China. Email: wangdalongbao@163.com; lipinghmu@163.com; Minghui Zhang, MD. Department of Respiratory Medicine, Affiliated Cancer Hospital of Harbin Medical University, No. 150 Haping Road, Nangang District, Harbin 150081, China. Email: zhmhui1985@163.com.

Background: In non-small cell lung cancer (NSCLC), accurate lymph node staging is vital for prognosis and treatment planning. However, positron emission tomography/computed tomography (PET/CT) is limited by false positives, and morphology-based criteria lack reliability. This study aimed to develop and validate a PET/CT-based deep learning radiomics (DLR) approach to distinguish benign from malignant lymph nodes.

Methods: A total of 217 hypermetabolic lymph nodes from 185 NSCLC patients were retrospectively analyzed. Radiomics and DenseNet121-based deep network features were extracted from PET/CT images. Clinical and imaging variables were selected using logistic regression (LR), correlation analysis, and recursive feature elimination (RFE). Nine machine learning models were trained and externally validated; diagnostic performance was evaluated by the area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, specificity, and F1 score.

Results: Our study demonstrated that the artificial neural network (ANN) and extra trees (ET) models exhibited superior diagnostic performance in identifying suspected malignant lymph nodes in NSCLC patients. Specifically, the ANN achieved an AUC of 0.865, sensitivity of 66.7%, and accuracy of 82.0% on the test set, while the ET model performed best in the external validation set with an AUC of 0.865, sensitivity of 76.9%, and accuracy of 80.4%. Tumor location, lymph node long-to-short axis (L/S) ratio, and bilateral hilar 18F-fluorodeoxyglucose (FDG) uptake were significant predictors of nodal status. Correlation analysis showed that deep learning and radiomics features are complementary, suggesting their integration can significantly improve lung cancer diagnostic accuracy.

Conclusions: The PET/CT-based DLR model accurately differentiates benign from malignant lymph nodes, outperforming conventional methods. Combining DenseNet121-derived features with radiomics improves staging accuracy and aids personalized treatment planning.

Keywords: Non-small cell lung cancer (NSCLC); positron emission tomography/computed tomography (PET/CT); lymph node metastasis (LNM); radiomics; deep learning


Submitted Jun 03, 2025. Accepted for publication Aug 25, 2025. Published online Oct 29, 2025.

doi: 10.21037/tlcr-2025-650


Highlight box

Key findings

• A deep learning radiomics (DLR) model based on 18F-fluorodeoxyglucose positron emission tomography/computed tomography (PET/CT) imaging accurately identified metastatic potential in lymph nodes of patients with non-small cell lung cancer (NSCLC).

What is known and what is new?

• PET/CT plays a critical role in assessing nodal involvement in NSCLC, but conventional morphological evaluation remains limited in sensitivity. While previous studies focused on primary tumors, few have comprehensively analyzed lymph node-specific imaging features using deep learning.

• This study introduces a multi-feature fusion model that incorporates radiomics and deep learning to improve preoperative prediction of nodal metastasis.

What is the implication, and what should change now?

• The proposed model may enhance diagnostic confidence and reduce unnecessary invasive biopsies in NSCLC patients.

• Clinical decision-making in staging and treatment planning could benefit from the integration of radiomics-based lymph node analysis.

• Future clinical workflows should consider incorporating DLR to support individualized patient management.


Introduction

Lung cancer is the most common cancer worldwide and remains the leading cause of cancer-related deaths, accounting for the largest proportion of all cancer fatalities (1). Non-small cell lung cancer (NSCLC) is the most prevalent histological type, representing 85% of all lung cancer cases, with adenocarcinoma and squamous-cell carcinoma being the most common subtypes (2,3). Lymph node staging including N1 and N2 status plays a crucial role in the management of NSCLC, directly impacting prognosis prediction and clinical treatment planning. Therefore, accurate lymph nodal staging is critical for clinicians to determine appropriate treatment options for patients with NSCLC.

18F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) is widely regarded as the most reliable functional imaging technique for assessing the status of mediastinal and hilar lymph nodes (4). This technique, which utilizes the increased glucose metabolism of malignant cells for non-invasive assessment, can simultaneously provide metabolic and anatomical information, making it particularly suitable for preoperative non-invasive diagnoses. However, inflammation, granulomas, and infectious diseases can also lead to false-positive results, which may affect survival rates when using PET/CT for surgical staging or resection in patients with NSCLC (3,5). False-positive FDG uptake in benign lymph nodes is a well-known issue, but overestimating the metastatic potential of lymph nodes can inappropriately exclude surgical treatment options. Since FDG uptake is not exclusive to cancer cells, false positives in FDG studies are inevitable (6). Accurately diagnosing mediastinal lymph nodes is crucial for selecting candidates for surgery. Typically, lymph nodes that have a short diameter greater than 1 cm and exhibit high metabolic activity on PET scans, along with characteristics such as asymmetric distribution, high metabolism, low CT values, round morphology, and blurred margins, are typically considered metastatic. However, relying solely on these imaging indicators can lead to a one-sided and potentially misleading diagnosis. Therefore, it is crucial to consider additional treatment strategies for patients with a likely poor prognosis. To avoid overtreatment and ensure that patients receive appropriate care, we urgently need more accurate diagnostic methods to differentiate between benign and malignant lymph nodes.

Radiomics is a high-throughput technique that extracts quantitative features from medical images such as PET, CT, and magnetic resonance imaging (MRI), reflecting the underlying pathophysiological processes depicted in these images, thereby enabling the prediction of tumor biological characteristics, behavior, type, treatment response, and prognosis (7,8). In recent years, numerous studies have demonstrated that radiomics offers significant advantages in distinguishing lymph node metastasis (LNM) (5,9,10). A typical radiomics feature analysis includes assessing size, shape, and texture features, but the predefined nature of feature extraction, which cannot be adjusted by category and image, might limit the model’s ability to generalize when dealing with unknown or heterogeneous image data (11-13). If these features fail to capture all the variability that affects the disease phenotype, it could lead to inaccurate analysis results or limited applicability. Deep learning offers more flexible and adaptive feature extraction methods by automatically learning complex patterns from data (14). This method allows models to directly learn and identify key disease-related features from various types of images and pathological conditions, enhancing their ability to process heterogeneous data and generalization (15). Moreover, the integration of deep learning with radiomics is increasingly widespread in medical image analysis. Deep learning radiomics (DLR) utilizes deep neural networks to directly extract high-throughput image features from medical images, without the need for additional feature extraction steps (16). This technology allows for high-dimensional quantification of radiological images and extraction of detailed features beyond human visual recognition. The application of DLR not only enhances the automation and accuracy of image analysis but also improves the model’s understanding of complex image information.

Although prior studies have explored LNM prediction using radiomics and deep learning techniques, the majority have focused on determining whether patients have metastatic lymph nodes at the case level (9,17,18). However, in clinical practice, even in patients with confirmed nodal metastasis, not all lymph nodes are involved; some may remain benign despite synchronous metastasis. Thus, patient-level assessments overlook node-level heterogeneity. Furthermore, existing node-level studies often suffer from small sample sizes, ambiguous inclusion/exclusion criteria, and lack of external validation, limiting their generalizability and clinical utility (19,20). The aim of this study was to develop and validate a DLR approach for more accurately distinguishing between benign and malignant lymph nodes with high metabolic activity in patients with NSCLC. This method aims to minimize overtreatment and ensure that patients receive precise and appropriate surgical and therapeutic interventions, thereby enhancing the accuracy and reliability of lymph node assessments. We present this article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-650/rc).


Methods

Patients

The study included NSCLC patients who underwent 18F-FDG PET/CT scans between September 2017 and November 2024 at the PET/CT Department of The Second Affiliated Hospital of Harbin Medical University and Affiliated Cancer Hospital of Harbin Medical University, hereafter referred to as Hospital 1 and Hospital 2. External validation was conducted at Hospital 2. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The Medical Ethics Committee of The Second Affiliated Hospital of Harbin Medical University approved this retrospective study (approval No. YJSKY 2024-035). Affiliated Cancer Hospital of Harbin Medical University was also informed and agreed to the study. The requirement for written informed consent was waived due to the retrospective nature of the study.

Inclusion criteria were as follows: (I) patients diagnosed with NSCLC confirmed by surgical pathology, with complete pathological evaluation of mediastinal lymph nodes; (II) preoperative 18F-FDG PET/CT scans performed (with the interval between PET/CT scans and surgery or biopsy being less than 3 weeks); (III) complete clinical and pathological data; and (IV) mediastinal lymph nodes showing FDG uptake higher than the background mediastinal activity. Exclusion criteria included: (I) a preoperative history of other malignancies besides lung cancer; (II) poor image quality; (III) patients with benign conditions known to cause increased FDG uptake, such as granulomatous diseases, active infections, or autoimmune disorders; and (IV) patients who had received chemotherapy or radiation before undergoing PET/CT. The specific process of determining the pathology of lymph nodes was as follows: (I) the classification of lymph nodes was based on the anatomical division defined by the 9th edition of the tumor-node-metastasis (TNM) staging system (21). (II) Hypermetabolic lymph nodes on PET/CT were considered metastatic if pathological examination confirmed metastasis in the corresponding region, and considered non-metastatic if all dissected nodes in that region were pathologically negative. Regions without pathological confirmation were excluded from the analysis. (III) For each lymph node station, 1 to 3 lymph nodes with the highest maximum standardized uptake value (SUVmax) values were selected for analysis.

The clinical and pathological data of the patients included various factors such as age, sex, smoking history, tumor history, family history, carcinoembryonic antigen (CEA) levels, pathological type, CT features (tumor long and short diameters, and tumor location), SUVmax, mean standardized uptake value (SUVmean), metabolic tumor volume (MTV), and total lesion glycolysis (TLG). TLG was calculated using the standard formula: TLG = SUVmean × MTV. The long-to-short axis (L/S) ratio of lymph nodes was used to assess their morphological characteristics. Tumor location was categorized based on the position of the lesion center: lesions situated within the inner one-third of the lung parenchyma were defined as central lung cancers, whereas those beyond this zone were considered peripheral lung cancers. Histopathological examination served as the reference standard for confirming LNM. The inclusion and exclusion workflow is presented in Figure 1. A total of 217 lymph nodes from 185 patients were included in the study. Patients recruited from Hospital 1 were randomly divided into training and test cohorts at a 7:3 ratio, whereas those from Hospital 2 comprised the external validation cohort.

Figure 1 Patient selection flowchart. Hospital 1, PET/CT Department of The Second Affiliated Hospital of Harbin Medical University. +, positive; −, negative. FDG, fluorodeoxyglucose; LN, lymph node; NSCLC, non-small cell lung cancer; PET/CT, positron emission tomography/computed tomography; SUVmax, maximum standardized uptake value; TNM, tumor-node-metastasis.

PET/CT image acquisition

The imaging was performed using a Siemens Biograph 64 mCT scanner (Siemens, Munich, Germany). All patients fasted for at least 6 hours prior to the administration of 18F-FDG, and only those with a blood glucose level below 11 mmol/L proceeded to scanning. A dose of 18F-FDG (0.11 mCi/kg, pH 5–7, radiochemical purity >95%) was injected intravenously. PET/CT acquisition was initiated 45–60 minutes post-injection, covering the region from the skull vertex to the proximal lower limbs. CT scans were acquired with a tube voltage of 120 kV, and tube current was modulated using CARE Dose. Following CT, PET data were obtained in a stepwise bed motion mode at 1.6 mm/s. Attenuation correction was performed based on the corresponding CT images. PET reconstruction was conducted using the TrueD algorithm [Digital Imaging and Communications in Medicine (DICOM) 3.0 format], with a matrix of 200×200 and voxel dimensions of 4.07 mm × 4.07 mm × 3 mm. The CT images were reconstructed at 512×512 with voxel dimensions of 0.78 mm × 0.78 mm × 0.7 mm. Final PET images were generated using the TrueX + TOF reconstruction method.

Tumor segmentation, and radiomics feature extraction

PET and CT images were exported in their native resolution using the DICOM format. CT data were resampled to a slice thickness of 1 mm and PET data to 3 mm, employing B-spline interpolation.

Segmentation was independently carried out by two experienced radiologists, who were aware of the tumor location but blinded to all other clinical and pathological data. In addition to segmentation, the two radiologists also independently evaluated the status of the lymph nodes, including lymph node size, shape, margin, and the presence of bilateral hilar FDG uptake. Discrepancies in the evaluation were resolved through discussion; if no consensus could be reached, a senior radiologist was consulted. The two radiologists also jointly reviewed and negotiated the segmentation of each lesion. CT segmentation involved manual delineation on lung-window reconstructions (ITK-SNAP, v4.2). PET segmentation used semi-automated thresholding in LIFEx (v7.6), including all voxels with intensity ≥40% of SUVmax (22). Image preprocessing and radiomics features extracted from each volume of interest were performed using the FAE software (version 0.5.2; https://github.com/salan668/FAE), which is a PyRadiomics-based open-source software (23). The complete DLR workflow of this study is illustrated in Figure 2.

Figure 2 Workflow of the DLR analysis. The process includes data acquisition, image segmentation, feature extraction, feature selection, model construction, and performance evaluation. Hospital 1, PET/CT Department of The Second Affiliated Hospital of Harbin Medical University; Hospital 2, Affiliated Cancer Hospital of Harbin Medical University. 3D, three-dimensional; AdaBoost, adaptive boosting; ANN, artificial neural network; AUC, area under the ROC curve; CI, confidence interval; Conv, convolution; CT, computed tomography; DLR, deep learning radiomics; DT, decision tree; ET, extra trees; FC, fully connected; GBM, gradient boosting machine; LASSO, least absolute shrinkage and selection operator; LR, logistic regression; PET, positron emission tomography; RF, random forest; RFE, recursive feature elimination; ROC, receiver operating characteristic; SVM, support vector machine; Trans., transformer; XGBoost, extreme gradient boosting.

DenseNet121 for deep learning classification

To enhance the adaptability of our model to image variability, we implemented various data augmentation techniques, including random horizontal and vertical flipping. Prior to augmentation, per-channel intensity normalization was applied as a preprocessing step to standardize input data. These augmentations were applied during the training phase to improve the robustness and generalizability of the model. A convolutional neural network based on DenseNet121 was trained and validated using these augmented image patches. DenseNet121 employs a densely connected architecture that significantly enhances feature reuse and effectively mitigates the vanishing gradient problem, enabling the training of deeper neural networks even with limited sample sizes. Compared to traditional convolutional neural networks, DenseNet121 achieves comparable or superior performance with fewer parameters, thereby reducing the risk of overfitting and improving training efficiency and generalizability. In previous medical imaging studies, DenseNet121 has demonstrated excellent performance in tasks such as lung and breast cancer diagnosis and classification (24-26).

The DenseNet121 model was initialized with pretrained weights from the ImageNet dataset and fine-tuned using transfer learning, in which all layers were updated during training to adapt to the PET/CT imaging domain. The Adam optimizer was employed with an initial learning rate of 1×10−4. A learning rate decay strategy was employed, in which the learning rate was reduced by half every 100 epochs to ensure stable convergence. The batch size was set to 16, and the model was trained for a total of 1,000 epochs. The cross-entropy loss function was used as the optimization objective. During each training iteration, the loss was computed through forward propagation, followed by backpropagation using loss.backward() and parameter updates via optimizer.step(). The loss value for each epoch was recorded to facilitate real-time monitoring of the training process. The model parameters achieving the best performance on the validation set were automatically saved as the optimal checkpoint. During inference, a forward_hook was registered on the fully connected (FC) layer of the DenseNet121 model using PyTorch to extract deep feature representations. This mechanism enabled the capture of intermediate features during forward propagation, which were subsequently used for further analysis and classification. Specifically, the output of the penultimate FC layer was extracted as a high-level abstract encoding of the input image. Compared to single-modality clinical variables, these image-encoded network features provided a more comprehensive and integrative representation of the lesion, enhancing the model’s learning capacity and interpretability. A thresholding operation was then applied to generate final binary classification outputs. Additionally, image identifiers, intermediate feature vectors, and predicted labels were saved in a .csv file to support downstream statistical analysis and validation. Throughout the training and validation process, key hyperparameters such as learning rate, weight decay, and batch size were adjusted as needed based on model performance, aiming to optimize both classification accuracy and generalization ability. The network and source codes are fully available (https://github.com/284496/deep-learning-radiomics).

All model training and inference procedures were implemented using Python (version 3.10) and PyTorch (version 2.6.0).

Feature selection

Univariate and multivariable logistic regression (LR) analyses were performed to identify clinically independent risk factors. Z-scores were used to standardize the radiomics and deep network features of all patients. In this study, a systematic feature selection strategy was employed to enhance model robustness and minimize redundancy. First, the Mann-Whitney U test identified features with significant differences (P<0.05) between classes. Next, Spearman correlation analysis removed features with correlation coefficients >0.8 to reduce multicollinearity. Then, recursive feature elimination (RFE) with random forest (RF) was applied over 100 iterations, each using a 7:3 train-validation split. In each iteration, RFE selected the top 8 most informative features, and those appearing in at least 30 iterations were retained to ensure stability. Finally, the least absolute shrinkage and selection operator (LASSO) was applied, with the penalty parameter determined via 10-fold cross-validation (lambda.1se). This multi-step process eliminated redundancy, and improved statistical reliability, optimizing model performance and generalization.

Model construction and evaluation

A total of nine machine learning algorithms were employed for model construction, namely LR, RF, support vector machine (SVM), adaptive boosting (AdaBoost), artificial neural network (ANN), extra trees (ET), decision tree (DT), gradient boosting machine (GBM), and extreme gradient boosting (XGBoost). All models were implemented using Python (version 3.10) with the Scikit-learn and XGBoost libraries.

Hyperparameter tuning for each model was performed using grid search with five-fold cross-validation to determine the optimal parameter configuration. The receiver operating characteristic (ROC) curve was plotted by varying classification thresholds, and the area under the ROC curve (AUC) was calculated accordingly. In addition, several supplementary metrics were computed to provide a comprehensive evaluation of model performance, including accuracy, sensitivity, specificity, and F1-score. All metrics were calculated based on the test sets within the cross-validation framework. In order to explore the relationship between deep network features obtained CT and PET and radiomics features, we conducted a Pearson correlation analysis.

Statistical analysis

Statistical processing was performed in R 4.4.0. Distributional assumptions were checked with the Shapiro-Wilk test, and homogeneity of variances with the Levene test. Continuous outcomes are presented as mean ± standard deviation (SD) when Gaussian and as median [interquartile range (IQR)] when non-Gaussian; comparisons employed Student’s t-test or the Mann-Whitney U test accordingly. Categorical variables were analyzed using χ2 tests on 2×2 tables. All tests were two-sided, with P<0.05 indicating significance. The Youden index was used to identify the optimal threshold.


Results

Patient characteristics

In this study, a total of 217 lymph nodes from 185 patients were identified, with an average age of 63.69 years (SD =7.60 years; 104 females). These lymph nodes were further allocated to the training set (n=114), test set (n=50), and external validation set (n=53). Among these lymph nodes, 75 (34.88%) were positive for malignancy/metastasis. The baseline characteristics of the three datasets are similarly distributed, as shown in Table 1.

Table 1

Clinical characteristics of the patients

Characteristics Training set Test set External validation set
LN− (n=74) LN+ (n=40) P value LN− (n=32) LN+ (n=18) P value LN− (n=36) LN+ (n=17) P value
Age (years) 64.11±6.83 62.93±6.92 0.61 66.59±7.87 58.72±6.96 0.002 65.17±7.66 51.43±9.23 0.078
Sex 0.24 0.16 0.71
   Female 40 (54.05) 17 (42.50) 19 (59.38) 7 (38.89) 13 (43.33) 8 (38.10)
   Male 34 (45.95) 23 (57.50) 13 (40.63) 11 (61.11) 17 (56.67) 13 (61.90)
Tumor SUVmax 8.21
(4.76–12.43)
12.05
(7.38–15.94)
0.009 9.37
(4.35–14.76)
11.94
(8.16–14.21)
0.21 9.70
(14.00–28.00)
13.60
(17.00–32.00)
0.005
Tumor SUVmean 4.10
(2.67–5.29)
5.65
(3.75–7.00)
<0.001 4.63
(2.69–6.59)
5.05
(3.54–5.95)
0.52 4.46
(3.28–9.52)
9.66
(6.57–10.48)
0.001
Smoking 0.49 0.19 0.59
   No 65 (87.84) 26 (65.00) 15 (46.88) 5 (27.78) 12 (40.00) 10 (47.62)
   Yes 9 (12.16) 14 (35.00) 17 (53.13) 13 (72.22) 18 (60.00) 11 (52.38)
CEA (ng/mL) 0.24 0.31 0.017
   <5 59 (79.73) 25 (62.50) 26 (81.25) 12 (66.67) 25 (83.33) 11 (52.38)
   ≥5 15 (20.27) 15 (37.50) 6 (18.75) 6 (33.33) 5 (16.67) 10 (47.62)
Tumor MTV (cm3) 7.19
(3.52–17.61)
16.01
(7.28–23.35)
0.007 4.41
(2.05–13.00)
12.06
(4.58–28.13)
0.08 8.07
(4.10–16.30)
8.99
(5.26–15.08)
0.90
Tumor TLG (g) 25.32
(10.31–83.12)
85.35
(38.42–130.57)
0.002 24.59
(8.65–66.30)
64.12
(30.83–114.62)
0.059 51.17
(13.45–83.12)
71.38
(46.22–212.98)
0.15
Tumor long diameter (mm) 26.50
(20.00–36.00)
34.00
(27.50–40.50)
0.016 23.00
(18.50–33.50)
30.50
(23.00–38.00)
0.16 30.50
(27.00–40.00)
30.00
(25.00–40.00)
0.86
Tumor short diameter (mm) 17.00
(12.00–28.00)
24.00
(16.50–32.50)
0.015 15.50
(13.50–27.00)
23.00
(15.00–29.00)
0.22 22.00
(14.00–28.00)
22.00
(17.00–32.00)
0.65
Location 0.004 0.017 0.002
   Peripheral 65 (87.84) 26 (65.00) 28 (87.50) 10 (55.56) 24 (80.00) 8 (38.10)
   Central 9 (12.16) 14 (35.00) 4 (12.50) 8 (44.44) 6 (20.00) 13 (61.90)
Pathology source 0.29 0.16 09
   ADC 60 (81.08) 29 (72.50) 24 (75.00) 10 (55.56) 17 (56.67) 19 (90.48)
   SCC 14 (18.92) 11 (27.50) 8 (25.00) 8 (44.44) 13 (43.33) 2 (9.52)
Bilateral hilar FDG uptake <01 0.016 0.011
   No 18 (24.32) 25 (62.50) 7 (21.88) 10 (55.56) 12 (40.00) 16 (76.19)
   Yes 56 (75.68) 15 (37.50) 25 (78.13) 8 (44.44) 18 (60.00) 5 (23.81)
Margin 0.46 0.29 0.15
   Clear 63 (85.14) 36 (90.00) 31 (96.88) 16 (88.89) 29 (96.67) 17 (80.95)
   Unclear 11 (14.86) 4 (10.00) 1 (3.13) 2 (11.11) 1 (3.33) 4 (19.05)
LN SUVmax 4.64
(3.19–6.23)
4.50
(3.33–5.84)
0.956 4.81
(3.40–7.03)
4.00
(2.72–8.84)
0.75 4.59
(3.14–6.70)
6.01
(5.22–8.49)
0.014
LN SUVmean 3.07
(2.46–3.65)
3.09
(2.35–3.74)
0.95 3.35
(2.65–4.28)
2.36
(2.21–4.86)
0.37 2.82
(2.19–3.67)
3.40
(3.08–5.01)
0.022
LN CT values (HU) 47.50
(28.00–66.00)
40.00
(19.00–65.00)
0.35 49.50
(6.00–73.00)
43.00
(14.00–51.00)
0.29 53.00
(41.00–75.00)
36.00
(11.00–45.00)
0.006
LN long diameter (mm) 14.00
(12.00–18.00)
13.00
(10.00–16.50)
0.13 12.50
(11.00–17.50)
14.00
(8.00–18.00)
0.86 12.00
(10.00–14.00)
15.30
(10.00–19.00)
0.19
LN short diameter (mm) 8.00
(6.00–10.00)
9.00
(6.00–11.00)
0.61 8.00
(6.00–10.50)
8.00
(7.00–11.00)
0.77 7.00
(5.00–9.00)
8.00
(7.00–12.00)
0.15
L/S ratio 1.82
(1.43–2.10)
1.53
(1.26–1.86)
0.009 1.78
(1.33–2.20)
1.67
(1.17–1.86)
0.28 1.76
(1.38–2.10)
1.60
(1.20–1.90)
0.32

Data are presented as mean ± SD, n (%), or median (IQR). P value of the last column shows differences of variables in difference sets. +, positive; −, negative. ADC, adenocarcinoma; CEA, carcinoembryonic antigen; CT, computed tomography; FDG, fluorodeoxyglucose; HU, Hounsfield units; IQR, interquartile range; L/S, long-to-short axis; LN, lymph node; MTV, metabolic tumor volume; SCC, squamous cell carcinoma; SD, standard deviation; SUVmax, maximum standardized uptake value; SUVmean, mean standardized uptake value; TLG, total lesion glycolysis.

Comparison between different models

Multivariate LR analysis demonstrated that the L/S ratio, tumor location, and bilateral hilar FDG uptake were independently associated with lymph node status, as shown in Table 2. The cut-off of L/S ratio for true positive and false positive was 1.615 (sensitivity, 62.2%, specificity, 60.3%).

Table 2

Results of LR analysis for clinical characteristics

Characteristic (reference level) Univariate analysis Multivariable analysis
OR 95% CI P value OR 95% CI P value
Age 1.00 0.97–1.05 0.66
Sex (male) 1.78 0.93–3.40 0.08
Smoking (yes) 1.58 0.80–3.14 0.19
Location (central) 4.37 1.99–9.60 <0.001* 3.05 1.30–7.19 0.011
   Tumor long diameter 1.00 0.99–1.01 0.80
   Tumor short diameter 1.02 0.99–1.05 0.11
CEA (≥5) 2.30 1.12–4.71 0.023*
   Tumor SUVmax 1.00 0.98–1.01 0.88
   Tumor SUVmean 1.18 1.04–1.34 0.014*
   Tumor MTV 1.00 0.99–1.01 0.92
   Tumor TLG 1.00 1.00–1.01 0.94
Pathology source (SCC) 1.86 0.91–3.83 0.09
Bilateral hilar FDG uptake (yes) 0.20 0.10–0.41 <0.001* 0.25 0.12–0.51 <0.001*
   LN margin (unclear) 0.90 0.32–2.54 0.85
   LN SUVmax 1.06 0.97–1.15 >0.99
   LN SUVmean 0.99 0.98–1.01 >0.99
   LN CT values 0.99 0.99–1.00 0.23
L/S ratio 0.37 0.18–0.79 0.009 0.38 0.17–0.85 0.019
   LN long diameter 0.95 0.88–1.01 0.15
   LN short diameter 1.05 0.94–1.17 0.38

*, P<0.05. CEA, carcinoembryonic antigen; CI, confidence interval; CT, computed tomography; FDG, fluorodeoxyglucose; L/S, long-to-short axis; LN, lymph node; LR, logistic regression; MTV, metabolic tumor volume; OR, odds ratio; SCC, squamous cell carcinoma; SUVmax, maximum standardized uptake value; SUVmean, mean standardized uptake value; TLG, total lesion glycolysis.

A total of nine machine learning models were employed to predict patient outcomes. The predictive efficacy of these models is presented in Figure 3, as illustrated by ROC curves and the corresponding AUC values. Overall, most models achieved high performance on the training set, with AUCs exceeding 0.90. However, varying degrees of performance degradation were observed on the test and external validation sets, reflecting differences in generalization ability.

Figure 3 ROC curves and performance metrics of various models across different datasets. (A) ROC curve of the training set; (B) ROC curve of the test set; (C) ROC curve of the external validation set. The lower panels show comparisons of model accuracy, sensitivity, specificity, precision, and F1 score. AdaBoost, adaptive boosting; ANN, artificial neural network; AUC, area under the ROC curve; CI, confidence interval; DT, decision tree; ET, extra trees; GBM, gradient boosting machine; LR, logistic regression; RF, random forest; ROC, receiver operating characteristic; SVM, support vector machine; XGBoost, extreme gradient boosting.

Notably, the ANN and ET models demonstrated relatively stable and superior performance across all datasets. On the test set, the ANN model achieved an AUC of 0.865 [95% confidence interval (CI): 0.745–0.961], with a sensitivity of 0.667, specificity of 0.906, precision of 0.800, F1 score of 0.727, and accuracy of 0.820. The ET model showed comparable performance, with an AUC of 0.793 (95% CI: 0.643–0.901), sensitivity of 0.556, specificity of 0.906, precision of 0.769, F1 score of 0.645, and accuracy of 0.780. On the external validation set, the ANN model achieved an AUC of 0.817 (95% CI: 0.661–0.897), with a sensitivity of 0.769, specificity of 0.800, precision of 0.800, F1 score of 0.784, and accuracy of 0.784. Similarly, the ET model yielded the highest external AUC of 0.865 (95% CI: 0.760–0.941), along with a sensitivity of 0.769, specificity of 0.840, precision of 0.833, F1 score of 0.800, and accuracy of 0.804. These results suggest that both models maintained strong discriminative power and balanced classification ability across test and external sets, supporting their potential for robust clinical application. In summary, ANN and ET emerged as the most reliable and generalizable models among the algorithms tested. ROC curves, detailed model performance metrics are presented in Figure 3.

The results of the Pearson correlation analysis reveal the distribution of correlations between features. For example, a very high correlation is demonstrated between CT deep network feature 1 (CT_1) and CT deep network feature 2 (CT_2), with a correlation coefficient reaching 1.00, indicating a very strong linear relationship. Similarly, the correlation coefficient between PET deep network feature 1 (PET_1) and PET deep network feature 2 (PET_2) is 0.86, showing significant correlation between these features. In contrast, the correlation between the deep network features from CT and PET and the radiomics features is relatively low, with most coefficients ranging from 0.00 to 0.30. This suggests that the deep network features and radiomics features each provide relatively independent information, possibly reflecting different biological properties and pathological characteristics. This independence underscores the potential value of integrating features from different sources to improve diagnostic accuracy and disease classification. The details of the correlation analysis and charts are included in the appendix, as shown in Figure S1. In Figure S2, we present the SHapley Additive exPlanations (SHAP)-based feature importance analysis alongside individual sample explanations.


Discussion

In this study, we developed and validated a DLR-based approach for differentiating benign from malignant lymph nodes in patients with NSCLC using PET/CT imaging. Among the nine machine learning models evaluated, the ANN and ET models demonstrated the most robust and generalizable performance across the training, test, and external validation sets.

In this study, multivariate LR analysis revealed that central tumor location positively predicts lymph node status, whereas a higher L/S ratio of the lymph nodes and the presence of bilateral hilar FDG uptake are associated with a negative prediction of lymph node status. First, tumors located centrally were more likely to be associated with lymph node positivity compared to peripheral tumors. This may be attributed to the anatomical proximity of central tumors to major bronchovascular structures and lymphatic drainage pathways, which facilitates earlier nodal involvement. Centrally located tumors also tend to exhibit more aggressive biological behavior, increasing the likelihood of regional spread. In conventional radiological assessments, a lower L/S ratio—indicating a rounder node—is often considered suggestive of malignancy, whereas a higher L/S ratio, reflecting a more elongated morphology, is usually associated with benign reactive nodes. This observation is consistent with conventional radiological understanding, suggesting that such nodes may be morphologically benign yet metabolically active due to non-malignant processes. In our study, bilateral hilar FDG uptake was identified as an independent predictor of false-positive lymph node findings on PET/CT. Multivariate LR analysis revealed that patients with symmetrical FDG uptake in the hilar region often exhibited metabolically active lymph nodes on imaging, but were pathologically confirmed as benign. Endoh et al. (27) demonstrated in their study that bilateral hilar FDG uptake was significantly associated with false-positive mediastinal lymph node findings on PET/CT, and identified it as an independent predictor of false positivity (P=0.029). They suggested that such uptake patterns may reflect benign reactive conditions, such as anthracosis or inflammation, rather than true metastatic disease.

Beyond traditional clinical and imaging parameters, our DLR framework demonstrated substantial value in the classification of lymph node status. The radiomics features and deep network features exhibit low Pearson correlation coefficients, indicating a weak linear relationship and suggesting that the correlation between radiomics and deep network features is complementary rather than substitutive. This insight supports the use of our DLR framework, which provides a more adaptive and data-driven approach by directly learning high-dimensional feature representations from the imaging data, unlike conventional handcrafted radiomics features—which are predefined and may fail to capture complex or subtle imaging variations. This capability allows the model to better capture tumor heterogeneity and imaging context, thereby enhancing classification performance. In recent years, radiologists worldwide have been facing an increasing volume and complexity of medical imaging, placing a substantial burden on modern healthcare systems. Moreover, even highly experienced radiologists are subject to inherent human limitations, including fatigue, perceptual biases, and cognitive errors, all of which can contribute to diagnostic inaccuracies (28,29). These challenges have strongly motivated the adoption of deep learning techniques to assist in various tasks within medical imaging, aiming to enhance accuracy, efficiency, and consistency in clinical decision-making. Recent studies have also underscored the potential of deep learning in nodal evaluation across various cancer types. Zhong et al. (30) developed and validated a cross-modal deep learning model (DLNMS) based on PET/CT imaging to noninvasively predict occult nodal metastasis in patients with clinical N0 NSCLC. Li et al. (31) developed an automated deep learning system named FAIS-DL, which integrates multiregional dynamic contrast-enhanced MRI features of both the primary tumor and axillary lymph nodes, along with clinicopathological information, to predict axillary pathological complete response (pCR) after neoadjuvant chemotherapy in breast cancer patients. The system achieved a clinical benefit rate of up to 86.5%, demonstrating strong potential for personalized treatment decision-making and optimized axillary management. These studies collectively demonstrate that deep learning holds great promise in improving disease diagnostic accuracy by automatically extracting high-level features from multimodal imaging and integrating them with clinical information.

There are several limitations in this study. First, the sample size was relatively small, which may limit the statistical power and generalizability of the findings. Deep learning models, in particular, typically require large-scale datasets to achieve optimal performance and avoid overfitting. Second, although external validation was performed, all data were derived from two centers, and further multi-center prospective studies are warranted to confirm the robustness and applicability of the proposed model. Third, while our model integrates imaging and clinical features, it does not yet incorporate other potential biomarkers, such as genomic or pathological data, which could further enhance predictive accuracy. Future research should focus on expanding the dataset, integrating multimodal information, and improving the interpretability of the model to facilitate clinical translation.


Conclusions

In conclusion, this study developed and validated a DLR model based on PET/CT imaging for the preoperative classification of benign and malignant lymph nodes in patients with NSCLC. The model demonstrated superior predictive performance compared to traditional clinical models and conventional machine learning approaches, particularly with the ANN and ET classifiers. Multivariate analysis further identified tumor location, L/S ratio, and bilateral hilar FDG uptake as significant clinical predictors of nodal status. By integrating high-dimensional image features and key clinical variables, the proposed DLR framework offers a robust and noninvasive tool to support accurate lymph node staging and optimize treatment decision-making in NSCLC. These findings highlight the potential of deep learning-based approaches to advance personalized cancer care.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-650/rc

Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-650/dss

Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-650/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2025-650/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Medical Ethics Committee of The Second Affiliated Hospital of Harbin Medical University (approval No. YJSKY 2024-035). Affiliated Cancer Hospital of Harbin Medical University was also informed and agreed to the study. The requirement for written informed consent was waived due to the retrospective nature of the study.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Siegel RL, Giaquinto AN, Jemal A. Cancer statistics, 2024. CA Cancer J Clin 2024;74:12-49. [Crossref] [PubMed]
  2. Eslami-S Z, Cortés-Hernández LE, Sinoquet L, et al. Circulating tumour cells and PD-L1-positive small extracellular vesicles: the liquid biopsy combination for prognostic information in patients with metastatic non-small cell lung cancer. Br J Cancer 2024;130:63-72. [Crossref] [PubMed]
  3. Petrella F, Rizzo S, Attili I, et al. Stage III Non-Small-Cell Lung Cancer: An Overview of Treatment Options. Curr Oncol 2023;30:3160-75. [Crossref] [PubMed]
  4. Huang M, Zou Y, Wang W, et al. The role of baseline (18)F-FDG PET/CT for survival prognosis in NSCLC patients undergoing immunotherapy: a systematic review and meta-analysis. Ther Adv Med Oncol 2024;16:17588359241293364. [Crossref] [PubMed]
  5. Xie Y, Zhao H, Guo Y, et al. A PET/CT nomogram incorporating SUVmax and CT radiomics for preoperative nodal staging in non-small cell lung cancer. Eur Radiol 2021;31:6030-8. [Crossref] [PubMed]
  6. Iskender I, Kadioglu SZ, Cosgun T, et al. False-positivity of mediastinal lymph nodes has negative effect on survival in potentially resectable non-small cell lung cancer. Eur J Cardiothorac Surg 2012;41:874-9. [Crossref] [PubMed]
  7. Müller J, Leger S, Zwanenburg A, et al. Radiomics-based tumor phenotype determination based on medical imaging and tumor microenvironment in a preclinical setting. Radiother Oncol 2022;169:96-104. [Crossref] [PubMed]
  8. Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures, They Are Data. Radiology 2016;278:563-77. [Crossref] [PubMed]
  9. Duan F, Zhang M, Yang C, et al. Non-invasive Prediction of Lymph Node Metastasis in NSCLC Using Clinical, Radiomics, and Deep Learning Features From (18)F-FDG PET/CT Based on Interpretable Machine Learning. Acad Radiol 2025;32:1645-55. [Crossref] [PubMed]
  10. Wang H, Zhang J, Li Y, et al. Deep-learning features based on F18 fluorodeoxyglucose positron emission tomography/computed tomography ((18)F-FDG PET/CT) to predict preoperative colorectal cancer lymph node metastasis. Clin Radiol 2024;79:e1152-8. [Crossref] [PubMed]
  11. Ubaldi L, Valenti V, Borgese RF, et al. Strategies to develop radiomics and machine learning models for lung cancer stage and histology prediction using small data samples. Phys Med 2021;90:13-22. [Crossref] [PubMed]
  12. Sun F, Chen Y, Chen X, et al. CT-based radiomics for predicting brain metastases as the first failure in patients with curatively resected locally advanced non-small cell lung cancer. Eur J Radiol 2021;134:109411. [Crossref] [PubMed]
  13. Miao S, Jia H, Cheng K, et al. Deep learning radiomics under multimodality explore association between muscle/fat and metastasis and survival in breast cancer patients. Brief Bioinform 2022;23:bbac432. [Crossref] [PubMed]
  14. Hosny A, Aerts HJ, Mak RH. Handcrafted versus deep learning radiomics for prediction of cancer therapy response. Lancet Digit Health 2019;1:e106-7. [Crossref] [PubMed]
  15. Claudio Quiros A, Coudray N, Yeaton A, et al. Mapping the landscape of histomorphological cancer phenotypes using self-supervised learning on unannotated pathology slides. Nat Commun 2024;15:4596. [Crossref] [PubMed]
  16. Thomas SM, Lefevre JG, Baxter G, et al. Interpretable deep learning systems for multi-class segmentation and classification of non-melanoma skin cancer. Med Image Anal 2021;68:101915. [Crossref] [PubMed]
  17. Li Y, Deng J, Ma X, et al. Diagnostic accuracy of CT and PET/CT radiomics in predicting lymph node metastasis in non-small cell lung cancer. Eur Radiol 2025;35:1966-79. [Crossref] [PubMed]
  18. Ma X, Xia L, Chen J, et al. Development and validation of a deep learning signature for predicting lymph node metastasis in lung adenocarcinoma: comparison with radiomics signature and clinical-semantic model. Eur Radiol 2023;33:1949-62. [Crossref] [PubMed]
  19. Ouyang ML, Wang YR, Deng QS, et al. Development and Validation of a (18)F-FDG PET-Based Radiomic Model for Evaluating Hypermetabolic Mediastinal-Hilar Lymph Nodes in Non-Small-Cell Lung Cancer. Front Oncol 2021;11:710909. [Crossref] [PubMed]
  20. Wang D, Zhuang Z, Wu S, et al. A Dual-Energy CT Radiomics of the Regional Largest Short-Axis Lymph Node Can Improve the Prediction of Lymph Node Metastasis in Patients With Rectal Cancer. Front Oncol 2022;12:846840. [Crossref] [PubMed]
  21. Klug M, Kirshenboim Z, Truong MT, et al. Proposed Ninth Edition TNM Staging System for Lung Cancer: Guide for Radiologists. Radiographics 2024;44:e240057.
  22. Nioche C, Orlhac F, Boughdad S, et al. LIFEx: A Freeware for Radiomic Feature Calculation in Multimodality Imaging to Accelerate Advances in the Characterization of Tumor Heterogeneity. Cancer Res 2018;78:4786-9. [Crossref] [PubMed]
  23. Song Y, Zhang J, Zhang YD, et al. FeAture Explorer (FAE): A tool for developing and comparing radiomics models. PLoS One 2020;15:e0237587. [Crossref] [PubMed]
  24. Lu M, Zheng Y, Liu S, et al. Deep learning model for automated diagnosis of moyamoya disease based on magnetic resonance angiography. EClinicalMedicine 2024;77:102888. [Crossref] [PubMed]
  25. Li Z, Xie H, Wang Z, et al. Deep learning for multi-type infectious keratitis diagnosis: A nationwide, cross-sectional, multicenter study. NPJ Digit Med 2024;7:181. [Crossref] [PubMed]
  26. Han D, Li H, Zheng X, et al. Whole slide image-based weakly supervised deep learning for predicting major pathological response in non-small cell lung cancer following neoadjuvant chemoimmunotherapy: a multicenter, retrospective, cohort study. Front Immunol 2024;15:1453232. [Crossref] [PubMed]
  27. Endoh H, Yamamoto R, Ichikawa A, et al. Clinicopathologic Significance of False-Positive Lymph Node Status on FDG-PET in Lung Cancer. Clin Lung Cancer 2021;22:218-24. [Crossref] [PubMed]
  28. Yoon SY, Lee KS, Bezuidenhout AF, et al. Spectrum of Cognitive Biases in Diagnostic Radiology. Radiographics 2024;44:e230059. [Crossref] [PubMed]
  29. Song Z, Zhang W, Jiang Q, et al. Artificial intelligence-aided detection for prostate cancer with multimodal routine health check-up data: an Asian multi-center study. Int J Surg 2023;109:3848-60. [Crossref] [PubMed]
  30. Zhong Y, Cai C, Chen T, et al. PET/CT based cross-modal deep learning signature to predict occult nodal metastasis in lung cancer. Nat Commun 2023;14:7513. [Crossref] [PubMed]
  31. Li Z, Gao J, Zhou H, et al. Multiregional dynamic contrast-enhanced MRI-based integrated system for predicting pathological complete response of axillary lymph node to neoadjuvant chemotherapy in breast cancer: multicentre study. EBioMedicine 2024;107:105311. [Crossref] [PubMed]
Cite this article as: Duan F, Zu H, Zhang R, Li Y, Zhao Y, Wang X, Zhang M, Li P, Wang D. Deep learning radiomics and 18F-FDG PET/CT imaging: mediastinal lymph node characteristics as predictors of metastasis in non-small cell lung cancer. Transl Lung Cancer Res 2025;14(10):4301-4314. doi: 10.21037/tlcr-2025-650

Download Citation