Prediction of peak oxygen uptake using interpretable machine learning on routinely available preoperative assessments before lung resection
Original Article

Prediction of peak oxygen uptake using interpretable machine learning on routinely available preoperative assessments before lung resection

Se-Hun Kim1, Sa-Eun Park1, Cho-Hui Hong2, Tae-Sung Park3, Myung-Jun Shin3, Ki-Hun Kim1,4, Sang-Hun Kim3

1Industrial Engineering, Pusan National University, Busan, Republic of Korea; 2Biomedical Research Institute, Pusan National University Hospital, Busan, Republic of Korea; 3Department of Rehabilitation Medicine, Biomedical Research Institute, Pusan National University Hospital, Pusan National University School of Medicine, Busan, Republic of Korea; 4Graduate School of Data Science, Pusan National University, Busan, Republic of Korea

Contributions: (I) Conception and design: All authors; (II) Administrative support: All authors; (III) Provision of study materials or patients: All authors; (IV) Collection and assembly of data: CH Hong; (V) Data analysis and interpretation: Se-Hun Kim, SE Park, KH Kim, Sang-Hun Kim; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Ki-Hun Kim, PhD. Industrial Engineering, Pusan National University, Busan, Republic of Korea; Graduate School of Data Science, Pusan National University, Busan, Republic of Korea. Email: kihun@pusan.ac.kr; Sang-Hun Kim, MD, PhD. Department of Rehabilitation Medicine, Biomedical Research Institute, Pusan National University Hospital, Pusan National University School of Medicine, Busan, Republic of Korea. Email: kel5504@gmail.com.

Background: Cardiopulmonary exercise testing is the reference standard for preoperative functional assessment before lung resection, but its use is limited by resource and practical constraints. This study developed and internally evaluated an interpretable machine learning model to estimate preoperative peak oxygen uptake from routine clinical assessments, quantify the contribution of individual predictors to model estimation, and evaluate classification performance at the clinically relevant threshold of 20 mL·kg−1·min−1.

Methods: This single-centre retrospective study included 320 consecutive patients in South Korea who underwent preoperative treadmill cardiopulmonary exercise testing between April 2018 and March 2024. Thirty-three routinely available predictors, including demographic characteristics, anthropometric measures, pulmonary function results, and bioimpedance-derived indices, were used to train regression models. Model development employed 10 repeats of 5-fold nested cross-validation. The best-performing model was interpreted using Shapley additive explanations. Potential prioritisation performance was assessed by classifying patients with peak oxygen uptake below the prespecified threshold.

Results: Random forest showed the best performance, with a root mean square error of 3.750±0.731, a mean absolute error of 2.901±0.599, and a coefficient of determination of 0.323±0.153, indicating moderate explanatory performance for continuous VO2peak estimation. The most influential predictors were forced expiratory volume in 1 second, forced vital capacity, age, and bioimpedance-derived phase angle. At the threshold of 20 mL·kg−1·min−1, the model achieved an area under the receiver operating characteristic curve of 0.842±0.072.

Conclusions: This model showed moderate performance for estimating peak oxygen uptake, identified clinically relevant contributors, and may support prioritisation for CPET and consideration of prehabilitation assessment when resources are limited.

Keywords: Thoracic surgery; preoperative risk assessment; interpretable machine learning (IML); peak oxygen uptake (VO2peak); bioelectrical impedance analysis


Submitted Mar 30, 2026. Accepted for publication May 14, 2026. Published online May 28, 2026.

doi: 10.21037/tlcr-2026-0351


Highlight box

Key findings

• An interpretable machine learning model was developed to estimate preoperative peak oxygen uptake (VO2peak) from routinely available demographic, spirometric, and bioimpedance-derived variables in patients undergoing lung resection, while quantifying the contribution of individual predictors.

• Random forest showed the best overall regression performance, with moderate explanatory performance for continuous VO2peak estimation, and achieved reasonable discrimination at the 20 mL·kg−1·min−1 threshold.

• Forced expiratory volume in 1 second (FEV1), forced vital capacity (FVC), age, and bioimpedance-derived phase angle were among the most influential predictors.

What is known and what is new?

• Cardiopulmonary exercise testing (CPET) is the reference standard for functional assessment before lung resection, but access may be limited by equipment, personnel, and workflow constraints.

• This study provides an interpretable, exploratory approach for estimating CPET-derived VO2peak using routine preoperative data and suggests that bioimpedance-derived variables may provide complementary information beyond conventional spirometry.

What is the implication, and what should change now?

• The model may support prioritisation for formal CPET and inform consideration of prehabilitation assessment when testing capacity is limited.

• It should complement, not replace, standard physiologic evaluation before lung resection, and requires external validation before clinical implementation.


Introduction

Lung cancer remains the leading cause of cancer-related mortality worldwide, and surgical resection is central to curative treatment in operable disease. However, lung resection is associated with postoperative morbidity and mortality, particularly cardiopulmonary complications such as pneumonia and respiratory failure, which are strongly related to limited cardiopulmonary reserve (1,2). Accurate preoperative functional assessment is therefore essential to support patient selection, perioperative planning, and risk-informed care.

Cardiopulmonary exercise testing (CPET) is regarded as the reference standard for preoperative functional evaluation before major thoracic surgery (3). Among CPET-derived variables, peak oxygen uptake (VO2peak) is strongly associated with postoperative outcomes and is incorporated into clinical guidelines for patients with impaired pulmonary function or elevated surgical risk (4,5). Despite its clinical value, however, CPET is not universally available because it requires specialized equipment, trained personnel, and time-intensive protocols. As a result, access varies across institutions and may delay preoperative pathways (6). In contemporary lung resection cohorts, the incremental predictive value of CPET-derived variables beyond routinely available clinical and pulmonary function data may be less pronounced than previously assumed (7). Rather than diminishing the importance of CPET, this perspective underscores the need for complementary tools that may support more efficient and targeted use of CPET based on accessible preoperative information.

Several preoperative measures are already available in routine care, including demographic characteristics, anthropometrics, bioimpedance-derived body composition indices, which may reflect muscle mass, fluid distribution, and tissue-related characteristics (8), pulmonary function tests (3), and physical performance metrics (9). Nevertheless, exercise capacity in patients with lung cancer is influenced by complex disease-specific factors often absent in healthy cohorts (10), including coexisting chronic obstructive pulmonary disease (COPD), cumulative smoking history, cancer-associated cachexia, and frailty. Consequently, prediction equations derived from general populations, which often fail to account for ethnic and temporal variations in exercise capacity (11), may not be directly applicable to patients being evaluated for lung resection. Moreover, evidence in lung resection cohorts remains limited regarding practical approaches that use routine preoperative data to generate clinically interpretable estimates of functional reserve.

In developing such tools, conventional prediction equations in preoperative care have largely relied on linear regression approaches. While highly interpretable, these linear models often fail to capture the complex, non-linear interrelationships among physiological variables in a heterogeneous patient population. Although standard machine learning models can address this non-linearity, their “black-box” nature limits clinical understandability and hinders clinician trust. Interpretable machine learning (IML) offers a solution by modeling non-linear relationships while preserving transparency in how predictors contribute to the estimate (12). In preoperative care—where decisions regarding further functional testing and perioperative optimisation must remain clinically justifiable—this transparency is crucial. Ultimately, rather than replacing formal CPET, an IML model based on routinely available data may help preliminarily prioritise patients who should undergo formal CPET and may inform consideration of prehabilitation, pending further validation.

This study aimed to develop and internally evaluate an IML model for estimating preoperative VO2peak using routinely available assessments in patients scheduled for lung resection, with the goal of supporting prioritisation for formal CPET rather than replacing CPET. We present this article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0351/rc).


Methods

This study aimed to develop a predictive model to estimate measured preoperative VO2peak using routinely available clinical assessments. The overall study process comprised four steps: (I) collection of preoperative clinical data, including demographic characteristics, anthropometrics, body composition indices, and pulmonary function test results; (II) development and internal evaluation of prediction models for estimating VO2peak; and (III) application of SHapley Additive exPlanations (SHAP) (13); and (IV) exploratory assessment of threshold-based classification performance at the 20 mL·kg−1·min−1 VO2peak cutoff to examine the model’s potential role in CPET prioritisation. This approach was intended to explore whether routinely available preoperative data could provide supportive information for prioritising formal CPET when access to CPET is constrained.

Study population and baseline characteristics

This single-centre, retrospective study analysed a consecutive cohort of 320 patients scheduled for lung resection at Pusan National University Hospital between April 2018 and March 2024. The inclusion criteria were: (I) completion of preoperative CPET; (II) availability of demographics, anthropometrics, body composition, and pulmonary function test (PFT) results obtained within 6 months of the CPET; and (III) availability of data recorded before surgery. When multiple examinations were available for a single patient, measurements temporally closest to the CPET were selected. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of Pusan National University Hospital (IRB No. 2407-016-141), and individual consent for this retrospective analysis was waived. A complete-case analysis was performed. In the source cohort, 688 consecutive patients scheduled for lung resection were initially screened. Patients were included in the final analytic cohort only if they had completed preoperative CPET and had all required demographic, anthropometric, bioimpedance-derived body composition, pulmonary function, and CPET-derived VO2peak data available. Patients with missing required predictor or outcome variables were excluded, resulting in a final complete-case analytic cohort of 320 patients. No statistical imputation, including mean imputation, multiple imputation, or model-based imputation, was performed.

Although the eligibility criterion allowed routine preoperative assessments performed within 6 months of CPET, the actual intervals were substantially shorter in most patients. We separately calculated the intervals between CPET and the major routine preoperative assessments, including pulmonary function testing and bioimpedance analysis (BIA). The median interval between CPET and pulmonary function testing was 11.0 days [interquartile range (IQR), 5.0–27.0 days], and 96.0% of patients underwent pulmonary function testing within 3 months of CPET. The median interval between CPET and BIA was 0.0 days (IQR, 0.0–0.0 days), and 99.3% of patients underwent BIA within 3 months of CPET. The maximum intervals were 167.0 days for pulmonary function testing and 164.0 days for BIA.

CPET

All patients underwent symptom-limited CPET on a treadmill using a breath-by-breath gas analysis system (COSMED, Rome, Italy). Tests were conducted according to a modified Bruce protocol. Gas analyzers were calibrated before each assessment. Breath-by-breath data were time-averaged over 10-second intervals. The primary outcome, VO2peak, was defined as the highest 30-second averaged VO2 value obtained at test termination. This value was extracted directly from the standardized system report (PFT Ergo 10.0).

Bioimpedance-derived indices

BIA was performed using a multifrequency device (InBody S10, InBody Co., Seoul, Republic of Korea) with the patient in the supine position using tetrapolar electrodes after at least 2 hours of fasting. Patients were instructed to avoid strenuous exercise for at least 12 hours before testing. The following BIA-derived indices were recorded:

  • Phase angle: calculated as the arctangent of reactance to resistance at 5, 50, and 250 kHz for each body segment, including right arm (RA), left arm (LA), right leg (RL), left leg (LL), trunk (TR), and whole body (WB). Higher values are generally considered to reflect better cellular membrane integrity and nutritional or functional status.
  • Extracellular water-to-total body water ratio (ECW/TBW): defined as the ratio of extracellular water relative to total body water, expressed globally and segmentally. Elevated ratios indicate fluid imbalance or edema.
  • Skeletal muscle index (SMI): defined as the ratio of appendicular skeletal muscle mass adjusted for height squared (kg/m²), representing relative muscle mass.

Model development and evaluation

Predictive performance was evaluated using 10 repeats of 5-fold nested cross-validation, with five outer folds used for performance estimation and five inner folds used for hyperparameter tuning. Each repeat was generated from an independent random partition to improve the robustness of performance estimates. As VO2peak was treated as a continuous outcome, all models were trained for regression. Continuous predictors were z-score standardised using statistics derived from each training split and then applied to the corresponding validation and test data. Sex was treated as a categorical variable and encoded as an indicator variable without standardisation.

Multiple linear regression (MLR) was used as a baseline model commonly employed in medical statistics. Additional models included k-nearest neighbour regression (kNN), support vector regression (SVR), random forest (RF), extreme gradient boosting (XGB), multi-layer perceptron (MLP), and tabular deep-learning architectures [TabTransformer (14), FT-Transformer (15), TabNet (16)]. Hyperparameters were optimised using Optuna within the inner cross-validation loop. The selected hyperparameter set was then refitted on the full outer-training split and evaluated on the corresponding held-out outer test fold.

The full predictor set (Set F), comprising demographic, anthropometric, spirometric, and bioimpedance-derived body composition variables, was used for the primary model-development analysis. To assess whether this full predictor set provided incremental value beyond simpler variable combinations, we additionally conducted feature-set comparison analyses using five reduced comparator sets:

  • Set A: demographic and anthropometric variables;
  • Set B: spirometric variables only, including Forced expiratory volume in 1 second (FEV1), forced vital capacity (FVC), and FEV1/FVC;
  • Set C: bioimpedance-derived body composition variables only;
  • Set D: demographic and anthropometric variables combined with spirometric variables;
  • Set E: demographic and anthropometric variables combined with bioimpedance-derived body composition variables;
  • Set F: full predictor set.

Sets A–E were used as comparator analyses, whereas Set F represented the main analysis. For the feature-set comparison, we focused on RF, MLR, and SVR, which were the three leading regression models in the full predictor-set analysis. The hyperparameter search spaces for the evaluated models are summarized in Table S1.

Interpretable post-hoc analysis of the best prediction models

For clinical interpretation, the RF, which achieved the best overall performance, was selected as the final predictive model. To maximise clinical interpretability, this study applied SHAP to quantify how each variable increased or decreased the model’s predicted VO2peak relative to the average prediction. For tree-based models, the TreeSHAP algorithm was used to compute feature-specific contribution values for each prediction. Global importance was summarised by the mean absolute SHAP value, and the sign of each SHAP value indicated the direction of its association with the predicted outcome.

Statistical analysis


Results

Baseline characteristics

Baseline characteristics are shown in Table 1. A total of 320 patients were included (mean age 69.1±7.6 years; 65.9% male), with a mean VO2peak of 23.4±4.9 mL/kg/min.

Table 1

Baseline characteristics of research data

Characteristics and CPET values Female (N=109) Male (N=211) All (N=320)
Demographics & anthropometrics
   Age (years) 67.174±7.625 70.128±7.469 69.122±7.640
   Height (cm) 154.673±5.253 165.488±10.915 161.804±10.683
   Weight (kg) 57.734±9.010 66.082±14.228 63.238±13.283
Body composition indices
   250kHz-la phase angle (°) 6.386±2.071 6.502±1.298 6.463±1.601
   250kHz-ll phase angle (°) 4.415±0.961 4.529±0.918 4.490±0.933
   250khz-ra phase angle (°) 6.600±2.285 6.655±1.443 6.636±1.771
   250khz-rl phase angle (°) 4.807±1.002 4.878±1.103 4.854±1.069
   250kHz-tr phase angle (°) 3.654±4.623 3.892±3.218 3.811±3.750
   50kHz-la phase angle (°) 5.170±2.443 5.534±1.540 5.410±1.900
   50kHz-ll phase angle (°) 5.458±1.172 5.846±1.188 5.714±1.195
   50kHz-ra phase angle (°) 5.350±2.354 5.738±1.785 5.606±2.002
   50kHz-rl phase angle (°) 5.705±1.193 6.063±1.288 5.941±1.266
   50kHz-tr phase angle (°) 5.362±5.051 5.542±3.146 5.481±3.893
   50kHz-whole body phase angle (°) 5.428±1.893 5.796±1.418 5.671±1.603
   5kHz-la phase angle (°) 2.157±1.531 2.329±1.095 2.271±1.261
   5kHz-ll phase angle (°) 2.328±0.704 2.525±0.763 2.458±0.748
   5kHz-ra phase angle (°) 2.212±1.217 2.422±1.148 2.350±1.175
   5kHz-rl phase angle (°) 2.420±0.790 2.667±0.863 2.583±0.846
   5kHz-tr phase angle (°) 3.211±4.132 3.020±2.331 3.085±3.060
   Protein (kg) 7.658±1.207 10.799±9.196 9.729±7.641
   SMI (kg/m²) 7.206±1.418 8.750±2.749 8.224±2.488
   TBW/FFM (%) 73.626±0.336 75.228±15.272 74.683±12.416
   PBF (%) 30.928±8.236 20.893±8.432 24.311±9.616
   ECW/TBW (ratio) 0.392±0.011 0.389±0.010 0.390±0.011
   ECW/TBW (la) (ratio) 0.379±0.015 0.379±0.009 0.379±0.012
   ECW/TBW (ll) (ratio) 0.398±0.013 0.397±0.012 0.397±0.012
   ECW/TBW (ra) (ratio) 0.377±0.015 0.376±0.011 0.376±0.013
   ECW/TBW (rl) (ratio) 0.393±0.013 0.390±0.014 0.391±0.013
   ECW/TBW (tr) (ratio) 0.391±0.009 0.387±0.010 0.389±0.010
Pulmonary function indices
   FEV1 (L) 1.912±0.409 2.311±0.632 2.175±0.596
   FEV1/FVC (%) 74.358±9.148 67.009±10.765 69.513±10.807
   FVC (L) 2.570±0.461 3.435±0.716 3.140±0.761
Cardiopulmonary exercise index
   VO2peak (mL·kg−1·min−1) 23.005±4.382 23.651±5.173 23.431±4.921

Data are presented as mean ± standard deviation. BIA, bioelectrical impedance analysis; CPET, cardiopulmonary exercise testing; ECW, extracellular water; ECW/TBW, extracellular water-to-total body water ratio; FEV1, forced expiratory volume in 1 s; FEV1/FVC, ratio of forced expiratory volume in 1 s to forced vital capacity; FFM, fat-free mass; FVC, forced vital capacity; LA, left arm; LL, left leg; PBF, percent body fat; RA, right arm; RL, right leg; SMI, skeletal muscle index; TBW, total body water; TR, trunk; VO2peak, peak oxygen uptake; WB, whole body. Phase angles were calculated at 5, 50, and 250 kHz.

Comparison of model performance

Table 2 summarises model performance. RF showed the best overall regression performance [root mean square error (RMSE) 3.750±0.731; mean absolute error (MAE) 2.901±0.599; R2 0.323±0.153], although the margin over MLR was small. SVR and XGBoost also showed competitive performance, whereas the deep-learning models generally did not outperform RF.

Table 2

Comparison of prediction model performance with 95% CI for VO2peak

Model RMSE ↓ MAE ↓ R² ↑
MLR 3.791±0.914 (3.137–4.445) 2.946±0.716 (2.434–3.458) 0.322±0.169 (0.202–0.443)
KNN 4.016±0.744 (3.484–4.549) 3.256±0.628 (2.807–3.706) 0.232±0.134 (0.136–0.328)
SVR 3.817±0.697 (3.319–4.315) 3.047±0.592 (2.623–3.471) 0.305±0.118 (0.221–0.390)
RF 3.750±0.731 (3.227–4.272) 2.901±0.599 (2.472–3.330) 0.323±0.153 (0.214–0.433)
XGB 3.840±0.671 (3.360–4.320) 3.049±0.547 (2.657–3.440) 0.287±0.165 (0.168–0.405)
MLP 4.411±0.944 (3.736–5.086) 3.573±0.830 (2.979–4.166) 0.070±0.217 (−0.085–0.226)
FT-Transformer 4.089±0.771 (3.537–4.640) 3.284±0.661 (2.811–3.757) 0.172±0.326 (−0.061–0.405)
TabTransformer 4.498±0.836 (3.900–5.096) 3.661±0.739 (3.132–4.189) −0.036±0.489 (−0.386–0.313)
TabNet 4.442±0.861 (3.826–5.059) 3.601±0.821 (3.014–4.188) 0.058±0.201 (−0.085–0.202)

Data are presented as mean ± standard deviation (95% CI). Downward arrows (↓) indicate that lower values represent better performance, whereas upward arrows (↑) indicate that higher values represent better performance. CI, confidence interval; KNN, k-nearest neighbors; MAE, mean absolute error; MLP, multilayer perceptron; MLR, multiple linear regression; R², coefficient of determination; RF, random forest; RMSE, root mean square error; SVR, support vector regression; VO2peak, peak oxygen uptake; XGB, extreme gradient boosting.

The additional feature-set comparison is summarized in Table 3. Across RF, MLR, and SVR, the full predictor set showed the strongest overall regression performance. Within the full predictor set, RF achieved the lowest RMSE and MAE and the highest R2 (RMSE 3.750±0.731; MAE 2.901±0.599; R2 0.323±0.153). Among the reduced predictor sets, bioimpedance-derived body composition variables generally performed better than spirometric variables alone. Specifically, the lowest RMSE for Set C was observed with MLR [RMSE 3.979; 95% confidence interval (CI), 3.479–4.480], whereas the lowest RMSE for Set B was observed with SVR (RMSE 4.245; 95% CI, 3.668–4.822). Similarly, adding bioimpedance-derived body composition variables to demographic and anthropometric variables showed better performance than adding spirometric variables, with Set E achieving its lowest RMSE with SVR (RMSE 3.931; 95% CI, 3.419–4.442) and Set D achieving its lowest RMSE with MLR (RMSE 4.061; 95% CI, 3.466–4.656). These comparisons suggest that bioimpedance-derived body composition variables may provide complementary predictive information beyond conventional spirometry.

Table 3

Feature-set comparison of model performance for VO2peak prediction with 95% CI

Predictor set Model RMSE ↓ MAE ↓ R² ↑
Set A RF 4.503±0.725 (3.984–5.021) 3.623±0.699 (3.124–4.123) 0.024±0.166 (–0.094–0.143)
MLR 4.405±0.745 (3.872–4.938) 3.573±0.680 (3.087–4.060) 0.073±0.125 (–0.016–0.162)
SVR 4.813±0.937 (4.143–5.484) 3.818±0.582 (3.402–4.234) −0.373±1.436 (–1.401–0.654)
Set B RF 4.366±0.808 (3.788–4.944) 3.485±0.680 (2.999–3.971) 0.098±0.120 (0.012–0.183)
MLR 4.251±0.788 (3.688–4.815) 3.403±0.661 (2.930–3.875) 0.146±0.111 (0.066–0.225)
SVR 4.245±0.807 (3.668–4.822) 3.469±0.719 (2.955–3.983) 0.148±0.119 (0.063–0.233)
Set C RF 4.021±0.796 (3.452–4.591) 3.229±0.647 (2.766–3.692) 0.229±0.155 (0.118–0.339)
MLR 3.979±0.700 (3.479–4.480) 3.131±0.612 (2.693–3.569) 0.236±0.169 (0.115–0.357)
SVR 3.995±0.732 (3.471–4.519) 3.147±0.639 (2.690–3.604) 0.241±0.110 (0.163–0.320)
Set D RF 4.098±0.699 (3.598–4.598) 3.311±0.513 (2.945–3.678) 0.190±0.154 (0.080–0.300)
MLR 4.061±0.832 (3.466–4.656) 3.254±0.628 (2.804–3.703) 0.219±0.123 (0.131–0.307)
SVR 4.222±0.470 (3.886–4.557) 3.474±0.423 (3.172–3.777) 0.095±0.375 (–0.173–0.364)
Set E RF 3.958±0.772 (3.406–4.511) 3.177±0.622 (2.732–3.621) 0.250±0.159 (0.137–0.364)
MLR 4.009±0.806 (3.432–4.586) 3.133±0.661 (2.660–3.606) 0.233±0.164 (0.116–0.350)
SVR 3.931±0.715 (3.419–4.442) 3.156±0.611 (2.719–3.593) 0.261±0.136 (0.164–0.358)
Set F RF 3.750±0.731 (3.227–4.272) 2.901±0.599 (2.472–3.330) 0.323±0.153 (0.214–0.433)
MLR 3.791±0.914 (3.137–4.445) 2.946±0.716 (2.434–3.458) 0.322±0.169 (0.202–0.443)
SVR 3.817±0.697 (3.319–4.315) 3.047±0.592 (2.623–3.471) 0.305±0.118 (0.221–0.390)

Data are presented as mean ± standard deviation (95% CI). Set A: demographic and anthropometric variables; Set B: spirometric variables only, including FEV1, FVC, and FEV1/FVC; Set C: bioimpedance-derived body composition variables only; Set D: demographic and anthropometric variables combined with spirometric variables; Set E: demographic and anthropometric variables combined with bioimpedance-derived body composition variables; Set F: full predictor set. Downward arrows (↓) indicate that lower values represent better performance, whereas upward arrows (↑) indicate that higher values represent better performance. CI, confidence interval; FEV1, forced expiratory volume in 1 s; FEV1/FVC, ratio of forced expiratory volume in 1 s to forced vital capacity; FVC, forced vital capacity; MAE, mean absolute error; MLR, multiple linear regression; R², coefficient of determination; RF, random forest; RMSE, root mean square error; SVR, support vector regression; VO2peak, peak oxygen uptake.

Variable importance results from SHAP analysis

Figure 1 shows the SHAP summary plot for the 15 predictors with the highest mean absolute SHAP values (mean(|SHAP|)) across observations, and Table 4 quantifies their mean(|SHAP|) rankings together with Pearson’s correlation coefficients with VO2peak. Features are ordered by decreasing mean(|SHAP|). In the SHAP plot, each point represents an individual observation; the x-axis denotes the SHAP value (feature contribution relative to the model baseline), and colour indicates the feature value (blue = low, red = high). Overall, the SHAP patterns suggest that predicted VO2peak reflects multiple domains of physiologic reserve, including pulmonary function, aging, skeletal muscle quality, body composition, and fluid distribution.

Figure 1 SHAP summary plot of the top 15 predictors in the VO2peak prediction model. Predictors are ranked in descending order of importance on the y-axis according to mean absolute SHAP values. The x-axis represents the SHAP value (impact on model output), and each dot represents an individual patient. Dot colour indicates the feature value from low (blue) to high (red). ECW, extracellular water; ECW/TBW, extracellular water-to-total body water ratio; FEV1, forced expiratory volume in 1 s (L); FFM, fat-free mass; FVC, forced vital capacity (L); FEV1/FVC, ratio of FEV1 to FVC (%); PBF, percent body fat (%); kHz, kilohertz; phase angle, bioimpedance-derived phase angle (°); LL/RA/RL/TR, left leg/right arm/right leg/trunk; SHAP, Shapley Additive exPlanations; TBW, total body water; VO2peak, peak oxygen uptake (mL·kg−1·min−1).

Table 4

Mean absolute SHAP values (A)/Pearson’s correlation coefficients (B) between top 15 predictors in SHAP analysis with VO2peak

Rank Predictors (A)/(B)
1 FEV1 (L) 0.648/0.423 (P<0.001*)
2 FVC (L) 0.449/0.380 (P<0.001*)
3 Age 0.325/−0.377 (P<0.001*)
4 50kHz-RL phase angle 0.303/0.261 (P<0.001*)
5 PBF 0.266/−0.302 (P<0.001*)
6 50kHz-LL phase angle 0.239/0.302 (P<0.001*)
7 Weight 0.201/0.019 (P<0.734)
8 50kHz-TR phase angle 0.151/0.004 (P<0.938)
9 50kHz-whole body phase angle 0.129/0.126 (P<0.024*)
10 ECW/TBW (tr) 0.125/−0.275 (P<0.001*)
11 250kHz-LL phase angle 0.099/0.266 (P<0.001*)
12 ECW/TBW (ll) 0.092/−0.267 (P<0.001*)
13 5kHz-RA phase angle 0.092/0.163 (P<0.003*)
14 ECW/TBW (rl) 0.090/−0.229 (P<0.001*)
15 TBW/FFM 0.085/0.020 (P<0.716)

Global importance is summarised by mean(|SHAP|), the mean absolute SHAP value. Pearson’s correlation coefficients are presented as r with corresponding P values. ECW, extracellular water; FEV1, forced expiratory volume in 1 second; FFM, fat-free mass; FVC, forced vital capacity; LL, left leg; PBF, percent body fat; RA, right arm; RL, right leg; SHAP, Shapley Additive exPlanations; TBW, total body water; TR, trunk; VO2peak, peak oxygen uptake.

Pulmonary reserve measures were the dominant contributors. FEV1 (L) ranked first [mean(|SHAP|) =0.648], and FVC (L) ranked second (0.449), and both showed moderate positive correlations with VO2peak (FEV1: r=0.423; FVC: r=0.380; both P<0.001). Consistent with the SHAP distributions, higher FEV1 and FVC values were predominantly associated with positive SHAP values, indicating upward shifts in predicted VO2peak. Age was the third most important predictor (0.325) and demonstrated an inverse pattern in both analyses, with higher age associated with negative SHAP values and a negative correlation with VO2peak (r=−0.377, P<0.001).

Markers of skeletal muscle quality also contributed substantially to the model. Bioimpedance-derived phase angle at 50 kHz, particularly in the lower extremities, featured prominently: right-leg (RL) phase angle ranked fourth (0.303; r=0.261, P<0.001) and left-leg (LL) phase angle ranked sixth (0.239; r=0.302, P<0.001). In the SHAP plot, higher lower-extremity phase angle values tended to correspond to positive SHAP values, consistent with the observed positive association between phase angle and measured VO2peak. In contrast, indices reflecting adverse body composition and fluid distribution showed negative directionality. Percent body fat (PBF) ranked fifth (0.266) and was inversely associated with VO2peak (r=−0.302, P<0.001). Weight (rank seventh; 0.201) and trunk phase angle (rank eighth; 0.151) exhibited comparatively small and more centrally distributed SHAP effects, and neither showed a significant linear correlation with VO2peak (weight: r=0.019, P<0.734; trunk phase angle: r=0.004, P<0.938).

Whole-body phase angle ranked ninth (0.129) and showed a small but statistically significant positive correlation with VO2peak (r=0.126, P<0.024), whereas ECW/TBW (trunk) ranked tenth (0.125) and correlated negatively with VO2peak (r=−0.275, P<0.001), with higher values tending to contribute negative SHAP values. Additional lower-ranked contributors included 250kHz-ll phase angle (rank eleventh; 0.099; r=0.266, P<0.001), ECW/TBW (ll) (rank twelfth; 0.092; r=−0.267, P<0.001), 5kHz-ra phase angle (rank thirteenth; 0.092; r=0.163, P<0.003), ECW/TBW (rl) (rank fourteenth; 0.090; r=−0.229, P<0.001), and TBW/FFM (rank fifteenth; 0.085; r=0.020, P<0.716). Overall, the SHAP analysis suggests that the model’s predictions were primarily driven by pulmonary function (FEV1, FVC), age, and selected bioimpedance-derived variables, including lower-extremity phase angle and ECW/TBW. These patterns are physiologically plausible but should be interpreted as exploratory associations rather than evidence of a validated mechanistic pathway.

Exploratory threshold-based classification at the 20 mL·kg−1·min−1 threshold

To explore threshold-based classification performance, we assessed the models at the 20 mL·kg−1·min−1 VO2peak cutoff, a clinically relevant threshold used in ESTS/ERS guidance to inform further physiologic assessment before lung resection (5,17). In clinical practice, formal CPET contributes to the identification of patients with limited functional capacity who may require comprehensive physiological assessment. The present analysis did not directly predict postoperative outcomes but examined whether model-estimated VO2peak could support prioritisation for formal CPET. Accordingly, the classification analysis was interpreted as an exploratory assessment of potential CPET prioritisation, with attention to sensitivity and false-negative classifications because missing patients with low functional capacity could have important clinical consequences. The experimental settings for this classification analysis were identical to those employed for the regression models. Classification results across algorithms are summarized in Table 5, and ROC plots at the 20 mL·kg−1·min−1 VO2peak threshold are presented in Figure 2.

Table 5

Comparison of prediction model performance with 95% CI for VO2peak at 20 mL·kg−1·min−1 threshold, a critical cutoff in ESTS/ERS guidelines

Model AUROC ↑ Accuracy ↑ Recall ↑ F1-score ↑
KNN 0.762±0.129 (0.670–0.854) 0.779±0.063 (0.734–0.825) 0.269±0.202 (0.125–0.414) 0.327±0.202 (0.183–0.472)
SVC 0.813±0.074 (0.761–0.866) 0.786±0.048 (0.752–0.821) 0.255±0.186 (0.122–0.388) 0.312±0.205 (0.165–0.458)
RF 0.842±0.072 (0.790–0.894) 0.797±0.077 (0.741–0.852) 0.574±0.255 (0.391–0.756) 0.538±0.178 (0.411–0.665)
XGB 0.824±0.101 (0.752–0.896) 0.817±0.075 (0.764–0.871) 0.586±0.235 (0.417–0.754) 0.574±0.191 (0.438–0.711)
DNN (MLP) 0.797±0.072 (0.746–0.849) 0.803±0.061 (0.760–0.847) 0.493±0.226 (0.331–0.654) 0.506±0.184 (0.374–0.638)
FT-Transformer 0.819±0.093 (0.752–0.885) 0.787±0.078 (0.732–0.843) 0.677±0.194 (0.538–0.816) 0.594±0.151 (0.487–0.702)
TabTransformer 0.815±0.091 (0.749–0.880) 0.753±0.063 (0.708–0.798) 0.648±0.179 (0.520–0.776) 0.545±0.115 (0.462–0.627)
TabNet 0.810±0.134 (0.714–0.905) 0.769±0.081 (0.711–0.827) 0.734±0.232 (0.568–0.900) 0.591±0.146 (0.486–0.695)

Data are presented as mean ± standard deviation (95% CI). Upward arrows (↑) indicate that higher values represent better predictive performance. AUROC, area under the receiver operating characteristic curve; CI, confidence interval; DNN, deep neural network; ERS, European Respiratory Society; ESTS, European Society of Thoracic Surgeons; KNN, k-nearest neighbors; MLP, multilayer perceptron; RF, random forest; SVC, support vector classification; VO2peak, peak oxygen uptake; XGB, extreme gradient boosting.

Figure 2 Mean receiver operating characteristic curves for the top five full predictor-set models based on AUROC in the threshold-based classification task for VO2peak <20 mL·kg−1·min−1. Curves were averaged across 10 repeated test evaluations. The shaded areas represent variability across repeated evaluations. AUROC, area under the receiver operating characteristic curve; RF, random forest; SVC, support vector classification; VO2peak, peak oxygen uptake; XGB, extreme gradient boosting.

Specifically, RF achieved the highest AUROC (0.842±0.072), indicating the best overall discrimination across classification thresholds, whereas XGBoost achieved the highest accuracy (0.817±0.075), reflecting the highest overall proportion of correctly classified patients. However, because this analysis aimed to identify patients with VO2peak <20 mL·kg−1·min−1, recall, or sensitivity, is also important because it reflects the proportion of low-VO2peak patients correctly identified by the model. In this respect, TabNet showed the highest recall (0.734±0.232), followed by FT-Transformer (0.677±0.194) and TabTransformer (0.648±0.179). FT-Transformer achieved the highest F1-score (0.594±0.151), with TabNet showing a very similar value (0.591±0.146), indicating the balance between precision and recall. Taken together, these findings suggest a trade-off between overall discrimination (RF), classification accuracy (XGBoost), and sensitivity-oriented classification performance (TabNet and FT-Transformer), which should be considered when interpreting false-positive and false-negative classifications.


Discussion

Preoperative functional assessment remains central to perioperative evaluation in thoracic oncology because reduced cardiopulmonary reserve is associated with postoperative pulmonary complications after lung resection (18). Recent TLCR work has further shown that preoperative factors such as age, body mass index, smoking, poor physical condition, respiratory disease, diabetes, and neurological comorbidity carry substantial predictive value for postoperative pneumonia after thoracoscopic lung cancer surgery, reinforcing the importance of preoperative physiologic evaluation (19). Although CPET is the reference standard, its availability remains limited in some settings (20). Estimating VO2peak from routine preoperative data may therefore help prioritise formal CPET or inform consideration of prehabilitation when resources are limited, rather than replace formal testing (21,22).

VO2peak was estimated from non-invasive preoperative variables with moderate regression performance (R2=0.323±0.153; MAE =2.901±0.599 mL/kg/min) and reasonable discrimination at the 20 mL·kg−1·min−1 threshold (AUROC up to 0.842). The modest R2 indicates that the model explained only approximately one-third of the variance in measured VO2peak. This is consistent with recent applications of machine learning and interpretability methods in cardiothoracic risk assessment (23-26). Overall, RF provided the strongest discrimination, XGBoost showed the most favorable overall classification accuracy, and TabNet and FT-Transformer offered more sensitivity-oriented classification performance, with FT-Transformer achieving the best balance between precision and recall.

Importantly, the present model was not designed to predict postoperative pulmonary complications, mortality, or other surgical outcomes directly. Rather, it was designed to estimate CPET-derived VO2peak, a physiologic measure associated with perioperative risk. This distinction is clinically relevant because the model is intended to support an earlier decision point in the preoperative pathway: identifying patients who may warrant prioritised formal CPET or further preoperative evaluation. In this sense, the proposed approach should be interpreted as an exploratory CPET-prioritisation aid based on routinely available data, rather than as a replacement for formal physiologic testing or a stand-alone operative risk calculator.

Although VO2peak is a clinically important CPET-derived measure, it represents only one dimension of CPET. Formal CPET provides multiple additional parameters, including the ventilatory equivalent for carbon dioxide (VE/VCO2) slope, anaerobic threshold, electrocardiographic response, and evidence of ventilatory limitation. These indices may provide information on ventilatory efficiency, cardiovascular response, respiratory mechanics, and postoperative risk that cannot be captured by VO2peak estimation alone. Therefore, the present model should not be interpreted as capturing the full multidimensional physiologic information provided by CPET or as replacing formal CPET. Future studies should evaluate whether routinely available preoperative data can be used to estimate VE/VCO2 slope or multiple CPET-derived indices, and whether multi-index prediction improves CPET prioritisation and perioperative decision support.

Predicted VO2peak should therefore be viewed as supportive information for considering CPET prioritisation and possible prehabilitation assessment, not as a basis to defer formal testing (27,28). This potential role is also consistent with TLCR guidance on enhanced recovery after lung surgery and with recent TLCR work that frames the preoperative treatment interval as a potential window for prehabilitation rather than merely a waiting period (29,30). Although the model showed reasonable discrimination at the 20 mL·kg−1·min−1 threshold, sensitivity was only moderate. This is clinically important because false-negative classification of patients with low VO2peak could delay formal CPET or further physiologic assessment. Therefore, threshold selection should be further tuned according to the intended clinical use. In a screening or prioritisation context, a more sensitivity-oriented threshold may be preferable to reduce false negatives, although this would increase false positives and additional CPET referrals. Prospective validation and calibration are needed before any threshold can be recommended for clinical use. The ROC analysis further supported this interpretation by showing reasonable threshold-dependent discrimination for the top five full predictor-set models based on AUROC. However, ROC curves do not define an optimal clinical threshold by themselves, and threshold selection should still be guided by the intended clinical use and prospective validation.

Model interpretability is important for clinical adoption of machine learning (31) and supports the biological plausibility of this VO2peak prediction approach. This interpretation-first framing is also in line with recent TLCR original research in thoracic oncology, in which IML using readily available data was developed to support clinical interpretation rather than replace clinical judgment (32). The prominence of spirometric indices, particularly FEV1 and FVC, is consistent with the contribution of ventilatory capacity to exercise tolerance among candidates for lung resection, although these variables may also partly reflect body size. Age also contributed materially, consistent with the age-related decline in cardiorespiratory fitness. These findings suggest that routine demographic and pulmonary function measures capture substantial information about functional reserve. However, the additional feature-set comparison indicated that the expanded model was not driven solely by spirometric variables. Rather, the body-composition-only and basic-plus-body-composition models showed performance comparable to or better than the corresponding spirometry-only feature sets, and the full predictor set showed the strongest overall regression performance. Therefore, the practical value of the expanded model should be interpreted not as replacing spirometry, but as integrating spirometric and bioimpedance-derived information in an interpretable manner.

The inclusion of bioimpedance-derived variables was motivated by their routine availability and their potential to capture non-respiratory dimensions of preoperative physiologic status, such as body composition, hydration, and tissue-related characteristics. In the present model, several bioimpedance-derived variables appeared among the influential predictors. The feature-set comparison further suggested that bioimpedance-derived variables may provide complementary information beyond spirometric variables, as the body-composition-only and basic-plus-body-composition feature sets performed favorably compared with the corresponding spirometry-based feature sets. Nevertheless, these results should not be interpreted as indicating that bioimpedance-derived variables replace established spirometric predictors. Rather, they suggest that bioimpedance-derived variables may add valuable complementary information to conventional pulmonary function measures in routine preoperative assessment.

Phase angle measures contributed to prediction alongside spirometric variables, with lower-extremity phase angle ranking among the influential non-spirometric predictors. This finding is physiologically plausible because phase angle has been interpreted as a marker related to cellular membrane properties, hydration status, and nutritional or functional status (33). However, the present study was designed for prediction rather than mechanistic validation. Therefore, the association between phase angle and estimated VO2peak should be interpreted as exploratory and should not be taken as evidence of a validated causal or mechanistic pathway. Although conventional muscle-mass indices such as SMI were less influential than phase-angle variables, collinearity among bioimpedance-derived variables and the non-causal nature of SHAP preclude mechanistic inference (34,35). Exercise performance may reflect not only pulmonary limitation and body size, but also peripheral muscle function and metabolic reserve (36), which are relevant to multimodal prehabilitation frameworks (37,38). Nevertheless, focused prospective studies are needed to evaluate whether phase angle has an independent physiologic relationship with CPET-derived exercise capacity.

From a practical perspective, these findings suggest that bioimpedance-derived variables may enrich routine preoperative assessment without adding a substantial procedural burden when bioelectrical impedance analysis is already available in clinical practice. These markers should not be interpreted causally or in isolation, but they may provide supportive information about muscle quality, hydration status, or systemic reserve that could inform further clinical assessment. Accordingly, the added value of an interpretable model in this setting lies not only in prediction accuracy, but also in making these multidimensional contributors clinically visible at the individual-patient level.

Adiposity and fluid-distribution indices also contributed to prediction, suggesting that the model captures physiological burden not fully represented by body weight alone. PBF may reflect mechanical load, whereas extracellular water to total body water ratios may reflect altered body composition or fluid balance. Because the outcome was weight-normalized VO2peak, some associations may partly reflect ratio scaling. Even so, these variables may broaden characterization of preoperative vulnerability beyond pulmonary function alone.

These findings remain relevant to contemporary thoracic practice, in which perioperative outcomes depend on patient-level physiological reserve as well as operative approach. Recent debate regarding the incremental predictive utility of CPET underscores the need for objective, clinically interpretable measures that capture systemic fitness rather than lung mechanics alone (7).

If implemented clinically, the model could be considered as supportive information when CPET capacity is constrained, particularly for prioritising formal CPET or considering prehabilitation assessment, rather than as a substitute for formal testing. Its safety would depend on calibration and threshold-specific operating characteristics, and a decision-analytic evaluation would be needed to determine whether it improves outcomes or resource allocation.

Several limitations should be acknowledged. First, the single-centre retrospective design limits generalizability, and external multicentre validation is required before the model can be considered for clinical implementation. Predictive performance was moderate, likely reflecting heterogeneity not fully captured by the available predictors. DLCO was unavailable for all patients, limiting comparability with ERS/ESTS guideline-based assessment. Other relevant variables, including comorbidity burden, smoking exposure, anaemia, tumor stage, and planned extent of resection, were not included. In addition, because the final analytic cohort was based on complete-case inclusion, patients with missing required predictor or outcome data were excluded. This may have introduced selection bias if excluded patients differed systematically from those included in the analysis. Although routine preoperative assessments were required to have been performed within 6 months of CPET and the observed intervals were generally short, temporal variability between routine assessments and CPET may still have influenced the measured associations in some patients.

Second, the scope of the model should be interpreted cautiously. Although phase angle was among the influential non-spirometric predictors, this observational prediction study cannot determine whether phase angle has a causal or mechanistic relationship with VO2peak. The analysis focused on VO2peak and a single clinically relevant threshold, rather than the full continuum of exercise capacity, other CPET-derived indices, or postoperative outcomes. The model did not estimate other clinically important CPET-derived information, such as VE/VCO2 slope, anaerobic threshold, electrocardiographic response, or ventilatory limitation, and therefore cannot capture the full multidimensional physiologic assessment provided by formal CPET. Future studies should evaluate whether VE/VCO2 slope or multiple CPET-derived indices can be estimated from routinely available preoperative data. Prospective studies are also needed to determine whether model-estimated VO2peak can inform CPET prioritisation or prehabilitation assessment and whether such use improves postoperative outcomes. In addition, threshold selection was not prospectively optimized for clinical implementation. Although the model showed reasonable discrimination, sensitivity was moderate at the evaluated threshold, and alternative thresholds may be required depending on whether the intended use prioritizes sensitivity, specificity, or resource efficiency. The 20 mL·kg−1·min−1 threshold may function as a sensitivity- or negative predictive value-oriented screening threshold for CPET prioritisation, whereas the 15 mL·kg−1·min−1 threshold may represent a specificity- or positive predictive value-oriented high-risk cutoff (39). Because few patients in this cohort had VO2peak ≤15 mL·kg−1·min−1, the performance and clinical role of this lower threshold should be evaluated in larger external cohorts.


Conclusions

Limited access to CPET remains a practical bottleneck in preoperative lung resection pathways. Using routinely available clinical, spirometric, and bioimpedance-derived measures, this study developed and internally evaluated an IML model to estimate VO2peak and explore threshold-based classification at 20 mL·kg−1·min−1. The model may provide supportive information for prioritising formal CPET when resources are constrained, while offering patient-level explanations of model predictions. Further multicentre validation and prospective implementation studies are needed to determine whether model-informed CPET prioritisation improves perioperative outcomes or CPET utilization.


Acknowledgments

During the preparation of this work, the authors used ChatGPT (OpenAI) to improve the manuscript’s grammar, readability, and phrasing. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the integrity and accuracy of the published article.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0351/rc

Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0351/dss

Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0351/prf

Funding: This study was supported by the Biomedical Research Institute Grant, Pusan National University Hospital (No. 20240050).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0351/coif). Se-Hun Kim, S.E.P., C.H.H., K.H.K., and Sang-Hun Kim report receiving support from Biomedical Research Institute Grant, Pusan National University Hospital. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of Pusan National University Hospital (IRB No. 2407-016-141), and individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Stéphan F, Boucheseiche S, Hollande J, et al. Pulmonary complications following lung resection: a comprehensive analysis of incidence and possible risk factors. Chest 2000;118:1263-70. [Crossref] [PubMed]
  2. Benzo R, Kelley GA, Recchi L, et al. Complications of lung resection and exercise capacity: a meta-analysis. Respir Med 2007;101:1790-7. [Crossref] [PubMed]
  3. American Thoracic Society. ATS/ACCP Statement on cardiopulmonary exercise testing. Am J Respir Crit Care Med 2003;167:211-77.
  4. Brunelli A, Kim AW, Berger KI, et al. Physiologic evaluation of the patient with lung cancer being considered for resectional surgery: Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 2013;143:e166S-e190S.
  5. Brunelli A, Charloux A, Bolliger CT, et al. ERS/ESTS clinical guidelines on fitness for radical therapy in lung cancer patients (surgery and chemo-radiotherapy). Eur Respir J 2009;34:17-41. [Crossref] [PubMed]
  6. Pele I, Mihălțan FD. Cardiopulmonary exercise testing in thoracic surgery. Pneumologia 2020;69:3-10.
  7. Filakovszky Á, Brat K, Tschoellitsch T, et al. Cardiopulmonary exercise testing before lung resection surgery: still indicated? Evaluating predictive utility using machine learning. Thorax 2026;81:474-82.
  8. Akamatsu Y, Kusakabe T, Arai H, et al. Phase angle from bioelectrical impedance analysis is a useful indicator of muscle quality. J Cachexia Sarcopenia Muscle 2022;13:180-9. [Crossref] [PubMed]
  9. Abdelnour D, Grove Ii M, Pulford-Thorpe K, et al. Associations between absolute and relative handgrip strength with fitness and fatness. Sports Med Int Open 2025;9:a25377537. [Crossref] [PubMed]
  10. Fearon K, Strasser F, Anker SD, et al. Definition and classification of cancer cachexia: an international consensus. Lancet Oncol 2011;12:489-95. [Crossref] [PubMed]
  11. Jeong D, Oh YM, Lee SW, et al. Comparison of Predicted Exercise Capacity Equations in Adult Korean Subjects. J Korean Med Sci 2022;37:e113. [Crossref] [PubMed]
  12. Kelly CJ, Karthikesalingam A, Suleyman M, et al. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019;17:195. [Crossref] [PubMed]
  13. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in neural information processing systems [Internet]. 2017 [cited 2026 May 8];30.
  14. Huang X, Khetan A, Cvitkovic M, et al. Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:201206678 [Internet]. 2020 [cited 2026 May 8];
  15. Gorishniy Y, Rubachev I, Khrulkov V, et al. Revisiting deep learning models for tabular data. Advances in neural information processing systems. 2021;34:18932-43.
  16. Arik SÖ, Pfister T. Tabnet: Attentive interpretable tabular learning. Proceedings of the AAAI conference on artificial intelligence 2021;6679-6687. [Internet].
  17. Orlandi R, Rinaldo RF, Mazzucco A, et al. Early outcomes of "low-risk" patients undergoing lung resection assessed by cardiopulmonary exercise testing: Single-institution experience. Front Surg 2023;10:1130919. [Crossref] [PubMed]
  18. Petrella F, Cara A, Cassina EM, et al. Evaluation of preoperative cardiopulmonary reserve and surgical risk of patients undergoing lung cancer resection. Ther Adv Respir Dis 2024;18:17534666241292488. [Crossref] [PubMed]
  19. Bian H, Liu M, Liu J, et al. Seven preoperative factors have strong predictive value for postoperative pneumonia in patients undergoing thoracoscopic lung cancer surgery. Transl Lung Cancer Res 2023;12:2193-208. [Crossref] [PubMed]
  20. Ferguson M, Shulman M. Cardiopulmonary Exercise Testing and Other Tests of Functional Capacity. Curr Anesthesiol Rep 2022;12:26-33. [Crossref] [PubMed]
  21. Melnyk M, Casey RG, Black P, et al. Enhanced recovery after surgery (ERAS) protocols: Time to change practice? Can Urol Assoc J 2011;5:342-8. [Crossref] [PubMed]
  22. Solaini L, Prusciano F, Bagioni P, et al. Video-assisted thoracic surgery (VATS) of the lung: analysis of intraoperative and postoperative complications over 15 years and review of the literature. Surg Endosc 2008;22:298-310. [Crossref] [PubMed]
  23. Betts KS, Marathe SP, Chai K, et al. A machine learning approach to predicting 30-day mortality following paediatric cardiac surgery: findings from the Australia New Zealand Congenital Outcomes Registry for Surgery (ANZCORS). Eur J Cardiothorac Surg 2023;64:ezad160. [Crossref] [PubMed]
  24. Hui V, Litton E, Edibam C, et al. Using machine learning to predict bleeding after cardiac surgery. Eur J Cardiothorac Surg 2023;64:ezad297. [Crossref] [PubMed]
  25. Qiu X, Hu S, Dong S, et al. Construction of an automated machine learning-based predictive model for postoperative pulmonary complications risk in non-small cell lung cancer patients undergoing thoracoscopic surgery. PLoS One 2025;20:e0333413. [Crossref] [PubMed]
  26. Chen S, Deng T, Yang Q, et al. Development and validation of an explainable machine learning model for predicting postoperative pulmonary complications after lung cancer surgery: a machine learning study. EClinicalMedicine 2025;86:103386. [Crossref] [PubMed]
  27. Zhou N, Ripley-Gonzalez JW, Zhang W, et al. Preoperative exercise training decreases complications of minimally invasive lung cancer surgery: A randomized controlled trial. J Thorac Cardiovasc Surg 2025;169:516-528.e10. [Crossref] [PubMed]
  28. Guo Y, Pan M, Xiong M, et al. Efficacy of preoperative pulmonary rehabilitation in lung cancer patients: a systematic review and meta-analysis of randomized controlled trials. Discov Oncol 2025;16:56. [Crossref] [PubMed]
  29. Geomini LD, van Steenwijk QCA, Janki S, et al. Redefining treatment interval in lung cancer surgery in the era of prehabilitation: a systematic review. Transl Lung Cancer Res 2025;14:5082-98. [Crossref] [PubMed]
  30. Gao S, Barello S, Chen L, et al. Clinical guidelines on perioperative management strategies for enhanced recovery after lung surgery. Transl Lung Cancer Res 2019;8:1174-87. [Crossref] [PubMed]
  31. Band SS, Yarahmadi A, Hsu CC, et al. Application of explainable artificial intelligence in medical health: A systematic review of interpretability methods. Informatics in Medicine Unlocked 2023;40:101286.
  32. Chen Y, Jin J, Mao Y, et al. Development and validation of an interpretable machine learning model for prediction of occult lymph node metastasis in clinical stage T1 lung adenocarcinoma. Transl Lung Cancer Res 2025;14:5415-30. [Crossref] [PubMed]
  33. Prete M, Ballarin G, Porciello G, et al. Bioelectrical impedance analysis-derived phase angle (PhA) in lung cancer patients: a systematic review. BMC Cancer 2024;24:608. [Crossref] [PubMed]
  34. Kirk B, Cawthon PM, Arai H, et al. The Conceptual Definition of Sarcopenia: Delphi Consensus from the Global Leadership Initiative in Sarcopenia (GLIS). Age Ageing 2024;53:afae052. [Crossref] [PubMed]
  35. Hou S, Zhao X, Wei J, et al. The diagnostic performance of phase angle for sarcopenia among older adults: A systematic review and diagnostic meta-analysis. Arch Gerontol Geriatr 2025;131:105754. [Crossref] [PubMed]
  36. Sietsema KE, Stringer WW, Sue DY, et al. Wasserman & Whipp’s: principles of exercise testing and interpretation: including pathophysiology and clinical applications. Lippincott Williams & Wilkins, 2020.
  37. Cruz Mosquera FE, Murillo SR, Naranjo Rojas A, et al. Effect of Exercise and Pulmonary Rehabilitation in Pre- and Post-Surgical Patients with Lung Cancer: Systematic Review and Meta-Analysis. Medicina (Kaunas) 2024;60:1725. [Crossref] [PubMed]
  38. Granger C, Cavalheri V. Preoperative exercise training for people with non-small cell lung cancer. Cochrane Database Syst Rev 2022;9:CD012020. [Crossref] [PubMed]
  39. Yang N, Shi Z, Liu J, et al. Key cardiopulmonary exercise testing indicators for predicting the risk of postoperative cardiopulmonary complications in patients undergoing thoracoscopic lung resection. Front Surg 2025;12:1765398. [Crossref] [PubMed]
Cite this article as: Kim SH, Park SE, Hong CH, Park TS, Shin MJ, Kim KH, Kim SH. Prediction of peak oxygen uptake using interpretable machine learning on routinely available preoperative assessments before lung resection. Transl Lung Cancer Res 2026;15(6):171. doi: 10.21037/tlcr-2026-0351

Download Citation