Prediction of peak oxygen uptake using interpretable machine learning on routinely available preoperative assessments before lung resection

Se-Hun Kim; Sa-Eun Park; Cho Hui Hong; Tae-Sung Park; Myung-Jun Shin; Ki-Hun Kim; Sang-Hun Kim

doi:10.21037/tlcr-2026-0351

Original Article

Prediction of peak oxygen uptake using interpretable machine learning on routinely available preoperative assessments before lung resection

Se-Hun Kim¹, Sa-Eun Park¹, Cho Hui Hong², Tae-Sung Park^2,3, Myung-Jun Shin^2,4, Ki-Hun Kim^1,5, Sang-Hun Kim^2,4

¹Department of Industrial Engineering, Pusan National University, Busan, Republic of Korea; ²Biomedical Research Institute, Pusan National University Hospital, Busan, Republic of Korea; ³Department of Convergence Medical Institute of Technology, Pusan National University Hospital, Busan, Republic of Korea; ⁴Department of Rehabilitation Medicine, Pusan National University Hospital, Pusan National University School of Medicine, Busan, Republic of Korea; ⁵Graduate School of Data Science, Pusan National University, Busan, Republic of Korea

Contributions: (I) Conception and design: All authors; (II) Administrative support: All authors; (III) Provision of study materials or patients: All authors; (IV) Collection and assembly of data: CH Hong; (V) Data analysis and interpretation: Se-Hun Kim, SE Park, KH Kim, Sang-Hun Kim; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Ki-Hun Kim, PhD. Department of Industrial Engineering, Graduate School of Data Science, Pusan National University, 2 Busandaehak-ro 63beon-gil, Geumjeong-gu, 46241, Busan, Republic of Korea. Email: kihun@pusan.ac.kr; Sang-Hun Kim, MD, PhD. Department of Rehabilitation Medicine, Biomedical Research Institute, Pusan National University Hospital, Pusan National University School of Medicine, 179 Gudeok-Ro Seo-Gu, 49241, Busan, Republic of Korea. Email: kel5504@gmail.com.

Background: Cardiopulmonary exercise testing (CPET) is the reference standard for preoperative functional assessment before lung resection, but its use is limited by resource and practical constraints. This study developed and internally evaluated an interpretable machine learning model to estimate preoperative peak oxygen uptake (VO₂peak) from routine clinical assessments, quantify the contribution of individual predictors to model estimation, and evaluate classification performance at the clinically relevant threshold of 20 mL·kg⁻¹·min⁻¹.

Methods: This single-centre retrospective study included 320 consecutive patients in South Korea who underwent preoperative treadmill CPET between April 2018 and March 2024. Thirty-three routinely available predictors—including demographic characteristics, anthropometric measures, pulmonary function results, and bioimpedance-derived indices—were used to train predictive models. Model development employed 10 repeats of 5-fold nested cross-validation. The best-performing model was interpreted using Shapley additive explanations. Potential prioritisation performance was assessed by classifying patients with VO₂peak below the prespecified threshold.

Results: Random forest showed the best performance, with a root mean square error of 3.750±0.731, a mean absolute error of 2.901±0.599, and a coefficient of determination of 0.323±0.153, indicating moderate explanatory performance for VO₂peak estimation. The most influential predictors were forced expiratory volume in 1 second, forced vital capacity, age, and bioimpedance-derived phase angle. At the threshold of 20 mL·kg⁻¹·min⁻¹, the model achieved an area under the receiver operating characteristic curve of 0.842±0.072.

Conclusions: This predictive model showed moderate performance for estimating VO₂peak, identified clinically relevant contributors, and may support prioritisation for CPET and consideration of prehabilitation assessment when resources are limited.

Keywords: Thoracic surgery; preoperative risk assessment; interpretable machine learning (IML); peak oxygen uptake (VO₂peak); bioelectrical impedance analysis

Submitted Mar 30, 2026. Accepted for publication May 14, 2026. Published online May 28, 2026.

doi: 10.21037/tlcr-2026-0351

Highlight box

Key findings

• An interpretable machine learning model was developed to estimate preoperative peak oxygen uptake (VO₂peak) from routinely available demographic, spirometric, and bioimpedance-derived variables in patients undergoing lung resection, while quantifying the contribution of individual predictors.

• Random forest showed the best overall regression performance, with moderate explanatory performance for continuous VO₂peak estimation, and achieved reasonable discrimination at the 20 mL·kg⁻¹·min⁻¹ threshold.

• Forced expiratory volume in 1 second (FEV1), forced vital capacity (FVC), age, and bioimpedance-derived phase angle were among the most influential predictors.

What is known and what is new?

• Cardiopulmonary exercise testing (CPET) is the reference standard for functional assessment before lung resection, but access may be limited by equipment, personnel, and workflow constraints.

• This study provides an interpretable, exploratory approach for estimating CPET-derived VO₂peak using routine preoperative data and suggests that bioimpedance-derived variables may offer complementary information beyond conventional spirometry.

What is the implication, and what should change now?

• The model may support prioritisation for formal CPET and inform targeted prehabilitation when testing capacity is limited.

• It should complement, not replace, standard physiologic evaluation before lung resection, and requires external validation before clinical implementation.

Introduction

Lung cancer remains the leading cause of cancer-related mortality worldwide, and surgical resection is central to curative treatment in operable disease. However, lung resection is associated with postoperative morbidity and mortality, particularly cardiopulmonary complications such as pneumonia and respiratory failure, which are strongly related to limited cardiopulmonary reserve (1,2). Accurate preoperative functional assessment is therefore essential to support patient selection, perioperative planning, and risk-informed care.

Cardiopulmonary exercise testing (CPET) is regarded as the reference standard for preoperative functional evaluation before major thoracic surgery (3). Among CPET-derived variables, peak oxygen uptake (VO₂peak) is strongly associated with postoperative outcomes and is incorporated into clinical guidelines for patients with impaired pulmonary function or elevated surgical risk (4,5). Despite its clinical value, CPET is not universally available because it requires specialized equipment, trained personnel, and time-intensive protocols. As a result, access varies across institutions and may delay preoperative pathways (6). In contemporary lung resection cohorts, the incremental predictive value of CPET-derived variables beyond routinely available clinical and pulmonary function data may be less pronounced than previously assumed (7). Rather than diminishing the importance of CPET, this perspective underscores the need for complementary tools that may support more efficient and targeted use of CPET based on accessible preoperative information.

Several preoperative measures are already available in routine care, including demographic characteristics, anthropometrics, bioimpedance-derived body composition indices—which may reflect muscle mass, fluid distribution, and tissue-related characteristics (8)—pulmonary function test (PFT) (3), and physical performance metrics (9). Nevertheless, exercise capacity in patients with lung cancer is influenced by complex disease-specific factors often absent in healthy cohorts (10), including coexisting chronic obstructive pulmonary disease (COPD), cumulative smoking history, cancer-associated cachexia, and frailty. Consequently, prediction equations derived from general populations, which often fail to account for ethnic and temporal variations in exercise capacity (11), may not be directly applicable to patients being evaluated for lung resection. Moreover, evidence in lung resection cohorts remains limited regarding practical approaches that use routine preoperative data to generate clinically interpretable estimates of functional reserve.

In developing such tools, conventional prediction equations in preoperative care have largely relied on linear regression approaches. While highly interpretable, these linear models often fail to capture the complex and non-linear interrelationships among physiological variables in a heterogeneous patient population. Although standard machine learning models can address this non-linearity, their “black-box” nature limits clinical understandability and hinders clinician trust. Interpretable machine learning (IML) offers a solution by modeling non-linear relationships while preserving transparency in how predictors contribute to the estimate (12). In preoperative care—where decisions regarding further functional testing and perioperative optimisation must remain clinically justifiable—this transparency is crucial. Ultimately, rather than replacing formal CPET, an IML model based on routinely available data may help preliminarily prioritise patients who should undergo formal CPET and may inform consideration of prehabilitation, pending further validation.

This study aimed to develop and internally evaluate an IML model for estimating preoperative VO₂peak in patients scheduled for lung resection. We present this article in accordance with the TRIPOD reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0351/rc).

Methods

The overall study process comprised four steps: (I) collection of preoperative clinical data, including demographic characteristics, anthropometrics, body composition indices, and PFT results; (II) development and internal evaluation of prediction models for estimating VO₂peak; (III) application of SHapley Additive exPlanations (SHAP) for model interpretation (13); and (IV) exploratory assessment of threshold-based classification performance at the 20 mL·kg⁻¹·min⁻¹ VO₂peak cutoff to examine the model’s potential role in CPET prioritisation. This approach was intended to explore whether routinely available preoperative data could provide supportive information for decision-making when access to formal testing is constrained.

Study population and baseline characteristics

This single-centre, retrospective study initially screened a consecutive source cohort of 688 patients scheduled for lung resection at Pusan National University Hospital between April 2018 and March 2024. The inclusion criteria were: (I) completion of preoperative CPET; (II) availability of demographics, anthropometrics, bioimpedance-derived body composition, and PFT results obtained within 6 months of the CPET; and (III) availability of data recorded before surgery. When multiple examinations were available for a single patient, measurements temporally closest to the CPET were selected.

A complete-case analysis was performed. Patients with missing required predictor or outcome variables were excluded, and no statistical imputation, including mean, multiple, or model-based imputation, was performed. This rigorous selection process resulted in a final complete-case analytic cohort of 320 patients.

Although the inclusion criteria allowed routine preoperative assessments to be performed within 6 months of the CPET, the actual intervals were substantially shorter in most patients. We separately calculated the intervals between CPET and major routine preoperative assessments. The median interval between CPET and PFT was 11.0 days [interquartile range (IQR), 5.0–27.0 days], with 96.0% of patients undergoing PFT within 3 months of CPET. For bioelectrical impedance analysis (BIA), the median interval from CPET was 0.0 days (IQR, 0.0–0.0 days), with 99.3% of patients undergoing BIA within 3 months of CPET. The maximum intervals were 167.0 days for PFT and 164.0 days for BIA.

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study protocol was approved by the Institutional Review Board of Pusan National University Hospital (IRB No. 2407-016-141), and the requirement for individual informed consent was waived due to the retrospective nature of the analysis.

CPET

All patients underwent symptom-limited CPET on a treadmill using a breath-by-breath gas analysis system (COSMED, Rome, Italy). Tests were conducted according to a modified Bruce protocol. Gas analyzers were calibrated before each assessment. Breath-by-breath data were time-averaged over 10-second intervals. The primary outcome, VO₂peak, was defined as the highest 30-second averaged VO₂ value obtained at test termination. This value was extracted directly from the standardized system report (PFT Ergo 10.0).

Bioimpedance-derived indices

BIA was performed using a multifrequency analyzer (InBody S10; InBody Co., Ltd., Seoul, Republic of Korea). Measurements were obtained with patients in the supine position after they had fasted for at least 2 hours, using a tetrapolar electrode system. Patients were instructed to avoid strenuous exercise for at least 12 hours before testing. The following BIA-derived indices were recorded:

Phase angle: calculated as the arctangent of reactance to resistance at 5, 50, and 250 kHz for each body segment, including right arm (RA), left arm (LA), right leg (RL), left leg (LL), trunk (TR), and whole body (WB). Higher values are generally considered to reflect better cellular membrane integrity and nutritional or functional status.
Extracellular water-to-total body water ratio (ECW/TBW): calculated as extracellular water divided by total body water and reported for the whole body and individual body segments.
Skeletal muscle index (SMI): calculated as appendicular skeletal muscle mass divided by height squared (kg/m²).

Model development and evaluation

Predictive performance was evaluated using 10 repeats of 5-fold nested cross-validation, with five outer folds used for performance estimation and five inner folds used for hyperparameter tuning. Each repeat was generated from an independent random partition to improve the robustness of performance estimates. As VO₂peak was treated as an outcome, all models were trained for regression. Continuous predictors were z-score standardised using statistics derived from each training split and then applied to the corresponding validation and test data. Sex was treated as a categorical variable and encoded as an indicator variable without standardisation.

Multiple linear regression (MLR) was used as a baseline model commonly employed in medical statistics. Additional models included k-nearest neighbour regression (kNN), support vector regression (SVR), random forest (RF), extreme gradient boosting (XGB), multi-layer perceptron (MLP), and tabular deep-learning architectures [TabTransformer (14), FT-Transformer (15), TabNet (16)]. Hyperparameters were optimised using Optuna within the inner cross-validation loop. The selected hyperparameter set was then refitted on the full outer-training split and evaluated on the corresponding held-out outer test fold.

The full predictor set (Set F), comprising demographic, anthropometric, spirometric, and bioimpedance-derived body composition variables, was used for the primary model-development analysis. To assess whether this full predictor set provided incremental value beyond simpler variable combinations, this study additionally conducted feature-set comparison analyses using five reduced comparator sets:

Set A: demographic and anthropometric variables;
Set B: Spirometric variables only, including FEV1, FVC, and FEV1/FVC;
Set C: Bioimpedance-derived body composition variables only;
Set D: demographic and anthropometric variables combined with spirometric variables;
Set E: demographic and anthropometric variables combined with bioimpedance-derived body composition variables.

For the feature-set comparison, we focused on the top-performing regression models identified in the full predictor-set analysis. The hyperparameter search spaces for the evaluated models are summarized in Table S1.

Interpretable post-hoc analysis of the best prediction models

For clinical interpretation, the RF model, which achieved the best overall performance, was selected as the final predictive model. To maximise clinical interpretability, this study applied the TreeSHAP algorithm to quantify how each variable increased or decreased the model's predicted VO₂peak relative to the average prediction. Global importance was summarised by the mean absolute SHAP values, and the sign of each SHAP value indicated the direction of its association with the predicted outcome.

Statistical analysis

Descriptive statistics were used to summarize the baseline characteristics of the study population, with continuous predictors presented as means ± standard deviations. To evaluate the linear associations between measured VO₂peak and the most influential predictors identified by the machine learning model, Pearson’s correlation coefficients (r) were calculated. A two-sided P value <0.05 was considered statistically significant. All predictive modeling and statistical analyses were performed using Python version 3.9.

Results

Baseline characteristics

Baseline characteristics are shown in Table 1. A total of 320 patients were included (mean age 69.122±7.640 years; 65.9% male), with a mean VO₂peak of 23.431±4.921 mL/kg/min.

Table 1

Baseline characteristics of research data

Characteristics and CPET values	Female (N=109)	Male (N=211)	All (N=320)
Demographics & anthropometrics
Age (years)	67.174±7.625	70.128±7.469	69.122±7.640
Height (cm)	154.673±5.253	165.488±10.915	161.804±10.683
Weight (kg)	57.734±9.010	66.082±14.228	63.238±13.283
Body composition indices
250kHz-LA phase angle (°)	6.386±2.071	6.502±1.298	6.463±1.601
250kHz-LL phase angle (°)	4.415±0.961	4.529±0.918	4.490±0.933
250kHz-RA phase angle (°)	6.600±2.285	6.655±1.443	6.636±1.771
250kHz-RL phase angle (°)	4.807±1.002	4.878±1.103	4.854±1.069
250kHz-TR phase angle (°)	3.654±4.623	3.892±3.218	3.811±3.750
50kHz-LA phase angle (°)	5.170±2.443	5.534±1.540	5.410±1.900
50kHz-LL phase angle (°)	5.458±1.172	5.846±1.188	5.714±1.195
50kHz-RA phase angle (°)	5.350±2.354	5.738±1.785	5.606±2.002
50kHz-RL phase angle (°)	5.705±1.193	6.063±1.288	5.941±1.266
50kHz-TR phase angle (°)	5.362±5.051	5.542±3.146	5.481±3.893
50kHz-whole body phase angle (°)	5.428±1.893	5.796±1.418	5.671±1.603
5kHz-LA phase angle (°)	2.157±1.531	2.329±1.095	2.271±1.261
5kHz-LL phase angle (°)	2.328±0.704	2.525±0.763	2.458±0.748
5kHz-RA phase angle (°)	2.212±1.217	2.422±1.148	2.350±1.175
5kHz-RL phase angle (°)	2.420±0.790	2.667±0.863	2.583±0.846
5kHz-TR phase angle (°)	3.211±4.132	3.020±2.331	3.085±3.060
Protein (kg)	7.658±1.207	10.799±9.196	9.729±7.641
SMI (kg/m²)	7.206±1.418	8.750±2.749	8.224±2.488
TBW/FFM (%)	73.626±0.336	75.228±15.272	74.683±12.416
PBF (%)	30.928±8.236	20.893±8.432	24.311±9.616
ECW/TBW (ratio)	0.392±0.011	0.389±0.010	0.390±0.011
ECW/TBW (LA) (ratio)	0.379±0.015	0.379±0.009	0.379±0.012
ECW/TBW (LL) (ratio)	0.398±0.013	0.397±0.012	0.397±0.012
ECW/TBW (RA) (ratio)	0.377±0.015	0.376±0.011	0.376±0.013
ECW/TBW (RL) (ratio)	0.393±0.013	0.390±0.014	0.391±0.013
ECW/TBW (TR) (ratio)	0.391±0.009	0.387±0.010	0.389±0.010
Pulmonary function indices
FEV1 (L)	1.912±0.409	2.311±0.632	2.175±0.596
FEV1/FVC (%)	74.358±9.148	67.009±10.765	69.513±10.807
FVC (L)	2.570±0.461	3.435±0.716	3.140±0.761
Cardiopulmonary exercise index
VO₂peak (mL·kg⁻¹·min⁻¹)	23.005±4.382	23.651±5.173	23.431±4.921

Data are presented as mean ± standard deviation. BIA, bioelectrical impedance analysis; CPET, cardiopulmonary exercise testing; ECW, extracellular water; ECW/TBW, extracellular water-to-total body water ratio; FEV1, forced expiratory volume in 1 s; FEV1/FVC, ratio of forced expiratory volume in 1 s to forced vital capacity; FFM, fat-free mass; FVC, forced vital capacity; LA, left arm; LL, left leg; PBF, percent body fat; RA, right arm; RL, right leg; SMI, skeletal muscle index; TBW, total body water; TR, trunk; VO₂peak, peak oxygen uptake; WB, whole body. Phase angles were calculated at 5, 50, and 250 kHz.

Comparison of model performance

Table 2 summarises model performance. RF showed the best overall regression performance [root mean square error (RMSE) 3.750±0.731; mean absolute error (MAE) 2.901±0.599; R-squared (R²): 0.323±0.153], although the margin over MLR was small. SVR and XGBoost also showed competitive performance, whereas the deep-learning models generally did not outperform RF.

Table 2

Regression performance of prediction models for VO₂peak with 95% CIs

Model	RMSE ↓	MAE ↓	R² ↑
MLR	3.791±0.914 (3.137–4.445)	2.946±0.716 (2.434–3.458)	0.322±0.169 (0.202–0.443)
KNN	4.016±0.744 (3.484–4.549)	3.256±0.628 (2.807–3.706)	0.232±0.134 (0.136–0.328)
SVR	3.817±0.697 (3.319–4.315)	3.047±0.592 (2.623–3.471)	0.305±0.118 (0.221–0.390)
RF	3.750±0.731 (3.227–4.272)	2.901±0.599 (2.472–3.330)	0.323±0.153 (0.214–0.433)
XGB	3.840±0.671 (3.360–4.320)	3.049±0.547 (2.657–3.440)	0.287±0.165 (0.168–0.405)
MLP	4.411±0.944 (3.736–5.086)	3.573±0.830 (2.979–4.166)	0.070±0.217 (−0.085–0.226)
FT-Transformer	4.089±0.771 (3.537–4.640)	3.284±0.661 (2.811–3.757)	0.172±0.326 (−0.061–0.405)
TabTransformer	4.498±0.836 (3.900–5.096)	3.661±0.739 (3.132–4.189)	−0.036±0.489 (−0.386–0.313)
TabNet	4.442±0.861 (3.826–5.059)	3.601±0.821 (3.014–4.188)	0.058±0.201 (−0.085–0.202)

Data are presented as mean ± standard deviation (95% CI). Downward arrows (↓) indicate that lower values represent better performance, whereas upward arrows (↑) indicate that higher values represent better performance. CI, confidence interval; KNN, k-nearest neighbors; MAE, mean absolute error; MLP, multilayer perceptron; MLR, multiple linear regression; R², coefficient of determination; RF, random forest; RMSE, root mean square error; SVR, support vector regression; VO₂peak, peak oxygen uptake; XGB, extreme gradient boosting.

The additional feature-set comparison is summarised in Table 3. Across RF, MLR, and SVR, the full predictor set showed the strongest overall regression performance. Within the full predictor set, RF achieved the lowest RMSE and MAE and the highest R². Among the reduced predictor sets, bioimpedance-derived body composition variables generally performed better than spirometric variables alone. Specifically, the lowest RMSE for Set C was observed with MLR [RMSE 3.979; 95% confidence interval (CI), 3.479–4.480], whereas the lowest RMSE for Set B was observed with SVR (RMSE 4.245; 95% CI, 3.668–4.822). Similarly, adding bioimpedance-derived body composition variables to demographic and anthropometric variables showed better performance than adding spirometric variables, with Set E achieving its lowest RMSE with SVR (RMSE 3.931; 95% CI, 3.419–4.442) and Set D achieving its lowest RMSE with MLR (RMSE 4.061; 95% CI, 3.466–4.656). These comparisons suggest that bioimpedance-derived body composition variables may provide complementary predictive information beyond conventional spirometry.

Table 3

Regression performance of different feature sets for VO₂peak prediction with 95% CIs

Predictor set	Model	RMSE ↓	MAE ↓	R² ↑
Set A	RF	4.503±0.725 (3.984–5.021)	3.623±0.699 (3.124–4.123)	0.024±0.166 (–0.094–0.143)
	MLR	4.405±0.745 (3.872–4.938)	3.573±0.680 (3.087–4.060)	0.073±0.125 (–0.016–0.162)
	SVR	4.813±0.937 (4.143–5.484)	3.818±0.582 (3.402–4.234)	−0.373±1.436 (–1.401–0.654)
Set B	RF	4.366±0.808 (3.788–4.944)	3.485±0.680 (2.999–3.971)	0.098±0.120 (0.012–0.183)
	MLR	4.251±0.788 (3.688–4.815)	3.403±0.661 (2.930–3.875)	0.146±0.111 (0.066–0.225)
	SVR	4.245±0.807 (3.668–4.822)	3.469±0.719 (2.955–3.983)	0.148±0.119 (0.063–0.233)
Set C	RF	4.021±0.796 (3.452–4.591)	3.229±0.647 (2.766–3.692)	0.229±0.155 (0.118–0.339)
	MLR	3.979±0.700 (3.479–4.480)	3.131±0.612 (2.693–3.569)	0.236±0.169 (0.115–0.357)
	SVR	3.995±0.732 (3.471–4.519)	3.147±0.639 (2.690–3.604)	0.241±0.110 (0.163–0.320)
Set D	RF	4.098±0.699 (3.598–4.598)	3.311±0.513 (2.945–3.678)	0.190±0.154 (0.080–0.300)
	MLR	4.061±0.832 (3.466–4.656)	3.254±0.628 (2.804–3.703)	0.219±0.123 (0.131–0.307)
	SVR	4.222±0.470 (3.886–4.557)	3.474±0.423 (3.172–3.777)	0.095±0.375 (–0.173–0.364)
Set E	RF	3.958±0.772 (3.406–4.511)	3.177±0.622 (2.732–3.621)	0.250±0.159 (0.137–0.364)
	MLR	4.009±0.806 (3.432–4.586)	3.133±0.661 (2.660–3.606)	0.233±0.164 (0.116–0.350)
	SVR	3.931±0.715 (3.419–4.442)	3.156±0.611 (2.719–3.593)	0.261±0.136 (0.164–0.358)
Set F	RF	3.750±0.731 (3.227–4.272)	2.901±0.599 (2.472–3.330)	0.323±0.153 (0.214–0.433)
	MLR	3.791±0.914 (3.137–4.445)	2.946±0.716 (2.434–3.458)	0.322±0.169 (0.202–0.443)
	SVR	3.817±0.697 (3.319–4.315)	3.047±0.592 (2.623–3.471)	0.305±0.118 (0.221–0.390)

Data are presented as mean ± standard deviation (95% CI). Set A: demographic and anthropometric variables; Set B: spirometric variables only, including FEV1, FVC, and FEV1/FVC; Set C: bioimpedance-derived body composition variables only; Set D: demographic and anthropometric variables combined with spirometric variables; Set E: demographic and anthropometric variables combined with bioimpedance-derived body composition variables; Set F: full predictor set. Downward arrows (↓) indicate that lower values represent better performance, whereas upward arrows (↑) indicate that higher values represent better performance. CI, confidence interval; FEV1, forced expiratory volume in 1 s; FEV1/FVC, ratio of forced expiratory volume in 1 s to forced vital capacity; FVC, forced vital capacity; MAE, mean absolute error; MLR, multiple linear regression; R², coefficient of determination; RF, random forest; RMSE, root mean square error; SVR, support vector regression; VO₂peak, peak oxygen uptake.

Variable importance results from SHAP analysis

Figure 1 shows the SHAP summary plot for the 15 predictors with the highest mean absolute SHAP values (mean(|SHAP|)) across observations, and Table 4 quantifies their mean(|SHAP|) rankings together with Pearson’s correlation coefficients with VO₂peak. Features are ordered by decreasing mean(|SHAP|). In the SHAP plot, each point represents an individual observation; the x-axis denotes the SHAP value (feature contribution relative to the model baseline), and colour indicates the feature value (blue = low, red = high). Overall, the SHAP patterns suggest that predicted VO₂peak reflects multiple domains of physiologic reserve, including pulmonary function, aging, skeletal muscle quality, body composition, and fluid distribution.

Figure 1 SHAP summary plot of the top 15 predictors in the VO₂peak prediction model. Predictors are ranked in descending order of importance on the y-axis according to mean absolute SHAP values. The x-axis represents the SHAP value (impact on model output), and each dot represents an individual patient. Dot colour indicates the feature value from low (blue) to high (red). ECW, extracellular water; ECW/TBW, extracellular water-to-total body water ratio; FEV1, forced expiratory volume in 1 s (L); FFM, fat-free mass; FVC, forced vital capacity (L); FEV1/FVC, ratio of FEV1 to FVC (%); PBF, percent body fat (%); kHz, kilohertz; phase angle, bioimpedance-derived phase angle (°); LL/RA/RL/TR, left leg/right arm/right leg/trunk; SHAP, Shapley Additive exPlanations; TBW, total body water; VO₂peak, peak oxygen uptake (mL·kg⁻¹·min⁻¹).

Table 4

Mean absolute SHAP values (A)/Pearson’s correlation coefficients (B) between top 15 predictors in SHAP analysis with VO₂peak

Rank	Predictors	(A)/(B)
1	FEV1 (L)	0.648/0.423 (P<0.001*)
2	FVC (L)	0.449/0.380 (P<0.001*)
3	Age	0.325/−0.377 (P<0.001*)
4	50kHz-RL phase angle	0.303/0.261 (P<0.001*)
5	PBF	0.266/−0.302 (P<0.001*)
6	50kHz-LL phase angle	0.239/0.302 (P<0.001*)
7	Weight	0.201/0.019 (P=0.734)
8	50kHz-TR phase angle	0.151/0.004 (P=0.938)
9	50kHz-whole body phase angle	0.129/0.126 (P=0.024*)
10	ECW/TBW (TR)	0.125/−0.275 (P<0.001*)
11	250kHz-LL phase angle	0.099/0.266 (P<0.001*)
12	ECW/TBW (LL)	0.092/−0.267 (P<0.001*)
13	5kHz-RA phase angle	0.092/0.163 (P=0.003*)
14	ECW/TBW (RL)	0.090/−0.229 (P<0.001*)
15	TBW/FFM	0.085/0.020 (P=0.716)

Global importance is summarised by mean(|SHAP|), the mean absolute SHAP value. Pearson’s correlation coefficients are presented as r with corresponding P values. *P<0.05 indicates statistical significance. ECW, extracellular water; FEV1, forced expiratory volume in 1 second; FFM, fat-free mass; FVC, forced vital capacity; LL, left leg; PBF, percent body fat; RA, right arm; RL, right leg; SHAP, Shapley Additive exPlanations; TBW, total body water; TR, trunk; VO₂peak, peak oxygen uptake.

Pulmonary reserve measures were the dominant contributors. FEV1 ranked first [mean(|SHAP|) = 0.648] and FVC ranked second (0.449); both showed moderate positive correlations with VO₂peak (FEV1: r=0.423; FVC: r=0.380; both P<0.001). Consistent with the SHAP distributions, higher FEV1 and FVC values were predominantly associated with positive SHAP values, indicating upward shifts in predicted VO₂peak. Age was the third most important predictor (0.325) and demonstrated an inverse pattern in both analyses, with higher age associated with negative SHAP values and a negative correlation with VO₂peak (r=−0.377, P<0.001).

Markers of skeletal muscle quality also contributed substantially to the model. Bioimpedance-derived phase angle at 50 kHz, particularly in the lower extremities, featured prominently: RL phase angle ranked fourth (0.303; r=0.261, P<0.001) and LL phase angle ranked sixth (0.239; r=0.302, P<0.001). In the SHAP plot, higher lower-extremity phase angle values tended to correspond to positive SHAP values, consistent with the observed positive association between phase angle and measured VO₂peak. In contrast, indices reflecting adverse body composition and fluid distribution showed negative directionality. Percent body fat (PBF) ranked fifth (0.266) and was inversely associated with VO₂peak (r=−0.302, P<0.001). Weight (rank seventh; 0.201) and TR phase angle (rank eighth; 0.151) exhibited comparatively small and more centrally distributed SHAP effects, and neither showed a significant linear correlation with VO₂peak (weight: r=0.019, P=0.734; trunk phase angle: r=0.004, P=0.938).

Whole-body phase angle ranked ninth (0.129) and showed a small but statistically significant positive correlation with VO₂peak (r=0.126, P=0.024), whereas ECW/TBW (TR) ranked tenth (0.125) and correlated negatively with VO₂peak (r=−0.275, P<0.001), with higher values tending to contribute negative SHAP values. Additional lower-ranked contributors included 250kHz-LL phase angle (rank eleventh; 0.099; r=0.266, P<0.001), ECW/TBW (LL) (rank twelfth; 0.092; r=−0.267, P<0.001), 5kHz-ra phase angle (rank thirteenth; 0.092; r=0.163, P=0.003), ECW/TBW (RL) (rank fourteenth; 0.090; r=−0.229, P<0.001), and TBW/FFM (rank fifteenth; 0.085; r=0.020, P=0.716). Overall, the SHAP analysis suggests that the model’s predictions were primarily driven by pulmonary function (FEV1, FVC), age, and selected bioimpedance-derived variables, including lower-extremity phase angle and ECW/TBW. These patterns are physiologically plausible but should be interpreted as exploratory associations rather than evidence of a validated mechanistic pathway.

Exploratory threshold-based classification at the 20 mL·kg⁻¹·min⁻¹ threshold

To explore threshold-based classification performance, we assessed the models at the 20 mL·kg⁻¹·min⁻¹ VO₂peak cutoff, a clinically relevant threshold used in ESTS/ERS guidance to inform further physiologic assessment before lung resection (5,17). In clinical practice, formal CPET contributes to the identification of patients with limited functional capacity who may require comprehensive physiological assessment. The present analysis did not directly predict postoperative outcomes but examined whether model-estimated VO₂peak could support prioritisation for formal CPET. Accordingly, the classification analysis was interpreted as an exploratory assessment of potential CPET prioritisation, with attention to sensitivity and false-negative classifications because missing patients with low functional capacity could have important clinical consequences. The experimental settings for this classification analysis were identical to those employed for the regression models. Classification results across algorithms are summarised in Table 5, and ROC plots at the 20 mL·kg⁻¹·min⁻¹ VO₂peak threshold are presented in Figure 2.

Table 5

Classification performance of prediction models with 95% CIs at the 20 mL·kg⁻¹·min⁻¹ VO₂peak threshold, a cutoff in ESTS/ERS guidelines

Model	AUROC ↑	Accuracy ↑	Recall ↑	F1-score ↑
KNN	0.762±0.129 (0.670–0.854)	0.779±0.063 (0.734–0.825)	0.269±0.202 (0.125–0.414)	0.327±0.202 (0.183–0.472)
SVC	0.813±0.074 (0.761–0.866)	0.786±0.048 (0.752–0.821)	0.255±0.186 (0.122–0.388)	0.312±0.205 (0.165–0.458)
RF	0.842±0.072 (0.790–0.894)	0.797±0.077 (0.741–0.852)	0.574±0.255 (0.391–0.756)	0.538±0.178 (0.411–0.665)
XGB	0.824±0.101 (0.752–0.896)	0.817±0.075 (0.764–0.871)	0.586±0.235 (0.417–0.754)	0.574±0.191 (0.438–0.711)
DNN (MLP)	0.797±0.072 (0.746–0.849)	0.803±0.061 (0.760–0.847)	0.493±0.226 (0.331–0.654)	0.506±0.184 (0.374–0.638)
FT-Transformer	0.819±0.093 (0.752–0.885)	0.787±0.078 (0.732–0.843)	0.677±0.194 (0.538–0.816)	0.594±0.151 (0.487–0.702)
TabTransformer	0.815±0.091 (0.749–0.880)	0.753±0.063 (0.708–0.798)	0.648±0.179 (0.520–0.776)	0.545±0.115 (0.462–0.627)
TabNet	0.810±0.134 (0.714–0.905)	0.769±0.081 (0.711–0.827)	0.734±0.232 (0.568–0.900)	0.591±0.146 (0.486–0.695)

Data are presented as mean ± standard deviation (95% CI). Upward arrows (↑) indicate that higher values represent better predictive performance. AUROC, area under the receiver operating characteristic curve; CI, confidence interval; DNN, deep neural network; ERS, European Respiratory Society; ESTS, European Society of Thoracic Surgeons; KNN, k-nearest neighbors; MLP, multilayer perceptron; RF, random forest; SVC, support vector classification; VO₂peak, peak oxygen uptake; XGB, extreme gradient boosting.

Figure 2 Mean receiver operating characteristic curves for the top five full predictor-set models based on AUROC in the threshold-based classification task for VO₂peak <20 mL·kg⁻¹·min⁻¹. Curves were averaged across 10 repeated test evaluations. The shaded areas represent variability across repeated evaluations. AUROC, area under the receiver operating characteristic curve; RF, random forest; SVC, support vector classification; VO₂peak, peak oxygen uptake; XGB, extreme gradient boosting.

Specifically, RF achieved the highest AUROC (0.842±0.072), indicating the best overall discrimination across classification thresholds, whereas XGBoost achieved the highest accuracy (0.817±0.075), reflecting the highest overall proportion of correctly classified patients. However, as this analysis aimed to identify patients with VO₂peak <20 mL·kg⁻¹·min⁻¹, recall (sensitivity) is also important because it reflects the proportion of low-VO₂peak patients correctly identified by the model. In this respect, TabNet showed the highest recall (0.734±0.232), followed by FT-Transformer (0.677±0.194) and TabTransformer (0.648±0.179). FT-Transformer achieved the highest F1-score (0.594±0.151), with TabNet showing a very similar value (0.591±0.146), indicating the balance between precision and recall. Taken together, these findings suggest a trade-off between overall discrimination (RF), classification accuracy (XGBoost), and sensitivity-oriented classification performance (TabNet and FT-Transformer), which should be considered when interpreting false-positive and false-negative classifications.

Discussion

Preoperative functional assessment remains central to perioperative evaluation in thoracic oncology because reduced cardiopulmonary reserve is associated with postoperative pulmonary complications after lung resection (18). Preoperative factors such as age, body mass index, smoking, poor physical condition, respiratory disease, diabetes, and neurological comorbidity carry substantial predictive value for postoperative pneumonia after thoracoscopic lung cancer surgery, reinforcing the importance of preoperative physiologic evaluation (19). Although CPET is the reference standard, its availability remains limited in some settings (20). Estimating VO₂peak from routine preoperative data may therefore help prioritise formal CPET or inform consideration of prehabilitation when resources are limited, rather than replace formal testing (21,22).

While the model demonstrated moderate regression performance (R²=0.323±0.153; MAE = 2.901±0.599 mL/kg/min) and reasonable discrimination (AUROC up to 0.842), explaining approximately one-third of the variance in measured VO₂peak, this level of explanatory power aligns with recent applications of machine learning in cardiothoracic risk assessment (23-26). Overall, RF provided the strongest discrimination, XGBoost showed the most favourable overall classification accuracy, and TabNet and FT-Transformer offered more sensitivity-oriented classification performance, with FT-Transformer achieving the best balance between precision and recall.

Crucially, the present model is not intended as a direct prognosticator of postoperative complications or mortality. Rather, it serves as a surrogate estimator for CPET-derived VO₂peak—a well-established physiologic proxy for perioperative risk. This distinction bears clinical relevance, as the model aims to support an earlier decision point in the preoperative pathway: identifying patients who may warrant prioritised formal CPET or further preoperative evaluation. In this sense, the proposed approach should be interpreted as an exploratory CPET prioritisation aid based on routinely available data, rather than as a replacement for formal physiologic testing or a stand-alone operative risk calculator.

Although VO₂peak is a clinically important CPET-derived measure, it represents only one dimension of CPET. Formal CPET provides multiple additional parameters, including the ventilatory equivalent for carbon dioxide (VE/VCO₂) slope, anaerobic threshold, electrocardiographic response, and evidence of ventilatory limitation. These indices may provide information on ventilatory efficiency, cardiovascular response, respiratory mechanics, and postoperative risk that cannot be captured by VO₂peak estimation alone. Therefore, the present model should not be interpreted as capturing the full multidimensional physiologic information provided by CPET or as replacing formal CPET. Future studies should evaluate whether routinely available preoperative data can be used to estimate VE/VCO₂ slope or multiple CPET-derived indices, and whether multi-index prediction improves CPET prioritisation and perioperative decision support.

Predicted VO₂peak should therefore be viewed as supportive information for considering CPET prioritisation and possible prehabilitation assessment, not as a basis to defer formal testing (27,28). This potential role is also consistent with TLCR guidance on enhanced recovery after lung surgery and with recent TLCR work that frames the preoperative treatment interval as a potential window for prehabilitation rather than merely a waiting period (29,30). Although the model showed reasonable discrimination at the 20 mL·kg⁻¹·min⁻¹ threshold, sensitivity was only moderate. This is clinically important because false-negative classification of patients with low VO₂peak could delay formal CPET or further physiologic assessment. Therefore, threshold selection should be further tuned according to the intended clinical use. In a screening or prioritisation context, a more sensitivity-oriented threshold may be preferable to reduce false negatives, although this would increase false positives and additional CPET referrals. Prospective validation and calibration are needed before any threshold can be recommended for clinical use. The ROC analysis further supported this interpretation by showing reasonable threshold-dependent discrimination for the top five full predictor-set models based on AUROC. However, ROC curves do not define an optimal clinical threshold by themselves, and threshold selection should still be guided by the intended clinical use and prospective validation.

Model interpretability is important for clinical adoption of machine learning (31) and supports the biological plausibility of this VO₂peak prediction approach. This interpretation-first framing is also in line with recent TLCR original research in thoracic oncology, in which IML using readily available data was developed to support clinical interpretation rather than replace clinical judgment (32). The prominence of spirometric indices, particularly FEV1 and FVC, is consistent with the contribution of ventilatory capacity to exercise tolerance among candidates for lung resection, although these variables may also partly reflect body size. Age also contributed materially, consistent with the age-related decline in cardiorespiratory fitness. These findings suggest that routine demographic and pulmonary function measures capture substantial information about functional reserve. However, the additional feature-set comparison indicated that the expanded model was not driven solely by spirometric variables. Rather, the body-composition-only and basic-plus-body-composition models showed performance comparable to or better than the corresponding spirometry-only feature sets, and the full predictor set showed the strongest overall regression performance. Therefore, the practical value of the expanded model should be interpreted not as replacing spirometry, but as integrating spirometric and bioimpedance-derived information in an interpretable manner.

The inclusion of bioimpedance-derived variables was motivated by their routine availability and their potential to capture non-respiratory dimensions of preoperative physiologic status, such as body composition, hydration, and tissue-related characteristics. In the present model, several bioimpedance-derived variables appeared among the influential predictors. The feature-set comparison further suggested that bioimpedance-derived variables may provide complementary information beyond spirometric variables, as the body-composition-only and basic-plus-body-composition feature sets performed favorably compared with the corresponding spirometry-based feature sets. Nevertheless, these results should not be interpreted as indicating that bioimpedance-derived variables replace established spirometric predictors. Rather, they suggest that bioimpedance-derived variables may add valuable complementary information to conventional pulmonary function measures in routine preoperative assessment.

Phase angle measures contributed to prediction alongside spirometric variables, with lower-extremity phase angle ranking among the influential non-spirometric predictors. This finding is physiologically plausible because phase angle has been interpreted as a marker related to cellular membrane properties, hydration status, and nutritional or functional status (33). However, the present study was designed for prediction rather than mechanistic validation. Therefore, the predictive contribution of phase angle to the model should be interpreted as exploratory and should not be taken as evidence of a validated causal or mechanistic pathway. Although conventional muscle-mass indices such as SMI were less influential than phase-angle variables, mechanistic conclusions cannot be drawn because SHAP values represent model-specific feature contributions rather than causal effects, particularly in the presence of correlated bioimpedance-derived variables (34,35). Exercise performance may reflect not only pulmonary limitation and body size, but also peripheral muscle function and metabolic reserve (36), which are relevant to multimodal prehabilitation frameworks (37,38). Nevertheless, focused prospective studies are needed to evaluate whether phase angle has an independent physiologic relationship with CPET-derived exercise capacity.

From a practical perspective, these findings suggest that bioimpedance-derived variables may enrich routine preoperative assessment without adding a substantial procedural burden when bioelectrical impedance analysis is already available in clinical practice. These markers should not be interpreted causally or in isolation, but they may provide supportive information about muscle quality, hydration status, or systemic reserve that could inform further clinical assessment. Accordingly, the added value of an interpretable model in this setting lies not only in prediction accuracy, but also in making these multidimensional contributors clinically visible at the individual-patient level.

Adiposity and fluid-distribution indices also contributed to prediction, suggesting that the model captures physiological burden not fully represented by body weight alone. PBF may reflect mechanical load, whereas extracellular water to total body water ratios may reflect altered body composition or fluid balance. Because VO₂peak was normalised to body mass, associations with weight-, BMI-, or adiposity-related predictors may partly reflect mathematical coupling with the denominator rather than differences in physiological exercise capacity alone. Even so, these variables may broaden characterization of preoperative vulnerability beyond pulmonary function alone.

These findings remain relevant to contemporary thoracic practice, in which perioperative outcomes depend on patient-level physiological reserve as well as operative approach. Recent debate regarding the incremental predictive utility of CPET underscores the need for objective, clinically interpretable measures that capture systemic fitness rather than lung mechanics alone (7).

If implemented clinically, the model could be considered as supportive information when CPET capacity is constrained, particularly for prioritising formal CPET or considering prehabilitation assessment, rather than as a substitute for formal testing. Its safety would depend on calibration and threshold-specific operating characteristics, and a decision-analytic evaluation would be needed to determine whether it improves outcomes or resource allocation.

Several limitations should be acknowledged. First, the single-centre retrospective design limits generalisability, and external multicentre validation is required before the model can be considered for clinical implementation. Predictive performance was moderate, likely reflecting heterogeneity not fully captured by the available predictors. DLCO was unavailable for all patients, limiting comparability with ERS/ESTS guideline-based assessment. Other relevant variables, including comorbidity burden, smoking exposure, anaemia, tumour stage, and planned extent of resection, were not included. In addition, because the final analytic cohort was based on complete-case inclusion, patients with missing required predictor or outcome data were excluded. This may have introduced selection bias if excluded patients differed systematically from those included in the analysis. Although routine preoperative assessments were required to have been performed within 6 months of CPET and the observed intervals were generally short, temporal variability between routine assessments and CPET may still have influenced the measured associations in some patients.

Second, the scope of the model should be interpreted cautiously. Although phase angle was among the influential non-spirometric predictors, this observational prediction study cannot determine whether phase angle has a causal or mechanistic relationship with VO₂peak. The analysis focused on VO₂peak and a single clinically relevant threshold, rather than the full continuum of exercise capacity, other CPET-derived indices, or postoperative outcomes. The model did not estimate other clinically important CPET-derived information, such as VE/VCO₂ slope, anaerobic threshold, electrocardiographic response, or ventilatory limitation, and therefore cannot capture the full multidimensional physiologic assessment provided by formal CPET. Future studies should evaluate whether VE/VCO₂ slope or multiple CPET-derived indices can be estimated from routinely available preoperative data. Prospective studies are also needed to determine whether model-estimated VO₂peak can inform CPET prioritisation or prehabilitation assessment and whether such use improves postoperative outcomes. In addition, threshold selection was not prospectively optimised for clinical implementation. Although the model showed reasonable discrimination, sensitivity was moderate at the evaluated threshold, and alternative thresholds may be required depending on whether the intended use prioritizes sensitivity, specificity, or resource efficiency. The 20 mL·kg⁻¹·min⁻¹ threshold may function as a sensitivity- or negative predictive value-oriented screening threshold for CPET prioritisation, whereas the 15 mL·kg⁻¹·min⁻¹ threshold may represent a specificity- or positive predictive value-oriented high-risk cutoff (39). Because few patients in this cohort had VO₂peak ≤15 mL·kg⁻¹·min⁻¹, the performance and clinical role of this lower threshold should be evaluated in larger external cohorts.

Conclusions

Limited access to CPET remains a practical bottleneck in preoperative lung resection pathways. Using routinely available clinical, spirometric, and bioimpedance-derived measures, this study developed and internally evaluated an IML model to estimate VO₂peak and explore threshold-based classification at 20 mL·kg⁻¹·min⁻¹. The model may provide supportive information for prioritising formal CPET when resources are constrained, while offering patient-level explanations of model predictions. Further multicentre validation and prospective implementation studies are needed to determine whether model-informed prioritisation optimises CPET utilisation and ultimately improves perioperative outcomes.

Acknowledgments

During the preparation of this work, the authors used ChatGPT (OpenAI) to improve the manuscript’s grammar, readability, and phrasing. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the integrity and accuracy of the published article.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0351/rc

Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0351/dss

Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0351/prf

Funding: This study was supported by Biomedical Research Institute Grant, Pusan National University Hospital (20240050).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0351/coif). Se-Hun Kim, S.E.P., C.H.H., K.H.K., and Sang-Hun Kim report receiving support from Biomedical Research Institute Grant, Pusan National University Hospital. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Institutional Review Board of Pusan National University Hospital (IRB No. 2407-016-141), and individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Stéphan F, Boucheseiche S, Hollande J, et al. Pulmonary complications following lung resection: a comprehensive analysis of incidence and possible risk factors. Chest 2000;118:1263-70. [Crossref] [PubMed]
Benzo R, Kelley GA, Recchi L, et al. Complications of lung resection and exercise capacity: a meta-analysis. Respir Med 2007;101:1790-7. [Crossref] [PubMed]
American Thoracic Society. ATS/ACCP Statement on cardiopulmonary exercise testing. Am J Respir Crit Care Med 2003;167:211-77.
Brunelli A, Kim AW, Berger KI, et al. Physiologic evaluation of the patient with lung cancer being considered for resectional surgery: Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 2013;143:e166S-e190S.
Brunelli A, Charloux A, Bolliger CT, et al. ERS/ESTS clinical guidelines on fitness for radical therapy in lung cancer patients (surgery and chemo-radiotherapy). Eur Respir J 2009;34:17-41. [Crossref] [PubMed]
Pele I, Mihălțan FD. Cardiopulmonary exercise testing in thoracic surgery. Pneumologia 2020;69:3-10.
Filakovszky Á, Brat K, Tschoellitsch T, et al. Cardiopulmonary exercise testing before lung resection surgery: still indicated? Evaluating predictive utility using machine learning. Thorax 2026;81:474-82.
Akamatsu Y, Kusakabe T, Arai H, et al. Phase angle from bioelectrical impedance analysis is a useful indicator of muscle quality. J Cachexia Sarcopenia Muscle 2022;13:180-9. [Crossref] [PubMed]
Abdelnour D, Grove Ii M, Pulford-Thorpe K, et al. Associations between absolute and relative handgrip strength with fitness and fatness. Sports Med Int Open 2025;9:a25377537. [Crossref] [PubMed]
Fearon K, Strasser F, Anker SD, et al. Definition and classification of cancer cachexia: an international consensus. Lancet Oncol 2011;12:489-95. [Crossref] [PubMed]
Jeong D, Oh YM, Lee SW, et al. Comparison of Predicted Exercise Capacity Equations in Adult Korean Subjects. J Korean Med Sci 2022;37:e113. [Crossref] [PubMed]
Kelly CJ, Karthikesalingam A, Suleyman M, et al. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019;17:195. [Crossref] [PubMed]
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30 (NIPS 2017). Available online: https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
Huang X, Khetan A, Cvitkovic M, et al. TabTransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678. 2020. Available online: https://arxiv.org/abs/2012.06678
Gorishniy Y, Rubachev I, Khrulkov V, et al. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems 34 (NeurIPS 2021):18932-43. Available online: https://proceedings.neurips.cc/paper/2021/hash/9d86d83f925f2149e9edb0ac3b49229c-Abstract.html
Arik SÖ, Pfister T. Tabnet: Attentive interpretable tabular learning. Proceedings of the AAAI conference on artificial intelligence 2021;6679-6687. [Internet].
Orlandi R, Rinaldo RF, Mazzucco A, et al. Early outcomes of "low-risk" patients undergoing lung resection assessed by cardiopulmonary exercise testing: Single-institution experience. Front Surg 2023;10:1130919. [Crossref] [PubMed]
Petrella F, Cara A, Cassina EM, et al. Evaluation of preoperative cardiopulmonary reserve and surgical risk of patients undergoing lung cancer resection. Ther Adv Respir Dis 2024;18:17534666241292488. [Crossref] [PubMed]
Bian H, Liu M, Liu J, et al. Seven preoperative factors have strong predictive value for postoperative pneumonia in patients undergoing thoracoscopic lung cancer surgery. Transl Lung Cancer Res 2023;12:2193-208. [Crossref] [PubMed]
Ferguson M, Shulman M. Cardiopulmonary Exercise Testing and Other Tests of Functional Capacity. Curr Anesthesiol Rep 2022;12:26-33. [Crossref] [PubMed]
Melnyk M, Casey RG, Black P, et al. Enhanced recovery after surgery (ERAS) protocols: Time to change practice? Can Urol Assoc J 2011;5:342-8. [Crossref] [PubMed]
Solaini L, Prusciano F, Bagioni P, et al. Video-assisted thoracic surgery (VATS) of the lung: analysis of intraoperative and postoperative complications over 15 years and review of the literature. Surg Endosc 2008;22:298-310. [Crossref] [PubMed]
Betts KS, Marathe SP, Chai K, et al. A machine learning approach to predicting 30-day mortality following paediatric cardiac surgery: findings from the Australia New Zealand Congenital Outcomes Registry for Surgery (ANZCORS). Eur J Cardiothorac Surg 2023;64:ezad160. [Crossref] [PubMed]
Hui V, Litton E, Edibam C, et al. Using machine learning to predict bleeding after cardiac surgery. Eur J Cardiothorac Surg 2023;64:ezad297. [Crossref] [PubMed]
Qiu X, Hu S, Dong S, et al. Construction of an automated machine learning-based predictive model for postoperative pulmonary complications risk in non-small cell lung cancer patients undergoing thoracoscopic surgery. PLoS One 2025;20:e0333413. [Crossref] [PubMed]
Chen S, Deng T, Yang Q, et al. Development and validation of an explainable machine learning model for predicting postoperative pulmonary complications after lung cancer surgery: a machine learning study. EClinicalMedicine 2025;86:103386. [Crossref] [PubMed]
Zhou N, Ripley-Gonzalez JW, Zhang W, et al. Preoperative exercise training decreases complications of minimally invasive lung cancer surgery: A randomized controlled trial. J Thorac Cardiovasc Surg 2025;169:516-528.e10. [Crossref] [PubMed]
Guo Y, Pan M, Xiong M, et al. Efficacy of preoperative pulmonary rehabilitation in lung cancer patients: a systematic review and meta-analysis of randomized controlled trials. Discov Oncol 2025;16:56. [Crossref] [PubMed]
Geomini LD, van Steenwijk QCA, Janki S, et al. Redefining treatment interval in lung cancer surgery in the era of prehabilitation: a systematic review. Transl Lung Cancer Res 2025;14:5082-98. [Crossref] [PubMed]
Gao S, Barello S, Chen L, et al. Clinical guidelines on perioperative management strategies for enhanced recovery after lung surgery. Transl Lung Cancer Res 2019;8:1174-87. [Crossref] [PubMed]
Band SS, Yarahmadi A, Hsu CC, et al. Application of explainable artificial intelligence in medical health: A systematic review of interpretability methods. Informatics in Medicine Unlocked 2023;40:101286.
Chen Y, Jin J, Mao Y, et al. Development and validation of an interpretable machine learning model for prediction of occult lymph node metastasis in clinical stage T1 lung adenocarcinoma. Transl Lung Cancer Res 2025;14:5415-30. [Crossref] [PubMed]
Prete M, Ballarin G, Porciello G, et al. Bioelectrical impedance analysis-derived phase angle (PhA) in lung cancer patients: a systematic review. BMC Cancer 2024;24:608. [Crossref] [PubMed]
Kirk B, Cawthon PM, Arai H, et al. The Conceptual Definition of Sarcopenia: Delphi Consensus from the Global Leadership Initiative in Sarcopenia (GLIS). Age Ageing 2024;53:afae052. [Crossref] [PubMed]
Hou S, Zhao X, Wei J, et al. The diagnostic performance of phase angle for sarcopenia among older adults: A systematic review and diagnostic meta-analysis. Arch Gerontol Geriatr 2025;131:105754. [Crossref] [PubMed]
Sietsema KE, Stringer WW, Sue DY, et al. Wasserman & Whipp’s: principles of exercise testing and interpretation: including pathophysiology and clinical applications. Lippincott Williams & Wilkins, 2020.
Cruz Mosquera FE, Murillo SR, Naranjo Rojas A, et al. Effect of Exercise and Pulmonary Rehabilitation in Pre- and Post-Surgical Patients with Lung Cancer: Systematic Review and Meta-Analysis. Medicina (Kaunas) 2024;60:1725. [Crossref] [PubMed]
Granger C, Cavalheri V. Preoperative exercise training for people with non-small cell lung cancer. Cochrane Database Syst Rev 2022;9:CD012020. [Crossref] [PubMed]
Yang N, Shi Z, Liu J, et al. Key cardiopulmonary exercise testing indicators for predicting the risk of postoperative cardiopulmonary complications in patients undergoing thoracoscopic lung resection. Front Surg 2025;12:1765398. [Crossref] [PubMed]

Cite this article as: Kim SH, Park SE, Hong CH, Park TS, Shin MJ, Kim KH, Kim SH. Prediction of peak oxygen uptake using interpretable machine learning on routinely available preoperative assessments before lung resection. Transl Lung Cancer Res 2026;15(6):171. doi: 10.21037/tlcr-2026-0351

Prediction of peak oxygen uptake using interpretable machine learning on routinely available preoperative assessments before lung resection

Highlight box

Introduction

Methods

Study population and baseline characteristics