Assessment of tumor mutation burden based on whole-exome sequencing of extracellular vesicle-derived DNA from bronchoalveolar lavage fluid in advanced non-small cell lung cancer treated with pembrolizumab: a prospective, multicenter, observational study
Original Article

Assessment of tumor mutation burden based on whole-exome sequencing of extracellular vesicle-derived DNA from bronchoalveolar lavage fluid in advanced non-small cell lung cancer treated with pembrolizumab: a prospective, multicenter, observational study

Heejoung Kim1 ORCID logo, Jaeyoung Hur1,2 ORCID logo, Wan Seop Kim1,2 ORCID logo, In Ae Kim1 ORCID logo, Dongil Park3 ORCID logo, In-Jae Oh4 ORCID logo, Seung-Jae Noh5 ORCID logo, Hyoeun Bang6 ORCID logo, Kye Young Lee1,7 ORCID logo

1Precision Medicine Lung Cancer Center, Konkuk University Medical Center, Seoul, Republic of Korea; 2Department of Pathology, Konkuk University Medical Center, Seoul, Republic of Korea; 3Division of Respiratory and Critical Care Medicine, Department of Internal Medicine, College of Medicine, Chungnam National University, Daejeon, Republic of Korea; 4Department of Internal Medicine, Chonnam National University Medical School and Hwasun Hospital, Jeonnam, Republic of Korea; 5Neogenlogic, Seongnam, Republic of Korea; 6Division of AI Data Science, The University of Suwon, Hwaseong, Republic of Korea; 7Department of Pulmonary Medicine, Konkuk University School of Medicine, Seoul, Republic of Korea

Contributions: (I) Conception and design: KY Lee; (II) Administrative support: H Kim, D Park, IJ Oh, KY Lee; (III) Provision of study materials or patients: H Kim, J Hur, WS Kim, IA Kim, D Park, IJ Oh, SJ Noh; (IV) Collection and assembly of data: H Kim, J Hur, WS Kim, SJ Noh, H Bang; (V) Data analysis and interpretation: H Kim, J Hur, SJ Noh, H Bang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Professor Kye Young Lee, MD, PhD. Precision Medicine Lung Cancer Center, Konkuk University Medical Center, Seoul, Republic of Korea; Department of Pulmonary Medicine, Konkuk University School of Medicine, 120-1 Neungdong-ro, Gwangjin-Gu, Seoul 05030, Republic of Korea. Email: kyleemd@kuh.ac.kr.

Background: Obtaining sufficient tumor tissue for molecular testing remains challenging in advanced non-small cell lung cancer (NSCLC). We evaluated tumor mutation burden (TMB) using extracellular vesicle (EV)-derived DNA from bronchoalveolar lavage fluid (BALF) and assessed the correlation between BALF and tissue TMB.

Methods: In this prospective, multicenter, observational study, we enrolled patients with pathologically confirmed stage IV NSCLC who received pembrolizumab at three academic institutions in South Korea. TMB was quantified by whole-exome sequencing (WES) of EV-derived DNA isolated from BALF and from matched tumor tissue obtained prior to treatment. The concordance of TMB between BALF EV-derived DNA and matched tumor tissue was assessed using Spearman’s rank correlation coefficient. Associations between BALF EV-TMB and clinical outcomes—including overall survival (OS), progression-free survival (PFS), and objective response rate—were evaluated over a median follow-up of 17.0 months (range, 1–63 months).

Results: Out of 64 patients, 46 (71.9%) had evaluable tissue TMB, and 53 (82.8%) had evaluable BALF TMB. There were no significant differences between BALF and tissue regarding DNA amounts, estimated library size, mean depth, and uniformity. The median tissue TMB and BALF TMB were 4.738 and 1.82 mut/Mb, respectively. A modest positive correlation was observed between tissue TMB and BALF TMB (r=0.61, P<0.001). Neither tissue TMB nor BALF TMB showed significant differences in PFS. However, higher TMB is associated with a more favorable OS outcome in patients with advanced NSCLC.

Conclusions: These findings demonstrate the feasibility of WES-based TMB quantification from BALF EV-derived DNA and suggest its potential as a complementary prognostic measure in pembrolizumab-treated NSCLC. Validation in larger, prospective cohorts is warranted before clinical implementation.

Keywords: Tumor mutation burden (TMB); extracellular vesicles (EVs); bronchoalveolar lavage fluid (BALF); pembrolizumab


Submitted Mar 12, 2026. Accepted for publication May 28, 2026. Published online Jun 24, 2026.

doi: 10.21037/tlcr-2026-0294


Highlight box

Key findings

• This study demonstrates the clinical utility of a novel liquid biopsy approach that calculates tumor mutation burden (TMB) using extracellular vesicle (EV)-derived DNA from bronchoalveolar lavage fluid (BALF) in patients with advanced non-small cell lung cancer (NSCLC).

• BALF EV-TMB showed a statistically significant correlation with tissue TMB (r=0.614, P<0.001).

• High BALF EV-TMB was associated with improved overall survival (OS) in patients receiving pembrolizumab, highlighting its potential as a complementary prognostic measure when tissue is insufficient or unavailable.

What is known and what is new?

• TMB is a biomarker for predicting the efficacy of immune checkpoint inhibitors (ICIs) in NSCLC. However, obtaining adequate tissue for whole-exome sequencing (WES) is often challenging, and blood-based liquid biopsy may have limited sensitivity due to low tumor DNA shedding.

• We introduce BALF TMB assessment using EV-derived DNA in BALF, which is highly feasible and correlates positively with tissue TMB (r=0.614). High BALF TMB was found to be modestly associated with improved OS in pembrolizumab-treated NSCLC patients.

What is the implication, and what should change now?

• BALF EV DNA provides a robust and enriched source of tumor-derived genetic material, complementing the quantitative limitations of tissue and blood samples. It serves as a reliable surrogate for genomic profiling and TMB calculation in clinical practice.

• For advanced NSCLC patients where tissue biopsy is unavailable, BALF-based liquid biopsy should be considered as a feasible alternative for molecular stratification. Incorporating BALF EV-TMB into the diagnostic workflow can help clinicians more accurately identify candidates who will benefit most from immunotherapy.


Introduction

Pembrolizumab is a representative immunotherapy drug that works by blocking the programmed death-ligand 1 (PD-L1) pathway to prevent cancer cells from evading immune surveillance (1). It has expanded its indications for various types of cancer based on extensive clinical trial results, starting with its Food and Drug Administration (FDA) approval for malignant melanoma (2). Pembrolizumab with or without chemotherapy is known to be the preferred first-line treatment in patients with metastatic non-small cell lung cancer (NSCLC), although the regimen and efficacy may differ slightly depending on the expression level of PD-L1 and histological subtype (3). The introduction of immune checkpoint inhibitors (ICIs) has demonstrated long-term effects in a subset of patients, including those with advanced or metastatic disease, although such benefits are not observed in the majority of patients. When considering cost-effectiveness, as well as the treatment opportunities for individual patients, it is crucial to select appropriate patient groups expected to benefit the most. Currently, guidelines recommend conducting PD-L1 immunohistochemistry (IHC) to select drugs based on the level of expression (3). KEYNOTE-024 demonstrated that pembrolizumab as a first-line treatment is more effective than chemotherapy [overall survival (OS): 30.0 vs. 14.2 months] in metastatic NSCLC patients with a tumor proportion score (TPS) of PD-L1 equal to or greater than 50% (4). However, KEYNOTE-042 and KEYNOTE-047 showed favorable outcomes in patients with PD-L1 scores ranging from 1% to 49% and those with high PD-L1 expression (5). In addition, there are several clones (22C3, 28-8, SP263, and SP14) used in PD-L1 IHC; however, while some of them are interchangeable for selecting specific drugs, others showed decreased sensitivity (6). When choosing 1% or 50% as the criterion for treatment with ICIs, there are a few limitations due to small specimen size, tumor heterogeneity, and inter-observer discrepancy. Therefore, PD-L1 IHC results may not be sufficient as predictive biomarkers for treatment decisions.

Tumor mutation burden (TMB) refers to the total number of mutations per megabase that occur in the tumor genome and is measured by whole-exome sequencing (WES). As TMB increases, the immune system can more effectively detect and attack cancer cells, and patients with high TMB tend to show better responses to immunotherapy (7). TMB has demonstrated clinical utility as a predictive biomarker; in the CheckMate-227 trial, high TMB (≥10 mut/Mb) was significantly associated with improved OS in patients treated with nivolumab plus ipilimumab (8). However, the predictive value of TMB for progression-free survival (PFS) has been inconsistent across trials, and subsequent analyses have raised questions about the optimal TMB threshold and its generalizability across different tumor types and sequencing platforms (8). Furthermore, TMB assessment carries practical limitations: there is no universally standardized sequencing method or threshold definition, and in large-scale trials such as CheckMate-227 and CheckMate-568, TMB calculation failed in 34–42% of subjects owing to insufficient tumor tissue quantity or inadequate quality (9).

To overcome the tissue requirement, blood-based liquid biopsy using cell-free DNA (cfDNA) has been extensively investigated; however, its clinical applicability is constrained by several limitations. In patients with low tumor burden, circulating tumor DNA (ctDNA) concentrations in peripheral blood are disproportionately low, and the resulting low variant allele frequency (VAF) increases the risk of false-negative results and technical noise in sequencing data (10,11). These limitations are particularly relevant for WES, which requires high-quality, sufficiently long DNA fragments for accurate TMB quantification.

Bronchoalveolar lavage (BAL) is a diagnostic procedure used in various respiratory diseases, in which 100–150 mL of saline is instilled into the subsegment of the affected area and then retrieved. BAL is a routinely performed bronchoscopic procedure that retrieves cellular and non-cellular material from the distal airways and alveoli. Bronchoalveolar lavage fluid (BALF) has been established as a reliable source for detecting driver mutations in lung cancer with high sensitivity (12), and targeted next-generation sequencing (NGS) using BALF-derived DNA has been demonstrated to be technically feasible (13). Extracellular vesicles (EVs) have recently gained considerable attention as a liquid biopsy analyte across multiple body fluids, including blood, urine, saliva, exhaled breath condensate, pleural effusion, cerebrospinal fluid (CSF), and BALF, given their capacity to carry tumor-derived DNA in a protected, double-stranded form within a lipid bilayer membrane (12,13). Unlike plasma cfDNA, which is released passively through apoptosis and necrosis and is highly fragmented and rapidly cleared from circulation, EV-derived DNA is shielded from nuclease-mediated degradation by this membrane structure, yielding longer, more intact fragments that are well suited for WES-based TMB quantification (13,14). When isolated from BALF, EVs offer the additional advantage of being collected directly from the pulmonary compartment in contact with the tumor, inherently enriching tumor-derived nucleic acids relative to peripheral blood which is subject to systemic dilution (13).

Despite growing interest in liquid biopsy-based TMB assessment, the concordance between blood-derived and tissue-derived TMB remains suboptimal, and no validated surrogate for tissue TMB has been established in clinical practice. BALF EV-derived DNA, with its unique biological properties and anatomical proximity to lung tumors, represents an unexplored but theoretically superior alternative. Therefore, this study aimed to (I) evaluate the technical feasibility and concordance of WES-based TMB quantification using BALF EV-derived DNA compared with matched tumor tissue, and (II) investigate the prognostic value of BALF EV-TMB for predicting OS, PFS, and objective response in patients with advanced NSCLC treated with pembrolizumab. We present this article in accordance with the STROBE reporting checklist (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0294/rc).


Methods

Study design and patients

This was a prospective, multicenter, observational study conducted at Konkuk University Medical Center (KUMC), Chungnam National University Hospital (CNUH), and Chonnam National University Hwasun Hospital (CNUHH) in the Republic of Korea. Patient enrollment was conducted between January 2019 and December 2022. Pathologically confirmed stage IV NSCLC patients, who were treated with pembrolizumab, were enrolled. Exclusion criteria included: (I) prior immunotherapy, (II) active autoimmune disease requiring systemic treatment, and (III) insufficient BALF volume (<5 mL) for DNA isolation.

Patients received pembrolizumab intravenously at a dose of 200 mg every 3 weeks until disease progression, unacceptable adverse events (AEs), or loss to follow-up. Some patients who received second-line pembrolizumab treatment were administered a dosage of 2 mg/kg, as specified by the Korean Ministry of Food and Drug Safety. Tumor response was assessed every 9 weeks (3 cycles) by computed tomography (CT) per RECIST version 1.1 by the treating physician; independent radiological review was not performed. If there was no evidence of disease progression, pembrolizumab was administered for up to 35 cycles over a period of 2 years. Dose reductions were not permitted; however, pembrolizumab could be interrupted or discontinued due to toxicity.

Patients were followed up for a median duration of 17 months (range, 1–63 months) to assess survival outcomes and disease progression. The primary outcome was the concordance of TMB between BALF EV-derived DNA and matched tumor tissue. Secondary outcomes included OS and PFS stratified by BALF EV-TMB levels, as well as objective response rate (ORR). OS was defined as the time from treatment initiation to death from any cause; PFS was defined as the time from treatment initiation to radiographic progression per RECIST 1.1 or death from any cause, whichever occurred first.

This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Institutional Review Board (IRB) of each institution, and written informed consent was obtained from all patients.

Sample collection and processing

BALF

BALF was obtained at the time of diagnosis or before treatment with pembrolizumab, and at least 10 aliquots of 1 mL BALF were frozen at −80 ℃. The collected BALF was placed into 50 mL centrifuge tubes and immediately centrifuged at 1,000 ×g at 4 ℃ for 15 minutes to eliminate cells and debris. The obtained supernatants were stored at −80 ℃ for EV isolation. Subsequently, the cell- and debris-free BALF was transferred to an ultracentrifuge tube and centrifuged at 200,000 ×g at 4 ℃ for 1 hour using a Beckman rotor (Beckman Coulter, Brea, CA, USA). EVs were isolated from 5 mL BALF. After carefully removing the supernatant, the resulting pellet was resuspended in 200 µL PBS. The size distribution of the purified EVs was analyzed by dynamic light scattering (DLS) using a NanoSight instrument (Malvern Instruments, Worcestershire, UK).

To remove free-floating DNA external to the EVs, the purified EVs were placed in a microtube, treated with 10× reaction buffer [containing 200 mM Tris-HCl (pH 8.3) and 20 mM MgCl2] along with DNase I (Sigma, St. Louis, MO, USA), and incubated at room temperature for 15 minutes. To facilitate the binding of calcium and magnesium ions and to inactivate DNase I, a stop solution [50 mM ethylenediaminetetraacetic acid (EDTA)] was added to the samples. The samples were then heated at 70 ℃ for 10 minutes and chilled on ice. EVs were lysed, and EV-derived DNA was extracted using the High-Pure polymerase chain reaction (PCR) Template Preparation Kit (Roche Diagnostics, Mannheim, Germany). Subsequently, the quality and fragment length of the purified DNA were assessed using TapeStation and Genomic DNA ScreenTape (Agilent Technologies, Santa Clara, CA, USA). The concentration and purity of the DNA samples were measured using a NanoDrop spectrophotometer (Thermo Scientific, Waltham, MA, USA).

Tumor tissue

At least 10 slides, each 5-µm thick, were collected from a formalin-fixed, paraffin-embedded (FFPE) tissue blocks. IHC staining for PD-L1 (22C3 PharmDx) was performed as a standard procedure, and the results from additional clones (SP263 and SP142) were recorded when available. Tumor tissue for WES was isolated from the slides by laser capture microdissection (LCM).

WES and bioinformatics analysis

Library preparation and sequencing

To analyze somatic variants and assess TMB in BALF-EV samples from patients with NSCLC, WES was performed. WES libraries were constructed from BALF EV-derived DNA of 54 tumor-normal matched pairs using the Agilent SureSelect XT Human All Exon v5 (50 Mb) kit according to the manufacturer’s protocol. DNA (25–200 ng) was sheared into fragments with an average length of 150–200 bp by sonication using Covaris M220 instrument. The resulting fragments were subjected to end repair, A-tailing, adapter ligation and PCR amplification before target enrichment. DNA was purified using AMPure beads after each enzymatic reaction, and the size and concentration of the pre-capture library were determined using an Agilent 2200 TapeStation and Quant-iT PicoGreen dsDNA Assay. Then, 750 ng of Illumina paired-end pre-capture library DNA was hybridized to Agilent SureSelect human exome capture probes according to the manufacturer’s protocol. After hybridization, the captured library was amplified and indexed using the SureSelect primer and SureSelect indexing PCR reverse primer. The quality of the post-capture library was assessed using an Agilent 2200 TapeStation and Quant-iT PicoGreen dsDNA assay for product size and concentration. Libraries were normalized to a final concentration of 4 nM based on DNA concentration and average fragment size, and pooled in equal volumes. The libraries were denatured with 0.2N NaOH and diluted to 20 pM with hybridization buffer (Illumina). Libraries were sequenced on an Illumina HiSeq2500 or NovaSeq6000 platform (2×100 bp paired-end reads), with a target mean coverage of 240× for tumor samples and 60× for matched normal samples. Reads passing the Illumina chastity filter were converted to FASTQ format using bcl2fastq (v2.18.0).

Read alignment and preprocessing

The generated FASTQ files were analyzed using an in-house pipeline designed for WES of matched tumor-normal samples. Normal DNA was extracted from the buffy coat of peripheral blood samples. Raw sequencing data were preprocessed using Cutadapt (v2.8) to remove adapter sequences and trim low-quality reads, with a minimum read length of 30 bp. Trimmed reads were aligned to the human reference genome (hg19) using BWA-MEM (v0.7.17). Aligned sequences were sorted using SAMtools (v1.10) and deduplicated using MarkDuplicates (GATK v4.1.4.0) to remove PCR duplicates. WES reads were then realigned, and base quality scores were recalibrated according to GATK best practices to reduce erroneous variant calls.

Somatic variant calling

To improve the accuracy of somatic mutation identification, SomaticSeq (v3.4.0) was applied by integrating results from multiple variant callers. Eight variant callers were used: Mutect2 (v4.0.5), VarScan2 (v2.3.7), VarDict (v1.7.0), LoFreq (v2.1.3.1), Strelka (v2.9.5), JointSNVMix2 (v0.7.5), SomaticSniper (v1.0.5), and MuSE (v1.0). BALF-EV samples were processed identically to tumor tissue samples. Somatic variant calls meeting the following criteria were retained for further analysis: read depth ≥8 in both normal and tumor/BALF-EV samples; ≥3 alternative reads supporting single nucleotide polymorphisms (SNPs) or ≥2 supporting insertions/deletions (INDELs); calls from ≥3 variant callers for SNPs and ≥2 for INDELs (with SomaticSeq counted as one caller); and a maximum population allele frequency <0.01 across 1000 Genomes, gnomAD, and KRGDB (15). Variants were annotated using the Ensembl Variant Effect Predictor (VEP, v107) and the OncoKB Application Programming Interface (API) system. Variants identified as oncogenic by OncoKB were classified as oncogenic variants. Germline variants (SNPs and INDELs) were identified using the GATK HaplotypeCaller.

Biomarker analyses

TMB

TMB was calculated by dividing the number of filtered somatic missense mutations by an estimated coding sequence (CDS) size of 35.5 Mb.

Copy number variation (CNV) and tumor purity

CNVkit program (16) was used for copy number determination from tumor-normal paired BAM files. A reference file (.cnn) was generated using the matched normal BAM, and copy numbers were calculated from the tumor BAM using both on- and off-target reads, yielding copy ratio (.cnr) and copy segment (.cns) files. THetA2 (17) was applied to the .cns file and Mutect2 somatic variant (.vcf) output to estimate tumor purity and subclonal fractions. The estimated tumor purity fractions were incorporated into CNVkit (using the call command) to obtain tumor purity-adjusted copy number data.

Mutational signature analysis

Mutational signatures were investigated using the Mutalisk R package (18) and COSMIC mutational signatures v2. Annotated and filtered VCF files were used as input to obtain decomposed fraction scores for 30 mutational signatures per sample, which compared across the cohort.

Neoantigen prediction

Predicted neoantigens were derived from human leukocyte antigen (HLA) types and somatic missense variants using DeepNeo software (19), which integrates major histocompatibility complex (MHC) binding affinity and T cell receptor (TCR) reactivity to identify immunogenic neoantigens. HLA typing was performed using POLYSOLVER (20), OptiType (21) and HLA-HD (22). For class I alleles (HLA-A and HLA-B), alleles identified by at least two of the three tools were selected. For class II (HLA-DRB1 and HLA-DQB1), HLA-HD results alone were used. Class I neoantigens were predicted from 9-mer neoepitope candidates derived from missense mutations, and class II epitopes were predicted from 15-mer neoepitope candidates. Neoantigen load was quantified as follows: position-based, as the number of mutations generating class I and class II neoantigens; epitope-based, as the number of class I and class II neoepitopes; and gene-based, as the number of genes harboring class I and II neoantigens.

Statistical analysis

A formal a priori sample size calculation was performed prior to the study. Assuming a target Spearman’s correlation coefficient of 0.85 with a 95% confidence interval (CI) width of 0.200 and an anticipated dropout rate of 30%, a minimum of 47 evaluable patients was required, necessitating enrollment of 62 patients. The final evaluable cohort of 44 patients for TMB concordance analysis is consistent with this pre-specified estimate. The survival analyses were not included in the original power calculation and should be interpreted as exploratory.

The concordance of TMB between BALF EV-derived DNA and matched tumor tissue was assessed using Spearman’s rank correlation coefficient, given the non-normal distribution of TMB values. The correlation between VAF from BALF and tissue samples was similarly evaluated using Spearman’s rank correlation. Pearson’s correlation coefficient was used to assess the linear association between TMB scores and PD-L1 TPS.

To determine the optimal TMB cut-off values for stratifying patients into high and low TMB groups, maximally selected log-rank statistics were applied using the ‘maxstat’ package in R. This method identified the cut-off value that maximized the log-rank test statistic for survival outcomes. Where the optimal cut-off produced a substantially imbalanced group distribution, the second maximum was selected to minimize potential bias. Survival outcomes, including OS and PFS, were estimated using the Kaplan-Meier method and compared between TMB groups using the log-rank test. Hazard ratios (HRs) and 95% CIs were calculated using univariable Cox proportional hazards regression and visualized in forest plots.

Continuous variables were expressed as mean ± standard deviation (SD) or median with range, as appropriate. Categorical variables were presented as frequencies and percentages.

Regarding missing data, nine patients in whom tumor tissue NGS was unsuccessful were excluded from the TMB concordance analysis but were retained in the BALF NGS feasibility analysis. No imputation was performed for missing tissue NGS results, as the missingness was attributable to specimen insufficiency rather than systematic bias.

Data preprocessing and statistical analyses were performed using Python 3.10 and R 4.3., and a P value of less than 0.05 was considered statistically significant.


Results

Baseline characteristics

In this study, we initially enrolled 64 patients (Figure 1). Out of these, ten patients were excluded due to various reasons: seven because of insufficient tissue for NGS, two cases of laboratory errors and one case of inadequate BALF DNA. Additionally, ten patients were excluded for statistical analysis because of mismatches between the locations of tumor tissue and BALF: four were due to BALF being collected from the adjacent bronchus to the endobronchial lesion, one from supraclavicular lymph node tissue, and three because of a mean NGS depth below 50×, which included two cases in tissue samples and one in BALF, and two patients were foreign nationals. After these exclusions, 44 patients were included for comparative TMB analysis between the BALF and tissue samples. Interestingly, among the nine patients for whom NGS failed in the tissue analysis, BALF analysis was successful. NGS analysis was successfully performed in 54 patients using BALF, indicating a higher success rate for testing using BALF.

Figure 1 Patient enrollment and selection flowchart. Of 64 enrolled patients, 10 were excluded before NGS analysis (7 insufficient tissue, 2 laboratory errors, 1 insufficient BALF DNA), leaving 54 patients who underwent BALF NGS. A further 10 were excluded from the TMB concordance analysis due to anatomically discordant sampling sites (n=5), low sequencing depth (n=3), or foreign nationality (n=2), yielding a final analysis cohort of 44 patients. Of the 54 patients in whom BALF NGS was attempted, sequencing was successfully completed in 53 (98.1%). BALF, bronchoalveolar lavage fluid; LN, lymph node; NGS, next-generation sequencing; NSCLC, non-small cell lung cancer; TMB, tumor mutation burden.

This cohort comprised 52 males (81.3%) and 12 females (18.8%), with an age distribution ranging from 52 to 89 years (mean age: 68.5±8.9 years) (Table 1). Smoking status was categorized as never smoker (18.7%), ex-smoker (29.7%), and current smoker (51.6%). Adenocarcinoma was the most prevalent (64.1%), followed by squamous cell carcinoma (25%), and non-small cell carcinoma (10.9%). The average tumor purity was 44.6% [standard deviation (SD): 20.2%], ranging from 5% to 90%, categorized into 5% to <30% (17%), 30% to <50% (27.8%), and ≥50% (35.2%), with 1.9% of cases not available. PD-L1 expression (22C3) was observed to be ≥50% in 90.7% and <50% in 9.3% of patients, respectively. Pembrolizumab treatment was administered as the first-line treatment in 17.2%, second-line treatment in 60.1%, and third-line or later in 21.9% of their treatment courses. While 96.8% of patients received pembrolizumab as a single agent, 4.7% received it in combination with pemetrexed and carboplatin. The average number of pembrolizumab treatment cycles was 12.6 (SD: 10.8; range, 1–35; median: 9), with 48.4% receiving 1–8 cycles and 51.6% receiving 9–35 cycles (more than 6 months). Treatment response was categorized as partial response (20.7%), stable disease (59.4%), progressive disease (7.8%), or 3.1% of cases not evaluated. At the time of analysis, the median follow-up duration was 17.0 months (range, 1–63 months).

Table 1

Baseline characteristics

Characteristics Value
Age (years) 68.5±8.9 [52–89]
Sex
   Male 52 (81.3)
   Female 12 (18.8)
Smoking status
   Never smoker 12 (18.7)
   Ex-smoker 19 (29.7)
   Current smoker 33 (51.6)
Pathology
   Adenocarcinoma 41 (64.1)
   Squamous cell carcinoma 16 (25.0)
   Non-small cell carcinoma 7 (10.9)
Tumor purity (%) 44.6±20.2 [5–90]
   5 to <30 9 (17.0)
   30 to <50 15 (27.8)
   ≥50 19 (35.2)
   N/A 1 (1.9)
PD-L1 (22C3, %)
   ≥50 58 (90.7)
   <50 5 (9.3)
Order of pembrolizumab
   1st line 11 (17.2)
   2nd line 39 (60.1)
   3rd line or more 14 (21.9)
Regimen of pembrolizumab
   Single 61 (96.8)
   Combination (with pemetrexed/carboplatin) 3 (4.7)
Treatment cycles of pembrolizumab 12.6±10.8; 9 [1–35]
   1–8 31 (48.4)
   9–35 (longer than 6 months) 33 (51.6)
Best response
   Partial response 19 (20.7)
   Stable disease 38 (59.4)
   Progressive disease 5 (7.8)
   N/A 2 (3.1)

Data are presented as mean ± SD [range], n (%), or mean ± SD; median [range]. N/A, not available; PD-L1, programmed death-ligand 1; SD, standard deviation.

NGS analysis and TMB scores of BALF and tissue samples

We evaluated the DNA quality between BALF and tissue samples across various categories (Figure 2). Although the total amount was lower in BALF than in tissue (tissue: 372.5 ng vs. BALF: 63.5 ng), the difference was not statistically significant (P=0.778). Additionally, other parameters such as library size, mean depth, uniformity, tumor purity, and distribution of fragment length showed similar patterns in both BALF and tissue samples. These findings underscore the comparable quality of DNA obtained from BALF and tissue, affirming the potential of BALF as a reliable source for genomic analysis of lung cancer.

Figure 2 Comparison of DNA quality between BALF EV-derived DNA and matched tumor tissue (n=54). Box plots show (A) DNA amount, (B) estimated library size, (C) mean sequencing depth, (D) uniformity, and (E) tumor purity. (F) Bean plots depict DNA fragment length distribution (tissue median 156 bp; BALF median 168 bp). No significant differences were observed between groups across all metrics (all P>0.05). BALF, bronchoalveolar lavage fluid; EV, extracellular vesicle.

A comparative analysis of the TMB scores and VAF derived from BALF, and tissue samples is presented in Figure 3. The left panel illustrates the distribution of TMB scores as box plots, indicating a median TMB of 4.738 in tissue samples, which is relatively higher than the median of 1.8245 in BALF samples. The relationship between BALF and tissue samples displayed a modest correlation, as evidenced by the scatter plot and correlation coefficient (R) of 0.614 (Figure S1). The right panel displays the correlation between the VAF in the tissue and BALF samples. The scatter plot revealed a moderate positive correlation with a correlation coefficient (R) of 0.46283.

Figure 3 Correlation of TMB and VAF between BALF EV-derived DNA and matched tumor tissue (n=44). Box plots and scatter plots show (A) TMB (tissue median 4.738 vs. BALF median 1.825 mut/Mb; R=0.614, P<0.001) and (B) VAF (tissue median 0.245 vs. BALF median 0.157; R=0.463, P<0.001). Shaded areas represent 95% confidence intervals. BALF, bronchoalveolar lavage fluid; EV, extracellular vesicle; TMB, tumor mutation burden; VAF, variant allele frequency.

At the time of patient enrollment, local insurance criteria permitted pembrolizumab treatment only for those with PD-L1 expression levels ≥50%, resulting in over 90% of participants meeting this criterion. Given the predominance of high PD-L1 expression, we were unable to determine its correlation with TMB (Figure S2). Pearson’s correlation coefficient between TMB score and PD-L1 also did not show a significant result. Furthermore, this skewed representation appeared to limit the discriminative power of weighted prognostic predictions using the TPS of PD-L1.

Treatment responses according to TMB scores

The comparative analysis of survival outcomes, delineated by PFS and OS, in cohorts with high versus low TMB derived from BALF and tissue samples is shown in Figures 4,5, and Figure S2. In the process of establishing criteria for differentiating high and low TMB groups in BALF and tissue, an analysis using a TMB cut-off of 0.5 was conducted (Figures S3,S4) (19,23). This examination revealed that optimal separation capabilities were achieved at a threshold of 1.5/Mb for BALF and 4.5/Mb for tissue. To determine the optimal cut-offs, we employed the ‘maxstat’ of R package, which utilizes maximally selected log-rank statistics to identify the best separation based on survival outcomes (Figure S5). When applying the log-rank cut-off approach, the cut-offs that produced the maximum log-rank statistic were EV TMB of >0.594 and tissue TMB of >4.639. However, for EV TMB, a cut-off of >0.594 resulted in a high group of 40 patients and a low group of 13 patients, which significantly skewed the distribution toward the high group, potentially introducing bias. By selecting the second maximum, the cut-off for EV TMB became 1.471, which is nearly identical to our best P value cut-off of 1.5. Similarly, for tissue TMB, the log-rank method produced a cut-off of 4.639, which was very close to the reported cut-off of 4.5. Consequently, we decided to apply these criteria to the subsequent analyses in this study. While PFS did not exhibit significant differences between the high and low TMB groups (Figure 4), OS analysis indicated a marked improvement in the high TMB group (Figure 5). This suggests that a higher TMB is associated with a more favorable OS outcome in patients with advanced lung cancer, underscoring the prognostic value of TMB in predicting long-term survival. When stratified according to the histological type, the results were aligned with the overall cohort findings. PFS did not exhibit significant differences; however, OS was more favorable in the high TMB group. This pattern suggests that a high TMB may be a consistent indicator of improved OS across various types of lung cancer histology. Due to the limited number of participants with specific subtypes, it was not feasible to conduct a separate statistical analysis for this group. Therefore, these patients were included in a broader category designated as ‘non-adenocarcinoma’ (Figure S5) and ‘non-squamous cell carcinoma’ (Figure S6) for the purposes of our analysis.

Figure 4 PFS according to TMB status derived from BALF EVs and tissue. Kaplan-Meier curves for PFS stratified by (A) BALF EV-TMB (cut-off 1.471 mut/Mb; n=53; HR =0.597, P=0.13) and (B) tissue TMB (cut-off 4.639 mut/Mb; n=48; HR =0.544, P=0.10). No significant difference in PFS was observed between high and low TMB groups in either analysis. BALF, bronchoalveolar lavage fluid; EV, extracellular vesicle; HR, hazard ratio; PFS, progression-free survival; TMB, tumor mutation burden.
Figure 5 OS according to TMB status derived from BALF EVs and tumor tissue. Kaplan-Meier curves for OS stratified by (A) BALF EV-TMB (cut-off 1.471 mut/Mb; n=53; HR =0.292, P=0.008) and (B) tissue TMB (cut-off 4.639 mut/Mb; n=48; HR =0.129, P<0.001). High TMB was significantly associated with improved OS in both analyses. BALF, bronchoalveolar lavage fluid; EV, extracellular vesicle; HR, hazard ratio; OS, overall survival; TMB, tumor mutation burden.

Discussion

Our findings demonstrate the technical feasibility of WES-based TMB quantification using BALF EV-derived DNA and provide preliminary evidence for its prognostic relevance in pembrolizumab-treated advanced NSCLC. We further attempted to determine the utility of BALF TMB as an effective biomarker in identifying NSCLC patients likely to benefit from treatment with pembrolizumab.

The success rate of NGS in BALF was higher than that in tissue samples, with successful sequencing in 53 out of 64 (82.8%) BALF samples compared to 46 out of 64 (71.9%) tissue samples. Although the absolute BALF EV-TMB values were lower than tissue TMB (median 1.825 vs. 4.738 mut/Mb), likely reflecting dilution of tumor-derived EVs during the BAL procedure in which 100–150 mL of saline is instilled, the DNA quality of BALF was sufficient to be comparable to that of tissue for TMB calculation. Moreover, the correlation of TMB between BALF and tissue samples was modest and statistically significant (R=0.61432, P<0.001). Tissue biopsy remains the standard for lung cancer diagnosis; however, it presents certain challenges as approximately 30% of biopsies do not provide sufficient molecular data needed to determine the best therapeutic strategy. Liquid biopsy offers a promising complementary approach owing to its convenience, ease of performance, monitoring capability, and less invasive nature despite its lower sensitivity. Liquid biopsy is already actively utilized for specific mutation testing, and analyzing TMB through NGS has shown results comparable to those obtained from tissue samples (24,25). Accordingly, for assays which blood-based analysis is applicable, most can also be conducted using BALF. Unlike plasma cfDNA, which largely originates from passive release through apoptosis and necrosis of dying tumor cells, a substantial proportion of EV-derived DNA is thought to be actively secreted by viable tumor cells (26,27). This biological distinction suggests that BALF EV-TMB may more accurately reflect the current mutational burden of viable, immunologically active tumor cells, potentially explaining its association with OS despite lower absolute mutation counts. In lung cancer patients, it is anticipated that BALF contains materials originating from or secreted by tumor tissue in a higher proportion than blood, thus holding the potential to increase sensitivity and to be a promising specimen for reliable biomarker detection.

NGS assays have been approved for identifying mutations predictive of response to targeted therapies and the use of targeted-gene NGS panels on ctDNA could enhance clinical practice by identifying actionable genomic alterations, thus allowing for the administration of tailored treatments (28,29). It has been postulated that if targeted panel sequencing had been utilized rather than WES, it may have led to enhanced outcomes, potentially due to the higher quality and depth of sequencing achievable with targeted panels (13,30). By implementing targeted panel NGS, it is also possible to achieve savings in both time and cost. This approach is particularly relevant in cases where NGS using tissue samples is not feasible due to tissue shortages and DNA quality issues. In these patients, BALF-based NGS has emerged as a feasible alternative. Although pathologists reviewed the tissue slides and conducted LCM on lesions with a high tumor proportion for NGS, the analysis failed in nine cases. We attempted to identify a distinctive mutational signature in smokers using WES, but no specific pattern was observed (Figure S7).

Additionally, patients with higher TMB from BALF EV DNA exhibited improved OS compared to those with lower TMB, underscoring the prognostic value of high TMB levels in advanced NSCLC patients who were treated with pembrolizumab. Most participants received pembrolizumab as second-line treatment, and a significant number were treated with pembrolizumab monotherapy. Despite these clinical characteristics, patients with high TMB exhibited better prognoses. These results align with the findings of previous studies (30,31) and highlight the potential of TMB as a biomarker for stratifying patients in terms of their overall prognosis. As shown in Figure S2, neoantigens were also detectable in BALF, and similar to tissue samples, patients with detectable neoantigens in BALF exhibited a trend toward better OS. However, the role of neoantigen detection in BALF as a standalone biomarker remains unclear.

Our study has several limitations. First, although a formal a priori sample size calculation was conducted and the enrolled cohort met the pre-specified requirement, the overall sample size remained limited (n=44 for TMB concordance analysis). The survival analyses in particular were exploratory in nature and were not powered a priori, and therefore these findings require validation in larger prospective cohorts. Second, the clinical cohort was obtained during the coronavirus disease 2019 (COVID-19) pandemic, which might have influenced the overall study duration and the patient enrollment criteria (32,33). The process of registering patients extended over a year beyond the anticipated timeframe. Concurrently, the National Health Insurance Service in South Korea expanded reimbursement coverage for pembrolizumab. These changes over this period led to more diverse characteristics in the patient population under study. Third, we hypothesized that performing BAL in the adjacent bronchus, when endobronchial lesions made conventional BAL challenging, could still provide a tumor microenvironment similar to that of the primary lesion. However, the substantially lower TMB observed in BALF compared to tissue samples suggests that this location may not adequately reflect the tumor’s characteristics. Furthermore, the BAL procedure lavages a broader anatomical compartment than a single-site tissue biopsy, integrating tumor-derived EVs from across the lung segment. This ‘pooling effect’ may allow BALF EV-TMB to capture a more representative sample of the tumor’s overall mutational landscape, potentially mitigating the impact of spatial intratumoral heterogeneity that limits the interpretability of single-site tissue biopsy results. Fourth, the predominance of patients with PD-L1 TPS ≥50% (90.7%) reflects the Korean national health insurance reimbursement criteria at the time of enrollment, which may limit the generalizability of our findings to patients with lower PD-L1 expression.


Conclusions

These findings demonstrate the feasibility of WES-based TMB quantification from BALF EV-derived DNA and suggest its potential as a complementary prognostic measure in pembrolizumab-treated NSCLC. Validation in larger, prospective cohorts is warranted before clinical implementation.


Acknowledgments

A preliminary version of this abstract was presented as a poster at the American Association for Cancer Research (AACR) International Conference in Orlando, USA, Apr 14–19, 2023 (Cancer Res, 2023; Abstract 1027.)


Footnote

Reporting Checklist: The authors have completed the STROBE reporting checklist. Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0294/rc

Data Sharing Statement: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0294/dss

Peer Review File: Available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0294/prf

Funding: This research was supported in part by a research grant from the Investigator-Initiated Studies Program of Merck Sharp & Dohme Corp. (grant No. MK3475-936). The opinions expressed in this paper are those of the authors and do not necessarily represent those of Merck Sharp & Dohme Corp.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tlcr.amegroups.com/article/view/10.21037/tlcr-2026-0294/coif). All authors report the funding from a Merck Investigator Studies Program Grant from Merck Sharp & Dohme LLC (Investigator Initiated Studies Oncology Translational Studies Program; No. MK3475-936). S.J.N. is an employee of Neogenlogic. The authors have no other conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Institutional Review Board (IRB) of each institution, and written informed consent was obtained from all patients.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Khoja L, Butler MO, Kang SP, et al. Pembrolizumab. J Immunother Cancer 2015;3:36. [Crossref] [PubMed]
  2. Vaddepally RK, Kharel P, Pandey R, et al. Review of Indications of FDA-Approved Immune Checkpoint Inhibitors per NCCN Guidelines with the Level of Evidence. Cancers (Basel) 2020;12:738. [Crossref] [PubMed]
  3. National Comprehensive Cancer Network. NCCN Clinical Practice Guidelines in Oncology: Non-Small Cell Lung Cancer. Version 4. 2026. Available online https://www.nccn.org/professionals/physician_gls/pdf/nscl.pdf
  4. Reck M, Rodríguez-Abreu D, Robinson AG, et al. Pembrolizumab versus Chemotherapy for PD-L1-Positive Non-Small-Cell Lung Cancer. N Engl J Med 2016;375:1823-33. [Crossref] [PubMed]
  5. Mok TSK, Wu YL, Kudaba I, et al. Pembrolizumab versus chemotherapy for previously untreated, PD-L1-expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): a randomised, open-label, controlled, phase 3 trial. Lancet 2019;393:1819-30. [Crossref] [PubMed]
  6. Torlakovic E, Lim HJ, Adam J, et al. "Interchangeability" of PD-L1 immunohistochemistry assays: a meta-analysis of diagnostic accuracy. Mod Pathol 2020;33:4-17. [Crossref] [PubMed]
  7. Sholl LM, Hirsch FR, Hwang D, et al. The Promises and Challenges of Tumor Mutation Burden as an Immunotherapy Biomarker: A Perspective from the International Association for the Study of Lung Cancer Pathology Committee. J Thorac Oncol 2020;15:1409-24. [Crossref] [PubMed]
  8. Hellmann MD, Paz-Ares L, Bernabe Caro R, et al. Nivolumab plus Ipilimumab in Advanced Non-Small-Cell Lung Cancer. N Engl J Med 2019;381:2020-31. [Crossref] [PubMed]
  9. Addeo A, Banna GL, Weiss GJ. Tumor Mutation Burden-From Hopes to Doubts. JAMA Oncol 2019;5:934-5. [Crossref] [PubMed]
  10. Fenizia F, Pasquale R, Roma C, et al. Measuring tumor mutation burden in non-small cell lung cancer: tissue versus liquid biopsy. Transl Lung Cancer Res 2018;7:668-77. [Crossref] [PubMed]
  11. Wan JCM, Massie C, Garcia-Corbacho J, et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat Rev Cancer 2017;17:223-38. [Crossref] [PubMed]
  12. Hur JY, Kim HJ, Lee JS, et al. Extracellular vesicle-derived DNA for performing EGFR genotyping of NSCLC patients. Mol Cancer 2018;17:15. [Crossref] [PubMed]
  13. Lee SE, Park HY, Hur JY, et al. Genomic profiling of extracellular vesicle-derived DNA from bronchoalveolar lavage fluid of patients with lung adenocarcinoma. Transl Lung Cancer Res 2021;10:104-16. [Crossref] [PubMed]
  14. Tsering T, Li M, Chen Y, et al. EV-ADD, a database for EV-associated DNA in human liquid biopsy samples. J Extracell Vesicles 2022;11:e12270. [Crossref] [PubMed]
  15. Jung KS, Hong KW, Jo HY, et al. KRGDB: the large-scale variant database of 1722 Koreans based on whole genome sequencing. Database (Oxford) 2020;2020:baz146. [Crossref] [PubMed]
  16. Talevich E, Shain AH, Botton T, et al. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol 2016;12:e1004873. [Crossref] [PubMed]
  17. Oesper L, Satas G, Raphael BJ. Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data. Bioinformatics 2014;30:3532-40. [Crossref] [PubMed]
  18. Lee J, Lee AJ, Lee JK, et al. Mutalisk: a web-based somatic MUTation AnaLyIS toolKit for genomic, transcriptional and epigenomic signatures. Nucleic Acids Res 2018;46:W102-8. [Crossref] [PubMed]
  19. Kim JY, Cha H, Kim K, et al. MHC II immunogenicity shapes the neoepitope landscape in human tumors. Nat Genet 2023;55:221-31. [Crossref] [PubMed]
  20. Shukla SA, Rooney MS, Rajasagi M, et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat Biotechnol 2015;33:1152-8. [Crossref] [PubMed]
  21. Szolek A, Schubert B, Mohr C, et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics 2014;30:3310-6. [Crossref] [PubMed]
  22. Kawaguchi S, Higasa K, Shimizu M, et al. HLA-HD: An accurate HLA typing algorithm for next-generation sequencing data. Hum Mutat 2017;38:788-97. [Crossref] [PubMed]
  23. Kim K, Kim HS, Kim JY, et al. Predicting clinical benefit of immunotherapy by antigenic or functional mutations affecting tumour immunogenicity. Nat Commun 2020;11:951. [Crossref] [PubMed]
  24. Marinello A, Tagliamento M, Pagliaro A, et al. Circulating tumor DNA to guide diagnosis and treatment of localized and locally advanced non-small cell lung cancer. Cancer Treat Rev 2024;129:102791. [Crossref] [PubMed]
  25. Horndalsveen H, Haakensen VD, Madebo T, et al. Blood-based tumor mutational burden as a biomarker in unresectable non-small cell lung cancer treated with chemoradiotherapy and durvalumab. Front Oncol 2025;15:1681420. [Crossref] [PubMed]
  26. Hu Z, Chen H, Long Y, et al. The main sources of circulating cell-free DNA: Apoptosis, necrosis and active secretion. Crit Rev Oncol Hematol 2021;157:103166. [Crossref] [PubMed]
  27. Tsering T, Nadeau A, Wu T, et al. Extracellular vesicle-associated DNA: ten years since its discovery in human blood. Cell Death Dis 2024;15:668. [Crossref] [PubMed]
  28. Téllez Castillo N, Goyeneche-García AM, Montoya Quesada LM, et al. Diagnostic accuracy of next-generation sequencing (NGS) for identifying actionable mutations in advanced non-small cell lung cancer: Systematic Review and Meta-Analysis. Clin Transl Oncol 2026;28:1005-15. [Crossref] [PubMed]
  29. Raez LE, Brice K, Dumais K, et al. Liquid Biopsy Versus Tissue Biopsy to Determine Front Line Therapy in Metastatic Non-Small Cell Lung Cancer (NSCLC). Clin Lung Cancer 2023;24:120-9. [Crossref] [PubMed]
  30. Nair VS, Hui AB, Chabon JJ, et al. Genomic Profiling of Bronchoalveolar Lavage Fluid in Lung Cancer. Cancer Res 2022;82:2838-47. [Crossref] [PubMed]
  31. Aggarwal C, Ben-Shachar R, Gao Y, et al. Assessment of Tumor Mutational Burden and Outcomes in Patients With Diverse Advanced Cancers Treated With Immunotherapy. JAMA Netw Open 2023;6:e2311181. [Crossref] [PubMed]
  32. Park JJH, Mogg R, Smith GE, et al. How COVID-19 has fundamentally changed clinical research in global health. Lancet Glob Health 2021;9:e711-20. [Crossref] [PubMed]
  33. McDonald K, Seltzer E, Lu M, et al. Quantifying the impact of the COVID-19 pandemic on clinical trial screening rates over time in 37 countries. Trials 2023;24:254. [Crossref] [PubMed]
Cite this article as: Kim H, Hur J, Kim WS, Kim IA, Park D, Oh IJ, Noh SJ, Bang H, Lee KY. Assessment of tumor mutation burden based on whole-exome sequencing of extracellular vesicle-derived DNA from bronchoalveolar lavage fluid in advanced non-small cell lung cancer treated with pembrolizumab: a prospective, multicenter, observational study. Transl Lung Cancer Res 2026;15(6):176. doi: 10.21037/tlcr-2026-0294

Download Citation