The path to a better biomarker: application of a risk management framework for the implementation of PD-L1 and TILs as immuno-oncology biomarkers in breast cancer clinical trials and daily practice
Abstract
Immune checkpoint inhibitor therapies targeting PD-1/PD-L1 are now the standard of care in oncology across several hematologic and solid tumor types, including triple negative breast cancer (TNBC). Patients with metastatic or locally advanced TNBC with PD-L1 expression on immune cells occupying ≥1% of tumor area demonstrated survival benefit with the addition of atezolizumab to nab-paclitaxel. However, concerns regarding variability between immunohistochemical PD-L1 assay performance and inter-reader reproducibility have been raised. High tumor-infiltrating lymphocytes (TILs) have also been associated with response to PD-1/PD-L1 inhibitors in patients with breast cancer (BC). TILs can be easily assessed on hematoxylin and eosin–stained slides and have shown reliable inter-reader reproducibility. As an established prognostic factor in early stage TNBC, TILs are soon anticipated to be reported in daily practice in many pathology laboratories worldwide. Because TILs and PD-L1 are parts of an immunological spectrum in BC, we propose the systematic implementation of combined PD-L1 and TIL analyses as a more comprehensive immuno-oncological biomarker for patient selection for PD-1/PD-L1 inhibition-based therapy in patients with BC. Although practical and regulatory considerations differ by jurisdiction, the pathology community has the responsibility to patients to implement assays that lead to optimal patient selection. We propose herewith a risk-management framework that may help mitigate the risks of suboptimal patient selection for immuno-therapeutic approaches in clinical trials and daily practice based on combined TILs/PD-L1 assessment in BC. © 2020 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Introduction
Immune checkpoint inhibitor (ICI) therapies targeting programmed cell death 1 (PD-1) and programmed death ligand 1 (PD-L1) are now the standard of care in oncology. Anti-PD-1 pembrolizumab (Keytruda, Merck & Co. Inc., Kenilworth, NJ, USA) and nivolumab (Opdivo, Bristol-Myers Squibb Company, New York, NY, USA), and anti-PD-L1 atezolizumab (Tecentriq, Genentech Inc, South San Francisco, CA, USA), durvalumab (Imfinzi, AstraZeneca plc, Cambridge, UK), and avelumab (Bavencio, Merck KGA, Darmstadt, Germany) have been approved to treat multiple tumor types, in many countries. To date, atezolizumab specifically has been approved for triple-negative breast cancer (TNBC). At the same time, immunohistochemistry (IHC)–based detection of PD-L1 expression has been proposed as the predictive biomarker to select patients that may benefit from these therapies. Five primary antibody clones have been developed in the form of assays paired with a specific staining platform. PD-L1 22C3 (Agilent Technologies Inc., Santa Clara, CA, USA), 28-8 (Agilent Technologies Inc.), SP142 (Roche Tissue Diagnostics, Tucson, AZ, USA), SP263 (Roche Tissue Diagnostics), and 73-10 (Agilent Technologies Inc.) have been used in clinical trials of the above-mentioned drugs, respectively. In addition, laboratory-developed tests (LDTs) using any of the above-mentioned primary antibodies or the E1L3N clone with different staining platforms are in use in research and clinical scenarios. Parallel to the multiple assays, multiple scoring systems exist. Table 1 shows technical details and defines scoring methods used for each antibody. Furthermore, different cut-offs are used to define PD-L1 positivity for different tumor types, whereas for certain indications PD-L1 testing is not required for PD-1/PD-L1 inhibition–based therapy, from now on referred to as ICI.
Commercial diagnostic assays used in clinical trials | Biosimilar diagnostic antibodies used in clinical practice | |||||
---|---|---|---|---|---|---|
Assay | SP142 | 22C3 | SP263 | 73-3 | 28-8 | E1L3N (Cell Signaling, Technology), CAL10 (Zytomed), QR1 (Quartett), ZR3 (Cell Mark) |
Binding epitope | C-terminus cytoplasmic domain | Discontinuous segments on the extracellular domain | C-terminus cytoplasmic domain | C-terminus cytoplasmic domain | Discontinuous segments on the extracellular domain | E1L3N: C-terminus cytoplasmic domain |
Platform | Ventana BenchMark ULTRA | Agilent Link 48 | Ventana BenchMark ULTRA | Agilent Link 48 | Agilent Link 48 | Any |
Scored cell type | IC | IC and TC | IC or TC | IC or TC | TC | Depending on score |
Scoring system | ICA: PD-L1 + IC tumor area |
CPS: PD-L1 + IC + PD-L1 + TC TC TPS: PD-L1 + TC TC |
T%: PD-L1 + TC TC ICIC%: PD-L1 + IC IC |
T%: PD-L1 + TC TC ICIC%: PD-L1 + IC IC |
T%: PD-L1 + TC TC |
Depending |
Partner drug | Atezolizumab | Pembrolizumab | Durvalumab | Avelumab | Nivolumab | Any |
Breast cancer clinical trials | IMpassion130 NCT01375842 NCT01633970 KATE-2 |
KEYNOTE-119 KEYNOTE-150 KEYNOTE-086 KEYNOTE-012 PANACEA KEYNOTE-173 KEYNOTE-552 TONIC (Nivolumab) |
GeparNuevo | JAVELIN | None |
- CPS, combined positive score; IC, immune cells; TC, tumor cells; TPS, tumor positive score.
For several years the oncology and pathology communities have raised concerns about the reliability of IHC-based detection of PD-L1 to appropriately select patients for ICI. To date, although PD-L1 is currently the only approved biomarker for these agents, it remains controversial given the complexities of its clinical use due to variability in assay performance of the PD-L1 IHC antibodies, spatial and temporal heterogeneity, absence of a unified scoring system, and concerns about inter-reader reproducibility for scoring PD-L1 on immune cell (ICs). Due to these inconsistencies, some patients who could benefit might not receive treatment, whereas others may be treated based on erroneous test results, exposing them to potential adverse side effects with no drug benefit. In addition, because PD-1/PD-L1 interaction is only one of many factors that may determine the clinical response to immunotherapeutics, it is unlikely that a single biomarker will sufficiently predict clinical outcomes in response to ICI. The use of composite biomarkers can provide biologically relevant information on multiple factors that determine response. In a meta-analysis, combined biomarker approaches such as PD-L1 IHC and tumor mutational burden (TMB) and multiplex fluorescent IHC-evaluating protein co-expression and spatial relationships, demonstrated an improved performance over PD-L1 or TMB alone 1. As guardians of patient's samples, pathologists partnered with clinicians, industry, and regulators must guide evidence-based inclusion of biomarkers in clinical trials and daily practice to ensure the best patient outcomes possible. Stromal tumor-infiltrating lymphocytes (TILs) have also been studied as a predictive biomarker of response to ICI for a variety of cancers including breast cancer (BC). TILs can be assessed on a simple hematoxylin and eosin (H&E) slide with reliable reproducibility among pathologists when they adhere to the standardized method 2, 3. We propose PD-L1 and TILs as a more comprehensive composite biomarker.
A good biomarker should be analytically valid, robust, reproducible, and clinically useful. To be incorporated into daily practice, it must also be affordable and accessible to pathologists in both academic and community-hospital practices worldwide 4. In this review, we propose a systematic implementation of combined PD-L1 and TIL analysis as a comprehensive immuno-oncological biomarker for patient selection for ICI in both clinical trials and daily practice. In support of this position, we outline the evolution of PD-L1 and TILs as biomarkers, from the analytical and clinical validation phases through clinical implementation, review the challenges we have encountered, and propose mitigation approaches within a risk-management framework as previously published 5. The collective of available evidence anticipates enhancement of patient selection and safety by the systematic implementation of combined PD-L1 and TIL analysis.
Technical validation phase: analytical validity of PD-L1 IHC
Biomarker development starts with an initial discovery in pre-clinical studies, which we do not cover in this review, followed by a validation phase in which the biomarker is adapted to clinically applicable assay platforms and subjected to analytical and clinical validation 6. For PD-L1 IHC, analytical validity refers to the accuracy and consistency of the technique to detect the presence of PD-L1 protein. To be able to analyze the accuracy and consistency of the test we must first define the presence of PD-L1 protein. PD-L1 can be expressed on solid and hematologic tumor cells (TCs) and on ICs, including macrophages, dendritic cells, lymphocytes, and granulocytes 7, 8. PD-L1 is expressed in the cytoplasm and/or on the cell membrane. A PD-L1-positive (PD-L1+) TC has been defined as showing partial or complete membranous staining of any intensity 8-13. Accompanying cytoplasmic staining is often observed but ignored in TC. On the other hand, a PD-L1+ IC is one that shows membranous or cytoplasmic staining of any intensity. Cytoplasmic staining may show a punctate or granular pattern, most commonly observed with SP142 11, 12, 14. IC can be observed in aggregates or as single cells dispersed in the intratumoral or peritumoral stroma as well as admixed with TC 8, 14.
Chromogenic IHC-based detection of PD-L1 has been largely concordant with other methods to detect PD-L1 expression, such as immunofluorescence, mass spectrometry, and RNA in situ hybridization 9, 15-18. Each PD-L1 diagnostic kit has shown precision, reproducibility, and robustness when standard operating procedures and optimization of conditions are followed 8, 14, 19-21. Studies comparing PD-L1 assays performance on archival, routine clinical practice, and clinical trial TNBC samples have shown discrepancies among SP142, SP263, and 22C3 assays. PD-L1 positivity defined as the proportion of tumor area occupied by PD-L1- positive immune cells (ICA) ≥1% with SP142 showed between 20 and 38, 10 and 35, and 7 and 19% fewer PD-L1+ cases compared to SP263 ICA ≥1% and 22C3 combined positive score (CPS) ≥1 and ICA ≥1%, respectively 22-26. Prevalence with each assay is shown in Table 2. Similar findings were observed in previous multi-institutional studies on archival clinical non–small cell lung cancer (NSCLC) and urothelial carcinoma specimens, in which results between 22C3, 28-8, SP263, 73-3, and E1L3N assays were broadly comparable, whereas SP142 has shown lower PD-L1 expression on both TC and IC 9, 10, 12, 13, 16, 38-43.
Study | Samples number and site | SP142 | SP263 | 22C3 | Others |
---|---|---|---|---|---|
Scott et al 22 | 196 TNBC | ICA ≥ 1%:32% CPS ≥ 1: 35% TC% ≥ 1%: 11% |
ICA ≥ 1%:54% CPS ≥ 1: 64% TC% ≥ 1%: 53% |
ICA ≥ 1%:51% CPS ≥ 1: 60% TC% ≥ 1%: 50% |
28-8 ICA ≥ 1%:46% CPS ≥ 1: 52% TC% ≥ 1%: 35% |
Noske et al 23 | 30 primary TNBC samples | ICA ≥ 1%: 50% | ICA ≥ 1%: 87% | ICA ≥ 1%: 57% CPS ≥ 1: 60% |
28-8 ICA ≥ 1%: 63% |
Noske et al 23 | 104 primary TNBC samples | ICA ≥ 1%: 44% | ICA ≥ 1%: 82% | ||
Reisenbichler et al 25 | 68–76 primary TNBC samples | ICA ≥ 1%: 58% (n = 68) | ICA ≥ 1%: 78% (n = 76) | ||
IMpassion130 NCT02425891 24 | 614 primary and metastatic TNBC samples | ICA ≥ 1%: 46% | IC ≥ 1%:75% | CPS ≥ 1: 81% | |
902 primary and metastatic TNBC samples | All: ICA ≥1%:41% primary: ICA ≥1%:44% metastatic: ICA ≥1%:36% All: TC% ≥1%: 9% (900) |
||||
FDA SSED 14 | 2744 primary and 50 metastatic TNBC samples | All: ICA ≥1%:50% primary: ICA ≥1%:50% metastatic: ICA ≥1%:78% |
|||
Carter et al 29 | 500 chemotherapy naïve TNBC | ICA ≥1%: 46% TC% ≥1%: 9% |
|||
Downes et al 26 | 30 BC | ICA ≥1%:47–50% | CPS ≥1: 53–63% | E1L3N: ICA ≥1%:53–63% CPS ≥1: 53–67% |
|
NCT01633970 30 | 24 TNBC | ICA ≥1%: 50% TC ≥1%:17% (of which 92% were ICA ≥1%) |
|||
NCT01375842 7 | 112 TNBC | ICA ≥1%: 78%* | |||
KEYNOTE-119 NCT02555657 31 | 622 TNBC | CPS ≥1: 65% CPS ≥10: 31% CPS ≥20: 18% |
|||
KEYNOTE-012 NCT01848834 32 | 111 TNBC | CPS ≥1: 59% | |||
KEYNOTE-086 NCT02447003 33 | 170 primary and metastatic samples TNBC | CPS ≥1: 62% | |||
KEYNOTE-150 NCT02513472 34 | 107 TNBC | CPS ≥1: 46% | |||
JAVELIN NCT01772004 35 | 136 BC, 48 TNBC | 73-3 All: IC ≥10%: 9% TNBC: IC ≥10%: 19% |
|||
TONIC NCT02499367 36 | 70 metastatic TNBC samples | IC ≥1%: 86% IC ≥5%: 67% |
|||
GeparNuevo NCT02685059 37 | 158 TNBC | ICIC% and/or TC% ≥1%: 87% |
- CPS, combined positive score; FDA SSED, U.S. Food and Drug Administration summary of safety and effectiveness data; IC, immune cells; met, metastatic or non-primary sample; n, number of patients included in the analysis; prim, primary sample; TC, tumour cells.
- * The first 25 patients were selected only if PD-L1+, then enrolment was extended to all patients, explaining the higher PD-L1 prevalence.
To investigate this discordance, a study mapped the antibody-binding sites for each antibody 44. SP142, SP263, and E1L3N bind amino acid residues in the cytoplasmic tail of PD-L1 14, 44, 45, whereas 22C3 and 28-8 target the extracellular domain 44, 46. 22C3 and 28-8 binding sites contain N-linked glycosylation sites, which may lead to variability in antigen retrieval. N-glycosylation may also affect binding efficacy of antibodies with cytoplasmic binding; differences between mass spectrometry and E1L3N IHC were reported on melanoma samples with high glycan modifications, suggesting that posttranslational modifications could interfere with recognition of binding sites 17. SP142 and SP263 bind to the same epitope 44; hence the above-described discordance between these assays may be due to differences in assay protocol leading to insufficient antibody saturation. The visualization and amplification methods have been shown to affect the extent and pattern of expression of PD-L1 on IC and TC 47, at least partly explaining the discordance among assays.
Inter-observer reproducibility represents a major challenge to the reliable assessment of any IHC assay; this is especially true for PD-L1. Although inter-pathologist reproducibility for the assessment of PD-L1 on TC is high, concordance has been lower for IC evaluation across multiple tumor types 10, 13, 39, irrespective of the assay. Scoring IC is more difficult from a methodological standpoint. Identification of IC may be straightforward in some cases, but complex in others, especially when attempting to differentiate between TC and intra-tumoral monocytic (macrophages/dendritic) cells, which cannot be easily distinguished on H&E. In addition, the four kits reportedly show different IC staining patterns: 22C3, 28-8, and SP263 assays mainly stain macrophages and dendritic cells, whereas the SP142 assay, while staining a lower number of ICs, also identifies some lymphocyte-like cells 47. Using SP142, the majority of non-neoplastic cells were CD68+, whereas 5% were CD8+ 48. Two multi-institutional studies, including up to 19 pathologists, show moderate agreement (interclass correlation coefficient [ICC] 0.560–0.805) between pathologists for SP142 assay on TNBC samples 23, 25. Pathologists were trained on the evaluation of PD-L1 IHC and were required to pass a proficiency test in one of these studies 23. Agreement for other assays was slightly lower. Table 3 shows details of studies evaluating inter-observer reproducibility on BC samples. Of interest, SP142 has been shown to have the highest concordance among readers for PD-L1 IC ≥1% in studies including other tumor types 10-12, although the differences are not statistically significant. This may be because SP142 stains TC with lower prevalence, allowing the IC staining to be more easily identified.
Study | Assay and scoring | Participating pathologists | Samples evaluated | Training | Concordance |
---|---|---|---|---|---|
Reisenbichler et al 25 | SP142 CDA ICA ≥1% | 19 |
68 primary TNBC | No specific training for the study. | ICC 0.560, OPA 41% |
SP263 CDA ICA ≥1% | ICC 0.513 | ||||
Noske et al 23 | SP142 CDA ICA ≥1% | 7 | 30 primary TNBC | Trained on digital platform for the evaluation of PD-L1 IC with SP142 and had to pass a proficiency exam. | ICC 0.805 |
SP263 CDA ICA ≥1% | ICC 0.616 | ||||
22C3 CDA ICA ≥1% | ICC 0.605 | ||||
28-8 CDA ICA ≥1% | ICC 0.460 | ||||
FDA SSED 14 | SP142 CDA ICA ≥1% | 3 | 60 TNBC | Not specified. | OPA 91.1% |
Dennis et al 49 | SP142 CDA ICA ≥1% | 903 | 28 TNBC | Regional trainer lead sessions and digital platform training conducted by Roche International Pathologist Training program. A proficiency test was evaluated. | OPA 98% |
Downes et al 26 | SP142 CDA ICA ≥1% | 3 | 30 BC | Not specified. | ICC 0.956, OPA 98% |
22C3 CDA CPS ≥1 | ICC 0.862, OPA 93% | ||||
E1L3N LDT IC ≥1% | ICC 0.862, OPA 93% | ||||
E1L3N LDT CPS ≥1 | ICC 0.815, OPA 91% | ||||
Solinas et al 50 | E1L3N LDT IC ≥1% | 2 | 441 BC | Not specified. | ICC 0.10–0.58 for primary treatment naïve tumours, ICC 0.94[0.84–0.97] for NAC treated, ICC 0.00 [−0.54–0.35] for relapses |
- Overall percentage agreement (OPA) is calculated as the total number of times in which the readers agree, divide by the total number of readings. The OPA is expected to vary by classification difficulty and by the number of observers but does not take chance into account. Kappa does and should therefore be calculated as an associated measurement. Agreement measurements focus on the reliability of evaluations between different readers and do not require a standard reference, thus should not be confused with studies of accuracy. When using these measures of agreement, the FDA recommends to clearly state the calculations being performed. These calculations were not available for all the studies in Table 3 precluding fair comparison among studies.
- CDA, commercial diagnostic assay; FDA SSED, U.S. Food and Drug Administration summary of safety and effectiveness data; ICC, interclass correlation coefficient; LDT, laboratory developed test; OPA, overall percent agreement.
Overall percent agreement (OPA) is the proportion of samples that are classified the same by all observers. The U.S. Food and Drug Administration (FDA) summary of safety and effectiveness data for SP142 showed an OPA of 91.1%; however, this study included only three pathologists 14. In contrast, the study including 19 pathologists found an OPA of 41% with SP142. Recently, Reisenbichler et al 25 showed a new method for analysis of OPA as a function of the number of observers. The resulting graphs reaches a plateau at the number of observers required to provide realistic concordance estimate. If there is high concordance, then the plot will plateau at a high OPA with a small number of observers. In contrast, OPA for PD-L1 ICA ≥1% decreased as the number of observers increased, reaching a plateau of 40% at nine observers. Results of real-world training conducted by Roche demonstrated an OPA of 98% between 903 pathologists from 75 countries assessing 28 TNBC cases in a proficiency test; however, the methodology for calculating OPA was not disclosed on the abstract 49. On re-analysis of the National Comprehensive Cancer Network (NCCN) study with lung cancer samples, OPA between 13 pathologists increased from 0% with a three-category score to 18% using a two-category scale (IC ≥1 and <1%), or even 67% if an outlier pathologist is excluded 38, showing that two categories are more reproducible. Moreover, low values, such as 1%, show lower inter-reader reproducibility 51.
Clinical validation phase: Clinical validity and utility of PD-L1 IHC and TILs as predictive biomarkers of response to PD-1/PD-L1 inhibitors
Clinical validation refers to how reliably the biomarker correlates with response to ICI and divides the patient population into groups with divergent expected outcomes. Clinical utility is a measure of whether clinical use of a test improves clinical outcome and assists clinical decision-making 52. The gold standard for evaluating biomarker clinical utility is the outcome of prospective randomized trials, which include biomarker evaluation in the study design, such that it is powered to specifically evaluate the benefit derived from the new drug according to biomarker status 52-54. However, most randomized trials adopt a primary end point of drug efficacy and do not employ a biomarker design. Table 4 shows the characteristics and results of clinical trials utilizing PD-L1 IHC and TILs as predictive biomarkers of response to ICI in BC.
Clinical trial | Drug | Tumor type (n) | Biomarker details (n) | Predictive capacity of PD-L1/TILs |
---|---|---|---|---|
Advanced setting | ||||
IMpassion130 NCT02425891 |
Nab paclitaxel +/− atezolizumab randomized phase III | UnTx LAdv or mTNBC (902) |
PD-L1 (SP142) was prospectively tested at BTx (902) and used as a stratification factor for randomization. TILs were evaluated retrospectively (460 and 614). PD-L1 SP263 and 22C3 were performed retrospectively on BEP (614) post-hoc exploratory analysis. |
Improved PFS (HR 0.62[0.49–0.78]) and OS (HR 0.62[0.45–0.86]) with the addition of atezolizumab in PD-L1+ tumors (SP142 ICA ≥ 1%). ORR 56 versus 46% in the ITT population and 59 versus 43% in PD-L1+ tumors (p = 0.002). Better PFS (0.53[0.38–0.74]) and OS (0.57[0.35–0.92]) for TIL > 10%PD-L1 ≥ 1% population (n = 460). PD-L1+ cases showed higher median TILs (10%[IQR:5–20]) on BEP. Improved PFS and OS (0.64 [0.53–0.79]; 0.75 [0.59–0.96]) with the addition of atezolizumab in SP263 (IC ≥ 1%) and (0.68 [0.56–0.82]; 0.78 [0.62–0.99]) 22C3 (CPS ≥ 1) on BEP. Median PFS SP142 4.2 months, 22C3 2.1 months, SP263 2.2 months, and median OS SP142 9.4 months, 22C3 2.4 months, SP263 3.3 months. |
NCT013758427 | Atezolizumab single arm phase Ib | PreTx mTNBC (116) |
PD-L1 (SP142) tested prospectively (116). TICs (116). |
PD-L1 ICA ≥ 1% (ORR:12 versus 0%; HR: 0.55[0.33–0.92]) and TICs > 10% (HR:0.54[0.35–0.83]) were associated with better outcome. TICs > 10% was independently associated with ORR, PFS and OS in multivariate analysis. PD-L1 TC ≥ 1% was not associated with response. |
NCT0163397030 | Atezolizumab + nab paclitaxel single arm phase Ib | UnTx (13) and PreTx (20) mTNBC (33) |
PD-L1 (SP142) and TILs tested retrospectively at BTx (23 and 20) and PostTx (11 and 15, respectively). | No statistically significant association of baseline PD-L1 or TILs with response. Numerically higher ORR (41.7 versus 33.3%) and longer PFS (6.9 versus 5.1mo) and OS (21.9 versus 11.4mo) in PD-L1+ tumor (ICA ≥ 1%). Numerically higher OS in TILs > 5%. Changes in PD-L1 or TILs were not associated with clinical response. |
KEYNOTE-119 NCT02555657 |
Physician's choice chemo +/− pembrolizumab randomized phase III | PreTx mTNBC (622) |
PD-L1 (22C3) was prospectively tested at BTx (622) and used as a stratification factor for randomization. | No improved outcome in ITT population or PD-L1+ tumors (CPS ≥ 10 p = 0.057; CPS ≥ 1 p = 0.073). For CPS ≥ 20 HR OS:0.58[0.38–0.88]. Better OS for TILs ≥ 5% (0.75[0.59–0.96]) in the pembrolizumab arm but not the chemotherapy arm (1.46[1.11–1.92]). TILs and PD-L1 CPS moderately correlated (0.45). TILs (p = 0.004) and CPS (p = 0.09) were independently predictive. |
KEYNOTE-150 NCT02513472 |
Pembrolizumab + eribulin single arm phase Ib/II |
PreTx and UnTx mTNBC (107) |
PD-L1 (22C3) | ORR independent (30.6 versus 22.4%) of PD-L1 status (CPS ≥ 1).5 |
KEYNOTE-086 NCT02447003 33, 56, 57 | Pembrolizumab single arm phase II | A: PreTx mTNBC (170) B: UnTx PD-L1+ mTNBC (84) |
PD-L1 (22C3) was prospectively tested at BTx (254). TILs were evaluated retrospectively (193). |
ORR independent of PD-L1 status (CPS ≥ 1) on cohort A (5.7 versus 4.7%). No difference in PFS or OS between PD-L1+ and PD-L1-. 21.4% ORR cohort B. Better ORR in pts with TILs > median in cohort A (6 versus 2%) and B (39 versus 9%) and combined cohorts (OR: 1.26[1.03–1.55]). Higher median Higher TILs in responders versus non-responders in cohort A (10 versus 5%) and cohort B (50 versus 15%). |
KEYNOTE-012 NCT01848834 |
Pembrolizumab single arm phase Ib | PreTx PD-L1+ mTNBC (32) | PD-L1 (22C3) was prospectively tested at BTx (32) | Increasing ORR (p = 0.028) and reduction in HR (p = 0.012) with increasing PD-L1 expression. |
TONIC NCT02499367 |
Nivolumab +previous induction therapy (Rx/chemo) randomized phase II |
PreTx and UnTx mTNBC (67) |
PD-L1 (22C3) and TILS at BTx, after induction and PostTx. | Higher BTx TILs (median 12.5 versus 6%, p = 0.004) and PD-L1 on IC (median 15 versus 5%) on responders versus non-responders. Better PFS and OS was observed in PD-L1 IC ≥ 5% patients. No difference was observed between PD-L1 TC ≥ 1 and <1% populations. |
PANACEA NCT02129556 |
Pembrolizumab + trastuzumab single arm phase Ib/II | PreTx LAdv or mHER2+ BC Ib: PD-L1+ (6) II: PD-L1+ & PD-L1- (52) |
PD-L1 (QualTek/ 22C3) tested prospectively at BTx (58). TILs were evaluated retrospectively (48). | II: Higher ORR (15 versus 0%) in PD-L1+ (CPS ≥ 1). Longer OS for PD-L1+ population. Higher TILs levels in objective responders (median ~25 versus 1.5% p = 0.006) and in PD-L1+ population (p = 0.0004). |
KATE-2 NCT02924883 |
T-DM1+/− atezolizumab randomized phase II | PreTx LAdv or mHER2+ BC (202) | PD-L1 (SP142) tested prospectively (202). | PFS survival benefit (HR0.60[0.32–1.11]) and numerically higher ORR (54 versus 33%) in PD-L1+ tumors (ICA ≥ 1%) with the addition of atezolizumab. |
JAVELIN NCT01772004 |
Avelumab single arm phase Ib | PreTx LAdv or mBC (168) |
PD-L1 (73-10) evaluated prospectively on IC and TC (168). | Better ORR in PD-L1+ (IC ≥ 10%) BC (16.7 versus 1.6% p = 0.039, 22.2 versus 2.6% in TNBC). No association between outcome and PD-L1+ (HR PFS:0.66[0.34–1.26], OS: 0.62[0.25–1.54]). PD-L1 on TC showed no association with response. |
Neoadjuvant setting | ||||
KEYNOTE-552 NCT03036488 |
Neoadjuvant paclitaxel + carboplatin + AC/EC +/− pembrolizumab randomized phase III |
UnTx TNBC (602) |
22C3 | pCR achieved irrespective PD-L1 status (CPS ≥ 1) with the addition of pembrolizumab (68.9 versus 45.3%). |
KEYNOTE-173 NCT02622074 |
Neoadjuvant pembrolizumab + nab-paclitaxel +/− carboplatin +/-AC randomized phase Ib | UnTx LAdv TNBC (60) |
PD-L1 (22C3) retrospectively tested on BTx (52). TILs were retrospectively evaluated on BTx (53) and OnTx (50) samples. | Higher BTx (p = 0.028) and OnTx TILs (p = 0.005) and BTxPD-L1 CPS (p = 0.021) were associated with pCR. Responders had higher median pre (40 versus 10%) and OnTx (65 versus 22.5%) TILs. |
- Δ, change between baseline and after treatment; AC, doxorubicin + cyclophosphamide; BEP, biomarker evaluable population; BTx, baseline or pre-treatment; CPS, combined positive score; EC, epirubicin + cyclophosphamide; HR, hazard ratio; IC, immune cells; IQR, interquartile range; LAdv, unresectable locally advanced; ITT, intention-to-treat population; mBC, metastatic breast cancer, all subtypes; mTNBC, metastatic TNBC; n, number of patients included in the analysis; OnTx, on treatment; OR, odds ratio; ORR, objective response rates; OS, overall survival; PostTX, post-treatment; PreTx, previously treated; PFS, progression free survival; Rx, radiation; TC, tumor cells; TIC, tumor infiltrating immune cells (lymphocytes, macrophages, dendritic cells and granulocytes) scored as a percentage of tumor area; UnTx, untreated.
Patients with newly diagnosed metastatic or locally advanced PD-L1 ICA ≥1% TNBC demonstrated survival benefit with the addition of the PD-L1 inhibitor atezolizumab to nab-paclitaxel in the randomized phase III IMpassion130 trial in which all patients were prospectively tested for PD-L1 with SP142 28. Evaluation of progression-free survival (PFS) and overall survival (OS) in the PD-L1+ subgroup was one of the primary efficacy end points. Although the primary endpoint of OS for the intention-to-treat (ITT) population was not reached, and although a pre-specified statistical testing hierarchy prevented further formal analysis, OS was improved within the PD-L1+ subgroup with the addition of atezolizumab 28, 63.
No improved outcome was observed for pre-treated metastatic TNBC patients with PD-1 inhibitor pembrolizumab as monotherapy or compared to chemotherapy (treatment per physician choice: vinorelbine, capecitabine, or gemcitabine) in the ITT population or PD-L1+ populations (PD-L1 CPS ≥1 or CSP ≥10 with 22C3) on the randomized phase III KEYNOTE-119 study 31. Large randomized trials with survival end points, like the aforementioned, are generally required to establish the medical utility of a predictive biomarker. Nevertheless, retrospective analysis of specimens collected from prospective trials may also establish biomarker clinical utility if appropriately designed and if archival tissue is available from enough patients to have adequate statistical power 64. An exploratory analysis with a cut-off of CPS ≥20 did show a longer benefit in OS with the addition of pembrolizumab to chemotherapy 31. To further reliably establish clinical utility, these results should be validated in similar, but separate cohorts 64. Likewise, response to pembrolizumab monotherapy or in combination with chemotherapy was independent of PD-L1 status (CPS ≥1) on a single-arm phase II KEYNOTE-086 and KEYNOTE-150 trials, respectively 33, 34. Of note, patients participating in these studies were pre-treated. TNBC patients with PD-L1 IC ≥1% and IC ≥5% showed improved survival outcomes with nivolumab after induction treatment on the phase II TONIC trial 36.
For patients with metastatic trastuzumab-resistant HER2-positive (HER2+) BC, PD-L1 CPS ≥1 was predictive of response to the pembrolizumab plus trastuzumab combination in the single-arm phase II PANACEA trial 58. Conversely, on the phase II randomized KATE-2 trial, although the response was numerically higher in patients with PD-L1 ICA ≥1% tumors, no significant benefit was observed with the addition of atezolizumab to T-DM1 59. Notably, in an exploratory biomarker-analysis, the hazard ratio (HR) for OS was similar for PD-L1 as for TILs in this trial, suggesting that both predict benefit from the addition of atezolizumab to T-DM1.
In the neoadjuvant setting, an increase in pathological complete response (pCR) rate observed with the addition of pembrolizumab to chemotherapy was independent of PD-L1 status (CPS ≥1) on the randomized phase III KEYNOTE-552 trial 60. Similarly, PD-L1 ICIC% ≥1% not only failed to predict pCR after the addition of durvalumab to chemotherapy, but in fact was predictive of response in the chemotherapy-only arm on the phase II randomized GeparNuevo trial 37.
Exploratory analysis of the randomized phase III KEYNOTE-119 trial showed that patients with TILs higher than the median (5%) had better OS in the pembrolizumab monotherapy arm but not in the chemotherapy arm 55. TILs greater than the median were also shown to be predictive of response to single-agent pembrolizumab regardless of PD-L1 status on retrospective biomarker analysis of the previously treated PD-L1 unselected cohort A of KEYNOTE-086 (median TILs 5%), but even more so within PD-L1+ treatment-naïve cases on cohort B (median TILs 17.5%) 57. Furthermore, patients with TNBC and HER2+ BC who responded to treatment with pembrolizumab alone and in combination with trastuzumab showed higher median TILs on the single arm phase II KEYNOTE-086 and PANACEA trials 57, 58 and on the TONIC phase II trial evaluating nivolumab after induction treatment 36.
In the neoadjuvant setting, baseline TILs evaluated as a continuous variable and stratified (<10, 11–59, ≥60%) were predictive of pCR in both the durvalumab plus chemotherapy and chemotherapy plus placebo arms of GeparNuevo 37. In addition, overall T-cell density was associated with pCR in response to pembrolizumab in the randomized phase II I-SPY 2 trial 65.
It is important to keep in mind that TILs have also proven predictive of response to neoadjuvant chemotherapy (NAC) in patients with TNBC and HER2+ BC 66, 67 and strongly prognostic of outcome in patients with early TNBC treated with standard anthracycline-based adjuvant chemotherapy 68-70 on phase III and pooled trials. In addition, in early stage treatment-naïve TNBC patients, high TIL-counts predict >98% 5-year survival, suggesting that the benefit of chemotherapy is probably very limited in this group 71, 72. PD-L1 baseline expression has also been positively associated with response to anthracycline-based NAC in hormone receptor–positive BC 73 and TNBC 74. However, both PD-L1 and TILs are predictive of response to monotherapy ICI, proving predictive capacity beyond chemotherapy treatment.
Clinical implementation: Inclusion of PD-L1 and TILs in clinical trials
Given the existing evidence, we propose systematic implementation of combined PD-L1 and TIL analyses as a comprehensive immuno-oncological integral biomarker for patient selection for ICI in BC clinical trials. Because both have proven to be influential determinants of response to ICI, the use of both markers as stratification factors on randomized clinical trial designs could improve the balance of baseline characteristics among arms. Trial design should include PD-L1 and TIL analyses in real time, pre-specifying the inclusion of both biomarkers in the protocol and ensuring well-powered biomarker clinical utility data that can be used for regulatory submissions of both TILs and PDL1 as markers of efficacy for immunotherapy. In addition, new protocols can be written to conduct prospective–retrospective biomarker analysis on archival tissues from completed trials. All studies must be conducted and analyzed in a standardized manner per Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK) criteria 75, 76. TILs should be scored as recommended by the International Immuno-oncology Biomarker Working Group (TIL-WG) 2, 3 as a continuous variable with clinically relevant cut-offs in mind. A recent publication demonstrated the feasibility of the application of a web-based TIL scoring platform to enable the use of TILs as a stratification factor in an immunotherapy clinical trial for TNBC within a risk-management framework 77. This pilot study proposes a standardize workflow that can be used in future clinical trials.
In BC, both PD-L1 and TILs have shown higher expression in primary tumors samples than in metastases 2, 24, 57. Nonetheless, PD-L1 expression on either primary breast (HR PFS: 0.61[0.47–0.81]) or metastatic lesion samples (HR OS: 0.55[0.32–0.93]) was predictive of response to atezolizumab and nab-paclitaxel combination 24. Although the most recent sample may be more representative of the current immunologic status, evaluating all available samples on clinical trials would provide useful data to define the most appropriate time point for testing. Pre- and on-treatment TILs have been associated with response to ICI 61, 62. On-treatment biopsies could be included in protocols, since they may provide real-time information to help guide future treatment choices.
Furthermore, the existence of multiple scoring systems for PD-L1 assays precludes the harmonization of assays and complicates reproducibility of scoring among pathologists. A single scoring system would allow a more accurate and direct comparison among assays and simplify scoring, likely facilitating adoption into clinical practice. For BC patients, clinical benefit has been correlated with PD-L1 expression on IC 7, 27, 35, 36. Moreover, PD-L1 expression on macrophages was associated with outcome in response to neoadjuvant durvalumab 78. Although PD-L1 expression on TCs with SP263 was predictive of response to durvalumab in the neoadjuvant setting 37, in the advanced setting, expression on TCs evaluated by SP142 7, 24, 27, 22C3 36, and 73-10 35 was not predictive. We therefore encourage reporting PD-L1 expression as IC, TC%/tumor positive score (TPS), and CPS separately for all assays in clinical trials to assess which scoring system is most clinically relevant for each setting. Note that IC scored as proportion of tumor area occupied by PD-L1 expressing IC is not equivalent to IC as a percent of TC, given that most BCs contain distinct stromal areas in-between tumor areas; a score normalized by cross-sectional area produces lower scores than a score normalized by number of TCs.
We believe that the application of systematic criteria for combined PD-L1 and TIL analyses to future clinical trial designs will produce reliable data to better understand which patients will benefit the most from ICI. The resultant data could ultimately allow the conduction of a meta-analysis to provide clinically impactful data. Nevertheless, PD-L1 expression and IC presence are subject to dynamic regulation processes that are incompletely understood biologically. In addition, several other factors also influence responses to ICI, including tumor neoantigen load, IC composition, and expression of other costimulatory and inhibitory molecules. Additional biomarkers may help further refine patient selection. These potential biomarkers will likely be predictive in a tumor type–specific dependent manner. For instance, TMB has been showed to be a predictive biomarker of response to ICI across multiple cancers in retrospective studies 79. However, mutational load is relatively low in BC. In addition, in TMB, estimates are variable across laboratories 80, with slower turnaround and higher cost compared to IHC. Class II major histocompatibility complex (MHC-II) tumor expression has been associated with response to ICI in breast 81 and other tumor types. Further investigation of these and other biomarkers in correlative studies in clinical trials is warranted, such as those evaluated by multiplex fluorescence IHC or gene-expression profiling.
Clinical implementation: Inclusion of PD-L1 and TILs in daily practice
An analytically and clinically validated biomarker assay can be implemented into clinical care, but level 1 evidence is needed to change clinical practice. Results from randomized phase III IMpassion130 28 led to the accelerated approval of atezolizumab and nab-paclitaxel as the standard treatment regimen for PD-L1+ (ICA ≥1%) metastatic TNBC in many countries. Clinical implementation of a biomarker requires three key elements: Regulatory approval, reimbursement by health systems, and incorporation into clinical practice guidelines 6. Regulatory approval is different in every country. Only the SP142 assay has been approved by regulatory agencies as the companion diagnostic test for the administration of atezolizumab and nab-paclitaxel in countries such as the United States, Japan, Sweden, Peru, and Argentina. Whereas in certain counties in the European Union (EU), China, and Brazil, any PD-L1 assay can be used as long as it has been validated. In the EU, drugs are generally not regulatorily linked to a companion diagnostic test. The NCCN and other guidelines 82 include PD-L1 diagnostic testing as part of the workup for recurrent or metastatic TNBC as well as other tumor types. However, to date, in most countries, PD-L1 testing is not performed routinely on metastatic TNBC, but mainly upon oncologist request.
Following regulatory approval and incorporation into clinical practice guidelines, a biomarker must also be affordable and accessible to pathologists in both academic and community-hospital practices worldwide to be successfully incorporated into daily practice. In Japan, where the SP142 assay is the approved companion diagnostic test for TNBC, only this assay is covered by the health system. In the United States, the SP142 assay and LDTs are covered by health insurance. In Peru, PD-L1 testing is covered by prepaid health insurance but it is not yet covered by the public health system. In Argentina, Australia, Brazil, Chile, India, Morocco, and some countries in the EU, the test is not yet covered by the health system. In the UK, the National Institute for Health and Care Excellence (NICE), the UK regulatory agency that evaluates drug efficacy, reported: ‘Atezolizumab with nab-paclitaxel […] does not meet NICE's criteria for inclusion in the Cancer Drugs Fund. This is because it does not have the potential to be cost effective at the current price, and there is no clear evidence that further trial data would resolve the uncertainties’ 83.
Subsequently, each pathology laboratory faces challenges including sample selection, sample processing, choice of assay, quality assurance, and interpretation to ensure correct implementation and consequent accurate patient selection. Table 5 summarizes these and previously stated risks along with proposed mitigation approaches to ease the implementation of PD-L1 testing into clinical practice. It has been suggested that labs should test as many time points as are available such as to maximize patient eligibility for treatment. However, such an approach will be costly without proven benefit to the patient. It is also unclear whether insurance companies will pay for testing of multiple samples.
Risk | Description of risk | Mitigation approach/Recommendation |
---|---|---|
Risks to patient safety | ||
Provision of inappropriate treatment because of false-positive or false-negative test results | Inter-pathologist variability and use of different assays with different sensibilities may mislead categorization of PD-L1 status. Incorrect results lead to inappropriate treatment allocation and put patient safety at risk. | See below. |
Physical harm or inconvenience associated with tissue biopsy | Heterogeneity of PD-L1 expression between primary and metastatic lesions in TNBC 14 can lead to misleading categorization depending on the sample tested. | Define optimal sample for PD-L1 testing from data of future clinical trials. When both primary and metastatic samples available, test both if possible. |
Operational risks | ||
Failure of sample collection, processing and quality | Poor quality samples can result in unreliable test results. | Ensure correct sample fixation for 6 to 72 h and processing. Determine sample adequacy on H&E: presence of TC and tumor-associated IC. Cut 4um sections for PD-L1 IHC testing along with sections for other IHC to preserve tissue in biopsy samples. Use within 2 months of cutting 14. |
Within laboratory assay variability | Drifts in assay results over time can result in unreliable test results. | Follow staining protocol with optimized conditions. Include control tissue (tonsil) to test acceptance criteria 14. Internal and external quality assurance. Audit positivity rates 87. |
Risks to biomarker development | ||
Difference in PD-L1 expression prevalence among assays | SP142 has shown PD-L1 expression on a lower number of TC and IC compared to the other assays 9, 10, 12, 13, 16, 22-24, 38-42. | It is more important that an assay identifies the patients who will most likely respond, than identifying a greater proportion of PD-L1 positive patients. Even though assays are not analytically equivalent, clinical utility interchangeability must be further studied. |
Use of multiple scoring systems | The existence of multiple scoring systems for the PD-L1 assays preclude the homologation of assays and complicate reproducibility. | For BC, PD-L1 expressed in IC and not in TC has been shown to be predictive of response 7, 27, 35, 36. Future clinical trials should evaluate the most effective PD-L1 scoring system. Cut-points must be reproducible. |
Inter pathologist variability to read assay | Quantification of PD-L1 on IC has been shown not be reproducible to expected standards 10-13, 18, 23, 38, 39. | Ensure training on expected staining profile and cut-off for pathologist participating in clinical trials. Use of a single scoring system. Automated quantification by computer-based image analysis. Evaluate interobserver variability with a sufficiently large and statistically powered number of pathologists to ensure reproducibility. |
Temporal and Spatial heterogeneity | Both PD-L1 and TILs have demonstrated higher expression in primary tumors than in metastases 2, 24, 57. | Evaluating all available samples on clinical trials would provide useful data, since the most appropriate time point for testing has not yet been clearly established. |
Unique biomarker as companion diagnostic test | Due to the complexity of immune response it is unlikely a single biomarker will sufficiently predict response to ICI. | Since both PD-L1 and TILs have shown to be predictive of response to ICI 28, 57 the use of both as stratification factors and for composite biomarker analysis in future clinical trials may help further optimize patient selection. Enough samples should be secured to further investigate other biomarkers on exploratory analysis. |
Risks to biomarker implementation into daily practice | ||
Regulatory approval differs per country | Implementation into daily practice is dependent on regulatory approval. | Thorough and timely scientific interaction between the pathology community, industry and regulatory and national reimbursement agencies is needed. |
Biomarker accessibility and affordability | PD-L1 testing is not yet covered by health insurance in many countries. | Thorough and timely scientific interaction between the pathology community, industry and regulatory and national reimbursement agencies is needed. |
Use of multiple PD-L1 assays for a single analyte | With multiple PD-L1 assays available, pathology labs cannot be expected to have all tests available, causing variability in test results between laboratories. | Choice of assay will depend on regional regulations, availability of antibody, automated staining platform and optimized assay in currently in use. Consider LDTs. Outsource to reference laboratories. |
Difference in PD-L1 expression prevalence among assays | SP142 has shown PD-L1 expression on a lower number of TC and IC compared to the other assays 9, 10, 12, 13, 16, 22-24, 38-42. | It is more important that an assay identifies the patients who will most likely respond, than identifying a greater proportion of PD-L1+ patients. For BC SP142, SP263 and 22C3 have shown to identify patients that derive better outcome in response to atezolizumab and nab-paclitaxel 24. |
Inter pathologist variability to read assay | Quantification of PD-L1 on IC has been shown not be reproducible to expected standards 10-13, 18, 23, 38, 39. | Training on expected staining profile and cut-off. Interpretation guideline. Use of a single scoring system. Automated quantification by computer-based image analysis. |
Unique biomarker as companion diagnostic test | Due to the complexity of immune response it is unlikely a single biomarker will sufficiently predict response to ICI. The use of PD-L1+ IC score as a unique biomarker test maybe suboptimal in real world conditions. | Since TILs and PD-L1 are part of an immunological spectrum and PD-1/PD-L1 interaction is only one of many factors that may determine the clinical outcome of immunotherapeutic therapies, assessing both as a composite biomarker could be a better way to identify patients most likely to respond to ICI. |
- H&E, hematoxylin and eosin; IC, immune cells; IHC, immunohistochemistry; LDTs, laboratory developed test; PD-L1+ IC, proportion of tumor area covered by IC with discernible PD-L1 staining of any intensity expressed as a percentage; ICI, PD-1/PD-L1 inhibition based therapy; TC, tumor cell; TILs, tumor infiltrating lymphocytes; TNBC, triple negative breast cancer.
From a clinical perspective, it is imperative that an assay identifies patients likely to respond to ICI, rather than identifying a greater proportion of PD-L1+ patients. The lower prevalence of PD-L1+ cases detected by the SP142 assay could potentially lead to fewer patients selected for therapy (false-negative tests), whereas use of SP263 or 22C3 could lead to greater patient eligibility at the expense of false-positive tests, unnecessarily subjecting a subset of these patients to toxicity and financial costs without clinical benefit. In an exploratory post hoc analysis of IMpassion130, the PD-L1+ population identified by each assay independently showed clinical benefit with similar hazard ratio (HR) (HR [95% CI]: SP142 ICA ≥1%: PFS: 0.60 [0.47–0.78], OS: 0.74 [0.54–1.101]), 22C3 CPS ≥1: PFS: 0.68 [0.56–0.82], OS: 0.78 [0.62–0.99], SP263 IC ≥1%: PFS: 0.64 [0.53–0.79], OS: 0.75 [0.59–0.96]) 24. 22C3 and SP263 identified a larger PD-L1+ population, of which the SP142 positive cases are a subgroup. Of note, the biomarker evaluable population (BEP) included only 68% of the original ITT population, and although it may be adequately sized to reliably identify a larger treatment effect in the two-category test-positive patients, it could be underpowered to analyze a tripartite population of dual-assay analysis. OPA for analytical concordance with SP142 (ICA ≥1%) was 64% (22C3 CPS ≥ 1) and 69% (SP263 IC ≥ 1%), demonstrating that the assays are not equivalent 24. Nevertheless, even if mostly driven by the SP142-positive subpopulation, SP263 and 22C3 identified patients that showed improved PFS and OS, making them clinically interchangeable, since they identify populations with near-similar clinical outcomes 9. Further studies such as this, done in partnership between academia, industry, and regulatory entities, need to be encouraged, preferably before formal regulatory approval of an assay as a companion diagnostic linked to a specific drug. In a meta-analysis including samples from various tumor types, each diagnostic kit was found to better match with properly validated corresponding LDTs than with other diagnostic kit assays 43. Although further studies are warranted, the use of LDTs is a reality in daily practice.
From a practical point of view, a single pathology laboratory cannot have all assays available. Labs performing PD-L1 IHC testing for NSCLC already use other assays, most commonly 22C3 and SP263 assays or an LDT 38, 40. Developing and validating the SP142 assay could be an unwarranted burden for some laboratories. SP142 and 22C3 commercial diagnostic assays are performed on different platforms, each a large capital expenditure. In countries where regulatory agencies permit, PD-L1 could be performed as an LDT, if analytically validated. For the SP142 antibody, similar PD-L1 expression was observed with different platforms 15, although using a different detection method has proven to impact assay performance 47. In countries where the regulatory agencies mandate the use of the SP142 assay, smaller hospitals will likely need to outsource testing to a reference laboratory. To date, in most countries, only a handful of large academic hospitals and reference labs are performing PD-L1 testing for TNBC. The choice of assay should be an agreement between pathologist, oncologist, and patients, and be directed by good laboratory practices and common sense. Patient advocates need to be aware of how the choice of an assay can influence treatment decisions.
For quality assurance purposes, tonsil-control tissue must be included as positive and negative controls alongside the clinical case to accept or reject the assay run. Tonsil tissue is recommended because it demonstrates granular punctate staining on lymphocytes arranged in aggregates and dispersed single-cell patterns, diffuse staining in the reticulated crypt epithelium, and absence of staining on superficial squamous epithelium 8. A control sample staining close to the cut-off point is also recommended 87. Unlike HER2, PD-L1 has no reflex alternative testing method that can be employed to ascertain accuracy. In addition, because the different PD-L1 assays are not equivalent, they cannot be tested against each other for accuracy. Pathology laboratories must audit their PD-L1 positivity rates as part of internal quality assurance. Prevalence of PD-L1+ (ICA >1%) TNBC with SP142 was 41% (44% on primary and 36% on metastatic samples) on IMpassion130 24, 28. Other studies have shown a similar range of prevalence 32–58% on TNBC samples using SP142 ICA ≥1% 14, 22-25, 28-30; one study had an outlier prevalence of 78%, in which the first 25 patients were selected only if PD-L1+; then enrollment was extended to all patients 7. However, PD-L1+ prevalence reaches 54–87 and 46–86% when using SP263 ICA ≥1% and 22C3 CPS ≥1, respectively 22-25, 31-34. Prevalence of PD-L1+ on each of the cited studies is shown on Table 2. As part of an external quality assessment and validation, samples with known PD-L1 expression should be tested and compared on proficiency tests. A validated standardized PD-L1 Index Tissue Microarray 16 containing cell-line samples with known varying PD-L1 expression levels could be used for this purpose. For LDTs, laboratories must show results comparable to those obtained in clinical trials, with a diagnostic assay validated to predict potential response to a particular drug in a particular disease as a gold standard 84. The Canadian Association of Pathologists has published a guide to ensure the quality of PD-L1 testing 85.
As discussed previously, inter-observer reproducibility is one of the main pitfalls regarding PD-L1 validity as a viable prognostic or predictive marker. These errors in patient selection not only put patients at risk, but also generate extra costs for health systems, generating issues at the national regulatory level regarding reimbursement-criteria. Pathologists must be trained to interpret and score PD-L1 assays. Training material developed by assay manufacturers, including a digital training platform with a proficiency test, can be accessed freely 86, 88. The value of training should be established in statistically rigorous studies that include post-training evaluation with proper decay time. In addition, pathologists must participate in external quality assurance programs. A guideline for the interpretation of PD-L1 IHC developed by pathologists for pathologists, like those for TILs 2, 3, 89, ER 90, and HER2 91, is needed. Such a guideline developed by the International Association for the Study of Lung Cancer is available 92. Even though reproducibility among pathologists has been shown to be higher with two-category scoring 38, we believe the percentage of PD-L1+ ICA should be incorporated into the pathology report in addition to a positive or negative PD-L1 deliberation.
Another tool available for pathologists that can improve reproducibility is digital image analysis of whole-slide images. Evaluation of TILs in solid tumors is a highly suitable application for computational assessment; automated quantification by computer-based image analysis provides accurate and reproducible results that can aid pathologists, especially for borderline cases surrounding the clinically relevant 1% cut-off that are challenging to distinguish by eye. In the basic retrospective research realm, image analysis algorithms have shown better or comparable concordance between the automated algorithm score and the mean pathologist score than between pathologists 9, 93. Like any biomarker, computer-based image analysis algorithms would need to be analytically and clinically validated with demonstrated clinical utility such that results are consistent with trial materials used to established cut-points for clinical decision-making and approved by corresponding regulatory agencies before they can be applied in the daily practice. A recent publication outlines possible workflows and challenges for analytical and clinical validation of computational TIL assessment 94, paving the path for its incorporation into clinical trials and daily practice.
In view of the considerable level Ib evidence for the prognostic value of TILs, the expert panels at St Gallen 2019 95 and authors of the 2019 edition of the World Health Organization Classification of Tumors of the Breast recommended quantification of TILs in TNBC. Internationally, some institutions have already begun incorporating TILs into pathology reports, paving the way for TIL counts to inform BC therapies. Going forward, a standardized format for reporting TIL counts, similar to those used to report hormone receptors, will need to be adopted. Given the inherent variability in TIL distribution and heterogeneity of sampling, we propose that TIL counts should be scored in treatment-naïve and advanced-setting BC specimens, while in the clinical post-treatment setting TILs should be scored only on clinical trial samples according to established guidelines 96. TILs should be scored as recommend by the TIL-WG 2, 3 as a continuous variable, with clinically relevant cut-offs in mind.
Even though TILs will require validation in accordance with regulatory standards prior to being clinically recommended as a predictive biomarker for response to ICI, TILs ≥5% have been shown to be predictive of response to pembrolizumab on the exploratory analysis of the randomized phase III KEYNOTE-119 clinical trial 57. In addition, TILs have been analytically validated, with three ring studies showing reliable inter-reader reproducibility 97-99, and have the advantage of being easily assessed on a simple H&E slide with an existing standardized method that is available to the pathology community though numerous publications and at the TIL-WG website 2, 89. In a recent publication, an analysis of the most discordant cases on the ring studies identified possible pitfalls for scoring TILs, including technical factors, sample heterogeneity, variability in defining tumor boundaries, differentiating lymphocytes from mimics, and limited stroma for evaluation. Approaches to avoid these pitfalls have been covered in the publication, and associated educational resources are available at the TIL-WG website 89, 97. Once pathologists score TILs in their daily practice for prognostic purposes, this information will already be present in the report. As shown by Liu et al using SP142 LDT, a significant proportion of PD-L1+ ICs are macrophages 48, whereas TILs are composed of lymphocytes and plasma cells. In addition to providing this biologically relevant predictive information, TILs can also serve as a starting point. It is improbable that a tumor with no TILs will be PD-L1+. Similarly, PD-L1 borderline cases are likely to have low TILs. At the same time, cases with high TILs are highly likely to be PD-L1+, as evidenced on the BEP of IMpassion130 exploratory analysis, in which virtually all cases with TILs >20% were PD-L1+ 24. Therefore, used in combination with TILs it may conceptually not matter which PD-L1 assay is used, as long as it is validated according to international standards. TILs are highly likely to be the backbone of predictive and prognostic information.
In conclusion, pathologists have a responsibility to patients to implement assays that lead to the most optimal selection of patients for immunotherapies. Solving the current issues in implementation of PD-L1 assays in clinical trials and daily practice requires a partnership between industry, academia, and regulating agencies, involving patient advocates. Because TILs and PD-L1 are part of an immunological spectrum in BC, and PD1-PD-L1 interaction is only one of many factors that may determine the clinical outcome of immunotherapeutic therapies, assessing both as a composite biomarker may be the best way to identify patients most likely to respond to ICI. However, reality and regulatory implementations dictate that practices will vary across different jurisdictions. We propose herewith a risk-management framework that may help mitigate the risks of suboptimal patient selection for immuno-therapeutic approaches in BC.
Acknowledgements
The authors recognize the members of the International Immuno-Oncology Biomarker Working Group for reviewing and providing critical feedback on the manuscript. RS is supported by the Breast Cancer Research Foundation, New York, USA. SL is supported by the National Breast Cancer Foundation of Australia Endowed Chair and the Breast Cancer Research Foundation, New York. EAT is supported by the Breast Cancer Research Foundation.
Author contributions statement
RS conceived the presented idea, PGE did the literature search and took the lead in writing the manuscript, with the guidance of RS and MS. All authors provided critical feedback and helped shape the manuscript.
Disclaimer
This work includes contributions from, and was reviewed by, individuals who are employed by Bristol-Myers Squibb and Merck & Co, Inc. The content is solely the responsibility of the authors and does not necessarily represent the official views of Bristol-Myers Squibb or Merck & Co, Inc. Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer/World Health Organization.