Machine-learning-based image analysis algorithms improve interpathologist concordance when scoring PD-L1 expression in non-small-cell lung cancer

Introduction

Treatment of patients with non-small-cell lung cancer (NSCLC) has been dramatically improved by the use of immune modulators (IMs) targeted against the PD-1/PD-L1 (programmed death-1/programme death ligand-1) checkpoint in tumours that exploit it as a mechanism of immune escape.1–4 Currently, the only validated predictive biomarker for guiding the use of these therapies in NSCLC is assessment of PD-L1 expression on tumour cells (TCs) by immunohistochemistry.1–6 Unfortunately, several key weaknesses hamper the predictive power of PD-L1 expression, one of which is poor concordance between pathologists when interpreting such expression.7 8 This carries important implications, since categorisation of PD-L1 expression around agreed thresholds is the cornerstone of clinical decision-making regarding whether or not prescription of IMs is appropriate.9 10

One potential solution to interpretative discordance is the adoption of image analysis tools afforded through recent developments in digital pathology (DP), technology that is becoming commonplace in the fields of diagnostic and predictive pathology.11 12 In addition to providing benefits in ‘routine’ morphology-based reporting, DP and image analysis is being increasingly applied to ‘quantitative immunochemistry’ in the prediction of response to targeted therapies.13 14

In this preliminary study, we explore whether the use of a machine learning-derived image analysis tool can improve interpathologist concordance in assessing PD-L1 expression in NSCLC and thereby improve consistency of interpretation.

MethodsStudy design

Thirteen biopsy specimens of NSCLC (seven adenocarcinomas and six squamous carcinomas) were selected to represent a range of PD-L1 expression levels. Serial sections 4µm thick were stained with H&E for assessment of morphology and immuno-stained for PD-L1 using the Ventana SP263 antibody clone with a validated kit and protocol.15 Slides were scanned at 20× magnification using the Roche Ventana DP200 slide scanner and images were viewed and assessed using Roche Navify DP software and uPath PD-L1 (SP263) image analysis, NSCLC algorithm.

Five pathologists routinely scoring PD-L1 expression in specimens of NSCLC at a major regional laboratory servicing two large lung cancer units and a thoracic surgical centre participated in the study. Each pathologist independently scored every case using the machine learning-based image analysis algorithm for assessment of PD-L1 expression. After a 6 week ‘washout’ period, the same specimens were reassessed without assistance of the image analysis tool. Pathologists were blinded to assessments of their colleagues and to their initial, assisted assessment scores.

Assessment of PD-L1 expression was performed according to the VENTANA PD-L1 (SP263) interpretation guide. The number of PD-L1-positive TCs as a proportion of the total number of TCs in the section was expressed as the tumour proportion score (TPS)/TC content and each specimen was placed in one of three categories, using established ‘cut-off’ points, of ‘negative’ (<1%), ‘weak positive’ (1%–49%) and ‘positive’ (≥50%).

The PD-L1 image analysis tool

Prior to the study, the pathologists involved were trained in using the tool by assessing a range of specimens.

Utilising the Navify DP software, images of H&E-stained and PD-L1-immunolabelled sections were viewed digitally in parallel. The H&E-stained slide was used to assess the morphological features of the tumour and its heterogeneity, particularly extent and distribution of stroma, inflammatory infiltration and necrosis. With this as a basis, ‘regions of interest’ (ROIs) were identified and digitally selected using a ‘drawing tool’ and the analysis tool was then run, calculating a TPS for both the individual ROI and for combined ROIs across the entire section (figure 1). The pathologist then considered all available data, with and without the tool and determined a final ‘assisted TPS’.

Figure 1Figure 1Figure 1

Applying annotations to NSCLC tissue in uPath. (A) Multiple annotations are drawn around each area of interest. (B) Higher power view of annotated region. (C) Annotated region post image analysis; note the PD-L1 positive tumour cells (red) and PD-L1 negative tumour cells (blue). The stroma and anthracotic pigment in the central area is ignored. (D) Final data for all annotations: 8 ROI totalling 6717 tumour cells, 4889 PD-L1 positive tumour cells, 1828 PD-L1 negative tumour cells to give a TPS of 72.8%. NSCLC, non-small-cell lung cancer; PD-L1, programmed death ligand 1; TPS, tumour proportion score; ROI, regions of interest.

The second ‘postwashout’ assessment utilised the same images, viewed on the uPath software, but scored conventionally, without the assistance of the image analysis tool, providing an ‘unassisted TPS’.

Statistical analysis

Statistical analysis was performed using IBM SPSS statistics software, V.26 (IBM Corp). PD-L1 expression was captured as both continuous data (expressed as an absolute percentage) and categorical data (negative/weak positive/strong positive). Interpathologist concordance was calculated for both, the TPS as a continuous variable was assessed using two-way random intraclass coefficient correlation (ICC) with interpretation as per Koo and Li (2016) and clinical categories assessed using Fleiss’ kappa, with interpretation as previously described.16 All significances were taken as p<0.05.

ResultsConcordance for categories of PD-L1 expression

The categorisation of PD-L1 expression for each specimen into ‘clinically relevant’ categories of negative, weak positive or strong positive is shown for each pathologist when assisted with and without image analysis (tables 1 and 2). Agreement between pathologists when assisted by image analysis was ‘very good’ (κ 0.886 (95% CI 0.881 to 0.891) p<0.0005) as compared with ‘good’ when unassisted (κ 0.613 (95% CI 0.608 to 0.617) p<0.0005).

Table 1

Categorisation of each case by PD-L1 TPS into clinically relevant categories of negative (<1%), weak (1%–49%) and strong (≥50%) for each pathologist when assisted by the image analysis tool

Table 2

Categorisation of each case by PD-L1 TPS into clinically relevant categories of negative (<1%) weak, (1–49%) and strong (≥50%) for each pathologist when unassisted by the image analysis tool

Concordance for TPS as absolute values

Agreement between pathologists when PD-L1 expression was expressed as the TPS (online supplemental tables S1 and S2) was ‘excellent’ when assisted by image analysis (ICC 0.954 (95% CI 0.903 to 0.984) p<0.0005) as compared with ‘good’ when unassisted (ICC 0.837 (95% CI 0.686 to 0.938) p<0.0005).

Feedback from participants

Feedback from pathologists highlighted the marked subjectivity involved in selecting ROIs and its implications for assessment by the image analysis tool. Each pathologist annotated ROIs differently on the digital slides with clear variability in the shape and size of regions considered suitable for assessment by the tool (figure 2). There was a consensus view that the main value of the tool was in either confirming or questioning assessments made without it, thereby acting as a form of continuous ‘quality control’ that might be of particular value in combating ‘interpretative drift’.

Figure 2Figure 2Figure 2

Case assessed for PD-L1 (SP263) using uPath to draw annotations for image analysis. (A) 13 annotations by pathologist C and (B) 4 annotations drawn by pathologist B. PD-L1, programmed death ligand 1.

Discussion

Many factors weaken the predictive power of assessing PD-L1 expression to determine response of NSCLC to IMs. These include variation in tissue preparation and processing and the use of different assays, elements which can be controlled with a view to ensuring consistency.8 17 Other factors, such as biological tumorous heterogeneity and the discordance in interpretation addressed in this report, are more difficult to address.7 17 A recent UK-based survey of laboratories routinely involved in interpreting PD-L1 expression in profiling NSCLC highlighted the challenge of interpathologist discordance of interpretation and considered strategies to address it.18

In this study, we utilised a machine learning derived image analysis tool to assist in the interpretation of PD-L1 expression in NSCLC. Even though the five participating pathologists are highly experienced in assessing PD-L1 expression, the image analysis algorithm still improved concordance. Critically, the most marked improvement in consistency was in the placement of cases into the three clinically crucial categories of ‘negative’, ‘weak positive’ and ‘strong positive’.

An interesting observation was that each pathologist had an individual approach to applying the image analysis tool, with considerable variation in the number and shape of ROIs annotated for analysis (figure 2). It is likely that improving consistency in this particular process by further training might result in even greater value from the tool.

The value of image analysis may be especially valuable for specific specimens; for example, biopsies with multiple fragmented pieces of tissue that cannot all be viewed simultaneously at higher powers (figure 3). Conversely, there may be specimens that derive minimal benefit from this process; such as when specimens are entirely positive or entirely negative (figure 4). It is therefore critical that, as with all tools, image analysis is not applied ‘blindly’ or without human oversight but integrated in a careful fashion with appropriate guidance.

Figure 3Figure 3Figure 3

Example of when application of image analysis adds benefit. (A) Biopsy of NSCLC with multiple fragments of tissue which cannot all be viewed at high power simultaneously. Manually scored as 50% TPS for PD-L1 (SP263). (B) Higher power view of a fragment showing tumour with heterogeneous expression of PD-L1. (C) Image analysis applied to annotated regions shows scoring of heterogeneous area. (D) Image analysis applied to multiple annotations returns a total of 19 965 tumour cells scored, with 10 615 PD-L1 positive and 9350 PD-L1 negative tumour cells to give a TPS of 53.2%, confirming this case is ≥50% TPS. NSCLC, non-small-cell lung cancer; PD-L1, programmed death ligand 1; TPS, tumour proportion score.

Figure 4Figure 4Figure 4

Example of samples likely to yield minimal added value with image analysis. (A1/A2) A biopsy of NSCLC with homogeneous strong staining for PD-L1 (SP263) scored manually as 95% TPS (A).(B) Application of image analysis returns a TPS of 92.5%. (B1/B2) A core biopsy of NSCLC pan-negative for PD-L1 scored manually as 0% TPS (A).(B) Application of image analysis returns a TPS of 0.1%. In both instances, image analysis can be applied and returns an accurate result, but the added benefit is minimal. NSCLC, non-small-cell lung cancer; PD-L1, programmed death ligand 1; TPS, tumour proportion score.

An important consideration is whether the increased concordance seen within this study (and other related studies) results in an agreement of ‘ground-truth’ or if we are simply ‘agreeing incorrectly’ more often. A weakness in this study is that there is no clinical follow-up to determine the patient outcome. A challenge when developing a quantitative immunohistochemical assay is in balancing the decision of cut-offs that accurately reflect biology, clearly define patient groups and are simple enough to result in good interpathologist concordance.

We believe the first step in resolving these challenges is to demonstrate that a high level of agreement among pathologists can be achieved in order to refine specific cut-offs for predicting patient response to therapy, and that the ability for image analysis to be consistent and objective when ascertaining and quantifying ‘positive’ staining is a critical tool to do so.

In conclusion, the results of this preliminary study, particularly the improved concordance in categorisation of PD-L1 expression into the three clinically relevant categories, suggests that image analysis should, ultimately, result in more robust data and better-informed clinical decision-making.

The need to expand studies like this to include a much larger network of pathologists and laboratories is clear.

Ethics statementsPatient consent for publicationEthics approval

This study is a retrospective analysis of digital images of fully anonymised archival tissue samples of NSCLC. Ethical approval was granted by the NHS HRA (Health Research Authority) and Health and Care Research Wales (HCRW) Ref number: 20/EM/0091.

Acknowledgments

We acknowledge that the work is part of Northern Pathology Imaging Co-operative, NPIC (Project no. 104687) supported by a £50m investment from the Data to Early Diagnosis and Precision Medicine strand of the government’s Industrial Strategy Challenge Fund, managed and delivered by UK Research and Innovation (UKRI).

留言 (0)

沒有登入
gif