Improving practice in PD-L1 testing of non-small cell lung cancer in the UK: current problems and potential solutions

Assessment of PD-L1 expression as detected by IHC is currently the only ‘test’ used universally to guide the prescription of IMs to treat patients with NSCLC, and its implementation has not been straightforward. Variation in specimen processing and in the experience of pathologists engaged in its interpretation augment the unavoidable challenges inherent in its biology and weaken its predictive power. The results of our survey highlight well this variability and raise the obvious question of how it might be reduced. In the context of UK practice, we believe our survey to be the most comprehensive yet performed in this area of diagnostics, in terms of coverage of those active in this area and the data collected. A more detailed understanding of why such variability exists is a prerequisite to devising a strategy to reduce it, assuming that variability is detrimental to the desired endpoints.

Laboratory practice

Variability in laboratory practice (the handling, processing and preparation of specimens preassessment) is almost a tradition in pathology, a legacy of an approach that, until recently, owed more to cookery than to uniform, evidence based, regulated and tightly controlled practice. Such variability was highlighted in a recent review addressing the use of cytology specimens for assessing PD-L1 expression in NSCLC11 and is important because its ultimate consequence is that specimens prepared by different laboratories might already vary in how PD-L1 expression is manifested before they are interpreted by a pathologist. Such variability has been brought into sharp focus by the increasing requirement for broader predictive ‘biomarker testing’ of NSCLC using IHC, and by studies showing how variation in such techniques can have an impact on treatment choices. This is illustrated, for example, by the results reported by the UK National External Quality Assessment Service (NEQAS) on assessing expression of anaplastic lymphoma kinase (ALK) fusion protein.12 The ready availability of EQA schemes, across the developed world at least, provides an obvious mechanism for standardising laboratory practice and reducing variability.13 A comparison can be drawn between the current situation with PD-L1 testing and the serious variability in the technical quality of specimens of breast cancer assessed for human epidermal growth factor receptor 2 (Her2) expression that became apparent in the early 2000s when UK NEQAS established an EQA scheme specifically for this predictive test.14 15 A similar scheme for PD-L1 expression in NSCLC is now well established by UK NEQAS and is generating valuable information about interlaboratory variability; in the UK, subscription to such schemes is mandatory for laboratories performing such analyses in order for them to obtain UK Accreditation Service accreditation (standard ISO15189).6 It is important, however, that this information is acted on and the effect of these improvements re-audited. It is sobering also to realise that, in many countries, subscription to such EQA schemes is not mandatory.

Interpretation

Identifying the reasons for, and then improving interpretation of, PD-L1 expression by pathologists is more challenging still. The most worrying result of our survey is the wide variability of scoring PD-L1 expression within the three broad groups, ‘negative’ (0%–1%), ‘low’ (1%–49%) and ‘high’ (≥50%). These scores, the ultimate endpoints of PD-L1 testing on which crucial clinical decisions are made, should show relatively limited variation between centres since it is unlikely, in the context of UK patients with NSCLC, that significant variation in the range of PD-L1 expression will occur for reasons of biology or geography. It is well established from clinical trials and other reports that the distribution of PD-L1 TPSs is approximately even across the three categories of ‘negative’, ‘low’ and ‘high’ with, perhaps, a tendency for slightly fewer cases in the middle category, leading towards a bimodal distribution.5 7–11 Broadly speaking, therefore, there is evidence from our survey that some centres may be ‘under-reporting’ the PD-L1 TPS. With the deployment of stage-agnostic reflex testing, which appears to be the dominant approach in this survey of UK centres, there could be a slight bias towards a greater, though still relatively small, proportion of early-stage disease in the test population when compared with data from clinical trials of patients with more advanced disease. Although there is evidence for lower PD-L1 expression in early stage disease,16 this still would not account for the ‘outliers’ in this survey reporting high proportions of specimens as ‘negative’. Most of the laboratories in our survey used trial-validated companion diagnostic assays, so it is unlikely that the observed variation is due to poor assay sensitivity.

Of course, there will always be some variability; interpreting PD-L1 expression is, by its very nature, subjective, but we do not believe that the variability we reveal here is acceptable. Guidelines for which pathologists should and should not interpret PD-L1 expression in NSCLC have emerged over recent years, but are difficult, if not impossible, to enforce. It has been suggested, for example, that interpretation should be restricted to pathologists who see at least 200 diagnostic lung cancer specimens a year, have undergone appropriate formal training (which results in some evidence of competence) and subscribe to an appropriate EQA scheme that is interpretative, not technical.7 Even among the laboratories covered by our survey, in which at least one pathologist, as a member of the APP, clearly has an interest in thoracic pathology, there are some worrying trends. For example, more than a third of laboratories handle fewer than five PD-L1 tests a week and, in more than 15%, the PD-L1 testing workload is spread between five and eight pathologists (online supplemental file 1).

All pathologists involved in PD-L1 scoring are aware of how difficult it can be and of its subjectivity. In the training programmes that are delivered for PD-L1 assessment by means of a TPS, emphasis is put on how to (semi)quantify, if not actually count, the number of tumour cells in the sample and the proportion that are ‘positive’. All levels of staining intensity are relevant and are counted. In a proportion of cases, staining can be weak, requiring examination at high magnification. As pathologists become more familiar with an assay such as PD-L1 scoring, the time required for each assessment will inevitably reduce. Anecdotally, we also hear reports of a more ‘gestalt’ approach to assessment that could conceivably lead to small numbers of positive cells, or cells with light staining, being missed. As many pathologists are currently practising under pressurised conditions with poor staff/workload ratios and pressure to improve TAT, taking such shortcuts is understandable; more than a quarter of respondents in this survey reported average TATs of 5 days or more.

In comparison with clinical trials, from which cytology specimens were excluded, it is difficult to know precisely what impact the regular, routine testing of such specimens might have had on our observed outcomes. Most pathologists acknowledge that, in general, PD-L1 scoring of cytology specimens can be challenging and require more time, but there is no conclusive evidence that PD-L1 scores per se are lower in cytology as compared with histology (‘biopsy’) specimens.7 10 11 As discussed above, there is considerable variability in how cytology specimens are processed, and this may well contribute to variability in the results obtained from their assessment.17

In view of these challenges, there is growing interest, as in other difficult areas of diagnostic pathology, in the use of image analysis, algorithms and machine learning as an aid to interpretation. For example, the validation of such software as an aid to interpretation of PD-L1 expression in NSCLC is a component of the Northern Pathology Imaging Co-operative project,18 which is currently assessing its utility to a range of pathologists with varying levels of experience across six universities in the North of England.

Some variability is inevitable in such complex systems as laboratories in which activity is run and undertaken by individuals who vary in their approach, practice and the variety of skills they possess, and is not surprising. Indeed, a very similar pattern of variability, although in a slightly different context, was revealed by the LungPath study.19 In this survey, the approach of laboratories and pathologists to subclassifying NSCLCs into squamous and adenocarcinoma was examined, and the findings are largely recapitulated by those we describe here. This is not to say, however, that such variability cannot be reduced.

We suggest that a formal network is established of all laboratories engaged in PD-L1 testing of NSCLC with a view to sharing details of practice and data resulting from testing. This would provide a basis for standardising and improving practice and would carry an important educational component.

Ultimately, however, encouraging and supporting adoption of best practice might require a more rigorous approach by those institutions, such as the Royal College of Pathologists and Institute of Medical Laboratory Scientists, that are responsible for training, examining and maintaining standards. Part of the approach to remedying the serious inconsistencies in assessing specimens of breast cancer for Her2 expression referred to above consisted of removing the service from ‘failing’ laboratories. This greatly improved quality and consistency and set an important precedent.

Adequacy of samples

The only objective metric we have for sample adequacy for PD-L1 testing is the presence of at least 100 viable tumour cells in the tissue section being assessed. Intuitively, this makes sense when one is delivering a percentage score on a sample that is already severely challenged by biological heterogeneity and sampling ‘error’ but raises questions about how representative of the patient’s disease burden the rendered score actually is. There is evidence that TPSs reported on samples that have <100 tumour cells are much less predictive of response to IMs than scores derived from samples that are richly cellular.20 It is comforting that awareness and reporting of this criterion of sufficiency seems to be universal in our survey.

Our survey is by no means the first to highlight the problems and challenges with PD-L1 testing in NSCLC, which were clearly apparent, for example, in the global survey conducted by the Pathology Committee of the International Association for the Study of Lung Cancer.21 However, we wished to concentrate specifically on practice in the UK so that addressing and resolving any problems that might become apparent could be managed efficiently under the auspices of the APP, which is a UK-based association with strong national links.

It is gratifying, for example, that the College of American Pathologists is currently in the process of developing guidelines for PD-L1 testing of patients with lung cancer in an attempt to standardise and improve assessment, a strategy that also considers the possible utility of assessing tumour mutational burden as an adjunctive investigation.22

It is always politically difficult to impose what are often interpreted as restrictions on what individuals might or might not do, even to the point of their being seen as a threat to individuality. In the end, however, the only significant measure of quality of any test we perform, or assessment we make, is arriving at the right answer for the patient, the ultimate user of the service we provide.

留言 (0)

沒有登入
gif