An Adjustable Positivity Threshold for Non-invasive Screening Tests for Colorectal Neoplasms Can Improve Screening Program Effectiveness and Feasibility

Choosing a Predictive Threshold that Matches the Desired Program Outcomes

The results of this literature review and modeling confirm that f-Hb is not simply a predictive biomarker, but is also program-outcome-selective, when considered from a quantitative perspective. The consequences of threshold adjustment based on a once-only test have major implications for screening program outcomes and are summarized in Table 3.

Table 3 Consequences of fecal hemoglobin (f-Hb) threshold adjustment for screening program outcomes

The dwell time of precursor lesions (i.e. the several years it takes for lesions to progress to an invasive CRC) adds a layer of complexity to determine the precursor lesion detection rate in the context of repeated screening with FIT. For example, an APL missed at a screening round might be detected at a subsequent round even though the threshold was not increased and the lesion was initially missed. On the other hand, some missed APL might progress to invasive CRC in the time interval between two screening rounds, with the chance for this happening likely to be dependent on a host of individual-dependent variables. These rapidly developing lesions might be the ones we would most like to detect, as their progression is more rapid. To further add complexity, it is possible that some APL missed in previous rounds but detected at subsequent rounds might be slowly growing lesions with minimal risk of progressing to invasive CRC throughout during the screening participant’s lifetime. This inability to identify the highest risk precursor lesions might be improved with inclusion of non-hemoglobin predictive biomarkers incorporated into new screening tests.

The false-negative rate is an important consideration. A surrogate measure for this is the interval cancer rate, which if it were to rise, might be associated with an increase in CRC mortality although survival of those with interval CRC remains better than in those who did not screen in the original gFOBT randomized trials [22].

From a global perspective, there are two main types of screening program [1]: PBOS based on a WHO-style public health model [23] and structured opportunistic screening based on jurisdictional standards for practitioner practice [24]. Whether PBOS or opportunistic in nature, program priorities can differ widely, and how different countries provide screening, determine positivity thresholds, and choose the outcomes appropriate for the applicable health-care system, varies widely [1].

Feedback from the global survey confirmed a prior view [5] that several scenarios require consideration by a screening jurisdiction conducting PBOS programs [19]. The following scenarios are likely to require different thresholds:

1.

be highly discriminatory between those with CRC or APL and those who are normal or with other pathologies (e.g. non-advanced precursor lesions).

2.

detect most CRCs.

3.

detect as many advanced precursor lesions as feasible to reduce incidence.

4.

minimize the chance of returning a false-positive result in a person without any CRC or APL.

5.

provide a manageable population colonoscopy workload.

6.

achieve equity across age-bands and/or for gender.

Towards Predictive Options with New Tests

Is it feasible for new tests, especially those based on multiple biomarkers panels, to be configured to provide a range of diagnostic accuracies and workloads, that enable screening program providers to choose a configuration that fits one of the above scenarios and suits the demands of a program’s health-care context?

As technological capabilities have rapidly increased, other predictive biomarkers detected in a range of biological samples, including feces and blood, have and are emerging [25]. Multiple biomarkers have typically been included in test panels in an effort to improve disease discrimination. Machine-learning algorithms, often lacking transparency, are typically applied to the biomarker panels to identify who is more likely to benefit from colonoscopic follow-up by virtue of the test’s predictive value.

An established multi-marker non-invasive screening test is a multi-target fecal test that incorporates hemoglobin and DNA biomarkers [26]. Blood tests based on a range of biomarkers are also in process of being developed and evaluated for a range of cancers [27]. One cell-free DNA blood-based test has 87% sensitivity for stages I-III CRC but just 13% for APL at a corresponding specificity of just under 90% [28]. Tests such as these use an algorithm to generate a qualitative result (to identify who proceeds to colonoscopy) with a corresponding clinical accuracy. In the USA for example, the United States Preventive Services Task Force (USPSTF) has suggested that a non-invasive screening test should be at least 70% sensitive for CRC with a specificity for cancer plus APL of at least 90% [24]. Similar guidance is provided by the United States Centers for Medicare Services guidance (74% sensitivity and specificity 90%) [29]. Such a specification would not suit a proportion of other countries that undertake population-based organized screening where a higher specificity and more manageable colonoscopy workload is important [19]. So far, no new multi-marker test is providing capacity to choose a different positivity threshold with an alternative predictive outcome.

Achieving threshold adjustability with such tests would require a multilayered approach. There are at least two ways to achieve this. One is to provide capacity to users to apply different weights to algorithm components, although this would require algorithm transparency and adaptation to the population of intended use in which demographics including ethnicity and other factors might require careful reconsideration. The other is to generate a series of predictive algorithms with differing emphases on screening program outcomes. Transparency of algorithms would be desirable. Either approach would facilitate choice of a threshold that suited the chosen scenarios, even if not enabling threshold adjustment per se, and would improve the actionability of a new test for widespread use in organized screening programs.

The relationships between laboratory certification of a diagnostic test, documentation of marketing claims and acceptance by health-care funders are complex and vary widely among those countries that conduct screening programs. Generally, a test’s analytical characteristics would ideally conform to recommended protocols described by relevant standards such as those of the international Clinical and Laboratory Standards Institute (CLSI) or the Quality System Requirements (QSR) of the USA [5]. Requirements for competence and quality that apply to medical laboratories will be set by bodies such as QSR or the international standards ISO 15189, 13485-16 and 14971.

Despite these complexities, manufacturers are encouraged to consider these options for test configuration. But it should be noted that safety and efficacy data requirements might be more rigorous if a test is presented as a quantitative (rather than qualitative) test, or where the end-user has freedom to choose the threshold that best delivers their program outcomes. The studies required to establish the necessary evidence are likely to be larger than those currently needed when comparing non-invasive tests, each with just a single predictive threshold. The recent expert consensus described how the potential impact of a new test on program outcomes including mortality would be readily inferred by a paired comparison to an established FIT without need to colonoscope everyone [5]. To identify outcomes at different thresholds, this would be achievable by using a practical but study context-feasible low threshold/high positivity rate FIT, with simulation of the relative performance of the two tests as the positivity threshold is raised [30]. Doing so would reduce the concern about different missed-lesion rates not being verified by colonoscopy. Alternatively, a random subgroup subjected to colonoscopy could be included to document false-negative rates, or several years of follow-up with linkage to population CRC registries.

Implications for New Tests

Measurement of f-Hb using quantitative FIT has a capability unique in current non-invasive screening tests that enables screening program providers to take an outcome-selective approach to using the test positivity threshold that delivers the preferred test accuracy while ensuring that the resultant colonoscopy workload is feasible for the applicable health-care system. The accompanying paper documenting the wide variation in the thresholds used in population-based programs [19] shows that thresholds can be much higher than the optimal discriminatory point found in this population. A qualitative test with a fixed threshold—the format emerging with new non-transparent algorithm-based tests—would not deliver this flexibility.

In modeling a much larger dataset than has previously been reported in the literature, we confirm that f-Hb is predictive for different stages of colorectal neoplasia and that the chosen predictive threshold is a key determinant of program impact and feasibility. Qualitative tests are not favored as their use is limited to a single fixed program outcome scenario. The recommendation made here to provide an adjustable threshold is not just one based on modeling but also on practice, given that programs in New Zealand and the Netherlands have already changed their thresholds [31, 32].

It also needs to be noted that a single test metric might not be appropriate for subpopulations characterized by differing age distributions, sex ratio, ethnicity or other underlying risk factors that influence prevalence of CRC and are relevant to the predictive biomarkers. It is apparent from population studies that age-band and sex are also significant determinants of f-Hb [33,34,35]. An important paper from Finland [36] has shown that if the same threshold is applied to men and women, then women received proportionally fewer colonoscopies and that a lower proportion of prevalent CRC are diagnosed in women. Consequently, Sweden have now introduced different FIT positivity thresholds for men and women which are higher in men [37]. Scotland is also considering adopting this approach [34]. While there is increasing interest in achieving equity for different subpopulations with FIT or new tests doing so is going to depend on the health-care context. Decisions will be needed to determine whether equity is to be based upon similar false-negative rates, similar colonoscopy rates or more complex metrics such as cost-effectiveness.

The strength of the modeling of FIT data presented here is that it overcomes deficiencies in earlier studies where either false-negative rates were not possible to determine, case numbers were small, or workload feasibility was not considered.

There are several possible weaknesses. FIT accuracy was not adjusted according to age and/or sex, primarily because even 74 CRC cases was insufficient to provide precise estimates. This might be important for novel markers if they are dependent on age or sex. The population modeled is not a typical “average-risk” screening population as pointed out in the prior publication [20]. However, symptomatic cases were excluded, the overall test positivity rate was comparable to population screening studies [20], and the CRC incidence was consistent with prior screened populations [20]. However, the number of APL per CRC was approximately 7.5:1. This is somewhat higher than that expected in an average-risk population and perhaps reflects some bias due to the nature of the population where APL are likely to be more frequent. This means that the observed FIT accuracies according to the f-Hb might not match those found at the same f-Hb in a different population, especially a population with a differing proportion of bleeding lesions or major demographic differences. We did not attempt to model the relative impact of different thresholds on CRC mortality and incidence as costs vary so widely around the world. However, impact on mortality and incidence can be implied by sensitivity for CRC and/or APL, while costs and cost-effectiveness are implied from colonoscopy workloads and number needed to colonoscope to detect a relevant neoplastic lesion.

The global experience with FIT and the modeling provided here, together highlight the need for programs to be able to ensure that test positivity specifications are compatible with the desired program outcome scenario. Ideally, new tests, based on new predictive biomarkers will provide capacity for program providers to select the positivity threshold that delivers the desired screening outcomes, including feasibility as well as detection.

留言 (0)

沒有登入
gif