Cost-effectiveness analysis of molecular testing for cytologically indeterminate thyroid nodules

This cost-effectiveness model and analysis suggest utilizing Afirma in the testing of indeterminate solitary thyroid nodules is a cost-effective strategy for avoiding unnecessary thyroid surgery. At a willingness-to-pay threshold of $5000 per surgery avoided, the Afirma strategy is the more cost-effective strategy with a certainty of 62.6%. Although including Afirma within the treatment algorithm has an increased cost compared to standard management, $8176.28 and $6016.83 respectively, molecular testing had a superior effectiveness in avoiding unnecessary surgeries (0.58 versus 0.07). Thus, the results of this study may help inform the decisions of clinicians and patients as they weigh the case-by-case usefulness of molecular testing in indeterminate thyroid nodules. For many patients and health care providers alike, this incremental increase in cost when using molecular testing (approximately $4000) is easily offset by the number of surgeries avoided and the resultant cost savings.

An interesting finding of this study is that cost effectiveness varies significantly based on the cost of the molecular test. In the one-way sensitivity analysis, the cost of the molecular test was the heaviest contributor to cost-utility. Interestingly, the test became cost-neutral at a cost $2778.06 and was dominant at lower costs (i.e. at a lower cost of the test, this strategy would be both cheaper and more effective). Therefore, as the costs of the molecular test decreases, testing becomes the clear choice in managing indeterminate solitary thyroid nodules. Other molecular tests are in development and are likely to be made available at lower costs [15]. Of note, while it has been shown that patients with a high pre-test probability of malignancy, based on worrisome findings on high-fidelity ultrasound with standardized reporting, are less likely to benefit from molecular testing, this study assumes the base case of an individual with low/intermediate risk features on ultrasound [16]. Further, high quality ultrasound with standardized reporting is not yet available in many Canadian regions, and therefore does not depict the current reality of these regions.

While this study suggests molecular testing is a cost-effective strategy, the results from previous studies have been varied [13, 17,18,19]. These conflicting results likely relate to the differences in model construction, outcome measures and cost estimates in different health systems. Despite this, only a few studies conclude molecular testing is not cost-effective and is, instead, dominated by standard practice.

A difference when comparing these studies to the current study is the choice of outcome measure. Some previously published studies assess effectiveness through Quality Adjusted Life Years (QALYs), as opposed to surgeries avoided, used in this study [17, 20]. QALYs gained is an established outcome measure in cost-effectiveness analyses, allowing for inclusion of more broad health states related to less quantifiable metrics such as the impact of time away from work, subsequent diagnostic testing, anxiety related to the disease state and the societal impact. While QALY is an important metric when assessing the cost-efficacy of two different treatment interventions (which have long-term consequences on patient outcome), it may not be the optimal metric for assessing the immediate impact of a diagnostic test. The utility of a diagnostic test such as Afirma is best evaluated by determining its impact on avoiding a more invasive diagnostic procedure (lobectomy). It has little impact on the long-term disease state for the individual patient. Regardless of choice of treatment strategy, patients in both arms will be subject to some degree of follow up testing, and the anxiety related to this. Furthermore, given that overall outcomes for patients diagnosed with indeterminate thyroid nodules is excellent, and the risk of surgical complications that negatively affect long-term quality of life is low, the broad health states captured by QALY are not dissimilar, regardless of whether a molecular test was used [18]. Using QALY would therefore significantly dilute the impact of the cost-efficacy of a molecular test. Finally, the health utility values utilized in the analyses using QALYs are extrapolated from a small sample survey study and may not be accurate, and attaching costs to these utilities is even more problematic [21]. This is in large part due to the fact that factors that affect global heath states (time away from work, anxiety) occur inconsistently between individual patients, and the degree to which they occur varies immensely. This is particularly important as models were highly sensitive to the valuation of health states and, hence, inaccurate health utility values would challenge the robustness of the model [17]. Using surgeries avoided as an outcome of effectiveness more directly addresses the strength of using a molecular test and provides a more practical and immediate sense of the benefit gained. Similar to our study, a study assessing surgeries avoided as an outcome of effectiveness found molecular testing to be more effective compared to standard practice [18]. Additionally, a study assessing the cost-effectiveness of molecular testing compared to diagnostic thyroid lobectomy using correct diagnosis as the outcome, rather than QALY, found molecular testing to be the superior, more cost-effective strategy [24].

When constructing the decision tree model, another variable that differs across studies is extent of ongoing surveillance required for indeterminate thyroid nodules deemed benign or “negative” by molecular testing. Models concluding standard practice to be more cost-effective compared to molecular testing included ongoing follow up and surveillance of “negative” nodules over the course of the model, typically 5 years [17, 20]. The appropriate surveillance for these “negative” nodules has yet to be elucidated and annual follow up may be excessive and therefore incur unnecessary costs, leading to an inflated cost estimation of the molecular testing strategy. Further, regardless of use of a molecular test, the long-term outcomes of the two treatment arms are not dissimilar in terms of favourable clinical outcomes for indeterminate thyroid nodules, as well as the use of ongoing tests for surveillance. To avoid masking the immediate impact of the molecular test on avoiding surgery, our model’s time horizon included a surveillance duration of one year. During this one-year time horizon, standardized follow-up testing and the associated probabilities of negative outcomes were incorporated into the observation portion of the model, to allow for real-world simulation. The creation of a robust costing model with a finite time horizon allows for direct comparison of several commercially available molecular tests, and will highlight any subtle differences in cost-efficacy.

It is important to note that nodules with cytology consistent with either Bethesda III and IV were pooled into one analysis. While in theory, a separate analysis for each category could yield a separate cost-efficacy outcome, there were several practical reasons to amalgamate these two categories. Firstly, the ATA guidelines state that both Bethesda III and IV nodules are deemed indeterminate in terms of malignancy risk and could be managed via diagnostic lobectomy. The reported range of malignancy for both categories is wide and overlapping; in Table 1 of the ATA guideline [4], the range for Bethesda III is 6–48% and Bethesda IV is 14–34%. Therefore, there would be little value in running the model separately for Bethesda III and Bethesda IV, given the probability of malignancy of the two categories are similar. Additionally, both Bethesda III and IV nodules are suitable candidates for molecular testing. Finally, with regard to model construction, while the model could stratify for Bethesda risk category in the standard treatment arm, the rates of malignancy in the Afirma arm, stratified by Bethesda category, are not well known. This would force the model to pool the analysis in one treatment arm, while not pooling in the other. To avoid this, a decision was made to remain consistent with previously published studies that have similarly attributed a pooled malignancy risk to both Bethesda III and IV nodules [4, 5]. In this study, that risk was 19.5%. It should be noted that the model did vary the malignancy risk to an upward limit of 50% (to accommodate for centres with higher rates), but this did not have a significant impact on the outcome of the analysis.

The ATA Guideline recommendation 15 suggests that nodules with initial cytology of AUS/FLUS cytology could undergo either repeat FNA or molecular testing [4]. Our model therefore did not consider repeat FNA as an option since the goal was to address the impact of a decision to use molecular testing instead. Further given that repeat FNA is inconsistently used in practice, eliminating this option from the model allows for evaluation of a homogeneous patient population that was subjected to the same preliminary investigations.

There are several unique strengths of our study. Firstly, this is the first cost-effectiveness analysis for Afirma in the management of indeterminate nodules using Canadian specific data and a single payer model. This is particularly important as the cost of the test varies among countries. The results of this study are contextualized to the Canadian health care system, can provide unique insight into the value of molecular testing in Canada, and inform potential decisions to fund this test by provincial health systems. Secondly, the costs employed in this model are not reported or aggregate costs. Instead a more accurate micro-costing approach was used to populate our model to allow for a more robust and accurate cost estimation [14, 22]. Thirdly, we used a short time horizon for this model: one year. While some studies use a longer time horizon, these models may dilute the impact of a diagnostic test as the contribution to cost or efficacy is primarily in the first cycle or year of the model. The further iterations become more reliant on factors that are difficult to control (and not related to the molecular test), such as final pathology, risk of recurrence, and relevant findings on ongoing surveillance investigations, which may trigger further costly interventions. Fourthly, while Afirma was the molecular test used to construct the costing model for this study, the model now allows for the substitution of any commercially available molecular test, to allow for comparisons to be made. Lastly, the treatment algorithm utilized in the model is based on the most recent ATA Guidelines, published in 2015, thereby making the model more aligned with current clinical practice and the most up-to-date cost-effectiveness analysis of molecular testing.

There are some limitations with this study. Firstly, all models inherently must incorporate assumptions and expert opinion must be used where there is a paucity of published literature. In this model, all nodules that grew in size following a negative Afirma test underwent diagnostic lobectomy. However, the rates of growth in this population are unknown, as is the rate of eventual malignancy. Therefore, identical rates of growth for Afirma-negative nodules were used as for all thyroid nodules which have not undergone testing. This may overestimate rate of growth in the Afirma-negative nodules, and therefore overestimate the probability of diagnostic lobectomy, malignancy, complications, and the related costs [19]. However, this may strengthen the conclusion that Afirma is cost-effective, as over-estimates of the cost associated with the Afirma strategy would bias the results towards standard practice as the more effective approach. Secondly, the willingness-to-pay threshold varies from previously published literature. The most common value cited is a WTP of $100,000/QALY gained. However, as the primary outcome in this paper is “unnecessary surgery avoided”, the WTP threshold is not related to QALYs and rather surgery avoided. To provide the most conservative estimate possible, a WTP threshold similar to the cost of surgery ($5000) was used, on the assumption that an individual would be willing to pay at least that same amount for the test, in order to avoid the surgery. Of note, costs associated with surgery are already accounted for in the costing model and the effectiveness metric measures only the willingness to pay over and above the financial cost of surgery. Practically, a payer would likely be willing to pay much more than $5000, given that this cost does not account for the monetary loss of time away from work, and decreased productivity in relation to contribution to society. Similar approaches have been used in other papers where the long-term quality of life does not differ substantially between the two groups, as is the case in this patient population. Additionally, other published WTPs for surgery avoided are much higher. For example, a study comparing early versus late tracheostomy cited a WTP of $80,000 per tracheostomy avoided [23]. Had a larger WTP threshold been used, the certainty that molecular testing was the more cost-effective strategy would have increased, however the goal was to be conservative in the conclusions. Thirdly, we utilized a healthcare perspective (single government payer) and thus this model does not include societal costs such as time away from work, income loss, and delays for procedures. These are important yet difficult factors to incorporate. In addition, we utilized the micro-costing approach based on local data, making our data more relevant to the Canadian context, but limiting the generalizability of our findings to other settings outside of Canada. Despite these limitations, the unique strengths and perspective of this study support the conclusion of this cost-effectiveness analysis.

留言 (0)

沒有登入
gif