Validation of deauville score for response evaluation in hodgkin's lymphoma
Junita Rachel John1, Regi Oommen1, Julie Hephzibah1, David Mathew1, Anu Korula2, Nylla Shanthly1, Anu Eapen3
1 Department of Nuclear Medicine, Christian Medical College, Vellore, Tamil Nadu, India
2 Department of Hematology, Christian Medical College, Vellore, Tamil Nadu, India
3 Department of Radiodiagnosis, Christian Medical College, Vellore, Tamil Nadu, India
Correspondence Address:
Dr. Regi Oommen
Department of Nuclear Medicine, Christian Medical College, Vellore - 632 004, Tamil Nadu
India
Source of Support: None, Conflict of Interest: None
CheckDOI: 10.4103/ijnm.ijnm_102_22
Context: Positron emission tomography (PET) using F-18 fluorodeoxyglucose (FDG) for treatment monitoring in patients with lymphoma is one of the most well-developed clinical applications. Deauville five-point score (DS) is recommended for response assessment in international guidelines. DS gives the threshold for adequate or inadequate response to be adapted according to the clinical context or research question. Aims: We aimed to validate DS in Hodgkin's lymphoma (HL) by retrospectively assigning this score to F-18 FDG PET-computed tomography (CT) studies done before 2016 and analyzing its concordance with the line of management. The secondary aim was to assess the reproducibility of DS in the interpretation of PET-CT scans. Subjects and Methods: A total of 100 eligible consecutive patients underwent F-18 FDG PET-CT scans between January 2014 and December 2015. Their interim, end of treatment, and follow-up PET scans were retrospectively visually analyzed and assigned DS by three nuclear medicine physicians. Concordance was defined as agreement between the DS assigned and the line of treatment. Interobserver variability was calculated using weighted Kappa and presented with 95% confidence interval. Results: Among 212 scans assigned DS, 165 scans showed agreement between the DS and line of treatment. Of these, 95.2% of scans scored DS 1–3 were kept on following or the same treatment plan was continued and patients did well. Among the scans that showed discordance, 24 scans scored DS 4/5 were continued on the same treatment regimen and the next assessment showed disease progression. Conclusions: Our study confirmed that DS is a useful tool to aid in reporting F-18 FDG PET-CT in the management of HL with good positive and negative predictive values. This study also demonstrated good interobserver agreement.
Keywords: Deauville score, F-18 fluorodeoxyglucose positron emission tomography-computed tomography, Hodgkin's lymphoma
F-18 fluorodeoxyglucose (FDG) positron emission tomography (PET)-Computed tomography (CT) for treatment monitoring in patients with Hodgkin's lymphoma (HL) is one of the most well-known clinical applications. It is used during treatment to assess chemosensitivity with response-adapted therapy, assess remission from disease and to predict prognosis in the pretransplant setting. The level of FDG uptake can be assessed semiquantitatively using a standardized uptake value (SUV).[1]
F-18 FDG PET-CT plays a crucial role in the staging of HL with a sensitivity of 97% and specificity of 100%.[2] F-18 FDG PET had also been proven to be superior to CT for staging and has enabled upstaging in 32% of HL.[3],[4] PET-CT also has prognostic implications, as a positive study at the end of treatment (EOT) is usually associated with a higher relapse rate.[3] A study by Jerusalem et al. showed that patients with residual mass after therapy, with positive PET-CT results, had a 100% relapse rate in contrast to patients who showed no activity in the residual mass where the relapse rate was only 26%.[5]
In 1999, the National Cancer Institute Lymphoma International Working Group published the first imaging and clinical response guidelines for non-HL, known as Cheson 1999 criteria, which were based only on CT imaging.[6] This was reviewed in 2007; Cheson 2007 criteria or the International Harmonization Project criteria incorporated bone marrow (BM) immunohistochemistry, flow cytometry, and the use of F-18 FDG PET imaging as an effective modality for visualizing the presence and distribution of lymphoma at the EOT.[6] In this revised criteria, the lesions were divided into “PET-positive” and “PET-negative” following treatment. This however gave rise to significant potential for ambiguity in the interpretation of images based on SUV which had inherent errors.
To standardize the criteria, the First International Workshop on PET scan in lymphoma was conducted in Deauville, France in 2009, and it was decided that the Deauville score (DS) should be applied for reporting scans by visual analysis.[7] Previous studies have shown good interobserver agreement and confirmed that DS could predict outcomes using less stringent criteria.[8],[9] In 2014, the Lugano guidelines came out with a revision of the 5-point DS for interim and EOT analysis.[10] DS gives the threshold for adequate or inadequate response to be adapted according to the clinical context or research question.
A reduction in metabolic activity is indicative of response and interim FDG PET/CT interim PET (iPET) negativity is associated with improved outcomes in HL. The benefit of iPET evaluation has the potential to predict “response-adapted therapy,” whereby treatment can be de-escalated in the background of a satisfactory early response or escalated if the early response is inadequate.[11],[12],[13],[14],[15] Both approaches have shown promise in HL.[16],[17] A negative FDG PET/CT scan at the EOT-PET excludes residual viable tumor with high certainty in both HL and diffuse large B-cell lymphoma, with higher negative predictive value in HL.[18],[19]
DS has been in use in our institution since 2016. The purpose of this study was to validate DS by retrospectively assigning the score to F-18 FDG PET-CT studies done before 2016 and analyzing its concordance with the line of management and disease outcome. The secondary aim was to assess the reproducibility of DS in the interpretation of PET-CT scans.
Subjects and MethodsStudy design
A pilot study was done in 11 patients with HL who underwent F-18 FDG PET-CT in December 2014. Each patient's F-18 FDG PET-CT scans were reviewed and DS was assigned to interim, EOT, and follow-up scans. Data analysis revealed a concordance of 54% between DS and treatment response and the Kappa coefficient was found to be 0.3165. A sample of 100 participants (based on the pilot study done) was required to obtain a 95% confidence interval (CI) of ± 10% around a concordance rate of 54% [Figure 1].
Expected prevalence = 54%
Precision = 10%
Z-value for 95% CI = 1.96
Data were summarized using mean (Standard deviation), median depending on normality and categorical data were presented as numbers along with percentages. Interobserver variability was calculated using weighted Kappa and presented with 95% CI. All the analysis was done using STATA/IC 16.0 software (StataCorp. 2019. Stata Statistical Software: Release 16. College Station, TX: StataCorp LLC).
Patients provided consent for the scans (but were under a waiver of informed consent approved for those in the retrospective series), and the study was approved by the Institutional Review Board.
Imaging
PET imaging was carried out in accordance with our standard clinical PET protocol, the patients were injected intravenously with F-18 FDG; 3.7 MBq/kg body weight to a maximum dose of 370 MBq after a 4–6 h fasting period. All patients were imaged with an integrated PET-CT system (Siemens Biograph TruePoint 6). After 45 min to 1 h uptake period at rest, in a dimly lit quiet room, images were acquired at 2 min per bed position. The PET scan was acquired together with the CECT (contrast-enhanced computed tomography) scan. CT scans helped in attenuation correction and anatomical localization.
The iPET, EOT-PET, and follow-up PET scans were retrospectively analyzed visually, quantified by SUV, and assigned DS by three trained nuclear medicine physicians. The visual evaluation was performed in direct comparison to automatically coregistered slices of PET-CT images. All involved sites were checked for increased FDG uptake as an indicator for tumor residual. The region with the highest residual FDG-uptake was identified. Visual assessment was performed blinded for the results of quantitative measurements (i.e. SUV). The metabolic response was scored according to the DS [Table 1].
Concordance was defined as an agreement between the DS assigned and the line of treatment.
ResultsThere were a total of 121 patients with HL who underwent F-18 FDG PET-CT scans between January 2014 and December 2015. Among these, 21 patients did not meet the inclusion criteria, i.e. they either did not have biopsy-proven HL or had incomplete data/images or had a follow-up of <6 months. A total of 251 PET-CT scans were done during the study for the 100 patients included in the study. The baseline characteristics of the patients are described in [Table 2].
There was a male preponderance with 67%. Fifty-four patients (54%) were between 21 and 20 years of age.
DS was retrospectively assigned to the iPET, EOT PET scans (EOT PET), and follow-up PET scans, i.e. a total of 212 scans.
Among 212 scans, 138 were scored DS 1–3 and 74 scans were scored DS 4/5. These scans were further assessed and we found that overall 165/212 scans (77.83%) showed good agreement between the retrospectively assigned DS and the line of treatment, suggesting concordance.
Among the scans that showed concordance, 125 scans were scored DS 1–3. In 119/125 (95.2%) scans, patients were either kept on follow-up or the same treatment plan was continued and all of them showed complete response [Figure 2]. Forty scans had DS 4/5 and were managed with revised treatment regimen. In 18/40 (45%) scans management was changed based on clinical evaluation and all showed good response to treatment. However, in 22/40 scans, the patients showed disease progression despite changing the treatment regimen. Six scans (4.8%) with a DS of 1–3 although was continued on the same treatment regimen, relapsed within 1 year. Hence, DS aided in accurate treatment planning in 159/165 scans.
Figure 2: A 32/female with HL, post 3 cycles ABVD, iPET showed significant disease regression. The same treatment regimen was continued and the patient did well as seen in the EOT scan, EOT: End of treatment, HL: Hodgkin's lymphoma, ABVD: Adriamycin, Bleomycin, Vinblastin and DacarbazineAmong the scans (47/212) that showed discordance, 13 (27.65%) scans were scored DS 1–3 (i.e. they were negative) but management was changed based on adverse clinical findings and persistent metabolic activity on imaging. However, if DS was followed, in 9/13 (69.23%) scans patients could have achieved the same good outcome with de-escalation/continuation of the same therapy. Four out of 13 patients showed disease progression and so the retrospectively assigned DS of 1–3 was misrepresentative.
Of the remaining 34 (72.34%) scans with DS 4/5, in 24 (70.58%) scans, the management was continued on the same regimen but later had disease progression [Figure 3]. In 10 (29.42%) scans, although the treatment plan was not changed they still had disease regression.
Figure 3: A 45/male with HL, MC, post 3 cycles chemotherapy, iPET was given DS 4. However, as the scan showed disease regression, the same treatment regimen was continued and EOT PET-CT showed disease progression with DS 5B (new lesions in pelvic bones). In this case, DS at iPET would have aided in prior escalation of treatment. PET: Positron emission tomography, EOT: End of treatment, CT: Computed tomography, HL: Hodgkin's lymphoma, MC: Mixed cellularity, DS: Deauville scoreThe score that had maximum discordance was DS 4. Fifteen out of 23 times (65.21%), DS of 4 accurately helped in predicting disease progression at the next assessment when management was not changed.
All 100 patients received chemotherapy as part of their treatment, whereas 24 patients received radiotherapy and 10 patients underwent autologous stem cell transplant also. The mean follow-up period was found to be 4.4 years (range 1–13 years). Four out of 100 patients died due to disease-/treatment-related complications.
The interobserver agreement of DS was calculated using weighted Kappa and presented with 95% CI. The overall weighted Kappa between reviewers was 0.853.
DiscussionFDG PET-CT is increasingly used for staging and response assessment in lymphoma, both for early assessment during treatment, commonly referred to as iPET-CT, and for remission assessment at the EOT.[20],[21] The five-point scale was adopted as the preferred reporting method at the First International Workshop on PET in Lymphoma in Deauville, France (i.e. Deauville criteria), and in several international trials.[19],[22]
In 2014, the Lugano criteria were put forward which reiterated the beneficial use of FDG PET-CT in the assessment of FDG avid lymphomas. It also supported the use of DS in addition to measuring the maximum SUV (SUVmax) of the most metabolically active lesion. However, there are many inherent errors associated with the use of SUVmax for comparison between scans as SUV estimation depends on the dose injected, the time of scan after FDG injection, and patient's blood glucose levels and imaging parameters (image artifacts and patient respiratory motion).[23]
In our study, 24 (70.58%) scans were scored DS 4/5 and these patients were continued on the same treatment regimen. At the next assessment, they were found to have disease progression, thus showing that iPET DS could predict the response. Furthermore, among the scans that showed agreement, 119 (95.2%) scans scored DS 1–3. These patients were kept on the following or the same treatment plan was continued and the patients did well. This implies that DS has very good positive and negative predictive value. This is in agreement with Sedig et al.[22] who also concluded that DS improved the clinical utility of end-of-chemotherapy PET, as evidenced by an increase in positive predictive value from 72.7% to 44.4% on the basis of characterization of the report alone. The negative predictive value remained >95% by both methods.
The discordant cases were typical of cases that are challenging in daily practice. There was difficulty distinguishing the healing process from residual disease in pathologic fractures, separating physiologic from pathologic uptake with prominent brown fat uptake and separating misregistered physiologic uptake in the gut from liver uptake. These can lead to more equivocal PET reports.
In 9 (69.23%) scans, if DS was known, escalation of treatment could have been avoided and patients might have done well. The predictive value of DS in these cases could reduce the risk of treatment-related toxicities also. One of the drawbacks of our study was that only 39% of patients had baseline FDG PET-CT (others had various basic imaging modalities) and so DS in iPET might have been compromised. A baseline pretreatment FDG PET-CT is always recommended to allow accurate comparisons later during treatment as was shown in a comprehensive comparison of FDG-PET/CT and CT alone in 1214 HL patients (RATHL trial between 2008 and 2012).[24]
This study revealed good interobserver agreement on reporting with the DS. The weighted Kappa between reviewers was 0.853. Similarly, very good agreement was reported between expert readers from four different European centers using the five-point scale in a study done by Barrington et al.[25] and separately in another study by Biggi et al.[13] A good interobserver agreement has been reported in HL as compared to other lymphomas. Interreader concordance of scoring may depend on the population included and possibly on the nature and cycle number of previously administered chemotherapy.[26]
The highest discrepancies between observers were seen for DS 4. Among the scans that scored DS 4, in two scans one observer gave a DS of 3 while in the remaining the debate was between DS 4 and DS 5. Similar findings were reported in a study by Itti et al.[26] This could suggest the need to further define the DS and refine what is meant by “moderately more” and “markedly more” than liver. Discrepancies were due to misidentification of the residual tumor in areas of high physiological uptake (tongue, tonsil, thymus, stomach, muscle, and liver), or due to postchemotherapy or radiation effects (BM and spleen activation and thyroid inflammation), or without any obvious reasons.[26]
A proposed “Refinement of the Lugano Classification lymphoma response criteria in the era of immunomodulatory therapy” was published in 2016.[27] This refinement suggested the addition of a new response category, “indeterminate response,” to recognize that increases in the extent of FDG-avid disease following immunomodulatory therapy do not necessarily indicate the presence of progressive disease. This adds flexibility in terms of the interpretation of disease response and enables clinicians to direct patient management while maintaining close followup monitoring. However, further validation of this revision is required.[27]
With the advent of PET-CT, the management of HL has evolved. This however brings an added financial strain on the patient. A recent retrospective study from our center did not show a difference in relapse rates between patients who were examined with suboptimal imaging techniques and clinical examination as compared to those who had a PET/CT and reiterates the use of basic imaging modalities as assessment tools when resources are a constraint.[28] However, it is difficult to distinguish posttreatment fibrosis from active residual disease using basic imaging modalities thus resulting in increasing number of false-negative reports. Although PET/CT has much strength and is widely recognized as a standard assessment modality in HL, it is important to acknowledge that there are limitations to its use, which become apparent during routine clinical practice. The use of PET/CT to predict clinical outcomes in HL must be accompanied by recognition that this modality has imperfect sensitivity and specificity, which results in given false-negative and false-positive rates, respectively. Combining FDG-PET with molecular biomarkers, such as circulating tumor DNA, is likely to enhance the predictive value of response assessment and help to refine response-adapted treatment approaches. Other approaches including metabolic tumor volume and its prognostic impact on HL need to be investigated further.[29] Our study has a few limitations which include its retrospective nature, single-center experience, and small sample size.
Despite these precincts, our results support the concept that DS is a valuable tool for the interpretation of response assessment in PET scans in HL and can help the treating oncologist in planning the next steps in management. The use of a graded visual response assessment reflects that F-18 FDG uptake is a continuum, with the likelihood of malignancy increasing as the level of F-18 FDG uptake increases, rather than a black-or-white phenomenon indicating the presence or absence of malignancy.
ConclusionsDS is a useful tool to aid the reporting of F-18 FDG PET-CT in the management of HL with good positive and negative predictive values. The inclusion of DS makes equivocal scans easier to interpret for the treating physician. This study also demonstrated good interobserver agreement and indicates that DS could predict outcomes in the majority of patients. F-18 FDG PET-CT provides a valuable means of stratifying HL patients at various points (iPET, EOT, and follow-up) before planning the next treatment regimen.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References
留言 (0)