Worldwide sources of data in haematology: Importance of clinician-biostatistician collaboration

Observational studies using registry-based data have contributed much to the understanding and care of patients with hematologic diseases. Likely related to such things as the increasing focus on the use of real-world evidence, the availability of electronic health records (EHR), key legislative changes over the past decade, as well as events such as the worldwide coronavirus pandemic, there has been an increased focus on registries and observational research in hematologic diseases. For the purposes here, a registry is an “organized system that uses observational study methods to collect uniform data (clinical and other) to evaluate specified outcomes for a population defined by a particular disease, condition or exposure, and that serves one or more predetermined scientific, clinical or policy purposes” as defined by the United States (U.S.) Agency for Healthcare Research and Quality (AHRQ) [1]. In this review, we describe characteristics of high-quality registries, the use of registries in research, and provide an overview of registries focused on hematologic diseases including hematopoietic cell transplantation (HCT) and cellular therapies (CT). The importance of collaborations between biostatisticians and haematologists in designing and conducting registry-related research will be highlighted.

The quality of registry-based research relies heavily on the quality of the data collected from the registry. The foundation of a well-designed registry is a clearly defined purpose as well as scientific, clinical or policy objectives that are developed with input from key stakeholders [1]. That may include patients, health care providers, biostatisticians, bio-informaticists, researchers, community-based or patient advocacy organizations, communities, pharmaceutical companies or device manufacturers, policy makers and governmental health agencies. The registry purpose serves as the guidepost for defining the population, the data sources used (e.g., EHRs, administrative databases, patient-reported data), the data elements collected, the key outcomes measured and the overall registry design Many registries function in a centralized manner, collecting and storing data in a central location with ongoing data curation by trained registry personnel. This approach decreases the resources needed at centres, allows for ready access of formatted data for research purposes, but requires ongoing data submission from centres. Conversely, federated data registries request data from participating sites in an on-demand manner, allowing for local data storage and control, but does require standardization and coordination among sites [2].

Quality registries adhere to all ethical, privacy and security legal requirements, have a robust data quality program, and strong data governance policies and procedures that reflect the requirements of oversight organizations and regulatory bodies. Quality registries should also periodically seek feedback and make modifications to reflect advancements in the field or changing needs of stakeholders. More detailed resources describing these key components are available [1,[3], [4], [5]] and recently, the Registry Evaluation and Quality Standards Tool (REQueST) was developed to evaluate data quality of registries [6,7].

While randomized controlled trials (RCT) have traditionally been considered the “gold standard” for study design, registries offer advantages that allow them to address a wider range of study questions (Table 1). RCTs typically have restrictive inclusion criteria that can limit generalizability to a larger, more heterogeneous population, which is often better represented in registries. Additionally, RCTs often places a relatively high burden on the participant (e.g., travel to study site, etc.) and are costly to run especially long-term, making registries much more amenable to collecting longitudinal data. Registries are also able to collect data on subsequent treatments allowing for late effects studies to consider treatments given outside of RCTs.

Clinical hematologic registries generally collect information on patients sharing a common health characteristic such as a disease (or condition) or therapies received. Disease-focused registries are ideal for natural history studies, especially important for rare hematologic diseases, and can be used to estimate the incidence and/or prevalence of a disease [8]. Therapy-focused registries can be used to evaluate efficacy in a real-world setting, utilization and safety of medicines (e.g., new chemotherapeutic or disease-modifying agents) or procedures (e.g., haematopoietic cell transplant (HCT), cellular therapies (CT) or gene therapy (GT)) [[9], [10], [11], [12]]. Registries can also help determine trends and patterns of care, provide follow-up for late effects, study underserved populations and identify health disparities, inform quality improvement initiatives, guideline development, and are valuable when RCTs are not ethical or feasible (e.g., cost or the field is advancing quickly) [1,[13], [14], [15]]. Registries with established biorepositories can also support genetic predisposition studies or those aimed at understanding the underlying drivers of disease or correlative studies considering patient and donor factors that influence outcomes.

The composition of a registry cohort directly impacts how generalizable the study results are to the larger population and affects whether disparities in access to treatments and outcomes can be detected [16]. Enrolling individuals that are representative of the larger population does require access, resources, and may involve incentives. Launched in 2008, the Hematologic Disease Registry of the Japanese Society of Haematology has enrolled 20,000–40,000 patients per year with newly diagnosed hematologic diseases, amassing data on nearly 400,000 persons as of 2020 [17]. Centres are incentivized to enrol patients to be certified by the Society as an Education Centre for Haematology. Reporting to registries can also be mandated. As an example, U.S. HCT centres are required to submit essential transplant data to the Center for International Blood and Marrow Transplant Research (CIBMTR; as holder of a U.S. government contract) for all allogeneic HCTs [18]. Beyond that, the CIBMTR collects more detailed data on a subset of consenting patients for research purposes and uses sampling techniques to enrol a more representative sample, thereby maximizing external validity and resource allocation. These patients are randomly selected to the research cohort using a weighted propensity score to promote representative assignment within constrained resources while also increasing selection of patients that contribute to high priority research questions.

The essential (or “core”) data elements included a registry should be relevant and clearly defined. Common data elements for various hematologic diseases (e.g., leukaemia [[19], [20], [21]], sickle cell disease (SCD) [[22], [23], [24]]) and treatments (e.g. HCT [25], CT [26]) are needed to help collect relevant data and encourage standardization. Data collected by hematologic registries often include the following: patient population (e.g., demographics, co-morbidities), the disease or condition (e.g., diagnosis, disease burden, laboratory, radiological, genetic and/or genomic features, prior treatments), potential therapies (e.g., dosing, timing, expected and unexpected adverse events, costs), and outcomes. To help guide selection of outcomes measures, the AHRQ designed a content model, the Outcomes Measures Framework (OMF), that includes five categories: survival, clinical response/status, events of interest, patient-reported and resource utilization [1]. In general, the data collected should allow a researcher to describe the population and perform the appropriate risk adjustments needed to assess the outcome of interest.

Data may originate from various sources (e.g., patients, medical team, EHRs, claims or billing data, or others), but should be reliable and feasible to collect [3]. Many registries historically relied on manual data entry, and while it continues to be the primary method of data collection, it is time-consuming, resource heavy and is a source of transcription errors. Adequate training of data entry staff, data quality validation and audits can correct these errors, but these are still resource intensive. A major challenge that faces most registries is the limited funding available to support data collection. Thus, any increases in data collection needs to carefully consider burden or provide additional resources to the centre. To decrease centre burden, there is much interest in leveraging EHRs for data capture. This is the approach being taken by the CIBMTR and interested partner centres to collect demographics and laboratory values [18]. Ongoing work is focused on expanding the types of discrete data that can be procured directly from the EHR. Within the bioinformatics community, there is a major focus on improving natural language processing technology to further expand what data can be extracted from EHRs.

When designing studies using registries, researchers should partner with biostatisticians to determine the optimal design and whether the quality and scope of the available data will be sufficient for the study question. Observational registry-based studies do not typically randomize since decisions regarding treatments are made by the patient and their healthcare team, and likely influenced by that patient's risk factors. It is therefore necessary for biostatisticians to adjust for those risk factors in the analysis. This can be done for example using matching techniques (propensity score or exact matching) or covariate adjustment in regression models. Biostatisticians can also assess for sources of bias that can impact the validity of a study, such as missing data, participant attrition or incomplete follow-up, etc. Close collaboration between biostatisticians and researchers is important to ensure that analysis adequately addresses the question, controls for bias in the sample and correct conclusions are drawn.

While many registry studies utilize retrospective cohort study designs, registry infrastructure can also be leveraged to do prospective cohort studies, possibly requiring supplemental data collection to address the study objectives. Registries can also be used to identify patients for case-control or case-cohort study selection, on whom supplemental data collection may be pursued. Registries can provide comparator cohorts of patients (synthetic control arms) for single arm prospective interventional clinical trials, such as in early phase trials or rare diseases [11,27,28]. The relatively large size of registry cohorts can facilitate matching of patient characteristics, either through exact matching or propensity score matching, that is often needed for such comparative analysis. They also can be used to provide digital twins to increase efficiency of randomized clinical trials [29]. Tools to evaluate comparative effectiveness research studies have been developed [30].

The past two decades have seen increasing emphasis placed on the patient's perspective in research. Patient-reported outcomes (PROs) are standardized measures that can assess the health-related quality of life of a patient. The use of PROs have been fully endorsed by many governmental agencies (e.g., European Union (EU), U.S., and elsewhere) [31] and are commonly included in haematological clinical trials [32,33]. Numerous studies have described the health-related quality of life in cohorts of paediatric and adults with hematologic diseases [[34], [35], [36]] or who have undergone HCT [37,38] or CT [39]. Registries are starting to routinely collect PRO data and core domains have been identified for different populations [37,40,41]. Centralized PRO collection leverages the registry infrastructure already in place, decreases burden for the centre, augments the registry database with valuable data for use in studies, allows for more complete and correct socio-economic data collection, among other benefits. Barriers may include having correct contact information for patients, having a representative cohort of respondents and participant burden [42]. CIBMTR is routinely collecting PROs from HCT and CT recipients and recently has started providing PRO data back to the centre for their use also [42].

The availability of biorepositories samples further expands the potential studies that a registry can facilitate, especially in rare diseases [43]. Biorepositories can have a broad focus such as the U.K. Biobank which has been collecting biologic samples and medical data (e.g., imaging, genomic, questionnaires) on over a half million United Kingdom (UK) participants since 2006 [44]. This resource has been used to study a wide range of diseases including SCD, leukaemia and lymphoma [44]. Other biorepositories have a narrower focus such as the National Marrow Donor Program (NMDP) which maintains samples from 60,000 donor (related, unrelated or cord blood) and HCT recipient pairs [45]. The NMDP has a research partnership with CIBMTR that pairs clinical data with biologic samples. Samples from biorepositories without formal registry partnerships can also be linked to existing registries using probabilistic data linkage strategies.

Partnerships between registries are commonplace to augment available data while conserving resources. Smaller registries often partner with larger registries to leverage their infrastructure. The CIBMTR provides this service for several national HCT registries. Similarly, the ASH Research Collaborative supports the Latin American Registry for Aplastic Anemia [46]. Taking the approach “collect once and use often”, coordinated registry networks leverage individual registries or other data sources to expand the scope of available data. An example of this is the U.S. National Cancer Institute (NCI) National Childhood Cancer Registry (NCCR) that is working to centralize paediatric cancer data by leveraging existing, primarily adult-focused cancer registries [47]. Currently, the NCCR contains cancer data representing 66% of all U.S. children (age <20 years) [47]. Registries can also partner with clinical trials networks, such as the Blood and Marrow Transplant Clinical Trials Network (BMT CTN) [48], and the CIBMTR [18], to collect data. BMT CTN leverages the infrastructure of CIBMTR for data collection allowing data to be submitted once and used for both purposes thus decreasing centre burden. An increasing number of registries have established or participate in research data collaboratives or data hubs to centralize various data (e.g., clinical, genomic, etc.) for researchers. Within the EU, the HARMONY Alliance is a public-private endeavour, including registry partners, that was established to promote big data research in hematologic malignancies [49]. The ASH Research Collaborative Data Hub is bringing together multiple myeloma and SCD data from partner centres [23], but also mobilized during the pandemic to establish the COVID-19 Registry for Hematology [50]. The NCI National Childhood Cancer Data Initiative is creating a data ecosystem that includes the NCI NCCR discussed above [47]. As these data ecosystems continue to evolve, registries can lead the effort to build the necessary infrastructure (e.g., data standardization, quality, governance, etc.) that will allow for expanded data types such as genomics and management of increased volume (e.g., wearable devices). Clinicians and Biostatisticians will have key roles in determining relevancy of these data as well as developing the statistical approaches needed for analysis.

留言 (0)

沒有登入
gif