The Manitoba Population Research Data Repository holds an extensive collection of administrative datasets that are population-based and linkable across domains at the individual and family levels, making it a valuable resource for population health research.
We had access to multiple sources of relevant data (COVID-19 test and vaccination data, physician billing claim codes, prescribed medications and physician visit and hospitalisation patterns) from which to triangulate our approach and identify people with post-COVID condition.
Despite the strengths of this approach, we cannot be certain that all of the symptoms and health service use patterns we identified as post-COVID condition were directly related, since providers and patients may not have recognised their symptoms as being post-COVID condition at the time of the visit, or may have been seeking medical help for symptoms that were not actually post-COVID condition.
Using natural language processing to incorporate free text from physicians’ clinical notes into our analysis would provide additional context and nuance to our administrative data-driven approach, and may be an option for future research in this area.
IntroductionThe SARS-CoV-2 pandemic created significant societal disruption across the globe, with over 774 million confirmed cases and more than 7 million deaths as of early 2024.1 While COVID-19 is primarily considered an acute illness, some individuals experience long-term symptoms in a syndrome referred to as ‘long COVID’ or ‘post-COVID condition’ (PCC). More than 200 symptoms have been reported to occur in the weeks following SARS-CoV-2 infection, including shortness of breath, cognitive dysfunction, and fatigue, as well as multiple conditions that impact daily functions, such as neurological and psychiatric disorders, hypertension, respiratory difficulties, heart palpitations, chest pain, myalgia and digestive disorders.2–4
Our current understanding of PCC is mainly derived from a series of clinical cohort, cross-sectional and survey-based studies.5–11 Estimates of PCC prevalence vary, with 32–87% of COVID-19 survivors reporting persistent or new symptoms 2–3 months after acute infection.12 Many individuals with PCC are young and previously healthy and had mild forms of COVID-19 that did not require hospitalisation.13 There appears to be a higher prevalence of PCC in particular groups, including females and individuals aged 35–45.13–15 SARS-CoV-2 infection has also been shown to disproportionately impact many communities who are lower income, have higher rates of comorbidities, live in crowded or inadequate housing conditions and have poorer access to health services than the general population.16 This could mean a higher prevalence of PCC in, for example, Indigenous populations or immigrant groups.17 18
PCC can have a tremendous impact on quality of life and has been shown to substantially increase health service use.19–21 The underlying mechanisms of PCC are now presumed to be mediated by the immune system.22 23 The link between complement dysregulation and PCC explains the three principal mechanisms: immune dysregulation, immune priming and microvascular blood clotting23; the activation of cytomegalovirus- and Epstein–Barr virus-specific antibodies may also contribute to the pathophysiology of PCC.23 24
Despite PCC becoming better understood, several important clinical questions remain. Due to the emerging nature of the illness, the large number of associated symptoms and the diverse methodologies used thus far to study PCC, there is currently no consensus on its true clinical definition. While the WHO, the Government of Canada, the National Institute for Clinical Excellence and the Centres for Disease Control and Prevention have each proposed different clinical definitions, these may continue to undergo refinement as the understanding of PCC and its pathophysiology evolves. In the meantime, an analytic approach for identifying PCC in the general population would be a significant asset, since it would support the investigation of risk characteristics, medium- to long-term health outcomes and recovery trajectories of those with PCC. This evidence is critical for informing the development of clinical treatment guidelines and health system resource planning and would also aid in the development of COVID-related health policy worldwide.
Several editorials have called for data science-based solutions to address some of the challenges the pandemic has presented.25–27 Administrative data are a powerful resource for researchers, clinicians, patients and health system decision-makers and are currently being used to help us understand COVID-19 and its impact across various populations. An administrative data approach has the advantage of capturing a whole population of interest, thereby limiting selection bias and loss to follow-up, and because administrative data include community-based care, they allow for the inclusion of subpopulations with less severe diseases that may not be present in hospital or critical care studies. Research using administrative data to examine the prevalence of poorly defined syndromes in the US population, for example, post-treatment Lyme disease syndrome28 and (most recently) PCC,29 has demonstrated the utility of such an approach. In this study, we constructed a PCC cohort using administrative health data from a population-based data repository in Manitoba, Canada and described PCC risk and protective factors in the COVID-positive population.
MethodsStudy settingManitoba is a province with approximately 1.4 million residents in the geographical centre of Canada. The provincial single-payer healthcare system covers over 99% of Manitoba residents, excluding only individuals who are federally insured (eg, those incarcerated in federal prisons, members of the military and some First Nations and Inuit populations).30 31 The provincial health insurance coverage includes all hospitalisations, medically necessary physician services and prescription drug dispensations by Manitoba pharmacies.
During the study period (March 2020 to December 2021), the total number of people residing in Manitoba was 1 465 704. In March 2020, the population was 1 385 424, and in December 2021, it was 1,405,498, with an overall increase of 20 074 people, a growth rate of 1.4% during the study period. There were 30 237 births (2.06%), 21 721 deaths (1.48%), 52 243 people who moved into Manitoba or initiated health insurance coverage for a reason other than birth (3.56%) and 40 764 people who moved out of the province or ceased health insurance coverage for a reason other than death (2.78%).
Data sourcesThe Manitoba Population Research Data Repository at the Manitoba Centre for Health Policy comprises over 100 databases of whole-population, individual-level administrative data from the health, social services, legal and education systems in Manitoba. All repository databases are de-identified (names and addresses removed), but they are linkable at the individual and family level using a scrambled Patient Health Identification Number attached to each record and to a central population registry.31 32 The repository data have been used extensively in research, and the validity of the databases to examine population health has been well documented.33–35 The databases used in this study are presented in table 1. Both the hospital discharge abstract data and the medical claims data have been shown to be comprehensive for the population of Manitoba.
Table 1Databases and key variables from the Manitoba Population Research Data Repository
Patient and public involvementAt least one member of the research team was an individual who was experiencing PCC at the time of the study. The study conception, design, conduct and interpretation were informed by their personal experiences with PCC and as a family physician. They are listed as an author on the study.
Ethics and privacyEthics approvals were granted by the University of Manitoba’s Health Research Ethics Board (#HS25090 H2021:279) and the Health Information Research Governance Committee at the Manitoba First Nations Health and Social Secretariat (2020). The study protocol was reviewed by the Manitoba Government Health Information Privacy Committee (HIPC #2021/2022-18).
Constructing the PCC cohortThe cohort construction comprised three key steps, an overview of which is presented in figure 1. All cohort construction steps and analyses were completed in SAS/STAT software V. 15.1.36
Figure 1Post-COVID condition cohort development.
Step 1: identify the population at risk for PCCWe began by identifying all Manitobans at risk for developing PCC, that is, those with a positive COVID-19 test result from 1 March 2020 to 31 December 2021 (Step 1 cohort). People in the cohort were followed from their inclusion in the COVID-positive cohort for at least 90 days and for at most 365 days after the index date or until they were lost to follow-up (due to death or a move out of province) or the end of study date (30 June 2021). The province of Manitoba provided whole-population COVID-19 PCR testing from March 2020 to December 2021; thus, we were able to include information on the alpha, delta and early omicron waves of infection. The province’s centralised testing approach was discontinued on 31 December 2021.
Step 2: use diagnosis codes and prescription drug codes to narrow to those with PCCFrom the rapidly evolving literature on COVID-19,3 20 37–39 we gathered information on symptoms commonly associated with PCC and medications used to treat these symptoms. We determined the corresponding International Classification of Disease (ICD)-9 (outpatient) and ICD-10 (in hospital) diagnosis and procedure codes and Anatomical Therapeutic Chemical (ATC) codes (online supplemental table S1) and used these to narrow the cohort to only people who reported new symptoms or received a new prescription related to PCC symptoms 90 days or more after their positive COVID-19 test date (Step 2 cohort). We excluded anyone who reported the same symptoms in the 3 years before their positive COVID-19 test. However, given the broad range of symptoms associated with COVID-19, we knew this step likely would not be exhaustive because individuals with PCC might not have reported new symptoms (many are common to other illnesses) and might not have received a new prescription if their symptoms were ones they had experienced previously. Thus, we used another strategy to identify additional individuals in the Step 1 cohort likely to have experienced PCC but not captured in Step 2.
Step 3: identify additional individuals with PCC based on predictive modellingIn this final step, we used a predictive modelling approach to identify individuals in Manitoba likely to have PCC based on their physician visit rates. This approach allowed us to incorporate sociodemographic and health service use variables into the comparison between the Step 2 cohort and the rest of the population of Manitoba using a High Dimensional Propensity Score (HDPS).
First, in the PCC cohort from Step 2, we calculated the physician visit rate from the 91st day after their first positive COVID-19 test date to 31 March 2022 (PCC period) and compared it with their visit rate from 1 April 2019 to 31 March 2020 (pre-COVID period). We also compared physician visit rates for these two periods among those who tested positive for COVID-19 in Step 1 but were not part of the Step 2 cohort. In this latter group, those with visit rates the same or greater than those of the Step 2 cohort were added to the final cohort. Then, we trained several different regression model builds (stepwise, best subset and lasso) to determine the best fit. Model selection was done by considering the Brier score, the Schwarz Bayesian Criterion and the C-statistic,40 41 and the selected model was then internally validated by bootstrapping 500 samples, yielding an optimism-corrected C-statistic of 0.69387.42 This method ensured that we used all the data we had available in the prediction process. The selected model was then re-fitted to the entire group of individuals who tested positive for COVID-19 (Step 1 cohort) to determine the predicted probability of any of those individuals having PCC. The list of variables considered in these models and the ones that were used in the selected model are shown in online supplemental table S2).
One of the notable challenges in predictive modelling is determining the optimal cut-off with which to classify individuals as predicted to have the outcome. We tested several methods to obtain the highest optimal cut-point for a more conservative classification43 and then classified the Step 1 cohort into those who were and were not predicted to have PCC.
Validating Step 3 of cohort constructionOur efforts to identify individuals with PCC in Step 3 were validated in a three-way comparison: we compared three health service use indicators (physician visit rates, hospitalisations and emergency department visits) in the Step 2 cohort, the additional individuals identified in Step 3 and a group matched on an HDPS that encompassed both sociodemographic characteristics and health service use data dimensions. We hypothesised that the Step 3 cohort would have health service use patterns similar to the Step 2 cohort.
High-dimensional propensity score matchingFollowing the method developed by Li et al,44 we hard-matched on age, sex and region of residence (urban/rural) at the time of the first positive COVID-19 test using a ratio of 25:1 and calculated standardised differences for these variables. Then, we generated the HDPS using demographic variables (age, sex, region of residence and income quintile) and health service use data dimensions (hospital diagnosis, hospital procedures, physician diagnosis, physician tariff codes and ATC codes for prescription drugs) (online supplemental figure S1). The HDPS models were stratified by sex and run separately for three health service use indicators: physician visits, hospitalisations and emergency department visits. To create the final matched group, the five closest matches to the HDPS in each pool of 25 were selected. The case count was n=66 365. A calliper of 0.2× the pooled SD of the logit (p-score) was applied to avoid selecting pairs with unsuitably large differences in p-score.
In terms of completeness, for physician visits, 52 960 cases had five matches (79.80%), 12 392 cases had one to four matches (18.67%) and 1013 cases had no match (1.53%). For hospitalisations, 52 867 cases had five matches (79.66%), 12 539 cases had one to four matches (18.89%) and 959 cases had no match (1.45%). For emergency department visits, 52 885 cases had five matches (79.69%), 12 468 cases had one to four matches (18.79%) and 1012 cases had no match (1.52%).
Statistical analysisThe results of the three-way comparison used to validate the Step 3 cohort creation process are shown in figure 2. We examined physician visit rates using a negative binomial distribution with the log of person-days as the offset, obtaining rate ratios and 95% CI, and we conducted time-to-event analyses of first hospitalisations and first emergency department visits after the index date, obtaining hazard ratios and 95% CI. The time-to-event approach was used for the latter two indicators because they were rare events.
Figure 2Three-way health service use comparison to examine the validity of the physician visit/predictive modelling approach to identifying post-COVID condition (rate ratios and hazard ratios with 95% CI). ED, emergency department.
Predictors of post-COVID condition: univariate and multivariate analysesWe used univariate and multivariate logistic regression modelling to identify sociodemographic and clinical variables that were predictive or protective of PCC, producing ORs and 95% CI. Each of the univariate models had PCC as the dependent variable and one of the following as the independent variable: sex (male/female); age group (0–18, 19–29, 30–59, 60–79 and 80+); region of residence (urban/rural); income quintile (Q1–Q5); Regional Health Authority (Winnipeg, Interlake-Eastern Regional Health Authority, Prairie Mountain Health, Southern Health/Santé-Sud and Northern Health Region); hospitalisation within 14 days of positive COVID-19 test for any reason (Y/N); number of COVID-19 vaccine doses before first positive COVID-19 test (0–3+); comorbidity prior to the start of the study period (Charlson Comorbidity Index, 0–3+); immigrant to Manitoba within the last 5 years (Y/N); and First Nations (Y/N). The multivariate models included all of the variables in the list above.
ResultsPCC prevalenceAs shown in figure 1, we identified 66 365 individuals who tested positive for COVID-19 from 1 March 2020 to 31 December 2021 in Step 1. Among these individuals, 11 316 (17.1%) were identified as having PCC based on their diagnostic codes and prescription drug codes in Step 2. Physician visit patterns for the remaining 55 049 individuals who tested positive for COVID-19 were then further examined in Step 3. The pre-COVID versus PCC period physician visit rate ratio for the Step 2 cohort (n=11 316) was 1.75; among the other 55 049 individuals who tested positive for COVID-19, we identified 4155 (7.5% of 55 049) individuals as likely having PCC in Step 3 based on their physician visit rate and/or the predictive modelling. This group was added to the Step 2 cohort for a total of 15 471 PCC cases. Thus, we determined the prevalence of PCC to be 23.3% in the COVID-positive population of Manitoba.
Rate ratios for physician visits and hazard ratios for hospitalisations and emergency department visits were calculated as part of the validation process for constructing the final PCC cohort to determine whether there were significant differences in health service use at each step. Shown in figure 2, these data confirm that the rates of health service use were either the same or slightly higher between the Step 2 and Step 3 cohorts.
Predictors of post-COVID conditionIn the final PCC cohort, we conducted univariate and multivariate analyses to determine the odds of developing PCC based on the cohort’s sociodemographic characteristics. The univariate analyses are presented in table 2 as ORs and 95% CI. Female individuals and those over the age of 59 had higher odds of developing PCC, as did people in lower-income brackets, people with comorbidities (measured by the Charlson Comorbidity Index) and people who were hospitalised within 14 days of a positive COVID-19 test. First Nations also had higher odds of developing PCC than other Manitobans, whereas immigrants who arrived in Manitoba within the last 5 years had lower odds. Receiving the COVID-19 vaccine was protective against developing PCC, and a higher number of vaccine doses (from 0 to 3+) conferred greater protection against PCC.
Table 2Predictive characteristics for developing PCC in the population of Manitoba, Canada (univariate ORs and 95% CI)
The multivariate analyses are shown in table 3. Even after adjusting for age, sex, income and region of residence, many of the same factors were predictive of developing PCC, but a few discrepancies between the univariate and the multivariate findings are notable: in the multivariate findings, lower income was no longer a significant predictor of PCC, nor was living in the more remote regions of the province (eg, the Northern Health Region).
Table 3Predictive characteristics for developing PCC in the population of Manitoba, Canada (multivariate ORs and 95% CI)
DiscussionOur study aimed to develop an administrative data-driven analytic approach to identifying PCC cases in the general population. We identified PCC cases by first identifying individuals with a positive COVID-19 PCR test and then used diagnostic codes, prescription drug codes and patterns of health service use to develop a cohort of 15 471 individuals with PCC, which represents a prevalence of 23.3% in the segment of the Manitoba population who tested positive for COVID-19. In this cohort, individuals who were female, were aged 59+, were First Nations, had one or more comorbidity, or were hospitalised within 2 weeks of a positive COVID-19 test had higher odds of developing PCC than other Manitobans. Vaccination for COVID-19 (one or more doses) was protective against PCC.
In the literature, we observed that others using administrative data to examine PCC prevalence generally sought to define PCC using a combination of health provider claims, hospital discharge abstracts and/or electronic health records.29 45 46 A study from British Columbia, Canada, used data from patients attending a PCC-specific clinic and patients with an ICD-10-coded hospital admission for PCC and then identified other potential PCC cases with similar demographics, pre-existing conditions and COVID-19 symptoms using an elastic net penalised logistic regression model; they reported a PCC prevalence of 18% in the COVID-positive population.47 The authors acknowledged that they likely underestimated the prevalence of PCC since their analyses were based on the characteristics of severely impacted patients who were either hospitalised or referred to the specialised PCC clinic. However, they had the advantage of established ICD-10 codes, whereas we did not have a specific code for PCC at the time of our study. Other possible reasons for the difference in estimates between ours and others’ are the lag time between a positive PCR test and the development of PCC symptoms (some authors used 30–60 days,48 49 while we used the WHO definition of 90 days50) and the severity of illness in the starting population (eg, PCC among people hospitalised for acute COVID-1951).
The major challenge we encountered in developing our analytic approach was that people experiencing PCC symptoms were not readily identified in the Manitoba Population Research Data Repository. Without access to a specific ICD code for PCC, we instead used surrogate markers (physician billing claim codes, prescribed medications and physician visit and hospitalisation patterns), but we cannot be certain that these were all directly related to PCC. In addition, outpatient billing practices in Manitoba capture only one main diagnosis, potentially limiting the information we had available to identify PCC cases. As well, a possible lack of awareness or understanding of PCC at the time of providing the clinical care and completing the billing claims could have resulted in providers not screening for persistent symptoms and patients not seeking medical help specifically for PCC, especially if symptoms were mild. In an online survey of Manitobans in 2022, 62.6% indicated that PCC symptoms impacted their daily lives, but only 50.8% sought healthcare (among these, 65.9% primary care, 15.2% emergency department and 32.0% specialist or therapist).52 To account for these limitations, we relied on the breadth of data available in the repository, drawing from multiple linked databases and using analytic techniques to assemble the PCC cohort. At each analytic step, the research team discussed how the results aligned with their clinical observations and with the literature, and this discussion informed the next step.
A second challenge in our approach was that we were not certain which variables should be included in the predictive models to identify potential PCC cases, and it is possible that our approach overestimated the prevalence of PCC by including false positives in the cohort. As more research on PCC becomes available, our model could be adjusted to include additional variables, thus increasing sensitivity, and leading to a more refined classification of PCC cases. It would also be helpful to include clinical notes in our administrative data definition of PCC. Our structured administrative claims data currently lack the context that would be provided by linked clinical notes from physician visits. Recent advances in natural language processing would facilitate the incorporation of free text into analyses of structured administrative data.53–55
With respect to PCC predictors, we identified several sociodemographic factors that were associated with higher odds of developing PCC. Several other studies have reported similar findings, particularly the higher odds of PCC among female and/or older individuals, people who were severely ill after contracting acute COVID-19 (eg, required hospitalisation) and people with certain types of comorbidities.54 56–60 Higher odds of PCC among First Nations has not been reported previously, although there is ample published research showing how colonial practices and policies have disadvantaged First Nations and other Indigenous populations over many generations and placed them at risk for poor health outcomes, particularly in a public health crisis like the SARS-CoV-2 pandemic.61–63 A recent study reports on the public health measures implemented in Manitoba, how First Nations acted to protect their communities from COVID-19 and their success in advocating for vaccine prioritisation.64 Immigrants having lower odds of developing PCC than other Manitobans can perhaps be explained by the healthy immigrant effect,18 although examining sub-groups of immigrants would likely provide more insight into this finding. In our current study, vaccination was found to be protective against PCC, and this finding is generally supported by existing literature,54 57 59 65 although a 2023 study by Luo et al 56 did not find a significant decrease in PCC risk with more vaccine doses, possibly because the study examined a later time period than ours, where most individuals already had 3+doses of the COVID-19 vaccine. Finally, while some of our analyses (data not presented here) indicate an income gradient in positive COVID-19 tests, this gradient is not evident in the PCC population identified in our study.
The method we present here for estimating the prevalence of PCC using linked administrative data is widely generalisable to other jurisdictions with access to health system data and can be used to follow long-term outcomes in the PCC population. It will serve as the basis for further research (ours and others’) to follow the natural history and health outcomes of PCC. In particular, we plan to examine PCC outcomes in Indigenous and immigrant populations in more detail. Our findings are being shared widely within our research team’s networks of healthcare providers and planners to support the resourcing and implementation of health services for patients living with PCC. As well, the ongoing discourse on the pathogenetic mechanisms of PCC draws many parallels with other post-viral conditions,66 and as such, our approach could inform future research to better understand the prevalence and course of these conditions. A few limitations of our study approach warrant mention. By definition, our study cohorts only included those with a positive COVID-19 test result. Access to COVID-19 tests may not have been equitable across all of Manitoba for the entire study period, especially during early stages of the pandemic. It is possible that individuals with minor symptoms chose not to complete a test and thus our method selects for more severe disease. We also assumed that patients hospitalised with COVID-19 had more severe disease than those who were not hospitalised. Finally, the constant genetic variation and the evolution of new variants and sub-variants and the changing application of testing and vaccination modalities limits our ability to describe the current prevalence of COVID-19 and PCC. However, as more data become available, we could conduct time series analyses based on the emergence of new variants to address this limitation.
ConclusionPopulation-wide administrative data have the potential to address important gaps in our knowledge of emerging health conditions and can provide nuanced information for clinical and health system recovery and prevention efforts in the aftermath of a global public health crisis like the SARS-CoV-2 pandemic. Their value may be further enhanced by the use of natural language processing to link unstructured clinical data to existing repositories, which would provide additional context to structured administrative data.
Data availability statementData are available upon reasonable request. Data may be obtained from a third party and are not publicly available. The source data used in this study were originally collected during the routine administration of health services in Manitoba. They were provided to the Manitoba Centre for Health Policy (MCHP) for secondary use in research under specific data sharing agreements between the data trustees and MCHP. The data are approved for use at MCHP only. They are not owned by the researchers or by MCHP and cannot be deposited in a public repository. To review source data specific to this article or project, interested parties should contact the MCHP Repository Access & Use team at MCHP.Access@umanitoba.ca. The team will then facilitate data access by seeking the consent of the original data holders and the required privacy and ethics review bodies. Research studies using First Nations data require ethics approval from the Health Information Research Governance Committee at the First Nations Health and Social Secretariat of Manitoba (https://www.fnhssm.com/hirgc), and we comply with their policies for data access, linkage and sharing. For inquiries about accessing Manitoba First Nations data, please contact info@fnhssm.com.
Ethics statementsPatient consent for publicationNot applicable.
Ethics approvalEthics approvals were granted by the University of Manitoba’s Health Research Ethics Board (#HS25090 H2021:279) and the Health Information Research Governance Committee at the Manitoba First Nations Health and Social Secretariat (2020). The study protocol was reviewed by the Manitoba Government Health Information Privacy Committee (HIPC #2021/2022–-18).
AcknowledgmentsThe authors acknowledge the Manitoba Centre for Health Policy for use of data contained in the Manitoba Population Research Data Repository under project #2021-033 (HIPC/PHRPC #2021/2022-18). Data used in this study are from the Manitoba Population Research Data Repository housed at the Manitoba Centre for Health Policy, University of Manitoba, and were derived from databases provided by Manitoba Health; Statistics Canada; Shared Health Manitoba; Immigration, Refugees, and Citizenship Canada; First Nations Health and Social Secretariat of Manitoba; and Manitoba Primary Care Research Network. The results and conclusions are those of the authors and no official endorsement by Manitoba Health or other data providers is intended or should be inferred. We would also like to thank Annshirley Aba Afful, Quinn Anhorn, Dana Kliewer, Katie Kitchen, Marlee Mayer, Saman Muthukumaranra and Gillian Martens for their contributions to analysis planning and manuscript writing.
留言 (0)