Fusion versus decompression alone for lumbar degenerative spondylolisthesis and spinal stenosis: a target trial emulation with index trial benchmarking

Study design, target trial emulation, and benchmarking against index trial

We described and emulated a hypothetical, pragmatic target trial mimicking a state-of-the-art index RCT comparing decompression with or without fusion for DS—the Norwegian Degenerative Spondylolisthesis and Spinal Stenosis trial (NORDSTEN-DS) [13]. We used data from the Lumbar Stenosis Outcome Study (LSOS), a multicenter cohort study in 4 hospitals and 8 specialized clinical units in Zurich, Switzerland. Details on the LSOS are available elsewhere [17]. The target trial was specified in terms of eligibility criteria, treatment strategies, treatment assignment, outcomes, follow-up, causal estimand, and statistical analysis (sTable 1 in Supplemental File 1). Time zero was the time of meeting eligibility and being assigned to treatment. Primary outcome comparative effectiveness estimates from the emulation were benchmarked against those of the index RCT at 2 years. If comparative effectiveness estimates led to similar clinical decisions as the index RCT, the emulation would be extended to 3 years. [18] The study was preregistered in the Open Science Framework (OSF) [19]. Our study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) recommendations [20], and was informed by applicable work-in-progress guidelines [21].

Eligibility criteria

A total of 1716 patients were screened for eligibility to participate in the LSOS from December 2010 to December 2015. To be eligible for LSOS, patients had to be 50 years or older, have unilateral or bilateral neurogenic claudication, with a life expectancy of more than one year, able to provide informed consent, and fluent in German [17]. Exclusion criteria for LSOS were evidence of tumor, fracture, infection, lumbar scoliosis of more than 15°, and peripheral artery occlusive disease. For this analysis, we included patients with lumbar spinal stenosis with additional spondylolisthesis verified by magnetic resonance imaging (MRI) as a slippage of one vertebra over the adjacent one undergoing decompression with or without fusion within 6 months after enrollment in the study. We restricted our study population to patients with complete primary outcome follow-up data for the primary analysis. LSOS followed the ethical principles of the Helsinki Declaration and all applicable laws and regulations. Institutional review board approval was received from the Cantonal Ethic Committee of Zurich (KEK-ZH-Nr: 2010-0395/0).

Treatment groups

Patients undergoing decompression alone received open bilateral decompression or unilateral laminotomy with bilateral decompression of the affected disc level. Those in the decompression and fusion group received additional implantation of pedicle screws with rods plus interbody fusion cages (sFigure 1). Only experienced orthopedic surgeons or neurosurgeons—with more than 10 years of experience⁠—delivered the procedures.

Follow-up and outcomes

The primary outcome was change in health-related quality of life at 3-year follow-up, measured by the EuroQol Health-Related Quality of Life 5-Dimension 3-Level questionnaire (EQ-5D-3L)—a well-established instrument comprising mobility, self-care, usual activities, pain/discomfort, and anxiety/depression dimensions. Health states from the EQ-5D-3L were transformed into a summary index score using country-specific value sets. German and French weights were used to calculate summary indices.

Secondary outcomes at 3-year follow-up included:

Change in Numeric Rating Scale (NRS) for back or leg pain intensity. NRS scores range from 0 to 10, with 10 indicating worst pain imaginable.

Change in Spinal Stenosis Measure (SSM) satisfaction subscale scores. SSM satisfaction subscale scores range from 1 to 4, with 4 indicating very dissatisfied.

Physical therapy utilization—a binary (yes/no) outcome of physical therapy utilization.

Oral analgesic use—a binary (yes/no) outcome of oral analgesic intake.

Exploratory adverse event outcomes included:

Complications—either intraoperative (e.g., bleeding, dural injury) or post-operative n (e.g., revision due to wound infection, hematoma evacuation, wound revision, dural leakage revision).

Revision—due to restenosis, infection or epidural hemorrhage; with additional fusion.

Statistical analysis

The primary analysis compared standardized mean differences in German and French EQ-5D-3L change scores among patients undergoing decompression with or without fusion at 3-year follow-up. Standardized mean differences in change scores and 95% confidence intervals (CIs) were estimated using weighted ordinary least squares models. The secondary analysis compared standardized mean differences in NRS and SSM satisfaction subscale change scores at 3 years. We used inverse probability weighting for confounding control and obtained balanced groups at baseline [22]. Individual observations were weighted by the inverse of their probability of receiving decompression plus fusion surgery given a set of confounders (i.e., the higher the probability, the less the weighting applied to an individual observation). The probability of receiving decompression plus fusion was obtained from a logistic regression model with fusion as dependent variable and the following baseline covariates: age, sex, body mass index (BMI), months since onset of current complaints, compromise of the foraminal zone (left and right), smoking status, educational level, back pain, gluteal pain, thigh pain, lower leg pain, presence of spinal instability (operationalized as facet degeneration with effusion of > 2 mm), spondylolisthesis severity (Meyerding grading system), number of vertebral levels operated, number of vertebral levels with spondylolisthesis (Meyerding I or more), diabetes, civil risk (operationalized as either living alone, or living in a nursing or residential home and being single, divorced, or widowed), anxiety (Hospital Anxiety and Depression Scale [HADS] anxiety subscale), depression (HADS Depression subscale), and EQ-5D-3L scores (German and French). For added modelling flexibility, age, BMI, anxiety, and depression were introduced in the model as beta-splines with three degrees of freedom. Extreme weights were trimmed at the 99th percentile according to best practice [22]. Confounder selection was guided by published literature, clinical expertise, and causal thinking. Two-sided p values < 0.05 indicated statistical significance.

The healthcare resource utilization analysis compared odds of physical therapy utilization and oral analgesic use at 1-, 2-, and 3- year follow-up. Odds ratios and 95% CIs were estimated using weighted generalized linear models with a logit link.

A standardized mean difference superiority threshold favoring decompression plus fusion of 0.20 in the German EQ-5D-3L summary index was prespecified for the primary outcome—informed by suggested minimal clinically important differences in LSOS [23].

Sensitivity analysis

Since we restricted our study population to patients with complete 3-year primary outcome follow-up data for the primary analysis, we assessed the robustness of our comparative effectiveness estimates by repeating the primary analysis including all patients who underwent surgery within 6 months after baseline—irrespective of their follow-up data completeness. Under the partial assumption of missingness at random, we performed multiple imputation by chained equations (MICE) with predictive mean matching, generating 20 imputed datasets for the 258 patients [24]. We analyzed each of the imputed datasets and combined comparative effectiveness estimates using Rubin's rules. All analyses were performed with R version 4.2.2. [25].

Power consideration

With a study population of 215 patients with complete follow-up data, a mean difference of 0.2 in EQ-5D-3L scores and assuming a standard deviation of 0.3 [12], we expected to reach 99% power to be able to reject the null hypothesis of no between-group difference at a 2-sided α level of 0.05. Under similar sample size, standard deviation, and α level considerations and aiming for 80% power, we anticipated to be able to detect a between-group difference of 0.13 in EQ-5D-3L scores.

留言 (0)

沒有登入
gif