Forgetting‐by‐not‐doing: The case of surgeons and cesarean sections

1 INTRODUCTION

Since the seminal report by (Luft et al., 1979), there has been growing evidence of a positive association between volume and quality in the provision of health services for a wide variety of procedures, time periods, and locations.1 Nevertheless, the debate about the causal direction of this relationship is far from settled (Halm et al., 2002; Ho, 2014).

Two principal hypothesis have been put forward to explain this association: (i) “learning-by-doing” (or “practice-makes-perfect”) and (ii) “selective referral” (Luft et al., 1987).2 Under “learning-by-doing,” increased experience leads to improvement in skills which in turn results in better quality as measured by patient outcomes. “Selective referral,” instead, occurs when providers with higher quality attract a larger volume of patients. The importance of identifying which one is driving the correlation between volume and outcome stems from the fact that they have opposite policy implications. If volume causes outcome, as learning-by-doing suggests, then the concentration of procedures in fewer and bigger providers would raise quality. However, if causality runs from outcome to volume, then those benefits are not present anymore, and concentration would only lead to reduced competition between providers and lower geographical coverage.

This paper aims at causally identifying whether learning-by-doing is present at the individual level in the healthcare sector, more specifically, for surgeons performing cesarean sections (C-sections). In particular, I look at whether a surgeon's recent procedure volume affects patient outcomes.3 In order to establish a causal relationship, I benefit from the fact that, due to state regulation, most pregnant women in Italy do not choose the gynecologist that will help them give birth within the public system. This institutional feature creates a setup where selective referral is not possible.

I make use of a census of birth certificates from a large public hospital in Italy for the period 2011–2014 that contains surgeon identifier for each surgery. Even though patients cannot choose a particular physician, the hospital may assign physicians with higher skills to patients with a higher health risk (selective allocation). To address this concern I use a fixed effect model and rely on changes in volume within surgeon for the estimation. I find strong evidence of learning-by-doing for C-section surgeons: operations performed by physicians with a higher recent experience result in better newborn health. More specifically, I find that a one standard deviation increase in surgeon's experience in the previous 4 weeks lowers a newborn's probability of having a low Apgar score by 13.2%4 and of being admitted to a neonatal intensive care unit (NICU) by 13.8%. These effects are only present for emergent C-sections (not for elective C-sections), meaning, cases in which the surgeon has to make crucial decisions against the clock.

One important assumption for these results to hold is the absence of any form of dynamic matching between physicians and patients. If, for example, hospitals aware of depreciating skills may assign healthier patients to physicians with a low recent activity, which would biased the estimates toward zero. In this case the results should be considered a lower bound of the true effect. To alleviate this concern, I first show that pre-treatment pregnancy and mother characteristics are uncorrelated with physician's recent experience. Second, one would expect that, if there is some form of dynamic matching between physicians and patients, emergent cases should get the more experienced physicians—biasing my results downwards. However, as mentioned before, I see the strongest effects of recent experience on these non-elective cases. Third, I also implement a sensitivity checks using a bounding approach following Oster (2019) and find that unobservables are unlikely to explain these results. Finally, I perform a robustness check by using different windows for recent experience, from 4 to 52 weeks. Results show that only very recent experience matters providing further evidence for the human capital depreciation hypothesis.

Cesarean sections are an attractive procedure to analyze the presence of surgeon's “learning-by-doing” hypothesis. Unlike other highly studied procedures that are performed by a team of surgeons, C-sections are executed by only one surgeon, allowing for better estimates of the individual surgeon's learning curve. In addition, for many developed countries, C-sections have become the most common surgical procedure.5 Furthermore, the discussion on volume-outcome effects become all the more relevant in view of the recent wave of closures of maternity services in various countries (e.g., US, Canada, UK, Japan, France, the Netherlands, and others).6 To the best of my knowledge, this is the first paper to obtain causal estimates of learning-by-doing for the case of cesarean section surgeons.

1.1 Literature review

There are hundreds of papers in the medical literature finding an association between higher hospital or surgeon procedure volume and better health outcomes (Birkmeyer et al., 2003; Chowdhury et al., 2007; Halm et al., 2002). However, these studies are mostly observational and tend to neglect the potentially endogenous nature of provider volume. Few studies have attempted to translate the association between volume and outcome into a causal relationship, and most rigorous econometric analysis have failed to identify learning-by-doing (Ho, 2014).

At the hospital-level, studies on learning-by-doing typically use lagged or cumulative volume as covariates of interest, and find no support for the learning-by-doing hypothesis (Gaynor et al., 2005; Ho, 2002; Sfekas, 2009). One exception is Avdic et al. (2019), who find a positive effect of hospital operation volume on patients' survival using Swedish register data on advanced cancer surgery procedures. They exploit the closures and openings of entire cancer clinics as an exogenous variation for volume in an instrumental variable set up. Importantly, they provide suggestive evidence that the effect on outcome is mainly due to increases in individual surgeon's experience.7

However, the literature testing for volume-outcome effects using individual (surgeon) level data is much more limited and it finds mixed results. On the one hand, Huesch (2009) and Contreras et al. (2011) fail to find any association between cumulative surgeon procedure volume and patient's health. Using longitudinal data for a specific eye surgery (LASIK) in one clinic in Colombia, Contreras et al. (2011) find no effect of cumulative volume on outcome. They exploit a quasi-random allocation of surgeons to patients which makes selective allocation less of a problem. Similarly, Huesch (2009) fails to find any effect of cumulative volume on outcome for a panel of surgeons performing coronary artery bypass grafts (CABG) in Florida in the period 1998–2006. Moreover, he finds that almost all prior experience is depreciated from one quarter to the next. The author uses a choice model and predicted volume to mitigate potential issues of selective referral, although he does not reject exogeneity of volume.

On the other hand, Ramanarayanan (2008) and Huckman and Pisano (2006) find evidence of strong learning-by-doing effects at the physician level when using a measure of recent experience as their covariate of interest—instead of cumulative volume. Ramanarayanan (2008) studies the same dataset for CABG surgeons as Huesch (2009) but uses the departure of a surgeon as an exogenous shock to the yearly volume of the remaining physicians. Instead, Huckman and Pisano (2006) do not discuss potential bias to a great extent and confine themselves to using surgeon risk-adjusted mortality as quality controls. They also focus on CABG cases—although their data comes from Pennsylvania for 1994 and 1995—and find that the mortality rate of patients decreases significantly with increases in the surgeon's experience in the previous calendar quarter.8

The current paper contributes to the existing literature in several ways. First, it provides new evidence of the causal link between patient outcomes and surgeon experience. As clearly showed before, the literature on volume-outcome at the individual level is in its early steps and more research is needed. Second, previous studies looking at the causal effects of volume on outcome rely mostly on instrumental variable estimates. This paper serves as a complement to previous studies by exploiting a set up where selective referral is not possible, together with a dataset that allows to estimate the effect from within surgeon variation in volume. In addition, most previous studies use health care data from the United States, and focus almost exclusively on coronary artery disease procedures. Finally, the data employed allows me to make more precise estimates about the volume of patients seen by each surgeon, while previous studies have relied mainly on yearly or quarterly data.

2 CLINICAL AND INSTITUTIONAL SETTING 2.1 The performance and organization of cesarean sections

A cesarean section (C-section) is a major surgical procedure in which a fetus is delivered through an incision in the mother's abdomen and uterus (American College of Obstetricians and Gynecologists, 2010). The procedure typically takes 45 min to an hour, and most mothers and babies stay in the hospital for 2 to 3 days.

Based on their degree of urgency, C-sections are typically classified in two groups: elective (or scheduled), and emergent (Lucas et al., 2000). The first group includes all C-sections scheduled in advance to occur before labor begins on the basis of an obstetrical or medical indication -although there is no immediate maternal or fetal compromise. The second group of C-sections includes all cases where the patient attempts to have a vaginal delivery (either through the natural onset of labor or medical induction) but end up delivering by C-section instead. This occurs when the patient develops complications that put in danger the health of the infant and/or the mother and thus the physician recommends to change delivery method toward surgery.

2.2 The Italian health care system and C-sections

Italian health care is a universal, public-private insurance system. The public part is the national health service—Sistema Sanitario Nazionale (SSN)—which is administered on a regional basis. According to the World Health Organization, in 2000 the Italian system provided the second best overall health care in the world—the first one being France (WHO, 2000). Furthermore it has the lowest maternal mortality rate worldwide at 1.94 for every 100,000 births (WHO et al., 2019).

Under this system, a pregnant woman cannot choose the physician or midwife that will assist her for the delivery unless she pays. Furthermore, given the well functioning of the system, the grand majority (89%) of women choose to use the public service (Ministero della Salute, 2019).9 This institutional feature eliminates the risk of selective referral, where institutions or surgeons with better performance attract higher volumes of patients—a common endogeneity issue in studies of learning-by-doing.

In the year 2016, Italy had an overall C-section rate of 36.8%, with great disparities between the public (31.7%) and the private sector (50.9%) (Ministero della Salute, 2019). According to the Health Statistics 2019 from the Organisation for Economic Co-operation and Development (OECD), Italy has the seventh largest C-section rate among OECD countries, with Turkey showing the highest rate at 53.1% and Israel the lowest one at 14.8% (the OECD average was 28.1%). These cross-country variation can be linked not only to patient characteristics, but also to non-medical reasons. Previous work has found that higher C-sections can be the result of higher fees in comparison to vaginal deliveries (Alexander, 2013; Allin et al., 2015; Gruber et al., 1999; Gruber & Owings, 1996), defensive medicine (Bertoli & Grembi, 2019; Currie & MacLeod, 2008; Dranove & Watanabe, 2010) and physician's scheduling convenience (Facchini, 2020; Lefèvre, 2014).10

Physicians working in the delivery room in public hospitals are paid a fixed salary, which means they have no personal financial incentive to recommend any particular treatment. They have full-time contracts and are hence not allowed to work in other hospitals, either public or private—although they may take leaves to visit other institutions.

3 EMPIRICAL METHODOLOGY 3.1 Empirical model The main question addressed in this paper is whether there is learning-by-doing in cesarean section surgeons. I test this by looking at whether surgeon's recent experience (est) has an impact on the next surgery's outcome. Thus I estimate a reduced-form model of the following type: urn:x-wiley:10579230:media:hec4460:hec4460-math-0001(1)where yist is a health indicator for patient i whose procedure was performed by surgeon s at time t. Surgeon's recent experience is defined as the number of C-sections performed in the 4 weeks leading up to and including the procedure on the patient surgeon s operated on just before operating on patient i.11d is a control for the number of days since the prior cesarean section surgeon s performed.12xit contains individual-level control variables for mother and pregnancy characteristics.13ϕ is a vector of indicators for year, month and day of the week of delivery.

Individual surgeon fixed effects (ηs) are included to mitigate concerns that the captured relationship between outcomes and recent experience is driven by composition effects. Surgeon fixed effects ensures that the recent experience parameter in Equation (1) is identified from changes in volume within surgeon.14 As discussed above, if physicians skills improve with recent repetition, then β should be negative: since outcomes are defined as adverse, a higher recent volume of surgeries would help (partially) avoid the loss of skills. On the contrary, a coefficient close to zero would imply that there is full depreciation and recent experience does not affect current outcome.

One important assumption for the previous model to obtain causal effects is the lack of any compositional effect of patients between physicians with different levels of recent experience. To test whether selection bias affects my estimates, I regress each pre-treatment characteristic on the treatment—that is, the number of C-sections performed in the last 4 weeks. If observed characteristics were associated with recent experience, it would be a sign of patient selection. The results for these estimations are reported graphically in Figure 1. After controlling for physician and time fixed effects, the treatment does not predict any of the observed maternal and pregnancy characteristics.15 This provides further evidence that mothers undergoing surgery with a physician with higher or lower recent experience are similar in observable characteristics.

image

Balanced pre-treatment characteristics. The figure represents the coefficients and 95% confidence intervals from separate regressions of each predetermined variable on recent experience, controlling for days since prior C-section, physician fixed effects, and day-of-the-week, month, and year of birth fixed effects

Even if physician fixed effects help alleviate issues of selection of patients based on physician's skills, there could still be problems of endogeneity if some sort of dynamic matching exists or “selective allocation” (Huesch & Sakakibara, 2009). For instance, hospitals aware of depreciating skills may assign healthier patients to physicians coming back from a period of low activity. In this case my estimates on the impact of recent experience on patients' health would suffer from a downward bias.16 To mitigate these concerns, I estimate a separate coefficient for different types of C-sections depending on their emergency status. For patients admitted emergently, given the unexpected nature of these cases, surgeons need to make fast decisions under pressure and skill depreciation should be of particular relevance.17 On the other hand, if selective allocation exist, we would expect these difficult cases to be taken by more experienced surgeons, which would bias my estimates downwards.

3.2 Data

This study utilizes birth certificates from the maternity ward of a large public university hospital in Tuscany (Italy) for the years 2011 through 2014. The hospital has an average C-section rate of 31% average, close to the national Italian rate of 33% among public hospitals in 2012.18 Birth certificates constitute a census of all births that took place in the hospital in this period. It contains information on mother characteristics (e.g., community of residence, education, civil status, age, previous deliveries, etc.), pregnancy characteristics (e.g., weeks of gestation, controls, assisted reproduction, etc.), birth characteristics (e.g., time of birth, type of labor, attendant, place, etc.) and indicators on newborns' health (e.g., weight, height, Apgar score, death, etc.). This information is complemented with surgeon's ID.19

The richness of this dataset comes at a cost: because the information available corresponds to just one hospital in a 4-year period, the sample size is relatively small. There were approximately 12,343 newborns during the period under study, 4413 (35%) of which were delivered via C-sections. Almost half of these C-sections are planned in advance between the physician and the patient (elective C-sections). From the 4413 cases, I keep only one observation per pregnancy and drop 427 observations from plural births. In addition, I drop from the analysis 86 births that have missing information in at least one of the variables used. Then I restrict the sample to surgeons who have performed, on average, at least 12 C-sections a year. This leaves the sample with 60 surgeons who performed 3599 (92%) surgeries. In addition, since my measure of experience is the number of C-sections in the past 4 weeks by surgeon, I drop all births that take place in the first 28 days from a surgeon's observed first C-section in my data. This leaves a sample of 3467 births performed in the 4-year period. Finally, I restrict the analysis to observations in which the surgeon has performed at least one C-section in the 4 weeks before (“active” surgeons). I do this to avoid using surgeons who have spent sometime practicing in another institution and whose recent experience I cannot observe.20 The final sample has 2982 births performed by 59 surgeons. Importantly, the empirical results are robust to the sample restriction on days from previous C-section.

Table 1 summarizes the variables used in the analysis. Mean admission to NICU was approximately 21%—including both intensive and sub-intensive units—and mean low Apgar score was 11%. As expected, emergently admitted patients have a higher probability of both having a low Apgar score and of being admitted to NICU than elective patients. The average age of patients is 34.5, and about 41% of them are first-time mothers—although this number is higher (49%) for non-elective procedures. About 21% of all babies are born with less than 37 weeks of gestation.

TABLE 1. Summary statistics All CS Elective CS Non-elective CS Mean SD Mean SD Mean SD Outcomes % NICU 21.0 40.7 18.0 38.4 23.3 42.3 % Apgar scoreurn:x-wiley:10579230:media:hec4460:hec4460-math-00029 11.4 31.7 7.5 26.3 14.4 35.1 Provider characteristics (Mean) CS in past 4 weeks 3.0 1.9 2.7 1.8 3.2 1.9 (Mean) days since last CS 8.4 7.9 9.5 8.1 7.6 7.6 Patient characteristics (Mean) age 34.5 5.5 35.3 5.4 33.9 5.5 % University degree 31.4 46.4 33.2 47.1 30.0 45.8 % First-time mothers 41.1 49.2 31.0 46.3 49.0 50.0 Pregnancy characteristics % Male 51.9 50.0 51.2 50.0 52.5 50.0 (Mean) weight in grams 2993.3 751.4 2989.9 650.7 2995.9 820.7 % Low birthweight (<2500gr) 20.7 40.5 19.9 39.9 21.3 40.9 (Mean) weeks of gestation 37.8 3.0 37.7 2.1 37.8 3.5 % Preterm (<37 wofg) 20.5 40.3 19.1 39.3 21.5 41.1 % with at-least 1 ER visit 19.5 39.6 22.4 41.7 17.2 37.8 Observations 2982 1298 1684 Note: Table contains variables used in the empirical analysis for the main estimation sample and the restricted sample of physicians who performed at least one C-section (CS) in the past 4 weeks for the period 2011–2014. Abbreviation: NICU, neonatal intensive care unit.

The mean number of procedures performed in the previous 4 weeks was 3 for the whole sample. Surgeons performing non-elective C-sections have a slightly higher mean recent experience than those performing elective procedures. Figure 2 shows the frequency distribution of the measure of recent experience for the study sample.

image

Frequency distribution of recent experience. The figure represents the distribution of recent experience measured at 4 weeks. The red dashed-line is the mean average

3.3 Outcomes

The most common outcome (almost exclusively) used in the health economics literature analyzing learning-by-doing and forgetting by hospitals and physicians is the death of the patient—both during and after surgery. As mentioned before, one important drawback of the database used here is the small sample size. Both maternal and fetal deaths are rare events, more so in developed countries, hence there are very few observations experiencing either one of these outcomes (e.g., there are only 12 stillbirths in the study sample). This impedes their use as outcomes for this study. However, one may also argue that mortality alone, being an extreme outcome, is an inadequate measure for capturing the full spectrum of the effects of learning-by-doing on patient health and hospital costs (e.g., morbidity or ordered procedures may also be important outcomes).

The data in hand contains other potential outcomes for patients' health beyond death that can be affected by surgeons skills. As proxies for newborns' health, this study uses the probability of needing to be transferred to a NICU and the probability of having a low APGAR score (at 5 min). The first one measures whether the newborn had to be transferred to a NICU. All else equal, a newborn that is transferred to the NICU is likely to have worse health than one that is not. Furthermore, NICU admissions are among the most expensive treatments in regular hospitals, with one day cost being above $3000. Finally, there are also psychological costs for the parents of the infant. The second outcome is based on a total score of 1 to 10, where the higher the score the better the baby is doing after birth. This test is done to determine whether a newborn needs help breathing or is having heart trouble. Any score lower than 7 is a sign that the baby needs medical attention. In this study, there are only 72 newborns with score below 7. For this reason a new measure was constructed setting the bar higher and all births with a score lower than 9 will be considered of “relatively” lower health. This doesn't necessarily mean a bad score that doctors should act on, but it can be argue that a newborn with an APGAR score below 9 is in relatively worse health condition than a newborn with a score of 9 or 10.

4 RESULTS 4.1 The effect of recent practice on patient health

Figure 3 provides the most direct illustration of the effect of experience on outcomes by plotting this bivariate relationship. The dots are 50 equally sized bins plotting the mean of the y-variable (probability of neonatal ICU or probability of an Apgar score below 9) against the mean of the x-variable (surgeon's experience in the last 4 weeks). The dashed line, instead, are the fitted values of a linear regression. The graphs on the left hand-side correspond to elective C-sections, while the ones on the right use only non-elective C-sections. For the elective cases, we cannot observe any clear relationship between experience and either outcome. Instead, when looking at emergent cases, we observe a negative effect of experience on the both outcomes.

image

Visualization of the relationship between experience and outcomes. (a) Elective, (b) non-elective. The dots are the mean outcome as a function of experience (after controlling for fixed effects for year, month and day-of-the-week of birth, mother and pregnancy controls mentioned in Section 3.1 and surgeon fixed effects) using a binned scatter plot with 50 equally sized bins (using “binscatter” in Stata [Stepner, 2013]). The dashed line shows the fitted values of a linear regression. The left hand-side graphs use only the sample of elective C-sections while the right hand-side graphs use the sample of non-elective C-sections

Table 2 shows the results of estimating Equation (1) using a linear probability model for each outcome.21 Panel A shows results using the whole sample, while Panel B and Panel C repeat the exercise for elective and non-elective C-sections. For each outcome, column one shows results of a model with controls, and physician and additive time fixed effects. Column two adds a control for surgeon's number of days since last C-section, and column four adds surgeon's specific time-trends. Standard errors are clustered at the surgeon level in all specifications.

TABLE 2. Effect of recent experience on birth outcomes Neonatal ICU Apgar urn:x-wiley:10579230:media:hec4460:hec4460-math-0003 9 (1) (2) (3) (4) (5) (6) Panel (A): All C-sections Experience (4w) −0.008*** −0.010*** −0.008** −0.005 −0.005 −0.005 (0.003) (0.003) (0.003) (0.003) (0.004) (0.004) Days since last CS −0.002** −0.001* −0.000 −0.001 (0.001) (0.001) (0.001) (0.001) Observations 2982 2982 2982 2982 2982 2982 Mean dep. 0.210 0.210 0.210 0.114 0.114 0.114 Panel (B): Elective C-sections Experience (4w) 0.004 0.000 0.002 0.001 0.001 0.000 (0.005) (0.005) (0.006) (0.005) (0.005) (0.006) Days since last CS −0.002* −0.002 0.000 0.000 (0.001) (0.001) (0.001) (0.001) Observations 1297 1297 1297 1297 1297 1297 Mean dep. 0.180 0.180 0.180 0.075 0.075 0.075 Panel (C): Non-Elective C-sections Experience (4w) −0.015*** −0.017*** −0.015*** −0.009** −0.010** −0.010* (0.003) (0.003) (0.004) (0.004) (0.005) (0.005) Days since last CS −0.001 −0.001 −0.001 −0.001 (0.001) (0.001) (0.001) (0.001) Observations 1681 1681 1681 1681 1681 1681

留言 (0)

沒有登入
gif