How effective has the Spanish lockdown been to battle COVID‐19? A spatial analysis of the coronavirus propagation across provinces

This section outlines the main features of the empirical strategy used in this paper to assess both the propagation of coronavirus across the Spanish provinces and the effectiveness of the control measures implemented at containing the outbreak. We also discuss in two separate sub-sections the drawbacks of our empirical strategy, as well as the choice of the most suitable econometric specification for achieving the abovementioned objectives.

2.1 Epidemic curve specification

This sub-section introduces the functional form of the epidemic curve to be estimated and the set of variables that will be used to capture the spread of the virus within and between provinces.

Consider a panel of urn:x-wiley:10579230:media:hec4437:hec4437-math-0001 provinces observed on urn:x-wiley:10579230:media:hec4437:hec4437-math-0002 days. Let urn:x-wiley:10579230:media:hec4437:hec4437-math-0003 denote the onset date of the epidemic, that is, the date in which province urn:x-wiley:10579230:media:hec4437:hec4437-math-0004 reports its first coronavirus case. We then analyze the development of the epidemic in each province, that is, the temporal evolution of coronavirus cases once each province reports its first coronavirus case.

Let urn:x-wiley:10579230:media:hec4437:hec4437-math-0005 denote the cumulative number of confirmed (reported) coronavirus cases until day urn:x-wiley:10579230:media:hec4437:hec4437-math-0006 in province urn:x-wiley:10579230:media:hec4437:hec4437-math-0007. As is customary in panel data settings, we next assume that the number of cases in day urn:x-wiley:10579230:media:hec4437:hec4437-math-0008 can be expressed as a function of the number of cases on a previous day as follows: urn:x-wiley:10579230:media:hec4437:hec4437-math-0009(1)where urn:x-wiley:10579230:media:hec4437:hec4437-math-0010 can be interpreted as a heteroskedastic autoregressive parameter. For ease of notation, we have chosen a single temporal lag of urn:x-wiley:10579230:media:hec4437:hec4437-math-0011 to represent this relationship. The autoregressive model (1) can be viewed as a reduced-form model that simply aims to fit the observed epidemic curve of cumulative cases. Therefore, our model does not make assumptions on the underlying parameters that determine the contagion of COVID-19. In this sense, we later show that a linear specification of Equation (1) can fit the epidemic curve of several epidemiological models, which differ in their assumptions on the incubation period, and other critical parameters. For instance, the popular Susceptible, Infected and Recovered (SIR) and Susceptible, Exposed, Infectious and Recovered (SEIR) epidemiological models yield time-varying growth rates of cumulative cases, regardless of whether daily or longer temporal lags are used.

In this sense, a key variable to carry out this analysis is the epidemic time urn:x-wiley:10579230:media:hec4437:hec4437-math-0012, which denotes the number of days relative to the onset date. We expect that the rate of growth of coronavirus cases varies with urn:x-wiley:10579230:media:hec4437:hec4437-math-0013 as the traditional epidemic curve for a single wave has an S-shaped form.

The key aim of the coronavirus control measures is to reduce urn:x-wiley:10579230:media:hec4437:hec4437-math-0014. If urn:x-wiley:10579230:media:hec4437:hec4437-math-0015 is equal to one, there are no new infections, and the pandemic has therefore been controlled. If urn:x-wiley:10579230:media:hec4437:hec4437-math-0016 is greater than unity, new infections have been reported and the coronavirus pandemic is still spreading among the population despite the efforts to prevent the propagation of the virus. Our beta parameter (i.e., the rate of growth of cumulative cases) thus plays the same role as the so-called “reproductive number of the infection” (urn:x-wiley:10579230:media:hec4437:hec4437-math-0017), a fundamental epidemiological quantity, representing the average number of infections per infected case over the course of their infection. As we will show later, our beta parameter is also related to another commonly used epidemiological quantity: the so-called “growth rate,” which is often defined as the proportional (per capita) change in number of new cases per unit of time.

In order to obtain a simple empirical specification of Equation (1), we take natural logarithms and perform a first differentiation of the model. This yields the following expression: urn:x-wiley:10579230:media:hec4437:hec4437-math-0018(2)where urn:x-wiley:10579230:media:hec4437:hec4437-math-0019 is a set of province-specific but time-invariant fixed effects, urn:x-wiley:10579230:media:hec4437:hec4437-math-0020 is an exponential function of a set of covariates in order to impose the theoretical restriction urn:x-wiley:10579230:media:hec4437:hec4437-math-0021, urn:x-wiley:10579230:media:hec4437:hec4437-math-0022 is a urn:x-wiley:10579230:media:hec4437:hec4437-math-0023 vector of explanatory variables of the Spanish provinces, and urn:x-wiley:10579230:media:hec4437:hec4437-math-0024 is a spatial weight vector where the weights (urn:x-wiley:10579230:media:hec4437:hec4437-math-0025) measures the degree of human mobility (connectivity) between provinces. The urn:x-wiley:10579230:media:hec4437:hec4437-math-0026 parameter is the spatial autoregressive coefficient that measures the degree of spatial correlation between provinces. In our application, it can be interpreted as the propagation effect caused by the mobility of people across provinces.

The vector of covariates urn:x-wiley:10579230:media:hec4437:hec4437-math-0027 includes two sets of variables. urn:x-wiley:10579230:media:hec4437:hec4437-math-0028 firstly includes a third-order function of urn:x-wiley:10579230:media:hec4437:hec4437-math-0029 in order to capture the temporal pattern of the virus epidemic, conditional on the set of control measures.2 The growth rate of cumulative cases in a simple SIR epidemiological model changes (decreases) with urn:x-wiley:10579230:media:hec4437:hec4437-math-0030 either in levels or logs (see the temporal evolution of the growth rate of cumulative cases provided in Figure A1 of Appendix A that can be obtained replicating the same simulation of Chudik et al., 2020). The decline in growth rates for this model is not linear over time, which in turn explains why the epidemic curve of cumulative cases is S-shaped. Similar comments can be made if we introduce an incubation period into the SIR model obtaining an SEIR model (see, e.g., Institute for Disease Modelling, 2020). Moreover, if the model is deterministic, the simulated growth rates of cumulative cases can be predicted accurately using a third-order function of urn:x-wiley:10579230:media:hec4437:hec4437-math-0031 (see again Figure A1 in Appendix A).

As pointed out by a referee, the time-varying growth rate of cumulative cases decreases in an SIR model probably because, in this model, a higher percentage of the population is no longer susceptible to the virus as time passes. Obviously, in a more realistic model, the growth rate of cumulative cases might also change over time due to the importation of cases (from other provinces or geographical areas) and the introduction of non-pharmaceutical interventions. On the one hand, these phenomena might explain why the S-shape of the epidemic curve cannot be perceived visually and on the other hand, justifies the inclusion of other explanatory variables such as spatially lagged indicators of the pandemic in neighboring provinces and dummy variables to capture the Spanish lockdown.

Second, urn:x-wiley:10579230:media:hec4437:hec4437-math-0032 includes a dummy variable urn:x-wiley:10579230:media:hec4437:hec4437-math-0033 that takes the value 1 from March 14, 2020, the day marking the imposition of most of the coronavirus control measures by the Spanish Government. We also include 1- and 2-week lags of this dummy variable (i.e., urn:x-wiley:10579230:media:hec4437:hec4437-math-0034 and urn:x-wiley:10579230:media:hec4437:hec4437-math-0035) in order to capture larger effects attributable to the lockdown as time passes. This is an expected result due to the gap, which exists between when a person becomes infected and when they might subsequently infect another person, which is on average about 6 or 7 days (see, Flaxman et al., 2020, p. 18). Moreover, as pointed out by a referee, this result might also be caused by the lag between infection and the onset of symptoms and the existence of a large proportion of under-reported cases due to testing in March being saved and prioritized for only the most severe hospital cases.

Notice that our model specification looks like a Difference-in-Difference (DiD) model where we compare an outcome variable before and after treatment (a policy measure), having controlled for unobserved differences across units (provinces). Although the lockdown of the population in Spain was implemented in all provinces on March 14, 2020, the advance of the pandemic in each province was rather different at that time. Therefore, our identification strategy is based on the relatively large dispersion of pandemic developments (i.e., onset dates) across provinces, and that the onset dates are orthogonal to the lockdown implementation date.

We estimate the above model after taking natural logarithms to make it linear. Once we take natural logarithms, and a traditional noise term is added, the model to be estimated is: urn:x-wiley:10579230:media:hec4437:hec4437-math-0036(3)where urn:x-wiley:10579230:media:hec4437:hec4437-math-0037, urn:x-wiley:10579230:media:hec4437:hec4437-math-0038 is a mean-zero error term capturing random shocks, measurement or specification errors, and other unobservable variables not correlated with the rates of growth determinants. We used the logarithm transformation of the growth rates because it can be estimated using the standard linear Fixed-Effect (FE) estimator, which is equivalent to a linear panel data DiD estimation (Lechner, 2010, p. 189). This estimator ensures obtaining consistent causal effects attributable to a given policy measure, even in those cases where the time-invariant unobservable variables are correlated with the treatment variable (the lockdown dummy variable in our case). For instance, in our application, we might think that the centrality of Madrid and the greater mobility of the people living in Madrid and other populated cities/provinces were responsible for triggering the implementation of the Spanish lockdown.

It is also worth mentioning that in our paper we are not examining causal epidemiological effects in the sense that, for instance, infected individuals in period t cause secondary infections in period t + 1, and so on. This type of causal effect cannot be examined using a reduced-form model that simply aims to fit the observed epidemic curve of cumulative cases. However, the DiD specification of our reduced-form model is able to measure causal effects of a different nature, that is, those attributable to the public control measures implemented nationwide in Spain around March 14, 2020 aimed at containing the coronavirus outbreak during the first wave of the pandemic.

There is an extensive literature on human mobility for measuring the spread of infectious diseases. In this sense, it is worth mentioning the articles by Belik et al. (2011) and Bajardi et al. (2011), among others, that provide computational and theoretical models seeking to address the effect of human mobility and mobility restrictions on containing outbreaks of infectious diseases. Findlater and Bogoch (2018) find that the increasing volume of passenger travel, especially by air, enabled the global epidemic transmission. More recently, the use of new technologies such as mobile phones has facilitated the measurement of human mobility and its effects on disease connectivity (Lai et al., 2019). The researchers actually focus on severe acute respiratory syndrome coronavirus 2, concluding that human mobility predicts the spread and size of the epidemic and that travel restrictions are particularly useful in the early stage of the outbreak (see, e.g., Kraemer et al., 2020). This literature also demonstrates that viruses can spread through human contact patterns (Liu et al., 2020), given that human mobility contributes to promote social interaction (Mollgaard et al., 2017). Several studies corroborate these findings for Europe (see, e.g., Iacus et al., 2020; Lemey et al., 2021).

Please note that we use a Spatial Lag Model (SLX) specification to examine the role of human mobility in spreading the virus across the Spanish provinces. Inter-provincial mobility is captured using the spatial weight matrix urn:x-wiley:10579230:media:hec4437:hec4437-math-0039. This spatial matrix can be computed in different ways. We follow Giuliani et al. (2020) and Gross et al. (2020) and use a contiguity or binary urn:x-wiley:10579230:media:hec4437:hec4437-math-0040 matrix, where the weights equal one for adjacent units and zero for non-bordering units. In their spatial analysis of the spread of COVID-19 in Italy, Bourdin et al. (2021) performed several tests to select the best spatial weight matrix and selected, like us, the first-order contiguity matrix.

We select the epidemic time of neighboring provinces (i.e., urn:x-wiley:10579230:media:hec4437:hec4437-math-0041) in order to capture the potential propagation effects between provinces for two reasons. First, this variable is exogenous by construction. In a Spatial Autoregressive model (SAR) specification, urn:x-wiley:10579230:media:hec4437:hec4437-math-0042 is replaced with (a transformation of) the dependent variable, which is endogenous and should thus be instrumented as long as good instruments are available. Second, Vega and Elhorst (2015, p. 342) suggest taking the SLX model as a point of departure because this is not only the simplest specification but is also more flexible in modeling spatial spillover effects than other specifications.

2.2 Drawbacks

Three drawbacks of our empirical strategy are worth noting. First, although the linear FE model (3) has some features that are very appealing for our application, estimating the above logged linear model implies dealing with the zero growth rates of cumulative cases that often appear at the beginning of outbreaks. We can address this issue by dropping such observations from the sample. As this approach might generate some kind of sample selection bias if the missing observations are not random, we instead replace the zero values with a tiny but positive number before taking logs and keep the adjusted zero-value observations in our sample. We include a new dummy variable controlling for (adjusted) zero values as an additional explanatory variable. This variable not only allows us to control for potential measurement issues but also to prevent the observed sharp declines in growth rates caused by zero values to distort the third-order parametric function of epidemic times.

Second, in the first wave of the pandemic, no European country had sufficient testing capacity so that reported cases are a small fraction of the true number of infections. We can discuss whether this issue matters in our empirical application using the preliminary results of Orea et al. (2021), an ongoing study that complements the current paper as it tries to account for the prevalence of undocumented cases. In this paper we propose a stochastic frontier analysis approach for estimating epidemic curves, where the unobserved cases are proxied using a one-sided random term in the same fashion as firms' inefficiency in production economics. We find that the average reporting rate is around 42%. Despite this, we obtain very similar effects due to lockdown on the growth rates of coronavirus cases (6.8 percentage points [pp] on average) compared to our non-frontier application. So, our results would seem to be quite robust in terms of this issue.

Another but related matter has to do with the onset date of the pandemic used in our paper. Our epidemic time variable is defined as the number of days relative to the observed onset date of the pandemic, which relies on reported cases. Therefore, it is not a necessary circumstance that a single reported case on a certain date seeded the pandemic in a particular province due to underreporting of cases. In order to see whether in practice the gap between observed and true onset dates is an important issue, we have modified the simulation of Chudik et al. (2020) and simulated several scenarios with different observed onset dates due to underreporting.3 Two results of the simulation are worth mentioning. First, the goodness-of-fit of our model does not deteriorate when underreporting increases if the level of underreporting is common to all provinces. Second, the goodness-of-fit of the model does deteriorate when underreporting is large and the gap between observed and true onset dates varies notably across provinces. In this case, however, a linear model with fixed effects allowed us to retrieve the predictive capabilities of the model.

2.3 Discussion on modeling choice 2.3.1 Local versus global spatial spillovers

In this sub-section we discuss the nature of the spillovers generated by the SLX spatial specification of our epidemic curve. The spillovers induced by an SLX model are local in the sense that once the virus is transmitted from a province to another neighboring province, the transmission does not feedback and does not reverberate to other provinces. In this case, only adjacent neighbors are involved, but not higher-order neighbors. In contrast, the SAR model yields a more global spillover effect because it assumes that an impact on neighboring provinces reverberates to the neighbors of the neighboring provinces, neighbors to the neighbors, and so on, thus generating endogenous interaction and feedback effects (see LeSage, 2014). In this case, the propagation of an original outbreak involves more spatial observations.

The epidemiology literature focusing on the spatial propagation of COVID-19 highlights the contribution to the spread of the virus of both cross-border travel (Lemey et al., 2021) and local transmission (du Plessis et al., 2021). However, these papers do not discuss explicitly whether their transmission channels do have feedback effects between geographical units. This is the key issue that should guide the selection of a spatial econometric model. Although we believe that most of the inter-provincial mobility is local in nature due to regular commuting, we cannot rule out the possibility of more global effects caused by the transportation of goods or by business and leisure travelers.

As we do not have a theoretical justification for the selected spatial specification, we will proceed as follows with our empirical application. First, we will verify that the SLX model is able to capture all the spatial dependence in the dependent variable through a set of spatial autocorrelation tests on the model's residuals. We will next provide the parameter estimates of an SLX model that uses a W matrix defined using information on human mobility across all Spanish provinces, that is, not only between adjacent provinces. In this case, more spatial observations are involved, as occurs in the SAR and spatial Durbin models.

2.3.2 Linear versus count regression models

In this sub-section we discuss the advantages of using a linear model instead of a count model. Both models mainly differ in their dependent (outcome) variables and distributional assumptions.4 Despite these differences, the parameter estimates in our linear model can be interpreted as a semi-elasticity of the number of new cases with respect to an explanatory variable, in the same fashion as in count regression models.5

Although the interpretation of the estimated parameters is the same, the linear specification has some features that are critical in our application in order to measure the effectiveness of the Spanish lockdown in containing the propagation of COVID-19. First, running a linear model allows us to estimate a DiD model using the traditional FEs estimator. Estimating a DiD model using a count regression model is contentious as different empirical strategies exist for incorporating fixed effects into a count regression model, and some of them are not true FEs models (see Allison & Waterman, 2002). Moreover, Lechner (2010, p. 196) shows that estimating a DiD model with the standard specification of a count regression models (and other popular nonlinear models) would usually lead to an inconsistent estimator. Second, as the growth rate of cumulative cases is much less volatile than the number of new cases (or its growth rate), our linear model provides more accurate predictions than a count model. This is a feature of the model that is important in our application because we use predicted values to carry out our counterfactual analyses aimed at examining the effect of the Spanish lockdown.

Despite the fact that the FE linear model has some features that are very appealing for our application, we also provide the parameter estimates of a Negative Binomial (NB) model for robustness analyses. The NB model is also estimated using two different W matrices, in the same fashion as the linear models. Whereas the contiguity-based W matrix is computed using binary values indicating adjacent provinces, the so-called mobility-based W matrix is computed using information on human mobility across all the Spanish provinces.

留言 (0)

沒有登入
gif