The Backdoor Criterion: Covariate Selection for Causal Inference
As a contrast from model selection approaches, a causal inference methodology that has recently emerged in ecology is Judea Pearl’s structural causal model (SCM; Pearl 2009). This framework uses DAGs to visualize researchers’ assumptions about the causal structure of a system or process under study. Once a DAG has been created, a graphical rule known as the backdoor criterion can be applied to determine the covariates required to answer a causal question from observational data.
Conceptually, the backdoor criterion instructs us to block all non-causal paths between a predictor and response variable of interest, while leaving all causal pathways open. Graphically, this translates to blocking all backdoor paths between a predictor and response variable. Backdoor paths are sequences of nodes and arrows with an arrow pointing into both the predictor and response variable of interest; if left open, they can lead to non-causal associations between variables of interest. To block a backdoor path, we can either (1) adjust for an intermediate arrow-emitting variable or (2) not adjust for a variable with two incoming arrows (i.e., a collider variable: X ).
For example, given our DAG in Fig 1, to determine the total effect of forestry on species Y, there are four backdoor paths that must be blocked:
  1. Species Y Climate Forestry
  2. Species Y Climate Fire Species A Species Y
  3. Species Y Species A Fire Climate Forestry
  4. Species Y Human Gravity Forestry
The first three backdoor paths can each be blocked by adjusting for the intermediate arrow-emitting variable climate. The fourth backdoor path can be blocked by adjusting for the intermediate arrow-emitting variable human gravity. Therefore, to determine the total effect of forestry on species Y, we must adjust for climate and forestry. Following covariate selection, researchers can determine the appropriate statistical analysis, given their data. It is important to note that DAGs and the backdoor criterion are compatible with both linear and non-parametric approaches (Pearl 2009; Elwert 2013). As our simulated data was created using linear relationships, we have chosen a linear regression model, setting species Y as our response, forestry as our predictor, and including climate and forestry as controls. This model returned an accurate total causal estimate of -0.75[-0.77, -0.73] (Appendix S1). The application of the backdoor criterion can become increasingly complex with larger DAGs and as such, tools such as ‘dagitty’ (www.dagitty.net; instructions within site) can help in composing DAGs and specifying causal questions, which will subsequently identify required backdoor adjustment sets.