Week 4 Worksheet Exercises
Interactions: Estimating, graphing and interpreting interaction effects
Aims
By the end of the session you will learn how to:
- fit, visualise and interpret results from regression models that include interaction terms
Setup
Create a worksheet document
In Week 1 you set up R
and RStudio
, and an RProject folder (we called it “HSS8005_labs”) with an .R
script and a .qmd
or .Rmd
document in it (we called these “Lab_1”). Ideally, you saved this on a cloud drive so you can access it from any computer (e.g. OneDrive). You will be working in this folder. If it’s missing, complete Exercise 3 from the Week 1 Worksheet.
Create a new Quarto markdown file (.qmd
) for this session (e.g. “Lab_4.qmd”) and work in it to complete the exercises and report on your final analysis.
Load R packages
Using functions learnt in Week 1, load (and install, if needed) the following R packages:
Introduction
This session explores examples of interactions in regression models. We will look more closely at some of the multiple regression models we have fit in previous labs and ask whether the effects of core explanatory variables could be said to depend on (or are conditioned on) the values of other explanatory variables included in the model. In practice, this will involve including the product of two explanatory variables in the model. Thus, taking a regression model of the generic form \(y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \epsilon\), we ask whether there is an interaction between variables \(X_1\) and \(X_2\) (i.e. whether the effect of one depends on the values of the other) by fitting a model of the form \(y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \color{green}{\beta_3(X_1 \times X_2)} + \epsilon\), where we are interested in the “interaction effect” represented by \(\beta_3\).
While fitting such interaction models is simple in practice, understanding when they are needed, interpreting, and communicating their results can be challenging, and a growing literature in the social sciences has explored best practices in relation to such models (Berry, Golder, and Milton 2012; Brambor, Clark, and Golder 2006; Clark and Golder 2023; Hainmueller, Mummolo, and Xu 2019). Advancements in software have also made it easier to interpret and present results from interaction models, with some notable R
packages entering this space over the past few years; we have already encountered the marginaleffects and ggeffects packages when visualising results from logistic regression models, and examining results from interaction models presents very similar challenges. In this week’s lab we will use some of these tools to undertake comprehensive analyses of interaction effects.
We have already encountered interactions in all of the application readings we have engaged with so far. However, they mostly involved more complex cases of interactions, whose understanding will first require a more basic knowledge of how interaction effects work and how they can be implemented in practice. Wu (2021) and Dingemans and Van Ingen (2015) make use of “cross-level” interactions, which we will revisit when learning about multilevel models (Week 6). Mitchell (2021) uses higher-level interactions on data aggregated at country level as an additional analysis to their main models, while Österman (2021) operates with a quasi-experimental design in which the interaction effect is taken to divulge a more directly causal effect the data. This latter example is “more complex” only at a conceptual and study design level, and we will replicate it in the lab exercises. However, we begin by looking at a conceptually less involved example where interaction effects are at the core of the analysis, also from the field of social trust research.
Exercise 1. Akaeda (2023)
Following a review of the literature on the relationship between attitudes towards redistribution, education, and social- and institutional trust, Akaeda (2023) derives a hypothesis they want to investigate: “trust decreases the gap in preferences for redistribution due to education” (p. 296). They break down the hypothesis into two parts, one relating to social trust and the other to institutional trust. In this exercise, follow the description and steps in the Notes page to answer a similar question to this by using only data on European countries from the European Values Study.
Exercise 2. Österman (2021): Model 1 and Model 2, main variables
Following the second example in the Notes, fit models osterman_m1 and osterman_m2.
Exercise 3. Österman (2021): Model 1 and Model 2, full covariate models
As described in the Notes, the models reported in Table 3 of Österman (2021) also included a number of additional covariates for statistical control (fbrneur
, mbrneur,
fnotbrneur
, mnotbrneur
, agea
, essround
, yrbrn
, eform_id_num
, including interactions between some of those control variables: yrbrn*yrbrn
, yrbrn*reform_id_num,
agea*reform_id_num
, agea*agea
, agea*agea*reform_id_num
).
Fit a more detailed version of model osterman_m2 from Exercise 2 that also includes these covariates, and compare your results to the simpler models fit in Exercise 2 and those reported by Österman (2021). Keep in mind that the results will still diverge from those reported in the original study because you are still not using sample weights or clustering.
Exercise 4. Österman (2021): Model 3, “flexible interaction” model
Österman (2021:223) writes:
“In the third main model I allow the effect of parental education to vary with all other independent variables, including the reform-fixed effects. This flexible model explores whether there exists any other conditional relationship between parental education and the covariates that could potentially bias the interaction estimate between parental education and reform exposure”
He calls this a “flexible interaction” model - also referred to as a “fully dummy-interactive” model given the binary nature of the interaction variable.
Attempt to fit a model that replicates the “flexible interaction” specification of Model 3 in Table 3 of Österman (2021). This model would include interaction terms not only between “Reform” and “High parental education”, but parental education is also interacted with all the other covariates.