Week 4 Application Lab

Interactions: Estimating, plotting and interpreting interaction effects

Setup

In Week 1 you set up R and RStudio, and an RProject folder (we called it “HSS8005_labs”) with an .R script and a .qmd or .Rmd document in it (we called these “Lab_1”). Ideally, you saved this on a cloud drive so you can access it from any computer (e.g. OneDrive). You will be working in this folder.

Create a new Quarto markdown file (.qmd) for this session (e.g. “Lab_3.qmd”) and work in it to complete the exercises and report on your final analysis.

Aims

By the end of the session you will learn how to:

fit, visualise and interpret results from regression models that include interaction terms

Setup

Load R packages

Using functions learnt in Week 1, load (and install, if needed) the following R packages:

Introduction

This session explores examples of interactions in regression models. We will look more closely at some of the multiple regression models we have fit in previous labs and ask whether the effects of core explanatory variables could be said to depend on (or are conditioned on) the values of other explanatory variables included in the model. In practice, this will involve including the product of two explanatory variables in the model. Thus, taking a regression model of the generic form \(y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \epsilon\), we ask whether there is an interaction between variables \(X_1\) and \(X_2\) (i.e. whether the effect of one depends on the values of the other) by fitting a model of the form \(y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \color{green}{\beta_3(X_1 \times X_2)} + \epsilon\), where we are interested in the “interaction effect” represented by \(\beta_3\).

While fitting such interaction models is simple in practice, understanding when they are needed, interpreting, and communicating their results can be challenging, and a growing literature in the social sciences has explored best practices in relation to such models (Berry, Golder, and Milton 2012; Brambor, Clark, and Golder 2006; Clark and Golder 2023; Hainmueller, Mummolo, and Xu 2019). Advancements in software have also made it easier to interpret and present results from interaction models, with some notable R packages entering this space over the past few years; we have already encountered the marginaleffects and ggeffects packages when visualising results from logistic regression models, and examining results from interaction models presents very similar challenges. In this week’s lab we will use some of these tools to undertake comprehensive analyses of interaction effects.

We have already encountered interactions in all of the application readings we have engaged with so far. However, they mostly involved more complex cases of interactions, whose understanding will first require a more basic knowledge of how interaction effects work and how they can be implemented in practice. Wu (2021) and Dingemans and Van Ingen (2015) make use of “cross-level” interactions, which we will revisit when learning about multilevel models (Week 6). Mitchell (2021) uses higher-level interactions on data aggregated at country level as an additional analysis to their main models, while Österman (2021) operates with a quasi-experimental design in which the interaction effect is taken to divulge a more directly causal effect the data. This latter example is “more complex” only at a conceptual and study design level, and we will replicate it in the lab exercises. However, we begin by looking at a conceptually less involved example where interaction effects are at the core of the analysis, also from the field of social trust research.

Example 1. Akaeda (2023): Are preferences for redistribution associated with education in European countries, and is the association moderated by social trust?

The research question in the title of this exercise relates directly to the research reported by Akaeda (2023). Following a detailed review of the literature on the relationship between attitudes towards redistribution, education, and social- and institutional trust, the author derives a hypothesis they want to investigate: “trust decreases the gap in preferences for redistribution due to education” (p. 296). They break down the hypothesis into two parts, one relating to social trust and the other to institutional trust.

Akaeda (2023) combine data from several waves of the World Values Survey (WVS) and the European Values Study (EVS) to test this research hypothesis. Their combined dataset contains data on “74 countries, 26 years, 259 country-years, and 254,214 individuals”. Their interest is in modelling the hypothesised relationship on a global level, accounting both for various individual explanatory variables as well as macro-societal factors at the level of countries and country-years (i.e. change over time within countries).

Here we will attempt to fit a simpler version of their “Model 3” (reported in Table 2). In terms of data, we focus specifically on “European” countries; for this purpose, we will use data from the latest wave of the European Values Study survey. In terms of variable selection and transformations, we will roughly follow Akaeda (2023), with some adjustments for convenience. Akaeda (2023:297–98) describe their variable selection procedure in these words:

the dependent variable is the score of preferences for redistribution. This score is based on a question … that asks respondents to indicate on a scale from 1 to 10 whether ‘Incomes should be made more equal (1)’ or ‘We need larger income differences as incentives for individual effort (10)’. In accordance with previous research, the scores were reversed such that a higher score indicates stronger support for redistribution;

the level of education is a key independent variable because the association between education and support for redistribution is a main focus of this study. Because previous studies have shed light on the mechanisms of university education related to a conservative view of redistribution, this analysis adopts the dummy for university or higher degree as an independent variable;

as a key moderator variable … social trust as measured by the question, ‘Generally speaking, would you say that most people can be trusted or that you need to be very careful in dealing with people?’, which had the following two potential responses: ‘1. Most people can be trusted’, ‘2. Need to be very careful’. Based on previous research involving social trust, this analysis employs the dummy for social trust (1 = ‘Most people can be trusted’);

the following individual-level controls: gender (1 = female, 0= male), age, age squared, employment status (employed, self-employed, unemployed, retired, other), household income (z scored for country-year units, marital status (married = 1), child (having child = 1), and political orientation (1 = right to 10 = left)

Data preparation

The original data is freely accessible from the European Values Study website in various formats. For the purposes of this lab, we can also access a lightly edited dataset in native .Rds format from the Data page (evs2017.rds). We can either download the dataset to a local data folder and load it from there, or loaded directly from the web:

evs <- datawizard::data_read("https://cgmoreh.github.io/HSS8005/data/evs2017.rds")

Once the data is loaded in the workspace environment, we can select the variables that we are interested in, inspect them in a codebook and apply any necessary transformations:

# Select the variables needed:

evs <- evs |> 
  select(v106,      # redistribution pref.
         v243_r,    # education
         v31,       # social trust
         v225,      # sex/gender
         age,
         v244,      # employment status
         v261_ppp,  # household income (corrected)
         v234,      # marital status
         v239_r,    # number of children
         v102,      # political orientation (1=left...10=right)
         )

View the codebook for the selected variables:

data_codebook(evs) |> View()

Recode variables to match Akaeda (2023) as closely as possible:

akaeda <- evs |> 
  data_modify(redistribute = reverse(v106),
              # recode education to a binary; 
              # treat the original variable as a numeric variable in the process to allow referring to numeric codes rather than the long text labels
              educ_univ = recode_values(as_numeric(v243_r),
                                   recode = list(`1` = 3,
                                                 `0` = c(1, 2, 66))),
              # inverse to make "most people can be trusted" the second (indicator) category
              trusting = recode_values(as_numeric(v31),
                                       recode = list(`1` = 1,
                                                     `0` = 2)),
              # just rename as "female" is already the second category
              female = v225,
              # recode "employment" as a factor/categorical (without treating it as numeric and adding labels later)
              employment = recode_values(v244,
                                         recode = list("Employed" = c("30h a week or more", 
                                                                      "less then 30h a week"), 
                                                       "Self-employed" = "self employed", 
                                                       "Unemployed" = "unemployed", 
                                                       "Retired" = "retired/pensioned", 
                                                       "Other" = c("military service", 
                                                                   "homemaker not otherwise employed", 
                                                                   "student", 
                                                                   "disabled", 
                                                                   "other"))),
              hh_income = v261_ppp,
              # recoding marital status as numeric
              married = recode_values(as_numeric(v234),
                                      recode = list("1" = 1,
                                                    "0" = 2:6)),
              
              child = recode_values(v239_r,
                                    recode = list("0" = 1,
                                                  "1" = 2:6)),
              politics = reverse(v102)
              ) |> 
  set_labels(married, labels = c("Not married", "Maried")) |> 
  set_labels(trusting, labels = c("Not trusting", "Trusting")) |> 
  set_labels(educ_univ, labels = c("No university education", "University education")) |> 
  set_labels(child, labels = c("No children", "Has children")) |> 
  # Treat them as factors again for better reporting of labels
  to_label(married, trusting, educ_univ, child, married)

View the codebook for the recoded dataset:

data_codebook(akaeda) |> View()

Relationships between the main variables of interest

Following Akaeda (2023), the main variables of interest are redistribute, educ_univ and trusting. We can inspect some basic relationships between them using boxplots and cross-tabulations:

# Boxplots:

ggformula::gf_boxplot(redistribute ~ educ_univ, data = akaeda)

ggformula::gf_boxplot(redistribute ~ trusting, data = akaeda)

# Cross-tabulations:

# To check the values of a categorical variable by another categorical variable, a contingency table (cross-tabulation) is preferable.
# Native R is not very good at cross-tabulations; there are some better user-written functions, but only relying on packages we have already used so far,
# some options are:

## In base R (table):
### Simple frequencies
table(akaeda$educ_univ, akaeda$trusting)
### Frequencies with added row marginal totals
table(akaeda$educ_univ, akaeda$trusting) |> addmargins(margin = 2)
### Proportions, by row
table(akaeda$educ_univ, akaeda$trusting) |> prop.table(margin = 1)
### Proportions, by column with added column marginal totals
table(akaeda$educ_univ, akaeda$trusting) |> prop.table(margin = 2) |> addmargins(margin = 1)


## In base R (xtabs - formula style):
xtabs(~ educ_univ + trusting, data = akaeda)
#### Proportions by column, rounded to 2 decimals, added column marginal totals
xtabs(~ educ_univ + trusting, data = akaeda)|> prop.table(margin = 2) |> round(2) |> addmargins(margin = 1)

## In tidyverse (dplyr + tidyr)
### Frequencies
akaeda |> 
  group_by(educ_univ, trusting) |> 
  summarise(n=n()) |> 
  spread(trusting, n)
### Proportions
akaeda |> 
  group_by(educ_univ, trusting) |> 
  summarise(n=n()) |> 
  mutate(prop=n/sum(n)) |> 
  subset(select=c("educ_univ","trusting","prop")) |> 
  spread(trusting, prop)
  # or, for % of education by trust:
  # spread(educ_univ, prop)


## In {datawizard}:
### Both frequencies and % in a listing format rather than crosstabulated
akaeda |> 
  data_group(educ_univ) |>
  data_tabulate(trusting, collapse = TRUE)


## In {gtsummary}:
akaeda |> 
  gtsummary::tbl_cross(educ_univ, trusting,
                     percent = "row",
                     missing = "no")

Multiple regression model without interaction terms

Our dependent variable (redistribute) can be treated as numeric, so we will fit a linear (ordinary least squares) regression model using the lm() function:

m1 <- lm(redistribute ~ educ_univ + trusting + female + employment + hh_income + married + child + politics, data = akaeda)

Let’s then check the model coefficients (parameters):

model_parameters(m1)

Model with an interaction between education and social trust

We can interact predictor variables in regression models using the : and * operators. With :, we include in the regression model only the interaction term, but not the component variables (i.e. their main effects); to include the component variables as well, we need to add them in as usual with the + operator. To note that we should normally want to include both main effects and and the interaction effects in a regression model. With *, we include both the main effects and the interaction effects, so it provides a shortcut to using :, but it is often useful to combine the two in more complex interaction scenarios.

The specifications below are equivalent:

# Interaction using ":"
# including the constituent terms must be done manually
m2 <- lm(redistribute ~ educ_univ + trusting + educ_univ:trusting + female + employment + hh_income + married + child + politics, data = akaeda)

# Interaction using "*"
# including the constituent terms is done automatically
m2 <- lm(redistribute ~                        educ_univ*trusting + female + employment + hh_income + married + child + politics, data = akaeda)
m2 <- lm(redistribute ~ educ_univ + trusting + educ_univ*trusting + female + employment + hh_income + married + child + politics, data = akaeda)

But the one below is missing the constituent terms and is very likely not what we want:

m2_wrong <- lm(redistribute ~                  educ_univ:trusting + female + employment + hh_income + married + child + politics, data = akaeda)

Out of precaution, it is advisable to use the * operator and to manually include all the constituent terms as well, as in the third version above.

Let’s then check the model coefficients (parameters):

model_parameters(m2)

We can compare our results with those reported in Model 3 (Table 2) of Akaeda (2023), keeping in mind that our data are very different. Nonetheless, we find a similar positive effect for the interaction term (0.41), which in our case is stronger.

Interpreting interactions

As with results from logistic regression models, interpreting interaction results is not straightforward.

First of all, the coefficients associated with the individual variables included in the interaction term can no longer be interpreted in the usual direct way, and we usually no longer interpret the “main effects” of educ_univ and trusting, but only the multiplicative effect of the interaction between them. The interpretation of the numeric results, however, is rather convoluted and prone to lead to misinterpretations. In essence, what we get is the effect of a unit-change in trust among the university-educated compared to the effect of a unit-change in trust among the non-university-educated.

As with interpreting logistic regression, it can be much easier and reliable to interpret a plot of the “marginal effects”. As we already know from Lab 4, that can be achieved using the ggpredict() and the plot() functions, and we would add both constitutive terms of the interaction. Depending on which variable we write first we get either the effect of trust by education, or the effect of education by trust:

Education by trust:

ggpredict(m2, terms = c("educ_univ", "trusting")) |> 
  plot()

Trust by education:

ggpredict(m2, terms = c("trusting", "educ_univ")) |> 
  plot()

The latter plot reproduces Figure 1 from Akaeda (2023). To get an even closer reproduction of that Figure, we could add an additional argument to the plotting function, asking for the point-predictions to be connected with lines:

ggpredict(m2, terms = c("trusting", "educ_univ")) |> 
  plot(connect_lines = TRUE)

Looking at whether and how much the confidence intervals of the predicted coefficients overlap with the point-predictions of the coefficients, we can visually asses the interaction effect. We find that positive differences in the level of social trust have a much steeper effect on redistributive attitudes among the highly educated than it does among the lower educated.

Example 2. Österman (2021): Do educational reforms have different effects by levels of parental education?

This research question is at the core of the study conducted by Österman (2021). We have explored this article and used its data to fit simple binary and multiple regression models in Week 2. We can now take that analysis further and replicate some of the models reported in Table 3 of that article:

Exercise 1. Akaeda (2023)

Following a review of the literature on the relationship between attitudes towards redistribution, education, and social- and institutional trust, Akaeda (2023) derives a hypothesis they want to investigate: “trust decreases the gap in preferences for redistribution due to education” (p. 296). They break down the hypothesis into two parts, one relating to social trust and the other to institutional trust. In this exercise, follow the description and steps in the Notes page to answer a similar question to this by using only data on European countries from the European Values Study.

Note

Questions

How would you interpret the effect of education and of social trust on redistributive attitudes based on Model 1?
How would you interpret the interaction effect of education by social trust on redistributive attitudes based on Model 2?
How does Akaeda (2023) discuss their findings in respect to social trust? Read the relevant sections in the journal article and attempt to write down our own findings along those lines.
Fit another model with an interaction between female and social trust and attempt an interpretation of the results.

Exercise 2. Österman (2021): Model 1 and Model 2, main variables

Following the second example in the Notes, fit models osterman_m1 and osterman_m2.

Note

Questions

Read through the Osterman article and check your understanding of how the author interprets these results. Particularly, how does the interaction model help us elucidate causal factors in this regression model?
Fit another model similar to osterman_m2, but with an interaction between female and reform1_7. How doe you interpret the results from this interaction? Is there a difference in how educational reforms affect social trust among men and women?

Exercise 3. Österman (2021): Model 1 and Model 2, full covariate models

As described in the Notes, the models reported in Table 3 of Österman (2021) also included a number of additional covariates for statistical control (fbrneur, mbrneur, fnotbrneur, mnotbrneur, agea, essround, yrbrn, eform_id_num, including interactions between some of those control variables: yrbrn*yrbrn, yrbrn*reform_id_num, agea*reform_id_num, agea*agea, agea*agea*reform_id_num).

Fit a more detailed version of model osterman_m2 from Exercise 2 that also includes these covariates, and compare your results to the simpler models fit in Exercise 2 and those reported by Österman (2021). Keep in mind that the results will still diverge from those reported in the original study because you are still not using sample weights or clustering.

Exercise 4. Österman (2021): Model 3, “flexible interaction” model

Österman (2021:223) writes:

“In the third main model I allow the effect of parental education to vary with all other independent variables, including the reform-fixed effects. This flexible model explores whether there exists any other conditional relationship between parental education and the covariates that could potentially bias the interaction estimate between parental education and reform exposure”

He calls this a “flexible interaction” model - also referred to as a “fully dummy-interactive” model given the binary nature of the interaction variable.

Attempt to fit a model that replicates the “flexible interaction” specification of Model 3 in Table 3 of Österman (2021). This model would include interaction terms not only between “Reform” and “High parental education”, but parental education is also interacted with all the other covariates.

References

Akaeda, Naoki. 2023. “Trust and the Educational Gap in the Demand for Redistribution: Evidence from the World Values Survey and the European Value Study.” International Sociology 38(3):290–310. doi: 10.1177/02685809231167834.

Berry, W. D., M. Golder, and D. Milton. 2012. “Improving Tests of Theories Positing Interaction.” Journal of Politics 74(3):653–71. doi: 10.1017/S0022381612000199.

Brambor, T., W. R. Clark, and M. Golder. 2006. “Understanding Interaction Models: Improving Empirical Analyses.” Political Analysis 14(1):63–82. doi: 10.1093/pan/mpi014.

Clark, William Roberts, and Matt Golder. 2023. Interaction Models: Specification and Interpretation. Cambridge: Cambridge University Press.

Dingemans, Ellen, and Erik Van Ingen. 2015. “Does Religion Breed Trust? A Cross-National Study of the Effects of Religious Involvement, Religious Faith, and Religious Context on Social Trust.” Journal for the Scientific Study of Religion 54(4):739–55. doi: 10.1111/jssr.12217.

Hainmueller, J., J. Mummolo, and Y. Q. Xu. 2019. “How Much Should We Trust Estimates from Multiplicative Interaction Models? Simple Tools to Improve Empirical Practice.” Political Analysis 27(2):163–92. doi: 10.1017/pan.2018.46.

Mitchell, Jeffrey. 2021. “Social Trust and Anti-Immigrant Attitudes in Europe: A Longitudinal Multi-Level Analysis.” Frontiers in Sociology 6.

Österman, Marcus. 2021. “Can We Trust Education for Fostering Trust? Quasi-experimental Evidence on the Effect of Education and Tracking on Social Trust.” Social Indicators Research 154(1):211–33. doi: 10.1007/s11205-020-02529-y.

Wu, Cary. 2021. “Education and Social Trust in Global Perspective.” Sociological Perspectives 64(6):1166–86. doi: 10.1177/0731121421990045.