Week 7 Application Lab

Paths: Graphical models and considerations for causal analysis

Aims

This session will help you think more carefully about the causal claims you can make from a regression-based analysis of observational (i.e. non-experimental) data. We will practice some more elementary methods for slicing data to assess group differences and relate them to the linear regression methods familiar from previous weeks. We will also take a step back from data to think more conceptually about some of the models we have encountered so far, and to begin thinking about the research questions and the theoretical and empirical estimands (quantities of interest) that you would like to investigate on a dataset of your choice. These will be the first steps towards the analysis you will be submitting for assessment in two months’ time.

Setup

In Week 1 you set up R and RStudio, and an RProject folder (we called it “HSS8005_labs”) with an .R script and a .qmd or .Rmd document in it (we called these “Lab_1”). Ideally, you saved this on a cloud drive so you can access it from any computer (e.g. OneDrive). You will be working in this folder.

Create a new Quarto markdown file (.qmd) for this session (e.g. “Lab_7.qmd”) and work in it to complete the exercises and report on your final analysis.

Packages

## Install first if not yet installed

library("tidyverse")
library("sjlabelled")
library("easystats")
library("ggformula")
library("marginaleffects")
library("modelsummary")
library("gtsummary")
library("sjPlot")
library("sjmisc")
library("ggeffects")

Introduction

Exercise 1: Estimating the persuasive power of the news media

In this exercise we will focus on data from an article by Ladd and Lenz (2009) (follow the doi link in the citation to access the article). You can also download the article from here and download the article’s appendix from here). In this article the authors aim to estimate whether (print) media has the power to persuade people to vote differently. In general terms, this is a very interesting question, even though the power of print media has definitely weakened over the past few decades and measuring the effect of alternative media sources would present additional challenges. Nevertheless, the authors attempt to take advantage of a unique natural experiment that arguably facilitates tackling this causal question: between the 1992 and 1997 UK general elections, four newspapers (the Sun, Daily Star, Independent, and Financial Times) changed their editorial stance and tone from supporting the Conservative Party to supporting Tony Blair’s Labour Party. Such radical shifts editorial stance are extremely rare. The data they use come from several waves of the British Election Panel Study from between 1992 and 1997 and include variables on voting behaviour in 1992 and 1997, as well as whether the respondent was a reader of one of the “switching” newspapers. These are the main variables that are useful for tackling the causal effect involved in the research question, but there are several other variable that provide various controls.

Data

We can import the dataset (available in Stata’s.dta format) from the course website. For the purpose of this exercise, we select only three core variables of interest:

# Let's call the data object "news" for simplicity

news <- sjlabelled::read_stata("https://cgmoreh.github.io/HSS8005-data/LaddLenz.dta") |> 
  select(tolabor, vote_l_92, vote_l_97) |> 
  # let's also rename the variable `tolabor` to `reader`, which sounds more intuitive, perhaps
  rename(reader = tolabor)

data_codebook(news)

Comparing proportions

The aim of Ladd and Lenz (2009) is to estimate the effect of “reading a switching newspaper” on respondents’ voting behaviour change between the 1992 and 1997 elections. In terms of our variables in the dataset, they aim to estimate vote_l_97 (Voted Labour in 1997) from vote_l_92 (Prior Labour vote in 1992) as moderated by reader (treatment: indicator of whether reading a switching newspaper). All three variables are binary indicator variables.

We have data consisting of measurements on the same individuals at two different time points (1992 and 1997), which allows us to think in causal terms. But the question could be first broken down into two smaller questions exploring:

the average treatment effect of tolabour in a cross-sectional design (oblivious of prior vote)
a before/after comparison of the treated group, comparing their average vote in 1997 to their average vote in 1992
a differences-in-differences comparison of the average changes over time in the treatment group and average changes over time in the control group.

Overall mean vote for Labour in 1997 and 1992

What is the overall proportion of those voting Labour in the 1997 elections in the sample?

mean(news$vote_l_97)

[1] 0.4507219

And it 1992:

mean(news$vote_l_92)

[1] 0.3314501

As with all proportions, these can be read as a percentage if multiplied by 100, and we can also round the numbers to one decimal point:

round(mean(news$vote_l_97) * 100, digits = 1)

[1] 45.1

round(mean(news$vote_l_92) * 100, 1) # we can even omit "digits = "

[1] 33.1

So the difference in the 1997 and 1992 Labour vote is:

mean(news$vote_l_97 - news$vote_l_92)

[1] 0.1192718

round(mean(news$vote_l_97 - news$vote_l_92) * 100, digits = 1)

[1] 11.9

percentage points.

To get a better sense of the data that generates these averages, let’s tabulate the 97-92 Labour vote difference:

data_tabulate(news$vote_l_97 - news$vote_l_92)

outcome: voted labor in 1997 (news$vote_l_97 - news$vote_l_92) <numeric>
# total N=1593 valid N=1593

Value |    N | Raw % | Valid % | Cumulative %
------+------+-------+---------+-------------
-1    |   49 |  3.08 |    3.08 |         3.08
0     | 1305 | 81.92 |   81.92 |        85.00
1     |  239 | 15.00 |   15.00 |       100.00
<NA>  |    0 |  0.00 |    <NA> |         <NA>

It’s important to understand how these data come about.

Think (look again) at the coding of the variables involved. What do you think the values in the table represent?

Which category are those who voted Labour in 1992 but did not vote Labour in 1997?
Which category are those who did not vote Labour in 1992, but voted Labour in 1997?
What does the 0 category stand for here?

Conditional averages

What about the proportion of Labour voters among those who read/not read papers that shifted their editorial support?

# We can first break down the dataset into two:
not_reader <- news |>  filter(reader == 0) 
reader <- news |>  filter(reader == 1) 

# Then calculate means within each:
not_reader_97_lab_vote_mean <- mean(not_reader$vote_l_97)
reader_97_lab_vote_mean     <- mean(reader$vote_l_97)

# With the result:
not_reader_97_lab_vote_mean

[1] 0.4305355

reader_97_lab_vote_mean

[1] 0.5829384

news |>  
  group_by(reader) |>  
  summarize(mean(vote_l_97, na.rm = TRUE))

# A tibble: 2 × 2
  reader `mean(vote_l_97, na.rm = TRUE)`
   <dbl>                           <dbl>
1      0                           0.431
2      1                           0.583

mean(news$vote_l_97[news$reader == 0], na.rm = TRUE)

[1] 0.4305355

mean(news$vote_l_97[news$reader == 1], na.rm = TRUE)

[1] 0.5829384

Difference between “treatment” groups in 1997: “Average Treatment Effect”

The average treatment effect would be the difference between the two groups in the 1997 vote share:

between_97 <- reader_97_lab_vote_mean - not_reader_97_lab_vote_mean
between_97

[1] 0.1524029

ATE <- between_97
round(ATE * 100, 1)

[1] 15.2

Between in 1992

not_reader_92_lab_vote_mean <- mean(not_reader$vote_l_92)
reader_92_lab_vote_mean     <- mean(reader$vote_l_92)

not_reader_92_lab_vote_mean

[1] 0.3227207

reader_92_lab_vote_mean

[1] 0.3886256

between_92 <- reader_92_lab_vote_mean - not_reader_92_lab_vote_mean
between_92

[1] 0.0659049

round(between_92 * 100, 1)

[1] 6.6

Before/After difference within “treatment” groups

mean(news$vote_l_92[news$reader == 0], na.rm = TRUE)

[1] 0.3227207

mean(news$vote_l_92[news$reader == 1], na.rm = TRUE)

[1] 0.3886256

within_nonreader  <- not_reader_97_lab_vote_mean - not_reader_92_lab_vote_mean
within_reader     <- reader_97_lab_vote_mean - reader_92_lab_vote_mean

within_nonreader
within_reader

[1] 0.1078148
[1] 0.1943128

Differences-in-Differences

We can get the “difference in differences” effect in several ways:

DiD <- within_reader - within_nonreader

DiD

[1] 0.08649803

between_97 - between_92

[1] 0.08649803

We can also obtain the DiD estimate with a linear regression model, in which the outcome \(y\) is the within-treatment-group difference and the predictor \(x\) is the “treatment” (reader). We can calculate the \(y\) as part of the regression command, but to include a mathematical operation within the regression function, we need to “isolate” the mathematical operation using the I() function:

DiD_reg <- lm(I(vote_l_97 - vote_l_92) ~ reader, data = news)
get_parameters(DiD_reg)

    Parameter   Estimate
1 (Intercept) 0.10781476
2      reader 0.08649803

Look back at the tabulation of the vote_l_97 - vote_l_92 difference from earlier to check again the values taken by \(y\) in this model.

What does the (Intercept) term represent here?

Linear probability model

The comparison of proportions that we calculated earlier relies on comparing average changes. We can also asses the effects involved using ordinary least squares (OLS) linear regression (a so-called “linear probability model” given that our outcome is binary). For example, we can check what the overall mean vote for Labour in 1997 was in the sample, and the average treatment effect we calculated earlier:

## the overall mean of Labour vote in 1997
mean_l_97 <- lm(vote_l_97 ~ 1, data = news)
coefficients(mean_l_97)

(Intercept) 
  0.4507219

In the above, we do not include any predictor \(y\) variables; instead, we specify a number 1 on the right hand side of the equation to get an “intercept”, an average of the entire sample.

Including the “treatment” variable reader in the model gives us the average treatment effect form earlier:

ATE_reg <- lm(vote_l_97 ~ reader, data = news)
coefficients(ATE_reg)

(Intercept)      reader 
  0.4305355   0.1524029

The “(Intercept)” here absorbs only one variable category - when the value of reader is 0 (i.e. when the respondent is a “non-reader”, or an “untreated” case) - and as such it tells us the proportion of the 1997 Labour vote among those who were not readers of one of the endorsement-shifting newspapers.

It can often be useful to know how to extract specific coefficients/parameters from regression model results, which allows us to perform further operations on them. Below we use both base-R functions and some equivalent - and simpler - functions from the insight package which is included as part of the easystats meta-package:

## base R: extract intercept coefficient, turn into percetage and round down to 1 decimals
coefficients(ATE_reg)["(Intercept)"]

(Intercept) 
  0.4305355

coefficients(ATE_reg)["(Intercept)"][[1]]

[1] 0.4305355

round(coefficients(ATE_reg)["(Intercept)"][[1]] * 100, 1)

[1] 43.1

## Extracting the intercept value with vbase-R and formatting it using easystats {insight}
coefficients(ATE_reg)["(Intercept)"][[1]] |> format_percent(digits = 1)

[1] "43.1%"

## Extracting the intercept value and formatting it using easystats {insight}
get_intercept(ATE_reg) |> format_percent(1)

[1] "43.1%"

format_percent(get_intercept(ATE_reg), 1)

[1] "43.1%"

What happens if we add together the two coefficients from the ATE_reg model?

coefficients(ATE_reg)["(Intercept)"] + coefficients(ATE_reg)["reader"]

(Intercept) 
  0.5829384

We can also use the the get_parameters() function from insight for this:

# Extracting parameter values (including for the intercept) and manipulating them, using {insight}
get_parameters(ATE_reg)

    Parameter  Estimate
1 (Intercept) 0.4305355
2      reader 0.1524029

get_parameters(ATE_reg)[2]

   Estimate
1 0.4305355
2 0.1524029

get_parameters(ATE_reg)[2] |> sum()

[1] 0.5829384

get_parameters(ATE_reg)[[2]]

[1] 0.4305355 0.1524029

get_parameters(ATE_reg)[[2]] |> sum()

[1] 0.5829384

We can also obtain this parameter directly from a regression specification that omits the intercept term from the model. It is almost never a good idea, in practice, to omit the intercept term from a model, but this example can provide an easy explanation of what happens if we do omit it. To leave out the intercept from a regression in R , we can replace the value of 1 on the right hand side of the equation (that value is always added into the regression model by default, even if we do not specify it) with either a 0 or a -1. W can check below whether the two procedures produce equivalent results:

## the mean of Labour vote in 1997 among the "treated"
reader_mean_l_97a <- lm(vote_l_97 ~  0 + reader, data = news)
reader_mean_l_97b <- lm(vote_l_97 ~ -1 + reader, data = news)

get_parameters(reader_mean_l_97a)

  Parameter  Estimate
1    reader 0.5829384

get_parameters(reader_mean_l_97b)

  Parameter  Estimate
1    reader 0.5829384

## Are the two coefficients for `reader` obtained above identical?

### Extract coefficients with base-R
identical(coefficients(reader_mean_l_97a), 
          coefficients(reader_mean_l_97b)
          )

[1] TRUE

### Extract parameters (names and coefficients) with {insight}
identical(get_parameters(reader_mean_l_97a),
          get_parameters(reader_mean_l_97b)
          )

[1] TRUE

If we treat the reader variable as a factor (categorical) - instead of numeric -, we can see more clearly what the lack of an intercept means, and how the coefficient of reader changes meaning compared to the model with the intercept included:

reader_mean_l_97c <- lm(vote_l_97 ~  0 + as.factor(reader), data = news)

## Pretty print parameters, keeping only the coefficients
get_parameters(reader_mean_l_97c)

           Parameter  Estimate
1 as.factor(reader)0 0.4305355
2 as.factor(reader)1 0.5829384

model_parameters(reader_mean_l_97c) |> print(select = c("Parameter", "Coefficient"), digits = 3)

Parameter  | Coefficient
------------------------
reader [0] |       0.431
reader [1] |       0.583

compare_parameters(reader_mean_l_97c, ci = NULL) |> print(digits = 3)

Parameter    | reader_mean_l_97c
--------------------------------
reader [0]   |             0.431
reader [1]   |             0.583
--------------------------------
Observations |              1593

compare_parameters(reader_mean_l_97a, 
                   reader_mean_l_97b, 
                   reader_mean_l_97c, 
                   ci = NULL) |> print(digits = 3)

Parameter    | reader_mean_l_97a | reader_mean_l_97b | reader_mean_l_97c
------------------------------------------------------------------------
reader       |             0.583 |             0.583 |                  
reader [0]   |                   |                   |             0.431
reader [1]   |                   |                   |             0.583
------------------------------------------------------------------------
Observations |              1593 |              1593 |              1593

Check these regression results against the values for not_reader_97_lab_vote_mean and reader_97_lab_vote_mean obtained previously and attemt an interpretation.

get_parameters(reader_mean_l_97a)

  Parameter  Estimate
1    reader 0.5829384

We can now add the third variable - Labour vote in 1992 - into the model:

m1 <- lm(vote_l_97 ~ reader + vote_l_92, data = news)
coefficients(m1)

(Intercept)      reader   vote_l_92 
  0.2113743   0.1076466   0.6791049

Questions

What does this simple model tell us?

What about if we also include the interaction effect between the two predictors?

m1_int <- lm(vote_l_97 ~ factor(reader) * factor(vote_l_92), data = news)
summary(m1_int)


Call:
lm(formula = vote_l_97 ~ factor(reader) * factor(vote_l_92), 
    data = news)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.92683 -0.20513 -0.20513  0.09641  0.79487 

Coefficients:
                                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)                         0.20513    0.01235  16.607  < 2e-16 ***
factor(reader)1                     0.15921    0.03549   4.486 7.77e-06 ***
factor(vote_l_92)1                  0.69846    0.02174  32.124  < 2e-16 ***
factor(reader)1:factor(vote_l_92)1 -0.13597    0.05763  -2.359   0.0184 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3779 on 1589 degrees of freedom
Multiple R-squared:  0.4247,    Adjusted R-squared:  0.4236 
F-statistic: 390.9 on 3 and 1589 DF,  p-value: < 2.2e-16

Question: How do we interpret the coefficients on this simple interaction model?

If the interpretation of the interaction model proves difficult (as it should), maybe it’s better to visualise the model by plotting it:

predict_response(m1_int, terms = c("reader", "vote_l_92")) |> 
  plot(connect_lines = TRUE)

To make the meaning of the plot even more straightforward for our interpretative purposes, we can make some changes to the y-axis to display vertical lines at the probability-levels that we got from the model coefficients, and label them with the name of the regression terms. Check the code below against the information from the model summary:

predict_response(m1_int, terms = c("reader", "vote_l_92")) |>
  plot(connect_lines = TRUE, dodge = NULL) +
  theme(panel.grid.minor = element_blank(),
        axis.title.y = element_blank()) +
  expand_limits(x = 0, y = 0) +
  scale_y_continuous(breaks = c(0,
                                0.20513, 
                                0.20513 + 0.15921, 
                                0.20513 + 0.69846, 
                                0.20513 + 0.15921 + 0.69846-0.13597,
                                1),
                     labels = c("0%",
                                "92 Not-Lab Not-Reader \n Intercept (20.5%)",
                                "92 Not-Lab Reader",
                                "92 Lab Not-Reader",
                                "92 Lab Reader",
                                "100%")) +
  scale_x_continuous(breaks = c(0, 1))

predict_response(m1_int, terms = c("vote_l_92","reader")) |>
  plot(connect_lines = TRUE, dodge = NULL) +
  theme(panel.grid.minor = element_blank(),
        axis.title.y = element_blank()) +
  expand_limits(x = 0, y = 0) +
  scale_y_continuous(breaks = c(0,
                                0.20513, 
                                0.20513 + 0.15921, 
                                0.20513 + 0.69846, 
                                0.20513 + 0.15921 + 0.69846-0.13597,
                                1),
                     labels = c("0%",
                                "92 Not-Lab Not-Reader \n Intercept (20.5%)",
                                "92 Not-Lab Reader",
                                "92 Lab Not-Reader",
                                "92 Lab Reader",
                                "100%")) +
  scale_x_continuous(breaks = c(0, 1))

References

Ladd, Jonathan McDonald, and Gabriel S. Lenz. 2009. “Exploiting a Rare Communication Shift to Document the Persuasive Power of the News Media.” American Journal of Political Science 53(2):394–410. doi: 10.1111/j.1540-5907.2009.00377.x.