HSS8005
  • Module plan
  • Materials
  • Resources
  • Data
  • Assessment
  • Canvas
  1. Week 7
  2. Notes
  • Weekly materials

  • Week 1
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 2
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 3
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 4
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 5
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 6
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 7
    • Slides
    • Notes
    • Exercises
    • Handout
  • Week 8
    • Slides
    • Notes
    • Exercises
    • Handout

On this page

  • Aims
  • R packages
  • Example 1: basic tabulations of hierarchical and time-series structures
  • Example 2: (Mitchell 2021): longitudinal multi-level analysis
    • Model 1:
    • Model 2:
    • Model 3:
    • Model 4:
    • Tabulating all the models:
  • Example 3: Longer panel data estimation with {plm}
    • Pooled model
    • Fixed effects model
    • Random effects
    • Tabulating all the models:
  1. Week 7
  2. Notes

Week 7 Worksheet Notes

Temporalities: Time series, panel and longitudinal data analysis

Aims

By the end of the session you will learn how to:

  • fit random-intercepts and random coefficients models to hierarchical data that contains a temporal dimension
  • use several R packages for modelling time-series data

R packages

# Just in case we get errors asking that the package repositories be explicitly set when installing new packages:

options(repos = list(CRAN = "http://cran.rstudio.com/"))

# Install and load required packages

# We can use the {pacman} package to easily install and load several packages:
# ({pacman} itself may need installing if not yet installed)

pacman::p_load(
  tidyverse, sjlabelled, easystats, 
  ggformula, ggeffects, marginaleffects, 
  modelsummary, gtsummary, sjPlot,
  survey, sandwich, lmtest, lme4,            # for general hierarchical modelling
  panelr, plm, pglm)                         # for time-series hierarchical data structures

Example 1: basic tabulations of hierarchical and time-series structures

Let’s import this simulated toy dataset:

test <- data_read("https://cgmoreh.github.io/HSS8005-24/data/test.dta")
Reading data...
Variables where all values have associated labels are now converted into
  factors. If this is not intended, use `convert_factors = FALSE`.
1 out of 9 variables were fully labelled and converted into factors.

we can peek at the data to get a feel for the variables:

datawizard::data_peek(test) |> print_md()
Data frame with 375 rows and 9 variables
Variable Type Values
teacher numeric 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
student numeric 1, 1, 1, 1, 1, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, …
age numeric 14, 15, 16, 17, 18, 14, 15, 16, 17, 18, 14, …
cage numeric -2, -1, 0, 1, 2, -2, -1, 0, 1, 2, -2, -1, 0, …
sex factor Male, Male, Male, Male, Male, Male, …
score numeric 64, 68, 69, 71, 75, 58, 60, 71, 69, 72, 54, …
pass numeric 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, …
experience numeric 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
zero numeric 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

We can also check the first few and last few observations in a dataframe using the head() and tail() functions included in base-R. The {gt} package also has a nice function (gt_preview()) that can do the same more flexibly and output it as a single table:

gt::gt_preview(test, top_n = 3, bottom_n = 3, incl_rownums = TRUE)
teacher student age cage sex score pass experience zero
1 1 1 14 -2 Male 64 0 5 0
2 1 1 15 -1 Male 68 0 5 0
3 1 1 16 0 Male 69 0 5 0
4..372
373 3 75 16 0 Female 81 1 8 0
374 3 75 17 1 Female 85 1 8 0
375 3 75 18 2 Female 86 1 8 0

This allows us to see more clearly how teachers, students and ages relate to each other and how they are nested.

To check more directly the number of students and teachers in the dataset, we could use the n_unique() function from {dplyr} in the {tidyverse}:

n_unique(test$student)
[1] 75
n_unique(test$teacher)
[1] 3

We can also check the distribution of the variables; we can break the sex variable (a categorical variable) into its two constituent categories and treat it as numeric for this purpose:

test |> datawizard::to_numeric() |> 
  datawizard::describe_distribution()
Variable   |  Mean |    SD | IQR |          Range | Skewness | Kurtosis |   n | n_Missing
-----------------------------------------------------------------------------------------
teacher    |  2.00 |  0.82 |   2 |   [1.00, 3.00] |     0.00 |    -1.50 | 375 |         0
student    | 38.00 | 21.68 |  38 |  [1.00, 75.00] |     0.00 |    -1.20 | 375 |         0
age        | 16.00 |  1.42 |   2 | [14.00, 18.00] |     0.00 |    -1.30 | 375 |         0
cage       |  0.00 |  1.42 |   2 |  [-2.00, 2.00] |     0.00 |    -1.30 | 375 |         0
sex.Female |  0.51 |  0.50 |   1 |   [0.00, 1.00] |    -0.03 |    -2.01 | 375 |         0
sex. Male  |  0.49 |  0.50 |   1 |   [0.00, 1.00] |     0.03 |    -2.01 | 375 |         0
score      | 68.71 |  7.20 |  10 | [51.00, 87.00] |     0.14 |    -0.28 | 375 |         0
pass       |  0.39 |  0.49 |   1 |   [0.00, 1.00] |     0.46 |    -1.80 | 375 |         0
experience |  6.33 |  1.25 |   3 |   [5.00, 8.00] |     0.38 |    -1.50 | 375 |         0
zero       |  0.00 |  0.00 |   0 |   [0.00, 0.00] |          |          | 375 |         0

Finally, it’s worth looking at the timeline of how test scores change across time (i.e. ages) for

ggplot(test, aes(x= age, score)) +
  theme_bw() +
  geom_line(aes(group= student), color= "blue")

Example 2: (Mitchell 2021): longitudinal multi-level analysis

The article by Mitchell (2021) uses the European Social Survey (2002–2016) to test how differences in social trust, both within and between countries influence attitudes about immigrants. The ESS is a cross-national survey with representative samples of 34 countries, all of them are included in this study. Three items were included in each of the eight waves of the ESS to measure attitudes about immigrants:

  • “Would you say it is generally bad or good for [country]’s economy that people come to live here from other countries?” (0= “Bad for the economy,” 10 = “Good for the economy”).
  • “Would you say that [country]’s cultural life is generally undermined or enriched by people coming to live here from other countries?” (0 = “Cultural life undermined,” 10= “Cultural life enriched”).
  • “Is [country] made a worse or a better place to live by people coming to live here from other countries?,” (0= “Worse place to live,” 10 = “Better place to live”).

An index was created by averaging responses to each of the three questions.

The main independent variable measures generalized trust. At the individual level, trust is measured through the question, “generally speaking, would you say that most people can be trusted, or that you can’t be too careful in dealing with people?” On a 10 point scale (0= “You cant be too careful,” 10= “most people can be trusted”). To isolate individuals’ trust levels in a way that is relative to the respondents’ country, responses were centered against the average level of generalized trust in that country at that time. This means that, the variable included in the analysis is the level of difference people say that they can trust others, compared to others in their country during the survey wave.

Due to the structure of the data, a multi-level modeling approach that nests respondents inside of country-years and counties was employed for this study. This allows for both the estimation of the relationship between contextual variables of interest and anti-immigrant attitudes both cross-sectionally and longitudinally. Since the responses to the surveys are in part dependent on the groups that the respondents belong, models that accommodate for this country and country-year grouping are better suited for this analysis than models with no grouping structure.

The analysis follows a model building approach where Model 1 is an “empty” model for the dependent variable and its nesting structure in the three level model. Model 2 tests the relationship between the individual level independent variables and the dependent variable measuring respondents’ attitudes about immigrants. Model 3 adds the country level independent variables, including country level social trust. In the first three models random intercepts are used in estimation, and Model 4 adds random slopes to the estimates in the three- level models. These models are presented in Table 1 of the article (see there).

First we load the already prepared dataset:

mitchell <- data_read("https://cgmoreh.github.io/HSS8005-24/data/mitchell2021.rds")
Reading data...
Variables where all values have associated labels are now converted into
  factors. If this is not intended, use `convert_factors = FALSE`.
18 out of 70 variables were fully labelled and converted into factors.
Following 1 variables are empty:
  edulvlb
  
Use `remove_empty_columns()` to remove them from the data frame.

Model 1:

#Null model ------------------------------------------------------------------------------------------------------
m1  <- lmer(immatt~ 1 + 
            (1|cntry/cntryyr), data=mitchell)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00277535 (tol = 0.002, component 1)
m1.1<- lmer(immatt~ 1 + 
              (1|cntry), data=mitchell)

m1.2<- lmer(immatt~ 1 + 
              (1|cntryyr), data=mitchell)

m1.3<- lmer(immatt~ 1 + 
              (1|cntry) +
              (1|cntry:cntryyr),
            data=mitchell)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00277535 (tol = 0.002, component 1)
m1.4<- lmer(immatt~ 1 + 
              (1|cntry) +
              (1|cntryyr:cntry),
            data=mitchell)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00277535 (tol = 0.002, component 1)
summary(m1)
Linear mixed model fit by REML ['lmerMod']
Formula: immatt ~ 1 + (1 | cntry/cntryyr)
   Data: mitchell

REML criterion at convergence: 1334089

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.3198 -0.6271  0.0599  0.6635  3.5527 

Random effects:
 Groups        Name        Variance Std.Dev.
 cntryyr:cntry (Intercept) 0.07828  0.2798  
 cntry         (Intercept) 0.61843  0.7864  
 Residual                  4.03754  2.0094  
Number of obs: 314934, groups:  cntryyr:cntry, 200; cntry, 34

Fixed effects:
            Estimate Std. Error t value
(Intercept)   4.9262     0.1368   36.02
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00277535 (tol = 0.002, component 1)
parameters::compare_models(m1, m1.1, m1.2, m1.3, effects = "all")
# Fixed Effects

Parameter   |                m1 |              m1.1 |              m1.2 |              m1.3
-------------------------------------------------------------------------------------------
(Intercept) | 4.93 (4.66, 5.19) | 4.92 (4.65, 5.19) | 4.99 (4.88, 5.10) | 4.93 (4.66, 5.19)

# Random Effects

Parameter                     |        m1 |      m1.1 |      m1.2 |      m1.3
-----------------------------------------------------------------------------
SD (Residual)                 | 2.01 (, ) | 2.03 (, ) | 2.01 (, ) | 2.01 (, )
SD (Intercept: cntry)         | 0.79 (, ) | 0.80 (, ) |           | 0.79 (, )
SD (Intercept: cntryyr:cntry) | 0.28 (, ) |           |           |          
SD (Intercept: cntryyr)       |           |           | 0.76 (, ) |          
SD (Intercept: cntry:cntryyr) |           |           |           | 0.28 (, )
modelsummary::modelsummary(list("cntry/cntryyr" = m1, m1.1, m1.2, m1.3, "cntry+cntryyr:cntry" = m1.4))
cntry/cntryyr cntry+cntryyr:cntry
(Intercept) 4.926 4.918 4.990 4.926 4.926
(0.137) (0.138) (0.054) (0.137) (0.137)
SD (Observations) 2.009 2.025 2.009 2.009 2.009
SD (Intercept cntry) 0.786 0.804 0.786 0.786
SD (Intercept cntryyrcntry) 0.280 0.280
SD (Intercept cntryyr) 0.763
SD (Intercept cntrycntryyr) 0.280
Num.Obs. 314934 314934 314934 314934 314934
R2 Marg. 0.000 0.000 0.000 0.000 0.000
R2 Cond. 0.147 0.136 0.126 0.147 0.147
AIC 1334097.2 1338521.2 1334365.6 1334097.2 1334097.2
BIC 1334139.9 1338553.2 1334397.6 1334139.9 1334139.9
ICC 0.1 0.1 0.1 0.1 0.1
RMSE 2.01 2.03 2.01 2.01 2.01

Model 2:

#Model with individual level variables ----------------------------------------------------------------------------------
m2<-lmer(immatt~  ppltrustd+ young+ old + university + gndr+ hincfel+ lrscale+(1|cntry)+  (1|cntryyr), data=mitchell)
summary(m2)
Linear mixed model fit by REML ['lmerMod']
Formula: immatt ~ ppltrustd + young + old + university + gndr + hincfel +  
    lrscale + (1 | cntry) + (1 | cntryyr)
   Data: mitchell

REML criterion at convergence: 1096546

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9489 -0.6138  0.0431  0.6506  4.3466 

Random effects:
 Groups   Name        Variance Std.Dev.
 cntryyr  (Intercept) 0.07645  0.2765  
 cntry    (Intercept) 0.46891  0.6848  
 Residual             3.43915  1.8545  
Number of obs: 268995, groups:  cntryyr, 198; cntry, 34

Fixed effects:
                                         Estimate Std. Error t value
(Intercept)                              5.374154   0.121853  44.103
ppltrustd                                0.195515   0.001637 119.448
young                                    0.158414   0.010878  14.563
old                                     -0.284244   0.009190 -30.931
university                               0.706875   0.009983  70.805
gndr                                    -0.023206   0.007199  -3.223
hincfelCoping on present income         -0.279744   0.009005 -31.064
hincfelDifficult on present income      -0.535303   0.012056 -44.403
hincfelVery difficult on present income -0.803543   0.017141 -46.879
lrscale1                                 0.085591   0.030502   2.806
lrscale2                                -0.655995   0.027094 -24.212
lrscale3                                 0.202945   0.025126   8.077
lrscale4                                 0.167045   0.023010   7.260
lrscale5                                 0.071751   0.022953   3.126
lrscale6                                -0.267820   0.020911 -12.808
lrscale7                                -0.155967   0.023003  -6.780
lrscale8                                -0.241845   0.022773 -10.620
lrscale9                                -0.335819   0.023464 -14.312
lrscaleRight                            -0.488616   0.029146 -16.765

Correlation matrix not shown by default, as p = 19 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

Model 3:

#Full model with context level variables ---------------------------------------------------------------------------------
m3<-lmer(immatt~  ppltrustd+ young +old + university+ gndr+ hincfel+ lrscale+ 
           countrytrustD+ countrytrustM+ nwolD+ nwolM + lGDPD+
           lGDPM + PctFBD + PctFBM+ essround + 
           (1|cntry) +  
           (1|cntryyr), 
         data=mitchell)
summary(m3)
Linear mixed model fit by REML ['lmerMod']
Formula: immatt ~ ppltrustd + young + old + university + gndr + hincfel +  
    lrscale + countrytrustD + countrytrustM + nwolD + nwolM +  
    lGDPD + lGDPM + PctFBD + PctFBM + essround + (1 | cntry) +  
    (1 | cntryyr)
   Data: mitchell

REML criterion at convergence: 1036621

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9669 -0.6108  0.0419  0.6506  4.3838 

Random effects:
 Groups   Name        Variance Std.Dev.
 cntryyr  (Intercept) 0.06101  0.2470  
 cntry    (Intercept) 0.30562  0.5528  
 Residual             3.40292  1.8447  
Number of obs: 254959, groups:  cntryyr, 189; cntry, 32

Fixed effects:
                                         Estimate Std. Error t value
(Intercept)                             10.244812   4.465218   2.294
ppltrustd                                0.197809   0.001682 117.605
young                                    0.140804   0.011116  12.666
old                                     -0.280120   0.009353 -29.950
university                               0.711735   0.010111  70.389
gndr                                    -0.033242   0.007357  -4.518
hincfelCoping on present income         -0.280695   0.009103 -30.834
hincfelDifficult on present income      -0.534626   0.012359 -43.256
hincfelVery difficult on present income -0.807173   0.017778 -45.403
lrscale1                                 0.069075   0.031374   2.202
lrscale2                                -0.733225   0.028132 -26.063
lrscale3                                 0.184455   0.025777   7.156
lrscale4                                 0.134272   0.023604   5.689
lrscale5                                 0.031266   0.023557   1.327
lrscale6                                -0.320745   0.021487 -14.928
lrscale7                                -0.209380   0.023602  -8.871
lrscale8                                -0.302706   0.023374 -12.951
lrscale9                                -0.399881   0.024124 -16.576
lrscaleRight                            -0.564745   0.030231 -18.681
countrytrustD                            0.731459   0.118322   6.182
countrytrustM                            0.375491   0.146294   2.567
nwolD                                   -0.021017   0.009739  -2.158
nwolM                                   -0.098709   0.037864  -2.607
lGDPD                                    0.114872   0.361877   0.317
lGDPM                                   -0.592714   0.465571  -1.273
PctFBD                                   0.017653   0.020988   0.841
PctFBM                                   0.012114   0.021314   0.568
essround                                -0.025951   0.016855  -1.540

Correlation matrix not shown by default, as p = 28 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

Model 4:

#Full model with random slopes for between country trust -----------------------------------------------------------------
# due to convergence issues the REML=FALSE but the estimates are the same
m4<-lmer(immatt~  ppltrustd+ young +old + university+ gndr+ hincfel+ lrscale+ 
           countrytrustD+ countrytrustM+ nwolD+ nwolM + lGDPD+
           lGDPM + PctFBD + PctFBM+ essround + 
           (countrytrustD|cntry) +  
           (1|cntryyr), 
         data=mitchell)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00286425 (tol = 0.002, component 1)
summary(m4)
Linear mixed model fit by REML ['lmerMod']
Formula: immatt ~ ppltrustd + young + old + university + gndr + hincfel +  
    lrscale + countrytrustD + countrytrustM + nwolD + nwolM +  
    lGDPD + lGDPM + PctFBD + PctFBM + essround + (countrytrustD |  
    cntry) + (1 | cntryyr)
   Data: mitchell

REML criterion at convergence: 1036619

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9681 -0.6107  0.0419  0.6507  4.3840 

Random effects:
 Groups   Name          Variance Std.Dev. Corr
 cntryyr  (Intercept)   0.06002  0.2450       
 cntry    (Intercept)   0.30433  0.5517       
          countrytrustD 0.04453  0.2110   1.00
 Residual               3.40292  1.8447       
Number of obs: 254959, groups:  cntryyr, 189; cntry, 32

Fixed effects:
                                         Estimate Std. Error t value
(Intercept)                             11.494698   4.152133   2.768
ppltrustd                                0.197812   0.001682 117.607
young                                    0.140817   0.011116  12.667
old                                     -0.280137   0.009353 -29.952
university                               0.711716   0.010111  70.388
gndr                                    -0.033238   0.007357  -4.518
hincfelCoping on present income         -0.280658   0.009103 -30.830
hincfelDifficult on present income      -0.534571   0.012359 -43.252
hincfelVery difficult on present income -0.807094   0.017778 -45.399
lrscale1                                 0.069090   0.031374   2.202
lrscale2                                -0.733242   0.028132 -26.064
lrscale3                                 0.184458   0.025777   7.156
lrscale4                                 0.134287   0.023604   5.689
lrscale5                                 0.031286   0.023557   1.328
lrscale6                                -0.320695   0.021487 -14.925
lrscale7                                -0.209369   0.023602  -8.871
lrscale8                                -0.302694   0.023374 -12.950
lrscale9                                -0.399874   0.024124 -16.576
lrscaleRight                            -0.564760   0.030231 -18.681
countrytrustD                            0.728344   0.123881   5.879
countrytrustM                            0.408231   0.138753   2.942
nwolD                                   -0.021657   0.009677  -2.238
nwolM                                   -0.106870   0.034795  -3.071
lGDPD                                    0.011592   0.358608   0.032
lGDPM                                   -0.723821   0.433199  -1.671
PctFBD                                   0.018206   0.021093   0.863
PctFBM                                   0.010639   0.020044   0.531
essround                                -0.022654   0.016835  -1.346

Correlation matrix not shown by default, as p = 28 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00286425 (tol = 0.002, component 1)

Tabulating all the models:

We can use the same modelsummary() function we have already used before:

modelsummary::modelsummary(list(m1, m2, m3, m4))
 (1)   (2)   (3)   (4)
(Intercept) 4.926 5.374 10.245 11.495
(0.137) (0.122) (4.465) (4.152)
ppltrustd 0.196 0.198 0.198
(0.002) (0.002) (0.002)
young 0.158 0.141 0.141
(0.011) (0.011) (0.011)
old −0.284 −0.280 −0.280
(0.009) (0.009) (0.009)
university 0.707 0.712 0.712
(0.010) (0.010) (0.010)
gndr −0.023 −0.033 −0.033
(0.007) (0.007) (0.007)
hincfelCoping on present income −0.280 −0.281 −0.281
(0.009) (0.009) (0.009)
hincfelDifficult on present income −0.535 −0.535 −0.535
(0.012) (0.012) (0.012)
hincfelVery difficult on present income −0.804 −0.807 −0.807
(0.017) (0.018) (0.018)
lrscale1 0.086 0.069 0.069
(0.031) (0.031) (0.031)
lrscale2 −0.656 −0.733 −0.733
(0.027) (0.028) (0.028)
lrscale3 0.203 0.184 0.184
(0.025) (0.026) (0.026)
lrscale4 0.167 0.134 0.134
(0.023) (0.024) (0.024)
lrscale5 0.072 0.031 0.031
(0.023) (0.024) (0.024)
lrscale6 −0.268 −0.321 −0.321
(0.021) (0.021) (0.021)
lrscale7 −0.156 −0.209 −0.209
(0.023) (0.024) (0.024)
lrscale8 −0.242 −0.303 −0.303
(0.023) (0.023) (0.023)
lrscale9 −0.336 −0.400 −0.400
(0.023) (0.024) (0.024)
lrscaleRight −0.489 −0.565 −0.565
(0.029) (0.030) (0.030)
countrytrustD 0.731 0.728
(0.118) (0.124)
countrytrustM 0.375 0.408
(0.146) (0.139)
nwolD −0.021 −0.022
(0.010) (0.010)
nwolM −0.099 −0.107
(0.038) (0.035)
lGDPD 0.115 0.012
(0.362) (0.359)
lGDPM −0.593 −0.724
(0.466) (0.433)
PctFBD 0.018 0.018
(0.021) (0.021)
PctFBM 0.012 0.011
(0.021) (0.020)
essround −0.026 −0.023
(0.017) (0.017)
SD (Intercept cntry) 0.786 0.685 0.553 0.552
SD (countrytrustD cntry) 0.211
Cor (Intercept~countrytrustD cntry) 1.000
SD (Observations) 2.009 1.854 1.845 1.845
SD (Intercept cntryyr) 0.276 0.247 0.245
SD (Intercept cntryyrcntry) 0.280
Num.Obs. 314934 268995 254959 254959
R2 Marg. 0.000 0.108 0.163 0.165
R2 Cond. 0.147 0.230 0.244 0.246
AIC 1334097.2 1096590.3 1036683.2 1036685.1
BIC 1334139.9 1096821.4 1037007.1 1037029.9
ICC 0.1 0.1 0.1 0.1
RMSE 2.01 1.85 1.84 1.84

Or the {sjPlot} package would replicate the output presented in the article exactly:

sjPlot::tab_model(m1, m2, m3, m4)
  Immigration bad or good
for country's economy
Immigration bad or good
for country's economy
Immigration bad or good
for country's economy
Immigration bad or good
for country's economy
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 4.93 4.66 – 5.19 <0.001 5.37 5.14 – 5.61 <0.001 10.24 1.49 – 19.00 0.022 11.49 3.36 – 19.63 0.006
Most people can be
trusted or you can't be
too careful
0.20 0.19 – 0.20 <0.001 0.20 0.19 – 0.20 <0.001 0.20 0.19 – 0.20 <0.001
young 0.16 0.14 – 0.18 <0.001 0.14 0.12 – 0.16 <0.001 0.14 0.12 – 0.16 <0.001
old -0.28 -0.30 – -0.27 <0.001 -0.28 -0.30 – -0.26 <0.001 -0.28 -0.30 – -0.26 <0.001
university 0.71 0.69 – 0.73 <0.001 0.71 0.69 – 0.73 <0.001 0.71 0.69 – 0.73 <0.001
Gender -0.02 -0.04 – -0.01 0.001 -0.03 -0.05 – -0.02 <0.001 -0.03 -0.05 – -0.02 <0.001
Feeling about household's
income nowadays: Coping
on present income
-0.28 -0.30 – -0.26 <0.001 -0.28 -0.30 – -0.26 <0.001 -0.28 -0.30 – -0.26 <0.001
Feeling about household's
income nowadays:
Difficult on present
income
-0.54 -0.56 – -0.51 <0.001 -0.53 -0.56 – -0.51 <0.001 -0.53 -0.56 – -0.51 <0.001
Feeling about household's
income nowadays: Very
difficult on present
income
-0.80 -0.84 – -0.77 <0.001 -0.81 -0.84 – -0.77 <0.001 -0.81 -0.84 – -0.77 <0.001
Placement on left right
scale: 1
0.09 0.03 – 0.15 0.005 0.07 0.01 – 0.13 0.028 0.07 0.01 – 0.13 0.028
Placement on left right
scale: 2
-0.66 -0.71 – -0.60 <0.001 -0.73 -0.79 – -0.68 <0.001 -0.73 -0.79 – -0.68 <0.001
Placement on left right
scale: 3
0.20 0.15 – 0.25 <0.001 0.18 0.13 – 0.23 <0.001 0.18 0.13 – 0.23 <0.001
Placement on left right
scale: 4
0.17 0.12 – 0.21 <0.001 0.13 0.09 – 0.18 <0.001 0.13 0.09 – 0.18 <0.001
Placement on left right
scale: 5
0.07 0.03 – 0.12 0.002 0.03 -0.01 – 0.08 0.184 0.03 -0.01 – 0.08 0.184
Placement on left right
scale: 6
-0.27 -0.31 – -0.23 <0.001 -0.32 -0.36 – -0.28 <0.001 -0.32 -0.36 – -0.28 <0.001
Placement on left right
scale: 7
-0.16 -0.20 – -0.11 <0.001 -0.21 -0.26 – -0.16 <0.001 -0.21 -0.26 – -0.16 <0.001
Placement on left right
scale: 8
-0.24 -0.29 – -0.20 <0.001 -0.30 -0.35 – -0.26 <0.001 -0.30 -0.35 – -0.26 <0.001
Placement on left right
scale: 9
-0.34 -0.38 – -0.29 <0.001 -0.40 -0.45 – -0.35 <0.001 -0.40 -0.45 – -0.35 <0.001
Placement on left right
scale: Right
-0.49 -0.55 – -0.43 <0.001 -0.56 -0.62 – -0.51 <0.001 -0.56 -0.62 – -0.51 <0.001
countrytrust D 0.73 0.50 – 0.96 <0.001 0.73 0.49 – 0.97 <0.001
countrytrust M 0.38 0.09 – 0.66 0.010 0.41 0.14 – 0.68 0.003
nwol D -0.02 -0.04 – -0.00 0.031 -0.02 -0.04 – -0.00 0.025
nwol M -0.10 -0.17 – -0.02 0.009 -0.11 -0.18 – -0.04 0.002
l GDPD 0.11 -0.59 – 0.82 0.751 0.01 -0.69 – 0.71 0.974
l GDPM -0.59 -1.51 – 0.32 0.203 -0.72 -1.57 – 0.13 0.095
Pct FBD 0.02 -0.02 – 0.06 0.400 0.02 -0.02 – 0.06 0.388
Pct FBM 0.01 -0.03 – 0.05 0.570 0.01 -0.03 – 0.05 0.596
ESS round -0.03 -0.06 – 0.01 0.124 -0.02 -0.06 – 0.01 0.178
Random Effects
σ2 4.04 3.44 3.40 3.40
τ00 0.08 cntryyr:cntry 0.08 cntryyr 0.06 cntryyr 0.06 cntryyr
0.62 cntry 0.47 cntry 0.31 cntry 0.30 cntry
τ11       0.04 cntry.countrytrustD
ρ01       1.00 cntry
ICC 0.15 0.14 0.10 0.10
N 200 cntryyr 34 cntry 32 cntry 32 cntry
34 cntry 198 cntryyr 189 cntryyr 189 cntryyr
Observations 314934 268995 254959 254959
Marginal R2 / Conditional R2 0.000 / 0.147 0.108 / 0.230 0.163 / 0.244 0.165 / 0.246

And we can also plot the results from a model of interest:

sjPlot::plot_model( m4)

Example 3: Longer panel data estimation with {plm}

Panel data gathers information about several individuals (cross-sectional units) over several periods. The panel is balanced if all units are observed in all periods; if some units are missing in some periods, the panel is unbalanced. A wide panel has the cross-sectional dimension (N) much larger than the longitudinal dimension (T); when the opposite is true, we have a long panel. Normally, the same units are observed in all periods; when this is not the case and each period samples mostly other units, the result is not a proper panel data, but pooled cross-sections model.

In the previous examples our panels were fairly short. In this exercise, we use longer panel data to also demonstrate some additional functionalities in the {plm} package. The dataset is more typical of econometric data and originates from Hill et al.’s book “Principles of Econometrics”.

We can load the dataset from the module website and inspect its contents:

nls <- data_read("https://cgmoreh.github.io/HSS8005-24/data/nls_panel.rda")
Reading data...
Variables where all values have associated labels are now converted into
  factors. If this is not intended, use `convert_factors = FALSE`.
0 out of 18 variables were fully labelled and converted into factors.

we can peek at the data to get a feel for the variables:

datawizard::data_peek(nls) |> print_md()
Data frame with 3580 rows and 18 variables
Variable Type Values
id numeric 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, …
year numeric 82, 83, 85, 87, 88, 82, 83, 85, 87, 88, 82, …
lwage numeric 1.808289, 1.863417, 1.789367, 1.84653, …
hours numeric 38, 38, 38, 40, 40, 48, 43, 35, 42, 42, 48, …
age numeric 30, 31, 33, 35, 37, 36, 37, 39, 41, 43, 35, …
educ numeric 12, 12, 12, 12, 12, 17, 17, 17, 17, 17, 12, …
collgrad numeric 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, …
msp numeric 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
nev_mar numeric 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
not_smsa numeric 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
c_city numeric 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
south numeric 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
black numeric 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
union numeric 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, …
exper numeric 7.666667, 8.583333, 10.17949, 12.17949, …
exper2 numeric 58.77777, 73.67361, 103.622, 148.3399, …
tenure numeric 7.666667, 8.583333, 1.833333, 3.75, 5.25, …
tenure2 numeric 58.77777, 73.67361, 3.361111, 14.0625, …

We can also check the first few and last few observations in a dataframe using the head() and tail() functions included in base-R. The {gt} package also has a nice function (gt_preview()) that can do the same more flexibly and output it as a single table:

gt::gt_preview(nls, top_n = 3, bottom_n = 3, incl_rownums = TRUE)
id year lwage hours age educ collgrad msp nev_mar not_smsa c_city south black union exper exper2 tenure tenure2
1 1 82 1.808289 38 30 12 0 1 0 0 1 0 1 1 7.666667 58.77777 7.666667 58.777770
2 1 83 1.863417 38 31 12 0 1 0 0 1 0 1 1 8.583333 73.67361 8.583333 73.673610
3 1 85 1.789367 38 33 12 0 0 0 0 1 0 1 1 10.179490 103.62200 1.833333 3.361111
4..3577
3578 716 85 1.427116 40 39 12 0 1 0 1 0 1 0 0 15.435900 238.26690 6.083333 37.006950
3579 716 87 1.494368 30 41 12 0 1 0 1 0 1 0 0 17.435900 304.01050 8.166667 66.694450
3580 716 88 1.341422 40 42 12 0 1 0 1 0 1 0 0 18.858970 355.66090 9.583333 91.840270

To check more directly the number of ids and years in the dataset, we could use the n_unique() function from {dplyr} in the {tidyverse}:

n_unique(nls$id)
[1] 716
n_unique(nls$year)
[1] 5

We can also check the distribution of the variables:

datawizard::describe_distribution(nls)
Variable |   Mean |     SD |    IQR |          Range | Skewness | Kurtosis |    n | n_Missing
---------------------------------------------------------------------------------------------
id       | 358.50 | 206.72 | 358.50 | [1.00, 716.00] |     0.00 |    -1.20 | 3580 |         0
year     |  85.00 |   2.28 |   4.00 | [82.00, 88.00] |     0.00 |    -1.57 | 3580 |         0
lwage    |   1.92 |   0.46 |   0.64 |   [0.14, 4.25] |     0.26 |     0.49 | 3580 |         0
hours    |  38.03 |   7.97 |   2.00 |  [1.00, 80.00] |    -1.20 |     3.85 | 3580 |         0
age      |  35.88 |   3.89 |   6.00 | [28.00, 46.00] |     0.11 |    -0.62 | 3580 |         0
educ     |  13.02 |   2.44 |   3.00 |  [4.00, 18.00] |     0.24 |     0.15 | 3580 |         0
collgrad |   0.23 |   0.42 |   0.00 |   [0.00, 1.00] |     1.28 |    -0.36 | 3580 |         0
msp      |   0.66 |   0.47 |   1.00 |   [0.00, 1.00] |    -0.67 |    -1.55 | 3580 |         0
nev_mar  |   0.14 |   0.35 |   0.00 |   [0.00, 1.00] |     2.07 |     2.30 | 3580 |         0
not_smsa |   0.30 |   0.46 |   1.00 |   [0.00, 1.00] |     0.88 |    -1.23 | 3580 |         0
c_city   |   0.30 |   0.46 |   1.00 |   [0.00, 1.00] |     0.86 |    -1.25 | 3580 |         0
south    |   0.42 |   0.49 |   1.00 |   [0.00, 1.00] |     0.31 |    -1.91 | 3580 |         0
black    |   0.28 |   0.45 |   1.00 |   [0.00, 1.00] |     0.97 |    -1.06 | 3580 |         0
union    |   0.26 |   0.44 |   1.00 |   [0.00, 1.00] |     1.07 |    -0.86 | 3580 |         0
exper    |  12.03 |   3.86 |   5.35 |  [1.06, 27.19] |     0.02 |    -0.13 | 3580 |         0
exper2   | 159.60 |  95.46 | 128.86 | [1.12, 739.42] |     0.92 |     1.32 | 3580 |         0
tenure   |   6.95 |   5.17 |   7.83 |  [0.00, 24.75] |     0.64 |    -0.41 | 3580 |         0
tenure2  |  75.01 |  93.66 | 101.83 | [0.00, 612.56] |     1.74 |     2.91 | 3580 |         0

Panel datsets can be organized in mainly two forms: the long form has a column for each variable and a row for each individual-period; the wide form has a column for each variable-period and a row for each individual. Most panel data methods require the long form, but many data sources provide one wide-form table for each variable; assembling the data from different sources into a long form data frame is often not a trivial matter.

The next code sequence creates a panel structure for the dataset using the function pdata.frame() of the {plm} package and displays a small part of this datase:

nlspd <- pdata.frame(nls, index=c("id", "year"))

The function pdim() extracts the dimensions of the panel data:

pdim(nlspd)
Balanced Panel: n = 716, T = 5, N = 3580

In the following, we will fit a series of models estimating wage as a function of education, experience (and a second-order polynomial for experience) and race (“black” vs. “white”), with the aim of demonstrating functions in the plm() package. We could also fit these models using the hierarchical modelling functions form previous examples and exercises.

Pooled model

A pooled model does not allow for intercept or slope differences among individuals. Such a model can be estimated with plm() using the specification pooling:

wage.pooled <- plm(lwage ~ educ + exper + I(exper^2) + black, 
  model="pooling", 
  data = nlspd)
summary(wage.pooled)
Pooling Model

Call:
plm(formula = lwage ~ educ + exper + I(exper^2) + black, data = nlspd, 
    model = "pooling")

Balanced Panel: n = 716, T = 5, N = 3580

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-1.7043039 -0.2469447 -0.0062387  0.2421563  2.6619636 

Coefficients:
               Estimate  Std. Error t-value  Pr(>|t|)    
(Intercept)  0.42706205  0.05691619  7.5033 7.807e-14 ***
educ         0.07451334  0.00274756 27.1198 < 2.2e-16 ***
exper        0.06272218  0.00797946  7.8605 5.026e-15 ***
I(exper^2)  -0.00122307  0.00032263 -3.7910 0.0001525 ***
black       -0.13616209  0.01485167 -9.1681 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    772.56
Residual Sum of Squares: 550.24
R-Squared:      0.28777
Adj. R-Squared: 0.28697
F-statistic: 361.111 on 4 and 3575 DF, p-value: < 2.22e-16
model_parameters(wage.pooled)
# Fixed Effects

Parameter   | Coefficient |       SE |         95% CI | t(3575) |      p
------------------------------------------------------------------------
(Intercept) |        0.43 |     0.06 | [ 0.32,  0.54] |    7.50 | < .001
educ        |        0.07 | 2.75e-03 | [ 0.07,  0.08] |   27.12 | < .001
exper       |        0.06 | 7.98e-03 | [ 0.05,  0.08] |    7.86 | < .001
exper^2     |   -1.22e-03 | 3.23e-04 | [ 0.00,  0.00] |   -3.79 | < .001
black       |       -0.14 |     0.01 | [-0.17, -0.11] |   -9.17 | < .001

Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

Fixed effects model

The fixed effects model takes into account individual differences, translated into different intercepts of the regression line for different individuals. Variables that change little or not at all over time, such as some individual characteristics should not be included in a fixed effects model because they produce collinearity with the fixed effects.

To estimate a fixed effects with {plm}, we can use the option model=“within”, and dummy variables for individuals are created automatically in the background and included in the model:

wage.within <- plm(lwage ~ educ + exper + I(exper^2) + black, 
  model="within", 
  data = nlspd)
summary(wage.within)
Oneway (individual) effect Within Model

Call:
plm(formula = lwage ~ educ + exper + I(exper^2) + black, data = nlspd, 
    model = "within")

Balanced Panel: n = 716, T = 5, N = 3580

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-1.1798141 -0.0693879  0.0052198  0.0789143  1.9969132 

Coefficients:
              Estimate  Std. Error t-value  Pr(>|t|)    
exper       0.05521048  0.00580433  9.5120 < 2.2e-16 ***
I(exper^2) -0.00105655  0.00022838 -4.6263 3.888e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    126.95
Residual Sum of Squares: 110.37
R-Squared:      0.13057
Adj. R-Squared: -0.087246
F-statistic: 214.901 on 2 and 2862 DF, p-value: < 2.22e-16
model_parameters(wage.within)
# Fixed Effects

Parameter | Coefficient |       SE |         95% CI | t(2862) |      p
----------------------------------------------------------------------
exper     |        0.06 | 5.80e-03 | [ 0.04,  0.07] |    9.51 | < .001
exper^2   |   -1.06e-03 | 2.28e-04 | [ 0.00,  0.00] |   -4.63 | < .001

Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

As we can see in the results above, the “race” variable was excluded from the model because it does not vary over time (it stays constant “within” an individual). This is one major limitation of so-called fixed effects (“within”) models.

The within method is equivalent to including the individual “dummies” in the model.

Random effects

The random effects model elaborates on the fixed effects model by recognizing that, since the individuals in the panel are randomly selected, their characteristics, measured by the intercept \(\beta_{1i}\) should also be random. Random effects estimators are reliable under the assumption that individual characteristics (heterogeneity) are exogenous, that is, they are independent with respect to the regressors in the random effects equation. We have already fit random effects models using the {lme4} packages previously.

In {plm}, the same function we used for fixed effects can be used for random effects, but setting the argument model= to ‘random’ and selecting the random.method as one out of four possibilities: “swar” (default), “amemiya”, “walhus”, or “nerlove”:

wage.random <- plm(lwage ~ educ + exper + I(exper^2) + black, 
  model = "random",
  random.method = "swar",
  data = nlspd)
summary(wage.random)
Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

Call:
plm(formula = lwage ~ educ + exper + I(exper^2) + black, data = nlspd, 
    model = "random", random.method = "swar")

Balanced Panel: n = 716, T = 5, N = 3580

Effects:
                  var std.dev share
idiosyncratic 0.03857 0.19638  0.25
individual    0.11592 0.34047  0.75
theta: 0.7502

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-1.2666036 -0.0928955  0.0057325  0.0984554  2.0978278 

Coefficients:
               Estimate  Std. Error z-value  Pr(>|z|)    
(Intercept)  0.47585529  0.08084927  5.8857 3.964e-09 ***
educ         0.07488688  0.00547943 13.6669 < 2.2e-16 ***
exper        0.05638773  0.00560490 10.0604 < 2.2e-16 ***
I(exper^2)  -0.00108166  0.00022148 -4.8838 1.041e-06 ***
black       -0.13627562  0.02973398 -4.5832 4.580e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    167.23
Residual Sum of Squares: 137.85
R-Squared:      0.17566
Adj. R-Squared: 0.17474
Chisq: 761.809 on 4 DF, p-value: < 2.22e-16
model_parameters(wage.random)
# Fixed Effects

Parameter   | Coefficient |       SE |         95% CI |     z |   df |      p
-----------------------------------------------------------------------------
(Intercept) |        0.48 |     0.08 | [ 0.32,  0.63] |  5.89 | 3575 | < .001
educ        |        0.07 | 5.48e-03 | [ 0.06,  0.09] | 13.67 | 3575 | < .001
exper       |        0.06 | 5.60e-03 | [ 0.05,  0.07] | 10.06 | 3575 | < .001
exper^2     |   -1.08e-03 | 2.21e-04 | [ 0.00,  0.00] | -4.88 | 3575 | < .001
black       |       -0.14 |     0.03 | [-0.19, -0.08] | -4.58 | 3575 | < .001

Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald z-distribution approximation.

Tabulating all the models:

We can now collate the models into a summary tabe to compare them more closely:

modelsummary::modelsummary(list(wage.pooled, wage.within, wage.random))
 (1)   (2)   (3)
(Intercept) 0.427 0.476
(0.057) (0.081)
educ 0.075 0.075
(0.003) (0.005)
exper 0.063 0.055 0.056
(0.008) (0.006) (0.006)
I(exper^2) −0.001 −0.001 −0.001
(0.000) (0.000) (0.000)
black −0.136 −0.136
(0.015) (0.030)
Num.Obs. 3580 3580 3580
R2 0.288 0.131 0.176
R2 Adj. 0.287 −0.087 0.175
AIC 3467.1 −2290.1 −1488.2
BIC 3504.2 −2271.6 −1451.1
RMSE 0.39 0.18 0.20

We can compare these results from those obtaied using other hierarchical modelling functions, such as the lmer() function we used in earlier examples and exercises.

References

Mitchell, Jeffrey. 2021. “Social Trust and Anti-Immigrant Attitudes in Europe: A Longitudinal Multi-Level Analysis.” Frontiers in Sociology 6.
Slides
Exercises