Week 7 Worksheet Notes

Temporalities: Time series, panel and longitudinal data analysis

Aims

By the end of the session you will learn how to:

fit random-intercepts and random coefficients models to hierarchical data that contains a temporal dimension
use several R packages for modelling time-series data

R packages

# Just in case we get errors asking that the package repositories be explicitly set when installing new packages:

options(repos = list(CRAN = "http://cran.rstudio.com/"))

# Install and load required packages

# We can use the {pacman} package to easily install and load several packages:
# ({pacman} itself may need installing if not yet installed)

pacman::p_load(
  tidyverse, sjlabelled, easystats, 
  ggformula, ggeffects, marginaleffects, 
  modelsummary, gtsummary, sjPlot,
  survey, sandwich, lmtest, lme4,            # for general hierarchical modelling
  panelr, plm, pglm)                         # for time-series hierarchical data structures

Example 1: basic tabulations of hierarchical and time-series structures

Let’s import this simulated toy dataset:

test <- data_read("https://cgmoreh.github.io/HSS8005-24/data/test.dta")

Reading data...

Variables where all values have associated labels are now converted into
  factors. If this is not intended, use `convert_factors = FALSE`.

1 out of 9 variables were fully labelled and converted into factors.

we can peek at the data to get a feel for the variables:

datawizard::data_peek(test) |> print_md()

Data frame with 375 rows and 9 variables
Variable	Type	Values
teacher	numeric	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
student	numeric	1, 1, 1, 1, 1, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, …
age	numeric	14, 15, 16, 17, 18, 14, 15, 16, 17, 18, 14, …
cage	numeric	-2, -1, 0, 1, 2, -2, -1, 0, 1, 2, -2, -1, 0, …
sex	factor	Male, Male, Male, Male, Male, Male, …
score	numeric	64, 68, 69, 71, 75, 58, 60, 71, 69, 72, 54, …
pass	numeric	0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, …
experience	numeric	5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
zero	numeric	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

We can also check the first few and last few observations in a dataframe using the head() and tail() functions included in base-R. The {gt} package also has a nice function (gt_preview()) that can do the same more flexibly and output it as a single table:

gt::gt_preview(test, top_n = 3, bottom_n = 3, incl_rownums = TRUE)

	teacher	student	age	cage	sex	score	pass	experience	zero
1	1	1	14	-2	Male	64	0	5	0
2	1	1	15	-1	Male	68	0	5	0
3	1	1	16	0	Male	69	0	5	0
4..372
373	3	75	16	0	Female	81	1	8	0
374	3	75	17	1	Female	85	1	8	0
375	3	75	18	2	Female	86	1	8	0

This allows us to see more clearly how teachers, students and ages relate to each other and how they are nested.

To check more directly the number of students and teachers in the dataset, we could use the n_unique() function from {dplyr} in the {tidyverse}:

n_unique(test$student)

[1] 75

n_unique(test$teacher)

[1] 3

We can also check the distribution of the variables; we can break the sex variable (a categorical variable) into its two constituent categories and treat it as numeric for this purpose:

test |> datawizard::to_numeric() |> 
  datawizard::describe_distribution()

Variable   |  Mean |    SD | IQR |          Range | Skewness | Kurtosis |   n | n_Missing
-----------------------------------------------------------------------------------------
teacher    |  2.00 |  0.82 |   2 |   [1.00, 3.00] |     0.00 |    -1.50 | 375 |         0
student    | 38.00 | 21.68 |  38 |  [1.00, 75.00] |     0.00 |    -1.20 | 375 |         0
age        | 16.00 |  1.42 |   2 | [14.00, 18.00] |     0.00 |    -1.30 | 375 |         0
cage       |  0.00 |  1.42 |   2 |  [-2.00, 2.00] |     0.00 |    -1.30 | 375 |         0
sex.Female |  0.51 |  0.50 |   1 |   [0.00, 1.00] |    -0.03 |    -2.01 | 375 |         0
sex. Male  |  0.49 |  0.50 |   1 |   [0.00, 1.00] |     0.03 |    -2.01 | 375 |         0
score      | 68.71 |  7.20 |  10 | [51.00, 87.00] |     0.14 |    -0.28 | 375 |         0
pass       |  0.39 |  0.49 |   1 |   [0.00, 1.00] |     0.46 |    -1.80 | 375 |         0
experience |  6.33 |  1.25 |   3 |   [5.00, 8.00] |     0.38 |    -1.50 | 375 |         0
zero       |  0.00 |  0.00 |   0 |   [0.00, 0.00] |          |          | 375 |         0

Finally, it’s worth looking at the timeline of how test scores change across time (i.e. ages) for

ggplot(test, aes(x= age, score)) +
  theme_bw() +
  geom_line(aes(group= student), color= "blue")

Example 2: (Mitchell 2021): longitudinal multi-level analysis

The article by Mitchell (2021) uses the European Social Survey (2002–2016) to test how differences in social trust, both within and between countries influence attitudes about immigrants. The ESS is a cross-national survey with representative samples of 34 countries, all of them are included in this study. Three items were included in each of the eight waves of the ESS to measure attitudes about immigrants:

“Would you say it is generally bad or good for [country]’s economy that people come to live here from other countries?” (0= “Bad for the economy,” 10 = “Good for the economy”).
“Would you say that [country]’s cultural life is generally undermined or enriched by people coming to live here from other countries?” (0 = “Cultural life undermined,” 10= “Cultural life enriched”).
“Is [country] made a worse or a better place to live by people coming to live here from other countries?,” (0= “Worse place to live,” 10 = “Better place to live”).

An index was created by averaging responses to each of the three questions.

The main independent variable measures generalized trust. At the individual level, trust is measured through the question, “generally speaking, would you say that most people can be trusted, or that you can’t be too careful in dealing with people?” On a 10 point scale (0= “You cant be too careful,” 10= “most people can be trusted”). To isolate individuals’ trust levels in a way that is relative to the respondents’ country, responses were centered against the average level of generalized trust in that country at that time. This means that, the variable included in the analysis is the level of difference people say that they can trust others, compared to others in their country during the survey wave.

Due to the structure of the data, a multi-level modeling approach that nests respondents inside of country-years and counties was employed for this study. This allows for both the estimation of the relationship between contextual variables of interest and anti-immigrant attitudes both cross-sectionally and longitudinally. Since the responses to the surveys are in part dependent on the groups that the respondents belong, models that accommodate for this country and country-year grouping are better suited for this analysis than models with no grouping structure.

The analysis follows a model building approach where Model 1 is an “empty” model for the dependent variable and its nesting structure in the three level model. Model 2 tests the relationship between the individual level independent variables and the dependent variable measuring respondents’ attitudes about immigrants. Model 3 adds the country level independent variables, including country level social trust. In the first three models random intercepts are used in estimation, and Model 4 adds random slopes to the estimates in the three- level models. These models are presented in Table 1 of the article (see there).

First we load the already prepared dataset:

mitchell <- data_read("https://cgmoreh.github.io/HSS8005-24/data/mitchell2021.rds")

Reading data...

Variables where all values have associated labels are now converted into
  factors. If this is not intended, use `convert_factors = FALSE`.

18 out of 70 variables were fully labelled and converted into factors.

Following 1 variables are empty:
  edulvlb
  
Use `remove_empty_columns()` to remove them from the data frame.

Model 1:

#Null model ------------------------------------------------------------------------------------------------------
m1  <- lmer(immatt~ 1 + 
            (1|cntry/cntryyr), data=mitchell)

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00277535 (tol = 0.002, component 1)

m1.1<- lmer(immatt~ 1 + 
              (1|cntry), data=mitchell)

m1.2<- lmer(immatt~ 1 + 
              (1|cntryyr), data=mitchell)

m1.3<- lmer(immatt~ 1 + 
              (1|cntry) +
              (1|cntry:cntryyr),
            data=mitchell)

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00277535 (tol = 0.002, component 1)

m1.4<- lmer(immatt~ 1 + 
              (1|cntry) +
              (1|cntryyr:cntry),
            data=mitchell)

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00277535 (tol = 0.002, component 1)

summary(m1)

Linear mixed model fit by REML ['lmerMod']
Formula: immatt ~ 1 + (1 | cntry/cntryyr)
   Data: mitchell

REML criterion at convergence: 1334089

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.3198 -0.6271  0.0599  0.6635  3.5527 

Random effects:
 Groups        Name        Variance Std.Dev.
 cntryyr:cntry (Intercept) 0.07828  0.2798  
 cntry         (Intercept) 0.61843  0.7864  
 Residual                  4.03754  2.0094  
Number of obs: 314934, groups:  cntryyr:cntry, 200; cntry, 34

Fixed effects:
            Estimate Std. Error t value
(Intercept)   4.9262     0.1368   36.02
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00277535 (tol = 0.002, component 1)

parameters::compare_models(m1, m1.1, m1.2, m1.3, effects = "all")

# Fixed Effects

Parameter   |                m1 |              m1.1 |              m1.2 |              m1.3
-------------------------------------------------------------------------------------------
(Intercept) | 4.93 (4.66, 5.19) | 4.92 (4.65, 5.19) | 4.99 (4.88, 5.10) | 4.93 (4.66, 5.19)

# Random Effects

Parameter                     |        m1 |      m1.1 |      m1.2 |      m1.3
-----------------------------------------------------------------------------
SD (Residual)                 | 2.01 (, ) | 2.03 (, ) | 2.01 (, ) | 2.01 (, )
SD (Intercept: cntry)         | 0.79 (, ) | 0.80 (, ) |           | 0.79 (, )
SD (Intercept: cntryyr:cntry) | 0.28 (, ) |           |           |          
SD (Intercept: cntryyr)       |           |           | 0.76 (, ) |          
SD (Intercept: cntry:cntryyr) |           |           |           | 0.28 (, )

modelsummary::modelsummary(list("cntry/cntryyr" = m1, m1.1, m1.2, m1.3, "cntry+cntryyr:cntry" = m1.4))

	cntry/cntryyr				cntry+cntryyr:cntry
(Intercept)	4.926	4.918	4.990	4.926	4.926
	(0.137)	(0.138)	(0.054)	(0.137)	(0.137)
SD (Observations)	2.009	2.025	2.009	2.009	2.009
SD (Intercept cntry)	0.786	0.804		0.786	0.786
SD (Intercept cntryyrcntry)	0.280				0.280
SD (Intercept cntryyr)			0.763
SD (Intercept cntrycntryyr)				0.280
Num.Obs.	314934	314934	314934	314934	314934
R2 Marg.	0.000	0.000	0.000	0.000	0.000
R2 Cond.	0.147	0.136	0.126	0.147	0.147
AIC	1334097.2	1338521.2	1334365.6	1334097.2	1334097.2
BIC	1334139.9	1338553.2	1334397.6	1334139.9	1334139.9
ICC	0.1	0.1	0.1	0.1	0.1
RMSE	2.01	2.03	2.01	2.01	2.01

Model 2:

#Model with individual level variables ----------------------------------------------------------------------------------
m2<-lmer(immatt~  ppltrustd+ young+ old + university + gndr+ hincfel+ lrscale+(1|cntry)+  (1|cntryyr), data=mitchell)
summary(m2)

Linear mixed model fit by REML ['lmerMod']
Formula: immatt ~ ppltrustd + young + old + university + gndr + hincfel +  
    lrscale + (1 | cntry) + (1 | cntryyr)
   Data: mitchell

REML criterion at convergence: 1096546

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9489 -0.6138  0.0431  0.6506  4.3466 

Random effects:
 Groups   Name        Variance Std.Dev.
 cntryyr  (Intercept) 0.07645  0.2765  
 cntry    (Intercept) 0.46891  0.6848  
 Residual             3.43915  1.8545  
Number of obs: 268995, groups:  cntryyr, 198; cntry, 34

Fixed effects:
                                         Estimate Std. Error t value
(Intercept)                              5.374154   0.121853  44.103
ppltrustd                                0.195515   0.001637 119.448
young                                    0.158414   0.010878  14.563
old                                     -0.284244   0.009190 -30.931
university                               0.706875   0.009983  70.805
gndr                                    -0.023206   0.007199  -3.223
hincfelCoping on present income         -0.279744   0.009005 -31.064
hincfelDifficult on present income      -0.535303   0.012056 -44.403
hincfelVery difficult on present income -0.803543   0.017141 -46.879
lrscale1                                 0.085591   0.030502   2.806
lrscale2                                -0.655995   0.027094 -24.212
lrscale3                                 0.202945   0.025126   8.077
lrscale4                                 0.167045   0.023010   7.260
lrscale5                                 0.071751   0.022953   3.126
lrscale6                                -0.267820   0.020911 -12.808
lrscale7                                -0.155967   0.023003  -6.780
lrscale8                                -0.241845   0.022773 -10.620
lrscale9                                -0.335819   0.023464 -14.312
lrscaleRight                            -0.488616   0.029146 -16.765


Correlation matrix not shown by default, as p = 19 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

Model 3:

#Full model with context level variables ---------------------------------------------------------------------------------
m3<-lmer(immatt~  ppltrustd+ young +old + university+ gndr+ hincfel+ lrscale+ 
           countrytrustD+ countrytrustM+ nwolD+ nwolM + lGDPD+
           lGDPM + PctFBD + PctFBM+ essround + 
           (1|cntry) +  
           (1|cntryyr), 
         data=mitchell)
summary(m3)

Linear mixed model fit by REML ['lmerMod']
Formula: immatt ~ ppltrustd + young + old + university + gndr + hincfel +  
    lrscale + countrytrustD + countrytrustM + nwolD + nwolM +  
    lGDPD + lGDPM + PctFBD + PctFBM + essround + (1 | cntry) +  
    (1 | cntryyr)
   Data: mitchell

REML criterion at convergence: 1036621

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9669 -0.6108  0.0419  0.6506  4.3838 

Random effects:
 Groups   Name        Variance Std.Dev.
 cntryyr  (Intercept) 0.06101  0.2470  
 cntry    (Intercept) 0.30562  0.5528  
 Residual             3.40292  1.8447  
Number of obs: 254959, groups:  cntryyr, 189; cntry, 32

Fixed effects:
                                         Estimate Std. Error t value
(Intercept)                             10.244812   4.465218   2.294
ppltrustd                                0.197809   0.001682 117.605
young                                    0.140804   0.011116  12.666
old                                     -0.280120   0.009353 -29.950
university                               0.711735   0.010111  70.389
gndr                                    -0.033242   0.007357  -4.518
hincfelCoping on present income         -0.280695   0.009103 -30.834
hincfelDifficult on present income      -0.534626   0.012359 -43.256
hincfelVery difficult on present income -0.807173   0.017778 -45.403
lrscale1                                 0.069075   0.031374   2.202
lrscale2                                -0.733225   0.028132 -26.063
lrscale3                                 0.184455   0.025777   7.156
lrscale4                                 0.134272   0.023604   5.689
lrscale5                                 0.031266   0.023557   1.327
lrscale6                                -0.320745   0.021487 -14.928
lrscale7                                -0.209380   0.023602  -8.871
lrscale8                                -0.302706   0.023374 -12.951
lrscale9                                -0.399881   0.024124 -16.576
lrscaleRight                            -0.564745   0.030231 -18.681
countrytrustD                            0.731459   0.118322   6.182
countrytrustM                            0.375491   0.146294   2.567
nwolD                                   -0.021017   0.009739  -2.158
nwolM                                   -0.098709   0.037864  -2.607
lGDPD                                    0.114872   0.361877   0.317
lGDPM                                   -0.592714   0.465571  -1.273
PctFBD                                   0.017653   0.020988   0.841
PctFBM                                   0.012114   0.021314   0.568
essround                                -0.025951   0.016855  -1.540


Correlation matrix not shown by default, as p = 28 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

Model 4:

#Full model with random slopes for between country trust -----------------------------------------------------------------
# due to convergence issues the REML=FALSE but the estimates are the same
m4<-lmer(immatt~  ppltrustd+ young +old + university+ gndr+ hincfel+ lrscale+ 
           countrytrustD+ countrytrustM+ nwolD+ nwolM + lGDPD+
           lGDPM + PctFBD + PctFBM+ essround + 
           (countrytrustD|cntry) +  
           (1|cntryyr), 
         data=mitchell)

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00286425 (tol = 0.002, component 1)

summary(m4)

Linear mixed model fit by REML ['lmerMod']
Formula: immatt ~ ppltrustd + young + old + university + gndr + hincfel +  
    lrscale + countrytrustD + countrytrustM + nwolD + nwolM +  
    lGDPD + lGDPM + PctFBD + PctFBM + essround + (countrytrustD |  
    cntry) + (1 | cntryyr)
   Data: mitchell

REML criterion at convergence: 1036619

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9681 -0.6107  0.0419  0.6507  4.3840 

Random effects:
 Groups   Name          Variance Std.Dev. Corr
 cntryyr  (Intercept)   0.06002  0.2450       
 cntry    (Intercept)   0.30433  0.5517       
          countrytrustD 0.04453  0.2110   1.00
 Residual               3.40292  1.8447       
Number of obs: 254959, groups:  cntryyr, 189; cntry, 32

Fixed effects:
                                         Estimate Std. Error t value
(Intercept)                             11.494698   4.152133   2.768
ppltrustd                                0.197812   0.001682 117.607
young                                    0.140817   0.011116  12.667
old                                     -0.280137   0.009353 -29.952
university                               0.711716   0.010111  70.388
gndr                                    -0.033238   0.007357  -4.518
hincfelCoping on present income         -0.280658   0.009103 -30.830
hincfelDifficult on present income      -0.534571   0.012359 -43.252
hincfelVery difficult on present income -0.807094   0.017778 -45.399
lrscale1                                 0.069090   0.031374   2.202
lrscale2                                -0.733242   0.028132 -26.064
lrscale3                                 0.184458   0.025777   7.156
lrscale4                                 0.134287   0.023604   5.689
lrscale5                                 0.031286   0.023557   1.328
lrscale6                                -0.320695   0.021487 -14.925
lrscale7                                -0.209369   0.023602  -8.871
lrscale8                                -0.302694   0.023374 -12.950
lrscale9                                -0.399874   0.024124 -16.576
lrscaleRight                            -0.564760   0.030231 -18.681
countrytrustD                            0.728344   0.123881   5.879
countrytrustM                            0.408231   0.138753   2.942
nwolD                                   -0.021657   0.009677  -2.238
nwolM                                   -0.106870   0.034795  -3.071
lGDPD                                    0.011592   0.358608   0.032
lGDPM                                   -0.723821   0.433199  -1.671
PctFBD                                   0.018206   0.021093   0.863
PctFBM                                   0.010639   0.020044   0.531
essround                                -0.022654   0.016835  -1.346


Correlation matrix not shown by default, as p = 28 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00286425 (tol = 0.002, component 1)

Tabulating all the models:

We can use the same modelsummary() function we have already used before:

modelsummary::modelsummary(list(m1, m2, m3, m4))

	(1)	(2)	(3)	(4)
(Intercept)	4.926	5.374	10.245	11.495
	(0.137)	(0.122)	(4.465)	(4.152)
ppltrustd		0.196	0.198	0.198
		(0.002)	(0.002)	(0.002)
young		0.158	0.141	0.141
		(0.011)	(0.011)	(0.011)
old		−0.284	−0.280	−0.280
		(0.009)	(0.009)	(0.009)
university		0.707	0.712	0.712
		(0.010)	(0.010)	(0.010)
gndr		−0.023	−0.033	−0.033
		(0.007)	(0.007)	(0.007)
hincfelCoping on present income		−0.280	−0.281	−0.281
		(0.009)	(0.009)	(0.009)
hincfelDifficult on present income		−0.535	−0.535	−0.535
		(0.012)	(0.012)	(0.012)
hincfelVery difficult on present income		−0.804	−0.807	−0.807
		(0.017)	(0.018)	(0.018)
lrscale1		0.086	0.069	0.069
		(0.031)	(0.031)	(0.031)
lrscale2		−0.656	−0.733	−0.733
		(0.027)	(0.028)	(0.028)
lrscale3		0.203	0.184	0.184
		(0.025)	(0.026)	(0.026)
lrscale4		0.167	0.134	0.134
		(0.023)	(0.024)	(0.024)
lrscale5		0.072	0.031	0.031
		(0.023)	(0.024)	(0.024)
lrscale6		−0.268	−0.321	−0.321
		(0.021)	(0.021)	(0.021)
lrscale7		−0.156	−0.209	−0.209
		(0.023)	(0.024)	(0.024)
lrscale8		−0.242	−0.303	−0.303
		(0.023)	(0.023)	(0.023)
lrscale9		−0.336	−0.400	−0.400
		(0.023)	(0.024)	(0.024)
lrscaleRight		−0.489	−0.565	−0.565
		(0.029)	(0.030)	(0.030)
countrytrustD			0.731	0.728
			(0.118)	(0.124)
countrytrustM			0.375	0.408
			(0.146)	(0.139)
nwolD			−0.021	−0.022
			(0.010)	(0.010)
nwolM			−0.099	−0.107
			(0.038)	(0.035)
lGDPD			0.115	0.012
			(0.362)	(0.359)
lGDPM			−0.593	−0.724
			(0.466)	(0.433)
PctFBD			0.018	0.018
			(0.021)	(0.021)
PctFBM			0.012	0.011
			(0.021)	(0.020)
essround			−0.026	−0.023
			(0.017)	(0.017)
SD (Intercept cntry)	0.786	0.685	0.553	0.552
SD (countrytrustD cntry)				0.211
Cor (Intercept~countrytrustD cntry)				1.000
SD (Observations)	2.009	1.854	1.845	1.845
SD (Intercept cntryyr)		0.276	0.247	0.245
SD (Intercept cntryyrcntry)	0.280
Num.Obs.	314934	268995	254959	254959
R2 Marg.	0.000	0.108	0.163	0.165
R2 Cond.	0.147	0.230	0.244	0.246
AIC	1334097.2	1096590.3	1036683.2	1036685.1
BIC	1334139.9	1096821.4	1037007.1	1037029.9
ICC	0.1	0.1	0.1	0.1
RMSE	2.01	1.85	1.84	1.84

Or the {sjPlot} package would replicate the output presented in the article exactly:

sjPlot::tab_model(m1, m2, m3, m4)

	Immigration bad or good for country's economy			Immigration bad or good for country's economy			Immigration bad or good for country's economy			Immigration bad or good for country's economy
Predictors	Estimates	CI	p	Estimates	CI	p	Estimates	CI	p	Estimates	CI	p
(Intercept)	4.93	4.66 – 5.19	<0.001	5.37	5.14 – 5.61	<0.001	10.24	1.49 – 19.00	0.022	11.49	3.36 – 19.63	0.006
Most people can be trusted or you can't be too careful				0.20	0.19 – 0.20	<0.001	0.20	0.19 – 0.20	<0.001	0.20	0.19 – 0.20	<0.001
young				0.16	0.14 – 0.18	<0.001	0.14	0.12 – 0.16	<0.001	0.14	0.12 – 0.16	<0.001
old				-0.28	-0.30 – -0.27	<0.001	-0.28	-0.30 – -0.26	<0.001	-0.28	-0.30 – -0.26	<0.001
university				0.71	0.69 – 0.73	<0.001	0.71	0.69 – 0.73	<0.001	0.71	0.69 – 0.73	<0.001
Gender				-0.02	-0.04 – -0.01	0.001	-0.03	-0.05 – -0.02	<0.001	-0.03	-0.05 – -0.02	<0.001
Feeling about household's income nowadays: Coping on present income				-0.28	-0.30 – -0.26	<0.001	-0.28	-0.30 – -0.26	<0.001	-0.28	-0.30 – -0.26	<0.001
Feeling about household's income nowadays: Difficult on present income				-0.54	-0.56 – -0.51	<0.001	-0.53	-0.56 – -0.51	<0.001	-0.53	-0.56 – -0.51	<0.001
Feeling about household's income nowadays: Very difficult on present income				-0.80	-0.84 – -0.77	<0.001	-0.81	-0.84 – -0.77	<0.001	-0.81	-0.84 – -0.77	<0.001
Placement on left right scale: 1				0.09	0.03 – 0.15	0.005	0.07	0.01 – 0.13	0.028	0.07	0.01 – 0.13	0.028
Placement on left right scale: 2				-0.66	-0.71 – -0.60	<0.001	-0.73	-0.79 – -0.68	<0.001	-0.73	-0.79 – -0.68	<0.001
Placement on left right scale: 3				0.20	0.15 – 0.25	<0.001	0.18	0.13 – 0.23	<0.001	0.18	0.13 – 0.23	<0.001
Placement on left right scale: 4				0.17	0.12 – 0.21	<0.001	0.13	0.09 – 0.18	<0.001	0.13	0.09 – 0.18	<0.001
Placement on left right scale: 5				0.07	0.03 – 0.12	0.002	0.03	-0.01 – 0.08	0.184	0.03	-0.01 – 0.08	0.184
Placement on left right scale: 6				-0.27	-0.31 – -0.23	<0.001	-0.32	-0.36 – -0.28	<0.001	-0.32	-0.36 – -0.28	<0.001
Placement on left right scale: 7				-0.16	-0.20 – -0.11	<0.001	-0.21	-0.26 – -0.16	<0.001	-0.21	-0.26 – -0.16	<0.001
Placement on left right scale: 8				-0.24	-0.29 – -0.20	<0.001	-0.30	-0.35 – -0.26	<0.001	-0.30	-0.35 – -0.26	<0.001
Placement on left right scale: 9				-0.34	-0.38 – -0.29	<0.001	-0.40	-0.45 – -0.35	<0.001	-0.40	-0.45 – -0.35	<0.001
Placement on left right scale: Right				-0.49	-0.55 – -0.43	<0.001	-0.56	-0.62 – -0.51	<0.001	-0.56	-0.62 – -0.51	<0.001
countrytrust D							0.73	0.50 – 0.96	<0.001	0.73	0.49 – 0.97	<0.001
countrytrust M							0.38	0.09 – 0.66	0.010	0.41	0.14 – 0.68	0.003
nwol D							-0.02	-0.04 – -0.00	0.031	-0.02	-0.04 – -0.00	0.025
nwol M							-0.10	-0.17 – -0.02	0.009	-0.11	-0.18 – -0.04	0.002
l GDPD							0.11	-0.59 – 0.82	0.751	0.01	-0.69 – 0.71	0.974
l GDPM							-0.59	-1.51 – 0.32	0.203	-0.72	-1.57 – 0.13	0.095
Pct FBD							0.02	-0.02 – 0.06	0.400	0.02	-0.02 – 0.06	0.388
Pct FBM							0.01	-0.03 – 0.05	0.570	0.01	-0.03 – 0.05	0.596
ESS round							-0.03	-0.06 – 0.01	0.124	-0.02	-0.06 – 0.01	0.178
Random Effects
σ²	4.04			3.44			3.40			3.40
τ₀₀	0.08 _{cntryyr:cntry}			0.08 _cntryyr			0.06 _cntryyr			0.06 _cntryyr
	0.62 _cntry			0.47 _cntry			0.31 _cntry			0.30 _cntry
τ₁₁										0.04 _{cntry.countrytrustD}
ρ₀₁										1.00 _cntry
ICC	0.15			0.14			0.10			0.10
N	200 _cntryyr			34 _cntry			32 _cntry			32 _cntry
	34 _cntry			198 _cntryyr			189 _cntryyr			189 _cntryyr
Observations	314934			268995			254959			254959
Marginal R² / Conditional R²	0.000 / 0.147			0.108 / 0.230			0.163 / 0.244			0.165 / 0.246

And we can also plot the results from a model of interest:

sjPlot::plot_model( m4)

Example 3: Longer panel data estimation with `{plm}`

Panel data gathers information about several individuals (cross-sectional units) over several periods. The panel is balanced if all units are observed in all periods; if some units are missing in some periods, the panel is unbalanced. A wide panel has the cross-sectional dimension (N) much larger than the longitudinal dimension (T); when the opposite is true, we have a long panel. Normally, the same units are observed in all periods; when this is not the case and each period samples mostly other units, the result is not a proper panel data, but pooled cross-sections model.

In the previous examples our panels were fairly short. In this exercise, we use longer panel data to also demonstrate some additional functionalities in the {plm} package. The dataset is more typical of econometric data and originates from Hill et al.’s book “Principles of Econometrics”.

We can load the dataset from the module website and inspect its contents:

nls <- data_read("https://cgmoreh.github.io/HSS8005-24/data/nls_panel.rda")

Reading data...

Variables where all values have associated labels are now converted into
  factors. If this is not intended, use `convert_factors = FALSE`.

0 out of 18 variables were fully labelled and converted into factors.

we can peek at the data to get a feel for the variables:

datawizard::data_peek(nls) |> print_md()

Data frame with 3580 rows and 18 variables
Variable	Type	Values
id	numeric	1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, …
year	numeric	82, 83, 85, 87, 88, 82, 83, 85, 87, 88, 82, …
lwage	numeric	1.808289, 1.863417, 1.789367, 1.84653, …
hours	numeric	38, 38, 38, 40, 40, 48, 43, 35, 42, 42, 48, …
age	numeric	30, 31, 33, 35, 37, 36, 37, 39, 41, 43, 35, …
educ	numeric	12, 12, 12, 12, 12, 17, 17, 17, 17, 17, 12, …
collgrad	numeric	0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, …
msp	numeric	1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
nev_mar	numeric	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
not_smsa	numeric	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
c_city	numeric	1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
south	numeric	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
black	numeric	1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
union	numeric	1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, …
exper	numeric	7.666667, 8.583333, 10.17949, 12.17949, …
exper2	numeric	58.77777, 73.67361, 103.622, 148.3399, …
tenure	numeric	7.666667, 8.583333, 1.833333, 3.75, 5.25, …
tenure2	numeric	58.77777, 73.67361, 3.361111, 14.0625, …

gt::gt_preview(nls, top_n = 3, bottom_n = 3, incl_rownums = TRUE)

	id	year	lwage	hours	age	educ	collgrad	msp	nev_mar	not_smsa	c_city	south	black	union	exper	exper2	tenure	tenure2
1	1	82	1.808289	38	30	12	0	1	0	0	1	0	1	1	7.666667	58.77777	7.666667	58.777770
2	1	83	1.863417	38	31	12	0	1	0	0	1	0	1	1	8.583333	73.67361	8.583333	73.673610
3	1	85	1.789367	38	33	12	0	0	0	0	1	0	1	1	10.179490	103.62200	1.833333	3.361111
4..3577
3578	716	85	1.427116	40	39	12	0	1	0	1	0	1	0	0	15.435900	238.26690	6.083333	37.006950
3579	716	87	1.494368	30	41	12	0	1	0	1	0	1	0	0	17.435900	304.01050	8.166667	66.694450
3580	716	88	1.341422	40	42	12	0	1	0	1	0	1	0	0	18.858970	355.66090	9.583333	91.840270

To check more directly the number of ids and years in the dataset, we could use the n_unique() function from {dplyr} in the {tidyverse}:

n_unique(nls$id)

[1] 716

n_unique(nls$year)

[1] 5

We can also check the distribution of the variables:

datawizard::describe_distribution(nls)

Variable |   Mean |     SD |    IQR |          Range | Skewness | Kurtosis |    n | n_Missing
---------------------------------------------------------------------------------------------
id       | 358.50 | 206.72 | 358.50 | [1.00, 716.00] |     0.00 |    -1.20 | 3580 |         0
year     |  85.00 |   2.28 |   4.00 | [82.00, 88.00] |     0.00 |    -1.57 | 3580 |         0
lwage    |   1.92 |   0.46 |   0.64 |   [0.14, 4.25] |     0.26 |     0.49 | 3580 |         0
hours    |  38.03 |   7.97 |   2.00 |  [1.00, 80.00] |    -1.20 |     3.85 | 3580 |         0
age      |  35.88 |   3.89 |   6.00 | [28.00, 46.00] |     0.11 |    -0.62 | 3580 |         0
educ     |  13.02 |   2.44 |   3.00 |  [4.00, 18.00] |     0.24 |     0.15 | 3580 |         0
collgrad |   0.23 |   0.42 |   0.00 |   [0.00, 1.00] |     1.28 |    -0.36 | 3580 |         0
msp      |   0.66 |   0.47 |   1.00 |   [0.00, 1.00] |    -0.67 |    -1.55 | 3580 |         0
nev_mar  |   0.14 |   0.35 |   0.00 |   [0.00, 1.00] |     2.07 |     2.30 | 3580 |         0
not_smsa |   0.30 |   0.46 |   1.00 |   [0.00, 1.00] |     0.88 |    -1.23 | 3580 |         0
c_city   |   0.30 |   0.46 |   1.00 |   [0.00, 1.00] |     0.86 |    -1.25 | 3580 |         0
south    |   0.42 |   0.49 |   1.00 |   [0.00, 1.00] |     0.31 |    -1.91 | 3580 |         0
black    |   0.28 |   0.45 |   1.00 |   [0.00, 1.00] |     0.97 |    -1.06 | 3580 |         0
union    |   0.26 |   0.44 |   1.00 |   [0.00, 1.00] |     1.07 |    -0.86 | 3580 |         0
exper    |  12.03 |   3.86 |   5.35 |  [1.06, 27.19] |     0.02 |    -0.13 | 3580 |         0
exper2   | 159.60 |  95.46 | 128.86 | [1.12, 739.42] |     0.92 |     1.32 | 3580 |         0
tenure   |   6.95 |   5.17 |   7.83 |  [0.00, 24.75] |     0.64 |    -0.41 | 3580 |         0
tenure2  |  75.01 |  93.66 | 101.83 | [0.00, 612.56] |     1.74 |     2.91 | 3580 |         0

Panel datsets can be organized in mainly two forms: the long form has a column for each variable and a row for each individual-period; the wide form has a column for each variable-period and a row for each individual. Most panel data methods require the long form, but many data sources provide one wide-form table for each variable; assembling the data from different sources into a long form data frame is often not a trivial matter.

The next code sequence creates a panel structure for the dataset using the function pdata.frame() of the {plm} package and displays a small part of this datase:

nlspd <- pdata.frame(nls, index=c("id", "year"))

The function pdim() extracts the dimensions of the panel data:

pdim(nlspd)

Balanced Panel: n = 716, T = 5, N = 3580

In the following, we will fit a series of models estimating wage as a function of education, experience (and a second-order polynomial for experience) and race (“black” vs. “white”), with the aim of demonstrating functions in the plm() package. We could also fit these models using the hierarchical modelling functions form previous examples and exercises.

Pooled model

A pooled model does not allow for intercept or slope differences among individuals. Such a model can be estimated with plm() using the specification pooling:

wage.pooled <- plm(lwage ~ educ + exper + I(exper^2) + black, 
  model="pooling", 
  data = nlspd)

summary(wage.pooled)

Pooling Model

Call:
plm(formula = lwage ~ educ + exper + I(exper^2) + black, data = nlspd, 
    model = "pooling")

Balanced Panel: n = 716, T = 5, N = 3580

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-1.7043039 -0.2469447 -0.0062387  0.2421563  2.6619636 

Coefficients:
               Estimate  Std. Error t-value  Pr(>|t|)    
(Intercept)  0.42706205  0.05691619  7.5033 7.807e-14 ***
educ         0.07451334  0.00274756 27.1198 < 2.2e-16 ***
exper        0.06272218  0.00797946  7.8605 5.026e-15 ***
I(exper^2)  -0.00122307  0.00032263 -3.7910 0.0001525 ***
black       -0.13616209  0.01485167 -9.1681 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    772.56
Residual Sum of Squares: 550.24
R-Squared:      0.28777
Adj. R-Squared: 0.28697
F-statistic: 361.111 on 4 and 3575 DF, p-value: < 2.22e-16

model_parameters(wage.pooled)

# Fixed Effects

Parameter   | Coefficient |       SE |         95% CI | t(3575) |      p
------------------------------------------------------------------------
(Intercept) |        0.43 |     0.06 | [ 0.32,  0.54] |    7.50 | < .001
educ        |        0.07 | 2.75e-03 | [ 0.07,  0.08] |   27.12 | < .001
exper       |        0.06 | 7.98e-03 | [ 0.05,  0.08] |    7.86 | < .001
exper^2     |   -1.22e-03 | 3.23e-04 | [ 0.00,  0.00] |   -3.79 | < .001
black       |       -0.14 |     0.01 | [-0.17, -0.11] |   -9.17 | < .001


Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

Fixed effects model

The fixed effects model takes into account individual differences, translated into different intercepts of the regression line for different individuals. Variables that change little or not at all over time, such as some individual characteristics should not be included in a fixed effects model because they produce collinearity with the fixed effects.

To estimate a fixed effects with {plm}, we can use the option model=“within”, and dummy variables for individuals are created automatically in the background and included in the model:

wage.within <- plm(lwage ~ educ + exper + I(exper^2) + black, 
  model="within", 
  data = nlspd)

summary(wage.within)

Oneway (individual) effect Within Model

Call:
plm(formula = lwage ~ educ + exper + I(exper^2) + black, data = nlspd, 
    model = "within")

Balanced Panel: n = 716, T = 5, N = 3580

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-1.1798141 -0.0693879  0.0052198  0.0789143  1.9969132 

Coefficients:
              Estimate  Std. Error t-value  Pr(>|t|)    
exper       0.05521048  0.00580433  9.5120 < 2.2e-16 ***
I(exper^2) -0.00105655  0.00022838 -4.6263 3.888e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    126.95
Residual Sum of Squares: 110.37
R-Squared:      0.13057
Adj. R-Squared: -0.087246
F-statistic: 214.901 on 2 and 2862 DF, p-value: < 2.22e-16

model_parameters(wage.within)

# Fixed Effects

Parameter | Coefficient |       SE |         95% CI | t(2862) |      p
----------------------------------------------------------------------
exper     |        0.06 | 5.80e-03 | [ 0.04,  0.07] |    9.51 | < .001
exper^2   |   -1.06e-03 | 2.28e-04 | [ 0.00,  0.00] |   -4.63 | < .001


Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

As we can see in the results above, the “race” variable was excluded from the model because it does not vary over time (it stays constant “within” an individual). This is one major limitation of so-called fixed effects (“within”) models.

The within method is equivalent to including the individual “dummies” in the model.

Random effects

The random effects model elaborates on the fixed effects model by recognizing that, since the individuals in the panel are randomly selected, their characteristics, measured by the intercept \(\beta_{1i}\) should also be random. Random effects estimators are reliable under the assumption that individual characteristics (heterogeneity) are exogenous, that is, they are independent with respect to the regressors in the random effects equation. We have already fit random effects models using the {lme4} packages previously.

In {plm}, the same function we used for fixed effects can be used for random effects, but setting the argument model= to ‘random’ and selecting the random.method as one out of four possibilities: “swar” (default), “amemiya”, “walhus”, or “nerlove”:

wage.random <- plm(lwage ~ educ + exper + I(exper^2) + black, 
  model = "random",
  random.method = "swar",
  data = nlspd)

summary(wage.random)

Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

Call:
plm(formula = lwage ~ educ + exper + I(exper^2) + black, data = nlspd, 
    model = "random", random.method = "swar")

Balanced Panel: n = 716, T = 5, N = 3580

Effects:
                  var std.dev share
idiosyncratic 0.03857 0.19638  0.25
individual    0.11592 0.34047  0.75
theta: 0.7502

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-1.2666036 -0.0928955  0.0057325  0.0984554  2.0978278 

Coefficients:
               Estimate  Std. Error z-value  Pr(>|z|)    
(Intercept)  0.47585529  0.08084927  5.8857 3.964e-09 ***
educ         0.07488688  0.00547943 13.6669 < 2.2e-16 ***
exper        0.05638773  0.00560490 10.0604 < 2.2e-16 ***
I(exper^2)  -0.00108166  0.00022148 -4.8838 1.041e-06 ***
black       -0.13627562  0.02973398 -4.5832 4.580e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    167.23
Residual Sum of Squares: 137.85
R-Squared:      0.17566
Adj. R-Squared: 0.17474
Chisq: 761.809 on 4 DF, p-value: < 2.22e-16

model_parameters(wage.random)

# Fixed Effects

Parameter   | Coefficient |       SE |         95% CI |     z |   df |      p
-----------------------------------------------------------------------------
(Intercept) |        0.48 |     0.08 | [ 0.32,  0.63] |  5.89 | 3575 | < .001
educ        |        0.07 | 5.48e-03 | [ 0.06,  0.09] | 13.67 | 3575 | < .001
exper       |        0.06 | 5.60e-03 | [ 0.05,  0.07] | 10.06 | 3575 | < .001
exper^2     |   -1.08e-03 | 2.21e-04 | [ 0.00,  0.00] | -4.88 | 3575 | < .001
black       |       -0.14 |     0.03 | [-0.19, -0.08] | -4.58 | 3575 | < .001


Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald z-distribution approximation.

Tabulating all the models:

We can now collate the models into a summary tabe to compare them more closely:

modelsummary::modelsummary(list(wage.pooled, wage.within, wage.random))

	(1)	(2)	(3)
(Intercept)	0.427		0.476
	(0.057)		(0.081)
educ	0.075		0.075
	(0.003)		(0.005)
exper	0.063	0.055	0.056
	(0.008)	(0.006)	(0.006)
I(exper^2)	−0.001	−0.001	−0.001
	(0.000)	(0.000)	(0.000)
black	−0.136		−0.136
	(0.015)		(0.030)
Num.Obs.	3580	3580	3580
R2	0.288	0.131	0.176
R2 Adj.	0.287	−0.087	0.175
AIC	3467.1	−2290.1	−1488.2
BIC	3504.2	−2271.6	−1451.1
RMSE	0.39	0.18	0.20

We can compare these results from those obtaied using other hierarchical modelling functions, such as the lmer() function we used in earlier examples and exercises.

References

Mitchell, Jeffrey. 2021. “Social Trust and Anti-Immigrant Attitudes in Europe: A Longitudinal Multi-Level Analysis.” Frontiers in Sociology 6.

Aims

R packages

Example 1: basic tabulations of hierarchical and time-series structures

Example 2: (Mitchell 2021): longitudinal multi-level analysis

Model 1:

Model 2:

Model 3:

Model 4:

Tabulating all the models:

Example 3: Longer panel data estimation with {plm}

Pooled model

Fixed effects model

Random effects

Tabulating all the models:

References

Example 3: Longer panel data estimation with `{plm}`