Week 3 Worksheet Exercises
Categories: Logistic regression and other generalised linear models
Aims
This session introduces binary logistic regression models. These models are the simplest form of a broader class of models called generalised linear models, which are applicable when the outcome (“dependent”, “response”, “explained”, etc.) variable cannot be assumed to follow a Gaussian (i.e. “normal”) distribution, but it instead a bounded or discrete measurement (e.g. think of variables whose values cannot be negative - i.e. have a lower limit of 0 - or fall into discrete categories such as “yes”/“no”, “disagree”/“neither agree nor disagree”/“agree”, or “blue”/“green”/“black”/“brown”/“other”). Binary logistic regression is the simplest case, where the outcome can take only two values (therefore “binary”). However, the logic that underpins it is similar to that of other generalised linear models.
By the end of the session you will learn how to:
- Fit and summarise logistic regression models in
R
- Interpret results from logistic regression models
- Manipulate the regression output to ease interpretation
- Plot and visualise results from logistic regression models to aid interpretation
Setup
In Week 1 you set up R
and RStudio
, and an RProject folder (we called it “HSS8005_labs”) with an .R
script and a .qmd
or .Rmd
document in it (we called these “Lab_1”). Ideally, you saved this on a cloud drive so you can access it from any computer (e.g. OneDrive). You will be working in this folder. If it’s missing, complete Exercise 3 from the Week 1 Worksheet.
Create a new Quarto markdown file (.qmd
) for this session (e.g. “Lab_3.qmd”) and work in it to complete the exercises and report on your final analysis.
Exercise 0: Load (and install) R packages needed for this lab
Using functions learnt in Week 1, load (and install, if needed) the following R packages:
Exercise 2: Refit the model for two single countries
You will carry out this exercise on your own, and you’ll make two adjustments compared to the previous exercise. Instead of treating the entire dataset as undifferentiated, originating from one single population, we will acknowledge the fact that the data originate from various countries and that the local socio-cultural context has an impact on social behaviours and attitudes. To account for this, re-fit the logistic regression model from the previous exercise in two different ways:
1. fit the same model as before, but add the *country* variable to the model as a covariate;
2. select *two* countries of your choice, reduce the dataset to that country and fit the model on that data;
In order to select countries from the data, you will need to use another function, filter()
, which lets us select rows (cases) given some criteria.