Week 3 — Coding playground

Curves: Logistic regression and other generalised linear models

Description

This coding playground introduces some basic R operations. Its aim is to introduce at a basic level those elements of the R programming language that will be most useful for the applied purposes of this course. It is therefore not a comprehensive introduction to R, but a pragmatic one.

This page contains interactive WebR code fields. If the WEBR STATUS below is 🟢Ready!, then the fields are ready to evaluate the code within them. You can edit the code in the fields as necessary, even completely replacing the default content. The ‘Start over’ refresh button at the top right end of the fields reinstates the original content. The code in the fields can also be copied, either by manually selecting it or using the copy button at the top right end of the fields.

R packages

Using functions learnt in Week 1, load (and install, if needed) the following R packages:

Data

The data required for the analysis in the application lab can be downloaded freely from the World Values Survey website.

Download Wave 5 (2005-2009) and Wave 6 (2010-2014) data from there.

We can also find the questionnaires and the codebooks on the survey website, which will help identify the relevant variables.

Data management

The code below imports and selects the data from the original WVS dataset and creates a data-frame called wvs56 that contains all the variables needed:

## Paths to files --------------------------------------------------------------------------------------------------------------

### Data was downloaded in SPSS (.sav) format from the WVS website (https://www.worldvaluessurvey.org/WVSContents.jsp)
### Downloaded data files were extracted and saved in the folder "raw" within the folder "data"
### Create paths to the data files are created from the RProject root folder; cross-check the created path string variable:

wvs5_path <- here::here("data", "raw", "WV5_Data_Spss_v20180912.sav")
wvs6_path <- here::here("data", "raw", "WV6_Data_sav_v20201117.sav")


## Select variables ------------------------------------------------------------------------------------------------------------

### Create a vector storing names for the variables that will be used, based on Wu (2021: 1170-1172):

wu_vars <- c("country", "year", "trust", "risktaker", "education", "sex", "age", "income", "marstat", "employment")

### Create vectors storing the original names of the relevant variables in the WVS5 and WVS6 datasets:

wvs5_vars <- c("V2", "V260", "V23", "V86", "V238", "V235", "V237", "V253", "V55", "V241")
wvs6_vars <- c("V2", "V262", "V24", "V76", "V248", "V240", "V242", "V239", "V57", "V229")


## Read in the data files -----------------------------------------------------------------------------------------------------

wvs5 <- read_spss(wvs5_path) |> 
  data_select(select = c(wvs5_vars)) 

wvs6 <- read_spss(wvs6_path) |> 
  data_select(select = c(wvs6_vars))


## Read in the data files -----------------------------------------------------------------------------------------------------

### Create codebooks to check variables
### Comparing the codebooks we see that the coding of all the variables is similar
### WVS6 has shorter value labels on some variables

wvs5_codebook <- wvs5 |> 
  data_codebook()

wvs6_codebook <- wvs6 |> 
  data_codebook() 

 
## Rename variables -----------------------------------------------------------------------------------------------------------

### It's safe to replace the original var names with more human-readable names; we use `datawizard::data_rename()`
### Replace the value codes of `country` and `year` with the labels; will make merging datasets less error-prone

wvs5 <- wvs5 |> 
  data_rename(c(wvs5_vars), c(wu_vars)) |> 
  labels_to_levels(c("country", "year"))

wvs6 <- wvs6 |> 
  data_rename(c(wvs6_vars), c(wu_vars)) |> 
  labels_to_levels(c("country", "year"))

## Merge data files ------------------------------------------------------------------------

### Now variable names are the same in both datasets, the two can be joined
### `datawizard::data_merge(..., join = "bind")` is similar to `dplyr::bind_rows()` but also copies value labels from the first-mentioned dataset
### We mention the `wvs6` dataset first to keep its (shorter) version of value labels

wvs56 <- data_merge(wvs6, wvs5, join = "bind")

## Create new `country_year` variable -----------------------------------------------------------------------------------------

wvs56 <- data_unite(wvs56, new_column = "country_year", c("country", "year"), append = TRUE)

It’s a good idea to save the created dataset to a local folder so you have access to it later:

## Save the dataset as an .rds file -------------------------------------------------------
## Check out the {here} package, which makes it easier to specify folders regardless of the operating system

saveRDS(wvs56, "enter/path/to/the/desired/folder/and/give/the/file/a/name/and/extension", compress = "bzip2")

# e.g. saveRDS(wvs56, "data/wvs56.rds", compress = "bzip2")