Week 3 Worksheet

Learning outcomes

By the end of the session, you should be familiar with:

how to create and interpret cross-tabulations (contingency tables)
how to create some of the most commonly used plots to visually summarise relationships between two or more variables
the basic intuition behind “associations” among variables

Intro

We will continue where we left off last week, completing any exercises that remained unfinished. Then, using the same datasets that we downloaded last week (EVS7, ESS10), we tabulate or plot (as appropriate for the given data type) the relationship between the “social trust” variable and some other variables. We will then use a country-level dataset (available here) to reproduce the association between inequality and social trust presented in Figure 4.1 in Wilkinson and Pickett (2010).

Exercise 0: Your module data and analysis folder

In Week 2 you created a folder for this module on your institutional OneDrive storage drive (e.g. C:\OneDrive - Newcastle University\SOC2069) and within that a sub-folder called “Data”. You used that sub-folder to save the datasets downloaded as part of the exercises in the Week 2 Worksheet. If you haven’t go back to the Week 2 Worksheet and follow the guidance there to set up your folder structure for future use.

Exercise 1: Contingency tables

In Week 2 Worksheet - Exercise 3 we explored univariate distributions (i.e. single variables). As part of that exercise, you used country-specific data from the “World Values Survey, Wave 7”, which you downloaded directly from the WVS website. You may also have saved your work (and modified dataset) from Week 2 as a .jaspwith the name “wvs7-example.jasp” (or anything else that you found useful) as instructed in Week 2 Worksheet - Exercise 3 - Point 8. If so, you can open that file for this exercise and continue where you left off. Otherwise, follow the instructions in Week 2 Worksheet - Exercise 1 and 3 to download the WVS dataset (if needed) and load it into JASP.

As part of Week 2 Worksheet - Exercise 3 you have probably created a frequency table of the “social trust” variable. In this exercise, you will check how those frequencies are distributed across the levels of another categorical variables. For this purpose contingency tables are a useful tool.

Open the “WVS7” dataset. You should now see something like this:

Using the survey questionnaire, identify the following two variables in the dataset: “social trust” and “sex”. In the original dataset these should be named Q57 and Q260, respectively (however, you may have already renamed the “social trust” variable as part of last week’s exercise). Using the Descriptives > Descriptive Statistics menu option in JASP, create frequency tables for these two variables. You can request frequency tables for your variables in the Tables sub-option:

Questions

What percentage of the respondents in your dataset had answered that “Most people can be trusted”?
What is the percentage of female respondents in your dataset?
What are the variables’ measurement level (Column type)? Is that correct?

Tip

If the two variables were recorded as “Scale” in your dataset, then it is useful to change the column type (measurement level) to “Categorical”, which is the correct variable type for these two variables. We can do this by clicking on the Edit Data menu tab and scrolling horizontally to the variable of interest, then clicking on the data type icon next to the variable name and selecting the appropriate type for that variable; e.g.:

Once you change the variable type, you will notice that the value labels appear in the frequency tables instead of their numeric values, making it easier to understand.

If the variables were recognised as “Ordinal” by the software, then the labels are also identified and used in outputs, just like with “Nominal” variables.

As a next step, we are interested in finding out the distribution of social trust among men and women. For this, we will create a contingency table. Use the Frequencies > [Classical] Contingency Tables menu option. Move the “social trust” variable to the Rows field and the “sex” variable to the Columns field.
Interpret the Contingency Tables part of the output (you can hide the Chi-Squared Tests table from the output by un-ticking the \(\chi^2\) option under the Statistics sub-menu). Expand the Cells sub-menu and experiment with the options.

Questions

How many women in your dataset answered “Need to be very careful”?
What percentage of the men in your dataset answered that “Most people can be trusted”?
What percentage of those who answered “Most people can be trusted” are men, and what percentage are women?
Based on your cross-tabulation, do women or men in your dataset have a higher level of “social trust”

Solutions

The cross-tabulation should look something like this:

791 women in this dataset answered “Need to be very careful?”
To get percentages, we need to tick the desired Cell options. If we are interested in the percentage of the row variable distribution within each of the column categories, we would request “Column” percentages:

This will look within the men category (i.e. Male adds up to 100%) and tell us that among them 48.67% stated that “Most people can be trusted”
To get the percentage of a column variable distribution within each of the row categories, we would request “Row” percentages instead, and find that among those who answered “Most people can be trusted” 44.89% are men and 55.11% are women in this dataset:
To sum up the cross-tabulation, we see that overall there are more women than men in this dataset, but men have a somewhat higher level of social trust (48.7% say that “Most people can be trusted” as opposed to 45.2% of the women)

Now find a few other “Nominal” or “Ordinal” variables in the dataset and add them to the analysis. Repeat point 4 above with other combinations of variables. What happens if you add a third variable to Layers?

Exercise 2: Scatterplots

For this exercise and the next, we will use a different dataset: a dataset containing country-level (macro-) variables describing aggregate features of various societies. The dataset can be downloaded from this page, or through the link below (where you can also read a brief description of the dataset):

Trust & Inequality: trust_inequality.dta – Download

This dataset combines data on “generalised/social trust” from the latest waves of the World Values Survey and the European Values Study with macro(country)-level data on World Development Indicators (WDI) provided by the World Bank. The main variables of interest taken from the WDI refer to measurements of economic inequality within countries. The dataset allows to replicate - using the latest available data - the analysis of the relationship between inequality and trust presented in Chapter 4 (“Community life and social relations”, pp. 49-62) of Wilkinson and Pickett (2010).

Download the dataset to your Data (sub)folder and open it in JASP
Create summary descriptive statistics for all the variables in the dataset (Descriptives > Descriptive Statistics menu option).

Tip

Because there are only a few variables in the dataset, you can select them all by clicking one of them, then Ctrl + A, and moving them all over to the Variables box

Because we have quite a few variables to display, it’s better to transpose (rotate) the table so that the variable names are listed in the rows and the summary statistics in the columns. Tick the Transpose descriptives table box for this:

Create a new descriptive analysis in your JASP session (Descriptives > Descriptive Statistics menu option). This will keep the summary statistics table you created earlier. In this new analysis, play around with the univariate descriptions that you are already familiar with and create a few frequency tables and plots for each variable. Make sure to chose the most appropriate tabulation/visualisation option for the given variable type
Now let’s explore the relationship between “inequality” and “social trust” using a scatter plot.

Create another new descriptive analysis in your JASP session (Descriptives > Descriptive Statistics menu option), to keep your univariate descriptives above intact on the output page.
Move the two variables over to the Variables box. The variable measuring “social trust” at the country level is trust_pct (% of respondents in the given country who answered “Most people can be trusted” to the standard social trust question in the WVS/EVS), while the variable measuring “inequality” is inequality_s80s20 (a commonly employed measure of income inequality representing the ratio between the total net disposable income of the 20% of people having the highest income (S80) and the total net disposable income of the 20% of people having the lowest income (S20))
Expand the Customizable plots option and tick Scatter plots
Play around with the options under Scatter plots to simplify the plot (e.g. remove the univariate distributions of the variables displayed on the margins/changed them to histograms, remove the regression line, etc.). You can also manually increase the size of the plot by dragging the bottom-right corner

Questions

What is the scatter plot telling us? Note down everything that comes to your mind.

Solutions

The scatterplot would look something like this:

We see a somewhat negative, downward-tilting trend in the relationship between trust and inequality

Now play around with the variables and check the association between some other Scale type variables in the dataset using scatterplots.

Questions

What happens if you add the Region variable to the Split box?
Try to interpret what this tri-variate plot is telling us

Solutions

The tri-variate scatter-plot would look something like this:

The scatterplot is not very easy to read, but we get a sense that there is a negative, downward-tilting trend in the relationship between trust and inequality. We also see that countries in “Europe & Central Asia” tend to have lower values on the inequality scale and they are also more represented at the higher end of the social trust scale.

Exercise 3: Box plots

Create another new descriptive analysis in your JASP session (Descriptives > Descriptive Statistics menu option), to keep your previous analysis above intact on the output page.
We will now check the distribution of “social trust” in different world regions using box plots. Move the trust_pct variable to the Variables box and the Region variable to the Split field. You can resize the output graph so that the Region labels are easier to see.

Questions

What is the box plot telling us? Note down everything that comes to your mind. You can use the lecture slides to remind yourself of the information contained in box plots.

Solutions

The boxplot would look something like this:

The number of regions is a bit too high and the labels are long, so it’s not easy to get a good visual representation, but we see differences between Regions in their median level of social trust (the black horizontal line inside the box) and also find that some regions have higher variability than other Regions (the height of the boxes and the “whiskers”). Much of this variability,however, is also due to the number of countries within each region, with some regions having only a few countries within them.

Exercise 4: Continue your analysis for Assignment 1

Building on the work you have done in Week 2 Worksheet - Exercise 4, open the dataset you have used to address one of the questions below.

Identify (the) two main variables relevant to the question, perform univariate descriptive statistics, and check their relationship using one of the plots/tabulations practised in the previous exercises above.
Select some other variables of different types and check their association with your main dependent variable using the plots/tabulations practised in the previous exercises above.

Reminder of the research questions to choose from to address in Assignment 1:

Are religious people more satisfied with life?
Are older people more likely to see the death penalty as justifiable?
What factors are associated with opinions about future European Union enlargement among Europeans?
Is higher internet use associated with stronger anti-immigrant sentiments?
How does victimisation relate to trust in the police?
What factors are associated with belief in life after death?
Are government/public sector employees more inclined to perceive higher levels of corruption than those working in the private sector?

For now, choose one question that you find most sympathetic (you don’t need to stick with it for the assignment, but you could if you wanted to!). All of the questions can be answered with at least one of the survey datasets that you downloaded (the “WVS7” or “ESS10”) and often they both contain relevant variables.

References

Wilkinson RG and Pickett K (2010) The Spirit Level: Why Greater Equality Makes Societies Stronger. New York: Bloomsbury Press.