Week 2 Worksheet

Learning outcomes

By the end of the session, you should be familiar with:

some important cross-national survey programmes
how to find, download and read data documentation, including survey questionnaires
the challenges of measuring sociological concepts
understanding the JASP interface

Intro

The lecture has introduced the complexities of measuring sociological concepts. Throughout this short course, we will be using the measurement and estimation of social trust as a guiding example. However, the aim is to help you build up your skills and confidence in asking and addressing research questions of your own, and the assessment questions will ask you to analyse some other chosen research topic.

In this first IT workshop, you will begin your data analysis journey by exploring some online sources of cross-national sociological data available for secondary analysis, and you will practice performing basic descriptive analyses of selected variables.

Exercise 0: Create a folder structure for this module

Create a folder for this module on your institutional OneDrive (e.g. C:\OneDrive - Newcastle University\SOC2069). Within that folder, create a sub-folder called “Data”. You will then be able to save data files and documentation as you progress.

Exercise 1: Explore the World Values Survey data documentation

About 20 minutes

Navigate to the The World Values Survey (WVS) website and read about the World Values Survey.

Don’t spend too much time browsing the website, but make sure you can answer the following questions:

When was the WVS started, and how many waves of data collection have there been so far?
In which year(s) did the most recent wave of data collection take place?
How many countries were covered by the most recent wave?
Can you name some of the topics covered by the WVS?
What population are the national samples representative of?
What is the general minimum sample size per country?
Is the WVS data freely available for academic research purposes?
Do you need to cite/reference the WVS data if you use it in your research?
Bonus: what is the difference between a time series dataset and a panel dataset, and which of the two can the WVS provide? (tip: look under Data and Documentation > Data Download > Timeseries (1981-2022) for a concise discussion)

Navigate to Data and Documentation > Data Download > Wave 7 (2017-2022) page and explore the Questionnaire and Documentation. Download the Master Questionnaire and the Codebook documents and using the search function identify all the survey questions and variables that relate to “trust”.

Questions

How many questions relating to trust have you identified in the questionnaires?
What is the difference between the questionnaire and the codebook?
In how many different ways is trust measured in the WVS?
What are, in your opinion, the advantages and disadvantages of the different measurements of trust?

Within the “Data” sub-folder you created earlier, create another sub-folder called “WVS7”. You can now save the Master Questionnaire and the Codebook into that folder, and because it is stored on the institutional OneDrive, you will always have access to it on any computer, once you log in to your institutional Microsoft Windows account.
In the Select a country panel on the right side of the Wave 7 (2017-2022) page select one country at random and under Data files download the country-specific dataset (for consistency, download the first data type option, CSV (i.e. comma separated values)) to the “WVS7” subfolder you created earlier.

Exercise 3: Describing “trust” using JASP

About 30 minutes

Open the JASP software from the Start menu. If you are using your own laptop, you can install the latest version of JASP from https://jasp-stats.org/.

The opening page should look something like this:

To open a dataset stored on your computer, you can navigate to the three horizontal bars (“hamburger”) menu icon > Open > Computer > Browse.

Note

To be able to see and load the data you downloaded from the WVS and ESS, you need to first extract the files from the compressed folder. Navigate to your Data folder outside the JASP data import window and extract your files, then return to the JASP window to open one of them.

Open the “WVS7” dataset. You should now see something like this:

You can scroll up-down and left-right in the dataset to have a look at the spreadsheet and its contents.

In JASP you can work on one dataset at a time (or in one window). When you finish your analysis, you can save both the data and the outputs you generated as one .jasp file, which you can then open later and continue or alter your analyses.

Today, we will explore some descriptive statistics using the Descriptives > Descriptive Statistics menu option. If you click through, you should see something like this:

Using the survey questionnaire, identify the variable coding “social/generalised trust” and move that variable to the Variables field
Apart from the default Descriptive Statistics table, select Basic plots > Distribution plots as well and interpret the distribution of the variable in your dataset

Questions

How many levels (categories) does the variable have in your dataset? is it what you had expected based on the questionnaire information?
Does the variables have any missing values in your dataset?
What is the “Mode” of the variable, and what does that mean?
What is the “Mean” of the variable, and what does that mean? Is it a useful statistic for this variable?
What percentage of the respondents in your dataset had answered that “Most people can be trusted”?

Solution

Using the WVS7 dataset for the United Kingdom, and having identified the “social trust” variable from the Questionnaire and/or Codebook to be named “Q57” in the dataset, we obtain the following descriptive statistics:

There appear to be 3 levels in the variables. Form the questionnaire we know that there should be 2 valid categories only: 1 (Most people can be trusted) and 2 (Need to be very careful);
From the Codebook we know that there may be some other values coded by the researchers as well: -1 for “Don´t know”, -2 for “No answer”, -4 for “Not asked” and -5 for “Missing”. These are coded with negative values in order to stand out as non-standard categories of answers. They code various reasons for why values may be “missing” in the dataset. In the UK dataset we are using here we have three of these values present;
By looking at the distribution plot, we see that the tallest bar (the largest answer category) is the one coded as 2. The largest (most common) category of a categorical variables is called the “mode”. We can also request for the “mode” the “median” and other potentially informative summary statistics to also be included in the Descriptive Statistics table by ticking the appropriate boxes under the Statistics option-bar in the Descriptive Statistics builder on the left hand side:

According to the Descriptive Statistics table the mean of the variable is 1.488. However, in this case this is not a sensible statistic. First of all, we have a categorical variable, so an average value between 1 (Most people can be trusted) and 2 (Need to be very careful) does not mean much. If anything, in the case of only two categories, we know that if category 1 and category 2 are of equal size (i.e. the same number of people have answered each of them), then the mean would be \({1 + 2 \over 2} = 1.5\), so a mean of 1.48 would indicate that the category coded as 1 is slightly larger. However, secondly, in our case we also have the several negative values that are also added up in the calculation of the mean. Under these circumstances, the mean cannot tell us anything accurate or useful.
To get a more precise percentage of the distribution of the answer options (categories) across the variable, we would need to request a Frequency table in the Descriptive Statistics builder, which we can do under the Tables option-bar:

The result would be:

We therefore find that almost 46% of the respondents in the dataset had answered that “Most people can be trusted”. However, this also counts with all the “missing” responses. If we would like to get a more accurate values for the “Valid Percent” (i.e. the percentage distribution among only those with valid response options, 1 or 2), then we will need to tell JASP to consider the negative values as “missing”.

If you see more than two levels/categories in your variable (and you are sure that you have chosen the correct variable), it means that in your dataset the variable contains some custom missing values. Missing values should be distinguished with a negative sign (e.g. -2). To set them as custom missing values, we can edit the dataset manually.

Click on the Edit Data menu tab and scroll horizontally to your variable. Double-click on the variable column. In the middle tabs, check the label editor and identify any values that should be set as missing (e.g. -2). Click on the Missing values tab > Use custom values and in the narrow field to the right type in the value you want to set as missing and click the + sign. You should be seeing something like below:

You can check back in the Label editor that the redundant value is not longer there.

In the same data editor window we can also make other changes to our variables. For example, let’s change the non-informative variable Name to something that better reflects the meaning of the variable, for example “soc-trust”, and let’s also give it a more descriptive Long name and even a Description if we want to. The Long name could be something like “Social trust”, and for the Description we could copy the original survey question out from the questionnaire: “Q57. Generally speaking, would you say that most people can be trusted or that you need to be very careful in dealing with people?”.

In the Label editor we should also attach labels to the values so that we can better read the outputs produced. We can copy the labels from the questionnaire here too: “1 = Most people can be trusted” and “2 = Need to be very careful”.

Your data editor window should look something like this:

Question

The variable measurement level (Column type) appears as Ordinal. Is that correct? If you feel that the variable type should be changed, you can also do that here using the drop-down menu next to Column type.

To return to your data analysis, you can click on the Analyses menu tab. If you have changed the variable’s name, you can move it back to the Variables field to see the updated outputs.

Find the other variables in the dataset that relate to social, interpersonal or institutional trust, and obtain similar descriptive statistics for them. Try to answer to yourself the same questions as above.
Add your notes to the Results output. By clicking on the small black down-arrowheads that appear next to the headings in the Results window when you hover over them with your mouse you can access small menu options that allow you to do various operations with the outputs (copy, save, edit, etc.), including the possibility to Add note. Your note will appear under the item to which it is added, and you can type in your note. The field acts as a basic text editor, which you can use to jot down your interpretation of the results and keep them close to the output. You can add in here your answers to some of the questions above.

Your results and notes could look something like this (based on the WVS7 dataset for Andorra):

Same your analysis. Click through Hamburger menu tab > Save as > Computer > Browse and save the analysis in your “WVS7” folder with the name “wvs7-example.jasp” (or anything else that you find useful). Once it’s saved, you can close the analysis. You can now open the .jasp file you saved and continue or modify the analysis you have started.

Tip

For simplicity we have downloaded the dataset in CSV format, which is a simple plain text format that can be opened with text editors or spreadsheet tools such as Microsoft Excel. However, some other formats that belong to proprietary statistical software packages like SPSS or Stata are likely to store more information (e.g. variable and value label, etc.), and downloading the data in that format and importing it to JASP may make it easier to explore the data without needing to do too many manual changes.

You can experiment by downloading the data in another format (Stata appears to be the most stable for importing to JASP) and checking the data editor window.

Now open a new JASP session and import the “ESS10” dataset you downloaded earlier, and perform a similar descriptive analysis on variables related to social, interpersonal and institutional trust you have done above. When complete, save that analysis to a .jasp file too.

Exercise 4: Begin your analysis for Assignment 1!

Below are some research questions that you can choose from to address in Assignment 1:

Are religious people more satisfied with life?
Are older people more likely to see the death penalty as justifiable?
What factors are associated with opinions about future European Union enlargement among Europeans?
Is higher internet use associated with stronger anti-immigrant sentiments?
How does victimisation relate to trust in the police?
What factors are associated with belief in life after death?
Are government/public sector employees more inclined to perceive higher levels of corruption than those working in the private sector?

For now, choose one question that you find most sympathetic (you don’t need to stick with it for the assignment, but you could if you wanted to!). All of the questions can be answered with at least one of the survey datasets that you downloaded (the “WVS7” or “ESS10”) and often they both contain relevant variables.

Identify your “explanandum” - i.e. the core phenomenon/concept/behaviour/etc. that the research question aims to explain. The questions all postulate a relationship/association between two or more variables (the topic of next week), but for now, think carefully about the question and how it is formulated, and identify which is the variable that will be the target of explanation, and which variable (if mentioned) will be used for explaining it. For example, in the research question “Does education increase social trust?”, the variable we are interested in explaining is “social trust”, while “education” is the variable that we will use to explain it. In later weeks we will develop better vocabulary to describe associations between variables.
Once the core phenomenon to be explained is identified, look through the two survey questionnaires to identify any variables that might exist in the dataset that captures it. This may require some trial-and-error with testing out search words.
Once you have found one (or several) candidate variable(s), navigate to the relevant survey website and select a single country for which to download data. You will be working with single-country datasets for your assignment. Download the dataset, import it into JASP, find the relevant variable and perform some descriptive analysis on the chosen variable as you have done in the previous exercise.
Make sure to add your noted and interpretations on the analysis results and save your analysis for later. You could create a new sub-folder for your “Assignment 1” work and save your analysis there for future use. If you end up liking your chosen question, you can continue this analysis next week.

Week 2 Worksheet

Learning outcomes

Intro

Exercise 0: Create a folder structure for this module

Exercise 1: Explore the World Values Survey data documentation

Exercise 2: Explore the European Social Survey data documentation

Exercise 3: Describing “trust” using JASP

Exercise 4: Begin your analysis for Assignment 1!