Quantitative analysis
HSS8005 • Intermediate/Advanced stream • 2023
Newcastle University (UK)
Module leader
Teaching Assistants
- Bilal Alsharif
- Fengting Du
Session dates
- Thursdays
- Check on your Timetable app
- Lecture: 10:00-11:30
- Labs: 13:00-14:30 (Group 03)
14:30-16:00 (Group 04)
Assessment
- 26th April 2023
- 3,500-word long research report
- Submit to Turnitin via Canvas
Chris’s mastodon feed
where he posts stuff of interest to #HSS8005
Module overview
This module is offered by School X - Researcher Education and Development to postgraduate students within the Faculty of Humanities and Social Sciences at Newcastle University. The module aims to provide a broad applied introduction to more advanced methods in quantitative analysis for students from various disciplinary backgrounds. See the module plan page for details about the methods covered. The course content consists of eight lectures (1.5 hours each) and eight IT labs (1.5 hours) . The course stands on three pillars: application, reproducibility and computation.
Application: we will work with real data originating from large-scale representative surveys or published research, with the aim of applying methods to concrete research scenarios. IT lab exercises will involve reproducing small bits of published research, using the data and (critically) the modelling approaches used by the authors. The aim is to see how methods have been used in practice in various disciplines and learn how to reproduce (and potentially improve) those analyses. This will then enable students to apply this knowledge to their own research questions. The data used in IT labs may be cleansed to allow focusing more on modelling tasks than on data wrangling, but exercises will address some of the more common data manipulation challenges and will cover essential functions. Data cleansing scripts will also be provided so that interested students can use them in their own work.
Reproducibility: developing a reproducible workflow that allows your future self or a reviewer of your work to understand your process of analysis and reproduce your results is essential for reliable and collaborative scientific research. We enforce the ideas and procedures of reproducible research both through replicating published research (see above) and in our practice (in the IT labs and the assignment). For an overview of why it’s important to develop a reproducible workflow early on in your research career and how to do it using (some) of the tools used in this module, read Chapter 3 of TSD (see Resources>Readings). It’s also worth reading through Kieran Healy’s The Plain Person’s Guide to Plain Text Social Science, although there are now better software options than those discussed there. In this course, we will be using a suite of well-integrated free and open-source software to aid our reproducible workflow: the statistical programming language and its currently most popular dialect – the {tidyverse}
– via the IDE for data analysis, and for scientific writing and publishing (see Resources>Software).
Computation: the development of computational methods underpins the application of the most important statistical ideas of the past 50 years (see Andrew Gelman’s article on these developments here or an online workshop talk here; Richard McElreath’s great talk on Science as Amateur Software Development is well worth watching too). This module aims to develop basic computational skills that allow the application of complex statistical models to practical scientific problems without advanced mathematical knowledge, and which lay the foundation on which students can then pursue further learning and research in computational humanities and social sciences.
The course and the website were written and are maintained by Chris Moreh.
Prerequisites
To benefit the most from this module, students are expected to have a foundational level of knowledge in quantitative methods: a good understanding of data types and distributions, familiarity with inferential statistics, and some exposure to linear regression. This is roughly equivalent to the content covered in the Introductory stream of the module or a textbook such as OpenIntro Statistics (which you can download for free in PDF).
Those who don’t feel completely up to date with linear regression but are determined to advance more quickly and read/practice beyond the compulsory material during weeks 1-3 are also encouraged to sign up.
Those with a stronger background in multiple linear regression (e.g. students with undergraduate-level training in econometrics) will still benefit from weeks 1-3 as the approach we are taking is probably different from the one they are familiar with.
No previous knowledge of or command-based statistical analysis software is needed. Gaining experience with using statistical software is part of the skills development aims of the module. However, it is not a general data science module, and the IT labs will cover a very limited number of functions (from both base R
, the tidyverse
and other reliable user-written packages) that are most useful for tackling specific analysis tasks. Students are advised to complete some additional self-paced free online training in the use of the software, such as Data Carpentry’s R for Social Scientists, and to consult Wickham, Çetinkaya-Rundel and Grolemund’s R for Data Science (2nd ed.) online book.