Topic plan and materials
Topic details
Topic | Description |
---|---|
Week 1 Gamblers, God, Guinness and peas A brief history of statistics |
In the first contribution to a series of articles on the history of probability and statistics in the journal Biometrika, Florence Nightingale David (1955) (no linear relationship with the famous social reformer) paraphrased a contemporary archaeologist who quipped that “a symptom of decadence in a civilization is when men become interested in their own history”, giving the interest in his own discipline as proof of the validity of his statement. David, however, thought that this does not stand true also for scientists’ and statisticians’ own emerging interest in their disciplines. He was right, in that the critical examination of the intellectual development of statistics and probability theory that followed has improved the discipline by excavating ideas that had been buried by mainstream statistics, but he was also mistaken, in that this activity threw light on the decadence of mainstream statistical practice. In this lecture we will look back on the development of some basic statistical concepts and learn about the ideas and preoccupations that influenced them over the centuries. The aim of this overview is to build up essential intuition about the concepts and methods that we will learn later. Brains-on activities will include casting astragali, fighting Laplace’s Demon, tasting tea, and comparing peas in a pod. By the end, we will gain a clearer understanding of the limits of statistical analysis and the dangers of not acknowledging those limits. |
Week 2 Revisiting Flatland A review of general linear models |
In Edwin Abbott’s 1884 novella, the inhabitants of Flatland are geometric shapes living in a two-dimensional world, incapable of imagining the existence of higher dimensions. A sphere passing through the plain of their world is a fascinating but incomprehensible event: Flatlanders can only see a dot becoming a circle, increasing in circumference, then shrinking back in size and disappearing. There are, in this universe, worlds with even more limited views, like the one-dimensional Lineland and the zero-dimensional Pointland. Any attempt to expand the perspective of their inhabitant(s) is doomed to failure. But as in any good adventure story, a chosen Flatland native embarks on a journey of discovery and revelation - and ostracism and imprisonment. The story is interpreted as an allegorical criticism of Victorian-age social structure, but can equally describe the limitations of inhabiting uncritically a methodological world in which all data are ‘normal’ and all relationships are linear. Moving beyond linearity and acquiring the statistical intuition needed to think in higher dimensions and perceive more complex relationships is indeed a matter of practice-induced revelation. It’s unlikely that we will reach statistical nirvana in this short course, but we’ll attempt to build some more substantial structures upon the arid plains of linear regression. We start by looking around in the Flat-, Line- and Point-lands of quantitative analysis. Incorrigible procrastinators may want to check out a full-length computer animated film version of Flatland on YouTube. Others may be better served by this brief TED-Ed animation. |
Week 3 Dear Prudence, Help! I may be cheating with my X Interactions and the logic of causal inference |
Much of what we do in quantitative data analysis is about examining relationships. We are often interested in proposing and testing models of relationships between two or more variables. Sometimes our variables cry out to us begging for help, and we turn into agony aunts and uncles to our data. Other times we must psychoanalyse our data to uncover hidden associations and interactions. This is not an easy task. Do it carelessly, and you may unwittingly cheat yourself and the readers of your research. This week we’ll build some intuition for detecting complex and uneasy relationships within the design matrix |
Week 4 The Y question Generalised linear models |
It wasn’t until the last quarter of the 20th century that a unified vision of statistical modelling emerged, allowing practitioners to see how the general linear model we have explored so far is only a specific case of a more general class of models. We could have had a fancy, memorable name for this class of models - as John Nelder, one of its inventors, acknowledged later in life (Senn 2003:127) - but back then academics were not required to undertake marketing training on the tweetabilty-factor of the chosen names for their theories; so we ended up with “generalised linear models”. These models can be applied to explananda (“explained”, “response”, “outcome”, “dependent” etc. variables, our |
Week 5 Do we live in a simulation? Basic data simulation for statistical inference and power analysis |
We have known ever since science-fiction author Philip K. Dick’s memorable “Metz address” of 1977 that our world is a computer simulation. Of course, like some common-currency theories in the social sciences, this knowledge will never be truly verified. We won’t even attempt to get to the bottom of it in class; instead, we’ll practice some basic methods of computer simulation for statistical inference and for generating data that has some idealised characteristics. Such methods play an increasingly important role in computational statistics and are extremely useful for designing robust data collection and analysis plans. If you make a mistake in the code and end up in an infinite loop, but you’re afraid that stopping the process may cause the known universe to implode, you can watch Dick on YouTube while you wait. If something like this can happen to our data, who says it couldn’t happen to us? |
Week 6 Challenging hierarchies Multilevel models |
By now we got a sense that every new thing we learn about turns out to be merely a specific case of a larger class of things. So, all the models we covered so far are specific, single-level, versions of multilevel models, in which our cases can be seen as clustered within larger entities. Sometimes they are part of several cross-cutting clusters and/or the clusters are themselves clustered. In general terms, we must acknowledge that there are dependencies in our data that may influence their behaviour. It turns out that data about humans living in societies look somewhat like humans living in societies. The importance of including information about hierarchical dependencies in our models is probably emphasised by no one else more than McElreath (2020:15), who wants “to convince the reader of something that appears unreasonable: multilevel regression deserves to be the default form of regression. Papers that do not use multilevel models should have to justify not using a multilevel approach.” We will encounter some of the uses and challenges of multilevel modelling. |
Week 7 The unobserved Latent variables and structural models |
The unobserved sounds like the title of a promising horror film; if we have achieved our aims in the module so far, our horror should be ‘merely’ metaphysical by now (Kołakowski anyone? No? Okay, never mind). We have already had to deal with various aspects of latency in our analyses. At the most fundamental level, we speak about population parameters, but we never actually observe them; even a sample statistic can be a purely imaginary case that doesn’t occur in real life. We have discussed the effects of omitted variables, which are thus unobserved by our model, but which we may have access to in our data. And, of course, our most interesting measurements are likely to be proxies of some unobservable theoretical construct (Mulvin (2021) has recently published a wonderfully rich book about proxies in general). This week we pick up an earlier thread from week 4, where we thought about binary and ordered multinomial variables as discretised manifestations of some continuous ‘latent variable’. We expand on this idea by exploring simple and then more complex latent variable models (factor analysis, structural equation modelling), as a further generalisation of the hierarchical perspective introduced earlier. This gives us a few more tools to deal with our radical uncertainty. (n.b. missing data points are another challenge that could fall under this heading, and learning how to deal with them is extremely important; but “The missing” is too good a title not to deserve a high-budget, weak-storyline, full-on special effects sequel somewhere else) |
Week 8 Words, words, mere words… Text as data |
As researchers in humanities and the social sciences, we use words both as tools of analysis and as sources of data. Words, and more broadly, texts, are also increasingly important for quantitative research in an age of so-called ‘big data’, when the digital world is saturated with unstructured textual information. But the statistical inspection of text is neither new, nor restricted to the humanistic tail of the social sciences. For example, a documented interest in the statistical study of literary style for the purposes of attributing authorship dates back to the mid-1850s (see El-Shagi and Jung 2015); and investors can use textual data such as minutes from the Bank of England’s Monetary Policy Committee’s deliberations to estimate future monetary policy decisions before they are actually taken (cf. Lord 1958). Methods for the collection and quantitative analysis of large-scale textual data are increasingly available, but their technical implementation is complex and requires efficient combination of humanistic subject knowledge and statistical expertise. Faced with words, one is understandably caught between Shakespeare’s Troilus and Wilde’s Dorian Gray. “Words, words, mere words, no matter from the heart; th’ effect doth operate another way. … My love with words and errors still she feeds, but edifies another with her deeds” - believed the betrayed Troilus. “Words! Mere words! How terrible they were! How clear, and vivid, and cruel! One could not escape from them. And yet what a subtle magic there was in them! They seemed to be able to give a plastic form to formless things, and to have a music of their own as sweet as that of viol or of lute. Mere words! Was there anything so real as words?” - pondered Dorian. |