Week 8 Words, words, mere words…
Text as data
Description
As researchers in humanities and the social sciences, we use words both as tools of analysis and as sources of data. Words, and more broadly, texts, are also increasingly important for quantitative research in an age of so-called ‘big data’, when the digital world is saturated with unstructured textual information. But the statistical inspection of text is neither new, nor restricted to the humanistic tail of the social sciences. For example, a documented interest in the statistical study of literary style for the purposes of attributing authorship dates back to the mid-1850s (see El-Shagi and Jung 2015); and investors can use textual data such as minutes from the Bank of England’s Monetary Policy Committee’s deliberations to estimate future monetary policy decisions before they are actually taken (cf. Lord 1958). Methods for the collection and quantitative analysis of large-scale textual data are increasingly available, but their technical implementation is complex and requires efficient combination of humanistic subject knowledge and statistical expertise. Faced with words, one is understandably caught between Shakespeare’s Troilus and Wilde’s Dorian Gray. “Words, words, mere words, no matter from the heart; th’ effect doth operate another way. … My love with words and errors still she feeds, but edifies another with her deeds” - believed the betrayed Troilus. “Words! Mere words! How terrible they were! How clear, and vivid, and cruel! One could not escape from them. And yet what a subtle magic there was in them! They seemed to be able to give a plastic form to formless things, and to have a music of their own as sweet as that of viol or of lute. Mere words! Was there anything so real as words?” - pondered Dorian.
References
David, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.
El-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.
Lord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.
McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.
Mulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.
Senn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489.