by Jonathan Hope (Strathclyde University, Glasgow)
The transcription of a substantial proportion of Early Modern English books by the Text Creation Partnership has placed more than 60,000 digital texts in the hands of literary and linguistic researchers. Linguists are in many cases used to dealing with large electronic corpora, but for literary scholars this is a new experience. Used to arguing from the quality, rather than quantity of evidence, literary scholars have a new set of norms and procedures to learn, and are faced with the exciting, or perhaps depressing, prospect that their object of study has changed.
In this talk I’ll look at some specific case studies that illustrate the potential, and the problems, of quantity-based studies – and will highlight key areas where literary scholars need to reassess their expectations of ‘evidence’, and the texts we use. A possible alternative title might be ‘Learning to live with error: gappy texts and crappy metadata’.
A screencast of the talk can be found below.
This paper was read at the Philological Society meeting in Oxford, Wolfson College, on Saturday, 11 March, 4.15pm.
Quantitative methods in historical linguistics are most often used to answer ‘variationist’ questions. We assume that we know what the possible forms of a language were, but ask questions about their distribution: when was one form replaced by another? Who used which forms? Were some more common in particular linguistic contexts, genres or text types? For this reason, quantitative methods might seem unappealing to historical linguists primarily interested in describing a historical variety—its grammar and lexicon—or describing etymologies. From time to time, however, quantitative data can throw a light on these more basic descriptive questions.
Old Norwegian, unlike its better-studied West Nordic sister Old Icelandic, exhibited height harmony of unstressed non-low vowels. Readers familiar with Old Icelandic texts will expect to see three distinct vowels in unstressed syllables: /a i u/ written <a i u>. In Old Norwegian texts we find an additional two graphemes, <e o>, in complementary distribution with <i u>. These vowels agree with the vowel of the stressed syllable for height: <i u> appear in unstressed syllables whenever the stressed syllable was high and <e o> whenever it was non-high. There are two exceptions to this rule: when the syllable contained the vowel normalised ǫ, which was the u-umlaut product of *a, we find unstressed syllables with <u> and either <e> or <i>, and when the stressed syllable contained the i-umlaut product of *a (usually normalised e but sometimes written ę to distinguish it from /e/ < Proto-Germanic *e), we find unstressed syllables with <i> and either <u> or <o>.
In theory, then, we could use the vowel harmony to distinguish between the stressed phonemes /e/ and /ę/ which were not (consistently) distinguished in the orthography: the former should have harmony vowels <e o> while the latter should have <i o/u>. However, Old Norwegian vowel harmony is a slippery creature. Few texts exhibit it totally consistently, making it difficult to sort out what is orthographic and what phonological variation. If we take a qualitative approach in which we read individual texts and describe their orthographies, we can’t confidently interpret deviations from vowel harmony as meaningful. If, on the other hand, we take a quantitative approach which includes data from many different texts, interesting patterns may become clear. Continue reading “Old Norwegian vowel harmony and the value of quantitative data for descriptive linguistics”→
Round table discussion with Aaron Ecay (Unversity of York), Seth Mehl (University of Sheffield), Nick Zair (Univeristy of Cambridge), chaired by Cécile De Cat (University of Leeds)
Is linguistics an empirical science? How reliable are the data on which linguistic analyses and theories are based? These questions are not new, but in light of the disturbing findings of the Reproducibility Project in psychological sciences, the need to revisit them has become more pressing. This round table discussion will start with presentations from three postdoctoral researchers, who will discuss the question of data collection and analysis and the interpretation of linguistic evidence.
This panel will be held on 11 November 2016 at 4.15pm in the Great Woodhouse Room, University House, University of Leeds, LS2 9JS.