Big and small data in ancient languages

by Nicholas Zair (University of Cambridge)

Back in November I gave a talk at the Society’s round table on ‘Sources of evidence for linguistic analysis’ on ‘Big and small data in ancient languages’. Here I’m going to focus on one of the case studies I considered under the heading of ‘small data’, which is based on an article that I and Katherine McDonald and I have written (more details below) about a particular document from ancient Italy known as the Tabula Bantina.

tabula_bantina

It comes from Bantia, modern day Banzi in Basilicata and is written in Oscan, a language which was spoken in Southern Italy in the second half of the first millennium BC, including in Pompeii prior to a switch to speaking Latin towards the end of that period. Since Oscan did not survive as a spoken language, we know it almost entirely from inscriptions written on non-perishable materials such as stone, metal and clay. There aren’t very many of these inscriptions: perhaps a few hundred, depending on definitions (for instance, do you include control marks consisting of a single letter?). We are lucky that Oscan is an Indo-European language, and, along with a number of other languages from ancient Italy, quite closely related to Latin, so we can make good headway with it. Nonetheless, our knowledge of Oscan and its speakers is fairly limited: it is certainly a language that comes under the heading of ‘small data’.

 

iron_age_italy

One of the ways scholars have addressed the problem of so-called corpus languages like Oscan, and even better-attested but still limited ones like Latin has been to combine as many relevant sources of information, from ancient historians to the insights of modern sociolinguistic theory as a way of squeezing as much information from what we have – and trying to fill in the blanks where information is lacking. This has been a huge success, but this approach can also be dangerous, especially when it comes to studying language death. Given that we know a language will die out in the end, it is very tempting to see every piece of evidence as a staging post in the process, and try to fit it into our narrative of language death. Often this provides very plausible histories, but we must remember that, while in hindsight history can look teleological, things are rarely so clear at the time.

The Tabula Bantina is a bronze tablet with a Latin law on one side and an Oscan law on the other side. It is generally agreed that the Latin text was written before the Oscan one, but the Oscan is not a translation of the Latin: the writer of the Oscan text simply used the conveniently blank side of the tablet to write the new material on. The striking things about the Oscan text are that it is written in the Latin alphabet, and there are lots of mistakes. It also strongly resembles Latin legal language. The date of this side is probably between about 100-90 BC, just before Rome’s ‘allies’, which is to say conquered peoples and cities in Italy, rose up against it in a rebellion generally known as the Social War. Continue reading “Big and small data in ancient languages”

The Morphological-to-Analytic Causative Continuum in Hausa: New Insights and Analyses in a Typological Perspective

by Philip J. Jaggar (School of Oriental and African Studies, University of London)

Over the last few decades, linguists have devoted considerable attention to both homogeneity and variation in the expression of causal events across languages. However, most studies, whether typological or language-specific, have focused on the category of morphologically overt (e.g., ‘lie/lay X down’) causatives, to the relative neglect of complex periphrastic (e.g., ‘get X to lie down’) formations.

The present study addresses this imbalance by elucidating a wide spectrum of causative expressions in Hausa (Chadic/Afroasiatic), supported by a strong cross-linguistic perspective. In line with contemporary approaches located within a general typology of causation, the analysis invokes the widely-accepted dichotomy between direct and indirect causative constructions. Direct causation associates with morphological causatives, indirect causation with periphrastic expressions—compare morphological ‘I lay X down’ (direct, with no intermediary) with periphrastic ‘I got X to lie down’ (indirect, where X also functions as an intervening actor/cause).

Hausa uses an indirect periphrastic causative usually formed with sâa ‘cause’ (lit. ‘put’) as the higher causal verb, e.g., nâs taa sâa yaaròn yaa kwântaa ‘the nurse got the boy to lie down’ (= intransitive kwântaa ‘lie down’). Direct morphological causatives, in contrast, associate with a specific derivational formation, known as “Grade 5” (Parsons 1960/61), e.g., nâs zaa tà kwantar̃ dà yaaròn ‘the nurse will lay the boy down’.

The monograph systematically explores, for the first time in an African language to our knowledge, the key design-features that distinguish the two mechanisms, in addition to demonstrating that Hausa periphrastic causatives can also differ from each other, e.g., in implicational strength, depending on the modal (TAM) properties of the lower clause. In so doing, it provides a rare account of how the two types are used to describe pragmatically different causal events and participant roles.


Jaggar, Philip J. (2017) The Morphological-to-Analytic Causative Continuum in Hausa: New Insights and Analyses in a Typological Perspective. (Abhandlungen für die Kunde des Morgenlandes, Band 109). Wiesbaden: Harrassowitz.
Available from 1 June 2017.

Latin in Medieval Britain

by Richard K. Ashdowne (University of Oxford; Honorary Membership Secretary, PhilSoc)

Of the many languages in use in Britain in the middle ages, Latin is arguably the best attested and yet most overlooked. Not the native language of any of its users and employed especially—though certainly not exclusively—in written functions, Latin has tended to be the elephant in the room despite its indisputable importance for its users and their societies.

After the departure of the Roman legions from Britain, Latin’s continued use was by no means assured, but there is a continuous train of use down to the time of the Tudors and beyond. Over more than a thousand years British medieval Latin was employed for all manner of functions from accountancy to zoology.

In this new collection of papers, arising from the conference held to celebrate the completion in print of the Dictionary of Medieval Latin from British Sources, the place of Latin in medieval Britain is examined from a variety of historical, cultural and linguistic perspectives and in relation to some of its many different contexts.

In the first part, David Howlett, Neil Wright, Wendy Childs and Robert Swanson look successively at the start of the Anglo-Latin tradition, the twelfth-century renaissance, the use of Latin in historiography and record-keeping in the fourteenth century, and the continued use of Latin in the medieval tradition into the fifteenth and sixteenth centuries. The vitality of the language over the ages and its users’ constant reinvention of its role emerge as central themes.

In the second part, attention is directed to particular fields, namely law (Paul Brand), musical theory (Leofranc Holford-Strevens), the church (Carolinne White) and science (Charles Burnett), as examples of how the Latin language was used and adapted to its roles. That it was being employed in historical, social, cultural and linguistic settings quite different from its ancient ancestor had important consequences. It meant that, for instance, Latin was frequently in need of new terminology for the contemporary world, especially in some of these more technical areas. Borrowing, calquing and native word-formation processes were all ways of meeting this need, reflecting the inherent contact between Latin and its users’ native vernacular languages.

In the third and final part, these linguistic contacts become the central focus in chapters examining the relationship between Welsh and Latin (Paul Russell), the relationship between Latin and English (Richard Sharpe), the development of a mixed-language code (Laura Wright), the relationship of Germanic, Anglo-Norman French and Latin (David Trotter), and the relationship between English and Latin (Philip Durkin and Samantha Schad). The final chapter, by David Howlett, ties in with some of the lexicographical questions raised by Sharpe, Trotter, and Durkin and Schad, and looks back at the process of preparing the Dictionary of Medieval Latin from British Sources.

Latin in Medieval Britain is edited by Richard Ashdowne and Carolinne White and  published by the British Academy in association with OUP. Many of the contributors are members of the Society and current or former members of Council.


Further information, including abstracts of all the chapters, can be found on the DMLBS blog and the book can be obtained directly from OUP and all good booksellers.

One Language, Two Grammars: the ‘Plight’ of Classical Armenian

by Robin Meyer (University of Oxford; Hon. Secretary for Student Associate Members)

Armenian is one of those Indo-European languages that very rarely gets much attention from students of historical linguistics or comparative philology; most frequently, it crops up only in discussions of the augment, laryngeals, and the Glottalic Theory. This, alas, is unlikely to change.
Yet, Armenian can serve as an interesting case study for a number of fields within linguistics, not least language contact and corpus linguistics. With these two topics in mind, allow me to introduce you to Armenian – albeit in extreme brevity –, and to illustrate one of its more curious traits: its two grammars.

Map_Armenia_BCE
Map of Armenia in the 2nd and 1st centuries BCE
An exceedingly short introduction: Iranian, Greek, and the Armenian language(s)

Armenian, attested in its Classical form (called գրաբար |grabar|) since the 5th century CE, is a language with a couple of twists. Until a ground-breaking paper by Heinrich Hübschmann (1875), Armenian was thought to belong to the Iranian language family. In fact, Armenian is most closely related to Greek – and even that not all that closely (Clackson 1994). For the most part, this relationship is not immediately obvious at the surface, particularly if compared to the similarities between, for instance, Vedic and Old Avestan, or Latin and Oscan.
The reason for its historical allocation to the Iranian family lies in the inordinate amount of Iranian loan words and calques, both lexical and phraseological, in Armenian. These are mostly taken from Parthian (North West Middle Iranian; Meillet 1911–12, Schmitt 1983). Less obviously, even certain Iranian syntactic structures and patterns have been replicated (Meyer 2013, 2016). These borrowings are, without doubt, owed to long-lasting contact between Armenian and Parthian speakers. Since the 5th century BCE, Armenia was under Iranian rule in one form or another: Achaemenid, Artaxiad, Arsacid Parthian, and later Sasanian Persian. For the most part, an Armenian king of Iranian origin ruled as primus inter pares among other Armenian and Iranian noble families. The history and ethnic composition of Armenia is, of course, far more complex than can be described in one sentence; excellent summaries can be found in Hovannisian (1997).

So far, so good. Continue reading “One Language, Two Grammars: the ‘Plight’ of Classical Armenian”

Transitive nouns and adjectives: evidence from Early Indo-Aryan

by John J. Lowe (University of Oxford)

LoweTransNomsTransitivity is typically thought of as a property of verbs, and perhaps of adpositions, but it is not a typical property of nouns or adjectives. In the influential cross-classification of syntactic categories developed by Chomsky (e.g. 1981: 48), nouns and adjectives are actually defined in opposition to adpositions and verbs by their inability to govern objects, that is by their inability to be transitive. A few authors have discussed exceptions to this generalization, but they tend to be rare and non-productive; for example in English there may be only a single transitive adjective, near, which is a historically explicable exception to an otherwise consistent synchronic rule that nouns and adjectives cannot govern ‘bare’ noun phrase complements (Maling, 1983). As a second example, in early Latin there are a few nouns and adjectives which may govern accusative case objects, but the process is not productive and is entirely eliminated by Classical Latin.

gnaruris                          vos                    volo                esse        hanc                 rem
acquainted.ACC.PL     you.ACC.PL    wish.1PL       be.INF   this.ACC          matter.ACC
‘I wish you to be acquainted with this matter.’ (Latin: Plautus Most. 100)

In the early Indo-Aryan languages, however, there is a relative wealth of transitive noun and adjective categories. In my forthcoming monograph Transitive Nouns and Adjectives: evidence from early Indo-Aryan (OUP, July 2017), I investigate the evidence from four periods of early Indo-Aryan, discussing the synchronic and diachronic explanation for this unusual phenomenon.

The majority of transitive noun/adjective categories in early Indo-Aryan fall under the traditional heading of ‘agent noun’ (including agentive adjectives, used in the same way); these are the categories whose transitivity is most clear, and most common. For example, in the sentence below the ‘agent adjective’ kāmin- ‘desirous, desiring’ governs an accusative object ‘drink’.

kāmī                                   hi       vīraḥ                            sadam    asya        pītim
desirous.NOM.SG.M   for      hero.NOM.SG.M     always    it.GEN    drink.ACC
‘For the hero (is) always desirous (of) a drink of it.’ (Sanskrit: RV 2.14.1c)

Superficially, kāmī here looks similar to a participle, i.e. to a word category which, as a non-finite verbal category, could unproblematically govern an object. However, I show that the majority of transitive nouns and adjectives attested in early Indo-Aryan cannot be analysed as non-finite verb forms, but must be acknowledged as part of a distinct constructional type in early Indo-Aryan.

Other transitive nouns fall under the traditional heading of ‘action nouns’; I show that for the most part action nouns are transitive only when used as infinitives, and hence their transitivity can be explained as the unexceptional transitivity of non-finite verb forms. There are also nouns and adjectives whose transitivity is adpositional, rather than verbal.

Crucially, I show that there is a statistical correlation between transitivity of nouns and adjectives and the syntactic context of predication: nouns and adjectives which are used as the primary predicate in a (perhaps null) copular construction are statistically more likely to be transitive than those which are used in other ways. This correlation is unique to transitive nouns and adjectives and securely distinguishes this formation from transitivity with non-finite verb categories.

The book provides a detailed introduction to transitivity (verbal and adpositional), to the categories of agent and action noun, and to early Indo-Aryan. The four periods of early Indo-Aryan selected for study are: Rigvedic Sanskrit, the earliest Indo-Aryan; Vedic Prose, a slightly later form of Sanskrit; Epic Sanskrit, a form of Sanskrit close to the standardized ‘Classical’ Sanskrit; and Pali, the early Middle Indo-Aryan language of the Buddhist scriptures. I show that while each linguistic stage is different, there are shared features of transitive nouns and adjectives which apply throughout the history of early Indo-Aryan.

The data is set in the wider historical context, from Proto-Indo-European to Modern Indo-Aryan, and a formal linguistic analysis of transitive nouns and adjectives is provided in the framework of Lexical-Functional Grammar.


References:

Chomsky, Noam (1981), Lectures on Government and Binding: The Pisa Lectures, Dordrecht: Foris.

Lowe, John J. (2017), Transitive Nouns and Adjectives: Evidence from Early Indo-Aryan, volume 25 in the series Oxford Studies in Diachronic and Historical Linguistics. Oxford: Oxford University Press. c. 400 pp. ISBN: 978-0-19-879357-1.

Maling, Joan (1983), ‘Transitive adjectives: a case of categorial reanalysis’, in Frank Heny & Barry Richard (eds.), Linguistic Categories: Auxiliaries and Related Puzzles, volume 1. Dordrecht: Reidel. 253–289.

‘The Word Detective’ serialised on BBC Radio 4

by John Simpson (Chief Editor, Oxford English Dictionary, 1993–2017)

Picture1
John Simpson
(© Bloomington Photography)

A generation ago, my colleagues and I at the OED were starting to become increasingly aware that the dictionary was in danger of drifting away from its audience. Or, to put it more accurately, the dictionary was standing still while its audience moved into the twentieth and then the twenty-first centuries.

Historical lexicography is demanding. There are few short cuts; standards are exacting. The editors of the First Edition of the OED had laboured for many years to capture the history of our language, and its format reflected nineteenth-century expectations about how knowledge should be presented. Nowadays the level of scholarship at the OED is the same – it has to be. But a wider audience expects to be able to access and understand the dictionary in radically new ways.  One of the challenges of the last few decades has been how to present the content of the OED to a new readership in the digital age.

Picture2I wrote The Word Detective to give readers an informal, behind-the-scenes look at the OED and the extraordinary things it has set out to achieve over the last forty years. In addition, I wanted to convey to readers the excitement of researching and defining the language – because that’s what we all felt as editors.

The Word Detective will be broadcast at 7.45 p.m. this Monday to Friday (13–17 March), on BBC Radio 4. See if I achieved it!

 

 


John Simpson’s ‘The Word Detective’ is published by Little Brown in the UK, and Basic Books in the USA.

Old Norwegian vowel harmony and the value of quantitative data for descriptive linguistics

by Tam Blaxter (University of Cambridge)

Quantitative methods in historical linguistics are most often used to answer ‘variationist’ questions. We assume that we know what the possible forms of a language were, but ask questions about their distribution: when was one form replaced by another? Who used which forms? Were some more common in particular linguistic contexts, genres or text types? For this reason, quantitative methods might seem unappealing to historical linguists primarily interested in describing a historical variety—its grammar and lexicon—or describing etymologies. From time to time, however, quantitative data can throw a light on these more basic descriptive questions.

on_homily
An excerpt from the Old Norwegian Homily Book

Old Norwegian, unlike its better-studied West Nordic sister Old Icelandic, exhibited height harmony of unstressed non-low vowels. Readers familiar with Old Icelandic texts will expect to see three distinct vowels in unstressed syllables: /a i u/ written <a i u>. In Old Norwegian texts we find an additional two graphemes, <e o>, in complementary distribution with <i u>. These vowels agree with the vowel of the stressed syllable for height: <i u> appear in unstressed syllables whenever the stressed syllable was high and <e o> whenever it was non-high. There are two exceptions to this rule: when the syllable contained the vowel normalised ǫ, which was the u-umlaut product of *a, we find unstressed syllables with <u> and either <e> or <i>, and when the stressed syllable contained the i-umlaut product of *a (usually normalised e but sometimes written ę to distinguish it from /e/ < Proto-Germanic *e), we find unstressed syllables with <i> and either <u> or <o>.

In theory, then, we could use the vowel harmony to distinguish between the stressed phonemes /e/ and /ę/ which were not (consistently) distinguished in the orthography: the former should have harmony vowels <e o> while the latter should have <i o/u>. However, Old Norwegian vowel harmony is a slippery creature. Few texts exhibit it totally consistently, making it difficult to sort out what is orthographic and what phonological variation. If we take a qualitative approach in which we read individual texts and describe their orthographies, we can’t confidently interpret deviations from vowel harmony as meaningful. If, on the other hand, we take a quantitative approach which includes data from many different texts, interesting patterns may become clear. Continue reading “Old Norwegian vowel harmony and the value of quantitative data for descriptive linguistics”