Sound change and analogy in morphological paradigms: Why automated inference is on the horizon

Erich Round (Max Planck Institute for the Science of Human History, Jena; University of Queensland) 

The comparative method is one of the greatest methodological achievements in the history of linguistics. And yet, despite its relatively precise formulation, we do not have an automated implementation of it, and consequently we face a very long wait to know more about the inferable history of language families around the globe. One may well ask why. As it happens, in a mathematical PhD thesis from 2010, Alexandre Bouchard-Côté demonstrated why, by showing that even the inference of sound change was computationally infeasible. Bouchard-Côté pointed to two impediments: (1) a factorial explosion in the difficulty of the computational task, and (2) a paucity of evidence when the data consists of a short list of basic vocabulary. However, recent progress in computational statistics provides reason to believe that impediment (1) may be overcome for at least some models of linguistic change. Impediment (2) might be alleviated by allowing the algorithms to look at richer sources of data (as we humans do), such as inflectional paradigms. And so, in this talk I discuss the prospects for trying to automate a core aspect of the comparative method: the inference of sound change and analogy in paradigms, with an emphasis on analogy. I discuss what is already known about analogy; what it might entail to model that knowledge explicitly; the role to be played by mathematical models of language change; and what research questions the exercise might realistically help us to ask.

This talk will take place at 4:15 on Friday 10 January 2020 at SOAS University of London, the Brunei Gallery Building (opposite the Main Building) in Room B103.

The Loss of the Latin Case System – A New Morphological Approach

by Zeprina-Jaz Ainsworth (University of Oxford)

Much work has already been done on the development of the Latin case system, which has been lost almost entirely from nouns and adjectives in Romance. Scholars such as Herman (2000) have outlined phonetic, analogical, functional, and syntactic changes which may have contributed to the opacification of certain morphological case forms. However, none of the previous analyses account for the near-total loss of the case category in Romance. For instance, as the result of regular phonological changes, the singular forms in the first declension would not have ‘fallen together’ into a single, invariant shape:

PluralClassical LatinSound ChangeResult

AccusativeMENSAMLoss of final -m**mensa
AblativeMENSĀLoss of vowel length distinctions
GenitiveMENSAEae >[e]
DativeMENSAEae >[e]

Table 1: Phonetic erosion in first declension singular case/number suffixes

Moreover, cross-linguistic comparison indicates that, despite phonological, analogical, and functional developments, languages do not necessarily always lose their case systems. Finnish, for instance, retains the fifteen case values (for nouns and adjectives) reconstructed for proto-Finnic (although the abessive, comitative, instructive and prolative are now in restricted usage), and has even begun to develop new morphological suffixes:

Proto-Finnic nominative, genitive, partitive, essive, translative, elative, inessive, illative, ablative, adessive, allative, abessive, comitative, instructive, prolative
Modern Finnish nominative, genitive, partitive, essive, translative, elative, inessive, illative, ablative, adessive, allative, (abessive, comitative, instructive, prolative), comitiative2, excessive

Table 2: Case values in proto-Finnic and modern Finnish

This study is concerned with answering the question: why do we find such different developments cross-linguistically?

One major difference between these two languages is that Latin is characterized predominantly by fusional morphology, whilst Finnish exhibits an abundance of agglutinative structure. By analysing these structures from a unit-agnostic ‘abstractive’ approach (as opposed to a ‘constructive’ perspective, in which forms are considered to be ‘built’ up of sub-word parts),[1] we may best understand how they behave in significantly different ways in diachrony.

In Latin for instance, the fully-inflected wordform and the relationship it bears to other forms in the paradigm provides the language-user with informative patterns which may be extended in the inflexion of other lexemes – there is no need to posit ‘underlying’ forms or identify sub-word morphs in order to ‘construct’ new forms. For instance, if the language-user knows a nominative singular form ending in -a, the lexeme must belong to the first declension. In the second and fourth declensions, however, even if both the nominative singular and accusative singular forms are known, there is residual ambiguity about the inflexion class to which the lexeme belongs:

Nom. sg. PUELLA 1st declension SERVUS 2nd/4th declension GRADUS 2nd/4th declension
Acc. sg. PUELLAM 1st declension SERVUM 2nd/4th declension GRADUM 2nd/4th declension
Gen. sg. PUELLAE 1st declension SERVĪ 2nd declension GRADŪS 4th declension

Table 3: Implicational relations in a sub-set of Latin nouns

In Finnish, implicative relations provide information about inflexion class, whilst the frequent isomorphic form~function mapping exhibited by inflexional suffixes provides absolute certainty in the expression of most case functions.

Nom. sg. ajatus ‘thought’ -Vs ~ -Vks-/-Vs ~ -VV- vieras ‘stranger’ -Vs ~ -Vks-/-Vs ~ -VV-
Part. sg. ajatusta -Vs ~ -Vks-/-Vs ~ -VV- vierasta -Vs ~ -Vks-/-Vs ~ -VV-
Gen. sg. ajatuksen -Vs ~ -Vks- + [n] vieraan Vs ~ -VV- + [n]

Table 4: Implicational relations and sub-word units in a sub-set of Finnish nouns

Whilst multiple forms are required in Finnish to determine the declension class to which a lexeme with a nominative singular form in -s belongs, there is certainty in many cells as to the inflexional material that will follow the lexical stem.

The abstract patterns that exist in Latin are not maximally-informative, that is, there is occasionally still uncertainty about the shape of an unknown form, even given knowledge of two forms in the language (consider table three). In Finnish, on the other hand, there is a sub-word area of absolute certainty in most of the cells in the inflexional paradigm. In addition to implicational relations, therefore, a Finnish speaker, even where there is not have sufficient information to deduce the inflexion class of a lexeme, may utilize maximally-predictable sub-word forms to produce a form (whether or not the ‘correct’ one) which may be interpreted correctly by a hearer.[2]

The observations offered here accord with language-learning data. Niemi and Niemi (1987) and Laalo (2009), for instance, observe that Finnish children recognise early the direct mapping of the suffix -n and genitive singular functions; they then utilise this knowledge in the deduction of previously unencountered forms. In Latin, exemplary paradigms and principal parts have long been used to capture the inflexional variation exhibited by lexemes. The implicational relations that exist between the nominative singular and genitive singular forms of a noun, for instance, are sufficient to enable L2 learners to ‘match’ novel items to the correct inflexion class.

I suggest that understanding the way in which morphological structures are recognised and exploited by languages-users may help to explain (in conjunction with, e.g., phonological or analogical developments) whether morphological case distinctions are likely to be lost or maintained. In Latin, the implicational relations, although informative, are not always maximally-predictive, and became opacified through time following regular phonological developments (such as those given in table one). As a result of such phonetic erosion, the area of informativeness in the Latin case system has shifted from the area of suffixal variation, distinct across declension, towards the certainty associated with the invariant form of the lexeme. By contrast, the maximally-predictable sub-word elements in Finnish may be rote-learned, which provides them with diachronic stability. These units, in addition to the less informative abstract relations, offer language-users on average more information in language use than is available to a learner of Latin in the production of novel inflected forms. Consideration of the morphological structures found in a given language and the ways in which they are recognised and exploited in language use may therefore offer some additional insight into why the robust Latin case system is not found in Romance.


Blevins, J.P., 2006. ‘Word-based Morphology’. In Journal of Linguistics 42:3. 531-573.

—-, 2016. Word and Paradigm Morphology. Oxford: Oxford University Press.

Blevins, J.P., P. Milin, and M. Ramscar. 2017. ‘The Zipfian Paradigm Cell Filling Problem’. In F. Kiefer, J.P. Blevins, and H. Bartos (eds.). Perspectives on Morphological Structure: Data and Analyses. Leiden: Brill. 139-158.

Herman, J., 2000. Vulgar Latin. Pennsylvania: Pennsylvania State University Press.

Laalo, K., 2009. ‘Acquisition of Case and Plural in Finnish’. In U. Stephany and M. Voeikova (eds.). Development of Nominal Inflection in First Language Acquisition: a Cross-Linguistic Perspective. Berlin: Mouton de Gruyter. 49-90.

Milin, P., V. Kuperman, A. Kostić and H.R. Baayen, 2009.
‘Words and paradigms bit by bit: An information-theoretic approach to the processing of inflection and derivation’ in In J.P. Blevins and J. Blevins (eds.). Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press. 214-252.

Niemi, J. and S. Niemi, 1987. ‘Acquisition of inflectional marking: A case study of Finnish’ in Nordic Journal of Linguistics 10:1. 59-89.

[1] The terms ‘abstractive’ and ‘constructive’ are from Blevins (2006).

[2] This discussion may be recast in terms of the information-theoretic notion of ‘entropy’. See, e.g., Milin et al. (2009) and Blevins (2016:171-196).

Functions of Vowel Length in Language: Phonological, Grammatical, & Pragmatic Consequences

by Larry Hyman (University of California, Berkeley)

In this talk my starting point is to frame the different functions of vowel length (lexical, morphological, syntactic, pragmatic) in terms of how they compare with other phonological properties, in particular tone, which has been claimed to be able to do things that “nobody” else can do (Hyman 2011). Rather than providing a cross-linguistic typology, I focus on the different functions of vowel length in Bantu—as well as how these functions have changed. Although Proto-Bantu had a vowel length contrast on roots which survives in many daughter languages today, many other Bantu languages have modified the inherited system. In this talk I distinguish between four types of Bantu languages:

  1. Those which maintain the free occurrence of the vowel length contrast inherited from the proto language;
  2. Those which maintain the contrast, but have added restrictions which shorten long vowels in pre-(ante-)penultimate word position and/or on head nouns and verbs that are not final in their XP;
  3. Those which have lost the contrast with or without creating new long vowels (e.g. from the loss of an intervocalic consonant flanked by identical vowels);
  4. Those which have lost the contrast but have added phrase-level penultimate lengthening.

I will propose that the positional restrictions fed into the ultimate loss of the contrast in types (3) and (4), with a concomitant shift from root prominence (at the word level) to penultimate prominence (at the intonational and phrase level). In the course of covering the above typology and historical developments in Bantu, I will show that there are some rather interesting Bantu vowel length systems that may or may not be duplicated elsewhere in the world and that vowel length is probably second only
to tone in what it can do.

This paper was read at the Philological Society meeting at SOAS, University of London, Djam Lecture Theatre (DLT, Main SOAS Building), on Friday, 15 February, 4.15pm.

Continue reading “Functions of Vowel Length in Language: Phonological, Grammatical, & Pragmatic Consequences”

Bashkardi – a language by convergence?

by Agnes Korn (CNRS, Paris)

Bashkardi, spoken in Southern Iran inland from the Strait of Hormuz, is a very little known language. The dialects differ on all levels of grammar and show strong influence of Persian. This talk will present some salient features of the phonology and morphology of Bashkardi and compare them to other Iranian languages to shed light on the development of the grammatical structures. I will examine the hypothesis that Bashkardi is not a genetic entity, but a group of Iranian dialects of diverse origin which developed common traits by a process of convergence, having found themselves next to each other in a small region that remains remote even today.

This paper will be read at the Philological Society meeting in London, SOAS, Brunei Gallery building, first floor, room B104, on Friday, 11 May, 4.15pm.

The Morphological-to-Analytic Causative Continuum in Hausa: New Insights and Analyses in a Typological Perspective

by Philip J. Jaggar (School of Oriental and African Studies, University of London)

Over the last few decades, linguists have devoted considerable attention to both homogeneity and variation in the expression of causal events across languages. However, most studies, whether typological or language-specific, have focused on the category of morphologically overt (e.g., ‘lie/lay X down’) causatives, to the relative neglect of complex periphrastic (e.g., ‘get X to lie down’) formations.

The present study addresses this imbalance by elucidating a wide spectrum of causative expressions in Hausa (Chadic/Afroasiatic), supported by a strong cross-linguistic perspective. In line with contemporary approaches located within a general typology of causation, the analysis invokes the widely-accepted dichotomy between direct and indirect causative constructions. Direct causation associates with morphological causatives, indirect causation with periphrastic expressions—compare morphological ‘I lay X down’ (direct, with no intermediary) with periphrastic ‘I got X to lie down’ (indirect, where X also functions as an intervening actor/cause).

Hausa uses an indirect periphrastic causative usually formed with sâa ‘cause’ (lit. ‘put’) as the higher causal verb, e.g., nâs taa sâa yaaròn yaa kwântaa ‘the nurse got the boy to lie down’ (= intransitive kwântaa ‘lie down’). Direct morphological causatives, in contrast, associate with a specific derivational formation, known as “Grade 5” (Parsons 1960/61), e.g., nâs zaa tà kwantar̃ dà yaaròn ‘the nurse will lay the boy down’.

The monograph systematically explores, for the first time in an African language to our knowledge, the key design-features that distinguish the two mechanisms, in addition to demonstrating that Hausa periphrastic causatives can also differ from each other, e.g., in implicational strength, depending on the modal (TAM) properties of the lower clause. In so doing, it provides a rare account of how the two types are used to describe pragmatically different causal events and participant roles.

Jaggar, Philip J. (2017) The Morphological-to-Analytic Causative Continuum in Hausa: New Insights and Analyses in a Typological Perspective. (Abhandlungen für die Kunde des Morgenlandes, Band 109). Wiesbaden: Harrassowitz.
Available from 1 June 2017.

TPS 115(2) – Abstract 1

Welsh svarabhakti as stem allomorphy

by Pavel Iosad (University of Edinburgh)

In this paper I propose an analysis of the repairs of sonority sequencing violations in South Welsh in terms of a non-phonological process of stem allomorphy. As documented by Hannahs (2009), modern Welsh uses a variety of strategies to avoid word-final rising-sonority consonant clusters, depending in part on the number of syllables in the word. In particular, while some lexical items epenthesise a copy of the rightmost underlying vowel in the word, others delete one of the consonants in the cluster. In this paper, I argue that at least the deletion is not a live phonological process, and suggest viewing it as an instance of stem allomorphy in a stratal OT framework (Bermúdez-Otero 2013). This accounts for the lexical specificity of the pattern, which has been understated in the literature, and for the fact that cyclic misapplication of deletion and diachronic change are constrained by part-of-speech boundaries.

DOI: 10.1111/1467-968X.12085

TPS 115(1) – Abstract 2

Verbal triplication morphology in Stau རྟའུ། (Mazi dialect)

by Jesse P. Gates (Southwest University for Nationalities)

This paper presents the first documentation and analysis of a typologically remarkable process of verbal triplication in the Stau language (Sino-Tibetan). Moreover, Stau’s triplication of verbs to index multiple agents (S/A) and to pragmatically highlight those agents, as is demonstrated in this study, is a morphological process that has not been documented among any of the world’s spoken languages to date. Stau’s verbal triplication, although unique in many regards, falls into a broader typological linguistic pattern of iconicity, demonstrating that there is often a strong tie between form and function.

DOI: 10.1111/1467-968X.12083

TPS 115(1) – Abstract 1

Words and Paradigms: Peter H. Matthews and the Development of Morphological Theory

by Stephen Anderson (Yale University)

The tension between morpheme-based views of word structure, on which words are exhaustively divided into atomic units linking form and content, and word and paradigm views, on which words are analyzed in terms of their relations to others, goes back at least to the beginning of the twentieth century. The history of this opposition within modern linguistics is described, and the specific role of Peter H. Matthews in promoting the superiority of a non-morphemic approach to morphology is highlighted. Arguments for such an approach are briefly reviewed, with discussion of the response to these on the part of the broader field of linguistics.

DOI: 10.1111/1467-968X.12090

Varro’s ‘De lingua Latina’ (‘On the Latin language’)

by Wolfgang D. C. de Melo (University of Oxford)

I must begin this blog post with a little confession. As an undergraduate and to a large extent still as a graduate, I found it hard to get excited about the history of linguistics. Of course I respected the great achievements of the Neogrammarians and of early phoneticians like Henry Sweet or Daniel Jones; but I was more interested in the results of their work than in how they got there. Any linguistic work written before the nineteenth century left me cold. Like any other classics undergraduate, I read through various grammarians. I liked the fact that they preserved so many quotations from early literature that had otherwise been lost. But beyond that I could not see anything of value in them. To me, Nonius was an encyclopaedia of errors; Isidore made me shudder; and, as Eduard Norden, the great authority on Latin style, told us, Varro had the worst prose style of any Latin writer before the Middle Ages.

In view of all this, it came as a bit of a shock to me when I was asked by OUP whether I would be willing to edit Varro’s De lingua Latina, our earliest extant treatise on Latin grammar. I had to think long and hard about it before I said yes. One thing that I consider vital for a text like this is a translation and a commentary. They are necessary because the text is both fragmentary and technical. I have now been working on Varro for a few years, and during this time I have come to respect, admire, and even like him.

Marcus Terentius Varro (116–27 BC) was born in Reate, modern Rieti. He was politically active and had his own farm, and yet, despite all this, he managed to write several hundred books on philosophy, history, agriculture, and language. An ancient book corresponds to a modern book chapter in length, but even so this output is astounding. Of course, quantity is not the same as quality, and there are indications that Varro often wrote in haste and could have produced better quality if he had written in less of a hurry. However, on the whole he is an original and thoughtful writer with many valid and interesting insights.

Originally, the De lingua Latina comprised twenty-five books. An introductory volume was followed by six books on etymology, six on morphology, and twelve on syntax. Sadly, we only have fragments of the books on syntax. What we do have in almost complete form is books 5-10, that is, the second half of the etymological part and the first half of the morphological part.

Of the etymological books, the first three covered the theory of etymology. The three books that we still have deal with the practical side. Book 5 gives us hundreds of etymologies of places and things; book 6 deals with the etymologies of times and actions; and book 7 discusses all these concepts in poetry.

Varro did not know that sound change is regular, and of course he had never heard of the comparative method. It comes as no surprise that many of his etymologies are, by modern standards, ‘wrong’. But wrong does not equal stupid. His method is surprisingly sound. He identified loan words, and did so by and large correctly. Among native words, he looked for words that are similar in sound and meaning. This approach enabled him to find many etymological connections that we can confirm today with the help of the comparative method.

Perhaps a few examples will show more clearly how Varro’s mind works.

Continue reading “Varro’s ‘De lingua Latina’ (‘On the Latin language’)”