Old Norwegian vowel harmony and the value of quantitative data for descriptive linguistics

by Tam Blaxter (University of Cambridge)

Quantitative methods in historical linguistics are most often used to answer ‘variationist’ questions. We assume that we know what the possible forms of a language were, but ask questions about their distribution: when was one form replaced by another? Who used which forms? Were some more common in particular linguistic contexts, genres or text types? For this reason, quantitative methods might seem unappealing to historical linguists primarily interested in describing a historical variety—its grammar and lexicon—or describing etymologies. From time to time, however, quantitative data can throw a light on these more basic descriptive questions.

An excerpt from the Old Norwegian Homily Book

Old Norwegian, unlike its better-studied West Nordic sister Old Icelandic, exhibited height harmony of unstressed non-low vowels. Readers familiar with Old Icelandic texts will expect to see three distinct vowels in unstressed syllables: /a i u/ written <a i u>. In Old Norwegian texts we find an additional two graphemes, <e o>, in complementary distribution with <i u>. These vowels agree with the vowel of the stressed syllable for height: <i u> appear in unstressed syllables whenever the stressed syllable was high and <e o> whenever it was non-high. There are two exceptions to this rule: when the syllable contained the vowel normalised ǫ, which was the u-umlaut product of *a, we find unstressed syllables with <u> and either <e> or <i>, and when the stressed syllable contained the i-umlaut product of *a (usually normalised e but sometimes written ę to distinguish it from /e/ < Proto-Germanic *e), we find unstressed syllables with <i> and either <u> or <o>.

In theory, then, we could use the vowel harmony to distinguish between the stressed phonemes /e/ and /ę/ which were not (consistently) distinguished in the orthography: the former should have harmony vowels <e o> while the latter should have <i o/u>. However, Old Norwegian vowel harmony is a slippery creature. Few texts exhibit it totally consistently, making it difficult to sort out what is orthographic and what phonological variation. If we take a qualitative approach in which we read individual texts and describe their orthographies, we can’t confidently interpret deviations from vowel harmony as meaningful. If, on the other hand, we take a quantitative approach which includes data from many different texts, interesting patterns may become clear.

The Medieval Nordic Text Archive (MeNoTA) contains 21 digitised Old Norwegian texts from manuscripts ranging from the first half of the thirteenth century (such as AM 619 4to, dated 1200-1225) to fourteenth century manuscripts (such as AM 78 4to); these are long texts (narratives, religious and law texts) comprising a total of 798,828 words. The Diplomatarium Norvegicum contains over ten thousand digitised Old and Middle Norwegian texts primarily ranging from the thirteenth to the mid sixteenth century; these are mostly legal charters, and are all short texts. Restricting ourselves to only original texts dated before 1350, we find 1044 texts and a total of 243,153 words.

So what kinds of vowel harmony patterns do we find in these texts? First, let’s consider some words whose stressed vowel is in no doubt.

word (normalised) stressed vowel expected unstressed vowel proportion high orthographies
hafi, hafir, hafit a non-high 18.77% 3.75%
dómi, dómsins ó non-high 14.45% 5.66%


non-high 8.82% 11.11%


ei high 98.48% 91.88%

ríki, ríkis, ríkisins

í high 91.80% 99.33%

Here we see roughly what we expected from the standard accounts of vowel harmony. There is a big difference in the rate of high vowel orthographies (<i>, <í>, etc.) for the unstressed vowel depending on the stressed vowel: after the non-high stressed vowels a, ó and æ the unstressed vowel is not usually spelled with a high vowel, with rates ranging from 3.75 to 18.77%; after the high stressed vowels ei and í the unstressed vowel is usually spelled high, with rates ranging from 91.80 to 99.33%. As expected, the vowel harmony isn’t totally consistent—but it is a very strong pattern.

So how about after the problem vowels e and ę, which aren’t distinguished in the orthography? Most of these behave as we’d expect on the basis of their etymologies. Take þessir ‘this (m.nom.pl.)’ and þessi ‘this (f.nom.sg. / f.dat.sg. / n.nom.acc.pl.)’. As this stem is from Proto Germanic *þess– (cf. Gothic þis) we expect the stressed vowel to be non-high e and so the unstressed vowel to be written <e>. And indeed, this is basically what we find: in the MeNoTA texts, 73.92% of tokens are spelled with <e>, very close to the rates we found after the other non-high vowels; in the DN the rate is a little higher. Conversely, we find that in words with stressed ę (i.e. the i-umlaut product of *a) in a majority of tokens the unstressed vowel is spelled with <i>: hęfi, hęfir ‘have’ and sęgi, sęgir ‘say’ (cf. forms of the verbs without umlaut such as the preterites hafði, sagði); ęptir ‘after’ (cf. the related form aptr ‘back’).

word (normalised) stressed vowel expected unstressed vowel proportion high orthographies

þessir, þessi


non-high 26.08% 20.18%
hęfi, hęfir


high 93.48% 70.37%



high 93.21% 92.29%

sęgi, sęgir


high 86.24% 98.77%

But we do find some words that don’t behave as we’d expect. The etymology of hęndi ‘hand (dat.sg.)’ is not in any doubt: the vowel of the stressed syllable must be the i-umlaut product of *a, as demonstrated by other forms of the noun (e.g. gen.sg. handar) and cognates (e.g. Gothic handus). Yet in a (small) majority of tokens, the unstressed vowel is spelled non-high <e>. Here, we might point to the fact that hęndi underwent grammaticalisation to give the Modern Norwegian adverb hende ‘in hand’ and already in Old Norwegian occurred in a long list of idiomatic phrases: perhaps an early element of this grammaticalisation process was an irregular change of the stressed vowel  ę > e?

The negative pronoun engi ‘none, no’ and its neuter form ekki ‘nothing; not’ is descended by irregular sound change from einn+gi and eitt+gi. As the stressed vowel was originally /ei/ we might assume that it had become /e/, but in that case we would expect that the unstressed vowel usually be written as non-high <e>: instead, we find it is usually spelled high. Looking at the dative forms of this word with unstressed –u we find an equivocal picture, with close to 50% of tokens spelled with a high vowel. All this strongly suggests that engi and ekki were really ęngi and ękki.

The verb gera is also an interesting case. It has a variety of forms in the different Old Nordic languages (ger(v)a, gjǫr(v)a, gør(v)a) of which gera is by far the most common in Old Norwegian. Its different forms and its cognates (OE gearwian/gierwian, OS gerwian/garuwian, OHG garawen) make it clear that the –e– forms must be the result of the i-umlaut of *a (the Íslensk orðsifjabók reconstructs *garw(i)a- for PG). Yet it turns out that its stressed vowel is usually spelled with non-high <e>. Here, we might wonder whether the –ǫ– or –ø– forms (for which the expected harmony vowel would be non-high) were influencing the spelling or pronunciation—or, more interestingly, whether perhaps the –ǫ– or –ø– forms were more common in Old Norwegian speech than we thought but that gera was the standard form in writing, obscuring this.

word (normalised) stressed vowel expected unstressed vowel proportion high orthographies


e non-high 95.69% 64.71%


e non-high 88.77% 95.20%

engu, engum

e non-high 56.37% 41.18%



high 42.46% 30.00%

geri, gerir


non-high 28.98% 35.27%

This is just a small sample of the fascinating patterns to be found in the vowel harmony data for Old Norwegian: my future work will illuminate more about this phenomenon and the sound changes by which it later broke down in Middle Norwegian. In the mean time, I hope I’ve convinced you that quantitative methods can sometimes offer us valuable evidence on descriptive questions in historical linguistics.

One thought on “Old Norwegian vowel harmony and the value of quantitative data for descriptive linguistics

  1. As the stressed vowel was originally /ei/ we might assume that it had become /e/

    I wouldn’t assume that if ę was [e] (while e was of course [ɛ]). And indeed, i-umlaut of *a in the absence of blocking factors produced [e] in Old High German, and still in extant Upper German dialects in many cases.

    A change from unstressed [e] to [ɛ] would explain hende, too…


