The Origin of /ɬ/ in Southern Pinghua

by Xiaolan Cao (University of Melbourne)

In this post, I will discuss the origin of the voiceless lateral fricative /ɬ/ in Southern Pinghua, one of the two branches of Pinghua and a minority Sinitic language. Southern Pinghua is mostly spoken in Southern Guangxi in China (Qin 2000) by approximately 1.8 million native speakers (Min 2013). However, some of the dialects have experienced huge trans-generational language loss and are hence potentially endangered (Cao 2019). Most Southern Pinghua speakers identify as ethnic Han, the majority ethnic group in China, while most of the rest identify as ethnic Zhuang.

In southern Pinghua, the voiceless lateral fricative /ɬ/ is a consonant phoneme occurring in the onset position of a syllable. The phonemicity of /ɬ/ can be established by the minimal pair in Table 1 below.[1]

Word Gloss
/ɬa33/ ‘spread’
/sa33/ ‘sprinkle’

Table 1: a minimal pair of voiceless lateral fricative /ɬ/

Commonly, /ɬ/ is not considered an internal development of Sinitic languages primarily because it rarely occurs in present-day Sinitic languages. Within China, it is distributed in the former Baiyue area, once occupied by the ancestors of Tai-Kadai speakers (Li 2000). Besides Southern Pinghua dialects, Cantonese dialects located in Southern Guangxi and Western Guangdong also have the phoneme /ɬ/. Outside Guangxi and Guangdong, /ɬ/ can be found only in three small regions in China: it can be seen in some dialects of Ming in non-contiguous geographical pockets in Fujian Province or some dialects of Hui in Anhui Province; it also can be found in some dialects of various Sinitic languages spoken on the west coast of Hainan Province (de Sousa 2015: 166-168, quoting Liu X 2006, Liu F 2007, Akitani 2008, and Meng 1981). Due to its limited distribution in present-day Sinitic languages, /ɬ/ is not reconstructed for Middle Chinese or Old Chinese in the literature; see Zhengzhang (2003), Li (1971), Baxter and Sagart (2014), and Wang (1985) respectively.

On the other hand, /ɬ/ is common in present-day dialects of Zhuang, a Tai-Kadai language mainly spoken in Guangxi (Zheng 1998). According to works by Mai (2009, 2011), Ouyang (1995), Yuan (1989), Zheng (1998), and Zhao (2015), the phoneme /ɬ/ in Sinitic languages may have developed under the influences from Zhuang loanwords through language contact. However, the opposing view—that because the phoneme /ɬ/ in Zhuang corresponds to *s in Proto-Tai, it is likely that Zhuang developed this phoneme under the influence from Sinitic languages instead of the opposite direction of influences—has been suggested in the Chinese language literature as well (de Sousa 2015, quoting Li F 1977 and Pittayaporn 2009).

The two views on the origin of /ɬ/ in Sinitic languages have some limitations. First, the argument that /ɬ/ is not an internal development of Sinitic language simply because of its limited distribution and absence from reconstructions for Middle Chinese or Old Chinese does not preclude that /ɬ/ could have developed in Southern Pinghua after the Middle Chinese period.

Further, the evidence does not indicate whether /ɬ/ is an internal development in Southern Pinghua or a phoneme developed under the influences of loanwords from Zhuang through language contact. As for its distribution in Southern Pinghua, the phoneme /ɬ/ can be found in both the Sinitic stratum and the Zhuang stratum. According to a survey by Cao (2018), in the Sinitic stratum, Chinese characters (Chinese cognates) whose Southern Pinghua pronunciations contain onset /ɬ/ were mostly recorded as having the Middle Chinese onset denoted as 心 (*s) in Qieyun, a rhyming dictionary published in 601 CE during the Sui dynasty (581–618). This correspondence exists not only in common words, such as /ɬam52/ (‘three’) and /ɬɜm52/ (‘heart’) but also in literary words, like /ɬɜw52/ (‘constellation’) and /ɬoŋ52/ (‘lofty’).

The correspondences between /ɬ/ in Southern Pinghua and onset 心 (*s) in Middle Chinese suggests that /ɬ/ is of Sinitic origin. However, from the same survey, there are ninety-one admissible syllables start with /ɬ/ in total, among which twenty-six cannot be associated with Chinese characters (Chinese cognates). Normally for Southern Pinghua syllables, being able to be identified by Chinese characters strongly indicates their Sinitic origin. Thus, these twenty-six syllables are possibly not of Sinitic origin but introduced to the language by loanwords from other languages, such as Zhuang. Thus, the distribution of /ɬ/ in Southern Pinghua does not support /ɬ/ being an internal development or one induced by the influences of language contact with Zhuang.

In addition to the distributional features of /ɬ/ in Southern Pinghua, the historical developments of /s/-phonemes in Southern Pinghua may also shed some light on the developments of /ɬ/. In Southern Pinghua, pronunciations of Chinese characters whose onset is /s/ correspond mostly to those denoted in Qieyun as having onsets denoted as 审 (*ɕ), 禅 (**ʑ), and邪 (*z). Based on the fact that these three Middle Chinese onsets did not develop into /ɬ/, we may speculate that the Middle Chinese onset 心 (*s) has some features that make it prone to sound change to /ɬ/ under certain influences, such as loanwords from Zhuang.

Finally, the geographical distribution of /ɬ/ is not so discontiguous as described in previous studies. The geographical distribution of /ɬ/ is contiguous in Southern Guangxi and Western Guangdong. These two adjacent regions in total occupy approximately 184,000 square kilometres [2] of densely populated area, which is larger than Cambodia (181,035 square kilometres) or Nepal (147,181 square kilometres). Therefore, it may not be accurate to describe the territory of /ɬ/ in Southern Guangxi and Western Guangdong as small or isolated, and /ɬ/ can be considered as an areal feature for further studies in historical linguistics, areal linguistics, and linguistic typology. Drawing from the analysis and evidence given in the discussion above, I would like to posit some questions for further investigation.

  1. Why is /ɬ/ so prevalent in Southern Pinghua and Cantonese dialects found in the area of Southern Guangxi and Eastern Guangdong, but not in the other areas?
  2. If language contact with Zhuang is a contributing factor to the development of /ɬ/, why does /ɬ/ occur in Southern Pinghua dialects but not most Northern Pinghua dialects, given both Pinghua branches have similar contact with Zhuang?
  3. Similarly, why do Cantonese dialects in Western Guangdong have /ɬ/ but not those in Eastern Guangdong, considering Cantonese dialects mostly have similar exposure to Zhuang in the history?
  4. Can the peculiar distributions of /ɬ/ in Pinghua and Cantonese dialects be explained by a mere historical accident?

In sum, the two opposing views on the origin of /ɬ/ in Southern Pinghua are questionable because the evidence is inconclusive. At this stage, the origin of /ɬ/ in Southern Pinghua dialects remains unclear, and further investigations are still required.


The Preterite and Perfect in Middle English

by Morgan Macleod (University of Ulster)

The Proto-Germanic tense system, consisting only of a present and a preterite, was augmented in Old English by the addition of a periphrastic perfect. This perfect had already been grammaticalized to the point where it could be used even with intransitive verbs, e.g. þin folc hæfð gesyngod ‘your people have sinned’ (Mitchell 1985: I, 289). However, it was still possible to use the preterite to express similar temporal content, e.g. Ic heold nu nigon gear[…] þines fæder gestreon ‘I (have) now held your father’s property nine years’ (ÆLS I.21.42). For many Old English authors the preterite was in fact the preferred mode of expression; previous research on a sample of Old English texts found that the new periphrastic perfect was used only in 26% (95/360) of the cases where it would have been possible semantically (see Macleod 2014). However, little previous quantitative work exists on the subsequent development of the perfect and preterite towards the modern system, in which the two categories are paradigmatically opposed and can seldom be interchanged without altering the meaning of an utterance.

A preliminary investigation of the preterite and perfect in Middle English was performed using the Helsinki Corpus (Rissanen et al. 1996). Such a corpus, small in size yet selected for balanced content, was ideal for a form of analysis involving manual review of entire textual passages. The methodology was based on that of Macleod (2014): texts from the earliest Middle English period, 1150–1250, were analysed to identify all situations for which a present perfect would be an appropriate representation, and the relevant verbs were identified either as preterites or as perfects. This research revealed an abrupt transition between Old English and Middle English; in Middle English, not including texts that represent late copies of Old English works, the periphrastic perfect was used in 94% (258/274) of cases. It is possible that the earlier stages of this transition took place within OE, where they were obscured by the relatively homogeneous nature of the textual record. In addition, some ME authors seem to show awareness of a new opposition between the preterite and the perfect, e.g. Orm 197 Þe þridde god uss hafeþþ don / Þe Laferrd Crist onn erþe, / Þurrh þatt he ȝaff hiss aȝhenn lif ‘The Lord Christ has done us the third good on earth in that He gave His own life’. Here the same situation is described with a preterite to position it within a historical narrative and with a perfect to highlight its continuing relevance, showing a clearer contrast than seems to have existed in Old English.

Although the majority of Middle English examples seem to conform to the modern pattern, a small number of exceptions remain, a fact noted by previous authors such as Mustanoja (1960) and Fischer (1992). One factor involved in these exceptions may lie in the variation observed (e.g. Elsness 1997) among varieties of English in their tense preferences: constructions such as American English I already ate can be paralleled in Middle English examples such as Ich ne seh him neauer ‘I never saw Him’ (St Juliana 100.15), while examples such as mare wunder ilomp ‘greater wonders (have) happened’ (Ancrene Wisse 32.9) may show an even greater tolerance for the preterite than would be possible in present-day American English. This variation may best be interpreted as a difference not in the temporal meaning of the forms involved, but in the pragmatic presuppositions created by their use, in keeping with the approach of Portner (2003).

Some Middle English examples also involve the use of a past tense under a present-tense verb in a way that would be of marginal acceptability in Modern English. This can be seen in examples such as Brut I.384.7424, Ich þonkie mine Drihte[…] þet he swulche mildce; sent to moncunne ‘I thank my Lord that He sent such mercy to mankind’. Although much research on the sequence of tenses (e.g. Abusch 1997; Gennari 2003) has tended to focus on cases in which the matrix verb is in the past tense, it is known that sequence-of-tense phenomena are subject to cross-linguistic variation in their construction and interpretation. Examples such as the above may reflect an underlying difference between Middle English and Modern English in their sequence-of-tense rules.

This preliminary investigation has found a high degree of similarity between Middle English and Modern English in their use of the perfect even at a very early date, in sharp contrast to the patterns found in Old English texts. While the explanations proposed here may help to explain the small number of apparent counterexamples, more work is needed to substantiate these proposals. In particular, a larger data sample might provide further examples to clarify the factors influencing speakers’ choice between the perfect and the preterite, while a more general examination of the sequence of tenses found in Middle English would be essential to establish the details of the system obtaining at this period and the ways in which it might differ from Modern English. Further research in this area has the potential to illuminate many currently obscure details of the Middle English verbal system.


The Loss of the Latin Case System – A New Morphological Approach

by Zeprina-Jaz Ainsworth (University of Oxford)

Much work has already been done on the development of the Latin case system, which has been lost almost entirely from nouns and adjectives in Romance. Scholars such as Herman (2000) have outlined phonetic, analogical, functional, and syntactic changes which may have contributed to the opacification of certain morphological case forms. However, none of the previous analyses account for the near-total loss of the case category in Romance. For instance, as the result of regular phonological changes, the singular forms in the first declension would not have ‘fallen together’ into a single, invariant shape:

PluralClassical LatinSound ChangeResult

AccusativeMENSAMLoss of final -m**mensa
AblativeMENSĀLoss of vowel length distinctions
GenitiveMENSAEae >[e]
DativeMENSAEae >[e]

Table 1: Phonetic erosion in first declension singular case/number suffixes

Moreover, cross-linguistic comparison indicates that, despite phonological, analogical, and functional developments, languages do not necessarily always lose their case systems. Finnish, for instance, retains the fifteen case values (for nouns and adjectives) reconstructed for proto-Finnic (although the abessive, comitative, instructive and prolative are now in restricted usage), and has even begun to develop new morphological suffixes:

Proto-Finnic nominative, genitive, partitive, essive, translative, elative, inessive, illative, ablative, adessive, allative, abessive, comitative, instructive, prolative
Modern Finnish nominative, genitive, partitive, essive, translative, elative, inessive, illative, ablative, adessive, allative, (abessive, comitative, instructive, prolative), comitiative2, excessive

Table 2: Case values in proto-Finnic and modern Finnish

This study is concerned with answering the question: why do we find such different developments cross-linguistically?

One major difference between these two languages is that Latin is characterized predominantly by fusional morphology, whilst Finnish exhibits an abundance of agglutinative structure. By analysing these structures from a unit-agnostic ‘abstractive’ approach (as opposed to a ‘constructive’ perspective, in which forms are considered to be ‘built’ up of sub-word parts),[1] we may best understand how they behave in significantly different ways in diachrony.

In Latin for instance, the fully-inflected wordform and the relationship it bears to other forms in the paradigm provides the language-user with informative patterns which may be extended in the inflexion of other lexemes – there is no need to posit ‘underlying’ forms or identify sub-word morphs in order to ‘construct’ new forms. For instance, if the language-user knows a nominative singular form ending in -a, the lexeme must belong to the first declension. In the second and fourth declensions, however, even if both the nominative singular and accusative singular forms are known, there is residual ambiguity about the inflexion class to which the lexeme belongs:

Nom. sg. PUELLA 1st declension SERVUS 2nd/4th declension GRADUS 2nd/4th declension
Acc. sg. PUELLAM 1st declension SERVUM 2nd/4th declension GRADUM 2nd/4th declension
Gen. sg. PUELLAE 1st declension SERVĪ 2nd declension GRADŪS 4th declension

Table 3: Implicational relations in a sub-set of Latin nouns

In Finnish, implicative relations provide information about inflexion class, whilst the frequent isomorphic form~function mapping exhibited by inflexional suffixes provides absolute certainty in the expression of most case functions.

Nom. sg. ajatus ‘thought’ -Vs ~ -Vks-/-Vs ~ -VV- vieras ‘stranger’ -Vs ~ -Vks-/-Vs ~ -VV-
Part. sg. ajatusta -Vs ~ -Vks-/-Vs ~ -VV- vierasta -Vs ~ -Vks-/-Vs ~ -VV-
Gen. sg. ajatuksen -Vs ~ -Vks- + [n] vieraan Vs ~ -VV- + [n]

Table 4: Implicational relations and sub-word units in a sub-set of Finnish nouns

Whilst multiple forms are required in Finnish to determine the declension class to which a lexeme with a nominative singular form in -s belongs, there is certainty in many cells as to the inflexional material that will follow the lexical stem.

The abstract patterns that exist in Latin are not maximally-informative, that is, there is occasionally still uncertainty about the shape of an unknown form, even given knowledge of two forms in the language (consider table three). In Finnish, on the other hand, there is a sub-word area of absolute certainty in most of the cells in the inflexional paradigm. In addition to implicational relations, therefore, a Finnish speaker, even where there is not have sufficient information to deduce the inflexion class of a lexeme, may utilize maximally-predictable sub-word forms to produce a form (whether or not the ‘correct’ one) which may be interpreted correctly by a hearer.[2]

The observations offered here accord with language-learning data. Niemi and Niemi (1987) and Laalo (2009), for instance, observe that Finnish children recognise early the direct mapping of the suffix -n and genitive singular functions; they then utilise this knowledge in the deduction of previously unencountered forms. In Latin, exemplary paradigms and principal parts have long been used to capture the inflexional variation exhibited by lexemes. The implicational relations that exist between the nominative singular and genitive singular forms of a noun, for instance, are sufficient to enable L2 learners to ‘match’ novel items to the correct inflexion class.

I suggest that understanding the way in which morphological structures are recognised and exploited by languages-users may help to explain (in conjunction with, e.g., phonological or analogical developments) whether morphological case distinctions are likely to be lost or maintained. In Latin, the implicational relations, although informative, are not always maximally-predictive, and became opacified through time following regular phonological developments (such as those given in table one). As a result of such phonetic erosion, the area of informativeness in the Latin case system has shifted from the area of suffixal variation, distinct across declension, towards the certainty associated with the invariant form of the lexeme. By contrast, the maximally-predictable sub-word elements in Finnish may be rote-learned, which provides them with diachronic stability. These units, in addition to the less informative abstract relations, offer language-users on average more information in language use than is available to a learner of Latin in the production of novel inflected forms. Consideration of the morphological structures found in a given language and the ways in which they are recognised and exploited in language use may therefore offer some additional insight into why the robust Latin case system is not found in Romance.


Functions of Vowel Length in Language: Phonological, Grammatical, & Pragmatic Consequences

by Larry Hyman (University of California, Berkeley)

In this talk my starting point is to frame the different functions of vowel length (lexical, morphological, syntactic, pragmatic) in terms of how they compare with other phonological properties, in particular tone, which has been claimed to be able to do things that “nobody” else can do (Hyman 2011). Rather than providing a cross-linguistic typology, I focus on the different functions of vowel length in Bantu—as well as how these functions have changed. Although Proto-Bantu had a vowel length contrast on roots which survives in many daughter languages today, many other Bantu languages have modified the inherited system. In this talk I distinguish between four types of Bantu languages:

  1. Those which maintain the free occurrence of the vowel length contrast inherited from the proto language;
  2. Those which maintain the contrast, but have added restrictions which shorten long vowels in pre-(ante-)penultimate word position and/or on head nouns and verbs that are not final in their XP;
  3. Those which have lost the contrast with or without creating new long vowels (e.g. from the loss of an intervocalic consonant flanked by identical vowels);
  4. Those which have lost the contrast but have added phrase-level penultimate lengthening.

I will propose that the positional restrictions fed into the ultimate loss of the contrast in types (3) and (4), with a concomitant shift from root prominence (at the word level) to penultimate prominence (at the intonational and phrase level). In the course of covering the above typology and historical developments in Bantu, I will show that there are some rather interesting Bantu vowel length systems that may or may not be duplicated elsewhere in the world and that vowel length is probably second only
to tone in what it can do.

This paper will be read at the Philological Society meeting at SOAS, University of London, Djam Lecture Theatre (DLT, Main SOAS Building), on Friday, 15 February, 4.15pm .

A spoken corpus of Cameroon Pidgin English: Compilation, applications and next steps

by Melanie Green (Sussex) & Gabriel Ozón (Sheffield)

Cameroon Pidgin English (CPE) is an expanded pidgin/creole spoken in some form by an estimated 50% of Cameroon’s 22,000,000 population (Simons & Fennig 2017). CPE is spoken primarily in the Anglophone west regions, but also in urban centres throughout Cameroon. As a predominantly spoken language, CPE has no standardised orthography, but enjoys a vigorous oral tradition, not least through its presence in the broadcast media. The language has stigmatised status in the face of French and English, prestige languages of Cameroon, where it also co-exists with an estimated 280 indigenous languages (Simons & Fennig 2017).

We describe the spoken corpus of CPE, a British Academy/Leverhulme-funded pilot study (Green et al. 2016, Ozón et al. 2017). The corpus consists of 30 hours of recordings made in five locations, resulting in a total of 240,000 words (80 texts of 15 minutes/3,000 words). Proportions of text types are guided by the International Corpus of English project (Nelson 1996), and the texts contain mark-up and part-of-speech-tagging. The corpus files, which are freely available from the Oxford Text Archive, include sound files (*.mp3 and *.wav), raw and annotated text files, participant metadata, a field manual, a tagging manual and a spelling list.

We then briefly describe some case studies of linguistic phenomena that the pilot corpus allows us to investigate, focusing on grammatical and lexical phenomena, as well as codeswitching, demonstrating that while a small corpus provides a robust test-bed for the investigation of grammatical phenomena, a larger dataset is required for the full investigation of lexical and sociolinguistic phenomena. Finally, we outline our plans for a 1-million-word corpus, a project for which a funding application is in preparation.

This paper was read at the Philological Society meeting at SOAS, University of London, on Friday, 18 January 2019, 4.15pm. A video recording of the presentation can be found below; the slides are available here.

