The moment of truth: Testing the Matrix Language Frame model in English–Vietnamese bilingual speech

by Li Nguyen (University of Cambridge)

Over the last few decades, there has been burgeoning interest in the study of code-switching in the research of bilingualism. Despite various definitions of what the phenomenon might entail, it is generally agreed in the literature that code-switching broadly refers to bilinguals’ ability to effortlessly alternate between two different languages in their daily speech (Bullock and Toribio 2008:1). This ability enables speakers’ behaviour of language mixing, which, as researchers have come to realise, is far from random but rather governed by specific structural constraints (Poplack 1980; Bullock & Toribio 2009). The nature of such constraints has inspired the search for a ‘universal pattern’, resulting in new investigations involving a number of language pairs, such as English–Spanish (Poplack 1980; Travis & Torres Cacoullos 2013; Aaron 2015), English–Welsh (Stammers & Deuchar 2012), Ukrainian–English (Budzhak-Jones & Poplack 1997), Igbo–English (Eze 1997), or Acadian French–English (Turpin 1998).

One of the most influential theoretical accounts in code-switching literature is Myers-Scotton (2002)‘s Matrix Language Frame model (MLF), which assumes an asymmetrical relationship between the two languages in bilingual discourse. As the MLF goes, ‘speakers and hearers generally agree on which language the mixed sentence is “coming from”’ (Joshi 1985:190–191), and it is this language that constitutes the ‘matrix language’ (ML) of the conversation. In a code-switched clause, the MLF predicts that the ML (i) supplies closed-class system morphemes such as finite verbs or function words, and (ii) determines word order. Although the need and the practicality of identifying a ML in some language pairs are debatable (Sankoff & Poplack 1981; Clyne 1987), the asymmetrical relationship between two languages involved is borne out in many existing datasets. Most often, the asymmetry is more obvious in pairs that are structurally different, with existing evidence heavily involving an Indo-European language and an Asian or African language (see Chan 2009:184 for an exhaustive list). The question is then: does the MLF actually generate accurate predictions in spontaneous speech?

In this project, I am testing the applicability of the MLF in English–Vietnamese code-switching data. This pair provides an interesting testing platform, since they share a similar surface word order (SVO) despite other typological differences. In other words, at a clausal level, the word-order morpheme principle is not applicable to determining the Matrix Language. The focus of the study thus lies on the so-called ‘conflict sites’, points at which the word order of the participating languages differs. These conflicts involve the sequence head-modifier within NPs and Possessive Phrases. Specifically, modifier and possessors precede head nouns in English, but follow head nouns in Vietnamese. When bilingual speakers are presented with such a conflict, MLF predicts that the matrix language (i.e. language of the finite verbs or function words) should determine the word order. Furthermore, as an isolating language, Vietnamese has virtually no overt morphology. This adds an extra layer to the complexity of determining the Matrix Language at the clausal level, which is traditionally is assigned by the language of the finite verb, thereby testing the MLF predictions when these two languages come into contact.

Thanks to fieldwork funding support from the Philological Society, I was able to carry out my fieldwork in Canberra, Australia, where I had existing connections with the Vietnamese bilingual community. Data collection took place between June and September 2017. My principle in building the corpus was drawn from Labov’s emphasis on the vernacular, where ‘minimum attention is paid to speech’ (Labov 1984:29).  This approach was chosen because the vernacular reflects the most natural, systematic form of the language acquired by the speaker ‘before any subsequent efforts at (hyper-) correction or style shifting are made’ (Poplack 1993:252). Recruited speakers were thus free to choose their own interlocutors, in an environment that they were most comfortable with. They were asked to self-record a conversation on their personal mobile phone device, of a minimum of 30 minutes. After the recording was returned, speakers were asked to fill in a questionnaire to obtain information on extra-linguistic variables. The questionnaire consists of 18 questions, available both in English and Vietnamese.

The data collection process was successfully completed, resulting in a corpus of 10 hours of spontaneous speech. Results from this research should offer concrete, empirical evidence for or against the applicability of the MLF in language contact situations in which the participating languages are typologically disparate. If found non-applicable, it is hoped that the patterns found will form the foundation of a new theoretical framework accounting for the data in question. Methodologically, the study demonstrates a systematic approach to determining the ML, especially in problematic situations where the overarching word order of the participating languages converge, and one of the languages lacks overt morphology. When made publicly available, the data will also constitute the first digitalised English–Vietnamese bilingual corpus, providing a valuable resource for future research on this language pair in particular, and in bilingualism research as a whole.


References:

Aaron, J. E. (2015). Lone English-origin nouns in Spanish: The precedence of community norms. International Journal of Bilingualism 19(4), 429–480.

Budzhak-Jones, S. & Poplack, S. (1997). Two generations, two strategies: the fate of bare English-origin nouns in Ukrainian. Journal of Sociolinguistics 1(2), 225-258.

Bullock, B. & Toribio, J. (2008). Cambridge Handbook of Linguistic Code-switching. Cambridge: Cambridge University Press.

Chan, B. (2009). Code-switching between typologically distinct languages. In B. Bullock & A. Toribio (eds.), The Cambridge Handbook of Linguistic Code-switching. Cambridge: Cambridge University Press, 182-198.

Clyne, M. (1987). Constraints on code-switching: How universal are they? Linguistics 25, 739–76.

Eze, E. (1997). Aspects of language contact: A varionatist perspective on codeswitching and borrowing in Igbo-English bilingual discourse. PhD dissertation. Ottawa: University of Ottawa.

Joshi, K. (1985). Processing of sentences with intrasentential code switching. In D. R. Dowty, L. Karttunen and A. Zwicky (eds.) Natural language parsing. Cambridge: Cambridge University Press, 190–205.

Labov, W. (1984). Field methods of the project on linguistic change and variation. In J. Baugh & J. Sherzer (eds.), Language in use: Readings in sociolinguistics. Englewood Cliffs, NJ: Prentice Hall, 28–53.

Myers-Scotton, C. (2002). Contact Linguistics: Bilingual Encounters and Grammatical Outcomes. Oxford: Oxford University Press.

Poplack, S. (1980). Sometimes I’ll start a sentence in Spanish y termino en español: Toward a typology of codeswitching. Linguistics 18(7–8), 581–618. 

Poplack, S. (1993). Variation theory and language contact. In D. Preston (ed.), American dialect research: An anthology celebrating the 100th anniversary of the American Dialect Society. Amsterdam: Benjamins, 251–268.

Sankoff, D. & Poplack, S. (1981). A formal grammar for code-switching. Papers in Linguistics 14(1), 3-46.

Stammers J., & Deuchar M. (2012). Testing the nonce borrowing hypothesis: Counter-evidence from English-origin verbs in Welsh. Bilingualism: Language and Cognition 15(3), 630–664.

Travis, C., & Torres Cacoullos, R. (2013). Making voices count: Corpus compilation in bilingual communities. Australian Journal of Linguistics 33(2), 170-194.

Turpin, D. (1998). ‘Le francais, c’est le last frontier’: The status of English-origin nouns in Acadian French. International Journal of Bilingualism 2(2), 221–233.

Trilingual families in bilingual capital cities

by Kaisa Pankakoski (Cardiff University)

Open borders, superdiversity and globalisation have enabled the formation of a large amount of families where children are potentially multilingual and may have more than one native language. The parents of multilingual children have different strategies, methods and principles in place to promote intergenerational language transmission or passing a non-native language to their offspring.

What principles and other factors influence bringing up a trilingual child? How do the potentially multilingual children feel about their complex language repertoires? Is there a link between a certain method and the children’s attitudes towards their languages?

CardiffandHelsinkiIn my thesis I investigate trilingual families; the factors influencing language transmission; and the perspectives of the multilingual children in my two home cities: Helsinki and Cardiff. The reason why these two capital cities are compared is that they have very different approaches to bilingual education and heritage language promotion while having several similarities from a visible minority language population to substantial support from the governments for the minority languages. The two countries are also officially bilingual, which offers a different foundation for trilingual language transmission than for instance monolingual countries.

Previous research
There are various aspects influencing the transmission of minority languages in the home. These consist of linguistic environment factors such as families’ language strategies and methods of transmission; sociocultural factors including parental and societal attitudes, the roles of the languages or parental and societal support; and finally familial factors that may involve siblings, extended family and possible family mobility.

The most recent research strand of multilingualism, Family Language Policy (FLP), looks at the importance of parental strategies which are fluid and may change over time. Much like any multilingualism research most of FLP and language transmission research is based on bilingual context rather than multilingual context.

Previous work has not looked at trilingual children’s perceptions or the link between perceptions and language strategies. Furthermore, most multilingualism studies fall into the category of linguistics and language acquisition rather than sociolinguistics. There is no transmission research in contexts with a community majority and minority language.

Funding from PhilSoc to carry out fieldwork in Helsinki
IMG_9916From April 2017 until August 2017 I was based in Finland at the University of Helsinki, Department of Modern Languages. This enabled me to interview seven multilingual case study families living in the Helsinki Metropolitan Area. The families were settled in the country and each had at least one trilingual primary school aged child speaking two official languages of the country (Swedish and Finnish) and one or more additional language(s).

The methodological approach draws from qualitative, mixed-methods approach to data collection and analysis. First the parents filled in an online questionnaire to clarify the family’s language pattern. Then semi-structured interviews and observations within the family homes explored issues that affect language acquisition within families. Both parents and children aged five to twelve were interviewed.

I spent three to six hours with each family in their homes. The data collected includes fourteen filled in questionnaires, fifteen hours of audio recorded interviews, seven hours of recorded audio and/or video observation as well as photographs and notes of each family participating in the research.

IMG_1084This winter possible extended family members will be sent an online questionnaire which will hopefully reveal their perspectives. After completing the fieldwork in Helsinki I will carry out the interviews and observations in Cardiff.

The PhilSoc Travel and Fieldwork bursary covered a part of the expenses of the fieldwork allowing me to take time off work while I concentrated full-time on my PhD research.

 

More information about the research
There is a news item on the Cardiff University website as well as a Welsh-language BBC article about my research and fieldwork in Helsinki. For more information about my research questions and methods, see my Cardiff University page.


Read more
Braun, A., 2006. The effect of sociocultural and linguistic factors on the language use of parents in trilingual families in England and Germany.
Bryman, A., 2015. Social research methods. Oxford university press.
Murrell, M. 1966. Language acquisition in a trilingual environment: notes from a case-study. Studia linguistica 20(1), pp. 9-34.
Ronjat, J., 1913. Le développement du langage observé chez un enfant bilingue.
Sṭavans, A. and Hoffman, C. 2015. Multilingualism. Cambridge: Cambridge University Press.
Yamamoto, M. 2001. Language use in interlingual families: A Japanese-English sociolinguistic study. Multilingual Matters.

Natural Language Processing meets social media corpora

by Yin Yin Lu (University of Oxford)

From 17-19 May I attended the CLARIN workshop on the ‘Creation and Use of Social Media Resources’ in Kaunas, Lithuania. The thirty participants represented a broad range of backgrounds: computer science, corpus linguistics, political science, sociology, communication and media studies, sociolinguistics, psychology, and journalism. Our goal was to share best practises in the large-scale collection and analysis of social media data, particularly from a natural language processing (NLP) perspective.

As Michael Beißwenger noted during the first workshop session, there is a ‘social media gap’ in the corpus linguistics landscape. This is because social media corpora are the “naughty stepchild” of text and speech corpora. Traditional natural language processing tools (for, e.g., news articles, political documents, speeches, essays, books) are not always appropriate for social media texts, given the unique communicative characteristics of such texts. Part-of-speech tagging, tokenisation, dependency parsing, sentiment analysis, irony detection, and topic modelling are notoriously difficult. In addition, the personal nature of much social media creates legal and ethical challenges for the data mining and dissemination of social media corpora: Twitter, for example, forbids researchers from publishing collections of tweets; only their IDs can be shared.

I made invaluable connections with researchers at the intersection of NLP and social media data – and Twitter data in particular, which is the area of my own research. Dirk Hovy, an associate professor at the University of Copenhagen, spoke broadly about the challenges of NLP: engineers assume that all language is identically and independently distributed. This is clearly not true, as language is driven by demographic differences. How can we add extra-linguistic information to NLP models? His proposed solution is word embedding: transforming words into vectors, trained on large amounts of data from different demographic groups. These vectors should capture the linguistic peculiarities of the groups.

A variant of word embedding is document embedding – and tweets can be treated as documents. Thus, it should be possible to transform tweets into vectors to capture the demographic-driven linguistic differences that they contain. I will be applying this approach to my own corpus of 12 million tweets related to the EU referendum.

Andrea Cimino, a postdoc from the Italian NLP Lab, spoke about his work on adapting existing NLP tools—which are trained on traditional text—for social media text. The NLP Lab has developed the best POS tagger for social media based upon deep neural networks (long short-term memory), which are able to capture long relationships between words in a sentence. The tagger has achieved 93.2% accuracy, and is currently only valid on Italian texts. Similar taggers can be developed for English texts, given the appropriate training data.

Rebekah Tromble, an assistant professor at Leiden University, presented on the limitations and biases of data collected from Twitter’s Application Programming Interface (API). There are two public APIs that can be used: the historic Search API and the real-time Streaming API. Up to 18,000 tweets can be harvested from the former over the last seven to ten-day period, whichever limit is reached first. The Streaming API allows for up to 1% of all tweets to be collected in real time; as there are 500 million tweets a day, this is approximately 5 million tweets a day.

Continue reading “Natural Language Processing meets social media corpora”

Membership survey 2016 

by Richard K. Ashdowne (University of Oxford; Honorary Membership Secretary, PhilSoc)

In spring 2016 the Council of the Society ran an online survey to find out members’ views on matters to do with the Society’s current activities, and in particular its programme of meetings.

More than 200 members completed the survey, from a wide range of the Society’s very diverse membership, including new and student associate members and those who have been members of the society for many decades.

The chief results of the survey were that more than half of the respondents typically do not attend any meetings of the Society each year, while less than 10% of respondents said they typically manage to attend three or more meetings. Over a quarter of those who completed the survey said they had never attended a meeting of the Society.

The most frequently given reasons for being unable to attend meetings were the difficulty and/or cost of travel to meetings and the pressure of other work or family commitments. A number of other reasons were given by smaller numbers of respondents.

The Society very much understands that the investment of time and money for a member to attend a meeting in person is often considerable. For this reason we have now encouraged speakers to provide a brief abstract that will enable members to make a more informed decision about attending.

With a view to making its meetings more accessible to UK members living outside the southeast of England the Society is continuing to arrange at least one of its regular meetings each year outside of this area. Recent events of this kind have included the events in Newcastle and Leeds in 2016. The Society – via the Secretary – is keen to hear from members who would be willing to host such events in the future.

The survey asked whether respondents had viewed the videos of some of the Society’s joint events with the British Academy and whether members would watch recordings of other meetings in addition to or instead of attending. Since this possibility was generally welcomed by those who responded, the Society has now begun to experiment with making video recordings of some of its regular meetings and making these available via YouTube. It is hoped that members who are unable to attend meetings in person may find these of interest. We would be interested in any feedback on these videos in comments on this post.

Council keeps the arrangements for meetings under regular review and so we’d also be interested in any comments in general on the Society’s events via the comments on this post.

Fieldwork on West Polesian

by Kristian Roncero (University of Surrey)

West Polesian belongs to the Eastern Slavonic subgroup and is spoken in the Polish region of Podlasie, the south-western half of the Brest region in Belarus, and the Volynsk region in Ukraine. West Polesian has hardly been studied separately, yet it differs considerably from the national standard  (or literary) languages where it is spoken. One of the main reasons is its isolation. Older stages of the Common Eastern Slavonic language and culture have been preserved thanks to the fact that Polesians live in a marshy area which can be difficult to access as it is frequently flooded. In Žydča (see map), some speakers  remember the times when they were kids and a helicopter would bring bread to the village as the ‘road’ was flooded (before they drained some roads in the 80’s-90’s).

pastedImage
Map of the studied villages in the region of Brest (Belarus)

There is very little work on West Polesian grammar, which is why I decided that I needed to get it from first hand witnesses. Continue reading “Fieldwork on West Polesian”

Exaptation: acquiring the unacquirable

by Benjamin Lowell Sluckin (Humboldt University of Berlin, formerly University of Cambridge)

I was fortunate enough to receive a PhilSoc Masters Bursary in 2015/16, which has been of greater value to me than the £4000 awarded. It enabled me to study for an MPhil in Theoretical and Applied Linguistics at my institution of choice, the University of Cambridge. I’m happy to say it was worth it!  So before I get down to writing about my experiences of postgraduate study and research, I want to thank PhilSoc for their generosity and for seeing value in that hopeful letter of application penned in early Spring 2015.

First I’ll say a bit about my general experience and then I’ll get down to the linguistic meat. Cambridge is a weird and wonderful place. It is like stepping into a time machine and stepping out in 1870 where everyone has a MacBook. It is a bubble, as everyone says; the real world seems distant and at times one can feel claustrophobic. However, the bubble is good for doing research. It is quiet, there are talks almost every day and there was always the possibility of valuable academic discussion with my peers and seniors in the department, from whom I learnt a great deal.  Like any University, but perhaps especially, there is also the constant opportunity to have your assumptions about everything and anything challenged by those who know better, or at least pretend to do so. The Masters Bursary allowed me not only to learn some serious linguistics, but also to acquire the ability to power a very unstable boat with a very long stick. All in all, I learnt a great deal. I can now say with some confidence that I understand enough syntax to understand what people are disagreeing about most of the time, but not to always understand why they insist on disagreeing.

In my bursary application I said I wanted to specialise in diachronic morphosyntax in Germanic and I specifically “promised” to look at exaptive changes in language (my thanks to George Walkden whose support and lectures got me thinking about these things). In short, Lass (1990, 1997) said that when form-to-function mappings are eroded in language, we can be left with functionless linguistic “junk” which can then be co-opted for an unrelated function. The canonical example from Lass (1990) is the recycling of afrikaans gender marking from Dutch syntactic agreement marking for gender and definiteness (1a,b) to conditioning by the morphological character of the adjective itself (1c,d): simple vs complex.   I found Lass’ ideas interesting and I knew that David Willis in Cambridge had been working on this topic, so I was keen to get in on the action (for lack of a better term). Once arrived, he was always ready to challenge my ideas and encourage me to refine my arguments.

(1) Examples
a. Dutch common/neuter definite & common indefinite

de gevaarlijk-e muis/paard
the dangerous-e mouse.com/horse.neut

b. Dutch neuter, indefinite

een gevaarlijk-∅ paard
a dangerous-∅ horse.neut
(adapted from ex.23, Norde & Trousdale 2016:187)

c. Afrikaans simple adjective

die groot groep
the large-∅ group
([Lubbe & Plessis 2014:28] cf. Sluckin 2016:6)

d. Afrikaans complex adjective

die belangrik-e rol
the important-e role
([Lubbe & Plessis 2014:21] cf. Sluckin 2016:6)

Scholars have argued about exaptation for 25 years; so I will admit now that I approach this problem from a minimalist perspective. That means: I focus on Child Language Acquisition as the primary locus of morphosyntactic change, I reject junk, i.e. functionless material as impossible (like many but not all), and crucially my work assumes that the syntactic architecture is based on a hierarchical generation of formal features and projecting heads, and so on and so on….

This type of change is especially interesting because, in my mind, it shows the incredible capacity of the child acquiring language to regularise seemingly incoherent data. Research into exaptive reanalyses can tell us something about how humans can make good data from bad data.

So what is bad data? Well “junk” doesn’t work if we assume that every utterance is somehow a representation of linguistic units stored in the lexicon – or whatever we call it. Sadly,  I don’t have the space elaborate on all past approaches (see Vincent 1995; Willis 2010, 2016; Lass 1997, and Van de Velde & Norde 2016 for a review), but my hypothesis can be summed up as follows: breakdown in language can, over time, render structures increasingly difficult to acquire; this can reach a point where the target structure—dare I say parameter—is no longer acquirable from the input. The child is faced with the choice of losing the structure or finding any other possible analysis. What’s the difference between this and any other reanalysis, I hear you ask. Well, one standard view is that reanalysis works on the basis of ambiguity between possible analyses; so if there are two or more possible analyses, the child is more likely to choose the simpler one (2a). If the more economical analysis were not found, the original would still be available from the input. I argue that for exaptation what we instead find is that the original analysis is removed completely for the acquirer (2b). Therefore, any new analysis does not rely on ambiguity between the target and other analyses, as the target just doesn’t factor for the child making sense of the input.

I have tried to test this for syntax alone, whereas past work focused more on morphosyntax. The questions I am trying to answer is: how pervasive is exaptive reanalysis and what strategies do children use to find analyses when they can’t draw on strategies of economy. To these ends, I am looking for explanations orthogonal to Universal Grammar. My MPhil thesis research on the collapse of V2 and its reanalysis as Locative Inversion in Early Modern English involving the actuation of locative formal features, e.g. out of the woods came the bear, seems to suggest that phonologically silent syntactic heads might be especially vulnerable to this kind of change, as their acquisition is purely dictated by overt syntax (3a,b: trees for those who like them – click on the “Read more” button). Metaphorically speaking, we knew Pluto was there before we could see it because we could see things orbiting it. Syntax works similarly, the only difference is that if we change an orbit we change the planet, or rather syntactic head, too.  I am pursuing these ideas with larger case studies as part of my PhD project at the Humboldt University in Berlin, where I am now part of Artemis Alexiadou’s  research group.  I am also trying to see how grammar competition, language contact and exaptive reanalysis might go hand in hand in certain situations.

Continue reading “Exaptation: acquiring the unacquirable”