What is language revitalization about? Some insights from Provence

by James Costa (Sorbonne Nouvelle / UMR LACITO (CNRS), Paris)

Should you find yourself in Provence this summer, you might wonder why some villages have bilingual signs at the entrance. Your surprise would be forgiven, since you are unlikely to have heard anything but French in most places, and likely a lot of English as you approach the Mediterranean. But if you listen more closely, observe more closely, you might come across a world that is fast vanishing, but that is still present. You might stumble upon a concert in a language that you cannot identify, or wonder why some street names don’t sound French. You might even hear people speak Occitan—for this is what it is, a language also known as Provençal, one which many locals will refer to as “Patois” (a derogatory term in France to refer to anything other than French traditionally spoken in the country).

provençal
Bilingual sign (French, Provençal)

This sort of experience might happen to you in Provence, but not only. Across the European Union, several million people speak a language that is not the official language of the state they live in. Across Europe, there are language advocates who defend and promote the right to speak one’s language. This struggle for language rights also extends to Latin America, North America, Australia, and many other places. This, many scholars assert, is a consequence of globalization—a backlash against uniformity if you like. A way of being oneself, of finding meaning locally in a world that seems to be getting smaller. In my recent book, Revitalising Language in Provence: A Critical Approach, I argue otherwise. Those movements are not a reaction to globalization—they are, on the contrary, a way of taking part in this process, on the very terms defined by those who define what globalization is (and not on their own terms, as Leena Huss [2008, 133] asserts).

But let’s start from the beginning. This book focuses on Provence, home to what is perhaps the earliest language reclamation movement, or at least one of the earliest. Poets had already started writing texts in defense of Gascon, Provençal or Languedocien (all dialects of what most scholars of Romance linguistics view as Occitan) back in the 16th and 17th centuries. This is perhaps a consequence of an increasingly aggressive move to promote French in all administrative domains at the expense of Latin and Occitan, which had been in use for official usage for centuries in what is now Southern France. But it was after the French Revolution Terror government (after 1793) sought to eradicate the “patois” that a genuine interest was born in various parts of France, resulting in the south in a rediscovery of the poetry of Medieval Troubadours and in a scholarly interest in the history of Provence and Languedoc before their annexation to France. It wasn’t, however, before the 1850s that an organized language-based movement was formed, under the aegis of poets such as Frederic Mistral or Joseph Roumanille.

The Felibrige was the name they gave to their movement, a name whose origin remains mysterious. The Felibres sought to revive the Provençal or Occitan language (which was still almost universally spoken in all of Southern France) through poetry and literature. And indeed, Mistral published a series of long, epic poems that were hailed across Europe as monuments of literature. Mirèio is probably his most well known poem, a love story set in the Crau region of Provence and an allegory of the language revival movement. Mirèio was acclaimed in Paris as a chef d’æuvre, and was prefaced by Lamartine.

I recount parts of the history of the movement in the book but for this blog post, suffice it to say that while successful on a literary level, it never succeeded in political terms. Provençal was long banned in education, and despite a strong Occitan movement throughout the 20th century, the use of Provençal continued (and continues) to decline. But the story I tell in this book isn’t the story of the language movement. Instead, following a two-year ethnographic study in Provence, I ask why the movement was based on language at all, like so many others afterwards—but, crucially, none before, or at least none before the 1840s.  Continue reading “What is language revitalization about? Some insights from Provence”

AGM & The President’s Lecture: Standards, norms and prescriptivism

The Annual General Meeting of the Philological Society was held on 17 June at Selwyn College, Cambridge.

Having completed a four-year term of office, Prof. Wendy Ayres-Bennett stood down as President of the Society; she is succeeded by Prof. Aditi Lahiri FBA.

The following Members of Council have served their term on council or wished to retire early, and did not stand for re-election: Prof. Ruth Kempson FBA (KCL); Prof. Aditi Lahiri FBA (Oxford); Dr John Penney (Oxford); Dr George Walkden (Manchester).

In their place, the following new Ordinary Members of Council have been elected: Prof. Eleanor Dickey (Reading); Dr Mary MacRobert (Oxford); Prof. Maj-Britt Mosegaard-Hansen (Manchester); Dr David Willis (Cambridge).

The 9th RH Robins Prize was awarded to Jade Jørgen Sandstedt (Edinburgh) for a paper entitled ‘Transparency and blocking in Old Norwegian height harmony’, which will be published in TPS.

The outgoing President delivered her President’s Lecture on ‘Standards, norms and prescriptivism’, an audio recording and screencast of which can be found below and on the Society’s YouTube channel.

The Faces of PhilSoc: Melanie Green

melanie_green

Name: Melanie Green

Position: Reader in Linguistics and English Language

Institution: University of Sussex

Role in PhilSoc: Council Member

 


About You

How did you become a linguist – was there a decisive event, or was it a gradual development?

Somewhere between doing my A-levels (in English, French and Latin) and applying for university, when I found the SOAS prospectus in the school cupboard. At that point I realised that studying language didn’t have to mean studying literature, and I applied to study Hausa at SOAS. In my final year, I took a course that focused on the linguistic description of Hausa (taught by Professor Philip Jaggar), and it was this course that led me upstairs to the Linguistics Department, where I then took my MA and PhD.

What was the topic of your doctoral thesis? Do you still believe in your conclusions?

My doctoral thesis was on focus and copular constructions in Hausa, and offered a minimalist analysis. I still believe in the descriptive conclusions, which relate to the grammaticalisation of non-verbal copula into focus marker, but I’m less convinced these days by formal theory. I still enjoy teaching it though, because I think it makes students think carefully (and critically) about formal similarities and differences between languages.

On what project / topic are you currently working?

Together with Gabriel Ozon at Sheffield and Miriam Ayafor at Yaounde I, I’ve just completed a BA/Leverhulme funded project to build a pilot spoken corpus of Cameroon Pidgin English. Based on this corpus, Miriam and I co-authored a descriptive grammar of the variety, which is in press.

What directions in the future do you see your research taking?

In my dreams, typologically-framed language documentation. In reality, probably more corpus linguistics, since this seems to be what attracts funding at the moment.

How did you get involved with the Philological Society?

The PhilSoc published my first book, Focus in Hausa.


‘Personal’ Questions

Do you have a favourite language – and if so, why?

No.

Minimalism or LFG?

Minimalism.

Teaching or Research?

Both.

Do you have a linguistic pet peeve?

No.

 


Looking to the Future

Is there something that you would like to change in academia / HE?

I would like there to be more funding for language documentation. Languages are dying faster than we can describe them.

(How) Do you manage to have a reasonable work-life balance?

I do, but that only became possible in mid-career. I achieve it with careful planning, so when I’m off work, I’m really off work.

What is your prime tip for younger colleagues?

Start publishing as early as possible. 

Natural Language Processing meets social media corpora

by Yin Yin Lu (University of Oxford)

From 17-19 May I attended the CLARIN workshop on the ‘Creation and Use of Social Media Resources’ in Kaunas, Lithuania. The thirty participants represented a broad range of backgrounds: computer science, corpus linguistics, political science, sociology, communication and media studies, sociolinguistics, psychology, and journalism. Our goal was to share best practises in the large-scale collection and analysis of social media data, particularly from a natural language processing (NLP) perspective.

As Michael Beißwenger noted during the first workshop session, there is a ‘social media gap’ in the corpus linguistics landscape. This is because social media corpora are the “naughty stepchild” of text and speech corpora. Traditional natural language processing tools (for, e.g., news articles, political documents, speeches, essays, books) are not always appropriate for social media texts, given the unique communicative characteristics of such texts. Part-of-speech tagging, tokenisation, dependency parsing, sentiment analysis, irony detection, and topic modelling are notoriously difficult. In addition, the personal nature of much social media creates legal and ethical challenges for the data mining and dissemination of social media corpora: Twitter, for example, forbids researchers from publishing collections of tweets; only their IDs can be shared.

I made invaluable connections with researchers at the intersection of NLP and social media data – and Twitter data in particular, which is the area of my own research. Dirk Hovy, an associate professor at the University of Copenhagen, spoke broadly about the challenges of NLP: engineers assume that all language is identically and independently distributed. This is clearly not true, as language is driven by demographic differences. How can we add extra-linguistic information to NLP models? His proposed solution is word embedding: transforming words into vectors, trained on large amounts of data from different demographic groups. These vectors should capture the linguistic peculiarities of the groups.

A variant of word embedding is document embedding – and tweets can be treated as documents. Thus, it should be possible to transform tweets into vectors to capture the demographic-driven linguistic differences that they contain. I will be applying this approach to my own corpus of 12 million tweets related to the EU referendum.

Andrea Cimino, a postdoc from the Italian NLP Lab, spoke about his work on adapting existing NLP tools—which are trained on traditional text—for social media text. The NLP Lab has developed the best POS tagger for social media based upon deep neural networks (long short-term memory), which are able to capture long relationships between words in a sentence. The tagger has achieved 93.2% accuracy, and is currently only valid on Italian texts. Similar taggers can be developed for English texts, given the appropriate training data.

Rebekah Tromble, an assistant professor at Leiden University, presented on the limitations and biases of data collected from Twitter’s Application Programming Interface (API). There are two public APIs that can be used: the historic Search API and the real-time Streaming API. Up to 18,000 tweets can be harvested from the former over the last seven to ten-day period, whichever limit is reached first. The Streaming API allows for up to 1% of all tweets to be collected in real time; as there are 500 million tweets a day, this is approximately 5 million tweets a day.

Continue reading “Natural Language Processing meets social media corpora”

Big and small data in ancient languages

by Nicholas Zair (University of Cambridge)

Back in November I gave a talk at the Society’s round table on ‘Sources of evidence for linguistic analysis’ on ‘Big and small data in ancient languages’. Here I’m going to focus on one of the case studies I considered under the heading of ‘small data’, which is based on an article that I and Katherine McDonald and I have written (more details below) about a particular document from ancient Italy known as the Tabula Bantina.

tabula_bantina

It comes from Bantia, modern day Banzi in Basilicata and is written in Oscan, a language which was spoken in Southern Italy in the second half of the first millennium BC, including in Pompeii prior to a switch to speaking Latin towards the end of that period. Since Oscan did not survive as a spoken language, we know it almost entirely from inscriptions written on non-perishable materials such as stone, metal and clay. There aren’t very many of these inscriptions: perhaps a few hundred, depending on definitions (for instance, do you include control marks consisting of a single letter?). We are lucky that Oscan is an Indo-European language, and, along with a number of other languages from ancient Italy, quite closely related to Latin, so we can make good headway with it. Nonetheless, our knowledge of Oscan and its speakers is fairly limited: it is certainly a language that comes under the heading of ‘small data’.

 

iron_age_italy

One of the ways scholars have addressed the problem of so-called corpus languages like Oscan, and even better-attested but still limited ones like Latin has been to combine as many relevant sources of information, from ancient historians to the insights of modern sociolinguistic theory as a way of squeezing as much information from what we have – and trying to fill in the blanks where information is lacking. This has been a huge success, but this approach can also be dangerous, especially when it comes to studying language death. Given that we know a language will die out in the end, it is very tempting to see every piece of evidence as a staging post in the process, and try to fit it into our narrative of language death. Often this provides very plausible histories, but we must remember that, while in hindsight history can look teleological, things are rarely so clear at the time.

The Tabula Bantina is a bronze tablet with a Latin law on one side and an Oscan law on the other side. It is generally agreed that the Latin text was written before the Oscan one, but the Oscan is not a translation of the Latin: the writer of the Oscan text simply used the conveniently blank side of the tablet to write the new material on. The striking things about the Oscan text are that it is written in the Latin alphabet, and there are lots of mistakes. It also strongly resembles Latin legal language. The date of this side is probably between about 100-90 BC, just before Rome’s ‘allies’, which is to say conquered peoples and cities in Italy, rose up against it in a rebellion generally known as the Social War. Continue reading “Big and small data in ancient languages”

The Faces of PhilSoc: Karen Corrigan

karen_corrigan

 

Name: Karen Corrigan

Position: Professor of Linguistics and English Language

Institution: Newcastle University

Role in PhilSoc: Council Member


About You

How did you become a linguist – was there a decisive event, or was it a gradual development?

Even as a child I was fascinated by all things linguistic. I grew up in Northern Ireland at the height of The Troubles and the arrival of the British Army was my first exposure to accents and dialects that were not native to the region since Northern Ireland back then was synonymous with emigration rather than immigration. My younger sister and me – despite being teenagers – didn’t get out much on account of the security situation and used to entertain ourselves confined to quarters by challenging each other to mimic the English and Scottish accents we had begun to hear around us. I suppose that was our way of trying to make light of the threat which the army represented in our lives. When I went to University as an undergraduate, opting for English, Irish and Linguistics was thus a no-brainer for me.

What was the topic of your doctoral thesis? Do you still believe in your conclusions?

“The Syntax of South Armagh English in its Socio-Historical Perspective.” Amongst my conclusions, were the ideas that:

  1. Irish English needed to be investigated from a contact linguistic perspective since it did after all develop from the L2 acquisition of English on a massive scale by L1 speakers of Irish;
  2. Taking a mixed Sociolinguistic and Biolinguistic approach to syntactic variation and change can be more illuminating than viewing it through a single lens.

I still believe in both of these conclusions and the latter, in particular, has become associated with a new sub-discipline in linguistics known as ‘Socio-Syntax’ which I have continued to work in since and which is being further supported by the research of other scholars too.

On what project / topic are you currently working?

Research on language in Northern Ireland (including my own prior to 2014) tends to focus on the varieties spoken by the major ethnicities. Their linguistic heritages have been hotly disputed and scholarship reflects the socio-political conflict of ‘The Troubles’. The Peace Process has ensured greater protection for Irish and Ulster Scots and has also made the region more attractive, resulting in unprecedented immigration. New ethnicities have become increasingly visible and audible. The project I am currently investigating was supported by an AHRC Research Leadership Fellowship and explores these connected communities in the light of historical emigration.
The project addresses the following issues arising from these inward and outward migratory trends:

  1. How can a cross-disciplinary approach to migration and language contribute to our knowledge of the ways in which socially meaningful spaces are constructed by human agents?
  2. How do speakers make use of linguistic variation to express local belonging and/or dissonance therefore developing, and displaying to each other, ‘a sense of place’ (Convery, Corsane and Davis 2012)?
    In other words:

    • Do ethnic minorities maintain their community languages to assert social distance?
    • What are the constraints on linguistic variation amongst indigenous young people?
    • Are new inward migrants acquiring the same constraints as their local peers?
    • Are there differences between the constraints discernible amongst the Northern Irish English varieties used by newer and older minority ethnic groups?
  3. Do diverse NI social groups hold similar or different attitudes towards minority and regional languages and their speakers?
  4. To what extent are the migratory experiences of the Irish Diaspora and inward migrants to NI similar and can historical records of emigration by the majority ethnic groups be used to promote tolerance towards ethnic minority communities now living in NI?
  5. What ‘best practice’ educational support is there for regional and minority languages in NI?

What directions in the future do you see your research taking?

I will continue to work on language and dialect issues in Ireland alongside keeping up my interests in the ‘Diachronic Electronic Corpus of Tyneside English‘ project which I have been developing there since joining Newcastle University in the 1990s.

How did you get involved with the Philological Society?

I became a member of the society though communications with Prof. Keith Brown, former Honorary Secretary for Publications of the Society, who I first met as an undergraduate. Keith was the external examiner of our Linguistics programme at University College, Dublin and the practice there was to have a viva with the external as part of the examination process from the First Year onwards.


‘Personal’ Questions

Do you have a favourite language – and if so, why?

It has to be Irish because it is a minority language and could do with the support!

Minimalism or LFG?

Minimalism but with a Sociolinguistic twist.

Teaching or Research?

I have to say I really enjoy both.

Do you have a linguistic pet peeve?

Approaches to contact varieties that do not consider Mufwene’s wonderful ‘Founder Principle’

What’s your (main) guilty pleasure?

Chocolate.


Looking to the Future

Is there something that you would like to change in academia / HE?

I think HE should be free to all.

(How) Do you manage to have a reasonable work-life balance?

If I am honest, I’m afraid that I don’t …

What is your prime tip for younger colleagues?

Learn to be collaborative and collegiate.

The Morphological-to-Analytic Causative Continuum in Hausa: New Insights and Analyses in a Typological Perspective

by Philip J. Jaggar (School of Oriental and African Studies, University of London)

Over the last few decades, linguists have devoted considerable attention to both homogeneity and variation in the expression of causal events across languages. However, most studies, whether typological or language-specific, have focused on the category of morphologically overt (e.g., ‘lie/lay X down’) causatives, to the relative neglect of complex periphrastic (e.g., ‘get X to lie down’) formations.

The present study addresses this imbalance by elucidating a wide spectrum of causative expressions in Hausa (Chadic/Afroasiatic), supported by a strong cross-linguistic perspective. In line with contemporary approaches located within a general typology of causation, the analysis invokes the widely-accepted dichotomy between direct and indirect causative constructions. Direct causation associates with morphological causatives, indirect causation with periphrastic expressions—compare morphological ‘I lay X down’ (direct, with no intermediary) with periphrastic ‘I got X to lie down’ (indirect, where X also functions as an intervening actor/cause).

Hausa uses an indirect periphrastic causative usually formed with sâa ‘cause’ (lit. ‘put’) as the higher causal verb, e.g., nâs taa sâa yaaròn yaa kwântaa ‘the nurse got the boy to lie down’ (= intransitive kwântaa ‘lie down’). Direct morphological causatives, in contrast, associate with a specific derivational formation, known as “Grade 5” (Parsons 1960/61), e.g., nâs zaa tà kwantar̃ dà yaaròn ‘the nurse will lay the boy down’.

The monograph systematically explores, for the first time in an African language to our knowledge, the key design-features that distinguish the two mechanisms, in addition to demonstrating that Hausa periphrastic causatives can also differ from each other, e.g., in implicational strength, depending on the modal (TAM) properties of the lower clause. In so doing, it provides a rare account of how the two types are used to describe pragmatically different causal events and participant roles.


Jaggar, Philip J. (2017) The Morphological-to-Analytic Causative Continuum in Hausa: New Insights and Analyses in a Typological Perspective. (Abhandlungen für die Kunde des Morgenlandes, Band 109). Wiesbaden: Harrassowitz.
Available from 1 June 2017.