Saussure vindicated

by George Walkden (University of Manchester)

A new paper in the journal PNAS provides the most striking and robust empirical support ever found for Ferdinand de Saussure’s notion of the arbitrariness of the linguistic sign.

That’s not how the authors (henceforth Blasi et al. 2016) interpret it. Nor is it how it’s been reported in the many media outlets that have seized on it. For example, writing for the Guardian, David Shariatmadari describes their findings as the hidden sound patterns that could overturn years of linguistic theory. (The issue of why linguistics papers published in “science” journals get so much press while the same paper published in Diachronica or our own Transactions would be largely ignored is a topic for another blog post.) The authors, for their part, state that “These striking similarities call for a reexamination of the fundamental assumption of the arbitrariness of the sign”. ABC News goes even further: “The breakthrough finding disproves one of the most fundamental concepts in linguistics — the idea that the relationship between the sound of a word and its meaning is unrelated”.

To see why these are probably overstatements, let’s go back to the source.

The arbitrariness of the sign

Here’s what it actually says in the Course in General Linguistics (§2): I quote here from Roy Harris’s standard translation (1983).

The link between signal and signification is arbitrary. Since we are treating a sign as the combination in which a signal is associated with a signification, we can express this more simply as: the linguistic sign is arbitrary.

There is no internal connexion, for example, between the idea ‘sister’ and the French sequence of sounds s-ö-r which acts as its signal. The same idea might as well be represented by any other sequence of sounds. This is demonstrated by differences between languages, and even by the existence of different languages. The signification ‘ox’ has as its signal b-ö-f on one side of the frontier, but o-k-s (Ochs) on the other side.

No one disputes the fact that linguistic signs are arbitrary. But it is often easier to discover a truth than to assign it to its correct place.

(I’m committing the usual sin of referring to Saussure and the Course in General Linguistics more-or-less synonymously, when in fact it’s not clear how much of the Course reflects Saussure’s thought rather than that of his students and editors: see Joseph 2012 and Stawarska 2015 for discussion. But since the Ferdinand de Saussure that’s shaped linguistic discourse over the last century is the Saussure of the Course, I’ll continue with this sin in what follows.)

A number of things are interesting about this passage for those who’ve read the Blasi et al. paper recently. Some of them will be discussed later. But for now just note that Saussure evidently viewed the arbitrariness of the sign as i) a testable claim rather than a tautology or an analytic truth (“This is demonstrated by…”) and ii) so blindingly obvious as to be effectively indisputable (“No one disputes the fact that linguistic signs are arbitrary”). The simple fact that not all languages are the same is enough, for Saussure, to demonstrate that signs are arbitrary.

Saussure is also aware that some people might raise objections, and discusses the cases of onomatopoeia and exclamations in detail. The passage on genuine onomatopoeia is worth quoting in full:

As for genuine onomatopoeia (e.g. French glou-glou (‘gurgle’), tic-tac (ticking of a clock), not only is it rare but its use is already to a certain extent arbitrary. For onomatopoeia is only the approximate imitation, already partly conventionalised, of certain sounds. This is evident if we compare a French dog’s ouaoua with a German dog’s wauwau. In any case, once introduced into the language, onomatopoeic words are subjected to the same phonetic and morphological evolution as other words. The French word pigeon (‘pigeon’) comes from Vulgar Latin pīpīo, itself of onomatopoeic origin, which clearly proves that onomatopoeic words themselves may lose their original character and take on that of the linguistic sign in general, which is unmotivated.

The findings of Blasi et al. (2016)

The paper by Blasi et al. isn’t very long, so you should read it rather than relying on my summary. Unfortunately it’s behind a paywall, so here’s an overview for those who can’t access it. I’m not a statistician and can’t evaluate their methods in detail, nor am I a phonologist or phonological typologist, so I’ll take it as given that their data is good and that their method works as described.

Blasi et al. take word lists from 6,452 linguistic varieties. Each word list has between 28 and 100 lexical items on it, all of which are items of “basic vocabulary” (scare quotes in original). Their aim is to find associations between particular segments and particular concepts (basically, significations), and they do this by running a battery of statistical tests that evaluate the presence of a symbol in a word against the presence of the same symbol in a set of other words. As part of this, they try to screen out a number of possible confounds, including word length, phonotactic restrictions, and areal contact. They only accept sound-meaning pairings found in at least three different macro-areas and across ten different lineages. These restrictions mean that the procedure is conservative and likely to have a large number of false negatives.

They find that 74 sound-concept associations pass the test, involving 30 different concepts and 23 different symbols. Interestingly, not all of these are positive associations: 36 of the 74 are “negative” associations. For instance, the symbols u, p, b, t, s, l and r are cross-linguistically unlikely to form part of a first person pronoun, and the word for ‘dog’ is unlikely to contain a t. On the other hand, the word for ‘dog’ is very likely to contain an s, and the word for ‘knee’ is very likely to contain the symbols u, o, p, k, q. A useful measure is the risk ratio (RR) for a symbol-concept pairing: this gives the ratio of the frequency of a given symbol in words for that concept to the frequency of the same symbol in other words. The only really substantial ratio is for the symbol C (a voiceless palato-alveolar affricate) and the concept ‘small’, which is 5.12. That means (if I’ve understood correctly) that the voiceless palato-alveolar affricate is more than 5 times as frequent in the word for ‘small’ than it is in the average word. All the other risk ratios for positive associations are between 1.17 and 2.77, and the risk ratios for negative associations are between 0.18 and 0.81. The strongest negative association is between the symbol p and the first person singular pronoun: with a RR of 0.18, other words are more than five times as likely to have this symbol.

Blasi et al. vs. Saussure

The first thing to observe is that Saussurean arbitrariness is about the link between a signal and a signification. What Blasi et al. are looking for isn’t that: they’re only looking at the link between a particular symbol and a signification. But signals are a lot more than symbols: they also involve linear ordering. Since signals are otherwise basically the sum of their parts, though, we’ll grant that associations between symbols and concepts are relevant to arbitrariness, and move on.

(NB: the term signal is used by Blasi et al. to refer to associations between symbols and concepts, both positive and negative. This is evidently not the same as signal in Harris’s translation, which translates signifiant in the original French, i.e. the phonological form of a word.)

Recall that, for Saussure, the fact that different languages had different words for things at all was sufficient to demonstrate the arbitrariness of the linguistic sign. It would be very easy in principle to show that arbitrariness was not complete, if the evidence existed: just find a word that is the same in all languages. This is obviously not what Blasi et al. do, however. In fact, they don’t find a single exceptionless link between a symbol and a concept. What they instead find are tendencies: some subtle, some less so. But if it is possible for words to vary in form across languages, then arbitrariness is a fact about human language.

The reader might object at this point that I’m retreating to a very weak notion of arbitrariness. Actually, I’m not retreating to anything: I’m just recapitulating what was published in the Course almost exactly a hundred years ago. Saussurean arbitrariness makes clear predictions, which this study entirely (and impressively) fails to falsify despite submitting them to rigorous empirical examination. Instead, when we look at the world’s languages, we find not a single example of a cross-linguistically constant pairing of signal and signification. We don’t even find a cross-linguistically constant pairing of symbol and signification.

There is a stronger hypothesis that one could entertain, which is that there should be no meaningful cross-linguistic associations between symbols and concepts at all, or at least no more than chance scattering of symbols would predict. The paper does seem to falsify that. (Though even that’s not entirely clear: there are 39 symbols in the ASJP database they use, and in principle 100 concepts, so there are 3,900 possible associations, of which only 74 are detected by the analysis, i.e. 1.9%. That doesn’t seem enormously impressive, even if the method is biased towards conservativity.) However, this is a clear straw man: no one has ever proposed this, and it’s not obvious why anyone ever would.

Can Saussure’s approach deal with the kinds of positive association that this study demonstrates? Sure. While Saussure doesn’t discuss the full gamut of sound symbolism and iconicity-based reasons for symbol-concept association cross-linguistically of the kind that the paper by Blasi et al. makes reference to, recall what he says about onomatopoeia. While Latin pīpīo has developed into French pigeon, according to Saussure, the initial consonant p and vowel i are retained. If the sound of a pigeon’s call is the original motivation for the creation of the word pīpīo, as Saussure implies, and if that can happen in any language for similar reasons, then it’s clear that the residue of motivation can be retained over time and find parallels cross-linguistically. The prediction would be that these associations would be reasonably subtle, and certainly not absolute, given the distorting effect of regular sound change etc. – in other words, exactly the sort of thing we find in the Blasi et al. paper.

Another thing that makes me uneasy about the interpretation of these findings is independent of the concerns mentioned above. It goes as follows: even if Blasi et al. were able to demonstrate that symbol-concept associations (or signal-signification associations) were incredibly widespread cross-linguistically to the point of being the norm, that wouldn’t mean that we were dealing with violations of arbitrariness, even under the strong straw-man hypothesis presented above. In the case of glou-glou and gurgle, we are dealing with a principled link between sound and meaning. That also holds for some of the cases found by Blasi et al. The links between ‘tongue’ and the sound l and between ‘nose’ and the sound n make sense in terms of articulation. Similarly, the oft-wheeled-out case of ‘breasts’ and m, related to what suckling babies do, has a story behind it. But other cases are much more mysterious. Why should there be an association between ‘ash’ and u? Or between ‘one’ and t? Blasi et al. don’t tell us. Even more mind-boggling are the negative associations. More than half of these – 19, in fact, which is more than a quarter of the whole dataset – relate to personal pronouns. What could possibly be the explanation for the fact that ‘I’ doesn’t like to co-occur with u, p, b, t, s, l, r? Or for the fact that ‘you’ doesn’t like to co-occur with u, o, p, t, d, q, s, r, l? Or ‘we’ with p, l, s? Evidently this unholy trinity doesn’t like being associated with local personal pronouns at all, for some reason. Blasi et al. have nothing at all to say about causes for the negative associations. The upshot is that, without a principled explanation, I don’t see why quantitative evidence of association alone would be enough of a warrant for stating that a connection is non-arbitrary in a linguistic sense. Perhaps there’s a theory that neatly derives the patterns found in this dataset without resorting to post hoc justifications for individual cases. Or maybe there will be one in future. But for the moment I’m not aware of one.

The paper calls for “a reexamination of the fundamental assumption of the arbitrariness of the sign”. It looks like the only people who should be reexamining this fundamental assumption are those who think that the data presented in this paper in any way challenges the original statement of the arbitrariness of the sign in the Course in General Linguistics. Moreover, the best way to reexamine it is to go back and read what the Course actually says.

So what does this paper tell us?

I’ve argued that the findings of this paper don’t contradict Saussure’s original position in the slightest – in fact, they provide impressive (if perhaps unnecessary) support for it.

One might try to make the case that the target of the Blasi et al. paper wasn’t Saussure or the Course. That seems implausible to me, as the Course is the very first citation in the very first sentence (and has a misspelling of Albert Riedlinger’s name in the list of references). More generally, the way this paper is structured puts it in a long line of papers where an orthodoxy is challenged. We see this time and time again in linguistics, for instance with Chomsky, the poverty of the stimulus, and recursion. In the last two cases, challengers have taken a strong version of the claim and argued that it is false, and in both cases some defenders have argued that the original claim, while still contentful, does not in fact make the predictions that the challengers suggest. It certainly makes for a great media game – though only if the orthodoxy is very well established, either by virtue of its age or by being associated with one of the most prominent figures in the field. In the case of Saussurean arbitrariness, both of these prerequisites are met.

The paper does demonstrate very clearly, however, that there are associations between particular segments and particular meanings that are cross-linguistically robust, and that geographical proximity and common ancestry can’t be the only cause of this. While none of this would have surprised Saussure, it’s still a valuable demonstration in its own right.

Released under a CC-BY-SA 4.0 licence.


  • Blasi, Damián E., Wichmann, Søren, Hammarström, Harald, Stadler, Peter F., and Christiansen, Morten. 2016. Sound-meaning association biases evidenced across thousands of languages. Proceedings of the National Academy of Sciences of the United States of America, Early Edition. DOI: 10.1073/pnas.1605782113.
  • Joseph, John E. 2012. Saussure. Oxford: Oxford University Press.
  • Saussure, Ferdinand de. 1983 [1916]. Course in General Linguistics. Eds. Albert Bally & Robert Sechehaye. Trans. Roy Harris. La Salle, Illinois: Open Court.
  • Stawarska, Beate. 2015. Saussure’s philosophy of language as phenomenology: undoing the doctrine of the Course in General Linguistics. Oxford: Oxford University Press.

2 thoughts on “Saussure vindicated

  1. Just to throw in the most obvious hypothesis as to why personal pronouns do not contain certain letters. They are used commonly, so they need to be pronounced with minimum expenditure of energy or time (hence they are so short). Perhaps some letters are more suitable for that purpose than the others?


  2. Yes, I was wondering whether word length might be a factor. I don’t know why p, l and s would be any worse for short words than other sounds, though. And they do attempt to control for word length in ways that seem reasonable. Also, negation markers tend to be very short, but they don’t find any associations there at all.


