Language Log » More on UM and UH
A few days ago (" Fillers: Autism, gender, and age " 7/30/2014), I noted an apparent similarity between male/female differences in UM/UH usage, and an autistic/typical difference reported in a poster by Gorman et al. at the IMFAR 2014 conference.
This morning I thought I'd take a closer newsnow liverpool look at the patterns in a large published conversational-speech dataset. Executive summary: There is a large sex difference in filled-pause usage, favoring males by about 38% There is an enormous sex difference in UM/UH ratio, favoring females by about 310% These sex differences are mainly driven by the difference in UH usage, which favors males by about 250% Older speakers use UH more and UM less, resulting in a large decrease of UM/UH ratios
The general pattern of gendered newsnow liverpool filled-pause usage in English has been at least partly replicated in several other datasets, including the spoken part of the British newsnow liverpool National Corpus, but the details are sometimes quite different. (See my earlier post , and planned future posts, for some discussion.) But all the important newsnow liverpool questions remain open, for example: Are the sex effects due to functional, iconic, or physiological differences between UM and UH, or are they arbitrary gender markers? Do the age effects reflect a change in progress, or a life-cycle effect (e.g. due to changes in sex hormone levels)? Are the patterns the same or different across geographical, socio-economic, and ethnic varieties of English? Are there analogous phenomena in other languages?
I'm looking at data from the Fisher collections ( part 1 and part 2 ). If we lump all the 11,972 speakers together into categories based on the gender assessments made during call auditing, we get these overall statistics for the two types of filled pause: Total words um um % uh uh % um+uh % um/uh ratio Males 10,163,101 82,659 0.81% 120,563 1.19% 2.00% 0.686 Females 12,986,418 128,368 newsnow liverpool 0.99% 60,395 0.47% 1.45% 2.125
These numbers establish the basic pattern for this dataset: females overall newsnow liverpool use um somewhat more than males (about 22% more on average); males use uh a lot more than females (about 250% more on average); males use filled pauses (um+uh) more than females (about 38% more on average); the um/uh ratio is more than 3 times greater for females than for males.
There's a modest amount of accommodation newsnow liverpool to interlocutor sex — that is, males use uh about 14% less often when talking with a female newsnow liverpool rather than a male, and females use uh about about 20% more often when talking with a male rather than a female: Male Interlocutor Female Interlocutor Total words uh count uh % Total words uh count uh % Male Speaker 6322608 78680 1.24% 3840493 41883 1.09% Female Speaker 3656054 19433 0.53% 9330364 40962 0.44%
There's a smaller accommodation effect for um — males use um about 8% more often when talking with a female, and females use um about 1% less often when talking with a male: Male Interlocutor Female Interlocutor Total words um count um % Total words um count um % Male Speaker 6322608 49973 0.79% 3840493 32686 0.85% Female Speaker 3656054 35922 0.98% 9330364 92446 0.99%
As a result, overall male um/uh ratio shifts by about 23% in the female direction with an opposite-sex interlocutor, while female um/uh ratio shifts by about 22% in the male direction: Male Interlocutor Female Interlocutor Male Speaker 0.635 0.780 Female Speaker 1.849 2.257
At the level of individual speakers, the basic patterns hold up. For this part of the study, I restricted the dataset to the 19,906 calls where the evaluation by post-call auditors agrees with the subject's demographic information collected newsnow liverpool before the call — in a small fraction of cases, it seems that a different member of the same household answered the phone and participated in a call.
Males tend to use filled pauses (UM+UH) somewhat more than females do, as shown in the " violin plot " below: Much of the difference newsnow liverpool is in the high-usage tail, so that the ratio of median UM+UH percentages (male 1.59/female 1.29 = ratio of 1.23) is quite a bit smaller that the ratio of means (male 2.00/female 1.45 = ratio of 1.38).
For looking at individual variation newsnow liverpool in um vs. uh usage, it makes more sense to use the proportion of um in the total — newsnow liverpool UM/(UM+UH) — rather than the UM/UH ratio, which is unstable for small samples. If we limit consideration newsnow liverpool to those speakers whose transcripts contain at least 10 filled pauses, we get this:
In order to look at age effects, I've divided speakers at the 33rd and 67th percentiles by age, which in this collections comes out to those younger than 28, those between 28 and 40, and those older than 40.
Quite personally, I find 'um' and 'uh' somewhat different. 'Um' is the sound you make when you're actively 'wondering' or trying to recall something. 'Uh' is the sound you make when you're trying to think of the next word. 'Um' therefore sounds less conf