Tuesday, November 4, 2014

So maybe Midland people are on average less likely to signal interpersonal awkwardness, or at least

Language Log » UM / UH geography
Attached is a locally autocorrelated map based on the percent of um vs uh (i.e. um/(um+uh)) in a few billion word of geocoded tweets of 2013 (about 40,000 tokens giant each). Red are areas where "uh" is relatively more common and blue are areas where "um" is more common. quite a clear pattern, and probably the clearest Midland (only?) lexical pattern I've ever found.
The maps could be improved too. It's only based on about 1/3rd of the corpus there and there is also various noisy data in there that we need to strip out (e.g. blogs, retweets, Spanish). But overall the basic map is right. I'll get you some cleaned up data when I have a chance.
Update — Perhaps all of these patterns are just social amplification of random fluctuations in cultural signifiers. In fact, at some level that has to be true. But it's also possible that what's really happening involves variation at the level of different conversational functions giant of what we transcribe giant as UM and UH — and the meaningful variation might be that some people tend to vocalize certain giant conversational functions differently, or that some people tend to perform certain converational functions more often.
Some of this is obvious giant — thus we know that older people are more likely have take longer to find a word than younger people, and so the tendency for older people to use UH more often than younger giant people might be for this reason.
But some (of the many) possible stories of this type are less obvious. For example, there's a usage that we might call the "Awkward UM", exemplified by Alice's response "Um, yeah, go Golden giant Dragons" in the first panel of the 8/12/2014 Dumbing of Age strip :
So maybe Midland people are on average less likely to signal interpersonal awkwardness, or at least to signal it with phrase-initial UM. This is probably not true; but it's a plausible example of the kind of thing that might be behind the complex demographics of these simple words. [I need to get the explanation of the colors correct, though...]
I'm just waiting for someone to compile all of these indicators into one analysis that concludes that the one person in the US most likely to use "uh" is (name) of (hometown). Alert the media so they can camp on his doorstep waiting for an interview, and display in the bottom right corner of the screen during the interview a running counter of the number of times "uh" is said. Mike said,
ahaha apparently the Unicode thumbs up character ( U+1F44D ) not only was eaten by the commenting system, but also caused the rest of the comment to disappear! be careful with that thumb. Brett said,
BTW Texas is among the youngest states (median age 33.6, 2010 Census) and Florida among the oldest (40.7). Anyone surprised? You can look at this pdf report from the Census bureau . J. W. Brewer said,
I find it odd for a version of the U.S. "Midland" in regional-dialect-variation terms to include all of Del./Md. but exclude all of SE Pa. and South Jersey. In fact, having grown up only a few miles on the southern side of the border between New Castle Co., Del. (light blue) and Delaware Co., Pa. (pink), I find it odd for that particular bit of curved state line ( http://en.wikipedia.org/wiki/Twelve_Mile_Circle ) to be an isogloss of any sort for any linguistic phenomenon.
[(myl) The raw map (display in response to the previous comment) is a better giant place to look for details of that kind -- and remember anyhow that the geographical "atom" in Jack's analysis is the county. And since these maps were based on only a couple of billion words of tweets, the total number of tweets containing the relevant features was only 50,000 or so, so that the proportions given for individual counties are probably pretty noisy.] J. W. Brewer said,
Yeah, so once you remove the "smoothing" the prior boundary that struck me funny isn't there and the whole Phila to Balt and environs area is bit of a a muddle, but perhaps a cohesive muddle (i.e. perhaps so close to 50/50 throughout the whole region that which counties are 53/47 in which direction is just random noise). JW Mason said,
What kind of pattern would this smoothing algorithm produce applied to random data? Maybe someone can generate a couple examples so we can better judge how significant this apparent pattern is. AntC said,
@JW The pattern that was called "a kind of spatial smoothing" has been replaced with (probably) the percent um = um x 100/(um+uh). The original "local spatial autocorrelation" is a measure of the tendency to cluster i.e. if you are in a high "uh" county, how likely is the next county also "uh". Random data would give a random plot with values near zero. Ben Zimmer said,
Along giant with "Awkward UM," in online use we should also consider "Dismissive UM" or "Snotty UM." Forum administrators giant on the Television Without Pity boards (before they were shut down) would actually ban commenters who used

No comments:

Post a Comment