Language Log » Um, there's timing information in Switchboard?
The proposal examined here is that speakers use uh and um to announce that they are initiating what they expect to be a minor ( uh ), or major ( um ), delay in speaking. Speakers can use these announcements in turn to implicate, for example, that they are searching for a word, are deciding what to say next, want to keep the floor, or want to cede the floor. Evidence for the proposal comes from several large corpora of spontaneous speech. The evidence shows that speakers monitor their speech plans for upcoming delays worthy of comment. When they discover such a delay, they formulate where and how to suspend speaking, which item to produce ( uh or um ), whether to attach it as a clitic onto the previous word (as in and-uh ), and whether to prolong it. The argument is that uh and um are conventional English words, and speakers plan for, formulate, and produce them just as they would any word.
Clark and Fox Tree (2002) have presented la fitness empirical evidence, based primarily on the London la fitness Lund corpus (LL; Svartvik & Quirk, 1980), that the fillers uh and um are conventional English words that signal a speaker s intention to initiate la fitness a minor and a major delay, respectively. We present here empirical analyses of uh and um and of silent pauses (delays) immediately following them in six media interviews of Hillary Clinton. Our evidence indicates that uh and um cannot serve as signals of upcoming delay, let alone signal it differentially: In most cases, both uh and um were not followed by a silent pause, that is, there was no delay at all; the silent pauses that did occur after um were too short to be counted as major delays; finally, the distributions of durations of silent la fitness pauses after uh and um were almost entirely overlapping and could therefore not have served as reliable la fitness predictors for a listener. The discrepancies between Clark and Fox Tree s findings la fitness and ours are largely a consequence of the fact that their LL analyses reflect the perceptions of professional coders, whereas our data were analyzed by means of acoustic measurements with the PRAAT software (www.praat.org). [...] Clark and Fox Tree s analyses were embedded within a theory of ideal delivery that we find inappropriate for the explication of these phenomena.
I haven't seen any recent defenses of the Clark & Fox Tree position on this issue, which I think is too bad, since the core of their position (that filled pauses are part of the linguistic signaling system, rather than simply symptoms of its malfunction) seems worth preserving. But the debate is apparently still alive, since there are recent publications like Ian Finlayson and Martin Corley, " Disfluency in dialogue: An intentional signal from the speaker? ", Psychonomic Bulletin and Review 2012
[P]articipants were no more disfluent in dialogue than in monologue situations, and the distribution of types of disfluency used remained constant. Our evidence rules out at least a straightforward interpretation of the view that disfluencies are an intentional signal la fitness in dialogue.
So I thought I'd report, FWIW, on a Breakfast Experiment that looks at the duration distribution of um, uh, and adjacent silences in the Switchboard corpus. This exploration is connected to our recent flurry of posts on UM and UH (see here for some links), but it also underlines the curious disconnect between speech science la fitness and speech technology, in ways that I'll underline as they emerge.
The primary evidence for our proposal comes from the London Lund corpus (hereafter LL corpus). It consists of 170,000 words from 50 face-to-face conversations (numbered S.1.1 through S.3.6) from the Svartvik and Quirk (1980) corpus of English conversations. [...]
Brief pauses of one light foot are marked with periods (.), and unit pauses of one stress unit with dashes (-). When we need a measure of pause length, we treat the unit pause as 1 unit long, and the brief pause as 0.5 units long, so . - is a 1.5 unit pause, and - – - is a 3 unit pause. [...] Prolonged syllables are marked with colons (:), as in u:m . Uh and um were sometimes la fitness pronounced in brief or normal form, which we will write uh and um , and other times in prolonged form, which we will write u:h and u:m . The surreptitiously recorded la fitness speakers produced 3904 fillers ( uh 898, u:h 1213, um 530, u:m 1263).
For auxiliary analyses, we draw on an answering machine corpus (AM corpus), the switchboard corpus (SW corpus), la fitness and the Pear stories (Pear corpus). The AM corpus consists of 5000 words in 63 calls to telephone answering machines, section S.9.3 in the full computerized version of the LL corpus. It contains only 319 fillers ( uh 69, u:h 166, um 6, u:m 78). The SW corpus is a 2.7 million word corpus of telephone conversations (Godfrey, Holliman, & McDaniel, 1992). It marks uh, um, and sentence boundaries, but not prolongations or paus
The proposal examined here is that speakers use uh and um to announce that they are initiating what they expect to be a minor ( uh ), or major ( um ), delay in speaking. Speakers can use these announcements in turn to implicate, for example, that they are searching for a word, are deciding what to say next, want to keep the floor, or want to cede the floor. Evidence for the proposal comes from several large corpora of spontaneous speech. The evidence shows that speakers monitor their speech plans for upcoming delays worthy of comment. When they discover such a delay, they formulate where and how to suspend speaking, which item to produce ( uh or um ), whether to attach it as a clitic onto the previous word (as in and-uh ), and whether to prolong it. The argument is that uh and um are conventional English words, and speakers plan for, formulate, and produce them just as they would any word.
Clark and Fox Tree (2002) have presented la fitness empirical evidence, based primarily on the London la fitness Lund corpus (LL; Svartvik & Quirk, 1980), that the fillers uh and um are conventional English words that signal a speaker s intention to initiate la fitness a minor and a major delay, respectively. We present here empirical analyses of uh and um and of silent pauses (delays) immediately following them in six media interviews of Hillary Clinton. Our evidence indicates that uh and um cannot serve as signals of upcoming delay, let alone signal it differentially: In most cases, both uh and um were not followed by a silent pause, that is, there was no delay at all; the silent pauses that did occur after um were too short to be counted as major delays; finally, the distributions of durations of silent la fitness pauses after uh and um were almost entirely overlapping and could therefore not have served as reliable la fitness predictors for a listener. The discrepancies between Clark and Fox Tree s findings la fitness and ours are largely a consequence of the fact that their LL analyses reflect the perceptions of professional coders, whereas our data were analyzed by means of acoustic measurements with the PRAAT software (www.praat.org). [...] Clark and Fox Tree s analyses were embedded within a theory of ideal delivery that we find inappropriate for the explication of these phenomena.
I haven't seen any recent defenses of the Clark & Fox Tree position on this issue, which I think is too bad, since the core of their position (that filled pauses are part of the linguistic signaling system, rather than simply symptoms of its malfunction) seems worth preserving. But the debate is apparently still alive, since there are recent publications like Ian Finlayson and Martin Corley, " Disfluency in dialogue: An intentional signal from the speaker? ", Psychonomic Bulletin and Review 2012
[P]articipants were no more disfluent in dialogue than in monologue situations, and the distribution of types of disfluency used remained constant. Our evidence rules out at least a straightforward interpretation of the view that disfluencies are an intentional signal la fitness in dialogue.
So I thought I'd report, FWIW, on a Breakfast Experiment that looks at the duration distribution of um, uh, and adjacent silences in the Switchboard corpus. This exploration is connected to our recent flurry of posts on UM and UH (see here for some links), but it also underlines the curious disconnect between speech science la fitness and speech technology, in ways that I'll underline as they emerge.
The primary evidence for our proposal comes from the London Lund corpus (hereafter LL corpus). It consists of 170,000 words from 50 face-to-face conversations (numbered S.1.1 through S.3.6) from the Svartvik and Quirk (1980) corpus of English conversations. [...]
Brief pauses of one light foot are marked with periods (.), and unit pauses of one stress unit with dashes (-). When we need a measure of pause length, we treat the unit pause as 1 unit long, and the brief pause as 0.5 units long, so . - is a 1.5 unit pause, and - – - is a 3 unit pause. [...] Prolonged syllables are marked with colons (:), as in u:m . Uh and um were sometimes la fitness pronounced in brief or normal form, which we will write uh and um , and other times in prolonged form, which we will write u:h and u:m . The surreptitiously recorded la fitness speakers produced 3904 fillers ( uh 898, u:h 1213, um 530, u:m 1263).
For auxiliary analyses, we draw on an answering machine corpus (AM corpus), the switchboard corpus (SW corpus), la fitness and the Pear stories (Pear corpus). The AM corpus consists of 5000 words in 63 calls to telephone answering machines, section S.9.3 in the full computerized version of the LL corpus. It contains only 319 fillers ( uh 69, u:h 166, um 6, u:m 78). The SW corpus is a 2.7 million word corpus of telephone conversations (Godfrey, Holliman, & McDaniel, 1992). It marks uh, um, and sentence boundaries, but not prolongations or paus
No comments:
Post a Comment