The effect of rhythm on selective listening in multiple-source environments for young and older adults

Difficulty understanding speech in noisy environments is one of the most common problems reported by older listeners. This problem is partly explained by the increasing prevalence of hearing impairment among older listeners (World Health Organization, 2018). However, difficulty understanding speech in noisy environments is also common with normal hearing listeners (Dubno et al. 1984; Summers and Molis, 2004). Many studies have compared speech-in-noise perception for hearing-matched older and younger listener groups and have repeatedly found that audibility differences do not fully account for individual differences in speech-in-noise performance (Gordon-Salant & Fitzgibbons, 1993; Humes et al., 1994; Pichora-Fuller et al., 1995; Frisina & Frisina, 1997).

One possible non-audiometric explanation for these individual differences in speech-in-noise abilities could be differences in the ability to use speech rhythm to aid recognition. Naturally produced speech has rhythmic regularities on multiple levels (Rosen, 1992; Greenberg, 1999; Ghitza and Greenberg, 2009; Peelle and Davis, 2012). These quasi-rhythmic patterns have been theorized to establish a framework for the formation of temporal expectations, which enable listeners to predict the timing of information-bearing components of the acoustic signal (Jones, 1976; Large and Jones, 1999; Barnes and Jones, 2000; McAuley, 2010). Recent research in the speech domain has demonstrated that the natural rhythmic regularities of speech provide important cues for navigating complex listening environments. In this research, altering (disrupting) the natural rhythm of a to-be-attended (target) speech signal presented with competing background signals has been found to degrade listener comprehension (Aubanel et al., 2016; Wang et al., 2018; McAuley, et al., 2020; McAuley, et al., 2021).

For example, Wang et al. (2018) altered naturally produced Chinese nonsense sentences by first either slowing down or speeding up the speech rate. Rhythmically altered sentences were then created by combining three parts of three different sentences together: one with a slower rate (approximately 2.6 syllables per second), one from a natural rate (approximately 5.0 syllables per second), and one from a fast rate (approximately 7.7 syllables per second). These rhythmically altered sentences and unaltered natural versions of the sentences were both presented in a two-talker masker that consisted of speech from naturally produced nonsense sentences. For the unaltered sentences, listeners improve in syllable identification for words near the end of the sentence compared to words at the beginning of the sentence. However, this benefit disappeared when using the rhythmically altered sentences suggesting that consistent rhythmic cues were essential to this word position effect.

More recently McAuley et al., 2020 looked at a similar effect using naturally produced coordinate response measure sentences (Bolia, et al., 2000). Rhythm alteration was performed by systematically altering the speech rate of target or background sentences by temporally contracting and expanding sections of the sentence in a sinusoidal pattern. When presented with a multi-talker babble background, listeners were less able to accurately identify key words in the rhythmically-altered target sentences compared to unaltered versions of the sentences; the authors referred to this as a target rhythm effect. A background rhythm effect was also found, in which altering the rhythm of the background sentences facilitated perception of the target speech, indicating reduced interference from rhythmically altered background sentences.

A theoretical framework that potentially explains the role of rhythm in speech understanding is dynamic attending theory (DAT). DAT proposes that listener attention fluctuates with an internal attentional rhythm. These fluctuations are proposed to be entrained (synchronized) by external (stimulus) rhythms, leading to quasi-periodic peaks in attention that are aligned with important information-bearing time points in the auditory stimulus (Jones, 1976; Jones and Boltz 1989; Large and Jones, 1999). From this perspective, the beneficial effect of a regular target rhythm on speech understanding is the result of the greater degree of temporal alignment between attentional peaks and information in the speech signal for regular (intact) rhythms than with irregular (disrupted) rhythms. Less temporally regular speech rhythms are expected to reduce the degree of entrainment compared to more temporally regular speech rhythms, thereby reducing speech understanding.

Two experiments were conducted in the current study. The goal of the first experiment was to replicate the beneficial effect of rhythm in a target signal, a target rhythm effect, using a novel synthetic vowel sequence paradigm. The use of natural speech stimuli and rhythm alteration processes that temporally expand and compress speech can lead to potential confounds, raising the possibility that non-rhythmic factors may contribute to the target rhythm effect. In order to control for this possibility and further isolate rhythm as a variable, a vowel-sequence identification task was developed. In this paradigm, listeners were asked to identify synthetic vowels presented in target vowel sequences in quiet and in the presence of a competing background sequence. By comparing vowel identification performance with temporally regular (isochronous) and temporally irregular (anisochronous) vowel sequences the potential benefit of temporal regularity can be examined.

This vowel-sequence identification paradigm has a number of advantages. The rhythm of the target sequence can be altered without distorting individual target vowels or varying the amount of energetic masking from the background sequence. This ensures any effect of rhythm alteration on vowel identification performance could not be attributed to changes in the target vowels or in the amount of energetic masking. Additionally, the simple vowel stimuli minimize the involvement of high-level language processes, so that the performance from the experimental task does not depend on vocabulary size or the use of syntactic or semantic contexts. If the beneficial effect of rhythmic regularity for speech understanding found using naturally produced speech can be replicated using the current vowel-sequence identification paradigm, free of potential influences from stimulus distortion, energetic masking, and high-level language processes, then this would bolster support for DAT-based explanations of the effect of rhythmic regularity.

One factor we considered when replicating the target rhythm effect with the synthetic vowel sequence paradigm was the rate (tempo) of the competing background material. It is well-established that the temporal characteristics of competing background stimuli impacts speech understanding (Miller and Licklider, 1950; Festen and Plomp, 1990; Takahashi and Bacon, 1992; Gustafasson and Arlinger, 1994). For example, the temporal fluctuations of a to-be-ignored background stimulus may either mask (Stone et al., 2012; Stone and Moore, 2014; Fogerty et al., 2016) or interfere (Yost et al., 1989; Yost and Sheft, 1994; Grose et al., 1994) with the envelope of the target speech, especially when the target and background fluctuate at similar rates or have similar tempos. On the other hand, tempo differences between the target and background may be leveraged by the listener as a segregation cue (Grimault et al., 2002). Accordingly, it is expected that the use of the rhythm of the target speech is not fully independent from the temporal envelope of the background. To this end, multiple background tempos were examined with young listeners to determine the effect, if any, of background tempo on listener's ability to benefit from temporal regularity in the target signal.

The goal of the second experiment was to investigate the effect of age on the target rhythm effect. The perception of speech rhythm is dependent on the ability to process temporal information in the speech signal, which is known to become more difficult as listeners age. Older adults have demonstrated greater difficulty than young adults in auditory temporal processing tasks such as gap and duration discrimination, and temporal ordering (Schneider et al., 1994; Fitzgibbons and Gordon-Salant, 1994; Snell and Frisina, 2000; Gordon-Salant and Fitzgibbons, 2004; Lister and Tarver, 2004; Humes et al., 2013; Gallego Hiroyasu and Yotsumoto, 2020; Humes, 2021). Further, age-related decline in auditory processing has been demonstrated for signals with more rapid production rates (Fitzgibbons and Gordon-Salant, 2001; Fitzgibbons et al., 2006) as well as speech that has been time-compressed (Konkle, Beasley, and Bess, 1977; Wingfield et al., 1985; Gordon-Salant and Fitzgibbons, 1993; Gordon-Salant et al., 2004). When allowed to adjust the rate of time-altered speech, older listeners have also demonstrated a preference for listening to speech presented at significantly slower rates when compared to young listeners (Wingfield and Ducharme, 1999). Results from tapping experiments have shown that older listeners demonstrate a tendency towards slower tapping rhythms than younger listeners, producing slower rates both when prompted to tap at a comfortable rate and when prompted to tap at as fast a rate as they can maintain (McAuley et al., 2006; Turgeon et al., 2011). The consistency of rhythmic tapping has also been shown to decline with age: older listeners have greater variability in produced inter-tap-intervals than younger listeners (Turgeon and Wing, 2012) when asked to tap at an isochronous rate. These well-documented age-related changes in rhythm perception and production suggest that older adults may demonstrate less benefit from the temporal regularities in a target signal than young listeners.

留言 (0)

沒有登入
gif