
|

| Home | Articles | Article Details |
  |
Abstract
When adults attempt to learn new speech sounds, they do so in the context of the phonology of their native language. The purpose of the present work is to investigate, in individuals, 1) the process of learning to perceive a new speech sound, 2) the impact of the new sound on an acoustically similar native category, and 3) transfer of learning to a novel phonetic context.
Monolingual American English speakers were required to learn to recognize the Hindi voiced, unaspirated, dental stop consonant. A synthetic voiced speech continuum was created, spanning a range from Hindi dental to American English alveolar stop consonants. An initial perceptual mapping procedure that included identification, judged goodness, and difference-rating tasks established how participants perceived the synthetic stimuli. To learn the required perceptual distinction, participants participated in a two-alternative, forced-choice training program using a variety of voiced, natural speech stimuli produced by several talkers. Learning was monitored throughout training. The original mapping procedure was repeated immediately following training and again after at least a two-week delay.
Results indicated that the nature of change during the learning process was speaker-specific and depended on how the listener perceived the stimuli prior to training. Contextual effects were also noted with the order of stimulus presentation strongly influencing the perception of phonetic differences.
The Dynamics of Learning to Hear New Speech Sounds
One of the outstanding questions in speech research concerns how the perception of speech sounds by infants, children, and adults is altered by linguistic experience. This question and it’s subsequent answer, will likely shed light on how phonological distinctions can be regained in aphasia, or learned in developmental language disorders.
One way to examine this issue is to study how adult listeners become attuned to distinctions not inherent in their native phonology. Not all non-native distinctions are equally easy (or equally difficult) to perceive, or learn to perceive. For example, English speaking listeners are quite good at perceiving place and voice contrasts across Zulu (oral) clicks (Best, McRoberts, & Sithole, 1988) but they do not reliably perceive the dental-retroflex contrast in Hindi even after a year of training (Tees & Werker, 1984). It may be that the ability to perceive or learn nonnative distinctions is defined by the similarity or dissimilarity of the new sounds to native ones. According to Best (1994), sounds close to those in one's native language can be assimilated into existing categories and are thus harder to learn than dissimilar sounds.
One mechanism for assimilation was proposed by Kuhl (1993). In this, so-called "native language magnet theory," the best exemplar of each category is considered to be a linguistic prototype and the distribution of prototypes in acoustic/perceptual space is the background against which the acquisition of new speech sounds occurs. Prototypes are hypothesized to function like perceptual magnets, attracting acoustically similar members of a category, making them more perceptually similar to, and hence less discriminable from, the category prototype. Discriminability depends on how far each non-native sound is from the native prototype. When non-native sounds are attracted to different native prototypes or magnets, they will be perceptually distinct.
Experiments that support perceptual assimilation or native language magnet theory typically consider the native sound or prototype as an element in the general language system (see contributions in Strange, 1995, for several examples). However, the stimulus determined to be the best exemplar is dependent upon experimental conditions, changing when a different set of stimuli or even a subset of the original stimuli is used, or when the distinction of interest is presented in a different context, or when attention is directed elsewhere (e.g., Iverson and Kuhl, 1995; Nosofsky, 1984; 1986). Rather than a ''thing,'' such as a prototype or magnet, it appears that we need to understand a process that evolves over time, sensitive to current context and formed by its history.
From our perspective, it makes sense to think of perceptual space as a dynamical system and perceptual learning as a process that modifies this dynamical system (Kelso, 1995). Briefly, a dynamical system is one that evolves over time such that its present state always depends in some rule-governed way on previous states. Differential equations or maps of relevant variables offer a mathematical description of the system's behavior as time passes and parameters change. What may typically be observed in such a system are its stable behaviors, referred to as attractors. The attractor layout, or set of possible behaviors of a system, may change over time in such a way that observed behaviors change gradually or abruptly. Abrupt, or qualitative, changes are called phase transitions or bifurcations. For more information on dynamical systems, see Abraham and Shaw (1992), Kelso (1995), Kelso, Ding, and Schöner (1992), and Port and van Gelder (1995).
Our earlier work on perceptual categorization demonstrated dynamical effects in speech perception and delineated some of the factors responsible (Tuller, Case, Ding, & Kelso, 1994). A theoretical model described the stability and change of phonetic categorization and offered predictions that were later tested and confirmed (Case, Tuller, Ding, & Kelso, 1995). In this view, a perceptual category is deemed to change when the attractor corresponding to the initial category loses stability, especially in the presence of noise. The perceived transition between categories, conceptualized as attractive states of the underlying dynamical system, is an especially important point for uncovering the basic dynamics of speech perception. The learning of a new sound is viewed as the creation of an attractor that modifies the existing dynamics.
One benefit of our dynamical perspective is that it allows predictions about how learning will proceed, depending on how the stimuli are initially perceived (based, presumably on existing preferences), not simply whether or not something will be learned. Evidence that learning consists of the interaction between pre-existing constraints that the participant brings into the learning situation and the behavior to be learned has been provided by Schöner and Kelso (1988; see also Schöner, Zanone, & Kelso, 1992). In their model, behavioral information (such as the task to be learned) acts as a parameter of the attractor dynamics, attracting behavior toward that required. When the behavioral information does not correspond to a stable attractor of the existing, intrinsic dynamics, learning is predicted to take the form of a phase transition: a new behavioral attractor is found that alters the entire dynamics. When the required task is close to, or coincides with an existing stable pattern, cooperative mechanisms ensure that learning will proceed rapidly and smoothly. Zanone and Kelso (1992; 1994; 1997; see also Kelso, 1990) tested these model predictions in perceptuomotor learning. When participants were required to learn an environmentally-specified temporal relationship that did not correspond to any of the initially stable patterns (determined by observing how individual participants produced many different timing relationships prior to training), a phase transition occurred as the learned pattern emerged as an attractor of nearby states. When the temporal relationship to be learned coincided with an initially stable pattern, learning entailed the rapid stabilization of performance.
How might these ideas illuminate how a person learns to distinguish a sound that they have never heard (or spoken)? If a listener initially can perceive a non-native sound as ''different'' from a native one, although perhaps still acceptable as an exemplar of the native category, the existing perceptual landscape cooperates with the sound to be learned. Operationally, the rate of change of the landscape to include the sound to be learned, the progressive stabilization of the new sound, should be relatively fast. In contrast, if a listener initially perceives the non-native sound as indistinguishable from a native one, then learning to recognize the non-native sound competes with the existing perceptual space. In this case, the strength of the attraction of the to-be-learned sound increases until a qualitative change (a bifurcation, or phase transition) reflects the emergence of a new attractor. The rate of change of the perceptual space to the new sound should be slower than when the initial perceptual landscape cooperates with the new sound, and the bifurcation would be marked by high variability.
In order to test these ideas, it is necessary to modify the standard experimental techniques used in phonetic perceptual learning tasks in two ways. First, it is not sufficient simply to note that learning did or did not occur with a particular stimulus set and training regime. Observations of the changes in each listener’s behavior as learning proceeds must supplement measures of whether the trained distinction was finally learned to some criterion.
Second, the focus of analysis must be the individual, not the language. As an example, consider Iverson and Kuhl's (1996) investigation of native English speakers' perception of English /r/ and /l/ in which multidimensional scaling analyses of individual listener's similarity ratings of stimulus pairs revealed that the warping of perceptual space corresponded best to the listener's own identification patterns. Similarly, Aaltonen, Eerola, Hellström, Uusipaikka and Lang (1997) show individual differences in mismatch negativity EEG patterns depending on how the participant categorized the stimulus sequence. In other words, perceptual learning as a result of language training must be assessed relative to the individual's ''perceptual space'' pre-training. To do this, appropriate probes, or maps, of the latter should be conducted prior to, and during the learning process. This is particularly important in speech pathology and rehabilitation.
In the present work, the phonetic category to be learned is the voiced Hindi dental stop consonant /d/ and the existing category is the American English alveolar stop consonant /d/. The major articulatory distinction between these two sounds is in place of articulation--in /d/ the tongue tip is placed against the upper front teeth, and in the /d/, the tongue tip is against the alveolar ridge. There is no phonemic contrast between the dental and alveolar place of articulation in either Hindi or American English; however, it is potentially a useable distinction in that there are at least a half dozen languages (including Malayalam and several Australian and African languages) in which such a contrast does exist (Jongman, Blumstein, & Lahiri, 1985).
In the present experiment, we ask the following questions: Does a period of training result in learning to perceive reliably a new speech sound? Does transfer of learning occur from natural speech to synthetic speech? What are the effects of learning a new speech sound on an acoustically/articulatorily close native speech sound? That is, does an individual's phonetic system reorganize during learning by modifying native categories (e.g., Flege, 1995)? What are the dynamics of the learning process itself? Does the form that learning takes depend on the relationship between the sounds to be learned and how the individual initially perceives them? Does learning persist?
To answer these questions, we use a perceptual mapping procedure that includes three different tasks (identification, judged goodness and difference ratings). These tasks together allow a more complete assessment of each listener's perceptual space than use of any of the tasks alone. Each of the tasks taps somewhat different aspects of speech perception. Identification tasks encourage phonetic coding and, when used in a variable stimulus context including different speakers, utterances, and phonetic contexts, facilitate robust category formation with training (Lively, Logan, & Pisoni, 1993; Pisoni & Lively, 1995). The judged goodness task examines the internal structure of a category in a way that an identification task obscures, allowing the listener to determine how good an exemplar of a category a given stimulus is, focusing attention on differences among stimuli. Data from the difference-rating task allow one to investigate the internal structure of one or more categories simultaneously. Comparing the results of these tasks gives a fuller picture of how a given listener perceives the stimuli.
A group of monolingual American English listeners first completed the three-task perceptual mapping procedure and then participated in a 15-session training program distributed over a three-week period; their progress was monitored throughout training. Following training, the perceptual mapping procedure was repeated. A pre-training/post-training comparison and daily assessments during the training process were performed to assess whether learning occurred and, if so, to reveal its dynamics. Persistence of learning was evaluated by follow-up testing administered a few weeks after the training was completed. Methods Participants
Two groups of individuals participated for credit in their undergraduate psychology classes or for remuneration of $5 per hour. One group consisted of 9 native speakers of Hindi (all of whom were at least bilingual)1, and the other group consisted of 9 monolingual speakers of American English. All participants reported normal hearing and filled out a language background survey.
Training stimuli Four native speakers of Hindi (H) and four native speakers of American English (AE) were asked to produce a list of /CV/ syllables and /aCV/ disyllables. The consonant was either /d/ or /d/ and the vowels were those in ''hot'', ''heat'', ''hoot'' and ''hut.'' Hindi speakers were instructed in the production of the alveolar stop, and AE speakers were instructed in the production of the dental stop. Three native speakers of AE rated all intended alveolar productions and three native speakers of H rated all intended dental productions. Only productions judged to be acceptable by all native listeners were used in training. Dental productions of two AE speakers failed to meet criterion. Training stimuli included 3 tokens each of the 16 different syllables (8 dental, 8 alveolar) from the remaining four H speakers and two AE speakers, resulting in a training stimulus pool of 288 tokens (3 X 16 X 6).2
Test stimuli
A synthetic continuum of eleven syllables was constructed with an initial voiced stop consonant spanning a range from the unaspirated Hindi dental /d/ to the American English alveolar /d/ (Figure 1) followed by the vowel /a/. The syllables, each 315 ms in duration, were created using the parallel tract of a Klatt synthesizer by manipulating the second (F2) and third (F3) formant onset frequencies. A 55 ms period of prevoicing was followed by a 5 ms burst, 5 ms of silence, 35 ms transition, and 215 ms steady state vowel. Klatt parameters were based on acoustic analyses of natural Hindi stimuli (Polka, 1991), Hindi synthesis parameters (Agrawal and Stevens, 1992; Agrawal, personal communication; Werker and Lalonde, 1988), alveolar synthesis parameters (Stevens and Blumstein, 1975; 1978), and extensive pilot work. The starting value of F2 was 1425 Hz for the stimulus at the dental end of the continuum and was increased in 37-38 Hz steps to 1800 Hz for the stimulus at the alveolar end. The starting value of F3 was 2650 Hz for the stimulus at the dental end of the continuum, and was increased in 15 Hz steps to 2800 Hz for the stimulus at the alveolar end of the continuum. For all stimuli, the value of F1 was 200 Hz throughout prevoicing, the burst, and vowel onset, with a 35 ms. transition to the steady state vowel. The steady state vowel formants were F1 = 685 Hz, F2 = 1185 Hz, and F3 = 2585. F4 and F5 were set at 3600 and 4500 Hz, respectively, at vowel onset and remained constant throughout the remainder of the syllable. Fundamental frequency (F0) was set at 130 Hz through the prevoicing and burst to vowel onset (from 0 ms through 65 ms), fell linearly to 120 Hz at 100 ms, and then rose linearly to 140 Hz at 315 ms. Amplitude of voicing (AV) was set to 50 dB at 0 ms and remained constant through 45 ms, ramped to 43 dB at 55 ms, 0 dB at 60 ms, turned on to 60 dB at 65 ms, remained constant through 215 ms, fell linearly to 50 dB at 265 ms, and then fell linearly to 0 dB at 315 ms. The burst was created by setting F1 = 200 Hz, F4 = 3600 Hz, amplitude of frication (AF) = 30 dB, amplitude of frication for the fourth formant relative to other formants (A4F) = 65 dB and amplitude of voicing for the first formant relative to other formants (A1V)= 20 dB at 55 ms for the 5 ms update interval. AF and A4F were set to 0 for all other update intervals. A1V was set at 40 dB for the prevoicing, 20 dB for the burst, and 60 dB (the default value) for the vowel transition and steady state. The open quotient (OQ)--the ratio of the open time of a glottal cycle to the total duration of the period--was set to 60% throughout prevoicing, 20% for the burst, and 50% (the default value) from vowel onset to the end of the syllable. Spectral tilt (Learner) was set to 40 dB throughout prevoicing, then set to 0 for the remainder of the syllable. Flutter (FL), which produces a quasi-random fluctuation in F0 and makes for a more natural sounding vowel, was set to 15% at vowel onset and remained constant throughout the transition and vowel. All other Klatt parameters were left at their default values throughout the syllable.
 Figure 1. The F2 and F3 onsets for stimulus 1 and stimulus 11 (the endpoint stimuli) from the voiced synthetic continuum. For both F2 and F3, stimulus 11 has the steeper slope.
Procedure
Participants, tested individually, were seated directly in front of a computer in a sound-insulated booth. All stimuli were presented binaurally through headphones at a comfortable listening level of about 75 dB.
Hindi Group
All participants were native speakers of Hindi (H) who participated in two sessions of about one hour each. In the first session, they performed judged goodness and identification tasks. In the second session, they performed a difference rating task. We refer to this set of three tasks as ''perceptual mapping.'' Prior to each task, the participants heard 5 AABB sets of the two endpoint stimuli from the synthetic continuum with information regarding what they heard. After the demonstration, participants had an opportunity to practice responding to 10 stimulus presentations.
For the judged goodness procedure, participants were presented with a randomized set of ten tokens each of the eleven unique stimuli. The task was to rate from 1 to 7 (poorest to best) how good an exemplar of /d/ (for the voiced continuum) the stimulus was. All ratings were made by pressing a numerically labeled key on the computer keypad. There was a 3 sec. ISI during which participants responded, and a 6 sec. interval after every 11 tokens.
For the identification task, participants were presented with a differently randomized set of ten tokens each of the eleven unique stimuli. Participants were told that stimuli would be either a synthesized version of an American English alveolar /d/ or a Hindi dental /d/. Differences in how the two sounds are produced were described and examples of the endpoint stimuli from the continuum representing the two sounds were demonstrated. The two-alternative forced-choice task was to identify the stimulus as either alveolar or dental. All identifications were made by pressing an appropriately labeled key on the computer keypad. There was a 3 sec. ISI during which participants responded, and a 6 sec. interval after every 11 tokens.
In the difference rating task participants heard all possible pairs of stimuli from a 6-stimulus subset of each 11-stimulus continuum (stimuli 1, 3, 5, 7, 9, and 11). Pairs were rated on a scale from 1 to 7, with 1 being 'exactly the same' and 7 being 'most different.' There were 30 possible DIFFERENT pairs of 6 unique stimuli, taking order into account, and 6 possible SAME pairs. Participants were presented with each of the possible DIFFERENT pairs four times and each of the possible SAME pairs eight times for a total of 168 pairs. Stimuli were presented in AABB format with a 1 sec. ISI and a 3 sec. ITI. After every twelve trials there was a longer break (at least 10 sec.).
American English Group
This group of monolingual AE speakers participated in an initial 2-session perceptual mapping as defined above, 15 training sessions within a 3-week period, a 2-session perceptual mapping just after training was complete and another at least two weeks later. Each daily training session consisted of (in order) an initial free exploration period, a training set of 48 natural speech stimuli randomly chosen from the full set of 288 natural speech syllables or disyllables available, a difference rating test, a different training set of 48 natural speech stimuli, and another difference rating test. A new randomization of stimulus pairs was used for each difference rating test.
In the free exploration period of the training sessions, participants were permitted up to 10 min. to listen to some of the sounds that would be presented during the training session. Fourteen natural speech pairs, each of which differed only in the presence of /d/ or /d/, that included syllables and disyllables, all 4 vowel contexts, and all 6 speakers were available for comparison listening. An AABB set and a BBAA set of the two endpoint stimuli from the synthetic voiced continuum were also available.
The more regimented part of the training procedure consisted of a two-alternative forced-choice task with feedback. A single stimulus from the training set was presented for identification as either dental or alveolar. When the response was correct, a message to that effect appeared on the computer monitor and the next stimulus was presented. If the response was incorrect, that information appeared on the monitor along with the correct response. The same stimulus was played again, and then the next one was presented. If the participant failed to respond to any stimulus within 3 sec., the sentence ''You are taking a long time to respond, listen again.'' appeared on the monitor and the stimulus was presented again. Participants were permitted to listen as often as they liked to a given stimulus before responding, and the number of times the participant listened to each stimulus was recorded along with the response. Results
A basic premise of this investigation is that the individual is the appropriate unit of analysis. To the extent that groups exhibit learning, it is a collective consequence of individuals learning. So although group analyses were performed and are reported here, they are followed by close examination of individual participant's performance. Group effects 1. Do native speakers of Hindi and American English initially perceive the synthetic continuum differently?
The perceptual mapping data prior to any training were compared for the two language groups. Two separate ANOVAs were performed on the identification and judged goodness data. A 2-factor repeated measures ANOVA treated language group (AE and H) and stimulus (1-11) as within-participant factors. In addition, Multidimensional Scaling (MDS) solutions of the difference rating task were performed for each of the two language groups. MDS is a method that uses proximities (any measure of how similar or how different stimuli are or are perceived to be) to generate a spatial representation reflecting hidden structure in the data. In such a configuration, the farther apart two points are, the more dissimilar the stimuli are (Kruskal & Wish, 1978; see also Kewley-Port & Atal, 1989; Iverson & Kuhl, 1995; Davis & Kuhl, 1994, Pols, van der Kamp, & Plomp, 1969; Shepard, 1972 among others, for speech-related applications of MDS). All MDS solutions were performed with SAS statistical software and, following Iverson and Kuhl (1995), used the Kruskal algorithm with Kruskal’s stress formula 1, a Euclidean distance metric, and other default settings except in cases where there were obvious degenerate solutions (artificially strong clustering). In those cases, a linear transformation was applied. When participants were asked to rate each stimulus as a member of their native language category, there was a main effect of stimulus [F(10,160) = 2.11, p < .001] as well as an interaction of group and stimulus [F(10,160) = 5.98, p < .001]. Figure 2a plots the mean judged goodness by stimulus and reveals that, in absolute terms, stimuli from the dental end of the continuum are judged by H participants to be better exemplars of their native category than are stimuli from the alveolar end of the continuum. The opposite is true of AE participants. Note that despite the interaction, all stimuli were judged by both groups to be acceptable members of their native category. The lowest mean judged goodness of any stimulus by any group was 3.99 where 4 is defined as ''okay'' or ''acceptable''.
Figure 2b shows the stimuli in the identification task as a function of percent alveolar identification separately for the two language groups. The 95% confidence interval around 50%, or chance performance, goes from ~18% to ~82%. Thus, all stimuli were identified as alveolar at levels not significantly different from chance. When the percentages were converted to proportions, an arcsin transform done, and a group by stimulus ANOVA performed, the only significant effect was of stimulus [F(10,160) = 9.41, p <.001]; stimuli from the alveolar end of the continuum were more likely to be identified as alveolar than were stimuli from the dental end. Note that this continuum is not expected to show a typical sigmoid identification function given that the stimuli do not span two phonetic categories in either H or AE.3
Figure 2c depicts the MDS solutions for H and AE participants, based on difference ratings of pairs of stimuli. For both participant groups, the MDS solution shows acoustically ordered stimuli that are evenly spaced, following the equivalent acoustic spacing in the stimulus set. This finding is consistent with the fact that all the stimuli were generally perceived to be acceptable members of the listeners' native category, but none was rated as particularly better than the others.
 Figure 2. Language group comparisons of stimuli from the voiced synthetic continuum. Results of the judged goodness task are in (a) and results of the identification task are in (b). The dental stimulus represents the Hindi sound, and the alveolar stimulus represents the American English sound. Solid lines correspond to AEs and dashed lines to Hs. In (c), the MDS solutions to the difference ratings task are presented. Note that in (c) (and in all figures illustrating MDS solutions that follow), the letter ''E’ represents stimulus 11.
In summary, the H group heard the synthetic stimuli differently than the AE group in a manner that reflects the fact that their native language contains a dental stop consonant. Thus for AE participants, the continuum is appropriate for examining the modification of perception with training.
2. Does a period of training with natural speech result in learning to identify reliably a non-native speech sound?
For each of the 15 days of training with natural speech stimuli, the percent correct identifications for each participant were determined. A repeated measures ANOVA was performed on the arcsin transform of the proportion of correct identifications. Vowel context (4 levels), week of training (3), speaker (6), and consonant position (2) were all treated as within-participants factors. Weekly scores were obtained by averaging the daily percent correct identifications.
The target consonant was easier for participants to identify in some vowel contexts than others [F(3,24) = 7.04, p < .01] , and there were several significant interactions with vowel context including a two-way interaction with speaker [F(15,120) = 5.26, p < .001] and a three-way interaction of vowel, speaker, and consonant position (syllable initial or intervocalic) [F(11,88) = 5.52, p < .001]. These effects had no obvious acoustic or featural explanations.
The significant main effect of week [F(2,16) = 23.50, p < .001] demonstrates the effectiveness of training. Tukey HSDs revealed significant improvements from week 1 to week 2 and week 2 to week 3 that can be seen in Figure 3a. Every participant showed some improvement from the first day to the last day of training as well as in the weekly averages, although improvement was minimal for 3 of the listeners. For eight out of nine cases, the first week averages were higher than the first day percent correct identifications, indicating training effects beginning during the first week.
 Figure 3. In (a), the effectiveness of training is illustrated by the improved percent correct identifications across participants each week. In (b), percent correct identifications by speaker across participants are shown.
The significant main effect of speaker [F(5,40) = 22.74, p < .001] is shown in Figure 3b. Tukey HSDs revealed that AE listeners were consistently better able to differentiate between the native and new speech sounds when they were produced by a native speaker of their own language. This difference occurred in spite of the fact that all AE speakers' intended dental utterances were judged as perfectly acceptable Hindi dental stop consonants by all native Hindi judges in the intelligibility test -- a criterion for inclusion in the training set.
One possibility is that familiarity with some speakers' voices could account for the fact that AE speakers' productions were responded to more accurately than H speakers' utterances. Two participants were very familiar with only one of the AE speakers and they both identified productions of that speaker most accurately. Three participants were familiar with both AE speakers' voices and their identification accuracy was higher for both AE speakers than for the H speakers. However, of the four participants who were not familiar with any of the voices on the training tapes, all showed greater response accuracies to each of the AE speakers than to any of the H speakers in corresponding weeks. Thus, although there are some speaker familiarity effects, they cannot entirely account for the greater accuracy of identifying non-native (Hindi) sounds produced by native speakers of American English than of identifying non-native (Hindi) sounds produced by Hindi speakers. Moreover, the lack of a significant interaction of week with speaker [F(10,80) = 1.48, p > .1] means that in spite of effective training that resulted in an overall improvement of identification, responding to AE speakers continued to be more accurate than to H speakers.
To summarize, training was effective and there were differences in the accuracy of identification of stimuli produced by AE versus H speakers that could not be accounted for on the basis of speaker familiarity. This last result is suggestive for the pairing of pathologists and clients in the clinical setting.
3. Is there transfer of learning from natural to synthetic voiced stop consonants?
As noted earlier, three participants showed only minimal improvement in natural speech identification after training. These participants also failed to distinguish between dental and alveolar synthetic stimuli. Of the six participants who showed more substantial improvements on natural speech, three were also able to distinguish reliably between dental and alveolar place of articulation in the synthetic continuum after training. These three are identified as "good learners". For all three, most of the improvement on the natural speech stimuli occurred between weeks 1 and 2; as performance accuracy improved over time, variability decreased. Average day-to-day variability of the MDS scores for good learners was compared with the six listeners identified as non-learners. Day-to-day variability was calculated by subtracting the Day 1 dimension score for each stimulus from the Day 2 score, the Day 2 from the Day 3 score, and so forth. The squared difference scores were then summed and the standard deviation for each interval found. Learners showed a decrease in variability as training progressed that non-learners did not (Figure 4). These results support the classification of participants as learners or non-learners and that patterns of variability accurately reflect the learning process. In addition, within individuals, when variability in MDS scores in the first interval was compared to variability in the last interval, only the learners showed a decrease in variability regardless of order of presentation of stimuli. Next we examine data from these three individual learners in some detail.
 Figure 4. Day-to-day variability decreased for learners (dashed line), but not non-learners (solid line), over the course of training.
Learner 1.
In the pre-training identification task, Learner 1 showed chance responding to all stimuli except the three most extreme dental-end stimuli, which were nearly always identified as dental (Figure 5a left). A sequential probabilities analysis (computing the conditional probability of identifying each stimulus as a member of a particular category as a function of the response to the preceding stimulus) revealed that the response to stimuli 4-11 was likely to be the same as to the preceding stimulus presented. This clear assimilative effect can be considered additional evidence for the attractive influence of stimuli on succeeding ones reported in Tuller et al. (1994) and Case et al. (1995). In the judged goodness task (Figure 5a right), Learner 1 rated all stimuli as relatively good members of the alveolar category. Mean judged goodness ranged from 5.4 to 6.2, with the highest absolute rating given to stimulus 7. These results are intriguing in that stimuli consistently identified as dental were still judged as relatively good alveolars. This underscores not only the poverty of using only a single measure of an individual's phonetic perception but also the flexibility of perception.
In both the post-training and follow-up identification tasks, the identification functions partitioned the stimuli into two clear categories (Figure 5a left) and the sequential probabilities analysis no longer revealed any order effect. The post-training judged goodness showed a significant effect of stimulus [F(10,99)=7.38, p>0.001] such that the same stimuli identified as alveolar (stimuli 6-11) were judged to be better alveolars than stimuli identified as dentals (stimuli 1-5). In the follow-up judged goodness task, stimuli on the dental end of the continuum were again judged to be poor exemplars of the alveolar category and stimuli on the alveolar end were judged to be good exemplars [F(10,99)=14.39, p<0.001]. The ''best'' judged alveolar did not shift in the post-training judged goodness task, although it did become more extreme in the follow-up. No stimulus was judged a better alveolar after training than before.
 Figure 5. Pretest (solid lines), post-test (dotted lines), and follow-up (dashed lines) performances of Learner 1 (a), Learner 2 (b), and Learner 3 (c) on identification and judged goodness tasks are illustrated as labeled. Stimuli from the dental end of a continuum represent Hindi sounds, and stimuli from the alveolar end of a continuum represent American English sounds.
MDS analyses based on difference ratings are displayed in Figure 6. Pre-training, stimuli that are acoustically closest to the best exemplar are ''attracted'' or ''pulled'' such that stimulus 5, which is identified most often as dental, is pulled toward stimulus 7, away from the more dental stimuli. During the first several days of training, the two most dental stimuli form a group and stimulus 5, although also assigned a positive dimension score, is somewhat more separated. The three alveolar-end stimuli, all assigned negative dimension scores, are intertwined. On subsequent days, the two extreme stimuli on either end cluster and the mid-range stimuli oscillate in their attraction to either end.
 Figure 6. Learner 1’s MDS scores for pre-training, 15 days of training, post-training, and follow-up. See text for description.
Although in many perceptual studies order of presentation of stimuli in a pair is presumed to have no effect (Schiffman, Reynolds, & Young, 1981), our pilot data and the sequential probabilities analysis of identification responses suggested that order of pair elements might indeed influence difference ratings. For these reasons, MDS solutions were obtained separately for stimulus pairs with the acoustically more dental stimulus presented first (D1st pairs) and for those with the acoustically more alveolar stimulus presented first (A1st pairs).4 Results are presented in Figure 7a. In the pre-training data, when the first stimulus in a pair belongs to the participant's native category (the A1st condition), stimuli that are acoustically closest to the best exemplar are attracted or pulled in; dental-end stimuli cluster separately from the alveolar-end stimuli. When the acoustically more dental stimulus is presented first (the D1st condition), there is little if any evidence of stimulus grouping before training. In the post-training and follow-up testing, the D1st pairs also show a magnet-like effect but still a weaker one than observed for the A1st pairs. When the day-to-day variability of the MDS solutions is calculated, total variability is relatively low from the beginning of training and quickly decreases over the first six days, remaining low thereafter. The initially higher variability in the total is exclusively due to the A1st pairs (Figure 7b).
 Figure 7. MDS scores taking order of presentation into account are presented for pretest, post-test, and follow-up in (a). Day-to-day variability in the dental first condition (dotted line), alveolar first condition (dashed line), and without regard to order of presentation (solid line) for Learner 1 are shown in (b).
To summarize, Learner 1's pre-, post-, and follow-up test results showed changes in responding to the stimuli in all three tasks consistent with transfer of learning from the natural speech stimuli to the synthetic continuum and those changes persisted over several weeks without further training. Assimilative effects that were evident in sequential probabilities analyses were also apparent in the differences in configurations of MDS solutions to D1st and A1st pairs. Learning was rapid and, as indexed by MDS solution convergence, occurred over the first six days of training.
Learner 2
Learner 2's results for pre-training, post-training, and follow-up identification and judged goodness responses to voiced stimuli are shown in Figure 5b. Pre-training, only stimuli 7 and 8 were identified at levels different from chance (both as alveolar). Stimulus 8 was also judged as ''best'' although all stimuli were judged as acceptable members of the alveolar category, with mean judged goodness ranging from 3.6 to 5.1. After training and in follow-up testing, Learner 2's identification functions showed categorization of the stimuli into alveolar and dental percepts and stimuli on the alveolar end of the continuum were judged to be better exemplars of the alveolar category than were stimuli from the dental end [F(10,99)=6.60, p > 0.001]. Stimulus 11, judged the best alveolar after training, was also judged a better alveolar than before training. In the follow-up test, this effect was magnified [F(10,99)=8.88, p < 0.001].
The sequential probabilities analysis on pre-training identification responses revealed a clear assimilative effect across the range of stimuli except, of course, for 7 and 8 which were identified as alveolar regardless of the preceding response. The assimilative effect diminished after training although there were still remnants of it for mid-range stimuli. In the follow-up, only stimuli that were perceived as alveolar at chance levels displayed a consistently assimilative effect.
MDS analyses based on difference ratings are displayed in Figure 8. Pre-training, there is no evidence of parsing the stimulus set. During the first two days of training, the stimuli vary wildly in their relationship to each other. Shortly thereafter, stimuli separate so that the three alveolar-end stimuli are assigned negative dimension scores and the three dental-end stimuli are assigned positive dimension scores, but the absolute difference in scores is still large and the arrangement does not reflect acoustic order. On days 7 and 8, there is a widening of the alveolar group to include both mid-range stimuli. Thereafter, the two extreme alveolar stimuli were consistently close together and separate from the more dental stimuli, a pattern maintained through follow-up testing.
 Figure 8. Learner 2’s MDS scores for pre-training, 15 days of training, post-training, and follow-up. See text for description.
 Figure 9. MDS scores taking order of presentation into account are presented for pretest, post-test, and follow-up in (a). Day-to-day variability in the dental first condition (dotted line), alveolar first condition (dashed line), and without regard to order of presentation (solid line) for Learner 2 is shown in (b).
When the MDS analyses are performed taking order into account, the solution for pre-training does not respect acoustic ordering (see Figure 9a). By the post-training evaluation, both orders show grouping of stimuli, but in the A1st condition there is a tighter clustering of stimuli into two groups corresponding to alveolars and dentals. Grouping of stimuli is tighter in the follow-up as well, although the grouping of stimuli in the D1st condition looks more like the A1st grouping at this point. The day-to-day variability in the MDS solutions is shown in Figure 9b. Total variability for Learner 2 is initially higher than for Learner 1 and shows a steady decline until a few days of higher variability, around Day 8 and 9, precedes a sharp drop to levels equivalent to those observed for Learner 1. Note that the peak in total variability is primarily due to variability in judging the A1st pairs.
To summarize, pre-, post-, and follow-up test results for Learner 2 showed a change in responding to the stimuli consistent with transfer of learning from the natural speech stimuli to the synthetic continuum, a change that persisted over several weeks without further training. The rate of convergence of the MDS solution was slower than for Learner 1 and the pattern of convergence more volatile in that the variability grows just before settling. As for Learner 1, the growth in variability stems from the A1st pairs. Order effects are also observed pre-training that persist for mid-continuum stimuli after training and into the follow-up two weeks later.
Learner 3. In the pre-training identification task with synthetic stimuli, Learner 3 identified all stimuli as alveolar most of the time (Figure 5c). Hence, the sequential probabilities analysis could reveal no order effects. In the post-training and follow-up identification tasks, Learner 3 showed a clear ability to distinguish among the stimuli, with extreme dental-end stimuli identified as dental and alveolar-end stimuli identified as alveolar. The sequential probabilities analysis revealed a contrast effect for stimuli 1-5, assimilation for stimulus 6, and no discernible pattern for stimuli 7-11. In the follow-up, only stimulus 4 shows a strong (assimilative) order effect.
In the pre-training judged goodness task, all stimuli were judged to be acceptable members of the alveolar category (mean judged goodness ranging from 3.8 to 4.9 with the peak value occurring for stimulus 6 in the middle of the continuum). Post-training judged goodness data revealed a significant effect of stimulus [F(10,99) = 3.49, p < .001] with goodness (as alveolar) of stimuli 1-4 judged as 3.2 or less and goodness of stimuli 8-11 judged as 5.0 or better. Stimulus 11 was judged the best alveolar post-training and was judged a better alveolar than before. A similar stimulus effect was obtained in the follow-up judged goodness task [F(10,99) = 6.55, p < .001] with mean judged goodness ranging from 2.6 to 6.5 and the peak value occurring at stimulus 11. That is, after training the stimulus judged to be the best alveolar shifted to a more extreme stimulus.
In the pre-training MDS analysis (Figure 10), stimuli were rather disordered and spread out. That is, the stimuli all appeared to be confusable members of a single category. The daily MDS solutions show some separation between the two most extreme alveolar stimuli and the other four stimuli on Day 6 of training, but in general the MDS solutions fluctuate widely. However, during the last two days of training the solutions seem to settle with the three dental-end stimuli being assigned positive dimension scores and the three alveolar-end stimuli having negative dimension scores.
 Figure 10. Learner 3’s MDS scores for pre-training, 15 days of training, post-training, and follow-up. See text for description.
 Figure 11. MDS scores taking order of presentation into account are presented for pretest, post-test, and follow-up in (a). Day-to-day variability in the dental first condition (dotted line), alveolar first condition (dashed line), and without regard to order of presentation (solid line) for Learner 3 is shown in (b).
Learner 3's MDS solutions showed only small differences as a function of order of pair members (Figure 11a), consistent with the lack of an order effect in the identification task. In the pre-training MDS analysis, alveolar-end stimuli were closer together for A1st pairs than for D1st pairs. Post-training, only A1st pairs showed a grouping of the two dental-end stimuli. Day-to-day, the total variability of Learner 3's rated differences starts high and remains relatively high, as if this listener is still exploring the perceptual space (Figure 11b). That some stabilization process has in fact begun is indicated not only by the identification and judged goodness data but also by acoustic ordering of the MDS values post-training and the continued arrangement according to acoustics in follow-up testing. Additional evidence that learning has occurred lies in the improvement in post-training performance on each task and persistence in the follow-up. But there is also evidence that learning is still very much in progress in the patterns of variability which remain high throughout the course of training. Discussion
The present study examined several issues regarding the learning of new speech sounds. The methodology evaluated initial perceptual constraints and probed the learning process as it unfolded in order to determine how perceptual modification occurs. Native AE speakers were trained with a wide array of natural speech exemplars to distinguish between alveolar and dental stop consonants. All participants showed at least some improvement in their ability to identify the consonants correctly. A very interesting asymmetry occurred with respect to who was doing the talking. Listeners were both better able initially to make the distinction between voiced dentals and alveolars, and to learn it, when the utterances were produced by native speakers of their own language. In previous work that reported differences in phonetic perceptual learning as a function of speaker (e.g., Lively, et al., 1994), the task was to distinguish between two non-native speech sounds in a training set that, understandably, only included speakers of the non-native language. The synthetic test continuum used here encompasses both a native and a non-native sound. We did not want listeners simply to attend to the differences between the general characteristics of the H and AE speech used in the training set and assign the label ''dental'' to anything that sounded foreign and ''alveolar'' to anything that sounded native. To avoid that, AE speakers were trained to produce dental sounds and H speakers were trained to produce alveolars, so that both consonants produced by native speakers of both languages could be used for identification training.
The advantage for training stimuli produced by AE speakers cannot be explained as a result of speaker familiarity or on the basis of differences in intelligibility across the language groups. The issue of speaker familiarity arose because at least five members of the experimental group were quite familiar with one or both AE speakers. Although speaker familiarity should not be dismissed as a factor in learning (Lively, Logan, & Pisoni, 1993; Logan, Lively, and & Pisoni, 1991), it does not completely account for the asymmetry in response accuracy observed here as the distinction between responses to AE speakers vs. Hindi speakers occurred whether individuals were familiar with the speakers or not. Intelligibility of AE and H productions was assessed as a criterion for inclusion in the training set. No alveolar productions were used that were not judged to be acceptable exemplars by AE speakers, and no dental productions were included that were not judged acceptable by H speakers. Yet it is easier to hear or to learn to hear the differences between speech sounds when the speaker has the same native language as the learner (the H speakers in the experiment did speak English fluently, but all had noticeable accents).
A clue to the differences between listeners' responses to AE speakers and H speakers may lie in an interaction with the following vowel. In the AE speakers' productions, the following vowels sound familiar and quite similar whether the consonant is dental or alveolar, perhaps allowing the listener to focus on the consonant distinction. With the H speakers' productions, the vowels are often produced with more nasality than AE vowels; they are unfamiliar, and may draw attention from the consonantal distinction. It must be remembered that the interaction of speaker and following vowel context affected the accuracy of consonant identification, so the differences are unlikely to be strictly a featural distinction. However, AE speakers may signal place of articulation using acoustic cues that AE listeners are attuned to, but that H listeners still accept as possible. In order to assess whether listeners transferred the ability to distinguish among naturally spoken dental and alveolar stops to the synthetic continua, we assessed how each listener perceived the synthetic stimuli before and after training with natural speech. For several of the participants there was transfer of learning from the natural speech to the voiced synthetic speech. For participants who improved on the natural speech training task but did not show transfer, we speculate that learning was based on an acoustic dimension(s) different from the one manipulated in the synthetic speech continuum.
Although three participants successfully learned to distinguish between synthetic dental and alveolar stop consonants, the way in which learning progressed was not uniform within a participant or across participants. The three good learners might be construed as representing a ''continuum of closeness'' to an initial ability to hear dental-end stimuli as different from alveolar-end stimuli. Learner 1 shows the greatest ability to hear dental-end stimuli as different in the pre-training identification task and a separation of stimuli into perceptual clusters is evident in the MDS solutions after only a day or two of training. Day-to-day variability of the MDS begins fairly low and, over the first four days of training, settles quickly to an even lower level. This pattern is consistent with the idea of progressively stabilizing an already existing stable pattern.
Learner 2 shows much less evidence for an initial ability to hear dental-end stimuli as different from alveolar-end stimuli (although there is some tendency in that direction for at least one stimulus). Variability of the MDS solutions begins at a level nearly three times greater than initial variability for Learner 1, quickly begins to decrease, but peaks again just prior to reliable clustering of the MDS solutions, which occurred after 9 days of training. The peak in variability occurred almost exclusively in pairs whose first member had higher F2 and F3 values (i.e., the A1st pairs). After the settling of the MDS solutions into clusters, variability fell markedly to a level equivalent to that shown by Learner 1 and did not differ systematically between A1st and D1st pairs.
For Learner 3, there is no evidence of any initial ability to hear dental-end stimuli as different from alveolar-end stimuli. The day-to-day variability of the MDS solutions takes almost the entire training time to settle and the level to which variability falls is still higher than that exhibited by either of the other two trained listeners. Like for Learner 2, order effects were noted in the variability of the difference ratings such that day-to-day variability in the A1st pairs peaked just prior to the division of the stimulus range into clusters.
Thus, for the three participants who showed learning that could be assessed using the synthetic continuum, two things were noted. First, the rate of contraction of the stimuli into groups (the time it took for perceptual clustering in the MDS solutions and for variability of the MDS solutions to fall) increased. Second, the absolute level of variability across the learning interval decreased with the initial ability to hear distinctions within the continuum. The pattern of variability was also influenced by how the individual perceived the stimuli prior to training, i.e., if the listener could not initially identify any stimulus as the non-native sound, a local increase in variability (analogous to critical fluctuations preceding bifurcations; cf. Schöner, Haken, & Kelso, 1986) presaged the perceptual division of the continuum.
Although order effects are not typically evaluated in multidimensional scaling analyses based on difference ratings (Iverson & Kuhl, 1995, 1996), the MDS solutions in the present work showed substantial order effects. Participants were less sensitive to small differences between stimuli (showed more perceptual grouping of stimuli) when the first member of the pair was ''more alveolar,'' that is, the pair involved a decrease in F2 and F3 from one stimulus to the next. The difference in sensitivity may be at least in part dependent on the direction of formant frequency change; the sensitivities shown by these participants is in the same direction as AE listeners judging differences in tokens of [i] (Sussman & Lauckner-Morano, 1995) and in glide detection (Dooley & Moore, 1988). However, the direction of frequency change is unlikely to be the whole story because the strength of the asymmetry in sensitivity adjusts with learning.
The dynamical approach to speech production and perception (Kelso, Saltzman, & Tuller, 1986; Tuller & Kelso, 1990; Tuller et al., 1994; Case et al., 1995), here tailored for learning, can provide a framework by which Kuhl's (1993) perceptual magnet effect of category prototypes and Best's patterns of assimilation may occur. In addition, because they depend crucially on initial conditions defined for the individual at a given moment in time, dynamical systems can elucidate the context dependence of phonetic perception including the strong order effects observed. In the present case, for example, when a better alveolar is presented first in a pair, the acoustic/perceptual space warps so that more stimuli are included in its basin of attraction (i.e., acoustically different stimuli are perceptually equivalent). When the order of stimuli is reversed the attraction is more limited so that the same acoustic difference now produces a perceptual difference.
For all three learners, the perceptual categorization of some stimuli as dental persists or strengthens over time, suggesting some kind of consolidation. Although there was no further explicit training, or exposure to the training stimuli between the post- and follow-up tests it was certainly the case that the participants continued to be exposed to alveolar stop consonants in day-to-day discourse. Note that learning the non-native sound modified perception of the native one (cf. Flege, 1992; 1995), especially for listeners who did not initially parse the stimulus continuum. After learning, not only did the stimulus judged as the best alveolar exemplar shift away from the dental group, but the best exemplar was also a better exemplar post-training than pre-training. Thus the perception of everyday speech sounds between the post-training and follow-up testing was in the context of a system with different sensitivities than prior to training.
Similarly, Fowler (1995) argues that speakers’ productions are constantly changing to conform more to the speech of those around them. She reported a bilingual speaker of Portuguese and English whose VOTs in voiceless stop consonants drifted toward a value typical of the ambient language environment. For example, when in the U.S., VOTs of the speaker’s Portuguese productions of voiceless stops increased.
Conclusions
In the cognitive, behavioral, and brain sciences, large strides have been made in understanding pattern formation using the concepts of self-organization and the mathematical tools of nonlinear dynamical systems (e.g., see Haken & Stadler, 1990 for a variety of different contributions in this context; Kelso, 1995). Explicitly dynamical investigations of speech include attempts to identify phonological units with dynamically specified gestures (Browman & Goldstein; 1986; 1989; 1992; Kelso, Saltzman, & Tuller, 1986; Kelso, Tuller, & Harris, 1983), to construct a topology of vowels (Wildgen, 1990) and consonants (Petitot-Cocorda, 1985) in terms of a landscape of attractors and repellers within an articulatory or acoustic space, and to model the phonological system of artificial languages as a self-organized solution of talker-based and listener-based constraints (Lindblom, MacNeilage, & Studdert-Kennedy, 1983). In our own work (Tuller et al., 1994; Case et al., 1995), we demonstrated that changes in perception that occur as the acoustic signal is altered are indicative of a pattern formation process in perception. A model of the results was proposed and unique predictions of the model tested and confirmed. The model was also extended to a description of a well-known perceptual phenomenon--the verbal transformation effect--and simulations agreed with empirical observations to a remarkable degree (Ditzinger, Tuller, Kelso, & Haken, in press; Ditzinger, Tuller, & Kelso, in press). More recently, the approach has been applied to an analysis and model of auditory streaming (Almonte, Jirsa, Large & Tuller, submitted) and it shares much with studies of perceptuomotor learning (Kelso, 1990; Kelso & Zanone, in press; Schöner, Zanone, & Kelso, 1992; Zanone & Kelso 1992, 1994, 1997), the effects of attention on behavioral patterns (e.g., Temprado, Zanone, P.G., Monno., A., & Laurent, 1999), and many other investigations of learning from behavioral, theoretical, and neurophysiological perspectives (e.g., Jantzen, Fuchs, Mayville, & Kelso, 2001; Kelso, 1995; Sporns & Edelman, 1993). Important for an understanding of the mechanisms underlying speech disorders, theoretical work at the neural level is rapidly becoming more neurobiologically-grounded (e.g., Frank, Daffertshofer, Peper, Beek, & Haken, 2000; Fuchs, Jirsa & Kelso, 2000; Jirsa, Fuchs & Kelso, 1998; Jirsa & Haken, 1997).
The dynamical approach on which the present experiment is based provides a theoretically motivated way to understand the process of learning to perceive new speech sounds. Fundamental to this approach is a methodological difference: instead of studying features of objectively existing prototypes (either as abstract linguistic entities or as stored multiple exemplars) in a group of listeners, we focus on the interaction of an individual perceiver with speech stimuli in context. Here we have taken a first step in this direction. We have observed patterns of phonetic perceptual learning that are consistent with the notion that reliably perceiving a new speech sound depends on whether the new sound cooperates or competes with an individual's initial perceptual capabilities and that learning serves to reorganize the perceptual space.
References
Aaltonen, O., Eerola, O., Hellstrom, A., Uusipaikka, E., & Lang, A. H. (1997).
Perceptual magnet effect in the light of behavioral and psychophysiological data. Journal of the Acoustical Society of America, 101, 1090-1105.
Abraham, R. H., & Shaw, C. C. (1992). Dynamics: The geometry of behavior. Redwood City, CA: Addison-Wesley Publishing Company.
Agrawal, S., & Stevens, K. N. (1992). Synthesizing high quality Hindi speech using KLSYN88. Journal of the Acoustical Society of India, 20, 46-55.
Almonte, F., Jirsa, V.K., Large, E., & Tuller, B. (submitted). Neural model of streaming in rhythm perception.
Best, C., McRoberts_ G., & Sithole, N. (1988). Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by english-speaking adults and infants. Journal of Experimental Psychology: Human Perception and Performance, 14, 345-360.
Best, C. T. (1994). The emergence of native-language phonological influences in infants: A perceptual assimilation model. In J. C. Goodman & H. C. Nusbaum (Eds.), The development of speech perception: The transition from speech sounds to spoken words (pp.167-224). Cambridge, MA: MIT Press.
Browman, C., & Goldstein, L. (1986). Towards an articulatory phonology. Phonology Yearbook, 3, 219-252.
Browman, C., & Goldstein, L. (1989). Articulatory gestures as phonological units. Phonology, 62, 210-251.
Browman, C., & Goldstein, L. (1992). Articulatory phonology: An overview. Phonetica, 49, 155-180.
Case, P., Tuller, B., Ding, M., & Kelso, J. A. S. (1995). Evaluation of a dynamical model of speech perception. Perception and Psychophysics, 57, 977-988.
Davis, K., & Kuhl, P. K. (1994). Tests of the perceptual magnet effect for American English /g/ and /k/. Cambridge, MA: Poster presented at the 127th meeting of the Acoustical Society of America.
Ditzinger, T., Tuller, B., & Kelso, J. A. S. (1997). Temporal patterning in an auditory illusion: The verbal transformation effect. Biological Cybernetics, 77, 23-30.
Ditzinger, T., Tuller, B., Kelso, J. A. S., & Haken, H. (1997). A synergetic model for the verbal transformation effect. Biological Cybernetics, 77, 31-40.
Flege, J. E. (1992). Speech learning in a second language. In C. A. Ferguson, L. Menn, & C. Stoel-Gammon (Eds.). Phonological development: Models, research, implications. Timonium, MD: York Press.
Flege, J. E. (1995) Second language speech learning: Theory, findings, and problems. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research pp. 233—277. Baltimore, MD: York Press.
Fowler, C. A. (1995). A realist perspective on some relations among speaking, listening, and speech learning. Proceedings of the XIIIth International Congress of Phonetic Sciences, Vol. 1, Stockholm, 12-19 August, 470-477.
Frank, T.D., Daffertshofer, A., Peper, C.E., Beek, P.J., & Haken, H. (2000). Towards a comprehensive theory of brain activity: Coupled oscillator systems under external forces. Physica D 144, 62-86.
Fuchs, A., Jirsa, V.K., & Kelso, J.A.S. (2000). Theory of the relation between human brain activity (MEG) and hand movements. NeuroImage, 11, 359-369.
Haken, H., & Stadler, M. (1990). Synergetics of Cognition. Berlin: Springer-Verlag. Iverson, P., & Kuhl, P. K. (1995). Mapping the perceptual magnet effect for speech using signal detection theory and multidimensional scaling. Journal of the Acoustical Society of America, 97, 553-562.
Iverson, P., & Kuhl, P. K. (1996). Influences of phonetic identification and category goodness on American listeners’ perception of /r/ and /l/. Journal of the Acoustical Society of America, 99, 1130—1140.
Jantzen, K.J., Fuchs, A. Mayville, J.M., & Kelso, J.A.S. (2001) Neuromagnetic activity in alpha and beta bands reflects learning-induced increases in coordinative stability Clinical Neurophysiology, 112, 1685-1697.
Jirsa, V.K., Fuchs, A., & Kelso, J.A.S. (1998). Neural field theory connecting cortical and behavioral dynamics: Bimanual coordination. Neural Computation, 10, 2019-2045.
Jirsa V.K & Haken, H. (1997). A Derivation of a Macroscopic Field Theory of the Brain from the Quasi-microscopic Neural Dynamics, Physica D, 99, 503-526.
Jongman, A. Blumstein, S. E., & Lahiri, A. (1985). Acoustic properties for dental and alveolar stop consonants: a cross-language study. Journal of Phonetics, 13, 235-251.
Kelso, J. A. S. (1990). Phase transitions and critical behavior in human bimanual coordination. American Journal of Physiology: Regulatory, Integrative, and Comparative Physiology, 15, R1000—R1004.
Kelso, J. A. S. (1995). Dynamic patterns: The self-organization of brain and behavior. Cambridge, MA: MIT Press.
Kelso, J. A. S., Ding, M., & Schöner, G. (1992). Dynamic pattern formation: A primer. In Baskin & J. Mittenthal (Eds.), Principles of organization in organisms pp. 397—439. Santa Fe Institute, Santa Fe, NM: Addison-Wesley Publishing Co.
Kelso, J. A. S., Saltzman, E., & Tuller, B. (1986). The dynamical perspective speech production: Data and theory. Journal of Phonetics, 14, 29-60.
Kelso, J. A. S., Tuller, B., & Harris, K. (1983). Converging evidence for the role of relative timing in speech. Journal of Experimental Psychology: Human Perception and Performance, 9, 829-835.
Kelso, J.A.S., & Zanone, P.G. (in press). Coordination dynamics of learning and generalization across different effector systems. Journal of Experimental Psychology: Human Perception & Performance
Kewley-Port, D., & Atal, B. S. (1989). Perceptual differences between vowels located in a limited phonetic space. Journal of the Acoustical Society of America, 85, 1726—1740.
Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling, Sage University paper series on quantitative applications in the social sciences, 07-011. Beverly Hills and London: Sage Publications.
Kuhl, P. K. (1993). Early linguistic experience and phonetic perception: Implications for theories of developmental speech perception. Journal of Phonetics, 21, 125—139.
Lindblom, B., MacNeilage, P., & Studdert-Kennedy, M. (1983). Self-organization processes and the explanation of phonological universals. In B. Butterworth, B. Comrie, & O. Dahl (Eds.) Explanations of linguistic universals. The Hague: Molton.
Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listners to identify English /r/ and /l/: II. Ther role of phonetic environment and talker variability in learning new perceptual categories. Journal of the Acoustical Society of America, 94, 1242—1255.
Logan, J. S., Lively, S. E., & Pisoni, D. B. (1991). Training Japanese listeners to identify english /r/ and /l/: A first report. Journal of the Acoustical Society of America, 89, 874—886.
Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39—57.
Petitot-Cocorda, J. (1985) Les catastrophes de la parole. De Roman Jakobson a Rene Thom. Paris: Maloine.
Pisoni, D. B., & Lively, S. E. (1995). Variability and invariance in speech perception: A new look at some old problems in perceptual learning. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 433—459). Baltimore, MD: York Press.
Polka, L. (1991). Cross-language speech perception in adults: Phonemic, phonetic, and acoustic contributions. Journal of the Acoustical Society of America, 89, 2961—2977.
Pols, L. C. W., Van der Kamp, L. J. T., & Plomp, R. (1969). Perceptual and physical spaces of vowel sounds. Journal of the Acoustical Society of America, 46, 458-467.
Schiffman, S. S., Reynolds, M. L., & Young, F. W. (1981). Introduction to multidimensional scaling. New York: Academic Press.
Schoner, G., & Kelso, J. A. S. (1988). A synergetic theory of environmentally-specified and learned patterns of movement coordination. I. Relative phase dynamics. Biological Cybernetics, 58, 71-80.
Schoner, G., Zanone, P. G. & Kelso, J. A. S. (1992). Learning as a change of coordination dynamics: Theory and experiment. Journal of Motor Behavior, 24, 29—48.
Schoner, G., Haken, H. & Kelso, J. A. S. (1986). A stochastic theory of phase transitions in human hand movement. Biological Cybernetics, 53, 442-452.
Shepard, R. N. (1972). Introduction to volume 1. in R. N Shepard, A. K. Romney, & S. Nerlove (Eds.), Multidimensional scaling: Theory and applications in the behavioral sciences, Volume 1/Theory (pp. 1-19). New York: Seminar Press.
Sporns, O., & Edelman, G.M. (1993). Solving Bernstein’s problem: A proposal for the development of coordinated movement by selection. Child Development, 64, 960-981.
Stevens, K. N. & Blumstein, S. E. (1975). Quantal aspects of consonant production and perception: A study of retroflex consonants. Journal of Phonetics, 3, 215-233.
Stevens, K. N., & Blumstein, S. E. (1978). The search for invariant acoustic correlates of phonetic features. In P. Eimas & J. L. Miller (Eds.), Perspectives on the study of speech. Hillsdale, NJ: Erlbaum.
Strange, W. (1995). Speech perception and linguistic experience: Issues in cross-language research. Baltimore, MD: York Press.
Sussman, J., & Lauckner-Morano, V. J. (1995). Further tests of the "perceptual magnet effect" in the perception of [I]: Identification and change/no change discrimination.
Journal of the Acoustical Society of America, 97, 539-552. Tees, J. F., & Werker, R. C., (1984). Perceptual flexibility: Maintenance of recovery of the ability to discriminate nonnative speech sounds. Canadian Journal of Psychology, 38, 579-590.
Temprado, J.J., Zanone, P.G., Monno., A., & Laurent, M. (1999). Attentional load associated with performing and stabilizing preferred bimanual patterns. J Exp. Psych: Human Perception Performance 25, 1595-1608.
Tuller, B., & Kelso, J. A. S. (1990). Phase transitions in speech production and their perceptual consequences. In M. Jeannerod (Ed.), Attention and Performance XIII (pp. 429-452). Hillside, NJ: Erlbaum.
Tuller, B., Case, P., Ding, M., & Kelso, J. A. S. (1994). The nonlinear dynamics of speech categorization. Journal of Experimental Psychology: Human Perception and Performance, 20, 3-1
van Gelder, T., & Port, R. (1995). Its about time: An overview of the dynamical approach to cognition. In R. Port & T. van Gelder (Eds.), Mind as Motion: Explorations in the dynamics of Cognition (pp. 1-44). Cambridge, MA: MIT Press.
Werker, J. F. & Lalonde, C. (1988). Cross-language speech perception: Initial capabilities and developmental change. Journal of Phonetics, 24, 672--683.
Wildgen, W. (1990). Basic principles of self-organization in language. In H. Haken and M. Stadler (Eds.), Synergetics of cognition (pp. 415—426). Berlin: Springer-Verlag.
Zanone, P. G., & Kelso, J. A. S. (1992). The evolution of behavioral attractors with learning: nonequilibrium phase transitions. Journal of Experimental Psychology: Human Perception and Performance, 18, 403—421.
Zanone, P. G., & Kelso, J. A. S. (1994). The coordination dynamics of learning: Theoretical structure and experimental agenda. In S. P. Swinnen, H. Heuer, J. Massion, P. Casaer (Eds.), Interlimb coordination: Neural, dynamical, and cognitive constraints (pp. 461—490). San Diego: Academic Press.
Zanone, P. G., & Kelso, J. A. S. (1997). The coordination dynamics of learning and transfer: A multilevel study. Journal of Experimental Psychology: Human Perception and Performance, 23, 1454-1481.
Author Note
Pamela Case, Department of Psychology, St. Andrews Presbyterian College, Laurinburg, NC 28345.
Betty Tuller and J. A. Scott Kelso, Program in Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL 33062.
This work was performed by the first author in partial fulfillment of the requirements for a doctoral degree in Complex Systems and Brain Sciences at Florida Atlantic University. The work received support from NIMH grant R01-MH42900, NIMH Training Program T32-MH19116, and NSF grant DBS-9213995. The manuscript was completed while BT and JASK were Visiting Fellows at The Neurosciences Institute, San Diego, CA, and their support is gratefully acknowledged.
We wish to thank Shyam Agrawal and Linda Polka, respectively, for early support in the development of this project in the form of Klatt synthesis document files and a tape of Hindi speech. Footnotes
1Four of the Hindi participants we’re also speakers on the training tape.
2Three of the speakers did not produce three acceptable tokens of at most two of the 16 utterance types; additional acceptable tokens were substituted from another speaker with the same native language.
3One Hindi listener who had been exposed to Malayalam for several years in her youth showed a strongly sigmoidal identification function.
4 MDS solutions that take order of presentation into account were performed as follows: lower half triangular matrices were constructed for each listener. The matrices had data only for pairs of stimuli presented with the acoustically more dental stimulus first (D1st pairs) or only for pairs of stimuli presented with the acoustically more alveolar stimulus first (A1st pairs). Note that the validity of the MDS solution decreases with N. |
|
|
|