From the Desk of Ann Kummer
When I was in graduate school for my master’s degree, I took a required class on speech science. I have to confess that I found it very boring, and I did not see how the information that I was learning had any relevance to my future clinical practice. That was my fault (or maybe, in part, the fault of the professor). As I developed a passion for cleft palate and velopharyngeal dysfunction in my clinical practice, I came to realize that the acoustics of what we hear in speech is all due to the physics of sound transmission and airflow. The more I paid attention to the science, the better understood the effects of abnormal structure on resonance and speech sound disorders.
I now believe that it is very important for speech-language pathologists to consider the science of speech production in their clinical practice. Therefore, I am very excited that Dr. Amy Neel agreed to put together this excellent tutorial on speech science in clinical practice for us.
By way of introduction, Dr. Neel received a bachelor’s degree in Speech-Language Pathology from Texas Christian University, a master’s degree in Speech-Language Pathology from the University of Oklahoma Health Sciences Center, and a Ph.D. in Speech and Hearing Science and Cognitive Science from Indiana University. Her teaching interests include speech science, phonetics, and motor speech disorders. Her research focuses on speech intelligibility in normal speakers and speakers with dysarthria, including those with Parkinson's disease, Pompe disease, and oculopharyngeal muscular dystrophy. Currently, she is investigating speech biomarkers for repetitive head injury in professional fighters. She is the Coordinator for ASHA Special Interest Group 19 for Speech Science.
In this article, Dr. Neel discusses why and how speech science is useful in understanding speech disorders and differences. She provides several examples of how to use speech science principles and techniques in assessing and treating speech disorders. I think you will love this article as much as I do.
Now…read on, learn, and enjoy!
Ann W. Kummer, PhD, CCC-SLP, FASHA, 2017 ASHA Honors
Browse the complete collection of 20Q with Ann Kummer CEU articles at www.speechpathology.com/20Q
20Q: Using Speech Science in Clinical Practice
After this course, readers will be able to:
- Describe methods used by speech scientists to understand speech
- Explain how the source-filter theory of speech applies to speech and voice disorders
- Discuss objective acoustic and physiologic measures for speech
- Give examples of speech science-based biofeedback measures for treating speech disorders.
1. What is speech science?
Speech science is an interdisciplinary field that involves the study of the production, transmission, and perception of speech. For most people, speech is the primary mode of language expression – we make use of our body’s ability to produce sounds to convey linguistic information to listeners. Speech science includes the anatomic and physiologic study of the body systems used to produce and perceive sound. It can also involve psychological aspects of communication such as sensation, perception, and cognition. Speech science incorporates acoustics, the branch of physics dedicated to understanding how sound waves are produced, transmitted, and received. The field has expanded to include computer-based speech recognition, such as the technology used for voice assistants like Siri and Alexa.
2. How do speech scientists approach the study of speech in their research?
Some speech scientists use acoustic techniques in studying typical speech, speech disorders, and speech differences. For example, I use waveforms, spectrograms, pitch and intensity trackers, and other acoustic measures to understand reduced speech intelligibility in speakers with neuromuscular disorders. Other researchers employ physiologic measures such as ultrasound, electromagnetic articulography, electropalatography, and pressure transducers to examine articulatory movements; electroglottography and videostroboscopy to assess laryngeal behavior; magnetometry and spirometry to characterize respiratory support for speech; and MRI and nasoendoscopy to evaluate velopharyngeal activity. Some speech scientists use auditory-perceptual techniques with listeners to study speech intelligibility, naturalness, dialect, and accent.
Speech science instructors and researchers come from a variety of backgrounds. Some of us have clinical backgrounds as speech-language pathologists and audiologists. Others have advanced degrees in linguistics, psychology, cognitive science, computer science, physiology, and neuroscience.
3. Why do so many students find speech science classes difficult to understand, intimidating, or just plain boring?
Speech science classes often feature the use of mathematics to understand the production and transmission of sound. The deep desire of SLPs to help people communicate is not always accompanied by a passion for math and physical sciences. In fact, students express feelings of anxiety and low confidence levels about using math (Smith, 2017). So, it’s understandable that having to use logarithms to calculate intensity levels in decibels or to employ trigonometric functions for understanding sine waves is stressful for many communication sciences and disorders students.
Learning about the nature of sound waves can seem a bit remote from clinical applications as well. But clinicians need to recognize that effective communication can be affected at any stage in the speech chain. Speech may be produced in an atypical manner by the speaker, the environment may interfere with the clear transmission of sound, or the listener may not receive or interpret acoustic information properly. All of those aspects must be considered in assessing and treating communication disorders and differences.
It can also be difficult to bridge the gap between the theoretical knowledge gained in speech science classes and the clinical knowledge that students wish to acquire. Although many students are taught the source-filter theory of speech production, it may be difficult to apply it to clinical situations without scaffolding from instructors. A greater focus on active learning and clinical applications is occurring in our field (for example, see the ASHA Teaching Symposium on Foundational CSD Science Courses), but much work remains to be done by speech science instructors.
4. My speech science class focused a lot on the source-filter theory of speech production. Can you remind me what the source-filter theory is about?
The source-filter (or acoustic) theory of speech production is an explanation of the two-stage process of creating speech sounds. For voiced phonemes (vowels, liquids, nasals, and glides), the source is the complex glottal tone produced by the vibration of the vocal folds. The vocal tract, which consists of the throat, mouth, and nose, filters or shapes the tone produced by the vocal folds into speech sounds. The vocal tract filter allows some frequencies of that multi-frequency glottal tone pass through to the outside air and reduces the intensity of other frequencies. When you change the shape of your vocal tract by moving your tongue, lips, and jaw, you change the frequencies that pass through or are filtered out. For fricative, stop, and affricate consonants, the source created by turbulent airflow at the site of vocal tract constriction is also filtered by the vocal tract.
The source-filter theory is important for understanding differences among vowel sounds. Different vocal tract shapes result in different patterns of formant frequencies, the bands of acoustic energy that we use to perceive vowel phonemes. For example, the vowel /i/ as in “heat” is produced with the tongue positioned high (close to the roof of the mouth) and relatively forward in the mouth. It has a low-frequency first formant, around 300 Hz, and a high-frequency second formant, around 2500 Hz. If you keep your tongue high in position but move it back toward the soft palate to produce the vowel /u/ as in “hoot,” the first formant remains low (300 Hz), and the second formant moves lower in frequency (around 900 Hz). Changes in the shape of the vocal tract result in changes in the patterns of acoustic energy that pass through the mouth, and those acoustic patterns are perceived by listeners as different vowel sounds.
5. But why do clinicians need to understand source-filter theory to assess and treat speech disorders?
In addition to explaining the production of typical speech sounds, the source-filter theory can provide the foundation for differential diagnosis of speech disorders and differences. You can think about whether the source or the filter or both components of speech are affected, and that knowledge can guide assessment and treatment considerations. For someone with a hoarse voice whose consonants and vowels are clearly produced, the source component is involved, so you choose assessment techniques that elucidate the function of the larynx and respiratory system. When you hear imprecise consonants accompanied by normal voice quality, you select articulation as your treatment target because the filter is affected. Sometimes speech disorders are complex, especially those associated with neurogenic disorders because several speech systems are affected. The source-filter theory can help you sort out which deficits – those related to producing the laryngeal tone and those responsible for shaping the tone into speech sounds – contribute the most to speech intelligibility. For example, people with hypokinetic dysarthria related to Parkinson's disease can have low-intensity and breathy voice quality, restricted range of pitch and loudness, and distorted consonants. Because both source and filter components of speech are involved, you should consider treatments that address both phonation and articulation, such as loud speech or clear speech.
6. Can you give me another concrete example of the clinical applicability of the source-filter theory?
Let’s think about a client who has had a laryngectomy, removal of their “voice box” because of head and neck cancer. Because they no longer have vocal folds, they can’t produce the complex glottal tone that serves as the source for speech. The vocal tract – all of the speech structures above the vocal folds – is still intact and able to filter sound. Rehabilitation focuses on replacing the speech source with something else that produces a complex sound. The buzzing sound of an electrolarynx placed against the neck is shaped by the articulators into speech sounds. For esophageal speech users, vibration of the upper esophageal tissues creates the sound that is shaped into speech sounds by the articulators.
7. If auditory-perceptual evaluation by listeners is the “gold standard” for clinical decision-making in speech disorders, why do we need to understand acoustic and physiologic measures from speech science?
Listener judgments are incredibly important in diagnosing speech disorders, determining severity, and tracking changes in speech over time, because the primary aim of speech therapy is for clients to effectively communicate to listeners. As described by Kent (1996), however, these subjective judgments of speech are susceptible to error and bias, and listeners do not always agree on the nature and severity of speech deficits. Objective acoustic and physiologic measures can be useful in assessing and treating speech disorders as well as in documenting progress in treatment along with auditory-perceptual judgments.
8. Okay, let’s look at how speech science can help me in the clinic. How can acoustic speech science techniques help me diagnose and treat speech sound disorders in children?
Phonetic transcription is often used in assessing speech sound errors, but it is not always a straightforward task. Armed with knowledge of acoustic speech sound characteristics, waveforms and spectrograms can help you detect subtle features of consonant production. For example, children who do not seem to contrast initial voiced and voiceless stops may actually produce small differences in voice onset time (VOT), the interval between the stop burst, and onset of vocal fold vibration for the following vowel. These small VOT differences can be seen in acoustic displays but are not perceptible to our ears. Children who demonstrate this phonological knowledge may acquire the voicing contrast rapidly with therapy (Tyler et al., 1990).
Acoustic measures can serve as powerful visual biofeedback tools for remediating speech sound errors, especially for those difficult-to-treat rhotic phonemes, /ɹ/ and /ɚ/. Visual biofeedback allows children to see some aspect of articulatory behavior in real-time, so they can modify their movements to better match typical positions. Let’s look at some speech science-based ways to remediate the difficult-to-treat rhotic phonemes, The rhotic phonemes are characterized by low third formant frequencies and relatively high second formant frequencies compared to non-rhotic vowels. With target formant frequencies displayed on a computer screen in a real-time acoustic LPC spectrum, children can vary their tongue placement until they match a template for a more typical tongue position and achieve more accurate rhotic productions (McAllister Byun, 2017). A free application in the iOS app store called staRt makes this LPC biofeedback technique available to clinicians.
9. Are there any speech-science based physiologic techniques I can use with childhood speech sound disorders?
Rhotic errors can also be remediated with ultrasound and electropalatography (EPG) biofeedback. In ultrasound, a probe placed under the chin during speech production allows children to compare a visual image of their tongue movement with a model to achieve better production of rhotic phonemes (Preston et al., 2019). EPG uses a retainer-like pseudopalate containing small electrodes that register and display tongue contact with the hard palate on a computer screen. For some children, EPG tongue contact displays in conjunction with templates for more typical productions result in more accurate rhotic sound production (Hitchcock et al., 2017).
10. Can speech science measures help me work with second language learners who have accented speech?
Vowel production is one of the most common issues in producing native-like English for second-language (L2) learners. Tense vowels such as /i/ and /u/ that are common across languages are difficult for many L2 learners to differentiate from the lax vowels such as /ɪ/ and /ʊ/ that are relatively infrequent in other languages. The real-time LPC display of the F1 X F2 vowel quadrilateral in the Visi-Pitch (PENTAX Medical) supplies information about tongue height and tongue advancement during vowel production. Speakers who wish to produce more native-like English can produce better tense-lax vowel distinctions using this form of visual biofeedback (Carey et al, 2015). Using knowledge gained from speech science classes, clinicians can implement biofeedback procedures for consonant production, stress, and prosody to enhance intelligibility if desired by the L2 client.
11. Tell me about some physiologic techniques for diagnosing voice disorders.
Voice specialists have a long tradition of using objective acoustic and physiologic instrumentation in their practice along with auditory-perceptual measures, so understanding the relationships between laryngeal behavior and instrumental measures is important. Imaging techniques such as videostroboscopy and high-speed photography permit careful observation of vocal fold appearance and vibration. These approaches are invasive and expensive relative to other procedures, however, so they are not accessible to many clinicians. Less invasive physiologic techniques include aerodynamic measurement of glottal airflow rates and subglottal air pressure and electroglottographic assessment of contact between the vocal folds throughout the vibratory cycle. These measures also require special equipment, so their use in clinical settings is limited.
12. If I don’t have access to specialized equipment, are there other ways I can obtain objective measures of voice?
With inexpensive acoustic analysis packages, it is easy to obtain measures of vocal pitch, intensity, and quality. Pitch tracking software can be used to assess and treat clients with phonatory issues ranging from mutational falsetto, limited intonational range in dysarthria, and transgender voice change. Intensity tracking software and sound level meters provide feedback about vocal amplitude to clients undergoing loud speech therapy such as LSVT. Acoustic measures associated with breathy, strained, or hoarse vocal quality, such as cepstral peak prominence, jitter and shimmer, and low/high spectral ratio, can be used to detect and monitor voice disorders during treatment.
13. Can speech science help me treat clients with motor speech disorders?
Keeping in mind that strength training for speech disorders is controversial (Lee, 2021), knowledge of pressure concepts from speech science can be helpful in assessing and treating weakness of the speech mechanism. First, you need to determine that weakness is the likely cause of speech deficits in your client – weakness rarely causes childhood speech sound disorders, but it can affect speech in some neuromuscular disorders. Then you need to provide resistance to build muscle strength and follow the principle of specificity of training. Relatively inexpensive handheld devices such as the Iowa Oral Performance Instrument (IOPI) and the Tongueometer can be used to obtain objective measures of tongue and lip strength and to provide resistance and biofeedback for strengthening the tongue. Respiratory muscle trainers have been used to improve respiratory and phonatory function for a variety of neuromuscular disorders, and continuous positive airway pressure (CPAP) devices have been used to treat hypernasality by increasing velopharyngeal strength.
14. Speaking of hypernasality, what other speech science techniques are helpful for children with velopharyngeal dysfunction?
To understand hypernasality, you need a thorough understanding of the resonatory system for speech. Most speech sounds in English are resonated (or filtered) in the oral cavity – speakers close off the nose from the mouth by elevating their velum and moving the walls of the upper pharynx inward. For the three nasal sounds, /m/, /n/, and /ŋ/, the velum remains lowered so that acoustic energy flows into both the nose and the mouth. Individuals with velopharyngeal dysfunction, such as cleft palate, structural anomalies of the soft palate and pharynx, and neuromuscular disorders affecting the velopharyngeal valve are unable to separate the nose from the mouth for oral sounds, resulting in hypernasal speech. In addition to subjective judgments of hypernasality, we can use the objective measure of nasalance to quantify resonance balance. Nasometers use two microphones separated by a plate to compare the amount of acoustic energy coming from the nose with the amount coming from the mouth to generate a single nasalance value – higher nasalance values signal that more acoustic energy is present in the nasal cavity. Nasometry can also be used as visual biofeedback for clients who may benefit from behavioral therapy to reduce hypernasality.
15. I work with clients who rely on speech-generating devices for communication. Is it helpful for me to understand speech science for AAC?
Knowledge of digital signal processing, speech synthesis, and automatic speech recognition can be important for clinicians who work in augmentative and alternative communication. Older AAC devices featured synthetic speech that did not sound natural and was less intelligible than natural speech, and effectively incorporating prosody into synthetic speech remains challenging to this day. Speech science research has led to significant improvements in intelligibility and naturalness of synthetic speech. For clinicians who assist clients in voice banking and message banking for more personalized AAC output, understanding audio recording technology is helpful. SLPs may also be interested in improving the ability of voice assistants, such as Siri and Alexa, in recognizing the speech of people with communication disorders. For example, speech scientists are involved in Google Research’s Project Euphonia which aims to help voice-activated technology better understand atypical speech.
16. Are there any communication disorders for which speech science isn’t helpful?
Speech science is useful in understanding any speech disorder that affects respiration, phonation, articulation, resonance, prosody, neural control for speech, speech intelligibility, or speech naturalness. It can also help us understand the effect of hearing loss and other listener-related factors on speech perception and production. Although some concepts learned in speech science classes may help you treat swallowing disorders, speech science generally doesn’t help clinicians work with language, cognitive, or pragmatic communication disorders, and differences.
17. With the focus on diversity, equity, and inclusion in our profession, is speech science relevant to providing culturally responsive services?
Our speech conveys a great deal of information about our identity regardless of the intended linguistic message (Neel, 2021). Listeners make judgments about the age, gender, sexual orientation, region of origin, race, linguistic background, physical size, and health of the speaker from acoustic cues in the speech signal. These judgments about personal identity can evoke expectations, stereotypes, and prejudices that affect clinical care. Awareness of cues for personal identity in speech can help clinicians avoid misdiagnosis of speech disorders and differences and other discriminatory practices leading to health disparities in communication disorders.
18. I’m interested in using speech science in my clinical practice, but I’m a bit apprehensive about technology. What’s a good way to get started?
You can easily incorporate technology into your clinical practice by using acoustic analysis software to calculate diadochokinetic rates, measure maximum vowel duration, and estimate pitch range. For example, the free and easy-to-use application WASP2 (Huckvale, 2021) allows you to display waveforms, spectrograms, and pitch tracks in any computer browser. To obtain diadochokinetic rates in syllables per second using WASP2, record your client repeating alternating or sequential syllables, select five seconds of syllable production with the right and left mouse cursors, count the number of syllables, and divide that total by five. Counting syllable “blobs” in the waveform is much easier than marking dots on paper while managing a stopwatch! Similarly, it’s easy to mark the beginning and end of a sustained vowel and read the resulting duration for that interval from the computer screen. To obtain pitch range from a vowel glide or sentence, record the fundamental frequency at the highest and lowest points of the displayed pitch track.
For more advanced acoustic analysis users, the software package Praat (Boersma & Weenink, 2022) can be downloaded for Macintosh, Windows, and other operating systems. Praat is often used by speech scientists because it has tools for speech analysis, speech synthesis, manipulation of pitch and duration, and listening experiments. It allows users to write scripts to automate data collection for large sets of speech samples. Maryn (2017) has provided a tutorial for clinical voice analysis using Praat, and Styler’s “Using Praat for Linguistic Research” is helpful for SLPs as well.
19. Are there any books to help me review basic speech science concepts?
There are several textbooks that are accessible and contain information about the clinical application of speech science. Take a look at Speech and Voice Science by Alison Behrman and Speech Science: An Integrated Approach to Theory and Clinical Practice by Carole Ferrand. Preclinical Speech Science: Anatomy, Physiology, Acoustics, and Perception (Hixon, Weismer, and Hoit) is also an excellent resource.
20. Does ASHA have any resources for speech science?
ASHA’s SIG 19, the Special Interest Group for Speech Science, is an excellent way to learn more about speech science research, teaching, and clinical applications. SIG 19 maintains an online community where affiliates share information about speech science topics, it sponsors speech-science related sessions at annual ASHA conventions, and it holds live online discussions for professional development hours each year. Articles on speech science topics are published throughout the year in the journal Perspectives of the ASHA Special Interest Groups.
American Speech-Language-Hearing Association. Special Interest Group (SIG) 19 for speech science. https://www.asha.org/sig/19/.
Behrman, A. (2021). Speech and voice science, Plural Publishing.
Boersma, Paul & Weenink, David (2022). Praat: doing phonetics by computer, Version 6.2.15. http://www.praat.org/
Carey, M. D., Sweeting, A., & Mannell, R. (2015). An L1 point of reference approach to pronunciation modification: Learner-centred alternatives to ‘listen and repeat’. Journal of Academic Language and Learning, 9(1), A18-A30.
e2 Scientific. The Tongueometer. https://e2scientific.com/the-tongueometer/
Ferrand, C. (2018). Speech science: An integrated approach to theory and clinical practice, Pearson.
Google Research. Project Euphonia. https://sites.research.google/euphonia/about/.
Hitchcock, E. R., Byun, T. M., Swartz, M., & Lazarus, R. (2017). Efficacy of electropalatography for treating misarticulation of/r. American Journal of Speech-Language Pathology, 26(4), 1141-1158.
Hixon, T. J., Weismer, G., & Hoit, J. D. (2018). Preclinical speech science: Anatomy, physiology, acoustics, and perception. Plural Publishing.
Huckvale, M. (2021). WASP2, Version 2.1. https://www.speechandhearing.net/laboratory/wasp/
IOPI Medical. Iowa Oral Performance Instrument. https://iopimedical.com/products/
Kent, R. D. (1996). Hearing and believing: Some limits to the auditory-perceptual assessment of speech and voice disorders. American Journal of Speech-Language Pathology, 5(3), 7-23.
Lee, A. (2021). 20Q: Non-speech oral motor treatments: Any evidence. SpeechPathology. com.
Maryn, Y. (2017). Practical acoustics in clinical voice assessment: a praat primer. Perspectives of the ASHA Special Interest Groups, 2(3), 14-32.
McAllister, T. staRt: Speech therapist’s app for /R/ treatment. https://wp.nyu.edu/byunlab/projects/start/
McAllister Byun, T. (2017). Efficacy of visual–acoustic biofeedback intervention for residual rhotic errors: A single-subject randomization study. Journal of Speech, Language, and Hearing Research, 60(5), 1175-1193.
Neel, A. T. (2021). Promoting cultural and linguistic competence in speech science courses. Perspectives of the ASHA Special Interest Groups, 6(1), 207-213.
PENTAX Medical. Visi-Pitch, Model 3950C. https://www.pentaxmedical.com/pentax/en/99/1/Visi-Pitch-Model-3950C-Computerized-Speech-Lab-CSL-Model-4500B
Preston, J. L., McAllister, T., Phillips, E., Boyce, S., Tiede, M., Kim, J. S., & Whalen, D. H. (2019). Remediating residual rhotic errors with traditional and ultrasound-enhanced treatment: A single-case experimental study. American Journal of Speech-Language Pathology, 28(3), 1167-1183.
Smith, J. M. (2017). Math anxiety among first‐year graduate students in communication sciences and disorders, Gauisus, 5, 1 – 13.
Styler, W. (2013). Using Praat for linguistic research. University of Colorado at Boulder Phonetics Lab. https://wstyler.ucsd.edu/praat/
Tyler, A. A., Edwards, M. L., & Saxman, J. H. (1990). Acoustic validation of phonological knowledge and its relationship to treatment. Journal of Speech and Hearing Disorders, 55(2), 251-261.
Neel, A. T. (2022). 20Q: Using Speech Science in Clinical Practice. SpeechPathology.com. Article 20533. Available at www.speechpathology.com