A Comparison of Articulatory Assessment: the Goldman-Fristoe Test of Articulation-2 versus the Arizona Articulation Proficiency Scale-3

Amy C. Ogburn, Ph.D., CCC-SLP
December 3, 2012

 AbstractPurpose: Most speech-language pathologists (SLPs) would agree that articulation tests are valid, and when properly administered, should, in theory, generate similar results. However, most SLPs would agree that this is not the case. Several possible explanations have been posed to account for this effect (Schissel & James, 1979). The purpose of the study was to evaluate the efficacy of two standardized articulation instruments, Goldman-Fristoe Test of Articulation, Second Edition (GFTA-2; Goldman & Fristoe, 2000) and Arizona Articulation Proficiency Scale, Third Revision (Arizona-3; Fudula, 2000).Methods: Forty-one 3- and 4-year-old normally developing children were assessed with both instruments, which were counterbalanced. Standard scores, along with percentage consonants correct (PCC; Shriberg & Kwiatkowski, 1982; Shriberg, Kwiatkowski, Best, Hengst, & Terselic-Weber, 1986), were calculated for each child.Results: Mean standard scores revealed statistically significant differences between these two instruments on the study population with a mean Arizona-3 score 6.4 points lower than the GFTA-2. PCC also revealed disparities between the two instruments.Conclusions: Several explanations for these findings include differences in the composition of each instrument, the questionable validity of both instruments, and clinically significant normative data differences for specific phonemes classified as either glides or fricatives.       A Comparison of Articulatory Assessment: the Goldman-Fristoe Test of Articulation-2 versus the Arizona Articulation Proficiency Scale-3. Most speech-language pathologists (SLPs) would agree the majority of the standardized tests on the market today are valid and reliable indicators of articulatory development and behavior and when administered to an individual, these instruments should, at least in theory, generate similar results. However, most SLPs would also agree that this is not observed to be the case. Several possible explanations may then be posed to account for this effect. Schissel and James (1979) suggested that either (1) the tests are not valid or (2) that testing instruments are assessing different facets of articulatory behavior. Another possible explanation may be that the normative data between these tests is not identical, thus creating different age criteria for the production of individual phonemes.Schissel and James (1979) evaluated differences between the Deep Test of Articulation (DTA; McDonald, 1964) and the Arizona Articulation Proficiency Scale: Revised (AAPS; Fudula, 1974) and made comparisons on the performance for individual phonemes, total test performance, and differences in children who were judged as needing treatment. The DTA was chosen as it provides the clinician with multiple opportunities to observe production of each target in a variety of contexts. The AAPS was created to determine the precision of articulation and to identify individuals requiring intervention. The DTA and the AAPS were selected as the best instruments because the theoretical basis of each test lies in direct opposition to one another. The DTA allows multiple productions of a phoneme, whereas the AAPS elicits the phoneme only once, which is a method more representative of traditional articulation tests. Schissel and James found that difference between the accuracy of phoneme production occurred on 8.2% of test items, but that the trend appeared in approximately 83% of children assessed. The authors go on to note:“Though the decision is subjective, it is thought that 18 of these subjects produced correctly one or more sounds with sufficiently greater frequency on the DTA than on the AAPS as to constitute an important difference; that is, a difference sufficiently great as possibly to alter conclusions about the adequacy of the child’s production” (p.366).Schissel and James suggested that 14 participants (N=29) demonstrated more accurate production on one or more phonemes from the AAPS as compared to the DTA, and 8 children displayed greater production accuracy on some phonemes from the AAPS and some from the DTA. In their study, only 5 children presented with consistent performance across both measures. However, when the severity rating of the AAPS and the performance on the DTA was compared, results were similar in terms of which children would need follow-up therapy. Children with a moderate or severe impairment rating on the AAPS were found to be candidates for treatment based on results from the DTA. The authors stated that children with a rating of severe would likely qualify for therapy on the basis of either test alone, but children in the moderate category may or may not receive services based on other variables, such the size of a clinician’s case load or the level of parental concern.Schissel and James concluded that scores from the AAPS may be deceptive in that some phonemes may appear to have consistently correct production, when in reality, they do not. The investigators suggest the evidence is illustrated by the variance in accurate production of 8.2% of the test items between instruments, but the pattern was apparent in 83% of children assessed. These investigators proposed that the clinician must determine whether this is “an acceptable trade-off” for the speed and ease associated with the administration of the AAPS. These researchers also indicate the AAPS scoring system may be defective. For example, the number of trials for each phoneme on the AAPS is too low, thus, limiting a child’s response to either 0% or 100% production accuracy. Also, they suggest that the AAPS only credits some of the phonemes, specifically the /ʤ/, which is only tested once, and the phonemes, /ʃ, θ/, which are assessed twice, with 0.5 points in each instance counting toward the total score of 100. Because each phoneme is assigned a value, which is based on the frequency of occurrence in American English, the test can reflect the severity of impairment in individual’s speech. Further, the AAPS does not take into account the frequency of misarticulated target sounds, which affects specific classes of phonemes (i.e., consonants) more than others. Schissel and James concluded, “thus, the system is based on the implicit assumption that a sound produced either correctly or incorrectly in the few contexts in which it is tested by the AAPS will be produced in the same fashion consistently. Results of this study indicate that this assumption is tenuous (p. 370).”            However, when the overall findings were compared between the DTA and the AAPS, results suggested that in the majority...

Dr. Amy Ogburn is an associate professor at Auburn University Montgomery. She received a Ph.D. from the University of South Alabama in Mobile, Alabama in 2003. Currently, she teaches, performs research and is engaged in clinical practice with both children and adults.

