
|

| Home | Articles | Article Details |
  |
Note: This article will be peer-reviewed for ASHA CEUs. If you are interested in receiving ASHA CEUs, be sure to check back.
Introduction
Study Purposes
The data collected during the LENA Natural Language Study has resulted in a corpus of spontaneous speech data representative of the language environment of infants and toddlers (2 months – 48 months of age). Daily speech recordings collected during this study provided the basis for the development of the advanced audio processing algorithms central to the LENA System. Following Hart and Risley’s (1995) seminal language development research, these data were also intended to establish normative information about patterns of talk and Adult Word Counts (AWC), Child Vocalizations (CV), and Conversational Turns (CT) in the households of infants and toddlers and to validate the earlier Hart and Risley research. Study Overview
The LENA Natural Language Study is an ongoing multiphase data collection effort. The current paper describes Phase I, the normative data collection phase that took place between January – June 2006, and the first 18 months of Phase II, an extended longitudinal data collection phase started in July 2006. At the onset of Phase I, parents of infants and toddlers (predominantly 2 months – 36 months of age, with an additional 15 children; 37 months to 48 months) were recruited through advertisements in local newspapers and direct mail solicitation. Potential participants were selected based on demographic considerations such as the child’s age and the mother’s education level. Participating families in Phase I provided day-long audio recordings once per month for six months and visited LENA’s child language research center for a standard evaluation by a certified speech-language pathologist. We compiled speech recording data into the LENA Natural Language Corpus from which we produced normative estimates of daily AWCs, CTs, and CVs in the language environment of infants and toddlers. Phase I audio data have been supplemented by additional audio recordings during Phase II to provide normative information for children up to 48 months old. The LENA Natural Language Study has been reviewed and approved by Essex Institutional Review Board (IRB) to help ensure that the rights and welfare of research participants were protected and that the study was conducted in an ethical manner.
Methods
Demographics
A total of 334 children ages 2 months – 48 months from monolingual English-speaking households began the normative Phase I of the LENA Natural Language Study. Due to attrition and other factors, 311 participants completed this phase. There were 329 participants who contributed at least one valid 12-hour recording during Phase I.
Inclusive/Exclusive Criteria
To ensure a representative sample, we recruited children across an even age distribution and tried to match the US census with respect to the mothers’ attained education level, which has been shown to be correlated with child language development (e.g., Arterberry, Midgett, Putnick, & Bornstein, 2007). We selected families according to the following criteria:- Age distribution: We recruited roughly eight children from each age month, ages 2 – 36 months, plus 15 children ages 37 – 48 months.
- Mother’s education: The normative development sample was intended to represent the US population with respect to mothers’ education level (i.e., 23% college diploma, 29% some college, 26% high school diploma and no college, 22% no high school diploma). The goal was to match this distribution for each age month interval. For example, for the eight children who were six months old at the beginning of the study, on average two of them should have had mothers with college degrees, two should have had mothers with some college experience, two should have had mothers with high school diplomas (but no college classes), and two should have had mothers who did not graduate from high school.
- Household language: Participants were selected from English-speaking households only.
- Normative language development: Families with children who had been diagnosed with a language or developmental delay or disability were excluded from the original development study, as the data were collected for the purpose of establishing normative information about a typically developing population.
We actively sought participants from a representative distribution of children who were born premature, children who attended daycare, and children who had older or younger siblings, since any of these factors could influence language development. Table 1 summarizes the distribution of the 329 participants who contributed at least one valid recording with respect to mothers’ attained education levels compared to the US census. Appendix A lists the distributions of mothers’ education levels, child gender and number of participants at each age level at the beginning of the normative Phase I of the LENA Natural Language Study (January, 2006).
Table 1. Mother's Attained Education Compared to the 2004 US Census.2

Attrition and Elimination
Of the 334 participants who started the study, 311 completed it. Ten participants proved to be too difficult to work with and were asked to leave the study, and thirteen participants dropped out because they moved out of the area or for other reasons.3
Participant Recruitment
We recruited participants through advertising in local newspapers and direct mail solicitation. Interested parents responded by contacting a call center representative who asked them to provide demographic information about their child and household. We selected potential participants based on criteria such as child age and mother’s education level. Research assistants called to explain the study further; parents who remained interested were sent an informed consent form and questionnaire to collect additional demographic information. In total, 334 parents returned the consent form and were assigned participant ID numbers. Table 2 details the number of respondents at each stage of the recruitment process.4
Table 2. Number of Respondents at Each Recruitment Stage.

Additional Recruitment of 2-month-olds
To ensure the collection a sufficient number of audio recordings for the normative database from very young children, approximately eight 2-month-olds were recruited each month between February-June, 2006. Those recruited in February contributed five recordings from February through June; those recruited in March contributed four recordings from March through June, and so forth. Note that for purposes of simplicity, the summary in Table 2 includes information for children recruited from February through June. Appendix B provides further information about the number of 2-month-olds recruited each month after January.
Materials5
Preliminary Documents
Prior to the first recording session, participants were asked to sign an informed consent form detailing the study procedures and requirements for participation. They also completed a demographics questionnaire that requested additional detailed information about the child and the household.
Recording Session Materials
Participating families received a packet containing recording materials the day before a scheduled recording session. The packet included several instructional documents. The “how to record” booklet provided step-by-step instructions about what to do on the recording day and quick reference inserts about materials and study protocol.
At the end of each recording session, parents completed a Session Questions form to provide detailed information about the events related to the specific recording session. Parents of children who were between 8 months – 30 months of age were also asked to complete the MacArthur Communicative Development Inventory (Fenson et al., 2007), a parent self-report survey that asks about the child’s language development, and the types of words that the child says/understands.
Professional Evaluation Session Materials
Table 3 describes the standard developmental assessments administered during professional evaluation sessions. Not all assessments were used during a session due to age or time constraints.
Table 3. Standard Assessments Administered During Professional Evaluation Sessions.

Apparatus
Phase II participants recorded with an early prototype of the LENA digital language processor (DLP). This DLP prototype weighed 2.5 ounces. In order for the battery to run for a full day, the LENA DLP required charging for at least four hours using the LENA charger. Participants were sent LENA vests to wear on their recording days. The LENA DLP slipped into the front pocket of the LENA vest.
Figure 1. LENA DLP Prototype, Charger, and Vest Used by Study Participants

Design
Phase I: Recording Sessions
All participants during Phase I were asked to record one day each month and were required to record for at least 12 consecutive hours each session. In addition, 61 participants completed a “double recording session” whereby they recorded on two consecutive days.
Phase I: Language Evaluation Sessions
Nearly all participants visited the LENA Foundation at least once to be evaluated by a certified speech-language pathologist (SLP) who administered between 3-4 standard language assessments.6 The purpose was to obtain an independent assessment of each participating child’s language abilities.
Approximately half of the participants completed two additional evaluation sessions (one every two months) during the six-month study. The purposes of the repeated sessions were 1) to determine the reliability of the individual assessments over time, and 2) to investigate the correlation between change over time in the assessments with change over time in the acoustic properties in the audio recordings. Table 4 shows the distribution of participants who completed single vs. repeated evaluation sessions by maternal education level. Table 5 lists average standard scores for the three most frequently administered language assessments.
Table 4. Distribution of Participants Completing Single and Repeated Observation Sessions by Maternal Education Level.

Table 5. Average Standard Scores for Language Assessments.

Phase I: Cognitive Evaluation Sessions
We selected a subset of participants (N=79) for an additional cognitive evaluation. During these visits a trained professional research assistant administered the Bayley Scales of Infant and Toddler Development (Bayley, 2006), a standardized assessment that provides information about cognitive abilities. Cognitive evaluations took place once every two months for six months. The purpose of these evaluations was to obtain detailed information about participants’ intellectual abilities for future analyses and correlations with LENA measurements. Table 6 details the demographic distribution of participants who completed cognitive evaluations by maternal education level.
Table 6. Maternal Education Distribution for Participants Completing Cognitive Assessment Sessions.

Phase II: Extended Longitudinal Study
Eighty Phase I participants were selected to participate in an extended longitudinal study and continue to provide natural language environment data. Phase II participants were chosen to provide a representative sample with respect to mother’s education. Participants continue to provide monthly audio recording data and to complete language evaluation sessions at approximately six month intervals. All valid recordings from this extended study are included in the normative database; thus, these participants may have contributed more than six recordings. Appendix C provides demographic information for Phase II (extended longitudinal) participants.
Procedures
Recording Sessions
For the first two recording sessions in Phase I, parents were contacted individually to schedule recording session appointments. During the second phone call, research assistants scheduled additional appointments until the end of the study by assigning a “magic number” to each family (e.g., the 9th of the month). Parents were asked to record on the same day each month, corresponding to their magic number. This procedure conserved staff resources for scheduling time and also ensured that the recording sessions would be on different days of the week each month. Appendix D shows the distribution of Phase I and II recording sessions by day of the week.
Parents received a recording packet at least one day before their recording sessions via FedEx. They were instructed to take the charger out of the recording materials packet immediately and charge the LENA DLP overnight. We asked parents to begin recording as soon as their child woke up in the morning and to record continuously until their child went to bed that night. Parents were informed that should they be uncomfortable with some aspect of the recording session, it would be erased and not included in the normative data at their request and at no penalty to them.
Once activated, the LENA DLP could not be turned off. We told parents to remove the LENA vest during baths or nap time (the vest is not intended as sleepwear), but to place it near the child and to continue recording during that time. Participants were asked to behave as they would on any other day and to engage in any regularly scheduled routines with one exception: for the first three months of the study, parents were asked to turn off any ambient noise (e.g., TV, radio), and for the second three months of the study they were told that ambient noise was okay. At the end of a recording session day parents were instructed to complete the included paperwork (i.e., Session Questions, language questionnaires) and to put all materials into a FedEx return envelope which they left on their doorstep for pick up the next morning. On delivery to the LENA Foundation, we uploaded the speech data to our electronic database and entered the other documentation into various databases.
Language and Cognitive Evaluation Sessions
Evaluation session appointments were scheduled within two weeks of participants’ recording sessions. Participants were the same age (in months) for each evaluation session and the corresponding recording session.
Participants were invited to come to the LENA Foundation in Boulder, Colorado for language and cognitive evaluations. They were told that the sessions would last approximately one hour and that during the session their child would interact with a LENA Foundation employee who would ask the child to do things like repeat words or point to pictures. Parents typically scheduled appointments for weekdays, but special Saturday evaluations were sometimes arranged. If a family had been selected for repeated evaluations but had scheduling conflicts, they were switched to single session status and completed only one evaluation. Parents were not permitted to bring siblings to appointments. At the start of a language evaluation session, the SLP explained to parents that she would be interacting with the child and taking notes. She told parents the sessions would be videotaped to provide a visual record of each child’s development.
The SLP administered as many assessments as was feasible during a one-hour period or for as long as the child was deemed to be sufficiently attentive. Typically, the Receptive-Expressive Emergent Language Test, 3rd Ed (REEL-3) (Bzoch, League, & Brown, 2003), the Preschool Language Scale, 4th Ed (PLS-4) (Zimmerman, Lee, Steiner, & pond, 2002), and the Cognitive Adaptive Test (CAT), and the Clinical Linguistic and Auditory Milestone Scale (CLAMS) (Accardo & Caput, 2005) were administered. If time permitted and the child was amenable, then the Peabody Picture Vocabulary Test (PPVT) (Dunn and Dunn, 1997) or the Goldman Fristoe 2, Test of Articulation (GFTA) (Goldman and Fristoe, 2000) were administered. After each evaluation, the SLP scored the assessments and entered the information into an electronic database. Participants who completed cognitive evaluations were administered the Cognitive, Gross Motor and Fine Motor sections of the Bayley Scales of Infant and Toddler Development (2006), plus, time permitting, the expressive and receptive language sections. Research assistants entered all data a second time to provide a check on data entry errors.
Participant Compensation
Participants were compensated $75 for each recording session ($6.25/hour) and $100 for each evaluation session ($50 for the session, $25 for travel and $25 for child care). Participants had the opportunity to earn a $200 bonus at the end of the study. We provided a list of bonus violations (e.g., missing evaluation appointments, forgetting recording session appointments, etc.) and deducted $50 from the bonus for each violation. Most participants received some portion of the bonus.
Results
Estimating Normative Population Values
A primary goal of the Phase I and Phase II studies was to determine population estimates for LENA measurements for reference purposes. All 3,066 recording sessions completed by participants from January 2006 through December 2007 were considered, and 87.5% ultimately were included in the normative set presented here.7 We excluded 12.5% of the recording sessions as detailed in Table 7.8 The final normative sample included 2,682 12-hour recording sessions from 329 participants ages 2 months to 48 months.
Table 7. Normative Recording Sample Exclusions.

Adult Word Count
The Adult Word Count (AWC) report within the LENA System software estimates the total number of adult words the child hears per hour, per day, and per month. Given that the range of ages of participants (2 months to 48 months) extends across developmental periods and that participants typically recorded over a six month (or longer) span, any significant increase or decrease in AWC with age could bias normative estimates. Thus, before estimating normative values, we examined the relationship between AWC and chronological age. We found no significant correlation (r(327)=.04, p=.47), and thus no age-related adjustments to AWC norms were made.9
During Phase I, participants contributed from one to seven recording sessions to the final normative sample (M=4.6, SD=1.4). During Phase II, participants contributed an additional 1 to 19 sessions (M=14.6, SD=4.7). To control for the variable number of recording sessions being contributed by each participant family, we first computed each family’s average AWC from all usable sessions and from these values computed the full sample mean and standard deviation (M=12,297, SD=6,462).10
We generated AWC estimates for the 1st to the 99th percentiles based on these values and assuming a standard normal (Gaussian) distribution. We implemented AWC estimates for each percentile into the LENA software as four values; for example, the display for LENA System users with counts greater than or equal to the 50th but less than the 51st percentile shows a ranking at the 50th percentile.11
Conversational Turns
The Conversational Turns (CT) report within the LENA System software provides an estimate of the total number of conversational turns the child engages in with an adult per hour, per day, and per month. As might be expected and in contrast to AWC, CT counts increase significantly by month of age (r(327) = 0.51, p<.01); further analyses suggested that both the variability and degree of positive skew of CT counts also increase with age. Thus, we computed a unique mean and standard deviation for CT for each month of age.12 Table 8 displays final CT mean and SD by age month.
Table 8. Conversational Turns (CT) Estimates by Month of Age.*

To reduce age-related and random variability further and to improve interpretability, we performed separate regressions of CT means and standard deviations across month of age.13 From the best-fit regression solutions we computed estimated CT means and standard deviations for each month of age from 0 months to 48 months. We generated final CT counts for the 1st to the 99th percentiles for each age month based on these values assuming a standard normal (Gaussian) distribution as was done for AWC.14 Figure 2 displays CT mean and standard deviation values along with the best-fit regression lines.
Figure 2. Best-fit Regression Solutions for Conversational Turn Counts.

Child Vocalization Frequency
The Child Vocalization (CV) report within the LENA System software provides an estimate of the total number of vocalizations the child produces per hour, per day, and per month. As was true for CT counts, CV counts increase significantly by month of age (r(327) = 0.61, p<.01), and further analyses suggested that both the variability and degree of positive skew of CV counts also increase with age. Thus, we computed a unique mean and standard deviation for CV for each month of age.15 Table 9 displays final CV mean and SD by age month.
Table 9. Child Vocalization (CV) Estimates by Month of Age*

As was done for conversational turns, we performed separate regressions of CV means and standard deviations across month of age to improve interpretability and to reduce variability.16 We generated estimated CT means and standard deviations for each month of age from 0 months to 48 months from these best-fit regression solutions. Again assuming a standard normal (Gaussian) distribution, we then generated final CV counts for the 1st to the 99th percentiles for each age month.17 Figure 3 displays CV mean and standard deviation values along with the best-fit regression line.
Figure 3. Best Fit Regression Solutions for Child Vocalization Counts

Normative Values for Comparison Groups
We estimated AWC and CT values for a college-educated comparison group by selecting 696 audio recordings contributed by 84 families from the normative sample in which the mothers had graduated college. We computed the mean AWC for this subsample and determined that the average college-educated AWC fell near the 70th percentile. We then generated a new set of comparison data based on the subsample mean. Similarly, we examined data from 102 participants in a separate pilot user study and determined the average LENA software user’s AWC fell near the 76th percentile compared with the reference sample. Based on this value we generated a second set of comparison data.
Summary
The research described in this paper represents the first known attempt to assemble and analyze a large corpus of full-day spontaneous speech data from the households of infants and toddlers. Creating the first such database, this study establishes the baseline for future research. However, we must acknowledge that the lack of a valid external source of comparison data renders it difficult to identify unusual or suspect patterns of results. We hope that other researchers will use the LENA System to collect similar data that may further validate the results reported here and in the LENA software V3.1.0.
Although we made every effort to recruit a sample that was representative of the US population, there nevertheless remain several potential sources of selection bias. First, the sample population only included parents who responded to our recruitment ad. It is unclear whether these parents may have been more enthusiastic than other parents and perhaps might have had higher word counts or more advanced children. Second, although originally we included in our “No High School Education” group parents who had completed the GED, we have now moved them to the “High School” group to conform more closely to US Census grouping. This change makes it clearer that the group with the least education is likely to be underrepresented in our sample.
Research at the LENA Foundation is ongoing, and our research participants continue to provide data that will enable us to refine our normative estimates further. In particular, as our sample of older children increases, the variability in our AWC, CV, and CT estimates for these ages should be reduced.
Notes- See www.lenababy.com/Study.aspx for more information about Hart and Risley (1995) and The Power of Talk for a sample of findings from the LENA Natural Language Study.
- Note that the percentages in the first two education groups has changed since LTR-01-1 because the participants who had obtained GEDs were moved from the ‘Some High School’ group to the ‘High School Diploma’ group to more closely reflect the census grouping.
- Most of the 23 participants who dropped out or were eliminated provided no usable recordings.
- Of the 435 families who were invited to participate, 117 had previously participated in a pilot study conducted by the LENA Foundation in 2005.
- Samples of all materials are available on request.
- We were unable to schedule assessment sessions with 16 participants.
References
Accardo, P .J. and Capute, A.J. (2005). The Capute Scales: Cognitive Adaptive Test/Clinical Linguistic & Auditory Milestone Scale. Baltimore: Paul H. Brookes Publishing Co., Inc.
Arterberry, M.E., Midgett, C., Putnick, D.L., & Bornstein, M.H. (2007). Early attention and literary experiences predict adaptive communication. First Language, 27(2), 175-189.
Bayley, N. (2006). Bayley Scales of Infant and Toddler Development, Third Edition. San Antonio: Harcourt Assessment, Inc.
Bzoch, K.R., League, R., and Brown, V.L. (2003). Receptive-Expressive Emergent Language Test, Third Edition. Austin: PRO-ED.
Dunn, L.M. and Dunn, L.M. (1997). Peabody Picture Vocabulary Test, Third Edition. Circle Pines: American Guidance Service.
Fenson, L., Dale, P .S., Reznick, J.S., Thal, D., Bates, E., Hartung, J.P., Pethick, S. & Reilly, J.S. (2007). MacArthur-Bates Communicative Development Inventories, Second Edition. Baltimore: Paul H. Brookes Publishing Co.
Gilkerson, J. and Richards, J. (2009). The Power of Talk: The Impact of Adult Talk, Conversational Turns, & TV during the Critical Years (0-4) of Child Development. Retrieved April 27, 2009 from www.speechpathology.com
Goldman, R. and Fristoe, M. (2000). Goldman-Fristoe Test of Articulation-2. Circle Pines: American Guidance Service, Inc.
Hart, B., and Risley, T.R. (1995). Meaningful Differences in the Everyday Experience of Young American Children. Baltimore: Paul H. Brookes Publishing Co., Inc.
Zimmerman, I.L., Steiner, V.G., and Pond, R.E. (2002). Preschool Language Scale, Fourth Edition. San Antonio: The Psychological Corporation.
Appendix A. Gender and Education Distribution by Age at First Recording (cont.)

Click Here to View Larger Version of Appendix A
Appendix B. Gender and Education Distribution for 2-Month Olds

Appendix C. Gender and Education Distribution by Age for 80 Longitudinal Participants in July 2006


Appendix D. Phase I and II Recording Sessions Per Day of the Week

|
|
|
|