Copyright © 2024 AudiologyOnline - All Rights Reserved

Facebook tracking pixel

  • Back to Basics: Speech Audiometry

Janet R. Schoepflin, PhD

  • Hearing Evaluation - Adults

Editor's Note: This is a transcript of an AudiologyOnline live seminar. Please download supplemental course materials . Speech is the auditory stimulus through which we communicate. The recognition of speech is therefore of great interest to all of us in the fields of speech and hearing. Speech audiometry developed originally out of the work conducted at Bell Labs in the 1920s and 1930s where they were looking into the efficiency of communication systems, and really gained momentum post World War II as returning veterans presented with hearing loss. The methods and materials for testing speech intelligibility were of interest then, and are still of interest today. It is due to this ongoing interest as seen in the questions that students ask during classes, by questions new audiologists raise as they begin their practice, and by the comments and questions we see on various audiology listservs about the most efficient and effective ways to test speech in the clinical setting, that AudiologyOnline proposed this webinar as part of their Back to Basics series. I am delighted to participate. I am presenting a review of the array of speech tests that we use in clinical evaluation with a summary of some of the old and new research that has come about to support the recommended practices. The topics that I will address today are an overview of speech threshold testing, suprathreshold speech recognition testing, the most comfortable listening level testing, uncomfortable listening level, and a brief mention of some new directions that speech testing is taking. In the context of testing speech, I will assume that the environment in which you are testing meets the ANSI permissible noise criteria and that the audiometer transducers that are being used to perform speech testing are all calibrated to the ANSI standards for speech. I will not be talking about those standards, but it's of course important to keep those in mind.

Speech Threshold testing involves several considerations. They include the purposes of the test or the reasons for performing the test, the materials that should be used in testing, and the method or procedure for testing. Purposes of Speech Threshold Testing A number of purposes have been given for speech threshold testing. In the past, speech thresholds were used as a means to cross-check the validity of pure tone thresholds. This purpose lacks some validity because we have other physiologic and electrophysiologic procedures like OAEs and imittance test results to help us in that cross-check. However, the speech threshold measure is a test of hearing. It is not entirely invalid to be performed as a cross-check for pure tone hearing. I think sometimes we are anxious to get rid of things because we feel we have a better handle from other tests, but in this case, it may not be the wisest thing to toss out. Also in past years, speech thresholds were used to determine the level for suprathreshold speech recognition testing. That also lacks validity, because the level at which suprathreshold testing is conducted depends on the reason you are doing the test itself. It is necessary to test speech thresholds if you are going to bill 92557. Aside from that, the current purpose for speech threshold testing is in the evaluation of pediatric and difficult to test patients. Clinical practice surveys tell us that the majority of clinicians do test speech thresholds for all their patients whether it is for billing purposes or not. It is always important that testing is done in the recommended, standardized manner. The accepted measures for speech thresholds are the Speech Recognition Threshold (SRT) and the Speech Detection Threshold (SDT). Those terms are used because they specify the material or stimulus, i.e. speech, as well as the task that the listener is required to do, which is recognition or identification in the case of the SRT, and detection or noticing of presence versus absence of the stimulus in the case of SDT. The terms also specify the criterion for performance which is threshold or generally 50%. The SDT is most commonly performed on those individuals who have been unable to complete an SRT, such as very young children. Because recognition is not required in the speech detection task, it is expected that the SDT will be about 5 to 10 dB better than the SRT, which requires recognition of the material. Materials for Speech Threshold Testing The materials that are used in speech threshold testing are spondees, which are familiar two-syllable words that have a fairly steep psychometric function. Cold running speech or connected discourse is an alternative for speech detection testing since recognition is not required in that task. Whatever material is used, it should be noted on the audiogram. It is important to make notations on the audiogram about the protocols and the materials we are using, although in common practice many of us are lax in doing so. Methods for Speech Threshold Testing The methods consideration in speech threshold testing is how we are going to do the test. This would include whether we use monitored live voice or recorded materials, and whether we familiarize the patient with the materials and the technique that we use to elicit threshold. Monitored live voice and recorded speech can both be used in SRT testing. However, recorded presentation is recommended because recorded materials standardize the test procedure. With live voice presentation, the monitoring of each syllable of each spondee, so that it peaks at 0 on the VU meter can be fairly difficult. The consistency of the presentation is lost then. Using recorded materials is recommended, but it is less important in speech threshold testing than it is in suprathreshold speech testing. As I mentioned with the materials that are used, it is important to note on the audiogram what method of presentation has been used. As far as familiarization goes, we have known for about 50 years, since Tillman and Jerger (1959) identified familiarity as a factor in speech thresholds, that familiarization of the patient with the test words should be included as part of every test. Several clinical practice surveys suggest that familiarization is not often done with the patients. This is not a good practice because familiarization does influence thresholds and should be part of the procedure. The last consideration under methods is regarding the technique that is going to be used. Several different techniques have been proposed for the determination of SRT. Clinical practice surveys suggest the most commonly used method is a bracketing procedure. The typical down 10 dB, up 5 dB is often used with two to four words presented at each level, and the threshold then is defined as the lowest level at which 50% or at least 50% of the words are correctly repeated. This is not the procedure that is recommended by ASHA (1988). The ASHA-recommended procedure is a descending technique where two spondees are presented at each decrement from the starting level. There are other modifications that have been proposed, but they are not widely used.  

Suprathreshold speech testing involves considerations as well. They are similar to those that we mentioned for threshold tests, but they are more complicated than the threshold considerations. They include the purposes of the testing, the materials that should be used in testing, whether the test material should be delivered via monitored live voice or recorded materials, the level or levels at which the testing should be conducted, whether a full list, half list, or an abbreviated word list should be used, and whether or not the test should be given in quiet or noise. Purposes of Suprathreshold Testing There are several reasons to conduct suprathreshold tests. They include estimating the communicative ability of the individual at a normal conversational level; determining whether or not a more thorough diagnostic assessment is going to be conducted; hearing aid considerations, and analysis of the error patterns in speech recognition. When the purpose of testing is to estimate communicative ability at a normal conversational level, then the test should be given at a level around 50 to 60 dBHL since that is representative of a normal conversational level at a communicating distance of about 1 meter. While monosyllabic words in quiet do not give a complete picture of communicative ability in daily situations, it is a procedure that people like to use to give some broad sense of overall communicative ability. If the purpose of the testing is for diagnostic assessment, then a psychometric or performance-intensity function should be obtained. If the reason for the testing is for hearing aid considerations, then the test is often given using words or sentences and either in quiet or in a background of noise. Another purpose is the analysis of error patterns in speech recognition and in that situation, a test other than some open set monosyllabic word test would be appropriate. Materials for Suprathreshold Testing The choice of materials for testing depends on the purpose of the test and on the age and abilities of the patients. The issues in materials include the set and the test items themselves.  

Closed set vs. Open set. The first consideration is whether a closed set or an open set is appropriate. Closed set tests limit the number of response alternatives to a fairly small set, usually between 4 and 10 depending on the procedure. The number of alternatives influences the guess rate. This is a consideration as well. The Word Intelligibility by Picture Identification or the WIPI test is a commonly used closed set test for children as it requires only the picture pointing response and it has a receptive language vocabulary that is as low as about 5 years. It is very useful in pediatric evaluations as is another closed set test, the Northwestern University Children's Perception of Speech test (NU-CHIPS).

In contrast, the open set protocol provides an unlimited number of stimulus alternatives. Therefore, open set tests are more difficult. The clinical practice surveys available suggest for routine audiometric testing that monosyllabic word lists are the most widely used materials in suprathreshold speech recognition testing for routine evaluations, but sentences in noise are gaining popularity for hearing aid purposes.  

CID W-22 vs. NU-6. The most common materials for speech recognition testing are the monosyllabic words, the Central Institute of the Deaf W-22 and the Northwestern University-6 word list. These are the most common open set materials and there has been some discussion among audiologists concerning the differences between those. From a historical perspective, the CID W-22 list came from the original Harvard PAL-PB50 words and the W-22s are a group of the more familiar of those. They were developed into four 50-word lists. They are still commonly used by audiologists today. The NU-6 lists were developed later and instead of looking for phonetic balance, they considered a more phonemic balance. The articulation function for both of those using recorded materials is about the same, 4% per dB. The NU-6 tests are considered somewhat more difficult than the W-22s. Clinical surveys show that both materials are used by practicing audiologists, with usage of the NU-6 lists beginning to surpass usage of W-22s.

Nonsense materials. There are other materials that are available for suprathreshold speech testing. There are other monosyllabic word lists like the Gardner high frequency word list (Gardner, 1971) that could be useful for special applications or special populations. There are also nonsense syllabic tasks which were used in early research in communication. An advantage of the nonsense syllables is that the effects of word familiarity and lexical constraints are reduced as compared to using actual words as test materials. A few that are available are the City University of New York Nonsense Syllable test, the Nonsense Syllable test, and others.

Sentence materials. Sentence materials are gaining popularity, particularly in hearing aid applications. This is because speech that contains contextual cues and is presented in a noise background is expected to have better predictive validity than words in quiet. The two sentence procedures that are popular are the Hearing In Noise Test (HINT) (Nilsson, Soli,& Sullivan, 1994) and the QuickSIN (Killion, Niquette, Gudmundsen, Revit & Banerjee, 2004). Other sentence tests that are available that have particular applications are the Synthetic Sentence Identification test (SSI), the Speech Perception and Noise test (SPIN), and the Connected Speech test.

Monitored Live Voice vs. Recorded. As with speech threshold testing, the use of recorded materials for suprathreshold speech testing standardizes the test administration. The recorded version of the test is actually the test in my opinion. This goes back to a study in 1969 where the findings said the test is not just the written word list, but rather it is a recorded version of those words.

Inter-speaker and intra-speaker variability makes using recorded materials the method of choice in almost all cases for suprathreshold testing. Monitored live voice (MLV) is not recommended. In years gone by, recorded materials were difficult to manipulate, but the ease and flexibility that is afforded us by CDs and digital recordings makes recorded materials the only way to go for testing suprathreshold speech recognition. Another issue to consider is the use of the carrier phrase. Since the carrier phrase is included on recordings and recorded materials are the recommended procedure, that issue is settled. However, I do know that monitored live voice is necessary in certain situations and if monitored live voice is used in testing, then the carrier phrase should precede the test word. In monitored live voice, the carrier phrase is intended to allow the test word to have its own natural inflection and its own natural power. The VU meter should peak at 0 for the carrier phrase and the test word then is delivered at its own natural or normal level for that word in the phrase.  

Levels. The level at which testing is done is another consideration. The psychometric or performance-intensity function plots speech performance in percent correct on the Y-axis, as a function of the level of the speech signal on the X-axis. This is important because testing at only one level, which is fairly common, gives us insufficient information about the patient's optimal performance or what we commonly call the PB-max. It also does not allow us to know anything about any possible deterioration in performance if the level is increased. As a reminder, normal hearers show a function that reaches its maximum around 25 to 40 dB SL (re: SRT) and that is the reason why suprathreshold testing is often conducted at that level. For normals, the performance remains at that level, 100% or so, as the level increases. People with conductive hearing loss also show a similar function. Individuals with sensorineural hearing loss, however, show a performance function that reaches its maximum at generally less than 100%. They can either show performance that stays at that level as intensity increases, or they can show a curve that reaches its maximum and then decreases in performance as intensity increases. This is known as roll-over. A single level is not the best way to go as we cannot anticipate which patients may have rollover during testing, unless we test at a level higher than where the maximum score was obtained. I recognize that there are often time constraints in everyday practice, but two levels are recommended so that the performance-intensity function can be observed for an individual patient at least in an abbreviated way.

Recently, Guthrie and Mackersie (2009) published a paper that compared several different presentation levels to ascertain which level would result in maximum word recognition in individuals who had different hearing loss configurations. They looked at a number of presentation levels ranging from 10 dB above the SRT to a level at the UCL (uncomfortable listening level) -5 dB. Their results indicated that individuals with mild to moderate losses and those with more steeply sloping losses reached their best scores at a UCL -5 dB. That was also true for those patients who had moderately-severe to severe losses. The best phoneme recognition scores for their populations were achieved at a level of UCL -5 dB. As a reminder about speech recognition testing, masking is frequently needed because the test is being presented at a level above threshold, in many cases well above the threshold. Masking will always be needed for suprathreshold testing when the presentation level in the test ear is 40 dB or greater above the best bone conduction threshold in the non-test ear if supra-aural phones are used.  

Full lists vs. half-lists. Another consideration is whether a full list or a half-list should be administered. Original lists were composed of 50 words and those 50 words were created for phonetic balance and for simplicity in scoring. It made it easy for the test to be scored if 50 words were administered and each word was worth 2%. Because 50-word lists take a long time, people often use half-lists or even shorter lists for the purpose of suprathreshold speech recognition testing. Let's look into this practice a little further.

An early study was done by Thornton and Raffin (1978) using the Binomial Distribution Model. They investigated the critical differences between one score and a retest score that would be necessary for those scores to be considered statistically significant. Their findings showed that with an increasing set size, variability decreased. It would seem that more items are better. More recently Hurley and Sells (2003) conducted a study that looked at developing a test methodology that would identify those patients requiring a full 50 item suprathreshold test and allow abbreviated testing of patients who do not need a full 50 item list. They used Auditec recordings and developed 10-word and 25-word screening tests. They found that the four lists of NU-6 10-word and the 25-word screening tests were able to differentiate listeners who had impaired word recognition who needed a full 50-word list from those with unimpaired word recognition ability who only needed the 10-word or 25-word list. If abbreviated testing is important, then it would seem that this would be the protocol to follow. These screening lists are available in a recorded version and their findings were based on a recorded version. Once again, it is important to use recorded materials whether you are going to use a full list or use an abbreviated list.  

Quiet vs. Noise. Another consideration in suprathreshold speech recognition testing is whether to test in quiet or in noise. The effects of sensorineural hearing loss beyond the threshold loss, such as impaired frequency resolution or impaired temporal resolution, makes speech recognition performance in quiet a poor predictor for how those individuals will perform in noise. Speech recognition in noise is being promoted by a number of experts because adding noise improves the sensitivity of the test and the validity of the test. Giving the test at several levels will provide for a better separation between people who have hearing loss and those who have normal hearing. We know that individuals with hearing loss have a lot more difficulty with speech recognition in noise than those with normal hearing, and that those with sensorineural hearing loss often require a much greater signal-to-noise ratio (SNR), 10 to 15 better, than normal hearers.

Monosyllabic words in noise have not been widely used in clinical evaluation. However there are several word lists that are available. One of them is the Words in Noise test or WIN test which presents NU-6 words in a multi-talker babble. The words are presented at several different SNRs with the babble remaining at a constant level. One of the advantages of using these kinds of tests is that they are adaptive. They can be administered in a shorter period of time and they do not run into the same problems that we see with ceiling effects and floor effects. As I mentioned earlier, sentence tests in noise have become increasingly popular in hearing aid applications. Testing speech in noise is one way to look at amplification pre and post fitting. The Hearing in Noise Test and QuickSin, have gained popularity in those applications. The HINT was developed by Nilsson and colleagues in 1994 and later modified. It is scored as the dB to noise ratio that is necessary to get a 50% correct performance on the sentences. The sentences are the BKB (Bamford-Kowal-Bench) sentences. They are presented in sets of 10 and the listener listens and repeats the entire sentence correctly in order to get credit. In the HINT, the speech spectrum noise stays constant and the signal level is varied to obtain that 50% point. The QuickSin is a test that was developed by Killion and colleagues (2004) and uses the IEEE sentences. It has six sentences per list with five key words that are the scoring words in each sentence. All of them are presented in a multi-talker babble. The sentences get presented one at a time in 5 dB decrements from a high positive SNR down to 0 dB SNR. Again the test is scored as the 50% point in terms of dB signal-to-noise ratio. The guide proposed by Killion on the SNR is if an individual has somewhere around a 0 to 3 dB SNR it would be considered normal, 3 to 7 would be a mild SNR loss, 7 to15 dB would be a moderate SNR loss, and greater than 15 dB would be a severe SNR loss.  

Scoring. Scoring is another issue in suprathreshold speech recognition testing. It is generally done on a whole word basis. However phoneme scoring is another option. If phoneme scoring is used, it is a way of increasing the set size and you have more items to score without adding to the time of the test. If whole word scoring is used, the words have to be exactly correct. In this situation, being close does not count. The word must be absolutely correct in order to be judged as being correct. Over time, different scoring categorizations have been proposed, although the percentages that are attributed to those categories vary among the different proposals.

The traditional categorizations include excellent, good, fair, poor, and very poor. These categories are defined as:  

  • Excellent or within normal limits = 90 - 100% on whole word scoring
  • Good or slight difficulty = 78 - 88%
  • Fair to moderate difficulty = 66 - 76%
  • Poor or great difficulty = 54 - 64 %
  • Very poor is < 52%

A very useful test routinely administered to those who are being considered for hearing aids is the level at which a listener finds listening most comfortable. The materials that are used for this are usually cold running speech or connected discourse. The listener is asked to rate the level at which listening is found to be most comfortable. Several trials are usually completed because most comfortable listening is typically a range, not a specific level or a single value. People sometimes want sounds a little louder or a little softer, so the range is a more appropriate term for this than most comfortable level. However whatever is obtained, whether it is a most comfortable level or a most comfortable range, should be recorded on the audiogram. Again, the material used should also be noted on the audiogram. As I mentioned earlier the most comfortable level (MCL) is often not the level at which a listener achieves maximum intelligibility. Using MCL in order to determine where the suprathreshold speech recognition measure will be done is not a good reason to use this test. MCL is useful, but not for determining where maximum intelligibility will be. The study I mentioned earlier showed that maximum intelligibility was reached for most people with hearing loss at a UCL -5. MCL is useful however in determining ANL or acceptable noise level.  

The uncomfortable listening level (UCL) is also conducted with cold running speech. The instructions for this test can certainly influence the outcome since uncomfortable or uncomfortably loud for some individuals may not really be their UCL, but rather a preference for listening at a softer level. It is important to define for the patient what you mean by uncomfortably loud. The utility of the UCL is in providing an estimate for the dynamic range for speech which is the difference between the UCL and the SRT. In normals, this range is usually 100 dB or more, but it is reduced in ears with sensorineural hearing loss often dramatically. By doing the UCL, you can get an estimate of the individual's dynamic range for speech.  

Acceptable Noise Level (ANL) is the amount of background noise that a listener is willing to accept while listening to speech (Nabelek, Tucker, & Letowski, 1991). It is a test of noise tolerance and it has been shown to be related to the successful use of hearing aids and to potential benefit with hearing aids (Nabelek, Freyaldenhoven, Tampas, & Muenchen, 2006). It uses the MCL and a measure known as BNL or background noise level. To conduct the test, a recorded speech passage is presented to the listener in the sound field for the MCL. Again note the use of recorded materials. The noise is then introduced to the listener to a level that will be the highest level that that person is able to accept or "put up with" while they are listening to and following the story in the speech passage. The ANL then becomes the difference between the MCL and the BNL. Individuals that have very low scores on the ANL are considered successful hearing aid users or good candidates for hearing aids. Those that have very high scores are considered unsuccessful users or poor hearing aid candidates. Obviously there are number of other applications for speech in audiologic practice, not the least of which is in the assessment of auditory processing. Many seminars could be conducted on this topic alone. Another application or future direction for speech audiometry is to more realistically assess hearing aid performance in "real world" environments. This is an area where research is currently underway.  

Question: Are there any more specific instructions for the UCL measurement? Answer: Instructions are very important. We need to make it clear to a patient exactly what we expect them to do. I personally do not like things loud. If I am asked to indicate what is uncomfortably loud, I am much below what is really my UCL. I think you have to be very direct in instructing your patients in that you are not looking for a little uncomfortable, but where they just do not want to hear it or cannot take it. Question: Can you sum up what the best methods are to test hearing aid performance? I assume this means with speech signals. Answer: I think the use of the HINT or the QuickSin would be the most useful on a behavioral test. We have other ways of looking at performance that are not behavioral. Question: What about dialects? In my area, some of the local dialects have clipped words during speech testing. I am not sure if I should count those as correct or incorrect. Answer: It all depends on your situation. If a patient's production is really reflective of the dialect of that region and they are saying the word as everyone else in that area would say it, then I would say they do have the word correct. If necessary, if you are really unclear, you can always ask the patient to spell the word or write it down. This extra time can be inconvenient, but that is the best way to be sure that they have correctly identified the word. Question: Is there a reference for the bracketing method? Answer: The bracketing method is based on the old modified Hughson-Westlake that many people use for pure tone threshold testing. It is very similar to that traditional down 10 dB, up 5 dB. I am sure there are more references, but the Hughson-Westlake is what bracketing is based on. Question: Once you get an SRT result, if you want to compare it to the thresholds to validate your pure tones, how do you compare it to the audiogram? Answer: If it is a flat hearing loss, then you can compare to the 3-frequency pure tone average (PTA). If there is a high frequency loss, where audibility at perhaps 2000 Hz is greatly reduced, then it is better to use just the average of 500Hz and 1000Hz as your comparison. If it is a steeply sloping loss, then you look for agreement with the best threshold, which would probably be the 500 Hz threshold. The reverse is also true for patients who have rising configurations. Compare the SRT to the best two frequencies of the PTA, if the loss has either a steep slope or a steep rise, or the best frequency in the PTA if it is a really precipitous change in configuration. Question: Where can I find speech lists in Russian or other languages? Answer: Auditec has some material available in languages other than English - it would be best to contact them directly. You can also view their catalog at www.auditec.com Carolyn Smaka: This raises a question I have. If an audiologist is not fluent in a particular language, such as Spanish, is it ok to obtain a word list or recording in that language and conduct speech testing? Janet Schoepflin: I do not think that is a good practice. If you are not fluent in a language, you do not know all the subtleties of that language and the allophonic variations. People want to get an estimation of suprathreshold speech recognition and this would be an attempt to do that. This goes along with dialect. Whether you are using a recording, or doing your best to say these words exactly as there are supposed to be said, and your patient is fluent in a language and they say the word back to you, since you are not familiar with all the variations in the language it is possible that you will score the word incorrectly. You may think it is correct when it is actually incorrect, or you may think it is incorrect when it is correct based on the dialect or variation of that language. Question: In school we were instructed to use the full 50-word list for any word discrimination testing at suprathreshold, but if we are pressed for time, a half word list would be okay. However, my professor warned us that we absolutely must go in order on the word list. Can you clarify this? Answer: I'm not sure why that might have been said. I was trained in the model to use the 50-word list. This was because the phonetic balance that was proposed for those words was based on the 50 words. If you only used 25 words, you were not getting the phonetic balance. I think the more current findings from Hurley and Sells show us that it is possible to use a shorter list developed specifically for this purpose. It should be the recorded version of those words. These lists are available through Auditec. Question: On the NU-6 list, the words 'tough' and 'puff' are next to each other. 'Tough' is often mistaken for 'puff' so then when we reads 'puff', the person looks confused. Is it okay to mix up the order on the word list? Answer: I think in that case it is perfectly fine to move that one word down. Question: When do you recommend conducting speech testing, before or after pure tone testing? Answer: I have always been a person who likes to interact with my patients. My own procedure is to do an SRT first. Frequently for an SRT I do use live voice. I do not use monitored live voice for suprathreshold testing. It gives me a time to interact with the patient. People feel comfortable with speech. It is a communicative act. Then I do pure tone testing. Personally I would not do suprathreshold until I finished pure tone testing. My sequence is often SRT, pure tone, and suprathreshold. If this is not a good protocol for you based on time, then I would conduct pure tone testing, SRT, and then suprathreshold. Question: Some of the spondee words are outdated such as inkwell and whitewash. Is it okay to substitute other words that we know are spondee words, but may not be on the list? Or if we familiarize people, does it matter? Answer: The words that are on the list were put there for their so-called familiarity, but also because they were somewhat homogeneous and equal in intelligibility. I think inkwell, drawbridge and whitewash are outdated. If you follow a protocol where you are using a representative sample of the words and you are familiarizing, I think it is perfectly fine to eliminate those words you do not want to use. You just do not want to end up only using five or six words as it will limit the test set. Question: At what age is it appropriate to expect a child to perform suprathreshold speech recognition testing? Answer: If the child has a receptive language age of around 4 or 5 years, even 3 years maybe, it is possible to use the NU-CHIPS as a measure. It really does depend on language more than anything else, and the fact that the child can sit still for a period of time to do the test. Question: Regarding masking, when you are going 40 dB above the bone conduction threshold in the non-test ear, what frequency are you looking at? Are you comparing speech presented at 40 above a pure tone average of the bone conduction threshold? Answer: The best bone conduction threshold in the non-test ear is what really should be used. Question: When seeing a patient in follow-up after an ENT prescribes a steroid therapy for hydrops, do you recommend using the same word list to compare their suprathreshold speech recognition? Answer: I think it is better to use a different list, personally. Word familiarity as we said can influence even threshold and it certainly can affect suprathreshold performance. I think it is best to use a different word list. Carolyn Smaka: Thanks to everyone for their questions. Dr. Schoepflin has provided her email address with the handout. If your question was not answered or if you have further thoughts after the presentation, please feel free to follow up directly with her via email. Janet Schoepflin: Thank you so much. It was my pleasure and I hope everyone found the presentation worthwhile.

American Speech, Language and Hearing Association. (1988). Determining Threshold Level for Speech [Guidelines]. Available from www.asha.org/policy Gardner, H.(1971). Application of a high-frequency consonant discrimination word list in hearing-aid evaluation. Journal of Speech and Hearing Disorders, 36 , 354-355. Guthrie, L. & Mackersie, C. (2009). A comparison of presentation levels to maximize word recognition scores. Journal of the American Academy of Audiology, 20 (6), 381-90. Hurley, R. & Sells, J. (2003). An abbreviated word recognition protocol based on item difficulty. Ear & Hearing, 24 (2), 111-118. Killion, M., Niquette, P., Gudmundsen, G., Revit, L., & Banerjee, S. (2004). Development of a quick speech-in-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired listeners. Journal of the Acoustical Society of America, 116 (4 Pt 1), 2395-405. Nabelek, A., Freyaldenhoven, M., Tampas, J., Burchfield, S., & Muenchen, R. (2006). Acceptable noise level as a predictor of hearing aid use. Journal of the American Academy of Audiology, 17 , 626-639. Nabelek, A., Tucker, F., & Letowski, T. (1991). Toleration of background noises: Relationship with patterns of hearing aid use by elderly persons. Journal of Speech and Hearing Research, 34 , 679-685. Nilsson, M., Soli. S,, & Sullivan, J. (1994). Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and in noise. Journal of the Acoustical Society of America, 95 (2), 1085-99. Thornton, A.. & Raffin, M, (1978). Speech-discrimination scores modeled as a binomial variable. Journal of Speech and Hearing Research, 21 , 507-518. Tillman, T., & Jerger, J. (1959). Some factors affecting the spondee threshold in normal-hearing subjects. Journal of Speech and Hearing Research, 2 , 141-146.

Rexton Reach - April 2024

Chair, Communication Sciences and Disorders, Adelphi University

Janet Schoepflin is an Associate Professor and Chair of the Department of Communication Sciences and Disorders at Adelphi University and a member of the faculty of the Long Island AuD Consortium.  Her areas of research interest include speech perception in children and adults, particularly those with hearing loss, and the effects of noise on audition and speech recognition performance.

Related Courses

Using gsi for cochlear implant evaluations, course: #39682 level: introductory 1 hour, empowerment and behavioral insights in client decision making, presented in partnership with nal, course: #37124 level: intermediate 1 hour, cognition and audition: supporting evidence, screening options, and clinical research, course: #37381 level: introductory 1 hour, innovative audiologic care delivery, course: #38661 level: intermediate 4 hours, aurical hit applications part 1 - applications for hearing instrument fittings and beyond, course: #28678 level: intermediate 1 hour.

Our site uses cookies to improve your experience. By using our site, you agree to our Privacy Policy .

  • Audiometers
  • Tympanometers
  • Hearing Aid Fitting
  • Research Systems
  • Research Unit
  • ACT Research
  • Our History
  • Distributors
  • Sustainability
  • Environmental Sustainability

another name for speech recognition threshold

Training in Speech Audiometry

  • Why Perform Functional Hearing Tests?

Speech Audiometry: An Introduction

Description, table of contents, what is speech audiometry, why perform speech audiometry.

  • Contraindications and considerations

Audiometers that can perform speech audiometry

How to perform speech audiometry, results interpretation, calibration for speech audiometry.

Speech audiometry is an umbrella term used to describe a collection of audiometric tests using speech as the stimulus. You can perform speech audiometry by presenting speech to the subject in both quiet and in the presence of noise (e.g. speech babble or speech noise). The latter is speech-in-noise testing and is beyond the scope of this article.

Speech audiometry is a core test in the audiologist’s test battery because pure tone audiometry (the primary test of hearing sensitivity) is a limited predictor of a person’s ability to recognize speech. Improving an individual’s access to speech sounds is often the main motivation for fitting them with a hearing aid. Therefore, it is important to understand how a person with hearing loss recognizes or discriminates speech before fitting them with amplification, and speech audiometry provides a method of doing this.

A decrease in hearing sensitivity, as measured by pure tone audiometry, results in greater difficulty understanding speech. However, the literature also shows that two individuals of the same age with similar audiograms can have quite different speech recognition scores. Therefore, by performing speech audiometry, an audiologist can determine how well a person can access speech information.

Acquiring this information is key in the diagnostic process. For instance, it can assist in differentiating between different types of hearing loss. You can also use information from speech audiometry in the (re)habilitation process. For example, the results can guide you toward the appropriate amplification technology, such as directional microphones or remote microphone devices. Speech audiometry can also provide the audiologist with a prediction of how well a subject will hear with their new hearing aids. You can use this information to set realistic expectations and help with other aspects of the counseling process.

Below are some more examples of how you can use the results obtained from speech testing.

Identify need for further testing

Based on the results from speech recognition testing, it may be appropriate to perform further testing to get more information on the nature of the hearing loss. An example could be to perform a TEN test to detect a dead region or to perform the Audible Contrast Threshold (ACT™) test .

Inform amplification decisions

You can use the results from speech audiometry to determine whether binaural amplification is the most appropriate fitting approach or if you should consider alternatives such as CROS aids.

You can use the results obtained through speech audiometry to discuss and manage the amplification expectations of patients and their communication partners.

Unexpected asymmetric speech discrimination, significant roll-over , or particularly poor speech discrimination may warrant further investigation by a medical professional.

Non-organic hearing loss

You can use speech testing to cross-check the results from pure tone audiometry for suspected non‑organic hearing loss.

Contraindications and considerations when performing speech audiometry

Before speech audiometry, it is important that you perform pure tone audiometry and otoscopy. Results from these procedures can reveal contraindications to performing speech audiometry.

Otoscopic findings

Speech testing using headphones or inserts is generally contraindicated when the ear canal is occluded with:

  • Foreign body
  • Or infective otitis externa

In these situations, you can perform bone conduction speech testing or sound field testing.

Audiometric findings

Speech audiometry can be challenging to perform in subjects with severe-to-profound hearing losses as well as asymmetrical hearing losses where the level of stimulation and/or masking noise  required is beyond the limits of the audiometer or the patient's uncomfortable loudness levels (ULLs).

Subject variables

Depending on the age or language ability of the subject, complex words may not be suitable. This is particularly true for young children and adults with learning disabilities or other complex presentations such as dementia and reduced cognitive function.

You should also perform speech audiometry in a language which is native to your patient. Speech recognition testing may not be suitable for patients with expressive speech difficulties. However, in these situations, speech detection testing should be possible.

Before we discuss speech audiometry in more detail, let’s briefly consider the instrumentation to deliver the speech stimuli. As speech audiometry plays a significant role in diagnostic audiometry, many audiometers include – or have the option to include – speech testing capabilities.

Table 1 outlines which audiometers from Interacoustics can perform speech audiometry.

Table 1: Audiometers from Interacoustics that can perform speech audiometry.

Because speech audiometry uses speech as the stimulus and languages are different across the globe, the way in which speech audiometry is implemented varies depending on the country where the test is being performed. For the purposes of this article, we will start with addressing how to measure speech in quiet using the international organization of standards ISO 8252-3:2022 as the reference to describe the terminology and processes encompassing speech audiometry. We will describe two tests: speech detection testing and speech recognition testing.

Speech detection testing

In speech detection testing, you ask the subject to identify when they hear speech (not necessarily understand). It is the most basic form of speech testing because understanding is not required. However, it is not commonly performed. In this test, words are normally presented to the ear(s) through headphones (monaural or binaural testing) or through a loudspeaker (binaural testing).

Speech detection threshold (SDT)

Here, the tester will present speech at varying intensity levels and the patient identifies when they can detect speech. The goal is to identify the level at which the patient detects speech in 50% of the trials. This is the speech detection threshold. It is important not to confuse this with the speech discrimination threshold. The speech discrimination threshold looks at a person’s ability to recognize speech and we will explain it later in this article.

The speech detection threshold has been found to correlate well with the pure tone average, which is calculated from pure tone audiometry. Because of this, the main application of speech detection testing in the clinical setting is confirmation of the audiogram.

Speech recognition testing

In speech recognition testing, also known as speech discrimination testing, the subject must not only detect the speech, but also correctly recognize the word or words presented. This is the most popular form of speech testing and provides insights into how a person with hearing loss can discriminate speech in ideal conditions.

Across the globe, the methods of obtaining this information are different and this often leads to confusion about speech recognition testing. Despite there being differences in the way speech recognition testing is performed, there are some core calculations and test parameters which are used globally.

Speech recognition testing: Calculations

There are two main calculations in speech recognition testing.

1. Speech recognition threshold (SRT)

This is the level in dB HL at which the patient recognizes 50% of the test material correctly. This level will differ depending on the test material used. Some references describe the SRT as the speech discrimination threshold or SDT. This can be confusing because the acronym SDT belongs to the speech detection threshold. For this reason, we will not use the term discrimination but instead continue with the term speech recognition threshold.

2. Word recognition score (WRS)

In word recognition testing, you present a list of phonetically balanced words to the subject at a single intensity and ask them to repeat the words they hear. You score if the patient repeats these words correctly or incorrectly.  This score, expressed as a percentage of correct words, is calculated by dividing the number of words correctly identified by the total number of words presented.

In some countries, multiple word recognition scores are recorded at various intensities and plotted on a graph. In other countries, a single word recognition score is performed using a level based on the SRT (usually presented 20 to 40 dB louder than the SRT).

Speech recognition testing: Parameters

Before completing a speech recognition test, there are several parameters to consider.

1. Test transducer

You can perform speech recognition testing using air conduction, bone conduction, and speakers in a sound-field setup.

2. Types of words

Speech recognition testing can be performed using a variety of different words or sentences. Some countries use monosyllabic words such as ‘boat’ or ‘cat’ whereas other countries prefer to use spondee words such as ‘baseball’ or ‘cowboy’. These words are then combined with other words to create a phonetically balanced list of words called a word list.

3. Number of words

The number of words in a word list can impact the score. If there are too few words in the list, then there is a risk that not enough data points are acquired to accurately calculate the word recognition score. However, too many words may lead to increased test times and patient fatigue. Word lists often consist of 10 to 25 words.

You can either score words as whole words or by the number of phonemes they contain.

An example of scoring can be illustrated by the word ‘boat’. When scoring using whole words, anything other than the word ‘boat’ would result in an incorrect score.

However, in phoneme scoring, the word ‘boat’ is broken down into its individual phonemes: /b/, /oa/, and /t/. Each phoneme is then scored as a point, meaning that the word boat has a maximum score of 3. An example could be that a patient mishears the word ‘boat’ and reports the word to be ‘float’. With phoneme scoring, 2 points would be awarded for this answer whereas in word scoring, the word float would be marked as incorrect.

5. Delivery of material

Modern audiometers have the functionality of storing word lists digitally onto the hardware of the device so that you can deliver a calibrated speech signal the same way each time you test a patient. This is different from the older methods of testing using live voice or a CD recording of the speech material. Using digitally stored and calibrated speech material in .wav files provides the most reliable and repeatable results as the delivery of the speech is not influenced by the tester.

6. Aided or unaided

You can perform speech recognition testing either aided or unaided. When performing aided measurements, the stimulus is usually played through a loudspeaker and the test is recorded binaurally.

Global examples of how speech recognition testing is performed and reported

Below are examples of how speech recognition testing is performed in the US and the UK. This will show how speech testing varies across the globe.

Speech recognition testing in the US: Speech tables

In the US, the SRT and WRS are usually performed as two separate tests using different word lists for each test. The results are displayed in tables called speech tables.

The SRT is the first speech test which is performed and typically uses spondee words (a word with two equally stressed syllables, such as ‘hotdog’) as the stimulus. During this test, you present spondee words to the patient at different intensities and a bracketing technique establishes the threshold at where the patient correctly identifies 50% of the words.

In the below video, we can see how an SRT is performed using spondee words.

Below, you can see a table showing the results from an SRT test (Figure 1). Here, we can see that the SRT has been measured in each ear. The table shows the intensity at which the SRT was found as well as the transducer, word list, and the level at which masking noise was presented (if applicable). Here we see an unaided SRT of 30 dB HL in both the left and right ears.

For both ears, the transducer type is phone and the masking level is 15 dB HL. The word list for the right ear is Spondee A, while the word list for the left ear is Spondee B.

Once you have established the intensity of the SRT in dB HL, you can use it to calculate the intensity to present the next list of words to measure the WRS. In WRS testing, it is common to start at an intensity of between 20 dB and 40 dB louder than the speech recognition threshold and to use a different word list from the SRT. The word lists most commonly used in the US for WRS are the NU-6 and CID-W22 word lists.

In word recognition score testing, you present an entire word list to the test subject at a single intensity and score each word based on whether the subject can correctly repeat it or not. The results are reported as a percentage.

The video below demonstrates how to perform the word recognition score.

Below is an image of a speech table showing the word recognition score in the left ear using the NU‑6 word list at an intensity of 55 dB HL (Figure 2). Here we can see that the patient in this example scored 90%, indicating good speech recognition at moderate intensities.

another name for speech recognition threshold

Speech recognition testing in the UK: Speech audiogram

In the UK, speech recognition testing is performed with the goal of obtaining a speech audiogram. A speech audiogram is a graphical representation of how well an individual can discriminate speech across a variety of intensities (Figure 3).

another name for speech recognition threshold

In the UK, the most common method of recording a speech audiogram is to present several different word lists to the subject at varying intensities and calculate multiple word recognition scores. The AB (Arthur Boothroyd) word lists are the most used lists. The initial list is presented around 20 to 30 dB sensation level with subsequent lists performed at quieter intensities before finally increasing the sensation level to determine how well the patient can recognize words at louder intensities.

The speech audiogram is made up of plotting the WRS at each intensity on a graph displaying word recognition score in % as a function of intensity in dB HL. The following video explains how it is performed.

Below is an image of a completed speech audiogram (Figure 4). There are several components.

Point A on the graph shows the intensity in dB HL where the person identified 50% of the speech material correctly. This is the speech recognition threshold or SRT.

Point B on the graph shows the maximum speech recognition score which informs the clinician of the maximum score the subject obtained.

Point C on the graph shows the reference speech recognition curve; this is specific to the test material used (e.g., AB words) and method of presentation (e.g., headphones), and shows a curve which describes the median speech recognition scores at multiple intensities for a group of normal hearing individuals.

Point A is at about 45 dB HL. Point B is at about 70 dB HL.

Having this displayed on a single graph can provide a quick and easy way to determine and analyze the ability of the person to hear speech and compare their results to a normative group. Lastly, you can use the speech audiogram to identify roll-over. Roll-over occurs when the speech recognition deteriorates at loud intensities and can be a sign of retro-cochlear hearing loss. We will discuss this further in the interpretation section.

Masking in speech recognition testing

Just like in audiometry, cross hearing can also occur in speech audiometry. Therefore, it is important to mask the non-test ear when testing monaurally. Masking is important because word recognition testing is usually performed at supra-threshold levels. Speech encompasses a wide spectrum of frequencies, so the use of narrowband noise as a masking stimulus is not appropriate, and you need to modify the masking noise for speech audiometry. In speech audiometry, speech noise is typically used to mask the non-test ear.

There are several approaches to calculating required masking noise level. An equation by Coles and Priede (1975) suggests one approach which applies to all types of hearing loss (sensorineural, conductive, and mixed):

  • Masking level = D S plus max ABG NT minus 40 plus E M

It considers the following factors.

1. Dial setting

D S is the level of dial setting in dB HL for presentation of speech to the test ear.

2. Air-bone gap

Max ABG NT is the maximum air-bone gap between 250 to 4000 Hz in the non‑test ear.

3. Interaural attenuation

Interaural attenuation: The value of 40 comes from the minimum interaural attenuation for masking in audiometry using headphones (for insert earphones, this would be 55 dB).

4. Effective masking

E M is effective masking. Modern audiometers are calibrated in E M , so you don’t need to include this in the calculation. However, if you are using an old audiometer calibrated to an older calibration standard, then you should calculate the E M .

You can calculate it by measuring the difference in the speech dial setting presented to normal listeners at a level that yields a score of 95% in quiet and the noise dial setting presented to the same ear that yields a score less than 10%. 

You can use the results from speech audiometry for many purposes. The below section describes these applications.

1. Cross-check against pure tone audiometry results

The cross-check principle in audiology states that no auditory test result should be accepted and used in the diagnosis of hearing loss until you confirm or cross-check it by one or more independent measures (Hall J. W., 3rd, 2016). Speech-in-quiet testing serves this purpose for the pure tone audiogram.

The following scores and their descriptions identify how well the speech detection threshold and the pure tone average correlate (Table 2).

Table 2: Correlation between speech detection threshold and pure tone average.

If there is a poor correlation between the speech detection threshold and the pure tone average, it warrants further investigation to determine the underlying cause or to identify if there was a technical error in the recordings of one of the tests.

2. Detect asymmetries between ears

Another core use of speech audiometry in quiet is to determine the symmetry between the two ears and whether it is appropriate to fit binaural amplification. Significant differences between ears can occur when there are two different etiologies causing hearing loss.

An example of this could be a patient with sensorineural hearing loss who then also contracts unilateral Meniere’s disease . In this example, it would be important to understand if there are significant differences in the word recognition scores between the two ears. If there are significant differences, then it may not be appropriate for you to fit binaural amplification, where other forms of amplification such as contralateral routing of sound (CROS) devices may be more appropriate.

3. Identify if further testing is required

The results from speech audiometry in quiet can identify whether further testing is required. This could be highlighted in several ways.

One example could be a severe difference in the SRT and the pure tone average. Another example could be significant asymmetries between the two ears. Lastly, very poor speech recognition scores in quiet might also be a red flag for further testing.

In these examples, the clinician might decide to perform a test to detect the presence of cochlear dead regions such as the TEN test or an ACT test to get more information.

4. Detect retro-cochlear hearing loss

In subjects with retro-cochlear causes of hearing loss, speech recognition can begin to deteriorate as sounds are made louder. This is called ‘roll-over’ and is calculated by the following equation:

  • Roll-over index = (maximum score minus minimum score) divided by maximum score

If roll-over is detected at a certain value (the value is dependent on the word list chosen for testing but is commonly larger than 0.4), then it is considered to be a sign of retro-cochlear pathology. This could then have an influence on the fitting strategy for patients exhibiting these results.

It is important to note however that as the cross-check principle states, you should interpret any roll-over with caution and you should perform additional tests such as acoustic reflexes , the reflex decay test, or auditory brainstem response measurements to confirm the presence of a retro-cochlear lesion.

5. Predict success with amplification

The maximum speech recognition score is a useful measure which you can use to predict whether a person will benefit from hearing aids. More recent, and advanced tests such as the ACT test combined with the Acceptable Noise Level (ANL) test offer good alternatives to predicting hearing success with amplification.

Just like in pure tone audiometry, the stimuli which are presented during speech audiometry require annual calibration by a specialized technician ster. Checking of the transducers of the audiometer to determine if the speech stimulus contains any distortions or level abnormalities should also be performed daily. This process replicates the daily checks a clinicians would do for pure tone audiometry. If speech is being presented using a sound field setup, then you can use a sound level meter to check if the material is being presented at the correct level.

The next level of calibration depends on how the speech material is delivered to the audiometer. Speech material can be presented in many ways including live voice, CD, or installed WAV files on the audiometer. Speech being presented as live voice cannot be calibrated but instead requires the clinician to use the VU meter on the audiometer (which indicates the level of the signal being presented) to determine if they are speaking at the correct intensity. Speech material on a CD requires daily checks and is also performed using the VU meter on the audiometer. Here, a speech calibration tone track on the CD is used, and the VU meter is adjusted accordingly to the desired level as determined by the manufacturer of the speech material.

The most reliable way to deliver a speech stimulus is through a WAV file. By presenting through a WAV file, you can skip the daily tone-based calibration as this method allows you to calibrate the speech material as part of the annual calibration process. This saves the clinician time and ensures the stimulus is calibrated to the same standard as the pure tones in their audiometer. To calibrate the WAV file stimulus, the speech material is calibrated against a speech calibration tone. This is stored on the audiometer. Typically, a 1000 Hz speech tone is used for the calibration and the calibration process is the same as for a 1000 Hz pure tone calibration.

Lastly, if the speech is being presented through the sound field, a calibration professional should perform an annual sound field speaker calibration using an external free field microphone aimed directly at the speaker from the position of the patient’s head.

Coles, R. R., & Priede, V. M. (1975). Masking of the non-test ear in speech audiometry .  The Journal of laryngology and otology ,  89 (3), 217–226.

Graham, J. Baguley, D. (2009). Ballantyne's Deafness, 7th Edition. Whiley Blackwell.

Hall J. W., 3rd (2016). Crosscheck Principle in Pediatric Audiology Today: A 40-Year Perspective .  Journal of audiology & otology ,  20 (2), 59–67.

Katz, J. (2009). Handbook of Clinical Audiology. Wolters Kluwer.

Killion, M. C., Niquette, P. A., Gudmundsen, G. I., Revit, L. J., & Banerjee, S. (2004).  Development of a quick speech-in-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired listeners . The Journal of the Acoustical Society of America , 116 (4), 2395–2405.

Stach, B.A (1998). Clinical Audiology: An Introduction, Cengage Learning.

another name for speech recognition threshold

Popular Academy Advancements

Getting started: assr, what is nhl-to-ehl correction, what is the ce-chirp® family of stimuli, nhl-to-ehl correction for abr stimuli.

  • Find a distributor
  • Customer stories
  • Made Magazine
  • ABR equipment
  • OAE devices
  • Hearing aid fitting systems
  • Balance testing equipment

Certificates

  • Privacy policy
  • Cookie Policy
  • AudioStar Pro
  • TympStar Pro
  • All Content
  • 60 Minute Courses
  • 30 Minute Courses
  • Testing Guides
  • Video Library
  • Get a Quote
  • Select Language

Speech Audiometry

Audiometry guides, introduction.

Speech audiometry is an important component of a comprehensive hearing evaluation. There are several kinds of speech audiometry, but the most common uses are to 1) verify the pure tone thresholds 2) determine speech understanding and 3) determine most comfortable and uncomfortable listening levels. The results are used with the other tests to develop a diagnosis and treatment plan.

SDT = Speech Detection Threshold, SAT = Speech Awareness Threshold.   These terms are interchangeable and they describe the lowest level at which a patient can hear the presence of speech 50% of the time.   They specifically refer to the speech being AUDIBLE, not INTELLIGIBLE.

This test is performed by presenting spondee (two-syllable) words such as baseball, ice cream, hotdog and the patient is to respond when they hear the speech.   This is often used with non-verbal patients such as infants or other difficult to test populations.   The thresholds should correspond to the PTA and is used to verify the pure tone threshold testing.    

How to Test:      

Instruct the patient that he or she will be hearing words that have two parts, such as “mushroom” or “baseball.” The patient should repeat the words and if not sure, he or she should not be afraid to guess.

Using either live voice or recorded speech, present the spondee word lists testing the better ear first. Start 20 dB above the 1000 Hz pure tone threshold level. Present one word on the list and, if the response is correct, lower the level by 5 dB. Continue until the patient has difficulty with the words. When this occurs, present more words for each 5 dB step.

Speech Reception Threshold (SRT)

SRT, or speech reception threshold, is a fast way to    help verify that the pure tone thresholds are valid. Common compound words - or spondee words - are presented at varying degrees of loudness until it is too soft for the patient to hear. SRT scores are compared to the pure tone average as part of the cross check principle.   When these two values agree, the reliability of testing is improved.

Word Recognition

Instruct the patient that he or she is to repeat the words presented. Using either live voice or recorded speech, present the standardized PB word list of your choice. Present the words at a level comfortable to the patient; at least 30 dB and generally 35 to 50 dB above the 1000 Hz pure tone threshold. Using the scorer buttons on the front panel, press the “Correct” button each time the right response is given and the “Incorrect” button each time a wrong response is given.

Speech Audiometry Testing Screen

The Discrimination Score is the percentage of words repeated correctly: Discrimination % at HL = 100 x Number of Correct Responses/Number of Trials.

WRS = Word Recognition Score, SRS = Speech Reception Score, Speech Discrimination Score.   These terms are interchangeable and describe the patient’s capability to correctly repeat a list of phonetically balanced (PB) words at a comfortable level.   The score is a percentage of correct responses and indicates the patient’s ability to understand speech.

Word Recognition Score (WRS)

WRS, or word recognition score, is a type of speech audiometry that is designed to measure speech understanding. Sometimes it is called word discrimination. The words used are common and phonetically balanced and typically presented at a level that is comfortable for the patient. The results of WRS can be used to help set realistic expectations and formulate a treatment plan.

Speech In Noise Test

Speech in noise testing is a critical component to a comprehensive hearing evaluation. When you test a patient's ability to understand speech in a "real world setting" like background noise, the results influence the diagnosis, the recommendations, and the patient's understanding of their own hearing loss.

Auditory Processing

Sometimes, a patient's brain has trouble making sense of auditory information. This is called an auditory processing disorder. It's not always clear that this lack of understanding is a hearing issue, so it requires a very specialized battery of speech tests to identify what kind of processing disorder exists and develop recommendations to improve the listening and understanding for the patient.

QuickSIN is a quick sentence in noise test that quantifies how a patient hears in noise. The patient repeats sentences that are embedded in different levels of restaurant noise and the result is an SNR loss - or Signal To Noise ratio loss.   Taking a few additional minutes to measure the SNR loss of every patient seen in your clinic provides valuable insights on the overall status of the patient' s auditory system and allows you to counsel more effectively about communication in real-world situations. Using the Quick SIN to make important decisions about hearing loss treatment and rehabilitation is a key differentiator for clinicians who strive to provide patient-centered care.

Speech-in-Noise Audiometry Testing Screen

BKB-SIN is a sentence in noise test that quantifies how patients hear in noise. The patient repeats sentences that are embedded in different levels of restaurant noise an the result is an SNR loss - or signal to noise ratio loss. This test is designed to evaluate patients of many ages and has normative corrections for children and adults. Taking a few additional minutes to measure the SNR loss of every patient seen in your clinic is a key differentiator for clinicians who strive to provide patient-centered care.

  • Education >
  • Testing Guides >
  • Speech Audiometry >

GRASON-STADLER

  • Our Approach
  • Cookie Policy
  • AMTAS Pro

Corporate Headquarters 10395 West 70th St. Eden Prairie, MN 55344

General Inquires +1 800-700-2282 +1 952-278-4402 [email protected]  (US) [email protected]  (International)

Technical Support Hardware: +1 877-722-4490 Software: +1 952-278-4456

DISTRIBUTOR LOGIN

  • GSI Extranet
  • Request an Account
  • Forgot Password?

Get Connected

Facebook

  • Distributor Locator

© Copyright 2024

BASLP COURSE

Speech Reception Thresholds – Procedure and Application

Speech Reception Thresholds – Procedure and Application: The speech reception threshold is the minimum hearing level for speech (ANSI, 2010) at which an individual can recognize 50% of the speech material. Speech reception thresholds are achieved in each ear. The term speech reception threshold is synonymous with speech recognition threshold.

Purpose of Speech Reception Thresholds:

  • To validate the thresholds obtained through PTA
  • To serve as a reference point for supra-threshold tests
  • To ascertain the need for aural (re)-habilitation and monitor its progress
  • To determine hearing sensitivity in difficult to test population

Materials for Speech Reception Thresholds:

  • Spondaic words are the usual and recommended test material for the SRT test. They are 2-syllable words that have equal stress on both syllables.
  • Word familiarization can be done prior to the start of test. This ensures that the client is familiar with the test vocabulary, and the client’s responses can be accurately interpreted by the clinician. Care to be taken to eliminate the visual cues during familiarization.
  • Based on the circumstances or individuals (age, language facility, physical condition), the standard word list can be modified; however, that the use of speech stimuli with less homogeneity than spondaic words may compromise the reliability of this measure.
  • The test material used should be noted in reporting of the results.

Response Format / Mode of Speech Reception Thresholds:

  • The usual response mode for obtaining the SRT is repetition of the stimulus item.
  • For many patients it is not possible to obtain verbal responses, necessitating the use of alternative response modes such as writing down the responses or closed set of choices such as picture pointing, signing, or visual scanning etc.
  • If picture pointing mode is to be used, then the clinician should be cautious in choosing the number of response items (e.g., between 6 and 12 words usually is appropriate).

Procedure of Speech Reception Thresholds:

There are different methods to obtain SRT – ascending or descending method.

another name for speech recognition threshold

Generally, descending method (ASHA, 1988) is preferred and is described below.

  • Obtain pure tone average (PTA) .
  • Starting level for SRT: 30-40dB above anticipated SRT or 20 dBSL (with reference to PTA).
  • Present one spondee at a time at this level. Decrease in 10 dB decrements, whenever the client response is correct. The 10 dB decrement continues until one word is missed/ until the client responds incorrectly.
  • Now present a second spondaic word at the same level that the client responded incorrectly.
  • If the second word is correctly identified by the client, the level is attenuated by 10 dB and two spondees are presented. This process is continued until two spondees are incorrectly identified at one level. This is the preliminary phase and the actual test phase begins, which can be performed in 5 dB step (Martin & Sides, 1985).
  • If you get response for at least one spondee, reduce the intensity by 5 dB and present 03 spondees at that level.
  • Continue the same procedure, until “no response” for all the 03 spondees obtained. Increase by 5 dB and continue it by presenting 03 spondees at each level till you get 2/3 spondees (>50%). That level can be considered as SRT.

Tally-sheet-and-calculation-of-the-SRT-with-5-dB-steps-according-to-the-ASHA-(1988)-method

Interpretation of Speech Reception Thresholds:

  • The SRT shall be recorded in dB HL. The results should be recorded for each ear on the same form that contains the client’s results for pure tone audiometry. Additional space  should be available to report other pertinent information that describes the test situation, such as alternative materials or response modes
  • The SRT & PTA correlation are usually within 6 – 12dB.
  • If there is disagreement, it could indicate one of the possibilities: misunderstanding of the instructions, functional hearing loss (non-organic) , instrumentation malfunction, pathology along CANS including VIII nerve, cognitive and language difficulties etc. For e.g. the SRT can be poorer than PTA in elderly and auditory processing disorders; whereas the SRT can be better than PTA in cases of malingerers/functional hearing loss .

Masking of Speech Reception Thresholds:

  • Masking should be applied to the non-test ear, when the obtained SRT in one ear exceeds the apparent SRT or a pure tone BC threshold at 500, 1000, 2000 or 4000 Hz in the contralateral ear by 40 dB or more.
  • The masker used should have a wide band spectrum (white, pink or speech noise) to effectively mask the speech stimuli.
  • The level of effective masking used should be sufficient to eliminate reception by the non-test ear without causing over masking and should be recorded on the same form as that used to record audiometric results.

Application of Speech Reception Thresholds:

Speech recognition measures have been used in every phase of audiology, such as

  • To describe the extent of hearing impairment in terms of how it affects speech understanding,
  • In the differential diagnosis of auditory disorders,
  • For determining the needs for amplification and other forms of audiologic rehabilitation,
  • For making comparisons between various hearing aids and amplification approaches,
  • For verifying the benefits of hearing aid use and other forms of audiologic rehabilitation, and
  • For monitoring patient performance over time for either diagnostic or rehabilitative purposes.

References :

⇒ https://www.ishaindia.org.in/pdf/Guidelines-Standard-Audiometric-Screening-Procedures.PDF ⇒ https://www.asha.org/PRPSpecificTopic.aspx?folderid=8589935335&section=Assessment#Speech_Audiometry ⇒ Essentials of Audiology – Stanley A. Gelfand, PhD (Book)

You are reading about:

Share this:.

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Telegram (Opens in new window)
  • Click to share on WhatsApp (Opens in new window)
  • Click to print (Opens in new window)

another name for speech recognition threshold

Written by BASLPCOURSE.COM

June 11, 2020, audiologic testing materials | audiology | baslp 2nd semester audiology notes | baslp 2nd semester audiology unit 4 notes | baslp 2nd semester notes | baslp notes | blog | hearing assessment tests, 0comment(s), follow us on.

For more updates follow us on Facebook, Twitter, Instagram, Youtube and Linkedin

You may also like….

Types of Earmolds for Hearing Aid – Skeleton | Custom

Types of Earmolds for Hearing Aid – Skeleton | Custom

Jan 18, 2024

Types of Earmolds for Hearing Aid - Skeleton | Custom: The realm of hearing aids is a diverse landscape with a myriad...

Procedure for Selecting Earmold and Earshell

Procedure for Selecting Earmold and Earshell

Jan 17, 2024

Procedure for Selecting Earmold and Earshell: When it comes to optimizing the acoustic performance of hearing aids,...

Ear Impression Techniques for Earmolds and Earshells

Ear Impression Techniques for Earmolds and Earshells

Jan 16, 2024

Ear Impression Techniques for Earmolds and Earshells: In the realm of audiology and hearing aid fabrication, the Ear...

If you have any Suggestion or Question Please Leave a Reply Cancel reply

Toggle Menu

Tous droits réservés © NeurOreille (loi sur la propriété intellectuelle 85-660 du 3 juillet 1985). Ce produit ne peut être copié ou utilisé dans un but lucratif.

Journey into the world of hearing

Speech audiometry

Authors: Benjamin Chaix Rebecca Lewis Contributors: Diane Lazard Sam Irving

Facebook Twitter Google+

Speech audiometry is routinely carried out in the clinic. It is complementary to pure tone audiometry, which only gives an indication of absolute perceptual thresholds of tonal sounds (peripheral function), whereas speech audiometry determines speech intelligibility and discrimination (between phonemes). It is of major importance during hearing aid fitting and for diagnosis of certain retrocochlear pathologies (tumour of the auditory nerve, auditory neuropathy, etc.) and tests both peripheral and central systems.

Speech audiogram

Normal hearing and hearing impaired subjects.

The speech recognition threshold (SRT) is the lowest level at which a person can identify a sound from a closed set list of disyllabic words.

The word recognition score (WRS) testrequires a list of single syllable words unknown to the patient to be presented at the speech recognition threshold + 30 dBHL. The number of correct words is scored out of the number of presented words to give the WRS. A score of 85-100% correct is considered normal when pure tone thresholds are normal (A), but it is common for WRS to decrease with increasing sensorineural hearing loss.

The curve 'B', on the other hand, indicates hypoacusis (a slight hearing impairment), and 'C' indicates a profound loss of speech intelligibility with distortion occurring at intensities greater than 80 dB HL.

It is important to distinguish between WRS, which gives an indication of speech comprehension, and SRT, which is the ability to distinguish phonemes.

Phonetic materials and testing conditions

Various tests can be carried out using lists of sentences, monosyllabic or dissyllabic words, or logatomes (words with no meaning, also known as pseudowords). Dissyllabic words require mental substitution (identification by context), the others do not.

A few examples

The test stimuli can be presented through headphones to test each ear separately, or in freefield in a sound attenuated booth to allow binaural hearing to be tested with and without hearing aids or cochlear implants. Test material is adapted to the individual's age and language ability.

What you need to remember

In the case of a conductive hearing loss:

  • the response curve has a normal 'S' shape, there is no deformation
  • there is a shift to the right compared to the reference (normal threshold)
  • there is an increase in the threshold of intelligibility

In the case of sensorineural hearing loss:

  • there is an increased intelligibility threshold
  • the curve can appear normal except in the higher intensity regions, where deformations indicate distortions

Phonetic testing is also carried out routinely in the clinic (especially in the case of rehabilitation after cochlear implantation). It is relatively long to carry out, but enables the evaluation of the real social and linguistic handicaps experienced by hearing impaired individuals. Cochlear deficits are tested using the “CNC (Consonant Nucleus Consonant) Test” (short words requiring little mental recruitment - errors are apparent on each phoneme and not over the complete word) and central deficits are tested with speech in noise tests, such as the “HINT (Hearing In Noise Test)” or “QuickSIN (Quick Speech In Noise)” tests, which are sentences carried out in noise.

Speech audiometry generally confirms pure tone audiometry results, and provides insight to the perceptual abilities of the individual. The intelligibility threshold is generally equivalent to the average of the intensity of frequencies 500, 1000 and 2000 Hz, determined by tonal audiometry (conversational frequencies). In the case of mismatch between the results of these tests, the diagnostic test used, equipment calibration or the reliability of the responses should be called into question.

Finally, remember that speech audiometry is a more sensitive indicator than pure tone audiometry in many cases, including rehabilitation after cochlear implantation.

Last update: 16/04/2020 8:57 pm

Connexion | Powered by eZPublish - Ligams

another name for speech recognition threshold

How To Calculate Speech Recognition Threshold

  • Success Team
  • March 29, 2023

Working with language data? Save 99% of your time and costs.

Join 150,000+ individuals and teams who rely on speak ai to capture and analyze unstructured language data for valuable insights. streamline your workflows, unlock new revenue streams and keep doing what you love..

Get a 7-day fully-featured trial!

another name for speech recognition threshold

Are you looking to understand how to calculate speech recognition threshold? Speech recognition technology is becoming increasingly popular, and knowing how to calculate the threshold is an important part of its successful use. This article will explain what a speech recognition threshold is, how it is calculated, and the importance of understanding this metric.

What Is a Speech Recognition Threshold?

In order for a speech recognition system to accurately recognize spoken words, it must be able to distinguish between background noise and the intended words. The speech recognition threshold is the point at which the system can differentiate between speech and other noise.

How Is the Speech Recognition Threshold Calculated?

The speech recognition threshold is calculated by taking the ratio of the signal-to-noise ratio (SNR) and the signal-to-background noise ratio (SBR). The SNR is the ratio of the signal power to the noise power, while the SBR is the ratio of the signal power to the background noise power.

The higher the SNR and SBR, the higher the speech recognition threshold. This means that if the SNR and SBR are both high, the speech recognition system will be able to recognize more words.

What Is the Importance of Understanding Speech Recognition Threshold?

Understanding the speech recognition threshold is important for ensuring the accuracy of a speech recognition system. By calculating the SNR and SBR, and then taking their ratio, it is possible to determine the point at which the system will be able to accurately recognize spoken words.

This can be especially beneficial for businesses, as it can help them ensure their speech recognition systems are working accurately and efficiently. It can also help them understand how to optimize their systems for better performance.

Knowing how to calculate speech recognition threshold is an important part of using speech recognition technology. By understanding the SNR and SBR, and then taking their ratio, it is possible to determine the point at which a speech recognition system will be able to accurately recognize spoken words. This can help businesses ensure their speech recognition systems are working accurately and efficiently.

another name for speech recognition threshold

Save 99% of your time and costs!

Use Speak's powerful AI to transcribe, analyze, automate and produce incredible insights for you and your team.

Acoustical Society of America

The interpretation of speech reception threshold data in normal-hearing and hearing-impaired listeners: Steady-state noise

Author to whom correspondence should be addressed. Electronic mail: [email protected]

  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Reprints and Permissions
  • Cite Icon Cite
  • Search Site

Cas Smits , Joost M. Festen; The interpretation of speech reception threshold data in normal-hearing and hearing-impaired listeners: Steady-state noise. J. Acoust. Soc. Am. 1 November 2011; 130 (5): 2987–2998. https://doi.org/10.1121/1.3644909

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Speech-in-noise-measurements are important in clinical practice and have been the subject of research for a long time. The results of these measurements are often described in terms of the speech reception threshold (SRT) and SNR loss. Using the basic concepts that underlie several models of speech recognition in steady-state noise, the present study shows that these measures are ill-defined, most importantly because the slope of the speech recognition functions for hearing-impaired listeners always decreases with hearing loss. This slope can be determined from the slope of the normal-hearing speech recognition function when the SRT for the hearing-impaired listener is known. The SII-function (i.e., the speech intelligibility index (SII) against SNR) is important and provides insights into many potential pitfalls when interpreting SRT data. Standardized SNR loss, sSNR loss, is introduced as a universal measure of hearing loss for speech in steady-state noise. Experimental data demonstrates that, unlike the SRT or SNR loss, sSNR loss is invariant to the target point chosen, the scoring method or the type of speech material.

Sign in via your Institution

Citing articles via.

  • Online ISSN 1520-8524
  • Print ISSN 0001-4966
  • For Researchers
  • For Librarians
  • For Advertisers
  • Our Publishing Partners  
  • Physics Today
  • Conference Proceedings
  • Special Topics

pubs.aip.org

  • Privacy Policy
  • Terms of Use

Connect with AIP Publishing

This feature is available to subscribers only.

Sign In or Create an Account

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 November 2020

Speech in noise perception improved by training fine auditory discrimination: far and applicable transfer of perceptual learning

  • Xiang Gao 1 ,
  • Tingting Yan 1 ,
  • Ting Huang 1 ,
  • Xiaoli Li 1 &
  • Yu-Xuan Zhang 1  

Scientific Reports volume  10 , Article number:  19320 ( 2020 ) Cite this article

2490 Accesses

4 Citations

1 Altmetric

Metrics details

  • Learning and memory

A longstanding focus of perceptual learning research is learning specificity, the difficulty for learning to transfer to tasks and situations beyond the training setting. Previous studies have focused on promoting transfer across stimuli, such as from one sound frequency to another. Here we examined whether learning could transfer across tasks, particularly from fine discrimination of sound features to speech perception in noise, one of the most frequently encountered perceptual challenges in real life. Separate groups of normal-hearing listeners were trained on auditory interaural level difference (ILD) discrimination, interaural time difference (ITD) discrimination, and fundamental frequency (F 0 ) discrimination with non-speech stimuli delivered through headphones. While ITD training led to no improvement, both ILD and F 0 training produced learning as well as transfer to speech-in-noise perception when noise differed from speech in the trained feature. These training benefits did not require similarity of task or stimuli between training and application settings, construing far and wide transfer. Thus, notwithstanding task specificity among basic perceptual skills such as discrimination of different sound features, auditory learning appears readily transferable between these skills and their “upstream” tasks utilizing them, providing an effective approach to improving performance in challenging situations or challenged populations.

Similar content being viewed by others

another name for speech recognition threshold

Speech perception is similar for musicians and non-musicians across a wide range of conditions

another name for speech recognition threshold

Rapid but specific perceptual learning partially explains individual differences in the recognition of challenging speech

another name for speech recognition threshold

Effects of training and using an audio-tactile sensory substitution device on speech-in-noise understanding

Introduction.

To extract target information from a competing and intervening background environment, such as speech perception in noise, is a major perceptual challenge that people encounter daily. Improving perception in such situations is of great interest in rehabilitative, professional, and educational settings. However, the benefit of perceptual learning is often bound to the training material and task for review, 1 , 2 . For example, training word recognition in noise with one word set failed to improve performance with another set 3 , and training discrimination of one sound feature did not transfer to discrimination of another feature even with the same sound 4 . The past decade has seen vigorous research and considerable progress on understanding and overcoming learning specificity 5 , 6 , 7 , 8 . To date, such research has primarily focused on stimulus specificity due to both practical and theoretical concerns. Theoretically, stimulus specificity of learning has often been linked to stimulus selectivity of neural responses along the sensory processing hierarchy to shed light onto learning loci 1 , 8 . Practically, stimulus specificity is the foremost limit of learning utility. As different perceptual tasks and their situations of application typically involve different stimuli, across-task transfer of learning appears, if not impossible, at least impractical before stimulus specificity can be resolved. Therefore, though nearly all basic perceptual skills can improve with training, whether training these skills can benefit real-life perceptual challenges such as speech recognition in noise has rarely been examined. Here, we propose and confirm that, notwithstanding learning specificity of and among the basic skills, such benefits can be attained.

We started with a simple assumption that any perceptual performance, may it be as simple as deciding if two pure tones are the same or as complicated as speech comprehension at a cocktail party, would depend on a hierarchical network of sensory, perceptual, cognitive, and affective processes, in which processes at similar levels of the hierarchy such as extraction of different stimulus features can function in parallel while those at different levels are serially organized 5 . According to this network view, specificity of learning reflects presence of parallel processing at the learning level: stimulus specificity arises when different stimuli are processed separately for the training task, and task specificity arises when the transfer task relies on a process parallel to the learned one at that level of processing. This account, while concurring with most current theories of learning in terms of stimulus specificity, has generated a contrasting prediction regarding task specificity: learning should transfer to other tasks that engage the trained component process. Supporting this prediction, we have shown that learning can transfer between perceptual (tone frequency discrimination) and cognitive (n-back) tasks 9 , which presumably share critical memory processes. A more direct test of the prediction is whether training basic perceptual skills should benefit “up-stream” tasks employing those skills, with the trained skills themselves serving as shared processes. Towards this end, we examined whether speech perception in noise, one of the most frequently encountered real-life perceptual challenges, could benefit from training fine discrimination of sound features useful for signal–noise separation with non-speech stimuli.

Among the sound features that can contribute to signal–noise separation, the most studied ones are cues to sound source location, primarily interaural time differences (ITDs) and interaural level differences (ILDs). These cues, among others, are used to reduce the masking effect of noise originating from sources spatially separate from the target sound, a phenomenon known as spatial release from masking, e.g., 10 , 11 though in the case of ILDs, it has been argued that the masking release could be to certain extent attributed to “better-ear listening”, i.e., listening to the ear with better signal-to-noise ratios 12 , 13 , 14 , 15 . Similarly, noise masking can be reduced with spectral or temporal cues. For example, fundamental frequency (F 0 ) is used to separate speech of different voices (speakers), attenuating speech-to-speech masking 16 . F 0 perception under some conditions, such as with “unresolved” tones that cannot be separated by auditory filters at the peripheral auditory system, relies on processing of “temporal fine structure” 17 , a temporal skill important for speech perception in noise with amplitude fluctuations 18 , 19 , 20 . Human discrimination of ILDs 21 , 22 , ITDs 23 , 24 , and F 0 25 has been demonstrated, to various extent, to improve with training. Thus, these cues were chosen to test whether learning of basic auditory skills can transfer, across task and stimulus differences, to speech perception in noise.

Methods and materials

Participants and equipment.

A total of 83 young healthy adults (54 females, mean age 21.9 ± 2.5 years) participated in the experiment. They were recruited from Beijing Normal University campus, and gave written consent for participation. All of the participants had normal hearing (tone threshold ≤ 20 dB HL from 0.25 to 8 kHz at each ear) and no previous experiences with psychoacoustic studies. The experimental procedure was approved by the Beijing Normal University Research Committee. The study was carried out in accordance with relevant guidelines and regulations. All participants provided informed consent.

Testing and training were conducted in a double-walled sound attenuating booth using custom computer programs based on the Psychtoolbox for Matlab 26 , 27 . Auditory stimuli were digitally generated. The sampling rate was 192 kHz to increase time resolution close to 5 μs when interaural time difference (ITD) was manipulated and was 44.1 kHz otherwise. Speech stimuli were manipulated with Praat 28 for duration and pitch adjustments. Sounds were presented binaurally via circumaural headphones (Sennheiser HD-650).

Experimental design

The study consisted of three training experiments, two training spatial skills (interaural level difference, or ILD, and interaural time difference, or ITD, discrimination), and one training spectral skills (fundamental frequency, or F0 discrimination). A pretest-training-posttest design was used for all experiments. Training involved repetitive practice on a single auditory task approximately half an hour per day for six to seven consecutive days except for weekends. In the pre- and posttests, the trained group, together with an untrained control group were tested on the training task as well as a speech-in-noise task. Details of training procedure and testing tasks are described in the following sections.

Tasks and stimuli

Auditory discrimination tasks.

For the three training tasks, ILD, ITD, and F0 discrimination, performance was measured and trained with a two- (ITD and ILD) or three- (F0) interval, forced-choice procedure and adaptive staircases. Each staircase consisted of 60 trials (a block), beginning with a lead-in phase in which the discrimination signal was increased after each incorrect response and decreased after each correct response. The point at which the direction of signal change switched from increasing to decreasing or from decreasing to increasing was denoted as a reversal. After the third reversal, the adaptive rule switched to 3-down-1-up (ITD and ILD) or 2-down-1-up (F0) to estimate discrimination threshold corresponding to 79% (ITD and ILD) or 71% (F0) correct performance on the psychometric function 29 . A visual feedback was provided after each response.

Interaural level difference (ILD) discrimination

On each trial, two 300-ms (including 10-ms rise/fall raised cosine ramps) sounds differing only in ILD value were presented binaurally with a 500-ms silence gap in between. The sounds were Gaussian noise lowpass filtered at 1 kHz sinusoidally amplitude modulated at 8 Hz with no interaural time or phase differences. Amplitude modulation has been shown to enhance across-stimulus transfer of ILD discrimination learning 22 . The low-frequency region was chosen because it affords greater share of speech energy and produced effective ILD learning in a pilot experiment. Listeners were instructed to report whether the second sound was to the left or right of the first sound by pressing the left or right arrow key on a computer keyboard. ILD difference between the two sounds (ΔILD) served as the discrimination signal. For each block, ΔILD started at 6 dB for each block and was adaptively changed with a step size of 0.8 dB in the lead-in phase and 0.2 dB thereafter. The ILD value was fixed in one of the two sounds randomly selected at each trial, referred to as the standard ILD. ILD in the other sound was the standard ILD plus ΔILD. Each sound was presented at the left ear at 70 dB SPL minus 0.5 times the desired ILD, and at the right ear at 70 dB SPL plus 0.5 times the desired ILD.

Participants were instructed to attend to the sound image inside their head and indicate the sound that was lateralized further to their right ear. Though discouraged, ILD discrimination could be performed by listening to sound level change at a single ear (level difference = ΔILD/2) while ignoring input from the other. Possible implications of this alternative strategy will be elaborated in Discussion.

ILD training consisted of 6 to 7 daily sessions, 12 blocks per session, of ILD discrimination with a standard ILD of 0 dB (perceived at around the midline of the head). During pre- and posttests, both the training condition and an untrained condition with a standard ILD of 6 dB were tested for 2 blocks per condition.

Interaural time difference (ITD) discrimination

On each trial, two 300-ms (including 10-ms rise/fall raised cosine ramps) 1-kHz lowpass Gaussian noise with a 500-ms inter-stimulus interval were presented binaurally at 70 dB SPL. The two sounds differed only in their ongoing ITD. This difference (ΔITD) served as discrimination signal. Task instruction was the same as the ILD task. Each sound was gated on and off simultaneously at the two ears. Ongoing ITDs were set by playing to the two ears two 300-ms sections of a slightly longer noise sample, the onsets of which were separated by the desirable ITD. Discrimination was conducted around a nominal standard ITD: At each trial, ITD in one sound was standard ITD plus 0.5 times ΔITD, and in the other was standard ITD minus 0.5 times ΔITD. The presentation order was randomized across trials. ΔITD started at 500 μs for each staircase, and was adaptively varied on the logarithmic scale 30 . The step size was 2 during the lead-in phase, and was 1.41 thereafter. Threshold estimation and subsequent analyses were also conducted on the logarithmic scale.

ITD training consisted of 7 daily sessions of 12 blocks with a nominal standard ITD of 0 μs. During pre- and posttests, both the training condition and an untrained condition with a nominal standard ITD of 150 μs were tested for 2 blocks per condition.

Fundamental frequency (F 0 ) discrimination

The F0 task was modified after two previous studies on F0 discrimination training 31 , 32 . Each trial consisted of three 200-ms harmonic complexes (with 10-ms rise/fall ramps) separated by 300-ms inter-stimulus intervals presented within a pink noise background that started 300 ms earlier and ended 300 ms later than the complex tones. Two of the complexes were identical (the standard), and the third, randomly selected at each trial, had a higher F0. The F0 difference (ΔF0) served as discrimination signal. Listeners were instructed to indicate which sound was different from the others by pressing a key on the keyboard. Each complex tone was generated by adding in sine (0°) phase the 5th to 27th harmonics of the desirable F0 and bandpass filtered the stimulus between the 10th and the 20th order of the standard F0 (e.g., between 2 to 4 kHz for a standard F0 of 200 Hz). Relatively high-order harmonics were used because compared to lower-order ones, they appeared to generate less specific learning 33 . The filter had a flat top and a slope of 80 dB/octave. The same filter was applied to all of the three complex tones at each trial. The background noise was intercepted from a 10-s pink noise generated offline with a 6 dB/octave slope and presented with an overall level of 55 dB SPL. The complex tones were presented at 65 dB SPL. Within each block, standard F0 was roved between 120 and 240 Hz, with the constraint that variation between consecutive trials should be between 5 and 30 Hz. Standard roving has been shown to enhance magnitude and transferability of frequency learning 9 , 34 . ΔF0 started at 50% and was adaptively adjusted on the logarithmic scale. Similar to the ITD task, the step size was 2 during the lead-in phase and was 1.41 thereafter. All subsequent calculations were also conducted on the logarithmic scale.

The roving condition was used for both training and testing. Training consisted of 7 daily sessions of 12 blocks, while 3 blocks were conducted in each of the pre- and posttests.

Speech perception in noise

Speech perception in noise was measured using word identification in the ILD and ITD training experiments and vowel identification in the F0 training experiment.

Word identification

At each trial, a monosyllable Chinese word spoken by a native male voice was presented within a noise masker. Different stimulus sets were used in the pre- and posttests. Each stimulus set was comprised of 16 syllables each with 4 variations in lexical tone, resulting in a one-interval, 64-alternative forced choice task. The choice options were displayed on the computer screen, with a 4 × 4 grid containing the Chinese spelling (Pinyin) of the 16 syllables flanked on the right by a 4 × 1 grid containing the digits (1 to 4) denoting the lexical tones. Listeners were instructed to indicate the perceived syllable and tone by mouse clicks. There was no trial-by-trial feedback, but overall performance in percent correct was visually displayed upon finishing a block. All of the speech tokens were presented at a constant level of 65 dB SPL, and at their originally recorded durations (340 to 780 ms long, 539 ms on average). The masker was Gaussian noise filtered to match the long-term spectrum of spoken Chinese characters, gated on and off simultaneously with the speech stimuli. Noise was presented at four signal-to-noise ratios (SNRs), − 12, − 9, − 6, and − 3 dB. The SNRs were determined based on a pilot study to cover the major portion of performance range in most listeners.

To examine the use of spatial skills, the task was conducted under two spatial configurations. While the target speech stimuli were always presented diotically (perceived approximately in the middle of the head), the noise maker was either co-located with or spatially separated from the target. In the co-located condition, the masker was also diotic, with both ILD and ITD set to zero. In the separated condition, the masker was lateralized to the right by ILD (6 dB, by increasing sound level at right ear and decreasing sound level at left ear by 3 dB) in the ILD training experiment and by ITD (150 μs) in the ITD training experiment.

Word identification was assessed in each of the pre- and posttests with 80 trials (20 trials per SNR mixed in random order) for each spatial condition. The order of spatial conditions was randomized across listeners but maintained for each listener through the tests. In case dramatic improvement resulted from testing and masked effect of spatial training, in the ILD training experiment, the word identification task was ‘pre-trained’ for six blocks before the pretest for all groups. Because improvement caused by such pre-training was moderate (3.9 ± 10% in identification accuracy), in the following ITD and F0 training experiments, no pre-training was provided for the speech task.

Vowel identification task

At each trial, a 350-ms monophthong Chinese vowel (the target) embedded in the middle of a 1000-ms clip of babble noise (the masker) was presented binaurally. Listeners were instructed to select the perceived vowel from a 2 × 3 grid labelling in Pinyin all of the Mandarin Chinese vowels (a, o, e, i, u, ü). The six vowels were presented equally frequently but in randomized order. All of the vowels were pronounced in tone 1. The level of target stimuli was fixed at 65 dB SPL, and the level of noise masker was varied to produce SNRs of − 13, − 9, − 5, − 1, and 3 dB.

This task tapped into the ability to take advantage of pitch-related spectral and temporal skills for hearing in noise. The noise masker was generated by mixing six sound tracks of 10-s random words spoken by six different male talkers. The multi-talker babble masker has been shown to produce greater masking for phoneme identification than steady-state noise and single-talker competing speech 35 , 36 . F 0 s of the six talkers were adjusted to distribute evenly between 87 and 161 Hz. The target vowels were spoken either by a male talker with an F 0 in the middle of those of the babble noise (124 Hz), or by a female talker with an F 0 10 semitones higher (229 Hz).

In each of the pre- and posttests, listeners completed three blocks of 60 trials, with SNRs and target talker conditions randomized. Before the test, listeners practiced another block of 60 trials, half with and half without the babble noise, to familiarize themselves with the task and the target talkers’ voices.

Auditory working memory (WM) task

A Tone n-back task was used to access and train auditory WM 9 . At each trial, a sequence of 40 + n pure tones was presented at the rate of 2.5 s/item. A tone matching that presented n positions back was denoted as a target and there were twelve targets randomly distributed in each sequence. Before and during each trial, n was displayed on the screen. Listeners were instructed to indicate a target by pressing a key and to make no response for non-targets. Visual feedback was provided after each response and upon finishing a sequence. All tones were 100-ms long (including 10-ms raised cosine ramps) and presented at 60 dB SPL. There were eight sets of eight tone frequencies selected from the range of 1080 to 4022 Hz, with neighboring frequencies in each set separated by at least one equivalent rectangular bandwidth (ERB) so that they were clearly distinguishable from each other. WM performance was indexed by d’, calculated as Z(hit rate) – Z(false alarm rate), where Z is the inverse cumulative Gaussian distribution.

WM training was used as active control for F0 training and similar to F0 training, consisted of 7 daily sessions of approximately half an hour of practice per session. To enable learning, WM training started with 2-back and switched to 3-back after three sessions 9 . Twelve sequences were completed in each training session and two sequences were completed in each of the pre- and posttests.

Training spatial skills

We first examined whether training spatial discrimination could improve speech perception in noise. Healthy young adults practiced on discrimination of one of the two sound localization cues, interaural level and time differences (ILDs and ITDs), for six to seven daily 35-min sessions. During training, the listeners were instructed to indicate direction of changes in perceived sound location (Fig.  1 A) caused by changes in either ILD (N = 10) or ITD (N = 12). Before and after training, the training groups, together with their respective no-training control groups (ILD-control: N = 10; ITD-control: N = 12), were tested on a Mandarin word-in-noise recognition task as well as the respective training task.

figure 1

ILD discrimination training. (A) Illustration of ILD discrimination task, shown in 3 trials. (B–D) Individual (grey lines) and group mean (filled symbols) ILD discrimination thresholds through training sessions (B) and between the pre- and post-training tests for the training condition (C) and an untrained condition with 6-dB standard ILD (D) . Error bars in all figures stand for one S.E.M.

ILD training

ILD discrimination (Fig.  1 A) was trained with a noise low-passed at 1 kHz sinusoidally amplitude modulated (AM) at 8 Hz, with a standard location of 0-dB ILD (the midline). The low-pass AM noise was chosen because the low-frequency region contains most energy in speech stimuli and amplitude modulation has been shown to enhance transfer of ILD learning 22 .

ILD discrimination threshold decreased with training (Fig.  1 B; linear regression: F 1,5  = 10.88, p = 0.022, adjusted R 2  = 0.622). However, compared to the ILD-control group (N = 10), the ILD-train group did not improve more on the trained condition (Fig.  1 C; repeated measure ANOVA, group by test interaction: F 1, 18  = 0.03, p = 0.865, partial η 2  = 0.002; group effect: F 1, 18  = 0.027, p = 0.872, partial η 2  = 0.001; test effect: F 1, 18  = 1.31, p = 0.267, partial η 2  = 0.068). Instead, they improved more on an untrained condition of the training task, where the standard sound location was 6-dB instead of 0-dB ILD (Fig.  1 D; group by test interaction: F 1, 18  = 7.78, p = 0.012, partial η 2  = 0.302; group effect: F 1, 18  = 0.34, p = 0.856, partial η 2  = 0.002; test effect: F 1, 18  = 8.67, p = 0.009, partial η 2  = 0.325). Compared to the trained location, ILD discrimination threshold at the untrained location was significantly higher before training (rmANOVA, effect of condition: F 1, 9  = 5.63, p = 0.042, partial η 2  = 0.385), but not after (F 1, 9  = 0.006, p = 0.942, partial η 2  = 0.001).

Speech perception was measured by Mandarin word identification (Fig.  2 A) in collocated and spatially separated (by 6-dB ILD) long-term speech shaped noise. The task was pre-trained before the pretest to allow for rapid learning of the speech task, in case such learning should confound with the ILD training effect.

figure 2

Effect of ILD discrimination training on sech perception in noise. (A) Illustration of the Mandarin word-in-noise task. (B,C) Mandarin word identification score (in % correct) across SNR levels for the ILD-train (B) and ILD-control (C) groups. (D) Speech reception threshold (SNR at 50% correct identification). (E) Pre-to-posttest gain in spatial release from masking (SRM). (F) Correlation between ILD learning at the 6-dB ILD condition and SRM gain at − 12-dB SNR.

At the pretest, speech perception performance did not differ between groups (Fig.  2 B,C; rmANOVA, effect of group: F 1, 18  = 2.81, p = 0.111, partial η 2  = 0.135), but differed markedly between the collocated and spatially separated noise conditions (effect of condition: F 1, 18  = 63.00, p < 0.001, partial η 2  = 0.778), indicating that the 6-dB ILD difference successfully produced spatial release from masking (SRM). Between the pre- and posttests, word identification score (in percent correct) of the ILD-control group did not change for either spatially separated (Fig.  2 C; rmANOVA, effect of test: F 1, 9  = 0.987, p = 0.346, partial η 2  = 0.099) or collocated (F 1, 9  = 0.018, p = 0.896, partial η 2  = 0.002) noise configuration, indicating that word identification performance was successfully stabilized by the pre-training. Critically, the ILD-train group improved significantly for spatially separated (Fig.  2 B; rmANOVA, effect of test: F 1, 9  = 12.94, p = 0.006, partial η 2  = 0.590), but not for collocated (F 1, 9  = 0.05, p = 0.829, partial η 2  = 0.005) noise, consistent with our prediction that improved spatial perception transfers to separation of signal from noise.

Following convention, speech reception threshold (SRT) was calculated as the SNR that corresponds to the 50% point in the psychometric function fitted for each individual and each noise condition. SRT decreased more in the ILD-train than in the ILD-control group for spatially separated noise (Fig.  2 D; rmANOVA, group by test interaction: F 1, 15  = 13.02, p = 0.003, partial η 2  = 0.465; effect of test: F 1, 15  = 8.27, p = 0.012, partial η 2  = 0.355; effect of group: F 1, 15  = 0.19, p = 0.666, partial η 2  = 0.013). Post hoc comparisons revealed that SRT improved in the ILD-train group (p = 0.001), but not in the ILD-control group (p = 0.576), consistent with the pattern of raw identification score and our hypothesis.

We predicted that ILD training would improve speech-in-noise perception by enhancing signal–noise separation using ILD, i.e., ILD-based spatial release from masking (SRM). ILD-based SRM, calculated as SRT difference between spatially separated and collocated conditions (Fig.  2 D), was enhanced by ILD training (rmANOVA, group by test interaction: F 1, 16  = 4.69, p = 0.046, partial η 2  = 0.227). When calculated as increase in identification score brought about by spatial separation of noise across all SNRs, SRM showed only a trend of ILD-training induced improvement (rmANOVA, group by test interaction: F 1, 54  = 3.93, p = 0.063, partial η 2  = 0.179). Between group comparisons revealed that the SRM gain took place primarily at the lower SNRs (Fig.  2 E; one-way ANOVA, SNR of − 12 dB: F 1, 18  = 7.97, p = 0.011, partial η 2  = 0.307; p > 0.03 for SNR of − 9 dB and p > 0.1 for higher SNRs; alpha was set at 0.013 for correction of multiple comparisons). Moreover, the SRM improvement at SNR of − 12 dB correlated positively with ILD learning at the 6-dB standard location (Fig.  2 F; r = 0.519, p = 0.019).

ITD training

ITD discrimination was trained with a 1 k-Hz low-pass noise around a standard location of 0-μs ITD (the midline). Unlike ILD training, ITD discrimination threshold did not improve with training (Fig.  3 A; rmANOVA, effect of session: F 6, 66  = 1.74, p = 0.125, partial η 2  = 0.137; linear regression: F 1,5  = 0.05, p = 0.832, adjusted R 2  = 0.01). Between the pre- and posttests, the ITD-train group also performed similarly to untrained controls on both the trained location (Fig.  3 B; rmANOVA, group effect: F 1, 22  = 0.54, p = 0.471, partial η 2  = 0.024; test effect: F 1, 22  = 1.61, p = 0.22, partial η 2  = 0.068; group by test interaction: F 1, 22  = 3.47, p = 0.076, partial η 2  = 0.136) and an untrained location of 150-μs ITD (Fig.  3 C; group effect: F 1, 22  = 0.08, p = 0.778, partial η 2  = 0.004; test effect: F 1, 22  = 2.72, p = 0.114, partial η 2  = 0.11; group by test interaction: F 1, 22  = 0.64, p = 0.432, partial η 2  = 0.028). The lack of training-induced learning in ITD discrimination was consistent with previous reports 21 , 37 .

figure 3

ITD discrimination performance through training. Individual (grey lines) and group mean (filled symbols) ITD discrimination thresholds were plotted through training sessions (A) and between the pre- and post-training tests for the training condition (B) and an untrained location of 150-μs standard ITD (C) .

Speech perception in noise was measured in the same task as in the ILD training experiment, except that ITD instead of ILD was varied to lateralize the noise in the spatially separated condition. Also, as the pre-training session in the ILD training experiment produced only a moderate learning effect on the speech task (increase of 3.9 ± 10% in identification score), the pre-training session was skipped in the ITD training experiment.

ITD training failed to impact speech perception in noise. Before training, similar to the ILD experiment, an ITD-based SRM was observed in both groups (Fig.  4 A,B; rmANOVA, effect of condition: F 1, 22  = 17.49, p < 0.001, partial η 2  = 0.443; effect of group: F 1, 22  = 1.195, p = 0.286, partial η 2  = 0.052; group by condition interaction: F 1, 22  = 0.02, p = 0.899, partial η 2  = 0.001). Between the pre- and posttests, the ITD-control (Fig.  4 A) and the ITD-train (Fig.  4 B) groups improved similarly on identification score (rmANOVA, effect of test: F 1, 22  = 20.59, p < 0.001, partial η 2  = 0.483; effect of group: F 1, 22  = 1.24, p = 0.278, partial η 2  = 0.053; group by test interaction: F 1, 22  = 0.28, p = 0.604, partial η 2  = 0.012; group by test by condition interaction: F 1, 22  = 0.05, p = 0.824, partial η 2  = 0.002). Speech reception threshold (SRT) also showed similar pre-to-posttest improvements (Fig.  4 C) between the two groups and the two spatial configurations (rmANOVA, effect of test: F 1, 16  = 10.41, p = 0.05, partial η 2  = 0.394; effect of group: F 1, 16  = 0.33, p = 0.577, partial η 2  = 0.02; all interaction effects: p > 0.4), indicating a nonspecific test–retest effect. Finally, ITD-based SRM (Fig.  4 D) did not improve with training (rmANOVA, group effect: F 1, 22  = 0.05, p = 0.824, partial η 2  = 0.002; SNR effect: F 3, 66  = 0.82, p = 0.488, partial η 2  = 0.036; group by SNR interaction: F 3, 66  = 0.379, p = 0.768, partial η 2  = 0.017).

figure 4

Effect of ITD discrimination training on speech perception in noise. (A,B) Mandarin word identification score (in % correct) across SNR levels for the ITD-train (A) and ITD-control (B) groups. (C) Speech reception threshold (SNR at 50% correct identification) changes between pre- and post-training tests. (D) Spatial release from masking (SRM) gain in identification score.

Training spectral skills

In the second study, we trained a new group of listeners on F 0 discrimination (N = 13; Fig.  5 A) with high-order (from the 10th to the 20th) harmonic tones. To promote transferable learning, the standard F 0 was roved between 120 to 240 Hz, approximately equivalent to the range of human voice. According to our hypothesis and previous study 9 , standard frequency roving during frequency discrimination training would engage constant updating of frequency representations in working memory (WM), leading to WM improvement. To control for possible effect of working memory learning, we trained a separate group (N = 13) on Tone n-back, an auditory WM task, Before and after training, the F 0 -train and the WM-train groups as well as an untrained F 0 -control group (N = 13), were tested on the F 0 task, the WM task, and speech perception in noise.

figure 5

F 0 discrimination training. (A) Illustration of F 0 discrimination task (for 2 consecutive trials). (B,C) Individual (grey lines) and group mean (filled symbols) F 0 discrimination thresholds through training sessions (B) and between the pre- and post-training tests (C) .

F 0 discrimination threshold decreased through the seven training sessions (Fig.  5 B; linear regression: F 1,5  = 60.57, p = 0.001, adjusted R 2  = 0.909). Consistently, between the pre- and posttests, the F 0 -train group improved more than the WM-train and the F 0 -control groups (Fig.  5 C; rmANOVA, group by test interaction: F 2, 36  = 17.13, p < 0.001, partial η 2  = 0.488; effect of test: F 1, 36  = 84.92, p < 0.001, partial η 2  = 0.702; effect of group: F 2, 36  = 3.98, p = 0.028, partial η 2  = 0.181). Interestingly, F 0 discrimination did not differ between the WM-train and the F 0 -control groups (group by test interaction: F 1, 24  = 0.06, p = 0.806, partial η 2  = 0.003), indicating that F 0 perception, unlike pure-tone pitch perception 9 , did not benefit from WM training.

Speech perception in noise was measured by identification of Mandarin vowels spoken by a target speaker in babble noise consisting of mixed speech of six different speakers (Fig.  6 A). All masker voices were male with F 0 s between 78 and 161 Hz, equally distributed. The target speaker was either a male with an F 0 of the mean of the six masker F 0 s (the embedded condition) or a female with an F 0 well above the masker F 0 range (the spectrally separated condition). Before training, all of the groups performed better on the spectrally separated than on the embedded condition (Fig.  6 B,C,D; rmANOVA, effect of condition: F 1, 36  = 116.91, p < 0.001, partial η 2  = 0.765; effect of group: F 2, 36  = 0.005, p = 0.995, partial η 2  < 0.002; group by condition interaction: F 2, 36  = 0.66, p = 0.525, partial η 2  = 0.035), demonstrating pitch based masking release. Between the pre- and posttests, the three groups improved equally for the embedded condition (rmANOVA, test effect: F 1, 36  = 40.94, p < 0.001, partial η 2  = 0.532; group effect: F 2, 36  = 1.98, p = 0.153, partial η 2  = 0.099; group by test interaction: F 2, 36  = 1.10, p = 0.343, partial η 2  = 0.058), indicating test–retest learning. The spectrally separated condition, however, showed different amounts of learning across groups (group by test interaction: F 2, 36  = 4.62, p = 0.016, partial η 2  = 0.204; test effect: F 1, 36  = 12.92, p = 0.001, partial η 2  = 0.264; group effect: F 2, 36  = 0.50, p = 0.612, partial η 2  = 0.027). Between-group comparisons revealed that the F 0 -train group (group by test interaction: F 1, 24  = 7.90, p = 0.010, partial η 2  = 0.248), but not the WM-train group (group by test interaction: F 1, 24  = 2.34, p = 0.139, partial η 2  = 0.089), improved more than the F 0 -control group. The group differences were also illustrated with the speech reception threshold (Fig.  6 E), which improved equivalently among the three groups on the embedded condition (rmANOVA, group by test interaction: F 2, 31  = 0.564, p = 0.575, partial η 2  = 0.035; test effect: F 1, 31  = 20.04, p < 0.001, partial η 2  = 0.393; group effect: F 2, 31  = 1.23, p = 0.306, partial η 2  = 0.074), but improved more in the F 0 -train group than the other two groups on the spectrally separated condition (group by test interaction: F 2, 28  = 5.976, p = 0.007, partial η 2  = 0.299; test effect: F 1, 28  = 5.04, p = 0.033, partial η 2  = 0.153; group effect: F 1, 28  = 0.737, p = 0.487, partial η 2  = 0.050). A closer examination of performance change on the spectrally separate condition (Fig.  6 F) revealed that the additional learning of the F 0 -train group primarily occurred at the mid-SNR level (rmANOVA, group by SNR interaction: F 8, 144  = 2.348, p = 0.021, partial η 2  = 0.115; post hoc group comparison, Sidak: p < 0.001 at −5-dB SNR, p > 0.6 at other SNRs). Moreover, performance improvement at this SNR level correlated positively with F 0 discrimination learning (Fig.  6 G; r = 0.49, p = 0.002).

figure 6

Effect of F 0 discrimination training on speech perception in noise. (A) Illustration of F 0 relationship of the target and masking speech in the vowel identification task. (B–D) Vowel identification score (in % correct) across SNR levels for the F 0 -train (A) , WM-train (B) and F 0 -control (C) groups. (E) Speech reception threshold (SNR at 50% correct identification). (F) Pre-to-posttest changes of identification score in the spectrally separated condition. (G) Correlation between vowel identification improvement at the mid-SNR level (− 5 dB) and F 0 discrimination learning.

The current results demonstrate that training basic auditory perception, namely discrimination of fine spatial or spectral differences in simple non-speech sounds, can improve speech recognition in noise. In that the trained spatial and spectral cues are used to separate signal from noise, the results support our hypothesis that learning would transfer between tasks involving processes at different levels of information processing. The hypothesis challenges the current view on learning specificity to the training task, suggesting broad existence of learning transfer between tasks. To emphasize the contingence of between-task transfer on their relation, we refer to this hypothesis as the principle of vertical transfer . In the literature of perceptual learning, the current results would constitute “far transfer”, as the training and transfer tasks differed categorically in task demand and stimulus type. However, the transfer was not boundless, but displayed a number of limitations or specificities. First, improvement of discrimination performance appears to be a prerequisite for transfer. For similar amount and method of training, while ILD discrimination improved and transferred to speech-in-noise perception, ITD discrimination did not. Though ITD discrimination has been shown to improve with training under some circumstances 23 , 24 , the lack of ILD comparable training effect was consistent with previous reports 21 , 37 . In this sense, ITD discrimination training could serve as active control for ILD discrimination training, indicating that the time, exposure, and effort involved in training were insufficient, and that learning of the trained task was necessary, to produce the far transfer. Second, speech perception improved only when noise was separable from target stimuli using the trained spatial or spectral cue, indicating that discrimination learning specifically improved the ability to release noise masking, not speech processing per se. Further, on the separated conditions, transfer was significant only for middle to low signal-to-noise ratios, consistent with the fact that performance benefit of noise separation depends on nature and amount of noise masking 38 . Third, training and learning of auditory working memory did not transfer to speech-in-noise perception (Fig.  6 ), despite the critical role of working memory suggested for speech recognition 39 . This is probably due to the use of vowel identification for target task, which involved only isolated monosyllables, rendering it unlikely for working memory to become a performance-limiting factor. All taken together, the far transfer from fine discrimination of sound features to speech-in-noise perception is by no means an overthrow of the specificities that have long been observed for perceptual learning, but rather coexists with them. Indeed, the coexisting specificities rather support our hypothesis by demonstrating that between-task transfer occurs only when the proposed contingence is met.

Under the current experimental design, the exact nature of the learned skills, hence the specific mechanisms of their contribution to speech perception in noise, cannot be determined. The training conditions were designed based on previous learning studies 22 , 34 to promote the likelihood of learning and across-stimulus transfer, with little effort to limit possibilities of multiple learning mechanisms. For ILD training, ILD was applied to sounds presented through headphones by increasing sound level at one ear and decreasing at the other. Though instructed to indicate change in the lateralized sound image, a listener could perform the discrimination task by listening to sound level change at one ear only and acquire the spatial release for speech-in-noise perception by listening to the ear with better signal-to-noise ratios, namely “better-ear listening” 12 , 13 , 14 , 15 . Spatial separation by an ILD of 6 dB would yield a 3-dB better-ear advantage. The observed spatial release when calculated in SRT (Fig.  2 c) was 2.4 dB before and 3.8 dB after training, not much beyond the expected better-ear advantage. Alternative to improving ILD discrimination, ILD training might have improved monaural level discrimination while ignoring input from the other ear, which could have transferred to speech-in-noise perception by improving better-ear listening. Thus, ILD training benefits could be binaural, monaural, or a combination of the two in nature. For F 0 training, the use of relatively high-order harmonics (10th to 20th order for F 0 of 120 to 240 Hz) may promote utilization of “temporal fine structure” 17 , a skill considered by some researchers to be important for speech perception in noise with amplitude fluctuations by allowing for “temporal glimpsing” 18 , 19 , 20 . On the other hand, in the case of two competing speech stimuli, the contribution of high-order harmonics to masking release was much smaller than low-order ones, particularly for small F 0 differences 40 , 41 . Further, the use of multi-talker babble noise, a most effective masker for phoneme stimuli 35 , 36 , discourages speech segregating mechanisms relying on the masker’s harmonicity such as the harmonic cancellation model 42 , 43 , but leaves intact other mechanisms such as spectral glimpsing 44 , 45 . Indeed, it has been suggested that F 0 -difference based speech-masker separation involves a combination of both temporal and spectral mechanisms 46 and that the pattern and mechanism of masking release depend on the nature of the masker 47 . Thus, the current F 0 training benefits could be spectral, temporal, or combined in mechanism. For both training experiments, as all candidate skills for learning are also contributing skills for masking release, the uncertainty in learning and transfer mechanisms bears little consequence for our proposal and examination of between-task learning transfer.

While the exact mechanisms of the observed far transfers remain to be specified, there is a straightforward functional link between improved perceptual acuity and reduced noise masking. When discrimination threshold of a sound feature (nominally ILD or F 0 ) decreased with training, the perceived distance of a fixed amount of variation in that feature or its associated cues would increase correspondingly, causing greater separation of signal and noise along that perceptual dimension. This idea is supported by the correlations between the threshold decrease for the trained discrimination tasks and the speech intelligibility increase on the speech-in-noise tasks (Figs.  2 F; 6 G). Alternatively, discrimination training could have led to cognitive changes, such as improved attention control or working memory for better-ear listening in presence of ILD difference or for temporal/spectral glimpsing in presence of F 0 difference, hence enhancing the utility of that feature in separating noise from signal. The cognitive view, though tempting in its easy accountability for far transfers, is not compatible with the aforementioned specificities coexisting with learning transfer, particularly the lack of transfer from working memory training.

The observed “vertical” transfer between tasks of different levels of complexity and neural processing may be a rule rather than oddity of perceptual learning. Most reports of task specificity have examined transfer between tasks of similar levels, such as feature discrimination along different stimulus dimensions for review, see 1 , 5 , 6 , 7 . The critical skills trained with such tasks can be deemed “parallel”, in that they involve information at similar levels of perceptual processing hierarchy that could be computed separately from and independently of each other. In the few cases where non-parallel tasks were examined e.g., 48 , 49 across-task transfer has indeed been reported, with the transfer pattern matching the relation of the tasks in question. For example, training an asynchrony task (whether two tones ended at the same time) transferred to an order task (which tone ended earlier), but not vice versa 48 , which were interpreted as training the two tasks affecting “asymmetric” neural circuits. In another case 49 , learning was reported to transfer between a visual alignment task (whether three elements were aligned) and a bisection task (whether three elements were equally spaced), which was accounted for by the two tasks sharing the same skill (positional judgement along the same spatial axis). Together with our previous report of learning transfer between auditory frequency discrimination and working memory 9 and the current data, the pattern emerges that learning transfers readily between tasks that are non-parallel, with shared component processes or contributing to each other. That is, perceptual learning is intrinsically capable of “far”, across-task transfer despite its specificity for stimulus and task variation at “near” grounds. While most preceding theories of perceptual learning try to account for learning specificity or transfer in the form of neural modification locus 50 and/or mechanism 51 , 52 , 53 , 54 , the principle of vertical transfer, assuming that auditory performance in most situations involve a shared hierarchical network of sensory, perceptual, affective and cognitive processes organized parallelly at the same level and serially across levels, accounts for learning specificity or transfer in terms of the relationship of the trained process with the processing network of the transfer task 5 . For example, in light of the “learning loci” theories, the current results would be interpreted as learning taking place somewhere “high” along or even beyond the perceptual processing hierarchy, where neurons would respond widely to different stimuli and task demands. In contrast, according to the principle of vertical transfer, learning could take place at relatively low level of sensory processing, befitting the trained sound feature, but is transferrable to “upstream”, more complicated tasks because performance of such tasks would engage the low-level sensory processes. The proposed principle of learning is in line with the multiplexing theory of the auditory system 55 , as well as with a plethora of evidence for rapid, goal oriented plasticity of auditory cortices that allow the same neurons to subserve multiple tasks 56 , 57 .

On the practical side, the principle of vertical transfer supports broad and effective applications of perceptual learning. Long and much effort has been spent on ways to overcome learning specificities so that perception in challenging environments or challenged populations can benefit from perceptual training 8 , 58 . Novel training regimens 4 , 59 , 60 have been designed and recreational video games have been exploited 61 , 62 to boost learning and its transferability. The current results indicate that the “vertical”, across-task transfers, being far relative to the aims of most previous endeavors, may have been present all the time. Given this principle, an effective way to improve real-life perceptual performance would be training “the shared ground”, i.e., the basic skills most widely involved in target situations of application. The current study, demonstrating that speech perception in noise could benefit from discrimination training of different sound features, provides a first and successful example towards such applications.

1Wright, B. A. & Zhang, Y. A review of the generalization of auditory learning. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 364 , 301–311(2009).

Sagi, D. Perceptual learning in vision research. Vision. Res. 51 , 1552–1566 (2011).

Google Scholar  

Burk, M. H. & Humes, L. E. Effects of long-term training on aided speech-recognition performance in noise in older adults. J Speech Lang Hear Res. 51 , 759–771 (2008).

Wright, B. A., Sabin, A. T., Zhang, Y., Marrone, N. & Fitzgerald, M. B. Enhancing perceptual learning by combining practice with periods of additional sensory stimulation. J. Neurosci. 30 , 12868–12877 (2010).

CAS   Google Scholar  

Amitay, S., Zhang, Y. X., Jones, P. R. & Moore, D. R. Perceptual learning: Top to bottom. Vision. Res. 99 , 69–77 (2014).

Kawato, M. et al. Perceptual learning–the past, present and future. Vision. Res. 99 , 1–4 (2014).

Irvine, D. R. F. Auditory perceptual learning and changes in the conceptualization of auditory cortex. Hear. Res. (2018).

Li, W. Perceptual learning: Use-dependent cortical plasticity. Annu. Rev. Vis. Sci. 2 , 109–130 (2016).

Zhang, Y. X. et al. Auditory discrimination learning: Role of working memory. PLoS ONE 11 , e0147320 (2016).

Culling, J. F., Hawley, M. L. & Litovsky, R. Y. The role of head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources. J. Acoust. Soc. Am. 116 , 1057–1065 (2004).

ADS   Google Scholar  

Gallun, F. J., Mason, C. R. & Kidd, G. Jr. Binaural release from informational masking in a speech identification task. J. Acoust. Soc. Am. 118 , 1614–1625 (2005).

Edmonds, B. A. & Culling, J. F. The spatial unmasking of speech: evidence for better-ear listening. J. Acoust. Soc. Am. 120 , 1539–1545 (2006).

Glyde, H. et al. The effect of better-ear glimpsing on spatial release from masking. J. Acoust. Soc. Am. 134 , 2937–2945 (2013).

Zurek, P. M. A predictive model for binaural advantages in speech intelligibility. J. Acoust. Soc. Am. 71 (1983).

Hawley, M. L., Litovsky, R. Y. & Culling, J. F. The benefit of binaural hearing in a cocktail party: Effect of location and type of interferer. J. Acoust. Soc. Am. 115 , 833–843 (2004).

Brown, C. A. & Bacon, S. P. Fundamental frequency and speech intelligibility in background noise. Hear. Res. 266 , 52–59 (2010).

Moore, B. C., Hopkins, K. & Cuthbertson, S. Discrimination of complex tones with unresolved components using temporal fine structure information. J. Acoust. Soc. Am. 125 , 3214–3222 (2009).

Hopkins, K., Moore, B. C. & Stone, M. A. Effects of moderate cochlear hearing loss on the ability to benefit from temporal fine structure information in speech. J. Acoust. Soc. Am. 123 , 1140–1153 (2008).

Moon, I. J. et al. Optimal combination of neural temporal envelope and fine structure cues to explain speech identification in background noise. J. Neurosci. 34 , 12145–12154 (2014).

Lorenzi, C., Gilbert, G., Carn, H., Garnier, S. & Moore, B. C. Speech perception problems of the hearing impaired reflect inability to use temporal fine structure. Proc. Natl. Acad. Sci. U.S.A. 103 , 18866–18869. https://doi.org/10.1073/pnas.0607364103 (2006).

Article   ADS   CAS   Google Scholar  

Wright, B. A. & Fitzgerald, M. B. Different patterns of human discrimination learning for two interaural cues to sound-source location. Proc. Natl. Acad. Sci. U.S.A. 98 , 12307–12312 (2001).

ADS   CAS   Google Scholar  

Zhang, Y. & Wright, B. A. An influence of amplitude modulation on interaural level difference processing suggested by learning patterns of human adults. J. Acoust. Soc. Am. 126 , 1349–1358 (2009).

Ortiz, J. A. & Wright, B. A. Differential rates of consolidation of conceptual and stimulus learning following training on an auditory skill. Exp. Brain Res. 201 , 441–451 (2010).

Rowan, D. & Lutman, M. E. Learning to discriminate interaural time differences at low and high frequencies. Int. J. Audiol. 46 , 585–594 (2007).

Miyazono, H., Glasberg, B. R. & Moore, B. C. Perceptual learning of fundamental frequency discrimination: Effects of fundamental frequency, harmonic number, and component phase. J. Acoust. Soc. Am. 128 , 3649–3657 (2010).

Brainard, D. H. The psychophysics toolbox. Spat. Vis. 10 , 433–436 (1997).

Kleiner, M., Brainard, D. & Pelli, D. What’s new in psychtoolbox-3?. Perception. 36 , 1 (2007).

Praat: Doing Phonetics by Computer [Computer Program] v. 6.0.17 , retrieved April 24, 2016 https://www.praat.org/ (2016).

Levitt, H. Transformed up-down methods in psychoacoustics. J. Acoust. Soc. Am. 49 (Suppl 2), 467+ (1971).

Saberi, K. Some considerations on the use of adaptive methods for estimating interaural-delay thresholds. J. Acoust. Soc. Am. 98 , 1803–1806 (1995).

Carcagno, S. & Plack, C. J. Subcortical plasticity following perceptual learning in a pitch discrimination task. J. Assoc. Res. Otolaryngol. JARO. 12 , 89–100 (2011).

Grimault, N., Micheyl, C., Carlyon, R. P. & Collet, L. Evidence for two pitch encoding mechanisms using a selective auditory training paradigm. Percept. Psychophys. 64 , 189–197 (2002).

Carcagno, S. & Plack, C. J. Pitch discrimination learning: specificity for pitch and harmonic resolvability, and electrophysiological correlates. J. Assoc. Res. Otolaryngol. JARO 12 , 503–517 (2011).

Amitay, S., Hawkey, D. J. & Moore, D. R. Auditory frequency discrimination learning is affected by stimulus variability. Percept. Psychophys. 67 , 691–698 (2005).

Garcia Lecumberri, M. L. & Cooke, M. Effect of masker type on native and non-native consonant perception in noise. J. Acoust. Soc. Am. 119 (2006).

Simpson, S. A. & Cooke, M. Consonant identification in N-talker babble is a nonmonotonic function of N. J. Acoust. Soc. Am. 118 , 2775–2778 (2005).

Zhang, Y. & Wright, B. A. Similar patterns of learning and performance variability for human discrimination of interaural time differences at high and low frequencies. J. Acoust. Soc. Am. 121 , 2207–2216 (2007).

Bronkhorst, A. W. The cocktail-party problem revisited: Early processing and selection of multi-talker speech. Attent. Percept. Psychophys. 77 , 1465–1487 (2015).

Rudner, M., Davidsson, L. & Ronnberg, J. Effects of age on the temporal organization of working memory in deaf signers. Neuropsychol. Dev. Cogn. B Aging Neuropsychol. Cogn. 17 , 360–383 (2010).

Culling, J. F. & Darwin, C. J. Perceptual separation of simultaneous vowels: Within and across-formant grouping by F0. J. Acoust. Soc. Am. 93 , 3454–3467 (1993).

Oxenham, A. J. & Simonson, A. M. Masking release for low- and high-pass-filtered speech in the presence of noise and single-talker interference. J. Acoust. Soc. Am. 125 , 457–468 (2009).

de Cheveigne, A. Cancellation model of pitch perception. J. Acoust. Soc. Am. 103 , 1261–1271 (1998).

Deroche, M. L. & Culling, J. F. Voice segregation by difference in fundamental frequency: Evidence for harmonic cancellation. J. Acoust. Soc. Am. 130 , 2855–2865 (2011).

44Deroche, M. L., Culling, J. F., Chatterjee, M. & Limb, C. J. Roles of the target and masker fundamental frequencies in voice segregation. J. Acoust. Soc. Am. 136 , 1225 (2014).

45Guest, D. R. & Oxenham, A. J. The role of pitch and harmonic cancellation when listening to speech in harmonic background sounds. J. Acoust. Soc. Am. 145 , 3011 (2019).

Assmann, P. F. & Summerfield, Q. Modeling the perception of concurrent vowels: Vowels with different fundamental frequencies. J. Acoust. Soc. Am. 88 , 680–697 (1990).

Deroche, M. L. & Culling, J. F. Voice segregation by difference in fundamental frequency: Effect of masker type. J. Acoust. Soc. Am. 134 , EL465–470 (2013).

Mossbridge, J. A., Scissors, B. N. & Wright, B. A. Learning and generalization on asynchrony and order tasks at sound offset: Implications for underlying neural circuitry. Learn. Mem. 15 , 13–20 (2008).

Webb, B. S., Roach, N. W. & McGraw, P. V. Perceptual learning in the absence of task or stimulus specificity. PLoS ONE 2 , e1323 (2007).

Ahissar, M. & Hochstein, S. The reverse hierarchy theory of visual perceptual learning. Trends Cogn. Sci. 8 , 457–464 (2004).

Shibata, K., Sagi, D. & Watanabe, T. Two-stage model in perceptual learning: toward a unified theory. Ann. N. Y. Acad. Sci. 1316 , 18–28 (2014).

Lu, Z. L., Liu, J. & Dosher, B. A. Modeling mechanisms of perceptual learning with augmented Hebbian re-weighting. Vis. Res. 50 , 375–390 (2010).

Gold, J., Bennett, P. J. & Sekuler, A. B. Signal but not noise changes with perceptual learning. Nature 402 , 176–178 (1999).

Dosher, B. A. & Lu, Z. L. Perceptual learning reflects external noise filtering and internal noise reduction through channel reweighting. Proc. Natl. Acad. Sci. U.S.A. 95 , 13988–13993 (1998).

Irvine, D. R. F. Plasticity in the auditory system. Hear. Res. 362 , 61–73 (2018).

Fritz, J., Shamma, S., Elhilali, M. & Klein, D. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat. Neurosci. 6 , 1216–1223 (2003).

Fritz, J. B., Elhilali, M. & Shamma, S. A. Differential dynamic plasticity of A1 receptive fields during multiple spectral tasks. J. Neurosci. 25 , 7623–7635 (2005).

58Horton, J. C., Fahle, M., Mulder, T. & Trauzettel-Klosinski, S. Adaptation, perceptual learning, and plasticity of brain functions. Graefe's Arch. Clin. Exp. Ophthalmol. (Albrecht von Graefes Archiv fur klinische und experimentelle Ophthalmologie ) 255 , 435–447 (2017).

Xiao, L. Q. et al. Complete transfer of perceptual learning across retinal locations enabled by double training. Curr. Biol. CB 18 , 1922–1926 (2008).

Kattner, F., Cochrane, A., Cox, C. R., Gorman, T. E. & Green, C. S. Perceptual learning generalization from sequential perceptual training as a change in learning rate. Curr. Biol. CB 27 , 840–846 (2017).

Bejjanki, V. R. et al. Action video game play facilitates the development of better perceptual templates. Proc. Natl. Acad. Sci. U.S.A. 111 , 16961–16966 (2014).

Green, C. S. & Bavelier, D. Learning, attentional control, and action video games. Curr Biol. 22 , R197-206 (2012).

Download references

Acknowledgements

The work was funded by the National Natural Science Foundation of China (91432102) and State Key Development Program for Basic Research of China (2014CB846101).

Author information

Authors and affiliations.

State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China

Xiang Gao, Tingting Yan, Ting Huang, Xiaoli Li & Yu-Xuan Zhang

You can also search for this author in PubMed   Google Scholar

Contributions

X.G. and Y.-X.Z. designed the study. X.G., T.Y. and T.H. performed the experiments and analyzed data. Y.-X.Z. and X.G. prepared the manuscript. Y.-X.Z. and X.L. revised the manuscript. All authors have read and approved the content of the manuscript.

Corresponding author

Correspondence to Yu-Xuan Zhang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Gao, X., Yan, T., Huang, T. et al. Speech in noise perception improved by training fine auditory discrimination: far and applicable transfer of perceptual learning. Sci Rep 10 , 19320 (2020). https://doi.org/10.1038/s41598-020-76295-9

Download citation

Received : 30 September 2019

Accepted : 21 October 2020

Published : 09 November 2020

DOI : https://doi.org/10.1038/s41598-020-76295-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

another name for speech recognition threshold

IMAGES

  1. Speech Recognition in Python- The Complete Beginner’s Guide

    another name for speech recognition threshold

  2. Speech Reception Thresholds

    another name for speech recognition threshold

  3. Speech Reception Thresholds

    another name for speech recognition threshold

  4. Google Assistant Takes Lead in Understanding Speakers with Accents

    another name for speech recognition threshold

  5. Speech Recognition: Everything You Need to Know in 2023

    another name for speech recognition threshold

  6. Speech recognition threshold (SRT) and speech recognition score (SRS

    another name for speech recognition threshold

VIDEO

  1. Name speech

  2. magi district name speech #youtubeshorts

  3. American Gangster, brand name speech. #movie #film #highlights #americangangster

  4. “Say my name”

  5. Crossing the Threshold

  6. Part 1: encoder decoder speech recognizer

COMMENTS

  1. Determining Threshold Level for Speech

    The speech recognition threshold is the minimum hearing level for speech (see ANSI S3.6-1969 standard or subsequent superseding standards) at which an individual can recognize 50% of the speech material. A recognition task is one in which the subject selects the test item from a closed set of choices. The individual should repeat or in some ...

  2. Back to Basics: Speech Audiometry

    Materials for Speech Threshold Testing The materials that are used in speech threshold testing are spondees, which are familiar two-syllable words that have a fairly steep psychometric function. Cold running speech or connected discourse is an alternative for speech detection testing since recognition is not required in that task.

  3. Speech Reception Threshold (SRT)

    Speech reception threshold (SRT) is a measure of hearing ability that is used to assess the lowest intensity level at which an individual can repeat familiar two-syllable words, known as spondee words, more than half of the time. ... A normal speech recognition score is typically considered to be 90% or higher. However, this can vary depending ...

  4. Speech Audiometry: Overview, Indications, Contraindications

    The speech-recognition threshold (SRT) is sometimes referred to as the speech-reception threshold. ... Another word list (devised from a grouping of 200 consonant-nucleus-consonant [CNC] words) is called the Northwestern University Test No. 6 (NU-6). Recorded tape and CD versions of all these word-recognition tests are commercially available.

  5. How to Read an Audiogram

    Speech reception threshold (SRT) = Softest intensity bisyllabic spondee (balanced syllable) words can be repeated 50% of the time Word recognition score = % of words discerned at threshold Speech discrimination = % single syllabic words identified and repeated at suprathreshold levels (generally 30 dB above SRT)

  6. Speech Audiometry: An Introduction

    In WRS testing, it is common to start at an intensity of between 20 dB and 40 dB louder than the speech recognition threshold and to use a different word list from the SRT. The word lists most commonly used in the US for WRS are the NU-6 and CID-W22 word lists. ... Another core use of speech audiometry in quiet is to determine the symmetry ...

  7. PDF Open Access Guide to Audiology and Hearing Aids for Otolaryngologists

    Speech recognition threshold (SRT) The SRT is the most frequently used speech threshold test. It is a measure of the intensity level at which the listener is able to ... present another word 5. Continue this process until the listener is able to correctly repeat the presen-ted word 6. Once the listener correctly repeats a

  8. Speech Audiometry

    Request an Appointment. 443-997-6467 Maryland. 855-695-4872 Outside of Maryland. +1-410-502-7683 International. Find a Doctor. Speech audiometry involves two tests: one checks how loud speech needs to be for you to hear it and the other how clearly you can understand words when spoken.

  9. Speech Audiometry

    There are several kinds of speech audiometry, but the most common uses are to 1) verify the pure tone thresholds 2) determine speech understanding and 3) determine most comfortable and uncomfortable listening levels. The results are used with the other tests to develop a diagnosis and treatment plan. SDT = Speech Detection Threshold, SAT ...

  10. Speech Reception Thresholds

    Speech Reception Thresholds - Procedure and Application: The speech reception threshold is the minimum hearing level for speech (ANSI, 2010) at which an individual can recognize 50% of the speech material.Speech reception thresholds are achieved in each ear. The term speech reception threshold is synonymous with speech recognition threshold.

  11. Individual Aided Speech-Recognition Performance and Predictions of

    This is referred to as the speech-recognition threshold (SRT). SRT measurements have the advantage that the resulting speech level is a physically interpretable quantity which can be compared with, for example, real-world speech levels employed by people speaking in noisy environments (Olsen, 1998).

  12. Speech audiometry

    The speech recognition threshold (SRT) is the lowest level at which a person can identify a sound from a closed set list of disyllabic words.. The word recognition score (WRS) testrequires a list of single syllable words unknown to the patient to be presented at the speech recognition threshold + 30 dBHL.The number of correct words is scored out of the number of presented words to give the WRS.

  13. How To Calculate Speech Recognition Threshold

    The higher the SNR and SBR, the higher the speech recognition threshold. This means that if the SNR and SBR are both high, the speech recognition system will be able to recognize more words. What Is the Importance of Understanding Speech Recognition Threshold? Understanding the speech recognition threshold is important for ensuring the accuracy ...

  14. The interpretation of speech reception threshold data in normal-hearing

    The results of these measurements are often described in terms of the speech reception threshold (SRT) and SNR loss. Using the basic concepts that underlie several models of speech recognition in steady-state noise, the present study shows that these measures are ill-defined, most importantly because the slope of the speech recognition ...

  15. A Guide To Interpreting Hearing Word Recognition Tests

    The client demonstrates that they recognize each word by repeating or using another method. During the test phase, various spondaic word pairs are presented at 2 dB decrements until at least five of the last six test words are missed. ... The speech recognition threshold (ANSI S3) is the amount of hearing required for speech. The speech ...

  16. 3700 Exam 3 Flashcards

    141 terms · What does SDT stand for? → Speech Detection threshold, What is another name for SDT → Speech Awareness Threshold (SA…, What is SDT? → lowest level in db person can…, What is the stimulus for SDT → sentences or connected speech…

  17. A Comparison of Presentation Levels to Maximize Word Recognition Scores

    In another study on practice patterns of audiologists, 74% of the respondents reported using a single SL; however, ... The speech-recognition threshold (SRT) was obtained using the Downs and Minard (1996) procedure. The Downs and Minard procedure is an ascending speech recognition threshold search. Following a familiarization phase, one word ...

  18. Speech recognition thresholds in noisy areas: Reference values for

    The strategy used to investigate the speech recognition threshold in noise was the sequential or adaptative, or even the ascending-descending, proposed by LEVIT & RABINER (1967). This one allows for a necessary level for the individual to correctly identify approximately 50% of the speech stimuli presented in an established S/N ratio.

  19. The Ultimate Guide To Speech Recognition With Python

    An in-depth tutorial on speech recognition with Python. Learn which speech recognition library gives the best results and build a full-featured "Guess The Word" game with it. ... and even varies from one utterance to another by the same speaker. A special algorithm is then applied to determine the most likely word (or words) that produce the ...

  20. Factors Underlying Individual Differences in Speech-Recognition

    Development of a speech in multitalker babble paradigm to assess word-recognition performance. J. Am. Acad. Audiol. 14, 453-470. [Google Scholar] Wilson R. H. (2011). Clinical experience with the Words-in-Noise test on 3430 veterans: comparisons with pure-tone thresholds and word recognition in quiet. J. Am. Acad.

  21. Speech in noise perception improved by training fine auditory ...

    Speech perception was measured by Mandarin word identification (Fig. 2A) in collocated and spatially separated (by 6-dB ILD) long-term speech shaped noise. The task was pre-trained before the ...

  22. Efficient Adaptive Speech Reception Threshold Measurements Using

    This study examines whether speech-in-noise tests that use adaptive procedures to assess a speech reception threshold in noise (SRT50n) can be optimized using stochastic approximation (SA) methods, especially in cochlear-implant (CI) users.A simulation model was developed that simulates intelligibility scores for words from sentences in noise for both CI users and normal-hearing (NH) listeners.