StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Score Normalisation in Voice Biometrics - Term Paper Example

Cite this document
Summary
The author of the paper "Score Normalisation in Voice Biometrics" will begin with the statement that speaker verification involves the determination of the identity of the speaker, and speaker identification involves determination of matches to the input voice…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER94.2% of users find it useful
Score Normalisation in Voice Biometrics
Read Text Preview

Extract of sample "Score Normalisation in Voice Biometrics"

Score Normalisation in Voice Biometrics Abstract Speaker verification involves determination of the identity of the speaker, and speaker identification involves determination of matches to the input voice. Score normalisation techniques are used to transform a system’s output scores reducing misalignments, caused due to speaker dependent or independent factors, such as test data conditions, training conditions, etc., in score distributions in different speaker models. Bayesian method and standardization of score distributions are two score normalisation methods. Bayesian methods include cohort normalization, world model normalisation, and unconstrained cohort normalisation. Standardisations of score distributions include Z-norm, and T-norm. Score normalisation helps achieve separation between score distributions of known and unknown speakers. A reduction in equal error rate is achieved by the use of score normalisation methods. Introduction Speaker recognition is required in applications, such as operating in environments that are uncontrolled or while transmitting speech over communication channels. Speaker verification involves assessment of similarity scores between registered or unregistered users and reference models. The expectation is that verification scores should be high for true speakers and low for impostors. However, true speaker verification scores could be adversely affected by background noise, speech variations of the speaker, variations caused by the recording apparatus, and/or effects caused by the communication channel. Score distribution plots enable observation of true speaker scores and impostor scores relative to each other. Figure 1. True and Impostor Speaker Score Distribution (Ariyaeeinia, 2006) Test utterances from true speakers and impostors obtained experimentally can be used to generate score distribution plots (see fig. 1). Since, there is an overlap between true and impostor score distributions, an acceptance threshold is chosen. The accuracy of verification process is directly proportional to the distance between the score distributions. Overlapping of score distributions could result in errors, such as false acceptances and false rejections. False acceptances involve accepting impostors as true speakers. False rejections involve rejecting true speakers. Adjusting the threshold could result in reduction of one type of error while increasing the other. This could be overcome by setting the threshold, so that the two error types are equal. This technique is known as the equal error rate (see fig. 2), where false acceptance rate is set equal to false rejection rate. Figure 2: Setting Threshold for Equal Error Rate (Ariyaeeinia, 2006) The accuracy of speaker verification is represented by a detection error trade off plot (see fig. 3). Figure 3: Detection Error Trade off Plot (Ariyaeeinia, 2006) Variations in speech characteristics are caused because of background noise and/or channel noise. These along with speaker generated variations can cause mismatch between utterances between training and testing, resulting in reduction of accuracy (Ariyaeeinia, 2006). Score Normalisation Methods Score normalisation methods have been widely used to improve accuracy. Several score normalisation methods exist, depending on the approximation approach. These are mostly based on the mean of scores for background speaker model. It is given by the expression Snorm = score for target model/mean scores for background models, where Snorm is the normalised score. The ratio of scores instead of absolute scores has resulted in improvement of verification performance, since the ratio of the score for target model to a statistic of scores with the same background remains unchanged. Cohort normalisation method uses scores for a cohort of speaker models, where competing speakers are selected based on the closeness of speaker and target models before the testing. In this method, the possibility of an impostor’s test utterance being equally dissimilar from competing and target models exists, giving rise to the possibility of false acceptance. Unconstrained cohort normalisation uses scores for a cohort of background speaker models closest to the test utterance. Background speaker models are selected during the testing of speaker verification, thus reducing impostor scores in relation to true speaker scores. The method has been successful in the reduction of false acceptance and false rejection. When the number of background models is increased, the capability to suppress impostor score is diminished. Other score normalisation methods include those based on standardisation of score distributions. It is desirable to use a single threshold for all registered speakers. However, score distributions for impostors and true speakers have different characteristics. A widely used practice is the standardisation of impostor score distribution. T-norm is an effective normalisation method, where normalisation parameters are determined dynamically during testing. It is given by the expression Snorm =St-µT/σT, where Snorm is the normalised score, St is the initial score for the target speaker model, µT the average of scores for background speaker models, and σT the standard deviation for background speaker models (Ariyaeeinia, 2006). Snelick (2005) has described other common score normalization methods, which include: Min Max Method: Raw scores are within 0 to 1 range. It can be expressed as , where n is the normalized score, s is the raw matching score, and max(S) and min(S) are the maximum and minimum points of the score range. Z-score Method: The method is given by the expression, where n is the normalized score, s is the raw matching score, mean(S) is the arithmetic mean and std (S) is the standard deviation. Tanh Method: The method is given by the expression , where n is the normalized score, s is the raw matching score, mean(S) is the arithmetic mean, std(S) is the standard deviation, and tanh() is a trigonometric operator. Score Normalisation Advantages Ariyaeeinia et al. (2006) have emphasized that score normalisation helps achieve separation between score distributions of known and unknown speakers. Two main score normalisation categories include the Bayesian method and standardization of score distributions. Bayesian methods include cohort normalization, world model normalisation, and unconstrained cohort normalisation. Standardisations of score distributions include Z-norm, and T-norm. Speaker identification involves determining the correct speaker from a registered population. Speaker verification involves determining a speaker as s/he claims to be. In the study, cohort methods exhibited the best performance. Score normalisations are used to overcome problems in scores, which are affected by distortions in test utterance characteristics, speaker model misalignment, and unseen data. In a comparative study of decoupled and adapted Gaussian mixture models in open set text independent speaker identification by Fortuna et al. (2005), it was found that cohort approaches, particularly unconstrained cohort normalisation were equally capable of good performances in both models. Normalisations in the study included, world model normalisation, cohort normalisation, unconstrained cohort normalisation, T-norm and Z-norm. T-norm was among the worst performers in the case of decoupled Gaussian mixture models and among the best performers in adapted Gaussian mixture models. Score normalisation techniques are used to transform a system’s output scores reducing misalignments in score distributions in different speaker models. Misalignments are caused due to speaker dependent or independent factors, such as test data conditions, training conditions, etc. T-norm has been widely deployed as a score normalisation technique for improving the performance of speaker verification systems, as a result of its low false acceptance rates. T-norm has been used as a test-dependent normalization technique, which estimates score distribution of the test speech from a set of impostor models. A novel speaker adaptive technique based on T-norm has been proposed for speaker verification. The technique uses Kullback-Leibler divergence fast approximation for Gaussian mixture models. Stable improvements in error reduction rates were obtained for all conditions (Ramos-Castro, 2005). Score Normalisation Case Studies In a study of text dependant speakers by Ariyaeeinia et al. (1997), verification performance of various types of vector quantisation and dynamic time warping classifiers, algorithmic issues and verification accuracy were examined. Performance degradation caused by linear filtering effect of a telephone channel was minimized by the use of cepstral mean normalization approach, where cepstral feature vector average was computed and subtracted from individual feature vectors. Ma (2003) conducted a comparison of discriminate training methods for speaker verification. Widely used score normalization techniques, such as T-norm and Z-norm have been used in speaker verification systems to perform channel and handset compensation. On application of a discriminative score normalization technique, the methods caused better performance. However, additional speech data or external speakers needed to be computed. In the experiment, a logistic regression model has been used. Logistic regression has proved to be an effective score-normalization technique, which could be combined with other model training methods. A normalized discriminant analysis method for speaker verification has been presented by Li et al. (1996) to address problems in the use of linear discriminant analysis in the design of classifiers. The training data being small, discriminant scores from different classifiers were scaled differently. In the technique, the projected data from true speaker and impostor was maximally separated by the use of a weight vector. An equal error rate of 6.13 was achieved, while the use of Fisher linear discriminant analysis resulted in error rate of 18.18 percent. The method combined with Hidden Markov Models, a hybrid speaker verification system resulted in an error rate of 4.32 percent, which was lower than 5.30 percent in the Hidden Markov Model with cohort normalization. Alsaade et al. (2008) have proposed unconstrained cohort normalization for multimodal biometrics in the score level fusion process. The technique examined the application of widely used score normalization in voice biometrics to other biometrics. Normalisation methods considered were cohort normalization, unconstrained cohort normalisation, universal background model normalisation, T-norm and Z-norm. Speaker recognition involves the computation of the probability of the target model given the test utterance, where statistical classifiers provide the verification score. Another approach to score normalisation is based on standardizations of score distributions. The aim is to facilitate the use of single threshold for all speakers. However, impostor score distribution and true score distribution have different characteristics for different speakers. Standardising the impostor score distribution has been the current practice. In a study of speaker verification by the use of mixture decomposition discrimination, Sukkar et al (2000) showed that an error reduction of 46 percent was achieved by using a hybrid verification system, involving speaker dependent Hidden Markov Modelling with cohort normalisation. In the experiment, the same word spoken by different speakers caused domination of different Hidden Markov Model mixture components. Speaker verification output scores are transformed during score normalization, which serves to enhance the effectiveness of detection threshold. This is achieved by the alignment of score distributions of individual speaker models, and reduction of effects of speaker dependent and independent modifications of the signal. The T-norm and Z-norm are common normalization techniques. T-norm involves the estimation of parameters using scores derived from impostor models. Z-norm involves estimation of parameters using scores from a set of impostor utterances. In an experiment, the T-norm was extended to Adaptive T-norm offering advantages over the standard T-norm. This was achieved by adjusting the speaker set to the target model. This resulted in lower error rates compared to the traditional T-norm (Sturim, 2005). H´ebert et al. (2005) have described a T-norm technique has been described for text-dependent speaker verification. T-norm is an extension to cohort normalization, which has proved to be very effective in normalizing verification scores. In a text-dependent task, mismatch between the lexicon of the target speaker and cohort speaker models has made the deployment of T-norm a challenge. The researchers proposed a scheme of hybrid scoring using T-norm and background model to over to overcome the problem. This resulted in a 31 percent relative error rate reduction than the use of T-norm alone. Score Normalisation Applications Naval Research Laboratory has embarked on a study of voice biometrics, with the ultimate goal of enabling the use of voice as a password. Speech normalization methods used in the study included normalization of the peak amplitude of the speech waveform, adaptive boosting of high frequency speech for spectral analysis, fixed rule to crop the speech waveform, wider bandwidth for extracting more voice features, and removal of speech distortion on use of a gas mask. A voice biometrics system has been designed, which involves the selection of test phrase by the speaker, carrying own speech template, pre-processing of speech waveform for normalization, optimization of voice biometrics performance and calibration of the self-test score (Kang, 2002). Arslan et al. proposed a speaker authentication and identification system in the VOICIFY project. Speaker verification involves determination of the identity of the speaker from a voice sample. Speaker identification involves determination of matches to the input voice. Other systems include text dependant systems and vocabulary dependant systems. The system was designed for high precision, and robustness against channel transformations and noise making it suitable for telephony applications for security purposes. Speaker verification has been proposed in three steps. The first step was to extract features that were speaker dependent. The second step was to build a statistical model representing the characterization of the feature set. The third step involved decision making about the input voice by comparing it to previously developed speaker models. The benefits to the proposed project include improved security, reduced costs, improved service and saving time spread over industry sectors, such as financial services, telecom, retail, enterprise and information technology, travel, internet, hospitals, insurance, government, and military. Conclusion Score normalisation methods include Bayesian methods and standardisation of score distributions. Score normalisation helps achieve separation between score distributions of known and unknown speakers. A reduction in equal error rate is achieved by the use of score normalisation methods (see fig. 4). Figure 4: Effectiveness of Score Normalisation (Ariyaeeinia, 2006) References Alsaade, F. (2008). Enhancement of multimodal biometric segregation using unconstrained cohort normalisation. Pattern Recognition. 41 (2008), 814-820. Ariyaeeinia, A. (2006). Verification effectiveness in open-set speaker identification. IEE Proc.-Vis. Image Signal Process. 153 (5), 618-624. Ariyaeeinia, A. (1997). COMPARISON OF VQ AND DTW CLASSIFIERS FOR SPEAKER VERIFICATION. European Conference on Security and Detection . Conference Publication No. 437 (28-30 April), 142-146. Arslan, L. (2009). HANDSET NORMALIZATION FOR VOICE AUTHENTICATION (VOICIFY). GVZ SPEECH TECHNOLOGIES CO. 2009, 1-5. Fortuna, J. (2005). ON THE USE OF DECOUPLED AND ADAPTED GAUSSIAN MIXTURE MODELS FOR OPEN-SET SPEAKER IDENTIFICATION. Proceedings of The Third COST 275 Workshop. Biometrics on The Internet (2005), 41-44. H´ebert, M. (2005). T-Norm for Text-Dependent Commercial Speaker Verification Applications: Effect of Lexical Mismatch. ICASSP 2005. ICASSP (0-7803-8874-7/05/), 729-732. Kang, G. (2002). Voice Biometrics for Information Assurance Applications. Naval ResearchLaboratory. NRL/FR/5550--02-10,044 (December 5), 1-44. Li, Q. (1996). NORMALIZED DISCRIMINANT ANALYSIS WITH APPLICATION TO A HYBRID SPEAKER-VERIFICATION SYSTEM. IEEE. 1996 (0-7803-3 192-3), 681-684. Ma, C. (2003). COMPARISON OF DISCRIMINATIVE TRAINING METHODS FOR SPEAKER VERIFICATION. ICASSP 2003. ICASSP (0-7803-7663-3/0), 192-195. Ramos-Castro, D. (2005). SPEAKER VERIFICATION USING FAST ADAPTIVE TNORM BASED ON KULLBACK-LEIBLER DIVERGENCE. Proceedings of The Third COST 275 Workshop. Biometrics on The Internet (2005), 49-52. Snelick, R. (2005). Large Scale Evaluation of Multimodal Biometric Authentication Using State-of-the-Art Systems. IEEE Transactions on Patern Analysis and Machine Intelligence. 27 (3), 450-455. Sturim, D. (2005). SPEAKER ADAPTIVE COHORT SELECTION FOR TNORM IN TEXT-INDEPENDENT SPEAKER VERIFICATION. ICASSP 2005. ICASSP (0-7803-8874-7/05/), 741-744. Sukkar, R. (2000). Speaker Verification Using Mixture Decomposition Discrimination. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING. 8 (3), 292-299. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(Score Normalisation in Voice Biometrics Term Paper, n.d.)
Score Normalisation in Voice Biometrics Term Paper. Retrieved from https://studentshare.org/formal-science-physical-science/1553281-score-normalisation-in-voice-biometrics-case-study
(Score Normalisation in Voice Biometrics Term Paper)
Score Normalisation in Voice Biometrics Term Paper. https://studentshare.org/formal-science-physical-science/1553281-score-normalisation-in-voice-biometrics-case-study.
“Score Normalisation in Voice Biometrics Term Paper”, n.d. https://studentshare.org/formal-science-physical-science/1553281-score-normalisation-in-voice-biometrics-case-study.
  • Cited: 0 times

CHECK THESE SAMPLES OF Score Normalisation in Voice Biometrics

Biometric Authentication

Strengths and Weaknesses of the PALM biometrics and Reliability and Authentication of this Mechanism PALM biometrics systems are used interchangeably with palm vein biometric method.... According to the research findings, provided people require security for their critical assets, which has proven to be an uphill task in the contemporary society, dedicated biometric solution (DBS) will play a vital role in ensuring the same....
7 Pages (1750 words) Essay

The Biometric Facial Recognition Process

Professor Date biometrics Introduction biometrics refers to the unique personal, physical and measurable characteristics that can be used to identify an individual.... hellip; There are various kinds of biometrics one of the most common being facial recognition.... In facial recognition, the spatial geometry of the facial biometrics is recorded....
5 Pages (1250 words) Term Paper

Biometric Security

The paper "Biometric Security" tells us about security measures.... These days the world community is experiencing a fierce security scare from several sources.... Terror-related incidents in different parts of the world.... hellip; While on the one hand, the terror attacks are resulting in loss of human lives, the heightened security measures are also causing a lot of inconvenience to the people whenever, they have to cross over a boundary, check-in at the airport, attend a ceremony, start a train journey, etc....
15 Pages (3750 words) Essay

Biometrics: Fingerprints, Retina, Facial Recognition, and Iris Patterns

Since the early twentieth century, statistical development has been the major concern of general science with the inclusion of field of biometrics.... For instance,… t varieties of wheat were compared with the help of different agricultural experiments, in order to understand the yielding characteristics of the wheat with the facilitation of statistical analyses through biometrics.... In this way, a number of sectors have employed the means of In the present contemporary era of computer science, a different role is being played by the biometrics applications....
11 Pages (2750 words) Essay

The Biometric System and Its Use

biometrics is one of the latest applications of IT and helps us in many different cumbersome tasks in an effective manner.... The term biometrics has been used since the 20th century.... biometrics refers to the field of development of mathematical methods which are applied in data analysis of problems in biological sciences.... The term biometrics has been widely used in two aspects, characteristics and processes.... This is where the term “biometrics” discusses characteristics....
4 Pages (1000 words) Essay

The Use of Biometrics and Bio-Information to Support New Systems Integration

The paper "The Use of biometrics and Bio-Information to Support New Systems Integration" states that new technology base biometrics systems are transforming the simplicity of use, exactness as well as the performance of customary biometrics technology-based solutions.... hellip; biometrics technology offers power as a worldwide systems integrator is a serious aspect in supporting to bring biometrics policies to life....
19 Pages (4750 words) Annotated Bibliography

Biometrics Signature Recognition

This essay "biometrics Signature Recognition" is drawn to researches information in signature from a statistics synthesis view.... nbsp;biometrics is a continually progressing technology that has been broadly used in many endorsed and viable ID applications.... biometrics-based private certification schemes have grown intensive research curiosity due to the inconvenience and unreliability of legacy systems.... Thus, biometrics is well-thought-out to be a confident and appropriate authentication apparatus since it can't be plagiarized, forgotten, or even stolen....
5 Pages (1250 words) Essay

Comparative Analysis of Suicide in Japan and the UK

Often as a result of various issues including among them are depression, borderline personality, mental disorders and other influences such as that of alcohol, socio-economic deprivation, health… These and other stress factor has led to the rise in suicidal rates in most countries especially those in the Asian continent and has been a great global concern (NEERAJA 2008, pg18). Various studies have been undertaken to There has been a variation of these rates between different countries and regions (NEERAJA 2008, pg23)....
6 Pages (1500 words) Essay
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us