StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Automatic Speaker Recognition - Thesis Example

Cite this document
Summary
The paper “Automatic Speaker Recognition” is a meaty example of a finance & accounting thesis.  The above chapter discusses the field of Automatic Speaker Recognition (ASR). It gives an introduction about speaker recognition. The advantages and applications of this unique method of authentication are explained in detail…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER96.6% of users find it useful

Extract of sample "Automatic Speaker Recognition"

Summary of individual chapters : CHAPTER 1 : The above chapter discusses about the field of Automatic Speaker Recognition ( ASR ). It gives an introduction about speaker recognition. The advantages and applications of this unique method of authentication is explained in detail. A comparison of this automatic speaker recognition with other methods is highlighted to emphasize the uniqueness of the automatic speaker recognition system. A brief note on the the two main stages of automatic speaker recognition system namely the enrollment and testing stages are shown. Next, the classification of the automatic speaker recognition is shown to be of Automatic Speaker Identification ( ASI ) and Automatic Speaker Verification ( ASV ). Later the concepts of Automatic Speaker Identification and Automatic Speaker Verification are dealt in detail. These two methods are compared to highlight the advantages of each. The commonly possible errors like False Acceptance and False Rejection are discussed and their dependency on the speech threshold is also highlighted. The relation between the Equal Error Rate ( ERR ) and the threshold are outlined to derive a system performance index which will be useful in the system implementation and testing stages. Finally another classification of the Automatic Speaker Recognition is described to be Text Dependent and Text Independent classes. With this introduction we now start with the detailed discussion of Text Independent Speaker Identification System. CHAPTER 2 TEXT INDEPENDENT SPEAKER IDENTIFICATION SYSTEM : In this chapter the theory and methodology behind Text Independent Speaker Identification Systems were discussed. The chapter started with the discussion about human voice and speech production mechanism. The human vocal track modeled as the acoustical tube, enhances the correlation between the physical nature of the vocal track, with the resonant properties of the acoustical tube. This eases the modeling and parameter extraction of the speech signal. A detailed description of the voiced and unvoiced sounds, plosives etc. has been dealt. The next section of this chapter explains the purpose and process of feature extraction from a speaker?s speech signal. The prominent methods of feature extraction like Linear Prediction Cepstral Coefficients ( LPCC ), Mel Frequency Cepstral Coefficients ( MFCC ), Bark Frequency Cepstral Coefficients ( BFCC ) and Uniform Frequency Cepstral Coefficients ( UFCC ) are analyzed. The derivation of the LP coefficients by Yule Walker method is shown with the aid of a diagram. It is shown that for better performance of the speaker identification system, LPCC with Mahalanobis distance measure is preferred. Apart from LPCC, the MFCC , BFCC and UFCC feature extractors are also explained in detail. Under the discussion about Pattern Matching, the template models ( Dynamic Time Wraping, Vector Quantization ) and stochastic models ( GMM , HMM ) are explained. The concept of Neural Networks for training and testing of speech has also been analyzed. More emphasis is given on the GMM which gives a smooth approximation to arbitrarily shaped densities CHAPTER 3 THE DESCRIPTION AND PERFORMANCE OF THE SYSTEM : The above chapter shows the intended implementation of the speaker identification system. The three phases of training, testing and performance evaluation are carried out in detail. The corpus used for evaluation is explained earlier in the chapter. The performance evaluation is done on the TIMIT database which has around 630 speaker?s utterances. During the training phase, utterances of 24 seconds were taken. The feature extraction methods include LPCC, MFCC, BFCC and UFCC. The LPC coefficients are computed using Levnson Durbin method which are later converted into cepstral coefficients. In MFCC, BFCC and UFCC the entire utterance is converted into feature vectors. The thesis uses the GMM and EM algorithm to model a speaker. The best match is obtained by the likely hood calculation method. The main performance parameter is the percentage of correct identification. The evaluation is done for TIMIT speech signals with varying SNRs. The utterances were of 3 seconds and 6 seconds duration and the feature orders were 8, 10, 12. Is it proved that the performance increases when the length of the utterances are increased from 6 seconds. Also the combined effect of the feature extractors and the Gp vector gives a better performance in identifying the speakers correctly. CONCLUSION : This these outlines that the concept of Automatic Speaker Recognition as a mechanism by which a person is recognized from a spoken phrase. These speaker recognition systems can be used to identify or verify a person?s authentication. This led to the discussion about speech production , speech processing , extraction of unique features from the speaker?s utterance , modeling of a speaker and finally the process of pattern matching to identify the speaker. The main idea of this thesis is to implement and evaluate a Text Independent Speaker Identification system. The text independent speaker identification needs to identify the speaker from the uttered word , even if the uttered word is new. This method is more sophisticated than the text dependent speaker identification system, because the text dependent method is confined to identify only speaker utterances that are already trained and stored in the system. The thesis also explains the different possible errors in these methods of speaker identification like false acceptance and false rejection. The equal error rate of these two is also explained for system performance evaluation. The system?s corpus uses the most easily accessible TIMIT ( Texas Instruments Institute of Technology ) database. The utterances of speakers in the TIMIT database are ten for each speaker and the duration for the utterance is 3 seconds. This database is rich in phonetics comprising of the dialect sentences( SA ) , diverse sentences ( SI ) and the compact sentences ( SX ). From this database the clean channel speech with sound booth environment has been chosen for the evaluation of this system. The accuracy in the identification of the speaker has been shown by the histograms of the various feature extractors and by the comparison tables in chapter three. This thesis uses the prominent four types of feature extractors, which are the Linear Prediction Cepstral Coefficients ( LPCC ) , Mel Frequency Cepstral Coefficients ( MFCC ) , Bark Frequency Cepstral Coefficients ( BFCC ) and the Uniform Frequency Cepstral Coefficients ( UFCC ) . these current methods of feature extraction are under improvement and a lot of research is still expected in these areas. The present Linear Prediction Cepstral Coefficient method is good for many cases of speaker feature extraction. During the training phase , the speech utterance is subjected to feature extractors. In Linear Prediction Cepstral Coefficient the Levinson Durbin algorithm is used to compute the prediction coefficients , they are later converted into cepstral coefficients. In Mel Frequency Cepstral Coefficients and Bark Frequency Cepstral Coefficients ?M? number of triangular filters and Discrete Cosine Transform ( DCT ) are used to get the cepstral coefficients. The performance of these feature extractors are well understood by their histogram representations. The stochastic Gaussian Mixture Model with 32 mixture density components is used to model the speakers. The training is done by EM algorithm. The analysis in this thesis shows that the Gaussian Mixture Model ( GMM ) is the most preferred model for speaker identification. This stochastic model uses conditional probability that depends upon the speaker?s utterance. This conditional probability density function is evaluated by a set of vectors. When they attain the expected density, they show a high probability of being identified. The identification is according to the Expectation Maximization ( EM ) algorithm. This algorithm computes a new model from the previous value by iteration method until convergence is attained. It is evident that the iteration leads to a maximum likelihood match of the feature vectors and the utterance. The performance parameter for the evaluation of the system is the calculation of the percentage of correct speaker identification based on the number of correct segments and total segments. The analysis of the performance of the system with 32 mixtures , 24 seconds duration of utterance, under different conditions of SNR ( signal to Noise Ratio ) reveals that the LPCC method of feature extraction is better ( 61.98 % ) for higher ranges of SNR and the UFCC extractor works well ( 4.71 % ) for the lower ranges of SNR. Under clean speech conditions the performance of LPCC it proved to be better. From the TIMIT database 10 % of the speakers were evaluated by this LPCC feature extractor and the percentage of identification is found to be 100 %. When the system was analysed for 16 model orders and 13 feature orders of MFCC , BFCC , UFCC , and Gp the performance is the best. In the above 13 feature orders 12 coefficients are of feature extractors and one coeffecient is of Gp. this parameter Gp is derived by Levinson Durbin algorithm as described in LPC computation. Under yhis combination of feature extractors and the parameter Gp , the best performance percentage of 99.20 % has been obtained by the MFCC + Gp combination. The evaluation of this system for different values of coefficients shows that the MFCC performs well for lower number of coefficients say 8 coefficients , while the UFCC does well for higher number of coefficients like 12 coefficients. Also the identification performance is tested for utterances that are more than 3 seconds. When using 6 seconds for utterances, the performance is found to be better with values of 96.83 % for MFCC , 97.14 % for UFCC and 97.78 % for BFCC. Thus it can be concluded that the usage of Levinson Durbin algorithm for the calculation of Gp and it?s combination with the feature vectors gives a great improvemrnt in the performance of correct identification of the speaker. Further the use of a lengthy utterance enhances the percentage of correct speaker identification. Read More
Tags
Cite this document
  • APA
  • MLA
  • CHICAGO
(Automatic Speaker Recognition Thesis Example | Topics and Well Written Essays - 1500 words, n.d.)
Automatic Speaker Recognition Thesis Example | Topics and Well Written Essays - 1500 words. https://studentshare.org/finance-accounting/2032021-the-conclusion
(Automatic Speaker Recognition Thesis Example | Topics and Well Written Essays - 1500 Words)
Automatic Speaker Recognition Thesis Example | Topics and Well Written Essays - 1500 Words. https://studentshare.org/finance-accounting/2032021-the-conclusion.
“Automatic Speaker Recognition Thesis Example | Topics and Well Written Essays - 1500 Words”. https://studentshare.org/finance-accounting/2032021-the-conclusion.
  • Cited: 0 times

CHECK THESE SAMPLES OF Automatic Speaker Recognition

Information Retrieval, Inverse Document Frequency

… The paper "Information Retrieval, Inverse Document Frequency" is an outstanding example of management coursework.... nbsp;With the development of information technology more and more data is being stored in electronic and other forms.... Finding the correct data especially from the electronically stored information is becoming more important by the day....
19 Pages (4750 words) Coursework

Speech and Speaker Recognition

… The paper "Speech and speaker recognition" is a great example of a finance and accounting assignment.... The paper "Speech and speaker recognition" is a great example of a finance and accounting assignment.... ysarthria speech recognition ... here are various speech recognition approaches employed in dealing with dysarthric speech including the automatic speech recognition system which is essential in the assessment of the dysarthric speech, the Sy and Horowitz's model (1) for determining the link between judgment from naïve listeners and the dynamic time warping response....
9 Pages (2250 words) Assignment

Industrial Relations and Workplace Change

According to Pyman, Cooper, Teicher & Holland (2006), the first step in transforming an organization into an all-time productive entity is the recognition of trade unions by employers.... … The paper "Industrial Relations and Workplace Change" is a wonderful example of an assignment on management....
9 Pages (2250 words) Assignment

Leadership, Motivation, and Communication

… The paper "Leadership, Motivation, and Communication" is a wonderful example of a report on management.... The document entails a brief discussion of the leadership, motivation, communication, and individual personality and values topics, which constitute the broad organizational behavior....
13 Pages (3250 words)

ATM and Process of Withdrawing Money

… The paper “ATM and Process of Withdrawing Money, Possible Resistance That Financial Service Firms Might Face during Systems Change” is a  thoughtful example of the literature review on finance & accounting.... Due to Microsoft announcing that it will stop its support on Windows XP that most ATMs run on, most banks will have to change to a different operating system....
9 Pages (2250 words) Literature review

Comparison of Asda, Marks and Spencer and Tesco

United kingdom has major of its challenges in the supermarkets and manufacturing sector while with recognition of the Indian they exhibit problematic handling of consumerism underuse of the slave-related actions on their labor force.... United kingdom has major of its challenges in the supermarkets and manufacturing sector while with recognition of the Indian they exhibit problematic handling of consumerism underuse of the slave-related actions on their labor force....
6 Pages (1500 words) Assignment

Business intelligence: a Managerial Approach

When considering buying a new laptop, a number of activities are involved, the first activity entails defining the decision problem and determining requirements-need recognition (intelligence phase).... When considering buying a new laptop, a number of activities are involved, the first activity entails defining the decision problem and determining requirements-need recognition (intelligence phase)....
12 Pages (3000 words) Essay

Principles of Management - Causes of Conflict in a Workplace

In addition, I learned that competition within employees especially for position advancement as well as recognition was another cause of conflict within a workplace (Raines, 2013).... … The paper "Principles of Management - Causes of Conflict in a Workplace" is an engrossing example of coursework on management....
7 Pages (1750 words) Coursework
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us