StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Speaker Identification System - Thesis Example

Cite this document
Summary
The paper “Speaker Identification System” is an appropriate example of a finance & accounting thesis. In this chapter, the concepts of speaker recognition and identification are introduced. Initial discussion highlights the uniqueness, applications, and advantages of speaker recognition and speaker identification method of authentication…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER92.3% of users find it useful

Extract of sample "Speaker Identification System"

SPEAKER IDENTIFICATION SYSTEM SUMMARY FOR INDIVIDUAL CHAPTERS : CHAPTER 1 - INTRODUCTION : In this chapter the concepts of speaker recognition and identification are introduced. Initial discussion highlights the uniqueness, applications and advantages of speaker recognition and speaker identification method of authentication. This method is compared with the other methods of authentication. The concepts related to speech processing, speech recognition and speaker recognition are also introduced. The topic of speech processing details about the characteristics of the speech signal like the different qualities of speech, the variations in the acoustic properties of speech signal and the conversion stages in transforming the human speech signal into the digital speech signal. This digital speech is of main interest for the speaker authentication system. A spectrogram of speech signal is shown to characterize the signal energy in it. Speech recognition is described with a block diagram which details the stages of speech processing, word detector, pattern matching, etc. Later the basic classification of speaker recognition systems is described with a diagram. A clear demarcation between the speaker identification and speaker verification systems is described. These basic concepts lay the required foundation for the discussions to follow in later chapters. CHAPTER 2- SPEAKER IDENTIFICATION : This chapter deals with the concept of speaker identification in detail. Initially the human speech production mechanism is described. Next, the types of speaker identification systems are explained based on the dependency on the text information. The two main types discussed are text independent speaker identification and text dependent speaker identification. Later the speaker identification system is described in detail with a block diagram after a short discussion about speech models. The various stages like pre-emphasis filtering, analog to digital conversion, frame blocking mechanisms, windowing techniques and auto correlation analysis are discussed in detail. The pre-emphasis filtering is dealt for both the cases of frame by frame speech signal sequence and the entire speech signal. The analog to digital converters which are used in practice are highlighted with their specifications. The detailed theory behind frame blocking is explained with the aid of a diagram. The characteristics and performance parameters of various windowing methods are shown. Finally the use of auto correlation analysis for extracting the harmonic and formant properties from speech signal is emphasized. CHAPTER 3 – FEATURE EXTRACTION : The above discussion about feature extraction describes the methods of selecting and estimating the appropriate features in the speech signal using best possible methods. Methods like Linear prediction coefficients (LPC), Linear prediction cepstral coefficients (LPCC), Mel filter bank cepstral coefficients (MFCC), Bark filter bank cepstral coefficients (BFCC) and Uniform filter bank cepstral coefficients (UFCC) are dealt in detailed. It is shown that the Linear prediction coefficients give information about formant frequency and bandwidth of the speech signal. Nonetheless, a more suitable alternative for LPC is LPCC. The cepstral coefficients other than the zeroth coefficient represent the features of the speech signal. In Mel filter bank, cepstral coefficients are calculated on the mel scale using triangular filters. This frequency mapping has been dealt in this chapter and the cepstral coefficients are computed according to the given equations. The chapter also outlines the advantages of MFCC in application to GMMs and speaker identification systems. Another feature extraction method named BFCC, is also discussed whose performance is similar to MFCC. Finally the UFCC discussed has lower performance than MFCC and BFCC. It is still suitable for speaker identification because it gives uniform resolution at all frequencies. Thus this chapter gives a wide idea about the various methods of speech feature extraction. CHAPTER 4 – SPEAKER MODELLING : The concepts of speaker modeling discussed in this chapter begins with a brief introduction about human voice production and it’s uniqueness. The different models for speakers such as Template models and Stochastic models are explained. In depth analysis of the Gaussian Mixture Model (GMM) and the Vector Quantization (VQ) models are done. It is shown that the GMM which uses around 32 mixture components performs well. These mixture components influence the amplitude of the speakers reference signal. It is shown that the feature vectors of the GMM models are conditional Probability Density Functions, which depend on the speakers voice characteristics. The best match is chosen based on the maximum likelihood estimate method. A template model namely, the vector quantization (VQ) method is also explained in detail. The VQ codebook method is discussed with relevant diagrams showing the clustering and the partitioning sequence based on Euclidean distance measurement method. The best match is shown to be obtained in VQ under minimum distortion conditions. The remaining chapter gives knowledge about with K – means algorithm and Expectation maximization (EM) algorithm which are used in the best match detection procedure. CHAPTER 5 – THE SYSTEM PERFORMANCE AND RESULTS : In this chapter a best approach for Speaker Identification was chosen and the evaluations were carried out on a selected speech database called the TIMIT database. This database has a vast collection of speaker utterances which are of 3 seconds duration. The system evaluation is done under two stages namely training phase and testing phase. During training eight utterances are combined to form a speech frame. This speech is subjected to signal processing and feature extraction methods like LPCC, MFCC , BFCC , UFCC. This thesis uses the GMM for speaker modeling. Each speaker has a model. The testing is carried out for different values of feature order , different number of mixture components and at different levels of Signal – to – Noise Ratios. For the purpose of evaluation the performance parameter is the percentage of correct identification. The performance is checked for various combinations of order, mixture components, etc. the best results are found to be obtained for lengthy utterances and by the use of an additional vector - Gp of Levinson Durbins algorithm. The best percentage of 99.20 % is obtained for MFCC and Gp taken together. CONCLUSION : In this thesis Text Independent Speaker Identification is done by using Gaussian Mixture Models ( GMM ). The thesis discussion covers the various concepts in speech processing, speech recognition, speaker recognition and speaker modeling. Each stage in the process of speaker identification, discusses about all related methods involved in that stage. More emphasis is given for feature extraction and speaker modeling which are essential in speaker identification and are generally done through pattern matching. The text independent speaker identification is chosen rather than text dependent method because it is more sophisticated to use. For such a speaker identification system mainly stochastic models are used for the speaker modeling. In this thesis the stochastic Gaussian Mixture Model for speaker modeling has been chosen because of it’s high rate of success in this field of speaker identification. One Gaussian Mixture Model is used to represent each speaker in the training set. These Gaussian Mixture Models are obtained with the help of k- means algorithm for clustering and the Expectation Maximization ( EM ) algorithm for maximizing the likelihood match. The results of the tests show that the Gaussian Mixture Model is superior than any other type of speaker modeling. The tests are based on the TIMIT ( Texas Instruments Massachusetts Institute of Technology ) database. This database is chosen specifically because of it’s wide accessibility and use. The database has speeches of around 630 speakers , with ten conversations for each speaker. The database also gives various environmental noises characterized at various levels of Signal to Noise Ratios. The Signal to Noise Ratios available are 15 , 20 , 25 , 30 dBs. and clean speech. Each speaker is allowed ten utterances , each utterance lasts for a duration of 3 seconds. The sampling frequency of these speech signals are 16 khz , without session interval. During the training phase eight utterances from each noise environment were chosen. During the testing phase the remaining two utterances were used. The experimental tests were conducted under two phases namely the training phase and the testing phase. The training phase started with the combination of the eight utterances to get a long speech for 24 secs. After subjecting this speech frame to the speech processing stages, it is subjected to feature extraction methods like Linear Prediction Cepstral Coefficients , Mel filter Bank Cepstral Coefficients , Bark filter Bank Cepstral Coefficients and Uniform filter Bank Cepstral Coefficients. The experimental analysis of feature extraction done by LPCC , MFCC , BFCC and UFCC are compared by computing the histograms for each method . The normalized curves for the above methods are also shown. After feature extraction, the GMM is used to model every speaker. In the testing phase the same steps are repeated until feature extraction. Then to identify the speaker correctly , the EM algorithm has been adopted. The main performance criteria set is the percentage of number of correct identifications against the total number of speakers. The experimental results for the LPCC , MFCC , BFCC , UFCC under different noise levels or SNR levels are analyzed. The SNR levels of 15 , 20 , 25 , 30 and clean speech were used. The performance of all the four LPCC , MFCC , BFCC , UFCC were found to be poor for the SNR whose value is 15. For the SNR of 25 , the performance is good with values of 50.32 % for MFCC , 47.54 % for BFCC , 61.19 % for UFCC and finally 61.98 % for LPCC. For these results the experimental parameters used were an utterance length of 3.6 seconds , feature order of 8 , 10 , 12 , and the number of mixture components of 8, 16, 32. The results show that the performance is good for the LPCC method compared to the other three methods. For the same set of parameters, the performance figure is high when the number of mixture coefficients are considered rather than the SNR. The performance values found show that the MFCC method has good performance compared to others when the tests were conducted for 8 coefficients whereas the BFCC has good performance for 12 coefficients. Further increase in the feature order above 32 , does not show any improvement in the system performance characteristics. When the speech frame length was increased to 6 seconds, the performance of all the methods are good for higher values of feature extraction order say 12 coefficients. The lower range of feature extraction orders with 8 coefficients shows variations and MFCC proves to be a better option. For higher values of mixture components say 8, 16 , 32 mixtures, the system performance shows variations at different levels. The BFCC and UFCC show good performance of 99.8413 % with 8 mixture components. The UFCC proved better with the performance of 99.841 % for 16 mixtures and finally the BFCC was found to have good performance of 97.7778 % with 32 mixtures. This thesis involves the Levinson Durbin algorithm for computing the Gp for each frame. This computation of Gp, increases the energy content of the speech signal. This Gp vector when included in the feature matrix, improves the overall system performance of identifying the uttered speaker correctly. In this case of analysis the MFCC along with Gp showed good performance percentage of 99.20 % for correct identification of the speaker. Next to MFCC , the UFCC along with Gp has a good performance of correct speaker identification with a performance percentage of 98.89 %. Finally , the BFCC along with Gp came out with a performance percentage of 98.49 % in identifying the speaker correctly. Thus we can conclude that the use of Levinson Durbin algorithm for the computation of Gp and the inclusion of this Gp as an additional feature vector along with other feature vectors , improves the system performance to a greater extent. Also it has been proved that the best performance of correct Speaker Identification is obtained by increasing the length of the test utterances. Thus in this thesis a sophisticated method of Speaker Identification has been derived , tested and the results shown are the best with respect to correct Speaker Identification. Read More
Tags
Cite this document
  • APA
  • MLA
  • CHICAGO
(Speaker Identification System Thesis Example | Topics and Well Written Essays - 2000 words, n.d.)
Speaker Identification System Thesis Example | Topics and Well Written Essays - 2000 words. https://studentshare.org/finance-accounting/2032020-conclusion
(Speaker Identification System Thesis Example | Topics and Well Written Essays - 2000 Words)
Speaker Identification System Thesis Example | Topics and Well Written Essays - 2000 Words. https://studentshare.org/finance-accounting/2032020-conclusion.
“Speaker Identification System Thesis Example | Topics and Well Written Essays - 2000 Words”. https://studentshare.org/finance-accounting/2032020-conclusion.
  • Cited: 0 times

CHECK THESE SAMPLES OF Speaker Identification System

Problems Exist in the Contemporary US

For example, in the education system, people can express their dissatisfaction with the inadequate facilities in the United States schools that support the marginalized.... For example, in the education system, people can express their dissatisfaction with the inadequate facilities in the United States schools that support the marginalized.... Problem identification entails the expression of dissatisfaction with the prevailing status quo.... Problem identification entails the expression of dissatisfaction with the prevailing status quo....
1 Pages (250 words) Essay

Automatic Speaker Recognition

With this introduction, we now start with a detailed discussion of the Text Independent Speaker Identification System.... HAPTER 2 TEXT-INDEPENDENT Speaker Identification System :In this chapter, the theory and methodology behind Text Independent Speaker Identification Systems were discussed.... Next, the classification of the automatic speaker recognition is shown to be of Automatic speaker identification ( ASI ) and Automatic Speaker Verification ( ASV )....
6 Pages (1500 words) Thesis

Threats to E-Commerce

The essential parts of an information system are made up of five basic rudiments.... The core component of an information system is the computer itself.... This would, therefore, entail the support needed for the development of the information system, and management of these systems that are in essence computer-based....
9 Pages (2250 words) Assignment

Speech and Speaker Recognition

ysarthria as a family of neurogenic speech disorders that affects almost the complete speech sub-system like laryngeal, velopharyngeal and the articulary subsystems.... Dysarthric problems are a result of muscular control disruption because of the peripheral's or even central nervous system's lesions which results in the disruption of the message transmission for effective movement through motor controls, due to these complications it can also be grouped under the neuromotor disorder (Enderby, 1983)....
9 Pages (2250 words) Assignment

How Do Teachers Observe and Evaluate Elementary School Students Foreign Language Performance

… The paper  “How Do Teachers Observe and Evaluate Elementary School Students' Foreign Language Performance?... rdquo;  is an opportune example of a finance & accounting case study.... nbsp; What is the research question?... This study finds out how teachers observe and assess basic or elementary learners of foreign language in terms of classroom performance....
2 Pages (500 words) Case Study

Project Management and Operation Planning

… The paper "Project Management and Operation Planning " is a great example of management coursework.... Chart flows can be used to produce accurate timescales for projects.... Therefore, this helps to ensure that the various parts of a project are completed within the required time.... Chart flows show the responsibility of each person and the expectations from the projects....
14 Pages (3500 words) Coursework

Business intelligence: a Managerial Approach

… The paper "Business Intelligence Systems" is an outstanding example of a business essay.... When considering buying a new laptop, a number of activities are involved, the first activity entails defining the decision problem and determining requirements-need recognition (intelligence phase)....
12 Pages (3000 words) Essay

Marks and Spencer PESTEL Analysis

… The paper "Marks and Spencer PESTEL Analysis" is a perfect example of a business case study.... Marks and Spencer is a multinational retailer which is on the London stock exchange.... It has specialized in selling home products, clothing and food products that are luxurious.... Marks and Spencer company was established in the year 1884 by Thomas Spencer and Michael Marks....
9 Pages (2250 words) Case Study
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us