StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Speech Recognition and Speaker Recognition - Coursework Example

Cite this document
Summary
The author of this coursework "Speech Recognition and Speaker Recognition" describes some methods and techniques which are used in the speaker recognition and speech recognition systems so as to enhance the security as well as provide good outcomes. …
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER95.2% of users find it useful
Speech Recognition and Speaker Recognition
Read Text Preview

Extract of sample "Speech Recognition and Speaker Recognition"

Speech recognition and speaker recognition Speech recognition and speaker identification are very important for verification and authentication in security purpose; however, they are hard to achieve (Kamruzzaman, Et al.) The speaker substantiation is taken as a subclass of automatic speaker recognition (ASR) system. It can be applied to make out the identity of a person. As a result, the problem of speaker authentication is an accept-reject (true-false) (Chen &Luo, 2009). Background noise significantly effects on the general effectiveness of a voice recognition system, and this is considered as a tricky and a great challenge for the speaker recognition system. Progress in communication along with security technologies has become significant to have the strength of implanted biometric systems (Khan, Farhan& Ali, 2011).This chapter demonstrates the methods of speaker recognition that were employed in this field and offered reviews for the major weaknesses along with strengths for every classifier. 2.1 Introduction Speaker identification has been the theme of dynamic research for several years and has a lot of likely applications where the appropriateness of the concern of information source. Speaker identification is the technique that automatically recognizes a speaker by machine by means of the sound of the speaker. The most trendy programs of speaker identification systems is used in access control, for instance, featured information through the phone or admittance to a room. In addition, it has a very valuable usage for adaptation of speaker in the system of automatic speech recognition. Speaker recognition can be classified into two categories: verification and identification. Speaker identification is the process of making out which recorded speaker gives a certain expression. Speaker verification is the course of acceptance or rejection of the identity alleges of a speaker. Methods of Speaker recognition can as well be divided into text-dependent as well as text-independent methods (Kamruzzaman et al) 2.2 Description of Technique 2.2.1 MFCC MFCC is perhaps the most popular as well as best recognized as representing the speech signal for the function of speaker recognition. MFCCs are dependent on the known disparity of the human ear’s decisive bandwidths with frequency. 2.2.2 The MFCC processor Figure ‎2.1.1: Block diagram of the MFCC processor (Kamruzzaman et al). In the diagram above as made known in Figure 2.1 signify the configuration of MFCC processor.The speech input is recorded at a rate of sampling of 22050 Hz. This frequency of sampling is selected to lessen the impacts of obfuscation in the analog-to-digital relocate process (Kamruzzaman et al). 2.2.3 Mel-frequency wrapping The speech signal usually has tones at diverse frequencies. For every tone with a definite frequency f, measured in Hz, a skewed pitch is measured on the Mel scale (Kamruzzaman, Et al.) A mel unit depends on perceived hesitate on the human ear. The mel scale has roughly linear hesitate spacing under 1000Hz along with a logarithmic spacing above 1000Hz (Nijhawan&Soni, 2014). The following formula to calculate the mels for a specified frequency f in Hz (Kamruzzaman et al) Mel (f)= 2595*log10(1+f/700) ……….. (1) 2.2.4 Cepstrum The mel frequency cepstral coefficients (MFCCs) is used in transforming the log mel spectrum at the time and since the mel spectrum coefficients are actual numbers, they might be transformed to the time domain by use of the discrete cosine transform (DCT) (Kamruzzaman et al). The following equation is used to calculate MFCCs: 2.2.5 Where n=1, 2, …, K. The number of mel cepstrum coefficients, K, is classically selected as 20. The first component, c~ 0, is debarred from the DCT since it signifies the input signal mean value that carries little speaker explicit information (Kamruzzaman et al.).The subjective spectrum is to use a filter bank, consistently spaced on the mel scale as demonstrated in figure 2.1.2(Khan, Farhan& Ali, 2011). Figure ‎2.1.2 An example of mel-spaced filterbank(Khan, Farhan& Ali, 2011). LINEAR PREDICTIVE CODING (LPC) LPC is among the most prevailing techniques of speech analysis and is a valuable method for encoding excellence speech at a low bit rate. The fundamental idea behind the linear predictive analysis is that an explicit speech sample at the existing time can be estimated as a linear combination of previous speech samples (Shrawankar).LP model bases on human speech production. It uses a conventional source-filter model, whereby the glottal, vocal tract, along with lip radiation transfer functions are incorporated into one all-pole filter that replicates vocal tract acoustics. The principle behind the use of LPC is to lessen the sum of the squared differences amid the estimated speech signal and the original speech signal over a fixed duration (Shrawankar). POWER SPECTRAL ANALYSIS (FFT) One of the commonly used techniques of studying a speech signal is through the power spectrum. The speech signal power spectrum explains the frequency content of the signal in the end. The initial step towards computing the speech signal power spectrum is to conduct a Discrete Fourier Transform (DFT). A DFT computes the frequency data of the equivalent time domain signal. Because a speech signal has only actual point values, we can utilize a real point Fast Fourier Transform (FFT) for increased efficiency. The consequential output comprises of both the magnitude as well as phase information of the original time domain signal (Shrawankar). Support vector machine Support vector machine: SVM was formed by Vapinik (1998).It is among the most important advancements in pattern recognition in the last one decade. Other techniques such as Gaussian mixture models (GMM) and Hidden Markov Models (HMM) and which are utilized in feature matching are prone to overfitting. Moreover, they do not frankly optimize discrimination (Chen &Luo, 2009). 2.1.1.1 Three alternative methods for training SVMs Chunking: The chunking algorithm utilizes the fact that the quadratic form value is the same if you take out the rows as well as columns of the matrix that matches to zero Lagrange multipliers. Thus, the large quadratic optimization problem can be splitted into a series of smaller quadratic problems, whose final goal is to recognize all the non-zero Lagrange multipliers and abandon all the zero Lagrange Multipliers (Kamruzzaman et al.). Osuna’s algorithm: In 1997, Osuna, attested a theorem that suggests an entire new set of SVMs quadratic algorithms. The theorem attests that the large quadratic optimization problem can be splitted into a smaller quadratic sub-problems series (Kamruzzaman et al.) SMO: Sequential minimal optimization (SMO) is an easy algorithm that can promptly solve the SVM problem without any additional matrix storage and without the use of numerical quadratic optimization steps totally. SMO decomposes the general quadratic problem into quadratic sub-problems, by use of Osuna’s theorem to ascertain convergence (Kamruzzaman et al.) Three steps are demonstrated for each method in the Figure3. The horizontal thin line at each step stands for the training set, whereas the thick boxes stand for the Lagrange multipliers that are being optimized at that step. For chunking, a certain number of examples are added in each step, whilst the zero Lagrange multipliers are removed at each step. Therefore, the number of examples trained at each step has a tendency to grow. For Osuna’s algorithm, a preset number of examples are optimized in each step: the equal number of examples is added to and subtracted from the problem at each step. For SMO, merely two examples are optimized analytically at each step so that every step is extremely fast (Kamruzzaman et al.) Figure ‎2.1.3 Three alternative methods for training SVMs:Chunking, Osuna’s algorithm, and SMO (Kamruzzaman, et al.) LDA: Linear Discriminant Analysis is a statistical technique that decreases the dimension of the features even as maximizing the data conserved in the minimized feature space. Use of LDA after MFCC radically minimizes the dimension of features since LDA finds optimal conversion matrix that preserve a lot of the data and the same can be utilized in discriminating amid a range of classes (Khan, Farhan& Ali, 2011). Vector Quantization (VQ): VQ is the mapping process of vectors from a huge vector space to a limited number of regions in that space (Manjunath& PB, 2012). The purpose of VQ is to condense data and pick the more efficient features rather than using the entire feature vectors. By clustering the feature vector of the speaker into a known cluster numbers, the speaker models are produced. Each cluster is known as centroid and is symbolized by a code vector that constitutes a codebook. Every feature vector of the input is then evaluated with all the other codebooks. Eventually, the codebook that offers the minimum distance is chosen as the best (Nijhawan&Soni, 2014). 2.3 Comparative Analysis Method Strengths Weaknesses SMO Has a faster training time as compared to other methods. SMO is better to chunking and Osuna in computation time (Kamruzzaman, Et al.) ــــــــــــــ Chunking Can solve a quadratic optimization problem. It cannot solve large-scale training problems, since even this condensed matrix cannot fit in memory. Osuna’s algorithm Suggests keeping a regular size matrix for all quadratic sub-problems, which involves adding as well as deleting the same number of examples at all steps. It is inefficient. Mel frequency cepstral coefficients (MFCC) The commonly best known classifier used for both speaker recognition and speech recognition. Cost-effective and robust computation - LINEAR PREDICTIVE CODING (LPC) Reduce the sum of the squared differences amid the estimated speech signal and the original speech signal over a limited time (Shrawankar). POWER SPECTRAL ANALYSIS (FFT) FFT-based approach is best for its linearity in the frequency domain along with its speed of computation. FFT does not distort or discard data in any anticipatory way (Shrawankar). SVM It is highly accurate. Directly optimize discrimination. Easy to train and scale complicated high dimensional data in comparison with neural networks (Khan, Farhan & Ali, 2011). LDA Reduce the features’ dimension while maximizing the data preserved in the reduced feature space (Khan, Farhan& Ali, 2011). Finds optimal conversion matrix that preserves a lot of data and can also be used to distinguish between the a range of classes (Khan, Farhan& Ali, 2011). Neural Networks Models are noticeably trained. It is not generalizable. Hidden Markov Models HMM systems at all times produce the excellent performance. --- VQ Highly accurate and can be easily implemented (Manjunath& PB, 2012). Table‎2.1.1 : Comparisons between the techniques. 2.4 Conclusions This paper describes some methods which are used in the speaker recognition and speech recognition systems so as to enhance the security as well as provide good outcomes. The comparative analysis has been demonstrated for each technique. 2.5 References Chen, S., &Luo, Y. (2009).Speaker Verification Using MFCC and Support Vector Machine.Proceedings Of The International Multiconference Of Engineers And Computer Scientists. Retrieved from http://www.iaeng.org/publication/IMECS2009/IMECS2009_pp532-535.pdf. Kamruzzaman, S. M., Rezaul Karim, A. N. M., Saiful Islam, Md., & Emdadul Haque, Md. Speaker Identification using MFCC-Domain Support Vector Machine.Retrieved from http://arxiv.org/ftp/arxiv/papers/1009/1009.4972.pdf. Khan, A., Farhan, M., & Ali, A. (2011). Speech Recognition: Increasing Efficiency of Support Vector Machines. International Journal of Computer Applications, 35.Retrieved from http://arxiv.org/ftp/arxiv/papers/1204/1204.4257.pdf. Manjunath, N., & PB, M. (2012).Isolated Word Speech Recognition Using Vector Quantization (VQ). International Journal Of Advanced Research In Computer Science And Software Engineering, 2(5). Retrieved from http://www.ijarcsse.com/docs/papers/May2012/Volum2_issue5/V2I500451.pdf Nijhawan, G., &Soni, M. k. (2014). Speaker Recognition using Support Vector Machine.International Journal Of Computer Applications, 87(2).Retrieved from http://research.ijcaonline.org/volume87/number2/pxc3893379.pdf Shrawankar, U. TECHNIQUES FOR FEATURE EXTRACTION IN SPEECH RECOGNITION SYSTEM : A COMPARATIVE STUDY. Retrieved from http://arxiv.org/ftp/arxiv/papers/1305/1305.1145.pdf Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(Speech Recognition and Speaker Recognition Coursework Example | Topics and Well Written Essays - 1500 words, n.d.)
Speech Recognition and Speaker Recognition Coursework Example | Topics and Well Written Essays - 1500 words. https://studentshare.org/technology/1875838-related-work
(Speech Recognition and Speaker Recognition Coursework Example | Topics and Well Written Essays - 1500 Words)
Speech Recognition and Speaker Recognition Coursework Example | Topics and Well Written Essays - 1500 Words. https://studentshare.org/technology/1875838-related-work.
“Speech Recognition and Speaker Recognition Coursework Example | Topics and Well Written Essays - 1500 Words”. https://studentshare.org/technology/1875838-related-work.
  • Cited: 0 times

CHECK THESE SAMPLES OF Speech Recognition and Speaker Recognition

Possibility of Developing a Voice Recognition System in an Aircraft

Possibility of developing a voice recognition system in an aircraft Author Institution Course Date This paper seeks to research whether is it's possible to come up with a voice recognition technology for use in aircrafts that are to be flown by disabled people.... An instance would be the use of voice recognition systems in aircraft.... In the end, this paper will look at the process of implementing voice recognition technology in a pilot's communication with air traffic controllers as argued by (Chen, 2006)....
9 Pages (2250 words) Research Paper

Mentally challenged people entering our prisons and correctional facilities

The term ‘prison' is used to denote; “the institutions that hold people who have been sentenced to a period of imprisonment by the courts for offences against the law.... However, principles, approaches and technical advice are also relevant to other forms of compulsory detention....
2 Pages (500 words) Speech or Presentation

The Use of Voice Recognition for Forensic Investigations

The witnesses' correct recognition and recollection of the voice off the criminal will therefore determine the conclusion of the investigation or case.... Forensic Phonetics Name: Instructor: Course: Date: Preliminary assessments that should be carried out on forensic analysis speech samples to establish their adequacy for speaker comparison and identification analysis The use of voice recognition for forensic investigations goes back to several centuries back with the first case to employ the use of voice recognition in a case being the trial of William Hulet in 1660....
4 Pages (1000 words) Coursework

Article response paper

Article Response Paper Name Institution Main Issues/Points Discussed The article talks about the recognition of emotions in people's voices as they speak, regardless of the speaker's language.... In particular, the article talks about emotion recognition across cultural differences in people speaking different languages.... This article reinforces this knowledge; the authors state that listeners respond to changes in pitch, tone, loudness, quality, and rhythm as a person speaks, forming an impression about the speaker's emotional state....
3 Pages (750 words) Article

Building an International Image of Procter and Gamble International

ortune Magazine: World's third most admired companies (Awards and recognition)The second best company for leaders.... anked #1 as the world's most admired company for 12 consecutive years (Awards and recognition).... usiness Week:Ranked as the 12th most innovative company in the world (Awards and recognition).... (Awards and recognition)....
1 Pages (250 words) Speech or Presentation

Automatic Speech Recognition

lthough there still is a room for improvement in Automatic speech recognition (ASR) systems, there are a number of application areas which benefit from its use.... Telecommunications is one of the major application areas as speech recognition software acts as an interface that directly transfers data through a communication system into the information system.... It can be explained from this information that recognition rates for heavily accented people may be lower than others....
5 Pages (1250 words) Research Paper

Automatic Speaker Verification System

For this project, I employed the Hidden Markov Model (HMM) for the recognition of speech.... The paper "Automatic speaker Verification System" describes that HSLab is a label editor that is interactive for directing speech label files.... In the training step, the characteristics will be derived from a recognized speaker sound then stored in a database that has a model or an outline like Password or Name.... The stored information is then extracted by means of the model given by an unidentified speaker....
22 Pages (5500 words) Essay

Outside speech

Motivation can be done through increased salaries, recognition of the best performing employees and promotions.... ___8__ The speaker was passionate about the material7.... ___8__ The presenter was dynamic, confident and well spoken___77__ Total PointsGeneral Feedback: The speaker's presentation was above average.... o not state your qualifications as a speaker-- it makes you seem insecure5.... Which of these is NOT part of the basic process for introducing a speaker?...
2 Pages (500 words) Assignment
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us