Speech Recognition Software Report Example | Topics and Well Written Essays

Institution : xxxxxxxxxxx Title : Speech Recognition Software Tutor : xxxxxxxxxxx Course : xxxxxxxxxxx @2010 1.0 Introduction Speech Recognition Software enable people to effectively control a computer through speaking to it by use of micro-phone either in form of entering text or simply issuing some commands to a computer. For the past twenty years, Speech recognition software have been in existence, although the early software seemed to be very expensive and they depended so much on very powerful computers in order for them to run. However, for recent years, a number of manufacturers have produced different versions (Honeycutt 2003). The technology used in speech recognition output also changes. Speech recognition software is designed to be used with the microphone. It interprets the spoken words in order to create some text-style documents. In addition, speech recognition software is controlled by algorithms which are very complex in coding. Speech recognition software is a program that enables a computer to identify spoken words. The technology involved is an individual talking to a computer and making sure that it recognizes what he or she says. Single-user speech recognition software is basically developed into some portable and digital voice recorders and they are available for both desktops and laptops (Borowitz 2001). Therefore, Speech recognition software is a computer program that operates together with a microphone in order to interpret the spoken words a text. This has made it increasing popular, making it beneficial to the various business executives, people with disabilities, college students, transcribers and the translators. 1.1 Aims and Objectives The overall aim and objective of the report is to provide a discussion on the brief history, different versions or types speech recognition software, how the technology of the software works, its uses as well as advantages, obstacles, constraints involved and the future expectations from the voice recognition software. Establishment of the transition between the use of early speech recognition software and the later versions will be made. Both commercial and free speech recognition software are such as Dragon dictation or NaturallySpeaking and IBM’s Via Voice software will also be examined. 2.0 Historical Summary The early software was only dictation products for the personal computers. Dragon System which was the Windows 1.0 Dragon Dictate was produced in 1994. Later on, the IBM introduced the basic and continuous speech recognition software known as MedSpeak/Radiology. Such systems normally existed in the five-figure price tags and could only operate on expensive personal computers. The technology of continuous speech allowed most of the users to speak in natural as well as conversational ways, and hence relieving the tedium of the discrete speech dictation (Juang & Rabiner 2004). In June1997, the Dragon Systems established enormous strides after releasing the basic general-purpose unbroken speech software program referred to as NaturallySpeaking. This program was very much affordable compared to the earlier programs. It also resulted into a realm of the continuous speech recognition to various users. In two months later, the IBM released very competing and continuous speech software known as Via Voice (Devine & Gaehde, Curtis 2000). Stringent demands implies much is required of the speech recognition programs or software, for instance, accuracy is considered critical while speed as the essential aspect to an effective program. In addition to the challenges, there are enormous variances among the various human speech inflections, patterns, rate and pitches. Such variations are considered to be the extraordinary tests the program flexibility. With the entire complex selections as well as tremendous flexibility that have been demanded in voice recognition software, very strong computers are required in running such programs. Realistic remainder indicates that the technology of speech recognition has impressively developed over the previous years (Juang & Rabiner 2004). 3.0 How Speech Recognition operates Speech recognition software entirely depends on the advanced coding algorithms since each person’s speech patterns as well as speaking style are unique. This implies that both regional and the foreign accents in conjunction with the various dialects, determines how the words are spoken. In addition, lazy enunciation generally changes how people sound out their own words. As a result, the speech recognition software becomes challenged because it operates through matching the various possible sounds of the spoken word in relation to its counterpart within a written form. Since each person’s voice is unique, it becomes very necessary for the users involve in training and configuring the speech recognition software in order for them to recognize their distinctive styles of speech. However, some applications are supplied with the training programs included instead of containing the particularly selected texts which have to be interpreted within the software (Grabianowski 2006). Usually, the instructions or algorithms call for the concerned user to speak in his or her normal voice. After the speech recognition program or software has translated the user’s spoken words, it creates room for the user to make the necessary corrections that are required. At times, the program may learn from its own mistakes and automatically adjust such interpretations accordingly. As a result, the longer time an individual utilizes a particular program, the less mistakes speech recognition software will result into, since such a program will be oriented for a specific user it implies that it cannot operate well with another user. Voice Recognition Software not only operates through decoding the phonemes or the sounds, but also by the use of word contexts prior and after every word. Additionally, it uses several of probability tables that are developed by the linguists involved in scouring each country’s regional speech samples referred to as bigrams as well as trigrams. To explain further bigrams are identified as the two word phrases, while trigrams are the three word phrases. In this case, accuracy is not only improved by a clear enunciation, but also through speaking in phrases (Grabianowski 2006). 4.0 The different types of Speech Recognition Software Speech recognition software has distinctive classes in relation to their abilities to recognize the various types of utterances. Dragon NaturallySpeaking and the IBM via Voice are the commonly commercial speech recognition software available in the market. Although, the Dragon NaturallySpeaking has emerged to be the best speech recognition software with the IBM’s 40 years commitment in speech research and development has also led to the large distribution of via voice software. Dragon NaturallySpeaking software performs very well because it comprises of the advanced features that allow the users, particularly those that are trained to easily access both nuts and bolts of the software. On contrary, IBM via voice software is very slow that results into much hangs as well as significant translations that takes noticeably longer time. Furthermore, its cross-software compatibility is considerably poor. Other commercial speech software include the vocalis speech ware, Babel technologies, speechworks, nuance, abbot or the abbot demo and entropic. Several other free speech recognition software are in existence, for instance; XVoice which is the dictation or continuous speech recognizer software that can effectively use with various XWindow-based applications. It enables provides features of user-defined macros and is considered to be a fine program that reflects a definite future. In addition to its features, it properly set up XVoice performs with very adequate accuracy. In order to deploy it, XVoice has to be first downloaded and install as well as configured of the IBM’s free viavoice for it operate correctly enabled with the Lesstif or motif. Finally, since the program interacts with the X windows, it is advisable that the X resources have to be left open on the machine. This is a caution that has to be applied is the software is used on a networked or the multi-user machines (Girard & Dillon 1997). (CVoiceControl or kVoiceControl) refers to the Console Voice Control was established as the KVoiceControl (KDE Voice Control). It is an important speech recognition system that enables the users to execute the Linux commands through the use of spoken commands. CVoiceControl has replaced the KVoice Control. It also contains the microphone intensity configuration utility, systems for speech recognition and vocabulary model editors that add some new commands as well as utterances. CVoiceControl is considered to be an excellent point of reference or the starting zone for the experienced users who may have the intentions of getting started through the ASR. Although, it is does lead in user friendly software, but if correctly trained, CVoiceControl can be very beneficial to the user. GVoice is identified as the speech ASR library that makes use of the IBM’s viavoice SDK in order to control the Gtk or GNOME applications. It contains libraries for vocabulary manipulation, initialization, and panel control and recognition engine. Open Mind Speech is free software that its names have time and again changed from VoiceControl through SpeechInput and to FreeSpeech. Today, it is part of an Open Mind Initiative that is in complete operation basically for the developers. ISIP (Institute for the Signal and Information Processing originated from the Mississippi State University produced its own speech recognition engine. Among the tool kit that are used with it include; a decoder, training module and front-end which functions as the tool kits. The software is also useful to the developers. CMU Sphinx was initially introduced at the CMU, but of recent has been released as the open source. It is a considerably large program that comprises of the various tools as well as information. Although, still under development, CMU Sphinx includes the trainers, language models, little documentation, recognizers and acoustic models. More of the free speech recognition software are the Ears, NICO ANN Toolkit, the Hidden Markov Model of Myer, Jialong He’s Speech Recognition and Research Tool software. 5.0 Uses and advantages Although such tasks that are accomplished through the technology of computer interfacing can potentially be achieved by use of the Automatic Speech Recognition (ASR), the commonly applications used with it include; dictation as the leading use of the ASR systems today. For example, the medical transcriptions, general word processing, legal as well as business dictations are the major dictation areas used. Special vocabularies are also used in various cases in order to add value to the system accuracy. Embedded applications have also indicated that cellular phones include the C&C speech recognition which allow the utterances like “Call Home” that is considered to be a major factor concerning the future of Automatic Speech Recognition (ASR) as well as the Linux. Medical or disabilities and wearable implies that a number of people have difficulties with typing because of the physical limitations, for example, recurring strain injuries ( RSI) and muscular dystrophy (Bergeron 2004). This technology also best suits such individuals with hearing difficulties. They can make use of their telephone connected system to effectively convert the speech from a caller to various texts. Since most of the inputs have limitations for the wearable devices, speaking emerges to be the natural possibility. Command as well as control systems which are designed specifically to perform both functions and the action, greatly apply utterances such as Open Netscape and the Start a new xterm. In addition, Telephony heavily relies on the PBX or the Voice Mail Systems that allow the callers to speak through commands rather than pressing buttons in order to send some specific tones. More of the advantages include; the technology allows the users to be mobile, interface is best suited on the transactions instead of surfing, there is a wide variance in the individual preferences in regard to directed vs. the natural language interfaces and speech technology is a one-dimensional interface. 6.0 Obstacles and constraints Speech recognition software begins with a pre-programmed database of sound patterns, although, the speech of the actual users varies. As a result, the pronunciation of a word by the user can easily change, the quality of microphone that collects the sound patterns at times can also be poor and the ambient noise can entirely change the sound pattern of a given word. In addition, voice recognition software operates best only when the software has collected speech pattern data of each user. This implies that speech recognition software involves introductory curve learning, making it effective but initially makes a lot of mistakes. Voice recognition software is also associated with some other potential drawbacks such as large amounts of the computer memory are required in storing the voice files (Ramaswamy et al. 2000). The software cannot be used within the classroom settings because of the noise interference and it makes a lot of errors that are frustrating if adequate support is not given. The users of the speech recognition software must be trained to enable them recognize the voice which is very hard for the poor decoders. Speech recognition software is inaccurate and very clumsy and they do not offer enough security. Therefore, it becomes very important for every user to effectively involve in speech recognition software trainings in order to recognize his or her styles of speech. However, some of the voice recognition software is produced with the training programs that contain specific texts that are identified for the users to read into a given program. The instructions involved may dictate the user to speak in his or her usual voice for the program to translate the spoken words for the user to correct the emerging mistakes (Ramaswamy et al. 2000). 7.0 The Future; Expectations and Solutions Microsoft is very much determined to move the technology of speech recognition into conventional, enhancing the widespread of the speech industry. For instance, Kokanee as the codename given to the major research as well as development efforts made at Microsoft that are specifically designed to enable the speech industry grow. Due to Microsoft’s focus on delivering the Net Speech Platform, it will make the development and deployment of speech-enabled applications to be simpler and faster (Grabianowski 2006). The argument is that incase Kokanee succeeds in enhancing the development of the killer speech applications, it implies that soon people will find themselves talking with the computers on frequent basis as they enjoy such an experience. The shifts through the various versions of the speech recognition software have covered the period of about eight years (Detmer 1995). During this period of time, various software as well as hardware has emerged to be increasingly very powerful. For instance, the change from the DragonDictate to the NaturallySpeaking was considered as the greatest because of the introduction of the continuous speech. Continuous upgrades have been in speech recognition software in order to cope up with its increasing demand. Therefore, much expectations is given to the current and most preferred NaturallySpeaking version 7 to effectively cope with the dictation requirements, particularly in writing scientific or PhD thesis. Voice or speech recognition software is highly considered by the administrators in the medical offices as the alternative to record-completion, expenses and error rate delays that are associated with the conventional transcription (Zick & Olsen 2001). This implies that given the recent advances in the speech recognition software, most of the medical transcriptionists are largely searching for this emerging technology as the powerful way in accomplishing the crucial record-keeping tasks. At some future point, speech recognition is expected to turn into the speech understanding. This is because such statistical models that enable the computers in deciding what a person says will in future allow them to understand the logic behind the words. Although, the voice recognition software appears to be a huge component in terms the computational power as well as software sophistication, various researcher argue that developments in speech recognition, provides a very direct line from the current computers to really artificial intelligence. Today, people can successfully talk to their computers, but in 25 years the computers are most likely to talk back to people (Girard & Dillon 1997). 8.0 Conclusion Various manufacturers have produced different versions of speech recognition. The technology used in speech recognition output also changes. The technology of continuous speech allowed most of the users to speak in natural as well as conversational ways, and hence relieving the tedium of the discrete speech dictation. Realistic remainder indicates that the technology of speech recognition has impressively developed over the previous years. Stringent demands implies much is required of the speech recognition programs or software, for instance, accuracy is considered critical while speed as the essential aspect to an effective program. Speech recognition software entirely depends on the advanced coding algorithms since each person’s speech patterns as well as speaking style are unique. This implies that both regional and the foreign accents in conjunction with the various dialects, determines how the words are spoken. Additionally, lazy enunciation normally changes how people sound out their own words. Voice Recognition Software not only operates through decoding the phonemes or the sounds, but also by the use of word contexts prior and after every word. Continuous upgrades have been in speech recognition software in order to cope up with its increasing demand. The developments in speech recognition, provides a very direct line from the current computers to really artificial intelligence. Bibliography Bergeron, B., (2004), Voice recognition and medical transcription, MedGenMed, 6(3), 54. Borowitz, S.M., (2001), Computer-based speech recognition as an alternative to medical transcription. J Am Med Inform Assoc, 8(1), 101–102. Devine, E.G. & Gaehde, S.A, Curtis AC., (2000), Comparative evaluation of three continuous speech recognition software packages in the generation of medical reports. J Am Med Inform Assoc. 7(5), 462–468. Detmer, W.M., (1995), A continuous-speech interface to a decision support system: II. An evaluation using a Wizard-of-Oz experimental paradigm. J Am Med Inform Assoc, 2(1), 46–57. Grabianowski, Ed. (2006), How Speech Recognition Works, Retrieved December 24, 2010 from, Girard, K., & Dillon, N., (1997). Market grows for voice applications. Computerworld, 31(32), 55-56. Honeycutt, L (2003), Researching the use of voice recognition writing software, Computers and Composition, 20(1), 77-95. Juang, B.H & Rabiner, R.L., (2004), Automatic Speech Recognition – A Brief History of the Technology Development, Retrieved December 24, 2010 from, Ramaswamy et al., (2000), Continuous speech recognition in MR imaging reporting: advantages, disadvantages, and impact, American Journal of Roentgen logy, 174(3), 617-622. Zick, R.G. & Olsen, J., (2001), Voice recognition software versus a traditional transcription service for physician charting in the ED. Am J Emerg Med, 19(4):295–298. Read More

Speech Recognition Software - Report Example

Extract of sample "Speech Recognition Software"

CHECK THESE SAMPLES OF Speech Recognition Software

Radiology Department Service Blueprint

Automatic Speech Recognition

UDL Solutions Plan: Step Four

Speech Recon

Technologies For Students With Disabilities

Accommodations and Modifications

Human Interface Techniques for Computers

Continuous Speech Recognition for Clinicians