Improving Accuracy of Answer Extraction in Question-Answer System Research Paper Example | Topics and Well Written Essays

Improving Accuracy of Answer Extraction In a Question Answering System (QA SYSTEM)/ Goal-driven answer extraction Abstract The paper aimed at providing an approach for improving accuracy of answer extraction in question answering system. The approach proposed in this research proposal helps in retrieving relevant passages, rank them in order of their importance and reduce errors of obtaining irrelevant document. It goes ahead to high necessary passages in documents which contain the exact in formation the user is querying. The proposed system is an improvement from corpus based question answering system by introducing semantics and information based retrieval models. This system will reduce the amount of time spent in collecting information. It will further rank the data according to its accuracy in terms of information contained in them. The passage that contains the exact answers for the query will be highlighted within the documents that are retrieved. The system works on the grounds that improved retrieval will improve the quality of information obtained and this relies on system accuracy. In this regard, the empirical review paper analyses the current and previous literatures conducted by various experts in an attempt to develop a literature based on how to enhance accuracy of information retrieval and answer extraction in a question answering system. Consequently, the review paper will provide brief literatures on QA System research history, issues and challenges it solves, and current research. Acknowledgement First of all, I would like to express my deepest appreciation to my supervisor Dr.Fei Liu for being understandable and flexible person to me and foremost her useful comments, notes and engagement through the learning process of this thesis survey. Furthermore, I would like to thank Prof.WennyRahayufor encouraging me to work hard in this year as well for the support on the way. Also, I like to thank the people who shared their valuable time during the process in my survey. I would like to thank my loved ones (my wife), who have supported me throughout entire process, both by keeping me harmonious and helping me putting pieces together. I will be grateful forever for your love. Table of Contact Abstract 2 Acknowledgement 3 1Introduction 1 1.1Research Field 3 1.2Research Aim 4 1.3QA system Research history 5 1.4Current Research Works 6 1.5Issues/challenges to be addressed 8 2Current Research 9 2.1Query Analysis 10 2.2Information Retrieval 11 2.2.1Information Retrieval in Isolation 11 2.2.1.1The OpenEphyra QA System: 12 2.2.1.2Test Collection 13 2.2.1.3High-Accuracy Text Retrieval 14 2.3Answer Extraction 17 2.3.1The Question-Answer Database (QUAB) Model for Factoid Question Answering 19 2.3.1.1Developing the QUAB 21 3Conclusions and Research Proposal 23 3.1Conclusion 23 3.2Research proposal 24 3.3Evaluation 25 4Reference 27 Table of Figures Figure 1: General Architecture of Question Answering System 3 Figure 2: A sample process in QA system from Tapeh and Rahgozar (2008) Study 10 1 Introduction Information technology has grown to the level that requires retrieving information from Internet necessary. This is because the entire world is accessing information through Internet and the information is understood by human beings. Retrieving this information from search engine such as yahoo and Google requires users who have adequate knowledge on a topic before searching. This limits individuals who have little knowledge on a certain title thus requiring an accurate way of extracting an answer from such engines. The current Question answering system (QA System) is a dedicated structure of information retrieval, and when offered with a set of documents, it tries to recover exact answers to questions fronted in natural idiom. According to Zhenqiu (2012), Open-domain question response needs question-answering systems to answer questions concerning every imaginable matter, and such systems cannot depend on hand designed domain for definite comprehension to stumble on and figure out the accurate answers. Besides that, several QA System has been employed, and diminutive effort has been done on the establishment of an assessment paradigm for them. QA system helps in retrieving information containing document but does not provide means of locating the exact passages within the documents thus leaving the user with a task of extracting them. This is tedious exercise if number of documents supplied is taken into account. Thus, it is essential to improve the way information is retrieved by reducing the amount of documents and text that the user receives. Therefore having the most relevant extracted QA system needs to be improved. There have been developments that are closer to providing accurate information retrieval system like Corpus-based question answering, however they have some shortcoming. In the system the finds data or information in form of natural language which is through the use of surface patterns or lexico-syntactic patterns to extract information from corpus offline. Figure 1: General Architecture of Question Answering System Therefore the system to be developed here is intended to assist researchers to extract accurate information for their queries. This will combine semantic based search engine and accurate answer extraction system which will help in accurate Question – answering system. The system should have the ability to include the extraction mechanism and answerextraction since they are interrelated. The isolation of answer extraction and mechanism of extraction will not improve accuracy as expected because of they are inseparable thus the need of a new system that will provide improvement is required. 1.1 Research Field There are many fields of study relating to QA system and they include query analysis, answer extraction and information retrieval. Query analysis involves processing of information input in a natural language that is acceptable by the system toclassify the information needed from the question format. Once the information has been queried, it will be retrieved from the system in a form of answers using information retrieval components into target collection.Yen, Wu and Yang (2013) reviewed existing QA system paradigms to review the status of comprehension with regard to comprehending user acceptance of novel information technologies. The procedure of searching answers to a natural language question has been presumed customarily to be in three different phases: first, question processing, which involves the acknowledgment of the information that is required from the question query (Hickl 1265). Secondly, passage retrieval, which involves the retrieval of relevantinformation text from definite phrases and key words mined from the passage of a question, and thirdly, extract parallel text to a question’s precise answer. Huang and Yao (2004)argue that this is not the merely applicable model for factoid question answering (QA). As an alternative of extracting answers from groups of retrieved texts, the review paper bring in a new model which discerns precise answers to factoid questions by balancing a system of question-answer pairs, which correspond to all of the provided questions and answers that can be gained from a certain text collection. The review paper analyze available literatures to provide an insight on the developed novel paradigm of answer extraction, which computes the value of candidate answers not just concerning a group of attributes mined from a question, but in the enormous perspective of the information entailed in the corpus all together. 1.2 Research Aim The aim of the research is to develop a system that will be used in improving the accuracy of answer extraction in a question answer system. The research aims to provide a brief literature of query analysis, information retrieval, and extracts answer in natural language and to bring in a novel paradigm of answer influence, which permits question-answering systems to approximate the value of answers. This will require designing of new features which will interact with the question in order to improve information retrieval precision. Furthermore, the research will offer a comparison between TREC information retrieval system functionality to a system that employs similar text retrieval component, and with no natural language processing. This will be pursued by illustrations of approaches used to searching literal answer strings to domain with the aim of conferring a new method based on semantic restraints to develop the functionality and mobility of a reformulation-established question answering system. Finally, the research aims to illustrate a technique for acquiring semantic-established reformulations routinely and to produce patterns from texts retrieved from the Web established on syntactic, lexical and semantic limitations. 1.3 QA system Research history To begin with QA query answer system has been in existence since 1977. In that year a system called student was developed with a specific aim of helping students solve algebra. However there was development of another system called GISENG which had the ability to predict input commands as the user was typing. However, this system was improved to START which contains three main parts. The system had the ability to generate a sentence for a question and it read to the current question answer system. This is the system that needs to be improved in order to have accurate answer for queries (Veeravalli and Varma563). This is because an inquiry is expressly designed strings that have keywords as well as search engine commands. The first Question Answering systems like BASEBALL, which was developed by Green et al. in the 60s, offered a natural language client interface to a catalog of baseball data and information (Oh, Myaeng and Jang 3695). Consequently, LUNAR is another question and answer system developed by Woods in 70s, which permitted NASA geologist to forward questions to a catalog containing data and scrutiny of lunar rocks and soil examples collected from the Apollo 11 astral mission (Buscaldi and Rosso 4). At present, there are various dissimilar forms of question answering systems, and in general, the systems plunge beneath two sets, open domain QA systems and closed domain QA systems. According to Vilares, Vilares and Otero (2011), a closed-domain Question Answering system is developed to respond questions that plunge in an explicit authority field. For instance, there is a medication QA system developed by Yun and Graeme and an airplane maintenance QA system developed by Rinaldi et al. On the hand, open-domain systems pact with general questions concerning almost everything, and a paradigm of an open-domain QA SYSTEM is MIT’s START web-established system developed by Katz in 1997 (Kosseim and Yousefi 61). 1.4 Current Research Works Currently there are many projects that have been undertaken to improve accuracy in the QA systems. Fundamentally, natural language questions on the contrary, are a natural method to articulate an exact information requirement (Davide Buscaldi 443). Recently, research has been dedicated to the information retrieval paradigms that take into account representations attained from the natural language questions and answers traced in the documents. Presently, various review papers such as Guda, Sanampudi and Manikyamba (2011) and Buscaldi, Rosso and Gómez-soriano (2010) who analytically examined potential representation set-ups and parallel retrieval paradigms that permit systems to find efficiently answers in the documents directly and to combine incomplete answers attained from dissimilar pieces of a document or from unlike documents. In addition to textual databases, stored multimedia message are turning out to become progressively outstanding. Aktolga, Allan and Smith (2011) posit that content acknowledgment in media will be a key research field, and conventionally there has been a concern in utilizing passage-established queries over multimedia assets, if possible in the form of natural language questions or answers. A further feature that will almost certainly enhance the need for QA technology is the growing application of mobile devices, like Smartphone’s, to access information, for which time-honored queries comprising of typed keywords are not exceptionally comprehensible (García-cumbreras, Martínez-santiago and Ureña-lópez 416). Voice-facilitated natural language interfaces, can in the future facilitate asking factual questions or placing questions in verbal natural language. Essentially, there have been numerous surveys on question answering technologies in the earlier period; for instance, Mori (2005),studied the foremost approaches to QA in English while Yen, Wu and Yang (2013) discussed history, inspiration and all-purpose approaches to open domain QA extremely supported by the Text Retrieval Conference (TREC). A modernization on the approaches employed in open domain QA was presented in Moreda, Llorens and Saquete (2011) and Wacholder, Kelly and Kantor (2007), where they both presented their article in a different manner contrary to the previous ones. Foremost, their article does not make a difference amid open domain and closed domain QA, and it is not limited to the approaches accounted in the QA domain only, rather it presents the question answering work from an information retrieval context. In addition, it stresses on the significance of the retrieval paradigms, explicitly the retrieval tasks and representation of queries and informationdocuments, which are utilized for approximating the significance amid a query and an answer candidate(Attardi, Cisternino and Formica 6). 1.5 Issues/challenges to be addressed Essentially, a Goal-driven answer extraction is entirely a new concept in the domain of answer extraction making it hard to trace any form of related work that can be utilized to develop a scientific literature. As a result, the literature shortage will affect the selection techniques and build-up in all levels within the system, which comes because of problems arising from the techniques used to operate the system. Presently, there is lack of knowledge based on the significance of the retrieval system in the providedGoal-driven system paradigm failing to reduce the number of documents needed for passages retrieval. Additionally, there is insufficient literature based on the extra use for sentence retrieval in the TREC innovation track, whereby the task is to decrease the quantity of superfluous and irrelevant information in a provided set of documents. 2 Current Research According to Fan, Wang and Wang (2009), modular structural design is a QA system which has minimum deviations during information retrieval as it uses isolation components in retrieving documents which are relevant query made by the user. The system works by imposing a control mechanism which is linear in classification of information that is being generated after querying. The extracted information is classified in the order of their importance in terms of containing high percentage of key words and semantics (Kang, Liu and Zhuang 97). According to Liu, Chen and Kao (2013) modular structural design is important and easier to implement because its ability to be analysed. The components of the system can be coupled or decoupled during the analysis enabling extraction of the information to be easier. In a normal QA system, answer retrieval and question evaluation elements depends on processing tools which identifies passages with similar answers to the question queried. This craving is essential to facilitate the answer retrieval method to establish whether answers subsist in extracted passage, by evaluating it and contrasting it adjacent to the question evaluation module’s answer requirement (Kolomiyets and Moens 5432). In fact, the passage-extracting module does not employ the frequent representation for recording passage; either the question evaluation unit or an open query-designing module records it into a representation query-able by the passage extraction module. According to Bilotti and Nyberg (2008), in the transmission channeled modular QA system design includes the composition of the mechanism used in retrieval. Basically, it is effortless to view that errors flow as the QA process shifts through downstream units, and this steers to the instinct that exploiting functionality of individual components reduces the error at every level of the channel, which, as a result, should make the most of largely continuous system accuracy (Mengqiu 3). Figure 2: A sample process in QA system from Tapeh and Rahgozar (2008) Study 2.1 Query Analysis It involves analyzing and ranking the query to determine whether it requires more than one type of information to be retrieved. This is done by query operators which encode the query so that it can be machine understandable. It can also involve the ranking of a question in terms of the passage to be retrieved and the answer that is required. Kim and Han-joon Kim (2) filtering questions will help determine the number of answers to be generated and this is done by NER(Kim and Kim 367). Meaning that if the questions are not analyzed or filtered, the answer generated may have answers with little meaning. Thus in question analysis the following criteria is used. Figure 3. Data flow of the proposed QA system ( Kim and Han-joon Kim (3) From the data flow diagram above it shows that the system splits the query then the query is assigned named –entity recognition which generates the question which filters that is to be queried. Then these questions are filtered before they are indexed. 2.2 Information Retrieval Abney has introduced a new information retrieval system called Smart IR. Basically, smart IR retrieves much smaller section of information to extracts ranked passes relevant to the question query. In addition, it extracts specific answers from document collections. Smart IR is applied on the TREC-8 question-answering track. This section is going to evaluate the system based on the passage retrieval and entity extraction methods technology. 2.2.1 Information Retrieval in Isolation Bilotti and Nyberg (2008)try to enhance the functionality of a QA system by replacing its present passage retrieval module with for high accuracy retrieval system that can test out linguistic and semantic restraints at retrieval instance. The system is stand-alone component which was applied in OpenEphyra QA System, Test Collocation and high-Precision text Retrieval System as follows: 2.2.1.1 The OpenEphyra QA System: According Bilotti and Nyberg (2)Open-Ephyra is a free accessible QA system that is open source and has four faces that helps in information retrieval. The phases entails query formation, question examination, answer search, retrieval and assortment (Bilotti and Nyberg 2). It involves the application of Internet in searching for answers of queries. This is reinforced by a scheme which is within the QA system for passage retrieval. The frequencies of the entity appearance in a retrieved passage using OpenEphyra are marked within the text using semantic tasks. In this case ASSERT is used to download the data so that it can have well marked extracted answers (Srihari and Li 2000). OpenEphyra has a baseline that assists in extracting and assorting answers for queries. It uses this baseline to extract information from a certain document using relationship entity identified during querying (Bilotti and Nyberg 2). It has also a filter which helps in filtering out documents which have passages with information for the query. Bilotti and Nyber (3) have provided an example of Richard loves Jan: as shown below “Richard loves Jan [ARG0 [PERSON Richard]] [TARGET loves] [ARG1 [PERSON Jane]” (Bilotti and Nyberg 2) This system has provided an avenue for a proposed QA system that will improve accuracy of answer extraction from such engines. There should be an improvement of this system to include ranking of information retrieved as well as increase percentages of accuracy. 2.2.1.2 Test Collection Test collection involves collecting answers provided by the system during querying. Bilotti and Nyberg (3)used AQUAINT corpus to test the 109 questions that were queried. This experiment used verb recognition in identifying the text that was related to information to be retrieved. In this system question which does return answers are reframed or their verbs are changed in order to have possible answers. Bilotti and Nyberg (2008) cited that a set of document-level decisions was all set through manual ascertaining of whether all sentence corresponding the TREC-provided answer model for the presented question. This is according to the resolution that an answer-bearing text entirely contains and braces the answer to the question, without calling for deduction or collection outside of that text. Furthermore, questions without any answer-bearing sentences were eliminated from the experiment aggregation. Bilotti and Nyberg (2008) afterwards manually redeveloped questions with the intention that they control predicates. Fundamentally, reformulated questions are applied as key to have accessible and high-accuracy text retrieval components. Figure 4: The Ephyra QA System at TREC 2006 (Liu, Chen and Kao 316) This is the case when the system fails to return answer for queries made. However this compromises the accuracy of the information retrieved if some verbs are included which are not friendly or related to a query to be made. 2.2.1.3 High-Accuracy Text Retrieval High-Accuracy Text Retrival was made possible by Liu, Chen and Kao (2013) altering Open-Ephyrasystem. The system that was altered had high-precision text retrieval component that was developed for modified Indri search engine. This system had similar textual retrieval component with Open Ephyra but dependent on input key words. The system had features which helped in retrieving required information within a shorter period and reduced number of errors that were prone to OpenEphyra. It contained components that had similar features with OpenEphyra as well as high precision answer retrieving components. This system ranks retried information into order of relevant information contained and when a query had multiple questions the results showed documents which have high probable answer. The high-accuracy text retrieval component maintains storing of extent son behalf of sentences, aim arguments and verbs and cited entity forms as domains in the catalog (Kang, Liu and Zhuang 13). At query time, restraints on these domains can be proved by means of prepared query operatives. In essence high accuracy textual retrieval systems useverbs which are structured into machine acceptable, language as well as structured in manner the text extracted is understandable. The following example has been given by Bilotti and Nyberg (2008)as exhibited in Indri syntax, ‘Who was the first human being to sail across Pacific ocean? Accessible question #merge [sentence] (#any: person first human being to sail across Pacific Ocean) Top-ranked outcome Portugueseskimmer Ferdinand Magellan was the firstnavigatortosail across the Pacific Ocean. Second-ranked result He sailed acrossPacific Ocean in 1519. High-accuracy query#merge [sentence] (#max (#merge [target] (scored#max (#merge [. /arg1] (#any: individual))#max (#merge [. /arg2] (#max (#merge [target] (sail #max (#merge [. /arg1] (Pacific Ocean))))))Top-ratedoutcome[ARG1 Portugueseskimmer[PERSON Ferdinand Magellan]] [TARGET becomes](pertinent) [ARG2 [ARG0 first human being] to [TARGET sail across] [ARG1 [LOCATION Pacific Ocean]]]’ (Bilotti and Nyberg 3) It can be proved that high accuracy textual retrivial employed by Liu, Chen and Kao (2013) only helped in isolating some key words but did not improve answer generation for queries as failure played important role in ensuring that in formation generated was not 100% correct. High accuracy retrieval involves the use of high precision text retrieval module that supports storage of varoius key words that are used during retrieval system. It has the ability to store and monitor the time spent during retrieval process.querying is done by combining sentences as well as ranking them. After the questions have been ranked, they are encoded in a form of predicate thus argument structure. This improves the accuracy of the information retried (Bilotti, and Nyberg, 5). Summary: It has been argue that text retrieval in isolation may not provide high improvement to QA system accuracy, owing to the generated passages from the text retrieval in isolation could not be handled by different answer extraction modules(Bilotti and Nyberg 4). This led to shortage in QA performance, thus more research are required to combine the mechanism of information retrieval and answer extraction. 2.3 Answer Extraction Information retrieval is concerned with retrieving documents that have relevant information to the user’s question. When a user queries an answer is extracted in form of a number of documents containing relevant information. In extracting information, QA system enables the users to extract information from such engine. The answer extracted from a query depends on the type of information required and it involves various components. It brings with extraction of passage through entity extraction, entity classification, query classification, and entity ranking ( Abney,Collins and Singhal 2). Passage retrieval help in identifying relevant document that have passages relating to the query while entity extraction assist in extracting relevant information from passages. Entity classification groups entities into various classes and query classification identifies into which category the query falls into. Abne, Collins and Singhal (2000)study, introduce a novel information retrieval system called Smart IR, which retrieves much lesser information piece to retrieve ranked passes pertinent to the query. Additionally, it retrieves explicit answers from document collections. According to Abney Smart IR is used on the TREC-8 question-answering track. Essentially, passage extraction involves recognition of pertinent documents that probably contain the question’s answer. In reference postulated approach, it begun with searching potential passages that has the query’s answer and afterwards Smart IR is employed to retrieve documents set, which are pertinent to the question. Fundamentally, Abne, Collins and Singhal (2000) describe passages as imbrications sets that have a sentence and two transitional neighbors. In this regard, reference calculated the passage i as Whereby Si, represents the sentence j score, which is the sum of IDF continuous terms weight that it shares with the question, plus a supplemental bonus for words pairs (bigrams) that the both the query and sentence share. Figure 5: Sample Sketch of an SMART information retrieval system (Quan, Wenyin and Qiu 1112) The figure above shows how information is retrieved from a database. It involves making a query which is categorized and weighed so that it can be compared with database questions. If similarities are found, semantic mapping is developed which is followed by question vector. after question vector has been developed, then similarity index is calculated. 2.3.1 The Question-Answer Database (QUAB) Model for Factoid Question Answering A novel structural design was developed for improving accuracy in information retrieval and been introduced by Hickl (2008). This system retrieves information in stockpiles where an excellent answer is identified from a pile of documents that has been retrieved.Introduce a novel structural design for QA, which leverages a new depiction of the information stockpiled in a document set in order to extract the excellent answer in reaction to a submitted question. The Internet has always been a good place to look for answers to questions. According to Vila, Maźon and Ferŕandez (2011), while set of frequently-asked questions (FAQs) have been a well-liked method for content suppliers to distribute data with concerned users, society- established question-answering websites, like Live QnA, Yahoo! Answers, Google Answers, or Wondir have of late advanced one step extra. That is enlisting the society assistance for users with the intention of generating a big, open-domain FAQs, which can be explored or browsed through the web. Essentially, while these online sites continue to offer a novel means for users to collect data correlated to a specialty, they endure several of the similar restrictions met by time-honored search applications. Foremost, just like any information extraction module, Hickl (2008) expect that the functionality of community- established QA sites is in due course partial of the quantity and value of the question-answer pairs (QAPs) held in a certain database. Subsequently, Hickl (2008) anticipates that approximating whether an exact answer to user’s question held in a corpus will persist to linger a dispute. In numerous examples, systems will persist to extract answers if there is some QAP, which holds attributes from the inventive submitted question. In this regard, Hickl (2008) expect that access to a set of unified question-answer pairs (QAS) of this mass could considerably alter how QAS extract answers to questions. Rather than retrieving answers from collection of extracted texts, because the systems would find precise answers by recognizing the one or more pairs from this set, which come closest to exhibiting the information requirement articulated by the user’s presented question. Figure 6: The workflow of automatic categorization of question (Kolomiyets and Moens 5429) Figure 6 above shows how information is being retrieved from answer query system. The diagram indicates clearly how the system operates from time information inquiry is made to the time information presentation is made in a transformed manner. The system begins by receiving an enquiry about some information where it prepares itself for retrieval from the storage system in a form of database. Once the information is retrieved from the database, it is represented in a transformed manner understandable by the user. 2.3.1.1 Developing the QUAB Hickl (2008)describe QUAB as a subjective directed graph of the formula G = (V,E) that entails V nodes matching to the questions sets (qi) and answers sets (ai) stockpiled in a hefty gathering of question-answering pairs. (Q:{(qi,ai), ..., (qn, an)}) (Hickl 1263) E subjective directed limits matching to the connections sets, which can be deduced to clasp amid any two nodes from V (For instance, (qi → ai), (qi → aj), (ai→aj), (qi → qj)) (Hickl 1263) Hickl (2008) presume that a query node (qi) matches to any attractive, normal language query for which there subsists no less than one entity, word or phrase which signifies an entire and suitable answer to qi. Answer node (ai) is described as a pair (ri,si).Where ri symbolizes the word, phrase, or entity, which fulfills some factoid question qi and si,symbolize the sentence, which both have a mention of ri and offers the perspective essential to bracer acknowledgment as a suitable answer to qi. In this regard, reference presume that the identical sentence si can generate n applicable question- answer pairs ({(qi1,ai1), ..., (qin , ain )}, where n is the number of candidate answers r €s. furthermore, reference make use of two information forms to surmise limits connecting nodes pairs in V (Hickl 1263).This is achieved through presumption that a directed limit widens from a question qi to its matching answer node ai (qi → ai) for all question-answer pairs held in Q.In addition, Hickl (2008) suppose that directed limits subsist amid any nodes pairs (x, y which is the graphical coordinates of x-axis and y-axis), which depicts extended substance. Consequently, Hickl (2008) employs system for identifying textual with the purpose of approximating the probability that the node y content can be gathered from the other node x content. The equation below summarizes various assumptions that were pursued in reference study to develop their proposed recognizing textual entailment (RTE) approach. w (x→y) = If pTE (yes, (x, y)) ≥ λTE If x = qi and y=ai Means elsewhere (Hickl 1263) “Since textual entailment (TE) relations are (by definition) asymmetric, we assume that an edge (x → y) exists if x |= tey. we assume that an edge (x → y) exists between two nodes (x, y) iff the probability that x textually entails y (pTE(yes, (x, y))) is above a pre-defined threshold _TE. (N is used as a normalization factor to ensure that all of the edge probabilities sum to 1”(Hickl 1263) 3 Conclusions and Research Proposal 3.1 Conclusion The aim of the study was to design a system that will improve the accuracy of answer extraction in question-answer system. This will reduce the amount of documents that are retrieved by QA system as well as assist the user locate the actual answer to his query to a text or passage within the retrieved document. This is important as the amount of documents is reduced and unnecessary passages are ignored. Improved corpus- based question answering system that uses natural language and surface/ lexico syntactic patterns will be used to improve the quality of information extracted from such engines (Abouenour, Bouzouba,andRosso 45). The assumption here is that, the improvement to extraction of information is interrelated with system performance. Therefore an attempt has been made by the current research to improve the system so as to reduce the amount of extracted documents and provide documents that are to the query asked. Therefore the role of answer extraction is critical to this system since irrelevant passages are not highlighted. The system concentrates in extracting answer later than extracting the text that has relevant thus reducing the amount of time that the user takes to extract the required solutions. The system will have the ability to check semantics and linguistic of QA system in order to improve retrieval quality. The paper tried to improve semantic aspects in information retrieval so as to improve the quality of answers obtained by QA system. The uses of semantics will not only the quality of information retrieved but also have high level of completed passages during retrieval. The system will ensure that the information retrieved is accurate by including semantics to key words for retrieving. It will also rank retrieved documents in order of similarities to the required answer (Abouenour, Bouzouba,andRosso 50) 3.2 Research proposal The literature survey intends to propose a QA system that will improve accuracy of answer extraction by reducing the amount of documents retrieved as well as ranking the documents in order to improve corpus management, text cleaning and fixed-size passages retrieval. The proposed system is Goal-Driven Answer Extraction which generates an answer from a user query and source documents (Laszlo, Kosseim and Lapalme). The Goal-Driven System is consisting of two components which return (1) long answer with exactly 250 characters, and (2) short answer with no more than 50 characters. Moreover, Laszlo, Kosseim and Lapalme (2011) have proven that extracting accurate answers primary depends on the set of documents which are generated by IR process. However, the proposed system does not implement its own IR process. Therefore, we aim to implement Goal-Driven system with SMART IR process whichretrieves much less information piece to retrieve ranked passages. For this purpose, we will improveSMART IR paradigm by implementing syntactic structures for passage ranking (Aktolga, Elif, James Allan and David). Answer extraction in the proposed system will score goal-type expression along with passage ranking to generate the final answer. Resting on these algorithms, we will develop and enhanced Goal-driven paradigm with no required for active participants to extract the answer (Jijkoun, de Rijke and Mur 2). Furthermore, we will experiment with query expansion relying on the form of a query, and based on a TREC corpus, we will display that my proposed approach offer a performance better to the typical techniques. 3.3 Evaluation To evaluate the performance of the system that is proposed, extensive experiments will be carried out to compare the current Goal-Driven QA system with the proposed in terms of passages retrieval and answer extraction. The number of documents generated from the two systems as well as the accuracy of the system within the document will be compared(Laszlo, Kosseim and Lapalme 2). This will provide an avenue for evaluating whether the proposed system is more accurate than the current one. The method of input will also be evaluated since the two will differ as to the passages will have different ranking. The proposed system is expected to question oriented as compared to answer driven. This is expected to improve the QA system. This will further improve to the current accuracy of answer extraction in a question answering system. 4 Reference 1 Read More

Improving Accuracy of Answer Extraction in Question-Answer System - Research Paper Example

Extract of sample "Improving Accuracy of Answer Extraction in Question-Answer System"

CHECK THESE SAMPLES OF Improving Accuracy of Answer Extraction in Question-Answer System

Task Management

Implementing a Kiosk at POS

International peopel management

Theory and Practice of the Business Analytics

Knowledge Management, Organizational Trust, Creativity and Innovation in UAE