StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Information Extraction System Using Keyword Matching - Report Example

Cite this document
Summary
This report "Information Extraction System Using Keyword Matching" presents an information extraction system by use of keywords. The system can easily be adapted for different hospital or clinical departments. The overall accuracy of the system is difficult to ascertain without an extensive review…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER93.5% of users find it useful

Extract of sample "Information Extraction System Using Keyword Matching"

Name Course Title Table of Contents Information Extraction system using keyword matching 13 1 Target Population 3 Project Duration 3 Current Situation (Problems and Needs) 3 Justification and Benefits 4 Benefits 5 Project Objectives 6 Project Description 6 Algorithm 9 Interface and Functionality 10 Project Management and Organisation 10 Expected Result and Envirnmental 11 Evaluation 11 Conclusions 12 Future work 12 References 13 Target Population This project will be implemented in a hospital with a population over 1000 people, which include patients and workers. The system will manage information systems for in-patients and out patients, research studies, and for management of personnel. It will also be used to extract necessary medical information like disease status and prediction, medical document, knowledge discovery, disease diagnosis, treatment and prevention. Project Duration This project is expected to take three months taking into account the collection of data, processing, implementations and other management requirements. Current Situation (Problems and Needs) Dealing with in-patients presents a big problem in the hospital. Patients go to the hospital at most once per day, but their records are kept active for several years. A patient may stay in the hospital for a short time, and during that period data entries and retrieval may occur within a short time; usually in minutes. In-patient services are therefore active for a short time frame than for the out-patient services. Input, processing and output of data has to be fast, but the data may not remain in an active state for a long time. The information extraction system in the hospital has to be managed in a new way to provide efficient services to the people (Carter, 2001). Rapid communication due efficient information extraction, will lead to reduction in the time of stay and omit redundant diagnostic instructions. Another area which needs information management in the hospital is clinical studies (Palley 75). Since information is required in control clinical trials and in open studies, careful management of data if crucial. As studies become larger in terms of observation period and population, a systematic management system becomes essential. Actual database management system is in use, with well defined manual procedure, to complement the traditional facilities with database tools like data verification process, dictionaries and periodic backup. There are still inefficiencies in time management (Bigelow, 2005). Searching for clinical data from patient’s records has remained a major obstacle to taking full advantage of clinical information system. One of the strategies used to solve the problem was to enhance the availability of data by training the authors of clinical reports to submit structured data. Nevertheless, for clinical objectives, the strategy is too restrictive and other alternatives have been sought to fit the author’s needs. Such alternatives include computer generated narratives that contains essential clinical data that may not be found anywhere else in the document and the problem of hidden information remains (Bigelow, 2005). Information extraction by use of keyword matching is the process of extracting user text from a set of documents. The goal is to capture the information without sacrificing within sort duration. This strategy is easy to implement than natural language processing system due to the fact that synthetic characteristics and language understanding may not be necessary. although natural language processing is one of the most promising system for extracting clinical information from clinical narrative, information extraction that do not rely on full parsing has also demonstrated good results, specifically when used in domains in which language displays more regularity and are limited in scope (Carter, 2001). Justification and Benefits The aim of this project is to draw a methodology for managing information systems that will present an efficient and easy way to find accurate clinical data. It will enhance the preventive measures for the patient disease by providing the most recent and detailed information from medical journal articles or the computer without spending much time on searching. Information extraction system is used to identify or find the desired information from a document which is done by converting the keywords into a database with specific fields. The database is composed of data, and programs or software to input and process the data. The data and the program which can be in the form of software are stored in the computer which supports the database. Technological tools and programs subsystems are used as the necessary components in the software database (Chung and Murphy, 2005). The subsystems are: a. Storage Systems that allocate and manage space, particularly llarge storage devices like disk and tapes. b. File access software that update and access software data that are referred by the machine and the user. c. Data Languages – include programs that allow the user to extract and use the data conveniently (Chung and Murphy, 2005). This will enable health professionals to access accurately important information within a short time. Training of staff will also enable the user to make use of this technology easily. Benefits The system is effective in the fact that from a single, comprehensive database a lot of information that is relevant to various organizational purposes is obtained. Data sharing through the system will promote information consistency which is important for decision making as well as reducing duplication of collected data. Another benefit of the system in health care is the application of the information in the service management and allocation of resources required for the services, but it is also important that there is communication through information sharing by health providers, and the medical care validation from the observations on patients. Information extraction system and the use of database facilitate the organization, collection, storage and data processing. It enables processing of data from a variety of sources and therefore presents information that may have been available before (Dicheva and Dochev, 2010). Project Objectives (a) The relationship to national development objectives, To enhance information sharing and ensure good communication, efficient information extraction system is required. This information extraction system will enhance the management of the information in the facility. The use of this system wills the government to capture the population data for the diseases that are so costly and prevalent in the nation. Sometimes specific legislation is focus on them the use of this system will help in policy and health care delivery (Dicheva and Dochev, 2010). (b) Sectoral objectives The system will enhance communication of information in the sector and hence improve management of information and personnel. In addition, it will help overcome the difficulty of comparing and managing large amount of information. (c) Immediate objectives, which are quantifiable and constitute the basic performance indicators for monitoring and evaluation. The basic indicator is the number time taken to retrieve the required information. it is measured using the percentage of precision and the number of information that can be extracted.. Project Description Keywords Specific keywords are set that will help us to extract relevant, useful and related information in a document. This will reduce the search space and makes the data relevant. Keywords fall in two categories: relevant information and irrelevant information. they categories are shown below. Relevant keywords – these include: proposal, conclusion, experimental results, recommendations, prove and response. Irrelevant keywords – tests, comparison, occurrences, test, report case, number of patients, recent literature, number of occurrence, symptoms, clinical course and periodic information. Tokenization In this step, information management process is to split the text in to smaller units of single word, phrases, sentence and paragraph. A common delimiter is a space and string of random length. This step is done at two levels, one at full stop and the other at paragraph level. The sentence level shows the actual relation of the medical term and the paragraph level shows the context in which the terms have been used. Stopword/Stemming Stopword is developed for the purpose of removing semantically insignificant words. It contains high frequency terms that do not give any useful information and they can be ignored from the text. Stopwords includes phrases, common words and characters. The most common are ‘of’, ‘the’, ‘a’, ‘an’ etc. Stemming is reducing a word to its root. For example ‘worker’ and working’ is reduced to work so that the terms can be detected at the same time. Stemming depends on the language of the text. Medical Dictionary The required medical terms are selected to match all the terms with Oxford Medical Dictionary. Therefore, the term that contains medical terms at single instance or multiple instances in a sentence is captured. Non-medical terms are pruned. POS Tagger POS tagger elaborate syntactic words like verbs, noun, adjective, pronoun in the medical terms. Example of nouns is malaria, lungs etc, pronouns include words that point to nouns, verbs dealt with relationship between nouns, and adjectives shows the strength of relation like severe, slow etc. All the required terms are tagged and passed on to the next stage for processing. Association Rule Mining. This rule is the use of implication of the form A → B, where A and B are medical terms items sets that have some relationship. Items with maximum frequency association will be provided to the user. Disambiguation This is concern with finding the most probable meaning of a word. This is done in context in which the term is used. Words like ill, sick and poor health are moved to the same root as ill. the terms used can supervised or unsupervised. The former is carried out by means of dictionary, while the later is carried out by use of Yarrowsky model. This model is famous with high accuracy. Expert Evaluation An expert will examine the patterns by viewing critically the nature of patterns and rules before presenting to the user. The useful and verified patterns are adapted for use in the health care setting. Algorithm Start. Input. Keywords. Output. Semantic Relationship, Medicine. For any input extract abstract of paper. Find any rule/keyword from Rules catalogue. For all keywords x in Catalogue Y. Where x€Y For each x i (i=1………..n) Tokenize specific sentence Stopwords removal. Do stemming. Extract words from tokenized sentence with medicine and disease. Filtered sentence is passed from POS tagger. Disease/Medicine and their actions are extracted. Medicines are associated and ranked based on frequency and superiority. Multiple synonym actions are replaced. Identified rules are validated by expert. Verified rules and semantic relationships presented to user. Exit. Interface and Functionality The user interface is designed is divided into four sections namely the main page, registration section, user login and retrieval section. The main functions are accessed in the retrieval page. Project Management and Organisation After the database has been prepared, the right decisions for the implementation are made. Some of the issues which need to be tackled include: adaptation to the project and it requirements, reliability, planning for growth and technical updating of the facilities. Database administrator will deal with the operational issues. It will require a strong support from the management and high quality technical personnel who will be responsible for day to day operations (Carter, 2001). The project will be implemented in different categories of health care setting. They are listed below. Maintenance of data records – this is the medical records for inpatients and outpatients. The effectiveness of management of database for is very important. Individual Practice – application for an individual may be through billing and time scheduling. This practice is not economical. For group operation of paramedical personnel and physicians in health care, access to the medical records is generated at multiple sites, but a complete data record must be legible and complete whenever it is extracted. This will aid the health care as the data entry, storage, diagnoses, prescription, procedure and follow up becomes easy (Carter, 2001). The project system should be made reliable as much as possible. To ensure reliability, the following factors have to be provided: central computer hardware, reliable electrical power, easy to understand software, reliable user operation site, and good communication services with consistency and minimum noise (Fries78). These calls for a major management effort to provide a well integrated system which significantly increase the operation cost. Expected Result and Envirnmental Information extraction system helps in the statistical analysis model used for interactive statistics when planning for environment conservation. Statistics packages in medical research are useful in general market and planning, and in specific the research support database system. Evaluation A comprehensive evaluation for this project has not been completed. Nevertheless, a user interview which was conducted indicated that the doctors and other users believe that the information extraction system is convenient for their daily work, since without the system they will have find the keyword or concept by searching all notes until they get the desired content. They mention that it is easy to use since you only type a keyword for what you are researching and you obtain results immediately. The user interface is made to be friendly enough so that the user may be willing to use the new technology actively. Therefore is important to know what the user like and dislike. With the help of feedback, a comprehensive system is developed to serve the hospital needs. The performance of information extraction system can be accessed by use of two measures, namely the number of extracted features, the number of recognized features in a test set. The recall is the number of correctly extracted feature descriptions, divided by all the number of features in the test set. The precision measures the success and the recall measures the sensibility of the algorithm (Dicheva and Dochev 2011). The performance measure F is calculated as follows. F = 2*Precision*Recall/(Precision+Recall) (Dicheva and Dochev 2011) Conclusions This paper present information extraction system by use of keywords. The system can easily be adapted in for different hospital or clinical departments. The overall accuracy of the system is difficult to ascertain without the extensive review. The developers should review and seek the feedback from the users so that the system can be improved to provide the entire user’s need. Future work In future, the project should be expanded to include the root cause of the disease by taking the history of the patient, condition and providing the correct dose. Also, large scale project performances with accurate tests needs to be done to include more data. Manual classifications is done by more people so that intercede error can be measured. If the test is big enough to include trained classification techniques like Bayesian classification, comparison between this approach and other traditional approach can be performed. References Bigelow J. H., Rand Corporation.; et al., 2005. Analysis of healthcare interventions that change patient trajectories, Santa Monica, CA: RAND Carter J. H; American College of Physicians--American Society of Internal Medicine, 2001. Electronic medical records: A guide for clinicians and administrators, Philadelphia: American College of Physicians-American Society of Internal Medicine Chung J. and Murphy S., 2005. Concept-Value Pair Extraction from Semi-Structured Clinical Narrative: A Case Study Using Echocardiogram Reports, from the link: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1560613/ Dicheva D. and Dochev D., 2010. Artificial Intelligence: Methodology, Systems, and Applications: 14th International Conference, AIMSA 2010, Varna, Bulgaria, September 8-10, 2010. Proceedings, Volume 6304 of Lecture Notes in Computer Science / Lecture Notes in Artificial Intelligence, Lecture Notes in Artificial Intelligence, Berlin: Springer, Wiederhold G., 1980. DATABASES IN HEALTHCARE, from the link: ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/790/CS-TR-80-790.pdf Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(Information Extraction System Using Keyword Matching Report, n.d.)
Information Extraction System Using Keyword Matching Report. https://studentshare.org/logic-programming/2048991-information-extraction-system-using-keyword-matching
(Information Extraction System Using Keyword Matching Report)
Information Extraction System Using Keyword Matching Report. https://studentshare.org/logic-programming/2048991-information-extraction-system-using-keyword-matching.
“Information Extraction System Using Keyword Matching Report”. https://studentshare.org/logic-programming/2048991-information-extraction-system-using-keyword-matching.
  • Cited: 0 times

CHECK THESE SAMPLES OF Information Extraction System Using Keyword Matching

Information Systems Development

The police parade system “volunteers” matching system requires that the 5 people to form the parade are brought in and cross-referenced with the CRB to make sure that they are not involved in the policy process.... In the first case, there is a one-to-one matching of the suspect with a “volunteer” while in the second case there is a one-to-many matching of the customer with the possible dates.... nbsp;Which model the designer will choose to employ is strictly up to the database system to be developed taking in mind the properties and the relations of the entities of the database system....
5 Pages (1250 words) Assignment

Future of Cataloging, Resource Description, Indexing And Abstracting

'Google' has helped in this cause a great deal as whenever a user fits in a search phrase it does provide with quality reference links at the side while displaying the actual search results The knowledge base comprises of information professionals who add their valuable info to pool up the system and for all this is for the achievement of a relatively better efficiency.... It has not put a halt to the need for specialist catalogers instead with the current technology at hand it has motivated more to adapt the IT structure… The future of this industry is bright indeed as more need is felt to acquire the information on a timely basis....
10 Pages (2500 words) Essay

Qualitative research- keyword

At this juncture, we are evaluating the application of the word data as a keyword in the context of learning English as my second language (David, 2007).... The social and economic aspect of academics and learning entails the process of integrating information and knowledge into one's own principles in life.... At this juncture, we are DATA-QUALITATIVE RESEARCH The social and economic aspect of academics and learning entails the process of integrating information and knowledge into one's own principles in life....
1 Pages (250 words) Essay

Keyword Research for JEMSS

The site has a statement that keeps appearing under the title keyword research for JEMSS This paper is a keyword research for JEMSS website.... There is also a keyword that links the site with the official website of the institution.... The paper has analysis of the keywords and a paragraph that rounds up the findings of the research. The JEMSS website is easy to navigate and… There are arrows on the left and write that create an avenue for information relevant to what the institution offers flows across the site....
1 Pages (250 words) Essay

Research on Google Trends and Social Media for Primary Keywords

For effective search engine optimization, it is crucial to use to use the right keyword for websites.... In business organizations, it provides the… keyword analysis is the research for practicing the use of best keyword and which would make the website more searchable.... keyword Keywords can be differentiated into two major types according to their importance in a webpage or website.... They are Primary keyword and secondary keywords....
3 Pages (750 words) Assignment

Communication Dashboard of Yelp

However, with the company's ominance in its business line, the growth of the organization as well as the increased data traffic in its website is owed to social media connectivity which enables users to post reviews based on first-hand information or research.... om is to collect information from users all around the San Francisco area and use it to provide customers with market information regarding the businesses and organizations active in the restaurant industry....
14 Pages (3500 words) Essay

The Process of Researching Keywords

This work called "keyword Research" focuses on the process of researching a keyword that involves JEMMS.... he JEMMS platform enables the student to nature and gets prepared for the real business world in terms of popularizing their content with the help of keyword which identifies any business in a search engine especially if a user is desperately looking for products or information that are related to whatever the business deals with.... The system basically wants expertise in the field of social media for all those who have undergone through the system....
6 Pages (1500 words) Essay

DNA Extraction and Segregation of the Compartment of Fruit

Shove it inside the tube of a depth of roughly two shuffles, shelter the filter cloth using an elastic girdle on the upper part of the test tube.... using a constituent which sparks beneath black light, boost the color of the fruit.... This is the "DNA extraction" essay.... hellip; extraction Essay begins with the primary phase necessary in the experimentation, which is the removal of DNA in cells.... A conclusion can be made that DNA extraction does vary as well as appearance....
3 Pages (750 words) Essay
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us