Smart Database Design to Avoid Fault Data Research Paper

Every organization deals with information regarding products, people including employees, s, prospective benefactor(s), who (might)protract organization’s functions and services. Therefore, the crucial information needs to be accurate and stored correctly in reliable storages/databases for its enduring usage. This paper reveales the diverse ways of entering data into databases along with reasons of entered and stored poor quality data in databases and its impacts on the organizations. One of the reasons is improper database design, therefore in order to avoid poor quality data in databases, features of good database design along with guidelines for developing a smart database to avoid faulty data have been provided in this paper. Keywords: database design, data quality, avoiding faulty information, Garbage in Garbage out (GIGO), database normalization, smart database design. Introduction Today, each and every decision from solving particular problem to deciding future of an organization is based on availability, accuracy and quality of information. “Information is an organizational asset, and, according to its value and scope, must be organized, inventoried, secured, and made readily available in a usable format for daily operations and analysis by individuals, groups, and processes, both today and in the future” (Neilson, 2007). The organizational information is neither just bits, bytes saved in a server nor limited to client data, the hardware and the software that store it. A data or information to which an organization deals with is a process of gathering, normalizing and sharing that information to all its stakeholders. It might be difficult to manage this imperative huge information manually. This is the reason that databases are formulated and high in demand. A database facilitates to store, handle and utilize implausible diverse organization’s information easily. A database can be defined as “collection of information that is organized so that it can easily be accessed, managed, and updated” (Rouse, 2006). Developing a database is neither a complicated process nor complex for using and manipulating information stored in it. A database smoothes the progress of maintaining order in what could be an extremely chaotic informative environment. In databases, a collection of information is stored individually and its management entails preliminary indexing of existing data by categorizing the isolated saved information based on common factors (identity). It can be done through assigning values which signify appropriate condition (i.e. national identities, names, cell numbers, etc.). Undoubtedly, if the data gathering and storing process are malfunctioned, the established data will be incorrect as well; this process is known to be as Garbage in Garbage out (GIGO). Quality and accuracy of data are too critical and fundamental for a database developed/maintained by any organization, either the database is developed for achieving a small goal with limited scope or it is a multi-billion dollar information system. It can be said that the value of data is directly proportional to the quality of data. It is one of many reasons that an inadequately designed database may present incorrect information that may be complicated to utilize, or may even stop working accurately. Why Poor data Quality? As there are a number of ways to enter data in databases that include initial data conversion (data conversion from some previously existing data source), consolidating existing database with new database, manual data entry, batch feeds and real-time data entry interfaces, therefore, there are a plenty of diverse root causes currently subsist for storage of inaccurate and poor data quality in databases. Some of them are because of inappropriate database design whereas the others are due to external outage factors. The basis of these errors is a lot more than just stumble-fingered typographer (typo error). Some of the reasons of poor quality data except database design include receiving erroneous data from apparently reliable resources, imperfect information collection procedures (real time), and improper usage of correct data, failure to maintain and update the existing data and blind trust in automated systems. Some outages reasons of inaccurate data are cluster-wide failure, storage failure and data corruption because of media corruption. It is pertinent to mention here that further discussion on reasons of poor data except database design is out of scope of this paper. The most of the problems of poor data quality arise as the result of inappropriate design include redundant data and anomalies. Redundant data is needless reoccurrence of data (repetition of data’s collection). Anomalies are any events in the databases that deteriorate the integrity of the stored data due to asymmetrical or incompatible storage that may occur after execution of database queries include insert, update and delete. While entering data to the database, the design faults do generate problems, the effectiveness of a database can be even more restricted by its improper structure, idiosyncrasy in how the fields are set up and deficiency in consistency of common fields. Impacts of Poor Data Quality Poor data quality has effects on the organization from a minor loss to million dollars loss, from developing customers’ mistrust to losing customers, from internal miniature decisions to unfortunate strategy and policy decisions. To be more specific, poor data quality within a billing system not only compels organization to badly compromise income compilation, payables, and reconciliations but also loses the trust of a customer by inaccurate bill charging. Furthermore, this also impacts the acquisitions that often alter customer-billing contacts without concerning the errors introduced; therefore it directly impacts on accounts and payment receipts when incorporating data from several sources. It can further be elaborated by mentioning an example, an order entry taken by a Customer Service Representative (CSR) updates the manufacturing system (database) with program and article information, but the data values (information) from one database does not fit structurally into the other (inconsistent database design). This would result in data corruption in the database that leads to ineffectiveness and less productivity on both the shop floor and in the storehouse. The correctness of inventory evaluations is directly proportional to accuracy of data. Physical monitoring verification frequently does not equivalent with data in inventory systems (database) due to stolen or erroneousness caused by flawed database design. A proper smart database design and scrutinizing the consistency of data can help recognize foundations of errors and discrepancies. Features of Smart Database Design There are the following five features of a high database and these features can be achieved by designing a high quality database. Accuracy The database is considered accurate if the values stored in the fields of the database are correct. As the organizations relies and presumes that the information that is input into a database is faultless as well as accurate, therefore, the design of the database should be accurate and reliable that will not only help achieving new business ideas but, also plays vital role to promoting the organizational goals. Completeness End user of the database has to be aware of the scope of information to be input and be enormously patent regarding encompasses of a specific information or data. The design of the database should be compelling the user to input complete information rather than partial or incomplete information. Consistency Comprehensive or condensed information is based on underlying fields level data stored in the database. The consistency is as important as the accuracy and completeness of a database. Uniqueness The data in one field must represent on and only one real world entity. For example, a name Shawn Michel Johnson may not be stored in any other field on the database as S. M. Johnson or any other way. Timeliness Updated and current data is always important for any organization; therefore, the stored data has to be updated and current with respect to the requirements of the organization. The end users of the database have to be aware of any variation by a standard update schedule of the database. Real-time information is one of a key components of timeliness. Guidelines for Smart Database Design Database design is the process of developing a comprehensive data model of a database in order to fulfill the requirements of an organization. Numerous aspects have to be well thought-out during the process of designing a database these include, but not restricted to, chronological and prospect data perception, requirements of an organization, security, presentation, cost, and at last, but the most important factors on which this document is based include data integrity, completeness, accuracy and uniqueness. The methodology for developing a database design has the following three (3) main phases (Elmasri & Navathe, 2004). i. Conceptual database design ii. Logical database design iii. Physical database design As the objective of this document is to provide guidelines on the design of the database to avoid fault data, and the logical database design phase is the most critical phase in which the database designer can avoid faulty data by removing redundancies and anomalies in the design of the database, therefore, I would discuss all the phases, but the main emphasis would only be on the logical database design phase (Kaula, 2007). Conceptual database design It is a process of developing an Entity Relationship Model (ER Model) for each actual problem of the real world without considering physical design aspects of the database. According to Connoly (2008), the conceptual database design is “the process of constructing a model of the data used in an enterprise independent of all physical consideration.” The ER Modeling is a graphical representation of the real world problem in terms of entities, attributes and relationship between entities. Where entity is referred to as an object or concept having the attributes, these are properties or characteristics of an object/concept (entity). Logical database design The logical database design is a process of developing a model of the data based on conceptual database design phase, which can then be mapped into storage objects supported by the Database Management System. Most of the problems of illogical, inaccurate and inconsistent stored data are due to the result of two bad design features include redundant data and anomalies in the database. These problems can be addressed in this phase of the database design through the process of normalization, as it eliminates redundancy and anomalies in a table by dividing it (an entity table) in two or more tables and defines relationship between the divided tables (Kuala, 2007; Mannino, 2006). This process also ensures that the data of one table is only dependent on the related data of other table to ensure data is logically stored so that the process of modifying/updating data in the table gets easier and consistent as well. The modeling of data is an iterative process, as the data model constructed in the conceptual database design phase is then refined in the logical database design phase through the application of normalization. There are in total six (6) Normal Forms have been identified by different great authors of database theory and there is another Normal Form defined by Raymond F. Boyce known as Boyce-Codd Normal Form (BCNF) in 1974. These Normal forms are used to refine the design of database modeling developed in the phase before. The process of normalization removes the problems of functional dependencies, transitive dependencies, multi-valued dependencies, join dependencies and composite entity through the application of normal forms. There are certain rules and guidelines for applying normalization that I have discussed later in the document. The application of first normal form (1NF) removes repeated groups of data from the tables by designating a Primary Key. The accomplishment criterion of second normal form (2NF) is to achieve 1NF along with removing functional dependencies. The third normal form (3NF) removes the transitive dependencies among non-key attributes by developing one to many relationships. The Boyce-Codd Normal Form (BCNF) has to be achieved before the fourth normal form (4NF) that removes multi-valued dependencies. Many authors have provided guidelines for refinement of an ER model, however there are some following guidelines provided by Michael V. Mannino in 2006. “Transform attributes into entity types. This transformation involves the addition of an entity type and a 1-M (one-to-many) relationship. Split compound attributes into smaller attributes. A compound attribute contains multiple kinds of data. Expand entity types into two entity types and a relationship. This transformation can be useful to record a finer level of detail about an entity. Transform a weak entity type into a strong entity type. This transformation is most useful for associative entity types. Add historical details to a data model. Historical details may be necessary for legal as well as strategic reporting requirements. This transformation can be applied to attributes and relationships. Add generalization hierarchies by transforming entity types into generalization hierarchy.” The major goals of a normalized database design include elimination of redundant and inconsistent data, uncomplicated representation of information, elimination of insert, update and delete anomalies. Sometimes these goals are also referred to as data integrity or referential integrity. If the goals of the normalization achieved in the logical database design phase, it may eliminate the basic problems that results in faulty data in the database, therefore, we can avoid faulty data in the database. However, the solution for other reasons (given in the above section) of faulty data has not been discussed in this document as those are out of scope of this document. Physical database design In this phase of the database design, the designers take decisions regarding implementation and configuration of the database on the storage. In this phase the designer describes the base relation, creates indexes and organizes the files. The indexes are used to attain proficient access to the data as well as any coupled integrity constraints. Conclusion Today the world is moving around information and accurate information is a key factor that leads the organizations to success, whereas the faulty information leads to failure. Information is impacted by plentiful procedures; the majority has an effect on its quality to a definite degree. As data and reporting necessities of the organizations has been converted into complicated processes, however, the designers of the database are simply producing the required data by incorrectly designing the database. This is one of the reasons of inadequate and faulty data that produces through data redundancies and all their associated data anomalies, thus producing an improper database design reducing the advantage of a database and its applications that even leads to a status worse rather than useless. In short, quality database design avoids anomalies, inaccurate data and redundant data by compelling the user to enter accurate values that helps to achieve a quality data. In order to make sure of a quality database design, the main emphasis should be on data model development by application of normalization through provided guidelines and quality assurance features. Normalization is one of the solutions to avoid faulty data through proper mapping of real world views into entities, attributes and their relationships by utilizing the guidelines provided in the document. Whereas, inadequately designed database produces inaccurate and inconsistent that leads to terrible decisions can guide to failure of an organization. References Connolly, T.M. (2008). Database Systems: A Practical Approach to Design, Implementation and Management (4th ed.). Pearson Education India. pp. 438-470. Elmasri, R., & Navathe, B. S. (2004). Fundamentals of Database Systems. Addison Wesley. pp. 58-97. Kaula, R. (2007). Normalizing with Entity Relationship Diagramming. Retrieved from http://www.tdan.com/view-articles/4583/ Mannino, M.V. (2006). Database Design, Application Development, and Administration (3rd ed.). McGraw-Hill, NJ. pp. 38-123. Neilson, P. (2007). Data Architecture. Retrieved from: http://sqlblog.com/blogs/paul_nielsen/archive/2007/11/25/data-architecture.aspx Rouse, M. (2006). Database. Retrieved from http://searchsqlserver.techtarget.com/definition/database Read More

Smart Database Design to Avoid Fault Data - Research Paper Example

Extract of sample "Smart Database Design to Avoid Fault Data"

CHECK THESE SAMPLES OF Smart Database Design to Avoid Fault Data

Security aspects of network

Physical & Network Fault Tolerance

Social Networking Web Application Development - Sports Pal

Enterprise Web Application Security Issues and Guidelines

Logical and Physical Structures of the Database

The Network Management Systems

Studying at Masters Level and Research Methods

Logical Database Design