StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Database Design for Existing Dataset and Analysis - Business Plan Example

Cite this document
Summary
This business plan "Database Design for Existing Dataset and Analysis" focuses on the project whose main aim is to develop a database that can provide services seamlessly. Linking of data has become very necessary due to the dynamic demands of clients in how they want to access data. …
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER93.9% of users find it useful
Database Design for Existing Dataset and Analysis
Read Text Preview

Extract of sample "Database Design for Existing Dataset and Analysis"

Database Design for Existing Dataset and Analysis Location Table of Contents 3 Hardware RequirementsSpecifications 5 Technical Hardware Specifications 5 Introduction 7 Relational Databases 8 UML Database Diagram Design 11 Database Analysis 12 First Normalization 12 Second Normalization 13 Third Normalization 14 Conclusion 15 References 15 Abstract Linked data platforms have been increasingly used for the last half of a decade as many people realized that they could now obtain any information they needed over the internet. This forced the developers to find a way in which to relate data so as to give a results-oriented data delivery design. This database design project explores the implementation of a similar project that can delivery data to the end user and at the same time be able to generate reports when queried. The database is expected to use the existing linked data platform to be able to realize its objective seamlessly. Requirements for implementation of the System Database Support Software With regards to the data that will be used on the database, inclusive of the retrieval requirements of this data, a number of criteria were noted. These criteria have to be met by the database software. The software’s minimum requirements are as follows: 1. It should allow for SQL scripts that are stored to run. A number of processes can be automated by the used of stored scripts. Editing, querying, updating, facility management among many other processes is examples of these processes. This control becomes even more important if user access is being done through the internet. It, therefore, becomes easier to both call and run a stored script than to code a script before running a process. 2. The software should be a programmed in a way that allows the restrictions on the values of data that are entered into the table columns. 3. Multiple index creation should also be easily implementable with this software. A single index should be allowed to be reused in many columns. This will allow for quicker querying and sorting process for the many parameters that the database presents. The prevention of duplicate data across many columns is achieved by the creation of duplicate data. 4. Simultaneous multiple user accesses to tables should also be allowed. This is because data will be accessed from a number of locations via the web browsers. Concerning this, it is, therefore, be very important for users to be allowed to have simultaneous access to data. 5. The software has to have a relational database model support. This standard is provided in the industry. It helps in the integration with in other technologies that are present in the various locations that this data will be accessed from. 6. The software should allow for the creation of various views on the entered data. This will allow for a minimal amount of data to be stored, with the rest being placed in an unlimited number of virtual outputs. Through views, calculated values are easily viewed while at the same time negating the need for any additional columns in the created data tables. Besides this, views allow the creation of customizable data views through the linking of multiple tables. 7. The software has to have a replication method across the servers it rides on. This is so since the data is going to reside on two different servers. 8. The software should allow for data table triggers. This process enables the predetermination of actions that are to be taken in the case of deletion or entry of information into data tables. 9. The software should allow for the entry of data over the internet. 10. The software has to run in a Linux environment. Based on all the requirements that are listed above, the most suitable software combination was found to be Linux’s LAMP package and MySQL. These software products fully meet the requirements that this implementation needed. MySQL was found to have a fast web service data processing. The other option that could as well be utilized is the use of PostgreSQL, which is open source and saves heavily on costs that would be incurred in the implementation process of this project. Hardware Requirements Specifications The server will be used be used both as the database and web server. A chassis that will accommodate the LTO tape drive that is internally placed has been chosen to realize this. This server is accessed via the internet. The hardware and the software have similar configurations. The hardware manufacturer has to be globally known and highly reputable. Technical Hardware Specifications These are the detailed requirements that the server should meet. The specifications are primarily determined by the database size, project life expectancy and the user numbers. It was assumed that the project has a life expectancy of 5 years. The specifications are as outlined below: Item Description Side Bus (Front) 300MHz Memory Expandable to 8 DIMM sockets with board configurable to up to 6GB Hard Drive 5 GB with 6000 rpm Diskette Drive 1.5 MB Monitor 14 inch Optical Drive CD-ROM/DVD ROM Graphics Card 12MB RAM with integrated controller Cache 1024 KB Expansion Slots 4 (2 X 64bit/100MHx, 2 X 64bit/133MHx) Processors Dual Core 1.8GHz Mouse Scroll wheel, left button and right button Keyboard Standard Network Adapter Load Balancing Support; Failover support; Up to 100mbps connection speed; Dual Port. Hard Drive Backplane On board RAID allowing for 5 drive connections RAID Controller Embedded RAID; Can handle RAID 5 and RAID 1.k Memory 1.5 GB 512 MHZ Secondary Controller Compatible with Tape’s Backup Unit Power Supplies 1000 watts Voltage: 200 VAC Chassis Tower Item Description Operating System Ubuntu Linux 10.04-14.04 versions Updated Drivers NIC OS Documentation Ubuntu 14.04 Linux Documentation Software for Management Administration Access Remote management Automatic Diagnosis Management Features Facilitate installation, configuration and system set-up Paging or email support Alarms for faults management System resource monitor Inventory Server resource management RAID Controller Array management Environmental Parameters 25o C average operating temperature 12% - 75% operating non-condensing relative humidity 7% - 90% operating non-condensing storage humidity UPS Stand-alone 2100VA/1200W UPS 200 V to allow for 45 runtime minutes with half-load. Its features: Frequency: 40/50 Hz +/- 2Hz for auto-sensing 2.5 hours recharge time Leakproof Replace or repair warranty (2 years) Noise Filtering Zero clamping Optional EPO 360 joules energy surge rating User Manuals Installation Guides Table 1: Hardware Technical Specifications Introduction As consumer data content demands keep changing, there has been a growing need for developing systems that are adaptive in nature. That is, systems that can be molded easily in order to fit the consumer’s needs. This has also been the case for systems that allow access to the internet. The integration of different technologies has become increasingly important when addressing the constantly changing consumer demands on these platforms. This is a design project whose main aim is to develop a database that can provide services seamlessly. Linking of data has become very necessary due to the dynamic demands of clients in how they want to access data. An example is BBC’s data content provision system. The system links BBC’s audiences with its applications by enabling access to these applications. A number of access protocols are used to achieve this. The database design project covered in this project uses a number of accepted protocols to realize a database that acts as a linked data platform. This project will realize a database design whose scope is not limited to a geographical area since it can be accessed through web interfaces using PHP and running on an Apache Web Server. The data are stored in MySQL on a turnkey LAMP server running Oracle’s Virtualbox. The design will allow for easy data access and entry across a number of separate places and continents. The main driving force behind the implementation of this project was to realize a design that could be efficient, easily normalized, scalable and have a high ease of both data access and data entry. By doing this, it shall have achieved the main objective of this design process. Relational Databases Relational databases are the item of discussion in the entire paper. As a matter of fact, the database that will be designed is a relational database. This type of database was the result of a presentation that was made by Edgar Codd in 1970. It is from this paper that the basis of the biggest relational database company, Oracle Corporation, would be formed. The system was independent of any platforms, which gave it advantage over all other existing database systems. Relational database systems are realized through a series of records of tables that have specific attributes. These attributes are then linked with other tables through the sharing of primary keys that are referred to as foreign keys when shared. With such a system, the various items in different tables can have access to one another depending on the permissions outlined in the sharing procedure. The primary key uniquely identifies a given item in a table. This data storage approach proved to be more efficient than any other because it significantly reduced the amount of disk space that was required and at the same time increased the speeds of access to database records. The relational database is manipulated using the structured query language. This is commonly referred to SQL. SQL was formalized by the American National Standards Institute (ANSI) in 1986. The most recent revision of SQL happened in 2011. The language contains a direct relations algebra and is very user-friendly. Database servers that use the relational database system are, for instance, the Microsoft SQL servers and the Oracle servers. Microsoft Access comes with many features of a relational database. This is, for instance, the ability to use SQL in the implementation of its projects. However, it is not necessarily a relational database software system. Understanding BBC Ontologies Data Organization Prior to the launch of the SQL Server 2008, storage and management of unstructured data was very tricky. Unstructured data refers to data that has no predefined organization. This kind of data is normally text heavy and is the most commonly used data form in carrying and relaying information. This data could be stored as an IMAGE or in VARBINANRY form. The result of this was that this would ensure consistency in transactions and reduce complexities that are presented due to management. However, this leads to reduced performance. The process often used storage of this data in disk fields that would then be linked to structured data presented a lot of complexities and overhead inconsistencies. This is the same case for data that was to be accessed over the internet. Since unstructured data was the most accessed data, it presented a lot of complexities in its accessibility. BBC provides this kind of data to many of customers spread across the world. This brings in an issue of both efficient and seamless content in spite of the data structure or the client location in the world. BBC can do this via its ontologies covering different areas that its customers may find interesting. Its audience can enjoy BBC Education, BBC Sport, BBC Music, BBC Concepts, News Projects among many other things through this ontology that are provided by BBC. These ontologies are the basis of Linked Data Platforms, the subject matter of this paper. The advantages that ontologies present to a business entity are numerous. One of the most notable advantages is that they can be expanded with the business requirement and the consumer needs. With such a platform in place, a business can be able to invest confidently in more content production while having the guarantee that this product will be delivered to the consumer efficiently. Ontologies rides on the existing platform that Linked Data platforms presents in delivering this plethora of services. Linked Data is simply the linking of web contents so that a person can be able to explore or enjoy more of what they are already looking at. That is, once a user chooses to view some content, more content will be generated that is similar to the topic of study the client is currently consuming. Link is a relationship that is created between resources that refer to another via the use of uniform resource identifiers (URIs). While URIs applies for the case of hypertext documents primarily written in HTML, RDF (Resource Description Framework) describes links used in connecting similar data resources. A number of conditions have to be followed to achieve the linkage of data resources. The first condition is that, resources will be identified by the use of URIs. By using names instead of digits, one can relate similar resources easily. Secondly, web information should be served against a URI. Through such a system, classes and properties of information contained in OWL, RDFS, and RDF ontologies can be easily accessed. These are the main conditions that have to be met to realize a linked data platform among others. UML Database Diagram Design Fig. 1: UML Database Design Database Analysis The analysis of the SQL data involves the process of interlinking related data so that simpler models of access and organization are achieved. A number of variables will be analyzed for this process as listed below: pid start_time end_time epoch_start epoch_end complete_title media_type masterbrand service brand_pid is_clip categories tags The analysis of the SQL file involves the breakdown of the data into bits that can be understood and organized in the database system. The process of breaking down related data is called normalization. Normalization has different levels. These are: First Normalization Second Normalization Third Normalization First Normalization In the first normalization, the data will be linked as it generally appears on the UML design above. That is, tables are drawn and attributes are linked in a general way as to how they are related. For instance, since media types is the most commonly appearing entity, it will tend to appear as linked in the first normalized tables of this database as shown below: Media Type Brand Service Time Media Type Category Tags Epoch pid Masterbrand Is_clips Complete_title It is worthy to note that the relations are more focused on the main entities in the database. Second Normalization The second normal form of the database will be the creation of tables that further relate the attributes that are identified in the first normal analysis. In this case, the relations will normally enter into the attributes that a given entity has. For instance, this entity cab be time. Under time there are about four attributes: start_time, end_time, epoch_start and epoch_end. This is shown below: Media_type Media_type_id Category Time Start_time End_time Epoch_start Epoch_end Service Media_type_id Service_Name Service_Type Brand Masterbrand Brand_PID Brand_Name All these attributes, in spite of bearing exclusive features will be placed in the same table, as long as they are in the same entity. This type of normalization moves from the overall level to the entity level. Third Normalization In the third normalization of the database, the attributes themselves will now be defined. That is, tables that will focus on having a single unique feature for a given attribute will now be fabricated. An example is in the case of the time entity. It has both epoch and time attributes. When normalizing this form of data in the third stage these two unique features will be separated to form two different tables as in the figures below: Time Start_Time End_time Epoch Epoch_Start Epoch_End Service Service_Name Service_ID Brand Brand_PID Brand_Name Normalization is instrumental in achieving data that is easily accessible to all. Conclusion The data was ably analyzed through the normalization of its tables into bits that a user could easily relate and even send queries to. The data was r linked to a web-based platform where a PHP code enabled the generation of reports for any queries that a user needed. The entire experiment was a success. References BBC (n.d.),”Ontologies”, BBC Online. Available at: http://www.bbc.co.uk/ontologies (last accessed March 5, 2015). Read More
Tags
Cite this document
  • APA
  • MLA
  • CHICAGO
(Database Design for Existing Dataset and Analysis Business Plan Example | Topics and Well Written Essays - 3000 words, n.d.)
Database Design for Existing Dataset and Analysis Business Plan Example | Topics and Well Written Essays - 3000 words. https://studentshare.org/information-technology/1861477-db-design-for-existing-dataset-and-analysis-a
(Database Design for Existing Dataset and Analysis Business Plan Example | Topics and Well Written Essays - 3000 Words)
Database Design for Existing Dataset and Analysis Business Plan Example | Topics and Well Written Essays - 3000 Words. https://studentshare.org/information-technology/1861477-db-design-for-existing-dataset-and-analysis-a.
“Database Design for Existing Dataset and Analysis Business Plan Example | Topics and Well Written Essays - 3000 Words”. https://studentshare.org/information-technology/1861477-db-design-for-existing-dataset-and-analysis-a.
  • Cited: 0 times

CHECK THESE SAMPLES OF Database Design for Existing Dataset and Analysis

Database Theory and Design

Database Systems: A practical Approach to design, Implementation and Management.... The consulting company need for developing a database system to computerize its operations management requires that we first decipher the relationships between the different entities involved in its day to day business.... By virtue of being an associative entity, ASSIGNMENT inherits the primary keys from PROJECT and CONSULTANT which form the composite primary key: The consulting company need for developing a database system to computerize its operations management requires that we first decipher the relationships between the different entities involved in its day to day business....
2 Pages (500 words) Coursework

Low Cost-Inventory Control System

For that purpose you create a database for your small business “Cloth-Shop”, which provide you such environment from which you can handle the process of inventory and accounting at the same time by investing nominal cost on it.... Ms-Access, a database system is basically just a computerized record keeping system used to create such programs where transactions are needed because it keeps the data in the form of tables on which a user can perform variety of operations easily and quickly....
3 Pages (750 words) Assignment

Principles of Database Design

As information technology systems have become the focus of all areas of business,… A Business Continuity Plan identifies a business' vulnerability to both internal and external threats and manufactures soft and hard assets that offer efficient prevention Principles of database design Principles Of database design Disaster Recovery and Business Continuity Plans Business continuity and disaster recovery refer to the contingency plans and measures designed and executed by a business to guarantee operational flexibility in case of any service interruptions....
2 Pages (500 words) Essay

Analysis of Customer and User Needs

Although not directly related to database design, another feasibility issue that could impact on how well the design is implemented and how many essential design features are incorporated, is the budget.... It is therefore essential to minimise data redundancy and lack of coherency issues at the design stage through normalisation and ensuring that relationships between the relational tables are clearly and appropriately defined.... The proposal is certainly feasible and possible because a relational database such as MySQL can easily handle the data storage and retrieval complexity required, PHP can be used to facilitate data entry and retrieval, and the use of CSS can help to ensure consistency in the… However, a number of possible redundancy and coherency issues and other challenges should be acknowledged and dealt with, which are detailed below. In addition, an issue of data coherency could arise if care is not taken to different data pertaining to the same entity For example, if a patients current condition is required to be known, then all the necessary particulars of the patient will be required along with details of the treatments given, the response of the treatments and general health progress....
2 Pages (500 words) Essay

Creative Clusters & Gentrification with focus on the Hoxton area in London

It is a moderately big company with two outstation offices and 2000 employees from diverse cultural background.... The previous manager has worked for last few years.... During his tenure, the sales and profit has fallen.... Employee turnover… The previous manager has been replaced.... To revive the business, first problems should be identified and then steps should be taken in different departments to improve the business performance....
12 Pages (3000 words) Essay

IC Insights - Dataset Analysis

An analysis of this dataset may assist the reader to form a clear understanding of the market growth of each company over the 2011-2013 period.... Generally, contents of a single data matrix are recorded in a dataset to give the user a comprehensive view of a particular topic (Mirer, 2014, n.... A dataset is prepared using relevant portions of a database.... In a dataset,… Each value of the variable in a dataset is called datum (Johnson, 2000, p....
12 Pages (3000 words) Essay

Dataset for Salary

The assumption of matched participants did not change the hypothesis or assumption of a directional relationship, so it did not change the type of t-test; however, equal variances could be assumed with this dataset for Salary dataset for Salary A two-tailed t-test with unequal variances should be performed for the sample data.... No, the dataset could not be from a true experiment because it was not randomly assigned.... A quasi-experimental design was undertaken in which male participant's salaries were featured in category A and female participant's salaries were featured in category B....
1 Pages (250 words) Essay

Improving Electronic Store Database Design

Improving Electronic Store database design Insert Insert Improving database design through Normalization The tables are already in 1, 2, and 3rd Normal Forms.... They are as follows: Customer Table (Customer ID, Customer Name, Address, Residence) Sales Order Table (Sale ID, Date, Units Sold, Customer ID, Social_Security_Number) Employees Table (Social_ Security_ Number, First_Name, Last_Name, Address, City, State, Zip_Code, Birth_Date, Full_Time, Part_Time, Salary, Date_Hire) The tables exist in the three normalization forms because they meet the criteria to existing in those states....
1 Pages (250 words) Essay
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us