StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

DB Design From Existing Dataset - Report Example

Cite this document
Summary
This report "DB Design From Existing Dataset" focuses on a project that aims at modeling a database to fit the data retrieved from BBC ontologies. The main objective of the project is to facilitate data linkage and integrity. The final design is a robust system. …
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER98.8% of users find it useful
DB Design From Existing Dataset
Read Text Preview

Extract of sample "DB Design From Existing Dataset"

Logo) and DB Design from Existing Dataset March 24 This project aims at modeling a database to fit the data retrieved from BBC ontologies. The main objective of the project is facilitate data linkage and integrity. The final design is a robust system that assures data consistency and reliability for relative large data-sets. In the report an attempt has been made to first establish an understanding of how the BBC ontologies work as it is important before embarking on any activity of representing their data in the database. With the knowledge of how ontologies operate, a sufficient foundation had been created that was used to design the database to hold the programme ontology. Finally, an attempt was made to analyze the effectiveness of the database designed by demonstrating how queries and views could be generated from the database. The design is convincing enough in attempting to model linked data using a relational database. PostgreSQL was used in modeling the database. Table of Contents 1Abstract 2 2Requirements for implementation of the system 4 3Introduction 4 4Understanding BBC Ontologies data organization 6 5Design of Relations (Tables) 9 5.1Categories Table 10 5.2Brands table 11 5.3Media_types Table 12 5.4Masterbrands Table 12 5.5Services Table 13 5.6Programmes Table 13 6UML Diagram for the Database 16 7Analysis 17 7.1Justification for the database choice and data clean-up done 17 7.2Sample Queries from the database 18 7.2.1What category and/or brand do we have clips for? 18 7.2.2 What programmes are about money and politics 19 7.2.3What programmes are about music and politics 21 7.2.4 Are there any...? Queries 21 7.2.5 How many...? Queries. 22 7.2.6Other useful queries include: 24 7.3Views 24 8Conclusion 26 9Reference 27 2 Requirements for implementation of the system I. Object Relational Database System like MySQL and PostgreSQL. The latter was chosen because sufficient grasp of concepts in it was available. It is an ORDBMS that properly handles data relationships via features like inheritance and other OR features like joins and rules. II. Due to the number of records in the database, a software for maintaining the data and viewing is necessary preferably phpMyAdmin because it supports both PostgreSQL and MySQL databases. Also because the web system is to be implemented upon PHP as the server side language. pgAdmin III could also be used to manage the postgres data III. A web server, Apache. Preferred because it is open source thus readily available and has enough support, documentation and is widely used. IV. An able computer system running on UNIX like systems, Windows or OSX. However, most people report that postgreSQL does not work well on Windows, this leaves the two platforms as the most preferable. 3 Introduction In a world with millions of people in search for information, knowing exactly what each needs is not only impossible but unrealistic. The contemporary world has seen momentous advancements in the way information is accessed compared to earlier centuries. Several media exist through which information is disseminated to cater for users with diverse needs and capabilities. Audio, video, print and the web are probably the most striking media for this purpose. Providers of such services run into the challenge of organizing their data in way that presentation is easier and in a way that approximates what people might look for whenever they access these media. It becomes therefore, a matter of abject significance to model systems that organize data in way that it can be easily presented in a logical manner with economical use of resources and time. BBC Ontologies is one technology reminiscent of such systems and has been largely relied upon in creating a system that attempts to present linked data to users via the web. The web system modeled presents part of BBC data customized for presenting available programmes. To properly represent this data, a basic understanding of the manner in which BBC organizes its data has been established a requisite to modeling a database driven web based system that is both scalable and linked. The data is adapted from BBC programmes ontology and is stored in a relational database, PostgreSQL, which is an open source ORDB (Object Relational Database). The front end system will be a web system running PHP as the server side language, Apache as the server and the browser as the client. 4 Understanding BBC Ontologies data organization BBC has ontologies that describe varied fields of the data they provide. The data items are linked in way that facilitates a logical retrieval for use. These ontologies include sport, storyline, wildlife, programme, politics, food, creative works and some other categories. Data is grouped in classes that in turn have properties. The programme Ontology presented for design in this assignment organizes the data in the following manner. NO. CLASS SUB-CLASSES PROPERTIES 1. Programme (Encompasses almost all the listed properties as its sub-classes too) Actor, anchor, author, category, commentator, director, executive_producer, format, genre, masterbrand, microsite, news_reader, participants, performer, person, place, producer, service, subject, synopsis 2. Channel AM, LW, DAB, DUB, FM, IP Stream frequency 3. Version AudioDescribed version, OriginalVersion, shortenedversion, signLanguage Aspect-ratio, format, subtitleLanguage 4. Brand Value e.g (Top gear) 5. Broadcat Values can be like( FirstBroadcast, Repeat) broadcast_of, broadcast_on, schedule_date 6. Broadcaster 7. Category Format, genre, palce, subject, person 8. Episode 9. Radio NationalRadio, LocalRadio, RegionalRadio 10. Service outlet(Radio 4 LW), Radio, TV, Web Broadcaster, channel, location, outlet, ParentService 11. Season broadcast 12. Segment 13. Series Parent series 14. Subtitle text Table 1.0. Table of classes and sub-classes of BBC programme ontology. The dataset given has the fields listed below which we can map onto the original BBC class-property relationship shown in the table above so as to be able to model our database effectively. The mapping will help to achieve scalability of the database so that it can receive similar data from BBC as required in the assignment instructions and also help establish the right data linkage so that systems built upon this database may be able to retrieve related information easily. twoweek_proglist.csv fields used pid start_time end_time epoch_start epoch_stop complete_title media_type masterbrand service brand_pid is_clip categories tags A closer examination of the data contained in these fields helps determine which are ones are just properties of the programme class and which ones are sub-classes. In this implementation, properties have been modeled as columns of table programme while sub-classes have been treated as separate tables linked to the programme table. Properties identified from the list of fields above are pid start_time, end_time, epoch_start, epoch_stop, complete_title, is_clip and tags. This is because their values are so varied that separating them in individual tables does not eliminate redundancy but rather adds to the size of the database. Sub-classes have been considered independent tables because far as they help reduce replication of data over rows, they contain some other information that is indirectly related to a programme. Another reason why sub-classes have been represented as tables is because the data from the various ontologies other than programme ontology use the information from these subclasses and it would be inefficient to tie them up to a single table where other ontology tables would find hard to utilize the information. For instance, classes like Season, Brand, Broadcast and Category would be sub-classes of such ontologies as sport, food, politics, and of course programme. When we include the sub-classes tied into a single table, it implies that every ontology will have separate columns/fields for each of the sub-classes. This has several downsides, first it greatly increases the database size making it necessary to have more disk space. Second, it brings about data redundancy in the schema which has its own disadvantages to the performance of the database system. In addition to these, duplicate fields make it hard to establish links between fields and relations thus making the system ineffective in meeting the main objective (in our case) modelling linked data using relational databases. Therefore, in effort to meet the objectives of this project – a database to facilitate linked data and that can be extended to accommodate new data – the fields, media_type, masterbrand, service brand_id, and categories will be contained in independent tables which are linked to the programmes table.They contain data that is repeated over numerous rows and this step significantly solves this replication. The approach also tries as much as possible to mimic the data organizaation at BBC by having classes and sub-classes and their corresponding properties (which may be sub-classes of that same class like the case in question where category, service, masterbrand and others are sub-classes of programmes and at the same time are its properties). 5 Design of Relations (Tables) Breakdown of tables to be used programmes categories masterbrands media_types services Given that programmes table extracts most of its data from the other tables, it is reasonable and necessary to first create these other tables before creating the programmes table otherwise its creation will be prevented because of the foreign key constraints that are used in establishing links. In this project, data linking has been implemented solely based on relational database structures such as rules and joins. It was desired that inheritance be implemented where tables that inherit other tables be used. However, that would require complete knowledge of the data to anticipate from BBC besides the data available in the proglist.csv did not provide complete information on individual tables making it hard to establish clearly which tables are children of other tables and based on what fields? Efficient implementation of rules (foreign keys and primary keys) has been utilized, however, to offer the same linkage that would be offered by inheritance features. The tables designed do not exhaustively represent the BBC programme data because some fields did not have data provided and have been left empty until such data is available before they are populated through updates. In light of this, some data in tables has been combined into single fields because an understanding of how the merging of fields was done is missing such as in the categories and tags columns of the raw data. The columns have combined data which is hard to split into individual attributes owing to lack of the format with which they were combined. 5.1 Categories Table This table describes a programme in terms of its format, genre, place, subject and person. Classification based on these groups can be seen in the categories column on the raw data: format (92000059:2, 91000098:1 etc), genre (factual, drama, learning etc), place (pre-school, scotland), subject (politics, history, psychology) and person (adult, children). Information of the data types expected can be easily gathered from this data and used to create tables to hold such data. The table below describes how to create the categories table. The actual SQL syntax for PostgreSQL is available in the .sql file provided. Note that the category_id has been added to be used to establish relationship with other tables. As mentioned earlier, the way in which the various properties of categories have been combined is not known so populating the fields independently is not possible at this stage. The table will therefore combine the fields as in the raw data into another field called details which would still represent the same data as in the individual fields. Categories table column_name data_type Constraint (s) null_allowed category_id Serial Primary key, index_column NO details text NO An index is created on category_id because it is the one used in retrieving values in the correspond row. Indexing speeds up retrieval of data from a table and categories has considerably huge amount of data. Note: Given the categories are combined differently, it is not possible to implement a single table to accommodate these varied data-sets. Due to this, the column for categories is entered into the programmes table as a field rather than a link to another table. 5.2 Brands table This table does not contain much data, it just gives the IDs of brands whose names have not been presented in the raw data. This table is therefore modeled with empty field for brand names for such information can be easily fed against the PIDs. Brand_pid is the primary and index column used to reference other tables and for searching. Brands table column_name data_type Constraint (s) null_allowed brand_pid varchar(10) Primary key, index_column NO brand_name varchar(100) YES Note: Given that brand_names have not been given, this table will not be created for the brand_pids available are sufficient to represent the required information. It has been included in this discussion just in case categorization of brands by name is done, then table need to be created. 5.3 Media_types Table Data in the column describes the type of media such as audio and video. There are date values in the column which have clearly been misplaced and will be replaced by a string “unknown” which can be replaced with appropriate media type upon confirmation. Media_type_id is the primary key and index column for referencing and searching. Media_types Table column_name data_type Constraint (s) null_allowed media_type_id Serial (integer) Primary key, index_column NO media_type_name varcha(20) NO Syntax CREATE TABLE media_types ( media_type_id SERIAL PRIMARY KEY, media_type_name VARCHAR(20) ); 5.4 Masterbrands Table Data presented describes the various masterbrands that BBC broadcasts through like bbc_four, bbc_three among others. Like the preceding tables, masterbrand_id is the primary key and because it does not have large data set, it does not have an index column. Masterbrands Table column_name data_type Constraint (s) null_allowed masterbrand_id Serial (integer) Primary key NO masterbrand_name varchar(50) NO Syntax CREATE TABLE masterbrands ( masterbrand_id SERIAL PRIMARY KEY, masterbrand_name VARCHAR(50) ); 5.5 Services Table Data is similar to that masterbrands table but not entirely. The table is created as shown below. Services Table column_name data_type Constraint (s) null_allowed service_id Serial (int) Primary key NO service_name varchar(50) NO Syntax CREATE TABLE services ( service_id SERIAL PRIMARY KEY, service_name VARCHAR(50) ); 5.6 Programmes Table This is the main table that is linked to the above mentioned tables. It has the tables as its properties together with the other properties shown in the raw data sheet. The table instead of having the service_name or category_name, it has an ID corresponding to a column in the respective table. This does not only simplify the design of the programmes table through reduction in complexity but also helps eliminate redundancy in the database where fields are repeated severally in a single relation. The properties entered directly into the table are programme_pid, start_time, end_time, epoch_start, epoch_end, complete_title, is_clip and tags because they are unique values for almost every row in the table. Linked fields are media_type_id, masterbrand_id, service_id, brand_pid and category_id. The relationship is enforced via foreign key rule defined on those columns in the programmes table. This is the main table upon which data retrieval for the web system will be built. It dictates the kind of data to be retrieved and the various links possible. Therefore it forms the central part of the system upon which the logic of the system is controlled. Based on this table, a program can be defined to have start time and end time, has a title and tags relating it to other programs or things in the database, belongs to a particular brand, category and service and is a certain media_type like audio or video. Programmes Table column_name data_type Constraint (s) null_allowed programme_pid Serial (integer) Primary key, Index column NO start_time real NO end_time real NO epoch_start vrchar() NO epoch_end varchar() NO complete_title varchar(100) NO media_type_id integer foreign_key references media_types NO masterbrand_id integer foreign_key, references masterbrands NO service_id integer foreign_key, references services NO brand_pid varchar(10) YES is_clip boolean Default false YES categories varchar(150) NO tags text YES Syntax CREATE TABLE programmes ( programme_pid VARCHAR(10) PRIMARY KEY, start_time REAL, end_time REAL, epoch_start VARCHAR(20), epoch_end VARCHAR(20), complete_title VARCHAR(150), media_type_id INTEGER references media_types, masterbrand_id INTEGER references masterbrands, service_id INTEGER references services, brand_pid VARCHAR(10), is_clip BOOLEAN default false, categories VARCHAR(250), tags TEXT ); 6 UML Diagram for the Database 7 Analysis 7.1 Justification for the database choice and data clean-up done For analytic purposes, performance of PostgreSQL is quite powerful as compared to MySQL which is itself tailored for transaction based operations. Given that the problem in question is more of analysis rather than transaction, preference was given to PostgreSQL. This was in effort to exploit the vast features that the DBMS offers to facilitate data analyses which happens to be just what objective requires. These include ability to define complex links via multi-value fields also called nested tables and class based inheritance. To add on this, PostgreSQL has wide support and has been known to work well for large data driven systems such as Instagram and Amazon, has wide range of data-types including user-defined ones and supports loadable modules where one can write procedures in other languages like C, Perl and Python. As mentioned earlier, several modifications had to be done on the original before it could fit the design desired. The data was redundant in some cases and had to be cleaned-up before storing in the tables. Use of separate tables for services, media types and master brands, and their IDs in the programmes table is one such effort made to eliminate data redundancy in the system. In addition, there were entries in some fields that did not rhyme with the design so they had to be replaced by meaningful values. This was performed on masterbrands that were empty because it was thought illogical for a program no to belong to any master brand. Popular master brands were assigned to such empty fields. Similar actions were performed on is_clip columns by defining default value because the field presents a yes-no question which must have a value. The data type of some columns was changed too so as to achieve some level of logic in the relationship between data. For instance, is_clip is originally presented as SMALLINT/BIT having either 0 or 1 to indicate true or false. Implementation however, treats the column as BOOLEAN type and the values 0 and 1 are converted to false and true respectively. Because there are only two possible values for the column, empty values were defaulted to false. Categories table was not implemented, despite having been considered in the design, for a number of reasons. To have a separate table would require that clear knowledge about the data to be included in it is available. However, in this case, though the knowledge of what columns to include in the table based on the definitions at BBC site, the manner in which the category field data was combined remained unknown. Even though it was possible to store a category against its format, recombining them and linking to programmes table would prove an uphill task. The field was therefore included on the programmes table as it was rather than in a separate table linked to programmes as in the case with services, masterbrands and media_types. The decisions to represent data exactly as provided has some downsides on the operation of the system. To retrieve a program under a given category would therefore require the use of LIKE operator and REGULAR EXPRESSIONS because the column does not take on single categories but several concatenated categories. Tags column also suffers a similar disadvantage. The rigor of designing queries that use LIKE operator and Regex is much less compared to that of combining foreign keys into a single column which in fact makes the linked tables lose their tie. Therefore, separate tables for categories and tags were not created rather, the columns were inserted into the programmes table as they were. It is evident that after the clean-up on the original data, the resulting database is less bulky, less redundant and more meaningful as compared to the original data. 7.2 Sample Queries from the database 7.2.1 What category and/or brand do we have clips for? Here, we retrieve a result set containing category, masterbrand and (though not very necessary because we do not know exactly if a brand_pid defines a masterbrand or otherwise) brand_pid. This query in itself can help establish whether the relationship exists. Using this query, one can know how much of the programmes offered are clips and to which brand(s) and category(s) they belong. Syntax SELECT categories, brand_pid, masterbrand_name FROM programmes INNER JOIN masterbrands ON masterbrands.masterbrand_id = programmes.masterbrand_id WHERE is_clip = true ; Sample output from the database system designed "[9100098:1:comedy.9200011:2:sketch]";"p002s0wp";"bbc_radio_four" "[9100005:1:factual.9200059:2:science_and_nature]";"b00lwxj1";"bbc_one" "[9100005:1:factual.9200059:2:science_and_nature]";"b00lwxj1";"bbc_one" "[9100098:1:comedy.9200011:2:sketch]";"p002s0wp";"bbc_radio_four" From this result, we can speculate that the brand_pid correspond to masterbrands but because no specification has been provided to indicate that there are brands and masterbrands existing separately, this implementation was reluctant to assume that the brand_pid actually describe masterbrands. 7.2.2 What programmes are about money and politics Aim is to identify programmes that are associated with money and politics. Therefore, given that we have no column that explicitly stores money or politics as its only value, it becomes inevitable to use operators so that we find phrases containing the keywords mentioned above – money and politics. Such searches in this database can be done by defining a WHERE clause that requires matching columns to have a certain pattern, in this a pattern with money and politics in it as shown below. SELECT programme_pid, complete_title FROM programmes WHERE categories LIKE %money%politics%; This means that the result set will have columns that contain the words money and politics in their categories column. Note that in this case, only those columns that have money coming before politics will be retrieved. To retrieve all cases where the two words have been mentioned regardless the order, an OR keyword must be added that defines the reverse order. SELECT programme_pid, complete_title FROM programmes WHERE categories LIKE %money%politics% OR categories LIKE %politics%money%; % sign is used to show ANYTHING. Reading the condition says that find, a program where the categories column has anything followed by the word money, followed by anything then the word politics which is in turn followed by anything. The other condition after OR is the exact reverse of the one before OR. Sample output and analysis of the result set "b00qn37q";"the_virtual_revolution:_the_great_levelling?" "b00qsbvv";"the_virtual_revolution:_enemy_of_the_state?" This result alone is not so helpful. However one also be interested to know the brands offering this programmes so that they know what to tune in to. An improved query involving a join has been provided below that gives a more informative result. SELECT complete_title, masterbrands.masterbrand_name, media_types.media_type_name FROM programmes INNER JOIN masterbrands ON masterbrands.masterbrand_id = programmes. masterbrand_id INNER JOIN media_types ON media_types.media_type_id = programmes.media_type_id WHERE categories LIKE %money%politics% OR categories LIKE %politics%money%; Sample output "the_virtual_revolution:_the_great_levelling?";"bbc_two";"video" "the_virtual_revolution:_enemy_of_the_state?";"bbc_two";"video" "the_world_debate:_whos_business_is_it?";"bbc_three";"audio" "americana:_24/05/2010";"bbc_three";"audio" "on_the_money_with_robert_peston";"bbc_two_wales_digital";"video" "the_bottom_line:_30/03/2010";"bbc_three";"audio" From this, we can see that most of the programmes on money and politics are aired mostly on bbc_two and bbc_three both in audio and video. More inferences can be made on the data if a larger result set is examined. 7.2.3 What programmes are about music and politics The query is similar to that in the second question only that the keywords change to be music and politics. Here is the SQL code for the query. SELECT complete_title, masterbrands.masterbrand_name, media_types.media_type_name FROM programmes INNER JOIN masterbrands ON masterbrands.masterbrand_id = programmes. masterbrand_id INNER JOIN media_types ON media_types.media_type_id = programmes.media_type_id WHERE categories LIKE %music%politics% OR categories LIKE %politics%music%; Similar analysis can be done on the results as in the previous query. 7.2.4 Are there any...? Queries 7.2.4.1 Are there any masterbrands similar to services? SELECT DISTINCT service_name FROM programmes INNER JOIN services ON services.service_id = programmes.service_id INNER JOIN masterbrands ON masterbrands.masterbrand_id = programmes.masterbrand_id WHERE masterbrand_name = service_name ORDER BY service_name ; 7.2.4.2 Are there any programmes starting on the same date? SELECT complete_title FROM programmes WHERE epoch_start = 2012-07-25T01:13:01; 7.2.4.3 Can someone search for a programme details if they know some words but not the complete_title? SELECT complete_title, masterbrands.masterbrand_name, epoch_start, epoch_end FROM programmes INNER JOIN masterbrands ON masterbrands.masterbrand_id = programmes.masterbrand_id WHERE complete_title LIKE %mesopotamia%; Sample output "crime_catch-up:_hercule_poirot_-_murder_in_mesopotamia";"bbc_7";"2010-05-29T15:24:25";"2010-06-05T15:24:25" "crime_catch-up:_hercule_poirot_-_murder_in_mesopotamia";"bbc_7";"2010-05-29T15:32:00";"2010-06-05T15:32:00" "hercule_poirot_-_murder_in_mesopotamia:_episode_1";"radio_wales_arts_show";"2012-07-24T01:13:06";"2012-07-31T01:13:06" "poirot_-_murder_in_mesopotamia:_episode_2";"radio_wales_arts_show:";"2012-07-25T01:13:01";"2012-08-01T01:13:01" "poirot_-_murder_in_mesopotamia:_episode_4";"radio_wales_arts_show:";"2012-07-26T06:33:37";"2012-08-02T06:33:37" Studying the output above reveals some important information that first programmes can be aired several times in a day on a single channel. For example, crime_catch-up:_hercule_poirot_-_murder_in_mesopotamia is aired on BBC 7 several times in a day from 2010/05/29 to 2010/06/29 at 15:24 and 15:32 hours each day of the interval. This is shows that it is possible for users to perform searches in the system by specifying just a little information that they require. 7.2.5 How many...? Queries. These type of queries are important in carrying out analyses about the programmes thus educating us more about the kind of organization that BBC, in this case, has. They can also be used to carry out administrative tasks on the data for example searching and deleting empty columns or knowing if data is inconsistent by simply counting rows and other attributes. 7.2.5.1 How many programs are aired in a given interval? SELECT COUNT (*) FROM programmes WHERE epoch_start LIKE %2010-05% AND epoch_end LIKE %2010-06% Result: (bigint) 1850 7.2.5.2 How many programmes do not have any tags? SELECT COUNT (*) FROM programmes WHERE tags = []; 7.2.5.3 How many programmes are under a particular masterbrand? SELECT COUNT (*) FROM programmes WHERE masterbrand_id = 23; Or one can still may desire to retrieve all programmes that are under a certain masterbrand by their titles but since the result is bound to be large, the best option is always to count the rows matching the search criterion before attempting to display them. In the web system where queries return a large result set it is wise to either limit the number of rows returned although this affects performance because the search result could be among the data that is cut out of the allowable interval set. To help reduce this effect, queries should narrow as close to the target value as possible. General queries are harmful to the performance of the system. Lastly one can also desire to know how many programmes are under a given category. It was seen earlier that due to the lack of knowledge about how categories have been concatenated, retrieval of data on this column would prove challenging. However, with a little knowledge on how to use operators and regular expressions, this system limitation is overcome or at least its effect is smothered partially. We can comfortably retrieve data in a given category. The disadvantage though is that you cannot associate a programme with exactly category nor can you get the properties of a category to which a programme belongs for the same reason that its format is not uniform. Below is an example of how to retrieve and count programmes in a given category. SELECT complete_title FROM programmes WHERE categories LIKE %drama%; Or search by its number as follows SELECT complete_title FROM programmes WHERE categories LIKE %9100003:1%; Or simply count as follows SELECT COUNT (*) FROM programmes WHERE categories LIKE %history%; 7.2.6 Other useful queries include: 7.2.6.1 Retrieve programmes that are clips SELECT programme_id, complete_title FROM programmes WHERE is_clip = true ; 7.2.6.2 Retrieve masterbrand name, service name and media type for a particular programme SELECT masterbrands.masterbrand_name, services.service_name, media_types.media_type_name FROM programmes INNER JOIN masterbrands ON masterbrands.masterbrand_id = programmes. masterbrand_id INNER JOIN services ON services.service_id = programmes.service_id INNER JOIN media_types ON media_types.media_type_id = programmes.media_type_id 7.3 Views Views can also be created to store data that is frequently accessed so that new queries do not have to be written for every request a client makes. Example of a view CREATE VIEW vw_programmes AS SELECT masterbrands.masterbrand_name, services.service_name, media_types.media_type_name FROM programmes INNER JOIN masterbrands ON masterbrands.masterbrand_id = programmes. masterbrand_id INNER JOIN services ON services.service_id = programmes.service_id INNER JOIN media_types ON media_types.media_type_id = programmes.media_type_id This view can be used to store the query written above so that it does not have to be written again. Views behave like tables only that they cannot be updated as they actually are not tables. However reports can be built upon a view. Views have the advantage that they always contain up to date information about the database tables. These are suitable in reports as they will consequently always give up to date reports. To automate activities upon the database like tracking data fields that have been manipulated or deleted, triggers can be enforced on fields to log such changes. With triggers are functions that can be used to combine data in ways that result in more constructive information. The database has managed to implement several aspects desired for this project. First, data has been linked through linking tables via foreign keys and primary keys that are referenced during query execution. Through these keys, rules have been enforced too that dictate what data should go into certain columns and if it should be null or not. The database can be scaled up via additional tables and columns onto the existing ones. Column names can be altered too and dropped if needed. This flexibility is desirable because the data received for storage changes often according to the demands of the users and organizational objectives. Updates on the database cannot be haphazardly done because there are constraints on tables and columns. Deletion cannot for instance succeed on a column referenced by another table (unless cascaded). Triggers can be defined on columns so that an update on one column would take effect in dependent columns. This way, it is possible to circumvent the limitations of not using inheritance on the tables. Restricting manipulation of tables and their columns helps not only to preserve data integrity but also helps maintain the relationship between tables that brings about linkage in data. 8 Conclusion As shown in the above discussion, the database design attempts to accomplish the objectives of the study. Given that the sample data upon which the system is built was not too huge, the implementation chosen handles such data comfortably. There are not too many fields and tables to implement therefore the approach comfortably handles relationships between the few tables using relational joins, views and functions. It is effective and simpler compared to noSQL and XML based because they require complex codes to read. The approach was motivated by the availability of sufficient documentation on the implementation of various aspects of the database. PostgreSQL is an open source object relational database system that has vast support and enhancing the design later would be simpler given the active updates available for the database. Much as the design herein attempts to qualify the requirements, it can be inferred that it has not exhaustively exploited the linking capabilities of PostgreSQL. Given large data with numerous tables and relationships, the use of foreign keys alone cannot be used to implement such a system. The lack of inheritance also impacts negatively on the extent to which the design meets the objectives. However, to drive a web system with data of the quantity provided, this design is sufficient. 9 Reference BBC. Ontologies, [www]. BBC Online, n.d., n.p. Available at: http://www.bbc.co.uk/ontologies [accessed March 10, 2015] Read More
Tags
Cite this document
  • APA
  • MLA
  • CHICAGO
(DB Design From Existing Dataset Report Example | Topics and Well Written Essays - 5000 words, n.d.)
DB Design From Existing Dataset Report Example | Topics and Well Written Essays - 5000 words. https://studentshare.org/information-technology/1861478-db-design-for-existing-dataset-and-analysis-s
(DB Design From Existing Dataset Report Example | Topics and Well Written Essays - 5000 Words)
DB Design From Existing Dataset Report Example | Topics and Well Written Essays - 5000 Words. https://studentshare.org/information-technology/1861478-db-design-for-existing-dataset-and-analysis-s.
“DB Design From Existing Dataset Report Example | Topics and Well Written Essays - 5000 Words”. https://studentshare.org/information-technology/1861478-db-design-for-existing-dataset-and-analysis-s.
  • Cited: 0 times

CHECK THESE SAMPLES OF DB Design From Existing Dataset

Managerial Decision-Making in Enron

The non-rational escalation of commitment is one of the cognitive actions that can be related to a decision to be able to recover from certain loses which can be attributed to decisions made by the company.... For that matter commitments and decisions from the group which for some reasons can be irrational in certain cases and standards are made (Bachman & Zaheer, 2006).... Such actions target the virtually Running Head: Research design A Research design In APA Style I....
4 Pages (1000 words) Essay

Are the German Banks Riskier than the European Competitors

The present paper tries to explore the fact that the German banks have in the process been successful in implementing anti-risk measures in their policies in comparison to their European peers.... Use of statistical software STATA has been made to draw a comparative analysis.... … According to the report Are the German Banks Riskier than the European Competitors?...
7 Pages (1750 words) Essay

Flash Game using existing code

Some of the common codes used for making Game Flash Using existing Game Flash Using existing s In the modern world, games are everywhere.... om/gameflahplayer) Even for a developer who has experience, few key tools can be helpful since they can implement existing codes with simple text editor such as; (Notepad or Text editor) and any other HTML5-capable web browser; if one intends to do any serious work, it would be necessary to have syntax highlighting, a web inspector, JavaScript debugger and JavaScript console....
1 Pages (250 words) Coursework

Facility design :Clean Room Reducing Level

Due to this There is intense pressure from the management to flatly reduce costs of this site or risk closure.... The cost reduction option which was developed by the site team was to outsource three of the mature drugs from another BFT site located within the country and with lower labor rates.... This was to be followed by downgrading both cleanrooms from class 100 to 1000.... When it comes to the design difference between class 100 and 1000 clean rooms, it is evident that class 100 has unidirectional air flow system with average airflow velocity of 50-90 fpm (1)....
4 Pages (1000 words) Essay

Individual Project

dataset on cash flows for Coca-Cola Company was retrieved from the quarterly financial reports of the company as posted on its website and reported by the media.... From the dataset on the quarterly financial postings of Coca-Cola, the cash flows before the 2010 FIFA World Cup were $1.... From this online dataset, Stanford reports a 16% increase in sales volume in North America for Coca-Cola since 2007, just as the overall change in sales.... FIFA World Cup, a football tournament bringing together teams from around the world, is one event where Coca-Cola features extensively....
2 Pages (500 words) Research Paper

IC Insights - Dataset Analysis

Generally, contents of a single data matrix are recorded in a dataset to give the user a comprehensive view of a particular topic (Mirer, 2014, n.... A dataset is prepared using relevant portions of a database.... In a dataset,… Each value of the variable in a dataset is called datum (Johnson, 2000, p.... Depending on the number of rows, one or more members and corresponding data may In the current complex business environment, dataset is widely used in several forms to analyse and interpret data and to draw up meaningful conclusions....
12 Pages (3000 words) Essay

Dataset for Salary

The assumption of matched participants did not change the hypothesis or assumption of a directional relationship, so it did not change the type of t-test; however, equal variances could be assumed with this dataset for Salary dataset for Salary A two-tailed t-test with unequal variances should be performed for the sample data.... No, the dataset could not be from a true experiment because it was not randomly assigned.... This is clear from examining the T Test results that were constructed, which… 4....
1 Pages (250 words) Essay

Statistical Dataset Variables Significance

The essay "Statistical dataset Variables Significance" focuses on the statistical analysis of the dataset variables' stylistic significance.... The coefficient values range from -1 to +1, any value close to +1 means that there is a strong positive linear correlation while a value of -1 means a perfect negative linear correlation....
6 Pages (1500 words) Essay
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us