Relational Database Management System Critique Term Paper Example | Topics and Well Written Essays

University Affiliation:

Relational Database and Web Integration

Introduction

Database management system (DBMS) was introduced in the 1960s. In database, DBMS can be defined as a computer software or system software for creating, retrieving, updating and managing databases. A decade later in the 1970s, Rational Database management system (RDBMS) was introduced which out-shined the old DBMS for various reasons among them is the speed of querying. RDBMS was highly recommended for big data organizations like banks and companies not only for its data independence and concurrency control but also it is a relationship model which processed executions faster than the older navigational model. This paper discusses the alternatives of RDBMS and big data under the subtopics of volumes of data, types of data, data storage and security

RDBMS Structure

According to Krishnamurthy, Thombre, Conway, Li & Hoyer (2014), they argue that rational databases store its data in tables (rows and columns) where each table has a unique key called primary key that uniquely identifies records in that table and cannot be NULL. Tables have foreign keys that are used to show related rows with others of a different table. In most circumstances, foreign keys are primary keys of other related tables. The relational model also uses indexes to show the relationship between tables which makes the storing and fetching of data much faster than the navigational models (Krishnamurthy et al., 2014). An RDBMS database server has two sections: The Database itself and the instance. The database consists of the storage model where control files, logs, and data (in tables and indexes) are stored in a disk. Under the instance section, it consists of the process model and the memory model. The process model handles the processes such as reader, writer, checkpoint, and logging in the memory. The memory model handles the data (SQL statements, meta code, plan, and tables) in memory (Krishnamurthy et al., 2014).

There are three main alternatives of RDBMS based on market shares namely: The Oracle database, Microsoft SQL Server, and MYSQL databases.

Oracle Database

Oracle database software was developed in the late 1970s by the Oracle company. Oracle database is widely used and trusted RDBMS because of its security features. Oracle databases use Structured query language (SQL) to query the database which is case insensitive. For example, both “select UserID, PropertyValue from UserProfile” and “SELECT UserID, PropertyValue FROM UserProfile” execute the same way with no bugs. Oracle database is developed with two different structures: the logical structure and the physical structure. Since the two structures are separate, a user can manage data in the physical storage structure without accessing or affecting the logical storage structure (Harrington, 2016).

Oracle database classifies security into two categories: system security and data security. According to Elmasri and Navathe (2015), they argue that at the system level, system security is used to allow users to control access and use the database. This means the main duty of system security is to check if the user has the correct credentials to connect to the database, what permissions are granted to the user and if database audit is active. On the other hand, users use data security to control access to the database while at the schema object position. In other words, data security controls actions audited for every schema object, encrypts data to uncertified persons from bypassing oracle security measures to acquire access and check with users have access to which schema object (Krishnamurthy et al., 2014).

Oracle database has a database backup and recovery feature. Database backup is simply a copy of the current database at its best, the objective of the backup and recovery feature is in case of hardware or software malfunction, corrupted or lost data can be tracked and retrieved. In Oracle database, recovery manager (RMAN) tool is integrated with the running session to perform backup and recovery tasks or a user uses a user-managed backup and recovery solutions to perform the backup and recovery activities (use of commands, SQL*Plus commands) (Krishnamurthy et al., 2014).

Microsoft SQL server.

Microsoft SQL server is the flagship RDBMS developed by Microsoft Corporation. A database in SQL server is created at the disk level and are of two types: the log file and the data file. Data pages physically hold data stored in data file while log file stores transactions executed on the database (Krishnamurthy et al., 2014). Unlike Oracle database, SQL server uses an extended version of SQL called the T-SQL (Transact-SQL). MS SQL server can be classified into seven architectures namely: the pages and extents architecture, files and file groups, transaction log, query processing, memory management, thread and task and the table and index data structure architecture (Elmasri & Navathe, 2015).

SQL server has three major components: the relational engine, SQL OS, and the storage engine. Relational engine (also known as the query processor) manages the execution of query statements and determine what each query wants and the best way to do it and returns results to the user (Elmasri & Navathe, 2015). The main tasks of the relational engine include memory management, query processing, buffer management, thread and task management and the distributed query processing (Harrington, 2016). The second component of SQL server is the SQL OS which is in between the SQL server and the host machine for instance windows 10. SQL OS is highly configurable OS with advanced parallelism that takes care of all activities carried out on the database engine. Lastly is the storage engine whose main task is to store and retrieve data from the database.

SQL server also has backup and recovery feature depending on the disaster faced. These disasters are grouped into four categories namely failover clustering, database mirroring, log shipping and the peer-to-peer transactional replication (Elmasri & Navathe, 2015).

MYSQL databases.

MYSQL server is a relational, open source, fast, easy to use reliable and scalable database server mainly used by newbies of databases because of its well-constructed documentation (Erl, Khattak & Buhler, 2016). Basically, MYSQL architecture comprises of two layers: the application layer and the logical layer. Clients and users interact with MySQL software in this layer. Clients, administrators, and query users are the three main components of this later. Logical layer is subdivided into four subsystems namely: transaction management, storage management, query processor and the recovery management (Erl et al., 2016). The above-mentioned subsystems process together requests from the database server.

MYSQL database requires all its users to have strong passwords to prevent the database against attacks because the client program cannot tell who exactly is trying to execute statements. When dealing with privilege, Erl et al. (2016) urges database users not to grant FILE privilege to non-administrator users. He explains further and says with this privilege, a user can write a file anywhere in the system directory with the mysqld daemon that can compromise the database. Another security feature MYSQL uses is to ensure that when running mysqld, only Unix users credentials with read and write permissions can perform this task. Lastly, the root user is added by default by MYSQL and it is not advisable to rum MySQL server as a root user (Erl et al., 2016).

MySQL server supports up to 11 database backups and recovery types namely logical vs physical (raw) backups, online vs offline backups, local vs remote backups, snapshot backups, full vs incremental backup and point-in-time (incremental) vs full recovery (Erl et al., 2016). Remote backups are performed from different server/host whereas local backup is performed from the same host and in both instances, mysqldump can be used to connect both the remote or local servers. Lastly, as the name suggests, snapshot backups are implemented when the file system has been implemented to enable “snapshot”. This database is mostly preferred by learning institutions.

Big data

In database, the phrase big data refers is used to describe the huge volumes of data (both structured or unstructured), high speeds of processing and of high variety which is difficult for old DBMS to read. A more improved technology is required to gather, store and retrieve data from big data databases. These type of databases are mainly used in colleges, hospitals, and organizations that rely on data stores in them to make better decisions and further analysis (Mahajan, Gaba & Chauhan, 2016).

Volumes of data

Volume in big data is a term not only used to describe the size or growth of data but also the term refers to the ability handle a different variety of data types it holds. According to Coronel and Morris (2016), he argues that the common misunderstanding of volume when it comes to big data is the issue of petabytes or terabytes of data. He continues and explains any combination of volume, velocity, and variety is where the idea of big data originates. Yahoo and Google were among the first companies to come up with the idea of volume in big data and later Amazon and Facebook bought followed suit because they ran huge enterprises that required space to store their documents, texts, images and retrieve them as fast as possible when required (Coronel & Morris, 2016).

For example, Facebook alone can generate over ten profiles in a second and continuously, upload over 180,000 new pictures of different formats and about 1000 comments less than five minutes. When collecting this kind of data continuously for a month, that amount of data cannot be stored in old models of database forcing organizations to turn to using distributed systems where data could be stored in sections and more sophisticated software puts the segments of data together for analysis (Coronel & Morris, 2016). Just from Facebook alone, it is open that volume in big data cannot be ignored as there is no other way today to collect and analyze huge volume of data, at that speed and could also be retrieved faster (Harrington, 2016). The high storage of data has brought about marketing competition where an organization can become the leading storage of a nation’s data such as Equifax.

The greatest challenge of the volume of data is in the conversion of all the data (from all data types) to value, analyzing them and storing them. Traditional models have a limited capacity of data to handle and of fewer data types that also took too long to retrieve data (Mahajan et al., 2016).

Types of data

Type of data in big data could refer to the different sources of data either structured or unstructured. In the old days, data was captured and stored in spreadsheets which could only store strings, integers, and doubles such as currency. Despite saving records in rows and columns, the data in the spreadsheets could be manually accessed and could not be used as databases. With today’s advancing technology, data comes in various forms such as emails, monitoring device, audio, video, picture, and PDFs. Coronel and Morris (2016) argues that the wide variety of unstructured data generates complications data mining, storage, and analysis. Having a wide range of types of data means it requires more processing power and time to get a value out of data uploaded which old database models could not manage.

On the other hand, Structured Query Language (SQL) is a database programming language used to query and manage structured data of an RDBMS. SQL was originally developed by IBM early 1970s and relational software which is currently the Oracle Corporation grew it commercially (Coronel & Morris, 2016). In RDBMS, structured data types and assigned to every column that holds the value of that column. The CREATE TYPE command in an SQL statement is used to create new tables, databases in SQL. To retrieve data from the database, the SELECT command is used and the WHERE clause is used to select range at which the data is to be retrieved making it faster and fairly cheap as it uses little server properties to complete the whole process (Coronel & Morris, 2016). Lastly, the DROP command is used to erase completely content either a database or a table. This command is irreversible once executed.

Big data storage

According to Coronel and Morris (2016), big data storage refers to the storage infrastructural designed mainly to manage, store, and retrieve a large amount of data. With big data, sorting and storage of big data are organized in a way it can be accessed with ease, retrieved and processed by third-party tools working on big data. When it comes to flexibility of scaling, big data storage can scale as required (Mahajan et al., 2016).

For big data storage to work efficiently, some requirements must be met in order to keep scaling to boost data growth. Large big data practitioners such as Apple and Google rely on hyper-scale computing environments to run (Mahajan et al., 2016). According to Mahajan et al. (2016), the storage architecture is grouped into four categories: object-oriented storage, widely distributed nodes, all-solid-state driver array, and the scale-out network-attached storage. In order to choose the best architecture from the above four mentioned, consider the following: performance vs capacity. This means between processing power and storage space, which one do you need most? Another consideration would do you learn that storage is critical? Lastly, consider the input/output data patterns (Mahajan et al., 2016). Some of the common big data storage methods include storage specs, infrastructure choice, analytics, problem areas, data protection and Hadoop (Mahajan et al., 2016)

Big data security

Day by day, data privacy is constantly gaining more attention. With cyber attackers out there, somewhere in the world today, a system is being attacked and data breaching is going on. Since data comes from relational databases, it takes very little time for an attacker retrieve data from the database once they have access to the server. Most organizations use their customer personal data and conform to each customer that your records are safe with us. According to Coronel and Morris (2016), he argues that lack of security and privacy to customers’ data can make an organization lose the trust of its trusted customers.

In big data, some security faces many challenges. Non-relational databases (NoSQL) are still developing which makes it hard to keep up with demands of security. In most situations, automated data transfer need more security measures to strengthen it which is mostly unavailable (Mahajan et al., 2016). Another security issue is big data tools are developed to receive large or small amounts of data. When some receive huge data, the program is designed to validate data and only query accurate data but this does not always happen (Mahajan et al., 2016). It is unethical to use a customer’s personal information such as email and mobile number without their permission. Privacy and security drawback come here when inexperienced IT specialist log-in into the server and use data from the database without customers’ approval which is unethical and unacceptable.

To improve on big data security, Coronel and Morris (2016) argues that developers should focus on application security and not device security. Another way to improve big data security is by isolating servers and devices having sensitive information. Provision of proactive and reactive protection is also another way to improve security and lastly, introduction of real-time event management and security information to the administrators could help foresee data breach or unauthorized log-in in the server (Coronel & Morris, 2016

Big data required tools

In big data, the tool required are those tools that help in query analysis or those that provide the user interface to interact with the database via forms. One of the tools is the Hadoop. Hadoop is a tool developed by Apache which has been widely used by leading corporation. One of the best features Hadoop has is the ability to process a large collection of data through the use of effective computer programming models (Witten, Frank, Hall & Pal, 2016). Not only do large corporation prefer the tool because of its great processing capabilities but also regular updates from developers make it easier to use and understand it even better.

The second tool is the Talent tool which offers a collection of data product services. This tool is an open source (free) and it combines applications with real-time data. Since the tool is free, it has acquired popularity in the market from its ability to serve in any state of the business.

The third tool is the Qubole tool mainly used at the enterprise level. The tool is commercial but a trial version of 30 days is available for testing purposes. The tool is very flexible and accessible because you can either decide to run query scripts from a file or directly from the interface provided (Witten et al., 2016). One of the tool’s advantages over other tools is it scales big data, speeds processes and simplifies workloads. The tool has few terms of use making it appropriate for beginners and advanced IT specialists.

Lastly is the Teradata program. What makes this tool unique over others is the fact it recognizes that the data being queries falls under big data but whoever is running the scripts do not know what exactly is happening. In other words, the tool offers end-to-end services and solutions in big data, data warehousing and also marketing applications (Witten et al., 2016). With all these services offered, a user can become a data-driven business analyst since the tool also offers other host services that include implementation.

In summary, the paper has discussed RDBMS alternatives and big data under data storage, volumes of data, security, and types of data. The paper has also discussed the structure of an RDBMS and how data is stored, manipulated and retrieved by use of SQL programming language. The paper points out that Oracle database by Oracle Corporation, Microsoft SQL server by Microsoft and MYSQL databases also by Oracle Corporation are the three main alternatives of RDBMS databases with the largest market share. The paper also explains why RDBMS is considered the best option for an organization basing arguments on the security of data and velocity of processing. The paper recommends Oracle is the best RDBMS in the market currently since it is developed with features that ensure security and privacy of customers’ data in an organization are observed.

Reference

Krishnamurthy, S., Thombre, N., Conway, N., Li, W. H., & Hoyer, M. (2014). U.S. Patent No. 8,745,070. Washington, DC: U.S. Patent and Trademark Office.

Elmasri, R., & Navathe, S. B. (2015). Fundamentals of database systems. Pearson.

Harrington, J. L. (2016). Relational database design and implementation. Morgan Kaufmann.

Coronel, C., & Morris, S. (2016). Database systems: design, implementation, & management. Cengage Learning.

Mahajan, P., Gaba, G., & Chauhan, N. S. (2016). Big Data Security. IITM Journal of Management and IT, 7(1), 89-94.

Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.

Erl, T., Khattak, W., & Buhler, P. (2016). Big data fundamentals: concepts, drivers & techniques. Prentice Hall Press.

Relational Database Management System Critique - Term Paper Example

Extract of sample "Relational Database Management System Critique"