Storage of User Generated Data Using Distributed Backup Method - Essay Example

Add to wishlist

Summary

This paper 'Storage of User Generated Data Using Distributed Backup Method' tells us that many companies around the world face a major problem of dealing with big data. Big data is a term used in the description of a massive volume of unstructured and structured data so large that it presents considerable difficulty…

Download full paper File format: .doc, available for editing

GRAB THE BEST PAPER98.5% of users find it useful

Storage of User Generated Data Using Distributed Backup Method

Read Text Preview

Subject: Information Technology
Type: Essay
Level: Ph.D.
Pages: 5 (1250 words)
Downloads: 2
Author: gkessler

Extract of sample "Storage of User Generated Data Using Distributed Backup Method"

Storage of User-generated Data using Distributed Backup Method al Affiliation) Storage of User-generated Data using Distributed Backup Method Abstract Many companies around the world face a major problem of dealing with big data. Big data is a term used in the description of a massive volume of unstructured and structured data so large such that it presents considerable difficulty in its processing via conventional software and database techniques. An IDC report has made the prediction that there will be growth in the volume of global data by a constant factor of 300 from 2005 to the year 2020 (Aluru & Simmhan, 2013). This growth translates to an increase from 130 exabytes to 40000 exabytes, meaning that data will double in size every year. In recognition of the magnitude of the problems involved in the management of big data, many companies are investing considerable amounts of money in researching better methods of big data management. These companies seek reliable ways of organizing, storing and managing their machine, user and application data, which is rapidly growing in size to exabytes and petabytes. The volumes of big data are considerably large, such that many organizations find it difficult to process, store and access the data they need using the traditional databases and systems of storage (BVT, n.d.). Further complexity comes about in the now common scenario, where companies dispatch teams to different places around the world, yet the nature of their work requires them to collaborate on the same data since they are involved in the same project. This highlights the need for data solutions that allow these companies to change, add, check in content and modify them without creating problems for other employees within the organization (Kumar, 2012). Literature Review Conventional systems of storage typically require revision of the systems or tech refreshes every three years (sometimes four) so that the company can keep up with new requirements and growth. This, in many instances, requires expensive and disruptive data migrations, replacement of void storage capacity and regular upgrades of software licenses (Leavitt, 2013). This paper looks into how companies use big data technology to store data generated by users in exabytes by making use of the distributed backup method. Distributed backup mechanism The distributed backup method of storage provides companies with the opportunity to store more data in a scalable way across storage node networks. It is imperative to note that the main purpose of backing up data is so that an organization can have at least a copy of what may the company may consider as important. In the event that there is a failure in the storage systems within an organization, a backup system ensures that the organization can retrieve the lost or damaged files. In the event that an organization relies only on individual storage devices for their data storage, the company cannot get access to damaged or lost data (Pandey & Nepal, 2013). In networked distributed systems of storage, the use of erasure codes like the Reed-Solomon erasure codes, which in recent history performed storage functions in media such as DVDs and CDs and in RAID like systems, have become more popular. However, existing techniques in coding for communications have failed to address all the underlying issues as they appear in the networked distributed systems of storage context. Of main concern is the fact that existing coding techniques do not offer immediate mechanisms for repair in the case of networked distributed systems of storage (Leavitt, 2013). The failure of some devices used in storage need storage and recreation of lost redundancy, especially in new storage devices. This has led to increased research in the matter of designing codes to ensure more efficiency in repairability (Aluru & Simmhan, 2013). Research in this area has led to numerous deductions that have proven to be important in the storage of big data. The distributed storage system increases storage space and performance in data storage. One reason is that data is stored in different chunks. With the distribution of information in more than one place, data is situated in strategic areas where it can be accessed at any time. In many cases, a distributed architecture ensures that additional security features are accorded to every chunk in the hierarchy. This has to be a matter of priority, considering the fact that with the storage of data in more than one place, confidentiality, integrity and availability may become a major issue (Leavitt, 2013). One of the most preferred distributed storage systems is Hadoop (HDFS). Hadoop is a system designed to contain huge quantities of data. This system is mostly installed on interconnected computers. Hadoop refers to the Google File System, Google, BigTable and Google Mapreduce. HDFS can operate on any computer with features that support Java. The cluster of an HDFS consists of a number of DataNode machines and one NameNode. The NameNode is a master server that is responsible for the maintenance and management of the megadata of DataNodes in its Random Access Memory. DataNodes are mainly involved in the management of the storage linked to the nodes on which they run. The DataNode and the NameNode are software programs with in-built design to perform on machine used every day, which run on typical Linux/GNU operating systems (Kumar, 2012). The HFDS is compatible with schemes of data rebalancing. Datanodes allow for the distribution of information through nodes. In the event that a node does not have enough space to store data, the next node in line stores the information. This increases the storage capacity, allowing for the storage of more data (BVT, n.d.). Distributed data systems of storage exist in two categories. One of the categories is designed for internet services, while the other supports applications of intensive nature like Ceph FS, Lustre File System, Parallel Virtual System of Data and Fraunhofer File System. The Ceph FS, for instance, provides magnificent reliability and performance (Kumar, 2012). Future scope of study One of the areas that require further research in the future is designing codes (Pandey & Nepal, 2013). Designing codes is a complex activity that requires special skills and training in Information Technology. It is essential that enough research be undertaken on the best methods and practices of designing codes so that information held by organizations can remain secure. Codes can be custom-designed for organizations that store and pass big data to enhance security of such information. Designing codes is an important aspect in the storage of data by use of the distributed backup method, considering the fact that with backed up information, there is a need to reduce instances of unauthorized access to data. Further, it is imperative that data stored and backed up through this storage system be restricted to access through designed codes. This further reiterates the need for further research in designing codes for organizations. In addition to that, aspects of design of codes that best suit networked distributed systems of storage such as data insertion, reparability, mutable content and algorithms need adequate research to improve performance in big data storage (BVT, n.d.). Conclusion Despite the challenges faced in storage and management of big data, the distributed backup method ensures data generated by users in exabytes is safely stored and can be accessed later in the event that a need arises for such retrieval, and therefore should be adopted by companies dealing with big data (Aluru & Simmhan, 2013). The software industry has to prepare for the projected increase in data held by companies so that systems do not crush due to the magnitude of the data. The distributed backup method enables organizations to store sensitive information in more than one place of storage. This enhances the level of security by reducing the probability that unauthorized people can access such information. This is another reason why companies need to use this method of backing up information. The distributed backup method is a viable way of backing up information and as such, there is a need for institutions worldwide to invest in better-integrated distributed systems of data storage for better data management. This will promote the security of the information in their possession (Pandey & Nepal, 2013). References Aluru, S., & Simmhan, Y. (2013). A Special Issue of Journal of Parallel and Distributed Computing: Scalable Systems for Big Data Management and Analytics. Journal of Parallel and Distributed Computing, 73(6), 896. DOI:10.1016/j.jpdc.2013.04.004. Business value of Technology. (n.d.). State of Enterprise Storage. Retrieved September 11, 2014, from http://www.seiservice.com/wp-content/uploads/2014/07/2014StateofEnterpriseStorage.pdf Kumar, A. (2012). Distributed and Big Data Storage Management in Grid Computing. International Journal of Grid Computing & Applications, 3(2), 19-28. DOI:10.5121/ijgca.2012.3203. Leavitt, N. (2013). Storage Challenge: Where Will All That Big Data Go?. Computer, 46(9), 22-25. DOI:10.1109/MC.2013.326. Pandey, S., & Nepal, S. (2013). Cloud Computing and Scientific Applications- Big Data, Scalable Analytics, and Beyond. Future Generation Computer Systems, 29(7), 1774-1776. DOI: 10.1016/j.is.2014.07.006 Read More

CHECK THESE SAMPLES OF Storage of User Generated Data Using Distributed Backup Method

The Future of Cloud Computing

These facilities provide safe storage of the data online and make accessibility easier.... The main reasons why users prefer using the cloud facilities are archiving the data in backup, store application data, computing, sharing data and building apps and websites.... SCHOLARLY ARTICLE 1: MODELING AND SIMULATION OF CLOUD COMPUTING SOLUTION FOR distributed SPACE DATA STORAGE AND ACCESS IN MOBILE COMMUNICATION NETWORKS Then high bandwidth radio telescopes generate large amounts of digitized data....

5 Pages (1250 words) Essay

Data Backup System over Networks

The author of this paper "Data backup System over Networks" examines data types according to its sensitivity, backup methodology an archiving, failure types, disaster recovery planning, data restoring methodology, and the survey on data backup solutions being used in the market place.... The aim of this research is to discuss the data backup and restore life cycle in the enterprises and companies, what's the importance levels of the company data which the backup and restore operations will depend on, what's the standards of backup operations, what's the main steps of detecting a failure in the company sensitive data, etc....

21 Pages (5250 words) Dissertation

Critical evaluation of Intercontinental hotel network

It was found that the network in use was outdated and there were some obvious loopholes.... Besides, there were other issues related to utilization of the available resources.... ... ... ... By critically evaluating the hotel's network infrastructure, this project will suggest ways in which the system can be improved, such as better network type and design, network security, bandwidth, Aspects to be considered will include hardware and software, in addition to strategies and policies....

48 Pages (12000 words) Essay

HSM Performance Optimization

ecurity Assertion Markup Language - Is an XML-based open standard data format for exchanging authentication and authorization data between parties, in particular, between an identity provider and a service provider.... The present essay entitled "HSM Performance Optimization" dwells on the key pool solution for of Hardware Security Module (HSM) devices that serve to increase the performance by decreasing the response time when processing signing requests in a Digital Signature Service....

30 Pages (7500 words) Essay

The Usage of Network Media Technology: the Needs of the Interacting Platform

This Networked Attached Storage (NAS) method allows a real-time performance to the delivery networks.... While in case of Storage Area Networking (SAN) method, the clients are provided with common storage access, not common file access.... The paper describes Networked Media that creates a major impact in the way the media is produced and distributed among the systems of the organization.... Networked Media is based on the technological process known as Convergence, through which different kinds of media such as images, videos, texts, 3D graphics and audio could be distributed, consumed, managed and shared among various networks like the internet through Fiber, Wi-Fi, WiMAX, 3G, GPRS and any convenient medium....

10 Pages (2500 words) Research Paper

Data Migration Between Clouds

For example, Cloud computing technology has provided financial organizations with a reliable scheme for using IT computing services more efficiently.... Cloud computing technology has eliminated the need for using resources such as servers, devices, and other types of equipment while reducing the personnel costs of handling these systems (Shroff, 2010).... Generally, the paper "data Migration Between Clouds" has provided an exploratory overview of the topic of data computing technology and described an effective model that can address the challenges associated with the Amazon cloud system considerably....

18 Pages (4500 words) Thesis

App Development for Mobile Platforms

Indeed, Cloud computing is gaining popularity as the cheapest and affordable method of data storage and software.... Web applications are client-server applications that are distributed over the web browser.... Also, cloud computing in mobile cloud networks is virtualized and assigned to a group of distributed computers.... This essay "App Development for Mobile Platforms" discusses the bulk of applications and user data that will be hosted in the cloud network....

8 Pages (2000 words) Essay

Relational Database Management System Critique

The paper "Relational Database Management System Critique" focuses on discussing the alternatives of RDBMS and big data under the subtopics of volumes of data, types of data, data storage, and security.... RDBMS was highly recommended for big data organizations.... According to Krishnamurthy, Thombre, Conway, Li & Hoyer (2014), they argue that rational databases store their data in tables (rows and columns) where each table has a unique key called primary key that uniquely identifies records in that table and cannot be NULL....

12 Pages (3000 words) Term Paper