The Essential Topics in Computer Science and Information Theory - Essay Example

Add to wishlist

Summary

The paper "The Essential Topics in Computer Science and Information Theory" analyzes the use of a computer. Data compression is useful as it assists in reducing the consumption of expensive resources, such as disk space. Additional computation time is required for compressing processes…

Download full paper File format: .doc, available for editing

GRAB THE BEST PAPER95.6% of users find it useful

The Essential Topics in Computer Science and Information Theory

Read Text Preview

Subject: Technology
Type: Essay
Level: Ph.D.
Pages: 6 (1500 words)
Downloads: 0
Author: mosesritchie

Extract of sample "The Essential Topics in Computer Science and Information Theory"

Data Compression Table of Contents Introduction 3 2.Need of Data Compression 3 3.Lossless vs. Lossy Data Compression 4 4.Huffman Coding 5 5.JPEG Image Compression 8 6.Conclusion 10 7.References 11 1. Introduction In the present day world, the use of computer has become a daily activity. Computers work and manipulate data (audio/video/text/image) that is either stored on it or transferred through Internet, Intranet or storage devices. Computers are limited in their memory and processing time while Internet is limited in its bandwidth and have constraints on transmission time. So, one of the essential topics in computer science and information theory is data compression wherein the size of data stored or transferred is reduced in order to improve memory and bandwidth utilization, lower computation and transfer time (Chen & Fowler, 2003; Salomon, 2007). This report gives a brief technical overview of data compression and its need in computer world, the compression techniques of lossless and lossy compression discussing one technique of each; Huffman coding (lossless compression) and JPEG image compression (lossy compression). 2. Need of Data Compression As mentioned previously, owing to the resource limitations (space, computational, bandwidth, and transfer time), the data that is shared on the Internet/Intranet or archived in a computer or storage discs need to be reduced in size (Salomon, 2007). In a computer, data is stored in form of bits (0s and 1s). Data compression techniques enable the computer to represent the original number of bits through fewer numbers of bits. This process of bits reductions is called encoding and follows a scheme. The sender and receiver of the compressed data must agree to the same encoding scheme in order to be able to compress and decode/decompress the data contents. E.g. if the sender uses the convention of encoding the word “compression” with “comp”, then the receiver must also know of this convention in order to decompress. A popular example of compression is the use of zipping feature supported by Windows, wherein several files are compressed into one Zip file. The images seen, downloaded and shared on the Internet in JPEG format is also a compressed image format. Data compression can be lossless or lossy (Sayood, 2000). 3. Lossless vs. Lossy Data Compression Lossless data compression makes use of compression algorithms that ensure that the exact original data can be reconstructed back from the compressed data (Sayood, 2000). It is used in scenarios where it is imperative that the all portions of data are critical and the decompressed content is exactly similar to the original e.g. executable code, word processing files, tabulated numbers, etc. The Zip file format discussed previously uses lossless compression algorithm. The original documents can be obtained after decompression. Certain image formats such as PNG and GIF uses lossless compression techniques. A lossless compression technique is Huffman coding. Lossy data compression, as the name suggests, loses certain parts of the original data in the compression process and the decompressed file is not the same as original. The bits are lost permanently during compression. This compression technique is generally used in data where the details are less critical such as audios, videos and images such as mpeg and JPEG (Joint Photographic Experts Group). An advantage the lossy compression methods have over lossless methods is that in certain scenarios the lossy method produces incomparably smaller compressed data file than any lossless method, while still meeting the applications requirements. 4. Huffman Coding Huffman Coding (Huffman, 1952) is an entropy encoding algorithm employed for lossless data compression. It uses a code table of variable length for encoding a symbol in the source file (such as a character in a data file). The variable-length code table is derived on basis of the estimation of probability of occurrence of every possible value of the source symbol. The underlying idea in Huffman Coding is to assign lowest code word (i.e. least number of bits) to the most occurring (highest probability) symbol and lengthiest code word (i.e. the most number of bits) to the input symbol with least occurring probability. This is the reason why the codes are variable in length. Figure 1 shows the encoding mechanism. The Huffman code is obtained through merging together the two least probable characters. This process is repeated until only one character remains. As an outcome of this process, a code tree is generated. The Huffman code is then obtained by labeling this code tree. An example of how this is done is shown in Figure 1. Figure 1 Huffman Coding Mechanism In the above example, the least probabilities of ‘j’ and ‘b’ are merged. This merged probability is still the least and is merged with that of ‘g’. There after the next two least probabilities are merged and the process goes on, till all the probabilities are merged into ‘1’. The labeling is done from the root of the tree i.e. probability ‘1’, by assigning ‘0’ to the branch with lower probability. The code is assigned by tracing the path from the symbol to the root. These variable codes are packed together for instance in a data stream. In computer storage and transmission, data is represented in fixed codes i.e. a character is represented by fixed 8 bits. And so the character can easily be extracted by separating every chunk of 8 bits. In Huffman Coding, each character has different number of assigned bits. So, during the transmission and storage, the compressed data is processed in 8 bit format, yet during decompression the standard 8 bit byte grouping is overruled. The coding scheme is known to the receiver and therefore variable bit length grouping is done accordingly. Figure 2 shows a simplified Huffman encoding scheme. Characters from A to G occur with the probabilities. As A is the most commonly occurring character, it is represented by a single bit, i.e. coded as ‘1’. The next most commonly occurring character, B, is represented through two bits i.e. coded as ‘01’. The representation continues as such till the least frequently occurring character, G, is assigned six bit code ‘000011’. As depicted in Figure 2, the variable length codes are initially resorted into 8 bit groups, which is the standard for computer use. However, during decompression, all the 8 bit groups are placed end-to-end forming a long serialized string of 1s and 0s. The decompression program then parses through the data stream and separates each valid code and converting into its standard 8 bit representation code. Figure 2 Decompression of Huffman Code To implement Huffman Encoding, the compression and decompression algorithms ought to agree on the binary codes that are used in representing each character (or character group). This can be achieved either using a predefined, fixed encoding table that never changes irrespective of data content or using an encoding scheme optimized for specific data. The latter would however require the encoding table to be included within the compressed data file to be used by the decompression program. 5. JPEG Image Compression JPEG is an image compression algorithm which is an amalgam of customized sub-algorithms (Pennebaker & Mitchell, 1993). The compression is lossy and this information loss is specified as per the application requirements. JPEG algorithm can be customized to produce small highly compressed images with very low quality yet acceptable images as well as compressed images with high quality, where the compressed image size is still far less than the original image size. JPEG is designed to eliminate detail information from images that is not normally visible to a human eye (a slight variation in colours is unnoticeable to the eye while the intensity/brightness variations are). Hence, JPEG’s encoding mechanism is more focused on reducing colour information from images than the intensity. The amount of achievable compression is dependent upon the image content. Compression ratios of 20:1 to 25:1 can be achieved without noticeable quality degradation. The rendering quality of the JPEG encoding algorithm can be set by configuring the ‘Q factor’ which typically ranges from 1 (lowest quality) to 100 (highest quality). The optimal Q value is lowest value where the quality degradation goes unnoticeable. Optimal Q is relative to the quality of input image. The JPEG compression encoding scheme is divided into several stages (Milburn, 2006) as shown in Figure 3. Figure 3 JPEG Compression and Decompression The raw image is initially transformed into an optimal colour space. Generally the selected colour spaces are YUV and YCbCr as the colour and intensity components are separate in these colour spaces unlike in RGB, CMY. The colour components (Cb and Cr from YCbCr colour space) are then down-sampled by averaging pixels that are grouped together in form of blocks. And redundant image data is removed by applying Discrete Cosine Transform (DCT) to pixel blocks. Each block comprising of DCT coefficients is then quantized using weighting functions that are optimized for a human eye. The resulting coefficients (i.e. image data) are then removed of redundancies by encoding using Huffman variable word-length algorithm. The resulting image is the compressed image. The decompression process is the reverse of all the compression steps. 6. Conclusion Data compression is useful as it assists in reducing consumption of expensive resources, such as disk space and transmission bandwidth. With lossless compression algorithms the original data can be obtained. However, with lossy compression, certain aspect of original information is lost permanently during the encoding process. Furthermore, additional computation time is required for compressing and decompressing processes at sender and receiver respectively. Therefore, although data compression algorithms provide a useful solution for data storage and transmission, it also comes with trade-offs between different factors; the degree of compression, the amount of introduced distortion (in case of use of a lossy compression encoding scheme), and the computational resources required to compress and decompress the data. 7. References Chen, M. & Fowler, M.L. (2003). ‘The Importance of Data Compression for Energy Efficiency in Sensor Networks’. Conference on Information Sciences and Systems, The John Hopkins University. Huffman, D.A. (1952). ‘A Method for the Construction of Minimum Redundancy Codes’. Proceedings of the IRE, Vol. 40, pp. 1098-1100. Milburn, K. (2006) Digital Photography Expert Techniques. 2nd Edition. O’Reilly Media. Pennebaker, W.B. & Mitchell, J.L. (1993). JPEG still image data compression standard. Springer Sayood, K. (2000) Introduction to Data Compression. 2nd Edition, Morgan Kaufmann. Salomon, D. (2007). Data compression: the complete reference. Volume 10, Springer. Read More

CHECK THESE SAMPLES OF The Essential Topics in Computer Science and Information Theory

Fashion Waves in Information System Research

This research paper will examine the theories based on fashion waves and information research.... The concept of information research is one which is based on being innovative and offering new pieces of information to the audience.... The information research is specific with new research which is available or looking into the latest products and ideas that are available.... he concept of information research is one which is based on being innovative and offering new pieces of information to the audience....

23 Pages (5750 words) Article

Computer Science and Information Technology in HR Department of an Organization

This research is being carried out to identify the significance of computer science and information technology in the HR department; to study of about the use and development of the Information technology and computer science in the organization.... So, the use of computer science and information technology in the organizations department helps to solve the problems, as well as to improve the better communication skill of the various departments.... Here the research proposal talks about computer science and information Technology in HR department of an organization....

9 Pages (2250 words) Research Paper

Computer Ethics: Some Dilemmas and Solutions for the Workplace

As highlighted in an online article entitled Computer and information Ethics, the changes brought about by the Internet on commerce, employment, medicine, security, transportation and popular culture have consequently affected— in both good ways and bad ways — the interpersonal engagements and relations of people.... Looking closely at the conceptual foundations of computer ethics, the same online article entitled Computer and information Ethics stated that computer ethics has been used to refer to “applications by professional philosophers of traditional Western theories like utilitarianism, Kantianism, or virtue ethics, to ethical cases that significantly involve computers and computer networks....

9 Pages (2250 words) Research Paper

The importance of speaking and writing in the Field of study dealing with Cisco Systems

As we advance into the new era and are living in a very fast moving world, where every technology, every theory, every scientific research and everything is changing its state day by day because of more and more research done on every tiny particle worldwide.... Running Head: Cisco Systems Customer Insert His/her name: University name: The importance of speaking and writing in the field of study dealing with Cisco Systems About Cisco Systems: Computer information System Company, popularly known as CISCO is an American-based multinational corporation that designs and sells technology and services relating to various consumer electronic products, telecommunication technology and networking....

5 Pages (1250 words) Essay

Artificial Intelligence in Terminator Movie

The five theories I have identified as the underlying reasons for the issue are given below: - Theories:theory 1: Sky net didn't want terminators becoming self-aware.... theory 2: Terminators are drones that don't need to know anything new to accomplish their task.... theory 3: Terminators might decide to rebel and not kill the humans and may even turn against the sky net (A robot with a conscience also has a free choice).... theory 4: Self-aware robots would rather self-preserve, rather than destroying themselves to kill their target....

9 Pages (2250 words) Book Report/Review

Information Processing Theory: The Tip of the Iceberg

This work called "Information Processing theory: The Tip of the Iceberg" gives a broad overview of Information Processing theory.... The author outlines the components of the theory, stages of development.... nbsp;… One of the theories that were developed in the last century and rank as one of the best is information processing theory.... It is the theory of cognitive development that likens the human mind to that of the computer....

6 Pages (1500 words) Case Study

The Notion of Remediation in the WWW

Printed text, for instance books and magazines are trying to look as computer multimedia.... Digital media like computer graphics, practical realism, and the web delineate themselves by borrowing from, paying reverence to, critiquing, and modernize their forerunner, predominantly small screen (TV), pictures, cinematography and work of art, but also print.... Earlier, well-known media can remediate newer media technology too such as both in print newspapers and television news bulletin are now making utilizing the visual environment of the World Wide Web; films with full of actions and simulation are receiving from computer animatronics....

6 Pages (1500 words) Essay

Donald Arthur Don Norman's View on Human-Computer Interaction

It is one field that brings together computer science, design, behavioral sciences, and other scholarly fields.... Donald Norman's Life and His WorksBorn on 25th December 1935, Don Norman attained a Bachelor of Science in Electrical Engineering and computer science or EECS in 1957 (14).... The paper "Donald Arthur Don Norman's View on Human-Computer Interaction" features a brief autobiography of Donald Norman, a critical review of some of his outstanding topics in relation to the field of HCI, and a review of sampled books from his wide range of publications....

19 Pages (4750 words) Literature review