Revolution of High Performance Computing Essay Example | Topics and Well Written Essays

How GPUs have Revolutionized High Performance Computing Abstract The high performance computing industry’s requirement for computation is increasing, as complex and big computational problems are very common over most of the industry segments. However, conventional CPU technology does not have the capability of scaling in performance adequately to handle this demand. The parallel processing capacity of the general processing units (GPUs) help the CPU to subdivide complex computing tasks that may be run at the same time. This capability is helping computational researchers and scientists to deal with many of the most challenging computational issues in the world. GPU-accelerated computing can be described as using GPU together with a CPU to accelerate enterprise, engineering, and scientific applications. Initiated in the year 2007 by NVIDIA, GPUs are currently used to power energy-efficient datacenters in small and medium businesses, enterprises, universities, and government labs around the globe. The purpose of this paper is to examine how the GPUs have revolutionized high performance computing. I. Introduction The high performance computing industry’s requirement for computation is increasing, as complex and big computational problems are very common over most of the industry segments. However, conventional CPU technology does not have the capability of scaling in performance adequately to handle this demand. The parallel processing capacity of the general processing units (GPUs) help the CPU to subdivide complex computing tasks that may be run at the same time. GPUs have been used for general purpose computation for over ten year now. General purpose computation is usually abbreviated as GPGPU[Bro131]1. Initially, GPU was used much as a calculator, where it had some permanent functions that were manipulated to obtain some preferred outcome. Since the GPU is intended to produce a 2-D image from a 3-D virtual world, the functions it possibly will execute were basically related to graphics, and the initial GPU codes were conveyed as operation on graphical primitives. It was hard to optimize, debug, and develop these programs, and compiler errors were often found. Nevertheless, the proof-of-principle programs indicated that the employment of GPUs possibly will accelerate CPUs for specific algorithms[Aro12], and exploration on GPUs shortly contributed to the growth of upper-level intermediary languages that separated aside the graphics. However, the intermediary languages were quickly relinquished as hardware dealers launched customized non-graphics languages that allowed the employment of general processing units for general purpose computing. The main benefit of GPU computing over CPU is its great capabilities. Currently, there is a capability difference of about 7 times between the GPU and the CPU when comparing theoretic peak gigaflops and bandwidth capabilities[Bro131] (see Figure 1). The performance gap is attributed to architectural differences and physical per-core restraints between the GPU and CPU processors. GPU is highly parallel and is quickly gaining popularity as a strong engine for applications that are computationally challenging. The GPU’s potential and performance provide a lot of potential for forthcoming computing systems. A regular approach of improving performance capabilities in GPUs is parallelism. Parallelism seems to be a viable technique of improving performance, and there are number of applications that present hindering parallel workloads that are well-matched for general processing units. Figure 1: Historic comparison of theoretic peak gigaflops and bandwidth performance Source:[Bro131] Currently, there are 3 main general processing unit dealers for the PC market namely Intel, AMD, and NVIDIA. Intel is considered to be the largest GPU vendor; however, it is simply predominant in the integrated and low performance market[Cha131]. For discrete and high performance graphics, NVIDIA and AMD are the dominant suppliers. In industrial and academic environments, NVIDIA seems to be the sole dominant supplier. The purpose of this paper is to examine how the GPUs have revolutionized high performance computing. II. Discussion A. CPU and GPU Integration Over the last few years, several CPU applications have been transferred to GPUs. Some implementations totally map to the GPU, while others only map specific kernel codes to the GPU. The GPU and the CPU are distinct processors that function asynchronously[Lee13]. This indicates that it is possible to let the GPU and CPU carry out different tasks at the same time, which is a critical component of heterogeneous computing. This is exposed a stream in the CUDA API[Che08]. Every stream is an in-sequence queue of tasks that are carried out by general processing unit, including kernel launches and memory transfers. Independent streams are also supported, which may perform their tasks at the same time so long as they follow their own streams sequence. Present general processing units are capable of supporting 16 concurrent kernel launches[Cha131], which imply that it is possible to have task parallelism, with reference to varying coexisting kernels, and data parallelism, with reference to a computational network of blocks. In addition, GPUs also support overlapping memories between the GPU and the CPU and kernel execution. This implies that it is possible to concurrently copy information from the CPU to the GPU, carry out sixteen different kernels, and copy back data to the CPU if each operation is properly scheduled to various streams[Che08]. While transmitting data from CPU to GPU and vice versa across the PCIe (PCI express bus), it is advisable to employ page-locked memory. This significantly stops the OS from paging memory, implying that the memory field is assured of being constant and in physical Random Access Memory (RAM). Nonetheless, page-locked memory is limited and quickly fatigued if not utilized well. CUDA has the capability of supporting an integrated address space, where the actual position of a pointer is determined automatically. This means that data can be moved from the General Processing Unit to the Central Processing Unit without indicating the direction of the copy[Bro131]. GPU-accelerated computing can be described as using GPU together with a CPU to accelerate enterprise, engineering, and scientific applications. Initiated in the year 2007 by NVIDIA, GPUs are currently used to power energy-efficient datacenters in small and medium businesses, enterprises, universities, and government labs around the globe. GPU-accelerated computing provides extraordinary application performance by offloading compute-intensive segments of the application to the GPU, while the rest of the code still runs on the CPU. Figure 2 shows how GPU-acceleration works. A better way to know the difference between a GPU and CPU is to compare the manner in which they process tasks. A GPU comprises of thousands of smaller, efficient cores designed to carry out multiple tasks simultaneously, while a CPU comprises of a few cores optimized for sequential processing[Lee13] (see Figure 3). Figure 2: How GPU Acceleration Works Figure 3: CPU Multiple Cores and GPU Thousand of Cores Source: [Lee13] B. General-Purpose Computing on the GPU (GPGPU) GPGPU can be described as the utilization of GPU to carry out computation in applications conventionally handled by CPU. Representing general-purpose computation onto the general processing units employs the graphics hardware just like any other typical graphics application. Due to this resemblance, it is both difficult and easier to describe the process. From one point of view, the real tasks are unchanged and easier to comply; on the contrary, the expressions are different between general purpose and graphics use. The major difficulty in programming general-purpose GPU applications has been that regardless of their general purpose functions having nothing related to graphics, graphics APIs has to be used to program the applications[Bro131]. Additionally, the code should be integrated with regards to the graphics pipeline. Currently, GPU computing applications are structured as follows: a) the computer programmer expresses the computation domain of interest directly as a controlled network of threads; b) a SPMD (single-program multiple-data) general purpose program compute the value of all threads; c) the value of all threads is calculated by combining math operations and both read and write accesses to global memory. In this method, one buffer may be employed for both writing and reading, permitting more flexible algorithms; and d) the resultant buffer in the global memory may then be applied as a input in upcoming computation[Sch131]. This programming model is a strong one for a number of reasons. One, it permits the hardware to completely use the application’s data parallelism by openly identifying that parallelism in the code. Second, it balances restrictions and generality to guarantee great performance. Lastly, its absolute retrieval to the programmable units eradicates most of the difficulties that was faced by early general-purpose GPU programmers in choosing the graphics interface for general-purpose programming. In regard to this, programs are frequently conveyed in well-known programming language and are easier and simpler to create and fix. This results to a programming model that helps its handlers to benefit from the GPU’s strong hardware, and also allows an increasing high-level programming model that facilitates prolific authoring of composite applications[Bro131]. C. Multiple GPUs Although GPU devices offer a very powerful computing architecture, the performance of a single GPU cannot perform well in high-performance computing applications. The possible and logical solution for high performance computing applications is to distribute all the computational tasks among multiple GPU devices. Computing on multiple GPU devices consist of solving a computational tasks on more than one GPU that may be hosted on the same host machine or different hosts connected by a network[Lee13]. The benefit of this strategy is that the amount of available memory is increased and the number of GPU cores available to process the computation is increased. Most of the GPU applications are constrained by the comparatively small amount of memory available on each device. By dividing a task between different devices, the amount of device memory is increased[Cha131]. Graphic cards are connected to their hosts through the PCI express bus, and GPUs operate well when connected to a PCIex16 slot. A number of motherboards have 4 PCI express card slots and can host 4 cards. Since there are dual-GPU graphics cards, a single host has the capability of hosting up to 8 GPUs[Cha131]. Another method of hosting multiple devices from one machine is to host them in a PCIe chassis and join them to the host by a PCI express card. The last configuration of multiple GPU systems is to host the devices in distributed nodes through a network. This kind of multiple GPU system is more extensible because additional nodes may be easily added to the network[Aro12]. Nevertheless, the disadvantage of such a system is the increased communication time between devices. GPUs hosted by similar machine are connected to similar PCI express bus and can communicate with other devices or host CPU through the bus. However, if GPU devices are hosted by different machines, this communication have to go through a comparatively network[Aro12]. D. Applications for GPUs There so many applications that uses GPUs to make computing experience productive. We all use more than one application at one. For instance, gamers may occasionally make use of photo editors, slideshow, web browser, and a spreadsheet. This section will describe three applications that make use of GPUs. The first application is gaming. PC GPUs were initially invented for 3-dimensional gaming on PCs. Sid Meier’s Civilization is one of the longest running gaming franchises on the PC[Cas10]. The latest version, Civilization V, has reinvented the game again. It creatively employs Microsoft’s DirectX 11 graphics API to engross the player in the game. Rolling clouds conceal areas of the map that are not explored in a literal fog of war. Individual civilization leaders are rendered in real time artistically, instead of the previous canned animations, employing graphical effects like cloth animation and heat shimmer, bringing realism to the virtual opponents. Modern GPUs have also allowed Civilization V’s developers to create animated characters that bring the maps to life. Employing modern GPUs in this game has offered robust re-playability and rich gameplay[Cas10]. The second application involves productivity. Microsoft Office 2010 and subsequent versions offer GPU acceleration for most of its graphical elements such as PowerPoint and WordArt transitions. Although Office use of the GPU will not overtask an AMD Radeon graphics card, AMD’s Eyefinity technology has the capability of running Office on 3 to 6 displays using only one enabled Radeon HD 6000 0r 5000 series card[Cas10]. The integration of GPU acceleration for critical elements of Microsoft Office 2010 and subsequent versions plus three-monitor AMD Eyefinity technology is a powerful one. Having PowerPoint, Word and Excel in big windows, all on their own screens makes combining data over multiple applications faster and easier than ever[Cas10]. The third application involves video editing. Video editing requires dense use of system resources even on high performance desktops. Consumer applications such as Adobe Premiere Elements 9, are providing features that were only available for professional previously[Cas10]. Transitions such as card flip, sphere or page curl are all GPU-accelerated in Adobe Premiere Element 9. Effects such as ripple and refraction are also GPU-accelerated. An AMD Radeon GPU with a graphics card will spend up final rendering and preview, making it faster and more enjoyable to create a video[Cas10]. III. Conclusion The parallel processing capacity of the general processing units (GPUs) help the CPU to subdivide complex computing tasks that may be run at the same time. This capability is helping computational researchers and scientists to deal with many of the most challenging computational issues in the world. GPU-accelerated computing can be described as using GPU together with a CPU to accelerate enterprise, engineering, and scientific applications. GPUs have been used for general purpose computation for over ten year now. The main benefit of GPU computing over CPU is its great capabilities. Currently, there is a capability difference of about 7 times between the GPU and the CPU when comparing theoretic peak gigaflops and bandwidth capabilities. The performance gap is attributed to architectural differences and physical per-core restraints between the GPU and CPU processors. Currently, there are 3 main general processing unit dealers for the PC market namely Intel, AMD, and NVIDIA. Intel is considered to be the largest GPU vendor; however, it is simply predominant in the integrated and low performance market. For discrete and high performance graphics, NVIDIA and AMD are the dominant suppliers. GPU-accelerated computing provides extraordinary application performance by offloading compute-intensive segments of the application to the GPU, while the rest of the code still runs on the CPU. General-Purpose Computing on the GPU can be described as the utilization of GPU to carry out computation in applications conventionally handled by CPU. Representing general-purpose computation onto the general processing units employs the graphics hardware just like any other standard graphics application. Although GPU devices offer a very powerful computing architecture, the performance of a single GPU cannot perform well in high-performance computing applications. The possible and logical solution for high performance computing applications is to distribute all the computational tasks among multiple GPU devices. There are many application that use GPUs, but this paper has only focused on three namely gaming, Microsoft Office productivity, and video editing. References Bro131: , [1], Aro12: , [2, 3], Cha131: , [4], Lee13: , [3], Che08: , [5, 2], Che08: , [5], Lee13: , [3], Sch131: , [6, 1, 4], Aro12: , [2], Cas10: , [7], [1] A. R. Brodtkorb, T. R. Hagen and M. L. Saetra, "Graphics processing unit (GPU) programming strategies and trends in GPU computing," Journal of Parallel and Distributed Computing, vol. 73, no. 1, pp. 4-13, 2013. [2] M. Arora, S. Nath, S. Mazumdar, S. B. Baden and D. M. Tullsen, "Redefining the role of the CPU in the era of CPU-GPU integration," The IEEE Computer Society, 15 June 2012. [Online]. Available: http://cseweb.ucsd.edu/~marora/files/papers/cpu-gpu-ieeemicro-2012.pdf. [Accessed 5 December 2013]. [3] J. Lee, S. Li, H. Kim and S. Yalamanchili, "Design space exploration of on-chip ring interconnection for a CPU-GPU heterogenous architecture," Journal of Parallel and Distributed Computing, vol. 73, no. 12, pp. 1525-1538, 2013. [4] I. Chakroun, N. Melab, M. Mezmaz and D. Tuyttens, "Combining multi-core and GPU computing for solving combinatorial optimization problems," Journal of Parallel and Distributed Computing, vol. 73, no. 12, pp. 1563-1577, 2013. [5] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer and K. Skadron, "A performance study of general purpose applications on graphics processors using CUDA," Journal of Parallel and Distributed Computing, vol. 68, no. 10, pp. 1370-1380, 2008. [6] C. Schulz, "Efficient local search on the GPU-Investigation on the vehicle routing problem," Journal of Parallel and Distributed Computing, vol. 73, no. 1, pp. 14-31, 2013. [7] L. Case, "Three killer applications for GPUs," IDG Creative Lab, 9 December 2010. [Online]. Available: http://www.pcworld.com/article/213109/gpu.html. [Accessed 4 December 2013]. Read More

Revolution of High Performance Computing - Essay Example

Extract of sample "Revolution of High Performance Computing"

CHECK THESE SAMPLES OF Revolution of High Performance Computing

Efficiency of Cloud Computing Data Centers

The Cross-Channel Transport Business and Eurotunnel

Computing platforms

The Use of Cloud Computing

High Definition Optical Disc Format

Efficiency of Cloud Computing Data Centers

Computer Impacts on Human Life

Enterprise Utilisation of Cloud Computing UAE Context