StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Pros and Cons of Directory-Based and Snoopy Protocols - Literature review Example

Cite this document
Summary
The paper "Pros and Cons of Directory-Based and Snoopy Protocols" compares two cache coherence protocols.  Directory-based protocols are potentially viable in as far as, scalability is concerned, and they can scale distributed shared-memory multicore processors to thousands of processors…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER96.6% of users find it useful
Pros and Cons of Directory-Based and Snoopy Protocols
Read Text Preview

Extract of sample "Pros and Cons of Directory-Based and Snoopy Protocols"

?A Survey of Literature on Cache Coherence s) Department, Address E-mail – Many multiprocessor chips and computer systems today have hardware that supports shared-memory. This is because shared-memory multicore chips are considered a cost-effective way of providing increased and improved computing speed and power since they utilize economically interconnected low-cost microprocessors. Shared-memory multiprocessors utilize cache to reduce memory access latency and significantly reduce bandwidth needs for the global interconnect and local memory module. However, a problem still exists in these systems: cache coherence problem introduced by local caching of data, which leads to reduced processor execution speeds. The problem of cache coherence in hardware is reduced in today’s microprocessors through the implementation of various cache coherence protocols. This article reviews literature on cache coherence with particular attention to cache coherence problem, and the protocols-both hardware and software that have been proposed to solve it. Most importantly, it identifies a specific problem associated with cache coherence and proposes a novel solution. Keywords: microprocessor, latency, cache coherence, bandwidth, multiprocessor, cache coherence protocol, shared memory, multicore processor I. Introduction Currently, there is undeniable interest in the computer architecture domain with regard to shared-memory multiprocessors. Often, proposed multiprocessor designs include a reserved cache for each processor within the system. This, in turn, results in the cache coherence problem (Cheng, Carter, & Dai, 2007). This situation, in which several caches are allowed to have simultaneous copies of a certain memory location, requires that a certain memory location be in place. This is to make sure that when the contents of that particular memory location are changed, there needs to be a mechanism that ensures all copies remain unchanged. Consequently, some systems employ a software mechanism to ensure multiple copies do not occur. This it achieved by labeling shared blocks so that they are not cached (Chang & Sohi, 2006). Additionally, task data in all caches are prohibited or restricted from migration. Alternatively, all blocks may be allowed to be cached by all processors and to depend on a cache coherence protocol to be responsible of ensuring that there is consistency. Various such protocols have been proposed, designed and/or described with some ideal for shared-bus and others specifically suitable for a general-purpose interconnection network. There is a substantial difference between shared-bus protocols and general network protocols. Firstly, share-bus protocols depend on every cache controller monitoring the bus transactions of all the other processors within the system and take appropriate action to ensure consistency is maintained. Secondly, each block’s state within the system is encoded in a distributed manner among all other cache controllers. As such, the cache controllers are able to monitor the traffic of the bus for the purposes of coherence; these are referred to as snooping cache controllers (Kurian et al., 2010). Recently, many studies and researches have been conducted and have mainly focused on shared-memory multiprocessors. They are common mainly because of their simple programming model, which means that they are simple to implement. Normally, address space is shared among all multiprocessors. This enables them to communicate to one another via a solitary address space. As had been earlier noted, a system with cache coherence results in the event that there is same cache block within multiple caches (Stenstrom, 1990). When such a scenario occurs, it does not affect the read process; however, in the event that a processor, for writes, writes to a single location, the resulting change must be updated to all caches. Therefore, cache coherence, according to (Archibald & Baer, 1986), refers to all caches having consistence data in the event of data write. Cache coherence protocol is a distributed algorithm that has been used by multiprocessor architects to deal the problem of cache coherence (Archibald & Baer, 1986). Different types of cache coherence protocol have been discussed and designed. They all differ from each other in terms of the scope of the area that is updated using the write process. It has been noted that these protocols have an impact on the performance of multiprocessors – a parameter that is normally very difficult to estimate. A system's performance is considered to be in direct proportionality with a microprocessor’s access latency. This paper reviews literature on cache coherence with particular attention to cache coherence problem, and the protocols, both hardware and software that have been proposed to solve it. Most importantly, it identifies a specific problem and proposes a solution or a cache coherence protocol based on those reviewed (MPSoCs, 2006). The first part describes cache coherence and analyzes the cache coherence problem. Part two reviews literature on cache coherence protocol and describes various protocols. The third part compares and contrast the protocols discussed in part three with regard to their advantages and disadvantages. Additionally, a specific cache coherence problem is identified and a suitable solution is proposed. A conclusion is then offered, as a summary of the whole subject of cache coherence with recommendation of which cache coherence protocol based on trade-offs is ideal. II. Cache Coherence Problem – An Analysis In this part, cache coherence is discussed and analyzed with regard to distributed and shared-memory. For more than two decades, cache memories have been heavily relied on in ensuring that microprocessors achieve rapid, high performance. This enables the rapid increment of processor speeds by exploiting locality during memory access. Caches are very effective in terms of operation and have little or no effect on the compiler or programmer. The cache hierarchy details do not affect the operation of instruction set architecture. Initially, in uniprocessors, the implementation of cache hierarchy had little or no ramifications on memory consistency (Molka, Hackenberg, Schone, & Muller, 2009). However, with the multiprocessor paradigm, cache hierarchy has had an adverse effect on multiprocessor memory consistency. This has been attributed to store propagation. For instance, in a system, two processors p1 and p2 can each hold in their private caches, the same memory block. The values in each cache will be different because of a subsequent store. Therefore, in the event that p1 stores to a memory block that is present in both caches, p1 and p2, then the cache in p2 holds a stale value since p1 had by default storing in its own cache. However, if p2 never loads to the memory block again, or the microprocessor did not support shared memory, cache incoherence would not be problematic. However, multiprocessor memory now support shared-memory, and as such, there is a point in which p2 must receive the value p1 had stored. This implies that what is stored in p1 impacts the status of p2’s cache so as to ensure consistency. The technique of doing this is what is referred to as cache coherence (Lai, Liu, Wang, & Feng, 2012). It is argued that a system is considered cache coherent in the event that execution leads to a valid writes and reads to a memory. A valid ordering is one in which a total order of all reads and writes to a given memory location leads to the value each read operation returns is the value written to the said location with the last write. In a memory system that is cache coherent, writes must be ordered in totality with other reads and writes to that very location (Sorin, Hill, & Wood, 2011). Despite this, a common optimization technique enables partial ordering of reads to a memory location. In this case, either many readers or one writer, but not both, may exist. This ensures that all processors see all writes to a given memory location in a similar order. Otherwise, a cache coherence problem arises. Fig. 1 illustrates a cache coherence problem. Initially, the memory has location x marked with value 0, and both processors 1 and 0 are reading into their caches the location x. Processor 1 cache will contain a value 0 for the location x in the event that the location x is written into the value 1 by processor 0. If processor 1 continues to subsequently read location x, then the cached, stale value 0 will continuously be returned. Normally, this is not what a programmer expects since the anticipated behavior is that, for any read, a processor needs to return the most updated copy of the data. Fig. 1 (Sorin, Hill, & Wood, 2011) In the event that multi-users depend on a single common memory resource, the problem of ensuring consistent up-to-data, also referred to as cache coherence problem, is aggravated. A cache coherence protocol is used to counter this problem. Cache coherence protocols ensure that any requests for any particular data return the most up-to-date value (Archibald & Baer, 1986). According to Marty (2008), cache coherence protocols achieve this by springing into action whenever there is a write to a memory location. III. Cache Coherence Protocol Cache coherence protocols ensure that any requests for any particular data return the most up-to-date value (Archibald & Baer, 1986). This is simply to say that given that a cache line is the granularity of a coherence protocol, whenever a cache line is written into, the cache coherence protocol springs into action. Normally, protocols take two actions in the event that a cache line is written; they may, first, destroy or invalidate all copies of that particular cache line from the other caches in the system. Secondly, the cache coherence protocol may opt to update those cache lines with the new value that is written. Most cache-coherent memory systems utilize the invalidation mechanism instead of the update one simply because it is very easy to implement in hardware (Chen, 2008). Furthermore, the continuous increase in the size of cache lines has ensured that the invalidation protocols remain relevant and popular. However, there are situations where update-based protocols are ideal, for instance, when accessing some other synchronization type variables and heavily contended lines. According to Stenstrom (1990), two main classes of cache coherence protocols exist, including directory-based protocols and snoopy protocols. Al-Hothali, Soomro, Tanvir, and Tuli (2010) consider snoopy protocols as broadcast systems. Snoopy protocols are known to utilize a broadcast medium within a system and are only applicable in bus-based multiprocessors (Borodin & Juurlink, 2008). As far as snoopy protocols are concerned, every cache “sneaks” on the bus and monitors for any processes that impact it. In the event that a cache discovers a read request on the bus, it checks to find out whether it is the most up-to-date copy of the datum; if so, it responds to the bus request. In the event that the cache finds out that there is a write on the bus, if a line is present, it is invalidated out of the cache. Building snoopy bus-based systems are very easy (Ferdman et al., 2012). However, an increase in the number of processors on a bus results in a bandwidth bottleneck, which, in turn, makes dependence on broadcast techniques a scalability nightmare. Snoopy protocols are commonly used in commercial multicore processors. Several snoopy bus-based protocols have been suggested and they are primarily categorized as Write Through / Write invalidate and Write Through / Write update (Archibald & Baer, 1986). In write invalidate protocol, as had earlier been mentioned, all the other caches containing a copy are invalidated. This protocol is advantageous because of its simplicity; however, it is disadvantageous because it will lead to a cache miss (Al-Hothali, Soomro, Tanvir, & Tuli, 2010). On the other hand, the write broadcast t or update protocol updates all cached copies and as such, there cannot be a cache miss. However, this method is disadvantageous because it uses more bandwidth since it needs to broadcast all the writes to the shared cache lines. These problems have been addressed by the adoption of distributed shared memory architecture (DSM) where each multiprocessor node contains a processor, a portion of the system’s physical distributed memory, its caches and a node controller (Lai, Liu, Wang, & Feng, 2012). The node controller is responsible for managing communication between and within nodes. Instead of the nodes being connected by a share bus, they connected using an interconnection network that is scalable. This allows multiprocessors to be able to scale to thousands of nodes. However, due to the fact that a broadcast medium is lacking, there is a problem for snoopy bus-based protocols – a cache coherence protocol –which have become inappropriate. As such, architecture designers must resort to directory-based cache coherence protocol (Sorin, Hill, & Wood, 2011). Directory-based cache coherence protocols simply refer to a secondary data structure that monitors the status of each cache line within the system (Kent, 1987). The directory is required to track, for each cache line, which caches have the most up-to-date copy or have read-only copies of the line if it is exclusively held. A system that is directory-based cache coherent takes an appropriate action based on the directory’s current state and request type after consulting the directory after every cache miss. In order to do away with the bottleneck caused by a single monolithic directory, the directory is distributed throughout the system just as the main memory is. Three categories of directory-based cache coherence protocols exist, including chained directories, full-map directories and limited directories (Chou, Aisopos, Lau, Kurosawa, & Jayasimha, 2007). Despite all these advantages and disadvantages, it is clear that a problem exists. There needs to be some sort of trade-off between directory-based protocol and snoopy based protocols. When snoopy-based cache coherence protocol is used, the speed of shared buses and cache coherence overhead limits the required bandwidth for broadcasting messages. On the other hand, when a directory based cache coherence protocol is utilized, the extra interconnect traversal and the directory access lies on the main path of cache-to-cache misses and the directory state is manipulated. These pertinent issues or problems cannot be addressed by either of the cache coherence protocols. As such, on the basis of the review offered herein, it would be ideal if a hybrid-compatible cache coherence protocol is designed. Such a protocol must be able to be used on the same system at the same time by different cache. This would act to correct the demerits of both snoopy bus-based and directory based cache coherence protocols. Conclusion The two cache coherence protocols proposed – directory-based protocols and snoopy protocols – all have merits and demerits. As noted, directory-based protocols are potentially viable in as far as, scalability is concerned, and they can scale distributed shared-memory multicore processors to thousands of processors. Therefore, compared to snoopy protocols, directory-based protocols scale much better. Additionally, they can arbitrarily exploit point-to-point interconnects. Snoopy protocols, on the other hand, require broadcast media to be able to monitor memory transactions of other processors. Snoopy protocols are advantageous in the sense that there is average low miss latency. However, the speed of shared buses and cache coherence overhead limits the required bandwidth for broadcasting messages. Consequently, snoopy protocols are inefficient in terms of power dissipation. The question, therefore begs: Which cache coherence protocol is better? Each of these protocols is better in their own right depending on applicability. Generally, for smaller systems, snooping protocol is better, whereas for large systems, which need performance boost, directory based protocols are ideal. Therefore, this paper recommends a hybrid-compatible cache coherence protocols is designed. Such a protocol must be able to be used on the same system at the same time by different cache. This would act to correct the demerits of both snoopy bus-based and directory based cache coherence protocols. IV. References Al-Hothali, S., Soomro, S., Tanvir, K., & Tuli, R. (2010). Snoopy and directory based cache coherence protocols: A critical analysis. Journal of Information & Communication Technology, 1–10. Archibald, J., & Baer, J.-L. (1986). Cache coherence protocols: Evaluation using a multiprocessor simulation model. ACM Transactions on Computer Systems, 4(4), 273–298. Barroso, L. A., & Dubois, M. (1991). Cache coherence on a slotted ring. ICPP. Borodin, D., & Juurlink, B. (2008). A Low-cost cache coherence verification method for snooping systems. Delft: Delft University of Technology. Chaiken, D., Kubiatowicz, J., & Agarwal, A. (1991). Limitless directories: A scalable cache coherence scheme. ASPLOS-IV. Chang, J., & Sohi, G. S. (2006). Cooperative Caching for chip multiprocessors. Proceedings of the 33rd Annual International Symposium on Computer Architec-ture (ISCA) (pp. 264–276). Chen, X. (2008). Verification of hierarchical cache coherence protocols for futuristic processors. The University of Utah, School of Computing, Utah. Cheng, L., Carter, J. B., & Dai, D. (2007). An Adaptive cache coherence protocol optimized for producer – consumer sharing. University of Utah. Chou, C.-C., Aisopos, K., Lau, D., Kurosawa, Y., & Jayasimha, D. N. (2007). Using OCP and Coherence extensions to support system-level cache coherence. OCP. Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D. et al. (2012). Clearing the clouds: A study of emerging scale-out workloads on modern hardware. Proceedings of the 17th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2012). London: ACM. Glasco, D. B., Delagi, B. A., & Flynn, M. J. (1993). Update-based cache coherence protocols for scalable shared-memory multiprocessors. Stanford University, Departments of Electrical Engineering and Computer Science. Stanford: Stanford University. Kent, C. A. (1987). Cache coherence in distributed systems. palo alto: western Research Laboratory. Kurian, G., Miller, J. E., Psota, J., Eastep, J., Liu, J., Michel, J. et al. (2010). ATAC: A 1000-core cache-coherent processor with on-chip optical network. Proceedings of Parallel Architectures and Compilation Techniques (PACT). Vienna: ACM. Lai, X., Liu, C., Wang, Z., & Feng, Q. (2012). A cache coherence protocol using distributed data dependence violation checking in TLS . Intelligent System Design and Engineering Application (ISDEA), 2012 Second International Conference (pp. 5–10). Sanya: IEEE Xplore. Lee, D., Wester, B., Veeraraghavan, K., Narayanasamy, S., Chen, P. M., & Flinn, J. (2010). Respec: Efficient Online multiprocessor replay via speculation and external determinism. ASPLOS’10. Pittsburgh: ACM . Marty, M. R. (2008). Cache Coherence techniques for multicore processors. University of Wisconsin, Computer Sciences, Madison. Molka, D., Hackenberg, D., Schone, R., & Muller, M. S. (2009). Memory Performance and cache coherency effects . 18th International Conference on Parallel Architectures and Compilation Techniques (PACT) (pp. 261–270). IEEE. MPSoCs, C. C.-M. (2006). Mirko Loghi; Massimo Poncino; Luca Benini. ACM Transactions on Embedded Computing Systems, 5(2), 383–407. Sorin, D. J., Hill, M. D., & Wood, D. A. (2011). A primer on memory consistency and cache coherence. (M. D. Hill, Ed.) San Rafael, CA: Morgan & Claypool Publishers. Stenstrom, P. (1990). A Survey of cache coherence schemes for multiprocessors. Lund: Lund University. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(“Cache coherence Term Paper Example | Topics and Well Written Essays - 2250 words”, n.d.)
Cache coherence Term Paper Example | Topics and Well Written Essays - 2250 words. Retrieved from https://studentshare.org/information-technology/1457419-cache-coherence
(Cache Coherence Term Paper Example | Topics and Well Written Essays - 2250 Words)
Cache Coherence Term Paper Example | Topics and Well Written Essays - 2250 Words. https://studentshare.org/information-technology/1457419-cache-coherence.
“Cache Coherence Term Paper Example | Topics and Well Written Essays - 2250 Words”, n.d. https://studentshare.org/information-technology/1457419-cache-coherence.
  • Cited: 0 times

CHECK THESE SAMPLES OF Pros and Cons of Directory-Based and Snoopy Protocols

Pros and cons of caffeinne

Moderation is the key to enjoying products that contain caffeine without suffering ill effects but the operative word is moderation (“The pros and cons of Caffeine”, 2003).... “pros and cons of the Caffeine Craze.... “The pros and cons of Caffeine Use for Exercise.... “(The) pros and cons of Caffeine in Your Diet.... “Adenosine basically tells us to slow down” (“Caffeine pros and cons”, 2006)....
2 Pages (500 words) Essay

An Analysis of the Celebritized Snoop Dogg

The paper "An Analysis of the Celebritized Snoop Dogg" states that it is important to state that the rise to fame of Snoop Dogg clearly attests to the premise put forth by C.... Wright Mills about the interaction of individual biography and social structure.... nbsp;… Snoop Dogg as a celebrity can be classified as a power elite based on Alberoni's premise....
8 Pages (2000 words) Case Study

The Pros and Cons of Outsourcing

Outsourcing is a potentially debatable subject because of the range of its pros as well as cons for the company hiring the services of the outsourcers, the outsources, and also the… “Outsourcing refers to the process wherein a business contracts with a third party service provider to provide services that might otherwise be performed by in-house employees of the business” (“Outsourcing: pros and cons”)....
4 Pages (1000 words) Research Paper

English - Pros & Cons Evaluation

This paper "English - Pros & Cons Evaluation" focuses on the fact that Danielle Cordaro uses the compare and contrast organizational outline to evaluate the pros and cons of three options of spring break.... Sam Collier describes this pattern as “the block arrangement also known as the tandem style”....
1 Pages (250 words) Essay

The Opening of a Coffee Bar

the use of the adjacent space for opening a coffee bar, is considered as most effective if taking into consideration the criteria presented above, Business- pros and cons The key issue related to this case is the following: which would be the best choice among the two options available, if taking into consideration the current status and the needs of the business?... he cons of opening a coffee bar instead of carrying cookbooks and food magazines in the adjacent space would be the following: the costs for setting up a coffee shop can be high....
1 Pages (250 words) Essay

EDirectory Protocols

Such protocols are known as service discovery protocol and are a class of network applications that permit distributed… Domain name system (DNS) was added in the edirectory 8.... The domain name system is a type of edirectory discovery protocol that enables the user to locate and translate internet domain Edirectory protocols By Edirectory has traditionally utilized service location protocol (SLP) and service advertising protocol (SAP) to advertise and search for network services....
1 Pages (250 words) Essay

The Pros and Cons of Performance-based Compensation

This paper, The pros and cons of Performance-based Compensation, presents performance-based compensation is emerging as one of the preferred basis of compensation by corporations for their employees.... Hence, this paper shall now attempt to establish the different pros and cons of performance-based compensation....
10 Pages (2500 words) Term Paper

Pros and Cons of Abortion

Abortion can be condemned but also can be valued as it has its own pros and cons.... Abortion pros and cons Abortion is a concept, which still has mixed reviews among U.... Abortion can be condemned but also can be valued as it has its own pros and cons.... To understand the phenomenon of abortion better it would be ideal to look at its pros and cons.... Conclusion Abortion having pros and cons can be defended and justified according to the situation and effect of this medical procedure....
2 Pages (500 words) Research Paper
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us