News Column

"Using Cache Hit Information to Manage Prefetches" in Patent Application Approval Process

June 19, 2014



By a News Reporter-Staff News Editor at Computer Weekly News -- A patent application by the inventor Chaudhary, Anurag (San Jose, CA), filed on November 27, 2012, was made available online on June 5, 2014, according to news reporting originating from Washington, D.C., by VerticalNews correspondents.

This patent application is assigned to Nvidia Corporation.

The following quote was obtained by the news editors from the background information supplied by the inventors: "A primary factor in the utility of a computer system is the speed at which the computer system can execute an application. It is important to have instructions and data available at least as fast as the rate at which they can be executed, to prevent the computer system from idling (stalling) while it waits for the instructions and/or data to be fetched from main memory.

"A widely used solution to reduce or prevent stalling is to implement a hierarchy of caches in the computer system. In essence, one or more caches are situated between the main memory and the central processing unit (CPU). The caches store recently used instructions and data based on the assumption that information might be needed again. By storing information in a hierarchical manner, the caches can reduce latency by providing information more rapidly than if the information had to be retrieved from, for example, the main memory.

"The closer a cache is to the CPU, the shorter the latency between the cache and the CPU. The cache closest to the CPU is usually referred to as the level one (L1) cache, the next cache is usually referred to as the level two (L2) cache, and so on. Information most likely to be needed by the CPU, or information more recently accessed by the CPU, is stored in the L1 cache, the next tier of information is stored in the L2 cache, and so on.

"Latency can be further reduced by prefetching information into the caches. Prefetching involves, in essence, making a prediction of the information that may be needed by an application, and then prefetching that information from, for example, the main memory into a cache, or from one cache into a cache that is closer to the CPU (e.g., from the L2 cache to the L1 cache).

"Hardware-initiated prefetching is typically based on a pattern-matching mechanism. The traffic stream (e.g., the stream of access requests for instructions or data) is monitored to try to find a pattern to the requests. If a pattern can be found, then that pattern can be used to anticipate subsequent requests for information, so that information can be prefetched. For example, if the prefetcher determines that data has been requested from addresses 2, 4, and 6 in the L2 cache because of cache misses in the L1 cache (e.g., a pattern of every other address, corresponding to every other cache line), then the prefetcher can anticipate that the cache line at address 8 might also be needed and can prefetch that cache line.

"There is a basic tradeoff in prefetching. As noted above, prefetching can improve performance by reducing latency. On the other hand, if too much information (e.g., too many cache lines) is prefetched, then the efficiency of the prefetcher may be reduced. Furthermore, if too much information is prefetched, then the cache might become polluted with cache lines that might not actually be needed. If the cache is full, then prefetching new cache lines into the cache can cause useful lines to be prematurely evicted in order to make room for the new lines.

"The benefits and risks of prefetching both can increase as the prefetch distance is increased. The prefetch distance is a measure of how far to prefetch based on an observed pattern. If, for instance, data is fetched from addresses 2, 4, and 6 (a pattern of every other address), then data can be prefetched from address 8 if the prefetch distance is one, from addresses 8 and 10 if the prefetch distance is two, and so on. In general, the prefetch distance specifies the number of accesses projected along a pattern starting from a starting point in the pattern (usually, from the last demand access that is a part of the pattern).

"The prefetch distance can be managed using a confidence value associated with the pattern. The confidence value, in effect, is a measure of how often the pattern is observed or, equivalently, the number of elements that make up the pattern. The confidence value, and hence the prefetch distance, may initially be zero; that is, prefetching might not begin as soon as an apparent pattern is detected. Instead, prefetching might begin only if the pattern is observed repeatedly; each time the pattern is observed, the confidence value can be incremented, and the prefetch distance can be increased when the confidence value reaches a threshold. In the example above, if the pattern indeed continues as expected and ends up including addresses 8 and 10 in addition to addresses 2, 4, and 6, then the confidence value might be incremented and prefetching can begin. If the pattern continues beyond address 10, then the confidence value and consequently the prefetch distance can again be increased. In other words, if the actual pattern continues to match the predicted pattern, then the confidence value can be increased and, in turn, the prefetch distance can be increased."

In addition to the background information obtained for this patent application, VerticalNews journalists also obtained the inventor's summary information for this patent application: "As used herein, an access request refers to a request for information (data and/or instructions) from a memory element. An access request may be a demand request issued by a processing unit, or it may be a request issued by a computer system's operating system or memory management system. In response to an access request, information may be fetched from a memory element to a processing unit, or it may be fetched from one memory element to another memory element (e.g., from main memory to a cache, or from one cache to another).

"As noted above, as the confidence level increases, the prefetch distance can increase as well. At some point, the prefetch distance can reach its design or user-specified limit, the maximum prefetch distance. However, there is an advantage to capping the prefetch distance at less than the maximum prefetch distance. For example, if L1 cache misses start hitting in the L2 cache when the prefetch distance for the L2 cache (the number of cache lines being prefetched into the L2 cache) is less than the maximum prefetch distance, then the prefetch distance for the L2 cache can be maintained at its current value so that a greater number of cache lines are not unnecessarily prefetched. That is, the fact that L1 cache misses are hitting in the L2 cache provides an indication that prefetching into the L2 cache is occurring far enough ahead, and thus it is not necessary to continue increasing the prefetch distance for the L2 cache beyond the current prefetch distance. By not increasing the prefetch distance beyond a value that is demonstrated as being sufficient, the possibility of a useful cache line being evicted prematurely from the L2 cache is reduced or eliminated. Also, overall efficiency is improved by some measures; for example, the prefetcher is not consuming bandwidth by prefetching more cache lines than are needed.

"Embodiments according to the present invention utilize cache hit information to manage (e.g., cap) the prefetch distance for a cache. In an embodiment in which there is a first cache and a second cache, where the second cache (e.g., an L2 cache) has greater latency than the first cache (e.g., an L1 cache), a prefetcher prefetches cache lines to the second cache and is configured to receive feedback from that cache. The feedback indicates whether an access request issued in response to a cache miss in the first cache results in a cache hit in the second cache. The prefetch distance for the second cache is determined according to the feedback. In one such embodiment, the prefetch distance for the second cache is held at its current value, even if the confidence level continues to increase and even if the current value is less than the absolute maximum value.

"In another embodiment, if the feedback indicates that an access request issued in response to a miss in the first cache results in a cache hit in the second cache, then the prefetch distance for the second cache is allowed to increase from its current value by a specified amount to a new (second) value, then held at the second value even if that value is less than the absolute maximum value. In this manner, an amount of margin is included in the prefetch distance for the second cache to account for changes in latencies in caching and/or prefetching. For example, if access requests are occurring relatively close to one another, then a larger prefetch distance may be needed; conversely, if access requests are occurring less frequently, then a smaller prefetch distance may be satisfactory. Thus, if the prefetch distance is capped, without margin, at its current value based on the current frequency of access requests, then that prefetch distance may not be satisfactory if the frequency of access requests increases. Without margin, some access requests issued in response to a cache miss in the first cache might result in a cache miss in the second cache as well, at least for the amount of time it takes for the prefetcher to catch up to the increased frequency of access requests (that is, until the prefetcher is able to detect the pattern of cache misses in the second cache and increase the prefetch distance for the second cache accordingly). With margin, this situation can be avoided.

"The prefetch distance for each cache in a hierarchy of caches can be managed in a manner similar to that described above. The prefetch distance can be different for each of the caches. Thus, for example, the prefetch distance for the L1 cache can be less than the prefetch distance for the L2 cache. This would reduce overall latency while also reducing pollution of the L1 cache.

"Thus, there are a number of other advantages associated with the features of the present invention. For example, the prefetch distance can be managed so that it is satisfactory for reducing cache misses and for mitigating the latencies associated with such misses, but does not result in evicting potentially useful cache lines from the caches and does not consume computer resources (e.g., bandwidth) by prefetching relatively large numbers of cache lines when it is not necessary to do so. By managing prefetch distance, information is still retrieved in advance of when it is needed but not too far in advance; that is, a sufficient number of cache lines are prefetched but not too many cache lines. In effect, some prefetches are postponed without negatively affecting computer system performance.

"These and other objects and advantages of the various embodiments of the present disclosure will be recognized by those of ordinary skill in the art after reading the following detailed description of the embodiments that are illustrated in the various drawing figures.

BRIEF DESCRIPTION OF THE DRAWINGS

"The accompanying drawings, which are incorporated in and form a part of this specification and in which like numerals depict like elements, illustrate embodiments of the present disclosure and, together with the description, serve to explain the principles of the disclosure.

"FIG. 1 is a block diagram of an example of a computer system capable of implementing embodiments according to the present invention.

"FIG. 2A is a block diagram of a computer system that includes a prefetcher in an embodiment according to the present invention.

"FIG. 2B is a block diagram illustrating the flow of information between elements including a prefetcher in an embodiment according to the present invention.

"FIG. 3 illustrates prefetch distances in an embodiment according to the present invention.

"FIG. 4 is a block diagram of a computer system that includes a prefetcher in an embodiment according to the present invention.

"FIGS. 5A and 5B illustrate examples of prefetch distances over time in embodiments according to the present invention.

"FIG. 6 is a flowchart of an example of a computer-implemented method for managing prefetches in an embodiment according to the present invention."

URL and more information on this patent application, see: Chaudhary, Anurag. Using Cache Hit Information to Manage Prefetches. Filed November 27, 2012 and posted June 5, 2014. Patent URL: http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=476&p=10&f=G&l=50&d=PG01&S1=20140529.PD.&OS=PD/20140529&RS=PD/20140529

Keywords for this news article include: Nvidia Corporation.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Computer Weekly News


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters