News Column

Patent Issued for Area and Power Efficient Data Coherency Maintenance

July 1, 2014



By a News Reporter-Staff News Editor at Information Technology Newsweekly -- From Alexandria, Virginia, VerticalNews journalists report that a patent by the inventors Craske, Simon John (Cambridge, GB); Penton, Antony John (Cambridge, GB); Pierron, Loic (Cambridge, GB); Rose, Andrew Christopher (Cambridge, GB), filed on February 2, 2010, was published online on June 17, 2014.

The patent's assignee for patent number 8756377 is ARM Limited (Cambridge, GB).

News editors obtained the following quote from the background information supplied by the inventors: "The field of the invention relates to data processing and in particular to maintaining memory coherency in a data processing apparatus having multiple masters, at least one local cache and a memory.

"FIG. 1 shows schematically a very simple system where coherency problems can arise. This system has a DMA 5 (direct memory access device) that accesses a memory 25 via a bus 20. There is also a processor CPU 10 that has a local cache and that also accesses the memory 25. In this example the cache of the CPU 10 is configured as a write-through cache so that data that the CPU 10 writes to the memory is written to the cache as well. This allows the CPU 10 to access this data more quickly later. However, as the DMA 5 is also accessing the memory 25 it may overwrite a data item stored in the memory that is also stored in the cache of the CPU. This would result in the CPU 10 storing an out of date value for that data item which if not corrected could result in errors in the CPU's processing. To protect against this there is a monitoring circuit 12 provided that snoops writes sent from the DMA 5 on the bus 20 and in response to detecting a write to an address stored in the cache of CPU 10 it invalidates the line in the cache storing this value. This means that a future access to the data item by the CPU 10 will miss in the cache and the CPU will access the memory 25 and retrieve the correct value. A problem with this system is that snooping of the bus and invalidation of the line in the cache takes time and in order to avoid errors it must happen quickly enough to keep up with the DMA writes, otherwise if an interrupt occurs between the DMA 5 updating a value of a data item in the memory and the corresponding cache line being invalidated an incorrect value could be stored in the CPU.

"One way of addressing this problem is to put 'back pressure' on the DMA so that it is stalled until the CPU has completed its work on the cache. FIG. 2 shows an example of a system having a write-back cache where the CPU 30 writes a data value to its cache and marks it as dirty and updates the memory and then marks the value as clean. This increases the speed of the writes but makes the coherency scheme more complex. In such a system, the most up to date value of a data item may be stored in the cache and not in the memory and thus, the snoop unit blocks any DMA writes if it detects the value to be stored in the CPU until the cache has been invalidated and cleaned if required. This maintains coherency but introduces significant delays as the DMA writes are stalled until the CPU has done the required work on its cache.

"It would be desirable to be able to maintain cache coherency without unduly increasing processing delays."

As a supplement to the background information on this patent, VerticalNews correspondents also obtained the inventors' summary information for this patent: "A first aspect of the present invention provides an apparatus for storing data being processed comprising: a cache associated with at least one device and for storing a local copy of data items stored in a memory for use by said at least one device; monitoring circuitry associated with said cache for monitoring write transaction requests to said memory initiated by at least one further device, said at least one further device being configured not to store data in said cache, said monitoring circuitry being responsive to detecting a write transaction request to write a data item, a local copy of which is stored in said cache, to block a write acknowledge signal transmitted from said memory to said at least one further device indicating said write has completed and to invalidate said stored local copy in said cache and on completion of said invalidation to send said write acknowledge signal to said at least one further device.

"The present invention recognises the competing problems associated with cache coherency operations. These operations need to keep pace with data writes if errors are to be avoided, however, doing this by delaying the writes, increase delays in the system. The present invention addresses these competing problems by rather than blocking a write request until coherency operations have been performed, it rather allows it to proceed and the actual write to be performed but it blocks it completing by blocking the transmission of the write acknowledge signal. When the write acknowledge signal is detected by the monitoring circuit it is blocked and invalidation of the corresponding cache entry is performed. Thus, the write acknowledge signal is blocked until the entry invalidation is completed whereupon the write acknowledge signal is unblocked and allowed to travel to its destination. As the write has not completed until the write acknowledge signal is received no interrupt can be processed until this occurs which avoids coherency errors. Furthermore, as it is only the write acknowledge signal that is blocked and needs to be buffered, not much information needs to be stored thus there is little additional storage requirements. If it were the write transaction requests that were delayed considerably more storage would be required.

"In some embodiments, the apparatus further comprises a master port for receiving transaction requests from said at least one further device to said memory; an interconnect port for accessing said memory via an interconnect; said apparatus comprising channels for transmitting said transaction requests to said interconnect port, at least some of said channels being routed through said monitoring circuitry to said interconnect port.

"Although the monitoring circuit can monitor the write traffic in a number of ways, in some embodiments the channels being monitored are passed through the monitoring circuitry on their way to the interconnect and the monitoring circuitry can then monitor them and also block signals as required.

"In some embodiments, said channels comprise a write address channel for transmitting an address of a write transaction request, a response channel for transmitting said write acknowledge signal, a write data channel for transmitting data to be written by said write transaction request, a read address channel for transmitting an address of a read transaction request and a read data channel for transmitting data that has been read, said response channel and said write address channel being routed through said monitoring circuitry to said interconnect port and said other channels being routed directly to said interconnect port.

"In order for the monitoring circuitry to be able to monitor the write traffic from the at least one further device the write address channel is routed through the monitoring circuitry. This channel carries the information that the monitoring circuitry requires to determine if the write request is to a data item that is stored in the cache. The response channel is also routed through the monitoring circuitry enabling the monitoring circuitry to block the write acknowledge signal and then to transmit it once it has invalidated any local copies of the data that require invalidating. This is generally done by setting a valid indicator associated with the storage location to invalid.

"In some embodiments, said monitoring circuitry is configured to monitor said write address channel to determine whether said write transaction request is to write a data item, a local copy of which is stored in said cache.

"In some embodiments, said monitoring circuitry is responsive to a coherency indicator associated with a write transaction request having a predetermined value not to block said write acknowledge signal transmitted from said memory to said at least one further device, and being responsive to said coherency indicator not having said predetermined value to block said write acknowledge signal.

"Although the monitoring circuitry can maintain coherency, there are circumstances where accesses are to regions where one can be sure that there are no coherency problems. In such a case, a coherency indicator has a predetermined value that indicates to the monitoring circuitry that coherency operations do not need to be performed for this access and thus, the write acknowledge signal does not need to be blocked and no line needs to be invalidated. Such a coherency indicator may be set to this predetermined value by the further device, and it may indicate that the device is writing to a region where local copies of the data are never taken.

"In some embodiments, said coherency indicator comprises a sideband signal associated with an address of said write request.

"Although the coherency indicator can be transmitted to the monitoring circuitry in a number of ways, it is quite convenient to transmit it as a sideband signal associated with an address to the write request. As the monitoring circuitry may use the write address to determine whether a local copy of the data item is stored in the cache, it will need to monitor this signal and as such, a sideband signal associated with it can also be monitored quite easily without requiring additional circuitry or routing.

"In some embodiments, said cache comprises a write-through storage region, in which data is stored in said cache at a same time as it is stored in said memory and a write-back storage region in which data is stored first in said cache and marked as dirty and is stored in said memory later whereupon the local copy stored in said cache is no longer marked as dirty, said monitoring circuitry being responsive when trying to invalidate a stored local copy, to detection that said stored local copy is marked as dirty to assert an error indicator indicating failed invalidation and not to invalidate said storage location.

"As noted in the introduction, caches can be configured to operate in either a write-through or a write-back mode. In a write-through mode the data is stored in the cache at the same time as it is stored in the memory while in a write-back mode it is stored in the cache first and stored in the memory later and thus, the cache may hold a value of the data item more recent than the value in memory. If it does hold this value it is marked as dirty. The coherency operations of the present invention are not suitable for write-back regions of the cache. This is because these coherency operations simply invalidate the local copy of the data item and if this is a dirty copy then it should be cleaned first, i.e. the value should be written to memory. However, it should be noted that if the access is to a write-back region of the cache where the data item stored is not dirty then the coherency operations will function correctly. Thus, embodiments of the present invention detect whether or not the item is marked as dirty and if it is they do not invalidate the stored local copy but rather they assert an error indicator which indicates a failed invalidation. In this way, the most up-to-date value of the data item is not overwritten and the processor knows from the error indicator that there is a programming error and it needs to react accordingly. Thus, although coherency is not maintained where there is a write-back region and a write request is made to memory where a dirty value of that item is stored in the cache, the system does provide an indication to the processor that an error has occurred and thus, the processor does not continue to operate with false data.

"Although the error indicator can be flagged in a number of ways, in some embodiments said error indicator forms part of said write acknowledge response.

"As the write acknowledge response is returned to the device that tried to access the memory which is storing an old version of the data, it is appropriate that the error response is sent to this device so that this device knows that coherency could not be maintained for this access. A convenient way of transmitting it is with the write acknowledge response, possibly as a side band signal to this response.

"In some embodiments, said apparatus further comprises cache control circuitry for controlling storage of data items in said cache.

"The cache may have associated with it cache control circuitry that controls the storage of the data and sets indicators such as the dirty bit.

"In some embodiments, said monitoring circuitry is responsive to detection that said cache controller and said cache are powered down not to block said write acknowledge signal and not to invalidate said local copy.

"If the cache controller and the cache are powered down then the monitoring circuitry is responsive to detection of this and does not block the write acknowledge signal nor does it invalidate the local copy. This is because if the cache is powered down then after it is powered up all the lines must be invalidated before it is enabled.

"In some embodiments, said monitoring circuitry is configured in response to detection that said cache controller is powered down and said cache is powered up and in response to detecting a write transaction request to write a data item, a local copy of which is stored in said cache, to assert an error indicator indicating failed invalidation and not to invalidate said local copy.

"If the cache controller is powered down but the cache itself is not powered down then in response to detecting a write transaction request to write a data item a local copy of which is stored in the cache, an error indicator is asserted and the local copy is not invalidated. If the cache controller is powered down then the cache data storage can no longer be correctly controlled and this can be signalled to any devices that are processing the data by an error signal. An error signal is also used to indicate times when the cache is operating in write-back mode and coherency cannot be maintained. Thus, the same indicator can be used for both situations and in this area efficient way times when the coherency of the stores cannot be guaranteed can be indicated to any processing apparatus.

"A second aspect of the present invention provides a data processing apparatus comprising: said apparatus for storing data being processed according a first aspect of the present invention; a processor for processing said data; and an interconnect port associated with said processor for accessing said memory via an interconnect; wherein said cache is associated with and stores data processed by said processor.

"Although the cache may be a stand alone cache that stores data for one or more devices such as video processors or various CPU's, in some embodiments it is a cache that is associated with a processor and stores data locally for that processor to use.

"In some embodiments, said data processing apparatus comprises a further processor, a further cache associated with said further processor for storing a local copy of a data item stored in a memory and an interconnect port associated with said further processor for accessing said memory via an interconnect and monitoring circuitry associated with said further cache for monitoring write traffic on said interconnect; wherein said monitoring circuitry associated with each of said cache and said further cache is configured to monitor write transaction requests from a processor not associated with said respective cache and to respond to detecting a write transaction request to write a data item, a local copy of which is stored in said cache, to block a write acknowledge signal transmitted from said memory to said processor not associated with said cache indicating said write has completed and to invalidate said stored local copy in said cache and thereafter to send said acknowledge signal to said processor.

"Embodiments of the present invention can provide a cross-coupled system where cache coherency between two cached processors is maintained in an area efficient manner by allowing the monitoring circuit of each to monitor the write requests of the other one and to invalidate cache lines where they are writing to data that is stored in respective caches.

"In some embodiments, said data apparatus further comprises said at least one further device, said at least one further device comprising a direct memory access (DMA) device.

"The at least one further device may comprise any device that accesses the memory, however in some embodiments it comprises a direct memory access device.

"In some embodiments, said monitoring circuitry is responsive to a coherency indicator associated with a write transaction request having a predetermined value not to block said write acknowledge signal transmitted from said memory to said at least one further device, and being responsive to said coherency indicator not having said predetermined value to block said write acknowledge signal; and said direct memory access device is configured to set said coherency indicator as a sideband signal of write address data in said write transaction request in dependence upon whether an access is to be performed coherently or non-coherently.

"The direct memory access device may perform data accesses such as copy operations coherently or non-coherently. If they are being performed non-coherently then it indicates this on a sideband signal of the write address data and the monitoring circuit then knows that it does not need to block the write acknowledge signal or invalidate any lines.

"A third aspect of the present invention provides a method of maintaining coherency between a cache and a memory comprising: monitoring write transaction requests to said memory initiated by a device that does not store data in said cache; detecting one of said monitored write transaction requests that is to write a data item, a local copy of which may be stored in said cache; blocking a write acknowledge signal transmitted from said memory to said at least one further device indicating said detected write has completed; determining if said data item is stored in said cache and if so invalidating said stored local copy in said cache; unblocking said write acknowledge signal and transmitting said write acknowledge signal to said at least one further device.

"The above, and other objects, features and advantages of this invention will be apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings."

For additional information on this patent, see: Craske, Simon John; Penton, Antony John; Pierron, Loic; Rose, Andrew Christopher. Area and Power Efficient Data Coherency Maintenance. U.S. Patent Number 8756377, filed February 2, 2010, and published online on June 17, 2014. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=8756377.PN.&OS=PN/8756377RS=PN/8756377

Keywords for this news article include: ARM Limited, Information Technology, Information and Data Processing.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Information Technology Newsweekly


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters