News Column

"Apparatus and Method for Controlling Access to a Memory Device" in Patent Application Approval Process

September 4, 2014



By a News Reporter-Staff News Editor at Politics & Government Week -- A patent application by the inventors CAMPBELL, Michael Andrew (Cambridge, GB); CONWAY, Thomas Kelshaw (Cambridge, GB), filed on February 11, 2013, was made available online on August 21, 2014, according to news reporting originating from Washington, D.C., by VerticalNews correspondents.

This patent application is assigned to Arm Limited.

The following quote was obtained by the news editors from the background information supplied by the inventors: "The present invention relates to an apparatus and method for controlling access to a memory device, and in particular for controlling access to a memory device where error correction codes are added to data to be stored in the memory device in order to provide error correcting capabilities when the data is subsequently retrieved from the memory device.

"It is known to use error correction codes (ECC) in order to protect a data packet from various forms of data corruption. Typically, this is achieved by treating the data packet as a series of data symbols of fixed length, and then adding a number of ECC symbols so that the data symbols and ECC symbols collectively form a code word. Using such a technique, if m ECC symbols are added when forming the code word, then up to m/2 randomly located symbol errors can be located and corrected within the code word. There are various known ECC coding techniques for generating the symbols of the code word. For example, one technique uses Reed Solomon codes, these codes being based on Galois field mathematics and having properties which make them suitable for hardware implementation.

"One practical application for such an ECC coding technique is in memory devices, for example memory devices using DRAM (Dynamic Random Access Memory). One known arrangement of such a memory device involves providing a number of Dual Inline Memory Modules (DIMMs), where each DIMM consists of a number of DRAM chips on a circuit board, including at least one chip reserved for storing ECC information. Often, such a memory device is accessed via burst access operations, each burst comprising a plurality of beats, and the DRAM chips of the DIMM being accessed during each beat. In such an arrangement, it is known to treat the entirety of the data to be written to the memory device via a burst write access as forming the data packet, with a plurality of ECC codes then being generated to add to that data packet in order to form the code word. As mentioned earlier, if the code word includes m ECC symbols, then up to m/2 randomly located symbol errors can be corrected when the data is subsequently read from the memory via a burst read access.

"There are various applications where data stored in the memory device may be subjected to such ECC coding techniques. One particular example is in Reliability, Availability, Serviceability (RAS) critical applications such as data server applications, where the use of such techniques provides greater reliability and availability of service.

"When the memory device is arranged as discussed earlier by employing a number of DIMMs, it is easy to replace any one of the DIMMs in the event of a failure. In particular, if one or more individual DRAM chips within a DIMM fail, then that can be notified to an entity responsible for managing the memory device, and the relevant DIMM can be replaced. Accordingly, current ECC coding techniques are targeted at maintaining service until a failed module can be replaced.

"However, such memory devices cannot always be deployed in convenient locations, and accordingly there can be some delay in replacing a failed DIMM. During such time, it would be preferable for the memory device to continue to be operational. Whilst one known way to achieve this is to provide one or more redundant blocks of memory, which can be switched in in the event of a failure, this obviously increases the size and cost of the memory device, and is not appropriate in many applications.

"Furthermore, it is increasingly the case that low cost, low power, servers are being built with solder/down memory parts. Unlike the above arrangement that uses replaceable DIMM modules, once such a memory device is assembled, it cannot be maintained in a similar fashion, and accordingly once sufficient memory failures have accumulated past the capability of the ECC protection scheme, the memory device is rendered unusable. It would accordingly be desirable to prolong the usability of such memory devices.

"The paper 'Virtualised ECC: Flexible Reliability in Main Memory', by Doe Hyun Yoon et al, Micro, IEEE, Volume 31, Issue 1, pages 11-19 (Digital Object identifier 10.1109/MM.2010.103) describes a system in which an operating system may decide, when allocating a portion of main memory to a particular application, how to apportion that allocated memory portion between the storage of data and the storage of related ECC information, with the goal of maintaining a constant error protection rate without requiring dedicated memory area for ECC storage."

In addition to the background information obtained for this patent application, VerticalNews journalists also obtained the inventors' summary information for this patent application: "Viewed from a first aspect, the present invention provides an apparatus for controlling access to a memory device configured to store code words, the apparatus comprising: encoding circuitry responsive to a write transaction to generate one or more code words for storage in the memory device, each code word comprising a plurality of symbols, said plurality of symbols comprising data symbols and associated error correction code (ECC) symbols; decoding circuitry responsive to a read transaction to decode one or more code words read from the memory device in order to generate read data for outputting in response to the read transaction, the decoding circuitry comprising error correction circuitry configured, for each read code word, to perform an error correction process to detect and correct errors in up to P symbols of said code word, where P is dependent on the number of ECC symbols in the code word; and error tracking circuitry configured to determine error quantity indication data indicative of the errors detected by the error correction circuitry; in response to the error quantity indication data indicating that an error threshold condition has been reached, the apparatus being caused to transition from a normal mode of operation to a safety mode of operation, in said safety mode of operation the encoding circuitry being configured such that the number of symbols in each code word generated by the encoding circuitry is no greater than in the normal mode of operation but each code word has a higher ratio of ECC symbols to data symbols than in said normal mode of operation.

"In accordance with the present invention, error tracking circuitry is used to determine error quantity indication data indicative of the amount of errors occurring within code words read from the memory device, this information being obtained from the error detection and correction activity of the error correction circuitry within the decoding circuitry of the apparatus. If a situation arises where the error quantity indication data indicates that an error threshold condition has been reached, the apparatus then is made to transition from its normal mode of operation to a safety mode of operation. In the safety mode of operation, the encoding circuitry is reconfigured such that each generated code word contains no more symbols than were contained in each code word generated during the normal mode of operation, but each code word has a higher ratio of ECC symbols to data symbols than in the normal mode of operation.

"As a result, the effective data capacity of the memory device is decreased, since the actual amount of the data contained within each code word stored to the memory device is decreased. However, the increased ratio of ECC symbols to data symbols provides an increased robustness to errors, and hence allows a memory device that might otherwise be unusable (due to the number of errors exceeding the error correction capabilities when in the normal mode of operation) to continue to be used in the safety mode of operation, albeit with a lower effective data storage capacity. Hence, the safety mode of operation provides a safe operating mode for the memory device with increased reliability and stability. In a memory device that uses replaceable modules such as the earlier described DIMM arrangement, the use of such a safety mode may allow the memory device to continue to function whilst awaiting replacement of the relevant DIMM. Similarly, for a memory device constructed with solder down memory parts that are non-replaceable, such a safety mode of operation will allow the memory device to continue to function, albeit with a reduced capacity, in situations where the memory device would otherwise be rendered unusable.

"The memory device can be constructed in a variety of ways. However, in one embodiment the memory device comprises a plurality of memory regions, and the apparatus is configured to allocate a first subset of the memory regions for storing the data symbols of each code word and to allocate a second subset of the memory regions for storing the ECC symbols of each code word. In such an embodiment, when the apparatus is caused to transition from the normal mode of operation to the safety mode of operation, the apparatus may be configured to alter which memory regions are within the first subset and the second subset having regards to the higher ratio of ECC symbols to data symbols that is employed when in the safety mode of operation. In particular, the number of memory regions within the first subset used to store data symbols can be reduced when in the safety mode of operation.

"When the number of memory regions within the first subset is decreased, the number of memory regions within the second subset can be correspondingly increased, so that the same total number of symbols is stored within each code word written into the memory device.

"However, in an alternative embodiment, if it can be determined that one or more of the memory regions is exhibiting a failure condition, such that that region can no longer reliably store data and each symbol read from that memory region needs correcting using the ECC symbols, then a decision can be taken to exclude any such failing memory region from use in the safety mode of operation. There are a number of ways in which such a situation can be detected. However, in one embodiment the error quantity indication data identifies error quantity information for each memory region, and if the error quantity indication data indicates that an error threshold condition has been reached, and identifies at least one memory region that is exhibiting a failure condition, that at least one memory region is excluded from use in the safety mode of operation. Hence, the number of symbols within each code word as stored in the memory device is reduced when in the safety mode of operation.

"When entering the safety mode of operation, both the encoding circuitry and the decoding circuitry need to be reconfigured to take account of the change in ratio between the ECC symbols and data symbols within each code word. In situations where the same number of symbols are contained within each code word when operating in the safety mode of operation or the normal mode of operation, and hence any decrease in the number of memory regions used to store data symbols results in a corresponding increase to the number of memory regions used to store ECC symbols, this can be readily achieved by identifying to the encoding circuitry and the decoding circuitry the number of data symbols within each code word.

"When operating in accordance with the embodiment discussed earlier, where one or more regions is excluded from use in the safety mode of operation, and hence the number of symbols in the code word as actually written into, and read from, the memory device decreases when in the safety mode of operation, there are a number of ways in which the operation of the encoding circuitry and the decoding circuitry can be managed. However, in one particular embodiment, in the safety mode of operation, the encoding circuitry is configured to generate a code word having the same number of symbols as in the normal mode of operation, with an ECC symbol being associated with each of said at least one excluded memory regions, and the ECC symbol associated with each of said at least one excluded memory regions not being written to the memory device. Hence, in such an embodiment, the encoding circuitry merely needs to be reconfigured to take into account the reduced number of data symbols in each code word, but the same basic process as used in the normal mode is still used to generate the code word since the overall number of symbols is unchanged. The apparatus is then arranged to ensure that the ECC symbols associated with any excluded memory regions are not written to the memory device.

"Similarly, in one embodiment, when each code word is read from said memory device in the safety mode of operation, dummy data is added at the symbol positions associated with each of said at least one excluded memory regions, such that each code word decoded by the decoding circuitry has the same number of symbols as in the normal mode of operation. Hence, the decoding circuitry also performs the same process as in the normal mode of operation, and merely needs to be reconfigured to take account of the reduced number of data symbols within each code word that it decodes.

"The memory regions can take a variety of forms. However, in one embodiment the memory device comprises a plurality of memory chips, with each memory chip forming one of the memory regions. The memory chips may be provided in one or more DIMMs, or may be non-replaceable.

"Whilst there are a number of ways in which the number of data symbols within each code word may be reduced when operating in the safety mode of operation, it is implementationally more straight forward (both in terms of the operation of the encoding circuitry and decoding circuitry, and in the management of available memory and translation of accesses to memory) to reduce the number of data symbols by a factor of 2. In particular, in one embodiment, in said safety mode of operation the encoding circuitry is configured such that each code word generated by the encoding circuitry has half the number of data symbols as are provided within each code word generated by the encoding circuitry in the normal mode of operation.

"There are a number of ways in which the apparatus can be configured to operate in either the normal mode of operation or the safety mode of operation. However, in one embodiment the apparatus further comprises mode control storage configured to store configuration data used to control operation of the apparatus, initial configuration data within the mode control storage causing the apparatus to operate in the normal mode of operation, but in response to the error quantity indication data indicating that said error threshold condition has been reached, the configuration data being updated within the mode control storage to cause the apparatus to enter the safety mode of operation.

"There are a variety of ways in which the configuration data can be provided to the mode control storage. However, in one embodiment, the error tracking circuitry is configured to output the error quantity indication data to control circuitry, and the mode control storage is configured to update said configuration data upon receipt of control signals from the control circuitry in response to the control circuitry determining that the error quantity indication data indicates that said error threshold condition has been reached. Hence, in this embodiment, the control circuitry is arranged to determine when the error quantity indication data indicates that the error threshold condition has been reached.

"However, in an alternative embodiment, the error tracking circuitry is configured to determine when the error quantity indication data indicates that said error threshold condition has been reached, and upon such determination to output a trigger signal to control circuitry, and the mode control storage is configured to update said configuration data upon receipt of control signals generated by the control circuitry in response to the trigger signal. Accordingly, in this embodiment, the error tracking circuitry determines when the error quantity indication data indicates that the error threshold condition has been reached, and the control circuitry responds to a trigger signal issued by the error tracking circuitry upon detection of the error threshold condition.

"In both of the above embodiments, the control circuitry may be provided either within the apparatus, or external to the apparatus. In one embodiment, the apparatus takes the form of a memory controller unit, and in one particular embodiment the control circuitry is provided external to that memory controller unit. Such an approach enables the control circuitry to coordinate with other elements of the system in which the apparatus is employed, so as to coordinate any activities that are required prior to transitioning the apparatus from the normal mode of operation to the safety mode of operation. For example, it may be necessary to flush the current contents of the memory device from the memory device, store them temporarily in another memory within the system, and then, following entry into the safety mode of operation, to rewrite that data into the memory device using the new code word format for the safety mode of operation. Sometimes, it may not be necessary to flush the memory device, but it may still be appropriate to overwrite all of the data in the memory device with some default data so as to place the memory device into the default state prior to transition from the normal mode of operation to the safety mode of operation.

"Whilst in one embodiment the apparatus may be arranged to operate in only the normal mode of operation or the safety mode of operation, in other embodiments additional modes of operation may be provided. For example, in one embodiment, the error tracking circuitry is configured in the safety mode of operation to continue to maintain error quantity indication data indicative of the errors detected by the error correction circuitry. In response to the error quantity indication data indicating that a further error threshold condition has been reached, the apparatus is then caused to transition from said safety mode of operation to a further safety mode of operation, in said further safety mode of operation the encoding circuitry being configured such that the number of symbols in each code word generated by the encoding circuitry is no greater than in the safety mode of operation but each code word has a higher ratio of ECC symbols to data symbols than in said safety mode of operation. Hence, such an approach allows another level of fallback to be provided, where the memory device can still operate reliably and correctly, albeit with an even further reduced effective capacity.

"In one embodiment, in said further safety mode of operation the encoding circuitry is configured such that each code word generated by the encoding circuitry has half the number of data symbols as are provided within each code word generated by the encoding circuitry in the safety mode of operation.

"Viewed from a second aspect, the present invention provides a method of controlling access to a memory device configured to store code words, the method comprising: generating, in response to a write transaction, one or more code words for storage in the memory device, each code word comprising a plurality of symbols, said plurality of symbols comprising data symbols and associated error correction code (ECC) symbols; decoding, in response to a read transaction, one or more code words read from the memory device in order to generate read data for outputting in response to the read transaction, the decoding step comprising, for each read code word, performing an error correction process to detect and correct errors in up to P symbols of said code word, where P is dependent on the number of ECC symbols in the code word; determining error quantity indication data indicative of the errors detected by the error correction process; and in response to the error quantity indication data indicating that an error threshold condition has been reached, transitioning the apparatus from a normal mode of operation to a safety mode of operation, in said safety mode of operation the encoding step being arranged such that the number of symbols in each generated code word is no greater than in the normal mode of operation but each code word has a higher ratio of ECC symbols to data symbols than in said normal mode of operation.

"Viewed from a third aspect, the present invention provides an apparatus for controlling access to a memory device configured to store code words, the apparatus comprising: encoding means, responsive to a write transaction, for generating one or more code words for storage in the memory device, each code word comprising a plurality of symbols, said plurality of symbols comprising data symbols and associated error correction code (ECC) symbols; decoding means, responsive to a read transaction, for decoding one or more code words read from the memory device in order to generate read data for outputting in response to the read transaction, the decoding means comprising error correction means for performing, for each read code word, an error correction process to detect and correct errors in up to P symbols of said code word, where P is dependent on the number of ECC symbols in the code word; error tracking means for determining error quantity indication data indicative of the errors detected by the error correction means; in response to the error quantity indication data indicating that an error threshold condition has been reached, the apparatus being caused to transition from a normal mode of operation to a safety mode of operation, in said safety mode of operation the encoding means generating code words such that the number of symbols in each generated code word is no greater than in the normal mode of operation but each code word has a higher ratio of ECC symbols to data symbols than in said normal mode of operation.

BRIEF DESCRIPTION OF THE DRAWINGS

"The present invention will be described further, by way of example only, with reference to embodiments thereof as illustrated in the accompanying drawings, in which:

"FIG. 1 is a block diagram illustrating an apparatus in accordance with one embodiment;

"FIG. 2 is a block diagram illustrating the operation of the encoder and decoder of FIG. 1 in accordance with one embodiment;

"FIG. 3 is a block diagram illustrating in more detail components provided within the syndrome decoder and repair circuitry of FIG. 2 in accordance with one embodiment;

"FIG. 4 is a diagram schematically illustrating how the chips of the memory device may be reallocated between the normal mode, the safety mode and further safety mode, in accordance with one embodiment;

"FIG. 5 is a diagram schematically illustrating how the chips of the memory device may be reallocated between the normal mode, the safety mode and further safety mode, in accordance with an alternative embodiment; and

"FIG. 6 is a flow diagram illustrating the operation of the control circuitry of FIG. 1 in accordance with one embodiment."

URL and more information on this patent application, see: CAMPBELL, Michael Andrew; CONWAY, Thomas Kelshaw. Apparatus and Method for Controlling Access to a Memory Device. Filed February 11, 2013 and posted August 21, 2014. Patent URL: http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=321&p=7&f=G&l=50&d=PG01&S1=20140814.PD.&OS=PD/20140814&RS=PD/20140814

Keywords for this news article include: Arm Limited.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Politics & Government Week


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters