News Column

"Data Error Detection and Correction Using Hash Values" in Patent Application Approval Process

July 15, 2014

By a News Reporter-Staff News Editor at Information Technology Newsweekly -- A patent application by the inventors Kalach, Ran (Bellevue, WA); Hasan, Kashif (Snoqualmie, WA); Oltean, Paul Adrian (Redmond, WA); Benton, James R. (Hanover, NH); Cheung, Chun Ho (Redmond, WA); El-Shimi, Ahmed Moustafa (Seattle, WA), filed on December 21, 2012, was made available online on July 3, 2014, according to news reporting originating from Washington, D.C., by VerticalNews correspondents.

This patent application is assigned to Microsoft Corporation.

The following quote was obtained by the news editors from the background information supplied by the inventors: "Data stored in memory and/or storage media, such as hard disks, is susceptible to hardware and/or software errors caused by myriad of reasons. Such errors often lead to data corruption and/or loss, which negatively affect computing device operation and thus, user productivity and monetary loss. A file, for instance, may be rendered inaccessible or unreadable due to data corruption and/or data loss. A bad hard disk sector renders any data stored therein unreadable and thus, lost. File system corruption may result in a loss of entire files. Corrupted data, typically, includes incorrect data that can no longer be used in the file. These data errors represent only a fraction of the potential data errors that cause data loss and/or corruption.

"Because data errors occur in conventional data storage systems, mitigating and/or preventing such data errors is a significant aspect of maintaining data integrity. Furthermore, protecting sensitive data is often considered an information customer requirement. One common solution implements a redundancy mechanism where redundant copies of data are organized to prevent data loss. Another common solution uses backup and restore mechanisms where a data backup allows a user to recover lost and/or corrupted data. Such solutions incur substantial costs, including from offline administration, and often allow data corruption to remain undetected."

In addition to the background information obtained for this patent application, VerticalNews journalists also obtained the inventors' summary information for this patent application: "This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.

"Briefly, various aspects of the subject matter described herein are directed towards data error self-healing using substantially collision-free hash values. If a data block's hash value is recorded in a persisted index, when a data error is detected at a later point in time, the hash value is used to determine whether a duplicate/identical data block exists and if so, recover original data. In one aspect, the duplicate data block is found as new data is being stored in a storage system. In another aspect, the duplication data block is found in an existing data store. In yet another aspect, the duplicate data block is generated through manipulation of a corrupted data block.

"In one aspect, the data storage service integrates data deduplication and data error detection and repair using strong hash-based identification. Data deduplication generally refers to detecting, uniquely identifying and eliminating redundant data blocks and thereby reducing the physical amount of bytes of data that need to be stored on disk and/or transmitted across a network. When one of these redundant data blocks is lost and/or corrupted, the data storage service searches one or more hash indexes for a duplicate data block having a matching hash value. If such a duplicate data block is found, the data storage service replaces the lost and/or corrupted redundant data block with the duplicate data block. In one aspect, the data storage service detects and repairs lost and/or corrupted data blocks without user intervention.

"In one aspect, in addition to a primary hash index for the redundant data blocks described herein, the data storage service maintains a sub-index comprising hash values corresponding to corrupted and/or lost data blocks. When looking for the duplicate data block, the data storage service searches the sub-index and any main index of redundant data blocks. If there is a match in the sub-index, the data storage service retains the correct data block, updates the sub-index with a reference to the correct data block, and deletes the corrupted or lost data block.

"Other advantages may become apparent from the following detailed description when taken in conjunction with the drawings.


"The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:

"FIG. 1 is a block diagram illustrating an example system for repairing data blocks to correct at least one data error according to one example implementation.

"FIG. 2 is a block diagram representing example components/phases of a deduplication process according to one example implementation.

"FIG. 3 is a flow diagram illustrating example steps for persisting a sub-index corresponding to data blocks having at least one error according to one example implementation.

"FIG. 4 is a flow diagram illustrating example steps for manipulating a corrupted data block to produce another data block for repairing the corrupted data block according to one example implementation.

"FIG. 5 is a block diagram representing example non-limiting networked environments in which various embodiments described herein can be implemented.

"FIG. 6 is a block diagram representing an example non-limiting computing system or operating environment in which one or more aspects of various embodiments described herein can be implemented."

URL and more information on this patent application, see: Kalach, Ran; Hasan, Kashif; Oltean, Paul Adrian; Benton, James R.; Cheung, Chun Ho; El-Shimi, Ahmed Moustafa. Data Error Detection and Correction Using Hash Values. Filed December 21, 2012 and posted July 3, 2014. Patent URL:

Keywords for this news article include: Microsoft Corporation, Information Technology, Information and Data Storage, Information and Data Loss and Recovery.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC

For more stories covering the world of technology, please see HispanicBusiness' Tech Channel

Source: Information Technology Newsweekly

Story Tools Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters