The patent's assignee is
News editors obtained the following quote from the background information supplied by the inventors: "This invention relates generally to data storage technologies, in particular, to distributed fault-tolerant storage in independent storage locations.
"Modern Internet-scale applications, such as social networking websites, receive and generate vast quantities of data continuously. This data includes user information, images, videos, text posts, emails, performance logs, search indices, meta data, etc. This data must be stored securely and reliably, and it must be accessible despite data disruption events such as natural disasters, power failures, disk failures, server failures, etc. In the past, reliability and accessibility of data in Internet applications was provided by storing many copies of the same data in geographically separate data centers. By having distinct, separate copies of the same data in multiple locations, a system could ensure that at least one copy of the data was accessible at any time, despite the occurrence of data disruption events.
"But data mirroring has a cost associated with it. Each copy of data requires additional storage resources, and if multiple copies of the same data are maintained, the storage overhead becomes prohibitive for large data sets. One solution to this problem is to not maintain full redundant copies of the data, but rather to compute smaller recovery codes from the data, where the recovery codes allow a lost piece of the data to be recovered using the remaining data. In the simplest case a recovery code can be generated by splitting the data into N pieces and computing an XOR across the pieces. The N pieces of data can then be distributed to N separate data storage locations. If any one of the N pieces is lost, the lost piece can be reconstructed by XORing the recovery code against the remaining pieces. In this simple case the storage scheme requires 1/N of the data as additional storage overhead to maintain the recovery codes, but this is still an improvement over the complete data duplication required in data mirroring. The simple scheme guards against the loss of only a single one of the N pieces of data, however, other methods of generating recovery codes allow for greater redundancy, but may require additional storage overhead as a tradeoff. The data storage locations can be established in geographically separate sites so that the probability of a single data disruption event effecting all locations is minimized.
"Systems that provide redundant storage as described above are sometimes called Reliable Arrays of Independent Nodes (RAIN). RAIN systems are often efficient in terms of the storage overhead that they require to provide data redundancy, but they are inefficient in terms of the network usage. When a piece of data is lost at one of the nodes of a RAIN system due to a data disruption event (e.g., hard disk failure), the information to reconstruct that lost data must be fetched from other nodes since all the recovery codes and the other data pieces will not be locally stored. The RAIN system cannot keep all the recovery codes and other data pieces locally because doing so would adversely affect the fault-tolerance characteristics of the system--the failure of a single machine or location could cause the system to lose access to all the locally stored data. Therefore, when data recovery is necessary, both the recovery codes and data pieces necessary for data reconstruction must be sent over the network to the location where the lost data is being reconstructed.
"Depending on the frequency and severity of data disruption events, the network traffic initiated by data reconstruction processes may cause network congestion and other issues. For extremely large data sets, such as those generated by Internet scale applications--e.g. social networking systems, search engines, web services providers, etc.--handling the traffic between data storage locations may be very expensive."
As a supplement to the background information on this patent application, VerticalNews correspondents also obtained the inventors' summary information for this patent application: "Embodiments of the invention provide fault-tolerant storage for systems that use large data sets stored across a distributed storage system. In one embodiment, input data is received from clients and the received data is divided into data blocks for storage. The data blocks are processed using a coding scheme that generates redundant level one error correction blocks (L1EC Blocks). The L1EC blocks enable the reconstruction of one or more damaged or inaccessible data blocks, so long as sufficient undamaged elements are still accessible. The L1EC blocks and the data blocks are divided into distribution sets and these sets are stored at a plurality of data storage locations. At each data storage location, additional level two error correction blocks (L2EC blocks) are generated that provide local data redundancy. The L2EC blocks enable reconstruction of damaged elements at a data storage location without requiring communication with the other data storage locations. Upon detecting a data disruption event, an inaccessible data storage location is identified and the elements that were stored at the inaccessible data storage location are reconstructed.
BRIEF DESCRIPTION OF THE DRAWINGS
"FIG. 1 is a figure illustrating one example embodiment of a distributed data storage system connected to clients in a network environment.
"FIG. 2 illustrates the components of a distributed data storage system in one example embodiment.
"FIG. 3A illustrates an example process for processing and storing data, to enable data recovery, in a distributed data storage system.
"FIG. 3B illustrates an embodiment of the data structures that are created at various stages in the course of data storage in a distributed data storage system.
"FIG. 4 illustrates a process for recovering from a data disruption event that makes an entire data storage location inaccessible.
"FIG. 5 illustrates a process for recovering from a data disruption event that makes a local storage resource inaccessible.
"The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein."
For additional information on this patent application, see: Borthakur, Dhrubajyoti; Brashers, Per; Taylor,
Keywords for this news article include:
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
Most Popular Stories
- Koch Brothers Step up Anti-Obamacare Campaign
- FDIC Accuses Big Banks of Fraud, Conspiracy
- Is Malaysian Airlines Flight 370 in Andaman Sea?
- Vybz Kartel Convicted of Murder
- Stocks Close Lower Ahead of Crimea Vote
- FDIC Sues Big Banks Over Rate Manipulation
- Ulta Shares Look Good on Strong Q4
- Jittery Investors Dumping Russian Stocks
- U.S. Consumer Sentiment Falls in Early March
- JLo Turns the Tables in New Vid: 'I Luh Ya Papi'