News Column

Patent Application Titled "Massively Scalable Object Storage System" Published Online

February 25, 2014



By a News Reporter-Staff News Editor at Information Technology Newsweekly -- According to news reporting originating from Washington, D.C., by VerticalNews journalists, a patent application by the inventors Barton, Michael (San Antonio, TX); Reese, Will (San Antonio, TX); Dickinson, John A. (Schertz, TX); Payne, Jay B. (San Antonio, TX); Thier, Charles B. (San Antonio, TX); Holt, Gregory (Hollywood Park, TX), filed on October 7, 2013, was made available online on February 13, 2014.

The assignee for this patent application is Rackspace US, Inc.

Reporters obtained the following quote from the background information supplied by the inventors: "The present disclosure relates generally to cloud computing, and more particularly to a massively scalable object storage system to provide storage for a cloud computing environment.

"Cloud computing is location-independent computing, whereby shared servers provide resources, software, and data to computers and other devices on demand. As a term, 'cloud computing' describes a consumption and delivery model for IT services based on the Internet, and it typically involves over-the-Internet provisioning of dynamically scalable and often virtualized resources. This frequently takes the form of web-based tools or applications that users can access and use through a web browser as if it were a program installed locally on their own computer. Details are abstracted from consumers, who no longer have need for expertise in, or control over, the technology infrastructure 'in the cloud' that supports them. Most cloud computing infrastructures consist of services delivered through common centers and built on servers. Clouds often appear as single points of access for consumers' computing needs.

"As the use of cloud computing has grown, cloud service providers such as Rackspace Hosting Inc. of San Antonio, Tex., have been confronted with the need to greatly expand file storage capabilities rapidly while making such expansions seamless to their users. Conventional file storage systems and methods to expand such systems suffer from several limitations that can jeopardize data stored in the object storage system. In addition, known techniques use up substantial resources of the object storage system to accomplish expansion while also ensuring data safety. Finally, the centralization of data storage brings with it issues of scale. A typical local storage system (such as the hard drive in a computer) may store thousands or millions of individual files for a single user. A cloud-computing-based storage system is designed to address the needs of thousands or millions of different users simultaneously, with corresponding increases in the number of files stored.

"Accordingly, it would be desirable to provide an improved scalable object storage system."

In addition to obtaining background information on this patent application, VerticalNews editors also obtained the inventors' summary information for this patent application: "According to one embodiment, the improved scalable object storage system includes a method for storing data, comprising providing a plurality of physical storage pools, each storage pool including a plurality of storage nodes coupled to a network, each storage node further providing a non-transient computer readable medium for data storage; classifying a plurality of availability zones, wherein the storage nodes within an availability zone are subject to a correlated loss of access to stored data; defining a plurality of abstract partitions, wherein each possible input data management request deterministically corresponds to one of the plurality of abstract partitions; mapping the plurality of abstract partitions to the plurality of physical storage pools such that each mapped physical storage pool includes a replica of the data associated with the associated mapped abstract partition, and each replica for a particular abstract partition is mapped to a physical storage pool in a different availability zone; receiving a data management request over the network, the data management request associated with a data object; identifying a first partition corresponding to the received data management request; and manipulating the data object in the physical storage pools mapped to the first partition in accordance with the data management request.

"According to another embodiment, the improved scalable object storage system includes a distributed storage coupled to a network, the distributed storage including a first storage pool and a second storage pool from a plurality of storage pools, the first storage pool in a first availability zone and the second storage pool in a second availability zone, each storage pool including at least one processor, a computer readable medium, and a communications interface; a director coupled to the network, the director including a processor, a computer readable medium, and a communications interface; a ring structure associated with the director, wherein the ring structure is adapted to associate a storage request with a first abstract partition from a plurality of abstract partitions, and wherein the ring structure is further adapted to selectively associate a first abstract partition with a first fault-tolerant multi-master replication target, the first replication target including the first storage pool and the second storage pool; wherein the director is adapted to route inbound storage requests to the replication target and outbound storage responses from the replication target.

"According to another embodiment, the improved scalable object storage system includes a non-transient computer readable medium containing executable instructions, which when executed on a processor at a first time, initialize a ring by retrieving a set of ring parameters, the ring parameters including a number of abstract partitions, a number of physical storage pools, and a set of performance constraints; performing a consistent hashing function associating a first range of inputs with a first abstract partition and a second range of inputs with a second abstract partition; and allocating the available physical storage pools by mapping each abstract partition to one or more storage pools in accordance with the set of performance constraints; at a second time, opaquely route an input request to a correct storage pool in accordance with the initialized ring; and at a third time, rebalance the ring by retrieving the set of ring parameters, performing a consistent hashing function associating the range of inputs with the first abstract partition and the second range of inputs with the second abstract partition; and allocating the available storage pools mapping each abstract partition to one or more storage pools in accordance with the set of performance constraints such that each abstract partition has zero or one changes in the physical storage pools allocated thereto.

"According to another embodiment, the improved scalable object storage system includes a system for coordinating events in a distributed system, comprising a plurality of subsidiary nodes coupled to a network, each subsidiary node including at least one processor, a computer-readable medium, and a communications interface, wherein information in a first subsidiary node needs to be synchronized with the information in a second subsidiary node in response to a time-varying series of requests; a first gateway, including a first processor, a first local clock, and a first communications interface; a second gateway, including a second processor, a second local clock, and a second communications interface; a timekeeping node coupled to the network, including a master clock; and a synchronization rectifier coupled to the first and second subsidiary nodes; wherein the timekeeping node is operationally coupled to the first and second gateways to reduce clock skew between the master clock, the first local clock and the second local clock below a configurable threshold; wherein the first gateway uses the first processor to timestamp a first request received over the first communications interface according to the time of the first local clock with a granularity at least equal to the configurable threshold; wherein the second gateway uses the second processor to timestamp a second request received over the second communications interface according to the time of the second local clock with a granularity at least equal to the configurable threshold; wherein synchronization between the first subsidiary node and the second subsidiary node is controlled by the later-occurring request if the first request and the second request are separated by a time greater than the configurable threshold; and wherein synchronization between the first subsidiary node and the second subsidiary node is controlled by the synchronization rectifier if the first request and the second request are separated by a time smaller than the configurable threshold.

"According to another embodiment, the improved scalable object storage system includes a method for coordinating events in a distributed system, comprising synchronizing a master clock to coordinated universal time within a master skew threshold; synchronizing a first local clock at a first gateway with the master clock within a system skew threshold, and synchronizing a second local clock at a second gateway with the master clock within the system skew threshold; receiving, at the first gateway, a first request to manipulate a non-volatile data storage, and marking the first request with the time of reception according to the first local clock, with a granularity at least equal to the system skew threshold; receiving, at the second gateway, a second request to manipulate the non-volatile data storage, and marking the second request with the time of reception according to the second local clock, with a granularity at least equal to the system skew threshold; evaluate the first request and the second request to determine if they are unambiguously ordered by determining if the first request and the second request are temporally ordered with a granularity greater than the system skew threshold; if the first request and the second request are unambiguously ordered, modifying the non-volatile data storage as directed in the later request; and if the first request and the second request are not unambiguously ordered, modifying the non-volatile data storage as directed by a deterministic tiebreaker.

"According to another embodiment, the improved scalable object storage system includes a non-transient computer readable medium containing executable instructions, which when executed on a processor synchronize a first local clock with a second local clock within a system skew threshold; receive a first request to manipulate a system resource and marks the first request with the time of reception according to the first local clock, with a granularity at least equal to the system skew threshold; receive a second request to manipulate the system resource and marks the second request with the time of reception according to the second local clock, with a granularity at least equal to the system skew threshold; evaluate the first request and the second request to determine if they are unambiguously ordered by determining if the first request and the second request are temporally ordered with a granularity greater than the system skew threshold; if the first request and the second request are unambiguously ordered, manipulates the system resource as directed in the later request; and if the first request and the second request are not unambiguously ordered, executes tiebreaker instructions controlling the system resource; and returns a success or error depending on the outcome of the tiebreaker instructions.

"According to another embodiment, the improved scalable object storage system includes a method for managing data items in a distributed storage pool, comprising providing a plurality of physical storage pools, each storage pool including a plurality of storage nodes coupled to a network, each storage node further providing a non-transient computer readable medium for data storage; storing a first replica of a data item in a first physical storage pools, and storing a second replica of the data item in a second physical storage pool; in response to receiving a modification instruction for the data item, selectively modifying the first replica of the data item, creating a first modification sentinel file, and storing the first modification sentinel file in the first physical storage pool; in response to encountering the first modification sentinel file during a data item replication process, modifying the second replica of the data item and creating a second modification sentinel file in the second physical storage pool.

"According to another embodiment, the improved scalable object storage system includes a system for out-of-band communication of object storage metadata, the system comprising a distributed storage system coupled to a network, the distributed storage including a first storage pool and a second storage pool from a plurality of storage pools, the first and second storage pools each including at least one processor, a computer readable medium, and a communications interface; wherein the first storage pool includes a first replica of a data item, and the second storage pool includes a second replica of the data; an object service responsive to modification instructions; and a replicator adapted to create a second replica of the data item in the second storage pool; wherein the object service responds to an out-of-band instruction by selectively modifying the first replica of the data item, creating a first modification sentinel file, and storing the first modification sentinel file in the first physical storage pool; and wherein the replicator responds to encountering the first modification sentinel file during a data item replication process by modifying the second replica of the data item and creating a second modification sentinel file in the second physical storage pool.

"According to another embodiment, the improved scalable object storage system includes a non-transient computer readable medium containing executable instructions, which when executed on a processor at a first time, run a replication procedure that takes a first copy of a data item in a first location and makes an identical second copy of the data item in a second location; at a second time, run an out-of-band modification procedure to selectively modify the first copy of the data item, create a first modification sentinel file, and store the first modification sentinel file in the first location; and at a third time, change the execution of the replication procedure to modify a the second copy of the data item and create a second modification sentinel file in the second location.

"According to another embodiment, the improved scalable object storage system includes a non-transient computer readable medium containing executable instructions, which when executed on a processor at a first time, run a replication procedure that takes a first copy of a data item in a first location and makes an identical second copy of the data item in a second location; at a second time, run an out-of-band modification procedure to selectively modify the first copy of the data item, create a first modification sentinel file, and store the first modification sentinel file in the first location; and at a third time, change the execution of the replication procedure to modify a the second copy of the data item and create a second modification sentinel file in the second location.

"According to another embodiment, the improved scalable object storage system includes a distributed information synchronization system, comprising a first subsidiary node coupled to a network, the first subsidiary node including a first non-transitory computer-readable medium wherein the first computer-readable medium includes a first structured information repository, and wherein information in the first structured information repository is subject to internal consistency constraints; a second subsidiary node coupled to a network, the second subsidiary node including a second non-transitory computer-readable medium wherein the second computer-readable medium includes a second structured information repository, and wherein information in the second structured information repository is subject to internal consistency constraints; a repository synchronizer coupled to the first and second structured information repositories; the repository synchronizer further including a consistency evaluation module adapted to evaluate the differences between the first structured information repository and the second structured information repository; an internal modification module adapted to modify the internal structures of a structured information repository; an external replication module adapted to delete a target structured information repository and replace it with a replicated copy of a source structured information repository; and a threshold comparator; wherein the repository synchronizer is adapted to evaluate the first and second structured information repositories and determine a level of difference and compare the level of difference to a configurable threshold using the threshold comparator; if the level of difference is above the configurable threshold, modify the internal structures of a selected structured information repository using the internal modification module; and if the level of difference is below the configurable threshold, delete the selected structured information repository and replace it with a replicated copy of a consistent structured information repository using the external replication module.

"According to another embodiment, the improved scalable object storage system includes a method for synchronizing structured information in a distributed system, comprising storing a first structured information repository on a first non-transitory computer-readable medium, wherein information in the first structured information repository is subject to internal consistency constraints; storing a second structured information repository on a second non-transitory computer-readable medium, wherein information in the second structured information repository is subject to internal consistency constraints; evaluating the differences between the first structured information repository and the second structured information repository to determine a preferred state and a difference measurement quantifying a difference from the preferred state; determining whether the difference measurement exceeds a configurable threshold; modifying a selected structured information repository if the difference measurement for the selected structured information repository is less than the configurable threshold, wherein the modification of the selected structured information repository is subject to the internal consistency constraints of the selected structured information repository, deleting the selected structured information repository if the difference measurement for the selected structured information repository is greater than the configurable threshold, and replacing the selected structured information repository with a replica of a structured information repository in the preferred state, wherein either modifying the selected structured information repository or deleting and replacing the structured information repository changes the non-transitory computer-readable medium storing the selected structured information repository such that the selected structured information repository is both compliant with its internal consistency constraints and in the preferred state. The method may also include determining that both the first structured information repository and the second structured information repository are not in the preferred state; pre-selecting the structured information repository that is closer to the preferred state and modifying the pre-selected structured information repository to bring the pre-selected structured information repository to the preferred state, subject to the internal consistency requirements of the pre-selected structured information repository, regardless of the configurable threshold.

"According to another embodiment, the improved scalable object storage system includes a non-transient computer readable medium containing executable instructions, which when executed on a processor update a first structured information repository on a first non-transitory computer-readable medium, subject to internal consistency constraints; update a second structured information repository on a second non-transitory computer-readable medium, subject to internal consistency constraints; evaluate the differences between the first structured information repository and the second structured information repository to determine a preferred state and a difference measurement quantifying a difference from the preferred state; determine whether the difference measurement exceeds a configurable threshold; modify a selected structured information repository if the difference measurement for the selected structured information repository is less than the configurable threshold, subject to the internal consistency constraints of the selected structured information repository, delete the selected structured information repository if the difference measurement for the selected structured information repository is greater than the configurable threshold, and replace the selected structured information repository with a replica of a structured information repository in the preferred state.

"According to another embodiment, the improved scalable object storage system includes a non-transient computer readable medium containing executable instructions, which when executed on a processor update a first structured information repository on a first non-transitory computer-readable medium, subject to internal consistency constraints; update a second structured information repository on a second non-transitory computer-readable medium, subject to internal consistency constraints; evaluate the differences between the first structured information repository and the second structured information repository to determine a preferred state and a difference measurement quantifying a difference from the preferred state; determine whether the difference measurement exceeds a configurable threshold; modify a selected structured information repository if the difference measurement for the selected structured information repository is less than the configurable threshold, subject to the internal consistency constraints of the selected structured information repository, delete the selected structured information repository if the difference measurement for the selected structured information repository is greater than the configurable threshold, and replace the selected structured information repository with a replica of a structured information repository in the preferred state.

"The specifics of these embodiments as well as other embodiments are described with particularity below.

BRIEF DESCRIPTION OF THE DRAWINGS

"FIG. 1a is a schematic view illustrating an embodiment of a file storage system.

"FIG. 1b is a schematic view illustrating an embodiment of an information handling system used in the file storage system of FIG. 1a.

"FIG. 2 is a schematic view illustrating an embodiment of a logical structure provided by the file storage system of FIG. 1a.

"FIG. 3 is a schematic view illustrating an embodiment of a user account.

"FIG. 4 is a flow chart illustrating an embodiment of a method for storing an object.

"FIG. 5 is a flow chart illustrating an embodiment of a method for creating a ring

"FIG. 6 flow chart illustrating an embodiment of a method for reassigning partitions in a ring."

For more information, see this patent application: Barton, Michael; Reese, Will; Dickinson, John A.; Payne, Jay B.; Thier, Charles B.; Holt, Gregory. Massively Scalable Object Storage System. Filed October 7, 2013 and posted February 13, 2014. Patent URL: http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=586&p=12&f=G&l=50&d=PG01&S1=20140206.PD.&OS=PD/20140206&RS=PD/20140206

Keywords for this news article include: Rackspace US Inc, Information Technology, Information and Data Storage, Information and Data Archiving, Information and Data Management.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Information Technology Newsweekly


Story Tools