News Column

Patent Issued for Higher Efficiency Storage Replication Using Compression

July 22, 2014



By a News Reporter-Staff News Editor at Information Technology Newsweekly -- According to news reporting originating from Alexandria, Virginia, by VerticalNews journalists, a patent by the inventors Holt, Gregory Lee (Hollywood Park, TX); Gerrard, Clay (San Antonio, TX); Goetz, David Patrick (San Antonio, TX); Barton, Michael (San Antonio, TX), filed on October 21, 2011, was published online on July 8, 2014.

The assignee for this patent, patent number 8775375, is Rackspace US, Inc. (San Antonio, TX).

Reporters obtained the following quote from the background information supplied by the inventors: "The present disclosure relates generally to cloud computing, and more particularly to a massively scalable object storage system to provide storage for a cloud computing environment. Cloud computing services can provide computational capacity, data access, networking/routing and storage services via a large pool of shared resources operated by a cloud computing provider. Because the computing resources are delivered over a network, cloud computing is location-independent computing, with all resources being provided to end-users on demand with control of the physical resources separated from control of the computing resources.

"As a term, 'cloud computing' describes a consumption and delivery model for IT services based on the Internet, and it typically involves over-the-Internet provisioning of dynamically scalable and often virtualized resources. This frequently takes the form of web-based tools or applications that users can access and use through a web browser as if it were a program installed locally on their own computer. Details are abstracted from consumers, who no longer have need for expertise in, or control over, the technology infrastructure 'in the cloud' that supports them. Most cloud computing infrastructures consist of services delivered through common centers and built on servers. Clouds often appear as single points of access for consumers' computing needs, and do not require end-user knowledge of the physical location and configuration of the system that delivers the services.

"Because the flow of services provided by the cloud is not directly under the control of the cloud computing provider, cloud computing requires the rapid and dynamic creation and destruction of computational units, frequently realized as virtualized resources. Maintaining the reliable flow and delivery of dynamically changing computational resources on top of a pool of limited and less-reliable physical servers provides unique challenges. Accordingly, it is desirable to provide a better-functioning cloud computing system with superior operational capabilities."

In addition to obtaining background information on this patent, VerticalNews editors also obtained the inventors' summary information for this patent: "In one embodiment, a trust and federation relationship is established between a first cluster and a second cluster. This is done by designating a first cluster as a trust root. The trust root receives contact from another cluster, and the two clusters exchange cryptographic credentials. The two clusters mutually authenticate each other based upon the credentials, and optionally relative to a third identity, authorization, or authentication service. Following the authentication of the two clusters, a service connection is established between the two clusters and services from the remote cluster are registered as being available to the cluster designated as the trust root. In further embodiments, a multi-cluster gateway is designated as the trust root, and the two clusters can be mutually untrusting. In a third embodiment, the remote cluster can be also designated as a trust root, and two one-way trust and federation relationships can be set up to form a trusted bidirectional channel.

"When a trusted connection is set up between the two clusters, a user working with the first cluster, or with a multi-cluster gateway, can ask for services and have the request or data transparently proxied to the second cluster. Cross-cluster replication is one anticipated service, as are multi-cluster compute or storage farms based upon spot availability or various provisioning policies. For example, a vendor providing a cloud storage 'frontend' could provide multiple backends simultaneously using the trust and federation relationship.

"In one embodiment, a multi-cluster gateway can have a two, three, or higher-level ring that transparently matches an incoming request with the correct cluster. In the ring, a request is first mapped to an abstract 'partition' based on a consistent hash function, and then one or more constrained mappings map the partition number to an actual resource. In another embodiment, the multi-cluster gateway is a dumb gateway, and the rings are located only at the cluster level.

"Various embodiments use existing cryptographic or authentication protocols when exchanging tokens or verifying each other; shared secrets, a public/private keypairs, a digital certificates, Kerberos, XAUTH and OAUTH are all contemplated. Separate authentication entities are also contemplated, such as an OpenID provider, LDAP store, or RADIUS server.

"In another embodiment, there is a multi-cluster synchronization system between two or more clusters. Each cluster has a cluster-internal network, with object storage services and container services. The container services track and replicate metadata associated with the object storage service. An intercluster network connects the two clusters and performs a one-way synchronization of the objects and metadata associated with a particular container. This can be done either through the direct association of the container and object storage services, such as through a trust and federation relationship, or it can be opaque, so that the cross-cluster replication treats the remote repository as a black box and uses the external API to call and manipulate the files.

"In a further embodiment, multiple synchronization relationships can be set up, either in a cycle (with two or more participants), in a line, or in a tree. For example, the multi-cluster replication could be used to transparently synchronize objects in a CDN network.

"In another embodiment, the multi-cluster synchronization system uses variable compression to optimize the transfer of information between multiple clusters. Aside from the simple use of compression to minimize the total number of bytes sent between the two clusters, the size of the objects sent across the wire can be dynamically changed using file compression to optimize for higher throughput after considering packet loss, TCP windows, and block sizes. This includes both the packaging of multiple small files together into one larger compressed file, saving on TCP and header overhead, but also the chunking of large files into multiple smaller files that are less likely to have difficulties due to intermittent network congestion or errors. Depending on the state of the network and disks, the best size can vary; examples range from approximately 4 MB (largest non-fragmented packet using jumbo frames) to 64 MB (block size on some distributed filesystems) to 1 GB and above. A further embodiment uses forward error correction to maximize the chances that the remote end will be able to correctly reconstitute the transmission.

"According to another embodiment, the improved scalable object storage system includes a distributed information synchronization system, comprising a first subsidiary node coupled to a network, the first subsidiary node including a first non-transitory computer-readable medium wherein the first computer-readable medium includes a first structured information repository, and wherein information in the first structured information repository is subject to internal consistency constraints; a second subsidiary node coupled to a network, the second subsidiary node including a second non-transitory computer-readable medium wherein the second computer-readable medium includes a second structured information repository, and wherein information in the second structured information repository is subject to internal consistency constraints; a repository synchronizer coupled to the first and second structured information repositories; the repository synchronizer further including a consistency evaluation module adapted to evaluate the differences between the first structured information repository and the second structured information repository; an internal modification module adapted to modify the internal structures of a structured information repository; an external replication module adapted to delete a target structured information repository and replace it with a replicated copy of a source structured information repository; and a threshold comparator; wherein the repository synchronizer is adapted to evaluate the first and second structured information repositories and determine a level of difference and compare the level of difference to a configurable threshold using the threshold comparator; if the level of difference is above the configurable threshold, modify the internal structures of a selected structured information repository using the internal modification module; and if the level of difference is below the configurable threshold, delete the selected structured information repository and replace it with a replicated copy of a consistent structured information repository using the external replication module.

"According to another embodiment, the improved scalable object storage system includes a method for synchronizing structured information in a distributed system, comprising storing a first structured information repository on a first non-transitory computer-readable medium, wherein information in the first structured information repository is subject to internal consistency constraints; storing a second structured information repository on a second non-transitory computer-readable medium, wherein information in the second structured information repository is subject to internal consistency constraints; evaluating the differences between the first structured information repository and the second structured information repository to determine a preferred state and a difference measurement quantifying a difference from the preferred state; determining whether the difference measurement exceeds a configurable threshold; modifying a selected structured information repository if the difference measurement for the selected structured information repository is less than the configurable threshold, wherein the modification of the selected structured information repository is subject to the internal consistency constraints of the selected structured information repository, deleting the selected structured information repository if the difference measurement for the selected structured information repository is greater than the configurable threshold, and replacing the selected structured information repository with a replica of a structured information repository in the preferred state, wherein either modifying the selected structured information repository or deleting and replacing the structured information repository changes the non-transitory computer-readable medium storing the selected structured information repository such that the selected structured information repository is both compliant with its internal consistency constraints and in the preferred state. The method may also include determining that both the first structured information repository and the second structured information repository are not in the preferred state; pre-selecting the structured information repository that is closer to the preferred state and modifying the pre-selected structured information repository to bring the pre-selected structured information repository to the preferred state, subject to the internal consistency requirements of the pre-selected structured information repository, regardless of the configurable threshold.

"According to another embodiment, the improved scalable object storage system includes a non-transient computer readable medium containing executable instructions, which when executed on a processor update a first structured information repository on a first non-transitory computer-readable medium, subject to internal consistency constraints; update a second structured information repository on a second non-transitory computer-readable medium, subject to internal consistency constraints; evaluate the differences between the first structured information repository and the second structured information repository to determine a preferred state and a difference measurement quantifying a difference from the preferred state; determine whether the difference measurement exceeds a configurable threshold; modify a selected structured information repository if the difference measurement for the selected structured information repository is less than the configurable threshold, subject to the internal consistency constraints of the selected structured information repository, delete the selected structured information repository if the difference measurement for the selected structured information repository is greater than the configurable threshold, and replace the selected structured information repository with a replica of a structured information repository in the preferred state.

"According to another embodiment, the improved scalable object storage system includes a non-transient computer readable medium containing executable instructions, which when executed on a processor update a first structured information repository on a first non-transitory computer-readable medium, subject to internal consistency constraints; update a second structured information repository on a second non-transitory computer-readable medium, subject to internal consistency constraints; evaluate the differences between the first structured information repository and the second structured information repository to determine a preferred state and a difference measurement quantifying a difference from the preferred state; determine whether the difference measurement exceeds a configurable threshold; modify a selected structured information repository if the difference measurement for the selected structured information repository is less than the configurable threshold, subject to the internal consistency constraints of the selected structured information repository, delete the selected structured information repository if the difference measurement for the selected structured information repository is greater than the configurable threshold, and replace the selected structured information repository with a replica of a structured information repository in the preferred state.

"The specifics of these embodiments as well as other embodiments are described with particularity below."

For more information, see this patent: Holt, Gregory Lee; Gerrard, Clay; Goetz, David Patrick; Barton, Michael. Higher Efficiency Storage Replication Using Compression. U.S. Patent Number 8775375, filed October 21, 2011, and published online on July 8, 2014. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=8775375.PN.&OS=PN/8775375RS=PN/8775375

Keywords for this news article include: Rackspace US Inc, Information Technology, Information and Cryptography, Information and Data Archiving.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Information Technology Newsweekly


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters