News Column

Patent Issued for Data De-Duplication in Computer Storage Systems

July 29, 2014



By a News Reporter-Staff News Editor at Information Technology Newsweekly -- A patent by the inventors Chavda, Kavita (Roswell, GA); Davis Rozier, Eric W. (Champaign, IL); Mandagere, Nagapramod S. (San Jose, CA); Uttamchandani, Sandeep M. (San Jose, CA); Zhou, Pin (San Jose, CA), filed on November 1, 2011, was published online on July 15, 2014, according to news reporting originating from Alexandria, Virginia, by VerticalNews correspondents.

Patent number 8781800 is assigned to International Business Machines Corporation (Armonk, NY).

The following quote was obtained by the news editors from the background information supplied by the inventors: "Data de-duplication is increasingly being adopted to reduce the data footprint of backup and archival storage, and more recently has become available for near-line primary storage controllers. Scale-out file systems are increasingly diminishing the silos between primary and archival storage by applying de-duplication to unified petabyte-scale data repositories spanning heterogeneous storage hardware. Cloud providers are also actively evaluating de-duplication for their heterogeneous commodity storage infrastructures and ever-changing customer workloads.

"While the cost of data de-duplication in terms of time spent on de-duplicating and reconstructing data is reasonably well understood, the impact of de-duplication on data reliability may not be as well known (e.g., especially in large-scale storage systems with heterogeneous hardware). Since traditional de-duplication keeps only a single instance of redundant data, such an approach magnify the negative impact of losing a data chunk in chunk-based de-duplication that divides a file into multiple chunks, or of missing a file in de-duplication using delta encoding that stores the differences among files. Administrators and system architects have found understanding the data reliability of systems under de-duplication to be important but challenging."

In addition to the background information obtained for this patent, VerticalNews journalists also obtained the inventors' summary information for this patent: "In general, embodiments of the present invention provide an approach that utilizes discrete event simulation to quantitatively analyze the reliability of a modeled de-duplication system in a computer storage environment. In addition, the approach described herein can perform such an analysis on systems having heterogeneous data stored on heterogeneous storage systems in the presence of primary faults and their secondary effects due to de-duplication. In a typical embodiment, data de-duplication parameters and a hardware configuration are received in a computer storage medium. A data de-duplication model is then applied to a set of data and to the data de-duplication parameters, and a hardware reliability model is applied to the hardware configuration. Then a set (at least one) of discrete events is simulated based on the data de-duplication model as applied to the set of data and the data de-duplication parameters, and the hardware reliability model as applied to the hardware configuration. Based on the simulation, a set of data reliability and availability estimations/estimates can be generated (e.g., and outputted/provided).

"A first aspect of the present invention provides a computer-implemented method for analyzing de-duplicated data storage systems, comprising: receiving data de-duplication parameters in a computer storage medium; receiving a hardware configuration for a data de-duplication system in the computer storage medium; applying a data de-duplication model to a set of data and to the data de-duplication parameters; applying a hardware reliability model to the hardware configuration; simulating a set of discrete events based on the data de-duplication model as applied to the set of data and the data de-duplication parameters, and the hardware reliability model as applied to the hardware configuration; and generating a set of data reliability and availability estimations based on the set of discrete events.

"A second aspect of the present invention provides a system for analyzing de-duplicated data storage systems, comprising: a memory medium comprising instructions; a bus coupled to the memory medium; and a processor coupled to the bus that when executing the instructions causes the system to: receive data de-duplication parameters in a computer storage medium; receive a hardware configuration for a data de-duplication system in the computer storage medium; apply a data de-duplication model to a set of data and to the data de-duplication parameters; apply a hardware reliability model to the hardware configuration; simulate a set of discrete events based on the data de-duplication model as applied to the set of data and the data de-duplication parameters, and the hardware reliability model as applied to the hardware configuration; and generate a set of data reliability and availability estimations based on the set of discrete events.

"A third aspect of the present invention provides a computer program product for analyzing de-duplicated data storage systems, the computer program product comprising a computer readable storage media, and program instructions stored on the computer readable storage media, to: receive data de-duplication parameters in a computer storage medium; receive a hardware configuration for a data de-duplication computer program product in the computer storage medium; apply a data de-duplication model to a set of data and to the data de-duplication parameters; apply a hardware reliability model to the hardware configuration; simulate a set of discrete events based on the data de-duplication model as applied to the set of data and the data de-duplication parameters, and the hardware reliability model as applied to the hardware configuration; and generate a set of data reliability and availability estimations based on the set of discrete events.

"A fourth aspect of the present invention provides a method for deploying a system for analyzing de-duplicated data storage systems, comprising: providing a computer infrastructure being operable to: receive data de-duplication parameters in a computer storage medium; receive a hardware configuration for a data de-duplication computer program product in the computer storage medium; apply a data de-duplication model to a set of data and to the data de-duplication parameters; apply a hardware reliability model to the hardware configuration; simulate a set of discrete events based on the data de-duplication model as applied to the set of data and the data de-duplication parameters, and the hardware reliability model as applied to the hardware configuration; and generate a set of data reliability and availability estimations based on the set of discrete events."

URL and more information on this patent, see: Chavda, Kavita; Davis Rozier, Eric W.; Mandagere, Nagapramod S.; Uttamchandani, Sandeep M.; Zhou, Pin. Data De-Duplication in Computer Storage Systems. U.S. Patent Number 8781800, filed November 1, 2011, and published online on July 15, 2014. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=8781800.PN.&OS=PN/8781800RS=PN/8781800

Keywords for this news article include: Information Technology, Information and Data Storage, International Business Machines Corporation.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Information Technology Newsweekly


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters