News Column

Patent Issued for Data Deletion in a Distributed Data Storage System

September 9, 2014



By a News Reporter-Staff News Editor at Information Technology Newsweekly -- Solidfire, Inc. (Suwanee, GA) has been issued patent number 8819208, according to news reporting originating out of Alexandria, Virginia, by VerticalNews editors.

The patent's inventor is Wright, David D. (Dacula, GA).

This patent was filed on March 4, 2011 and was published online on August 26, 2014.

From the background information supplied by the inventors, news correspondents obtained the following quote: "Particular embodiments generally relate to a distributed data storage system.

"A unit of data, such as a file or object, includes one or more storage units (e.g., bytes), and can be stored and retrieved from a storage medium. For example, disk drives in storage systems are divided into logical blocks that are addressed using logical block addresses (LBAs). The disk drives use spinning disks where a read/write head is used to read/write data to/from the drive. It is desirable to store an entire file in a contiguous range of addresses on the spinning disk. For example, the file may be divided into blocks or extents of a fixed size. Each block of the file may be stored in a contiguous section of the spinning disk. The file is then accessed using an offset and length of the file. The contiguous range of addresses is used because disks are good at sequential access, but suffer performance degradation when random access to different non-contiguous locations is needed.

"Storage systems typically do not have a mechanism to minimize the amount of storage used when duplicate copies of data are stored. Duplicate data may occur at different locations within a single file or between different independent files all in the same file system. However, because clients store data based on addresses in the storage medium, duplicate data is typically stored. For example, a first client stores a first file in a first range of addresses and a second client stores a second file in a second range of addresses. Even if duplicate data is found in the first file and the second file, storage systems prefer to store the first file and the second file in separate contiguous locations so that the data for either file can be accessed sequentially.

"Some storage systems, such as a write-anywhere file layout (WAFL), a logical volume manager (LVM), or new technology file system (NTFS), allow multiple objects to refer to the same blocks through a tree structure to allow for efficient storage of previous versions. For example, a snapshot feature may eliminate some duplicate data caused by multiple versions of the same file, but this is only to the extent that different versions are created and controlled by the file system itself.

"Some data storage systems can identify and eliminate duplicate copies of data within or between files. However, these systems typically deal with monolithic systems. For example, the elimination may occur on a single computer system.

"At some point, data may be deleted from the data storage system. When there is a 1:1 mapping between client addresses and stored data blocks, the data may be deleted using the client address. However, the process of deleting data that is referenced by multiple client addresses is more complicated because other client addresses may be referencing the data, and deletion of the data should not be performed if other client addresses still are referencing the data."

Supplementing the background information on this patent, VerticalNews reporters also obtained the inventor's summary information for this patent: "In one embodiment, a method for removing unused storage units is provided. One or more storage units are referenced by multiple client addresses. The method includes constructing, on a metadata server, a filter on at least a portion of block identifiers that identify storage units currently being referenced by client addresses. The metadata server stores information on which storage unit identifiers are referenced by which client addresses. The filter is transmitted from the metadata server to a block server. The filter is used by the block server to test whether storage unit identifiers that exist on the block server are present in the filter. The block server stores information on where a storage unit is stored on the block server for a storage unit identifier. Storage unit identifiers not present in the filter and associated storage units are deleted from the block server.

"In one embodiment, the filter includes a Bloom filter. Storage unit identifiers that exist on the block server are tested with the Bloom filter to determine if any storage unit identifiers stored on the block server are currently referenced by any client addresses.

"In one embodiment, a method for removing unused storage units is provided. One or more storage units are referenced by multiple client addresses. The method includes: receiving, at a block server, a filter generated by a metadata server on at least a portion of storage unit identifiers that identify storage units currently being referenced by a client address, wherein the metadata server stores information on which storage unit identifiers are referenced by which client addresses; using the filter to test whether storage unit identifiers that exist on the block server are present in the filter, wherein the block server stores information on where a storage unit is stored on the block server for a storage unit identifier; and deleting, from the block server, storage unit identifiers not present in the filter and associated storage units.

"In one embodiment, a system includes a metadata server and a block server. The metadata server is configured to: construct a filter on at least a portion of storage unit identifiers that identify storage units, wherein the metadata server stores information on which storage unit identifiers are referenced by which client addresses. The block server is configured to: use the filter to test whether storage unit identifiers that exist on the block server are present in the filter, wherein the block server stores information on where a storage unit is stored on the block server for a storage unit identifier and delete, from the block server, storage unit identifiers not present in the filter and associated storage units.

"In one embodiment, the system includes a plurality of block servers where block servers are designated to store storage units associated with different ranges of storage unit identifiers. Each block server is configured to: receive a filter that includes storage unit identifiers in the range associated with the block server; and use the filter to test whether storage unit identifiers in the range associated with the block server are present in the filter.

"The following detailed description and accompanying drawings provide a better understanding of the nature and advantages of the present invention."

For the URL and additional information on this patent, see: Wright, David D.. Data Deletion in a Distributed Data Storage System. U.S. Patent Number 8819208, filed March 4, 2011, and published online on August 26, 2014. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=8819208.PN.&OS=PN/8819208RS=PN/8819208

Keywords for this news article include: Information Technology, Information and Data Storage, Solidfire, Solidfire Inc.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Information Technology Newsweekly


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters