News Column

Patent Issued for Hierarchical Storage Management for Database Systems

February 11, 2014



By a News Reporter-Staff News Editor at Information Technology Newsweekly -- According to news reporting originating from Alexandria, Virginia, by VerticalNews journalists, a patent by the inventors Augenstein, Oliver (Boeblingen, DE); Bender, Stefan (Woerrstadt, DE); Fleckenstein, Karl (Malsch, DE); Uhl, Andreas (Herrenberg, DE), filed on July 31, 2012, was published online on January 28, 2014.

The assignee for this patent, patent number 8639880, is International Business Machines Corporation (Armonk, NY).

Reporters obtained the following quote from the background information supplied by the inventors: "The present invention relates in general to computers, and more particularly to apparatus, method and computer program product embodiments managing data in a hierarchical storage server storing data blocks on primary and secondary storage devices, the primary storage devices being in an active mode and the secondary storage devices being in an active or passive mode.

"Hierarchical Storage Management HSM is a data storage technique which automatically moves data between a primary and a secondary storage tier. HSM is sometimes also referred to as tiered storage. In HSM systems, data files that are frequently used are stored on high-speed storage devices of the primary storage tier, such as hard disk drive arrays. They are more expensive per byte stored than slower devices of the secondary storage tier, such as optical discs and magnetic tape drives. The bulk of application data is stored on the slower low-cost secondary storage devices and copied to the faster high-cost disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system migrates data files from the primary disk drives to the secondary tape drives if they have not been used for a certain period of time, typically a few months. This data migration frees expensive disk space on the primary storage devices. If an application does reuse a file which is on a secondary storage device, it is automatically recalled, that is, moved back to the primary disk storage. Due to this transparent file recall capability, the file remains accessible from a client application although it has been physically migrated to the secondary storage. HSM is implemented, for example, in the Tivoli.RTM. Storage Manager. For details, see http://www-.ibm.com/software/tivoli/products/storage-mgr/.

"HSM is advantageous for file systems that contain a large quantity of inactive files that are used only sporadically. In a typical HSM system, 90 percent of the file data can be stored off-site on secondary storage devices. Only a small quantity of spare storage space is needed on the primary storage devices to recall a small portion of the file data from the secondary storage devices when the client application requests access to this small portion.

"For this reason, HSM is, however, not well suited for databases. Database systems store table and index data in large datasets or files that are accessed randomly. Even if the tables and indices are assigned to different datasets or files, the database system can infrequently access all data at periodic points in time, for example, at the end of a month or year. Since the prior art HSM systems only provide a small quantity of free space on the primary devices to store recalled data, this free space would not be sufficient to store a large quantity of the recalled data temporarily."

In addition to obtaining background information on this patent, VerticalNews editors also obtained the inventors' summary information for this patent: "Storage controllers often support a suite of replication services that allow copy of a volume to be created. The types of copies supported may vary by product, but may include such features as a point in time copy or a continuous copy of either a synchronous or asynchronous copy. The copy function is generally invoked by establishing a relationship between two or more volumes. The topology relationships between the two or more volumes may be linear e.g., A.fwdarw.B.fwdarw.C.fwdarw.D . . . or may be branched e.g., A.fwdarw.B & C.

"In view of the foregoing, a need exists for a data management method using a hierarchical storage server system that saves costs of purchase and operation and provides faster access to larger quantities of stored data that are requested infrequently. Accordingly, various embodiments for managing data in a hierarchical storage server storing data blocks on primary and secondary storage devices are provided. In one such embodiment, by way of example only, the primary storage devices are in an active mode and the secondary storage devices are in one of an active and passive mode. A read request is received that has been sent from a client application to read a data block from a logical storage location of the hierarchical storage server. A physical storage location and one of the primary and secondary storage devices are determined when the determined physical storage location is associated with the requested logical storage location according to stored logical-to-physical mappings and the determined physical storage location resides on the determined primary or secondary storage device. When the determined physical storage location resides on the determined primary storage device, the data block is read from the determined physical storage location. When the determined physical storage location resides on the determined secondary storage device being in the active mode, the data block is read from the determined physical storage location. When the determined physical storage location resides on the determined secondary storage device being in the passive mode, the determined secondary storage device is switched over from the passive mode to the active mode and the data block is read from the determined physical storage location on the determined secondary storage device having been switched over from the passive mode to the active mode. The read data block is returned to the client application.

"According to a pre-defined data recall policy, sets of source physical storage locations may be determined when the source physical storage locations have been read-accessed within a pre-defined minimum activation time and each of the sets resides on a respective secondary storage device in the active mode. The data recall policy may comprise rules and configurable parameters thresholds to control a data recall. References to the source physical storage locations may be recorded. According to the data recall policy, one or more of the determined sets of source physical storage locations may be determined as sources for a data recall when quantities of the recorded sets of source physical storage locations fall below respective data recall source low-water marks for the respective secondary storage devices. The thresholds of the data recall policy may comprise the minimum activation time and the data recall low-water marks. Target physical storage locations residing on one or more primary storage devices may be determined. The data recall may be performed by copying data blocks from the determined sets of source physical storage locations to the determined target physical storage locations.

"Each of the logical-to-physical mappings may associate one respective physical storage location with one respective logical storage location. The associated logical and physical storage locations may be marked as used or unused. The client application may be only able to read from and write to logical storage locations that are marked as used.

"A write request may be received that has been sent from the client application to write a data block to a first logical storage location of the hierarchical storage server. A used first physical storage location and one of the primary and secondary storage devices may be determined when the determined first physical storage location is associated with the first logical storage location according to the stored logical-to-physical mappings and when the determined first physical storage location resides on the determined primary or secondary storage device. When the determined first physical storage location resides on the determined primary storage device, the data block may be written to the determined first physical storage location on the determined primary storage device. When the determined first physical storage location resides on the determined secondary storage device, an unused second physical storage location and a further one of the primary storage devices may be determined, where the determined second physical storage location resides on the determined further primary storage device. The further primary storage device can be different from the primary storage device of the preceding case. The second physical storage location may be associated with a second logical storage location according to the stored logical-to-physical mappings. The determined second physical storage location may be marked as used. The data block may be written to the determined second physical storage location. The logical-to-physical mappings may be modified by associating the first logical storage location with the second physical storage location and the second logical storage location with the first physical storage location. The determined first physical storage locations may be marked as unused. The modification of the logical-to-physical mappings does not require switching over the secondary storage device from the passive mode to the active mode.

"Quantities of used and unused physical logical storage locations of the primary and secondary storage devices may be collected.

"Last read access timestamps of the respective secondary storage devices may be recorded. A set of secondary storage devices in the active mode may be determined. A first subset and a second subset of the determined set of secondary storage devices may be determined when the determined set of secondary storage devices exceeds a pre-defined maximum number, the first subset of secondary storage devices has at most the pre-defined maximum number, and the secondary storage devices of the second subset have received no read requests from the client application within a pre-defined minimum activation time and have lower percentages of unused physical storage locations than the secondary storage devices of the first subset. The secondary storage devices of the determined second subset may be switched over from the active mode to the passive mode.

"According to a pre-defined data migration policy, source primary storage devices may be determined as sources for a data migration and target secondary storage devices may be determined as targets for the data migration. The data migration policy may comprise rules and configurable parameters thresholds to control the data migration. According to the data migration policy, pairs of respective used source physical storage locations and respective unused target physical storage locations may be determined. The determined source physical storage locations reside on the determined source primary storage devices and are associated with respective source logical storage locations. The target physical storage locations reside on the determined target secondary storage devices and are associated with respective target logical storage locations. The determined target physical storage locations may be marked as used. The data migration may be performed by copying for each of the determined pairs a respective data block from the respective source physical storage location to the respective target physical storage location. The logical-to-physical mappings may be modified by associating for each of the determined pairs the respective target logical storage location with the respective source physical storage location and the respective source logical storage location with the respective target physical storage location. The determined source physical storage locations may be marked as unused.

"Source physical storage locations may be determined in response to a data migration request sent from the client application.

"The data migration policy may comprise the following steps: Last access timestamps of used source physical storage locations on the primary storage devices may be recorded. First quantities of used source physical storage locations on the respective source primary storage devices may be determined. As sources for the data migration, source primary storage devices may be determined when the respective first quantities exceed respective data migration source high-water marks. Second quantities of used target physical storage locations on the respective target secondary storage devices may be determined. As targets for the data migration, target secondary storage devices may be determined when the respective second quantities fall below respective data migration target low-water marks. The determined target secondary storage devices may be switched over from the passive mode to the active mode when the determined target secondary storage devices are in the passive mode and before a data migration to the determined target secondary storage devices is started. An ascending order of last access timestamps of the used source physical storage locations may be determined. The data migration of data blocks may be started from subsets of used source physical storage locations residing on the respective determined source primary storage devices to unused target physical storage locations residing on the determined target secondary storage devices being in the active mode. The data migration may be performed in the determined order. The data migration may be stopped for the determined source primary storage devices when the respective first quantities fall below respective data migration source low-water marks and for the determined target secondary storage devices when the respective second quantities exceed respective data migration target high-water marks.

"First quantities of used physical storage locations on the respective primary storage devices may be determined. According to the data recall policy, target primary storage devices may be determined as targets for the data recall when the respective first quantities of the target primary storage devices fall below respective data recall target low-water marks. According to the data recall policy, pairs of respective used source physical storage locations and respective unused target physical storage locations may be determined. The determined source physical storage locations reside on the determined source secondary storage devices and are associated with respective source logical storage locations. The determined target physical storage locations reside on the determined target primary storage devices and are associated with respective target logical storage locations. The determined target physical storage locations may be marked as used. The data recall may be performed by copying for each of the determined pairs a respective data block from the respective source physical storage location to the respective target physical storage location. The logical-to-physical mappings may be modified by associating for each of the determined pairs the respective target logical storage location with the respective source physical storage location and the respective source logical storage location with the respective target physical storage location. The determined source physical storage locations may be marked as unused.

"In addition to the foregoing exemplary embodiment, various other system and computer program product embodiments are provided and supply related advantages."

For more information, see this patent: Augenstein, Oliver; Bender, Stefan; Fleckenstein, Karl; Uhl, Andreas. Hierarchical Storage Management for Database Systems. U.S. Patent Number 8639880, filed July 31, 2012, and published online on January 28, 2014. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=17&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=803&f=G&l=50&co1=AND&d=PTXT&s1=20140128.PD.&OS=ISD/20140128&RS=ISD/20140128

Keywords for this news article include: Information Technology, Information and Data Migration, International Business Machines Corporation.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Information Technology Newsweekly


Story Tools