News Column

Patent Issued for Equi-Joins between Split Tables

August 12, 2014



By a News Reporter-Staff News Editor at Information Technology Newsweekly -- From Alexandria, Virginia, VerticalNews journalists report that a patent by the inventors Peh, Thomas (Heidelberg, DE); Schwedes, Holger (Kraichtal, DE); Stephan, Wolfgang (Heidelberg, DE), filed on May 27, 2011, was published online on July 29, 2014.

The patent's assignee for patent number 8793287 is SAP AG (Walldorf, DE).

News editors obtained the following quote from the background information supplied by the inventors: "The present invention relates to database operations, and in particular to equi-join operations among split tables.

"Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.

"A common database operation in a relational database is the join operation. Generally, a join of two data sources creates an association of objects in one data source with objects that share a common attribute in another data source. A typical data structure for data sources is a data table (or simply, table) comprising rows and columns. Each row (or record) of the data table represents an object. Each column represents attributes of the object. For example, a data table may be defined for inventory in a retail store. The inventory items (e.g., pants, shirts, toasters, lamps, etc.) may constitute the objects represented by the data table. The attributes of each item may include such information as the name of the item, the number of items at that store, the location of the item in the store, and so on. Instances of an attribute are referred to as 'attribute values', 'actual values', or simply 'values.' An example of such a data table is shown in FIG. 14A, where each row 1402 represents a store item. Each row 1402 comprises attributes of the item columns 1404a-1404c. Each row 1402 may include an ID attribute 106 that identifies the row. For example, the ordinal position of a row 1402 in the data table may be used as the ID attribute.

"FIG. 14B shows an example of another data table called Mail-Order. A join operation between the Inventory and Mail Order data tables can be performed. For example, consider a so-called 'equi join' type of join operation where the join condition (join predicate) specifies a relationship (e.g., equality) between attributes that are common to both data tables. Suppose the join condition is: items in the Inventory data table that are the same as the items in the Mail-Order data table. For example, the join expression might be formulated as 'Table Inventory inner join Table MailOrder on Inventory.Item=Mail-Order.Item'.

"An execution plan (query plan) for performing the join operation may include the following steps: 1. read out a row from the Inventory table 2. compare the actual value of the Item attribute in the row that was read out from the Inventory table with the actual value of the Item attribute in a row of the Mail-Order table 3. if there is a match, then output the row that was read out from the Inventory table and the matching row in the Mail-Order table 4. repeat steps 2 and 3 for each row in the Mail-Order table 5. repeat steps 1-4 for each row in the Inventory table A result of the join operation can be represented by the data table shown in FIG. 14C.

"A database may comprise data tables that contain thousands of records each. In addition, records may have tens to hundreds of attributes each, and the actual values of some attributes may be lengthy (e.g., an attribute that represents the name of a person may require an allocation of 10-20 characters of storage space). Such databases can impose heavy requirements in the storage of their data. Accordingly, a practice of using dictionaries has arisen, where the actual values (e.g., 10-20 characters in length) of instances of an attribute in the data table are replaced by (or otherwise mapped to) an associated 'value ID' (e.g., two or three bytes in length).

"Consider the Inventory table and the Mail-Order table, for example. The actual values for instances of the Item attribute in the Inventory table include 'pants', 'shirts', 'toasters', and 'lamps'. A dictionary can be defined for the Item attribute. For example, the dictionary may store the actual values of the Item attribute in alphabetical order and the value IDs that are associated with the actual values might be the ordinal position of the actual values in the dictionary.

"An actual value in the data table is represented only once in the dictionary. For example, the actual value 'lamps' occurs in twice in the Mail-Order table, but there is only one entry in the dictionary; thus, the dictionary might look like: lamps pants shirts toasters The value ID associated with the actual value 'lamps' could be 1, being located in the first position in the dictionary. The value ID associated with the actual value 'pants' could be 2, being the second position in the dictionary, and so on.

"FIG. 15 shows the Inventory and Mail-Order tables of FIGS. 14A and 14B, modified by the use of a dictionary, more specifically a central dictionary. In particular, the actual values for instances of the Item attribute in the data tables (i.e., text) have been replaced by their corresponding associated value IDs (i.e., an integer). It can be appreciated that the use of dictionaries can reduce the storage burden of large databases.

"The distribution of databases across separate database servers is commonly employed, for example, to distribute the storage burden across multiple sites. In a distributed database configuration, one or more constituent data tables of the database are partitioned (split) into some number of 'partitions,' and the partitions are distributed across many database servers. While the processing of certain queries in a distributed database configuration may be accomplished using only the data within a given partition of a data table, queries that involve a join operation require access to data from all of the partitions of the data tables being joined.

"The execution plan of a join operation involving split (partitioned) data tables conventionally involves communicating the actual values of the attribute(s) specified in the join condition among the partitions in order to evaluate the join condition. One can appreciate that the execution plan may therefore entail a significant amount of data communication among the constituent partitions. As explained above, a dictionary can be used to reduce the space requirements for storing attribute values. Accordingly, each partition may be provided with its own local dictionary (rather than the central dictionary indicated in FIG. 15), the idea being that the associated value IDs can then be communicated among the partitions instead of the actual values. However, the value IDs in a given local dictionary are generated independently of the values IDs in the other local dictionaries. In other words, value IDs locally generated in one partition of a data table may have no correlation to value IDs locally generated in another partition of that data table. Suppose, for example, the Item attribute is specified in a join condition. Suppose further that the actual value 'pants' has a value ID of 2 in the local dictionary of one partition, a value ID of 7 in the local dictionary of another partition, and a value ID of 15 in yet another partition. The execution plan for the join operation may communicate the multiple different value IDs for 'pants' (i.e., 2, 7, 15) among the partitions. However, the value IDs would be meaningless in any one partition for the join operation because value IDs only have meaning for the partition in which they were generated. For example, while the value ID 2 may be associated with 'pants' in one partition, the value IDs 7 and 15 do not, and in fact very likely may be associated with completely different items; the value IDs could not be used to perform a join operation.

"These and other issues are addressed by embodiments of the present invention, individually and collectively."

As a supplement to the background information on this patent, VerticalNews correspondents also obtained the inventors' summary information for this patent: "In embodiments, a join operation between a first split data table and a second split data table includes receiving reduction data from each of first partitions of the first data table. In a second partition, actual values of a join attribute that occurs in the second partition which also occur in one of the first partitions is assigned a global ID. A globalized list for the second partition includes a Doc ID that identifies a data record in the second partition for which the actual value of the join attribute also occurs in one of the first partitions. The corresponding global ID is associated with that actual value. Each first partition receives a table of global IDs that are associated with actual values in the first partition. Each first partition creates a globalized list that includes a Doc ID identifying data records in the first partition for which the actual value of the join attribute is identified by a global ID in the received table. The join operation can then be performed using the globalized lists of the first partitions and the globalized lists of the second partitions.

"In an embodiment, a computer system can have stored therein executable program code configured to cause the computer system to perform the foregoing steps.

"In embodiments, the same global ID may be associated with an actual value that occurs in the second partition and in at least one or more of the first partitions.

"In embodiments, multiple occurrences of an actual value in the second partition are associated with the same global ID.

"The following detailed description and accompanying drawings provide a better understanding of the nature and advantages of the present invention."

For additional information on this patent, see: Peh, Thomas; Schwedes, Holger; Stephan, Wolfgang. Equi-Joins between Split Tables. U.S. Patent Number 8793287, filed May 27, 2011, and published online on July 29, 2014. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=8793287.PN.&OS=PN/8793287RS=PN/8793287

Keywords for this news article include: SAP AG, Information Technology, Information and Data Tabulation.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Information Technology Newsweekly


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters