Patent number 8655877 is assigned to
The following quote was obtained by the news editors from the background information supplied by the inventors: "In order to handle an ever growing amount of information, database management systems have recently been implemented on what are known as distributed or parallel architectures. One form of distributed architecture is the shared-nothing architecture. This architecture is called shared-nothing because the architecture does not include a single point of failure, such as a shared disk or shared memory.
"A typical shared nothing system includes a plurality of database partitions coupled together by a network. The database partitions generally include a set of storage devices, such as disk drives, and one or more processors. The network may comprise various types of networks, such as a local area network, and physically diverse paths. Data in a shared nothing system is partitioned using a partitioning scheme such as a hash partitioning scheme and stored across the various database partitions that are connected together through the network. Thus, each database partition keeps a part of the overall database and various operations on parts of the database in different database partitions may be performed in parallel.
"Data in the database is kept as tuples in relations that comprises of logically grouped attributes. Tuples and attributes in relations are also referred to as rows and columns in tables. An important feature of any database system, including shared-nothing systems, is the ability to process operations between rows in tables, such as relating data in one table to data in another table. These types of operations are particularly useful when querying related data from multiple tables such that the data needs to be retrieved in a meaningful way. For example, structured query language ('SQL') supports what are known as join operations for querying data from multiple tables. Indeed, many consider the join operation as the cornerstone of database management systems.
"A join operation may be expensive to perform in that it may require a large amount of processing and communication resources. One form of join operation in a distributed database system is the semi-join operation. A semi-join operation involves two tables R and S that are stored on different sites within a distributed system. A semi-join from R to S is performed by first projecting R on the join attributes at the site of R. The join attributes typically include one or more keys for selecting various tuples from tables R and S. Site R sends the resulting projection to the site of S, where a join operation is performed. The projection of R essentially acts as a filter of S, and thus, the result of the semi-join is usually smaller than S. Site S then sends to site R the semi-join result to complete the join operation. Hence, semi-joins can require creating and sending projections of one of the tables involved in the join across a network when the tables involved in the join are stored at different sites. This could be expensive and hence techniques to reduce semi-join processing are needed.
"Using bit filters is one technique used to reduce the amount of communication during the execution of semi-joins. For example, bit filters can be used to reduce the size of the projection of the joining keys, and thus, reduce the amount of data communication needed. A bit filter operates by hashing the projection data into a set of bits. This set of bits serves as a compressed representation of the hashed data. Bit filters are also known as Bloom filters or Hash filters by those skilled in the art. Prior art that looked at semi-joins in the context of distributed systems essentially dealt with joining tables R and S located at two sites where each site contained a single table.
"In general, data in tables in a partitioned environment may be distributed over multiple database partitions. One way to improve the performance of a database system is to divide the join operation or query so that each database partition can work in parallel with little inter-node communication. If data within two tables is partitioned in a similar way, a collocated join would be the most efficient join method. This can be done if the join attributes are the same as the partitioning attributes of the tables. No data would need to be moved to do the join. More often than not, this match of the partitioning keys and the join attributes is not possible and tuples of one or both tables need to be moved across the database partitions in order to perform the join operation. To minimize movement of tuples, bit filters might be used. In known systems, bit filters are broadcast between the various database partitions during various stages of a database operation. However, broadcasting bit filters themselves can consume a large amount of communication resources of the system. For example, when a particular database partition of one table is performing operations for a join, it must broadcast its updated bit filter to all the database partitions of the other table participating in the join. Some systems might use compressed bit filters to improve their efficiency. Unfortunately, even in a compressed form, bit filters can become a tremendous burden particularly when there are a large number of database partitions involved. Each database partition associated with one table requires enough memory space to store all the bit filters of all the other table's database partitions. In addition, database partitions may update their data frequently, and thus, may frequently need to broadcast/re-broadcast their bit filters. This can consume significant communication resources in the database system.
"Therefore, it would be desirable to provide methods and systems that efficiently uses processing and communications resources when performing database operations. It would also be desirable, among other things, to provide methods and systems that efficiently use memory and data communications resources to perform various types of join operations across database partitions."
In addition to the background information obtained for this patent, VerticalNews journalists also obtained the inventors' summary information for this patent: "In an aspect of the present invention, there is provided a data processing system for joining rows associated with a column of a source table with rows associated with a column of a target table, the data processing system including a source node containing the source table and including a target node containing the target table, the data processing system including a generating module for generating a reduced representation of selected rows associated with the column of the source table, and generating a representation of the column of the target table, a filtering module for filtering the generated reduced representation of selected rows associated with the column of the source table through the generated representation of the column of the target table, the filtered generated reduced representation of selected rows identifying source table rows that do not have to be joined with the target table, and a joining module for joining, to the rows associated with the column of the target table, the rows associated with the column of the source table minus the filtered generated reduced representation of selected rows.
"In another aspect of the present invention, there is provided an article of manufacture for directing a data processing system to join rows associated with a column of a source table with rows associated with a column of a target table, the data processing system including a source node containing the source table and including a target node containing the target table, the article of manufacture including a program usable medium embodying one or more instructions executable by the data processing system, the one or more instructions including data processing system executable instructions for generating a reduced representation of selected rows associated with the column of the source table, and generating a representation of the column of the target table, data processing system executable instructions for filtering the generated reduced representation of selected rows associated with the column of the source table through the generated representation of the column of the target table, the filtered generated reduced representation of selected rows identifying source table rows that do not have to be joined with the target table, and data processing system executable instructions for joining, to the rows associated with the column of the target table, the rows associated with the column of the source table minus the filtered generated reduced representation of selected rows.
"Additional features of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The features of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
"It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed."
URL and more information on this patent, see: Lariba-Pey,
Keywords for this news article include: Information Technology, Information and Data Management, Information and Data Processing,
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
Most Popular Stories
- Apple, HP, Intel May Take a Hit from Slowdown in Smartphone Sales Growth
- FDIC Files Lawsuit on Behalf of Banks Allegedly Hurt by Libor Scandal
- Some California Cities Seeking Water Independence
- Motley Crue's Nikki Sixx Marries Model Courtney Bingham
- Chinese e-Commerce Giant Alibaba Gears for IPO in U.S.
- Will Missing Malaysian Jet Prompt Aviation System Change?
- Jack Daniel's Resists Changes to Tenn. Whiskey Law