News Column

"System for Multi-Store Analytics Execution Environments with Storage Constraints" in Patent Application Approval Process

August 12, 2014



By a News Reporter-Staff News Editor at Information Technology Newsweekly -- A patent application by the inventors Hacigumus, Vahit Hakan (San Jose, CA); Sankaranarayanan, Jagan (Santa Clara, CA); LeFevre, Jeffrey Paul (Santa Cruz, CA); Tatemura, Junichi (Cupertino, CA); Polyzotis, Neoklis (Santa Cruz, CA), filed on November 6, 2013, was made available online on July 31, 2014, according to news reporting originating from Washington, D.C., by VerticalNews correspondents.

This patent application is assigned to NEC Laboratories America, Inc.

The following quote was obtained by the news editors from the background information supplied by the inventors: "The present system relates to Multi-store Analytics Execution Environments with Storage Constraints.

"A database stores information as records, and records are stored within data pages on disk. The physical design of a database refers to the configurable physical data layout and auxiliary data structures, such as indexes and materialized views. The physical design can greatly affect database processing speed, and designs are tuned to improve query performance. The physical design tuning problem can be stated informally as: 'Given a workload W and storage budget b, obtain a physical design that minimizes the cost to evaluate W and fits within b'. W is a workload that may contain query and update statements. The storage budget is the storage space allotted for the physical design. Physical designs can include secondary data structures to enable faster data access such as indexes and materialized views, and may also include data partitioning strategies that affect the physical data layout. Commercial tools to automate this process exist in major DBMS such as IBM DB2's Design Advisor] and MS SQL Index Tuning Wizard that recommend beneficial physical designs.

"In 'Optimizing analytic data flows for multiple execution engines', a single data flow is optimized across multiple execution engines by utilizing their unique performance capabilities to reduce total execution time of the flow. Input data is 'pinned' to a store, i.e., where it currently resides, and output data is pinned to the DW as a reporting requirement. They consider data shipping and function shipping, and these decisions are affected by the availability of data and functions on each of the engines. Data movement cost is modeled as the network cost. To solve the problem they model it as a state space using a binary-valued matrix. Each row represents an operation in a data flow (in sequence) and each column represents an execution engine. A '1' is used to indicate if an operation is available on an engine, and zero if the operation is not available. They enumerate all valid paths top-down through the matrix. Valid moves in creating a path are allowed straight down or diagonal, but not allowed to move through a '0' cell. After enumerating all valid paths, they prune extensively to eliminate non-preferred paths, based on their desired heuristics. Heuristics include 1) prefer a certain engine (favor bit) for a flow, 2) disallow too many data movements between engines, 3) prefer engines that have more of the functions required for a data flow (considering functionality is not identical on all stores), and several other heuristics. They also require all flows to terminate in the DW. However, this approach does not provide a direct solution for our problem because it only considers optimizing for a single flow (query). Our proposed work will optimize for a workload of queries. We will leverage multiple engines by allowing opportunistic views to be moved between stores during query processing if moving the view will reduce total workload cost. Since their solution consider only a single query, it can be sub-optimal when considering multiple queries. A straightforward example of why their solution can be sub-optimal is the following. Consider optimizing a single query, and a decision is made not to move a materialized view from engine A to engine B due to the cost of movement being too high for the single query to benefit. However, if it had considered all queries in the workload, moving the view could result in great benefit to other queries, potentially outweighing the data movement cost."

In addition to the background information obtained for this patent application, VerticalNews journalists also obtained the inventors' summary information for this patent application: "Systems and methods are disclosed for managing a multi-store execution environment by applying opportunistic materialized views to improve workload performance and executing a plan on multiple database engines to increase query processing speed by leveraging unique capabilities of each engine by enabling stages of a query to execute on multiple engines, and by moving materialized views across engines.

"Advantages of the preferred embodiment may include one or more of the following. The system can move any view to any engine without requiring a very large search space for engine designs. A choice to move an individual view affects the benefit of all future queries in a sequence. Furthermore, since views can have interactions, these benefits are determined in light of other views. The space of views to move is not fixed but is always changing, as it depends upon the opportunistic views created by the last plan to execute. The multi-store execution problem makes the correct view movement choices during execution of the workload sequence. A optimal solution may be impractical due to the exponential space. Because a globally optimal solution may not be practical, we can use a local solution.

BRIEF DESCRIPTION OF THE DRAWINGS

"FIG. 1 shows an exemplary multi-store environment with multiple engines.

"FIGS. 2-4 show exemplary processes supporting Multi-store Analytics Execution Environments with Storage Constraints.

"FIG. 5 shows an exemplary computer for the processes of FIGS. 2-4."

URL and more information on this patent application, see: Hacigumus, Vahit Hakan; Sankaranarayanan, Jagan; LeFevre, Jeffrey Paul; Tatemura, Junichi; Polyzotis, Neoklis. System for Multi-Store Analytics Execution Environments with Storage Constraints. Filed November 6, 2013 and posted July 31, 2014. Patent URL: http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=720&p=15&f=G&l=50&d=PG01&S1=20140724.PD.&OS=PD/20140724&RS=PD/20140724

Keywords for this news article include: Information Technology, NEC Laboratories America Inc, Information and Data Architecture.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Information Technology Newsweekly


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters