The patent's assignee is
News editors obtained the following quote from the background information supplied by the inventors: "Map-reduce is a data processing system that supports distributed processing of an input across a cluster of computing nodes. Traditionally, the map step of the map-reduce data processing system involves a root level computing node dividing an input into smaller sub-problems, and distributing the smaller sub-problems to a cluster of lower level computing nodes for processing. Each lower level computing node may process a corresponding sub-problem and return a solution to the root level computing node. In the reduce step of the map-reduce data processing system, the root level computing node may collect the solutions from the lower level computing nodes and combine the solutions to form an output that is the solution to the input. In some instances, there may be multiple lower levels of computing nodes.
"However, current map-reduce data processing systems are no longer limited to performing map operations and then reduce operations. Instead, current map-reduce data processing system use logical graphs to process operations. As a result, current map-reduce data processing systems are highly dependent on an initial selection of an appropriate execution plan. The selection of an appropriate execution plan may include the selection of properties for the map-reduce code used, properties of the data to be processed, and properties related to the interactions of the data and the code. However, such properties may be difficult to estimate due to the highly distributed nature of map-reduce data processing systems and the fact that the systems offer users the ability to specify arbitrary code as operations on the data. As a result, today's map-reduce data processing systems may use fixed apriori estimates to choose execution plans. In many instances, the use of such fixed a prior estimates may result in poor data processing performance."
As a supplement to the background information on this patent application, VerticalNews correspondents also obtained the inventors' summary information for this patent application: "Described herein are techniques for using statistical data collected during the execution of tasks for a job by a distributed data parallel computation system to optimize the performance of the tasks or similar recurring tasks of another job. In various embodiments, the distributed data parallel computation system may be a map-reduce data processing system that executes tasks on multiple computing devices. The jobs that are executed by the map-reduce data processing system are specified by execution plans. An execution plan may designate tasks that include operations to be performed in a distributed fashion by the multiple computing devices of the map-reduce data processing system. For example, the execution plan may encode, among other things, the sequence in which operations are to be performed, the partition of data in a data table, the degree of parallelism for each operation, and the implementations of the operations.
"The statistics that are collected during the execution of a job may include statistics about resource usage, computation cost, and cardinality. Such statistics may be used by an optimizer of the map-reduce data processing system to optimize an execution sequence for a set of operations, a degree of parallelism for the execution of operations, the number of partitions to use for the execution of operations, particular operations to fit within a single task, and/or other optimizations. The optimization may result in an optimized execution plan that uses less computation resources for the job or a similar recurring job. Such an optimized execution plan may be superior to an optimized execution plan for the same job that is produced using solely fixed apriori estimates.
"In at least one embodiment, an execution plan for a job is initially generated, in which the execution plan includes tasks. Statistics regarding operations performed in the tasks are collected while the tasks are executed via parallel distributed execution. Another execution plan is then generated for another recurring job, in which the additional execution plan has at least one task in common with the execution plan for the job. The additional execution plan is subsequently optimized based on the statistics to produce an optimized execution plan.
"This Summary is provided to introduce a selection of concepts in a simplified form that is further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
"The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference number in different figures indicates similar or identical items.
"FIG. 1 is a block diagram that illustrates an example map-reduce data processing system that uses collected statistical data to optimize data parallel task executions.
"FIG. 2 is an illustrative diagram that shows example components of a computing device that implements a job manager for collecting statistical data on task performance.
"FIG. 3 is an illustrative diagram that shows implementation scenarios for collecting statistical data and using the collected statistical data to optimize data parallel computations.
"FIG. 4 is a flow diagram that illustrates an example process for using collected statistics from an execution of a current job to optimize the execution of a similar recurring job.
"FIG. 5 is a flow diagram that illustrates an example process for using statistics collected during an execution of a job to perform on-the-fly optimization of the operations of the job.
"FIG. 6 is a flow diagram that illustrates an example process for collecting statistics regarding a set of operations that are executed according to an execution plan for a job."
For additional information on this patent application, see: Bruno, Nicolas; Zhou, Jingren; Kandula, Srikanth;
Keywords for this news article include:
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
Most Popular Stories
- Crimean Referendum Violates International Law: Obama
- Florida Insurers Reach Out to Hispanics
- 2 Million Long-term Jobless Have No Benefits
- Fuentes Makes NAHREP's Top 10 List
- Alfredo Ramos Martínez, Mexican Muralist, Symposium at Scripps
- U.S. Economy Added 175,000 Jobs in February
- Juanes Back to Singing About Love
- Hispanic Unemployment Eased in February
- Pussy Riot Members Attacked at McDonald's
- Darrell Issa Apologizes to Elijah Cummings