News Column

Patent Issued for Method for Multithreading an Application Using Partitioning to Allocate Work to Threads

July 22, 2014

By a News Reporter-Staff News Editor at Information Technology Newsweekly -- Oracle America, Inc. (Redwood City, CA) has been issued patent number 8776077, according to news reporting originating out of Alexandria, Virginia, by VerticalNews editors.

The patent's inventor is Neary, Paul (Greystones, IE).

This patent was filed on April 2, 2008 and was published online on July 8, 2014.

From the background information supplied by the inventors, news correspondents obtained the following quote: "Multithreaded programming permits concurrent execution of computational tasks to improve application performance. Thread synchronization methods such as semaphores, mutual exclusion locks, and readers/writer locks are generally used to guarantee the atomicity of operations on shared data and to provide a consistent view of memory across concurrently executing threads. Multithreaded programming generally employs a model for assigning computational tasks to threads. Conventional models include creating a thread for each task and a thread pool (a special case of which is the Boss/Worker model). Creating a thread per task may cause performance issues when the frequency of task creation is high and mean task duration is low. A thread pool typically incorporates some form of queue data structure to manage the work/resources assignable to each thread in the pool. This in turn requires some form of synchronization to prevent threads interfering with one another while accessing the queue data structure.

"In a thread pool model, a number of threads are created to perform a number of tasks which are usually organized in a queue referred to as a task queue. Typically there are many more tasks than threads. A thread requests the next task from the task queue upon completion of its current task. When all tasks have been completed (i.e., the task queue is empty), the threads can terminate or sleep until new tasks become available.

"Thread synchronization mechanisms may cause execution bottlenecks when multiple threads are blocked while attempting to access a protected data structure or code segment. In addition to the overhead (and reduced concurrency) introduced by the use of a synchronization mechanism to parallelize an application, there may be overhead associated with the Thread Pool assignment model employed. The management of work/resources through the use of a queue generally requires synchronization as well. For example, in the Boss/Worker model, a main (Boss) thread performs the task of finding the work (i.e., filling the queue) with the worker threads selecting and completing the work from the queue. Since all worker threads require access to a single queue, synchronized access is generally required to provide a consistent view of the queue data structure among all executing threads. This may result in a performance bottleneck when multiple worker threads are blocked while attempting to access the queue.

"Appendix I, which is incorporated herein by reference, shows the source code for matrix multiplication employing a conventional thread pool model of multithreading programming that incorporates the use of work/resource queues. Matrix A has dimensions (N, M) and matrix B has dimensions (M, K) and the results matrix C has dimensions (N, K). The worker threads process individual rows of A and individual columns of B resulting in individual elements of C. The total number of tasks that can be performed in parallel is N*K. It should be noted that matrix multiplication is intrinsically parallel in that the calculation of any of the tasks is independent of all the others. However, use of a queue to manage thread assignment typically reduces concurrency.

"When the matrix multiplication is performed, a mutual exclusion (mutex) lock is acquired to ensure that only one matrix multiplication is in progress. A mutex lock typically is used to synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. The mutex locks are statically initialized to zero before use. The main thread (the boss thread) checks whether its worker threads have been created. If not, it creates one for each CPU.

"Once the worker threads have been created, the boss thread sets up a counter of work to do and signals the workers with a condition variable. Each worker thread selects a row and column for the input matrices, then updates the row and column variables so that the next worker thread will get the next row or column or both.

"The mutex lock is then released so that computing the vector product can proceed in parallel. When the results are ready, a worker thread reacquires the mutex lock and updates the counter of work completed. The worker thread that completes the last bit of work signals the boss thread that the matrix multiplication is complete.

"Porting legacy code to utilize multithreading typically requires significant changes to the legacy code. As the code in Appendix I illustrates, the multithreaded code for matrix multiplication using a queue and locking mechanisms is very different from its single threaded counterpart. The multithreaded version typically involves porting of the single threaded counterpart to insert queue structures and locks to synchronize access to the queue structures. It should be noted that as the number of threads increases, contention for the queue increases due to the increased locking activity. This generally results in less concurrency, a result of using a non-parallel construct (e.g., a queue) to parallelize an application.

"Additionally, the number of threads created for the thread pool with a queue structure is a parameter that typically has to be tuned for best performance. The cost of having a larger thread pool is increased resource usage. Additionally, too many threads may hurt performance due to increased context switching overhead while too few threads may not fully utilize all the resources.

"What is needed is a method of assigning work to threads that does not require synchronization. What is further needed is a method that eliminates work queues to provide improved concurrency and increased application performance in a multithreaded programming environment."

Supplementing the background information on this patent, VerticalNews reporters also obtained the inventor's summary information for this patent: "One aspect of the present invention involves a method for assigning work to a plurality of threads. The method involves identifying a primitive data attribute and partitioning a work load into a plurality of partitions using the primitive data attribute. The method further involves assigning a first partition of the plurality of partitions to a first thread of the plurality of threads and assigning a second partition of the plurality of partitions to a second thread of the plurality of threads.

"Another aspect of the present invention involves a method for improving the concurrency of a multithreaded program. The method involves identifying a queue structure within a multithreaded program that stores a plurality of tasks to be performed by a plurality of threads. The method further involves determining a primitive data attribute that identifies each task of the plurality of tasks and selecting a partition function to assign each task to one of the plurality of threads. The method finally involves replacing the queue structure with the partition function.

"Yet, another aspect of the present invention involves a computer readable storage medium. Stored on the computer readable storage medium are computer instructions for partitioning a workload into a plurality of partitions that are executable in parallel and include one or more tasks, computer instructions for creating a plurality of threads, and computer instructions for allocating a first partition of the plurality of partitions to a first thread of the plurality of threads.

"Yet, another aspect of the present invention involves a computer system including a processor unit configured to run a plurality of threads of execution and a system memory coupled to the processor unit that stores a multithreaded program. The multithreaded program includes a workload partitioned into a plurality of partitions using a primitive data element. A first partition of the plurality of partitions is assigned to a first thread of the plurality of threads for execution."

For the URL and additional information on this patent, see: Neary, Paul. Method for Multithreading an Application Using Partitioning to Allocate Work to Threads. U.S. Patent Number 8776077, filed April 2, 2008, and published online on July 8, 2014. Patent URL:

Keywords for this news article include: Oracle America Inc, Information Technology, Information and Data Architecture.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC

For more stories covering the world of technology, please see HispanicBusiness' Tech Channel

Source: Information Technology Newsweekly

Story Tools Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters