News Column

Patent Issued for Techniques for Data Assignment from an External Distributed File System to a Database Management System

May 13, 2014

By a News Reporter-Staff News Editor at Information Technology Newsweekly -- From Alexandria, Virginia, VerticalNews journalists report that a patent by the inventors Qi, Yan (Fremont, CA); Xu, Yu (Burlingame, CA); Kostamaa, Olli Pekka (Santa Monica, CA); Wen, Jian (Irvine, CA), filed on December 29, 2011, was published online on April 29, 2014.

The patent's assignee for patent number 8713057 is Teradata US, Inc. (Dayton, OH).

News editors obtained the following quote from the background information supplied by the inventors: "After over two-decades of electronic data automation and the improved ability for capturing data from a variety of communication channels and media, even the smallest of enterprises find that the enterprise is processing terabytes of data with regularity. Moreover, mining, analysis, and processing of that data have become extremely complex. The average consumer expects electronic transactions to occur flawlessly and with near instant speed. The enterprise that cannot meet expectations of the consumer is quickly out of business in today's highly competitive environment.

"Consumers have a plethora of choices for nearly every product and service, and enterprises can be created and up-and-running in the industry it mere days. The competition and the expectations are breathtaking from what existed just a few short years ago.

"The industry infrastructure and applications have generally answered the call providing virtualized data centers that give an enterprise an ever-present data center to run and process the enterprise's data. Applications and hardware to support an enterprise can be outsourced and available to the enterprise twenty-four hours a day, seven days a week, and three hundred sixty-five days a year.

"As a result, the most important asset of the enterprise has become its data. That is, information gathered about the enterprise's customers, competitors, products, services, financials, business processes, business assets, personnel, service providers, transactions, and the like.

"Updating, mining, analyzing, reporting, and accessing the enterprise information can still become problematic because of the sheer volume of this information and because often the information is dispersed over a variety of different file systems, databases, and applications.

"In response, the industry has recently embraced a data platform referred to as Apache Hadoop.TM. (Hadoop.TM.). Hadoop.TM. is an Open Source software architecture that supports data-intensive distributed applications. It enables applications to work with thousands of network nodes and petabytes (1000 terabytes) of data. Hadoop.TM. provides interoperability between disparate file systems, fault tolerance, and High Availability (HA) for data processing. The architecture is modular and expandable with the whole database development community supporting, enhancing, and dynamically growing the platform.

"However, because of Hadoop's.TM. success in the industry, enterprises now have or depend on a large volume of their data, which is stored external to their core in-house database management system (DBMS). This data can be in a variety of formats and types, such as: web logs; call details with customers; sensor data, Radio Frequency Identification (RFID) data; historical data maintained for government or industry compliance reasons; and the like. Enterprises have embraced Hadoop.TM. for data types such as the above referenced because Hadoop.TM. is scalable, cost efficient, and reliable.

"One challenge in integrating Hadoop.TM. architecture with an enterprise DBMS is efficiently assigning data blocks and managing workloads between nodes. That is, even when the same hardware platform is used to deploy some aspects of Hadoop and a DBMS the resulting performance of such a hybrid system can be poor because of how the data is distributed and how workloads are processed."

As a supplement to the background information on this patent, VerticalNews correspondents also obtained the inventors' summary information for this patent: "In various embodiments, techniques for data assignment from an external distributed file system (DFS) to a DBMS are presented. According to an embodiment, a method for data assignment from an external DFS to a DBMS is provided.

"Specifically, an initial assignment for first nodes to second nodes is received in a bipartite graph. The first nodes represent data blocks in an external distributed file system and the second nodes represent access module processors of a database management system (DBMS). A residual graph is constructed with a negative cycle having the initial assignment. The residual graph is processed through iterations, with each of which the initial assignment is adjusted to eliminate negative cycles. Finally, a final assignment is achieved by removing all negative cycles of the residual graph, for each of the data blocks to one of the access module processors as an assignment flow."

For additional information on this patent, see: Qi, Yan; Xu, Yu; Kostamaa, Olli Pekka; Wen, Jian. Techniques for Data Assignment from an External Distributed File System to a Database Management System. U.S. Patent Number 8713057, filed December 29, 2011, and published online on April 29, 2014. Patent URL:

Keywords for this news article include: Teradata US Inc, Information Technology, Information and Data Management.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC

For more stories covering the world of technology, please see HispanicBusiness' Tech Channel

Source: Information Technology Newsweekly

Story Tools Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters