News Column

Researchers Submit Patent Application, "Extract-Transform-Load Processor Controller", for Approval

July 15, 2014



By a News Reporter-Staff News Editor at Information Technology Newsweekly -- From Washington, D.C., VerticalNews journalists report that a patent application by the inventors GREENWOOD, LEONARD D. (MILTON KEYNES, GB); HARDEN, ARRON J. (MILTON KEYNES, GB); VIZOR, JULIAN J. (MILTON KEYNES, GB), filed on October 23, 2013, was made available online on July 3, 2014.

The patent's assignee is International Business Machines Corporation.

News editors obtained the following quote from the background information supplied by the inventors: "The present invention relates to the field of controllers and methods for controlling Extract-Transform-Load (ETL) processors.

"In the field of very large data storage repositories, such as data warehouses, there is frequently a need to take data from a plurality of sources, often under the control of heterogeneous data storage systems, and to aggregate the data in such a way as to make it capable of coherent processing. The need for aggregation of data from such a plurality of data sources has given rise to a number of systems designed to perform the tasks of extracting, transforming and loading the data.

"Before a repository of data can be effectively used as a source of truly usable information, it is usually created or updated using many sources. Most often, the data that is accumulated (and later used for update of the repository) is of a different format residing on an external system than what is ultimately needed in the repository. The process of acquiring this data and converting it into useful, compatible and accurate data is often labelled ETL (for Extraction, Transformation, and Load).

"Extraction is the task of acquiring the data (in whatever format might be possible) from the source systems. This can be as simple as dumping a flat file from a database or spreadsheet, or as sophisticated as setting up relationships with external systems that then supervise the transportation of data to the target system.

"Transformation is often more than just converting data formats (although this is a critical step in getting the data to the target system). Data from external systems may contain incompatible or incorrect information, depending on the checks and balances that were in effect on the external system. Part of the transformation step is to 'cleanse' or 'reject' the data that does not conform. Common techniques used as part of this step include character examination (for example, reject numeric value fields that contain characters) and range checking (reject values outside of an acceptable range). Rejected records are usually deposited in a separate file and are then processed by a more sophisticated tool or manually to correct the problems. The values are then rolled into the transformed set.

"Load is the stage in which the captured and transformed data is deposited into the new data store (warehouse, mart, etc.). For SQL-compatible database systems, this process can be accomplished with SQL commands (IMPORT), utilities (LOAD), or integrated tools. Additionally, the total ETL process can be accomplished via third party applications, often decreasing or eliminating the need for custom programming. The ETL process can be as simple as transferring some data from one table to another on the same system. It can also be as complex as taking data from an entirely different system that is thousands of miles away and rearranging and reformatting it to fit a very different system.

"At its very simplest level an ETL (Extract Transform Load) job is a process that reads data from one source (such as a database), transforms it (for example, remove trailing spaces), and finally writes it to a target (such as a file). An ETL job design consists of one or more stages, each stage performing a discrete function such as read from database, sort data, merge data etc. The data read from, or written to, stages may be represented as links that join the stages together. ETL job designs can vary from the simple, consisting of only a handful of stages, to the complex where the number of stages can exceed one hundred.

"An ETL job design is typically constructed by the user (an 'ETL developer') dragging and dropping stages onto a graphical canvas and then linking their input and outputs together. The stages chosen, the way they are joined together, and the values of properties set will together satisfy the high level requirements for that job. Currently ETL developers need to be extremely knowledgeable about the ETL application and know exactly what stages they should use to achieve this requirement. This becomes a barrier for customers who want to get their developers up and running quickly with their ETL applications. Even for developers who are proficient with the application, it can be hard to remember exactly what stage can be linked to other stages and in what circumstances such links are recommended or not. Such barriers to learning add cost to the process and introduce potentially significant opportunities for human error."

As a supplement to the background information on this patent application, VerticalNews correspondents also obtained the inventors' summary information for this patent application: "In one embodiment of the present invention, a controller is coupled to an Extract-Transform-Load (ETL) processor, which is connected to one or more first data storage devices and adapted to render contextual assistance to a user on a display device. The controller comprises: a hardware storage device; a storage control component for storing, on said hardware storage device, a set of criteria for preferredness of ETL stage placements; an I/O input device detecting component for recognizing a proposed placement of a stage on a GUI canvas on the display device; an analytical component for analyzing an eventual result of the proposed placement in an ETL activity represented on the GUI canvas; a comparator for comparing the eventual result of the proposed placement in the ETL activity with the set of criteria; and an indicator control component for, responsive to an outcome of an operation of the comparator, providing to the user an indicator of a degree of preferredness of said proposed placement according to the set of criteria.

"In one embodiment of the present invention, a method and/or computer program product operates a controller for an Extract-Transform-Load (ETL) processor connected to one or more first data storage devices and adapted to render contextual assistance to a user on a display device. A storage control component on a hardware storage device stores a set of criteria for preferredness of ETL stage placements, wherein the storage control component is implemented by one or more processors. An I/O input device detecting component recognizes a proposed placement of a stage on a GUI canvas on the display device, wherein the I/O input device detecting component is implemented by one or more processors. An analytical component analyzes an eventual result of the proposed placement in an ETL activity represented on the GUI canvas, wherein the analytical component is implemented by one or more processors. A comparator compares the eventual result of the proposed placement in the ETL activity with the set of criteria, wherein the comparator is implemented by one or more processors. An indicator control component, responsive to an outcome of an operation of the comparator, provides to the user an indicator of a degree of preferredness of said proposed placement according to the set of criteria, wherein the indicator control component is implemented by one or more processors.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

"An illustrative embodiment of the present invention will now be described by way of example only, with reference to the accompanying drawings, in which:

"FIG. 1 shows a controller arrangement according to one embodiment of the present invention;

"FIG. 2 shows a method of operation of a controller arrangement according to one embodiment of the present invention;

"FIG. 3 shows a simple exemplary operation of an embodiment of the present invention; and

"FIG. 4 shows a further simple exemplary operation of an embodiment of the present invention."

For additional information on this patent application, see: GREENWOOD, LEONARD D.; HARDEN, ARRON J.; VIZOR, JULIAN J. Extract-Transform-Load Processor Controller. Filed October 23, 2013 and posted July 3, 2014. Patent URL: http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=1032&p=21&f=G&l=50&d=PG01&S1=20140626.PD.&OS=PD/20140626&RS=PD/20140626

Keywords for this news article include: Information Technology, Information and Data Storage, International Business Machines Corporation.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Information Technology Newsweekly


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters