News Column

Researchers Submit Patent Application, "Systems and Methods for Multi-Source Data-Warehousing", for Approval

August 19, 2014



By a News Reporter-Staff News Editor at Information Technology Newsweekly -- From Washington, D.C., VerticalNews journalists report that a patent application by the inventor Guerra, Joseph (Cheshire, CT), filed on December 27, 2013, was made available online on August 7, 2014.

No assignee for this patent application has been made.

News editors obtained the following quote from the background information supplied by the inventors: "Data warehouses provide systems for storing and organizing data that organizations use to plan and conduct business operations, for example. Data is organized using extraction, transform and load (ETL) operations to enable use of computer systems to access data for specific organizational needs. However, as the amount and complexity of data increases, existing tools are inadequate to provide access to the types of data that businesses need to conduct operations at the pace that is now required. Unfortunately, existing data warehouses are not a panacea for all business needs. Particularly, many warehouses are inefficient in their implementation and perform conventional operations in a manner which may render the system impractical for dealing with large datasets in a timely manner. There exists a need for novel systems and methods to improve data warehousing operations and to better coordinate data organization for analysis, input, and retrieval."

As a supplement to the background information on this patent application, VerticalNews correspondents also obtained the inventor's summary information for this patent application: "Data warehouses typically maintain a copy of information from source transaction systems. This architecture provides the opportunity to perform a variety of functions. For example, the warehouse may be used to maintain data history, even if the source transaction systems do not maintain a history. The warehouse can also integrate data from multiple source systems, enabling a central view across the enterprise. This is particularly valuable when the organization has grown by one or more mergers, for example. A warehouse can also restructure the data to deliver excellent query performance, even for complex analytic queries, without impacting the transactional database systems. A warehouse may also present the organization's information in a consistent manner and restructure the data so that it makes sense to the business users. A warehouse may provide a single common data model for all data of interest regardless of the data's source.

"Different data sources typically have different characteristics requiring different processes to perform data formatting and transfer into different data warehouses. Many organizations or entities (e.g. businesses, governmental organizations, non-profit entities) utilize two or more data sources to generate reports or facilitate decision making. However, such entities typically experience difficulties in accessing and analyzing data from these different sources. Preferred embodiments of the invention utilize different data transfer processes, often referred to as ETL operations, to enable the organization to manage the movement of data from a plurality of sources into a data warehouse. The ETL system is configured to provide for the loading of data from a plurality of sources having different characteristics into a data storage system. The ETL system can utilize a plurality of stages in order to organize data into the required format to achieve reporting of information from a single storage platform so that data from different sources can be retrieved and reported in a single reporting sequence. In a preferred embodiment, a plurality of ETL processes serve to load data from a corresponding plurality of sources into a corresponding plurality of intermediate storage devices referred to herein as repositories. A second plurality of ETL processes can then extract data from the repositories, and transform and load the data into a single data warehouse. The second stage ETL process can be associated with a single source, or a plurality of sources. The different sources, ETL system elements and storage devices can utilize separate servers that are connected by a communication network to facilitate data transfer and storage. System operation can be managed by one or more data processors to provide automated control of data management operations.

"In this manner the warehouse adds value to operational business applications. The warehouse may be built around a carefully designed data model that transforms production data from a high speed data entry design to one that supports high speed retrieval. This improves data quality, by providing consistent codes and descriptions, and possibly flagging bad data. A preferred embodiment of the invention uses a derived surrogate key in which an identifier is formed from field entrees in the source table in which transaction data has been positioned. Different combinations of fields can be employed to generate derived surrogate keys depending on the nature of the data and the fields in use for a given data warehouse. It is generally preferred to use a specific combination of fields, or a specific formula, to form the derived surrogate keys for a particular data warehouse. This provides for data consistency and accuracy, and avoids the look-up operations commonly used in generating surrogate keys in existing data warehouses. Preferred embodiments of the invention utilize the derived surrogate key methodology to provide faster access to more complex data systems, such as the merger of disparate source data into a single warehouse.

"A preferred embodiment of the invention uses the advantages provided by the derived surrogate key methodology in a hierarchical structure that uses a hierarchy table with a plurality of customer dimensions associated with a plurality of levels of an interim table. As hierarchy reporting requirements change it is no longer necessary to alter the dimension of the hierarchy table, as the interim table can be altered to provide for changed reporting requirements. Thus, a preferred method of the invention includes altering the interim table to provide for a change in reporting without the need for changing of each dimension. A preferred embodiment includes altering a rolling format which can include, for example, resetting the offset distance to identify which level in an interim table is used to retrieve the appropriate data. Thus, preferred methods involve setting the parameters such as the number of levels to be traversed in order to populate the interim table with an ETL tool. The interim table is then connected to the fact table and the dimension table to enable the generation of reports. The interim table can comprise a plurality of rows and a plurality of columns to provide a multidimensional array of fields in which keys are stored. Various dimensions of this key table can be extended to accommodate different reporting formats or the addition of additional data sources. A preferred embodiment operates to populate the fields of this key table with derived surrogate keys associated with each distinct data source, for example. This system can operate as an in-memory system with a cloud computing capability to support real time data management and analysis functions.

BRIEF DESCRIPTION OF THE DRAWINGS

"FIG. 1 is a high level representation of a data warehouse design used in certain embodiments, including a source system feeding the data warehouse and being utilized by a business intelligence (BI) toolset, according to an example embodiment.

"FIG. 2A is an exemplary computing device which may be programmed and/or configured to implement certain processes described in relation to various embodiments of the present disclosure, according to an example embodiment.

"FIG. 2B illustrates a networked communication system for performing multi-source data warehousing operations.

"FIG. 3 illustrates an example database topology for pulling data from multiple data sources using an Extract, Transform, and Load (ETL) software tool, according to an example embodiment.

"FIG. 4 illustrates an example of a database topology for creating a separate Central Repository (CR) for each of the separate data sources that uses a separately maintained ETL process, according to a preferred embodiment.

"FIG. 5 illustrates an example of the separate business subjects (data marts) that may be included in the data warehouse, according to an example embodiment.

"FIG. 6 illustrates an Accounts Receivable (AR) business subject (data mart) that may be included in the data warehouse, according to an example embodiment.

"FIG. 7 illustrates an example embodiment to move data from the separate source transactional data stores into the AR Data Mart Fact table and the subordinate source specific extension tables, according to an example embodiment.

"FIG. 8 illustrates an example embodiment to move data from the separate source transactional data stores into the Data Mart Fact Header table associated with each data source, according to an example embodiment.

"FIG. 9 illustrates a method of creation and usage of system generated surrogate keys according to prior art.

"FIG. 10A is a flow diagram depicting examples steps in a derived numeric surrogate key creation process, according to an example embodiment.

"FIG. 10B illustrates a preferred method of forming a derived surrogate key.

"FIG. 10C is a flow diagram depicting example steps in a derived surrogate key creation process without performing a lookup operation, according to an example embodiment.

"FIG. 10D is a flow diagram depicting example steps in a derived surrogate key creation process without performing a lookup operation, according to an example embodiment.

"FIG. 11A illustrates a flow diagram for forming a derived character surrogate key in accordance with preferred embodiments of the invention.

"FIG. 11B illustrates a method of creation and usage of simple derived numeric surrogate keys based on application data in certain embodiments.

"FIG. 12A illustrates a flow diagram for forming a derived multiple field numeric surrogate key in accordance with preferred embodiments of the invention.

"FIG. 12B illustrates a method of creation and usage of simple derived character surrogate keys based on application data in certain embodiments.

"FIG. 13A is a flow diagram for forming a derived multiple field character surrogate key in accordance with preferred embodiments of the invention.

"FIG. 13B illustrates the method of certain embodiments for creating and using derived complex numeric surrogate keys based on application data.

"FIG. 14A is a flow diagram for forming a derived surrogate key with a combination of numeric and character natural keys in accordance with preferred embodiments of the invention.

"FIG. 14B illustrates the method of certain embodiments for creating and using derived complex character surrogate keys based on application data.

"FIG. 15 illustrates the method of certain embodiments for creating and using a source control.

"FIG. 16 is a flow diagram depicting a method for providing multisource control in certain embodiments.

"FIG. 17A illustrates the method of certain embodiments for using audit controls.

"FIG. 17B illustrates an ETL process for moving a source system table into a dimension table.

"FIG. 18A-D illustrate various prior art methods of utilizing hierarchies.

"FIG. 19A illustrates the method of utilizing hierarchies in certain of the embodiments, overcoming certain of the deficiencies of the structures of FIGS. 18A-D.

"FIG. 19B is a flowchart of an exemplary method of generating an interim table.

"FIG. 19C is a flowchart of an exemplary method of using an interim table.

"FIG. 19D illustrates a method for traversing an hierarchical table.

"FIG. 20A illustrates a method used in certain embodiments to build a dates dimension.

"FIG. 20B illustrates a flow diagram for forming a dates dimension.

"FIG. 21 is a flow diagram depicting a method used in certain embodiments to create a dates dimension.

"FIGS. 22A-B show an example of the dates dimension in certain embodiments.

"FIG. 23 is a flow diagram depicting steps in a process for providing a cross-module linkages table.

"FIG. 24 is a process flow diagram illustrating a method for traversing a cross-module linkages table to generate reports.

"FIG. 25 illustrates a method of forming a derived composite key.

"FIG. 26A illustrates a process flow for forming a dates pattern table.

"FIG. 26B illustrates variables in the process flow sequence of FIG. 26A.

"FIGS. 26C-26G illustrate flow diagram for forming a dates pattern.

"FIGS. 27A-27E illustrate methods for periodic dates pattern information processing.

"FIGS. 28A-28G illustrate methods of processing dates information in accordance with preferred embodiments of the invention."

For additional information on this patent application, see: Guerra, Joseph. Systems and Methods for Multi-Source Data-Warehousing. Filed December 27, 2013 and posted August 7, 2014. Patent URL: http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=920&p=19&f=G&l=50&d=PG01&S1=20140731.PD.&OS=PD/20140731&RS=PD/20140731

Keywords for this news article include: Patents, Information Technology, Information and Data Management, Information and Data Aggregation.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Information Technology Newsweekly


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters