The assignee for this patent, patent number 8653993, is
Reporters obtained the following quote from the background information supplied by the inventors: "The present invention relates in general to data compression and data encoding. In particular, the present invention relates to generating occurrence information for data values in a data set to be encoded or compressed.
"Data compression is an important aspect of various computing and storage systems. While data warehouses are discussed in some detail as an example of systems where data compression is relevant, it is appreciated that data compression and efficient handling of compressed data is relevant in many other systems where large amounts of data are stored. In general, data warehouses are repositories of an organization's electronically stored data, which are designed to facilitate reporting and analysis.
"The effectiveness of data warehouses that employ table scans for fast processing of queries relies on efficient compression of the data. With adequate data compression method, table scans can be directly applied on the compressed data, instead of having to decode each value first. Also, well designed algorithms can scan over multiple compressed values that are packed into one word size in each loop. Therefore, shorter code typically means faster table scan. The following compression methods are well-known. Dictionary based compression encodes a value from a large value space but relatively much smaller set of actual values (cardinality) with a dictionary code. Offset based compression compresses data by subtracting a common base value from each of the original values and uses the remaining offset to represent the original value. The prefix-offset compression encodes a value by splitting its binary representation into prefix bits and offset bits, and concatenates the dictionary code of the prefix bits with the offset bits as the encoding code.
"One of the most important criteria for compression efficiency is the average code length, which is the total size of compressed data divided by the number of values in it. One way of achieving better compression efficiency, i.e. smaller average code length, is to encode the values with a higher probability with a shorter code."
In addition to obtaining background information on this patent, VerticalNews editors also obtained the inventors' summary information for this patent: "According to an exemplary embodiment, a computerized method for generating occurrence data of data values for enabling encoding of a data set, the method includes determining occurrences of data values in a first data batch and determining occurrence count information for at most a first number of most frequent data values in the first data batch, the occurrence count information identifying the most frequent data values and their occurrence counts. The method also includes generating for rest of the data values in the first data batch at least a first histogram having a second number of buckets and merging the occurrence count information of the first data batch to merged occurrence count information of a second data batch. The method further includes merging the first histogram of the first data batch to a merged histogram corresponding to the second data batch and processing a next data batch as a first data batch until the data set to be encoded is processed in batches.
"According to another exemplary embodiment, a data processing system includes input means for receiving data to be encoded and splitting means for splitting data to be encoded into data batches. The system also includes batch histogram means for determining occurrences of data values in a data batch, the batch histogram means is adapted to determine occurrence count information for at most a first number of most frequent data values in the data batch. The occurrence count information identifies the most frequent data values and their occurrence counts. The batch histogram means is also adapted to generate for rest of the data values in the data batch at least a first histogram having a second number of buckets. The system also includes merging means, operably connected to the batch histogram means, for merging the occurrence count information of a first data batch to merged occurrence count information of at least one further data batch and for merging the first histogram of a first data batch to a merged histogram corresponding to the at least one further data batch.
"Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with the advantages and the features, refer to the description and to the drawings."
For more information, see this patent: Bendel, Peter; Draese, Oliver; Hrle, Namik; Li, Tianchao. Data Value Occurrence Information for Data Compression. U.S. Patent Number 8653993, filed
Keywords for this news article include: Information Technology, Information and Data Aggregation, Information and Data Compression,
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
Most Popular Stories
- Obama Administration Releases Proposal to Regulate For-Profit Colleges
- Koch Brothers Step up Anti-Obamacare Campaign
- Elizabeth Vargas' Husband Marc Cohn Addresses Rumors
- Keurig Adds Peet's coffee, Alters Starbucks deal
- Quiznos Files for Chapter 11
- U.S. to Relinquish Gov't Control Over Internet
- Vybz Kartel Convicted of Murder
- SoCalGas Reaches Record Spend on Diversity Suppliers
- U.S. Consumer Sentiment Falls in Early March
- FDIC Sues Big Banks Over Rate Manipulation