News Column

Patent Issued for Self-Indexing Data Structure

July 22, 2014

By a News Reporter-Staff News Editor at Information Technology Newsweekly -- A patent by the inventors Green, Edward A (Englewood, CO); Rivas, Luis (Denver, CO); Kreider, Mark (Arvada, CO), filed on October 16, 2009, was published online on July 8, 2014, according to news reporting originating from Alexandria, Virginia, by VerticalNews correspondents.

Patent number 8775433 is assigned to Oracle International Corporation (Redwood City, CA).

The following quote was obtained by the news editors from the background information supplied by the inventors: "Generally, in database systems, several individual records of data may be stored in tables. Each table may identify fields, or columns, and individual records may be stored as rows, with a data entry in each column. For example, in a parts database, there may be a table 'Parts' which includes fields such as part name, part size, part brand, and the like. One record, which includes data in the several columns, would be entered in the Parts table for each part.

"One operation that may be performed on database systems is locating specific records within individual tables based on criteria of one or more fields (or columns). The database system may scan through every entry in a particular table to locate the desired records. However, this method may require the database system to scan an entire table which may undesirably consume a considerable amount of time.

"To reduce the time required to locate particular records in a database, database indexes may be established. Generally, a database index is a data structure that improves the speed of operations on a database table. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random look ups and efficient access of ordered records. The disk space required to store the index may be less than that required by the table (since indexes usually contain only key-fields according to which the table is to be arranged, and excludes the other details in the table), yielding the possibility to store indexes in memory for a table whose data is too large to store in memory.

"When an index is created, it may record the location of values in a table that are associated with the column that is to be indexed. Entries may be added to the index when new data is added to the table. When a query is executed against the database and a condition is specified on a column that is indexed, the index is first searched for the values specified. If the value is found in the index, the index may return the location of the searched data in the table to the entity requesting the query."

In addition to the background information obtained for this patent, VerticalNews journalists also obtained the inventors' summary information for this patent: "The following embodiments and aspects thereof are described and illustrated in conjunction with systems, tools, and methods which are meant to be exemplary and illustrative, and not limiting in scope. In various embodiments, one or more of the above-described problems have been reduced or eliminated, while other embodiments are directed to other improvements.

"The present invention is directed to a computer-based tool and associated methodology for transforming electronic information so as to facilitate communications between different semantic environments and access to information across semantic boundaries. More specifically, the present invention is directed to a self-indexing data structure and associated methodology that is automatically generated for data stored in a database. As set forth below, the present invention may be implemented in the context of a system where a semantic metadata model (SMM) for facilitating data transformation. The SMM utilizes contextual information and standardized rules and terminology to improve transformation accuracy. The SMM can be based at least in part on accepted public standards and classification or can be proprietary. Moreover, the SMM can be manually developed by users, e.g., subject matter experts (SMEs) or can be at least partially developed using automated systems, e.g., using logic for inferring elements of the SMM from raw data (e.g., data in its native form) or processed data (e.g., standardized and fully attributed data). The present invention allows for sharing of knowledge developed in this regard so as to facilitate development of a matrix of transformation rules ('transformation rules matrix'). Such a transformation system and the associated knowledge sharing technology are described in turn below.

"In a preferred implementation, the invention is applicable with respect to a wide variety of content including sentences, word strings, noun phrases, and abbreviations and can even handle misspellings and idiosyncratic or proprietary descriptors. The invention can also manage content with little or no predefined syntax as well as content conforming to standard syntactic rules. Moreover, the system of the present invention allows for substantially real-time transformation of content and handles bandwidth or content throughputs that support a broad range of practical applications. The invention is applicable to structured content such as business forms or product descriptions as well as to more open content such as information searches outside of a business context. In such applications, the invention provides a system for semantic transformation that works and scales.

"The invention has particular application with respect to transformation and searching of both business content and non-business content. For the reasons noted above relating to abbreviation, lack of standardization and the like, transformation and searching of business content presents challenges. At the same time the need for better access to business content and business content transformation is expanding. It has been recognized that business content is generally characterized by a high degree of structure and reusable 'chunks' of content. Such chunks generally represent a core idea, attribute or value related to the business content and may be represented by a character, number, alphanumeric string, word, phrase or the like. Moreover, this content can generally be classified relative to a taxonomy defining relationships between terms or items, for example, via a hierarchy such as of family (e.g., hardware), genus (e.g., connectors), species (e.g., bolts), subspecies (e.g., hexagonal), etc.

"Non-business content, though typically less structured, is also amenable to normalization and classification. With regard to normalization, terms or chunks with similar potential meanings including standard synonyms, colloquialisms, specialized jargon and the like can be standardized to facilitate a variety of transformation and searching functions. Moreover, such chunks of information can be classified relative to taxonomies defined for various subject matters of interest to further facilitate such transformation and searching functions. Thus, the present invention takes advantage of the noted characteristics to provide a framework by which locale-specific content can be standardized and classified as intermediate steps in the process for transforming the content from a source semantic environment to a target semantic environment and/or searching for information using locale-specific content. Such standardization may encompass linguistics and syntax as well as any other matters that facilitate transformation. The result is that content having little or no syntax is supplied with a standardized syntax that facilitates understanding, the total volume of unique chunks requiring transformation is reduced, ambiguities are resolved and accuracy is commensurately increased and, in general, substantially real-time communication across semantic boundaries is realized. Such classification further serves to resolve ambiguities and facilitate transformation as well as allowing for more efficient searching. For example, the word 'butterfly' of the term 'butterfly valve' when properly chunked, standardized and associated with tags for identifying a classification relationship, is unlikely to be mishandled. Thus, the system of the present invention does not assume that the input is fixed or static, but recognizes that the input can be made more amenable to transformation and searching, and that such preprocessing is an important key to more fully realizing the potential benefits of globalization. As will be understood from the description below, such standardization and association of attribute fields and field content allows for substantially automatic generation of database indexes having a useful relation to the indexed item of data.

"According to one aspect of the present invention, a computer-implemented method for automatically generating an index in a database system is provided. The method includes receiving raw data that includes human directed information (e.g., human readable text strings), and processing the raw data into a standardized format to produce standardized data. The standardized data includes information about an attribute or attribute value of the raw data. For example, the data may be a product and attribute data may include the brand, size, color, or other information about the product. It will be appreciated that the investigation is equally applicable to any data capable of being structured in this regard. In addition, the method includes generating a plurality of identifiers (e.g., index values) for the standardized data based on an attribute or attribute value of the raw data. For example, the identifiers may encode one or more attributes or attribute values of the raw data, such that the data may be accessed more rapidly using the identifiers. The method further includes storing the plurality of identifiers and the standardized data in a data storage structure.

"According to another aspect of the present invention, an apparatus for automatically generating an index structure in a database system is provided. The apparatus includes a conversion module operative to receive raw data and to convert the raw data to standardized data comprising a plurality of data objects. Further, the standardized data includes information about an attribute or attribute value of the data objects. The apparatus also includes an index generator module operative to generate a plurality of index values, wherein each of the index values is associated with a data object. Each of the index values encodes an attribute or attribute value of its associated data object. For example, in the case where the data objects are associated with parts in a catalogue, the index value for each part may encode information about the type of part, quantity, size, and the like. Further, the apparatus includes a data storage structure operative to store the plurality of index values and the plurality of data objects.

"According to another aspect of the present invention, a method for use in facilitating electronic communication between first and second data systems, wherein the first data system operates in a first semantic environment defined by at least one of linguistics and syntax is provided. The method includes providing a computer-based processing tool operating on a computer system. The method also includes first using the computer-based processing tool to access the communication and convert at least a first term of the communication between the first semantic environment and a second semantic environment that is different from the first semantic environment, and second using the computer-based processing tool to associate a classification with one of the first term and the converted term, the classification identifying the one of the first term and the converted term as belonging to a same class as at least one other term based on a shared characteristic of the at least one other term and the one of the first term and the converted term. Additionally, the method includes third using the classification to automatically generate an identifier for the converted term, and storing the identifier in a data storage structure.

"In addition to the exemplary aspects and embodiments described above, further aspects and embodiments will become apparent by reference to the drawings and by study of the following descriptions."

URL and more information on this patent, see: Green, Edward A; Rivas, Luis; Kreider, Mark. Self-Indexing Data Structure. U.S. Patent Number 8775433, filed October 16, 2009, and published online on July 8, 2014. Patent URL:

Keywords for this news article include: Information Technology, Information and Data Storage, Oracle International Corporation, Information and Data Architecture.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC

For more stories covering the world of technology, please see HispanicBusiness' Tech Channel

Source: Information Technology Newsweekly

Story Tools Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters