The assignee for this patent application is
Reporters obtained the following quote from the background information supplied by the inventors: "The invention relates in general to the field of computer-implemented methods for identifying, managing and displaying a large set of relationships between entities. In particular, it relates to co-clustering methods.
"Graphs are a popular data representation for modeling relationships, connections, etc., between entities. For example, bi-partite graphs have been the focus of a broad spectrum of studies spanning from document analysis to bioinformatics. A bi-partite graph paradigm may indeed be relied upon to represent various kinds of relationships, e.g., between parts of a computer-aided designed or CAD complex objects, real-world objects and attributes, etc., or even to represent data acquisition patterns between sets of processor cores and sets of data. Analysis of such related data is therefore of great importance for many companies, which accumulate increasingly large amounts of interaction data.
"One common approach involves the identification of groups of objects or entities that share common properties, have similar attribute values, etc. The availability of such information is advantageous in many respects, as patterns can be detected, improper relations can be repaired or even anticipated.
"Studies have suggested that matrix-based representations are more suitable and offer 'superior readability' compared to node-link representations, particularly when analyzing large number of subjects/variables. In some cases, one has interest in visualizing thousands of subjects and several dozens to hundreds of variables, therefore a matrix representation can advantageously be adopted for bi-partite graphs. Given a matrix data representation, the problem of simultaneous group discovery across two data dimensions can be mapped to a matrix co-clustering instance. The goal is to reveal the latent structure of a seemingly unordered matrix. This is achieved by discovering a permutation of matrix rows and columns, and a respective grouping, such that the resulting matrix is as homogeneous as possible. In a typical setting as contemplated herein, the rows represent the subjects (CAD objects or parts, cores, etc.) and the columns identify the variables (other entities to which the subject entities relate, attribute values, data accessed by a given processor, etc.).
"Presently, techniques for matrix co-clustering are predominantly based either on hierarchical clustering or on spectral clustering principles. As we discuss in more detail later on, both approaches exhibit limited scalability. The aim of the present approach is to provide a highly scalable approach that supports the analysis of thousands of graph nodes, and can easily drive interactive visual interfaces.
"The principle of co-clustering was introduced first by Hartigan with the goal of 'clustering cases and variables simultaneously'. Initial applications were for the analysis of voting data. Since then, several co-clustering algorithms have been proposed, broadly belonging into two classes, based on: a) hierarchical clustering, and b) spectral clustering.
"Agglomerative hierarchical clustering approaches are widely used in biological and medical sciences. In this setting, co-clustering also appears under the term 'bi-clustering'. One application is for the analysis of gene expression profiles. Columns and rows of an expression profile matrix are sorted using the relative orders of the leaves of the corresponding dendrograms constructed for genes and for arrays. The reordering of the dendrogram leaf objects is called seriation. Hierarchical clustering approaches can lead to discovery of very compact clusters. However, this comes at a high runtime complexity, i.e., ranging from O(n.sup.2) to O(n.sup.2 log.sup.2 n)--n being the number of objects--depending on the agglomeration process. Therefore, their applicability is limited to data instances that typically do not exceed several hundreds of objects. Such approaches are deemed prohibitive, even for today's computers, if one considers interactive response times.
"Spectral co-clustering approaches view the co-clustering problem as an instance of graph partitioning. Essentially, the problem is relegated to an eigenvector computation. Spectral clustering approaches are powerful for detecting non-linear cluster relationships (e.g., concentric circles). However, for some cases, including those contemplated here, one is interested in detecting rectangular clusters; hence, it can be realized that computationally simpler techniques may also discover the existence of rectangular co-clusters. The complexity of spectral approaches is in the order of O(n log.sup.2n). Recent works report a runtime of several seconds for a few thousands of objects; as such, their usefulness is typically limited to small data instances (fewer than 10.sup.4 nodes).
"In the last years, approaches have appeared that view co-clustering from a purely optimization perspective and do cluster assignments using an information theoretic objective function. So, the optimal co-clustering maximizes the mutual information between the clustered random variables.
"In the field of visualization, several techniques have been proposed for visualizing bipartite graphs. Such approaches do usually not involve co-clustering.
"Finally, there exist approaches that encapsulate hybrid visualization methods, using a combination of matrix and node-link techniques, so as to accommodate a more holistic graph exploration experience."
In addition to obtaining background information on this patent application, VerticalNews editors also obtained the inventors' summary information for this patent application: "In one embodiment, a computer-implemented method for identifying relationships between entities includes accessing a first data structure being a two-dimensional array of scalar elements (e, e.sub.ij, e.sub.kl.sup.(i)) representable as a matrix, each of the scalar elements capturing a relationship between two entities; reorganizing the first data structure by clustering the scalar elements separately on each dimension of the two-dimensional array, to obtain a second data structure, representable as a K.times.M block matrix, which is an arrangement of rows and columns of blocks, wherein each block is a reordered sequence of rows and/or columns of the first data structure; compacting the second data structure by: determining two parallel block sequences, which are the most similar according to a given distance measure, the parallel block sequences being either distinct rows or distinct columns of blocks of the second data structure; and reorganizing the second data structure by merging the two determined sequences into a single block sequence, wherein the n.sup.th block of the single sequence is the union of: the n.sup.th block of a first one of the two parallel sequences; and the n.sup.th block of a second one of the two parallel sequences, wherein a compacted data structure is obtained which is representable as a K-1.times.M or a K.times.M-1 block matrix; repeating the compacting, using a compacted data structure as input, in place of the second data structure; and identifying, in a graphical user interface, one or more blocks of a compacted data structure and/or selected scalar elements therein.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
"FIG. 1 represents a general purpose computerized system, suited for implementing one or more method steps as involved in embodiments of the invention;
"FIG. 2A shows a typical example of a bipartite graph representation;
"FIG. 2B is an adjacency matrix representation of the same relationships as captured in the bipartite graph representation of FIG. 2A;
"FIG. 3 is a flowchart diagram illustrating steps and components involved in a method for identifying relationships between entities, according to embodiments;
"FIG. 4 is a flowchart showing only the succession of steps involved in FIG. 3;
"FIGS. 5A, 5B, 5C and 5D illustrate, graphically, successive operations of (i) determinations of most similar parallel block sequences (either two rows or columns of blocks), and (ii) subsequent merging of the determined sequences into single block sequences, for an example of input matrix;
"FIG. 6 shows an example of an input matrix (left) and a final matrix representation (right) of a data structure compacted according to embodiments;
"FIG. 7 illustrates, graphically, ten intermediate steps to compact the input matrix of FIG. 6 (left) and arrive at the final matrix of FIG. 6 (right), and the corresponding minimization of an information theoretic objective function E, as in embodiments;
"FIG. 8 illustrates examples of a graphical interface displaying dense blocks of a compacted data structure as well as selected scalar elements thereof, the elements capturing relationships between pairs of (real-world) entities, as in embodiments;
"FIG. 9 is a graph comparing runtime performances of an embodiment of the invention vs. a spectral co-clustering approach; and
"FIGS. 10A and 10B show a comparison of present methods (in embodiments which automatically determine the number of final co-clusters) with spectral-based methods (requiring as input the number of co-clusters)."
For more information, see this patent application: Labbi, Abderrahim; Vlachos, Michail. Identifying Relationships between Entities. Filed
Keywords for this news article include: Information Technology, Information and Data Architecture,
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
Most Popular Stories
- Prosecutor to Investigate Walmart Police Shooting
- Chrysler Gets Nod as a Top Employer for Hispanic Women
- GM to Announce New Jobs in Tennessee
- Mark Sanchez Suddenly a Hot QB Commodity
- Hispanic Entrepreneurs Set Pace in Florida
- Smith & Wesson Misses Target
- Laid-off Workers Return to Their Fields
- Emirates Hit Libyan Targets With Airstrikes
- Marco Rubio Warns Obama on Deportations
- Michael Brown Funeral: Can Americans Change the Script of Violence?