News Column

Patent Issued for Entity Assessment and Ranking

February 11, 2014



By a News Reporter-Staff News Editor at Information Technology Newsweekly -- According to news reporting originating from Alexandria, Virginia, by VerticalNews journalists, a patent by the inventors Cumby, Chad Michael (Chicago, IL); Probst, Katharina (Atlanta, GA); Ghani, Rayid (Chicago, IL), filed on December 29, 2008, was published online on January 28, 2014.

The assignee for this patent, patent number 8639682, is Accenture Global Services Limited (Dublin, IE).

Reporters obtained the following quote from the background information supplied by the inventors: "With the advent of the World Wide Web and Internet, the volume of publicly available information has grown at an unprecedented rate. In order to make sense of this ever-expanding collection, significant attention has been paid to the development of improved document searching techniques, such as search engines and the like. While such techniques have greatly improved the speed, cost and accuracy of locating relevant documents in an essentially unstructured knowledge base, the realm of entity retrieval and ranking, until recently, has been the subject of limited research. As used herein, an entity is defined by its ability to be described by one or more nouns, e.g., a person, place or thing. By way of non-limiting example, in the context of commercial enterprises, entities may comprise employees, clients, projects, partners, alliances, facility locations, competitors, etc. Of course, similar entities will be readily apparent in numerous endeavors beyond the commercial context. Regardless, the ability to quickly identify entities relevant to a given topic of interest will find application in a wide variety of applications.

"For example, referring again to the commercial context, the preparation of business proposals may be made more efficient if one is able to quickly identify subject matter experts within the organization submitting the proposal. In a similar vein, the ability to accurately identify the most qualified potential team members with specific skill sets would improve project staffing. Further still, identifying the best vendors for certain equipment or service needs would be greatly simplified through provision of a system that enables quick and accurate identification of relevant entities. Stated more generally, various knowledge management tasks can be greatly simplified or assisted by delivering relevant information about entities to those responsible for such knowledge management tasks.

"Currently, it is very difficult to retrieve entity-related information. In a business context, any commercial enterprise search engine, in a manner akin to web search engines, will yield a list of documents relevant to a particular topic query. However, such engines are of little help in retrieving a reliable ranked list of entities relevant to the topic, and it is left to the requester to sift through the returned documents to identify any particularly relevant entities.

"More recently, entity, and especially expert, ranking has received a growing amount of attention. For example, the Initiative for the Evaluation of XML Retrieval (INEX) has introduced an entity ranking track. Such systems currently rely on the retrieved entities being marked up with Extensible Markup Language (XML). However, not all content within a given knowledge base may have entities tagged with appropriate mark-up. The Text Retrieval Conference (TREC) recently introduced an enterprise track, including an expert finding task. In one approach, a list of experts is provided and, for a given expert, a pseudo-document is created from all documents located that include a mention of that expert. In another approach, potentially relevant documents for a topic are retrieved and experts are subsequently extracted from (i.e., identified in) the set of documents. Ranking of the extracted experts according to their relevance to the topic is inferred by the number of mentions for each expert; more mentions results in higher rankings. However, to the extent that the number of mentions of an expert in a set of documents is subject to numerous other factors beyond relevance to a given topic, such systems are susceptible to providing inaccurate results. Further still, some expert identification techniques exploit structural information of documents, such as references from other, topically relevant documents or, in the example of emails, explicit links to other emails. With regard to these expert identification techniques, expert retrieval, while important, is appropriately viewed as a subset of entity retrieval and ranking and is thus limited in scope. That is, a more general entity retrieval and ranking approach represents a more scalable solution allowing for application to a wider variety of situations, and would therefore represent an advancement in the art."

In addition to obtaining background information on this patent, VerticalNews editors also obtained the inventors' summary information for this patent: "The instant disclosure describes techniques for general entity retrieval and ranking based on specific topic queries directed to document repositories. In particular, the instant disclosure describes techniques that leverage the availability of metadata about the documents being searched, which metadata is often more available in enterprise document repositories. The disclosed techniques may be implemented using suitable processing devices, such as general purpose or application specific computers, or other equivalent implementation techniques known in the art.

"In one embodiment, a user may directly, or via an intervening component, provide a topic that is subsequently formed into a query. Based on the query, a first set of documents is retrieved from one or more document repositories, for example via a suitable search engine. The first set of documents have first metadata values for a corresponding plurality of metadata attributes. The first set of documents is then characterized based on the first set of metadata values. One or more candidate entities are then identified based on the first set of documents. For example, candidate entities may be identified through text-extraction applied to the first set of documents, the number of mentions in the first set of documents or directly from the first metadata values. In one embodiment, the one or more candidate entities are selected according to an entity type, potentially provided by the querying user.

"Thereafter, the original query is augmented according to the one or more candidate entities. That is, document repository(ies) are searched again based on the original query and one of the candidate entities. The resulting second set of documents is then characterized on the basis of the same metadata attributes and the second metadata values associated with the second set of documents. In one embodiment, a document set is characterized by creating a vector in which each of the metadata values constitutes a separate dimension, optionally with weighting values for specific metadata values applied. Regardless, the first and second document set characterizations are then compared (e.g., through a vector comparison) to determine their degree of similarity. Increasingly similar document set characterizations lead to the inference that the candidate entity giving rise to the second document set is increasingly relevant to the original query. The intuition behind this metric is that the metadata values provide a detailed model of the documents in a retrieved set. Further, the metadata describes not only the content of each document (e.g., by automatic retrieval of named entities or keywords), but also other information associated with it. In other words, it describes the 'essence' of the document along with all relevant data about it, e.g., its type, associated part of an organization, etc. Repeating this process for each of the one or more candidate entities can give rise to rankings according to the respective degrees of similarity, which rankings can be subsequently provided in ordered list form. In this manner, the disclosed techniques represent an advancement in the art."

For more information, see this patent: Cumby, Chad Michael; Probst, Katharina; Ghani, Rayid. Entity Assessment and Ranking. U.S. Patent Number 8639682, filed December 29, 2008, and published online on January 28, 2014. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=21&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1001&f=G&l=50&co1=AND&d=PTXT&s1=20140128.PD.&OS=ISD/20140128&RS=ISD/20140128

Keywords for this news article include: Information Technology, Accenture Global Services Limited, Information and Knowledge Management.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Information Technology Newsweekly


Story Tools