The assignee for this patent, patent number 8639682, is
Reporters obtained the following quote from the background information supplied by the inventors: "With the advent of the World Wide Web and Internet, the volume of publicly available information has grown at an unprecedented rate. In order to make sense of this ever-expanding collection, significant attention has been paid to the development of improved document searching techniques, such as search engines and the like. While such techniques have greatly improved the speed, cost and accuracy of locating relevant documents in an essentially unstructured knowledge base, the realm of entity retrieval and ranking, until recently, has been the subject of limited research. As used herein, an entity is defined by its ability to be described by one or more nouns, e.g., a person, place or thing. By way of non-limiting example, in the context of commercial enterprises, entities may comprise employees, clients, projects, partners, alliances, facility locations, competitors, etc. Of course, similar entities will be readily apparent in numerous endeavors beyond the commercial context. Regardless, the ability to quickly identify entities relevant to a given topic of interest will find application in a wide variety of applications.
"For example, referring again to the commercial context, the preparation of business proposals may be made more efficient if one is able to quickly identify subject matter experts within the organization submitting the proposal. In a similar vein, the ability to accurately identify the most qualified potential team members with specific skill sets would improve project staffing. Further still, identifying the best vendors for certain equipment or service needs would be greatly simplified through provision of a system that enables quick and accurate identification of relevant entities. Stated more generally, various knowledge management tasks can be greatly simplified or assisted by delivering relevant information about entities to those responsible for such knowledge management tasks.
"Currently, it is very difficult to retrieve entity-related information. In a business context, any commercial enterprise search engine, in a manner akin to web search engines, will yield a list of documents relevant to a particular topic query. However, such engines are of little help in retrieving a reliable ranked list of entities relevant to the topic, and it is left to the requester to sift through the returned documents to identify any particularly relevant entities.
"More recently, entity, and especially expert, ranking has received a growing amount of attention. For example, the Initiative for the Evaluation of XML Retrieval (INEX) has introduced an entity ranking track. Such systems currently rely on the retrieved entities being marked up with Extensible Markup Language (XML). However, not all content within a given knowledge base may have entities tagged with appropriate mark-up.
In addition to obtaining background information on this patent, VerticalNews editors also obtained the inventors' summary information for this patent: "The instant disclosure describes techniques for general entity retrieval and ranking based on specific topic queries directed to document repositories. In particular, the instant disclosure describes techniques that leverage the availability of metadata about the documents being searched, which metadata is often more available in enterprise document repositories. The disclosed techniques may be implemented using suitable processing devices, such as general purpose or application specific computers, or other equivalent implementation techniques known in the art.
"In one embodiment, a user may directly, or via an intervening component, provide a topic that is subsequently formed into a query. Based on the query, a first set of documents is retrieved from one or more document repositories, for example via a suitable search engine. The first set of documents have first metadata values for a corresponding plurality of metadata attributes. The first set of documents is then characterized based on the first set of metadata values. One or more candidate entities are then identified based on the first set of documents. For example, candidate entities may be identified through text-extraction applied to the first set of documents, the number of mentions in the first set of documents or directly from the first metadata values. In one embodiment, the one or more candidate entities are selected according to an entity type, potentially provided by the querying user.
"Thereafter, the original query is augmented according to the one or more candidate entities. That is, document repository(ies) are searched again based on the original query and one of the candidate entities. The resulting second set of documents is then characterized on the basis of the same metadata attributes and the second metadata values associated with the second set of documents. In one embodiment, a document set is characterized by creating a vector in which each of the metadata values constitutes a separate dimension, optionally with weighting values for specific metadata values applied. Regardless, the first and second document set characterizations are then compared (e.g., through a vector comparison) to determine their degree of similarity. Increasingly similar document set characterizations lead to the inference that the candidate entity giving rise to the second document set is increasingly relevant to the original query. The intuition behind this metric is that the metadata values provide a detailed model of the documents in a retrieved set. Further, the metadata describes not only the content of each document (e.g., by automatic retrieval of named entities or keywords), but also other information associated with it. In other words, it describes the 'essence' of the document along with all relevant data about it, e.g., its type, associated part of an organization, etc. Repeating this process for each of the one or more candidate entities can give rise to rankings according to the respective degrees of similarity, which rankings can be subsequently provided in ordered list form. In this manner, the disclosed techniques represent an advancement in the art."
For more information, see this patent: Cumby,
Keywords for this news article include: Information Technology,
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
Most Popular Stories
- Chobani Counters Competition With Expanded Lineup
- Pope Francis, Huge Crowd Joyously Celebrate Easter
- Automakers Turn to China to Fuel Sales Growth
- GM Boosting China Production Capacity
- GOP Making Bold Play for Oregon Senate Seat
- Report: Iran VP Says Row Over Reactor Resolved
- Delay in Ferry Evacuation Puzzles Maritime Experts
- Confusion, Anger as Sunken Ferry's Relatives Wait
- NASA's Space Station Robonaut Finally Getting Legs
- Iran Denounces U.S. Ruling to Sell Property