News Column

Researchers Submit Patent Application, "Disambiguation and Tagging of Entities", for Approval

February 25, 2014

By a News Reporter-Staff News Editor at Information Technology Newsweekly -- From Washington, D.C., VerticalNews journalists report that a patent application by the inventor Houghton, David F. (Brattleboro, VT), filed on August 28, 2013, was made available online on February 13, 2014.

The patent's assignee is Comcast Interactive Media, LLC.

News editors obtained the following quote from the background information supplied by the inventors: "With the advent of the Internet and computing technologies in general, information about a wide array of topics has become readily available. The accessibility of such information allows a person to read about a topic and immediately obtain additional information about an entity mentioned in the article, webpage, white paper or other media. The entity may be a person, a movie, a song, a book title and the like. Alternatively, a person may wish to add the article or webpage to a database of information about the entity mentioned. However, the process of confirming that the entity mentioned corresponds to a particular known entity (e.g., a known entity in a database or an entity identified through a search) may be tedious and time consuming. Furthermore, tagging or associating an entity with the wrong person or title may lead to various inefficiencies in a system."

As a supplement to the background information on this patent application, VerticalNews correspondents also obtained the inventor's summary information for this patent application: "The following presents a simplified summary of the disclosure in order to provide a basic understanding of some aspects. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention. The following summary merely presents some concepts of the disclosure in a simplified form as a prelude to the more detailed description provided below.

"One or more aspects described herein relate to identifying and tagging entities in a content item. In one example, an article about a scientific breakthrough may name the scientists that were involved in the effort and the institution (e.g., a school) where the research took place. The scientists and the institution may each be tagged as a known entity if those scientists or the institution are known to a database or system. By tagging the scientists or institution, a processing system may link a user to additional information about each of the entities such as other articles, videos and the like. Additionally or alternatively, content items, once tagged, may be organized or sorted based on entities that are referenced therein.

"According to another aspect, candidate entities (i.e., entities that have not been confirmed as references to known entities) may be associated with some level of ambiguity in view of the candidate entity's similarity to multiple known entities. In such instances, the ambiguity is resolved before the candidate entity is tagged. Thus, disambiguation may be performed and may include the sorting and ranking of the multiple known entities for which the conflicted candidate entity may be a match according to a hierarchy of criteria. Once sorted, the lowest ranked known entity may be removed from consideration. The process may repeat until a single known entity remains, at which point the candidate entity may be tagged as corresponding to the remaining known entity.

"According to yet another aspect, the identification, classification and disambiguation process for candidate entities may be based on prior knowledge that is collected from a variety of sources either automatically or manually or both. For example, some articles or other content items may be manually tagged to identify people mentioned in those content items. Accordingly, the manual decisions and taggings may serve as a basis for the matching, categorization and disambiguation of candidate entities. Language models and finite state automata (e.g., built by the prior knowledge) may also be used to classify and identify candidate entities in a content item. Finite state automata (FSA) refer generally to process models comprising a number of finite states and transitions between the states and actions. FSAs may be used to identify subsequences of characters in strings, e.g., to find potential names. The language model may then assign probabilities to the identified strings, allowing for the identification of unusual uses of language, and in particular ordinary phrases used as names.

"According to one or more configurations, a feature detector may be used to identify attributes of a tagged content item or entity that may help with the matching, classification and disambiguation of other content items or entities. For example, if a person is referred to using an epithet in a tagged content item, the processing system may use or look for the epithet to determine whether a candidate entity in another content item refers to the same person.

"In other embodiments, the present invention can be partially or wholly implemented on a computer-readable medium, for example, by storing computer-executable instructions or modules, or by utilizing computer-readable data structures.

"Of course, the methods and systems of the above-referenced embodiments may also include other additional elements, steps, computer-executable instructions, or computer-readable data structures. In this regard, other embodiments are disclosed and claimed herein as well.

"The details of these and other embodiments of the present invention are set forth in the accompanying drawings and the description below. Other features and advantages of the invention will be apparent from the description and drawings, and from the claims.


"The present disclosure is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:

"FIG. 1 illustrates an example network distribution system in which content items may be provided to subscribing clients.

"FIG. 2 illustrates an example content item that may be analyzed and tagged according to one or more aspects described herein.

"FIG. 3 illustrates an example method for identifying and classifying candidate entities in a content item according to one or more aspects described herein.

"FIG. 4 illustrates an example of co-occurrence in a content item according to one or more aspects described herein.

"FIG. 5 illustrates an example method for disambiguating candidate entities according to one or more aspects described herein.

"FIGS. 6A and 6B illustrate example reference chains according to one or more aspects described herein.

"FIG. 7 illustrates an example block diagram of an apparatus for receiving content item data and generating content item recommendations according to one or more aspects described herein.

"FIG. 8 illustrates a tagged content item and information accessible through the tagged content item according to one or more aspects described herein.

"FIG. 9 illustrates a method for associating links to additional information with a tagged content item according to one or more aspects described herein."

For additional information on this patent application, see: Houghton, David F. Disambiguation and Tagging of Entities. Filed August 28, 2013 and posted February 13, 2014. Patent URL:

Keywords for this news article include: Information Technology, Information and Data Architecture.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC

For more stories covering the world of technology, please see HispanicBusiness' Tech Channel

Source: Information Technology Newsweekly

Story Tools