The patent's assignee for patent number 8639701 is
News editors obtained the following quote from the background information supplied by the inventors: "The subject matter of this specification relates generally to cross-language information retrieval.
"Internet search engines aim to identify resources (e.g., web pages, images, text documents, multimedia context) that are relevant to a user's needs and to provide information about the resources in a manner that is most useful to the user. Internet search engines return a set of search results in response to a user submitted query.
"With the increasing number of non-English language users and content providers, there is a significant amount of non-English content on the web.
As a supplement to the background information on this patent, VerticalNews correspondents also obtained the inventors' summary information for this patent: "This specification describes technologies relating to cross-language information retrieval.
"Cross-language information retrieval can be performed without a user specifying any particular languages to search. One or more languages can be automatically selected for cross-language information retrieval for a received query. The query is translated into the one or more languages and respective searches are performed. Search results responsive to the respective queries are identified and one or more search results are provided, e.g., for presentation or display in a search results interface.
"In general, one aspect of the subject matter described in this specification can be embodied in methods that include the actions of identifying a structured collection of documents, the collection of documents being organized according to a hierarchy of categories; extracting entities from structured collection of document; assigning language scores to each document in the collection of documents; assigning language scores to entities based on scores of associated documents of the collection of documents; and generating a mapping between entities and language scores. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods. A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions.
"These and other embodiments can each optionally include one or more of the following features. The method further includes extracting queries leading to documents in the structured collection of documents; and augmenting the mapping to incorporate queries associated with particular entities associated with the respective documents in the structure collection of documents. Extracting entities is based on capitalization within the structured collection of documents. Extracting entities is based on terms in the structured collection of documents that reference other content in the structured collection of documents. Assigning language scores to each document is based on a language score or scores for hierarchical categories in the structured collection of documents. The method further includes receiving a user input query and matching one or more query terms to one or more entities and using the mapping to determine language scores for the one or more query terms. The method further includes storing the mapping between entities and language scores. The mapping is stored in a table.
"Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. Cross-language searching is simplified for users by automatically selecting one or more languages likely to be most relevant to search. Additionally, by identifying the relevant language or languages, relevant search results can be efficiently identified. Computing resources can be focused on the selected languages allowing more resources to be devoted to improving translation quality or to adding more synonyms in the selected languages to improve information retrieval.
"The details of one or more embodiments of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims."
For additional information on this patent, see: Lim,
Keywords for this news article include:
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
Most Popular Stories
- Chinese May Have Spotted Malaysia Airlines Debris
- Why Buffett Bets Big on Green Energy
- 3 Shot Dead in Venezuela Unrest
- Banks Buying Little From Minority Firms: Study
- Better Pay Means Bigger Profits: Strategist
- Several Texas Cities Top Job Search List
- First-time Jobless Claims Drop Unexpectedly
- Senate Committee OKs Bill to Sanction Russia
- G7 Presses Russia to Pull Troops Out of Crimea
- Wall Street Rally Heads Off 3rd Day of Decline