News Column

"Text Auto-Correction via N-Grams" in Patent Application Approval Process

January 30, 2014

By a News Reporter-Staff News Editor at Politics & Government Week -- A patent application by the inventors Caskey, Sasha P. (New York, NY); Kanevsky, Dimitri (Ossining, NY); Kozloski, James R. (New Fairfield, CT); Sainath, Tara N. (New York, NY), filed on July 9, 2012, was made available online on January 16, 2014, according to news reporting originating from Washington, D.C., by VerticalNews correspondents.

This patent application is assigned to International Business Machines Corporation.

The following quote was obtained by the news editors from the background information supplied by the inventors: "Text-based communications using electronic devices such as computers and mobile phones require users of these devices to enter text using real or virtual key boards. Some devices provide for spoken text entry by translating spoken words into text. Existing methods of text entry have limitations that yield inaccuracies in the text. For example, the small size of virtual keyboards results in the selection of the wrong characters. In addition, text recognition software is not completely accurate due to variances in speech quality and voice tone. In certain applications such as text-based messaging, the desire by users is to accomplish text-based communication at speeds that rival spoken communications. However, the entry of text takes longer than speaking. Devices attempt to overcome errors and inaccuracies and to improve communication speeds by providing auto-complete and auto-correction functionality in association with text entry.

"Text input devices such as cellular telephones or smartphones provide users with scrollable and selectable lists of words and auto-corrections upon receipt of only the first few letters of any given word. These devices utilize methods such as iTap and T9 to provide this functionality. These capabilities, however, only apply to single words and to the current word being entered. There is no predictive capability or applicability to groups of words or phrases. N-grams have been used extensively in speech recognition and natural language processing to assign probabilities to a current word, given the previous two words. The use of n-grams to auto complete and correct text inputs has been applied to the current word but not to corrections or predictions of previous and subsequent words or to phrases within a given categorical context. Therefore, systems and methods are desired that provide for the auto-correction and prediction of a current phrase as well as previous and subsequent phrases."

In addition to the background information obtained for this patent application, VerticalNews journalists also obtained the inventors' summary information for this patent application: "Exemplary embodiments of systems and methods in accordance with the present invention provide for the correction of an entered text phrase containing a plurality or words, i.e., 'n' words, preferably using an n-gram language model to correct the series of 'n' words. Therefore, a user has a higher likelihood of typing a sentence without errors and does not have to worry about misspelling either a current or previous word. This allows the user to type much faster, since the n-gram model corrects multiple words together. Currently, n-grams give probabilities of a word given the previous n-1 words. Given a current word or 'phrase' entered by a user, the next n-1 words are predicted, allowing a user to auto-complete sentences and to minimize the number of words entered. In addition, words or phrases that precede a current set of n-1 words are predicted. For example, the n-gram determines probabilities of the n-1 words that may have preceded the entered text string, allowing the user to auto-complete preceding text strings as well as subsequent text strings.

"The phrase that is currently being entered can also be predicted, auto-completed or corrected. For example, a user chooses 'core' words from the desired text and enters these words. The phrase is then completed using just the entered core words. In one embodiment, the entered core words are displayed to the user in a suitable graphical user interface. The user selects an entered core word and indicates either a forward or backward direction from that core word for auto-completion of a phrase, for example using n-gram auto-completion. N-gram auto-completion auto-completes previous or subsequent text strings, e.g., characters, letters, words or phrases, depending on the indicated direction. In one embodiment, a plurality of candidate phrases or words are displayed, and the user scrolls through the plurality of displayed candidates, selecting one of the phrases or words. This process is repeated iteratively at each core word by selecting words to fill in the phrase and changing the direction of completion based on the context and expected behavior of the n-gram model.

"In one embodiment, an n-gram model is used that starts with base probabilities for phrase completion derived from a bootstrap system. However, the n-gram probabilities dynamically adjust to a given user based on a vocabulary and phrase history associated with that user. In addition, the present invention allows for the user to add 'n-grams' into the dictionary based on commonly used phrases.

"In one embodiment, a user types two words, word 1 and word 2, that have high bigram and are accepted by the system. The user then types a third, word 3, such that the phrase word 1, word 2, word 3 has a relatively low 3-gram score. The system identifies a substitute word, substitute 1, that has a small hamming distance to word 1; however, the 3-gram probability associated with the phrase substitute 1, word 2, word 3 is much higher than 3-gram probability of the phrase word 1, word 2, word 3. Therefore, the input phrase is auto-corrected to the phrase substitute 1, word 2, word 3.

"In one embodiment, back auto-correction, i.e., correction of preceding text strings including words and phrases, is done using topics that are identified dynamically. As the user enters words or phrases, a given categorical topic is identified that relates to the entered words, i.e., the words relate to a given subject. The suggested phrases and n-grams are adapted to the categorical topic. Candidate text strings are selected and suggestions are made that relate to this topic. The present invention also provides for the entry of words and phrases in different languages or a mix of different languages as well as the translation of phrases among different languages. For example, when multi-lingual users mix multiple languages when typing, the present invention performs auto-correction and prediction using words from different languages. In addition to spelling and content correction, the present invention provides for translation of a phrase entered in a first language into a second language. The combination of the modeling and the user interface allows a user to type a message in one language and view the output in a different language, making changes as the text strings are entered.

"In one embodiment, systems and methods in accordance with the present invention are extended to correcting entire sentences and paragraphs. A template is provided to users, and the users fill in certain sentences and phrases in the template. Then the same n-gram model evaluates the parts of the template that have been filled and use this evaluation to hypothesize parts of phrases and even sentences that have not been written, completing the sentences, paragraphs and expressions of the users. For an example involving the writing of a letter of recommendation for a candidate, the template asks for certain adjectives to describe the candidate, e.g, smart, hard-working and creative, and uses these words to create a paragraph describing these qualities of the candidate in more detail. In one embodiment, the language model is narrowed to a domain for a given topic related to a form that a user is completing, and the method uses the entered information to complete the form.


"FIG. 1 is a schematic representation of an embodiment of a system for text auto-correction in accordance with the present invention;

"FIG. 2 is flow chart of an embodiment of a method for text auto-correction in accordance with the present invention;

"FIG. 3 is schematic representation of an embodiment of a graphical user interface for input text string completion in accordance with the present invention."

URL and more information on this patent application, see: Caskey, Sasha P.; Kanevsky, Dimitri; Kozloski, James R.; Sainath, Tara N. Text Auto-Correction via N-Grams. Filed July 9, 2012 and posted January 16, 2014. Patent URL:

Keywords for this news article include: International Business Machines Corporation.

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC

For more stories covering the world of technology, please see HispanicBusiness' Tech Channel

Source: Politics & Government Week

Story Tools