"While speech synthesis systems are useful in reading back text for a known language, speech synthesis becomes more problematic when dealing with text messages that include slang terms, abbreviations, and other non-standard words used in text messages. The speech synthesis systems rely on a model that maps known words to an audio model for speech synthesis. When synthesizing unknown words, many speech synthesis systems fall back to imperfect phonetic approximations of words, or spell out words letter-by-letter. In these conditions, the output of the speech synthesis system does not follow the expected flow of normal speech, and the speech synthesis system can become a distraction. Other text processing systems, including language translation systems and natural language processing systems, may have similar problems when text messages include non-standard spellings and word forms.
"While existing dictionaries may provide translations for common slang terms and abbreviations, the variety of alternative spellings and constructions of standard words that are used in text messages is too broad to be accommodated by a dictionary compiled from standard sources. Additionally, portable electronic device users are continually forming new variations on existing words that could not be available in a standard dictionary. Moreover, the mapping from standard words to their nonstandard variations is many-to-many, that is, a nonstandard variation may correspond to different standard word forms and vice versa. Consequently, systems and methods for predicting variations of standard words to enable normalization of alternative word forms to standard dictionary words would be beneficial."
Supplementing the background information on this patent application, VerticalNews reporters also obtained the inventors' summary information for this patent application: "In one embodiment, a method for generating non-standard tokens from a standard token stored in a memory has been developed. The method includes selecting a standard token from a plurality of standard tokens stored in the memory, the selected token having a plurality of input characters, selecting an operation from a plurality of predetermined operations in accordance with a random field model for each input character in the plurality of input characters, performing the selected operation on each input character to generate an output token that is different from each token in the plurality of standard tokens, and storing the output token in the memory in association with the selected token.
"In another embodiment, a method for generating operational parameters for use in a random field model has been developed. The method includes comparing each token in a first plurality of tokens stored in a memory to a plurality of standard tokens stored in the memory, identifying a first token in the first plurality of tokens as a non-standard token in response to the first token being different from each standard token in the plurality of standard tokens, identifying a second token in the first plurality of tokens as a context token in response to the second token providing contextual information for the first token, generating a database query including the first token and the second token, querying a database with the generated query, identifying a result token corresponding to the first token from a result obtained from the database, and storing the result token in association with the first token in a memory.
Most Popular Stories
- Twitter Names Woman to Board
- Aspen Contracting Adding 300 Jobs
- NSA Tracks 5 Billion Cellphone Records a Day
- Nelson Mandela Dies After Momentous Life
- U.S. Unemployment Rate Dips to 7 Percent
- Consumer Spending Rises, Incomes Fall
- Fast-Food Workers Want $15 an Hour
- Roybal-Allard Tours Gordon Brush Plant
- Ford Mustang Still Packs Power
- W.H. Corrects Itself on Unclegate