News Column

Patent Issued for Methods and Apparatus for Formant-Based Voice Synthesis

May 6, 2014



By a News Reporter-Staff News Editor at Journal of Technology -- According to news reporting originating from Alexandria, Virginia, by VerticalNews journalists, a patent by the inventors Edgington, Michael D. (Bridgewater, MA); Gillick, Laurence (Newton, MA); Cohen, Jordan R. (Gloucester, MA), filed on February 27, 2013, was published online on April 22, 2014.

The assignee for this patent, patent number 8706488, is Nuance Communications, Inc. (Burlington, MA).

Reporters obtained the following quote from the background information supplied by the inventors: "Speech synthesis is a growing technology with applications in areas that include, but are not limited to, automated directory services, automated help desks and technology support infrastructure, human/computer interfaces, etc. Speech synthesis typically involves the production of electronic signals that, when broadcast, mimic human speech and are intelligible to a human listener or recipient. For example, in a typical text-to-speech application, text to be converted to speech is parsed into labeled phonemes which are then described by appropriately composed signals that drive an acoustic output, such as one or more resonators coupled to a speaker or other device capable of broadcasting sound waves.

"Speech synthesis can be broadly categorized as using either concatenative or formant-based methods to generate synthesized speech. In concatenative approaches, speech is formed by appropriately concatenating pre-recorded voice fragments together, where each fragment may be a phoneme or other sound component of the target speech. One advantage of concatenative approaches is that, since it uses actual recordings of human speakers, it is relatively simple to synthesize natural sounding speech. However, the library of pre-recorded speech fragments needed to synthesize speech in a general manner requires relatively large amounts of storage, limiting application of concatenative approaches to systems that can tolerate a relatively large footprint, and/or systems that are not otherwise resource limited. In addition, there may be perceptual artifacts at transitions between speech fragments.

"Formant-based approaches achieve voice synthesis by generating a model configured to build a speech signal using a relatively compact description or language that employs at least speech formants as a basis for the description. The model may, for example, consider the physical processes that occur in the human vocal tract when an individual speaks. To configure or train the model, recorded speech of known content may be parsed and analyzed to extract the speech formants in the signal. The term formant refers herein to certain resonant frequencies of speech. Speech formants are related to the physical processes of resonance in a substantially tubular vocal tract. The formants in a speech signal, and particularly the first three resonant frequencies, have been identified as being closely linked to, and characteristic of, the phonetic significance of sounds in human speech. As a result, a model may incorporate rules about how one or more formants should transition over time to mimic the desired sounds of the speech being synthesized.

"Generally speaking, there are at least two phases to formant-based speech synthesis: 1) generating a speech synthesis model capable of producing a formant tract characteristic of target speech; and 2) speech production. Generating the speech synthesis model may include analyzing recorded speech signals, extracting formants from the speech signals and using knowledge gleaned from this information to train the model. Speech production generally involves using the trained speech synthesis model to generate the phonetic descriptions of the target speech, for example, generating an appropriate formant tract, and converting the description (e.g., via resonators) to an acoustic signal comprehensible to a human listener."

In addition to obtaining background information on this patent, VerticalNews editors also obtained the inventors' summary information for this patent: "On embodiment according to the present invention includes a method of processing a voice signal to extract information to facilitate training a speech, synthesis model, the method comprising acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison.

"Another embodiment according to the present invention includes a computer readable medium encoded with a program for execution on at least one processor, the program, when executed on the at least one processor, performing a method of processing a voice signal to extract information from the voice signal to facilitate training a speech synthesis model, the method comprising acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison.

"Another embodiment according to the present invention includes computer readable medium encoded with a speech synthesis model adapted to, when operating, generate human recognizable speech, the speech synthesis modeled trained to generate the human recognizable speech, at least in part, by performing acts of detecting a plurality of candidate features in the voice signal, performing a comparison between combinations of the candidate features and the voice signal, and selecting a desired set of features from the candidate features based, at least in part, on the comparison."

For more information, see this patent: Edgington, Michael D.; Gillick, Laurence; Cohen, Jordan R.. Methods and Apparatus for Formant-Based Voice Synthesis. U.S. Patent Number 8706488, filed February 27, 2013, and published online on April 22, 2014. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=30&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1493&f=G&l=50&co1=AND&d=PTXT&s1=20140422.PD.&OS=ISD/20140422&RS=ISD/20140422

Keywords for this news article include: Technology, Nuance Communications Inc..

Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC


For more stories covering the world of technology, please see HispanicBusiness' Tech Channel



Source: Journal of Technology


Story Tools






HispanicBusiness.com Facebook Linkedin Twitter RSS Feed Email Alerts & Newsletters