ÿþ<!-- X-URL: http://www.gnu.org/software/gnuspeech/gnuspeech.html --> <!-- <BASE HREF="http://www.gnu.org/software/gnuspeech/gnuspeech.html"> --> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> <HEAD> <TITLE>Gnuspeech - GNU Project - Free Software Foundation (FSF)</TITLE> <LINK REV="made" HREF="mailto:webmasters@www.gnu.org"> <META NAME="keywords" CONTENT="gnuspeech articulatory speech synthesis tube model distinctive region formant sensitivity"> <meta http-equiv="content-type" content='text/html; charset=us-ascii'> </HEAD> <BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#1F00FF" ALINK="#FF0000" VLINK="#9900DD"> <A HREF="http://www.gnu.org"><IMG SRC="http://www.gnu.org/graphics/gnu-head-sm.jpg" ALT=" [image of the Head of a GNU] " WIDTH="129" HEIGHT="122"></A> <P> <H1><A NAME="TOC"><I>gnuspeech</I></A></H1> <OL> <LI><A NAME="TOCWhatis" HREF="#Whatis">What is <I>gnuspeech</I></A> <LI><A NAME="TOCGoal" HREF="#Goal">What is the goal of the <I>gnuspeech</I> project?</A> <LI><A NAME="TOCReleases" HREF="#Releases">Releases?</A> <UL> <LI><A NAME="TOCDevelopment" HREF="#Development"> Development & &#147;Coming Soon&#148;</A> <!-- <LI><A NAME="TOCStable" HREF="#Stable">Current Stable Release</A> <LI><A NAME="TOCHistory" HREF="#History">Release History</A> --> </UL> <!-- <LI><A NAME="TOCPlatforms" HREF="#Platforms">Supported Platforms</A> --> <LI><A NAME="TOCWhy" HREF="#Why">Why is it called gnuspeech?</A> <LI><A NAME="TOCObtaining" HREF="#Obtaining">Obtaining gnuspeech</A> <LI><A NAME="TOCHelp" HREF="#Help">Getting help with gnuspeech</A> <UL> <LI><A NAME="TOCManuals" HREF="#Manuals">Manuals</A> <!-- <LI><A NAME="TOCFAQ" HREF="#HelpFAQ">FAQ</A> <LI><A NAME="TOCHelpMailing" HREF="#HelpMailing">Mailing Lists</A> <LI><A NAME="TOCHelpUsenet" HREF="#HelpUsenet">Usenet</A> --> </UL> <LI><A NAME="TOCFindingPackages" HREF="#FindingPackages">Finding additional packages for <I>gnuspeech</I></A> <LI><A NAME="TOCFurther" HREF="#Further">Further information</A> <LI><A NAME="TOCYouHelp" HREF="#YouHelp"> If you want to help with <I>gnuspeech</I></A> <LI><A NAME="TOCCredits" HREF="#Credits"> Those who have helped develop and port <I>gnuspeech</I></A> </OL> <BR> <HR> <H2><A NAME="Whatis" HREF="#TOCWhatis">What is <I>gnuspeech</I>?</A></H2> <P> <B>gnuspeech</B> makes it easy to produce high quality computer speech output, design new language databases, and create controlled speech stimuli for psychophysical experiments. <P> The suite of programs uses a true articulatory model of the vocal tract and incorporates models of English rhythm and intonation based on extensive research that set a new standard for synthetic speech. </P> <P> The original <I>NeXT</I> computer implementation is complete. The ports to both OS X and GNU/Linux provide English text-to-speech capability, but parts of the database creation tools are still in the process of being ported. </P> <P> The research that provides the foundation of the system was carried out in research departments in France, Sweden, Poland, and Canada and is ongoing. The original system was commercialised by a now-liquidated University of Calgary spin-off company&#8212;<I>Trillium Sound Research Inc</I>. All the software has subsequently been donated by its creators to the <I>Free Software Foundation</I> forming the basis of the GNU Project <I>gnuspeech</I>. It is freely available under a General Public Licence, as described herein. </P> <P> Some of the features of <I>gnuspeech</I>, with the tools that are part of the software suite, tools include: <UL> <LI><I>A Tube Resonance Model</I> (<I>TRM</I>) for the human vocal tract (also known as a transmission-line analog, or a waveguide model) that truly represents the physical properties of the tract, including the energy balance between the nasal and oral cavities as well as the radiation impedance at lips and nose. <LI>A <I>TRM Control Model</I>, based on formant sensitivity analysis, that provides a simple, but accurate method of low-level articulatory control that avoids the need for an excessive number of control parameters (only eight varying tube section radii need be specified&#8212;and approach based on Carr&eacute;s <I>Distinctive Region Model</I> or <I>DRM</I>). Additionally, a vocal fold waveform, and various suitably &#8220;coloured&#8221; noises may be injected at the appropriate restrictions in the tract model (when they occur) to emulate voicing, aspiration, frication and noise bursts. The model, which&#8212;together with the TRM&#8212;is the heart of the <I>TextToSpeech Server</I>, is based on research at KTH in Stockholm, LCTI (ENST) in Paris, and The University of Calgary. <LI><I>Databases</I> specifying the characteristics of the articulatory postures&#8212;which loosely correspond to <I>phonemes</I>, rules for combinations of postures, and information about voicing, frication and aspiration, as required to produce specific spoken languages from an augmented phonemic input. Currently, only the database for the English language exists, but French vowels are also included in that. <LI>A text-to-augmented-phonetics conversion module (the <I>Parser</I>) to convert arbitrary text, preferably incorporating normal punctuation, into the form required for applying the synthesis methods. The <I>Parser</I> is an integral component of the <I>TextToSpeech Server</I> but can also be used separately (it is, for example, also built into the current <I>Monet</I> database editor). <LI><I>Models of English rhythm and intonation</I> based on research at IPO in The Netherlands, the University of Essex (UK) and the University of Calgary. <LI><I>&#147;Monet&#148</I>&#8212;a database creation and editing system, with a carefully designed graphical user interface (GUI) that allows the databases containing the necessary phonetic data and dynamic rules to be set up and modified in order that the computer can &#8220;speak&#8221; arbitrary languages. At present, as noted, only the English database exists. <LI>A 70,000+ word English <I>Pronouncing Dictionary</I> with rules for derivatives such as plurals, and adverbs. The dictionary also provides part-of-speech information for later addition of grammatical parsing and includes 6000 given names. <LI>Sub-dictionaries that allow different user- or application-specific pronunciations to be substituted for the default pronunciations coming from the main dictionary. <LI>Letter-to-sound rules to deal with spellings and words that are not in the dictionaries. <LI>Tools for managing the dictionary and carrying out analysis of speech. <LI><I>&#147;Synthesizer&#148;</I>&#8212;a GUI-based application to allow experimentation with a stand-alone <I>TRM</I>. All parameters, both static and dynamic, may be varied and the output can be monitored and analysed. It is an important component in the research needed to create the databases for target languages. </UL> <A NAME="diagram"> <CENTER> <!-- <IMG SRC="http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/software/gnuspeech/tts-block-diagram.png?rev=HEAD&cvsroot=www.gnu.org&content-type=image/jpeg"> <IMG SRC="http://www.gnu.org/software/gnuspeech/tts-block-diagram.png"/> --> <IMG SRC="./tts-block-diagram.png" WIDTH="800" HEIGHT="364"/> <BR> <BR> <H2> Overview of the main Articulatory Speech Synthesis System <H2> </CENTER> <P> More detailed information on the components noted above appears in what follows, with an indication of their current state in the OS X and GNU/Linux GNUStep ports, their origin, and some suggestions for further work. </P> <H2><A NAME="Why" HREF="#TOCWhy">Why is it called <I>gnuspeech</I>?</A></H2> <P> It is a play on words. This is a new (g-nu) &#8220;event-based&#8221; approach to speech synthesis from text that uses an accurate articulatory model rather than a formant-based approximation. It is also a GNU project, aimed at providing high quality text-to-speech output for GNU/Linux (and Mac OS X). In addition, it provides comprehensive tools for psychophysical and linguistic experiments as well as for creating the databases for arbitrary languages. </P> <H2><A NAME="Goal" HREF="#TOCGoal">What is the goal of the <I>gnuspeech</I> project?</A></H2> <P> The goal of the project is to create the best speech synthesis software on the planet. </P> <H2><A NAME="Releases" HREF="#TOCReleases">Releases</A></H2> <P> <B>gnuspeech</B> is currently under development. It is being ported from an original NeXTSTEP 3.x version. Both a GNU/Linux GNUStep pre-release version, and a pre-release version for Mac OS/X are available in the project SVN repository (as well as the complete original NeXTSTEP version). Both provide text-to-speech capability and some of the database creation tools (such as <I>Synthesizer</I>) can be used as intended, but work remains to be done to complete the database creation components of <I>Monet</I> that are needed for psychophysical/linguistic experiments, and for setting up new languages. An official release is expected &#8220;real soon now&#8221;. The SVN Repository material may soon be migrated to a Git Repository instead. Stay tuned. </P> <H2><A NAME="Development" HREF="#TOCDevelopment"> Development & &#147;Coming Soon&#148;</A></H2> <P> It would be very helpful if those obtaining and using the pre-release material would also join the mailing list (as explained below), and provide some feedback, ask questions, and so on. </P> <P> Those willing to help with the project are invited to contact the authors/developers through the <A HREF="http://savannah.gnu.org/projects/gnuspeech" TARGET="gnu2">gnu project facilities</A>. Both helpers and users can join the project mailing list by visiting<A HREF="http://mail.gnu.org/mailman/listinfo/gnuspeech-contact" TARGET="gnu3"> the subscription page</A>, and send mail to the group. Offers of help receive special attention! </P> <H2><A NAME="Project State" HREF="#TOCState">A brief technical history of <I>gnuspeech</I>, incorporating the current state</A></H2> <P> The <A HREF="./project-history.html" TARGET="history">project history</A>, explaining the components, is presented on a new page to reduce clutter. </P> <P> In summary, much of the core software has been or is being ported to the Mac under OS/X, and GNU/Linux under GNUStep. All sources and builds are currently in the SVN repository under three branches (for the Next, Mac OS X, and GNU/Linux under GNUStep versions&#8212;see below). Speech may be produced from input text. The development facilities for managing and creating new language databases, or modifying the existing English database for text-to-speech are incomplete, but coming along. These facilities also provide the tools needed for psychophysical and linguistic experiments. <I>Synthesizer</I>, which gives direct access to the tube model, is about 70% complete and already usefully functional&#8212;some of the data displays are stubs at present and clean-up is needed. Some accessory tools are available. As well as the acknowledgements above, Greg Casamento, Adam Fedor and the Savannah Hackers provided valuable support getting the <I>gnuspeech</I> project established, as well as initial work that facilitated the port, including making ubiquitous and tedious changes to the entire <I>NeXT</I> source code to bring it up to <I>OpenStep</I> standards. This work and support is gratefully acknowledged. It involves a lot of effort but is largely invisible to all but the developers involved, and made the actual port to OS X and GNUStep much less painful. </P> <H2><A NAME="Obtaining" HREF="#TOCObtaining">Obtaining gnuspeech</A></H2> <P> <I>gnuspeec</I>h is currently fully available as a NextStep 3.x version, and partly available (with working text-to-speech) as a version that compiles for both Mac OS/X, and GNU/Linux under GNUStep. Additionally, OS X .dmg files that can be directly installed and run are also available. These files are held in the Subversion Repository (<I>not</I> the CVS Repository) on the <A HREF="https://savannah.gnu.org/projects/gnuspeech">savannah web page for the project</A>&#8212; under &#8220;-Browse Sources Repository&#8221; in the &#8220;Development Tools&#8221; section. The material is organised according to the three branches previously mentioned (gnustep/, nextstep/ and osx/). The material may soon be migrated to an easier-to-use Git Repository. Stay tuned. </P> <H2><A NAME="Help" HREF="#TOCHelp">Getting Help with gnuspeech</A></H2> <P> Developers should contact the authors/developers through the <A HREF="http://savannah.gnu.org/projects/gnuspeech" TARGET="gnu2">gnu project facilities</A>. To join the project mailing list, you can go directly to<A HREF="http://mail.gnu.org/mailman/listinfo/gnuspeech-contact" TARGET="gnu3"> the subscription page.</A> Papers and manuals are available on-line (see below). </P> <H2><A NAME="Manuals" HREF="#TOCManuals">Manuals and papers</A></H2></B> <P> A number of papers and manuals relevant to gnuspeech exist: </P> <UL> <LI><A HREF="http://www.cpsc.ucalgary.ca/~hill/papers/monman/index.html" TARGET="new1">The <I>&#147;Monet&#148;</I> manual</A> provides a detailed view of the facilities and screens associated with the MONET subsystem, but does not describe the MONET engine that is used for real-time construction of parameters. <LI><A HREF="http://www.cpsc.ucalgary.ca/~hill/papers/synthesizer/index.html" TARGET="new2">The <I>&#147;Synthesizer&#148;</I> manual</A> provides a detailed view of the interactive application that allows access to all the parameters and facilities of the Tube Resonance Model (TRM) synthesiser as a learning tool and a development tool. <LI><A HREF="http://www.cpsc.ucalgary.ca/~hill/papers/avios95/index.htm" TARGET="new3">A paper presented at the American Voice I/O Society conference</A> in 1995</A> provides a reasonably detailed explanation of the theory underlying the tube resonance model. <LI><A HREF="http://www.cpsc.ucalgary.ca/~hill/papers/conc/index.htm" TARGET="new4">A heavily cross-referenced &#147;conceptionary&#148;</A> is available to provide access to some of the background terms and research in the relevant scientific fields. <LI><A HREF="http://www.cpsc.ucalgary.ca/~hill/papers/pronguid.htm" TARGET="new5">A guide to the pronunication notation used in the text-to-speech work</A> showing the relationship between standard forms (IPA, Websters) and the ASCII-friendly form used in gnuspeech, with examples of actual pronunciations. <LI><A HREF="./trm-write-up.pdf" TARGET="new6">The Tube Resonance Model</A> a write-up of the waveguide model of the acoustic tubes that form the underlying model of the human vocal apparatus. <LI><A HREF="http://pages.cpsc.ucalgary.ca/~hill/gnuspeech/gnuspeech-index.htm" TARGET="new7" >Additional material, including sound files</A> is also available on Professor Hill's university web site. <LI><A HREF="http://pages.cpsc.ucalgary.ca/~hill/papers/index.htm" TARGET="new8"> Papers</A> related to the research that has led to <I>gnuspeech</I> are also collected on Professor Hill's university web site. These include the development of the &#8220;event-based&#8221; approach to speech synthesis, which is also applicable to speech recognition. </UL> <P> Some examples of the papers of other research that we used in developing <I>gnuspeech</I> include: <UL> <LI>Carré, R. and Mrayati, M. (1992)  Distinctive regions in acoustic tubes. Speech production modelling. <I>J. Acoustique</I> <B>5</B>, 141-151 <LI>Fant, G. & Pauli, S. (1974)  Spatial characteristics of vocal tract resonance models. <I>Proc. Stockholm Speech Communication Seminar</I>, KTH, Stockholm, Sweden. <LI>Smith, J.O. (1992) Physical modelling using digital waveguides. <I>Computer Music Journal</I>, <B>16</B> (4) 74-91 <LI>Cook, P.R. (1989) Synthesis of the singing voice using a physically parameterised model of the human vocal tract. International Computer Music Conference, Columbus, Ohio. <LI>Liberman, A.M., Ingemann, F., Lisker, L., Delattre, P. & Cooper, F.S. (1959) Minimal rules for synthesising speech. J. Acoust. Soc. Amer. <B>31</B> (11), 1490-1499, Nov <LI>Wells, J.C. (1963)  A study of the formants of the pure vowels of British English , <I>Progress report for July</I>, University College, London. </UL> <P> but there are far too many to list and further papers may be found in the citations incorporated in the relevant papers listed on David Hill's <A HREF="http://pages.cpsc.ucalgary.ca/~hill" TARGET="papers">university web site</A>. </P> <H2><A NAME="FindingPackages" HREF="#TOCFindingPackages">Finding packages for gnuspeech</A> </H2> <P> Mac OS/X and GNUStep versions of <I>Monet</I>, <I>Synthesizer</I> and the <I>TextToSpeech Server</I>, including the <I>Parser</I>, dictionary, and other components needed to allow arbitrary English text to be changed to spoken output, are all available in both source form and, for Mac OS X, as a .dmg file. Check out <A HREF="http://savannah.gnu.org/projects/gnuspeech" TARGET="gnu4">the Savannah SVN repository</A> and open the repository by clicking &#8220;-Browse Sources Repository&#8221; under &#8220;Development Tools&#8221; as there is not yet an official &#8220;release&#8221;. Note: this is <I>not</I> the CVS Repository, and <I>not</I> the &#8220;Source Code Manager&#8221; for the Subversion Repository. &#8220;Anonymous&#8221; checkout can then be executed (the SVN commands are conveniently provided on the various pages accessed). The current working revision (as of April 18th 2012 is &#8220;669&#8221;). An official release should be available soon, but will be very similar to this current pre-release software. There are plans to migrate all the material to a new Git repository in the near future. </P> <P> The original <I>NeXT</I> User and Developer Kits are also available, and are complete, but do not run under OS X or under GNUStep on GNU/Linux. They also suffer from the limitations of a slow machine, so that shorter <I>TRM</I> lengths cannot be used. Any password can be selected to activate the <I>NeXT</I> kits from the file &#8220;nextstep / trunk / priv / SerialNumbers&#8221; and choosing a password such as &#8220;bb976d4a&#8221; for User 26 or &#8220;ebe22748&#8221; for Dev 15 from the very large selection provided. In fact, you can use these passwords. But you need a <I>NeXT</I> computer, of course (try <A HREF="http://www.blackholeinc.com">Black Hole, Inc.</A> if you'd like one). </P> <H2><A NAME="Further" HREF="#TOCFurther">Further information?</A></H2> <P> See the section on <A HREF=#Manuals>Manuals and papers</A> </P> <H2><A NAME="YouHelp" HREF="#TOCYouHelp">How to help with gnuspeech</A></H2> <P> To contact the maintainers of gnuspeech, to report a bug, or to contribute fixes or improvements, to join the development team, or to join the <I>gnuspeech</I> mailing list, please visit<A HREF="http://savannah.gnu.org/projects/gnuspeech" TARGET="gnu4"> the <I>gnuspeech</I> project page</A> and use the facilities provided. The mailing list can be accessed under the section &#8220;Communication Tools&#8221;. To help with the project work you can also contact <A HREF="mailto:hilld@ucalgary.ca">Professor David Hill</A> directly. <P> <H2><A NAME="Credits" HREF="#TOCCredits">Thanks to those who have helped</A></H2> <P> Many people have contributed to the work that has resulted in <I>gnuspeech</I>, either directly on the project, or indirectly through relevant research. The latter appear in the citations to the papers referenced above. Of particular note are Perry Cook & Julius Smith (Center for Computer Research in Music and Acoustics) for the waveguide model and the DSP Music Kit), Ren&#233; Carr&#233; (at the Département Signal, &#201;cole Nationale Sup&#233;rieure des T&#233;l&#233;communications in Paris). The original system was created over several years from 1990 to 1995 by the University of Calgary technology-transfer spin-off company <I>Trillium Sound Research Inc.</I> founded by David Hill, Leonard Manzara and Craig Schock at Leonard's suggestion. The work was mainly performed by the following: <UL> <LI>David Hill designed the event-based approach to speech synthesis. He compiled the pronunciation dictionary, following initial work by Adam Rostis, ported <I>Synthesizer</I> to Mac OS X, and ran the project. <LI> Craig Schock designed and developed <I>Monet</I> to create the databases needed to re-implement David Hill's  event-based approach to speech synthesis. He created dictionary creation tools. He wrote <I>WhosOnFirst</I>, the "say" command line tool, the <I>Speech Manager</I>,... and was the project software rchitect. <LI> Leonard Manzara wrote the "C" implementation of the tube model that forms the acoustic core of the synthesis system, and then re-implemented it on the DSP56000 to make it run in real time. He created the original <I>Synthesizer</I> app for the Next. He wrote <I>BigMouth</I> to add speech as a service. <LI> Vince Demarco and David Marwood wrote the original PrEditor. <LI> Eric Zoerner did an initial port of <I>PrEditor</I> to Mac OS X. <LI> Michael Forbes refactored <I>PrEditor</I> <LI> The Savannah hackers set up the original GNU project files, including the CVS repository <LI> Adam Fedor worked through the original <I>NeXTSTEP</I> source code to bring it to OpenStep standards. <LI> Steven Nygard provided the major effort needed to port the original NeXTSTEP version of <I>Monet</I> and related items to Mac OS X, adding to the original CVS repository material in the process. <LI> Dalmazio Brisinda took over from Steven and extended the Mac OS X port of <I>gnuspeech</I> modules, including integrating the parser and migrating all the material to the current SVN repository, reorganizing it to make it easier to manage and access in the process. <LI> Marcelo Matuda worked with Dalmazio to produce the first port to GNU/Linux <I>GNUStep</I> </UL> <HR> <P> Return to <A HREF="http://www.gnu.org">GNU's home page</A>. </P> <P> Please send FSF &amp; GNU inquiries &amp; questions to </P> <P> <A HREF="mailto:gnu@gnu.org"><EM>gnu@gnu.org</EM></A>. </P> <P> David Hill is responsible for writing this <I>gnuspeech</I> page. Thanks to Steve Nygard for his helpful criticisms </P> <P> Please send comments on these web pages to <A HREF="mailto:webmasters@www.gnu.org"><EM>webmasters@www.gnu.org</EM></A>, send other questions to <A HREF="mailto:gnu@gnu.org"><EM>gnu@gnu.org</EM></A>. </P> <P> Copyright (C) 1998, 2001 Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111, USA </P> <P> Verbatim copying and distribution of this article <I>in its entirety</I> is permitted in any medium, provided this copyright notice is preserved. </P><P> <HR> <P> Page originally created in the mists of time (2004?) </P> <address></address> <!-- hhmts start -->Last modified: Sat May 12 18:48:31 PDT 2012 <!-- hhmts end --> </body> </html>