Subject: Re: connection between lisp and speech recognition?
From: rpw3@rpw3.org (Rob Warnock)
Date: Fri, 16 Jan 2004 07:11:57 -0600
Newsgroups: comp.lang.lisp
Message-ID: <Jrmdnea2pIOAfprdRVn-hA@speakeasy.net>
Richard Fateman  <fateman@cs.berkeley.edu> wrote:
+---------------
| I'm trying to build a clean way of allowing a lisp program to
| listen to voice input...
+---------------

As much as it pains me to suggest it, look at some of the "Voice XML"
vendors, such as Nuance, VoiceGenie, SpeechWorks, TellMe, LumenVox, etc.
They basically make boxes which you can pre-load with a URL, and when a
call comes in the box makes an HTTP "GET" request to that URL (possibly
providing some query parameters, depending on the box and the application).
The HTTP server -- which could easily be a Lisp-based web server -- replies
with a script (written in either the "Voice XML" scripting language or some
proprietary scripting language) that tells the box what words (grammar) to
expect and how to proceed with the call (i.e., what state transitions to
make and the various URLs to "GET" or "POST" with the results of each
state transition).

TellMe and others even allow you limited free access to a unit if you
register as a developer. You point them to a URL on your server, and
they assign you a telephone number at their site. Then when anyone[1]
calls that number, their VMXL box does a "GET" from your URL across the
public Internet, and you're off and debugging...

VoiceGenie [I think] will alternatively sell you software that runs on
*your* platforms and takes raw PCM audio streams in over the 'Net (from
codecs co-located with your server or even somewhere else) and does the
voice-recognition function and then plays the same VXML game with your
HTTP server (which can also be either co-located or somewhere else).

From a programming languages point of view, VXML (and the related vendor-
proprietary languages) is (are) a horrible hack, but there are *LOTS* of
people deploying voice-based interactive applications out there these days
using VXML (and not just in simple emulation of tradition touch-tone menu
input, either)...

Anyway, my point is simply that hooking Common Lisp to a VXML box should
be really straightforward...


-Rob

[1] Normally you're the only one who calls it, at least during initial
    development, but once it's sort of running you could tell a small
    group of others whom you wanted to try out your application.

-----
Rob Warnock			<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607