Passa al contingut principal

Voice Driven Web Apps: Introduction to the Web Speech API

The new JavaScript Web Speech API makes it easy to add speech recognition to your web pages. This API allows fine control and flexibility over the speech recognition capabilities in Chrome version 25 and later. Here's an example with the recognized text appearing almost immediately while speaking.



DEMO / SOURCE


Let’s take a look under the hood. First we check to see if the browser supports the Web Speech API by checking if the webkitSpeechRecognition object exists. If not, we suggest the user upgrades his browser. (Since the API is still experimental, it's currently vendor prefixed.) Lastly, we create the webkitSpeechRecognition object which provides the speech interface, and set some of its attributes and event handlers.



if (!('webkitSpeechRecognition' in window)) {
upgrade();
} else {
var recognition = new webkitSpeechRecognition();
recognition.continuous = true;
recognition.interimResults = true;

recognition.onstart = function() { ... }
recognition.onresult = function(event) { ... }
recognition.onerror = function(event) { ... }
recognition.onend = function() { ... }
...


The default value for continuous is false, meaning that when the user stops talking, speech recognition will end. This mode is great for simple text like short input fields. In this demo, we set it to true, so that recognition will continue even if the user pauses while speaking.


The default value for interimResults is false, meaning that the only results returned by the recognizer are final and will not change. The demo sets it to true so we get early, interim results that may change. Watch the demo carefully, the grey text is the text that is interim and does sometimes change, whereas the black text are responses from the recognizer that are marked final and will not change.


To get started, the user clicks on the microphone button, which triggers this code:



function startButton(event) {
...
final_transcript = '';
recognition.lang = select_dialect.value;
recognition.start();


We set the spoken language for the speech recognizer "lang" to the BCP-47 value that the user has selected via the selection drop-down list, for example “en-US” for English-United States. If this is not set, it defaults to the lang of the HTML document root element and hierarchy. Chrome speech recognition supports numerous languages (see the “langs” table in the demo source), as well as some right-to-left languages that are not included in this demo, such as he-IL and ar-EG.


After setting the language, we call recognition.start() to activate the speech recognizer. Once it begins capturing audio, it calls the onstart event handler, and then for each new set of results, it calls the onresult event handler.



recognition.onresult = function(event) {
var interim_transcript = '';

for (var i = event.resultIndex; i < event.results.length; ++i) {
if (event.results[i].isFinal) {
final_transcript += event.results[i][0].transcript;
} else {
interim_transcript += event.results[i][0].transcript;
}
}
final_transcript = capitalize(final_transcript);
final_span.innerHTML = linebreak(final_transcript);
interim_span.innerHTML = linebreak(interim_transcript);
};
}


This handler concatenates all the results received so far into two strings: final_transcript and interim_transcript. The resulting strings may include "\n", such as when the user speaks “new paragraph”, so we use the linebreak function to convert these to HTML tags <br> or <p>. Finally it sets these strings as the innerHTML of their corresponding <span> elements: final_span which is styled with black text, and interim_span which is styled with gray text.


interim_transcript is a local variable, and is completely rebuilt each time this event is called because it’s possible that all interim results have changed since the last onresult event. We could do the same for final_transcript simply by starting the for loop at 0. However, because final text never changes, we’ve made the code here a bit more efficient by making final_transcript a global, so that this event can start the for loop at event.resultIndex and only append any new final text.


That’s it! The rest of the code is there just to make everything look pretty. It maintains state, shows the user some informative messages, and swaps the GIF image on the microphone button between the static microphone, the mic-slash image, and mic-animate with the pulsating red dot.


The mic-slash image is shown when recognition.start() is called, and then replaced with mic-animate when onstart fires. Typically this happens so quickly that the slash is not noticeable, but the first time speech recognition is used, Chrome needs to ask the user for permission to use the microphone, in which case onstart only fires when and if the user allows permission. Pages hosted on HTTPS do not need to ask repeatedly for permission, whereas HTTP hosted pages do.


So make your web pages come alive by enabling them to listen to your users!


We’d love to hear your feedback...







via HTML5Rocks http://updates.html5rocks.com/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+html5rocks+%28HTML5+Rocks%29

Comentaris

Entrades populars d'aquest blog

Learn Composition from the Photography of Henri Cartier-Bresson

“Do you see it?” This question is a photographic mantra. Myron Barnstone , my mentor, repeats this question every day with the hopes that we do “see it.” This obvious question reminds me that even though I have seen Cartier-Bresson’s prints and read his books, there are major parts of his work which remain hidden from public view. Beneath the surface of perfectly timed snap shots is a design sensibility that is rarely challenged by contemporary photographers. Henri Cartier-Bresson. © Martine Franck Words To Know 1:1.5 Ratio: The 35mm negative measures 36mm x 24mm. Mathematically it can be reduced to a 3:2 ratio. Reduced even further it will be referred to as the 1:1.5 Ratio or the 1.5 Rectangle. Eyes: The frame of an image is created by two vertical lines and two horizontal lines. The intersection of these lines is called an eye. The four corners of a negative can be called the “eyes.” This is extremely important because the diagonals connecting these lines will form the breakdown ...

El meu editor de codi preferit el 2024, que això ja se sap que va canviant 😄

Visual Code Visual Code és un editor de codi font lleuger, però potent que s’executa al teu escriptori i està disponible per a Windows, macOS i Linux. Compta amb suport integrat per a JavaScript, TypeScript i Node.js i té un ric ecosistema d’extensions per a altres llenguatges i entorns d’execució (com C++, C#, Java, Python, PHP, Go, .NET).  És una eina ideal per a desenvolupar i depurar aplicacions web i en el núvol. Per què Visual Code? Visual Code té molts avantatges com a editor de codi font, com per exemple: És gratuït, ràpid i fàcil d’instal·lar i actualitzar. Té un ampli ecosistema d’extensions que et permeten afegir funcionalitats i personalitzar la teva experiència de desenvolupament. Té un suport integrat per a molts llenguatges i entorns d’execució, i et permet depurar i executar el teu codi des del mateix editor. Té una interfície senzilla i elegant, amb diferents temes i modes de visualització. Té un sistema de sincronització de configuracions que et permet guardar les...

Las Mejores Aplicaciones Gratis para iPad de 2012

Las Mejores Aplicaciones Gratis para iPad de 2012 : ¿No tienes ni un duro? No te preocupes, pues hoy os traemos una extensa selección de las mejores apps gratuitas que puedes conseguir en la App Store para que llenes tu iPad de calidad, sin gastar nada de nada.   ¿Estás buscando juegos o apps gratis para tu iPad? En la App Store hay más de 500,000 apps y juegos, y una gran cantidad de ellos está disponible de forma totalmente gratuita. Aquí vamos con la selección de las mejores Apps gratis para iPad (todos los modelos), organizada por categoría. ¿Estás preparado? Las Mejores Apps Gratis de Redes Sociales para iPad Nombre Facebook Gratis Categoría Redes sociales Facebook es la red social más famosa del mundo , con casi mil millones de usuarios. Su app para iPad ha tardado, pero aquí está. Nombre Twitter Gratis Categoría Redes sociales Twitter es la red de microblogging por excelencia. La forma más rápida y directa de informar y mantenerse informado de las cosa...