Grammar in Google Web Speech API - javascript

Can I improve Google speech API recognition by give him a words list (in my case the request of user is very predictable) to make recognition more accurate?

Correct answer is: no, you can't. =(

I can't speak for Chrome, but in Android they are quite clear that you cannot provide a grammar. In Android speech recognition you are limited to a choice of two models: "free form" and "web search".
See Android: Speech Recognition Append Dictionary?

For Google Cloud Speech API (Not Web Speech API) but some may find this useful:
Although currently in beta, Google has released new capability which allows you to
include a list of phrases to act as "hints" to Cloud Speech-to-Text. Providing these hints, a technique called speech adaptation, helps Speech-to-Text API to recognize the specified phrases from your audio data."
See https://cloud.google.com/speech-to-text/docs/context-strength

Related

Web Speech API filter audio input

I am using Web Speech API for a chrome extension. Link
What I want to do is filter the audio being sent to the recognition instance. Does anyone know of a way to do this? The API really only gives you controls to the output of the recognition.

why javascript Speech Recognition api is not working without internet?

I was working with javascript speech recognition api(new webkitSpeechRecognition()) and i amazed why it is not working without internet since it is javascript code so it should work offline
I checked the network section of chrome developer tools, it is even not making request to internet
On Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline.
Looking at https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition:
SpeechRecognition.serviceURI Specifies the location of the speech recognition service used by the current SpeechRecognition to
handle the actual recognition. The default is the user agent's default
speech service.
The actual recognition is done by a 3rd party server.I assume the task of speech recognition currently is just too much for a browser to cope with on it's own or requires a big database.

Is it possible and advisable to call Google Cloud Speech APIs directly from browsers, including Safari?

I am starting to explore Google Cloud Speech APIs.
I have read that
"Speech API supports any device that can send a REST request"
Therefore I am thinking that potentially I could call such APIs from any Browser (both on laptops and on mobile devices). Specifically I am interested in scenarios where the APIs are used to translate "voice" to text. I am figuring out something like the following:
the user records his/her voice and stream it to the API
the API transform it to text which is sent back to the browser
the browser takes actions using the text received (e.g. saves the
text on a back end DB)
I have searched a bit, collected some information, but I have some big areas of doubt which I would like to clear before actually moving along this path
Is it possible and simple to call Google Cloud APIs directly from
the browser, i.e. using Javascript? The doubt comes from the fact
that the documentation shows nodejs examples but not pure
javascript ones
Can this scenario be implemented using Safari (both on desktop and
on mobile)? The doubt comes from the fact that all the searches I have made so far point to pages where I read that Safari does not support Audio recording (i.e. the
getUserMedia API of HTML5)
Any direction on these points will be very much appreciated.
From iOS11, Apple has added supporting the getUserMedia API.
You can find out more here.
Update
Streaming Speech Recognition is a potential solution for streaming audio (https://cloud.google.com/speech/docs/streaming-recognize)

Use of Web Speech API in commercial application

I want to create a commercial web application based on speech recognition. I have found the Web Speech API (https://w3c.github.io/speech-api/) currently only supported by Chrome.
Can I use this API for free for my commercial application? Is there a limit on the number of uses per day, or a free quota that I must not exceed?
From https://lists.w3.org/Archives/Public/public-speech-api/2013Jul/0001.html
To clarify, commercial apps that use the Web Speech
API https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html can
be used on browsers that support it, such as Chrome (for complete
details, see the Terms of Service in the About menu).
References:
https://groups.google.com/a/chromium.org/g/chromium-html5/c/31rlwXxzQGs/m/BbeSI_waCQAJ
More information can be searched here:
https://groups.google.com/a/chromium.org/g/chromium-html5/search?q=web%20speech%20api

Speech recognition and getUserMedia

I'm building a web application and plan on using both speechRecognition and navigator.getUserMedia for audio input.
I noticed that my desktop browser (Chrome on Mac, v. 31.0.1650.63) asks twice for permission to use the microphone. While this may be a little bit annoying for the user, both voice recognition and audio input seem to work.
However, if I open the same page on Android (Nexus 7, Android v4.4.2; Chrome v31.0.1650.59), it asks twice for permission to use my microphone, but I can only use one of the two (whichever was started first). Sometimes, I also get a speech recognition error: "not-allowed" error, even though I gave permission to access the microphone.
I made a jsFiddle, here: http://jsfiddle.net/5xBpW/
My question is: Is there a way to perform speech recognition on an input stream? Or is there any other way to have both functionalities work on Chrome for Android?
Have you considered other tools? There is an exciting new tool / product from Nuance (founded by Ray K, now head of Google Engineering) that translates voice data into actions using proprietary learning algorithms eg machine intelligence.
This tool understands context and can apply that to specific actions so the user doesn't have to use an exact phrase:
https://developer.nuance.com/public/index.php?task=mix
Tour: https://developer.nuance.com/views/templates/mix/howDoesMixWork/phone/index.html
The downside is that you are relying on a third party, but since the API you are looking at is also experimental this could be of interest.

Categories