I was working with javascript speech recognition api(new webkitSpeechRecognition()) and i amazed why it is not working without internet since it is javascript code so it should work offline
I checked the network section of chrome developer tools, it is even not making request to internet
On Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline.
Looking at https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition:
SpeechRecognition.serviceURI Specifies the location of the speech recognition service used by the current SpeechRecognition to
handle the actual recognition. The default is the user agent's default
speech service.
The actual recognition is done by a 3rd party server.I assume the task of speech recognition currently is just too much for a browser to cope with on it's own or requires a big database.
Related
I want to make use of W3C Web Speech API, which already have a demo at https://www.google.com/intl/en/chrome/demos/speech.html.
I want to enhance it as a continuous speech recognition service and hopefully it can be run from a local html file on an android phone which can have features added to its currently demo. Now, I find that the local file cannot access the microphone when running in Chrome or android webview.
I know little how to setup this API. For example, what files I need to install in order to make use of this API?
Also, does this recognition service needed to be paid?
I want to use the Google Speech Recognition API Google Speech API Reference into a Webpage by sending the request directly from the browser to Google.
Test requests using base64 encodes sample files already worked.
I am trying to use https://github.com/higuma/ogg-vorbis-encoder-js to record a ogg and send it to the API.
The request itself is done by a simple ajax request.
Did anybody implement it already in a web-browser (I also need iOS support, which should be possible since Safari recently updated and now supports recording)?
At the moment i am a bit stuck since the api just answers "{}"
I am starting to explore Google Cloud Speech APIs.
I have read that
"Speech API supports any device that can send a REST request"
Therefore I am thinking that potentially I could call such APIs from any Browser (both on laptops and on mobile devices). Specifically I am interested in scenarios where the APIs are used to translate "voice" to text. I am figuring out something like the following:
the user records his/her voice and stream it to the API
the API transform it to text which is sent back to the browser
the browser takes actions using the text received (e.g. saves the
text on a back end DB)
I have searched a bit, collected some information, but I have some big areas of doubt which I would like to clear before actually moving along this path
Is it possible and simple to call Google Cloud APIs directly from
the browser, i.e. using Javascript? The doubt comes from the fact
that the documentation shows nodejs examples but not pure
javascript ones
Can this scenario be implemented using Safari (both on desktop and
on mobile)? The doubt comes from the fact that all the searches I have made so far point to pages where I read that Safari does not support Audio recording (i.e. the
getUserMedia API of HTML5)
Any direction on these points will be very much appreciated.
From iOS11, Apple has added supporting the getUserMedia API.
You can find out more here.
Update
Streaming Speech Recognition is a potential solution for streaming audio (https://cloud.google.com/speech/docs/streaming-recognize)
I'm building a web application and plan on using both speechRecognition and navigator.getUserMedia for audio input.
I noticed that my desktop browser (Chrome on Mac, v. 31.0.1650.63) asks twice for permission to use the microphone. While this may be a little bit annoying for the user, both voice recognition and audio input seem to work.
However, if I open the same page on Android (Nexus 7, Android v4.4.2; Chrome v31.0.1650.59), it asks twice for permission to use my microphone, but I can only use one of the two (whichever was started first). Sometimes, I also get a speech recognition error: "not-allowed" error, even though I gave permission to access the microphone.
I made a jsFiddle, here: http://jsfiddle.net/5xBpW/
My question is: Is there a way to perform speech recognition on an input stream? Or is there any other way to have both functionalities work on Chrome for Android?
Have you considered other tools? There is an exciting new tool / product from Nuance (founded by Ray K, now head of Google Engineering) that translates voice data into actions using proprietary learning algorithms eg machine intelligence.
This tool understands context and can apply that to specific actions so the user doesn't have to use an exact phrase:
https://developer.nuance.com/public/index.php?task=mix
Tour: https://developer.nuance.com/views/templates/mix/howDoesMixWork/phone/index.html
The downside is that you are relying on a third party, but since the API you are looking at is also experimental this could be of interest.
Can I improve Google speech API recognition by give him a words list (in my case the request of user is very predictable) to make recognition more accurate?
Correct answer is: no, you can't. =(
I can't speak for Chrome, but in Android they are quite clear that you cannot provide a grammar. In Android speech recognition you are limited to a choice of two models: "free form" and "web search".
See Android: Speech Recognition Append Dictionary?
For Google Cloud Speech API (Not Web Speech API) but some may find this useful:
Although currently in beta, Google has released new capability which allows you to
include a list of phrases to act as "hints" to Cloud Speech-to-Text. Providing these hints, a technique called speech adaptation, helps Speech-to-Text API to recognize the specified phrases from your audio data."
See https://cloud.google.com/speech-to-text/docs/context-strength