I'm building a web application and plan on using both speechRecognition and navigator.getUserMedia for audio input.
I noticed that my desktop browser (Chrome on Mac, v. 31.0.1650.63) asks twice for permission to use the microphone. While this may be a little bit annoying for the user, both voice recognition and audio input seem to work.
However, if I open the same page on Android (Nexus 7, Android v4.4.2; Chrome v31.0.1650.59), it asks twice for permission to use my microphone, but I can only use one of the two (whichever was started first). Sometimes, I also get a speech recognition error: "not-allowed" error, even though I gave permission to access the microphone.
I made a jsFiddle, here: http://jsfiddle.net/5xBpW/
My question is: Is there a way to perform speech recognition on an input stream? Or is there any other way to have both functionalities work on Chrome for Android?
Have you considered other tools? There is an exciting new tool / product from Nuance (founded by Ray K, now head of Google Engineering) that translates voice data into actions using proprietary learning algorithms eg machine intelligence.
This tool understands context and can apply that to specific actions so the user doesn't have to use an exact phrase:
https://developer.nuance.com/public/index.php?task=mix
Tour: https://developer.nuance.com/views/templates/mix/howDoesMixWork/phone/index.html
The downside is that you are relying on a third party, but since the API you are looking at is also experimental this could be of interest.
Related
Some browsers (mobile Mi Browser, for instance) don't support WebRTC - they have no RTCPeerConnection API. So the users of your WebRTC web app have to open it in another one.
Is there a way to make your WebRTC app work without an explicit browser-change action from the user, especially on a mobile device?
I tried to investigate the following:
Deep Link. Looks like we can't redirect the user to another browser using deep link (I haven't found Chrome deep link for mobile).
Send WebRTC sources to browser / use third-party WebRTC lib. This won't work either, you need WebRTC support in the browser source code.
WebRTC is a framework based on a set of standards. It includes not only capability to get information about user input/output devices, but also set of network protocols which based on UDP (from getting client's IP to transfer arbitrary data through data channel using SCTP protocol). So, as you already may guess, it's impossible to support in a browser which doesn't have it, this is why point (2) will not work.
In case of point (1 - Open Chrome). On iOS exists custom protocol to open URL in chrome "googlechromes://stackoverflow.com", but it's better to explicitly say user that current browser doesn't support required functionality; And provide links on list of popular browsers for download (Chrome, Firefox, etc.); And already on these websites user will be redirected to the proper store for downloading native app.
I was working with javascript speech recognition api(new webkitSpeechRecognition()) and i amazed why it is not working without internet since it is javascript code so it should work offline
I checked the network section of chrome developer tools, it is even not making request to internet
On Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline.
Looking at https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition:
SpeechRecognition.serviceURI Specifies the location of the speech recognition service used by the current SpeechRecognition to
handle the actual recognition. The default is the user agent's default
speech service.
The actual recognition is done by a 3rd party server.I assume the task of speech recognition currently is just too much for a browser to cope with on it's own or requires a big database.
I am starting to explore Google Cloud Speech APIs.
I have read that
"Speech API supports any device that can send a REST request"
Therefore I am thinking that potentially I could call such APIs from any Browser (both on laptops and on mobile devices). Specifically I am interested in scenarios where the APIs are used to translate "voice" to text. I am figuring out something like the following:
the user records his/her voice and stream it to the API
the API transform it to text which is sent back to the browser
the browser takes actions using the text received (e.g. saves the
text on a back end DB)
I have searched a bit, collected some information, but I have some big areas of doubt which I would like to clear before actually moving along this path
Is it possible and simple to call Google Cloud APIs directly from
the browser, i.e. using Javascript? The doubt comes from the fact
that the documentation shows nodejs examples but not pure
javascript ones
Can this scenario be implemented using Safari (both on desktop and
on mobile)? The doubt comes from the fact that all the searches I have made so far point to pages where I read that Safari does not support Audio recording (i.e. the
getUserMedia API of HTML5)
Any direction on these points will be very much appreciated.
From iOS11, Apple has added supporting the getUserMedia API.
You can find out more here.
Update
Streaming Speech Recognition is a potential solution for streaming audio (https://cloud.google.com/speech/docs/streaming-recognize)
Getting access to the user's microphone through navigator.getUserMedia is pretty easy. But what if I'm using a mobile browser and want to pick up audio from a distance, like with a "speakerphone" mode?
How would I go about achieving this? There seem to be native apps that can achieve this, but what about Web Audio?
The purpose of this is for sending messages between devices using DTMF. I have already achieved this with my laptop because its microphone can record surrounding audio from a great distance, but any mobile phone I have access to seems to only record audio near the "mouthpiece" and so I have to hold the phone extremely close to the source speaker for even a slight chance of having a message received. This defeats the purpose unless I can get the mobile microphone to pick up audio from a distance.
EDIT: By distance, I mean greater than a few feet, as opposed to mere centimeters. Ambient sounds, as opposed to sound localized next to the microphone.
I am answering my own question here. Thanks to everyone who helped out, though none of the actual answers posted here were satisfactory, IMO.
On newer versions of Chrome, navigator.mediaDevices has a function called enumerateDevices which can be used to list available hardware devices, including microphones. As it turns out, this does return a "speakerphone" device on my Android phone. So, if you have a device where you suspect that speakerphone isn't set as the default browser microphone, and you(or your user) is on Chrome version 47 or above, you can use the device IDs returned by enumerateDevices to specify a specific microphone.
So, if your user chooses an option in a select element for a specific microphone device, you would take the ID for that device and pass it to getUserMedia.
navigator.getUserMedia({ audio: {deviceId: {exact: <insert device uuid here>}} }, callback)
Note that, as of this posting, the enumerateDevices API is only available on Chrome. In any other browser or web view, this probably won't work.
If the volume of the microphone happens to be too low for your application, you can increase it by creating a gainNode for your AudioContext.
volume = context.createGain()
volume.gain.value = 3 // raises the volume to 300%
audioInput = context.createMediaStreamSource(e)
audioInput.connect(volume)
Or, if you are dealing with raw samples, you can literally multiply each sample by a number before passing them to whatever function you are using to process them.
This cannot be done as it's directly related to the hardware of the device. If the device hardware (microphone) cannot pick up sounds from meters away, then there's nothing that can be done.
Two years ago, I implemented a webrtc (using google example) that works on mobile web browser, and the sound is captured with ambience levels. I really didn't a deep code analysis of google libraries but maybe you can start here.
In web RTC, at least using Chrome, it's possible to make a screen capture in order to stream it. It's accomplished by using the experimental chromeMediaSource constraint.
I would like to do the same but capturing only audio in order to be able to send it to a webpage. I mean, I would like to capture not the micro but the audio 'played' by my machine in order to send it to a website.
Is there such constraint in web RTC? If the answer is 'yes' is there a Firefox equivalent?
You may want to look at the MediaStream Recording API, which has been implemented in Firefox Nightly and has an Intent to Implement for Chrome.
I've put a demo at simpl.info/mediarecorder.
With Web Audio, try RecorderJS: there's a demo at webaudiodemos.appspot.com/AudioRecorder.