What are the ways to implement speech recognition in Electron? - javascript

So I have an Electron app that uses the web speech API (SpeechRecognition) to take the user's voice, however, it's not working. The code:
if ("webkitSpeechRecognition" in window) {
let SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
let recognition = new SpeechRecognition();
recognition.onstart = () => {
console.log("We are listening. Try speaking into the microphone.");
};
recognition.onspeechend = () => {
recognition.stop();
};
recognition.onresult = (event) => {
let transcript = event.results[0][0].transcript;
console.log(transcript);
};
recognition.start();
} else {
alert("Browser not supported.")
}
It says We are listening... in the console, but no matter what you say, it doesn't give an output. On the other hand, running the exact same thing in Google Chrome works and whatever I say gets console logged out with the console.log(transcript); part. I did some more research and it turns out that Google has recently stopped support for the Web Speech API in shell-based Chromium windows (Tmk, everything that is not Google Chrome or MS Edge), so that seems to be the reason it is not working on my Electron app.
See: electron-speech library's end Artyom.js issue another stackOverflow question regarding this
So is there any way I can get it to work in Electron?

I ended up doing an implementation that uses the media devices API to get the user's speech through their microphone and then sends it to a Python server using WebSockets which uses the audio stream with the SpeechRecognition pip package and returns the transcribed text to the client (Electron app).
This is what I implemented, it is way too long for a thing as simple as this, but if someone has a better suggestion, please do let me know by writing an answer.

Related

Web SpeechRecognition API- workaround for mobile devices not supporting continuous listening

I'm trying to use the SpeechRecognition API to create a speech-to-text interface.
Here's my configuration:
const recognition = new SpeechRecognition();
recognition.continuous = true;
recognition.lang = 'en-US';
recognition.onresult = (event) => {...}
recognition.start();
It works great on desktop, but it seems that most mobile devices don't support continuous listening, and when trying to record from them the microphone turns on and off constantly.
I tried to work around it by immediately starting it again after the onend event:
recognition.onend= (event) => {recognition.start();}
But the problem is that onend fires after the microphone has been turned off, which makes constant beeping sounds on some devices, and problems fully recognizing the speech.
Is there any way to intervene before the microphone is turned off?

how to request camera and microphone access again using getUserMedia() after being denied

how to request camera and microphone access again using getUserMedia() after being denied?
Is there any function that can reset browser settings to re-prompt again specifically for edge browser because I'm using Tauri to build a video conference desktop app that is based on webview2 similar to edge browser but there is no way to undo that if the user denies for the first time after that there is no going back and the app will be useless
const getLocalPreview = async () => {
try {
const stream = await navigator.mediaDevices.getUserMedia({video:true,audio:true});
return stream;
} catch (error) {
//this is when user don't allow media devices
console.log(error);
}
};
There currently is no api for this, see https://github.com/tauri-apps/tauri/issues/4434#issuecomment-1209259672
The problem is that solutions are blocked by https://github.com/MicrosoftEdge/WebView2Feedback/issues/2427 and by extension https://github.com/MicrosoftEdge/WebView2Feedback/issues/2672
Until upstream support is available, the only solution to this is to edit/remove the WebView2 files in C:\Users\<user-name>\AppData\Local\<your-app-bundle-identifieer>\EBWebView\ directly, specifically the \EBWebView\Default\Preferences file. Unfortunately you have to do it in Rust while no WebView2 Window is open. Unless just telling your users to do it themselves works for you...

Speech Recognition API in Microsoft Edge (Not defined)

I have been attempting to use the SpeechRecognition API(https://wicg.github.io/speech-api/#examples-recognition) in a recent project.
I am currently using the browser Microsoft edge and according to https://caniuse.com/#feat=speech-recognition it is only partially supported on there.
From the looks of it, it seems that the "text to speech" feature is supported (SpeechSynthesis) on Edge but not the Speech Recognition feature. As no matter what prefix I use for the SpeechRecognition (Speech to text) API in EDGE it always does not recognise it and says it "is not defined"
Anyone have any clarity on this situation, or know how to get the Speech Recognition to work with edge in JavaScript?
Cheers
UPDATE: As of 1/18/2022 the Speech Recognition part of the JavaScript Web Speech API seems to be working in Edge Chromium. Microsoft seems to be experimenting with it in Edge. It is automatically adding punctuation and there seems to be no way to disable auto punctuation. I'm not sure about all the languages it supports. But it seems to be working so far in English, Spanish, German, French, Chinese Simplified and Japanese. I'm leaving the information below for history.
As of 6/4/2020 Edge Chromium does not really support the Speech Recognition part of the Web Speech API. Microsoft seems to be working on it for Edge Chromium. It will probably never work for Edge Legacy (non-Chromium).
developer.microsoft.com says incorrectly that it is "Supported" but also says, "Working draft or equivalent". (UPDATE: As of 2/18/2021 it now says: "NOT SUPPORTED")
developer.mozilla.org compatibility table also incorrectly says that it is supported in Edge.
caniuse correctly shows that it is not supported in Edge Chromium even though it acts like it is but the proper events are not fired.
The only other browsers besides Chrome and Chromium that I have seen the Speech Recognition part of the Web Speech API work with is Brave and Yandex. Yandex probably connects to a server in Russia to process the speech recognition. It does not do a good job. At least in English. At the moment Brave is returning a "Network" error. According to this github Brave discussion Brave would have to pay Google in order to get the speech to text service.
Here is some quick code that can be used to test if Speech Recognition works in a browser and display all the errors and events in the body. It only works with https protocol. It does not seem to work with codepen or jsfiddle.
var msg = document.body;
var cr = "<br />";
var event_list = ["onaudioend", "onaudiostart", "onend", "onerror", "onnomatch", "onresult", "onsoundend", "onsoundstart", "onspeechend", "onspeechstart", "onstart"];
var sr = window.SpeechRecognition || window.webkitSpeechRecognition || false;
if (sr) {
var recognition = new sr();
event_list.forEach(function(e) {
recognition[e] = function() {
console.log(event);
var txt = event.type + ": ";
if (event.results) txt += event.results[0][0].transcript;
if (event.error) txt += event.error; // "not-allowed" usually is because of not using secure https protocol
if (event.type == "end")
recognition.start(); // Start Recognition again
msg.innerHTML += txt + cr;
};
});
recognition.start();
}
else {
msg.innerHTML += "This browser does not support SpeechRecognition or webkitSpeechRecognition." + cr;
}

Web Speech API lag time -- how fix?

I am using the Web Speech API in a web page. However, I am finding that it takes 3-5 seconds to bring back a result, which is a lot of lag time in today's web world. Has anyone else had this problem? Has anyone found a solution?
Here's the barebones of what I have so far -- it works, per se, but I need it to be faster.
var recognition = new webkitSpeechRecognition();
recognition.lang = 'en-US';
recognition.onresult = function(evt) {
console.log(evt);
};
recognition.start();
Assuming you don't a network speed issue, using
recognition.continuous = false;
will get a faster result (with some caveats)
see my detailed answer here

RTCIceCandidate instance cannot be created in browsers on moblie devices

I'm recently trying some awesome features of HTML5 and WebRTC, and am building a site to allow multiple people video chat.
Everything works just fine on my PC and the Media Capture of HTML5 works like a charm. But when I set up a video source on my PC, and try to connect to it via my android/iphone/ipad, it just did not work. I checked the logs, it suggests that the creation of RTCIceCandidate failed for some unknown reason:
// To be processed as either Client or Server
case "CANDIDATE":
trace("************important*********", "we get in");
var candidate = new RTCIceCandidate({candidate: msg.candidate});
trace("************important*********", JSON.stringify(candidate));
break;
turns out the second log never shows up.
Anyone has any idea? Is is because such features are not available on mobile devices for now? Or should I do something specially for mobile devices?
oh and this is the callback of IceCandidatem which is never called:
// This function sends candidates to the remote peer, via the node server
var onIceCandidate = function(event) {
if (event.candidate) {
trace("openChannel","Sending ICE candidate to remote peer : " + event.candidate.candidate);
var msgCANDIDATE = {};
msgCANDIDATE.msg_type = 'CANDIDATE';
msgCANDIDATE.candidate = event.candidate.candidate;
msgCANDIDATE.peer = server;
msgCANDIDATE.me = weAreActingAs;
//trace("openChannel","candidate peer : " + JSON.stringify(event));
socket.send(JSON.stringify(msgCANDIDATE));
} else {
trace("onIceCandidate","End of candidates");
}
}
The server is in nodejs.
Thanks so much guys! Need your hands!
You should be able to test device support here: http://www.simpl.info/getusermedia/
I'm no expert on webrtc but according to the following site there should be supported for IOS and Android: http://updates.html5rocks.com/2012/12/WebRTC-hits-Firefox-Android-and-iOS but you'll need to use the ericsson browser
In one of the comments it does say that ericsson browser uses the depreciated ROAP signaling and can't be used in peer communication with (for example) Chrome. One comment states that blackbarry native browser now supports getUserMedia so maybe Android and iOS will follow. No native support at the moment though. And ericsson browser implementation seems to be based on depreciated standards.

Categories