How to capture generated audio from window.speechSynthesis.speak() call?

How to capture generated audio from window.speechSynthesis.speak() call? - javascript

Previous questions have presented this same or similar inquiry
Can Web Speech API used in conjunction with Web Audio API?
How to access audio result from Speech Synthesis API?
Record HTML5 SpeechSynthesisUtterance generated speech to file
generate audio file with W3C Web Speech API
yet no workarounds appear to be have been created using window.speechSynthesis(). Though there are workarounds using epeak , meSpeak How to create or convert text to audio at chromium browser? or making requests to external servers.
How to capture and record audio output of window.speechSynthesis.speak() call and return result as a Blob, ArrayBuffer, AudioBuffer or other object type?

The Web Speech API Specification does not presently provide a means or hint on how to achieve returning or capturing and recording audio output of window.speechSynthesis.speak() call.
See also
MediaStream, ArrayBuffer, Blob audio result from speak() for recording?
Re: MediaStream, ArrayBuffer, Blob audio result from speak() for recording?
Re: MediaStream, ArrayBuffer, Blob audio result from speak() for recording?. In pertinent part, use cases include, but are not limited to
Persons who have issues speaking; i.e.g., persons whom have suffered a
stroke or other communication inhibiting afflictions. They could convert
text to an audio file and send the file to another individual or group.
This feature would go towards helping them communicate with other persons,
similar to the technologies which assist Stephen Hawking communicate;
Presently, the only person who can hear the audio output is the person
in front of the browser; in essence, not utilizing the full potential of
the text to speech functionality. The audio result can be used as an
attachment within an email; media stream; chat system; or other
communication application. That is, control over the generated audio output;
Another application would be to provide a free, libre, open source audio
dictionary and translation service - client to client and client to server,
server to client.
It is possible to capture the output of audio output of window.speechSynthesis.speak() call utilizing navigator.mediaDevices.getUserMedia() and MediaRecorder(). The expected result is returned at Chromium browser. Implementation at Firefox has issues. Select Monitor of Built-in Audio Analog Stereo at navigator.mediaDevices.getUserMedia() prompt.
The workaround is cumbersome. We should be able to get generated audio, at least as a Blob, without navigator.mediaDevices.getUserMedia() and MediaRecorder().
More interest is evidently necessary by users of browsers, JavaScript and C++ developers, browser implementers and specification authors for further input; to create a proper specification for the feature, and consistent implementation at browsers' source code; see How to implement option to return Blob, ArrayBuffer, or AudioBuffer from window.speechSynthesis.speak() call.
At Chromium a speech dispatcher program should be installed and the instance launched with --enable-speech-dispatcher flag set, as window.speechSynthesis.getVoices() returns an empty array, see How to use Web Speech API at chromium?.
Proof of concept
// SpeechSynthesisRecorder.js guest271314 6-17-2017
// Motivation: Get audio output from `window.speechSynthesis.speak()` call
// as `ArrayBuffer`, `AudioBuffer`, `Blob`, `MediaSource`, `MediaStream`, `ReadableStream`, or other object or data types
// See https://lists.w3.org/Archives/Public/public-speech-api/2017Jun/0000.html
// https://github.com/guest271314/SpeechSynthesisRecorder
// Configuration: Analog Stereo Duplex
// Input Devices: Monitor of Built-in Audio Analog Stereo, Built-in Audio Analog Stereo
class SpeechSynthesisRecorder {
constructor({text = "", utteranceOptions = {}, recorderOptions = {}, dataType = ""}) {
if (text === "") throw new Error("no words to synthesize");
this.dataType = dataType;
this.text = text;
this.mimeType = MediaRecorder.isTypeSupported("audio/webm; codecs=opus")
? "audio/webm; codecs=opus" : "audio/ogg; codecs=opus";
this.utterance = new SpeechSynthesisUtterance(this.text);
this.speechSynthesis = window.speechSynthesis;
this.mediaStream_ = new MediaStream();
this.mediaSource_ = new MediaSource();
this.mediaRecorder = new MediaRecorder(this.mediaStream_, {
mimeType: this.mimeType,
bitsPerSecond: 256 * 8 * 1024
});
this.audioContext = new AudioContext();
this.audioNode = new Audio();
this.chunks = Array();
if (utteranceOptions) {
if (utteranceOptions.voice) {
this.speechSynthesis.onvoiceschanged = e => {
const voice = this.speechSynthesis.getVoices().find(({
name: _name
}) => _name === utteranceOptions.voice);
this.utterance.voice = voice;
console.log(voice, this.utterance);
}
this.speechSynthesis.getVoices();
}
let {
lang, rate, pitch
} = utteranceOptions;
Object.assign(this.utterance, {
lang, rate, pitch
});
}
this.audioNode.controls = "controls";
document.body.appendChild(this.audioNode);
}
start(text = "") {
if (text) this.text = text;
if (this.text === "") throw new Error("no words to synthesize");
return navigator.mediaDevices.getUserMedia({
audio: true
})
.then(stream => new Promise(resolve => {
const track = stream.getAudioTracks()[0];
this.mediaStream_.addTrack(track);
// return the current `MediaStream`
if (this.dataType && this.dataType === "mediaStream") {
resolve({tts:this, data:this.mediaStream_});
};
this.mediaRecorder.ondataavailable = event => {
if (event.data.size > 0) {
this.chunks.push(event.data);
};
};
this.mediaRecorder.onstop = () => {
track.stop();
this.mediaStream_.getAudioTracks()[0].stop();
this.mediaStream_.removeTrack(track);
console.log(`Completed recording ${this.utterance.text}`, this.chunks);
resolve(this);
}
this.mediaRecorder.start();
this.utterance.onstart = () => {
console.log(`Starting recording SpeechSynthesisUtterance ${this.utterance.text}`);
}
this.utterance.onend = () => {
this.mediaRecorder.stop();
console.log(`Ending recording SpeechSynthesisUtterance ${this.utterance.text}`);
}
this.speechSynthesis.speak(this.utterance);
}));
}
blob() {
if (!this.chunks.length) throw new Error("no data to return");
return Promise.resolve({
tts: this,
data: this.chunks.length === 1 ? this.chunks[0] : new Blob(this.chunks, {
type: this.mimeType
})
});
}
arrayBuffer(blob) {
if (!this.chunks.length) throw new Error("no data to return");
return new Promise(resolve => {
const reader = new FileReader;
reader.onload = e => resolve(({
tts: this,
data: reader.result
}));
reader.readAsArrayBuffer(blob ? new Blob(blob, {
type: blob.type
}) : this.chunks.length === 1 ? this.chunks[0] : new Blob(this.chunks, {
type: this.mimeType
}));
});
}
audioBuffer() {
if (!this.chunks.length) throw new Error("no data to return");
return this.arrayBuffer()
.then(ab => this.audioContext.decodeAudioData(ab))
.then(buffer => ({
tts: this,
data: buffer
}))
}
mediaSource() {
if (!this.chunks.length) throw new Error("no data to return");
return this.arrayBuffer()
.then(({
data: ab
}) => new Promise((resolve, reject) => {
this.mediaSource_.onsourceended = () => resolve({
tts: this,
data: this.mediaSource_
});
this.mediaSource_.onsourceopen = () => {
if (MediaSource.isTypeSupported(this.mimeType)) {
const sourceBuffer = this.mediaSource_.addSourceBuffer(this.mimeType);
sourceBuffer.mode = "sequence"
sourceBuffer.onupdateend = () =>
this.mediaSource_.endOfStream();
sourceBuffer.appendBuffer(ab);
} else {
reject(`${this.mimeType} is not supported`)
}
}
this.audioNode.src = URL.createObjectURL(this.mediaSource_);
}));
}
readableStream({size = 1024, controllerOptions = {}, rsOptions = {}}) {
if (!this.chunks.length) throw new Error("no data to return");
const src = this.chunks.slice(0);
const chunk = size;
return Promise.resolve({
tts: this,
data: new ReadableStream(controllerOptions || {
start(controller) {
console.log(src.length);
controller.enqueue(src.splice(0, chunk))
},
pull(controller) {
if (src.length = 0) controller.close();
controller.enqueue(src.splice(0, chunk));
}
}, rsOptions)
});
}
}
Usage
let ttsRecorder = new SpeechSynthesisRecorder({
text: "The revolution will not be televised",
utternanceOptions: {
voice: "english-us espeak",
lang: "en-US",
pitch: .75,
rate: 1
}
});
// ArrayBuffer
ttsRecorder.start()
// `tts` : `SpeechSynthesisRecorder` instance, `data` : audio as `dataType` or method call result
.then(tts => tts.arrayBuffer())
.then(({tts, data}) => {
// do stuff with `ArrayBuffer`, `AudioBuffer`, `Blob`,
// `MediaSource`, `MediaStream`, `ReadableStream`
// `data` : `ArrayBuffer`
tts.audioNode.src = URL.createObjectURL(new Blob([data], {type:tts.mimeType}));
tts.audioNode.title = tts.utterance.text;
tts.audioNode.onloadedmetadata = () => {
console.log(tts.audioNode.duration);
tts.audioNode.play();
}
})
// AudioBuffer
ttsRecorder.start()
.then(tts => tts.audioBuffer())
.then(({tts, data}) => {
// `data` : `AudioBuffer`
let source = tts.audioContext.createBufferSource();
source.buffer = data;
source.connect(tts.audioContext.destination);
source.start()
})
// Blob
ttsRecorder.start()
.then(tts => tts.blob())
.then(({tts, data}) => {
// `data` : `Blob`
tts.audioNode.src = URL.createObjectURL(blob);
tts.audioNode.title = tts.utterance.text;
tts.audioNode.onloadedmetadata = () => {
console.log(tts.audioNode.duration);
tts.audioNode.play();
}
})
// ReadableStream
ttsRecorder.start()
.then(tts => tts.readableStream())
.then(({tts, data}) => {
// `data` : `ReadableStream`
console.log(tts, data);
data.getReader().read().then(({value, done}) => {
tts.audioNode.src = URL.createObjectURL(value[0]);
tts.audioNode.title = tts.utterance.text;
tts.audioNode.onloadedmetadata = () => {
console.log(tts.audioNode.duration);
tts.audioNode.play();
}
})
})
// MediaSource
ttsRecorder.start()
.then(tts => tts.mediaSource())
.then(({tts, data}) => {
console.log(tts, data);
// `data` : `MediaSource`
tts.audioNode.srcObj = data;
tts.audioNode.title = tts.utterance.text;
tts.audioNode.onloadedmetadata = () => {
console.log(tts.audioNode.duration);
tts.audioNode.play();
}
})
// MediaStream
let ttsRecorder = new SpeechSynthesisRecorder({
text: "The revolution will not be televised",
utternanceOptions: {
voice: "english-us espeak",
lang: "en-US",
pitch: .75,
rate: 1
},
dataType:"mediaStream"
});
ttsRecorder.start()
.then(({tts, data}) => {
// `data` : `MediaStream`
// do stuff with active `MediaStream`
})
.catch(err => console.log(err))
plnkr

This is an updated code from previous answer which works in Chrome 96:
make sure to select "Share system audio" checkbox in "Choose what to share" window
won't run via SO code snippet (save to demo.html)
<script>
(async () => {
const text = "The revolution will not be televised";
const blob = await new Promise(async resolve => {
console.log("picking system audio");
const stream = await navigator.mediaDevices.getDisplayMedia({video:true, audio:true});
const track = stream.getAudioTracks()[0];
if(!track)
throw "System audio not available";
stream.getVideoTracks().forEach(track => track.stop());
const mediaStream = new MediaStream();
mediaStream.addTrack(track);
const chunks = [];
const mediaRecorder = new MediaRecorder(mediaStream, {bitsPerSecond:128000});
mediaRecorder.ondataavailable = event => {
if (event.data.size > 0)
chunks.push(event.data);
}
mediaRecorder.onstop = () => {
stream.getTracks().forEach(track => track.stop());
mediaStream.removeTrack(track);
resolve(new Blob(chunks));
}
mediaRecorder.start();
const utterance = new SpeechSynthesisUtterance(text);
utterance.onend = () => mediaRecorder.stop();
window.speechSynthesis.speak(utterance);
console.log("speaking...");
});
console.log("audio available", blob);
const player = new Audio();
player.src = URL.createObjectURL(blob);
player.autoplay = true;
player.controls = true;
})()
</script>

Related

audio file doesn't work on ios iphone neither safari nor chrome

I'm creating a chat with the ability to send voice notes.
and voice notes work perfectly on desktop and android but on ios things start to crash
once the audio files load, the chrome console on ios shows an error
mediaError {code:4, message:Unsupported source type, MEDIA_ERR_ABORTED:1, MEDIA_ERR_NETWORK:2, MEDIA_ERR_DECODE:3}
and if I click on the play button it gives the error DOMException
This is the function that records audio
const recordAudio = async (_) => {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true })
const mediaRecorder = new MediaRecorder(stream, {
mimeType:'audio/mp4',
audioBitrate: '128000',
})
mediaRecorder.start()
const audioChunks = []
mediaRecorder.addEventListener('dataavailable', (event) => {
audioChunks.push(event.data)
})
mediaRecorder.addEventListener('stop', () => {
const audioBlob = new Blob(audioChunks, { type: 'audio/mp3' })
composeMessage('audio', audioBlob)
setIsRecording(false)
})
setTimeout(() => {
mediaRecorder.stop()
}, 30000)
}
The function that creates audio file
const createAudioFile = () => {
const audio = new Audio()
audio.setAttribute('preload', 'metadata')
const source = document.createElement('source')
source.setAttribute('src', URL)
source.setAttribute('type', 'audio/mp3')
audio.appendChild(source)
setAudioFile(audio)
}
and this is the function that triggers the audio file
const playAudioHandler = () => {
const playPromise = audioFile.play()
if (playPromise !== undefined) {
playPromise
.then((_) => {
audioFile.play()
setIsPlaying(true)
})
.catch((error) => {
pauseAudioHandler()
})
}
}

Web MediaRecorder API cannot record audio and video simultaneously

I've been trying to record video and audio with the MediaRecorder API but it will only let me record my screen without audio. Do I need to have two separate streams and merge them into one? But why would it be possible to set { audio: true, video: true } in the navigator.mediaDevices.getDisplayMedia() method in this case?
This is my code:
async function startRecording() {
let mimeType = "video/webm;codecs=vp9";
try {
const mediaDevices = navigator.mediaDevices as any;
const stream = await mediaDevices.getDisplayMedia({
audio: true,
video: true,
});
const options = {
mimeType: mimeType,
bitsPerSecond: 500000,
};
let recorder = new MediaRecorder(stream, options);
const chunks = [];
recorder.ondataavailable = (e) => {
if (e.data.size > 0) {
chunks.push(e.data);
} else {
console.log("no data to push");
}
};
recorder.onstop = (e) => {
const completeBlob = new Blob(chunks, {
type: chunks[0].type
});
stream.getTracks().forEach((track) => {
track.stop();
console.log(track);
});
setVideoData({
recorded: true,
localVideoURL: URL.createObjectURL(completeBlob),
blob: completeBlob,
});
};
recorder.start();
} catch (error) {
console.log(error);
}
}
Any pointers greatly appreciated.

Most browsers don't support capturing audio with display media. Even in Chrome and Chromium variants, capture support depends on the OS.
https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getDisplayMedia#Browser_compatibility

Creating AudioBuffers from MediaRecorder Chunks

I'm trying to analyse Audio in realtime as it flows in, using a function that takes an AudioBuffer. I convert the Blob that the recorder gives me with the following function, but it throws a DOMException: Unable to decode audio data.
async function audioBufferFromBlob(blob: Blob, audioCtx: AudioContext): Promise<AudioBuffer> {
return await audioCtx.decodeAudioData(await new Response(blob).arrayBuffer());
}
I believe this is due to the fact, that the recorders ondataavailable blobs aren't full audio streams only chunks. Is there a way convert the blobs to full audio streams?
Full code:
import PitchFinder = require('pitchfinder');
const amdf = PitchFinder.AMDF();
function logPitch(pitch: [number, number], elem: HTMLElement) {
elem.innerText = pitch.toString();
}
function detectPitch(audioBuffer: AudioBuffer): [number, number]{
let frequency = amdf(audioBuffer.getChannelData(0));
return [frequency, noteFromFrequency(frequency)]
}
async function audioBufferFromBlob(blob: Blob, audioCtx: AudioContext): Promise<AudioBuffer> {
return await audioCtx.decodeAudioData(await new Response(blob).arrayBuffer());
}
function noteFromFrequency(frequency: number ): number {
let noteNum = 12 * (Math.log( frequency / 440 )/Math.log(2) );
return Math.round( noteNum ) + 69;
}
export function displayPitch(displayElem: HTMLElement) {
let audioCtx = new AudioContext();
navigator.getUserMedia({audio: true, video: false}, successCallback, errorCallback);
function errorCallback() {
alert("Something went wrong.");
}
function successCallback(stream: MediaStream) {
let recorder = new MediaRecorder(stream, { mimeType: 'audio/webm'});
recorder.ondataavailable = async (event: BlobEvent) => {
let buffer = await audioBufferFromBlob(event.data, audioCtx);
logPitch(detectPitch(buffer), displayElem);
};
recorder.start(500);
}
}

Duration of recordered file WebRTC

I implement a video streaming service with WebRTC an signaling WebSockets-Server. It is a electron-App. I want to record some video fragments and get a webm file to download. It works fine if i record a local stream (from navigator.mediaDevices.getUserMedia), but if I try to record a remote stream, it will be a video file generated with duration < 1 sec, this trouble is only in chrome (and electron) in Firefox works it fine. If I call console.log(blob) in stopRecording () it gives somethink like this:
Blob(1875185) {size: 1875185, type: "video/webm"}
So blob size is correct and after downloading is file size is also correct, what in this case?
class WebRTC {
...........
constructor () {
navigator.mediaDevices.getUserMedia({
'audio': true,
'video': false
}).then(
(stream) => {
this.peerConnection = new RTCPeerConnection(peerConnectionConfig)
this.peerConnection.addEventListener('addstream', event => this.gotRemoteMediaStream(event))
this.peerConnection.addEventListener('icecandidate', e => this.handleConnection(e))
this.peerConnection.addStream(stream)
}
).catch(e => console.log('error', e))
}
gotRemoteMediaStream (event) {
console.log(event)
const mediaStream = event.stream
this.video.srcObject = mediaStream
this.video.play()
this.remoteStream = mediaStream
console.log('Remote peer connection received remote stream.')
}
startRecording () {
console.log('RECORD')
this.recorder = new MediaRecorder(this.remoteStream)
this.recorder.addEventListener('dataavailable', e => this.onRecordingReady(e))
this.recorder.start(15)
}
onRecordingReady (e) {
console.log(e)
this.chunks.push(e.data)
}
stopRecording () {
this.recorder.stop()
var a = document.createElement('a')
document.body.appendChild(a)
a.style = 'display: none'
a.download = 'test.webm'
setTimeout(() => {
console.log(this)
var blob = new Blob(this.chunks, { type: 'video/webm' })
console.log(this.chunks)
console.log(blob)
let url = URL.createObjectURL(blob)
a.href = url
a.click()
window.URL.revokeObjectURL(url)
}, 500)
}
}

React Native. MP3 Binary String (Uint8Array(9549)) to stream or file

I am trying to play an audio file with binary string format that Amazon Polly returns.
For that, I am using 'react-native-fetch-blob' and reading a stream, but just keep getting errors from the bridge saying 'Invalid data message - all must be length: 8'.
It happens when I try to open the stream: ifstream.open()
This is the code:
//polly config
const params = {
LexiconNames: [],
OutputFormat: "mp3",
SampleRate: "8000",
Text: "All Gaul is divided into three parts",
TextType: "text",
VoiceId: "Joanna"
};
Polly.synthesizeSpeech(params, function(err, data) {
let _data = "";
RNFetchBlob.fs.readStream(
// file path
data.AudioStream,
// encoding, should be one of `base64`, `utf8`, `ascii`
'ascii'
)
.then((ifstream) => {
ifstream.open()
ifstream.onData((chunk) => {
_data += chunk
})
ifstream.onError((err) => {
console.log('oops', err.toString())
})
ifstream.onEnd(() => {
//pasing _data to streaming player or normal audio player
ReactNativeAudioStreaming.play(_data, {showIniOSMediaCenter: true, showInAndroidNotifications: true});
})
})
});
Another solution I have also tried is to save the stream into a file to load it later on, but I got similars bugs.
RNFetchBlob.fs.createFile("myfile.mp3", dataG.AudioStream, 'ascii');
Huge thanks in advance

You could use the getSynthesizeSpeechUrl method from AWS.Polly.Presigner. I’m doing this and using react-native-sound to play the mp3. I ran into an issue where the mp3 wouldn’t play because my presigned URL contained special characters, but there’s a fix here.

You can use fetch() to request one or more media resources, return Response.body.getReader() from .then() to get a ReadableStream of the response. Read the Uint8Array values returned as the stream as read with .read() method of the ReadableStream, append to value to SourceBuffer of MediaSource to stream the media at an HTMLMediaElement.
For example, to output the audio of two requested audio resources, in sequence
window.addEventListener("load", () => {
const audio = document.createElement("audio");
audio.controls = "controls";
document.body.appendChild(audio);
audio.addEventListener("canplay", e => {
audio.play();
});
const words = ["hello", "world"];
const mediaSource = new MediaSource();
const mimeCodec = "audio/mpeg";
const mediaType = ".mp3";
const url = "https://ssl.gstatic.com/dictionary/static/sounds/de/0/";
Promise.all(
words.map(word =>
fetch(`https://query.yahooapis.com/v1/public/yql?q=select * from data.uri where url="${url}${word}${mediaType}"&format=json&callback=`)
.then(response => response.json())
.then(({
query: {
results: {
url
}
}
}) =>
fetch(url).then(response => response.body.getReader())
.then(readers => readers)
)
)
)
.then(readers => {
audio.src = URL.createObjectURL(mediaSource);
mediaSource.addEventListener("sourceopen", sourceOpen);
async function sourceOpen() {
var sourceBuffer = mediaSource.addSourceBuffer(mimeCodec);
// set `sourceBuffer` `.mode` to `"sequence"`
sourceBuffer.mode = "segments";
const processStream = ({
done,
value
}) => {
if (done) {
return;
}
// append chunk of stream to `sourceBuffer`
sourceBuffer.appendBuffer(value);
}
// at `sourceBuffer` `updateend` call `reader.read()`,
// to read next chunk of stream, append chunk to
// `sourceBuffer`
for (let [index, reader] of Object.entries(readers)) {
sourceBuffer.addEventListener("updateend", function() {
reader.read().then(processStream);
});
let stream = await reader.read().then(processStream)
.then(() => reader.closed)
.then(() => "done reading stream " + index);
console.log(stream);
}
}
})
})
plnkr http://plnkr.co/edit/9zHwmcdG3UKYMghD0w3q?p=preview

We Keep Coding

JavaScript is the programming language of the Web.

How to capture generated audio from window.speechSynthesis.speak() call? - javascript

Related

audio file doesn't work on ios iphone neither safari nor chrome

Web MediaRecorder API cannot record audio and video simultaneously

Creating AudioBuffers from MediaRecorder Chunks

Duration of recordered file WebRTC

React Native. MP3 Binary String (Uint8Array(9549)) to stream or file

Categories

Resources