generate audio file with W3C Web Speech API - javascript

Is it possible to use W3C Web Speech API to write Javascript code which generates audio file (wav, ogg or mp3) with voice speaking given text? I mean, I want to do something like:
window.speechSynthesis.speak(new SpeechSynthesisUtterance("0 1 2 3"))
but I want sound generated with it not to be output to speakers but to file.

The requirement is not possible using Web Speech API alone, see Re: MediaStream, ArrayBuffer, Blob audio result from speak() for recording?, How to implement option to return Blob, ArrayBuffer, or AudioBuffer from window.speechSynthesis.speak() call
Though requirement is possible using a library, for example, espeak or meSpeak, see How to create or convert text to audio at chromium browser?.
fetch("https://gist.githubusercontent.com/guest271314/f48ee0658bc9b948766c67126ba9104c/raw/958dd72d317a6087df6b7297d4fee91173e0844d/mespeak.js")
.then(response => response.text())
.then(text => {
const script = document.createElement("script");
script.textContent = text;
document.body.appendChild(script);
return Promise.all([
new Promise(resolve => {
meSpeak.loadConfig("https://gist.githubusercontent.com/guest271314/8421b50dfa0e5e7e5012da132567776a/raw/501fece4fd1fbb4e73f3f0dc133b64be86dae068/mespeak_config.json", resolve)
}),
new Promise(resolve => {
meSpeak.loadVoice("https://gist.githubusercontent.com/guest271314/fa0650d0e0159ac96b21beaf60766bcc/raw/82414d646a7a7ef11bb04ddffe4091f78ef121d3/en.json", resolve)
})
])
})
.then(() => {
// takes approximately 14 seconds to get here
console.log(meSpeak.isConfigLoaded());
console.log(meSpeak.speak("what it do my ninja", {
amplitude: 100,
pitch: 5,
speed: 150,
wordgap: 1,
variant: "m7",
rawdata: "mime"
}));
})
.catch(err => console.log(err));
There is also workaround using MediaRecorder, depending on system hardware How to capture generated audio from window.speechSynthesis.speak() call?.

Related

Detect if AVIF image is animated using JavaScript

Is there a way to detect if an AVIF image is animated using JavaScript?
Absolutely no frameworks or libraries.
The new ImageDecoder API can tell this to you.
You'd pass a ReadableStream of your data to it, and then check if one of the decoder's tracks has its animated metadata set to true:
if (!window.ImageDecoder) {
console.warn("Your browser doesn't support the ImageDecoder API yet, we'd need to load a library");
}
// from https://colinbendell.github.io/webperf/animated-gif-decode/avif.html
fetch("https://colinbendell.github.io/webperf/animated-gif-decode/6.avif").then((resp) => test("animated", resp.body));
// from https://github.com/link-u/avif-sample-images cc-by-sa 4.0 Kaede Fujisaki
fetch("https://raw.githubusercontent.com/link-u/avif-sample-images/master/fox.profile1.8bpc.yuv444.avif").then((resp) => test("static", resp.body));
document.querySelector("input").onchange = ({target}) => test("your image", target.files[0].stream());
async function test(name, stream) {
const decoder = new ImageDecoder({ data: stream, type: "image/avif" });
// wait for we have some metadata
await decoder.tracks.ready;
// log if one of the tracks is animated
console.log(name, [...decoder.tracks].some((track) => track.animated));
}
<input type=file>
However beware this API is still not widely supported, since only Chromium based browsers have an implementation currently.

Is there a `loadedmetadata` for html5 img?

I'm rendering large images which are streamed natively by the browser.
What I need is a Javascript event that indicates that the image's dimensions were retrieved from its metadata. The only event that seems to be firing is the onload event, but this is not useful as the dimensions were known long before that. I've tried loadstart but it does not fire for img elements.
Is there a loadedmetadata event for the img element in html5?
There is not an equivalent of loadedmetadata for img elements.
The most updated specs at the time of writting are the w3 Recommendation (5.2) (or the w3 WD (5.3)) and the WHATWG Living Standard. Although I find easier to browse through all the events in MDN; their docs are more user friendly.
You can check that loadedmetadata is the only event related to metadata and that it applies just to HTMLMediaElements.
You could take advantage of the Streams API to access the streams of data, process them and extract the metadata yourself. It has two caveats, though: it is an experimental technology with limited support and you will need to look for a way to read the image dimensions from the data stream depending on the image format.
I put together an example for PNG images based on MDN docs.
Following the PNG spec, the dimensions of a PNG image are just after the signature, at the beginning of the IHDR chunk (i.e., width at bytes 16-19, height at 20-23). Although it is not guaranteed, you can bet that the metadata of every image format is available in the first chunk that you receive.
const image = document.getElementById('img');
// Fetch the original image
fetch('https://upload.wikimedia.org/wikipedia/commons/d/de/Wikipedia_Logo_1.0.png')
// Retrieve its body as ReadableStream
.then(response => {
const reader = response.body.getReader();
return stream = new ReadableStream({
start(controller) {
let firstChunkReceived = false;
return pump();
function pump() {
return reader.read().then(({
done,
value
}) => {
// When no more data needs to be consumed, close the stream
if (done) {
controller.close();
return;
}
// Log the chunk of data in console
console.log('data chunk: [' + value + ']');
// Retrieve the metadata from the first chunk
if (!firstChunkReceived) {
firstChunkReceived = true;
let width = (new DataView(value.buffer, 16, 20)).getInt32();
let height = (new DataView(value.buffer, 20, 24)).getInt32();
console.log('width: ' + width + '; height: ' + height);
}
// Enqueue the next data chunk into our target stream
controller.enqueue(value);
return pump();
});
}
}
})
}).then(stream => new Response(stream))
.then(response => response.blob())
.then(blob => URL.createObjectURL(blob))
.then(url => console.log(image.src = url))
.catch(err => console.error(err));
<img id="img" src="" alt="Image preview...">
Disclaimer: when I read this question I knew that the Streams API could be used but I've never been in a need to extract metadata so I've never made ANY research about it. It could be that there are other APIs or libraries that do a better job, more straightforward and with wider browser support.

D3.js pulling and embedding DataURI images with Promises

I'm building a data visualization which relies on a lot of small raster images, delivered as AWS URLs via JSON API.
This works fairly well, until I try to implement my next step, which is rendering the data visualization as a PNG to download. In the PNG, the raster images are broken.
I've understood that to solve this, I need to embed images as Data URLs.
Here's what I've got so far:
const companies_base64 = companies.map(c => {
var o = Object.assign({}, c)
o.base64 = imageToBase64(c.mimetype, c.logo)
return o
})
Where companies is an array of objects. Here's imageToBase64, the Heroku app being a clone of CORS anywhere:
function imageToBase64(mimetype, logo) {
var url = 'https://example.herokuapp.com/' + logo
return d3.blob(url)
.then(blob => blobToBase64(blob))
.then(base64 => mimetype + base64)
.catch(error => console.error(error))
}
function blobToBase64(blob) {
return new Promise((resolve, reject) => {
let reader = new FileReader()
reader.onload = () => {
let dataUrl = reader.result
let base64 = dataUrl.split(',')[1]
resolve(base64)
}
reader.onerror = () => {
reject("Error")
}
reader.readAsDataURL(blob)
})
}
Which results in a Promise being returned when calling base64 on any of the objects in companies_base64, the [[PromiseValue]] being of course what I'm after. How am I supposed to make sure it is what gets returned so I can, ultimately, place it inside the xlink:href attributes of the <image>s in my <svg>?
I think that once it works and I can call imageToBase64 wherever, it's something I want to do only when the user presses Download. I imagine I can do this using D3, iterating over the <image>s and swapping out their xlink:href. Or should I go about it another way?
I have also tried getting the images as objects and then converting them to base64 in my RoR backend so they come packaged with the JSON, via an Image#to_base64 method. This does work, but it A) feels very wrong and B) is obviously very slow on initial load.
Thank you for your time and please bear with me as I am a beginner.
Your imageToBase64 function returns a promise, not the resolved data URL. That means you have to wait before you can attach them to the companies_base64 members. It is your choice if you do that as soon as the individual base64 string is ready, or if you wait for them all:
Promise.all(companies.map(c => {
return imageToBase64(c.mimetype, c.logo)
.then(u => Object.assign({ base64: u }, c))
.then(/* change the image reference here one by one... */)
}))
.then(companies_base64 => /* ...or here, in a loop over the array */)
.catch(error => console.error(error))

Getting number of audio channels for an AudioTrack

I have a video element, with data being added via MSE. I'm trying to determine how many audio channels there are in each track.
The AudioTrack objects themselves don't have a property with this information. The only way I know to go about it is to use the Web Audio API:
const v = document.querySelector('video');
const ctx = new OfflineAudioContext(32, 48000, 48000);
console.log(Array.from(v.audioTracks).map((track) => {
return ctx.createBufferSource(track.sourceBuffer).channelCount;
}));
For a video with a single mono track, I expect to get [1]. For a video with a single stereo track, I expect to get [2]. Yet, every time I get [2] no matter what the channel count is in the original source.
Questions:
Is there a proper direct way to get the number of channels in an AudioTrack?
Is there something else I could be doing with the Web Audio API to get the correct number of channels?
I stumbled upon an answer for this that seems to be working. It looks like by using decodeAudioData we can grab some buffer data about a file. I built a little function that returns a Promise with the buffer data that should return the correct number of channels of an audio file:
function loadBuffer(path) {
return fetch(path)
.then(response => response.arrayBuffer())
.then(
buffer =>
new Promise((resolve, reject) =>
audioContext.decodeAudioData(
buffer,
data => resolve(data),
err => reject(err)
)
)
)
}
Then you can use it like this:
loadBuffer(audioSource).then(data => console.log(data.numberOfChannels))
Might be best to store and reuse the data if it can be called multiple times.

How to create or convert text to audio at chromium browser?

While trying to determine a solution to How to use Web Speech API at chromium? found that
var voices = window.speechSynthesis.getVoices();
returns an empty array for voices identifier.
Not certain if lack of support at chromium browser is related to this issue Not OK, Google: Chromium voice extension pulled after spying concerns?
Questions:
1) Are there any workarounds which can implement the requirement of creating or converting audio from text at chromium browser?
2) How can we, the developer community, create an open source database of audio files reflecting both common and uncommon words; served with appropriate CORS headers?
There are several possible workarounds that have found which provide the ability to create audio from text; two of which require requesting an external resource, the other uses meSpeak.js by #masswerk.
Using approach described at Download the Audio Pronunciation of Words from Google, which suffers from not being able to pre-determine which words actually exist as a file at the resource without writing a shell script or performing a HEAD request to check if a network error occurs. For example, the word "do" is not available at the resource used below.
window.addEventListener("load", () => {
const textarea = document.querySelector("textarea");
const audio = document.createElement("audio");
const mimecodec = "audio/webm; codecs=opus";
audio.controls = "controls";
document.body.appendChild(audio);
audio.addEventListener("canplay", e => {
audio.play();
});
let words = textarea.value.trim().match(/\w+/g);
const url = "https://ssl.gstatic.com/dictionary/static/sounds/de/0/";
const mediatype = ".mp3";
Promise.all(
words.map(word =>
fetch(`https://query.yahooapis.com/v1/public/yql?q=select * from data.uri where url="${url}${word}${mediatype}"&format=json&callback=`)
.then(response => response.json())
.then(({query: {results: {url}}}) =>
fetch(url).then(response => response.blob())
.then(blob => blob)
)
)
)
.then(blobs => {
// const a = document.createElement("a");
audio.src = URL.createObjectURL(new Blob(blobs, {
type: mimecodec
}));
// a.download = words.join("-") + ".webm";
// a.click()
})
.catch(err => console.log(err));
});
<textarea>what it does my ninja?</textarea>
Resources at Wikimedia Commons Category:Public domain are not necessary served from same directory, see How to retrieve Wiktionary word content?, wikionary API - meaning of words.
If the precise location of the resource is known, the audio can be requested, though the URL may include prefixes other than the word itself.
fetch("https://upload.wikimedia.org/wikipedia/commons/c/c5/En-uk-hello-1.ogg")
.then(response => response.blob())
.then(blob => new Audio(URL.createObjectURL(blob)).play());
Not entirely sure how to use the Wikipedia API, How to get Wikipedia content using Wikipedia's API?, Is there a clean wikipedia API just for retrieve content summary? to get only the audio file. The JSON response would need to be parsed for text ending in .ogg, then a second request would need to be made for the resource itself.
fetch("https://en.wiktionary.org/w/api.php?action=parse&format=json&prop=text&callback=?&page=hello")
.then(response => response.text())
.then(data => {
new Audio(location.protocol + data.match(/\/\/upload\.wikimedia\.org\/wikipedia\/commons\/[\d-/]+[\w-]+\.ogg/).pop()).play()
})
// "//upload.wikimedia.org/wikipedia/commons/5/52/En-us-hello.ogg\"
which logs
Fetch API cannot load https://en.wiktionary.org/w/api.php?action=parse&format=json&prop=text&callback=?&page=hello. No 'Access-Control-Allow-Origin' header is present on the requested resource
when not requested from same origin. We would need to try to use YQL again, though not certain how to formulate the query to avoid errors.
The third approach uses a slightly modified version of meSpeak.js to generate the audio without making an external request. The modification was to create a proper callback for .loadConfig() method
fetch("https://gist.githubusercontent.com/guest271314/f48ee0658bc9b948766c67126ba9104c/raw/958dd72d317a6087df6b7297d4fee91173e0844d/mespeak.js")
.then(response => response.text())
.then(text => {
const script = document.createElement("script");
script.textContent = text;
document.body.appendChild(script);
return Promise.all([
new Promise(resolve => {
meSpeak.loadConfig("https://gist.githubusercontent.com/guest271314/8421b50dfa0e5e7e5012da132567776a/raw/501fece4fd1fbb4e73f3f0dc133b64be86dae068/mespeak_config.json", resolve)
}),
new Promise(resolve => {
meSpeak.loadVoice("https://gist.githubusercontent.com/guest271314/fa0650d0e0159ac96b21beaf60766bcc/raw/82414d646a7a7ef11bb04ddffe4091f78ef121d3/en.json", resolve)
})
])
})
.then(() => {
// takes approximately 14 seconds to get here
console.log(meSpeak.isConfigLoaded());
meSpeak.speak("what it do my ninja", {
amplitude: 100,
pitch: 5,
speed: 150,
wordgap: 1,
variant: "m7"
});
})
.catch(err => console.log(err));
one caveat of the above approach being that it takes approximately 14 and a half seconds for the three files to load before the audio is played back. However, avoids external requests.
It would be a positive to either or both 1) create a FOSS, developer maintained database or directory of sounds for both common and uncommon words; 2) perform further development of meSpeak.js to reduce load time of the three necessary files; and use Promise based approaches to provide notifications of the progress of of the loading of the files and readiness of the application.
In this users' estimation, it would be a useful resource if developers themselves created and contributed to an online database of files which responded with an audio file of the specific word. Not entirely sure if github is the appropriate venue to host audio files? Will have to consider the possible options if interest in such a project is shown.

Categories