I'm trying to implement amazon Polly in an application. It is an MVC application.
I
was able to retrieve the audio from the text and it works fine. And am trying to highlight the respective text in the webpage while playing that audio. like the rear speaker does.
My aim is to implement it without a third-party application. I went through the documentation but I can't find anything useful. I didn't find any options that automatically highlight the text in amazon Polly itself.
Can we do anything with Speech marks for this? is there any way to do this?
Thanks in Advance :)
Edit
I have the speech mark JSON result. Now am stuck on how to sync this result with an HTML audio tag.
{"time":6,"type":"word","start":0,"end":2,"value":"Hi"}
{"time":587,"type":"word","start":4,"end":6,"value":"my"}
{"time":754,"type":"word","start":7,"end":11,"value":"name"}
{"time":1147,"type":"word","start":12,"end":14,"value":"is"}
{"time":1305,"type":"word","start":15,"end":19,"value":"John"}
I have the speech mark JSON result. Now am stuck on how to sync this result with an HTML audio tag.
You can use the timeupdate event on and audio element and sync the audio.currentTime property with the best matching speech mark from Polly.
This assumes that audio references the JavaScript variable for the HTML audio element who has it's src set to the audio file returned from Polly that corresponds to the text's speech marks:
function getSpeechMarkAtTime(speechMarks, time) {
const length = speechMarks.length
let match = speechMarks[0]
let found = false
let i = 1
while (i < length && !found) {
if (speechMarks[i].time <= time) {
match = speechMarks[i]
} else {
found = true
}
i++
}
return match
}
function onTimeUpdate(speechMark) {
/**
* Update your HTML and CSS based on the attributes
* of the speech mark at the audio's current time.
*/
}
audio.addEventListener('timeupdate', () => {
// Polly Speech Marks use milliseconds
const currentTime = audio.currentTime * 1000
const speechMark = getSpeechMarkAtTime(speechMarksJSONResult, currentTime)
// Some custom callback
onTimeUpdate(speechMark)
})
Related
I've been building a music app and today I finally got around to the point where I started trying to work playing the music into it.
As an outline of how my environment is set up, I am storing the music files as MP3s which I have uploaded into a MongoDB database using GridFS. I then use a socket.io server to download the chunks from the MongoDB database and send them as individual emits to the front end where the are processed by the Web Audio API and scheduled to play.
When they play, they are all in the correct order but there is this very tiny glitch or skip at the same spots every time (presumably between chunks) that I can't seem to get rid of. As far as I can tell, they are all scheduled right up next to each other so I can't find a reason why there should be any sort of gap or overlap between them. Any help would be appreciated. Here's the code:
Socket Route
socket.on('stream-audio', () => {
db.client.db("dev").collection('music.files').findOne({"metadata.songId": "3"}).then((result) =>{
const bucket = new GridFSBucket(db.client.db("dev"), {
bucketName: "music"
});
bucket.openDownloadStream(result._id).on('data',(chunk) => {
socket.emit('audio-chunk',chunk)
});
});
});
Front end
//These variable are declared as object variables, hence all of the "this" keywords
context: new (window.AudioContext || window.webkitAudioContext)(),
freeTime: null,
numChunks: 0,
chunkTracker: [],
...
this.socket.on('audio-chunk', (chunk) => {
//Keeping track of chunk decoding status so that they don't get scheduled out of order
const chunkId = this.numChunks
this.chunkTracker.push({
id: chunkId,
complete: false,
});
this.numChunks += 1;
//Callback to the decodeAudioData function
const decodeCallback = (buffer) => {
var shouldExecute = false;
const trackIndex = this.chunkTracker.map((e) => e.id).indexOf(chunkId);
//Checking if either it's the first chunk or the previous chunk has completed
if(trackIndex !== 0){
const prevChunk = this.chunkTracker.filter((e) => e.id === (chunkId-1))
if (prevChunk[0].complete) {
shouldExecute = true;
}
} else {
shouldExecute = true;
}
//THIS IS THE ACTUAL WEB AUDIO API STUFF
if (shouldExecute) {
if (this.freeTime === null) {
this.freeTime = this.context.currentTime
}
const source = this.context.createBufferSource();
source.buffer = buffer
source.connect(this.context.destination)
if (this.context.currentTime >= this.freeTime){
source.start()
this.freeTime = this.context.currentTime + buffer.duration
} else {
source.start(this.freeTime)
this.freeTime += buffer.duration
}
//Update the tracker of the chunks that this one is complete
this.chunkTracker[trackIndex] = {id: chunkId, complete: true}
} else {
//If the previous chunk hasn't processed yet, check again in 50ms
setTimeout((passBuffer) => {
decodeCallback(passBuffer)
},50,buffer);
}
}
decodeCallback.bind(this);
this.context.decodeAudioData(chunk,decodeCallback);
});
Any help would be appreciated, thanks!
As an outline of how my environment is set up, I am storing the music files as MP3s which I have uploaded into a MongoDB database using GridFS.
You can do this if you want, but these days we have tools like Minio, which can make this easier using more common APIs.
I then use a socket.io server to download the chunks from the MongoDB database and send them as individual emits to the front end
Don't go this route. There's no reason for the overhead of web sockets, or Socket.IO. A normal HTTP request would be fine.
where the are processed by the Web Audio API and scheduled to play.
You can't stream this way. The Web Audio API doesn't support useful streaming, unless you happened to have raw PCM chunks, which you don't.
As far as I can tell, they are all scheduled right up next to each other so I can't find a reason why there should be any sort of gap or overlap between them.
Lossy codecs aren't going to give you sample-accurate output. Especially with MP3, if you give it some arbitrary number of samples, you're going to end up with at least one full MP3 frame (~576 samples) output. The reality is that you need data ahead of the first audio frame for it to work properly. If you want to decode a stream, you need a stream to start with. You can't independently decode MP3 this way.
Fortunately, the solution also simplifies what you're doing. Simply return an HTTP stream from your server, and use an HTML audio element <audio> or new Audio(url). The browser will handle all the buffering. Just make sure your server handles range requests, and you're good to go.
I have been trying to build a a media player in react native using Expo to be able to play audio on my music project.
I have successfully hacked one together with the preferred design etc but I still have a minor bug. Here, I receive information from an API Endpoint with Links to files stored in a server. this audios play when the filename is just one word. When there are spaces in the name, the file, it does not play. eg .../musics/test.mp3 plays while .../musics/test 32.mp3 does not play. Any idea on how to handle this issue in React native will be highly appreciated. My play function
startPlay = async (index = this.index, playing = false) => {
const url = this.list[index].url;
this.index = index;
console.log(url);
// Checking if now playing music, if yes stop that
if(playing) {
await this.soundObject.stopAsync();
} else {
// Checking if item already loaded, if yes just play, else load music before play
if(this.soundObject._loaded) {
await this.soundObject.playAsync();
} else {
await this.soundObject.loadAsync(url);
await this.soundObject.playAsync();
}
}
};
url is the link to the file .
I am working on a streaming platform and I will love to get a player similar to this:
Something like this https://hackernoon.com/building-a-music-streaming-app-using-react-native-6d0878a13ba4
But I am using React native expo. All the implementations I have come across online are using native without expo. Any pointers to any already done work on this using expo will be of great help eg packages .
thanks
The urls should be encoded:
const uri = this.list[index].url;
this.index = index;
const url = encodeURI(uri);
console.log(url);
The uri = "../musics/test 32.mp3" will be encoded to url = "../musics/test%2032.mp3"
I'm a web developer from japan.
This is first question on stack over flow.
I'm creating a simple music Web application now.
making a music system program is a completely beginner, so I am struggling to implement it.
As a result of various investigations, I noticed that using the Web Audio API was the best choice,
so, I decided to use it.
▼ What I want to achieve
Multiple Wav files load with the Web audio API can be grouped into one Wav file &To be able to download from the browser.
For example, load the multiple wav file like guitar, drum and piano, and
edit it on the browser, and finally output it as one Wav file.
Then we can download that edited wav file from the browser and we are able to play itunes.
▼ Question
Is it possible to achieve this requirements by just using web audio api ?
or we need to use another Library ?
I checked Record.js on github but development has stopped about 2 ~ 3 years and has many issues and I can not get support. so I decided not to use it.
and also I checked similar issue Web audio API: scheduling sounds and exporting the mix
Since the information is old, I do not know if I can still use it
thanks.
Hi and welcome to Stack Overflow!
Is it possible to achieve this just using the web audio api?
In terms of merging/mixing the files together this is perfectly achievable! This article goes through many (if not all) of the steps you will need to carry out the task you suggested.
Each file you want to upload can be loaded into an AudioBufferSource (examples explained in that article linked before) Example setting up a buffer source once the audio data has been loaded in:
play: function (data, callback) {
// create audio node and play buffer
var me = this,
source = this.context.createBufferSource(),
gainNode = this.context.createGain();
if (!source.start) { source.start = source.noteOn; }
if (!source.stop) { source.stop = source.noteOff; }
source.connect(gainNode);
gainNode.connect(this.context.destination);
source.buffer = data;
source.loop = true;
source.startTime = this.context.currentTime; // important for later!
source.start(0);
return source;
}
There are then also specific nodes already designed for your mixing purposes like the ChannelMergerNode (combines multiple mono channels into a new channel buffer). This is if you don't want to deal with the signal processing yourself in javascript but will be faster using the Web Audio objects since they are native compiled code already within the browser.
Following that complete guide sent before, there are also options to export the file (as a .wav in the demo case) using the following code :
var rate = 22050;
function exportWAV(type, before, after){
if (!before) { before = 0; }
if (!after) { after = 0; }
var channel = 0,
buffers = [];
for (channel = 0; channel < numChannels; channel++){
buffers.push(mergeBuffers(recBuffers[channel], recLength));
}
var i = 0,
offset = 0,
newbuffers = [];
for (channel = 0; channel < numChannels; channel += 1) {
offset = 0;
newbuffers[channel] = new Float32Array(before + recLength + after);
if (before > 0) {
for (i = 0; i < before; i += 1) {
newbuffers[channel].set([0], offset);
offset += 1;
}
}
newbuffers[channel].set(buffers[channel], offset);
offset += buffers[channel].length;
if (after > 0) {
for (i = 0; i < after; i += 1) {
newbuffers[channel].set([0], offset);
offset += 1;
}
}
}
if (numChannels === 2){
var interleaved = interleave(newbuffers[0], newbuffers[1]);
} else {
var interleaved = newbuffers[0];
}
var downsampledBuffer = downsampleBuffer(interleaved, rate);
var dataview = encodeWAV(downsampledBuffer, rate);
var audioBlob = new Blob([dataview], { type: type });
this.postMessage(audioBlob);
}
So I think Web-Audio has everything you could want for this purpose! However could be challenging depending on your web development experience, but its a skill definately worth learning!
Do we need to use another library?
If you can I think it's definately worth trying it with Web-Audio, as you'll almost definately get the best speeds for processing, but there are other libraries such as Pizzicato.js just to name one. I'm sure you will find plenty others.
Is there a global way to detect when audio is playing or starts playing in the browser.
something like along the idea of if(window.mediaPlaying()){...
without having the code tied to a specific element?
EDIT: What's important here is to be able to detect ANY audio no matter where the audio comes from. Whether it comes from an iframe, a video, the Web Audio API, etc.
No one should use this but it works.
Basically the only way that I found to access the entire window's audio is using MediaDevices.getDisplayMedia().
From there a MediaStream can be fed into an AnalyserNode that can be used to check the if the audio volume is greater than zero.
Only works in Chrome and maybe Edge (Only tested in Chrome 80 on Linux)
JSFiddle with <video>, <audio> and YouTube!
Important bits of code (cannot post in a working snippet because of the Feature Policies on the snippet iframe):
var audioCtx = new AudioContext();
var analyser = audioCtx.createAnalyser();
var bufferLength = analyser.fftSize;
var dataArray = new Float32Array(bufferLength);
window.isAudioPlaying = () => {
analyser.getFloatTimeDomainData(dataArray);
for (var i = 0; i < bufferLength; i++) {
if (dataArray[i] != 0) return true;
}
return false;
}
navigator.mediaDevices.getDisplayMedia({
video: true,
audio: true
})
.then(stream => {
if (stream.getAudioTracks().length > 0) {
var source = audioCtx.createMediaStreamSource(stream);
source.connect(analyser);
document.body.classList.add('ready');
} else {
console.log('Failed to get stream. Audio not shared or browser not supported');
}
}).catch(err => console.log("Unable to open capture: ", err));
I read all MDN docs about Web Audio API but I didn't find any global flag on window that shows audio playing. But I have found a tricky way that shows ANY audio playing, no matter an iframe or video but about Web Audio API:
const allAudio = Array.from( document.querySelectorAll('audio') );
const allVideo = Array.from( document.querySelectorAll('video') );
const isPlaying = [...allAudio, ...allVideo].some(item => !item.paused);
Now, by the isPlaying flag we can detect if any audio or video is playing in the browser.
There is a playbackState property (https://developer.mozilla.org/en-US/docs/Web/API/MediaSession/playbackState), but not all browsers support it.
if(navigator.mediaSession.playbackState === "playing"){...
I was looking for a solution in Google, but i didn't find anything yet.
Maybe you could check some data that has X value only when audio is playing. If you have some button that start playing the audio file, maybe you can be sure that the audio is playing by adding some event listener on the rep. button...
Maybe something like adding an event listener to the "audio" tag? If i remember correctly, audio tag has a "paused" attribute...
And now i just remember that the audio has "paused" attribute...
Also, you may want to check this topic HTML5 check if audio is playing?
i jus find it five seconds ago jaja
I've been working on using the html audio tag to play some audio files. The audio plays alright, but the duration property of the audio tag is always returning infinity.
I tried the accepted answer to this question but with the same result. Tested with Chrome, IE and Firefox.
Is this a bug with the audio tag, or am I missing something?
Some of the code I'm using to play the audio files.
javascript function when playbutton is pressed
function playPlayerV2(src) {
document.getElementById("audioplayerV2").addEventListener("loadedmetadata", function (_event) {
console.log(player.duration);
});
var player = document.getElementById("audioplayer");
player.src = "source";
player.load();
player.play();
}
the audio tag in html
<audio controls="true" id="audioplayerV2" style="display: none;" preload="auto">
note: I'm hiding the standard audio player with the intend of using custom layout and make use of the player via javascript, this does not seem to be related to my problem.
try this
var getDuration = function (url, next) {
var _player = new Audio(url);
_player.addEventListener("durationchange", function (e) {
if (this.duration!=Infinity) {
var duration = this.duration
_player.remove();
next(duration);
};
}, false);
_player.load();
_player.currentTime = 24*60*60; //fake big time
_player.volume = 0;
_player.play();
//waiting...
};
getDuration ('/path/to/audio/file', function (duration) {
console.log(duration);
});
I think this is due to a chrome bug. Until it's fixed:
if (video.duration === Infinity) {
video.currentTime = 10000000;
setTimeout(() => {
video.currentTime = 0; // to reset the time, so it starts at the beginning
}, 1000);
}
let duration = video.duration;
This works for me
const audio = document.getElementById("audioplayer");
audio.addEventListener('loadedmetadata', () => {
if (audio.duration === Infinity) {
audio.currentTime = 1e101
audio.addEventListener('timeupdate', getDuration)
}
})
function getDuration() {
audio.currentTime = 0
this.voice.removeEventListener('timeupdate', getDuration)
console.log(audio.duration)
},
In case you control the server and can make it to send proper media header - this what helped the OP.
I faced this problem with files stored in Google Drive when getting them in Mobile version of Chrome. I cannot control Google Drive response and I have to somehow deal with it.
I don't have a solution that satisfies me yet, but I tried the idea from both posted answers - which basically is the same: make audio/video object to seek the real end of the resource. After Chrome finds the real end position - it gives you the duration. However the result is unsatisfying.
What this hack really makes - it forces Chrome to load the resource into the memory completely. So, if the resource is too big, or connection is too slow you end up waiting a long time for the file to be downloaded behind the scenes. And you have no control over that file - it is handled by Chrome and once it decides that it is no longer needed - it will dispose it, so the bandwidth may be spent ineficciently.
So, in case you can load the file yourself - it is better to download it (e.g. as blob) and feed it to your audio/video control.
If this is a Twilio mp3, try the .wav version. The mp3 is coming across as a stream and it fools the audio players.
To use the .wav version, just change the format of the source url from .mp3 to .wav (or leave it off, wav is the default)
Note - the wav file is 4x larger, so that's the downside to switching.
Not a direct answer but in case anyone using blobs came here, I managed to fix it using a package called webm-duration-fix
import fixWebmDuration from "webm-duration-fix";
...
fixedBlob = await fixWebmDuration(blob);
...
//If you want to modify the video file completely, you can use this package "webmFixDuration" Other methods are applied at the display level only on the video tag With this method, the complete video file is modified
webmFixDuration github example
mediaRecorder.onstop = async () => {
const duration = Date.now() - startTime;
const buggyBlob = new Blob(mediaParts, { type: 'video/webm' });
const fixedBlob = await webmFixDuration(buggyBlob, duration);
displayResult(fixedBlob);
};