Everything is being done in the front end.
My goal is to be able to create an audio track, in real time, and play it instantly for a user. The file would be roughly 10 minutes. However, the files are very simple, mostly silence, with a few sound clips (the sound clip is 2kb) sprinkled around. So the process for generating the data (the raw bytes) is very simple, it's either write the 2kb sound clip or place n amount of 00 for the silence. It's just that for 10 minutes. But instead of generating the entire file fully, and then playing it, I would like to stream the audio, ideally I would be generating more and more of the file while the audio was playing. It would prevent any noticeable delay between when the user clicks play and when the audio starts playing. The process of creating the file can take anywhere from 20 milliseconds to 500 milliseconds, different files are created based off user input.
The only problem is: I have no idea how to do this. I've read ideas about using web sockets, but that seems like the data would come from the server, I see no reason why to bother a server with this when the JavaScript can easily generate the audio data on its own.
I've been researching and experimenting with the Web Audio API and the Media Streams API for the past several hours, and I keep going in circles and I'm totally confused by it. I'm starting to think that these API are meant to be used for gathering data from a users mic or webcam, and not fed data directly from a readable stream.
Is what I want to do possible? Can it be achieved using something like a MediaStreamAudioSourceNode or is there another simpler way that I haven't noticed?
Any help on this topic would be so greatly appreciated. Examples of a simple working version would be even more appreciated. Thanks!
I'm going to follow this question, because a true streaming solution would be very nice to know about. My experience is limited to using WebAudio API to play to two sounds with a given pause in between them. The data is actually generated at the server and downloaded using Thymeleaf, into two javascript variables that hold the PCM data to be played. But this data could easily have been generated at the client itself via Javascript.
The following is not great, but almost could be workable, given that there are extensive silences. I'm thinking, manage an ordered FIFO queue with the variable name and some sort of timing value for when you want the associated audio played, and have a function that periodically polls the queue and loads commands into javascript setTimeout methods with the delay amount calculated based on the timing values given in the queue.
For the one limited app I have, the button calls the following (where I wrote a method that plays the sound held in the javascript variable)
playTone(pcmData1);
setTimeout(() => playTone(pcmData2), 3500);
I have the luxury of knowing that pcmData1 is 2 seconds long, and a fixed pause interval between the two sounds. I also am not counting on significant timing accuracy. For your continuous playback tool, it would just have the setTimeout part with values for the pcmData variable and the timing obtained from the scheduling FIFO queue.
Whether this is helpful and triggers a useful idea, IDK. Hopefully, someone with more experience will show us how to stream data on the fly. This is certainly something that can be easily done with Java, using it's SourceDataLine class which has useful blocking-queue aspects, but I haven't located a Javascript equivalent yet.
Related
I'll try my best to explain this as good as I can:
I have programmed an art installation (interactive animation with three.js), it is running very smooth on my laptop, but not on older machines and almost on no mobile devices. Is there any way to run the javascript animation on a performant machine/server but display/render it on a website, thus running also on not performant devices. I was thinking about something like done for CloudGaming (as in shadow.tech) or services like parsec.app maybe...
Any ideas/hints?
I've thought about the same concept before too. The answer to your question rendering the three js on server and displaying it on an older machine will be something like this:
You are going to have to render three js on a server then basically live stream it to the client then capture the client's controls and send it back to the server to apply them but latency is going to be a pain in the ass so better write your code nice and neat and a lot of error handling. Come to think about it Google stadia does the same think too when you play games on youtube.
Server Architecture
So roughly thinking about it I would build my renderer backend like this:
First create a room with multiple instances of your three js code. => this basically means creating multiple (i.e index.js, index1.js, index2.js, etc...) Then Now decide if you wanna do this with the Straight Forward Recorded and Stream method or capture the stream directly from the canvas then broadcast it.
Recorded and Stream method
This means you create the js room then render and display it on the server machine then capture what is displaying on your screen using something like OBS studio then broadcast it to the client. I don't recommend this method.
OR
captureStream() method
After creating your room in your server run the three js code then access the canvas you are rendering the three js code in like this:
var canvas = document.querySelector('canvas');
var video = document.querySelector('video');
// Optional frames per second argument.
var stream = canvas.captureStream(60);
// Set the source of the <video> element to be the stream from the <canvas>.
video.srcObject = stream;
Then use something like webrtc and broadcast it to the client.
This is an example site you can look at: https://webrtc.github.io/samples/src/content/capture/canvas-pc/
Honestly don't ditch the client interaction idea. Instead write code to detect latency before hand then decide to enable it for that specific client
Hope this helps
I've seen things like waveform.js which uses the Web Audio API to display waveform data, and there are many other tools out there which are able to analyze the exact sound points of an audio file in JavaScript.
If so, it should be possible to use this power of analyzation to use for real-time lip syncing using JavaScript, i.e., to get an animated character to speak at the same time the user is speaking, by simply using an audio context, and reading the data-points some how to find the right sounds.
So the question becomes, more specifically:
How exactly do I analyze audio data to extract what exact sounds are made at specific timestamps?
I want to get the end result of something like Rhubarb Lip Sync, except with JavaScript, and in real time. It doesn't have to be exact, but as close as possible.
There is no algorithm that allows you to detect phonemes correctly 100% of the time.
You didn't say whether this was for real-time use or for offline use, but that would strongly affect which algorithm you'd use.
An algorithm based on mel frequency cepstral coefficients would be expected to give you about 80% accuracy, which would be good enough for video games or the like.
Deep learning systems based on covolutional neural nets would give you excellent recognition, but they are not real time systems (yet).
You could maybe start with Meyda, for example, and compare the audio features of the signal you're listening to, with a human-cataloged library of audio features for each phoneme.
I am working with a live video stream coming from AWS MediaLive, and CloudFront as a CDN. We have a player inside a Vue front-end, which is showing the stream, and rendering HTML/JS/CSS items on the screen, based on what is happening in the stream.
Now, in our very naive solution we have a service pushing events over a web socket to the front end to render these things "in sync" with the stream delay, but this doesn't work, since the stream delay is neither stable, nor is it consistent across different screens.
As far as I can read, there should be ways to embed the data/events we need directly into the stream as meta-data.
I think SCTE-35 could be exploited here, even though this is really for ad insertion, I think we should be able to use it to encode other events/data?
My question is two-fold... Is what I describe above (SCTE-35) the way to go about this, or should I use something else (ID3 seems another option)?
And, more importantly, how can I manage to read/respond to these events on the front-end (javascript) side, because that is the real thing I can't seem to find any information on.
Thanks in advance.
That is generally handled via timed metadata. HLS uses ID3, or DATE-RAGE in the manifest, CMAF has emsg, DASH... doesn’t really have a standard that I know of. You need a player that know how to extract the information, and execute a callback.
Hello Stack Overflow community,
I'm a rather novice coder, but I have a project I've been devising that looks more and more complicated every day, and I don't know where to start.
With inspiration taken from Synchtube & Phonoblaster, I'm looking to create something for my website that will allow visitors to watch YouTube videos and playlists that I have curated, together in real-time, in-sync.
Because I want to be able to put this in the context of my own website, I can't use the services listed above that already do this - so I wanted to figure out how to roll my own.
Some things have been written about this topic on Stack Overflow, and other blogs:
HERE
and HERE.
Because I still consider myself a novice programmer, and a lot of the information I've found on Google and Stack tends to be more than 1 or 2 years old, I'm still unsure where to begin or if this information is outdated. Specifically, what languages and tools I should be learning.
From what I've gathered so far, things like Javascript, Node.JS, and the YouTube API would form the crux of it. I've not used any of these before, but would be interested to see whether other experienced coders would have their own suggestions or ideas they could point me towards.
I appreciate you taking time out to read this post!
Hope to hear from some of you soon :)
Many thanks.
It partially sounds like you need a live stream from Youtube. You can find more info here. https://support.google.com/youtube/bin/answer.py?hl=en&answer=2474026
If you can get that going, then syncing play between any number of users is as simple as embedding a regular youtube embed of your stream in a browser.
Looking past that, if you wanted to sync video playback amongst any number of users, the first big problem is learning how to set time on a video. Luckily, that's easy with the hashbang #t=seconds.
Eg: http://www.youtube.com/watch?v=m38RdUGqBPM&feature=g-high-rec#t=619s will start this HuskyStarcraft video at 619 seconds into the video.
The next step is to have some backend server that keeps track of what the current time is. Node.js with Socket.io is incredibly easy to get setup. Socket.io is a wonderful library that gracefully handles concurrency connections from web sockets all through long polling and more and works well even on very old browsers. Note that websockets aren't even required, but will be the most modern and full-proof method for you. Otherwise its hacks and stuff.
One way this could work would be as follows.
User1 visits your site and starts playing the video first. A script on your page sends an XHR request to your server that says, "video started at time X". X then gets stored as the start time.
At this point, you could go 2 routes. You can have a client-side script using the Youtube API to poll the video and get its current status every second. If the status or time changes, send another request back to the server to update the state.
Another simple route would be to have the page load for User2+, then send an XHR request asking for the video play time. The server sends back the difference between the start time from User1, then the client script sets the 't' hashbang on the youtube player for User2+. This lets you sync start times, but if any users pause or rewind the video, those states dont get updated. A subsequent page refresh might do that though.
The entire application complexity depends on exactly what requirements you want to have. If its just synchronized start times, then route #2 should work well enough. Doesn't require sockets and is easy to do with jQuery or just straight javascript.
If you need a really synchronized experience where any user can start/stop/pause/fast forward/rewind the video, then you're looking at either using an established library solution or writing your own.
Sorry this answer is kind of open ended, but so was your question. =)
I was given 2 MP3 files, one that is 4.5Mb and one that is 5.6Mb. I was instructed to have them play on a website I am managing. I have found a nice, clean looking CSS based jQuery audio player.
My question is, is this the right solution for files that big? I am not sure if the player preloads the file, or streams it? (If that is the correct terminology) I don't deal much with audio players and such...
This player is from happyworm.com/jquery/jplayer/latest/demo-01.htm
Is there another approach I shoudl take to get this to play properly? I don't want it to have to buffer, and the visitor to wait, or slow page loading... etc.etc. I want it to play clean and not affect the visitors session to the site.
The name is a bit misleading - the MP3 playing is done in a Flash component, as in all other similar players, too. The jQuery part of it is the control and customization of the player (which is very nice, I'm not saying anything against the product).
The player should be capable to play an MP3 file while it loads. It's not going to be real streaming (because you can't skip to arbitrary positions) but it should work out all right.
Make sure you test the buffering yourself, using a big MP3 file. Remember to encode the MP3 files according to the rules because otherwise the files will act up, especially in older players.