Buffering/synchronizing multiple audio streams in JavaScript - javascript

Here's a high level overview of my problem:
There will be a central computer at an art gallery, and three separate remote sites, say up to a mile away from central. Each site has a musician. The central computer sends a live backing track over the internet to each of the three musicians, who play along to it and are each recorded as a live stream. Each of the three streams is then played back at the gallery, in-sync with the backing track and with the other musicians, as though all the musicians were playing live in the same room. The client has requested that the musicians appear to play PRECISELY in time with each other, i.e. no apparent latency between each musician. The musicians cannot hear each other, they only hear the backing track.
Here's what I see as the technical solution:
Each backing track packet is sent out from the gallery with the current timestamp. As a musician plays and is recorded, the packet currently being recorded is marked with the timestamp of the current backing track packet. When the three audio streams are sent back, they are buffered. Each packet is then played, say, ten seconds after its timestamp. i.e. At 11:00:00 AM, all of the packets marked 10:59:50 AM are played.
Or to think of it another way, each incoming stream is delayed 10 seconds behind real time. This buffering should allow for any network blips. It is also acceptable since there is no apparent latency to the viewers at the gallery, and everything is being played "as-live." We are assuming there is a good quality internet connection to each remote site.
I'm ideally looking for a JavaScript solution to this, as it's what I'm most familiar with (but other solutions would be interesting to know about as well).
Does anyone know of any JavaScript libraries with built-in functionality to allow this sort of buffering?

To be clear, it sounds like it doesn't matter that the musicians play back in time with each other... only that they play in time with the backing track, and within ~10 seconds of each other, correct? Assuming that's the case...
Site-to-Site Connectivity
You can use WebRTC for this, but we'll only be using the data channel. No need for media streams. This is a performance that requires precise timing, and I'm assuming decent quality audio as well. Let's just leave it in PCM and send that over the WebRTC data channels.
Alternatively, you could have a server host Web Socket connections which relay data to the sites.
Audio Recording and Playback
You can use the ScriptProcessorNode to play and record. This gives you raw access to the PCM stream. Just send/receive the bytes via your data channel. If you want to save some bandwidth, you can reduce the floating point samples down to 16-bit integers. Just crank them back up to floats on the receiving end.
Synchronization
The main synchronization needs to occur where playback of the backing track and recording occur at the same time. Immediately upon starting your playback, start recording. If you're using the ScriptProcessorNode as mentioned previously, you can actually do both in the same node, guaranteeing sample-accurate synchronization.
On playback, simply buffer all your tracks until you have your desired buffer level, and then play them back simultaneously inside your ScriptProcessorNode. Again, this is sample-accurate.
The only thing you might have to deal with now is clock drift. What's 44.1kHz to you might actually be 44.099kHz to me. This can add up over time, but is generally not something you need to concern yourself with as long as you reset all this once in awhile. That is, as long as you're not recording for a whole day or more without stopping, it probably won't be an issue for you.
The recorded packets should be marked with the timestamp of the incoming backing track packet
No, this synchronization should not happen at the network layer. If you're using a reliable transport with WebRTC data channels or Web Socket, you don't have to do anything but start all your streams at byte 0. Don't use timestamps, use sample offsets.
Does anyone know of any JavaScript libraries with built-in functionality to allow this sort of buffering?
I've actually built a project for doing similar things that allows for sample-accurate internet radio hand-offs from one site to another. It builds up a buffer over time, and then for the hand-off it basically re-syncs to a master clock from the new site. Since the new site is behind the old site, and since we can't bend space/time very easily, I drop out of the buffer a bit and pick up at the new site's master clock. (Not any different if there were a buffer underrun from a single site!) Anyway, I don't know of any other code that does this.

Related

Video encoding on client side vs server side

I am very new to this topic and I google searched but wasn't able to find anything concrete.
So, I was going through this article where the author says
In most video workflows there will be a transcoding server or serverless cloud function, that encodes the video into various resolutions and bitrates for optimal viewing in all devices and network speeds.
If you don’t want to use a transcoding server or API (which can be quite pricey), and depending on the kind of videos your app needs to upload and view, you can choose to forego server side transcoding altogether, and encode the videos solely on the client. This will save considerable costs, but will put the burden of encoding videos on the clients.
Even if you do use some server-side transcoding solution, you’ll probably want to perform minimal encoding on the client. The raw video sizes (especially on iOS), can be huge, and you don’t want to be wasteful of the user’s data plan, or force them to wait for WiFi unnecessarily.
I am curious to know advantages between encoding video on client side and server side and since HTTPS determine quality based on bandwidth speed (?), Does it have any affect on encoding video on client side?
Have you ever encoded a video on your computer? It uses a lot of CPU resources. The fan comes on, etc. If you start creating 4-5 different versions of the video on the clients phone, you'll drain BOTH their data plan and their battery.
Using an encoding API to create these variations does cost some money. But - one advantage of using such a service is that you get video experts helping you build the right sizes and resolutions for all of your videos. Some even handle all the hosting and delivery of the videos.
I suppose it depends how far down the rabbit hole you wish to go down. You can build it yourself, but there are many services like api.video that can do this work for you

Buffering/Syncing remote (webRTC) mediaStreams

I am currently building a webRTC application that streams audio (the classic server, client, one to many model). Communication and signaling is done through sockets.
The problem I have found is that there is a lot of variability when streaming to smart devices (mainly due to varying processing power), even on a local network.
Hence, I am trying to add functionality that syncs the stream between devices. At a high level I was thinking of potentially, buffering the incoming stream, once all devices are connected the last peer to connect will share something that indicates where that specific peer's buffer starts and all peers will play the buffer from that position.
Does this sound possible? Is there a better way to sync up remote streams? If I was to go along this path, how would I go about buffering a remote MediaStream object (or data from a BlobURL) potentially into some form of array which can be used to identify a common starting location between the streams?
Would I potentially use the Javascript AudioContext api?
I have also looked at NTP protocols and other syncing mechanism but I couldn't find how to apply them to in the context of a webRTC application.
Any help, pointers, or direction would be greatly appreciated.

How to calculate exact streaming duration time in web?

I am running streaming on wowza streaming server. But I am not able to find the exact duration time of streaming.
I am running music website on which we play streaming on user native player. For our tracking we want to keep track of exact time duration of the audio listen by the user.
As others have said in the comments, it isn't possible to determine the exact streaming time.
Different clients have different behavior with how they handle streams. Consider the case where a browser client may pre-buffer data. If the user goes to a page and the browser begins downloading audio data, the server will think that the client is listening to the stream when in fact the data is just sitting in memory. When the user does start playing the audio, say 1 minute later, the server now believes they have already been listening for a minute. When the user goes to a new page, the connection to the server is dropped, stopping audio at the same time as disconnection.
In other cases, media players can actually be paused mid-stream where they buffer the data for several seconds before disconnecting.
The best you could do is use client-side analytics, but this isn't possible in all circumstances since you are not always in control over the client, and not all clients would be capable.
You can't do that on server (wowza) side. Well you can, but data won't be accurate, because of buffering and how HTTP streaming protocols work in general.
However, you can still aggregate those data using some sort of javascript on client side.
You have to listen for player events like - play, pause, stop, even seek. Most web players have callbacks to track those events. Then collect data and send them to your db for storage.
To get duration of the stream, you have to develop a custom module. There is an event called onMediaStreamDestroy, using IMediaStream object you can get the duration.
public class MyMediaStreamListener implements IMediaStreamNotify
{
#Override
public void onMediaStreamDestroy(IMediaStream stream)
{
stream.length(); // this is used to get the length of the video in seconds
}
}

WebRTC - scalable live stream broadcasting / multicasting

PROBLEM:
WebRTC gives us peer-to-peer video/audio connections. It is perfect for p2p calls, hangouts. But what about broadcasting (one-to-many, for example, 1-to-10000)?
Lets say we have a broadcaster "B" and two attendees "A1", "A2". Of course it seems to be solvable: we just connect B with A1 and then B with A2. So B sends video/audio stream directly to A1 and another stream to A2. B sends streams twice.
Now lets imagine there are 10000 attendees: A1, A2, ..., A10000. It means B must send 10000 streams. Each stream is ~40KB/s which means B needs 400MB/s outgoing internet speed to maintain this broadcast. Unacceptable.
ORIGINAL QUESTION (OBSOLETE)
Is it possible somehow to solve this, so B sends only one stream on some server and attendees just pull this stream from this server? Yes, this means the outgoing speed on this server must be high, but I can maintain it.
Or maybe this means ruining WebRTC idea?
NOTES
Flash is not working for my needs as per poor UX for end customers.
SOLUTION (NOT REALLY)
26.05.2015 - There is no such a solution for scalable broadcasting for WebRTC at the moment, where you do not use media-servers at all. There are server-side solutions as well as hybrid (p2p + server-side depending on different conditions) on the market.
There are some promising techs though like https://github.com/muaz-khan/WebRTC-Scalable-Broadcast but they need to answer those possible issues: latency, overall network connection stability, scalability formula (they are not infinite-scalable probably).
SUGGESTIONS
Decrease CPU/Bandwidth by tweaking both audio and video codecs;
Get a media server.
As it was pretty much covered here, what you are trying to do here is not possible with plain, old-fashionned WebRTC (strictly peer-to-peer). Because as it was said earlier, WebRTC connections renegotiate encryption keys to encrypt data, for each session. So your broadcaster (B) will indeed need to upload its stream as many times as there are attendees.
However, there is a quite simple solution, which works very well: I have tested it, it is called a WebRTC gateway. Janus is a good example. It is completely open source (github repo here).
This works as follows: your broadcaster contacts the gateway (Janus) which speaks WebRTC. So there is a key negotiation: B transmits securely (encrypted streams) to Janus.
Now, when attendees connect, they connect to Janus, again: WebRTC negotiation, secured keys, etc. From now on, Janus will emit back the streams to each attendees.
This works well because the broadcaster (B) only uploads its stream once, to Janus. Now Janus decodes the data using its own key and have access to the raw data (that it, RTP packets) and can emit back those packets to each attendee (Janus takes care of encryption for you). And since you put Janus on a server, it has a great upload bandwidth, so you will be able to stream to many peer.
So yes, it does involve a server, but that server speaks WebRTC, and you "own" it: you implement the Janus part so you don't have to worry about data corruption or man in the middle. Well unless your server is compromised, of course. But there is so much you can do.
To show you how easy it is to use, in Janus, you have a function called incoming_rtp() (and incoming_rtcp()) that you can call, which gives you a pointer to the rt(c)p packets. You can then send it to each attendee (they are stored in sessions that Janus makes very easy to use). Look here for one implementation of the incoming_rtp() function, a couple of lines below you can see how to transmit the packets to all attendees and here you can see the actual function to relay an rtp packet.
It all works pretty well, the documentation is fairly easy to read and understand. I suggest you start with the "echotest" example, it is the simplest and you can understand the inner workings of Janus. I suggest you edit the echo test file to make your own, because there is a lot of redundant code to write, so you might as well start from a complete file.
Have fun! Hope I helped.
As #MuazKhan noted above:
https://github.com/muaz-khan/WebRTC-Scalable-Broadcast
works in chrome, and no audio-broadcast yet, but it seems to be a 1st Solution.
A Scalable WebRTC peer-to-peer broadcasting demo.
This module simply initializes socket.io and configures it in a way
that single broadcast can be relayed over unlimited users without any
bandwidth/CPU usage issues. Everything happens peer-to-peer!
This should definitely be possible to complete.
Others are also able to achieve this: http://www.streamroot.io/
AFAIK the only current implementation of this that is relevant and mature is Adobe Flash Player, which has supported p2p multicast for peer to peer video broadcasting since version 10.1.
http://tomkrcha.com/?p=1526.
"Scalable" broadcasting is not possible on the Internet, because the IP UDP multicasting is not allowed there. But in theory it's possible on a LAN. The problem with Websockets is that you don't have access to RAW UDP by design and it won't be allowed.
The problem with WebRTC is that it's data channels use a form of SRTP, where each session has own encryption key. So unless somebody "invents" or an API allows a way to share one session key between all clients, the multicast is useless.
There is the solution of peer-assisted delivery, meaning the approach is hybrid. Both server and peers help distribute the resource. That's the approach peer5.com and peercdn.com have taken.
If we're talking specifically about live broadcast it'll look something like this:
Broadcaster sends the live video to a server.
The server saves the video (usually also transcodes it to all the relevant formats).
A metadata about this live stream is being created, compatible with HLS or HDS or MPEG_DASH
Consumers browse to the relevant live stream there the player gets the metadata and knows which chunks of the video to get next.
At the same time the consumer is being connected to other consumers (via WebRTC)
Then the player downloads the relevant chunk either directly from the server or from peers.
Following such a model can save up to ~90% of the server's bandwidth depending on bitrate of the live stream and the collaborative uplink of the viewers.
disclaimer: the author is working at Peer5
My masters is focused on the development of a hybrid cdn/p2p live streaming protocol using WebRTC. I've published my first results at http://bem.tv
Everything is open source and I'm seeking for contributors! :-)
The answer from Angel Genchev seems to be correct, however, there is a theoretical architecture, that allows low-latency broadcasting via WebRTC. Imagine B (broadcaster) streams to A1 (attendee 1). Then A2 (attendee 2) connects. Instead of streaming from B to A2, A1 starts streaming video being received from B to A2. If A1 disconnects then A2 starts receiving from B.
This architecture could work if there are no latencies and connection timeouts. So theoretically it is right, but not practically.
At the moment I am using server side solution.
I'm developing WebRTC broadcasting system using the Kurento Media Server. Kurento Supports several kinds of streaming protocol such as RTSP, WebRTC, HLS. It works as well in term of real-time and scaling.
Hence, Kurento doesn't support RTMP which is used in Youtube or Twitch now. One of the problem with me is the number of user concurrent with this.
Hope it help.
You are describing using WebRTC with a one-to-many requirement. WebRTC is designed for peer-to-peer streaming, however there are configurations that will let you benefit from the low latency of WebRTC while delivering video to many viewers.
The trick is to not tax the streaming client with every viewer and, like you mentioned, have a "relay" media server. You can build this yourself but honestly the best solution is often to use something like Wowza's WebRTC Streaming product.
To stream efficiently from a phone you can use Wowza's GoCoder SDK but in my experience a more advanced SDK like StreamGears works best.

webrtc streaming video on one side and receiving on another? html5

i was wondering if it is possible to capture video input from a client like the following https://apprtc.appspot.com/?r=91737737, and display it on another so that any viewer can see it, my issue is that i do not have a webcam on my second computer and i would like to receive the video using webrtc. is it possible to capture from one end and capture it on another? perhaps if this isnt possible are websockets the best way to do this?
I see no reason it shouldn't be possible apart from being imperfect due to performance/bandwidth issues.
The most supported HTML5 solution at the moment I would imagine it be the use of getUserMedia which is available on Chrome (consider getUserMedia.js for broader support on camera input, although I haven't used it)
Scenario
We will have a capturer, a server that broadcasts the stream and the watchers that receive the final stream.
Plan
Capture phase
Use getUserMedia to get data from the camera
Draw it on a canvas (maybe you could skip this)
Post the frame as in the format of image data using websockets (e.x. via socket.io for broader support) to the server (e.x. node.js).
Broadcast phase
Receive the image data and just broadcast to the subscribed watchers
Watch phase
The watcher will have a websocket connection with the server
On every new frame received from the server it will have to draw the received frame to a canvas
Considerations
You should take into account that performance of the network will affect the playback.
You could enforce a FPS rate on the client side to avoid jiggly playback speed.
A buffer pool would be nice if it fits your case for smoother playback.
Future
You could use PeerConnection API, MediaSource API when they become available, since this is why they are made, although that will probably increase the CPU usage depending on the browsers' performance.

Categories