I want to add a hidden participant in a group video call to play song stream as participant's voice(without video), with some control like whenever we want to stop or start, we can. I'm trying to pass media stream from a URL as tracks while making a connect request to join room. I'm using quickstart example for this task:
try {
// Fetch an AccessToken to join the Room.
const response = await fetch(`/token?identity=${identity}`);
// Extract the AccessToken from the Response.
const token = await response.text();
// Add the specified Room name to ConnectOptions.
connectOptions.name = roomName;
const audio = new Audio("http://mediaserv30.live-streams.nl:8086/live");
const ctx = new (window.AudioContext || window.webkitAudioContext)();
const stream_dest = ctx.createMediaStreamDestination();
const source = ctx.createMediaElementSource(audio);
source.connect(stream_dest);
const stream = stream_dest.stream;
console.log("==================", stream.getAudioTracks());
const tracks = stream.getTracks().map(track => track.kind === 'audio' ? new LocalAudioTrack(track) : new LocalVideoTrack(track));
connectOptions.tracks = tracks;
await joinRoom(token, connectOptions);
}
Here is what I'm getting after running this:
Any help is really appreciated. I'm stuck on this problem from few days.
Twilio developer evangelist here.
Your code looks correct to me. And the error message in the screenshot doesn't have anything to do with what you have written.
The suggestion in the screenshot is that you are perhaps not running the application in a secure context. In order to use WebRTC you need to do so from either localhost or a site served over HTTPS. Are you testing this on a development domain over HTTP or using a local IP address?
If you are testing this on localhost then perhaps there is another issue. Please share any error logs from the developer tools that may be relevant too.
Related
I'm completely new to the Web Audio API, and not too terribly proficient in javascript. However, I had a specific function that I want to implement into a website I'm working on that requires Google's TTS API, which returns Base64 audio, to go through a reverb filter and then (preferably) autoplay the resulting audio.
So here's how the workflow looks.
TTS request to Google => Base64 response from Google => Base64 converted & sent through Convolver (reverb) node => Output sent to user's output device.
So what I'm struggling on first and foremost is getting ANY sort of response from an audio file going through the nodes. After that, I can deal with the Base64 conversions.
Any help would be appreciated. My IDE's are no help whatsoever. They all basically tell me "Congrats, this code looks fantastic!". Meanwhile, I'm over here pulling my hairs out and 2 lines of code away from jumping out my window.
Here's the code I've been working with. This obviously wouldn't be the entirety of it, but I thought I should first get some sound coming out of it before moving on.
let context;
let compressor;
let reverb;
let source1
let lowpassFilter;
let waveShaper;
let panner;
let wet;
let dry;
let masterDry;
let masterWet;
function effectsBoard () {
context = new (window.AudioContext || window.webkitAudioContext)();
// Effects Setup
lowpassFilter = context.createBiquadFilter();
waveShaper = context.createWaveShaper();
panner = context.createPanner();
compressor = context.createDynamicsCompressor();
reverb = context.createConvolver();
//Master Gains for Wet and Dry
masterDry = context.createGain();
masterWet = context.createGain();
//Connect the compressor (the last effect) to the final destination (audio output)
compressor.connect(context.destination);
//Connect the Master Wet and Dry signals to the compressor for mixing before the output.
masterDry.connect(compressor);
masterWet.connect(compressor);
//Connect Reverb to the Wet Master Gain
reverb.connect(masterWet);
//Connect source1 to the effectt - first the dry signal and then the wet
source1.connect(lowpassFilter);
lowpassFilter.connect(masterDry);
lowpassFilter.connect(reverb);
//Create a Source Buffer
fetch("voice.mp3")
.then(data => data.arrayBuffer())
.then(arrayBuffer => context.decodeAudioData(arrayBuffer))
.then(decodedAudio => {
avaAudio = decodedAudio;
});
//Then start the sources on run event
function playback() {
source1 = context.createBufferSource();
source1.buffer = avaAudio;
source1.start(context.currentTime);
}
window.addEventListener("mousedown", playback);
In skimming through your code, it looks okay. I think you're getting bit by autoplay policy.
When you create an audio context, it usually starts out as paused. You need to call context.resume(), but you can only do that on a trusted event.
mousedown isn't a trusted event. You actually need a full click event for that.
Also, at least in the code you show here, it seems effectsBoard() is never called, but I assume that there's more code.
Use your browser's developer tools to see what errors you need to see.
So I'm trying to manipulate the audio which comes from the client on the server and then return the newly manipulated audio back to the client. Of course this runs into the error that Node does not support the web-audio-api so I'm using the Node version of web-audio-api along with a WebRTC library.
As I'm new to the WebRTC world I've been building on from these examples which use the WebRTC library. Using the audio-video-loopback example as the starting point I've utilised some of the libraries none standard APIs to create an audio sink that allows me to directly access the samples from the client. For now I just want to change the volume so I'm just changing the values and pushing them into a new track which is how the doc (scroll down to Programmatic Audio) says to do this. At the end I just want to return to the newly created track which is done using the .replaceTrack method (which I believe retriggers a renegotiation).
Here's what I got so far for the server code (client is the same as the original example found in the link above):
const { RTCAudioSink, RTCAudioSource } = require("wrtc").nonstandard;
function beforeOffer(peerConnection) {
const audioTransceiver = peerConnection.addTransceiver("audio");
const videoTransceiver = peerConnection.addTransceiver("video");
let { track } = audioTransceiver.receiver;
const source = new RTCAudioSource();
const newTrack = source.createTrack();
const sink = new RTCAudioSink(track);
const sampleRate = 48000;
const samples = new Int16Array(sampleRate / 100); // 10 ms of 16-bit mono audio
const dataObj = {
samples,
sampleRate,
};
const interval = setInterval(() => {
// Update audioData in some way before sending.
source.onData(dataObj);
});
sink.ondata = (data) => {
// Do something with the received audio samples.
const newArr = data.samples.map((el) => el * 0.5);
dataObj[samples] = newArr;
};
return Promise.all([
audioTransceiver.sender.replaceTrack(newTrack),
videoTransceiver.sender.replaceTrack(videoTransceiver.receiver.track),
]);
}
Not surprisingly this doesn't work, I just get silence back even though the dataObj contains the correctly manipulated samples which is then passed to the newTrack when source.onData is called.
Is what I'm trying to do even possible server side? Any suggestions are welcome, like I said I'm very green with WebRTC.
I'm developing a website where the user can send audio commands which are captured with getUserMedia (only audio) and interpreted in the backend with a Speech-to-Text service. In order to keep the latency as low as possible, I'm sending small audio chunks to my server. This is working just fine on Chrome/Firefox and even Edge. However, I'm struggling with iOS Safari. I know that Safari is my only choice on Apple devices because of the missing WebRTC support on iOS Chrome/Firefox.
The problem is that I normally get the user's voice a couple of times (for some commands). But without any pattern the stream then suddenly contains only empty bytes. I tried a lot of different strategies but in general I stuck to the following plan:
After user clicks a button, call getUserMedia (with audio constraint) and save stream to a variable
Create AudioContext (incl. Gain, MediaStreamSource, ScriptProcess) and connect the audio stream to the MediaStreamSource
Register an event listener to the ScriptProcessor and send audio chunks in callback to the server
When a result is returned from the server close AudioContext and audio's MediaStream
The interesting part is now what happens after a subsequent user command. I tried various things: Call getUserMedia again for each call and close the MediaStream track each time, use the initially created MediaStream and reconnect the EventHandler every time, close the AudioContext after every call, use only one initially created AudioContext... All my attempts failed so far, because I either got empty bytes from the Stream or the AudioContext was created in a "suspended" state. Only closing MediaStream/AudioContext and creating it every time again seems to be more stable, but fetching the MediaStream with getUserMedia takes quite a while on iOS (~1,5-2s), which gives a bad user experience.
I'll show you my latest attempt where I tried to mute/disable the stream in between user commands and keep the AudioContext open:
var audioStream: MediaStream;
var audioContext: AudioContext;
var startButton = document.getElementById("startButton");
startButton.onclick = () => {
if (!audioStream) {
getUserAudioStream();
} else {
// mute/disable stream
audioStream.getAudioTracks()[0].enabled = true;
}
}
var stopButton = document.getElementById("stopButton");
stopButton.onclick = () => {
// unmute/enable stream
audioStream.getAudioTracks()[0].enabled = false;
}
function getUserAudioStream(): Promise<any> {
return navigator.mediaDevices.getUserMedia({
audio: true
} as MediaTrackConstraints,
}).then((stream: MediaStream) => {
audioStream = stream;
startRecording();
}).catch((e) => { ... });
}
const startRecording = () => {
const ctx = (window as any).AudioContext || (window as any).webkitAudioContext;
if (!ctx) {
console.error("No Audio Context available in browser.");
return;
} else {
audioContext = new ctx();
}
const inputPoint = audioContext.createGain();
const microphone = audioContext.createMediaStreamSource(audioStream);
scriptProcessor = inputPoint.context.createScriptProcessor(4096, 1, 1);
microphone.connect(inputPoint);
inputPoint.connect(scriptProcessor);
scriptProcessor.connect(inputPoint.context.destination);
scriptProcessor.addEventListener("audioprocess", streamCallback);
};
const streamCallback = (e) => {
const samples = e.inputBuffer.getChannelData(0);
// Here I stream audio chunks to the server and
// observe that buffer sometimes only contains empty bytes...
}
I hope the snippet makes sense to you, because I let some stuff out to keep it readable. I think I made clear that this is only one of many attempts and actually my question is: Is there some kind of special characteristic in WebRTC/getUserMedia on iOS that I missed so far? Why does iOS treat MediaStream differently than Chrome/Firefox on Windows? As a last comment: I know that the ScriptProcessorNode is no longer recommended. Actually, I'd like to use MediaRecorder for that but this is also not yet supported on iOS. Also, the polyfill I know is not really suitable because it only support ogg for streaming audio and which also leads to problems because I would need to set the sample rate for that to a fixed value.
I am wanting to create a live audio stream from one device to a node server which can then broadcast that live feed to several front ends.
I have searched extensively for this and have really hit a wall so hoping somebody out there can help.
I am able to get my audio input from the window.navigator.getUserMedia API.
getAudioInput(){
const constraints = {
video: false,
audio: {deviceId: this.state.deviceId ? {exact: this.state.deviceId} : undefined},
};
window.navigator.getUserMedia(
constraints,
this.initializeRecorder,
this.handleError
);
}
This then passes the stream to the initializeRecorder function which utilises the AudioContext API to create a createMediaStreamSource`
initializeRecorder = (stream) => {
const audioContext = window.AudioContext;
const context = new audioContext();
const audioInput = context.createMediaStreamSource(stream);
const bufferSize = 2048;
// create a javascript node
const recorder = context.createScriptProcessor(bufferSize, 1, 1);
// specify the processing function
recorder.onaudioprocess = this.recorderProcess;
// connect stream to our recorder
audioInput.connect(recorder);
// connect our recorder to the previous destination
recorder.connect(context.destination);
}
In my recorderProcess function, I now have an AudioProcessingEvent object which I can stream.
Currently I am emitting the audio event as as a stream via a socket connection like so:
recorderProcess = (e) => {
const left = e.inputBuffer.getChannelData(0);
this.socket.emit('stream', this.convertFloat32ToInt16(left))
}
Is this the best or only way to do this? Is there a better way by using fs.createReadStream and then posting the an endpoint via Axios? As far as I can tell this will only work with a file as opposed to a continuous live stream?
Server
I have a very simple socket server running ontop of express. Currently I listen for the stream event and then emit that same input back out:
io.on('connection', (client) => {
client.on('stream', (stream) => {
client.emit('stream', stream)
});
});
Not sure how scalable this is but if you have a better suggestion, I'm very open to it.
Client
Now this is where I am really stuck:
On my client I am listening for the stream event and want to listen to the stream as audio output in my browser. I have a function that receives the event but am stuck as to how I can use the arrayBuffer object that is being returned.
retrieveAudioStream = () => {
this.socket.on('stream', (buffer) => {
// ... how can I listen to the buffer as audio
})
}
Is the way I am streaming audio the best / only way I can upload to the node server?
How can I listen to the arrayBuffer object that is being returned on my client side?
Is the way I am streaming audio the best / only way I can upload to the node server?
Not really the best but i have seen worse, its not the only way either using websockets its considered ok from point of view since you want things to be "live" and not keep sending http post request every 5sec.
How can I listen to the arrayBuffer object that is being returned on my client side?
You can try this BaseAudioContext.decodeAudioData to listen to data streamed, the example is pretty simple.
From the code snippets you provide i assume you want to build something from scratch to learn how things work.
In that case, you can try MediaStream Recording API along with an websocket server that sends the chunks to X clients so they can reproduce the audio, etc.
It would make sense to invest time into WebRTC API, to learn how to stream from client to another client.
Also take a look at the links below for some useful information.
(stackoverflow) Get live streaming audio from NodeJS server to clients
(github) video-conference-webrtc
twitch.tv tech stack article
rtc.io
I see a lot of questions for how to record audio then stop recording, then play audio or save it to a file, but none of this is what I want.
tl;dr Here's my question in a nutshell: "How can I immediately play audio recorded from the user's microphone?" That is, I don't want to save a recording and play it when the user hits a "Play" button, I don't want to save a recording to a file on the user's computer and I don't want to use WebRTC to stream audio anywhere. I just want to talk into my microphone and hear my voice come out the speakers.
All I'm trying to do is make a very simple "echo" page that just immediately plays back audio recorded from the mic. I started using a mediaRecorder object, but that wasn't working and from what I can tell that's meant for recording full audio files, so I switched to an AudioContext-based approach.
A very simple page would just look like this:
<!DOCTYPE html>
<head>
<script type="text/javascript" src="mcve.js"></script>
</head>
<body>
<audio id="speaker" volume="1.0"></audio>
</body>
and the script looks like this:
if (navigator.mediaDevices) {
var constrains = {audio: true};
navigator.mediaDevices.getUserMedia(constrains).then(
function (stream) {
var context = new AudioContext();
var source = context.createMediaStreamSource(stream);
var proc = context.createScriptProcessor(2048, 2, 2);
source.connect(proc);
proc.onaudioprocess = function(e) {
console.log("audio data collected");
let audioData = new Blob(e.inputBuffer.getChannelData(0), {type: 'audio/ogg' } )
|| new Blob(new Float32Array(2048), {type: 'audio/ogg'});
var speaker = document.getElementById('speaker');
let url = URL.createObjectURL(audioData);
speaker.src = url;
speaker.load();
speaker.play().then(
() => { console.log("Playback success!"); },
(error) => { console.log("Playback failure... ", error); }
);
};
}
).catch( (error) => {
console.error("couldn't get user media.");
});
}
It can record non-trivial audio data (i.e. not every collection winds up as a Blob made from the new Float32Array(2048) call), but it can't play it back. It never hits the "could not get user media" catch, but it always hits the "Playback Failure..." catch. The error prints like this:
DOMException [NotSupportedError: "The media resource indicated by the src attribute or assigned media provider object was not suitable."
code: 9
nsresult: 0x806e0003]
Additionally, the message Media resource blob:null/<long uuid> could not be decoded. is printed to the console repeatedly.
There are two things that could be going on here, near as I can tell (maybe both):
I'm not encoding the audio. I'm not sure if this is a problem, since I thought that data collected from the mic came with 'ogg' encoding automagically, and I've tried leaving the type property of my Blobs blank to no avail. If this is what's wrong, I don't know how to encode a chunk of audio given to me by the audioprocess event, and that's what I need to know.
An <audio> element is fundamentally incapable of playing audio fragments, even if properly encoded. Maybe by not having a full file, there's some missing or extraneous metadata that violates encoding standards and is preventing the browser from understanding me. If this is the case, maybe I need a different element, or even an entirely scripted solution. Or perhaps I'm supposed to construct a file-like object in-place for each chunk of audio data?
I've built this code on examples from MDN and SO answers, and I should mention I've tested my mic at this example demo and it appears to work perfectly.
The ultimate goal here is to stream this audio through a websocket to a server and relay it to other users. I DON'T want to use WebRTC if at all possible, because I don't want to limit myself to only web clients - once it's working okay, I'll make a desktop client as well.
Check example https://jsfiddle.net/greggman/g88v7p8c/ from https://stackoverflow.com/a/38280110/351900
Required to be run from HTTPS
navigator.getUserMedia = navigator.getUserMedia ||navigator.webkitGetUserMedia || navigator.mozGetUserMedia;
var aCtx;
var analyser;
var microphone;
if (navigator.getUserMedia) {
navigator.getUserMedia(
{audio: true},
function(stream) {
aCtx = new AudioContext();
microphone = aCtx.createMediaStreamSource(stream);
var destination=aCtx.destination;
microphone.connect(destination);
},
function(){ console.log("Error 003.")}
);
}