Rxjs: Observable with takeUntil(timer) keeps emitting after the timer has ticked - javascript

I have run into a very strange behavior of takeUntil(). I create an observable timer:
let finish = Observable.timer(3000);
Then I wait for some time and call
// 2500 ms later
someObservable.takeUntil(finish);
I would expect said observable to stop emitting after the timer "ticks", i.e. about 500ms after its creation. In reality it keeps emitting for 3000ms after its creation, way beyond the moment when the timer "ticks". This does not happen if I create the timer with a Date object containing absolute time value.
Is this by design? If yes, what is the explanation?
Here's complete code, runnable with node.js (it requires npm install rx):
let {Observable, Subject} = require("rx")
let start = new Date().getTime();
function timeMs() { return new Date().getTime() - start };
function log(name, value) {
console.log(timeMs(), name, value);
}
Observable.prototype.log = function(name) {
this.subscribe( v=>log(name,v),
err=>log(name, "ERROR "+err.message),
()=>log(name, "DONE"));
return this;
}
let finish = Observable.timer(3000).log("FINISH");
setTimeout( ()=>Observable.timer(0,500).takeUntil(finish).log("seq"), 2500);
This generates the following output:
2539 'seq' 0
3001 'FINISH' 0
3005 'FINISH' 'DONE'
3007 'seq' 1
3506 'seq' 2
4006 'seq' 3
4505 'seq' 4
5005 'seq' 5
5506 'seq' 6
5507 'seq' 'DONE'
If I create the timer using absolute time:
let finish = Observable.timer(new Date(Date.now()+3000)).log("FINISH");
Then it behaves as expected:
2533 'seq' 0
3000 'seq' 'DONE'
3005 'FINISH' 0
3005 'FINISH' 'DONE'
This behavior seems to be rather consistent in various situations.E.g. if you take an interval and create child sequences using mergeMap() or switchMap(), the result would be similar: child sequences keep emitting beyond the finish event.
Thoughts?

You are forgetting the first rule of cold Observables: Each subscription is a new stream.
Your log operator has a bug; it is subscribing once to the Observable (thus creating the first subscription) and then returning the original Observable, which get subscribed to again, implicitly, when you pass it to the takeUntil operator. Thus in reality you actually have two seq streams active, both of which are behaving correctly.
It works in the absolute case, because you are basically setting each stream to emit at a specific time, not a relative time to when the subscription occurs.
If you want to see it work I would suggest you change your implementation to:
let start = new Date().getTime();
function timeMs() { return new Date().getTime() - start };
function log(name, value) {
console.log(timeMs(), name, value);
}
Observable.prototype.log = function(name) {
// Use do instead of subscribe since this continues the chain
// without directly subscribing.
return this.do(
v=>log(name,v),
err=>log(name, "ERROR "+err.message),
()=>log(name, "DONE")
);
}
let finish = Observable.timer(3000).log("FINISH");
setTimeout(()=>
Observable.timer(0,500)
.takeUntil(finish)
.log("seq")
.subscribe(),
2500);

For completeness, here's the code that actually does what I wanted. By using Observable.publish().connect() it creates a "hot" timer that starts ticking immediately, and keeps the same time for all subscribers. It also avoid unwanted subscriptions in the "log" method, as suggested by #paulpdaniels.
Warning: beware of the race condition. If the child sequence starts after the timer has ticked, it will never stop. To demonstrate, change timeout in the last line from 2500 to 3500.
let {Observable, Subject, Scheduler, Observer} = require("rx")
let start = new Date().getTime();
function timeMs() { return new Date().getTime() - start };
function log(name, value) {
console.log(timeMs(), name, value);
}
var logObserver = function(name) {
return Observer.create(
v=>log(name,v),
err=>log(name, "ERROR "+err.message),
()=>log(name, "DONE"));
}
Observable.prototype.log = function(name) { return this.do(logObserver(name)); }
Observable.prototype.start = function() {
var hot = this.publish(); hot.connect();
return hot;
}
let finish = Observable.timer(3000).log("FINISH").start();
setTimeout(()=>
Observable.timer(0,500)
.takeUntil(finish)
.log("seq")
.subscribe(),
2500);
The output is
2549 'seq' 0
3002 'FINISH' 0
3006 'seq' 'DONE'
3011 'FINISH' 'DONE'

Related

setTimeout callbacks have different execution orders in Firefox and Chrome

When I run this code in Firefox and Chrome, the results are different:
function run() {
setTimeout(() => console.log("1"), 0);
setTimeout(() => console.log("2"), 100);
let start = Date.now();
while (Date.now() - start < 200) {
// do nothing
}
setTimeout(() => {
console.log("3");
}, 0);
start = Date.now();
while (Date.now() - start < 200) {
// do nothing
}
setTimeout(() => {
console.log("4");
}, 0);
}
run();
In Chrome (and Node.js), this is printed:
1
3
2
4
In Firefox, this is printed:
1
2
3
4
But if I remove the line 2 (setTimeout(() => console.log("1"), 0);), then the same thing is printed on every platform:
2
3
4
How to explain these different results?
Thanks!
The explanation: It doesn't matter.
The details of when deferred "messages" are added to the event loop message queue are implementation details, not documented guarantees. By the time your function yields control back to the event loop, all of your setTimeout call are eligible to execute (three of them were scheduled to run immediately, one of them was scheduled to run in 100 ms) and you've guaranteed it's been at least 400 ms since you scheduled it.
The difference between the two could be as simple as whether they choose to look for deferred tasks that have become ready (to move from the deferred queue to the main "ready to go" message queue) immediately before or immediately after new items are inserted in the main message queue. Chrome chooses to move immediately after 3 is scheduled (so 3 goes in, then the deferred 2), Firefox immediately before (moving in 2 before it puts 3 in).
Both of them could change in the next release without violating any documented guarantees. Don't rely on it, don't expect it to be stable. While immediately scheduled tasks are guaranteed to execute in FIFO order, there are no guarantees on when deferred tasks get moved onto the "ready-to-go" message queue. The spec seems to requires that 1, 3 and 4 execute in that order (since they were all immediately ready, not deferred), with only the ordering of 2 being flexible, but even that isn't a true guarantee; it can get weird with the various ways in which an "immediate" setTimeout task may not actually be scheduled immediately.
You may be interested in the MDN docs on why setTimeout can take longer than expected; it explains by side-effect a lot of how the event loop works, even as it carefully provides no guarantees on the details you're exploring.
I can't give you an full detailed explanation, but the second paramter of setTimeoput and setInterval doesn't mean, it will exactly execute it at that time. They will put it in a queue, so the background can execute it.
The browser has a lifecycle when to execute specific steps to update the data and the styles.
I can only send you this youtube link, that helped me to learn more about it:
https://www.youtube.com/watch?v=MCi6AZMkxcU
1, 2, 3, 4 is the behavior that is expected.
The specs ask to
Wait until any invocations of this algorithm that had the same global and orderingIdentifier, that started before this one, and whose milliseconds is equal to or less than this one's, have completed.
So any call to setTimeout that were both made before, and had their milliseconds set to a lower value should be called first.
Firefox, Safari, and the current stable channel of Chrome all do this.
So when the event loop gains control again, it sees that all the timers are ready to be called, and it queues tasks for each, in this scheduled called order:
"1": scheduled-time = t=0 + 0 = 0
"2": scheduled-time = t=0 + 100 = 100
"3": scheduled-time = t=200 + 0 = 300
"4": scheduled-time = t=400 + 0 = 400
But, what Chrome apparently used to do and still does in its other branches is that they only do look at the milliseconds param to do the ordering and ignore the first "that started before this one" condition.
So in there we've got,
"1": milliseconds = 0
"3": milliseconds = 0
"4": milliseconds = 0
"2": milliseconds = 100
Below is a rewrite of this logic:
// We use a MessageChannel to hook on each iteration of the event loop
function postTask(cb) {
const channel = postTask.channel ??= new MessageChannel();
const { port1, port2 } = channel;
port1.addEventListener("message", (evt) => { cb() }, { once: true });
port1.start();
port2.postMessage("");
}
const timers = new Set();
let ended = false; // So we can stop our loop after some time
function timeoutChecker() {
const now = performance.now();
const toCall = Array.from(timers)
.filter(({ startTime, millis }) => startTime + millis <= now)
.sort((a, b) => a.millis - b.millis);
while(toCall.length) {
const timer = toCall.shift();
timers.delete(timer);
timer.callback();
}
if (!ended) {
postTask(timeoutChecker);
}
}
function myTimeout(callback, millis) {
const startTime = performance.now();
timers.add({ startTime, millis, callback });
}
// Begin our loop
postTask(timeoutChecker);
// OP's code
function run() {
myTimeout(() => console.log("1"), 0);
myTimeout(() => console.log("2"), 100);
let start = Date.now();
while (Date.now() - start < 200) {
// do nothing
}
myTimeout(() => {
console.log("3");
}, 0);
start = Date.now();
while (Date.now() - start < 200) {
// do nothing
}
myTimeout(() => {
console.log("4");
}, 0);
}
run();
// all should be done after 1s
setTimeout(() => ended = true, 1000);
As for why you sometimes may see "2" before "4" in Chrome and node.js, it's because they do clamp 0ms timeout to 1ms (thought they're working on removing this in Chrome). So when the event loop gains control at t=400, this log("4") timeout may not have met the timer condition yet.
Finally about Chrome's branch thing, I must admit I'm not sure at all what happens there. Running a bisect (against Canary branch) I couldn't find a single revision where the current stable branch behavior happens, so this must be a branch settings thing.

In RxJS, should complete be called in an observable that emits no items?

In my last post, I was trying to buffer pending http requests using RxJS. I thought bufferCount was what I needed, but I found it my items were under the buffer size, it would just wait, which is not what I was after.
I now have a new scheme, using take. It seems to do what I am after, except when my resulting observable has no items (left), the complete is never called.
Eg I have something like the following..
const pendingRequests = this.store$.select(mySelects.getPendingRequests).pipe(
// FlatMap turns the observable of a single Requests[] to observable of Requests
flatMap(x => x),
// Only get requests unprocessed
filter(x => x.processedState === ProcessedState.unprocessed),
// Batches of batchSize in each emit
take(3),
);
let requestsSent = false;
pendingRequests.subscribe(nextRequest => {
requestsSent = true;
this.sendRequest(nextEvent);
},
error => {
this.logger.error(`${this.moduleName}.sendRequest: Error ${error}`);
},
() => {
// **** This is not called if pendingRequests is empty ****
if (requestsSent ) {
this.store$.dispatch(myActions.continuePolling());
} else {
this.store$.dispatch(myActions.stopPolling());
}
}
);
So the take(3) will get the next 3 pending requests and send them ()where I also dispatch an action to set the processed state to not ProcessedState.pending so we don't get them in the next poll)
This all works fine, but when pendingRequests eventually returns nothing (is empty), the completed block, marked with the ****. is not called. I would have thought this would just be called straight away.
I am not sure if this matters, as since I don't then dispatch the action to continue polling, the polling does stop.
But my biggest concern is if pendingRequests is not completed, do I need to unsubscribe from it to prevent any leaks? I assume if the complete is called I do not need to unsubscribe?
Update
To get the pendingRegueststo always complete, I have taken a slightly different approach. Rather than using the rx operators to "filter", I Just get the whole list every time, and just take(1) on it. I will always get the list, even if it is empty, so the pendingReguests will complete every time.
ie
const pendingRequests = this.store$.select(mySelects.getPendingRequests).pipe(take(1))
And then I can just filter and batch inside the observable..
pendingRequests.subscribe(nextRequest => {
let requestToSend = nextRequest.filter(x => x.processedState === ProcessedState.unprocessed);
const totalPendingCount = requestToSend.length;
requestToSend = requestToSend slice(0, this.batchSize);
for (const nextRequest of requestToSend) {
this.sendRequest(nextRequest);
}
if (totalPendingCount > this.batchSize) {
this.store$.dispatch(myActions.continuePolling());
}
In my testing so far, I have now always got the complete to fire.
Also, by having 2 actions (a startPolling, and a continuePolling) I can put the delay just in the continuePolling, so the first time we start the polling (eg the app has just come back online after being out of network range), we submit straight away, and only delay if we have more than the batch size
Maybe this is not 100% the "rxy" way of doing it, but seems to work so far. Is there any problem here?
I would substitute take with toArray and a bit of buffering logic afterwards.
This is how your code could look like. I have added the delay logic, which I think was suggested by your previous post, and provided comments to describe each line added
// implementation of the chunk function used below
// https://www.w3resource.com/javascript-exercises/fundamental/javascript-fundamental-exercise-265.php
const chunk = (arr, size) =>
Array.from({ length: Math.ceil(arr.length / size) }, (v, i) =>
arr.slice(i * size, i * size + size)
);
const pendingRequests = this.store$.select(mySelects.getPendingRequests).pipe(
// FlatMap turns the observable of a single Requests[] to observable of Requests
flatMap(x => x),
// Only get requests unprocessed
filter(x => x.processedState === ProcessedState.unprocessed),
// Read all the requests and store them in an array
toArray(),
// Split the array in chunks of the specified size, in this case 3
map(arr => chunk(arr, 3)), // the implementation of chunk is provided above
// Create a stream of chunks
concatMap((chunks) => from(chunks)),
// make sure each chunk is emitted after a certain delay, e.g. 2 sec
concatMap((chunk) => of(chunk).pipe(delay(2000))),
// mergeMap to turn an array into a stream
mergeMap((val) => val)
);
let requestsSent = false;
pendingRequests.subscribe(nextRequest => {
requestsSent = true;
this.sendRequest(nextEvent);
},
error => {
this.logger.error(`${this.moduleName}.sendRequest: Error ${error}`);
},
() => {
// **** THIS NOW SHOULD BE CALLED ****
if (requestsSent ) {
this.store$.dispatch(myActions.continuePolling());
} else {
this.store$.dispatch(myActions.stopPolling());
}
}
);
I doubt that pendingRequests will ever complete by itself. The Store, at least in ngrx, is a BehaviorSubject. So, whenever you do store.select() or store.pipe(select()), you're just adding another subscriber to the internal list of subscribers maintained by the BehaviorSubject.
The BehaviorSubject extends Subject, and here is what happens when the Subject is being subscribed to:
this.observers.push(subscriber);
In your case, you're using take(3). After 3 values, the take will emit a complete notification, so your complete callback should be called. And because the entire chain is actually a BehaviorSubject's subscriber, it will remove itself from the subscribers list on complete notifications.
I assume if the complete is called I do not need to unsubscribe
Here is what happens when a subscriber(e.g TakeSubscriber) completes:
protected _complete(): void {
this.destination.complete();
this.unsubscribe();
}
So, there is no need to unsubscribe if a complete/error notification already occurred.

Why is my code executing although it should pause?

I have an API which is limited regarding how many requests per minute (50/minute) I can send to any endpoint provided by that API.
In the following code-section, I filter the objects orders with an URL as property, every object with an URL that provides data should be stored in successfullResponses in my app.component.ts.
Promise.all(
orders.map(order => this.api.getURL(order.resource_url).catch(() => null))
).then(responses => {
const successfulResponses = responses.filter(response => response != null)
for(let data of successfulResponses) {
// some other requests should be sent with data here
}
});
There are more than 50 orders to check, but I just can check maximum 50 orders at once, so I try to handle it in my service. I set the first date when the first request is sent. After that I compare the dates of the new request with the first one. If the difference is over 60, I set the current date to the new one and set maxReq again to 50. If it is under 60, I check if there are requests left, if yes I send the request and if not I just wait one minute :
sleep(ms){
return new Promise(resolve => setTimeout(resolve, ms));
}
async getURL(){
if(!this.date){
let date = new Date();
this.date = date;
}
if((new Date().getSeconds() - this.date.getSeconds() > 60 )){
this.maxReq = 50;
this.date = new Date();
return this.http.get(url, this.httpOptions).toPromise();
} else {
if(this.maxReq > 0){
this.maxReq -= 1;
return this.http.get(url, this.httpOptions).toPromise();
} else{
console.log("wait");
await this.sleep(60*1000);
this.maxReq = 50;
this.date = new Date();
return this.http.get(url, this.httpOptions).toPromise();
}
}
}
However the code in app.component.tsis not waiting for the function getURL() and executes further code with requests which leads to the problem that I send ´too many requests too quickly´.
What can I do about that?
I had a similar problem while trying to use promises with multiple async functions. It's an easy thing to forget, but in order to make them all wait, you have to use await on the root line that calls the function in question.
I'm not entirely certain, but my presumption is that your await this.sleep(60*1000); line is indeed waiting for a timeout to occur, but whilst it is doing this, the code that called getURL() is executing the rest of its lines, because it did not have an await (or equivalent, like .then) before getURL().
The way I discovered this in my case was by using a good debugging tool (I used Chrome DevTools's own debugging features). I advise you do the same, adding breakpoints everywhere, and see where your code is going with each line.
Here is a short, rough example to show what I mean:
// This code increments a number from 1 to 2 to 3 and returns it each time after a delay of 1 second.
async function loop() {
for (i = 1; i <= 3; i++) {
console.log('Input start');
/* The following waits for result of aSync before continuing.
Without 'await', it would execute the last line
of this function whilst aSync's own 'await'
waited for its result.
--- This is where I think your code goes wrong. --- */
await aSync(i);
console.log('Input end');
}
}
async function aSync(num) {
console.log('Return start');
/* The following waits for the 1-second delay before continuing.
Without 'await', it would return a pending promise immediately
each time. */
let result = await new Promise(
// I'm not using arrow functions to show what it's doing more clearly.
function(rs, rj) {
setTimeout(
function() {
/* For those who didn't know, the following passes the number
into the 'resolved' ('rs') parameter of the promise's executor
function. Without doing this, the promise would never be fulfilled. */
rs(num);
}, 1000
)
}
);
console.log(result);
console.log('Return end');
}
loop();

JS async / await tasks queue

In my JS app I'm using the async / await feature. I would like to perform multiple API calls and would like them to be fired one after other. In other words I would like to replace this simple method:
const addTask = async (url, options) => {
return await fetch(url, options)
}
with something more complex.. like:
let tasksQueue = []
const addTask = async (url, options) => {
tasksQueue.push({url, options})
...// perform fetch in queue
return await ...
}
What will be the best way to handle the asynchronous returns?
You could use a Queue data structure as your base and add special behavior in a child class. a Queue has a well known interface of two methods enqueue() (add new item to end) and dequeue() (remove first item). In your case dequeue() awaits for the async task.
Special behavior:
Each time a new task (e.g. fetch('url')) gets enqueued, this.dequeue() gets invoked.
What dequeue() does:
if queue is empty ➜ return false (break out of recursion)
if queue is busy ➜ return false (prev. task not finished)
else ➜ remove first task from queue and run it
on task "complete" (successful or with errors) ➜ recursive call dequeue() (2.), until queue is empty..
class Queue {
constructor() { this._items = []; }
enqueue(item) { this._items.push(item); }
dequeue() { return this._items.shift(); }
get size() { return this._items.length; }
}
class AutoQueue extends Queue {
constructor() {
super();
this._pendingPromise = false;
}
enqueue(action) {
return new Promise((resolve, reject) => {
super.enqueue({ action, resolve, reject });
this.dequeue();
});
}
async dequeue() {
if (this._pendingPromise) return false;
let item = super.dequeue();
if (!item) return false;
try {
this._pendingPromise = true;
let payload = await item.action(this);
this._pendingPromise = false;
item.resolve(payload);
} catch (e) {
this._pendingPromise = false;
item.reject(e);
} finally {
this.dequeue();
}
return true;
}
}
// Helper function for 'fake' tasks
// Returned Promise is wrapped! (tasks should not run right after initialization)
let _ = ({ ms, ...foo } = {}) => () => new Promise(resolve => setTimeout(resolve, ms, foo));
// ... create some fake tasks
let p1 = _({ ms: 50, url: '❪𝟭❫', data: { w: 1 } });
let p2 = _({ ms: 20, url: '❪𝟮❫', data: { x: 2 } });
let p3 = _({ ms: 70, url: '❪𝟯❫', data: { y: 3 } });
let p4 = _({ ms: 30, url: '❪𝟰❫', data: { z: 4 } });
const aQueue = new AutoQueue();
const start = performance.now();
aQueue.enqueue(p1).then(({ url, data }) => console.log('%s DONE %fms', url, performance.now() - start)); // = 50
aQueue.enqueue(p2).then(({ url, data }) => console.log('%s DONE %fms', url, performance.now() - start)); // 50 + 20 = 70
aQueue.enqueue(p3).then(({ url, data }) => console.log('%s DONE %fms', url, performance.now() - start)); // 70 + 70 = 140
aQueue.enqueue(p4).then(({ url, data }) => console.log('%s DONE %fms', url, performance.now() - start)); // 140 + 30 = 170
Interactive demo:
Full code demo: https://codesandbox.io/s/async-queue-ghpqm?file=/src/index.js
You can play around and watch results in console and/or dev-tools "performance" tab. The rest of this answer is based on it.
Explain:
enqueue() returns a new Promise, that will be resolved(or rejected) at some point later. This Promise can be used to handle the response of your async task Fn.
enqueue() actually push() an Object into the queue, that holds the task Fn and the control methods for the returned Promise.
Since the unwrapped returned Promise insta. starts to run, this.dequeue() is invoked each time we enqueue a new task.
With some performance.measure() added to our task, we get good visualization of our queue:
(*.gif animation)
1st row is our queue instance
New enqueued tasks have a "❚❚ waiting.." period (3nd row) (might be < 1ms if queue is empty`)
At some point it is dequeued and "▶ runs.." for a while (2nd row)
Log output (console.table()):
Explain:
1st task is enqueue()d at 2.58ms right after queue initialization.
Since our queue is empty there is like no ❚❚ waiting (0.04ms➜ ~40μm).
Task runtime 13.88ms ➜ dequeue
Class Queue is just a wrapper for native Array Fn´s!
You can of course implement this in one class. I just want to show, that you can build what you want from already known data structures. There are some good reasons for not using an Array:
A Queue data-structure is defined by an Interface of two public methods. Using an Array might tempt others to use native Array methods on it like .reverse(),.. which would break the definition.
enqueue() and dequeue() are much more readable than push() and shift()
If you already have an out-implemented Queue class, you can extend from it (re-usable code)
You can replace the item Array in class Queue by an other data structure: A "Doubly Linked List" which would reduce code complexity for Array.shift() from O(n) [linear] to O(1) [constant]. (➜ better time complexity than native array Fn!) (➜ final demo)
Code limitations
This AutoQueue class is not limited to async functions. It handles anything, that can be called like await item[MyTask](this):
let task = queue => {..} ➜ sync functions
let task = async queue => {..} ➜ async functions
let task = queue => new Promise(resolve => setTimeout(resolve, 100) ➜ new Promise()
Note: We already call our tasks with await, where await wraps the response of the task into a Promise.
Nr 2. (async function), always returns a Promise on its own, and the await call just wraps a Promise into an other Promise, which is slightly less efficient.
Nr 3. is fine. Returned Promises will not get wrapped by await
This is how async functions are executed: (source)
The result of an async function is always a Promise p. That Promise is created when starting the execution of the async function.
The body is executed. Execution may finish permanently via return or throw. Or it may finish temporarily via await; in which case execution will usually continue later on.
The Promise p is returned.
The following code demonstrates how that works:
async function asyncFunc() {
console.log('asyncFunc()'); // (A)
return 'abc';
}
asyncFunc().
then(x => console.log(`Resolved: ${x}`)); // (B)
console.log('main'); // (C)
// Output:
// asyncFunc()
// main
// Resolved: abc
You can rely on the following order:
Line (A): the async function is started synchronously. The async function’s Promise is resolved via return.
Line (C): execution continues.
Line (B): Notification of Promise resolution happens asynchronously.
Read more: "Callable values"
Read more: "Async functions"
Performance limitations
Since AutoQueue is limited to handle one task after the other, it might become a bottleneck in our app. Limiting factors are:
Tasks per time: ➜ Frequency of new enqueue()d tasks.
Runtime per task ➜ Blocking time in dequeue() until task complete
1. Tasks per time
This is our responsibility! We can get the current size of the queue at any time: size = queue.size. Your outer script needs a "fail-over" case for a steadily growing queue (check "Stacked wait times" section).
You want to avoid a "queue overflow" like this, where average/mean waitTime increases over time.
+-------+----------------+----------------+----------------+----------------+
| tasks | enqueueMin(ms) | enqueueMax(ms) | runtimeMin(ms) | runtimeMax(ms) |
| 20 | 0 | 200 | 10 | 30 |
+-------+----------------+----------------+----------------+----------------+
➜ Task 20/20 waits for 195ms until exec starts
➜ From the time, our last task was randomly enqueued, it takes another + ~232ms, until all tasks were resolved.
2. Runtime per task
This one is harder to deal with. (Waiting for a fetch() can not be improved and we need to wait until the HTTP request completed).
Maybe your fetch() tasks rely on each others response and a long runtime will block the others.
But there are some things we could do:
Maybe we could cache responses ➜ Reduce runtime on next enqueue.
Maybe we fetch() from a CDN and have an alternative URI we could use. In this case we can return a new Promise from our task that will be run before the next task is enqueue()d. (see "Error handling"):
queue.enqueue(queue => Promise.race(fetch('url1'), fetch('url2')));
Maybe your have some kind of "long polling" or periodic ajax task that runs every x seconds thatcan not be cached. Even if you can not reduce the runtime itself, you could record the runtimes which would give you an aprox. estimation of the next run. Maybe can swap long running tasks to other queue instances.
Balanced AutoQueue
What is an "efficient" Queue? - Your first thought might be something like:
The most efficient Queue handles most tasks in shortest period of time?
Since we can not improve our task runtime, can we lower the waiting time? The example is a queue with zero (~0ms) waiting time between tasks.
Hint: In order to compare our next examples we need some base stats that will not change:
+-------+----------------+----------------+------------------+------------------+
| count | random fake runtime for tasks | random enqueue() offset for tasks |
+-------+----------------+----------------+------------------+------------------+
| tasks | runtimeMin(ms) | runtimeMax(ms) | msEnqueueMin(ms) | msEnqueueMax(ms) |
| 200 | 10 | 30 | 0 | 4000 |
+-------+----------------+----------------+------------------+------------------+
Avg. task runtime: ⇒ (10ms + 30ms) / 2 = 20ms
Total time: ⇒ 20ms * 200 = 4000ms ≙ 4s
➜ We expect our queue to be resolved after ~4s
➜ For consistent enqueue() frequency we set msEnqueueMax to 4000
AutoQueue finished last dequeue() after ~4.12s (^^ see tooltip).
Which is ~120ms longer than our expected 4s:
Hint: There is a small "Log" block" after each task ~0.3ms, where I build/push an Object with log marks to a global 'Array' for the console.table() log at the end. This explains 200 * 0.3ms = 60ms.. The missing 60msare untracked (you see the small gap between the tasks) -> 0.3ms/task for our test loop and probably some delay from open Dev-Tools,..
We come back to these timings later.
The initialization code for our queue:
const queue = new AutoQueue();
// .. get 200 random Int numbers for our task "fake" runtimes [10-30]ms
let runtimes = Array.from({ length: 200 }, () => rndInt(10, 30));
let i = 0;
let enqueue = queue => {
if (i >= 200) {
return queue; // break out condition
}
i++;
queue
.enqueue(
newTask({ // generate a "fake" task with of a rand. runtime
ms: runtimes[i - 1],
url: _(i)
})
)
.then(payload => {
enqueue(queue);
});
};
enqueue(queue); // start recurion
We recursively enqueue() our next task, right after the previous finished. You might have noticed the analogy to a typical Promise.then() chain, right?
Hint: We don´t need a Queue if we already know the order and total number of tasks to run in sequence. We can use a Promise chain and get the same results.
Sometimes we don´t know all next steps right at the start of our script..
..You might need more flexibility, and the next task we want to run depends on the response of the previous task. - Maybe your app relies on a REST API (multiple endpoints), and you are limited to max X simultaneous API request(s). We can not spam the API with requests from all over your app. You even don´t know when the next request gets enqueue()d (e.g. API requests are triggered by click() events?..
Ok, for the next example I changed the initialization code a bit:
We now enqueue 200 tasks randomly within [0-4000ms] period. - To be fair, we reduced the range by 30ms (max task runtime) to [0-3970ms]. Now our randomly filled queue has a chance to keep inside 4000ms limit.
What we can get out or the Dev-Tools performance login:
Random enqueue() leads to a big number of "waiting" tasks.
Makes sense, since we enqueued all tasks within first ~4000ms, they must overlap somehow. Checking the table output we can verify: Max queue.size is 22 at the time task 170/200was enqueued.
Waiting tasks are not evenly distributed. Right after start there are even some idle section.
Because of the random enqueue() it it unlikely to get a 0ms offset for our first task.
~20ms runtime for each task lead to the stacking effect over time.
We can sort tasks by "wait ms" (see screen): Longest waiting time was >400ms.
There might be a relation between queue.size (column: sizeOnAdd) and wait ms (see next section).
Our AwaitQueue completed last dequeue() ~4.37s after its initialization (check tooltip in "performance" tab). An average runtime of 20,786ms / task (expected: 20ms) gives us a total runtime of 4157.13ms (expected: 4000ms ≙ 4s).
We still have our "Log" blocks and the exec. time of our test script it self ~120ms. Still ~37ms longer? Summing up all idle "gaps" right at the start explains the missing ~37ms
Back to our initial "definition"
The most efficient Queue handles most tasks in shortest period of time?
Assumption: Apart from the random offset, tasks get enqueue()d in the previous example, both queues handled the same number of tasks (equal avg. runtime) within the same period of time. Neither the waiting time of an enqueued task nor the queue.size affect the total runtime. Both have the same efficiency?
Since a Queue, by its nature, shrinks our coding possibilities it is best not to use a Queue if we talk about efficient code (tasks per time).
A queue helps us to straighten tasks in an async environment into a sync pattern. That is exactly what we want. ➜ "Run an unknown sequence of tasks in a row".
If you find yourself asking things like: "If a new task gets enqueued into an already filled queue, the time we have to wait for our result, is increased by the run times of the others. That´s less efficient!".
Then you are doing it wrong:
You either enqueue tasks that have no dependency (in some way) on each other (logical oder programmatical dependency) or there is a dependency which would not increase the total runtime of our script. - We have to wait for the others anyway.
Stacked wait times
We have see a peak wait time of 461.05ms for a task before it runs. Wouldn't it be great if we could forecast the wait time for a task before we decide to enqueue it?
At first we analyse the behavior of our AutoQueue class over longer times.
(re-post screens)
We can build a chart from from what console.table() output:
Beside the wait time of a task, we can see the random [10-30ms] runtime and 3 curves, representing the current queue.size, recorded at the time a task ..
.. is enqueued()
.. starts to run. (dequeue())
.. the task finished (right before the next dequeue())
2 more runs for comparison (similar trend):
chart run 2: https://i.imgur.com/EjpRyuv.png
chart run 3: https://i.imgur.com/0jp5ciV.png
Can we find dependencies among each other?
If we could find a relation between any of those recorded chart lines, it might help us to understand how a queue behaves over time (➜ constantly filled up with new tasks).
Exkurs: What is a relation?
We are looking for an equation that projects the wait ms curve onto one of the 3 queue.size records. This would proof a direct dependency between both.
For our last run, we changed our start parameters:
Task count: 200 ➜ 1000 (5x)
msEnqueueMax: 4000ms ➜ 20000ms (5x)
+-------+----------------+----------------+------------------+------------------+
| count | random fake runtime for tasks | random enqueue() offset for tasks |
+-------+----------------+----------------+------------------+------------------+
| tasks | runtimeMin(ms) | runtimeMax(ms) | msEnqueueMin(ms) | msEnqueueMax(ms) |
| 1000 | 10 | 30 | 0 | 20000 |
+-------+----------------+----------------+------------------+------------------+
Avg. task runtime: ⇒ (10ms + 30ms) / 2 = 20ms (like before)
Total time: ⇒ 20ms * 1000 = 20000ms ≙ 20s
➜ We expect our queue to be resolved after ~20s
➜ For consistent enqueue() frequency we set msEnqueueMax to 20000
(interactive chart: https://datawrapper.dwcdn.net/p4ZYx/2/)
We see the same trend. wait ms increases over time (nothing new). Since our 3 queue.size lines at the bottom were drawn into the same chart (Y-axis has ms scale), they are barely visible. Quick switch to a logarithmic scale for better comparison:
(interactive chart: https://datawrapper.dwcdn.net/lZngg/1/)
The two dotted lines for queue.size [on start] and queue.size [on end] pretty much overlap each other and fall down to "0" once our queue gets empty, at the end.
queue.size [on add] looks very similar to the wait ms line. That is what we need.
{queue.size [on add]} * X = {wait ms}
⇔ X = {wait ms} / {queue.size [on add]}
This alone does not help us at runtime because wait ms is unknown for a new enqueued task (has not yet been run). So we still have 2 unknown variable: X and wait ms. We need an other relation that helps us.
First of all, we print our new ration {wait ms} / {queue.size [on add]} into the chart (light green), and its mean/average (light green horizontal dashed). This is pretty close to 20ms (avg. run ms of our tasks), right?
Switch back to linear Y-axes and set its "max scale" to 80ms to get a better view of it. (hint: wait ms is now beyond the view port)
(interactive chart: https://datawrapper.dwcdn.net/Tknnr/4/)
Back to the random runtimes of our tasks (dot cloud). We still have our "total mean" of 20.72ms (dark green dashed horizontal). We can also calc the mean of our previous tasks at runtime (e.g. task 370 gets enqueued ➜ What is the current mean runtime for task [1,.., 269] = mean runtime). But we could even be more precise:
The more tasks we enqueue the less impact they have on total "mean runtime". So let´s just calc the "mean runtime" of the last e.g. 50 tasks. Which leads to a consistent impact of 1/50 per task for the "mean runtime". ➜ Peak runtimes get straighten and the trend (up/down) is taken into account. (dark green horizontal path curve next to the light green from our 1. equation).
Things we can do now:
We can eliminate X from our 1st equation (light green). ➜ X can be expressed by the "mean runtimes of previous n e.g. 50 tasks (dark green).
Our new equation just depends on variables, that are known at runtime, right at the point of enqueue:
// mean runtime from prev. n tasks:
X = {[taskRun[-50], .. , taskRun[-2], taskRun[-1] ] / n } ms
// .. replace X in 1st equation:
⇒ {wait ms} = {queue.size [on add]} * {[runtime[-50], .. , runtime[-2], runtime[-1] ] / n } ms
We can draw a new diagram curve to our chart and check how close it is compared to the recorded wait ms (orange)
(interactive chart: https://datawrapper.dwcdn.net/LFp1d/2/)
Conclusion
We can forecast the wait for a task before it gets enqueued, given the fact the run times of our tasks can be determined somehow. So it works best in situations where you enqueue tasks of the same type/function:
Use case: An AutoQueue instance is filled up with render tasks for your UI components. Render time might not change chat much (compared to fetch()). Maybe you render 1000 location marks on a map. Each mark is an instance of a class with a render() Fn.
Tips
Queues are used for various tasks. ➜ Implement dedicated Queue class variations for different kinds of logic (not mix different logic in one class)
Check all tasks that might be enqueued to the same AutoQueue instance (now or in future), they could be blocked by all the others.
An AutoQueue will not improve the runtime, at best it will not get lowered.
Use different AutoQueue instances for different Task types.
Monitor the size of your AutoQueue, particular ..
.. on heavy usage (high frequently of enqueue())
.. on long or unknown task runtimes
Check your error handling. Since errors inside your tasks will just reject their returned promise on enqueue (promise = queue.enqueue(..)) and will not stop the dequeue process. You can handle errors..
.. inside your tasks ➜ `try{..} catch(e){ .. }
.. right after it (before the next) ➜ return new Promise()
.. "async" ➜ queue.enqueue(..).catch(e => {..})
.. "global" ➜ error handler inside the AutoQueue class
Depending on the implementation of your Queue you might watch the queue.size. An Array, filled up with 1000 tasks, is less effective than a decentralized data-structure like the "Doubly Linked List" I used in the final code.
Avoid recursion hell. (It is OK to use tasks that enqueue() others) - But, it is no fun to debug an AutoQueue where tasks are dynamically enqueue()e by others in an async environment..
At first glance a Queue might solve a problem (at a certain level of abstraction). However, in most cases it shrinks existing flexibility. It adds an additional "control layer" to our code (which in most cases, is what we want) at the same time, we sign a contract to accept the strict rules of a Queue. Even if it solves a problem, it might not be the best solution.
Add more features [basic]
Stop "Auto dequeue()" on enqueue():
Since our AutoQueue class is generic and not limited to long running HTTP requests(), you can enqueue() any function that has to run in sequence, even 3min running functions for e.g. "store updates for modules",.. You can not guarantee, that when you enqueue() 100 tasks in a loop, the prev added task is not already dequeued().
You might want to prevent enqueue() from calling dequeue() until all where added.
enqueue(action, autoDequeue = true) { // new
return new Promise((resolve, reject) => {
super.enqueue({ action, resolve, reject });
if (autoDequeue) this.dequeue(); // new
});
}
.. and then call queue.dequeue() manually at some point.
Control methods: stop / pause / start
You can add more control methods. Maybe your app has multiple modules that all try to fetch() there resources on pageload. An AutoQueue() works like a Controller. You can monitor how many tasks are "waiting.." and add more controls:
class AutoQueue extends Queue {
constructor() {
this._stop = false; // new
this._pause = false; // new
}
enqueue(action) { .. }
async dequeue() {
if (this._pendingPromise) return false;
if (this._pause ) return false; // new
if (this._stop) { // new
this._queue = [];
this._stop = false;
return false;
}
let item = super.dequeue();
..
}
stop() { // new
this._stop = true;
}
pause() { // new
this._pause = true;
}
start() { // new
this._stop = false;
this._pause = false;
return await this.dequeue();
}
}
Forward response:
You might want to process the "response/value" of a task in the next task. It is not guaranteed that our prev. task has not already finished, at the time we enqueue the 2nd task.
Therefore it might be best to store the response of the prev. task inside the class and forward it to the next: this._payload = await item.action(this._payload)
Error handling
Thrown errors inside a task Fn reject the promise returned by enqueue() and will not stop the dequeue process. You might want to handle error before next task starts to run:
queue.enqueue(queue => myTask() ).catch({ .. }); // async error handling
queue.enqueue(queue =>
myTask()
.then(payload=> otherTask(payload)) // .. inner task
.catch(() => { .. }) // sync error handling
);
Since our Queue is dump, and just await´s for our task´s to be resolved (item.action(this)), no one prevents you from returning a new Promise() from the current running task Fn. - It will be resolved before the next task gets dequeued.
You can throw new Error() inside task Fn´s and handle them "outside"/after run:queue.enqueue(..).catch().
You can easily add a custom Error handling inside the dequeue() method that calls this.stop() to clear "on hold"(enqueued) tasks..
You can even manipulate the queue from inside your task functions. Check: await item.action(this) invokes with this and gives access to the Queue instance. (this is optional). There are use cases where task Fn´s should not be able to.
Add more features [advanced]
... text limt reached :D
more: https://gist.github.com/exodus4d/6f02ed518c5a5494808366291ff1e206
Read more
Blog: "Asynchronous Recursion with Callbacks, Promises and Async"
Book: "Callable values"
Book: "Async functions"
You could save previous pending promise, await for it before calling next fetch.
// fake fetch for demo purposes only
const fetch = (url, options) => new Promise(resolve => setTimeout(resolve, 1000, {url, options}))
// task executor
const addTask = (() => {
let pending = Promise.resolve();
const run = async (url, options) => {
try {
await pending;
} finally {
return fetch(url, options);
}
}
// update pending promise so that next task could await for it
return (url, options) => (pending = run(url, options))
})();
addTask('url1', {options: 1}).then(console.log)
addTask('url2', {options: 2}).then(console.log)
addTask('url3', {options: 3}).then(console.log)
Here's one I made earlier, also available in typescript
function createAsyncQueue(opts = { dedupe: false }) {
const { dedupe } = opts
let queue = []
let running
const push = task => {
if (dedupe) queue = []
queue.push(task)
if (!running) running = start()
return running.finally(() => {
running = undefined
})
}
const start = async () => {
const res = []
while (queue.length) {
const item = queue.shift()
res.push(await item())
}
return res
}
return { push, queue, flush: () => running || Promise.resolve([]) }
}
// ----- tests below 👇
const sleep = ms => new Promise(r => setTimeout(r, ms))
async function test1() {
const myQueue = createAsyncQueue()
myQueue.push(async () => {
console.log(100)
await sleep(100)
return 100
})
myQueue.push(async () => {
console.log(10)
await sleep(10)
return 10
})
console.log(await myQueue.flush())
}
async function test2() {
const myQueue = createAsyncQueue({ dedupe: true })
myQueue.push(async () => {
console.log(100)
await sleep(100)
return 100
})
myQueue.push(async () => {
console.log(10)
await sleep(10)
return 10
})
myQueue.push(async () => {
console.log(9)
await sleep(9)
return 9
})
// only 100 and 9 will be executed
// concurrent executions will be deduped
console.log(await myQueue.flush())
}
test1().then(test2)
Example usage:
const queue = createAsyncQueue()
const task1 = async () => {
await fetchItem()
}
queue.push(task1)
const task2 = async () => {
await fetchItem()
}
queue.push(task2)
// task1 will be guaranteed to be executed before task2
I would think there is a simple solution like follows.
class AsyncQueue {
constructor() {
this.promise = Promise.resolve()
}
push = (task) => {
this.promise = this.promise.then(task)
}
}
let count = 0
let dummy = () =>
new Promise((res) => {
const ms = 400 + Math.ceil(1200 * Math.random())
console.log('will wait', ms, 'ms')
setTimeout(res, ms)
})
const foo = async (args) => {
const s = ++count
console.log('start foo', s)
await dummy()
console.log('end foo', s)
}
console.log('begin')
const q = new AsyncQueue()
q.push(foo)
q.push(foo)
q.push(foo)
q.push(foo)
console.log('end')
For your case you can do something like this:
const q = new AsyncQueue()
const addTask = (url, options) => {
q.push(() => fetch(url, options))
}
If you want to handle some results:
const q = new AsyncQueue()
const addTask = (url, options, handleResults) => {
q.push(async () => handleResults(await fetch(url, options)))
}
Not sure about performance, I just think it is a quick clean solution.
https://stackoverflow.com/a/71239408/8784402
Schedule task parallelly from array without waiting any to finish within allowed threads
const fastQueue = async <T, Q>(
x: T[],
threads: number,
fn: (v: T, i: number, a: T[]) => Promise<Q>
) => {
let k = 0;
const result = Array(x.length) as Q[];
await Promise.all(
[...Array(threads)].map(async () => {
while (k < x.length) result[k] = await fn(x[k], k++, x);
})
);
return result;
};
const demo = async () => {
const wait = (x: number) => new Promise(r => setTimeout(r, x, x))
console.time('a')
console.log(await fastQueue([1000, 2000, 3000, 2000, 2000], 4, (v) => wait(v)))
console.timeEnd('a')
}
demo();

Rx distinctUntilChanged allow repetition after configurable time between events

Let's consider for a moment the following code
Rx.Observable.merge(
Rx.Observable.just(1),
Rx.Observable.just(1).delay(1000)
).distinctUntilChanged()
.subscribe(x => console.log(x))
We expect that 1 is logged just once. However what if we wanted to allow repetition of a value if its last emission was a configurable amount of time ago? I mean to get both events logged.
For example it would be cool to have something like the following
Rx.Observable.merge(
Rx.Observable.just(1),
Rx.Observable.just(1).delay(1000)
).distinctUntilChanged(1000)
.subscribe(x => console.log(x))
In which distinctUntilChanged() accepts some sort of timeout to allow repetition on the next element. However such a thing does not exist and I was wondering if anybody knows an elegant way to achieve this by using high level operators without messing with a filter that would require handling state
Unless I am misunderstanding I am pretty sure this could be accomplished in a relatively straight-forward manner with windowTime:
Observable
.merge(
Observable.of(1),
Observable.of(1).delay(250), // Ignored
Observable.of(1).delay(700), // Ignored
Observable.of(1).delay(2000),
Observable.of(1).delay(2200), //Ignored
Observable.of(2).delay(2300)
)
// Converts the stream into a stream of streams each 1000 milliseconds long
.windowTime(1000)
// Flatten each of the streams and emit only the latest (there should only be one active
// at a time anyway
// We apply the distinctUntilChanged to the windows before flattening
.switchMap(source => source.distinctUntilChanged())
.timeInterval()
.subscribe(
value => console.log(value),
error => console.log('error: ' + error),
() => console.log('complete')
);
See the example here (borrowed #martin's example inputs)
This is an interesting use-case. I wonder whether there's an easier solution than mine (note that I'm using RxJS 5):
let timedDistinctUntil = Observable.defer(() => {
let innerObs = null;
let innerSubject = null;
let delaySub = null;
function tearDown() {
delaySub.unsubscribe();
innerSubject.complete();
}
return Observable
.merge(
Observable.of(1),
Observable.of(1).delay(250), // ignored
Observable.of(1).delay(700), // ignored
Observable.of(1).delay(2000),
Observable.of(1).delay(2200), // ignored
Observable.of(2).delay(2300)
)
.do(undefined, undefined, () => tearDown())
.map(value => {
if (innerObs) {
innerSubject.next(value);
return null;
}
innerSubject = new BehaviorSubject(value);
delaySub = Observable.of(null).delay(1000).subscribe(() => {
innerObs = null;
});
innerObs = innerSubject.distinctUntilChanged();
return innerObs;
})
// filter out all skipped Observable emissions
.filter(observable => observable)
.switch();
});
timedDistinctUntil
.timestamp()
.subscribe(
value => console.log(value),
error => console.log('error: ' + error),
() => console.log('complete')
);
See live demo: https://jsbin.com/sivuxo/5/edit?js,console
The entire logic is wrapped into Observable.defer() static method because it requires some additional variables.
A couple points how this all works:
The merge() is the source of items.
I use do() to properly catch when the source completes so I can shutdown the internal timer and send proper complete notification.
The map() operator is where the most interesting things happen. I reemit the value that it received and then return null if there's already a valid Observable (it was created less then 1000ms ago = innerObs != null). Then I eventually create a new Subject where I'm going to reemit all items and return this BehaviorSubject chained with .distinctUntilChanged(). At the end I schedule 1s delay to set innerObs = null which means then when another value arrives it'll return a new Observable with new .distinctUntilChanged().
Then filter() will let me ignore all null values returned. This means it won't emit a new Observable more than once a second.
Now I need to work with so called Higher-order Observables (Observables emitting Observables. For this reason I use switch() operator that always subscribes only to the newest Observable emitted by the source. In our case we emit Observables only max. once a second (thanks to the filter() used above) and this inner itself Observable can emit as many values it wants and all of them are going to be passed through distinctUntilChanged() so duplicates are ignored.
The output for this demo will look like the following output:
Timestamp { value: 1, timestamp: 1484670434528 }
Timestamp { value: 1, timestamp: 1484670436475 }
Timestamp { value: 2, timestamp: 1484670436577 }
complete
As you can see the value 1 is emitted twice with cca 2s delay. However value 2 passed without any problem after 100ms thanks to distinctUntilChanged().
I know this isn't simple but I hope it makes sense to you :)

Categories