I am learning node.js and most of examples I can find are dealing with simple examples. I am more interested in building real-world complicated systems and estimating how well event based model of node.js can handle all the use cases of a real application.
One of the common patterns that I want to apply is let blocking execution to time-out if it does not occur within certain timeout time. For example if it takes more than 30 seconds to execute a database query, it might be too much for certain application. Or if it takes more than 10 seconds to read a file.
For me the ideal program flow with timeouts would be similar to the program flow with exceptions. If an event does not occur within certain predefined timeout limit, then the event listener would be cleared from the event loop and a timeout event would be generated instead. This timeout event would have an alternate listener. If the event is handled normally, then both the timeout listener and event listener are cleared from the event loop.
Is there a general mechanism for timeout handling and cleaning up timed out processes? I know some types such as socket have timeout parameter but it is not a general mechanism that applies to all events.
There is nothing like this at the moment (that i know of, but i don't know everything).
The only thing i can think of is that you reset it yourself somehow. I've given an example below but I think it may have some scope issues. Should be solvable though.
var to
function cb() {
clearTimeout(to)
// do stuff
}
function cbcb() {
cb()
}
function cancel() {
cb = function() {} // notice empty
}
fs.doSomethingAsync(file, cbcb)
to = setTimeout(cancel, 10000)
Related
Lets assume I run this piece of code.
var score = 0;
for (var i = 0; i < arbitrary_length; i++) {
async_task(i, function() { score++; }); // increment callback function
}
In theory I understand that this presents a data race and two threads trying to increment at the same time may result in a single increment, however, nodejs(and javascript) are known to be single threaded. Am I guaranteed that the final value of score will be equal to arbitrary_length?
Am I guaranteed that the final value of score will be equal to
arbitrary_length?
Yes, as long as all async_task() calls call the callback once and only once, you are guaranteed that the final value of score will be equal to arbitrary_length.
It is the single-threaded nature of Javascript that guarantees that there are never two pieces of Javascript running at the exact same time. Instead, because of the event driven nature of Javascript in both browsers and node.js, one piece of JS runs to completion, then the next event is pulled from the event queue and that triggers a callback which will also run to completion.
There is no such thing as interrupt driven Javascript (where some callback might interrupt some other piece of Javascript that is currently running). Everything is serialized through the event queue. This is an enormous simplification and prevents a lot of stickly situations that would otherwise be a lot of work to program safely when you have either multiple threads running concurrently or interrupt driven code.
There still are some concurrency issues to be concerned about, but they have more to do with shared state that multiple asynchronous callbacks can all access. While only one will ever be accessing it at any given time, it is still possible that a piece of code that contains several asynchronous operations could leave some state in an "in between" state while it was in the middle of several async operations at a point where some other async operation could run and could attempt to access that data.
You can read more about the event driven nature of Javascript here: How does JavaScript handle AJAX responses in the background? and that answer also contains a number of other references.
And another similar answer that discusses the kind of shared data race conditions that are possible: Can this code cause a race condition in socket io?
Some other references:
how do I prevent event handlers to handle multiple events at once in javascript?
Do I need to be concerned with race conditions with asynchronous Javascript?
JavaScript - When exactly does the call stack become "empty"?
Node.js server with multiple concurrent requests, how does it work?
To give you an idea of the concurrency issues that can happen in Javascript (even without threads and without interrupts, here's an example from my own code.
I have a Raspberry Pi node.js server that controls the attic fans in my house. Every 10 seconds it checks two temperature probes, one inside the attic and one outside the house and decides how it should control the fans (via relays). It also records temperature data that can be presented in charts. Once an hour, it saves the latest temperature data that was collected in memory to some files for persistence in case of power outage or server crash. That saving operation involves a series of async file writes. Each one of those async writes yields control back to the system and then continues when the async callback is called signaling completion. Because this is a low memory system and the data can potentially occupy a significant portion of the available RAM, the data is not copied in memory before writing (that's simply not practical). So, I'm writing the live in-memory data to disk.
At any time during any of these async file I/O operations, while waiting for a callback to signify completion of the many file writes involved, one of my timers in the server could fire, I'd collect a new set of temperature data and that would attempt to modify the in-memory data set that I'm in the middle of writing. That's a concurrency issue waiting to happen. If it changes the data while I've written part of it and am waiting for that write to finish before writing the rest, then the data that gets written can easily end up corrupted because I will have written out one part of the data, the data will have gotten modified from underneath me and then I will attempt to write out more data without realizing it's been changed. That's a concurrency issue.
I actually have a console.log() statement that explicitly logs when this concurrency issue occurs on my server (and is handled safely by my code). It happens once every few days on my server. I know it's there and it's real.
There are many ways to work around those types of concurrency issues. The simplest would have been to just make a copy in memory of all the data and then write out the copy. Because there are not threads or interrupts, making a copy in memory would be safe from concurrency (there would be no yielding to async operations in the middle of the copy to create a concurrency issue). But, that wasn't practical in this case. So, I implemented a queue. Whenever I start writing, I set a flag on the object that manages the data. Then, anytime the system wants to add or modify data in the stored data while that flag is set, those changes just go into a queue. The actual data is not touched while that flag is set. When the data has been safely written to disk, the flag is reset and the queued items are processed. Any concurrency issue was safely avoided.
So, this is an example of concurrency issues that you do have to be concerned about. One great simplifying assumption with Javascript is that a piece of Javascript will run to completion without any thread of getting interrupted as long as it doesn't purposely return control back to the system. That makes handling concurrency issues like described above lots, lots easier because your code will never be interrupted except when you consciously yield control back to the system. This is why we don't need mutexes and semaphores and other things like that in our own Javascript. We can use simple flags (just a regular Javascript variable) like I described above if needed.
In any entirely synchronous piece of Javascript, you will never be interrupted by other Javascript. A synchronous piece of Javascript will run to completion before the next event in the event queue is processed. This is what is meant by Javascript being an "event-driven" language. As an example of this, if you had this code:
console.log("A");
// schedule timer for 500 ms from now
setTimeout(function() {
console.log("B");
}, 500);
console.log("C");
// spin for 1000ms
var start = Date.now();
while(Data.now() - start < 1000) {}
console.log("D");
You would get the following in the console:
A
C
D
B
The timer event cannot be processed until the current piece of Javascript runs to completion, even though it was likely added to the event queue sooner than that. The way the JS interpreter works is that it runs the current JS until it returns control back to the system and then (and only then), it fetches the next event from the event queue and calls the callback associated with that event.
Here's the sequence of events under the covers.
This JS starts running.
console.log("A") is output.
A timer event is schedule for 500ms from now. The timer subsystem uses native code.
console.log("C") is output.
The code enters the spin loop.
At some point in time part-way through the spin loop the previously set timer is ready to fire. It is up to the interpreter implementation to decide exactly how this works, but the end result is that a timer event is inserted into the Javascript event queue.
The spin loop finishes.
console.log("D") is output.
This piece of Javascript finishes and returns control back to the system.
The Javascript interpreter sees that the current piece of Javascript is done so it checks the event queue to see if there are any pending events waiting to run. It finds the timer event and a callback associated with that event and calls that callback (starting a new block of JS execution). That code starts running and console.log("B") is output.
That setTimeout() callback finishes execution and the interpreter again checks the event queue to see if there are any other events that are ready to run.
Node uses an event loop. You can think of this as a queue. So we can assume, that your for loop puts the function() { score++; } callback arbitrary_length times on this queue. After that the js engine runs these one by one and increase score each time. So yes. The only exception if a callback is not called or the score variable is accessed from somewhere else.
Actually you can use this pattern to do tasks parallel, collect the results and call a single callback when every task is done.
var results = [];
for (var i = 0; i < arbitrary_length; i++) {
async_task(i, function(result) {
results.push(result);
if (results.length == arbitrary_length)
tasksDone(results);
});
}
No two invocations of the function can happen at the same time (b/c node is single threaded) so that will not be a problem. The only problem would be ifin some cases async_task(..) drops the callback. But if, e.g., 'async_task(..)' was just calling setTimeout(..) with the given function, then yes, each call will execute, they will never collide with each other, and 'score' will have the value expected, 'arbitrary_length', at the end.
Of course, the 'arbitrary_length' can't be so great as to exhaust memory, or overflow whatever collection is holding these callbacks. There is no threading issue however.
I do think it’s worth noting for others that view this, you have a common mistake in your code. For the variable i you either need to use let or reassign to another variable before passing it into the async_task(). The current implementation will result in each function getting the last value of i.
From what I see, if an event in Node take a "long time" to be dispatched, Node creates some kind of "queue of events", and they are triggered as soon as possible, one by one.
How long can this queue be?
While this may seem like a simple question, it is actually a rather complex problem; unfortunately, there's no simple number that anyone can give you.
First: wall time doesn't really play a part in anything here. All events are dispatched in the same fashion, whether or not things are taking "a long time." In other words, all events pass through a "queue."
Second: there is no single queue. There are many places where different kinds of events can be dispatched into JS. (The following assumes you know what a tick is.)
There are the things you (or the libraries you use) pass to process.nextTick(). They are called at the end of the current tick until the nextTick queue is empty.
There are the things you (or the libraries you use) pass to setImmediate(). They are called at the start of the next tick. (This means that nextTick tasks can add things to the current tick indefinitely, preventing other operations from happening whereas setImmediate tasks can only add things to the queue for the next tick.)
I/O events are handled by libuv via epoll/kqueue/IOCP on Linux/Mac/Windows respectively. When the OS notifies libuv that I/O has happened, it in turn invokes the appropriate handler in JS. A given tick of the event loop may process zero or more I/O events; if a tick takes a long time, I/O events will queue in an operating system queue.
Signals sent by the OS.
Native code (C/C++) executed on a separate thread may invoke JS functions. This is usually accomplished through the libuv work queue.
Since there are many places where work may be queued, it is not easy to answer "how many items are currently queued", much less what the absolute limit of those queues are. Essentially, the hard limit for the size of your task queues is available RAM.
In practice, your app will:
Hit V8 heap constraints
For I/O, max out the number of allowable open file descriptors.
...well before the size of any queue becomes problematic.
If you're just interested in whether or not your app under heavy load, toobusy may be of interest -- it times each tick of the event loop to determine whether or not your app is spending an unusual amount of time processing each tick (which may indicate that your task queues are very large).
Handlers for a specific event are called synchronously (in the order they were added) as soon as the event is emitted, they are not delayed at all.
The total number of event handlers is limited only by v8 and/or the amount of available RAM.
I believe you're talking about operations that can take an undefined amount of time to complete, such as an http request or filesystem access.
Node gives you a method to complete these types of operations asynchronously, meaning that you can tell node, or a 3rd party library, to start an operation, and then call some code (a function that you define) to inform you when the operation is complete. This can be done through event listeners, or callback functions, both of which have their own limitations.
With event listeners the maximum amount of listeners you can have is dependent on the maximum array size of your environment. In the case of node.js the javascript engine is v8, but according to this post there is a maximum set out by the 5th ECMA standard of ~4billion elements, which is a limit that you shouldn't ever overcome.
With callbacks the limitation you have is the max call stack size, meaning how deep your functions can call each other. For instance you can have a callback calling a callback calling a callback calling another callback, etc etc. The call stack size dictates how may callbacks calling callbacks you can have. Note that the call stack size can be a limitation with event listeners as well as they're essentially callbacks that can be executed multiple times.
And these are the limitations with each.
Tx is committed when :
request success callback returns
- that means that multiple requests can be executed within transaction boundaries only when next request is executed from success callback of the previous one
when your task returns to event loop
It means that if no requests are submitted to it, it is not committed until it returns to event loop. These facts pose 2 problematic states :
placing a new IDB request by enqueuing a new task to event loop queue from within the success callback of previous request instead of submitting new request synchronously
in that case the first success callback immediately returns but another IDB request has been scheduled
are all the asynchronous requests executed within the single initial transaction? This is quite essential in case you want to implement result pulling with back-pressure where consumer gives you a feedback in form of a Future that it is ready to consume another response
creating a ReadWrite tx, not placing any requests against it and creating another one before returning to event loop
does creating a new one implicitly commits the previous tx ? If not, serious write lock starvations might occur, because :
If multiple "readwrite" transactions are attempting to access the same
object store (i.e. if they have overlapping scope), the transaction
that was created first must be the transaction which gets access to
the object store first. Due to the requirements in the previous
paragraph, this also means that it is the only transaction which has
access to the object store until the transaction is finished.
The example of enqueuing a new task to event loop queue from within the success callback by recursive request submission with back-pressure :
function recursiveFn(key) {
val req = store.get(key)
req.onsuccess = function() {
observer.onNext(req.result).onsuccess { recursiveFn(nextKey) }
}
}
Observer#onNext // returns Future[Ack] Ack is either Continue or Cancel
Now can onsuccess or onNext do a setTimeout(0) or not to make the whole thing be part of one transaction?
Bonus question :
I think that ReadOnly transactions are exposed to the consumer/user just because it would be hard to detect the end of a batch read if you recursively submit new requests from the success callback of the previous one right? Otherwise I don't see any other reason for them to be exposed to a user, right ?
I'm not sure I understand your question completely but I'll offer an answer on whether you can safely use IDB transaction events to move a state machine.
Yes and no. Yes in theory, no in practice.
I think you understand the transaction lifetime. But to rehash:
The lifetime of a transactions lasts as long as it's referenced: it's "active" so long as it's being referenced, after which it is said to be "finished" and the transaction is committed.
In theory, oncomplete should fire whenever a transaction successfully commits. There's a useful tip in the spec on this that suggests how you could loop:
To determine if a transaction has completed successfully, listen to the transaction’s complete event rather than the IDBObjectStore.add request’s success event, because the transaction may still fail after the success event fires.
To safely use this mechanism be sure to watch for non-success events including onblocked and onabort as well.
Practically speaking, I've found transactions to be flakey when long-lived or done consecutively in batches (as you've noted in another IDB comment). I'm generally not engineering my apps to require tricky behavior because, no matter how the spec says it should behavior, I'm seeing wonky transactions in both Firefox and Chromium (but mostly Blink, interestingly) especially when multiple tabs are open.
I spent many weeks rewriting dash to reuse transactions for supposed performance gains. In the end it could not pass even my basic write tests and I was forced to abandon simultaneous/queued/consecutive transactions and rewrite once again. This time I picked a one-transaction-at-a-time model which is slower but, for me, more reliable (and suggest to avoid my lib and use something like ydn for bulk inserts).
I'm not sure on your application requirements, but in my humble opinion tying in your I/O into your event loop seems like a disastrous idea. If I needed an event loop as what I understand to be the term I would definitely use requestAnimationFrame() and throttle that callback if I needed fewer ticks than one per ~33 milliseconds.
Let's say I do this:
var timer = setTimeout(function() {
console.log("will this happen?");
}, 5000);
And then after just less than 5 seconds, another callback (from a network event in NodeJS for example) fires and clears it:
clearTimeout(timer);
Is there any possibility that the callback from the setTimeout call is already in the queue to be executed at this point, and if so will the clearTimeout be in time to stop it?
To clarify, I am talking about a situation where the setTimeout time actually expires and the interpreter starts the process of executing it, but the other callback is currently running so the message is added to the queue. It seems like one of those race condition type things that would be easy to not account for.
Even though Node is single thread, the race condition the question describes is possible.
It can happen because timers are triggered by native code (in lib_uv).
On top of that, Node groups timers with the same timeout value. As a result, if you schedule two timers with the same timeout within the same ms, they will be added to the event queue at once.
But rest assured node internally solves that for you. Quoting code from node 0.12.0:
timer.js > clearTimeout
exports.clearTimeout = function(timer) {
if (timer && (timer[kOnTimeout] || timer._onTimeout)) {
timer[kOnTimeout] = timer._onTimeout = null;
// ...
}
}
On clearing a timeout, Node internally removes the reference to the callback function. So even if the race condition happens, it can do no harm, because those timers will be skipped:
listOnTimeout
if (!first._onTimeout) continue;
Node.js executes in a single thread.
So there cannot be any race conditions and you can reliably cancel the timeout before it triggers.
See also a related discussion (in browsers).
I am talking about a situation where the setTimeout time actually expires and the interpreter starts the process of executing it
Without having looked at Node.js internals, I don't think this is possible. Everything is single-threaded, so the interpreter cannot be "in the process" of doing anything while your code is running.
Your code has to return control before the timeout can be triggered. If you put an infinite loop in your code, the whole system hangs. This is all "cooperative multitasking".
This behavior is defined in the HTML Standard, the fired task starts with:
If the entry for handle in the list of active timers has been cleared, then abort these steps.
Therefore even if the task has been queued already, it'll be aborted.
Whether this applies to Node.js, however, is debatable, as the documentation just states:
The timer functions within Node.js implement a similar API as the timers API provided by Web Browsers but use a different internal implementation that is built around the Node.js Event Loop.
I call a function that stacks two async calls and calls a callback when they have both completed. I am using a really simple method to keep track of the calls that have not completed lock++lock--The problem is that the program exits before the operation of the two functions is complete. I noticed this was the problem when I was debugging and gave the process time to complete before it exits. How can I fix this? (At the moment I am at a bit of a loss on how to exactly explain my problem please ask me anything you need to so I can clarify the question)
-----EDIT
With the script below why when I run it does it just exit? I thought that by calling on I was registering to the event que and the script should continue to run?
var events = require('events');
var eventEmitter = new events.EventEmitter();
eventEmitter.on('spo',function(){
console.log('spo');
});
The problem is that you setup an event emitter and attach a handler to the "spo" event and then you do nothing else. So the node runtime sees that there is nothing left to do and exits. Try this:
eventEmitter.on('spo', function() {
console.log('OK: got event "spo"');
});
eventEmitter.emit('spo');
COMING From a Git Hub thread.
What keeps the event loop alive is handles (sockets, timers, etc.) of which your script has none.
EventEmitter instances are synchronous - that is, they run immediately
- so in your example, once the event has fired, the script is done.
Think of it like this: an EventEmitter in itself isn't useful, it only
becomes useful when it's tied to something that emits interesting
events (data from the network or the file system, a timer that
expires, etc.).
I think that what they are saying is that it is the handle outside of Node into C land that holds the script open.