When is it appropriate to use a setTimeout vs a Cron? - javascript

I am building a Meteor application that is using a mongo database.
I have a collection that could potentially have 1000s of documents that need to be updated at different times.
Do I run setTimeouts on creation or a cron job that runs every second and loops over every document?
What are the pros and cons of doing each?
To put this into context:
I am building an online tournament system. I can have 100s of tournaments running which means I could have 1000s of matches.
Each match needs to absolutely end at a specific time, and can end earlier under a condition.

Using an OS-level cron job won't work because you can only check with a 60-second resolution. So by "cron job", I think you mean a single setTimeout (or synced-cron). Here are some thoughts:
Single setTimeout
strategy: Every second wake up and check a large number of matches, updating those which are complete. If you have multiple servers, you can prevent all but one of them from doing the check via synced-cron.
The advantage of this strategy is that it's straightforward to implement. The disadvantages are:
You may end up doing a lot of unnecessary database reads.
You have to be extremely careful that your processing time does not exceed the length of the period between checks (one second).
I'd recommend this strategy if you are confident that the runtime can be controlled. For example, if you can index your matches on an endTime so only a few matches need to be checked in each cycle.
Multiple setTimeouts
strategy: Add a setTimeout for each match on creation or when the sever starts. As each timeout expires, update the corresponding match.
The advantage of this strategy is that it potentially removes a considerable amount of unnecessary database traffic. The disadvantages are:
It may be a little more tricky to implement. E.g. you have to consider what happens on a server restart.
The naive implementation doesn't scale past a single server (see 1).
I'd recommend this strategy if you think you will use a single server for the foreseeable future.
Those are the trade-offs which occurred to me given the choices you presented. A more robust solution would probably involve technology outside of the meteor/mongo stack. For example, storing match times in redis and then listening for keyspace notifications.

This is all a matter of preference, to be honest with you.
I'm a big fan of writing small, independent programs, that each do one thing, and do it well. If you're also like this, it's probably better to write separate programs to run periodically via cron.
This way you get guaranteed OS-controlled precision for the time, and small, simple programs that are easy to debug outside the context of your webapp.
This is just a preference though.

Related

Memory considerations or risks to a very long (once per day) debounced function in JavaScript

I'm wondering if there are any possible risks or memory considerations to using a debounce (in our case, Lodash's implementation: https://github.com/lodash/lodash/blob/ddfd9b11a0126db2302cb70ec9973b66baec0975/lodash.js#L10304) with very large intervals, i.e. 24 hours. Specifically my use case is an event-based call to the backend to send a reminder email, and we don't want to spam customers with more than 1 reminder per day.
This question Building a promise chain recursively in javascript - memory considerations has a similar discussion, but about promises, and many at that (1000s or more). This would be a bit different as it is a single debounced function, and I believe the Lodash implementation of debounce anyway properly handles and manages the setTimeout and timeoutId's properly, so maybe I'm over engineering / overthinking this and we should just ship it and see :)
To sum up, I understand this functionality should be delegated to some sort of backend queue at some point, but is it something we can "get away with" for now in the client?

Why is Non blocking asynchronous single-threaded faster for IO than blocking multi-threaded for some applications

It helps me understand things by using real world comparison, in this case fastfood.
In java, for synchronous blocking I understand that each request processed by a thread can only be completed one at a time. Like ordering through a drive through, so if im tenth in line I have to wait for the 9 cars ahead of me. But, I can open up more threads such that multiple orders are completed simultaneously.
In javascript you can have asynchronous non-blocking but single threaded. As I understand it, multiple requests are made, and those request are immediately accepted, but the request is processed by some background process at some later time before returning. I don't understand how this would be faster. If you order 10 burgers at the same time the 10 requests would be put in immediately but since there is only one cook (single thread) it still takes the same time to create the 10 burgers.
I mean I understand the reasoning, of why non blocking async single thread "should" be faster for somethings, but the more I ask myself questions the less I understand it which makes me not understand it.
I really dont understand how non blocking async single threaded can be faster than sync blocking multithreaded for any type of application including IO.
Non-blocking async single threaded is sometimes faster
That's unlikely. Where are you getting this from?
In multi-threaded synchronous I/O, this is roughly how it works:
The OS and appserver platform (e.g. a JVM) work together to create 10 threads. These are data structures represented in memory, and a scheduler running at the kernel/OS level will use these data structures to tell one of your CPU cores to 'jump to' some point in the code to run the commands it finds there.
The datastructure that represents a thread contains more or less the following items:
What is the location in memory of the instruction we were running
The entire 'stack'. If some function invokes a second function, then we need to remember all local variables and the point we were at in that original method, so that when the second method 'returns', it knows how to do that. e.g. your average java program is probably ~20 methods deep, so that's 20x the local vars, 20 places in code to track. This is all done on stacks. Each thread has one. They tend to be fixed size for the entire app.
What cache page(s) were spun up in the local cache of the core running this code?
The code in the thread is written as follows: All commands to interact with 'resources' (which are orders of magnitude slower than your CPU; think network packets, disk access, etc) are specified to either return the data requested immediately (only possible if everything you asked for is already available and in memory). If that is impossible, because the data you wanted just isn't there yet (let's say the packet carrying the data you want is still on the wire, heading to your network card), there's only one thing to do for the code that powers the 'get me network data' function: Wait until that packet arrives and makes its way into memory.
To not just do nothing at all, the OS/CPU will work together to take that datastructure that represents the thread, freeze it, find another such frozen datastructure, unfreeze it, and jump to the 'where did we leave things' point in the code.
That's a 'thread switch': Core A was running thread 1. Now core A is running thread 2.
The thread switch involves moving a bunch of memory around: All those 'live' cached pages, and that stack, need to be near that core for the CPU to do the job, so that's a CPU loading in a bunch of pages from main memory, which does take some time. Not a lot (nanoseconds), but not zero either. Modern CPUs can only operate on the data loaded in a nearby cachepage (which are ~64k to 1MB in size, no more than that, a thousand+ times less than what your RAM sticks can store).
In single-threaded asynchronous I/O, this is roughly how it works:
There's still a thread of course (all things run in one), but this time the app in question doesn't multithread at all. Instead, it, itself, creates the data structures required to track multiple incoming connections, and, crucially, the primitives used to ask for data work differently. Remember that in the synchronous case, if the code asks for the next bunch of bytes from the network connection then the thread will end up 'freezing' (telling the kernel to find some other work to do) until the data is there. In asynchronous modes, instead the data is returned if available, but if not available, the function 'give me some data!' still returns, but it just says: Sorry bud. I have 0 new bytes for you.
The app itself will then decide to go work on some other connection, and in that way, a single thread can manage a bunch of connections: Is there data for connection #1? Yes, great, I shall process this. No? Oh, okay. Is there data for connection #2? and so on and so forth.
Note that, if data arrives on, say, connection #5, then this one thread, to do the job of handling this incoming data, will presumably need to load, from memory, a bunch of state info, and may need to write it.
For example, let's say you are processing an image, and half of the PNG data arrives on the wire. There's not a lot you can do with it, so this one thread will create a buffer and store half of the PNG inside it. As it then hops to another connection, it needs to load the ~15% of the image it alrady got, and add onto that buffer the 10% of the image that just arrived in a network packet.
This app is also causing a bunch of memory to be moved around into and out of cache pages just the same, so in that sense it's not all that different, and if you want to handle 100k things at once, you're inevitably going to end up having to move stuff into and out of cache pages.
So what is the difference? Can you put it in fry cook terms?
Not really, no. It's all just data structures.
The key difference is in what gets moved into and out of those cache pages.
In the case of async it is exactly what the code you wrote wants to buffer. No more, no less.
In the case of synchronous, it's that 'datastructure representing a thread'.
Take java, for example: That means at the very least the entire stack for that thread. That's, depending on the -Xss parameter, about 128k worth of data. So, if you have 100k connections to be handled simultaneously, that's 12.8GB of RAM just for those stacks!
If those incoming images really are all only about 4k in size, you could have done it with 4k buffers, for only 0.4GB of memory needed at most, if you handrolled that by going async.
That is where the gain lies for async: By handrolling your buffers, you can't avoid moving memory into and out of cache pages, but you can ensure it's smaller chunks. and that will be faster.
Of course, to really make it faster, the buffer for storing state in the async model needs to be small (not much point to this if you need to save 128k into memory before you can operate on it, that's how large those stacks were already), and you need to handle so many things at once (10k+ simultaneous).
There's a reason we don't write all code in assembler or why memory managed languages are popular: Handrolling such concerns is tedious and error-prone. You shouldn't do it unless the benefits are clear.
That's why synchronous is usually the better option, and in practice, often actually faster (those OS thread schedulers are written by expert coders and tweaked extremely well. You don't stand a chance to replicate their work) - that whole 'by handrolling my buffers I can reduce the # of bytes that need to be moved around a ton!' thing needs to outweigh the losses.
In addition, async is complicated as a programming model.
In async mode, you can never block. Wanna do a quick DB query? That could block, so you can't do that, you have to write your code as: Okay, fire off this job, and here's some code to run when it gets back. You can't 'wait for an answer', because in async land, waiting is not allowed.
In async mode, anytime you ask for data, you need to be capable of dealing with getting half of what you wanted. In synchronized mode, if you ask for 4k, you get 4k. The fact that your thread may freeze during this task until the 4k is available is not something you need to worry about, you write your code as if it just arrives as you ask for it, complete.
Bbbuutt... fry cooks!
Look, CPU design just isn't simple enough to put in terms of a restaurant like this.
You are mentally moving the bottleneck from your process (the burger orderer) to the other process (the burger maker).
This will not make your application faster.
When considering the single-threaded async model, the real benefit is that your process is not blocked while waiting for the other process.
In other words, do not associate async with the word fast but with the word free. Free to do other work.

Why observing oplog takes so much time in meteor / mongo?

I have a MongoLab cluster, which allows me to use Oplog tailing in order to improve performances, availability and redundancy in my Meteor.js app.
Problem is : since I've been using it, all my publications take more time to finish. When it only take like 200ms, that's not a problem, but it often takes much more, like here, where I'm subscribing to the publication I described here.
This publication already has a too long response time, and oplog observations are slowing it too, though it's far from being the only publication where observing oplog takes that much time.
Could anyone explain to me what's happening ? Nowhere where i search on the web I find any explanation about why observing oplog slow my publication that much.
Here are some screenshots from Kadira to illustrate what I'm saying :
Here is a screenshot from another pub/sub :
And finally, one where observing oplogs take a reasonable time (but still slow my pub/sub a bit) :
Oplog tailing is very fast. Oplog tailing isn't the issue here.
You're probably doing a lot of things that you don't realize make publications slow:
One-by-one document followed by update loops: You're probably doing a document update inside the body of a Collection.forEach call. This is incredibly slow, and the origin of your poor performance in method bodies. Every time you do a single document update that's listened to by hundreds of concurrent connections, each of those need to be updated; by doing a query following by an update one at a time, neither Mongo nor Meteor can optimize and they must wait for every single user to be updated on every change. It's a doubly-asymptotic increase in your performance. Solution: Think about how to do the update using {multi:true}.
Unique queries for every user: If you make a single change to a user document that has say, 100 concurrent unique subscriptions connected to it, the connections will be notified serially. That means the first connection will be notified in 90ms, while the last connection will be notified after 90ms * 100 users later. That's the other reason your observeChanges are slow. Solution: Think about if you really need a unique subscription on each users document. Meteor has optimizations for identical subscriptions shared between multiple concurrent collections.
Lots of documents: You're probably encoding each thread comment, post, chat message, etc. as its own document. Each document needs to be sent individually to each client, introducing some related overhead. This is the right schema for a relational database, and the wrong one for a document-based database. Solution: Try to hold every single thing you need to render a page to a user in a single document (de-normalization). With regards to chat, you should have a single "conversation" document that contains all the messages between two+ users.
Database isn't co-located with your host: If you're using MongoLab, your database may not be in the same datacenter as your web host (which I assume is Galaxy or Modulus). Intra-datacenter latencies can be very, very high, and this is probably the explanation for your poor collection reads. Indices, as other commenters have noticed, might play a role, but my bet is that you have fewer than a hundred records in any of these collections so it won't really matter.

How to call a mongodb function using monk/node.js?

Tyring to implement this:
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/#auto-increment-counters-collection
to provide sequence counters for a few things.
I've got the function stored in my db, but I can't call it without an error:
var model_id = db.eval('getNextSequence("model")');
Returns:
Object # has no method 'getNextSequence'
Is this because monk doesn't support the use of db functions via eval?
The getNextSequence method is part of your application code and runs in your application, not in the database. In the example from the mongodb docs, that application is the mongo shell which is basically a simple javascript-wrapper where you can easily declare your own methods.
In any case, implementing reliable, gap-free counters isn't trivial and should be avoided unless absolutely required:
incrementing a counter before insert will lead to gaps if the subsequent insert fails because of client crash, network partition, etc.
incrementing the counter after insert requires an optimistic insert loop (but doesn't require an explicit counter, as demonstrated in the link). This is more reliable, but gets inefficient with concurrent writers because it performs lots of queries and failed updates.
In a nutshell, such counters are OK if you're using them e.g. within a single account / tenant where the users of an account are humans. If your accounts are huge or if you have API clients, things get messy because they might burst and do 10,000 inserts in a few seconds which leads to a whole lot of conflicts.
Never use increment keys as primary keys.

Two JavaScript timers vs one timer, for performance, is it worth dropping one?

In a web app which I'm building, I have two loosely related bits of code running in two separate timers every one second.
I'm looking to optimize the Javascript, is it worth merging these two timers into one or is that just over the top?
Realistically, am I going to increase any performance (considering that we don't know what sort a system a visitor is running ) by merging two 1 second intervals into one 1 second interval?
As I understand it, JavaScript is single threaded so the more things happening, the more these stack up and block other things from happening (timers especially). I just don't know whether one measly timer running every second is an issue at all.
The reason for keeping the two timers separate would purely be code readability, which is fine on the server side where you control the hardware but I don't know what sort of browser or hardware my visitors will be running.
Thanks.
In terms of the overall number of operations that can be completed, no, there isn't going to be a measurable difference. It is possible for there to be a perceived performance advantage in keeping multiple timers, however. The more code you have running synchronously in a single timer iteration, the longer all DOM updates and certain types of user interactions are "halted". By splitting these up into multiple timers, you allow other updates to take place in between timer iterations, and therefore the user gets a "smoother" experience.
Odds are in this case there won't even be a difference in perceived performance either, though, so I'd do it whichever way makes the code organization simpler.
If performance really is an issue you could just create 1 timer, and for example use that to call both functions:
function update()
{
A(); //Do your first task
B(); //Do the second
setTimeout("update()", 1000);
}
update();
However, how sure are you that the bottleneck is within this timer? Try to measure first, and dont optimise the wrong parts of your application.
I would bet that you'd increase performance by eliminating clock handling at the JS level. You certainly won't degrade performance and, with just one timer running, I'd think that you'd enhance code maintainability, if not readability. In the app I'm working on right now, I have one timer running to handle three tasks: a special kind of scrolling, changing the background image of perhaps 300 cells, and checking to see if it's time to refresh the page and issuing an AJAX request if so. That timer is running with a 1/10-sec interval and things are tight, for sure, but the code gets through all of those jobs, once in a while with one clock tick coming on top of the previous.
So I doubt you'll have any trouble with a 1-sec interval and just one tick handler.

Categories