Node.js single-thread mechanism - javascript

I learnt Node.js is single-threaded and non-blocking. Here I saw a nice explanation How, in general, does Node.js handle 10,000 concurrent requests?
But the first answer says
The seemingly mysterious thing is how both the approaches above manage to run workload in "parallel"? The answer is that the database is threaded. So our single-threaded app is actually leveraging the multi-threaded behaviour of another process: the database.
(1) which gets me confused. Take a simple express application as an example, when I
var monk = require('monk');
var db = monk('localhost:27017/databaseName');
router.get('/QueryVideo', function(req, res) {
var collection = db.get('videos');
collection.find({}, function(err, videos){
if (err) throw err;
res.render('index', { videos: videos })
});
});
And when my router responds multi requests by doing simple MongoDB query. Are those queries handled by different threads? I know there is only one thread in node to router client requests though.
(2) My second question is, how does such a single-threaded node application ensure security? I don't know much about security but it looks like multi requests should be isolated (at least different memory space?) Even though multi-threaded applications can't ensure security, because they still share many things. I know this may not be a reasonable question, but in today's cloud service, isolation seems to be an important topic. I am lost in topics such as serverless, wasm, and wasm based serverless using Node.js environment.
Thank you for your help!!

Since you asked about my answer I guess I can help clarify.
1
For the specific case of handling multiple parallel client requests which triggers multiple parallel MongoDB queries you asked:
Are those queries handled by different threads?
On node.js since MongoDB connects via the network stack (tcp/ip) all parallel requests are handled in a single thread. The magic is a system API that allows your program to wait in parallel. Node.js uses libuv to select which API to use depending on which OS at compile time. But which API does not matter. It is enough to know that all modern OSes have APIs that allow you to wait on multiple sockets in parallel (instead of the usual waiting for a single socket in multiple threads/processes). These APIs are collectively called asynchronous I/O APIs.
On MongoDB.. I don't know much about MongoDB. Mongo may be implemented in multiple threads or it may be singlethreaded like node.js. Disk I/O are themselves handled in parallel by the OS without using threads but instead use I/O channels (eg, PCI lanes) and DMA channels. Basically both threads/processes and asynchronous I/O are generally implemented by the OS (at least on Linux and Mac) using the same underlying system: OS events. And OS events are just functions that handle interrupts. Anyway, this is straying quite far from the discussion about databases..
I know that MySQL and Postgresql are both multithreaded to handle parsing the SQL query loop (query processing in SQL are basically operations that loop through rows and filter the result - this requires both I/O and CPU which is why they're multithreaded)
If you are still curious how computers can do things (like wait for I/O) without the CPU executing a single instruction you can check out my answers to the following related questions:
Is there any other way to implement a "listening" function without an infinite while loop?
What is the mechanism that allows the scheduler to switch which threads are executing?
2
Security is ensured by the language being interpreted and making sure the interpreter does not have any stack overflow or underflow bugs. For the most part this is true for all modern javascript engines. The main mechanism to inject code and execute foreign code via program input is via buffer overflow or underflow. Being able to execute foreign code then allows you to access memory. If you cannot execute foreign code being able to access memory is kind of moot.
There is a second mechanism to inject foreign code which is prevalent in some programming language cultures: code eval (I'm looking at you PHP!). Using string substitution to construct database queries in any language open you up to sql code eval attack (more commonly called sql injection) regardless of your program's memory model. Javascript itself has an eval() function. To protect against this javascript programmers simply consider eval evil. Basically protection against eval is down to good programming practices and Node.js being open source allows anyone to look at the code and report any cases where code evaluation attack is possible. Historically Node.js has been quite good in this regards - so your main guarantee about security from code eval is Node's reputation.

(1) The big picture goes like this; for nodejs there are 2 types of thread: event (single) and workers (pool). So long you don't block the event loop, after nodejs placed the blocked I/O call to worker thread; nodejs goes on to service next request. The worker will place the completed I/O back to the event loop for next course of action.
In short the main thread: "Do something else when it need to wait, come back and continue when the wait is over, and it does this one at a time".
And this reactive mechanism has nothing to do with thread running in another process (ie database). The database may deploy other type of thread management scheme.
(2) The 'memory space' in your question is in the same process space. A thread belongs to a process (ie Express app A) never run in other process (ie Fastify app B) space.

Related

Why async/await performs better than threads if it is just a wrapper around them?

This topic was on my mind for a long time.
Let's assume we have a typical web server, one in Node.js and the other in Java(or any other language with threads).
Why would node perform better (handle more IO/network based requests per second) than a java server just because it uses async/await? Isn't it just a syntatic sugar that utilizes the same threads java/c#/c++ use behind the scenes?
There is no reason to expect Node to be faster than a server written in Java. Why do you think it might be?
It seems the other answers here (so far) are explaining the benefits of asynchronous programming in JS compared to single-threaded synchronous operations -- that's obvious, but not the question.
The key point everyone agrees on is that certain operations are inherently slow (e.g.: waiting for network requests, waiting for disk/database access), and it's efficient to let the CPU do something else while such operations are in flight. Using several threads in your application is one well-established way to do that; but of course that's only possible in languages that give you threads. Many traditional server implementations (in Java, C, C++, ...) use one thread per request (or, equivalently, a thread pool to distribute incoming requests over). These threads can block waiting for, say, the database -- that's okay, the operating system will put the waiting thread to sleep and let the CPU work on another thread (handling another request) in the meantime. The end result is fairly similar to what you get with Node.
JavaScript, of course, doesn't make threads available to the programmer. But instead, it has this concept of scheduling requests with the JavaScript engine and providing a callback to be invoked upon completion of the request. That's how the overall system behaves similarly to a traditional threaded programming language: user code can say "do this stuff, then schedule a database access, and when the result is available, continue with this [callback] code here", and while waiting for the database request, the CPU gets to execute some other code. What you want to avoid is the CPU sitting around idly waiting while there is other work waiting for the CPU to have time for it, and both approaches (Java threads and JavaScript callbacks) accomplish that.
Finally, async/await (just like Promises) are indeed just syntactic sugar that make it easier to write callback-based code. Code using async/await isn't any faster than old-style code using callbacks directly, just prettier and less error-prone. It also isn't any faster than a (well-written) Java-based server.
Node.js is popular because it's convenient to use the same language for the client and server parts of an app. From a performance point of view, it's not better than traditional alternatives, it's just also not worse (or at least not much; in practice how efficiently you design your app matters more than whether you implement it in Java or JavaScript). Don't fall for the hype :-)
Asynchrony(asyn/await) is essential for activities that are potentially blocking, such as when your application accesses the web. Access to a web resource sometimes is slow or delayed. If such an activity is blocked within a synchronous process, the entire application must wait.
In an asynchronous process (thread), the application can continue with other work that doesn't depend on the web resource until the potentially blocking task finishes.
https://learn.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2012/hh191443(v=vs.110)?redirectedfrom=MSDN#threads Though you should understand the differences between threads and async/await programming.
In regards to Asynchronous programming, Usability is the key. For instance, one request can be split up into smaller tasks i.e. fetching into internal results, reading, writing, establishing connections etc... thus, half the time gets wasted waiting on dependent tasks. Asynchronous models use this time to handle other incoming requests keeping a callback function, registering in a queue saving the state and becomes available for another request. Thus, they can handle more requests.
Understand more on handling requests: https://www.youtube.com/watch?v=cCOL7MC4Pl0

Redis caching in nodejs

So I was looking at this module, and I cannot understand why it uses callbacks.
I thought that memory caching is supposed to be fast and that is also the purpose someone would use caching, because it's fast... like instant.
Adding callbacks implies that you need to wait for something.
But how much you need to wait actually? If the result gets back to you very fast, aren't you slowing things down by wrapping everything in callbacks + promises on top (because as a user of this module you are forced to promisify those callbacks) ?
By design, javascript is asynchronous for most of its external calls (http, 3rd parties libraries, ...).
As mention here
Javascript is a single threaded language. This means it has one call stack and one memory heap. As expected, it executes code in order and must finish executing a piece code before moving onto the next. It's synchronous, but at times that can be harmful. For example, if a function takes a while to execute or has to wait on something, it freezes everything up in the meanwhile.
Having synchronous function will block the thread and the execution of the script. To avoid any blocking (due to networking, file access, etc...), it is recommended to get these information asynchronously.
Most of the time, the redis caching will take a few ms. However, this is preventing a possible network lag and will keep your application up and running with a tiny amount of connectivity errors for your customers.
TLDR: reading from memory is fast, reading from network is slower and shouldn't block the process
You're right. Reading from memory cache is very fast. It's as fast as accessing any variable (micro or nano seconds), so there is no good reason to implement this as a callback/promise as it will be significantly slower. But this is only true if you're reading from the nodejs process memory.
The problem with using redis from node, is that the memory cache is stored on another machine (redis server) or at least another process. So the even if redis reads the data very quickly, it still has to go through the network to return to your node server, which isn't always guaranteed to be fast (usually few milliseconds at least). For example, if you're using a redis server which is not physically close to your nodejs server, or you have too many network requests, ... the request can take longer to reach redis and return back to your server. Now imagine if this was blocking by default, it would prevent your server from doing anything else until the request is complete. Which will result in a very poor performance as your server is sitting idle waiting for the network trip. That's the reason why any I/O (disk, network, ..) operation in nodejs should be async by default.
Alex, you remarked with "I thought that memory caching is supposed to be fast and that is also the purpose someone would use caching, because it's fast... like instant." And you're near being wholly right.
Now, what does Redis actually mean?
It means REmote DIctionary Server.
~ FAQ - Redis
Yes, a dictionary usually performs in O(1) time. However, do note that the perception of the said performance is effective from the facade of procedures running inside the process holding the dictionary. Therefore, access to the memory owned by the Redis process from another process, is a channel of operations that is not O(1).
So, because Redis is a REmote DIctionary Server asynchronous APIs are required to access its service.
As it has already been answered here, your redis instance could be on your machine (and accessing redis RAM storage is nearly as fast as accessing a regular javascript variable) but it could also be an another machine/cloud/cluster/you name it. And in that case, network latency could be problematic, that's why the promises/callbacks syntax.
If you are 100% confident that your Redis instance would always lay on the same machine your code is, that having some blocking asynchronous calls to it is fine, you could just use the ES6 await syntax to write it as blocking synchronous events and avoid the callbacks or the promises :)
But I'm not sure it is worth it, in term of coding habit and scalability. But every project is different and that could suits you.

NodeJS: understanding nonblocking / event queue / single thread

I'm new to Node and try to understand the non-blocking nature of node.
In the image below I've created a high level diagram of the request.
As I understand, all processes from a single user for a single app run on a single thread.
What I would like to understand is the how the logic of the event loop fits in this diagram. Is the event loop the same as the processor pipeline where instructions are queued?
Imagine that we load an app page into RAM that creates a stream to read from by the program:
readstream.on('data', function(data) {});
Instructions for creating the readstream and waiting for data to occur: does this instruction "hang" in a register (waiting for the I/O to finish) in the processor whereas in a multithreaded environment, the processor just doesn't take new instructions from the RAM until the result of the previous I/O request has been returned to the RAM?
Or is the way I see this entirely/partially wrong?
Just a supplementary (related, perhaps stupid) question: run different users on different threads on the server and isn't the single threaded benefit only for a single user?
I'm new to this type of detail, so excuse me if this question doesn't entirely make sense to you. But understanding this seems essential for me before moving forward.
Event-driven non-blocking I/O relies on the fact that modern operating systems have a 'select' method that performs polling at the O/S level (not wasting CPU cycles). The select method allows you to register callbacks for certain I/O events. This tends to be much more efficient than the 'thread-per-connection' model commonly used in thread enabled languages. For more info, do a 'man select' on a Unix/Linux OS.
Threads and I/O have to do with operating system implementation and services, not CPU architecture.
Operations that involve input/output devices of any kind — mass storage, networks, serial ports, etc. — are structured as requests from the CPU to an external device that, by one of several possible mechanisms, are later satisfied.
On top of that reality, operating systems provide alternative programming models. In one model, the factual nature of input/output operations are essentially disguised such that executing programs are given an API that appears to be synchronous. In a C program, a call to the write() system call will cause the entire process to delay until the operation has completed.
Another programming model more closely exposes the asynchronous reality of the system. That's what Node uses. Operating systems provide ways to initiate long-duration asynchronous operations, along with ways for a process to either check for results or to block and wait for results. In Node, the runtime system can juggle lots of separate operations because the entire model is based on code running in response to events. An event can be a synthetic thing (such as the "event" of a Node module being loaded and run initially), or it can be something that's a result of actual asynchronous external events. In the case of input/output operations, the Node runtime waits for operating system notification and translates that into an event that causes some JavaScript code to run.

Web Workers vs Promises

In order to make a web app responsive you use asynchronous non-blocking requests. I can envision two ways of accomplishing this. One is to use deferreds/promises. The other is Web Workers. With Web Workers we end up introducing another process and we incur the overhead of having to marshal data back and forth. I was looking for some kind of performance metrics to help understand when to chose simple non-blocking callbacks over Web Workers.
Is there some way of formulating which to use without having to prototype both approaches? I see lots of tutorials online about Web Workers but I don't see many success/failure stories. All I know is I want a responsive app. I'm thinking to use a Web Worker as the interface to an in-memory data structure that may be anywhere from 0.5-15MB (essentially a DB) that the user can query and update.
As I understand javascript processing, it is possible to take a single long-running task and slice it up so that it periodically yields control allowing other tasks a slice of processing time. Would that be a sign to use Web Workers?
`Deferred/Promises and Web Workers address different needs:
Deferred/promise are constructs to assign a reference to a result not yet available, and to organize code that runs once the result becomes available or a failure is returned.
Web Workers perform actual work asynchronously (using operating system threads not processes - so they are relatively light weight).
In other words, JavaScript being single threaded, you cannot use deferred/promises to run code asynchronously — once the code runs that fulfills the promise, no other code will run (you may change the order of execution, e.g. using setTimeout(), but that does not make your web app more responsive per se). Still, you might somehow be able to create the illusion of an asynchronous query by e.g. iterating over an array of values by incrementing the index every few milliseconds (e.g. using setInterval), but that's hardly practical.
In order to perform work such as your query asynchronously and thus off load this work from your app's UI, you need something that actually works asynchronously. I see several options:
use an IndexedDB which provides an asynchronous API,
run your own in-memory data structure, and use web workers, as you indicated, to perform the actual query,
use a server-side script engine such as NodeJS to run your code, then use client-side ajax to start the query (plus a promise to process the results),
use a database accessible over HTTP (e.g. Redis, CouchDB), and from the client issue an asynchronous GET (i.e. ajax) to query the DB (plus a promise to process the results),
develop a hybrid web app using e.g. Parse.
Which approach is the best in your case? Hard to tell without exact requirements, but here are the dimensions I would look at:
Code complexity — if you already have code for your data structure, probably Web Workers are a good fit, otherwise IndexedDB looks more sensible.
Performance — if you need consistent performance a server-side implementation or DB seems more appropriate
Architecture/complexity — do you want all processing to be done client side or can you afford the effort (cost) to manage a server side implementation?
I found this book a useful read.

What is Node.js? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I don't fully get what Node.js is all about. Maybe it's because I am mainly a web based business application developer. What is it and what is the use of it?
My understanding so far is that:
The programming model is event driven, especially the way it handles I/O.
It uses JavaScript and the parser is V8.
It can be easily used to create concurrent server applications.
Are my understandings correct? If yes, then what are the benefits of evented I/O, is it just more for the concurrency stuff? Also, is the direction of Node.js to become a framework like, JavaScript based (V8 based) programming model?
I use Node.js at work, and find it to be very powerful. Forced to choose one word to describe Node.js, I'd say "interesting" (which is not a purely positive adjective). The community is vibrant and growing. JavaScript, despite its oddities can be a great language to code in. And you will daily rethink your own understanding of "best practice" and the patterns of well-structured code. There's an enormous energy of ideas flowing into Node.js right now, and working in it exposes you to all this thinking - great mental weightlifting.
Node.js in production is definitely possible, but far from the "turn-key" deployment seemingly promised by the documentation. With Node.js v0.6.x, "cluster" has been integrated into the platform, providing one of the essential building blocks, but my "production.js" script is still ~150 lines of logic to handle stuff like creating the log directory, recycling dead workers, etc. For a "serious" production service, you also need to be prepared to throttle incoming connections and do all the stuff that Apache does for PHP. To be fair, Ruby on Rails has this exact problem. It is solved via two complementary mechanisms: 1) Putting Ruby on Rails/Node.js behind a dedicated webserver (written in C and tested to hell and back) like Nginx (or Apache / Lighttd). The webserver can efficiently serve static content, access logging, rewrite URLs, terminate SSL, enforce access rules, and manage multiple sub-services. For requests that hit the actual node service, the webserver proxies the request through. 2) Using a framework like Unicorn that will manage the worker processes, recycle them periodically, etc. I've yet to find a Node.js serving framework that seems fully baked; it may exist, but I haven't found it yet and still use ~150 lines in my hand-rolled "production.js".
Reading frameworks like Express makes it seem like the standard practice is to just serve everything through one jack-of-all-trades Node.js service ... "app.use(express.static(__dirname + '/public'))". For lower-load services and development, that's probably fine. But as soon as you try to put big time load on your service and have it run 24/7, you'll quickly discover the motivations that push big sites to have well baked, hardened C-code like Nginx fronting their site and handling all of the static content requests (...until you set up a CDN, like Amazon CloudFront)). For a somewhat humorous and unabashedly negative take on this, see this guy.
Node.js is also finding more and more non-service uses. Even if you are using something else to serve web content, you might still use Node.js as a build tool, using npm modules to organize your code, Browserify to stitch it into a single asset, and uglify-js to minify it for deployment. For dealing with the web, JavaScript is a perfect impedance match and frequently that makes it the easiest route of attack. For example, if you want to grovel through a bunch of JSON response payloads, you should use my underscore-CLI module, the utility-belt of structured data.
Pros / Cons:
Pro: For a server guy, writing JavaScript on the backend has been a "gateway drug" to learning modern UI patterns. I no longer dread writing client code.
Pro: Tends to encourage proper error checking (err is returned by virtually all callbacks, nagging the programmer to handle it; also, async.js and other libraries handle the "fail if any of these subtasks fails" paradigm much better than typical synchronous code)
Pro: Some interesting and normally hard tasks become trivial - like getting status on tasks in flight, communicating between workers, or sharing cache state
Pro: Huge community and tons of great libraries based on a solid package manager (npm)
Con: JavaScript has no standard library. You get so used to importing functionality that it feels weird when you use JSON.parse or some other build in method that doesn't require adding an npm module. This means that there are five versions of everything. Even the modules included in the Node.js "core" have five more variants should you be unhappy with the default implementation. This leads to rapid evolution, but also some level of confusion.
Versus a simple one-process-per-request model (LAMP):
Pro: Scalable to thousands of active connections. Very fast and very efficient. For a web fleet, this could mean a 10X reduction in the number of boxes required versus PHP or Ruby
Pro: Writing parallel patterns is easy. Imagine that you need to fetch three (or N) blobs from Memcached. Do this in PHP ... did you just write code the fetches the first blob, then the second, then the third? Wow, that's slow. There's a special PECL module to fix that specific problem for Memcached, but what if you want to fetch some Memcached data in parallel with your database query? In Node.js, because the paradigm is asynchronous, having a web request do multiple things in parallel is very natural.
Con: Asynchronous code is fundamentally more complex than synchronous code, and the up-front learning curve can be hard for developers without a solid understanding of what concurrent execution actually means. Still, it's vastly less difficult than writing any kind of multithreaded code with locking.
Con: If a compute-intensive request runs for, for example, 100 ms, it will stall processing of other requests that are being handled in the same Node.js process ... AKA, cooperative-multitasking. This can be mitigated with the Web Workers pattern (spinning off a subprocess to deal with the expensive task). Alternatively, you could use a large number of Node.js workers and only let each one handle a single request concurrently (still fairly efficient because there is no process recycle).
Con: Running a production system is MUCH more complicated than a CGI model like Apache + PHP, Perl, Ruby, etc. Unhandled exceptions will bring down the entire process, necessitating logic to restart failed workers (see cluster). Modules with buggy native code can hard-crash the process. Whenever a worker dies, any requests it was handling are dropped, so one buggy API can easily degrade service for other cohosted APIs.
Versus writing a "real" service in Java / C# / C (C? really?)
Pro: Doing asynchronous in Node.js is easier than doing thread-safety anywhere else and arguably provides greater benefit. Node.js is by far the least painful asynchronous paradigm I've ever worked in. With good libraries, it is only slightly harder than writing synchronous code.
Pro: No multithreading / locking bugs. True, you invest up front in writing more verbose code that expresses a proper asynchronous workflow with no blocking operations. And you need to write some tests and get the thing to work (it is a scripting language and fat fingering variable names is only caught at unit-test time). BUT, once you get it to work, the surface area for heisenbugs -- strange problems that only manifest once in a million runs -- that surface area is just much much lower. The taxes writing Node.js code are heavily front-loaded into the coding phase. Then you tend to end up with stable code.
Pro: JavaScript is much more lightweight for expressing functionality. It's hard to prove this with words, but JSON, dynamic typing, lambda notation, prototypal inheritance, lightweight modules, whatever ... it just tends to take less code to express the same ideas.
Con: Maybe you really, really like coding services in Java?
For another perspective on JavaScript and Node.js, check out From Java to Node.js, a blog post on a Java developer's impressions and experiences learning Node.js.
Modules
When considering node, keep in mind that your choice of JavaScript libraries will DEFINE your experience. Most people use at least two, an asynchronous pattern helper (Step, Futures, Async), and a JavaScript sugar module (Underscore.js).
Helper / JavaScript Sugar:
Underscore.js - use this. Just do it. It makes your code nice and readable with stuff like _.isString(), and _.isArray(). I'm not really sure how you could write safe code otherwise. Also, for enhanced command-line-fu, check out my own Underscore-CLI.
Asynchronous Pattern Modules:
Step - a very elegant way to express combinations of serial and parallel actions. My personal reccomendation. See my post on what Step code looks like.
Futures - much more flexible (is that really a good thing?) way to express ordering through requirements. Can express things like "start a, b, c in parallel. When A, and B finish, start AB. When A, and C finish, start AC." Such flexibility requires more care to avoid bugs in your workflow (like never calling the callback, or calling it multiple times). See Raynos's post on using futures (this is the post that made me "get" futures).
Async - more traditional library with one method for each pattern. I started with this before my religious conversion to step and subsequent realization that all patterns in Async could be expressed in Step with a single more readable paradigm.
TameJS - Written by OKCupid, it's a precompiler that adds a new language primative "await" for elegantly writing serial and parallel workflows. The pattern looks amazing, but it does require pre-compilation. I'm still making up my mind on this one.
StreamlineJS - competitor to TameJS. I'm leaning toward Tame, but you can make up your own mind.
Or to read all about the asynchronous libraries, see this panel-interview with the authors.
Web Framework:
Express Great Ruby on Rails-esk framework for organizing web sites. It uses JADE as a XML/HTML templating engine, which makes building HTML far less painful, almost elegant even.
jQuery While not technically a node module, jQuery is quickly becoming a de-facto standard for client-side user interface. jQuery provides CSS-like selectors to 'query' for sets of DOM elements that can then be operated on (set handlers, properties, styles, etc). Along the same vein, Twitter's Bootstrap CSS framework, Backbone.js for an MVC pattern, and Browserify.js to stitch all your JavaScript files into a single file. These modules are all becoming de-facto standards so you should at least check them out if you haven't heard of them.
Testing:
JSHint - Must use; I didn't use this at first which now seems incomprehensible. JSLint adds back a bunch of the basic verifications you get with a compiled language like Java. Mismatched parenthesis, undeclared variables, typeos of many shapes and sizes. You can also turn on various forms of what I call "anal mode" where you verify style of whitespace and whatnot, which is OK if that's your cup of tea -- but the real value comes from getting instant feedback on the exact line number where you forgot a closing ")" ... without having to run your code and hit the offending line. "JSHint" is a more-configurable variant of Douglas Crockford's JSLint.
Mocha competitor to Vows which I'm starting to prefer. Both frameworks handle the basics well enough, but complex patterns tend to be easier to express in Mocha.
Vows Vows is really quite elegant. And it prints out a lovely report (--spec) showing you which test cases passed / failed. Spend 30 minutes learning it, and you can create basic tests for your modules with minimal effort.
Zombie - Headless testing for HTML and JavaScript using JSDom as a virtual "browser". Very powerful stuff. Combine it with Replay to get lightning fast deterministic tests of in-browser code.
A comment on how to "think about" testing:
Testing is non-optional. With a dynamic language like JavaScript, there are very few static checks. For example, passing two parameters to a method that expects 4 won't break until the code is executed. Pretty low bar for creating bugs in JavaScript. Basic tests are essential to making up the verification gap with compiled languages.
Forget validation, just make your code execute. For every method, my first validation case is "nothing breaks", and that's the case that fires most often. Proving that your code runs without throwing catches 80% of the bugs and will do so much to improve your code confidence that you'll find yourself going back and adding the nuanced validation cases you skipped.
Start small and break the inertial barrier. We are all lazy, and pressed for time, and it's easy to see testing as "extra work". So start small. Write test case 0 - load your module and report success. If you force yourself to do just this much, then the inertial barrier to testing is broken. That's <30 min to do it your first time, including reading the documentation. Now write test case 1 - call one of your methods and verify "nothing breaks", that is, that you don't get an error back. Test case 1 should take you less than one minute. With the inertia gone, it becomes easy to incrementally expand your test coverage.
Now evolve your tests with your code. Don't get intimidated by what the "correct" end-to-end test would look like with mock servers and all that. Code starts simple and evolves to handle new cases; tests should too. As you add new cases and new complexity to your code, add test cases to exercise the new code. As you find bugs, add verifications and / or new cases to cover the flawed code. When you are debugging and lose confidence in a piece of code, go back and add tests to prove that it is doing what you think it is. Capture strings of example data (from other services you call, websites you scrape, whatever) and feed them to your parsing code. A few cases here, improved validation there, and you will end up with highly reliable code.
Also, check out the official list of recommended Node.js modules. However, GitHub's Node Modules Wiki is much more complete and a good resource.
To understand Node, it's helpful to consider a few of the key design choices:
Node.js is EVENT BASED and ASYNCHRONOUS / NON-BLOCKING. Events, like an incoming HTTP connection will fire off a JavaScript function that does a little bit of work and kicks off other asynchronous tasks like connecting to a database or pulling content from another server. Once these tasks have been kicked off, the event function finishes and Node.js goes back to sleep. As soon as something else happens, like the database connection being established or the external server responding with content, the callback functions fire, and more JavaScript code executes, potentially kicking off even more asynchronous tasks (like a database query). In this way, Node.js will happily interleave activities for multiple parallel workflows, running whatever activities are unblocked at any point in time. This is why Node.js does such a great job managing thousands of simultaneous connections.
Why not just use one process/thread per connection like everyone else? In Node.js, a new connection is just a very small heap allocation. Spinning up a new process takes significantly more memory, a megabyte on some platforms. But the real cost is the overhead associated with context-switching. When you have 10^6 kernel threads, the kernel has to do a lot of work figuring out who should execute next. A bunch of work has gone into building an O(1) scheduler for Linux, but in the end, it's just way way more efficient to have a single event-driven process than 10^6 processes competing for CPU time. Also, under overload conditions, the multi-process model behaves very poorly, starving critical administration and management services, especially SSHD (meaning you can't even log into the box to figure out how screwed it really is).
Node.js is SINGLE THREADED and LOCK FREE. Node.js, as a very deliberate design choice only has a single thread per process. Because of this, it's fundamentally impossible for multiple threads to access data simultaneously. Thus, no locks are needed. Threads are hard. Really really hard. If you don't believe that, you haven't done enough threaded programming. Getting locking right is hard and results in bugs that are really hard to track down. Eliminating locks and multi-threading makes one of the nastiest classes of bugs just go away. This might be the single biggest advantage of node.
But how do I take advantage of my 16 core box?
Two ways:
For big heavy compute tasks like image encoding, Node.js can fire up child processes or send messages to additional worker processes. In this design, you'd have one thread managing the flow of events and N processes doing heavy compute tasks and chewing up the other 15 CPUs.
For scaling throughput on a webservice, you should run multiple Node.js servers on one box, one per core, using cluster (With Node.js v0.6.x, the official "cluster" module linked here replaces the learnboost version which has a different API). These local Node.js servers can then compete on a socket to accept new connections, balancing load across them. Once a connection is accepted, it becomes tightly bound to a single one of these shared processes. In theory, this sounds bad, but in practice it works quite well and allows you to avoid the headache of writing thread-safe code. Also, this means that Node.js gets excellent CPU cache affinity, more effectively using memory bandwidth.
Node.js lets you do some really powerful things without breaking a sweat. Suppose you have a Node.js program that does a variety of tasks, listens on a TCP port for commands, encodes some images, whatever. With five lines of code, you can add in an HTTP based web management portal that shows the current status of active tasks. This is EASY to do:
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end(myJavascriptObject.getSomeStatusInfo());
}).listen(1337, "127.0.0.1");
Now you can hit a URL and check the status of your running process. Add a few buttons, and you have a "management portal". If you have a running Perl / Python / Ruby script, just "throwing in a management portal" isn't exactly simple.
But isn't JavaScript slow / bad / evil / spawn-of-the-devil? JavaScript has some weird oddities, but with "the good parts" there's a very powerful language there, and in any case, JavaScript is THE language on the client (browser). JavaScript is here to stay; other languages are targeting it as an IL, and world class talent is competing to produce the most advanced JavaScript engines. Because of JavaScript's role in the browser, an enormous amount of engineering effort is being thrown at making JavaScript blazing fast. V8 is the latest and greatest javascript engine, at least for this month. It blows away the other scripting languages in both efficiency AND stability (looking at you, Ruby). And it's only going to get better with huge teams working on the problem at Microsoft, Google, and Mozilla, competing to build the best JavaScript engine (It's no longer a JavaScript "interpreter" as all the modern engines do tons of JIT compiling under the hood with interpretation only as a fallback for execute-once code). Yeah, we all wish we could fix a few of the odder JavaScript language choices, but it's really not that bad. And the language is so darn flexible that you really aren't coding JavaScript, you are coding Step or jQuery -- more than any other language, in JavaScript, the libraries define the experience. To build web applications, you pretty much have to know JavaScript anyway, so coding with it on the server has a sort of skill-set synergy. It has made me not dread writing client code.
Besides, if you REALLY hate JavaScript, you can use syntactic sugar like CoffeeScript. Or anything else that creates JavaScript code, like Google Web Toolkit (GWT).
Speaking of JavaScript, what's a "closure"? - Pretty much a fancy way of saying that you retain lexically scoped variables across call chains. ;) Like this:
var myData = "foo";
database.connect( 'user:pass', function myCallback( result ) {
database.query("SELECT * from Foo where id = " + myData);
} );
// Note that doSomethingElse() executes _BEFORE_ "database.query" which is inside a callback
doSomethingElse();
See how you can just use "myData" without doing anything awkward like stashing it into an object? And unlike in Java, the "myData" variable doesn't have to be read-only. This powerful language feature makes asynchronous-programming much less verbose and less painful.
Writing asynchronous code is always going to be more complex than writing a simple single-threaded script, but with Node.js, it's not that much harder and you get a lot of benefits in addition to the efficiency and scalability to thousands of concurrent connections...
Locked. There are disputes about this answer’s content being resolved at this time. It is not currently accepting new interactions.
I think the advantages are:
Web development in a dynamic language (JavaScript) on a VM that is incredibly fast (V8). It is much faster than Ruby, Python, or Perl.
Ability to handle thousands of concurrent connections with minimal overhead on a single process.
JavaScript is perfect for event loops with first class function objects and closures. People already know how to use it this way having used it in the browser to respond to user initiated events.
A lot of people already know JavaScript, even people who do not claim to be programmers. It is arguably the most popular programming language.
Using JavaScript on a web server as well as the browser reduces the impedance mismatch between the two programming environments which can communicate data structures via JSON that work the same on both sides of the equation. Duplicate form validation code can be shared between server and client, etc.
V8 is an implementation of JavaScript. It lets you run standalone JavaScript applications (among other things).
Node.js is simply a library written for V8 which does evented I/O. This concept is a bit trickier to explain, and I'm sure someone will answer with a better explanation than I... The gist is that rather than doing some input or output and waiting for it to happen, you just don't wait for it to finish. So for example, ask for the last edited time of a file:
// Pseudo code
stat( 'somefile' )
That might take a couple of milliseconds, or it might take seconds. With evented I/O you simply fire off the request and instead of waiting around you attach a callback that gets run when the request finishes:
// Pseudo code
stat( 'somefile', function( result ) {
// Use the result here
} );
// ...more code here
This makes it a lot like JavaScript code in the browser (for example, with Ajax style functionality).
For more information, you should check out the article Node.js is genuinely exciting which was my introduction to the library/platform... I found it quite good.
Node.js is an open source command line tool built for the server side JavaScript code. You can download a tarball, compile and install the source. It lets you run JavaScript programs.
The JavaScript is executed by the V8, a JavaScript engine developed by Google which is used in Chrome browser. It uses a JavaScript API to access the network and file system.
It is popular for its performance and the ability to perform parallel operations.
Understanding node.js is the best explanation of node.js I have found so far.
Following are some good articles on the topic.
Learning Server-Side JavaScript with Node.js
This Time, You’ll Learn Node.js
The closures are a way to execute code in the context it was created in.
What this means for concurency is that you can define variables, then initiate a nonblocking I/O function, and send it an anonymous function for its callback.
When the task is complete, the callback function will execute in the context with the variables, this is the closure.
The reason closures are so good for writing applications with nonblocking I/O is that it's very easy to manage the context of functions executing asynchronously.
Two good examples are regarding how you manage templates and use progressive enhancements with it. You just need a few lightweight pieces of JavaScript code to make it work perfectly.
I strongly recommend that you watch and read these articles:
Video glass node
Node.js YUI DOM manipulation
Pick up any language and try to remember how you would manage your HTML file templates and what you had to do to update a single CSS class name in your DOM structure (for instance, a user clicked on a menu item and you want that marked as "selected" and update the content of the page).
With Node.js it is as simple as doing it in client-side JavaScript code. Get your DOM node and apply your CSS class to that. Get your DOM node and innerHTML your content (you will need some additional JavaScript code to do this. Read the article to know more).
Another good example, is that you can make your web page compatible both with JavaScript turned on or off with the same piece of code. Imagine you have a date selection made in JavaScript that would allow your users to pick up any date using a calendar. You can write (or use) the same piece of JavaScript code to make it work with your JavaScript turned ON or OFF.
There is a very good fast food place analogy that best explains the event driven model of Node.js, see the full article, Node.js, Doctor’s Offices and Fast Food Restaurants – Understanding Event-driven Programming
Here is a summary:
If the fast food joint followed a traditional thread-based model, you'd order your food and wait in line until you received it. The person behind you wouldn't be able to order until your order was done. In an event-driven model, you order your food and then get out of line to wait. Everyone else is then free to order.
Node.js is event-driven, but most web servers are thread-based.York explains how Node.js works:
You use your web browser to make a request for "/about.html" on a
Node.js web server.
The Node.js server accepts your request and calls a function to retrieve
that file from disk.
While the Node.js server is waiting for the file to be retrieved, it
services the next web request.
When the file is retrieved, there is a callback function that is
inserted in the Node.js servers queue.
The Node.js server executes that function which in this case would
render the "/about.html" page and send it back to your web browser."
Well, I understand that
Node's goal is to provide an easy way
to build scalable network programs.
Node is similar in design to and influenced by systems like Ruby's Event Machine or Python's Twisted.
Evented I/O for V8 javascript.
For me that means that you were correct in all three assumptions. The library sure looks promising!
Also, do not forget to mention that Google's V8 is VERY fast. It actually converts the JavaScript code to machine code with the matched performance of compiled binary. So along with all the other great things, it's INSANELY fast.
Q: The programming model is event driven, especially the way it handles I/O.
Correct. It uses call-backs, so any request to access the file system would cause a request to be sent to the file system and then Node.js would start processing its next request. It would only worry about the I/O request once it gets a response back from the file system, at which time it will run the callback code. However, it is possible to make synchronous I/O requests (that is, blocking requests). It is up to the developer to choose between asynchronous (callbacks) or synchronous (waiting).
Q: It uses JavaScript and the parser is V8.
Yes
Q: It can be easily used to create concurrent server applications.
Yes, although you'd need to hand-code quite a lot of JavaScript. It might be better to look at a framework, such as http://www.easynodejs.com/ - which comes with full online documentation and a sample application.

Categories