Should I multithread my Node JS web server?

Should I multithread my Node JS web server? - javascript

I have a simple Node.JS HTTPS web server using socket.io. Users can send to the server lots of different information. In a few seconds, you may have the web server receive multiple one line objects to the web server from one user (i.e. { code: '124' }, but there may be multiple users doing this at once. Then the web server returns information to all users in that socket.io room.
As this information comes in, the web server intermittently saves this data to a simple MySQL database although I am limiting saves so that there aren't multiple small MySQL writes per second or anything like that.
My thought process is that as more users log onto and connect to the web server, the code may become blocked as Node.JS is single threaded, and this may cause a lag in information getting back to the users in real time, or it may cause problems with the latest data being saved to the database. I was thinking of doing something like this so that the database updates is handled by a separate web worker -
socket.on('data', function(msg) {
try {
const newWorker = new Worker('./src/worker.js');
newWorker.on('message', function(result) {
io.to(`${socketID}`).emit('newData', result);
});
newWorker.on('error', (err) => {
io.to(`${socketID}`).emit('newData', { error: err });
console.dir(err);
});
newWorker.postMessage(msg);
} catch(e) {
console.log(e);
io.to(`${socketID}`).emit('criticalError', "We ran into an error - try refreshing");
}
});
I know however, that async processes can be run on multiple threads in the background of Node.JS and Node.JS is generally quite performant.
My question is, given that there may be multiple database writes happening simultaneously, as well as multiple pieces of information coming in, which then need to be sent back to users in any given second, does it make sense for me to use web workers to make this kind of process mulithreaded? Or is Node.JS capable enough to handle all of this in the background on multiple threads without me needing to worry?

Related

Web client polls backend to check it's up

A web client should only expose some features when a backend API is up and running. Therefor, I'm looking for a clean way to monitor the availability of this backend.
As a quick fix, I made a timer-based function that performs a basic GET on the API root. It's not very clean, generates lots of traffic and pollutes the javascript console with errors (in case of server down).
How should one deal with such situation?

You can trigger something in the lines of this when you need it:
function checkServerStatus()
{
setServerStatus("unknown");
var img = document.body.appendChild(document.createElement("img"));
img.onload = function()
{
setServerStatus("online");
};
img.onerror = function()
{
setServerStatus("offline");
};
img.src = "http://myserver.com/ping.gif";
}
Make ping.gif small (1 pixel) to make it as fast as possible.
Ofc you can do it more smoothly by accessing the API that returns true and keeps a really small response time, but that requires you to do some coding in back-end this simply needs you to place a 1-pixel gif image in a correct directory on a server. You can use any picture already present on the server, but expect more traffic and time as image grows larger.
Now put this in some function that calls it with delay, or simply call this when you need to check status, it's up to you.
If you need a server to send to your app a notification when it's down then you need to implement this:
https://en.wikipedia.org/wiki/Push_technology
Ideally, you would have high-reliability server that has fast response rate and is really reliable to be pinging the desired server in some interval to determine whether it up then use the push to get that information to your app. This way that 3rd server would only send you a push if a status of your app server has changed. Ideally, this server's request has a high priority on your app server queue and servers are well connected and close to each other but not on the same network in case that fails.
Recommendation:
First approach should do you good since it's simple to implement and requires the least amount of knowledge.
Consider second if:
You need a really small interval of checking making your application slower and network traffic higher
You have multiple applications that need the same - making load heavier on both each application, network AND the server. The second approach lets you use single ping to determine truth for all apps.

In order to limit number of request, simple solution can be use of server-sent events. This protocol used on top of HTTP allow server to push multiple updates in response of the same client request.
Client side code (javascript) :
var evtSource = new EventSource("backend.php");
evtSource.onmessage = function(e) {
console.log('status:' + e.data);
}
evtSource.onerror = function(e) {
// add some retry then display error to the user
}
Backend code (PHP, also supported by other languages)
header("Content-Type: text/event-stream\n\n");
while (1) {
// Each 30s, send OK status
echo "OK\n";
ob_flush();
flush();
sleep(30);
}
In both case it will limit number of request (only 1 per "session") but you will have 1 socket per client opened, which can be also to heavy for your server.
If you really want to lower the workload, you should delegate it to external monitoring platform which can expose API to publish backend status.
Maybe it already exists if your backend is hosted on cloud platform.

Can WebSockets replace AJAX when it comes to Database requests?

This may seem like an extremely dumb question, but I am currently switching my Website from using the EventSource Polling constructor to the WebSocket standard which is implemented in Node.js.
Originally, all backend on my website was handled with PHP. With the introduction of Node.js, I am trying to switch as much as I can without going outside of the "standard". By standard, I meaning that typically I see WebSocket implementations that send small data, and receive small data back vs. performing database queries and then sending large amounts of data back to the client.
Can WebSockets replace AJAX when it comes to Database requests?
Let's consider a small hello world program in PHP/JavaScript (AJAX) vs Node.js/JavaScript (WebSockets)
PHP/JavaScript (AJAX)
// HelloWorld.php with Laravel in the Backend
Table::update([ 'column' => $_POST['message'] ]);
echo $_POST['message'];
Ajax.js with a custom ajax function
Global.request("HelloWorld.php").post({
message: "Hello World"
}).then(message => alert(message));
Node.js/JavaScript (WebSockets)
// skip all the server setup
server.on('connection', function () {
server.on('message', function (message) {
sqlConnection.query("UPDATE `table` SET `column` = ?", [message], function () {
server.send(message);
});
});
});
WebSocket.js:
let socket = new WebSocket('ws://example.com');
socket.onmessage = function (message) {
alert(message)
}
socket.send("Hello World");
They both essentially do the same thing, in a slightly different way. Now, in this scale it would not make sense to use WebSockets. Though an example, imagine it scaled up to a point where Node.js is processing bigger queries and sending lots of data to the client. Is this acceptable?

Yes, theoretically, you could trigger a db query with websockets. Both HTTP and Websockets are built on TCP which do the job of transferring data of being a the bridge between network requests and responses.
The bigger issue is that Websockets were intended to lessen burden of opening/closing network ports, which you would have to do for ajax. This comes with several application-level benefits including real-time media streaming.
So what's the benefit of sticking with HTTP if you don't have a specific use case for web sockets? That HTTP is built has a robust ecosystem of tools - HTTP is largely plug & play. Think of stuff like security and standardization. Web sockets is a relatively new technology and hasn't developed this same ecosystem.

How do I sync data with remote database in case of offline-first applications?

I am building a "TODO" application which uses Service Workers to cache the request's responses and in case a user is offline, the cached data is displayed to the user.
The Server exposes an REST-ful endpoint which has POST, PUT, DELETE and GET endpoints exposed for the resources.
Considering that when the user is offline and submitting a TODO item, I save that to local IndexedDB, but I can't send this POST request for the server since there is no network connection. The same is true for the PUT, DELETE requests where a user updates or deletes an existing TODO item
Questions
What patterns are in use to sync the pending requests with the REST-ful Server when the connection is back online?

What patterns are in use to sync the pending requests with the REST-ful Server when the connection is back online?
Background Sync API will be suitable for this scenario. It enables web applications to synchronize data in the background. With this, it can defer actions until the user has a reliable connection, ensuring that whatever the user wants to send is actually sent. Even if the user navigates away or closes the browser, the action is performed and you could notify the user if desired.
Since you're saving to IndexDB, you could register for a sync event when the user add, delete or update a TODO item
function addTodo(todo) {
return addToIndeDB(todo).then(() => {
// Wait for the scoped service worker registration to get a
// service worker with an active state
return navigator.serviceWorker.ready;
}).then(reg => {
return reg.sync.register('add-todo');
}).then(() => {
console.log('Sync registered!');
}).catch(() => {
console.log('Sync registration failed :(');
});
}
You've registered a sync event of type add-todo which you'll listen for in the service-worker and then when you get this event, you retrieve the data from the IndexDB and do a POST to your Restful API.
self.addEventListener('sync', event => {
if (event.tag == 'add-todo') {
event.waitUntil(
getTodo().then(todos => {
// Post the messages to the server
return fetch('/add', {
method: 'POST',
body: JSON.stringify(todos),
headers: { 'Content-Type': 'application/json' }
}).then(() => {
// Success!
});
})
})
);
}
});
This is just an example of how you could achieve it using Background Sync. Note that you'll have to handle conflict resolution on the server.
You could use PouchDB on the client and Couchbase or CouchDB on the server. With PouchDB on the client, you can save data on the client and set it to automatically sync/replicate the data whenever the user is online. When the database synchronizes and there are conflicting changes, CouchDB will detect this and will flag the affected document with the special attribute "_conflicts":true. It determines which one it'll use as the latest revision, and save the others as the previous revision of that record. It does not attempt to merge the conflicting revision. It is up to you to dictate how the merging should be done in your application. It's not so different from Couchbase too. See the links below for more on Conflict Resolution.
Conflict Management with CouchDB
Understanding CouchDB Conflict
Resolving Couchbase Conflict
Demystifying Conflict Resolution in Couchbase Mobile
I've used pouchDB and couchbase/couchdb/IBM cloudant but I've done that through Hoodie It has user authentication out-of-the box, handles conflict management, and a few more. Think of it like your backend. In your TODO application, Hoodie will be a great fit. I've written something on how to use Hoodie, see links Below:
How to build offline-smart application with Hoodie
Introduction to offline data storage and sync with PouchBD and Couchbase

At the moment I can think of two approaches and it depend on what storage options you are using at your backend.
If you are using an RDBMS to backup all data:
The problem with offline first systems in this approach is the possibility of conflict that you may face when posting new data or updating existing data.
As a first measure to avoid conflicts from happening you will have to generate unique IDs for all objects from your clients and in such a way that they remain unique when posted on the server and saved in a data base. For this you can safely rely on UUIDs for generating unique IDs for objects. UUID guarantees uniqueness across systems in a distributed system and depending on what your language of implementation is you will have methods to generate UUIDs without any hassle.
Design your local database such that you can use UUIDs as primary key in your local database. On the server end you can have both, an integer type auto incremented and indexed, primary key and a VARCHAR type to hold the UUIDs. The primary key on server uniquely identifies objects in that table while UUID uniquely identifies records across tables and databases.
So when posting your object to server at the time of syncing you will have to just check if any object with the UDID is already present and take appropriate action from there. When your are fetching objects from the server send both the primary key of the object from your table and the UDID for the objects. This why when you serialise the response in model objects or save them in local database you can tell the objects which have been synced from the ones which haven't as the objects that needs syncing will not have a primary key in your local database, just the UUID.
There may be a case when your server malfunctions and refuses to save data when you are syncing. In this case you can keep an integer variable in your objects that will keep a count of the number of times you have tried syncing it. If this number exceed by a certain value, say 3, you move on to sync the next object. Now what you do with the unsynced objects is up you the policy you have for such objects, as a solution you could discard them or keep them just locally.
If you are not using RDBMS
As an alternate approach, instead of keeping all objects you could keep transactions that each client perform locally to the server. Each client syncs just the transactions and the while fetching you get the current state by working all the transactions from bottom up. This is very similar to what Git uses. It saves changes in your repository in form of transactions like what has been added (or removed) and by whom. The current state of the repository for each user is worked from the transactions. This approach will not result in conflicts but as you can see its a little tricky to develop.

How to run simultaneous Node child processes

TL;DR: I have an endpoint on an Express server that runs some cpu-bound logic in a child_process. The problem is that if the server gets more than one request for that endpoint it won't run both requests simultaneously- it queues them up and runs them one-at-a-time. Is there a way to use Node child_process so that my server will perform multiple child processes simultaneously?
Long-Version: The major downfall of Node is that it is single-threaded and a logic-heavy (cpu-bound) request can make the server stop dead in its tracks so that it can't take anymore requests until that logic is finished running. I thought that I could work around this using child_process, which is working great in freeing up my server to take other requests. BUT- it will only execute child_processes one at a time, creating a queue that can get pretty backed-up. I also have a Node cluster setup so that my server is split into 8 separate "virtual servers" (8-core machine), so I guess I can technically run 8 of these child processes at once, but I want to be able to handle more traffic than that. Looking for a solution that will still allow me to use Node and Express, please only suggest using different technologies if you are absolutely sure this can't be efficiently done in my current environment. Thanks in advance for the help!
Endpoint:
app.get('/cpu-exec-file', function(req, res) {
child_process.execFile('node', ['./blocking_tasks/mathCruncher.js'], {timeout:30000}, function(err, stdout, stderr) {
var data = JSON.parse(stdout);
res.send(data);
})
});
mathCruncher.js:
var obj = {}
function myLoop (i) {
setTimeout(function () {
obj[i] = Math.random() * 100;
if (--i) {
myLoop(i);
} else {
string = JSON.stringify(obj);
console.log(string); // goes to stdout.
}
}, 1000)
};
myLoop(10);

Is there a way to use Node child_process so that my server will perform multiple child processes simultaneously?
message queue and back-end process.
i do exactly what you're wanting, using RabbitMQ. there are several other great messaging systems out there, like ZeroMQ and even Redis w/ some pub-sub libraries on top of it.
the gist of it is to send a request to your queueing system and have another process pick up the message, then run the process to do the work.
if you need a response from the worker, you can use bi-directional messaging with either a Request/Reply setup, or use status messages for really-long-running things.
if you're interested in the RabbitMQ side of things, I have a free email course on various patterns with RabbitMQ, including Request/Reply and status emails: http://derickbailey.com/email-courses/rabbitmq-patterns-for-applications/
and if you're interested in ground-up training on RMQ w/ Node, check out my training course at http://rabbitmq4devs.com

Node JS live text update with CloudMQTT

I have a node server which is connecting to CloudMQTT and receiving messages in app.js. I have my client web app running on the same node server and want to display my messages received in app.js elsewhere in a .ejs file, I'm struggling as to how best to do this.
app.js
// Create a MQTT Client
var mqtt = require('mqtt');
// Create a client connection to CloudMQTT for live data
var client = mqtt.connect('xxxxxxxxxxx', {
username: 'xxxxx',
password: 'xxxxxxx'
});
client.on('connect', function() { // When connected
console.log("Connected to CloudMQTT");
// Subscribe to the temperature
client.subscribe('Motion', function() {
// When a message arrives, do something with it
client.on('message', function(topic, message, packet) {
// ** Need to pass message out **
});
});
});

Basically you need a way for the client (browser code with EJS - HTML, CSS and JS) to receive live updates. There are basically two ways to do this from the client to the node service:
A websocket session instantiated by the client.
A polling approach.
What's the difference?
Under the hood, a websocket is full-duplex communication mechanism. That means that you can open a socket from the client (browser) to the node server and they can talk to each other both ways over a long-lived session. The pro is that updates are often times instantaneous without having to incur the cost of making another HTTP request as in the polling case. The con is that it uses a socket connection that may be long-lived, and there is typically a socket pool on any server that has limited ability to deal with many sockets. There are ways to scale around this issue, but if it's a big concern for you, you may want to go with polling.
Polling is where you set up an endpoint on your server that the client JS code hits every now and then. That endpoint will return you the updated information. The con is that you are now making a new request in order to get updates, which may not be desirable if a lot of updates are expected to come through and the app is expected to be updated in the timeliest manner possible (most of the time polling is sufficient though). The pro is that you do not have a live connection open on the server indefinitely.
Again, there are many more pros and cons, these are just the obvious ones. You decide how to implement it. When the client receives the data from either of these mechanisms, you may update the UI in any suitable manner.
From the server end, you will need a way to persist the information coming from CloudMQTT. There are multiple ways to do this. If you do not care about memory consumption and are ok with potentially throwing away old data if a client does not ask for it for a while, then it may be ok to just store this in memory in a regular javascript object {}. If you do care about persisting the data between server restarts/crashes (probably best), then you can persist to something like Redis, Mongo, any of the SQL stores if your data is relational in nature, or even a regular JSON file on disk (see fs.writeFile).
Hope this helped give you a step in the right direction!

We Keep Coding

JavaScript is the programming language of the Web.