NodeJS Semaphore module: how does it work?

NodeJS Semaphore module: how does it work? - javascript

So here I am with my CRON call to a nodeJS application. Currently there is no guarantee that my script will not be runned several times at a given moment ; I want the script to end whenever it is already running.
Semaphores are the usual way to do it. As I'm fairly new to NodeJS, I wanted to study and implement an existing module for that, if possible.
So I found semaphore for node, seemingly the most popular pre-made solution for node. However the documentation is extremely sparse, and studying the code gives me a headache:
https://github.com/abrkn/semaphore.js/blob/master/lib/semaphore.js
I thought the module would use some kind of file to know if the process is already running, however it uses global to store that information. How is that? That would certainly prevent two sub-scripts to run at the same time in a single CRON call, but that wouldn't prevent two different calls to CRON to execute, would it?
Or is it possible that global is shared between all running NodeJS tasks?
EDIT:
I have to precise that this app is not an HTTP application, it is just a collection of CRONed scripts. So maybe that semaphore module isn't even the right direction to take, since it should apparently be used with createServer().

You can use the flock command, that locks the simultaneous access to a resource, for instance:
* * * * * /usr/bin/flock -w 0 /path/to/cron.lock /usr/bin/node /path/to/script.js
The moment flock starts, it locks the lock-file you specify in the command. So flock is a pretty good way to prevent cronjobs from overlapping by using an extra Command Line tool.
There's a good explanation at: your man page or http://man7.org/linux/man-pages/man2/flock.2.html

Related

Make `node --prof` wait a bit before recording anything?

I have a script that is mostly callbacks for network events. The callbacks finish very quickly. In contrast, the script takes a relatively long time to initialize. I don't really care how long it takes to initialize, I just want to optimize the event callbacks. If I run node --prof, most of the results are from the initialization.
How can I make Node not record anything until it's done initializing? In other words, how can I programmatically enable and disable profiling?

You could always pipe the output into flamebearer to get a flame graph.
node --prof app.js
node --prof-process --preprocess -j isolate*.log | npx flamebearer
It won't stop node from profiling your entire application, but it will likely give you enough information to drill down in the flame graph and ignore the pieces around the edges that you don't want to see, like the app bootstrap.
You could always try executing the code that you want to profile in a loop as well, to give them a larger representation. Or use node --prof in tandem with benchmark.

node --prof is meant to profile the entire application from the beginning. You could try using another profiler like vTune or the one built into webstorm. Another thing I would try is debug. Debug can be used to log the time it takes between events, which I think is what you are trying to go for.

dealing with long server side calculations in meteor

I am using jimp (https://www.npmjs.com/package/jimp) in meteor JS to generate an image server side. In other words I am 'calculating' the pixels of the image using a recursive algorithm. The algorithm takes quite some time to complete.
The issue I am having is that this seems to completely block the meteor server. Users trying to visit the webpage while an image is being generated are forced to wait. The website is therefore not rendered at all.
Is there any (meteor) way to run the heavy recursive algorithm in a thread or something so that it does not block the entire website?

Node (and consequently meteor) runs in a single process which blocks on CPU activity. In short, node works really well when you are IO-bound, but as soon as you do anything that's compute-bound you need another approach.
As was suggested in the comments above, you'll need to offload this CPU-intensive activity to another process which could live on the same server (if you have multiple cores) or a different server.
We have a similar problem at Edthena were we need to transcode a subset of our video files. For now I decided to use a meteor-based solution, because it was easy to set up. Here's what we did:
When new transcode jobs need to happen, we insert a "video job" document in to the database.
On a separate server (we max out the full CPU when transcoding), we have an app which calls observe like this:
Meteor.startup(function () {
// Listen for non-failed transcode jobs in creation order. Use a limit of 1 to
// prevent multiple jobs of this type from running concurrently.
var selector = {
type: 'transcode',
state: { $ne: 'failed' },
};
var options = {
sort: { createdAt: 1 }, limit: 1,
};
VideoJobs.find(selector, options).observe({
added: function (videoJob) {
transcode(videoJob);
}, });
});
As the comments indicate this allows only one job to be called at a time, which may or may not be what you want. This has the further limitation that you can only run it on one app instance (multiple instances calling observe would simultaneously complete the job). So it's a pretty simplistic job queue, but it may work for your purposes for a while.
As you scale, you could use a more robust mechanism for dequeuing and processing the tasks like Amazon's sqs service. You can also explore other meteor-based solutions like job-collection.

I believe you're looking for Meteor.defer(yourFunction).
Relevant Kadira article: https://kadira.io/academy/meteor-performance-101/content/make-your-app-faster

Thanks for the comments and answers! It seems to be working now. What I did is what David suggested. I am running a meteor app on the same server. This app deals with the generating of the images. However, this resulted in the app still eating away all the processing power.
As a result of this I set a slightly lower priority on the generating algorithm with the renice command on the PID. (https://www.nixtutor.com/linux/changing-priority-on-linux-processes/) This works! Any time a user logs into the website the other (client) meteor application gains priority over the generating algorithm. Absolutely no delay at all anymore now.
The only issue I am having now is that whenever the server restarts I somehow have to rerun or run the (re)nice command.
Since I am using meteor up for deployment both apps run the same user and the same 'command': node main.js. I am currently trying to figure out how to run the nice command within the startup script of meteor up. (located at /etc/init/.conf)

Alternatives to executing a script using cron job every second?

I have a radio station at Tunein.com. In order to update album art and artist information, I need to send the following
# Update the song now playing on a station
GET http://air.radiotime.com/Playing.ashx?partnerId=<id>&partnerKey=<key>&id=<stationid>&title=Bad+Romance&artist=Lady+Gaga
The only way I can think to do this would be by setting up a PHP/JS page that updates the &title and &artist part of the URL and sends it off if there is a change. But I'd have to execute it every second, or at least every few seconds, using cron.
Are there any other more efficient ways this could be done?
Thank you for your help.

None of the code in this answer was tested. Use at your own risk.
Since you do not control the third-party API and the API is not capable of pushing information to you when it's available (an ideal situation), your only option is to poll the API at some interval to look for changes and to make updates as necessary. (Be sure the API provider is okay with such an approach as it might violate terms of use designed to prevent system abuse.)
You need some sort of long-running process that will execute at a given interval.
You mentioned cron calling a PHP script which is one option (here cron is the long-running process). Cron is very stable and would be a good choice. I believe though that cron has a minimum interval of 1 minute. I'm sure there are similar tools out there, but those might require you to have full control over your server.
You could also make a PHP script the long-running process with something like this:
while(true){
doUpdates(); # Call the API, make updates, etc
sleep(5); # Wait 5 seconds
}
If you do go down the PHP route, error handling of some sort will be a must:
while(true){
try{
doUpdates();
} catch (Exception $e) {
# manage the error
}
sleep(5);
}
Personal Advice
Using PHP as a daemon is possible but it is not as well tested as the typical use of PHP. If this task was given to me, I'd write a server/application in JavaScript using Node.js. I would prefer Node because it is designed to work as a long running process and intervals/events are a key part of JavaScript and I would be more confident in that working well than PHP for this specific task.

Sandboxing Node.js modules - can it be done?

I'm learning Node.js (-awesome-), and I'm toying with the idea of using it to create a next-generation MUD (online text-based game). In such games, there are various commands, skills, spells etc. that can be used to kill bad guys as you run around and explore hundreds of rooms/locations. Generally speaking, these features are pretty static - you can't usually create new spells, or build new rooms. I however would like to create a MUD where the code that defines spells and rooms etc. can be edited by users.
That has some obvious security concerns; a malicious user could for example upload some JS that forks the child process 'rm -r /'. I'm not as concerned with protecting the internals of the game (I'm securing as much as possible, but there's only so much you can do in a language where everything is public); I could always track code changes wiki-style, and punish users who e.g. crash the server, or boost their power over 9000, etc. But I'd like to solidly protect the server's OS.
I've looked into other SO answers to similar questions, and most people suggest running a sandboxed version of Node. This won't work in my situation (at least not well), because I need the user-defined JS to interact with the MUD's engine, which itself needs to interact with the filesystem, system commands, sensitive core modules, etc. etc. Hypothetically all of those transactions could perhaps be JSON-encoded in the engine, sent to the sandboxed process, processed, and returned to the engine via JSON, but that is an expensive endeavour if every single call to get a player's hit points needs to be passed to another process. Not to mention it's synchronous, which I would rather avoid.
So I'm wondering if there's a way to "sandbox" a single Node module. My thought is that such a sandbox would need to simply disable the 'require' function, and all would be bliss. So, since I couldn't find anything on Google/SO, I figured I'd pose the question myself.

Okay, so I thought about it some more today, and I think I have a basic strategy:
var require = function(module) {
throw "Uh-oh, untrusted code tried to load module '" + module + "'";
}
var module = null;
// use similar strategy for anything else susceptible
var loadUntrusted = function() {
eval(code);
}
Essentially, we just use variables in a local scope to hide the Node API from eval'ed code, and run the code. Another point of vulnerability would be objects from the Node API that are passed into untrusted code. If e.g. a buffer was passed to an untrusted object/function, that object/function could work its way up the prototype chain, and replace a key buffer function with its own malicious version. That would make all buffers used for e.g. File IO, or piping system commands, etc., vulnerable to injection.
So, if I'm going to succeed in this, I'll need to partition untrusted objects into their own world - the outside world can call methods on it, but it cannot call methods on the outside world. Anyone can of course feel free to please tell me of any further security vulnerabilities they can think of regarding this strategy.

What's the most scale-friendly way to allow one node process to communicate between its own modules?

I've built a system whereby multiple modules are loaded into an "app.js" file. Each module has a route and schema attached. There will be times when a module will need to request data from another schema. Because I want to keep my code DRY, I want to communicate to another module that I want to request a certain piece of data and receive its response.
I've looked at using the following:
dnode (RPC calls)
Dnode seems more suitable for inter-process communication - I want to isolate these internal messages to within the process.
Faye (Pubsub)
Seems more like something used for inter-process communication, also seems like overkill
EventEmitter
I was advised by someone on #Node.js to stay away from eventEmitter if there are potentially a large amount of modules (and therefore a large amount of subscriptions)
Any suggestions would be very much appreciated!

Dependency injection and invoking other modules directly works.
So either
var m = require("othermodule")
m.doStuff();
Or use a DI library like nCore

We Keep Coding

JavaScript is the programming language of the Web.