dealing with long server side calculations in meteor

dealing with long server side calculations in meteor - javascript

I am using jimp (https://www.npmjs.com/package/jimp) in meteor JS to generate an image server side. In other words I am 'calculating' the pixels of the image using a recursive algorithm. The algorithm takes quite some time to complete.
The issue I am having is that this seems to completely block the meteor server. Users trying to visit the webpage while an image is being generated are forced to wait. The website is therefore not rendered at all.
Is there any (meteor) way to run the heavy recursive algorithm in a thread or something so that it does not block the entire website?

Node (and consequently meteor) runs in a single process which blocks on CPU activity. In short, node works really well when you are IO-bound, but as soon as you do anything that's compute-bound you need another approach.
As was suggested in the comments above, you'll need to offload this CPU-intensive activity to another process which could live on the same server (if you have multiple cores) or a different server.
We have a similar problem at Edthena were we need to transcode a subset of our video files. For now I decided to use a meteor-based solution, because it was easy to set up. Here's what we did:
When new transcode jobs need to happen, we insert a "video job" document in to the database.
On a separate server (we max out the full CPU when transcoding), we have an app which calls observe like this:
Meteor.startup(function () {
// Listen for non-failed transcode jobs in creation order. Use a limit of 1 to
// prevent multiple jobs of this type from running concurrently.
var selector = {
type: 'transcode',
state: { $ne: 'failed' },
};
var options = {
sort: { createdAt: 1 }, limit: 1,
};
VideoJobs.find(selector, options).observe({
added: function (videoJob) {
transcode(videoJob);
}, });
});
As the comments indicate this allows only one job to be called at a time, which may or may not be what you want. This has the further limitation that you can only run it on one app instance (multiple instances calling observe would simultaneously complete the job). So it's a pretty simplistic job queue, but it may work for your purposes for a while.
As you scale, you could use a more robust mechanism for dequeuing and processing the tasks like Amazon's sqs service. You can also explore other meteor-based solutions like job-collection.

I believe you're looking for Meteor.defer(yourFunction).
Relevant Kadira article: https://kadira.io/academy/meteor-performance-101/content/make-your-app-faster

Thanks for the comments and answers! It seems to be working now. What I did is what David suggested. I am running a meteor app on the same server. This app deals with the generating of the images. However, this resulted in the app still eating away all the processing power.
As a result of this I set a slightly lower priority on the generating algorithm with the renice command on the PID. (https://www.nixtutor.com/linux/changing-priority-on-linux-processes/) This works! Any time a user logs into the website the other (client) meteor application gains priority over the generating algorithm. Absolutely no delay at all anymore now.
The only issue I am having now is that whenever the server restarts I somehow have to rerun or run the (re)nice command.
Since I am using meteor up for deployment both apps run the same user and the same 'command': node main.js. I am currently trying to figure out how to run the nice command within the startup script of meteor up. (located at /etc/init/.conf)

Related

Performance in Websockets blob decoding causing memory problems

I have a performance problem in Javascript causing a crash lately at work. With the objective modernising our applications, we are looking into running our applications as webservers, onto which our client would connect via a browser (chrome, firefox, ...), and having all our interfaces running as HTML+JS webpages.
To give you an overview of our performance needs, our application run image processing from camera sources, running in some cases at more than 20 fps, but in the most case around 2-3fps max.
Basically, we have a Webserver written in C++, which HTTP requests, and provides the user with the HTML pages of the interface and the corresponding JS scripts of the application.
In order to simplify the communication between the two applications, I then open a web socket between the webpage and the c++ server to send formatted messages back and forth. These messages can be pretty big, up to several Mos.
It all works pretty well as long as the FPS stays relatively low. When the fps increases the following two things happen.
Either the c++ webserver memory footprint increases pretty fast and crashes when no more memory is available. After investigation, this happens when the network usage full, and the websocket cache fills up. I think this is due to the websocket TCP-IP way of doing stuff, as the socket must wait for the message to be sent and received to send the next one.
Or the browser crashes after a while, showing the Aw snap screen (see figure below). It seems in that case that the same thing more or less happen but it seems this time due to the garbage collection strategy. The other figure below shows the printscreen of the memory usage when the application is running, clearly showing saw pattern. It seems to indicate that garbage collection is doing its work at intervals that are further and further away.
I have trapped the problem down to very big messages (>100Ko) being sent at fast rate per second. And the bigger the message, the faster it happens. In order to use the message I receive, I start a web worker, pass the blob i received to the web worker, the webworker uses a FileReaderSync to convert the message as an ArrayBuffer, and passes it back to the main thread. I expect this to have quite a lot of copies under the hood, but I am not so well versed in JS yet so to be sure of this statement. Also, I initialy did the same thing without the webworker (FileReader), but the framerate and CPU usage were really bad...
Here is the code I call to decode the messages:
function OnDataMessage(msg)
{
var webworkerDataMessage = new Worker('/js/EDXLib/MessageDecoderEvent.js'); // please no comments about this, it's actually a bit nicer on the CPU than reusing the same worker :-)
webworkerDataMessage.onmessage = MessageFileReaderOnLoadComManagerCBack;
webworkerDataMessage.onerror=ErrorHandler;
webworkerDataMessage.postMessage(msg.data);
}
function MessageFileReaderOnLoadComManagerCBack(e)
{
comManager.OnDataMessageReceived(e.data);
}
and the webworker code:
function DecodeMessage(msg)
{
var retMsg = new FileReaderSync().readAsArrayBuffer(msg);
postMessage(retMsg);
}
function receiveDecodingRequest(e)
{
DecodeMessage(e.data);
}
addEventListener("message", receiveDecodingRequest, true);
My question are the following:
Is there a way to make the GC not have to collect so much memory, by for instance telling some of the parts I use to reuse buffers instead of recreating them, or keeping the GC work intervals fixed ? This is something I know how to do in C++, but in JS ?
Is there another method I should use for my big payloads? Keep in mind that the transmission should be as fast as possible.
Is there another method for reading blob data as arraybuffers that would faster than what I did?
I thank you in advance for you help/comments.

As it turns out, the memory problem was due to the new WebWorker line and the new FileReaderSync line in the WebWorker.
Removing these greatly improved the performances!
Also, it turns out that this decoding operation is not necessary if I want to use the websocket as array buffer. I just need to set the binaryType attribute of websockets to "arraybuffer"...
So all in all, a very simple solution to a pain in the *** problem :-)

Meteor js publish and subscribe is very slow to react

I have used meteor publish and subscribe method to interact with client and server. Now according to my scenario I am using D3 js to generate a bar chart and as soon as the data is entered in mongo db collection I am using a client side function to generate a bar chart. My issue is that publish and subscribe is too slow to react. And even if I limit the number of documents returned by mongodb, the issue still persists. It is also inconsistent i.e. it will react under 1 second sometimes and other times it will take 4-5 second. Please guide me on what to do and what is wrong with my implementation.
Here is the server side code,
Test = new Mongo.Collection("test")
Meteor.publish('allowedData', function() {
return Test.find({});
})
and here is the client side code,
Test = new Mongo.Collection("test")
Meteor.subscribe('allowedData');
Meteor.setTimeout(function() {
Test.find().observe({
added: function(document){
//something
},
changed:function(){
//something
},
removed:function(){
//something
},
})

From your comments I see that you need a report chart which is reactive. Even though it is your requirement, it is too expensive to have a chart like this. In fact when you system grows bigger, say you have around 10000 documents for one single chart, this kind of chart will crash your server frequently.
To work around this problem, I have two suggestions:
Define a method that returns data for the chart. Set up a job/interval timer in client to call that method periodically. The interval value depends on your need, 10 seconds should be fine for charts. It is not completely reactive this way, you only get the newest data after an interval but it is still better than a slow and crash-frequent system. You could find good modules to manage job/timer here.
Use this Meteor package meteor-publish-join (disclaimer: I am the author), it is made to solve the kind of problem you have: the need to do reactive aggregations/joins on a big data set and still have a good overall performance

Save to 3 firebase locations with a slow internet connection

Sometimes I'm having issues with firebase when the user is on a slow mobile connection. When the user saves an entry to firebase I actually have to write to 3 different locations. Sometimes, the first one works, but if the connection is slow the 2nd and 3rd may fail.
This leaves me with entries in the first location that I constantly need to clean up.
Is there a way to help prevent this from happening?
var newTikiID = ref.child("tikis").push(tiki, function(error){
if(!error){
console.log("new tiki created")
var tikiID = newTikiID.key()
saveToUser(tikiID)
saveToGeoFire(tikiID, tiki.tikiAddress)
} else {
console.log("an error occurred during tiki save")
}
});

There is no Firebase method to write to multiple paths at once. Some future tools planned by the team (e.g. Triggers) may resolve this in the future.
This topic has been explored before and the firebase-multi-write README contains a lot of discussion on the topic. The repo also has a partial solution to client-only atomic writes. However, there is no perfect solution without a server process.
It's important to evaluate your use case and see if this really matters. If the second and third writes failed to write to a geo query, chances are, there's really no consequence. Most likely, it's essentially the same as if the first write had failed, or if all writes had failed; it won't appear in searches by geo location. Thus, the complexity of resolving this issue is probably a time sink.
Of course, it does cost a few bytes of storage. If we're working with millions of records, that may matter. A simple solution for this scenario would be to run and audit report that detects broken links between the data and geofire tables and cleans up old data.
If an atomic operation is really necessary, such as gaming mechanics where fairness or cheating could be an issue, or where integrity is lost by having partial results, there are a couple options:
1) Master Record approach
Pick a master path (the one that must exist) and use security rules to ensure other records cannot be written, unless the master path exists.
".write": "root.child('maste_path').child(newData.child('master_record_id')).exists()"
2) Server-side script approach
Instead of writing the paths separately, use a queue strategy.
Create an single event by writing a single event to a queue
Have a server-side process monitor the queue and process events
The server-side process does the multiple writes and ensures they
all succeed
If any fail, the server-side process handles
rollbacks or retries
By using the server-side queue, you remove the risk of a client going offline between writes. The server can safely survive restarts and retry events or failures when using the queue model.

I have had the same problem and I ended up choosing to use condition Conditional Request with the Firebase REST API in order to write data transactionally. See my question and answer. Firebase: How to update multiple nodes transactionally? Swift 3 .
If you need to write concurrently (but not transactionally) to several paths, you can do that now as Firebase supports multi-path updates. https://firebase.google.com/docs/database/rest/save-data
https://firebase.googleblog.com/2015/09/introducing-multi-location-updates-and_86.html

Node memory issues with Socket.io, how to collect and debug heap snapshots? How to do load tests?

I am having much trouble debugging my node app. I am using the Usage package to monitor memory like this:
var usage = require('usage');
module.exports = function (io, app, db) {
//every 3 seconds
new CronJob('*/3 * * * * *', function() {
var pid = process.pid;
usage.lookup(pid, function(err, result) {
io.sockets.emit('usage', result);
});
}):
};
Just after I start the app $ nodemon app.js it has about 80 megabytes footprint. With almost every refresh it slowly increases the memory usage, I got to about 160 megabytes (how can I automate this process? I cannot hit refresh button forever). Sometimes it would lower the memory usage but only slightly.
I also found out, that emitting large amount of data (2 MG) with Socket.io like this:
socket.emit(emitString, seasonsArray);
will cause memory to change randomly between about 150, 250, 350 MG and its only with manually refreshing the browser tab. I don't plan to always send this amount of data to every user but its still worrying.
I decided to try found out why Socket.io cause so large memory usage. Limiting transports won't help.
io.set('transports', ['jsonp-polling',
'polling']);
I was trying to find a better way to look into memory usage. I installed the node-webkit-agent but the heap snapshots I am able to collect are only about 20 MG so I think it doesn't work for me. Using it without nodemon won't help.
var agent = require('webkit-devtools-agent');
agent.start();
When I try to use it Node logs:
Timeline.supportsFrameInstrumentation is not implemented
Timeline.canMonitorMainThread is not implemented
CSS.getSupportedCSSProperties is not implemented
Network.enable is not implemented
Network.enable is not implemented
Page.getResourceTree is not implemented
CSS.enable is not implemented
Database.enable is not implemented
DOMStorage.enable is not implemented
So my questions are:
How can I automate page refreshes so I can test for how my app responds to load and if there are any memory leaks?
Why Socket.io causes my app footprint to jump randomly for even about 250 MG up?
And finally how should I debug and collect proper heap snapshots of my app? What does the community use?
Node version I use is 0.10.26, Socket.io is 1.1.0, express is 3.4.8. Please help :)

You can use selenium to automate browsers. I prefer to use Protractor, which is an abstraction on top of Selenium perfect for E2E testing of angular applications.
In protractor/selenium you can run almost unlimited number of browser instances at once and have them load up your app and perform any task a user would.
This is called e2e testing.
What you probably want is to start with regular unit testing-socket.io has node client, so just use that and you can open as many connections as you would like from your unit test. Your backend won't know the difference between node client and browser client.
250MB is not normal for socket.io 1.1.0. I suspect that you are probably leaking memory in some loop which loops a lot. Try tracking it down with https://github.com/lloyd/node-memwatch It is very easy to create memory leak in JS, especially when using events or any other pattern which expects programmer to clean up manually after himself.
Older version of socket.io did indeed leak little bit, but they fixed that in 1.0.0
Big memory consumption is not abnormal for node. Garbage collection is very expensive operation, so if you have free memory V8 rather just creates more garbage rather than frozing your whole process for few precious miliseconds.
For debugging node.js, most people use node-inspector
one afterthought- when limiting transports, try only websocket. Theoretically Websocket should be the least memmory hungry transport.

Alternatives to executing a script using cron job every second?

I have a radio station at Tunein.com. In order to update album art and artist information, I need to send the following
# Update the song now playing on a station
GET http://air.radiotime.com/Playing.ashx?partnerId=<id>&partnerKey=<key>&id=<stationid>&title=Bad+Romance&artist=Lady+Gaga
The only way I can think to do this would be by setting up a PHP/JS page that updates the &title and &artist part of the URL and sends it off if there is a change. But I'd have to execute it every second, or at least every few seconds, using cron.
Are there any other more efficient ways this could be done?
Thank you for your help.

None of the code in this answer was tested. Use at your own risk.
Since you do not control the third-party API and the API is not capable of pushing information to you when it's available (an ideal situation), your only option is to poll the API at some interval to look for changes and to make updates as necessary. (Be sure the API provider is okay with such an approach as it might violate terms of use designed to prevent system abuse.)
You need some sort of long-running process that will execute at a given interval.
You mentioned cron calling a PHP script which is one option (here cron is the long-running process). Cron is very stable and would be a good choice. I believe though that cron has a minimum interval of 1 minute. I'm sure there are similar tools out there, but those might require you to have full control over your server.
You could also make a PHP script the long-running process with something like this:
while(true){
doUpdates(); # Call the API, make updates, etc
sleep(5); # Wait 5 seconds
}
If you do go down the PHP route, error handling of some sort will be a must:
while(true){
try{
doUpdates();
} catch (Exception $e) {
# manage the error
}
sleep(5);
}
Personal Advice
Using PHP as a daemon is possible but it is not as well tested as the typical use of PHP. If this task was given to me, I'd write a server/application in JavaScript using Node.js. I would prefer Node because it is designed to work as a long running process and intervals/events are a key part of JavaScript and I would be more confident in that working well than PHP for this specific task.

We Keep Coding

JavaScript is the programming language of the Web.