Relative newbie to nodejs here trying to figure out a performance issue in a newly built application.
I am running a performance test on my node 0.12.7 app and I find the server hanging intermittently. It needs a restart upon reaching that state. After confirming there is no memory leak (the process heap does not exceed 500 MB whereas the default max-heap size is 1.4GB I understand), we moved to checking CPU profile. I have used this snippet of code with a dependency on v8-profiler to get regular profiles
Here is one of the charts that we encountered from jmeter (although the server didn't hang)
We plotted flame graphs in Chrome by loading the CPU profiles. I was expecting to find the JS stuck somewhere at this point, but I find that exactly in that time range, the node server is idle for a long time. Could anyone help me understand what could be the probable causes for the server to stay idle while being bombarded with client requests, and eventually recovering to continue operations after 10 minutes?
I unfortunately have lost the data to check if the responses between 16:48:10 and 16:57:40 are error or success, but it is very likely that they are error responses from the proxy since node didn't have a care in the world.
Here are the flame charts seen in Chrome
Before 16.47 hrs,
Around 16.47 hrs
A couple of minutes after 16.47 hrs
There could be multiple reasons around this.
Server is not accepting the requests. Do you see drop in throughput after you reach the peak?
Have you checked the server logs to see if any exceptions are logged?
Try plotting trends of response time and throughput for your test duration.
You may want to see any IO bound operations in your code.
Check the processor queue length. You should see it building up if processes are not getting enough CPU.
Related
I'm developing an app that should receive a .CSV file, save it, scan it, and insert data of every record into DB and at the end delete the file.
With a file with about 10000 records there aren't problems but with a larger file the PHP script is correctly runned and all data are saved into DB but is printed ERROR 504 The server didn't respond in time..
I'm scanning the .CSV file with the php function fgetcsv();.
I've already edit settings into php.ini file (max execution time (120), etc..) but nothing change, after 1 minute the error is shown.
I've also try to use a javascript function to show an alert every 10 seconds but also in this case the error is shown.
Is there a solution to avoid this problem? Is it possible pass some data from server to client every tot seconds to avoid the error?
Thank's
Its typically when scaling issues pop up when you need to start evolving your system architecture, and your application will need to work asynchronously. This problem you are having is very common (some of my team are dealing with one as I write) but everyone needs to deal with it eventually.
Solution 1: Cron Job
The most common solution is to create a cron job that periodically scans a queue for new work to do. I won't explain the nature of the queue since everyone has their own, some are alright and others are really bad, but typically it involves a DB table with relevant information and a job status (<-- one of the bad solutions), or a solution involving Memcached, also MongoDB is quite popular.
The "problem" with this solution is ultimately again "scaling". Cron jobs run periodically at fixed intervals, so if a task takes a particularly long time jobs are likely to overlap. This means you need to work in some kind of locking or utilize a scheduler that supports running the job sequentially.
In the end, you won't run into the timeout problem, and you can typically dedicate an entire machine to running these tasks so memory isn't as much of an issue either.
Solution 2: Worker Delegation
I'll use Gearman as an example for this solution, but other tools encompass standards like AMQP such as RabbitMQ. I prefer Gearman because its simpler to set up, and its designed more for work processing over messaging.
This kind of delegation has the advantage of running immediately after you call it. The server is basically waiting for stuff to do (not unlike an Apache server), when it get a request it shifts the workload from the client onto one of your "workers", these are scripts you've written which run indefinitely listening to the server for workload.
You can have as many of these workers as you like, each running the same or different types of tasks. This means scaling is determined by the number of workers you have, and this scales horizontally very cleanly.
Conclusion:
Crons are fine in my opinion of automated maintenance, but they run into problems when they need to work concurrently which makes running workers the ideal choice.
Either way, you are going to need to change the way users receive feedback on their requests. They will need to be informed that their request is processing and to check later to get the result, alternatively you can periodically track the status of the running task to provide real-time feedback to the user via ajax. Thats a little tricky with cron jobs, since you will need to persist the state of the task during its execution, but Gearman has a nice built-in solution for doing just that.
http://php.net/manual/en/book.gearman.php
I'm working on a pretty simple web application that displays top users hitting a web page. I've written some code in d3 to convert the data to a bar graph and have written a simple php script to pull the data from a database. I use an ajax request to pull the data once every 5 seconds and update the bar graph. Without fail if I leave the page open in the background it will get to the point of the old
aw snap google chrome has run out of memory
I've gone through a bunch of sites on memory leaks and done what I could to prevent them but there is a decent enough chance I messed something up. The problem is other than that error coming without fail on this page (and I've written plenty of javascript applications where this doesn't happen) there is absolutely no evidence the leak is happening. The data I'm retrieving every 5 seconds is 212kb, and when I do a heap recording heap memory peaks at 25mb and doesn't really increase (it generally bounces between 10 and 25mb), so it seems like that 212kb is being garbage collected and is not accumulating in the heap. Similarly I've looked at the task manager and when I make that the only tab open there seems to be a good amount of fluctuation but definitely not a trend upwards. I've taken heap snapshots and they tend to be in the 10-15mb range and I really don't understand how to read the snapshot, but here's what it looks like after running for 15 minutes or so:
It's just getting extremely frustrating, I've spent 20-30 hours on this but it seems like a case of a watched pot never boils. If I look for evidence of it happening I can never find any, but if I leave it open while I leave my computer for a few hours, without fail the page crashes. It's almost like the garbage collector just waits for me to leave my computer before deciding to just stop running.
Any thoughts on what could be the culprit here or next steps to attempt to identify the culprit? This is my first experience with a memory leak and it's driving me crazy. Thanks.
In one of our apps, we're finding that when usage spikes, RSS (as reported by top(1) for example) increases, hits a plateau and starts oscillating, but never comes back down. After a bit of a break, stressing the server again results in another higher plateau. This trend continues into the many gigabytes of RAM.
At first we suspected a memory leak on the Javascript side, and so started tracking the statistics returned by process.memoryUsage():
A few phases here:
After starting the server, we hit it with 100 concurrent long running requests for a few minutes and then left the server running for the evening. RSS stayed at around 320MB for the night; and heap usage just kept oscillating. We were emitting traces every minute so maybe this explained the movement in the heap.
At around 9a, we hit it with another round of concurrent requests and transitioned to a workflow involving 2 threads of rolling long running requests (downloads of 10MB of data). We kept these running until around 2p and stopped them. During this phase, after the initial bump in RSS, it again started to level off and stay between around 400 and 450MB.
The last bit is another round of concurrent requests. Here, RSS grew to between 480MB and 500MB
I'm wondering if there are ways to figure out what's going on outside the heap that could be consuming all of this memory and not releasing it. Short of that, are there any hidden options for limiting memory usage outside of the heap?
And I have to ask. Are we making too much of this? Is this about normal for Node?
I'm getting some traffic to a server of mine and I'm not sure how to deal with this problem.
I've added the nodetime to my app, and here's the result of a heap snapshot. Retainers > Other is up to 88% from 78% (in a matter of a couple of minutes)
Overall system's free memory decrease:
It's slow, but definitely happens. The jump up around 21:20 is when I restarted the server.
The server itself is basically collecting logs: it saves incoming requests to MongoDB, reads from MongoDB once and occasionally sets Redis key. In other words, it's a pretty simple set-up.
How do I track down what what this buffer is? In addition, is there a list somewhere of basic do-not's that can cause this type of issue?
I should also mention that running stress tests with ab causes the server to consume proportionately more memory, so it's definitely a node.js issue and likely not another process that's eating up the memory.
Would it be helpful to dig through the code and rename as many anonymous functions as possible?
After having learnt node, javascript and all the rest the hard way, I am finally about to release my first web app.
So I subscribed to Amazon Web Services and created a micro instance, planning on the first year free tier to allow me to make the app available to the world.
My concern is more about hidden costs. I know that with the free tier comes 1 million I/O requests per month for the Amazon EC2 EBS.
Thing is, I started testing my app one the ec2 instance to check that everything was running fine; and I am already at more than 100, 000 I/O requests. And I have basically been the only one using it so far (37 hours that the instance runs).
So I am quite afraid of what could happen if my app gets some traffic, and I don't want to end up with a huge unexpected bill at the end of the month.
I find it quite surprising, because I mainly serve static stuff, and my server side code consists in :
Receving a search request from a client
1 http request to a website
1 https request to the youtube api
saving the data to a mongoDB
Sending the results to the client
Do you have any advice on how to dramatically reduce my IO?
I do not use any other Amazon services so far, maybe am I missing something?
Or maybe Amazon free tier in not enough in my case, but then what can it be enough for? I mean, my app is really simple after all.
I'd be really glad for any help you could provide me
Thanks!
You did not mention total number of visits to your app. So I am assuming you have fairly less visits.
What are I/O requests ?
A single I/O request is a read/write instruction that reaches the EBS volumes. Beware! Execution of large read/writes is broken into multiple smaller pieces, which is the block size of the volume.
Possible reasons of high I/O:
Your app uses lot of RAM. After you hit the limit, the OS starts swapping memory to and fro from swap area in your disk, constantly.
This is most likely the problem, the mongoDB search. mongoDB searches can be long complex queries internally. From one of the answers to this question, the person was using mySQL and it caused him 1 billion I/O requests in 24 days. So 1 database search can be many I/O requests.
Cache is disabled, or you write/modify lot of files. You mentioned you were testing. Free-teir is just not suitable for developing stuff.
You should read this, in case you want to know what happens after free-tier expires.
I recently ran into a similar situation recording very high I/O request rates for a website with little to no traffic. The culprit seems to be a variation of what #prajwalkman discovered testing Chef deployments on a micro instance.
I am not using Chef, but I have been using boto3, Docker and Git to automatically 'build' test images inside a micro instance. Each time I go through my test script, a new image is built and I was not careful to read the fine print regarding default settings on the VolumeType argument on the boto3 run_instance command. Each test image was being built with the 'standard' volume type which, according to current EBS pricing, bills at the rate of $0.05/million I/Os. Whereas, the 'gp2' general purpose memory has a flat cost of $0.10 per GB per month with no extra charge for I/O.
With a handful of lean docker containers taking up a total of 2GB, on top of the 1.3GB for the amazon-ecs-optimized-ami, my storage is well within the free-tier usage. So, once I fixed the volumetype attribute in the blockdevicemappings settings in my scripts to 'gp2', I no longer had an I/O problem on my server.
Prior to that point, the constant downloading of docker images and git repos produced nearly 10m I/Os in less than a week.
The micro instance and the free tier is meant for testing their offerings, not a free way for you to host your site/web application.
You may have to pay money at the end of the month, but I really doubt if you can get away with paying less by using some other company for hosting. AFAIK AWS really is the rock bottom of the price charts.
As for the IO requests themselves, it's hard to give generic advice. I once was in a situation where my micro instance racked up ridiculous number of IO requests. Turns out testing Chef deployments on EC2 is a bad idea.
I/O Requests have to do with reading and writing blocks to EBS volumes. You can reduce this by using as much in memory caching as possible. Micro instances only have about 613 MB of memory available, so you may not be able to do much here.
Ok, so it seems like I/O requests are related to the EBS volume, and that caching may reduce it.
Something I had not considered though is all the operations I made to get my app running.
I updated the linux image, installed node and npm, several modules, mongodb, ....
This is likely to be the main cause of the I/O.
The number requests hasn't grown much in the last days, where the server stayed mostly idle.