I have some .js files that take information from web pages with Cheerio, but what I want to do is give them kind of setTimeout like 1 day period to restart themselves if there is new data comes or not. I guess I shouldn't do with setTimeout because I'll have 15-20 files bot that getting data, I should use thread but how am I going to use them like service.
I would recommend using cron for node, its an implementation of cron and is really simple to use. This will allow you to schedule tasks to be ran when you want them to. It will also allow you to schedule tasks with out overloading your server with setTimeout but from what you say you wont have many so it wont make too much of an impact.
Actually, 15-20 sounds fine for me to use setTimeout.
I thought you might want to check some cron tools like: https://www.npmjs.com/package/node-schedule and then schedule your crawlers to rescan targets as you need, as this would be more efficient.
Related
I have a nodejs server that runs some scripts and parses files for my client to access. The data sits on certain urls and I fetch that on the client side. The problem is that the data will be out of date, so the script will need to be ran again. Is there a way to run these scripts in my server every say 15 minutes and post that to the url?
Your need is something like job scheduling. There are few ways to achieve it:
You can handle in your Nodejs application
Use setInterval method to repeatedly calls a function. However, this method is not fully supported by Nodejs - look at this. Just remind that timer methods in Nodejs maybe delay your task because the event-loop mechanism
Use the job scheduling library that allows you to schedule the recurring task. There are 2 libs as I did a quick search: node-schedule and node-cron
Create a system cron-job to trigger your Nodejs app to handle the recurring task. Check out more at here.
I need a NodeJS script I wrote to run every 10 minutes and grab data from an API. I used to be a Unix admin and something like this would be accomplished with a cron job. I know I'm going to have to set up some kind of scheduled execution on the server where my script resides. What's the best way to approach this?
I think node-cron would solve your problem.
I know I'm going to have to set up some kind of scheduled execution on the server where my script resides
If you have just one script as of now and you are looking to NOT have many then you can simply place your scripts in your main repository itself with its own configurations.
Alternatively, you can setup a whole new repository just for your scripts which would give you a lot more power over how you want to run your scripts, what language you want to write them in, who can access you code etc.
I'm developing an app that should receive a .CSV file, save it, scan it, and insert data of every record into DB and at the end delete the file.
With a file with about 10000 records there aren't problems but with a larger file the PHP script is correctly runned and all data are saved into DB but is printed ERROR 504 The server didn't respond in time..
I'm scanning the .CSV file with the php function fgetcsv();.
I've already edit settings into php.ini file (max execution time (120), etc..) but nothing change, after 1 minute the error is shown.
I've also try to use a javascript function to show an alert every 10 seconds but also in this case the error is shown.
Is there a solution to avoid this problem? Is it possible pass some data from server to client every tot seconds to avoid the error?
Thank's
Its typically when scaling issues pop up when you need to start evolving your system architecture, and your application will need to work asynchronously. This problem you are having is very common (some of my team are dealing with one as I write) but everyone needs to deal with it eventually.
Solution 1: Cron Job
The most common solution is to create a cron job that periodically scans a queue for new work to do. I won't explain the nature of the queue since everyone has their own, some are alright and others are really bad, but typically it involves a DB table with relevant information and a job status (<-- one of the bad solutions), or a solution involving Memcached, also MongoDB is quite popular.
The "problem" with this solution is ultimately again "scaling". Cron jobs run periodically at fixed intervals, so if a task takes a particularly long time jobs are likely to overlap. This means you need to work in some kind of locking or utilize a scheduler that supports running the job sequentially.
In the end, you won't run into the timeout problem, and you can typically dedicate an entire machine to running these tasks so memory isn't as much of an issue either.
Solution 2: Worker Delegation
I'll use Gearman as an example for this solution, but other tools encompass standards like AMQP such as RabbitMQ. I prefer Gearman because its simpler to set up, and its designed more for work processing over messaging.
This kind of delegation has the advantage of running immediately after you call it. The server is basically waiting for stuff to do (not unlike an Apache server), when it get a request it shifts the workload from the client onto one of your "workers", these are scripts you've written which run indefinitely listening to the server for workload.
You can have as many of these workers as you like, each running the same or different types of tasks. This means scaling is determined by the number of workers you have, and this scales horizontally very cleanly.
Conclusion:
Crons are fine in my opinion of automated maintenance, but they run into problems when they need to work concurrently which makes running workers the ideal choice.
Either way, you are going to need to change the way users receive feedback on their requests. They will need to be informed that their request is processing and to check later to get the result, alternatively you can periodically track the status of the running task to provide real-time feedback to the user via ajax. Thats a little tricky with cron jobs, since you will need to persist the state of the task during its execution, but Gearman has a nice built-in solution for doing just that.
http://php.net/manual/en/book.gearman.php
What is a good aproach to handle background processes in a NodeJS application?
Scenario: After a user posts something to an app I want to crunch the data, request additional data from external resources, etc. All of this is quite time consuming, so I want it out of the req/res loop. Ideal would be to just have a queue of jobs where you can quickly dump a job on and a daemon or task runner will always take the oldest one and process it.
In RoR I would have done it with something like Delayed Job. What is the Node equivalent of this API?
If you want something lightweight, that runs in the same process as the server, I highly recommend Bull. It has a simple API that allows for a fine grained control over your queues.
If you're familiar with Ruby's Resque, there is a node implementation called Node-resque
Bull and Node-resque are all backed by Redis, which is ubiquitous among Node.js worker queues. They would be able to do what RoR's DelayedJob does, it's matter of specific features that you want, and your API preferences.
Background jobs are not directly related to your web service work, so they should not be in the same process. As you scale up, the memory usage of the background jobs will impact the web service performance. But you can put them in the same code repository if you want, whatever makes more sense.
One good choice for messaging between the two processes would be redis, if dropping a message every now and then is OK. If you want "no message left behind" you'll need a more heavyweight broker like Rabbit. Your web service process can publish and your background job process can subscribe.
It is not necessary for the two processes to be co-hosted, they can be on separate VMs, Docker containers, whatever you use. This allows you to scale out without much trouble.
If you're using MongoDB, I recommend Agenda. That way, separate Redis instances aren't running and features such as scheduling, queuing, and Web UI are all present. Agenda UI is optional and can be run separately of course.
Would also recommend setting up a loosely coupled abstraction between your application logic and the queuing / scheduling system so the entire background processing system can be swapped out if needed. In other words, keep as much application / processing logic away from your Agenda job definitions in order to keep them lightweight.
I'd like to suggest using Redis for scheduling jobs. It has plenty of different data structures, you can always pick one that suits better to your use case.
You mentioned RoR and DJ, so I assume you're familiar with sidekiq. You can use node-sidekiq for job scheduling if you want to, but its suboptimal imo, since it's main purpose is to integrate nodejs with RoR.
For worker daemonising I'd recommend using PM2. It's widely used and actively-maintained. It solves a lot of problems (e.g. deployment, monitoring, clustering) so make sure it won't be an overkill for you.
I tried bee-queue & bull and chose bull in the end.
I first chose bee-queue b/c it is quite simple, their examples are easy to understand, while bull's examples are bit complicated. bee's wiki Bee Queue's Origin also resonates with me. But the problem with bee is <1> their issue resolution time is quite slow, their latest update was 10 months ago. <2> I can't find an easy way to pause/cancel job.
Bull, on the other hand, frequently updates their codes, response to issues. Node.js job queue evaluation said bull's weakness is "slow issues resolution time", but my experience is the opposite!
But anyway their api is similar so it is quite easy to switch from one to another.
I suggest to use a proper Node.js framework to build you app.
I think that the most powerful and easy to use is Sails.js.
It's a MVC framework so if you are used to develop in ROR, you will find it very very easy!
If you use it, It's already present a powerful (in javascript terms) job manager.
new sails.cronJobs('0 01 01 * * 0', function () {
sails.log.warn("START ListJob");
}, null, true, "Europe/Dublin");
If you need more info not hesitate to contact me!
I'm developing website where I need to execute one code at particular time.
Which is faster and better choice to write Cron Job or to use JavaScript Timing Event
or something similar or JavaScript
You are asking a question about two completely different things.
Cron job is based on the server, JavaScript (unless you are using NodeJS) is based on the client. Depending on whether this is a task that:
must be performed and cannot be relied on the client (eg. the data is sensitive), or
can depend on the client execution (which means the browser window should remain open and the JavaScript should be enabled),
choose Cron (1) or JavaScript (2) respectively.
It is really like a comparison between apples and oranges. Unless you will tell us whether you want orange juice or apple pie, we won't be able to help you more. Just remember that Cron is for more reliable server-side task execution, and JavaScript timeout is per-user (or rather per-client), less reliable execution.
It entirely depends on the nature of the code you need to execute at a particular time.
If it's something that has to happen every day at 2pm or whatever, regardless of whether or not anyone's looking at the website, then you should use a cron job for that.
On the other hand, if it's something that needs to execute at a certain time per user (i.e., to automatically log a user out of a page after some amount of idle time), then the appropriate call is Javascript timing functions.
Javascript timing functions will only work if someone's actually looking at the page, and then it'll be called multiple times for multiple users, which may or may not be desirable depending on your situation.
Of course, you may be running Node.js on the server, in which case you can use Javascript timing functions as if they were a cron job.
In short, use cron
In your comment to another answer, you said:
I want to execute php function every week one time
In this case, you have one main option (assuming you are using *nix) and that is cron (I don't know what the Windows alternative is). Cron is specifically designed for this function, and whether or not you choose to use it, it is most likely running on your server anyway (for other system functions) so speed is not an issue.
Don't use Node.js
Node.js is an alternative serverside technology to PHP. You would use it server side instead of php. If you're already using PHP, then forget it. The only reason Node.js has been mentioned is because you've asked about JavaScript.
Also, for a weekly timing event, A JavaScript timer wouldn't be a good idea. The setTimeout() function works in milliseconds, and is good for working in seconds and minutes (possibly hours), but not weeks.
If you were to use serverside JavaScript (like Node.js), you would probably need to do something similar to the PHP Alternative below.
PHP Alternative
Of course, depending on your hosting environment (especially cheaper ones), cron may not be available. In this case you would have to come up with a different strategy, and you would probably be best to use PHP. Something that I've seen done before goes along these lines:
Have some register of jobs that need to be performed. (In a database, or a file, or whatever)
Every time you run your main PHP script (usually index.php), check the register to see if there are any outstanding jobs.
2.a. Run the job.
2.b. Update the register, so you remember the last time the job was performed.
Pros:
It works if you don't have access to cron.
Cons:
If your script is not run very often (because this method relies on people visiting your page), your jobs may not be run as often as you like.
If your script is run very often, you will suffer unnecessary overhead in your script.
If your jobs take a long time to run, it will effect the page load times.
You're basically replicating cron, but using PHP which is far less efficient than using cron.
It's unlikely (unless you invest a lot of time) that you'll develop a solution that is as good as cron.
For a javascript timing event to run you would need to open the webpage. That means you have to expose that page publicly. You don't want to do that. Cron jobs are easy and effective. I like them. You should do that.