My server guy wants to use apache, and also he wants to program a daemon to control automated file deletion, along with other automated file tasks. Can node.js automate the deletion of files if they reach a certain length of time? And in addition, can node.js time-stamp, because my server guy swears it cannot, and that "daemons are superior for automated file tasks!"
Thank you.
Sorry if my question isn't very comprehensible, I'm in a hurry to get this answered.
Yes, here you can find how.
For such tasks, I think that cron job that executes some script (it can be a nodejs, php, perl, sh, whatever) can work just fine.
At the end, depends on your problem. Daemon sound like an overkill, but it might be the only approach.
One way to use node might be to use inotify/dnotify to be notified when files are created, then set callbacks with setTimeout to clear them, or keep a list and periodically iterate it and delete files.
The distinction between a daemon and node is not correct, a node application can be a daemon or a one-off script. Node just keeps running until no further interrupts are possible and then the script terminates.
Related
I am developing a NodeJS application wherein a user can schedule a job (CPU intensive) to be run. I am keeping the event loop free and want to run the job in a separate process. When the user submits the job, I make an entry in the database (PostgreSQL), with the timestamp along with some other information. The processes should be run in the FCFS order. Upon some research on stackoverflow, I found people suggesting Bulljs (with Redis), Kue, RabbitMQ, etc. as a solution. My doubt is why do I need to use those when I can just poll the database and get the oldest job. I don't intend to poll the db at a regular interval but instead only when the current job is done executing.
My application does not receive too many simultaneous requests. And also users do not wait for the job to be completed. Instead they logout and are notified through mail when the job is done. What can be the potential drawbacks of using child_process (spawn/exec) module as a solution?
My doubt is why do I need to use those when I can just poll the database and get the oldest job.
How are you planning on handling failures? What if Node.js crashes with a job mid-progress, would that effect your users? Would you then retry a failed job? How do you support back-off? How many attempts before it should completely stop?
These questions are answered in the Bull implementation, RabbitMQ and almost every solution you'll find for your current challenge.
From what I noticed (child_process), it's a lower level implementation (low-level in Node.js), meaning that a lot of the functionality you'll typically require (failover/backoff) isn't included. You'll have to implement this.
That's where it usually becomes more trouble than it's worth, although admittedly managing, monitoring and deploying a Redis server may not be the most optimal solution either.
Have you considered a different approach, how would a periodic CRON job work? (For example).
The challenge with such a system is usually how you plan to handle failure and what impact failure has on your application and end-users.
I will say, in the defense of Bull, for a CPU intensive task I prefer to have a separated instance of the worker process, I can then re-deploy that single process as many times as I need. This keeps my back-end code separated and generally easier to manage, whilst also giving me the ability to easily scale up/down when required.
EDIT: I mention "more trouble than it's worth", if you're looking to really learn how technology like this is developed, go with child process and build your own abstractions on-top, if it's something you need today, use Bull, RabbitMQ or any purpose-built alternative.
I need a NodeJS script I wrote to run every 10 minutes and grab data from an API. I used to be a Unix admin and something like this would be accomplished with a cron job. I know I'm going to have to set up some kind of scheduled execution on the server where my script resides. What's the best way to approach this?
I think node-cron would solve your problem.
I know I'm going to have to set up some kind of scheduled execution on the server where my script resides
If you have just one script as of now and you are looking to NOT have many then you can simply place your scripts in your main repository itself with its own configurations.
Alternatively, you can setup a whole new repository just for your scripts which would give you a lot more power over how you want to run your scripts, what language you want to write them in, who can access you code etc.
I'm developing an app that should receive a .CSV file, save it, scan it, and insert data of every record into DB and at the end delete the file.
With a file with about 10000 records there aren't problems but with a larger file the PHP script is correctly runned and all data are saved into DB but is printed ERROR 504 The server didn't respond in time..
I'm scanning the .CSV file with the php function fgetcsv();.
I've already edit settings into php.ini file (max execution time (120), etc..) but nothing change, after 1 minute the error is shown.
I've also try to use a javascript function to show an alert every 10 seconds but also in this case the error is shown.
Is there a solution to avoid this problem? Is it possible pass some data from server to client every tot seconds to avoid the error?
Thank's
Its typically when scaling issues pop up when you need to start evolving your system architecture, and your application will need to work asynchronously. This problem you are having is very common (some of my team are dealing with one as I write) but everyone needs to deal with it eventually.
Solution 1: Cron Job
The most common solution is to create a cron job that periodically scans a queue for new work to do. I won't explain the nature of the queue since everyone has their own, some are alright and others are really bad, but typically it involves a DB table with relevant information and a job status (<-- one of the bad solutions), or a solution involving Memcached, also MongoDB is quite popular.
The "problem" with this solution is ultimately again "scaling". Cron jobs run periodically at fixed intervals, so if a task takes a particularly long time jobs are likely to overlap. This means you need to work in some kind of locking or utilize a scheduler that supports running the job sequentially.
In the end, you won't run into the timeout problem, and you can typically dedicate an entire machine to running these tasks so memory isn't as much of an issue either.
Solution 2: Worker Delegation
I'll use Gearman as an example for this solution, but other tools encompass standards like AMQP such as RabbitMQ. I prefer Gearman because its simpler to set up, and its designed more for work processing over messaging.
This kind of delegation has the advantage of running immediately after you call it. The server is basically waiting for stuff to do (not unlike an Apache server), when it get a request it shifts the workload from the client onto one of your "workers", these are scripts you've written which run indefinitely listening to the server for workload.
You can have as many of these workers as you like, each running the same or different types of tasks. This means scaling is determined by the number of workers you have, and this scales horizontally very cleanly.
Conclusion:
Crons are fine in my opinion of automated maintenance, but they run into problems when they need to work concurrently which makes running workers the ideal choice.
Either way, you are going to need to change the way users receive feedback on their requests. They will need to be informed that their request is processing and to check later to get the result, alternatively you can periodically track the status of the running task to provide real-time feedback to the user via ajax. Thats a little tricky with cron jobs, since you will need to persist the state of the task during its execution, but Gearman has a nice built-in solution for doing just that.
http://php.net/manual/en/book.gearman.php
This may be the wrong forum to ask this question, however is it possible to do a "touchless" node.js deployment? For example, is it possible to push a copy of node.exe and the required packages to a physical location on the drive (assume the machine is generally in a disconnected state) then have a shortcut that executes the appropriate commandline to get the node process running?
I know that this is a loaded question, because without being physically installed on the box, and then running within a Windows Service, you lose all the lifetime management, and that is just scratching the surface of the things that need to be considered. Anyway, I truly appreciate the help and opinions.
Just coming back to clean up this answer. Yes this is possible, but not always advisable. Better to do an installation or leverage containers.
IN NODEJS:
IF we can only run one function at the same time if node is not using multiple threads. How can this work when a lot of requests arrive at the web server at the same time?
Can to clear the panorama about thread and process??
Theoretically if a large number of requests happened in the same second - or if each request has to do something that takes a while, like hard math - your server could get bogged down. Either by not responding to people at the "end" of the line (late by milliseconds), or never finishing all of the things Node needs to do to serve those requests at the "front" of the line.
In general the strategy Node takes is that if you're going to perform a long operation - like querying the database - the execution of the program should not sit around waiting, but should "call back" to some other function when the database query is eventually done.
I talk more about this in another SO answer. You could Google "node.js is cancer" for other examples of just what you are talking about.
But the prevalence of this strategy is one of the major differences between Node and other languages/frameworks: that's just how Javascript deals.
Now, in practice, several things actually happen.
First, any production Node app really should be running with Cluster or some kind of solution that provides load balancing. Because you'd be having multiple processes of your app working, your solution can do more than one thing at once.
Secondly, in general Node.js stays up really well, because the idea of not waiting around for everything. It keeps your server busy, instead of cooling it's jets waiting for something to be done.
Thirdly, yes you do have to be careful about what you do in the server. If something's going to take too long (modiying all the records in the database), probably wise to do it in the background via some kind of worker queue system: "Hey, I need to update this person's username in all of (these) records in the database" probably should happen by yet another Node.js process being the worker.