Running background tasks in Meteor.js - javascript

This is my scenario:
1. Scrape some data every X minutes from example.com
2. Insert it to Mongodb database
3. Subscribe for this data in Meteor App.
Because, currently I am not very good at Meteor this is what I am going to do:
1. Write scraper script for example.com in Python or PHP.
2. Run script every X minutes with cronjob.
3. Insert it to Mongodb.
Is it possible to do it completely with Meteor without using Python or PHP? How can I handle task that runs every X minutes?

There are Cron like systems such as percolate:synced-cron for Meteor. There, you could register a job using Later.js syntax similar to this example taken from the percolate:synced-cron readme file:
SyncedCron.add({
name: 'Crunch some important numbers for the marketing department',
schedule: function(parser) {
// parser is a later.parse object
return parser.text('every 2 hours');
},
job: function() {
var numbersCrunched = CrushSomeNumbers();
return numbersCrunched;
}
});
If you want to rely on an OS level cron job, you could just provide an HTTP endpoint in your Meteor.js application that you could then access through curl at the chosen time.

I can suggest Steve Jobs, my new package for scheduling background jobs in Meteor.
You can use the register, replicate, and remove actions
// Register the job
Jobs.register({
dataScraper: function (arg) {
var data = getData()
if (data) {
this.replicate({
in: {
minutes: 5
}
});
this.remove(); // or, this.success(data)
} else {
this.reschedule({
in: {
minutes: 5
}
})
}
}
})
// Schedule the job to run on Meteor startup
// `singular` ensures that there is only pending job with the same configuration
Meteor.startup(function () {
Jobs.run("dataScraper", {
in: {
minutes: 5
},
singular: true
})
})
Depending on your preference, you can store the result in the database, as part of the jobs history, or remove it entirely.

Related

How to get the date every time a day passes in nodejs?

What I intend to do is a program that sends congratulatory emails for their birthday to several users, then the program will take today and execute a query to the database (it is an Excel file), in which it will take the date of the users and compare their date of birth with the current date, if the month and day coincide, mail will be sent. I think it can be done with a setInterval(), but I don't know if it affects the performance of the program. Since it will be uploaded on a windows server 2012 server of my company.
My code:
const express = require("express");
const app = express();
const excel = require('./public/scripts/readExcel.js');
const email = require('./services/email/sendEmail.js');
app.post('/send-email',(req, res)=>{
setInterval(() => {
email.sendEmail()
.then((result) => {
res.status(200).jsonp(req.body);;
console.log(result)
}).catch((err) => {
res.status(500).jsonp(req.body);;
console.log(err);
});
}, 3600000);//1 hour
});
app.listen(4000, ()=>{
console.log("Serven on -> http://localhost:4000");
})
Basically what it does is call the sendEmail function every hour which reads the Excel file or the database and extracts the date fields, compares with the current day, and sends the mail with Nodemailer to those who have birthdays. Also, the setInterval would go in the route of "/send-email" or how would the request be made?
For that, you can also run a cron job at every hour using npm package cron-job
using
var cron = require('node-cron');
cron.schedule('* * 1 * *', () => {
console.log('running a task every hour');
});
I'll break down my answer in two parts
What you need to do to make your solution work
How can you optimise the performance
1. What you need to do to make your solution work
There are two essential problems you need to resolve.
a. Your solution will work as it is, only thing you need to do is to call /send-email endpoint once after starting your server. BUT... this comes with side effects.
As setInterval will call the email.sendEmail... code block every hour, and this code block calls res.status(200).jsonp(req.body) every time. If you don't know this res.status.. sets the response for the request you receive. In this case, your request to /send-email. For the first time, it will work fine because you are returning the response to your above request. But when second time call to this code block kicks in, it has nothing to respond to because request has already been responded. Remember, HTTP protocol responds to a request once, then the request has been completed. So for this reason, your code block res.status... becomes invalid. So first thing, call res.status only once. So I'd remove this line out of the setInterval code block as follows
app.post('/send-email',(req, res)=>{
setInterval(() => {
email.sendEmail()
.then((result) => {
console.log(result)
}).catch((err) => {
console.log(err);
});
}, 3600000);//1 hour
res.status(200).jsonp(req.body);
})
b. Also I don't think you'd want the hastle of calling /send-email every time you start server, so I'd also make sure that this code block for birthday wishes gets kicked off every time you start server automatically. So I'd then just remove the line app.post('/send-email',(req, res)=>{. Also not that I'm not calling this for a request, I don't have any request to send response to so I can also remove the res.status.. line. And your code looks like this now
const express = require("express");
const app = express();
const email = require('./services/email/sendEmail.js');
(function(){
// Birthday wish email functionality
setInterval(() => {
email.sendEmail()
.then((result) => {
console.log(result)
}).catch((err) => {
console.log(err);
});
}, 3600000);//1 hour
})() // Immediately invoked function
That's it, your solution works now. Now to send birthday wish emails, you don't need to do anything else other than just starting your server.
Let's move on to second part now
2. How can you optimise the performance
a. Set interval to be 24hrs instead of 1 hr
Why do you need to check every hour for birthday? If you don't have a good answer here, I'd definitely change the interval to be 24hrs
b. Making the code more robust to deal with large data
As long as you have only 100s of entries in your excels and they are not going to grow much in future, I wouldn't go into making it more complex for performance.
But if your entries are destined to grow to 1000s and further. I'd suggest to use database(such as mongodb, postgres or mysql, etc.) to store your data and query only the entries with the birthday matching the particular date.
I'd also implement a queuing system to process query and send emails in batches instead of doing all of that at once.

Better way to schedule cron jobs based on job orders from php script

So I wrote simple video creator script in NodeJS.
It's running on scheduled cron job.
I have a panel written in PHP, user enter details and clicks "Submit new Video Job" Button.
This new job is saving to DB with details, jobId and status="waiting" data.
PHP API is responsible for returning 1 status at a time, checks status="waiting" limits query to 1 then returns data with jobID when asked
Video Creation Script requests every x seconds to that API asks for new job is available.
It has 5 tasks.
available=true.
Check if new job order available (With GET Request in every 20 seconds), if has new job;
available=false
Get details (name, picture url, etc.)
Create video with details.
Upload Video to FTP
Post data to API to update details. And Mark that job as "done"
available=true;
These tasks are async so everytask has to be wait previous task to be done.
Right now, get or post requesting api if new job available in every 20 seconds (Time doesnt mattter) seems bad way to me.
So any way / package / system to accomplish this behavior?
Code Example:
const cron = require('node-cron');
let available=true;
var scheduler = cron.schedule(
'*/20 * * * * *',
() => {
if (available) {
makevideo();
}
},
{
scheduled: false,
timezone: 'Europe/Istanbul',
}
);
let makevideo = async () => {
available = false;
let {data} = await axios.get(
'https://api/checkJob'
);
if (data == 0) {
console.log('No Job');
available = true;
} else {
let jobid = data.id;
await createvideo();
await sendToFTP();
await axios.post('https://api/saveJob', {
id: jobid,
videoPath: 'somevideopath',
});
available = true;
}
};
scheduler.start();
RabbitMQ is also a good queueing system.
Why ?
It's really well documented (examples for many languages including javascript & php).
Tutorials are simple while they're exposing real use cases.
It has a REST API.
It ships with a monitoring UI.
How to use it to solve your problem ?
On the job producer side : send messages (jobs) to a queue by following tutorial 1
To consume jobs with your nodejs process : see RabbitMQ's tutorial 2
Other suggestions :
Use a prefetch value of 1 and publisher confirms so you can ensure that an instance of consumer will not receive messages while there's a job running.
Roadmap for a quick prototype : tutorial 1... then tutorial 2 x). After sending and receiving messages you can explore the options you can set on queues and messages
Nodejs package : http://www.squaremobius.net/amqp.node/
PHP package : https://github.com/php-amqplib/php-amqplib
While it is possible to use the database as a queue, it is commonly known as an anti-pattern (next to using the database for logging), and as you are looking for:
So any way / package / system to accomplish this behavior?
I use the free-form of your question thanks to the placed bounty to suggest: Beanstalk.
Beanstalk is a simple, fast work queue.
Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.
It has client libraries in the languages you mention in your question (and many more), is easy to develop with and to run in production.
What you are doing in a very standard system design paradigm, done with Apache Kafka or any queue based implementation(ex, RabbitMQ). You can check out about Kafka/rabbitmq but basically Not going into details:
There is a central Queue.
When user submits a job the job gets added to the Queue.
The video processor runs indefinitely subscribing to the queue.
You can go ahead and look up : https://www.gentlydownthe.stream/ and you will recognize the similarities on what you are doing.
Here you don't need to poll yourself, you need to subscribe to an event and the other things will be managed by the respective queues.

Why is this funciton not loading right first in Node.JS?

I'm writing a twitter bot in Node.JS but I have a function that is using an npm library called "scrapejs" it grabs the data from yahoo finance and works just fine, problem is though I have the function loading after the code from the botton runs first and not a clue as to why. I had my tweeting section of the bot working before but if Im grabbing data from the web scrape I can't have that running after tweeting.
Here's the code:
console.log("The Bot Is Starting To Work.");
var sp = require('scrapejs').init({
cc: 2, // up to 2 concurrent requests
delay: 5 * 1 // delay .05 seconds before each request
});
sp.load('https://ca.finance.yahoo.com/q?s=CADUSD=X')
.then(function ($, res) {
//$.q("//h3[#class='r']/a").forEach(function(node){ //Adding the HTML Tags to filter the data
console.log("Scraping The Web...");
$.q("//span[#class='time_rtq_ticker']/span").forEach(function (node) {
var res = {
title: node.textContent //Always returns { title: 'NUMBER'} (When Working)
}
console.log(res);
return res;
})
})
.fail(function (err) {
console.log(err);
})
console.log("Why does this part load first??"); // This part comes first before the function above
Heres what the output looks like: https://imgur.com/YJD2FUW
node.js works asynchronously, this means that the rows in your code starts one after the other, and each of them ends when it ends - the third row will not wait for the second one to finish before starting.
Since your second function is heavy, it takes some time to run, much more than a simple console.log(), there for the last row will end first
For making your life easier you can use deasync module.
https://github.com/abbr/deasync
Just install it using: npm i -s deasync
Then you can convert any async function to sync one.

Force Meteor to Update on Remote Changes?

I have a meteor app that's being modified via an external API. The API modifies the mongodb that the Meteor app reads from. The problem I'm running into is that the changes the API are making to the database are not being rendered as quickly as I'd like them to be on my meteor app. If I post new data to my API every 10 seconds, my meteor app seems to only be updating every 30 seconds. How can I increase the rate at which meteor updates/listens to changes? Below is a sample of some code I wrote.
UsageData = new Mongo.Collection('UsageData');
if (Meteor.isClient) {
// This code only runs on the client
angular.module('dashboard', ['angular-meteor']);
angular.module('dashboard').controller('DashboardCtrl', ['$scope', '$meteor',
function($scope, $meteor) {
$scope.$meteorSubscribe('usageData');
$scope.query = {};
$scope.data = $meteor.collection(function() {
return UsageData.find($scope.getReactively('query'), {
sort: {
createdAt: -1
},
limit: 1
});
});
}
]);
}
// This code only runs on the server
if (Meteor.isServer) {
Meteor.publish('usageData', function() {
return UsageData.find({}, {
sort: {
createdAt: -1
},
limit: 20
});
});
}
Have you provided the OpLog URL to your meteor backend?
If not, then meteor is using the poll-and-diff algorithm which is
expensive (cpu & memory)
runs only every 10 seconds (because of 1.)
By using MongoDB OpLog it will run immediately.
This should be useful regarding OpLog & Meteor
https://meteorhacks.com/mongodb-oplog-and-meteor
Meteor 0.7 blog post, when the introduced oplog for the first time
http://info.meteor.com/blog/meteor-070-scalable-database-queries-using-mongodb-oplog-instead-of-poll-and-diff

meteor js create mongodb database hook to store data from API at fixed interval

tldr - What is the best pattern create a 'proprietary database' with data from an API? In this case, using Meteor JS and collections in mongo db.
Steps
1. Ping API
2. Insert Data into Mongo at some interval
In lib/collections.js
Prices = new Mongo.Collection("prices");
Basic stock api call, in server.js:
Meteor.methods({
getPrice: function () {
var result = Meteor.http.call("GET", "http://api.fakestockprices.com/ticker/GOOG.json");
return result.data;
}
});
Assume the JSON is returned clean and tidy, and I want to store the entire object (how you manipulate what is returned is not important, storing the return value is)
We could manipulate the data in the Meteor.method function above but should we? In Angular services are used to call API, but its recommended to modularize and keep the API call in its own function. Lets borrow that, and Meteor.call the above getPrice.
Assume this also done in server.js (please correct).
Meteor.call("getPrice", function(error, result) {
if (error)
console.log(error)
var price = result;
Meteor.setInterval(function() {
Prices.insert(price);
}, 1800000); // 30min
});
Once in the db, a pub/sub could be established, which I'll omit and link to this overview.
You may want to take a look at the synced-cron package.
With a cron job it's pretty easy, just call your method:
// server.js
SyncedCron.start();
SyncedCron.add({
name: "get Price",
schedule: function(parser){
return parser.text('every 30 minutes');
},
job: function(){
return Meteor.call("getPrice");
}
});
Then in getPrice you can do var result = HTTP.call(/* etc */); and Prices.insert(result);. You would want some additional checks of course, as you have pointed out.

Categories