Why is this funciton not loading right first in Node.JS? - javascript

I'm writing a twitter bot in Node.JS but I have a function that is using an npm library called "scrapejs" it grabs the data from yahoo finance and works just fine, problem is though I have the function loading after the code from the botton runs first and not a clue as to why. I had my tweeting section of the bot working before but if Im grabbing data from the web scrape I can't have that running after tweeting.
Here's the code:
console.log("The Bot Is Starting To Work.");
var sp = require('scrapejs').init({
cc: 2, // up to 2 concurrent requests
delay: 5 * 1 // delay .05 seconds before each request
});
sp.load('https://ca.finance.yahoo.com/q?s=CADUSD=X')
.then(function ($, res) {
//$.q("//h3[#class='r']/a").forEach(function(node){ //Adding the HTML Tags to filter the data
console.log("Scraping The Web...");
$.q("//span[#class='time_rtq_ticker']/span").forEach(function (node) {
var res = {
title: node.textContent //Always returns { title: 'NUMBER'} (When Working)
}
console.log(res);
return res;
})
})
.fail(function (err) {
console.log(err);
})
console.log("Why does this part load first??"); // This part comes first before the function above
Heres what the output looks like: https://imgur.com/YJD2FUW

node.js works asynchronously, this means that the rows in your code starts one after the other, and each of them ends when it ends - the third row will not wait for the second one to finish before starting.
Since your second function is heavy, it takes some time to run, much more than a simple console.log(), there for the last row will end first

For making your life easier you can use deasync module.
https://github.com/abbr/deasync
Just install it using: npm i -s deasync
Then you can convert any async function to sync one.

Related

How to get the date every time a day passes in nodejs?

What I intend to do is a program that sends congratulatory emails for their birthday to several users, then the program will take today and execute a query to the database (it is an Excel file), in which it will take the date of the users and compare their date of birth with the current date, if the month and day coincide, mail will be sent. I think it can be done with a setInterval(), but I don't know if it affects the performance of the program. Since it will be uploaded on a windows server 2012 server of my company.
My code:
const express = require("express");
const app = express();
const excel = require('./public/scripts/readExcel.js');
const email = require('./services/email/sendEmail.js');
app.post('/send-email',(req, res)=>{
setInterval(() => {
email.sendEmail()
.then((result) => {
res.status(200).jsonp(req.body);;
console.log(result)
}).catch((err) => {
res.status(500).jsonp(req.body);;
console.log(err);
});
}, 3600000);//1 hour
});
app.listen(4000, ()=>{
console.log("Serven on -> http://localhost:4000");
})
Basically what it does is call the sendEmail function every hour which reads the Excel file or the database and extracts the date fields, compares with the current day, and sends the mail with Nodemailer to those who have birthdays. Also, the setInterval would go in the route of "/send-email" or how would the request be made?
For that, you can also run a cron job at every hour using npm package cron-job
using
var cron = require('node-cron');
cron.schedule('* * 1 * *', () => {
console.log('running a task every hour');
});
I'll break down my answer in two parts
What you need to do to make your solution work
How can you optimise the performance
1. What you need to do to make your solution work
There are two essential problems you need to resolve.
a. Your solution will work as it is, only thing you need to do is to call /send-email endpoint once after starting your server. BUT... this comes with side effects.
As setInterval will call the email.sendEmail... code block every hour, and this code block calls res.status(200).jsonp(req.body) every time. If you don't know this res.status.. sets the response for the request you receive. In this case, your request to /send-email. For the first time, it will work fine because you are returning the response to your above request. But when second time call to this code block kicks in, it has nothing to respond to because request has already been responded. Remember, HTTP protocol responds to a request once, then the request has been completed. So for this reason, your code block res.status... becomes invalid. So first thing, call res.status only once. So I'd remove this line out of the setInterval code block as follows
app.post('/send-email',(req, res)=>{
setInterval(() => {
email.sendEmail()
.then((result) => {
console.log(result)
}).catch((err) => {
console.log(err);
});
}, 3600000);//1 hour
res.status(200).jsonp(req.body);
})
b. Also I don't think you'd want the hastle of calling /send-email every time you start server, so I'd also make sure that this code block for birthday wishes gets kicked off every time you start server automatically. So I'd then just remove the line app.post('/send-email',(req, res)=>{. Also not that I'm not calling this for a request, I don't have any request to send response to so I can also remove the res.status.. line. And your code looks like this now
const express = require("express");
const app = express();
const email = require('./services/email/sendEmail.js');
(function(){
// Birthday wish email functionality
setInterval(() => {
email.sendEmail()
.then((result) => {
console.log(result)
}).catch((err) => {
console.log(err);
});
}, 3600000);//1 hour
})() // Immediately invoked function
That's it, your solution works now. Now to send birthday wish emails, you don't need to do anything else other than just starting your server.
Let's move on to second part now
2. How can you optimise the performance
a. Set interval to be 24hrs instead of 1 hr
Why do you need to check every hour for birthday? If you don't have a good answer here, I'd definitely change the interval to be 24hrs
b. Making the code more robust to deal with large data
As long as you have only 100s of entries in your excels and they are not going to grow much in future, I wouldn't go into making it more complex for performance.
But if your entries are destined to grow to 1000s and further. I'd suggest to use database(such as mongodb, postgres or mysql, etc.) to store your data and query only the entries with the birthday matching the particular date.
I'd also implement a queuing system to process query and send emails in batches instead of doing all of that at once.

Start and Restart async function with setIntervalAsync returns TypeError cannot convert undefined or null to object

I've been trying to implement a web scraper that will use data pulled from MongoDB to create an array of urls to scrape periodically with puppeteer. I have been trying to get my scraper function to scrape periodically by using setIntervalAsync.
Here is my code right now that throws "UnhandledPromiseRejectionWarning: TypeError: Cannot convert undefined or null to object at Function.values..."
puppeteer.js
async function scrape(array){
// initialize for loop here
let port = '9052'
if(localStorage.getItem('scrapeRunning')=='restart'){
clearIntervalAsync(scrape)
localStorage.setItem('scrapeRunning') == 'go'
}else if(localStorage.getItem('scrapeRunning') != 'restart'){
/// Puppeteer scrapes urls in array here ///
}
server.js
app.post('/submit-form', [
// Form Validation Here //
], (req,res)=>{
async function submitForm(amazonUrl,desiredPrice,email){
// Connect to MongoDB and update entry or create new entry
// with post request data
createMongo.newConnectToMongo(amazonUrl,desiredPrice,email)
.then(()=>{
// Set local variable that will alert scraper to clearIntervalAsync///
localStorage.setItem('scrapeRunning','restart');
// before pulling the new updated mongoDB data ...
return createMongo.pullMongoArray();
})
.then((result)=>{
// and start scraping again with the new data
puppeteer.scrape(result)
})
submitForm(req.body.amazonUrl, req.body.desiredPrice,req.body.email);
}
}
createMongo.pullMongoArray()
.then((result)=>{
setIntervalAsync(puppeteer.scrape, 10000, result);
})
Currently the scraper starts as expected after the server is started and keeps 10 seconds between when the scrape ends and when it begins again. Once there is a post submit the MongoDB collection gets updated with the post data, the localStorage item is created, but the scrape function goes off the rails and throws the typeError. I am not sure what is going on and have tried multiple ways to fix this (including leaving setIntervalAsync and clearIntervalAsync inside of the post request code block) but have been unsuccessful so far. I am somewhat new to coding, and extremely inexperienced with asynchronous code, so if someone has any experience with this kind of issue and could shed some light on what is happening I would truly appreciate it!
I only think that it has something to do with async as no matter what I have tried it also seems to run the pullMongoArray function before the newConnectToMongo function is complete.
After a few more hours of searching around I think I may have found a workable solution. I've completely eliminated the use of localStorage and removed the if and else if statements from within the scrape function. I have also make a global timer variable and added control functions to this file.
puppeteer.js
let timer;
function start(result){
timer = setIntervalAsync(scrape,4000, result)
}
function stop(){
clearIntervalAsync(timer)
}
async function scrape(array){
// initialize for loop here
let port = '9052'
// Puppeteer scrapes urls from array here //
}
I've altered my server code a little bit so at the server start it gets the results from MongoDB and uses that in the scraper start function. A post request also calls the stop function before updating MongoDB, pulling a new result from MongoDB, and then recalling the start scraper function.
server.js
createMongo.pullMongoArray()
.then((result)=>{
puppeteer.start(result);
})
app.post('/submit-form', [
// Form Validation In Here //
], (req,res)=>{
async function submitForm(amazonUrl,desiredPrice,email){
// Stop the current instance of the scrape function
puppeteer.stop();
// Connect to MongoDB and update entry or create new entry
// with post request data
createMongo.newConnectToMongo(amazonUrl,desiredPrice,email)
.then(()=>{
// before pulling the new updated mongoDB data ...
console.log('Pulling New Array');
return createMongo.pullMongoArray();
})
.then((result)=>{
// and restarting the repeating scrape function
puppeteer.start(result);
})
}
})

$.getJSON works when trying to find all records, but doesnt work when trying to find one specific record

I'm working on a section of my app where a user can add a note to his project. Each note has the capability to have additional comments appended to it.
So far the backend is working. When calling all records at a time or one record at a time(by id) manually, be it via postman or simply adding the id number of a project(gotten from mongo) to the browser, both pull up the records as specified.
The problem starts when I try to pull this information through the front end via
$.getJSON.
Say for example that I have two projects in my app.
Project 01 has an id of "123" and has 3 comments
and
Project 02 has an id of "456" and has 10 comments
When I call all projects on the front end of the app, I see both appear, and all their comments come through ok but when I try to call, for example, project two by id, it does show but I get 10 "undefined" for all of that projects 10 comments. Same thing happens for any one record I call.
And again, this happens only when trying to call it via jquery $.getJSON because when I manually do it via postman/browser, they come through fine.
Here is some code below for when I try to find one record (not working fully).
This is the backend code:
app.get("/notesapi/:tabID", (request, response) => {
var idNum = request.params.tabID;
var newIdNumber = idNum.trim();
NotesModel.findOne({_id: newIdNumber}, (error, data)=>{
if(error){
console.log("error in finding this particular record!");
throw error;
} else {
console.log("data for this one record found YO");
response.status(200);
response.json(data);
}
});
});
And this is the front end code:
function getFreshComments(tabID){
console.log("FRONTEND -- the link is: /notesapi/" + tabID);
$.getJSON("/notesapi/456", showCommentsGotten);
function showCommentsGotten(dataFedIn){
var tabInnerComments = $("#" + tabID +" .theNote");
var output = "";
$.each(dataFedIn, (key, item)=>{
output += item.todoMessages + "<br>";
});
var normalComments = output;
var newComments = normalComments.split(",").join("<br>");
tabInnerComments.html(newComments);
}
}
As the example explained above, if I wanted to pull the 10 comments from id 456, then when I use $.getJSON("/notesapi/456", showCommentsGotten);
This returns me 10 "undefined".
When I remove the id number from the URL, then it fetches me ALL the comments for ALL the notes.
I don't get any errors anywhere. What am I missing or doing wrong?

Running background tasks in Meteor.js

This is my scenario:
1. Scrape some data every X minutes from example.com
2. Insert it to Mongodb database
3. Subscribe for this data in Meteor App.
Because, currently I am not very good at Meteor this is what I am going to do:
1. Write scraper script for example.com in Python or PHP.
2. Run script every X minutes with cronjob.
3. Insert it to Mongodb.
Is it possible to do it completely with Meteor without using Python or PHP? How can I handle task that runs every X minutes?
There are Cron like systems such as percolate:synced-cron for Meteor. There, you could register a job using Later.js syntax similar to this example taken from the percolate:synced-cron readme file:
SyncedCron.add({
name: 'Crunch some important numbers for the marketing department',
schedule: function(parser) {
// parser is a later.parse object
return parser.text('every 2 hours');
},
job: function() {
var numbersCrunched = CrushSomeNumbers();
return numbersCrunched;
}
});
If you want to rely on an OS level cron job, you could just provide an HTTP endpoint in your Meteor.js application that you could then access through curl at the chosen time.
I can suggest Steve Jobs, my new package for scheduling background jobs in Meteor.
You can use the register, replicate, and remove actions
// Register the job
Jobs.register({
dataScraper: function (arg) {
var data = getData()
if (data) {
this.replicate({
in: {
minutes: 5
}
});
this.remove(); // or, this.success(data)
} else {
this.reschedule({
in: {
minutes: 5
}
})
}
}
})
// Schedule the job to run on Meteor startup
// `singular` ensures that there is only pending job with the same configuration
Meteor.startup(function () {
Jobs.run("dataScraper", {
in: {
minutes: 5
},
singular: true
})
})
Depending on your preference, you can store the result in the database, as part of the jobs history, or remove it entirely.

Meteor - Server-side API call and insert into mongodb every minute

I am in the process of learning Meteor while at the same time experimenting with the TwitchTV API.
My goal right now is to call the TwitchAPI every minute and then insert part of the json object into the mongo database. Since MongoDB matches on _id and Twitch uses _id as its key I am hoping subsequent inserts will either update existing records or create a new one if the _id doesnt exist yet.
The call and insert (at least the initial one) seem to be working fine. However, I can't seem to get the Meteor.setTimeout() function to work. The call happens when I start the app but does not continue occurring every minute.
Here is what I have in a .js. file in my server folder:
Meteor.methods({
getStreams: function() {
this.unblock();
var url = 'https://api.twitch.tv/kraken/streams?limit=3';
return Meteor.http.get(url);
},
saveStreams: function() {
Meteor.call('getStreams', function(err, res) {
var data = res.data;
Test.insert(data);
}
}
});
Deps.autorun(function(){
Meteor.setTimeout(function(){Meteor.call('saveStreams');}, 1000);
});
Any help or advice is appreciated.
I made the changes mentioned by #richsilv and #saimeunt and it worked. Resulting code:
Meteor.startup(function(){
Meteor.setInterval(function(){Meteor.call('saveStreams');}, 1000);
});

Categories