So I have an array of URLs
I want to pull the html from each (for which I am using restler node.js library)
Then select some of that data to act on via jquery (for which I am using cheerio node.js library)
The code I have works, but duplicates the pulled data by however many URLS there are.
I am doing this in Node but suspect it's a generalized Javascript matter that I don't understand too well.
url.forEach(function(ugh){
rest.get(ugh).on('complete', function(data) {
$ = cheerio.load(data);
prices.push($(".priceclass").text());
//i only want this code to happen once per item in url array
//but it happens url.length times per item
//probably because i don't get events or async very well
});
});
So if there are 3 items in the 'url' array, the 'prices' array with the data I want will have 9 items. Which I don't want
--EDIT:
Added a counter to verify that the 'complete' callback was executing array-length times per array item.
x=0;
url.forEach(function(ugh){
rest.get(ugh).on('complete', function(data) {
var $ = cheerio.load(data);
prices.push($(".priceclass").text());
console.log(x=x+1);
});
});
Console outputs 1 2 3 4 5 6 7 8 9
I was thinking that I might be going about this wrong. I've been trying to push some numbers onto an array, and then outside the callbacks do something with that array.
Anyways it seems clear that >1 restler eventlisteners aren't gonna work together at all.
Maybe rephrasing the question would help:
How would I scrape a number of URLs, then act on that data?
Currently looking into request & async libraries, via code from the extinguished node.io library
To answer the rephrased question, scramjet is great for this if you use ES6+ and node which I assume you do:
How would I scrape a number of URLs, then act on that data?
Install the packages:
npm install scramjet node-fetch --save
Scramjet works on streams - it will read your list of url's and make each url a stream that you can work with as simple as with Array's. node-fetch is a simple node module that follows the standard Fetch Web API.
A simple example that also reads the url's from a file, assuming you store them one per line:
const {StringStream} = require("scramjet");
const fs = require("fs")
const fetch = require("node-fetch");
fs.createReadStream(process.argv[2]) // open the file for reading
.pipe(new StringStream()) // redirect it to scramjet stream
.split("\n") // split line by line
.map((url) => fetch(url)) // get the URL from the endpoint
.map((resp) => JSON.parse(resp)) // parse the response
.toArray() // accumulate the data into an Array
.then(
(data) => doYourStuff(data), // do the calculations
(err) => showErrorMessage(data)
)
Thanks to the way scramjet works, you needn't worry about error handling (all errors are caught automatically) and managing simultaneous requests. If you can parse the files url by url, then you can also make this very memory and resources efficient - as it won't ready and try to fetch all the items at once, but it will do some work in parallel.
There are more examples and the full API description in the scramjet docs.
Related
I am new to programming, and I heard that some guys on this website are quite angry, but please don't be. I am creating one web app, that has a web page and also makes som ecalculations and works with database (NeDB). I have an index.js
const selects = document.getElementsByClassName("sel");
const arr = ["Yura", "Nairi", "Mher", "Hayko"];
for (let el in selects) {
for (let key in arr) {
selects[el].innerHTML += `<option>${arr[key]}</option>`;
}
}
I have a function which fills the select elements with data from an array.
In other file named: getData.js:
var Datastore = require("nedb");
var users = new Datastore({ filename: "players" });
users.loadDatabase();
const names = [];
users.find({}, function (err, doc) {
for (let key in doc) {
names.push(doc[key].name);
}
});
I have some code that gets data from db and puts it in array. And I need that data to use in the index.js mentioned above, but the problem is that I don't know how to tranfer the data from getData.js to index.js. I have tried module.exports but it is not working, the browser console says that it can't recognize require keyword, I also can't get data directly in index.js because the browse can't recognize the code related to database.
You need to provide a server, which is connected to the Database.
Browser -> Server -> DB
Browser -> Server: Server provides endpoints where the Browser(Client) can fetch data from. https://expressjs.com/en/starter/hello-world.html
Server -> DB: gets the Data out of the Database and can do whatever it want with it. In your case the Data should get provided to the Client.
TODOs
Step 1: set up a server. For example with express.js (google it)
Step 2: learn how to fetch Data from the Browser(Client) AJAX GET are the keywords to google.
Step 3: setup a Database connection from you Server and get your data
Step 4: Do whatever you want with your data.
At first I thought it is a simple method, but them I researched a little bit and realized that I didn't have enough information about how it really works. Now I solved the problem, using promises and templete engine ejs. Thank you all for your time. I appreciate your help)
I am making a discord bot in Node.js mostly for fun to get better at coding and i want the bot to push a string into an array and update the array file permanently.
I have been using separate .js files for my arrays such as this;
module.exports = [
"Map: Battlefield",
"Map: Final Destination",
"Map: Pokemon Stadium II",
];
and then calling them in my main file. Now i tried using .push() and it will add the desired string but only that one time.
What is the best solution to have an array i can update & save the inputs? apparently JSON files are good for this.
Thanks, Carl
congratulations on the idea of writing a bot in order to get some coding practice. I bet you will succeed with it!
I suggest you try to split your problem into small chunks, so it is going to be easier to reason about it.
Step1 - storing
I agree with you in using JSON files as data storage. For an app that is intended to be a "training gym" is more than enough and you have all the time in the world to start looking into databases like Postgres, MySQL or Mongo later on.
A JSON file to store a list of values may look like that:
{
"values": [
"Map: Battlefield",
"Map: Final Destination",
"Map: Pokemon Stadium II"
]
}
when you save this piece of code into list1.json you have your first data file.
Step2 - reading
Reading a JSON file in NodeJS is easy:
const list1 = require('./path-to/list1.json');
console.log(list.values);
This will load the entire content of the file in memory when your app starts. You can also look into more sophisticated ways to read files using the file system API.
Step3 - writing
Looks like you know your ways around in-memory array modifications using APIs like push() or maybe splice().
Once you have fixed the memory representation you need to persist the change into your file. You basically have to write it down in JSON format.
Option n.1: you can use the Node's file system API:
// https://stackoverflow.com/questions/2496710/writing-files-in-node-js
const fs = require('fs');
const filePath = './path-to/list1.json';
const fileContent = JSON.stringify(list1);
fs.writeFile(filePath, fileContent, function(err) {
if(err) {
return console.log(err);
}
console.log("The file was saved!");
});
Option n.2: you can use fs-extra which is an extension over the basic API:
const fs = require('fs-extra');
const filePath = './path-to/list1.json';
fs.writeJson(filePath, list1, function(err) {
if(err) {
return console.log(err);
}
console.log("The file was saved!");
});
In both cases list1 comes from the previous steps, and it is where you did modify the array in memory.
Be careful of asynchronous code:
Both the writing examples use non-blocking asynchronous API calls - the link points to a decent article.
For simplicity sake, you can first start by using the synchronous APIs which is basically:
fs.writeFileSync
fs.writeJsonSync
You can find all the details into the links above.
Have fun with bot coding!
I am trying to do some web scraping for prices and the shipping weight, so I can calculate the shipping costs for my items, I am using Amazon in this case. I tried to use NodeJS and create an API so I can hook it up with a front end for ease of use, but somehow, It doesn't return the element, even though the element clearly exists, and it works in Python as you will see below...
Here's my NodeJS code, for the sake of this question I put the AMD Ryzen's link as a URL:
const cheerio = require('cheerio');
const request = require('request');
const url = `https://www.amazon.com/AMD-Ryzen-Processor-Wraith-Cooler/dp/B07B428M7F/ref/=sr_1_2/?ie\=UTF8\&qid\=1540883858\&sr\=8-2\&keywords\=amd`;
request(url, (error, response, body) => {
if(error) console.log(error);
let $ = cheerio.load(body);
console.log($('#priceblock_ourprice').text()); // Returns an empty line, even though it works in Python.
});
And here's my Python code that works:
import requests, urllib, sys
from pyquery import PyQuery as pq
d = pq(url="https://www.amazon.com/AMD-Ryzen-Processor-Wraith-Cooler/dp/B07B428M7F/ref/=sr_1_2/?ie\=UTF8\&qid\=1540883858\&sr\=8-2\&keywords\=amd")
print(d('#priceblock_ourprice').text()) # Returns $309.89 as expected.
It uses the same URL, but still returns the element as expected, I even tried using different request modules for NodeJS, still the same result, could the problem be with Cheerio? Any inputs are welcome.
So I finally got around the problem, I honestly don't know why the span with that id shows in the Python version and not in the NodeJS version, what I did to debug this was dumping the entire response to a file and then search and see if the span with that specific ID was there, and it turned out, it wasn't there... Luckily for me I found a div with attached data attributes, and one of those data attributes was the price, so I changed my DOM selector to:
$('#cerberus-data-metrics').data('asin-price')
And it works now.
I use lowDB dependency to control the JSON Data with Express and actually it works. But there is a bug and I cannot find how to solve it.
I create /create page to add information in JSON file and it contains 4 form and submit button.
And In express I code like this. each forms data will save it in variable and push with lowdb module.
router.post('/post', function (req, res) {
let pjName = req.body.projectName;
let pjURL = req.body.projectURL;
let pjtExplanation = req.body.projectExplanation;
let pjImgURL = req.body.projectImgURL;
console.log(pjName);
db.get('project').push({
name: pjName,
url: pjURL,
explanation: pjtExplanation,
imgurl: pjImgURL
}).write();
console.log(db.get('project'));
console.log(db.get('project').value());
res.redirect('/');
})
And it works well. But when I modify the JSON file myself (ex. reset the JSON file) and execute again. It shows the data that I reset before. I think in this app somewhere saves the all data and show save it in array again.
And When I shutdown the app in CMD and execute again, the Array is initialized.
As you may already know the lowdb persist the data into your secondary memory (hdd), and may return a promise depending on your environment when you call write method.As mentioned in the doc
Persists database using adapter.write (depending on the adapter, may return a promise).
So the data may be still getting write when you read them, so the old data is queried. Try this,
db.get('project').push({
name: pjName,
url: pjURL,
explanation: pjtExplanation,
imgurl: pjImgURL
}).write().then(() => {
console.log(db.get('project'));
console.log(db.get('project').value());
});
I am trying to get results from the twilio api like so: twilio => our secured api backend => our client app We are doing this to project our api-keys and other security based purposes.
We have sending faxes down, and also checking for single instances. I am however having a hard time getting the list of faxes we get sent back to our client app after it completes. Mainly due to the fact that it's a repeating call. This is what we have so far for this problem:
app.post('/fax', function (req, res) {
const faxList = [];
const getFax = client.fax.faxes.each((faxes) => {
faxList.push(faxes);
console.log(faxList);
});
Right now when I run this I see the array populated one by one just like it should, but can't seem to return the final result after it completes.
From my searches online it looks like I need to utilize Promise.all to send my completed res.status(200).json(faxList); so express can send the list of faxes to our app. I'm having issues setting up the promise.all as the faxList variable is just empty. Almost as if the push done to the array doesn't last once the call completes.
Does this have something to do with the way that twilio has their fax api function set up? https://github.com/twilio/twilio-node/blob/master/lib/rest/fax/v1/fax.js or would this me not understanding how promise.all functions?
I'm newer to the node side of javascript. I have more experience with other languages so I apologize in advance.
I would try to get the whole list , if you have less than a page worth of faxes . (I think a page in Twilio is 50 )
like so
return new Promise((resolve) => {
client.faxes.list().then(function(faxes){
if (!empty(faxes)){
resolve(faxes);
}
});
});