google-play-scraper and node errors

google-play-scraper and node errors - javascript

I'm trying to use google play scraper to scrape 100 applications, and show the results of those applications with all the permissions.
I've written multiple .js files to run via node, and all of them have returned errors.
I am not sure how to use promises here, so this code will not return any results, but does not give me a 404
var scraper = require("google-play-scraper")
// get a list of apps
var apps = scraper.search({
term: "foo",
num: 100,
})
var meta = []
// function to get meta from the app
function getMeta(item) {
var appId = item.appId
var metadata = scraper.permissions({
appId: appId,
})
// append metadata to meta somehow idk js that well anymore
}
// console.log(apps)
// await apps.forEach(getMeta)
async function printFiles () {
const files = await apps.forEach(getMeta)
for await (const file of fs.readFile(file, 'utf8')) {
console.log(contents)
}
}
the code sample below gives a 404 pointing to the location of the installed module
var gplay = require('google-play-scraper');
gplay.app({appId: 'com.dxco.pandavszombies'})
.then(console.log, console.log);
Here is the error for the second code sample.
Error: App not found (404)
at C:\Users\Matt\Documents\node_modules\google-play-scraper\lib\utils\request.js:44:19
at processTicksAndRejections (internal/process/task_queues.js:93:5)
{
status: 404
}

Related

discord.js says data isn't defined even though it is

I'm making a discord bot and I am following a guide, the following code is copied from the guide and is for registering slash commands
const { REST } = require('#discordjs/rest');
const { Routes } = require('discord-api-types/v10');
const { clientId, token } = require('./config.json');
const fs = require('node:fs');
const commands = [];
// Grab all the command files from the commands directory you created earlier
const commandFiles = fs.readdirSync('./commands').filter(file => file.endsWith('.js'));
// Grab the SlashCommandBuilder#toJSON() output of each command's data for deployment
for (const file of commandFiles) {
const command = require(`./commands/${file}`);
commands.push(command.data.toJSON());
}
// Construct and prepare an instance of the REST module
const rest = new REST({ version: '10' }).setToken(token);
// and deploy your commands!
(async () => {
try {
console.log(`Started refreshing ${commands.length} application (/) commands.`);
// The put method is used to fully refresh all commands in the guild with the current set
await rest.put(
Routes.applicationCommands(clientId),
{ body: commands },
);
console.log(`Successfully reloaded ${data.length} application (/) commands.`);
} catch (error) {
// And of course, make sure you catch and log any errors!
console.error(error);
}
})();
The commands folder only has 1 file right now and it is ping.js: (also copied from the guide)
const { SlashCommandBuilder } = require('discord.js');
module.exports = {
data: new SlashCommandBuilder()
.setName('ping')
.setDescription('Replies with Pong!'),
async execute(interaction) {
await interaction.reply('Pong!');
},
};
This code worked before, I tried it and it worked fine even with 2 commands. But when I tried adding a third one (by copying ping and just changing the values) it started saying: "ReferenceError: data is not defined" whenever I tried to run it. So I deleted the file and tried running it with the 2 that already worked but now it gave this error with those 2 as well. So I tried only with the ping file from the guide itself and even tried copying from the guide again and I couldn't get it to work again.

Collect hundreds of json files from url and combine into one json file in JavaScript

I am trying to 1) retrieve hundreds of separate json files from this website https://bioguide.congress.gov/ that contains legislators in the U.S., 2) process them and 3) combine them into a big json that contains all the individual records.
Some of the files I am working with (each individual legislator has a different url that contains their data in a json file format) can be found in these urls:
https://bioguide.congress.gov/search/bio/F000061.json
https://bioguide.congress.gov/search/bio/F000062.json
https://bioguide.congress.gov/search/bio/F000063.json
https://bioguide.congress.gov/search/bio/F000064.json
https://bioguide.congress.gov/search/bio/F000091.json
https://bioguide.congress.gov/search/bio/F000092.json
My approach is to create a for loop to loop over the different ids and combine all the records in an array of objects. Unfortunately, I am stuck trying to access the data.
So far, I have tried the following methods but I am getting a CORS error.
Using fetch:
url = "https://bioguide.congress.gov/search/bio/F000061.json"
fetch(url)
.then((res) => res.text())
.then((text) => {
console.log(text);
})
.catch((err) => console.log(err));
Using the no-cors mode in fetch and getting an empty response:
url = "https://bioguide.congress.gov/search/bio/F000061.json"
const data = await fetch(url, { mode: "no-cors" })
Using d3:
url = "https://bioguide.congress.gov/search/bio/F000061.json"
const data = d3.json(url);
I am getting a CORS related error blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. with all of them.
I would appreciate any suggestions and advice to work around this issue. Thanks.

Following on from what #code says in their answer, here's a contrived (but tested) NodeJS example that gets the range of data (60-69) from the server once a second, and compiles it into one JSON file.
import express from 'express';
import fetch from 'node-fetch';
import { writeFile } from 'fs/promises';
const app = express();
const port = process.env.PORT || 4000;
let dataset;
let dataLoadComplete;
app.listen(port, () => {
console.log(`Server running on port ${port}`);
});
function getData() {
return new Promise((res, rej) => {
// Initialise the data array
let arr = [];
dataLoadComplete = false;
// Initialise the page number
async function loop(page = 0) {
try {
// Use the incremented page number in the url
const uri = `https://bioguide.congress.gov/search/bio/F00006${page}.json`;
// Get the data, parse it, and add it to the
// array we set up to capture all of the data
const response = await fetch(uri);
const data = await response.json();
arr = [ ...arr, data];
console.log(`Loading page: ${page}`);
// Call the function again to get the next
// set of data if we've not reached the end of the range,
// or return the finalised data in the promise response
if (page < 10) {
setTimeout(loop, 1000, ++page);
} else {
console.log('API calls complete');
res(arr);
}
} catch (err) {
rej(err);
}
}
loop();
});
}
// Call the looping function and, once complete,
// write the JSON to a file
async function main() {
const completed = await getData();
dataset = completed;
dataLoadComplete = true;
writeFile('data.json', JSON.stringify(dataset, null, 2), 'utf8');
}
main();

Well, you're getting a CORS (Cross-Origin Resource Sharing) error because the website you're sending an AJAX request to (bioguide.congress.gov) has not explicitly enabled CORS, which means that you can't send AJAX requests (client-side) to that website because of security reasons.
If you want to send a request to that site, you must send a request from the server-side (such as PHP, Node, Python, etc).
More on the subject

Trouble with node-fetch Javascript

So I'm trying to build a weather app by using data from a weather API.
import fetch from 'node-fetch'
//fetch weather API
let weather
let getWeather = async() => {
let url = \https://api.openweathermap.org/data/2.5/weather?q=auckland&appid=c156947e2c7f0ccb0e2a20fde1d2c577`try {let res = await fetch(url)weather = await res.json() } catch (error) {console.log("error") } let weatherMain = weather.weather.map( el => el.description)if(weatherMain ="Rain"){console.log(weatherMain)// weatherImg = "[https://icon-library.com/images/raining-icon/raining-icon-1.jpg](https://icon-library.com/images/raining-icon/raining-icon-1.jpg)" } }console.log(getWeather())`
My problem is that I'm getting this error when running in vscode:
SyntaxError: Cannot use import statement outside a module
and this error when running in browser:
Uncaught TypeError: Failed to resolve module specifier "node-fetch". Relative references must start with either "/", "./", or "../".`
Not sure what exactly is going on, Can someone please explain what's happening?
I've tried fetch API once before and that time I didn't need to import fetch, so I'm pretty confused.
SS
Edit - Understood now, running in browser and in vscode is 2 different things. What works in the browser won't necessarily work in Node.js
When running in browser, there's no need to import fetch.
Thanks everyone.

let weather;
let getWeather = async () => {
let url = `https://api.openweathermap.org/data/2.5/weather?q=auckland&appid=c156947e2c7f0ccb0e2a20fde1d2c577`;
try {
let res = await fetch(url);
weather = await res.json();
console.log('weather', weather);
} catch (error) {
console.log(error);
}
let weatherMain = weather.weather.map((el) => el.description);
if ((weatherMain = 'Rain')) {
console.log('weatherMain', weatherMain);
let weatherImg =
'[https://icon-library.com/images/raining-icon/raining-icon-1.jpg](https://icon-library.com/images/raining-icon/raining-icon-1.jpg)';
return weatherImg;
}
};
const main = async () => {
const data = await getWeather();
console.log('data', data);
};
main();
Yes, you are right about no need to import fetch if you are running the js in the browser. But I see that you are importing node-fetch, this package is used to bring the fetch (window.fetch) for the node system.
But If you want to run it in the node, then you should know that the node doesn't support ES6 module. But you can user the experimental flag to run the code. e.g.
node --experimental-modules app.mjs

How can I troubleshoot this PDF generation?

Employee moved on and left me with this code that was once working to generate PDFs. I haven't had any luck trying to debug - with breakpoints or even console.logs - the script listed here at the bottom; is there a way to search the huge list of loaded scripts in Visual Studio?
C# error:
{System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.IO.IOException: The server returned an invalid or unrecognized response.
at System.Net.Http.HttpConnection.FillAsync()
Client Side error: (is this because the server never returns anything?)
ERROR Error: Uncaught (in promise): DataCloneError: Failed to execute 'postMessage' on 'Worker': TypeError: Failed to fetch could not be cloned.
Error: Failed to execute 'postMessage' on 'Worker': TypeError: Failed to fetch could not be cloned.
at MessageHandler.postMessage (pdf.js:12334)
at sendStreamRequest (pdf.js:12151)
at Object.error (pdf.js:12194)
at eval (pdf.js:8419)
at ZoneDelegate.invoke (zone.js:392)
Controller method
public async Task<IActionResult> Generate(string id)
{
try
{
var stream = await _reportService.GenerateReportAsync(id);
return new FileStreamResult(stream, "application/pdf");
}
catch(Exception ex)
{
throw;
}
}
Service method:
public async Task<Stream> GenerateReportAsync(string id)
{
return await Services.InvokeAsync<Stream>("./Node/generate-pdf.js", Configuration.Url, id, new { format = "A4" });
}
generate-pdf.js:
const pdf = require('html-pdf');
const puppeteer = require('puppeteer');
module.exports = async function (result, url, id, options) {
const browser = await createBrowser();
const page = await browser.newPage();
const css = await browser.newPage();
await page.goto(`${url}/reports/${id}`, {
waitUntil: 'networkidle2'
});
await css.goto(`${url}/styles.bundle.css`, {
waitUntil: 'networkidle2'
});
await page.waitForSelector('.report-loaded');
let cssBody = await css.evaluate(() => `<style>${document.documentElement.innerHTML}</style>`);
let bodyHtml = await page.evaluate(() => document.documentElement.innerHTML);
bodyHtml = bodyHtml.replace('<link href="styles.bundle.css" rel="stylesheet">', cssBody);
browser.close();
pdf.create(cssBody + bodyHtml, options).toStream((error, stream) => stream.pipe(result.stream));
}
async function createBrowser() {
return await puppeteer.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
}

Looks like the generate-pdf.js script is using "html-pdf". This can be found on npm:
https://www.npmjs.com/package/html-pdf
And it has a github page:
https://github.com/marcbachmann/node-html-pdf
So the problem is going to be with the usage of that package, or some kind of bug in it. (well, that's an assumption on my part, I don't know this package at all and have no experience working with it)
At this point I'd try to figure out which version of that package is being used, check out the source code and try to find a hint in there.
This structure seems rather convoluted though. Why not just generate the PDF in the client in the first place, or generate it in the C# code? That it was working at some point shouldn't be an argument as you are now noticing this is proving difficult to maintain.

Detect if a web page is using google analytics

i have a node server. I pass a Url into request and then extract the contects with cherio. Now what im trying to do is detect if that webpage is using google analytics. How would i do this?
request({uri: URL}, function(error, response, body)
{
if (!error)
{
const $ = cheerio.load(body);
const usesAnalytics = body.includes('googletag') || body.includes('analytics.js') || body.includes('ga.js');
const isUsingGA = ?;
}
}
From the official analytics site, they say that you can find some strings that would indicate GA is active. I have tried scanning the body for these but they always return false even if that page is running GA. I included this in the code above.
Ive looked at websites that use it and I cant see anything in their index that would suggest they are using it. Its only when i go to their sources and see they are using it. How would i detect this in node?

I have Node script which uses Puppeteer to monitor the requests sent from a website.
I wrote this some time ago so some parts might be irrelevant to you but here you go:
'use strict';
const puppeteer = require('puppeteer');
function getGaTag(lookupDomain){
return new Promise((resolve) => {
(async() => {
var result = [];
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', request => {
const url = request.url();
const regexp = /(UA|YT|MO)-\d+-\d+/i;
// look for tracking script
if (url.match(/^https?:\/\/www\.google-analytics\.com\/(r\/)?collect/i)) {
console.log(url.match(regexp));
console.log('\n');
result.push(url.match(regexp)[0]);
}
request.continue();
});
try {
await page.goto(lookupDomain);
await page.waitFor(9000);
} catch (err) {
console.log("Couldn't fetch page " + err);
}
await browser.close();
resolve(result);
})();
})
}
getGaTag('https://store.google.com/').then(result => {
console.log(result)
})
Running node ga-check.js now returns the UA ID of the Google Analytucs tracker on the lookup domain: [ 'UA-54090495-1' ] which in this case is https://store.google.com
Hope this helps!

We Keep Coding

JavaScript is the programming language of the Web.

google-play-scraper and node errors - javascript

Related

discord.js says data isn't defined even though it is

Collect hundreds of json files from url and combine into one json file in JavaScript

Trouble with node-fetch Javascript

How can I troubleshoot this PDF generation?

Detect if a web page is using google analytics

Categories

Resources