Scraping site with Request and Cheerio gets strange html content - javascript

I'm trying to scrape a simple website with Cheerio and Request
Here is my code:
import request from 'request';
request('http://michaelhyatt.com/page/2', function(err, res, html) {
console.log(html);
});
But the HTML that I get back is gibberish, some kind of weird encoded content:
���r� �lE�?��iSZb�,�DI�<��[k��-yy��v(#H�U������nE��y��y��9;��D����S֗�����M�duϲ�M�
H$�D"3��x����gg?�{����:�z���v�����4��7�c |���&����V��ڇ␌��3⎼�┌["�:��
What am I doing wrong? Other websites I have tried to scrape do not experience this issue.

I solve the same issue with axios, by just disable encoding on headers by:
const response = await axios.get(baseUrl, {
headers: {
"Accept-Encoding" : null
}
});
console.log(response)
The answer was found from the comments above from here

Related

Unable to retrieve data from axios GET request

I've been trying to send a GET request to an api to fetch data using Axios but always get a response object with status, headers, config, agents etc and response.data is always empty.
For example, the following code returns me an Axios response object with the hasBody set to true and data being empty.
axios.get(`https://fantasy.premierleague.com/api/leagues-classic/12000/standings/`).then(response => {console.log(response);
console.log(response.data);});
However, when I switched over to using Request library which has been deprecated, I am able to get the response body. For example, the following code works:
request(`https://fantasy.premierleague.com/api/leagues-classic/12000/standings/`, { json: true }, (err, res, body) => {
if (err) { return console.log(err); }
console.log(body);
});
Can someone tell me what am I doing wrong and how can I get the response body using axios? I'm a beginner and have spent hours trying to figure out so I would really appreciate any form of help.
It's not an axios library issue. From what I can tell, the server does't like the user-agents starting with "axios/". Specifying some user agent gives you the expected result:
const axios = require("axios");
axios.get(`https://fantasy.premierleague.com/api/leagues-classic/12000/standings`, {
headers: {
'user-agent': 'not axios',
}
}).then(response => {
console.log(response.data);
});
As for why the requests library works but axios does not: axios is setting the user-agent header to something like axios/0.21.1 or whatever version you have. requests on the other side, leaves the user-agent header unset. It's the server right to handle the request as he pleases.
I have verified the response from this URL https://fantasy.premierleague.com/api/leagues-classic/12000/standings/ - there is no data property in the response
Try like below to read the values:
It seem like your URL at https://fantasy.premierleague.com/api/leagues-classic/12000/standings/ had invalid response body.

Axios POST request sends data to Express server but Error 404

Axios POST request sends data to Express sever but Error 404
Hello, world, I am trying to build a user authentication server for a project I am working on, but I am running into a problem trying to send a POST request to my Node.js Express server.
I want to send a POST request using Axios containing a username and password from the browser. But once sending the request it gives me a 404 Not Found error. The request has to go to http://website/api/login and my Node.js code should return either "authed" or "invalid". I tested the API inside Postman and that seems to be working. I also exported the request code from Postman and tested it with fetch API, xhr, and Axios, all returning the same result.
The server receives the data and handles it properly, but when I look in the Chromium debugger it appears that the request URL is just http://website/ and not http://website/api/login. I am honestly lost and I have tried what feels like everything, but I can't seem to make it work. Any help in pointing me in the right direction would be amazing! Thank you!
The code I use for the POST request is:
const username = document.getElementById("username").value;
const password = document.getElementById("password").value;
const data = JSON.stringify({"username": username, "password":password});
const config = {
method: 'post',
url: 'http://website/api/login',
headers: {
'Content-Type': 'application/json'
},
data : data
};
axios(config).then(function (response) {
console.log(JSON.stringify(response.data));
}).catch(function (err) {
console.log(err);
})
}
This is what I see in the Chromium debugger:
Headers
This is my Node.js / Express code:
app.post('/api/login', function (req, res, next) {
scriptFile.authUser(req.body, function (err, state) {
if (err) console.log(err);
else {
if (state) {
res.send("authed");
} else {
res.send("invalid");
}
}
});
})
Thank you for any help I can get.
I am stupid,
Breakdown of what happened:
Everything was working fine except that I put the input data and submit button inside a form, which will refresh the page...
I fixed it by changing the form to a div.
Hey checking your chrome console pic looks like your post request is hitting the root api address 'http://website/' and not the full path 'http://website/api/login

How to fetch data from an API from GSX2JSON?

So I have this link I generated with GSX2JSON, and it looks like this: http://gsx2json.com/api?id=136PcbZppJfCH1vbE_j4X803umxv0_EWEg5Tjxnvvp7o&sheet=1. Now, I want to fetch the data into a variable, and so I used this code:
async function deetdeet(){
let response = await fetch('http://gsx2json.com/api?id=136PcbZppJfCH1vbE_j4X803umxv0_EWEg5Tjxnvvp7o&sheet=1');
if (response.ok) {
let json = await response.json();
console.log(json)
console.log("hyeet")
} else {
alert("Err: " + response.status);
}
}
deetdeet()
Sadly, this doesn't seem to return the JSON that is shown in the API, and I can't figure out why. I've tried using fetch() and even .getJSON() from JQUERY all to no avail. Is there an issue with my code, or the API I'm using?
Browsers block mixed content to protect against various attacks on users, so fetching HTTP resources from a HTTPS context will be blocked.
Look into proxying your request with a HTTPS API-Wrapper or using an API supporting HTTPS.
Make sure if you are running your site off HTTPS, that all fetch() requests are being handled through HTTPS as well.

Post request to JSON server through FETCH api refreshes the page

I am trying to send POST requests through fetch API to JSON-server. Function is called on a simple button click (type 'button', not 'submit'). When I replace POST request with GET request everything works like it supposed to, but with POST I have a problem. Request passes, on the JSON-server entity gets created but keeps refreshing the page after each request. Also, I don't have a response from JSON-server, google chrome says 'Failed to load response data'.
Where I'm making a mistake?
const comment = {
text: "test comment",
article_id: 3
};
console.log(JSON.stringify(comment));
const options = {
method: 'post',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify(comment)
}
fetch(`${URL_COMMENTS}`, options)
.then(response => { return response.json() })
.then(data => {
console.log(data)
});
If you use Live Server extension, try disabling that and try again.
Check out for Json sever port number running on your machine
attach the html form code
So we can try it on oru local machine to reproduce the issue.... Which help us to resolve the issue easy

Node.js can't request a squarespace site?

I'm trying to request this website but I keep getting a 400 Bad Request error. This code works for just about any other site I've tried that isn't built with squarespace so I'm guessing that's where the problem is.
var request = require('request');
var cheerio = require('cheerio');
var url = 'http://www.pond-mag.com/';
request(url, function(error, resp, body){
if(!error){
var $ = cheerio.load(body);
console.log(body);
}
});
Figured it out just had to manually set the headers object.
Heres the code that fixed it in case anyone else has the problem:
var options = {
url : 'http://www.pond-mag.com/',
headers: {
'User-Agent': 'request'
}
};
Then, just pass the options var to the request instead of the url.
I use Squarespace for a lot of different apps and thought it was worth mentioning that Squarespace has native support for getting the JSON of any Squarespace page. If you append ?format=json to the URL you can pull make the request and get JSON back.

Categories