Can't get json with axios.get and headers - javascript

I am trying to get the joke from https://icanhazdadjoke.com/. This is the code I used
const getDadJoke = async () => {
const res = await axios.get('https://icanhazdadjoke.com/', {headers: {Accept: 'application/json'}})
console.log(res.data.joke)
}
getDadJoke()
I expected to get the joke but instead I got the full html page, as if I didn't specify the headers at all. What am I doing wrong?

If you look at the API documentation for icanhazdadjoke.com, there is a section titled "Custom user agent." In that section, they explain how they want any requests to have a User Agent header. If you use Axios in a browser context, the User Agent is set for you by your browser. But I'm going to go out on a limb and say that you are running this code via Node, in which case, you may manually need to set the User Agent header, like so:
const getDadJoke = async () => {
const res = await axios.get(
'https://icanhazdadjoke.com/',
{
headers:
{
'Accept': 'application/json',
'User-Agent': 'my URL, email or whatever'
}
}
)
console.log(res.data.joke)
}
getDadJoke()
The docs say what they want you to put for the User Agent, but I think it would honestly work if there were any User Agent field at all.

The HTML page you're getting is a 503 response from Cloudflare.
As per the API documentation
Custom user agent
If you intend on using the icanhazdadjoke.com API we kindly ask that you set a custom User-Agent header for all requests.
My guess is they have a Cloudflare Browser Integrity Check configured that's triggering for the default Node / Axios user-agent.
Setting a custom user-agent appears to get around this...
const getDadJoke = async () => {
try {
const res = await axios.get("https://icanhazdadjoke.com/", {
headers: {
accept: "application/json",
"user-agent": "My Node and Axios app", // use something better than this
},
});
console.log(res.data.joke);
} catch (err) {
console.error(err.response?.data, err.toJSON());
}
};
Given how unreliable Axios releases have been since v1.0.0, I highly recommend you switch to something else. The Fetch API is available natively in Node since v18
const getDadJoke = async () => {
try {
const res = await fetch("https://icanhazdadjoke.com/", {
headers: {
accept: "application/json",
"user-agent": "My Node and Fetch app", // use something better than this
},
});
if (!res.ok) {
const err = new Error(`${res.status} ${res.statusText}`);
err.text = await res.text();
throw err;
}
console.log((await res.json()).joke);
} catch (err) {
console.error(err, err.text);
}
};

Using Axios REST API call which response JSON format.
If you using API from https://icanhazdadjoke.com/api#authentication
, you can use Axios.
Here is example.
Alternative method.
You needs to use web scrapping method for this case. Because HTML response from https://icanhazdadjoke.com/.
This is example how to scrap using puppeteer library in node.js
Demo code
Save as get-joke.js file.
const puppeteer = require("puppeteer");
async function getJoke() {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://icanhazdadjoke.com/');
const joke = await page.evaluate(() => {
const jokes = Array.from(document.querySelectorAll('p[class="subtitle"]'))
return jokes[0].innerText;
});
await browser.close();
return Promise.resolve(joke);
} catch (error) {
return Promise.reject(error);
}
}
getJoke()
.then((joke) => {
console.log(joke);
})
Selector
Main Idea to use DOM tree selector
In the Chrome's DevTool (by pressing F12), shows HTML DOM tree structures.
<p> tag has class name is subtitle
document.querySelectorAll('p[class="subtitle"]')
Install dependency and run it
npm install puppeteer
node get-joke.js
Result
You can get the joke from that web site.

Related

Why use Next.js API route with an external API?

I am new in Next.js.
I want to know what is the use of export default function handler because we can directly call the API using fetch.
In my HTML code I put below code. When I click on submit button sendformData() function will be called.
<input type="button" value="Submit" onClick={() => this.sendformData()} ></input>
sendformData = async () => {
const res = await fetch("/api/comments/getTwitFrmUrl?twitUrl=" + this.state.twitUrl, {
headers: {
"Content-Type": "application/json",
},
method: "GET",
});
const result = await res.json();
this.setState({ data: result.data });
};
When sendformData function is called, it calls /api/comments/ file and calls the function.
Here is the /api/comments/[id].js file code.
export default async function handler(req, res) {
if (req.query.id == 'getTwitFrmUrl') {
const resData = await fetch(
"https://dev.. .com/api/getTwitFrmUrl?twitId=" + req.query.twitUrl
).then((response) => response.text()).then(result => JSON.parse(result).data);
res.status(200).json({ data: resData });
}
else if (req.query.id == 'getformdata') {
console.log('getformdata api');
res.status(200).json({ user: 'getuserData' });
}
}
When I put the below code in the sendformData same response will be retrieved. So why we need to call
export default function handler function?
sendformData = async () => {
const res = await fetch(
"https://dev.. .com/api/getTwitFrmUrl?twitId=" + req.query.twitUrl
).then((response) => response.text()).then(result => JSON.parse(result).data);
const result = await res.json();
this.setState({ data: result.data });
};
If you already have an existing API there's no need to proxy requests to that API through an API route. It's completely fine to make a direct call to it.
However, there are some use cases for wanting to do so.
Security concerns
For security reasons, you may want to use API routes to hide an external API URL, or avoid exposing environment variables needed for a request from the browser.
Masking the URL of an external service (e.g. /api/secret instead of https://company.com/secret-url)
Using Environment Variables on the server to securely access external services.
— Next.js, API Routes, Use Cases
Avoid CORS restrictions
You may also want to proxy requests through API routes to circumvent CORS. By making the requests to the external API from the server CORS restrictions will not be applied.

How to do A/B testing with Cloudflare workers

I'm looking for an example of what these two lines of code look like in a functioning A/B testing worker. From https://developers.cloudflare.com/workers/examples/ab-testing
const TEST_RESPONSE = new Response("Test group") // e.g. await fetch("/test/sompath", request)
const CONTROL_RESPONSE = new Response("Control group") // e.g. await fetch("/control/sompath", request)
I used the examples, subsituting the paths I’m using, and got a syntax error saying await can only be used in async. So I changed the function to async function handleRequest(request) and got a 500 error.
What should these two lines look like for the code to work?
Working example
document.getElementById("myBtn").addEventListener("click", myFun);
async function myFun() {
const isTest = Math.random() > 0.5 //here you place your condition
const res = isTest ? await fetch1() : await fetch1() //here you should have different fetches for A and B
console.log(res)
document.getElementById("demo").innerHTML = res.message;
}
async function fetch1() {
//I took the link from some other SO answer as I was running into CORS problems with e.g. google.com
return fetch("https://currency-converter5.p.rapidapi.com/currency/list?format=json", {
"method": "GET",
"headers": {
"x-rapidapi-host": "currency-converter5.p.rapidapi.com",
"x-rapidapi-key": "**redacted**"
}
})
.then(response => response.json())
}
<button id="myBtn">button</button>
<div id="demo">demo</div>
<script src="https://unpkg.com/#babel/standalone#7/babel.min.js"></script>
<script type="text/babel">
</script>
Okay, I've worked on this a bit and this worker below seems to fulfill your requirements:
async function handleRequest(request) {
const NAME = "experiment-0";
try {
// The Responses below are placeholders. You can set up a custom path for each test (e.g. /control/somepath ).
const TEST_RESPONSE = await fetch(
"https://httpbin.org/image/jpeg",
request
);
const CONTROL_RESPONSE = await fetch(
"https://httpbin.org/image/png",
request
);
// Determine which group this requester is in.
const cookie = request.headers.get("cookie");
if (cookie && cookie.includes(`${NAME}=control`)) {
return CONTROL_RESPONSE;
} else if (cookie && cookie.includes(`${NAME}=test`)) {
return TEST_RESPONSE;
} else {
// If there is no cookie, this is a new client. Choose a group and set the cookie.
const group = Math.random() < 0.5 ? "test" : "control"; // 50/50 split
const response = group === "control" ? CONTROL_RESPONSE : TEST_RESPONSE;
return new Response(response.body, {
status: response.status,
headers: {
...response.headers,
"Set-Cookie": `${NAME}=${group}; path=/`,
},
});
}
} catch (e) {
var response = [`internal server error: ${e.message}`];
if (e.stack) {
response.push(e.stack);
}
return new Response(response.join("\n"), {
status: 500,
headers: {
"Content-type": "text/plain",
},
});
}
}
addEventListener("fetch", (event) => {
event.respondWith(handleRequest(event.request));
});
There are some issues with the snippet once you uncomment the fetch() parts:
Yes, the function needs to be async in order to use await keyword, so I added that.
Additionally, I think the 500 error you were seeing was also due to a bug in the snippet: You'll notice I form a new Response instead of modifying the one that was chosen from the ternary expression. This is because responses are immutable in the workers runtime, so adding headers to an already instantiated response will result in an error. Therefore, you can just create an entirely new Response with all the bits from the original one and it seems to work.
Also, in order to gain insight into errors in a worker, always a good idea to add the try/catch and render those errors in the worker response.
To get yours working on this, just replace the httpbin.org urls with whatever you need to A/B test.

Detect if a web page is using google analytics

i have a node server. I pass a Url into request and then extract the contects with cherio. Now what im trying to do is detect if that webpage is using google analytics. How would i do this?
request({uri: URL}, function(error, response, body)
{
if (!error)
{
const $ = cheerio.load(body);
const usesAnalytics = body.includes('googletag') || body.includes('analytics.js') || body.includes('ga.js');
const isUsingGA = ?;
}
}
From the official analytics site, they say that you can find some strings that would indicate GA is active. I have tried scanning the body for these but they always return false even if that page is running GA. I included this in the code above.
Ive looked at websites that use it and I cant see anything in their index that would suggest they are using it. Its only when i go to their sources and see they are using it. How would i detect this in node?
I have Node script which uses Puppeteer to monitor the requests sent from a website.
I wrote this some time ago so some parts might be irrelevant to you but here you go:
'use strict';
const puppeteer = require('puppeteer');
function getGaTag(lookupDomain){
return new Promise((resolve) => {
(async() => {
var result = [];
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', request => {
const url = request.url();
const regexp = /(UA|YT|MO)-\d+-\d+/i;
// look for tracking script
if (url.match(/^https?:\/\/www\.google-analytics\.com\/(r\/)?collect/i)) {
console.log(url.match(regexp));
console.log('\n');
result.push(url.match(regexp)[0]);
}
request.continue();
});
try {
await page.goto(lookupDomain);
await page.waitFor(9000);
} catch (err) {
console.log("Couldn't fetch page " + err);
}
await browser.close();
resolve(result);
})();
})
}
getGaTag('https://store.google.com/').then(result => {
console.log(result)
})
Running node ga-check.js now returns the UA ID of the Google Analytucs tracker on the lookup domain: [ 'UA-54090495-1' ] which in this case is https://store.google.com
Hope this helps!

Manually change response URL during Puppeteer request interception

I'm having a hard time navigating relative urls with puppeteer for a specific use case. Below you can see the basic setup and an pseudo example describing the problem.
Essentially I want to change the current url the browser thinks he is at.
What I already tried:
Manipulating the response body by resolving all relative URLs by myself. Collides with some javascript based links.
Triggering a new page.goto(response.url) if request url doesn't match response url and returning the response from the previous request. Can't seem to input custom options, so I don't know which request is a fake page.goto.
Can somebody lend me a helping hand? Thanks in advance.
Setup:
const browser = await puppeteer.launch({
headless: false,
});
const [page] = await browser.pages();
await page.setRequestInterception(true);
page.on('request', (request) => {
const resourceType = request.resourceType();
if (['document', 'xhr', 'script'].includes(resourceType)) {
// fetching takes place on an different instance and handles redirects internally
const response = await fetch(request);
request.respond({
body: response.body,
statusCode: response.statusCode,
url: response.url // no effect
});
} else {
request.abort('aborted');
}
});
Navigation:
await page.goto('https://start.de');
// redirects to https://redirect.de
await page.click('a');
// relative href '/demo.html' resolves to https://start.de/demo.html instead of https://redirect.de/demo.html
await page.click('a');
Update 1
Solution
Manipulating the browser history direction via window.location.
await page.goto('https://start.de');
// redirects to https://redirect.de internally
await page.click('a');
// changing current window location
await page.evaluate(() => {
window.location.href = 'https://redirect.de';
});
// correctly resolves to https://redirect.de/demo.html instead of https://start.de/demo.html
await page.click('a');
When you match the request that you want to edit its body, just get the URL and make a call using "node-fetch" or "request" modules, when you receive the body edit it then sends it as a response to the original request.
for example:
const requestModule = require("request");
const cheerio = require("cheerio");
page.on("request", async (request) => {
// Match the url that you want
const isMatched = /page-12/.test(request.url());
if (isMatched) {
// Make a new call
requestModule({
url: request.url(),
resolveWithFullResponse: true,
})
.then((response) => {
const { body, headers, statusCode, statusMessage } = response;
const contentType = headers["content-type"];
// Edit body using cheerio module
const $ = cheerio.load(body);
$("a").each(function () {
$(this).attr("href", "/fake_pathname");
});
// Send response
request.respond({
ok: statusMessage === "OK",
status: statusCode,
contentType,
body: $.html(),
});
})
.catch(() => request.continue());
} else request.continue();
});

Programmatically capturing AJAX traffic with headless Chrome

Chrome officially supports running the browser in headless mode (including programmatic control via the Puppeteer API and/or the CRI library).
I've searched through the documentation, but I haven't found how to programmatically capture the AJAX traffic from the instances (ie. start an instance of Chrome from code, navigate to a page, and access the background response/request calls & raw data (all from code not using the developer tools or extensions).
Do you have any suggestions or examples detailing how this could be achieved? Thanks!
Update
As #Alejandro pointed out in the comment, resourceType is a function and the return value is lowercased
page.on('request', request => {
if (request.resourceType() === 'xhr')
// do something
});
Original answer
Puppeteer's API makes this really easy:
page.on('request', request => {
if (request.resourceType === 'XHR')
// do something
});
You can also intercept requests with setRequestInterception, but it's not needed in this example if you're not going to modify the requests.
There's an example of intercepting image requests that you can adapt.
resourceTypes are defined here.
I finally found how to do what I wanted. It can be done with chrome-remote-interface (CRI), and node.js. I'm attaching the minimal code required.
const CDP = require('chrome-remote-interface');
(async function () {
// you need to have a Chrome open with remote debugging enabled
// ie. chrome --remote-debugging-port=9222
const protocol = await CDP({port: 9222});
const {Page, Network} = protocol;
await Page.enable();
await Network.enable(); // need this to call Network.getResponseBody below
Page.navigate({url: 'http://localhost/'}); // your URL
const onDataReceived = async (e) => {
try {
let response = await Network.getResponseBody({requestId: e.requestId})
if (typeof response.body === 'string') {
console.log(response.body);
}
} catch (ex) {
console.log(ex.message)
}
}
protocol.on('Network.dataReceived', onDataReceived)
})();
Puppeteer's listeners could help you capture xhr response via response and request event.
You should check wether request.resourceType() is xhr or fetch first.
listener = page.on('response', response => {
const isXhr = ['xhr','fetch'].includes(response.request().resourceType())
if (isXhr){
console.log(response.url());
response.text().then(console.log)
}
})
const browser = await puppeteer.launch();
const page = await browser.newPage();
const pageClient = page["_client"];
pageClient.on("Network.responseReceived", event => {
if (~event.response.url.indexOf('/api/chart/rank')) {
console.log(event.response.url);
pageClient.send('Network.getResponseBody', {
requestId: event.requestId
}).then(async response => {
const body = response.body;
if (body) {
try {
const json = JSON.parse(body);
}
catch (e) {
}
}
});
}
});
await page.setRequestInterception(true);
page.on("request", async request => {
request.continue();
});
await page.goto('http://www.example.com', { timeout: 0 });

Categories