How to scrape video url using puppeteer? - javascript

I'm trying to scrape video url of Instagram videos using puppeteer but unable to do it. it is returning null as a response
here is my code
async function getVideo(){
const launch = await puppeteer.launch({headless: true});
const page = await launch.newPage();
await page.goto('https://www.instagram.com/p/CfW5u5UJmny/?hl=en');
const video = await page.evaluate(() => {
return document.querySelector('video').src;
});
console.log(video); returns null
await launch.close();
}
example ur: https://instagram.fluh1-1.fna.fbcdn.net/v/t50.16885-16/290072800_730588251588660_5005285215058589375_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5pZ3R2LmJhc2VsaW5lIiwicWVfZ3JvdXBzIjoiW1wiaWdfd2ViX2RlbGl2ZXJ5X3Z0c19vdGZcIl0ifQ&_nc_ht=instagram.fluh1-1.fna.fbcdn.net&_nc_cat=100&_nc_ohc=ROJWkaOqkQcAX_z-_Ls&edm=AP_V10EBAAAA&vs=440468611258459_2442386419&_nc_vs=HBksFQAYJEdPQW9TaEUwaURaVmQ1Z0NBTC0yRkV0aVdIWkZidlZCQUFBRhUAAsgBABUAGCRHTEdvVHhGMWFjUUpsMzhDQUZNT0c1cV8wT3c1YnZWQkFBQUYVAgLIAQAoABgAGwGIB3VzZV9vaWwBMRUAACaa%2BO%2FYnLPeQBUCKAJDMywXQCDdsi0OVgQYEmRhc2hfYmFzZWxpbmVfMV92MREAdewHAA%3D%3D&ccb=7-5&oh=00_AfCBrACQlXOqmbGSWRk_6Urv_fmHJUFDIt-8w6EO0_UcHQ&oe=638D6CBD&_nc_sid=4f375e

You are loading the Instagram page. Since it takes a little while to load, I used setTimeout function to wait. Puppeteer also has many inbuilt functions you can use to obtain the src, such as the following.
async function getVideo(){
const launch = await puppeteer.launch({headless: false});
const page = await launch.newPage();
await page.goto('https://www.instagram.com/p/CfW5u5UJmny/?hl=en');
setTimeout(async () => {
let src = await page.$eval("video", n => n.getAttribute("src"))
console.log(src);
await launch.close();
}, 1000)
}

Related

Trying to use puppeteer inside async function inside async function which already has puppeteer

I'm trying to build telegram bot to parse page on use request. My parsing code works fine inside one async function, but completeky falls on its face if I try to put it inside another async function.
Here is the relevant code I have:
const puppeteer = require('puppeteer');
const fs = require('fs/promises');
const { Console } = require('console');
async function start(){
async function searcher(input) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const url = ; //here is a long url combining logic, that works fine
await page.goto(url);
const currentUrl = requestPage.url();
console.log(currentUrl); //returns nothing.
//here is some long parsing logic
await browser.close();
return combinedResult;
}
//here is a bot code
const { Telegraf } = require('telegraf');
const bot = new Telegraf('my bot ID');
bot.command('start', ctx => {
console.log(ctx.from);
bot.telegram.sendMessage(ctx.chat.id, 'Greatings message', {});
bot.telegram.sendMessage(ctx.chat.id, 'request prompt ', {});
})
bot.on('text', (ctx) => {
console.log(ctx.message.text);
const queryOutput = searcher(ctx.message.text);
bot.telegram.sendMessage(ctx.chat.id, queryOutput, {});
});
bot.launch()
}
start();
Here is an error message:
/Users/a.rassanov/Desktop/Fetch/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:218
return Promise.reject(new Error(`Protocol error (${method}): Session closed. Most likely the ${this._targetType} has been closed.`));
^
Error: Protocol error (Page.navigate): Session closed. Most likely the page has been closed.
I'm very new to this, and your help is really appriciated.

Node using test for UI login

I use the following blog to use the playright
login and I need something similar to use for my app, when I use the headless:flase
I see it opens the UI with the user password in however, it doesnt click on the logon button, I adopt the code I try with the following , am I missing something?
(async () => {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://app.com');
await page.fill('input[type="text"]', 'user#test.com');
await page.fill('input[type="password"]', 'Abcd1234!');
// page.click('div[data-testid="LoginForm_Login_Button"]');
page.click('div[id="logOnFormSubmit"]');
}
)();
You are currently using
page.click('div[id="logOnFormSubmit"]');
There is no div in your given code example with that ID, but instead there is a button. You'd need to change that line to reflect this. The final code would look like below.
(async () => {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://app.com');
await page.fill('input[type="text"]', 'user#test.com');
await page.fill('input[type="password"]', 'Abcd1234!');
// page.click('div[data-testid="LoginForm_Login_Button"]');
page.click('button[id="logOnFormSubmit"]');
}
)();

Favicon missing in headless puppeteer

I'm writing a simple script to just check if all the resources load correctly (i check status codes of the responses). I've decided to use puppeteer for this and i wrote
(async () => {
const browser = await puppeteer.launch({headless: false});
const [page] = await browser.pages();
page.on("response", (res) => {
const url = res.url(), status=res.status();
// my functionality goes here
if (url.includes("favicon")) console.log(status, url); // not logging in headless
});
await page.goto("https://stackoverflow.com/", {waitUntil: 'networkidle2'});
await browser.close();
})();
the issue is that if i run my application in headless mode the favicon is missing from the responses, i assume it has something to do with puppeteer not loading a favicon in headless. Any built in functionality or workarounds?
From the lack of a better solution right now I'm evaluating favicons url and manually visiting it
async function checkFavicon(page){
const iconUrl = await page.$eval("link[rel*='icon']", ({href}) => href);
await page.goto(iconUrl);
await page.goBack();
}

I can't go from a page to another using page.goto() - Puppeteer

I'm trying to make a InstagramBot that logs in and then go to some profile, my code worked yesterday for awhile and than it just stopped working .
I've tried to clone my repository from github, but it does'n work either, sometimes it works again, but if I try to create another function, the code just ignore the line of the code that changes the page.
I've also tried to create a new page and then in this new page use the goto function and it worked, but the account doesn keep logged in
The version of puppeteer that I'm using: 1.16.0
The version of node.js that I'm using: v10.15.3
const puppeteer = require('puppeteer');
const BASE_URL = "https://www.instagram.com/accounts/login/?hl=en&source=auth_switcher";
const instagram = {
browser: null,
page: null,
profile_url: null,
initialize: async (profile) => {
instagram.browser = await puppeteer.launch({
headless: false
})
instagram.profile_url = await "https://www.instagram.com/" + profile;
instagram.page = await instagram.browser.newPage();
await instagram.page.goto(BASE_URL, {waitUntil: 'networkidle2'});
},
login: async(username, password) =>{
await instagram.page.waitFor(1000);
await instagram.page.type('input[name="username"]', username);
await instagram.page.type('input[name="password"', password);
await instagram.page.click('button[type="submit"]');
await instagram.page.waitFor(1500);
await console.log(instagram.profile_url);
await instagram.page.goto(instagram.profile_url, {timeout: 0, waitUntil: 'domcontentloaded'}); // the code just ignore this line
await instagram.page.waitFor(1000);
},
getPhotosLinks: async() => {
console.log("Do something here");
}
}
module.exports = instagram;
It doesn't give any error message, just doesn't work
Replace
await instagram.page.click('button[type="submit"]');
await instagram.page.waitFor(1500);
with
await Promise.all([
instagram.page.click('button[type="submit"]');,
instagram.page.waitForNavigation()
]);
and see if it works

Puppeteer: Grabbing html from page that doesn't refresh after input tag button is clicked

I am trying to grab some html after a input tag button is clicked. I am clicking the button with page.evaluate() since page.click() does not seem to work for an input tag button. I have tried visual debugging with headless:false in the puppeteer launch options to verify that the browser indeed navigated to the point after the button is clicked. I am unsure as to why page.content() returns the html before the button is clicked rather than the html after the event happens.
const puppeteer = require('puppeteer');
const url = 'http://www.yvr.ca/en/passengers/flights/departing-flights';
const fs = require('fs');
const tomorrowSelector = '#flights-toggle-tomorrow'
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.goto(url);
await page.evaluate((selector)=>document.querySelector(selector).click(),tomorrowSelector);
let html = await page.content();
await fs.writeFile('index.html', html, function(err){
if (err) console.log(err);
console.log("Successfully Written to File.");
});
await browser.close();
});
You can click on the label for the radio. Also, you need to wait for some sign of changed state (for XHR/fetch response or new selectors). For example, this code works for me, but you can use any other condition or just wait for some seconds.
const fs = require('fs');
const puppeteer = require('puppeteer');
const url = 'http://www.yvr.ca/en/passengers/flights/departing-flights';
const tomorrowLabelSelector = 'label[for=flights-toggle-tomorrow]';
const tomorrowLabelSelectorChecked = '.yvr-form__toggle:checked + label[for=flights-toggle-tomorrow]';
puppeteer.launch({ headless: false }).then(async (browser) => {
const page = await browser.newPage();
await page.goto(url);
await Promise.all([
page.click(tomorrowLabelSelector),
page.waitForSelector(tomorrowLabelSelectorChecked),
]);
const html = await page.content();
await fs.writeFile('index.html', html, (err) => {
if (err) console.log(err);
console.log('Successfully Written to File.');
});
// await browser.close();
});

Categories