I'm trying to scrape video url of Instagram videos using puppeteer but unable to do it. it is returning null as a response
here is my code
async function getVideo(){
const launch = await puppeteer.launch({headless: true});
const page = await launch.newPage();
await page.goto('https://www.instagram.com/p/CfW5u5UJmny/?hl=en');
const video = await page.evaluate(() => {
return document.querySelector('video').src;
});
console.log(video); returns null
await launch.close();
}
example ur: https://instagram.fluh1-1.fna.fbcdn.net/v/t50.16885-16/290072800_730588251588660_5005285215058589375_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5pZ3R2LmJhc2VsaW5lIiwicWVfZ3JvdXBzIjoiW1wiaWdfd2ViX2RlbGl2ZXJ5X3Z0c19vdGZcIl0ifQ&_nc_ht=instagram.fluh1-1.fna.fbcdn.net&_nc_cat=100&_nc_ohc=ROJWkaOqkQcAX_z-_Ls&edm=AP_V10EBAAAA&vs=440468611258459_2442386419&_nc_vs=HBksFQAYJEdPQW9TaEUwaURaVmQ1Z0NBTC0yRkV0aVdIWkZidlZCQUFBRhUAAsgBABUAGCRHTEdvVHhGMWFjUUpsMzhDQUZNT0c1cV8wT3c1YnZWQkFBQUYVAgLIAQAoABgAGwGIB3VzZV9vaWwBMRUAACaa%2BO%2FYnLPeQBUCKAJDMywXQCDdsi0OVgQYEmRhc2hfYmFzZWxpbmVfMV92MREAdewHAA%3D%3D&ccb=7-5&oh=00_AfCBrACQlXOqmbGSWRk_6Urv_fmHJUFDIt-8w6EO0_UcHQ&oe=638D6CBD&_nc_sid=4f375e
You are loading the Instagram page. Since it takes a little while to load, I used setTimeout function to wait. Puppeteer also has many inbuilt functions you can use to obtain the src, such as the following.
async function getVideo(){
const launch = await puppeteer.launch({headless: false});
const page = await launch.newPage();
await page.goto('https://www.instagram.com/p/CfW5u5UJmny/?hl=en');
setTimeout(async () => {
let src = await page.$eval("video", n => n.getAttribute("src"))
console.log(src);
await launch.close();
}, 1000)
}
Related
I'm trying to build telegram bot to parse page on use request. My parsing code works fine inside one async function, but completeky falls on its face if I try to put it inside another async function.
Here is the relevant code I have:
const puppeteer = require('puppeteer');
const fs = require('fs/promises');
const { Console } = require('console');
async function start(){
async function searcher(input) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const url = ; //here is a long url combining logic, that works fine
await page.goto(url);
const currentUrl = requestPage.url();
console.log(currentUrl); //returns nothing.
//here is some long parsing logic
await browser.close();
return combinedResult;
}
//here is a bot code
const { Telegraf } = require('telegraf');
const bot = new Telegraf('my bot ID');
bot.command('start', ctx => {
console.log(ctx.from);
bot.telegram.sendMessage(ctx.chat.id, 'Greatings message', {});
bot.telegram.sendMessage(ctx.chat.id, 'request prompt ', {});
})
bot.on('text', (ctx) => {
console.log(ctx.message.text);
const queryOutput = searcher(ctx.message.text);
bot.telegram.sendMessage(ctx.chat.id, queryOutput, {});
});
bot.launch()
}
start();
Here is an error message:
/Users/a.rassanov/Desktop/Fetch/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:218
return Promise.reject(new Error(`Protocol error (${method}): Session closed. Most likely the ${this._targetType} has been closed.`));
^
Error: Protocol error (Page.navigate): Session closed. Most likely the page has been closed.
I'm very new to this, and your help is really appriciated.
I use the following blog to use the playright
login and I need something similar to use for my app, when I use the headless:flase
I see it opens the UI with the user password in however, it doesnt click on the logon button, I adopt the code I try with the following , am I missing something?
(async () => {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://app.com');
await page.fill('input[type="text"]', 'user#test.com');
await page.fill('input[type="password"]', 'Abcd1234!');
// page.click('div[data-testid="LoginForm_Login_Button"]');
page.click('div[id="logOnFormSubmit"]');
}
)();
You are currently using
page.click('div[id="logOnFormSubmit"]');
There is no div in your given code example with that ID, but instead there is a button. You'd need to change that line to reflect this. The final code would look like below.
(async () => {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://app.com');
await page.fill('input[type="text"]', 'user#test.com');
await page.fill('input[type="password"]', 'Abcd1234!');
// page.click('div[data-testid="LoginForm_Login_Button"]');
page.click('button[id="logOnFormSubmit"]');
}
)();
I'm writing a simple script to just check if all the resources load correctly (i check status codes of the responses). I've decided to use puppeteer for this and i wrote
(async () => {
const browser = await puppeteer.launch({headless: false});
const [page] = await browser.pages();
page.on("response", (res) => {
const url = res.url(), status=res.status();
// my functionality goes here
if (url.includes("favicon")) console.log(status, url); // not logging in headless
});
await page.goto("https://stackoverflow.com/", {waitUntil: 'networkidle2'});
await browser.close();
})();
the issue is that if i run my application in headless mode the favicon is missing from the responses, i assume it has something to do with puppeteer not loading a favicon in headless. Any built in functionality or workarounds?
From the lack of a better solution right now I'm evaluating favicons url and manually visiting it
async function checkFavicon(page){
const iconUrl = await page.$eval("link[rel*='icon']", ({href}) => href);
await page.goto(iconUrl);
await page.goBack();
}
I'm trying to make a InstagramBot that logs in and then go to some profile, my code worked yesterday for awhile and than it just stopped working .
I've tried to clone my repository from github, but it does'n work either, sometimes it works again, but if I try to create another function, the code just ignore the line of the code that changes the page.
I've also tried to create a new page and then in this new page use the goto function and it worked, but the account doesn keep logged in
The version of puppeteer that I'm using: 1.16.0
The version of node.js that I'm using: v10.15.3
const puppeteer = require('puppeteer');
const BASE_URL = "https://www.instagram.com/accounts/login/?hl=en&source=auth_switcher";
const instagram = {
browser: null,
page: null,
profile_url: null,
initialize: async (profile) => {
instagram.browser = await puppeteer.launch({
headless: false
})
instagram.profile_url = await "https://www.instagram.com/" + profile;
instagram.page = await instagram.browser.newPage();
await instagram.page.goto(BASE_URL, {waitUntil: 'networkidle2'});
},
login: async(username, password) =>{
await instagram.page.waitFor(1000);
await instagram.page.type('input[name="username"]', username);
await instagram.page.type('input[name="password"', password);
await instagram.page.click('button[type="submit"]');
await instagram.page.waitFor(1500);
await console.log(instagram.profile_url);
await instagram.page.goto(instagram.profile_url, {timeout: 0, waitUntil: 'domcontentloaded'}); // the code just ignore this line
await instagram.page.waitFor(1000);
},
getPhotosLinks: async() => {
console.log("Do something here");
}
}
module.exports = instagram;
It doesn't give any error message, just doesn't work
Replace
await instagram.page.click('button[type="submit"]');
await instagram.page.waitFor(1500);
with
await Promise.all([
instagram.page.click('button[type="submit"]');,
instagram.page.waitForNavigation()
]);
and see if it works
I am trying to grab some html after a input tag button is clicked. I am clicking the button with page.evaluate() since page.click() does not seem to work for an input tag button. I have tried visual debugging with headless:false in the puppeteer launch options to verify that the browser indeed navigated to the point after the button is clicked. I am unsure as to why page.content() returns the html before the button is clicked rather than the html after the event happens.
const puppeteer = require('puppeteer');
const url = 'http://www.yvr.ca/en/passengers/flights/departing-flights';
const fs = require('fs');
const tomorrowSelector = '#flights-toggle-tomorrow'
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.goto(url);
await page.evaluate((selector)=>document.querySelector(selector).click(),tomorrowSelector);
let html = await page.content();
await fs.writeFile('index.html', html, function(err){
if (err) console.log(err);
console.log("Successfully Written to File.");
});
await browser.close();
});
You can click on the label for the radio. Also, you need to wait for some sign of changed state (for XHR/fetch response or new selectors). For example, this code works for me, but you can use any other condition or just wait for some seconds.
const fs = require('fs');
const puppeteer = require('puppeteer');
const url = 'http://www.yvr.ca/en/passengers/flights/departing-flights';
const tomorrowLabelSelector = 'label[for=flights-toggle-tomorrow]';
const tomorrowLabelSelectorChecked = '.yvr-form__toggle:checked + label[for=flights-toggle-tomorrow]';
puppeteer.launch({ headless: false }).then(async (browser) => {
const page = await browser.newPage();
await page.goto(url);
await Promise.all([
page.click(tomorrowLabelSelector),
page.waitForSelector(tomorrowLabelSelectorChecked),
]);
const html = await page.content();
await fs.writeFile('index.html', html, (err) => {
if (err) console.log(err);
console.log('Successfully Written to File.');
});
// await browser.close();
});