So I'm trying to test a recaptcha implementation by writing my own spambot using React and Puppeteer. I got the script ready to do a single form submission after executing the script, but what I'm actually hoping for is to have my script loop through a database with form submission details, and then reiterate every row of the csv file until it's depleted the database.
So far I have the following script:
const puppeteer = require('puppeteer');
// Server Authentication
const username = "username";
const password = 'password';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Pass server side authentication needed for website when accessing url
await page.authenticate({ username, password});
// Load page with form and take screenshot to confirm page has been reached
await page.goto('https://pagewiththeformimtryingtosubmit.com');
await page.screenshot({path: 'example2.png'});
// Click away cookie pop-up banner, take screenshot after
page.keyboard.press('Escape');
await page.screenshot({path: 'cookiebotcleared.png', fullPage: true});
// Fill in first form fields, screenshot to see if it works
await page.type('#firstname', 'Somename', {delay:500});
await page.type('#lastname', 'Somelastname', {delay:500});
await page.type('#telephone', '123456789', {delay:500});
await page.type('#email_address', 'somename.lastname#gmail.com', {delay:500});
await page.type('#password', 'Chooseapassword', {delay:500});
await page.type('#password-confirmation', 'Chooseapassword', {delay:500});
await page.click('.consent', {delay:500});
await page.click('.submit', {delay: 500});
// Take screenshot after form submission to see if it has worked
await page.screenshot({path: 'formfillout.png', fullPage: true}, {delay: 500});
await browser.close();
})();
What I'm trying to do, is take a CSV with all the data I've randomized, and then have this script run but take elements from the csv for the various form inputs and loop that over.
I've tried working with CSVToJSON in order to process the csv database into objects that I should then be able to use in my code:
const CSVToJSON = require('csvtojson')
CSVToJSON().fromFile('formbot_database.csv')
.then(users => {
console.log(users);
console.log(users.firstname);
}).catch(err => {
console.log(err);
});
Here's where my first troubles start: I want to take the row headers of my database to map them to variables, so I can process those variables within my script. I first tried users.firstname, but when I console log that, it gives me undefined.
If anyone has any suggestion on how I can work this through, that'd be great. I've tried visiting multiple resources but can't figure it out I'm afraid.
Thanks in advance!
Related
I use Playwright framework on JS to autofill unknow Google form (which means i dont know Xpath to specify the answer, i just know to question. In my situation, form ask about address, name, size, phone number).
const { webkit } = require('playwright');
const URL = 'https://forms.gle/B4r6qZKdyxZCApTWA';
(async () => {
const browser = await webkit.launch({ headless: false });
const page = await browser.newPage();
await page.goto(URL);
await page.fill('input:below(:has-text("Họ và tên"))','name');
await page.fill('input:below(:has-text("Số điện thoại"))','phone number');
await page.fill('input:below(:has-text("Địa chỉ"))','Address');
await page.fill('input:below(:has-text("CMND"))','id');
await page.fill('input:below(:has-text("Game"))','LOL');
await page.pause();
await browser.close();
})();
URL: https://forms.gle/B4r6qZKdyxZCApTWA
The name and number field is fine but in the address field, things get mess up. It skip and jump to the id field and fill 'address'->'LOL'->'id'
The answer field of Google Form has 2 kind: input and textarea. I just need to change it. But any better way to do more "general" to fit that kind of GForm?
https://www.nhtsa.gov/ratings
I have been trying to make puppeteer select an option from the dropdown menu, so I may scrape information from the website above.
1.I'm having issues declaring a correct selector for puppeteer to understand.
What I mean is after telling puppeteer to click "manufacturer" at the end of the paragraph. I can't seem to click( or maybe select??) an option.
The default option in this dropdown menu is select a manufacturer
2.I also would like to know how I may select the 2nd 3rd and 4th option without hard coding it in.
I haven't even begun scraping information /sad ;(
const puppeteer = require('puppeteer');
async function spider() {
try {
let browser = await puppeteer.launch({ headless: false});
let page = await browser.newPage();
await page.goto('https://www.nhtsa.gov/ratings');
await page.click('a[data-target=".manufacturer-search-modal"]');
await page.click('select');
await page.click('option[value="AUDI"]');
} catch(error) {
console.log(error)
await browser.close();
}
}
export default spider
I think there are a few issues. One, sometimes that page has a popup for a survey, which you may need to close first. Two, you need to wait a bit between clicking the .manufacturer-search-modal link and trying to interact with the select box, because the options in the box aren't populated immediately (maybe it's making a request to the server to get the list of options). Three, I think select boxes are a little special and clicking on them doesn't work, but you can use page.select instead. Putting that all together, plus some ugly code for selecting items by number:
async function spider() {
try {
let browser = await puppeteer.launch({ headless: false});
let page = await browser.newPage();
await page.goto('https://www.nhtsa.gov/ratings');
try {
// Give the 'take our survey' box a chance to pop up, and close it if it does
await page.waitFor('.acsCloseButton', { timeout: 1000 });
await page.click('.acsCloseButton')
} catch {
}
await page.click('a[data-target=".manufacturer-search-modal"]');
// wait for the options to be populated
await page.waitFor('option:nth-child(2)');
// search for AUDI
await page.select('select', 'AUDI');
await page.click('.manufacturer-search-submit');
// select third element in drop-down
// there's probably a better way to do this
const options = await page.$$('option');
const properties = await options[2].getProperties();
const value = await properties.get('_value').jsonValue();
await page.select('select', value);
await page.click('.manufacturer-search-submit');
} catch(error) {
console.log(error)
await browser.close();
}
}
I was trying to query a website: const url = "https://personal.vanguard.com/us/FixedIncomeHome" with the hope to automate some functionality within puppeteer.
I noticed if i create a screen shot: page.screenshot("preclick.png") it will show the page data with tabs. When i try to follow it up with a query, it seems to not return the second tab (denoted by the following selector: a[container="CD"]
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto(url, {waitUntil: 'networkidle2'})
page.screenshot("start.png")
page.evaluate( () => {
document.querySelectorAll("a[container='CD']")[0].click()
})
///...
and i dont really know why this is the case. Ideally, i am trying to click CD and then click an empty search. I noticed that since session ids are tracked, I wanted to do this as a sort of E2E test in order to get the resulting table data.
I see that the Content of tab etc is dynamically loaded, so somehow there is an issue with the page being able to query.
I was attempting something else to see what would occur, waiting for the tag to appear, BUT it would just timeout after 30 seconds:
await page.waitForSelector("a[container='CD']").then( async resolve => {
page.execute( () => document.querySelector("a[container='CD']").click() );
});
I dont know why the screenshot shows the HTML, but when attempting to query for it from within execute it fails. It doesnt make sense to me why this occurs. Ideally, I want to click CD tab, then i want to click Search, then i want to loop through the 20 results in the table.
EDIT I was noticing that evaluate was not querying the component correctly because of an iframe. If i want to develop e2e testing though, i assumed there was a way to somehow get a reference to the button and click it, or simulate a click.*
You can get the iframe from a selector. As the iframe has the ID TWRIFrame, you can wait for that selector, then get the contentFrame from that element.
Once you have the frame, the frame class has almost the same functions as the page class, e.g. click.
Notice that, as that iframe is from other domain, with the --disable-features=site-per-process flag.
const browser = await puppeteer.launch({headless: false, args: ['--disable-features=site-per-process']});
const page = await browser.newPage();
await page.goto('https://personal.vanguard.com/us/FixedIncomeHome', {waitUntil: 'networkidle2'});
await page.screenshot("start.png");
await page.waitForSelector('#TWRIFrame');
const frameElement = await page.$('#TWRIFrame');
const frame = await frameElement.contentFrame();
await frame.click("a[container='CD']");
I'm trying to automate certain tasks for work. We have a portal that requires you to sign in through Google. I've created a Puppeteer instance that navigates to the Google auth page, types in my email and password, then stores the cookies so I can navigate through and manipulate the portal.
This works perfectly on my local environment, but I've deployed it to Heroku and Google adds a sign in challenge. After entering the password, I'm given the 'Verify it's you' page that says 'This device isn't recognized' and asks me to complete 2-FA auth.
I know I can't turn off 2-FA, so what would be the best way to bypass this?
Alternatively, is there an easier way to log in to a website guarded by Google auth and store the session cookies?
Here's my puppeteer code, any help would be much appreciated:
async function getCookies() {
const browser = await puppeteer.launch({
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu'
]
})
const page = await browser.newPage()
await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36')
await page.goto(process.env.URL)
await page.waitForSelector('#identifierId')
await page.type('#identifierId', process.env.EMAIL, { delay: 5 })
await page.click('#identifierNext')
await page.waitForSelector('#password input[type="password"]', { visible: true });
await page.type('#password input[type="password"]', process.env.PASS, { delay: 5 })
await page.click('#passwordNext')
await page.waitFor(3000)
const cookies = await page.cookies()
await browser.close()
return cookies
}
Not possible I am afraid and not the answer you want.
I know I can't turn off 2-FA, so what would be the best way to bypass
this?`
If it was possible to bypass then it kinda opens the door for hackers as Two-factor authentication works as an extra step in the process, a second security layer, that will reconfirm your identity. Its purpose is to make attackers' life harder and reduce fraud risks!
I would have added an Android app in the mix too. You can set up the 2FA with SMS codes and an Android app with SMS read permission can read the SMS and connect with a backend.
The backend can send push message, probably using Firebase Cloud Messaging to the local Node.js instance where the headless Chrome is running to input it in the 2FA screen.
I don't think there's any other way to do it. Although I would recommend not doing it, since it may open some backdoor for security issues.
I is actually possible using Twilio API within Puppeteer to programatically receive the SMS code. You will have to setup a special Google account for this to work with the Twilio number as mobile phone OR change your current Google account primary mobile number for the Twilio number, and use your regular number as a secondary contact in your Google account info.
My working solution (needs some refactoring)
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false, // for debugging only
ignoreHTTPSErrors: true // This happens when you use a self signed certificate locally
})
const page = await browser.newPage()
await page.setViewport({ width: 1280, height: 800 })
await page.goto('https://myawesomesystem/loginFrm01')
const navigationPromise = page.waitForNavigation()
// Clicks on the login button
const googleLoginButtonSelector = 'body > section > ... > div'
await page.waitForSelector( googleLoginButtonSelector )
await page.click( googleLoginButtonSelector )
// wait for the google oauth page to open
const googleOAuthTarget = await browser.waitForTarget( target => {
// console.log( target.url() ); // debugging
return target.url().indexOf('https://accounts.google.com/signin/oauth/identifier') !== -1
})
const googleOAuthPage = await googleOAuthTarget.page()
await googleOAuthPage.waitForSelector('#identifierId')
await googleOAuthPage.type('#identifierId', CRED.user, { delay: 5 } )
await googleOAuthPage.click('#identifierNext')
await googleOAuthPage.waitForSelector('input[type="password"]', { visible: true })
await googleOAuthPage.type('input[type="password"]', CRED.pass )
await googleOAuthPage.waitForSelector('#passwordNext', { visible: true })
await googleOAuthPage.click('#passwordNext')
await navigationPromise
// HERE:
// the user has been authenticated
// or login window was closed
// or whatever else, please check
await browser.close()
})()
Using puppeteer, how could you programmatically submit a form? So far I've been able to do this using page.click('.input[type="submit"]') if the form actually includes a submit input. But for forms that don't include a submit input, focusing on the form text input element and using page.press('Enter') doesn't seem to actually cause the form to submit:
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://stackoverflow.com/', {waitUntil: 'load'});
console.log(page.url());
// Type our query into the search bar
await page.focus('.js-search-field');
await page.type('puppeteer');
// Submit form
await page.press('Enter');
// Wait for search results page to load
await page.waitForNavigation({waitUntil: 'load'});
console.log('FOUND!', page.url());
// Extract the results from the page
const links = await page.evaluate(() => {
const anchors = Array.from(document.querySelectorAll('.result-link a'));
return anchors.map(anchor => anchor.textContent);
});
console.log(links.join('\n'));
browser.close();
})();
If you are attempting to fill out and submit a login form, you can use the following:
await page.goto('https://www.example.com/login');
await page.type('#username', 'username');
await page.type('#password', 'password');
await page.click('#submit');
await page.waitForNavigation();
console.log('New Page URL:', page.url());
Try this
const form = await page.$('form-selector');
await form.evaluate(form => form.submit());
For v0.11.0 and laters:
await page.$eval('form-selector', form => form.submit());
I was scraping a SPA, and I had to use waitForNetworkIdle since the form submit was not triggering a page navigation event. Instead it submitted data to the server, and updated the DOM of the page which was already loaded.
const [response] = await Promise.all([
page.waitForNetworkIdle(),
page.click('#form-submit-button'),
]);
When to use waitForNetworkIdle
I suspect that if you open a normal web browser, submit the form, and look to see if the page URL has changed or not. If it has not changed, you should use waitForNetworkIdle.
Also, take this advice with a grain of salt, I've only been using puppeteer for an hour.