Confusion over args for Puppeteer

Confusion over args for Puppeteer - javascript

I am a little confused over the arguments needed for Puppeteer, in particular when the puppeteer-extra stealth plugin is used. I am currently just using all the default settings and Chromium however I keep seeing examples like this:
let options = {
headless: false,
ignoreHTTPSErrors: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-sync',
'--ignore-certificate-errors'
],
defaultViewport: { width: 1366, height: 768 }
};
Do I actually need any of these to avoid being detected? Been using Puppeteer without setting any of them and it passes the bot test out of the box. What is --no-sandbox for?

these are chromium features - not puppeteer specific
please take a look at the following sections for --no-sandbox for example.
https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#setting-up-chrome-linux-sandbox
Setting Up Chrome Linux Sandbox
In order to protect the host
environment from untrusted web content, Chrome uses multiple layers of
sandboxing. For this to work properly, the host should be configured
first. If there's no good sandbox for Chrome to use, it will crash
with the error No usable sandbox!.
If you absolutely trust the content you open in Chrome, you can launch
Chrome with the --no-sandbox argument:
const browser = await puppeteer.launch({args: ['--no-sandbox',
'--disable-setuid-sandbox']});
NOTE: Running without a sandbox is
strongly discouraged. Consider configuring a sandbox instead.
https://chromium.googlesource.com/chromium/src/+/HEAD/docs/linux/sandboxing.md#linux-sandboxing
Chromium uses a multiprocess model, which allows to give different
privileges and restrictions to different parts of the browser. For
instance, we want renderers to run with a limited set of privileges
since they process untrusted input and are likely to be compromised.
Renderers will use an IPC mechanism to request access to resource from
a more privileged (browser process). You can find more about this
general design here.
We use different sandboxing techniques on Linux and Chrome OS, in
combination, to achieve a good level of sandboxing. You can see which
sandboxes are currently engaged by looking at chrome://sandbox
(renderer processes) and chrome://gpu (gpu process).\
. . .
You can disable all sandboxing (for
testing) with --no-sandbox.

Related

How to force IFrames to not share js resource thread with parent frame with same domain [duplicate]

Do web browsers use separate executional threads for JavaScript in iframes?
I believe Chrome uses separate threads for each tab, so I am guessing that JavaScript in an iframe would share the same thread as its parent window, however, that seems like a security risk too.

Recently tested if JavaScript running in a iFrame would block JavaScript from running in the parent window.
iFrame on same domain as parent:
Chrome 68.0.3440.84: Blocks
Safari 11.0.2 (13604.4.7.1.3): Blocks
Safari 15.1 on iOS: Blocks
Firefox 96: Blocks
iFrame on different domain as parent
Chrome 68.0.3440.84: Doesn't block
Safari 11.0.2 (13604.4.7.1.3): Blocks (outdated, but I don't have a macbook)
Safari 15.1 on iOS: Doesn't block
Firefox 96: Doesn't block
Chrome for Android 96: sometimes Blocks and sometimes Doesn't block (There are some complex rules in Chrome for Android that determine when Chrome for Android does and doesn't isolate a process, see chrome://process-internals and chrome://flags)
parent.html:
<body>
<div id="count"></div>
<iframe src="./spin.html"></iframe>
<script>
let i = 0;
let div = document.getElementById("count");
setInterval(() => {
div.innerText = i++;
}, 100);
</script>
</body>
spin.html:
<body>
<button id="spin">spin</button>
<script>
const spin = document.getElementById("spin");
spin.addEventListener('click', () => {
const start = Date.now();
while (Date.now() - start < 1000) { }
})
</script>
</body>

Before chrome came along, all tabs of any browser shared the same single thread of JavaScript. Chrome upped the game here, and some others have since followed suit.
This is a browser implementation detail, so there is no solid answer. Older browsers definitely don't. I don't know of any browser that definitely uses another thread for iframes, but to be honest I've never really looked into it.
It isn't a security risk, as no objects are brought along with the thread execution.

To sum up the other answers: No, iFrames usually run in the same thread/process as the main page.
However, it appears the Chromium team are working on further isolation in this area:
Chromium Issue 99379: Out of process iframes [sorry, link not working - if you can find a link to the issue that works, please let me know]
Design Plans for Out-of-Process iframes

I've had the same question myself this night, before checking for any existing answers. In the project I'm currently working we have to load an iFrame that uses a different framework and I was curios if that iFrame could somehow block the thread and affect my app. The answer is yes, it can.
My test was done in Chrome. In the parent I've loaded a child iFrame. In the parent I've set an interval to console.log a text every amount time. Then in the iFrame I've used a timeout to launch a 'while' that blocks the thread. The answer: the iFrame uses the same thread.
Example:
In the parent:
setInterval(() => {
console.log('iFrame still using the thread');
}, 3000)
In the iFrame:
setTimeout(() => {
console.log('now the thread is not working in the iFrame anymore');
while (true) {
}
}, 10000)

2021 Update:
There is now the Origin-Agent-Cluster header which allows you to request dedicated resources for an iframe. It is currently supported on Chrome (88+) with positive reception from Mozilla and Safari.
Origin-Agent-Cluster is a new HTTP response header that instructs the browser to prevent synchronous scripting access between same-site cross-origin pages. Browsers may also use Origin-Agent-Cluster as a hint that your origin should get its own, separate resources, such as a dedicated process.
[...] For example, if https://customerservicewidget.example.com expects to use lots of resources for video chat, and will be embedded on various origins throughout https://*.example.com, the team maintaining that widget could use the Origin-Agent-Cluster header to try to decrease their performance impact on embedders.
To use the Origin-Agent-Cluster header, configure your web server to send the following HTTP response header: Origin-Agent-Cluster: ?1 The value of ?1 is the structured header syntax for a boolean true value.
More details here: https://web.dev/origin-agent-cluster/

Only chrome & firefox on desktop (no, not mobile) is separating threads.
I've created a small page that run long loop in interval in the main page, and shows an animation both in the main page and in the iframe.
You can go to the site from the browser you wish to check.
If the lower animation (under 'crossorigin') runs without stopping, it's have a separate thread.
https://eylonsu.github.io/browser_thread/

Late on this but... good point, cause iframe js seems to be concurrent in Firefox 16.
Try with alert function (blocking), you'll see dialogs opening together.
You won't see that in Chrome or IE.
iframe js may access the parent window in Firefox 16 as usual, so I can think of possible race conditions arising.

Did some experimenting with this today in Chrome 28 in Ubuntu. Used this command to see Chrome's threads and processes
ps axo pid,nlwp,cmd | grep "chrome"
It looks like Chrome does not spawn new threads or processes for iframes. An interesting note is that it does spawn a new process for the dev tools pane.

2022 Update (Experimental)
Iframes can now be run in parallel in at least Chrome Canary on desktop computers, but this is still experimental.
Download Chrome Canary (https://www.google.com/chrome/canary/).
Navigate to "chrome://flags/".
Enable "Isolated sandboxed iframes".
Create "index.html" with the following content:
<h1>index.html</h1>
<iframe src="index-child.html" sandbox="allow-scripts"></iframe>
<script>
setInterval(() => {
console.log("index.html executed one iteration");
}, 1000)
</script>
Create "index-child.html" with the following content:
<h1>index-child.html</h1>
<script>
setTimeout(() => {
console.log("index-child.html started continuous execution");
while (true) {
}
}, 3000)
</script>
Open "index.html" in the browser.
Verify that the console is consistently logging "index.html executed one iteration". Thus, the iframe is executed in parallel.
Disable "Isolated sandboxed iframes" (or just use another browser) and open "index.html" again. The console is no longer consistently logging "index.html executed one iteration". Thus, the iframe is no longer executed in parallel.
Note: The sandbox attribute on the iframe tag must be correctly set for this to work. Additionally, only one extra process per site is currently supported, which means that multiple iframes will not all run in parallel.
The specific instructions from "chrome://flags/":
Isolated sandboxed iframes
When enabled, applies process isolation to iframes with the 'sandbox' attribute and without the 'allow-same-origin' permission set on that attribute. The current isolation model is that all sandboxed iframes from a given site will be placed into the same process, but alternative models may be introduced in future experiments. – Mac, Windows, Linux, Chrome OS, Fuchsia

For iFrames, no. However if you want to use threads in JavaScript you can use Web Workers, a working html5 draft supported by the new browsers. http://www.w3.org/TR/2009/WD-workers-20091029/

How to enable sharedArrayBuffer in chrome without cross-origin isolation

I have this experiment which I only run on my local machine: I load an external webpage from, for example https://example.com and the with puppeteer I inject a javascript file which is served from http://localhost:5000.
So far there are no issues. But, this injected javascript file loads a WebAssembly file and then I get the following error
Uncaught (in promise) ReferenceError: SharedArrayBuffer is not defined
....
And indeed, SharedArrayBuffer is not defined (Chrome v96) with the result that my code is not working at all (It used to work though). So my question is, how can I solve this error?
Reading more about this, it seems that you can add two headers
res.setHeader('Cross-Origin-Opener-Policy', 'same-origin');
res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp');
which I did for both files without much success. Maybe this will not work given that the page is from a different domain than the injected js and WASM files.
But maybe there is an other solution possible. Here is my command to start chrome
client.browser = await puppeteer.launch({
headless: false,
devtools: true,
defaultViewport: null,
executablePath: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
args: [
'--debug-devtools',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-web-security',
'--allow-running-insecure-content',
'--disable-notifications',
'--window-size=1920,1080'
]
//slowMo: 500
});
I know chrome has too many options, so maybe there is an option for this SharedArrayBuffer issue as well?
Hope someone knows how this works and can help me, Thnx a lot!

In this thread someone suggested to start chrome as follows
$> chrome --enable-features=SharedArrayBuffer
meaning I can add --enable-features=SharedArrayBuffer to my puppeteer config!

Peter Beverloo made an extensive list of Chromium command line switches on his blog a while back.
There are lots of command lines which can be used with the Google Chrome browser. Some change behavior of features, others are for debugging or experimenting. This page lists the available switches including their conditions and descriptions. Last automated update occurred on 2020-08-12.
See # https://peter.sh/experiments/chromium-command-line-switches/
If you're looking a specific command it will be there, give it a shot. Tho I'm pretty sure cross-origin restrictions were implemented specifically to prevent what you're trying to do.

How to open browser with open console [duplicate]

I want to ask how to open the Chrome developer Console during selenium tests execution. Currently, when tests are executing, and I open the console manually hitting F12, the tests stop responding immediately and fails after some time.
Can anyone tell me how can I initiate my tests with developer console opened, so I can catch/observe the console errors that occur during test execution.

Use --auto-open-devtools-for-tabs:
This flag makes Chrome auto-open DevTools window for each tab. It is intended to be used by developers and automation to not require user interaction for opening DevTools.
Source
How to use

Note: this answer does not apply to current versions of Chrome.
You can't. The Chrome driver uses the Chrome remote debugging protocol to communicate with the browser. This is the same protocol that the developer console uses also. Unfortunately, Chrome is designed so that only one client can be attached using the protocol at a time, so that means either the developer tools, or the driver, but not both simultaneously.

Have you tried simulating the key press events for the shortcut of opening the dev tools in Chrome?
String openDevTools = Keys.chord(Keys.ALT, Keys.CONTROL, "i");
driver.findElement(By.ByTagName("body")).sendKeys(openDevTools);
This is not ideal and in a rigorous testing regime you would need platform detection to ensure you are covering both Mac and Windows. I would absolutely recommend avoiding this (even if it works), but it's a possible as a work-around if you really must.
I have a feeling it may also lose focus of the window itself if you do this. If this is the case, you'd need something like the following: -
String parentHandle = driver.getWindowHandle(); // get the current window handle
// do your dev tool stuff here
driver.switchTo().window(parentHandle); // switch back to the original window
Hope this helps.
Useful link if it does get you anywhere: How to handle the new window in Selenium WebDriver using Java?
Edit: Just re-read the question and don't think this will work anyway. Your unit tests should capture errors in the logic of your code. Your selenium tests should only test user journeys and capture errors when the user journey is cut short. You should never be testing code logic/error throwing through a selenium test.

This is working for me in webdriver.io (wdio.conf.js)
const configs = {
chrome : {
maxInstances: "5",
browserName: "chrome",
chromeOptions: {
args: ['--window-size=1280,800', '--auto-open-devtools-for-tabs'],
binary: '/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome'
}
},
firefox : {
maxInstances: "5",
browserName: "firefox"
},
headless : {
maxInstances: "5",
browserName: "chrome",
chromeOptions: {
args: ['--headless', '--disable-gpu', '--window-size=1280,800'],
binary: '/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome'
}
},
}

WebdriverIO automation testing when JavaScript is disabled

Is there a way in WebdriverIO framework to launch the browser with JavaScript disabled?
I want to automate a scenario with JavaScript being disabled. But, when I manually disable the JavaScript in Chrome, or Firefox and run the WDIO scripts, the browser always opens with JavaScript enabled.

Not anymore. (but you have a workaround below)
This used to be easily achieved using the chromium switches. But considering all driver implementations (chromedriver, geckodriver, etc.) now require JavaScript to drive your spawned browser instance, it's no longer possible.
It was achieved via chromeOptions arguments/switches:
capabilities: [{
maxInstances: 2,
browserName: config[env].browser,
chromeOptions: {
args: ['--disable-javascript',
'--disable-javascript-harmony-shipping'
]
}
}]
!!! LATER EDIT: You can achieve this by loading a custom profile.
Start your WebdriverIO test case, but add a browser.debug() after you load your page;
In the address bar, type chrome://settings/content and in the modal, check the Do not allow any site to run JavaScript. Click Done. Now go to a random page and notice JavaScript has been blocked on it:
Now we have to save this custom profile and load it each time you start a WebdriverIO test case. Type chrome://version in your address bar. Notice the Profile Path value. Copy the content of the folder (e.g.: For C:\Users\<yourUserName>\Desktop\scoped_dir18256_17319\Default, copy the scoped_dir18256_17319 folder on your Desktop). This folder contains all the actions (search history, extensions installed, accounts saved... in our case, JavaScript disabled option) on THIS current instance.
Now all we need to do, is add the path to that folder in your wdio.config.js file as a chromeOptions argument:
chromeOptions: {
//extensions: ['./browserPlugins/Avira-SafeSearch-Plus_v1.5.1.crx'],
args: [ '--user-data-dir=/Users/<yourUserName>/Desktop/scoped_dir18256_17319'
]
}
Now all you have to do is run your test cases with this custom profile and JavaScript will be blocked on all websites. Hope this is the behavior you are looking for as there is no other way to achieve this behavior.
Cheers!

Chrome Packaged app, Always on top window

i am writing a text editor, i need the app window be always on top when switching to browser or e-book reader software. as i know ,for windows users, chrome doesn't provide any solution. is there any parameter to send when creating window to make window always on top?
or can i provide any button in app to turn this feature on or off?
Code i use to create window in bg.js:
var launch = function () {
chrome.app.window.create('index.html', {
type: 'shell',
width: 440,
height: 680,
minWidth: 440,
maxHeight: 680,
id: 'paat-start'
});
};
chrome.app.runtime.onLaunched.addListener(launch);
chrome.commands.onCommand.addListener(launch);
thank for any suggestion.

As Ben Wells mentioned above, this feature is now available in the stable release (either v33 or v34) via the alwaysOnTop option in chrome.app.windows.create. Note that special permissions are required in the manifest.json file. Example:
background.js
chrome.app.window.create('window.html', {
alwaysOnTop: true,
}, function (appWindow) {
// Window created and will remain on top of others.
// Change the property programmatically via:
//appWindow.setAlwaysOnTop();
});
manifest.json
"permissions": [
"alwaysOnTopWindows"
]
This seems to have been added in issue 26427002, gone stable in issue 159523002 and issue 48113024 thanks to the community!
I had looked into this a while back and wanted to catalog my findings since historically there were some discrepancies in the documentation which previously stated the name of the required permission was alwaysOnTop, but using this caused a "permission is unknown" error.
Reading through the original proposal for this feature lead me to issue 326361 which mentions the permission setting is actually called alwaysOnTopWindows. Using this one back then, however, yielded a "requires Google Chrome dev channel or newer" error (probably since the feature wasn't yet stable).
I did find it peculiar from browsing the source code, these two permissions might be aliases of each other, but that might be because I don't fully understand the Chromium codebase.

chrome.app.window.create does support a boolean alwaysOnTop option in more recent versions of Chrome. The feature is currently in beta channel on most platforms and at least dev channel on the rest.

We Keep Coding

JavaScript is the programming language of the Web.