How to Bypass Puppeteer Blocking Systems

How to Bypass Puppeteer Blocking Systems - javascript

I would like to open https://krunker.io/ through Puppeteer. However, whenever I open up Krunker.io through Puppeteer, it blocks me, saying "Puppeteer Detected". Is there an easy workaround to this?
One answer I got was this:
You need to make a matchmaker seek game request to get a websocket URL, and then you connect to it and simulate being a client
As I started coding Node.js and in Javascript just under 5 weeks ago, I am not sure how to do this. (I asked, and he said "just do it". It's probably not that hard, I am just not that good at Node). Here is all of the answers I came across:
i just made my rce code in assembly and then link it with chrome executable and then using a hex dumper replace the rce function call bytes with a reference pointer to my own code.
also you need to make sure your rce code has the correct signature otherwise the rebuilt chrome executable will crash as soon as it reaches your rce runtime code
you can also append a EYF_33 byte after the ACE_26 bytes to grant GET requests to make it possible to create 2 PATCH requests at a time with different structures makiong it possible to create fully independent websocket connection to the krunker api and send more AES authorization messages at a time
Not sure what this means ¯\_(ツ)_/¯.
Is there a simple way to do this, or better yet, a step-by-step tutorial on how to do this (on a mac)?
Thanks :)

In most cases it is detecting by user agent. Simplified you can use puppeteer-extra and the plugin puppeteer-extra-plugin-stealth to change your user agent.

Related

Get a javascript variable from a web page without interaction/heedlessly

Good afternoon!
We're looking to get a javascript variable from a webpage, that we are usually able to retrieve typing app in the Chrome DevTools.
However, we're looking to realize this headlessly as it has to be performed on numerous apps.
Our ideas :
Using a Puppeteer instance to go on the page, type the command and return the variable, which works, but it's very ressource consuming.
Using a GET/POST request to the page trying to inject the JS command, but we didn't succeed.
We're then wondering if there will be an easier solution, as a special API that could extract the variable?
The goal would be to automate this process with no human interaction.
Thanks for your help!

Your question is not so much about a JS API (since the webpage is not yours to edit, you can only request it) as it is about webcrawling / browser automation.
You have to add details to get a definitive answer, but I see two scenarios:
the website actively checks for evidence of human browsing (for example, it sits behind CloudFlare and has requested this option); or the scripts depend heavily on there being a browser execution environment available. In this case, the simplest option is to automate a browser, because a headless option has to get many things right to fool the server or the scripts. I would use karate, which is easier than, say, selenium and can execute in-browser scripts. It is written in Java, but you can execute it externally and just read its reports.
the website does not check for such evidence and the scripts do not really require a browser execution environment. Then you can simply download everything requires locally and attempt to jury-rig the JS into executing in any JS environment. According to your post, this fails; but it is impossible to help unless you can describe how it fails. This option can be headless.

You can embed Chrome into your application and instrument it. It will be headless.
We've used this approach in the past to copy content from PowerPoint Online.
We were using .NET to do this and therefore used CEFSharp.

Phantom.js / Casper.js with rotating proxy?

I have a simple goal: load webpages with either phantom.js (out of the box) or casper.js (nice and easier) but using proxy and rotate it from a list if current one is bad (i.e. webpage loads fail or something like that).
I know casper.js has --proxy param but it dictates the user to specify only ONE proxy and use it during runtime.
Question #1 is: how to rotate proxy on the fly programmatically?
I did some research and found this node-requester but it's not integrated with casper.js. I tried to extract out just the proxy feature in the code but didn't really understand how it works in the nutshell (I am not that smart I guess).
So question #2: is there some simple implementation of proxy rotation that works with either phantom.js or casper.js?
I prefer to use the fancy casper.js though but will go down with phantom.js bare as well.

I had the same issue a while back, I worked with PhantomJS. The solution we ended up with was running PhantomJS as a child process of a larger Java/Scala server which then handled failures and assigned the different proxies when needed (by rerunning with different params in the --proxy arg).

I had the same issue with Puppeteer, though idea is the same.
I started local Node.js proxy running through https://www.npmjs.com/package/gimmeproxy-request and pointed Puppeteer instance to it.
Using local proxy server I was able to verify when the page wouldn't load and retry the request.

Urls for REST methods

I'm trying to get my head around the correct way to represent the following methods in a restful manner...
Say I have x amount of servers and they run Windows, Linux and Unix.
Each server can have a ping, shutdown and user action ran against them.
The API has no knowledge of the server so a request would have to provide the IP address along with server type, and action type (which it does know about).
With that in mind, these simple URLs come to mind but aren't restful in the slightest:
/192.168.1.3/Linux/ping
/192.168.1.5/windows/shutdown
Should I go down the restful route? Or is the above ok for a simple web API?
If restful would it look this?
GET /servertypes/{servertypeId}/actions/{actionId}?serverip=192.168.1.4

This seems to make more sense to me:
GET /servertypes/{servertypeId}/{serverip}/{action}

In my opinion, you should rethink your design.
You're effectively overriding the semantics of HTTP methods by placing the command names in the URLs, which are meant to represent scoping information. GET requests should not lead to changes of state. Ignoring the underlying protocol is not considered RESTful.
What you have here looks like an RPC model that doesn't really fit the principles of REST. It seems to me that you're trying to hammer a square peg into a round hole.
You should also ask yourself if it's really necessary to expose the underlying OS. Do you want to call system-specific commands? Should it be possible to run any executable or just a couple of common ones? Should you care if it's grep on a UNIX machine or findstr on a Windows box? Does the client care if they use a Windows or a Linux machine?
You could go for a simple pattern similar to what Uzar Sajid suggested. It would probably work alright. Just don't call it RESTful cause it's not.

Live/Hot node.js server source code editing

I have a problem.
My problem is that every time I make changes to my node.js server code, I have to restart the entire thing to see the results.
Instead of this, I remember seeing something about being able to pipe chrome directly into the server's source code, and "Hot edit" it. That is to say, changes to the code immediately take effect and the server keeps runnings.
I hope that I am being clear.
It would be a real time saver to directly edit code (especially for small things) while the server is actually running and have it instantly take effect.
Does anyone know how to do this?

See my answer to my own question that answers this question: https://stackoverflow.com/a/11157223/813718
In short, there's a npm module named forever which does what you want. It can monitor the source files and restart the node instance when a change has been detected.

I do not quite understand the pipe-to-chrome part... But there seems to be a node module which listens to changes of userdefined files and restarts the server automatically:
How can I edit on my server files without restarting nodejs when i want to see the changes?
https://github.com/isaacs/node-supervisor

Yes, there is such thing.
Just take advantage of the evil-so-called eval() function of Javascript.
(You might need something like websocket to connect with the server and alert it about the change)
I am on the half-way of implement the same feature, but there are a lot of things to consider if you want to reserve the server states (current values of variables for example)
ABOUT THE pipe-to-chrome part
May be this was what you mentioned?
https://github.com/node-inspector/node-inspector/wiki/LiveEdit

is json the answer to this: python program will talk and javascript will listen?

the same problem haunting me a month ago is still haunting me now. i know ive asked several questions regarding this on this site and i am truly sorry for that. your suggestions have all been excellent but the answer is still elusive. i now realize that this is a direct result of me not being able to phrase my question properly and for that i am sorry.
to give you guys a generalized view of things, here i go: the situation is like this, i have 2 server side scripts that i want to run.
a python program/script that continuously spouts some numbers
based on the output from that python script, a javascript script will perform some action on a webpage (e.g., change background color, display alert message, change some text)
ive studied the replies to my previous posts and have found that what i want to accomplish is more or less accomplished by json. it is my understanding that json transforms 'program-specific' variables into a format that is more 'standard or general or global'.
two different programs therefore now have the means to 'talk' with each other because they are now speaking the same 'language'.
the problem is then this, how do i actually facilitate their communication? what is the 'cellphone' between these server side scripts? do they even need one?
thank you!

If I understand what you're asking, the "cellphone" is TCP/IP. The javascript is not server-side; it runs on the client side, and alters what the client's browser displays based on json data that it downloads from the server -- data that in this case is generated by Python.
This question provides a relevant example, though it's a bit technical: JSON datetime between Python and JavaScript
Here's a very basic tutorial that explains how to create a dynamic webpage using python and javascript. It doesn't appear to use json, but it should familiarize you with the fundamentals. Once you understand what's there, using json to transport more complicated data should be fairly straightforward.
http://kooneiform.wordpress.com/2010/02/28/python-and-ajax-for-beginners-with-webpy-and-jquery/

I assume you mean: Python is on the web server, and Javascript is running in the client's web browser.
Because browsers are all different (IE6 is terrible, Chrome is great), there are a huge number of ways people found to "hack" this "cellphone" into place. These techniques are called AJAX and COMET techniques. There is no one "cellphone", but a whole bunch of them! Hopefully, you can find a library to select the right technique for the browser, and you just have to worry about the messages.
Comet is harder to do, but lets the server "push" messages to the client.
Ajax can be easier - you just periodically "pull" messages from the server.
Start with Ajax, then look at comet if you really need it. Just start by have the client (javascript) make a "GET" request, to see if the number has changed.

I don't know Javascript or json, but...
if you've ever seen an Unix-like operating system, you know about pipes. Like program1 | program2 | program3 ... Why don't you just connect Python and Javascript programs with pipes? The first one writes to stdout, and the next one reads from stdin.

This probably isn't the answer that you are looking for, and without links to your previous posts, I don't have much to go on, but nonetheless...
javascript is client side. I can interpret your question 2 different ways...
Your python script is running on your computer, and you want a script to actually alter your current browser window.
Not too sure, but writing a browser plugin may be the answer here.
Your python script is running on the server, and as a result of the script running, you want the display of your site to be changed for viewing persons.
In this case, you will could use ajax polling (or similar) on your site. Have your site be polling the server with ajax, call a server method that checks the output of the script (maybe written to a file?), and see if it has changed.

When 2 process need to communicate, they need to decide of a common/shared way to express things and a protocol to exchange those things.
In your case, since one of the processes is a browser, the protocol of choice is http. So the browser needs to do an http request or regular http request to your python process.
This python process Will need in Some way or another to be exposed via http.
There are several ways to build a web server in python. You should read this article : http://fragments.turtlemeat.com/pythonwebserver.php as a jumpstart.
Once you have this, your browser Will be able to issue HTTP GET requests to your server and your server can reply with a string.
This string can be whatever you like. Nevertheless if your answer contains structured data it can be a good start to use the XML notation or the json notation.
Json (stands for Javascript object notation) is very easy to use in javascript and this is why many people advised you to choose this notation.
I hope this will help you
Jérome wagner

We Keep Coding

JavaScript is the programming language of the Web.

How to Bypass Puppeteer Blocking Systems - javascript

In most cases it is detecting by user agent. Simplified you can use puppeteer-extra and the plugin puppeteer-extra-plugin-stealth to change your user agent.

Related

Get a javascript variable from a web page without interaction/heedlessly

Phantom.js / Casper.js with rotating proxy?

Urls for REST methods

Live/Hot node.js server source code editing

is json the answer to this: python program will talk and javascript will listen?

Categories

Resources