Phantom.js / Casper.js with rotating proxy?

Phantom.js / Casper.js with rotating proxy? - javascript

I have a simple goal: load webpages with either phantom.js (out of the box) or casper.js (nice and easier) but using proxy and rotate it from a list if current one is bad (i.e. webpage loads fail or something like that).
I know casper.js has --proxy param but it dictates the user to specify only ONE proxy and use it during runtime.
Question #1 is: how to rotate proxy on the fly programmatically?
I did some research and found this node-requester but it's not integrated with casper.js. I tried to extract out just the proxy feature in the code but didn't really understand how it works in the nutshell (I am not that smart I guess).
So question #2: is there some simple implementation of proxy rotation that works with either phantom.js or casper.js?
I prefer to use the fancy casper.js though but will go down with phantom.js bare as well.

I had the same issue a while back, I worked with PhantomJS. The solution we ended up with was running PhantomJS as a child process of a larger Java/Scala server which then handled failures and assigned the different proxies when needed (by rerunning with different params in the --proxy arg).

I had the same issue with Puppeteer, though idea is the same.
I started local Node.js proxy running through https://www.npmjs.com/package/gimmeproxy-request and pointed Puppeteer instance to it.
Using local proxy server I was able to verify when the page wouldn't load and retry the request.

Related

Get a javascript variable from a web page without interaction/heedlessly

Good afternoon!
We're looking to get a javascript variable from a webpage, that we are usually able to retrieve typing app in the Chrome DevTools.
However, we're looking to realize this headlessly as it has to be performed on numerous apps.
Our ideas :
Using a Puppeteer instance to go on the page, type the command and return the variable, which works, but it's very ressource consuming.
Using a GET/POST request to the page trying to inject the JS command, but we didn't succeed.
We're then wondering if there will be an easier solution, as a special API that could extract the variable?
The goal would be to automate this process with no human interaction.
Thanks for your help!

Your question is not so much about a JS API (since the webpage is not yours to edit, you can only request it) as it is about webcrawling / browser automation.
You have to add details to get a definitive answer, but I see two scenarios:
the website actively checks for evidence of human browsing (for example, it sits behind CloudFlare and has requested this option); or the scripts depend heavily on there being a browser execution environment available. In this case, the simplest option is to automate a browser, because a headless option has to get many things right to fool the server or the scripts. I would use karate, which is easier than, say, selenium and can execute in-browser scripts. It is written in Java, but you can execute it externally and just read its reports.
the website does not check for such evidence and the scripts do not really require a browser execution environment. Then you can simply download everything requires locally and attempt to jury-rig the JS into executing in any JS environment. According to your post, this fails; but it is impossible to help unless you can describe how it fails. This option can be headless.

You can embed Chrome into your application and instrument it. It will be headless.
We've used this approach in the past to copy content from PowerPoint Online.
We were using .NET to do this and therefore used CEFSharp.

How to Bypass Puppeteer Blocking Systems

I would like to open https://krunker.io/ through Puppeteer. However, whenever I open up Krunker.io through Puppeteer, it blocks me, saying "Puppeteer Detected". Is there an easy workaround to this?
One answer I got was this:
You need to make a matchmaker seek game request to get a websocket URL, and then you connect to it and simulate being a client
As I started coding Node.js and in Javascript just under 5 weeks ago, I am not sure how to do this. (I asked, and he said "just do it". It's probably not that hard, I am just not that good at Node). Here is all of the answers I came across:
i just made my rce code in assembly and then link it with chrome executable and then using a hex dumper replace the rce function call bytes with a reference pointer to my own code.
also you need to make sure your rce code has the correct signature otherwise the rebuilt chrome executable will crash as soon as it reaches your rce runtime code
you can also append a EYF_33 byte after the ACE_26 bytes to grant GET requests to make it possible to create 2 PATCH requests at a time with different structures makiong it possible to create fully independent websocket connection to the krunker api and send more AES authorization messages at a time
Not sure what this means ¯\_(ツ)_/¯.
Is there a simple way to do this, or better yet, a step-by-step tutorial on how to do this (on a mac)?
Thanks :)

In most cases it is detecting by user agent. Simplified you can use puppeteer-extra and the plugin puppeteer-extra-plugin-stealth to change your user agent.

How can I debug javascript between client and server seamlessly

Question regarding javascript debugging:
We have a mobile app, made with JavaScript and HTML. This app is running on the mobile platform (BlackBerry 10, Android, iOS) inside a web container.
There is another part of this application running on the remote server. This part is implemented also with JavaScript running on Node.js.
Assume there is some communication established between the client (JS on mobile) and the server (JS on Node.js) sides using e.g. REST.
The question I would like to know is if it is possible to seamlessly debug both sides at the same time.
E.g. if I have a breakpoint on the mobile app client side, how would I be able to debug all the way to JS on the server side, if it’s possible at all.
Any suggestions would help.

You can use node-inspector on the server, then you'll have TWO instances, but the same debugger toolset.
If you're stepping through code client, and want to debug "into" the server, you must place a breakpoint in the server debugger before making the GET/POST from the client.
TIP: Get a three (at least two) monitor setup.

Using the node inspector is a good strategy for doing debugging, but it would also be good to supplement the debugger with logging.
Debugging will let you step through an event chain and examine values of variables and output of functions in that chain, but in production it won't give you insight into the steps or production conditions that lead to errors users are experiencing (i.e. Why do I have a null variable? Why is my response message wrong?) If you use debugging without logging you'll be trying to replicate the conditions in production that are causing an error, which is an inefficient (and usually futile) way of doing things.
I'd say the best way to implement what you want to do (solve issues taking into account client & server events that happen concurrently) is to implement an open source logging library like Log4j on both your server and your client and configure an appender to send the logs to a log aggregator like Loggly which gives you tools to analyze both client & server logs in the same place (rather than extracting log files from both devices manually).
Once you've done this, you'll be able to distribute your application out to testers and you'll be able to see what actions, application logs, and hardware/network conditions surround certain issues. With this data in hand you'll know a lot better what leads to certain bugs and can use that information to much more effectively use node-inspector to debug them.

Running PhantomJS as a server

I'm looking into using PhantomJS to generate static html from a dynamic AngularJS app that can be indexed by google. What I want to do is to start a PhantomJS server that sits behind a proxy and gets the ?escaped_fragment requests. PhantomJS appears to be (mainly) a command line tool (I have read the FAQ explaining why it's not a regular node module), and although I have found a couple of nodejs bridges for it, they appear to be a little bit unreliable.
Therefore, I'm looking into running PhantomJS with an embedded HTTP server. I have seen some examples of a built in webserver in PhantomJS, but I'm not sure if it's meant to be used this way? If not, is it possible to have PhantomJS use regular node modules, like e.g. expressjs, so I can use the PhantomJS runtime to also host a simple webserver?

The bridge node-phantom isn't unreliable (phantom-node is unreliable, and overcomplicated, so don't use that one).
Phantom itself can sometimes be a bit unreliable, but it tends to be with specific web sites.
I'm not convinced Phantom is the right solution for you though - you might want to check out JSDom instead, and just have your code run in-process.

Urls for REST methods

I'm trying to get my head around the correct way to represent the following methods in a restful manner...
Say I have x amount of servers and they run Windows, Linux and Unix.
Each server can have a ping, shutdown and user action ran against them.
The API has no knowledge of the server so a request would have to provide the IP address along with server type, and action type (which it does know about).
With that in mind, these simple URLs come to mind but aren't restful in the slightest:
/192.168.1.3/Linux/ping
/192.168.1.5/windows/shutdown
Should I go down the restful route? Or is the above ok for a simple web API?
If restful would it look this?
GET /servertypes/{servertypeId}/actions/{actionId}?serverip=192.168.1.4

This seems to make more sense to me:
GET /servertypes/{servertypeId}/{serverip}/{action}

In my opinion, you should rethink your design.
You're effectively overriding the semantics of HTTP methods by placing the command names in the URLs, which are meant to represent scoping information. GET requests should not lead to changes of state. Ignoring the underlying protocol is not considered RESTful.
What you have here looks like an RPC model that doesn't really fit the principles of REST. It seems to me that you're trying to hammer a square peg into a round hole.
You should also ask yourself if it's really necessary to expose the underlying OS. Do you want to call system-specific commands? Should it be possible to run any executable or just a couple of common ones? Should you care if it's grep on a UNIX machine or findstr on a Windows box? Does the client care if they use a Windows or a Linux machine?
You could go for a simple pattern similar to what Uzar Sajid suggested. It would probably work alright. Just don't call it RESTful cause it's not.

We Keep Coding

JavaScript is the programming language of the Web.

Phantom.js / Casper.js with rotating proxy? - javascript

I had the same issue a while back, I worked with PhantomJS. The solution we ended up with was running PhantomJS as a child process of a larger Java/Scala server which then handled failures and assigned the different proxies when needed (by rerunning with different params in the --proxy arg).

I had the same issue with Puppeteer, though idea is the same. I started local Node.js proxy running through https://www.npmjs.com/package/gimmeproxy-request and pointed Puppeteer instance to it. Using local proxy server I was able to verify when the page wouldn't load and retry the request.

Related

Get a javascript variable from a web page without interaction/heedlessly

How to Bypass Puppeteer Blocking Systems

How can I debug javascript between client and server seamlessly

Running PhantomJS as a server

Urls for REST methods

Categories

Resources