I'm looking into using PhantomJS to generate static html from a dynamic AngularJS app that can be indexed by google. What I want to do is to start a PhantomJS server that sits behind a proxy and gets the ?escaped_fragment requests. PhantomJS appears to be (mainly) a command line tool (I have read the FAQ explaining why it's not a regular node module), and although I have found a couple of nodejs bridges for it, they appear to be a little bit unreliable.
Therefore, I'm looking into running PhantomJS with an embedded HTTP server. I have seen some examples of a built in webserver in PhantomJS, but I'm not sure if it's meant to be used this way? If not, is it possible to have PhantomJS use regular node modules, like e.g. expressjs, so I can use the PhantomJS runtime to also host a simple webserver?
The bridge node-phantom isn't unreliable (phantom-node is unreliable, and overcomplicated, so don't use that one).
Phantom itself can sometimes be a bit unreliable, but it tends to be with specific web sites.
I'm not convinced Phantom is the right solution for you though - you might want to check out JSDom instead, and just have your code run in-process.
Related
There's a lot of mixed results when I search around for emulating a browser. Long story short, I need my Node server to do get & post requests. Usually I'd just do this with the http package. However, there is some anti-scripting things in place on the other side. Namely javascripts that let the server know it's a real browser. So, I need these to be executed.
I actually solved this problem like 5 years ago, but my site was only using PHP then. The solution involved using a Qt webkit widget, and a fake X-server. Not elegant, but it was pretty easy to do. The only javascript engines I found available in Perl, PHP, or Python at the time were crazy slow.
As NodeJS is built on V8, I gotta think there's an easy way to do this. For the record, I'm hoping to get something a la the following.
// Omitting some callbacks
http.get('http://remote.site', function(res) {
res.on('end', function() {
// previously accumulated data is the page returned by
// the request. Any thing found in a <script> tag would have
// been executed.
});
});
As NodeJS is built on V8, I gotta think there's an easy way to do this.
Actually, no! There's a lot more to running in the context of a browser than simply being able to execute JavaScript. All of the DOM stuff and what not is no present in Node.js. Node.js has the JavaScript engine only.
Without the browser engine, you won't know what scripts to load, in what order, or be able to provide everything that comes with the document or window, which is likely a required part of what you're trying to do.
The solution involved using a Qt webkit widget, and a fake X-server. Not elegant, but it was pretty easy to do.
This is actually the right solution... mostly. Fortunately these days there are existing tools which have optimized this reasonably well.
Take a look at PhantomJS. http://phantomjs.org/ You can write scripts for it much in the same way you do Node.js. (It supports require() and what not, and most of the NPM packages you'd want work.) PhantomJS will allow you to run the page and pull the DOM contents out easily.
In the event PhantomJS' built in JavaScript environment doesn't contain some Node.js component you need (for filesystem or network access for example), you can always control PhantomJS from your Node.js application. https://github.com/amir20/phantomjs-node
I've got a simple Javascript application with a JSON API. Currently it runs in the client, but I'd like to move it from the client to the server. I am accustomed to learning new platforms, but in this case, my time is very limited - so I need to find the absolute simplest way possible.
This should be an easy task, but all I'm finding are solutions that are way overcomplicated:
The application is currently hosted on an extremely basic server. Node.js is not available, and I do not have install privileges. I'll eventually move it to a different server, but I really don't know what will be available there.
Many solutions require installing and running a standalone server. Really? Just to evaluate Javascript server-side and spit out some data?
I can run Python and PHP, and I see that it's possible to call Javascript from inside a Python or PHP script. However, the specific Python solution that I've found also require installing some Python support via pip or easy-install, so probably not an option. Also, this just feels overcomplicated, and I'm concerned about setting myself up for errors such as data conversion or permissions, etc.
Any help?
#Quentin is correct. There is no way to run javascript on a server without a javascript interpreter on the server.
Node.js is not only the most robust and widely used one, it's also the simplest. It is certainly possible to write your own javascript interpreter in PHP or Python, but that would be much more complicated than using Node.js.
Try really hard to find a server solution that allows you to use Node. In the end, it's going to save you (and any other stakeholders interested in the project) a lot of time and money.
Whats the difference here? I want to create a small API for queueing jobs, but I am not sure which should I be using. I'm leaning towards just using Node, but I do understand what is the point of having a web server module for PhantomJS.
From the PhantomJS docs, the Phantom webserver is still experimental and intended to manage other phantom scripts and provide an interface to those scripts from the web. It currently only supports up to 10 concurrent requests. I'd recommend using node if you are looking for a general purpose web server.
I have a simple goal: load webpages with either phantom.js (out of the box) or casper.js (nice and easier) but using proxy and rotate it from a list if current one is bad (i.e. webpage loads fail or something like that).
I know casper.js has --proxy param but it dictates the user to specify only ONE proxy and use it during runtime.
Question #1 is: how to rotate proxy on the fly programmatically?
I did some research and found this node-requester but it's not integrated with casper.js. I tried to extract out just the proxy feature in the code but didn't really understand how it works in the nutshell (I am not that smart I guess).
So question #2: is there some simple implementation of proxy rotation that works with either phantom.js or casper.js?
I prefer to use the fancy casper.js though but will go down with phantom.js bare as well.
I had the same issue a while back, I worked with PhantomJS. The solution we ended up with was running PhantomJS as a child process of a larger Java/Scala server which then handled failures and assigned the different proxies when needed (by rerunning with different params in the --proxy arg).
I had the same issue with Puppeteer, though idea is the same.
I started local Node.js proxy running through https://www.npmjs.com/package/gimmeproxy-request and pointed Puppeteer instance to it.
Using local proxy server I was able to verify when the page wouldn't load and retry the request.
I'm attempting to write a particular script that logs into a website. This specific website contains a Javascript form so I had little to no luck by making use of "mechanize".
I'm curious if there exist other solutions that I may be unaware of that would help me in my situation. If this particular question or some related variant has been asked here before, please excuse me, and I would prefer the link to this particular query. Otherwise, what are some common techniques/approaches for dealing with this issue?
Thanks.
I've recently been using PhantomJS for this kind of work - it's a command-line tool that allows you to run Javascript in a browser environment (based on Webkit). This allows you to do scraping and online interactions that require Javascript-enabled interfaces. There's a Python-based implementation here that's fully compatible with the API of the C++ version, or you could run either version in Python via subprocess.
Depending on what you're trying to do, another good option might be to use Selenium, which has client driver implementation in Python - it's meant for integration testing, but can do a lot of automation as long as you're okay running the Java-based Selenium Server and having the automation happen in an open browser rather than as a background process.