NodeJS emulate browser for get/post requests

NodeJS emulate browser for get/post requests - javascript

There's a lot of mixed results when I search around for emulating a browser. Long story short, I need my Node server to do get & post requests. Usually I'd just do this with the http package. However, there is some anti-scripting things in place on the other side. Namely javascripts that let the server know it's a real browser. So, I need these to be executed.
I actually solved this problem like 5 years ago, but my site was only using PHP then. The solution involved using a Qt webkit widget, and a fake X-server. Not elegant, but it was pretty easy to do. The only javascript engines I found available in Perl, PHP, or Python at the time were crazy slow.
As NodeJS is built on V8, I gotta think there's an easy way to do this. For the record, I'm hoping to get something a la the following.
// Omitting some callbacks
http.get('http://remote.site', function(res) {
res.on('end', function() {
// previously accumulated data is the page returned by
// the request. Any thing found in a <script> tag would have
// been executed.
});
});

As NodeJS is built on V8, I gotta think there's an easy way to do this.
Actually, no! There's a lot more to running in the context of a browser than simply being able to execute JavaScript. All of the DOM stuff and what not is no present in Node.js. Node.js has the JavaScript engine only.
Without the browser engine, you won't know what scripts to load, in what order, or be able to provide everything that comes with the document or window, which is likely a required part of what you're trying to do.
The solution involved using a Qt webkit widget, and a fake X-server. Not elegant, but it was pretty easy to do.
This is actually the right solution... mostly. Fortunately these days there are existing tools which have optimized this reasonably well.
Take a look at PhantomJS. http://phantomjs.org/ You can write scripts for it much in the same way you do Node.js. (It supports require() and what not, and most of the NPM packages you'd want work.) PhantomJS will allow you to run the page and pull the DOM contents out easily.
In the event PhantomJS' built in JavaScript environment doesn't contain some Node.js component you need (for filesystem or network access for example), you can always control PhantomJS from your Node.js application. https://github.com/amir20/phantomjs-node

Related

Absolute simplest server-side Javascript mechanism?

I've got a simple Javascript application with a JSON API. Currently it runs in the client, but I'd like to move it from the client to the server. I am accustomed to learning new platforms, but in this case, my time is very limited - so I need to find the absolute simplest way possible.
This should be an easy task, but all I'm finding are solutions that are way overcomplicated:
The application is currently hosted on an extremely basic server. Node.js is not available, and I do not have install privileges. I'll eventually move it to a different server, but I really don't know what will be available there.
Many solutions require installing and running a standalone server. Really? Just to evaluate Javascript server-side and spit out some data?
I can run Python and PHP, and I see that it's possible to call Javascript from inside a Python or PHP script. However, the specific Python solution that I've found also require installing some Python support via pip or easy-install, so probably not an option. Also, this just feels overcomplicated, and I'm concerned about setting myself up for errors such as data conversion or permissions, etc.
Any help?

#Quentin is correct. There is no way to run javascript on a server without a javascript interpreter on the server.
Node.js is not only the most robust and widely used one, it's also the simplest. It is certainly possible to write your own javascript interpreter in PHP or Python, but that would be much more complicated than using Node.js.
Try really hard to find a server solution that allows you to use Node. In the end, it's going to save you (and any other stakeholders interested in the project) a lot of time and money.

Windowless container for Google App Engine channel API client

I would like to write a commandline tool that receives notifications from Google App Engine's Channel API. This seems to be quite straightforward thanks to open JavaScripts VMs such as v8 and js. One problem with this approach, though, is that these VMs do not provide standard js objects such as window and document, which the channel API references. Running such code therefore gives you window/document/.. not found errors.
There seem to be two ways of circumventing this obstacle:
To write a lightweight header in javascript to emulate the behavior of the required objects.
To edit Google's javascript (/_ah/channel/jsapi) and eliminate references to such objects.
Does anyone know if there are existing implementations of these approaches, or know of a better idea? Furthermore, is there a clean, uncompressed version of the channel API client side javascript code available somewhere?

You can't edit the script used by /_ah/channel/jsapi -- it's only used when the channel is running against the dev app server. When running in production, that script redirects to https://talkgadget.google.com/talkgadget/channel.js
So you're left with emulating the required objects, or just using a hidden browser window. I would opt for the latter, since I think emulating all the DOM calls is going to get very difficult very quickly.

wget vs load page with QtWebkit

I am trying to understand the difference between the resulting output of a simple load page with QtWebkit and an wget command, apart from that QtWebkit has a large API that we can make use of in a webpage to do a lot of things with Python.
What is the process of a wget and how does it download a webpage with all its components (images, etc.). Is there a difference in the output size of both processes?
And last question: What is being executed (javascript) in a load page with QtWebkit (besides an onload event handler)?

By default, wget does not retrieve any page requisites unless you tell it to via the -p/--page-requisites or the -r/--recursive flags. It processes no JavaScript commands, nor does it attempt to do anything with the markup or CSS unless you specifically tell it to. Even then, I'm pretty sure it just uses simple string matching to determine resource names and link URLs. All in all, it's pretty stupid until you configure it correctly (the basis for just about every powerful *NIX tool).
Since the WebKit library is so extensive, it would be useful to know what you're trying to do with it, like what code are you executing. But, since you already know what you're doing is performing JavaScript calls, it's reasonable to assume that it's doing a lot more than just retrieving the page.
Perhaps if you gave some examples of what you're trying to do I would be able to more thoroughly answer your question.

Execute javascript on IIS server

I have the following situation. A customer uses JavaScript with jQuery to create a complex website. We would like to use JavaScript and jQuery on the server (IIS) for the following reasons:
Skills transfer - we would like to use JavaScript and jQuery on the server and not have to use eg VB Script. / classic asp. .Net framework/Java etc is ruled out because of this.
Improved options for search/accessibility. We would like to be able to use jQuery as a templating system, but this isn't viable for search engines and users with js turned off - unless we can selectively run this code on the server.
There is significant investment in IIS and Windows Server, so changing that is not an option.
I know you can run jScript on IIS using windows Script host, but am unsure of the scalability and the process surrounding this. I am also unsure whether this would have access to the DOM.
Here is a diagram that hopefully explains the situation. I was wondering if anyone has done anything similar?
EDIT: I am not looking for critic on web architecture, I am simply wanting to know if there are any options for manipulating the DOM of a page before it is sent to the client, using javascript. Jaxer is one such product (no IIS) Thanks.

Have a look at bringing the browser to the server, Rhino, and Use Microsoft's IIS as a Java servlet engine.
The first link is from John Resig's (jQuery's creator) blog.
Update August 2 2011
Node.js is coming to Windows.

The idea to reuse client JS on the server may sound tempting, but I am not sure that jQuery itself would be ready to run in server environment.
You will need to define global context for jQuery somehow by initializing window, document, self, location, etc.. I am not sure it is doable.
Besides, as Cheeso has mentioned, Active Server Pages is a very outdated technology, it was replaced with ASP.Net by Microsoft in the beginning of the century. I used to maintain a legacy system using ASP 3.0 for more than a year and that was pain. The most wonderful pastime was debugging: you will hardly find anything for the purpose today and will have to decript beautiful errors like in IIS log:
error '800a9c68'
Application-defined or object-defined error
Nevertheless, I can confirm that I managed to reuse client and server JScript. But this was code written by me who knew that it was going to be used on the server.
P.S. I would not recommend move that way. There are plenty templating frameworks which are familiar to those who write HTML and JavaScript.

JScript runs on IIS via something called ASP.
Active Server Pages.
It was first available in 1996.
Eventually ASP.NET was introduced as a successor. But ASP is still supported.
There is no DOM for the HTML page, though.
You might need to reconsider your architecture a bit.

I think the only viable solutions you're likely to find anywhere near ready to go involve putting IIS in front of Java. There are two browser-like environments I'm aware of coded for Java:
1) Env-js (see http://groups.google.com/group/envjs and http://github.com/thatcher/env-js )
I believe this one has contributions from jQuery's John Resig and was put together with jQuery testing/support in mind.
2) HTMLUnit (see http://htmlunit.sourceforge.net/ ) This one's older, and wasn't originally conceived around jQuery, but there are reports in the wild of using it to run jQuery's test suite successfully (http://daniel.gredler.net/2007/08/08/htmlunit-taming-jquery/ ).
If you want something pure-IIS/MS, I think your observation about windowsScript host and/or something like the semi-abandoned JScript.NET is probably about as close as you're going to come, along with a port (which you'll probably have to start) of something like Env-js or HTMLUnit.
Also, I don't know if you've seen the Wikipedia list of server-side JavaScript solutions:
http://en.wikipedia.org/wiki/Server-side_JavaScript
Finally... you could probably write a serviceable jQuery-like library in any language that already has some kind of DOM library and first-class functions (or, failing that an eval facility). See, for example pQuery for Perl (http://metacpan.org/pod/pQuery ). This would get you the benefits of the jQuery style of manipulating documents. Skill transfer is great and JavaScript has a wonderful confluence of very nice features, but on the other hand, having developers who care enough to learn multiple languages is also great, and js isn't the only nice language out there.

I think it's mainly a browser based script so probably you are better of using technologies based on VB or .NET to perform or generate HTML from templates. I'm sure there are because in the java world there are a few of these around (like velocity). You'd then use jQuery to create or add client side functionality and usability so it makes the website more usable than it would have been.

What exactly do you mean by
"A customer uses JavaScript with
jQuery to create a complex website"
Half the point of jQuery is to make it easy for the developer to manipulate the DOM, and therefore add interactive enhancements to a web site. By running the Javascript on the server and only rendering HTML you will lose the ability to add these enhancements, without doing a round trip to the server (think WebForms postback model...ugh).
Now if what you really mean is the customer uses a site builder based on jQuery, why not have that tool output flat HTML in the first place?

Take a look at this technology. You can invoke scripts to run at server, at client, or both. Plus, this really implements the firefox engine on the server. Take a look at it.
Aptana's Jaxer is the first AJAX web server so far. I have not tryed it yet, but I will. Looks promising and very powerful.

HTTP Request, loading javascript DOM manipulations that have been made to the HTML

I'm currently using cURL to do HTTP requests, and it works fine. However I need to get the javascript code and execute it in the context of the HTML, making it manipulate the DOM exactly as if it were a web-browser.
The first thing that came to mind was to use firefox, there's a command-line interface so I thought it would be easy (maybe with some add-on) to programmatically do an HTTP request, let it natively run the javascript and manipulate the DOM, and get the generated HTML after the manipulation.
However this is harder than I expected, given also the fact that there's going to be problems fetching the data asynchronously.
Maybe someone has done this already and could give me some tips on what would be the best solution.

You could probably use Selenium remote control to achieve this.

I would recommend Watir
Watir, pronounced water, is an open-source (BSD) family of Ruby libraries for automating web browsers. It allows you to write tests that are easy to read and maintain. It is simple and flexible.

This is what you want to use for something like this:
http://code.google.com/p/envjs/

We Keep Coding

JavaScript is the programming language of the Web.