I am writing a program to search tags, however I can't get these if they are created with javascript.
There is no direct way to do this when it comes to Javascript because it needs a browser to be executed.
Possible Solution
You can run a headless browser with Selenium for example and wait for your DOM elements to be created then - and using Javascript - make a request to your PHP server with the window.document.querySelector('html') data that you want to parse.
Related
I built a random quotes app using vanilla js and ajax and DOM manipulation and was wondering if I can do the js bit in the back end using node/express. However I can't seem to figure out if DOM manipulation ( like generating a new quote upon clicking the button ) can be done via NODE JS ?
You can parse HTML into a DOM with Node.js using a library such as libxmljs or cheerio which would then allow you to manipulate the result.
You can't use code running in Node.js to manipulate the DOM that a browser has built, or directly handle a click on a button rendered in the browser because both the button and DOM will be in the browser and not in the Node.js program.
The DOM is not part of JS. It is part of HTML, and as such requires the context of the browser in order for it to become available.
What you want to do is continue your DOM manipulation in the browser and make API calls to your node back end. This would be using something like the fetch API to broker the data and then responding to the data by updating the DOM appropriately for your application.
You can try to use some template rendering language like (ejs, handlebars...) to use your vanilla javascript code.
There are many template engine manipulation you can use to generate server side rendering your page.
With template engines you can serve html and manipulate the DOM before serving to the client.
You need distinguish what's dom and what's node firstly.
1.DOM
Means to be running in browser, it's need three type files
js
css
html
2.Node
It's a programing language, a tool, looks like javascript. You can't running node in brsower. But you can create html/css/js file by node coding, alse you can return them by server framework (eg. node express)
3.My suggestion
If you truely wanna write some code in node express, and do some DOM manipulate.
try:
SSR (Server Side Render)
Coding your DOM code in ejs file or some other server side html template
Return your page by writing server api(Express is good choice to do that)
Well, here Node/Express is a backend tool and your browser (where vanilla JS would run) is a front end.
A backend can generate output in the form of HTML/JSON/XML or even plain text!. But similarly, as we can't change the value returned by a function, The backend cannot change once it outputs something.
Here the frontend tools are used, that are able to change the DOM which is vanilla JS in your case.
So one way is fetching a new resource from backend all together OR the smarter way is to just fetch the quote you want to change (by creating a REST API on the same backend)
so you need some route like /page/quotes that return an HTML page
and some route like /api/random_quote that may return a random quote in JSON format. that you may call from frontend JS using something like fetch
I'm trying to scrape the following page using hQuery: http://www.oddsportal.com/search/Paris+SG/soccer/
I realised half way that the odds of each game are included using JS (before, it's just -). Is there any way to get the page after the javascript has been executed or should I find another website??
My guess is that you would have to use another browser (not hQuery) and look into the code and see if there are any events that are emitted that you can catch up on.
You cannot using PHP
Scraping a site gives you whatever the server responds with to the HTTP request that you make (from which the "initial" state of the DOM tree is derived, if that content is HTML). It cannot take into account the "current" state of the DOM after it has been modified by Javascript.
You can using other powerful tools like selenium
You would need PhantomJs PHP wrapper for that is easy to use and gives more control and features, please see my answer here
Scraping a dynamically loading website with php curl
Hope it helps
The javascript is printing out the HTML onto the page example below, is it possible to call a C function on it for example in C to convert something to another language there is a function LANG_Str("text") which converts the text into the specified language. Would it be possible to use this function on the below text inside Javascript?.
"<tr><th>Service</th><th>Target Allocation (%)</th><th></th>"
EDIT:
I'm basically wanting to do a human language translation. The site already supports multi-language, the problem is on the custom screen like the one shown above which gets generated in Javascript, cannot use the function used to translate text the way its done normally in C.
If it's running in the browser: no. Sorry.
You might be able to do it in server-side code beforehand (e.g. Python or PHP which can call C) when putting together the page content. Alternatively you can make an AJAX request to a server which exposes the C function as an HTTP API/Endpoint (via, GCI, FCGI or Python/PHP/Perl). But not in the browser.
This is because the JS runs in a sandboxed virtual environment which has no access to system calls or anything outside the runtime.
EDIT
In response to your comment "The script is ran in the C using HTML_WriteToCgi", this suggests that you are putting together the HTML in C on your server. If this is correct, go for my option 1 above, by injecting the values directly into the JS source code if all values come out of some data known by the server.
You might consider moving some functionality out of browser JS and back into server-side code to solve your problem.
You can make a special request, so the webserver can use that request and send it to the webpage.
JavaScript can't access any other processes directly, but it can make a server request for the information. The server can call a C function if need be.
In the end, it's not JavaScript calling the C function, it's the server (and whatever language it's using: Python, PHP, ASP.NET, JSP, etc) that would be calling the C function.
My interpretation is that your goal is to call a C function within HTML / Javascript and capture the output.
What you could do is make a VM. Basically, you have a huge array "memory", a couple of "registers", etc... The hardest part is to make sure that they instruction set and the bytecodes of your VM mirrors some common instruction set that there is a C compiler for. You compile the C code that VM on your computer, save it to a file, and run it on the VM. If doing that is too hard, you could just get a C to assembly converter, and just define a couple of Assembly instructions instead. There is a Linux emulator in pure javascript with no server calls that does precisely that.
You might consider creating a RESTful web service on your server that will receive the source text and target language id, then return the translated text. You could then access it from your webpage via an ajax call.
I’m not an expert on web development. But isn’t it possible for javascript to invoke c using webassembly?
Not sure of it’s limitation/constraints though - such as memory
Something like this?
https://developer.mozilla.org/en-US/docs/WebAssembly/C_to_wasm
I have been researching for the standard practice to analyze the markup of a web page after javascript processing within a script or from the command line, i.e. without any browser?
This needs to happen on a Linux environment. Are the are "installables" that would allow you to pass HTML markup including javascript and it would return the markup after simulating a standard browser request and all Javascript calls have been done?
If there are any Perl Modules you can think of than that would be of even more help.
I have been looking at https://developer.mozilla.org/en/SpiderMonkey and http://search.cpan.org/~mschilli/JavaScript-SpiderMonkey-0.12/SpiderMonkey.pm but I am not sure this would allow me to pass in a full HTML document in and get the processed version with all javascript DOM manipulations back?
Please let me know.
Update, I figure it out
I figured it all out - this is what needs to be done:
#!/usr/bin/perl
use WWW::Scripter;
$w = new WWW::Scripter;
$w->use_plugin('JavaScript');
$w->get('http://www.google.com');
print $w->content(),"\n";
You have to use a browser, a new one like WWW::Scripter::Plugin::Javascript
or an old one like WWW::Mechanize::Firefox
Maybe the solution could be headless browser like PhantomJS. Not a perl module, but very practical for front-end testing and automation.
I use curl, in php and httplib2 in python to fetch URL.
However, there are some pages that use JavaScript (AJAX) to retrieve the data after you have loaded the page and they just overwrite a specific section of the page afterward.
So, is there any command line utility that can handle JavaScript?
To know what I mean go to: monster.com and try searching for a job.
You'll see that the Ajax is getting the list of jobs afterward. So, if I wanted to pull in the jobs based on my keyword search, I would get the page with no jobs.
But via browser it works.
you can use PhantomJS
http://phantomjs.org
You can use it as below :
var page=require("webpage");
page.open("http://monster.com",function(status){
page.evaluate(function(){
/* your javascript code here
$.ajax("....",function(result){
phantom.exit(0);
}); */
});
});
Get FireBug and see the URL for that Ajax request. You may then use curl with that URL.
There are 2 ways to handle this. Write your screen scraper using a full browser based client like Webkit, or go to the actual page and find out what the AJAX requesting is doing and do request that directly. You then need to parse the results of course. Use firebug to help you out.
Check out this post for more info on the subject. The upvoted answer suggests using a test tool to drive a real browser.
What's a good tool to screen-scrape with Javascript support?
I think env.js can handle <script> elements. It runs in the Rhino JavaScript interpreter and has it's own XMLHttpRequest object, so you should be able to at least run the scripts manually (select all the <script> tags, get the .js file, and call eval) if it doesn't automatically run them. Be careful about running scripts you don't trust though, since they can use any Java classes.
I haven't played with it since John Resig's first version, so I don't know much about how to use it, but there's a discussion group on Google Groups.
Maybe you could try and use features of HtmlUnit in your own utility?
HtmlUnit is a "GUI-Less browser for
Java programs". It models HTML
documents and provides an API that
allows you to invoke pages, fill out
forms, click links, etc... just like
you do in your "normal" browser.
It has fairly good JavaScript support
(which is constantly improving) and is
able to work even with quite complex
AJAX libraries, simulating either
Firefox or Internet Explorer depending
on the configuration you want to use.
It is typically used for testing
purposes or to retrieve information
from web sites.
Use LiveHttpHeaders a plug in for Firefox to see all URL details and then use the cURL with that url.
LiveHttpHeaders shows all information like type of method(post or get) and headers body etc.
it also show post or get parameters in headers
i think this may help you.