I'm trying to call the function showPage('3'); of this page, for use the page source code after. I tried to do with htmlUnit like so:
WebClient webClient = new WebClient();
webClient.waitForBackgroundJavaScriptStartingBefore(10000);
HtmlPage page = webClient.getPage("http://www.visittrentino.it/it/cosa_fare/eventi/risultati?minEventDate=09012014&maxEventDate=31012014&tp=searchForm.thismonth<p=gennaio");
String javaScriptCode = "showPage('3');";
ScriptResult result = page.executeJavaScript(javaScriptCode);
result.getJavaScriptResult();
System.out.println("result: "+ result);
But it's not working.
It prints out:
result: net.sourceforge.htmlunit.corejs.javascript.Undefined#a303147
and other 10000 warnings. What am I doing wrong? I need to change the page of this site for do some crawling on the source code. Is there another way (and maybe more easier) for calling jsp-function from Java code and then navigate in the source of the page?
Thank you for any help, have a nice day.
You print the ScriptResult object not the content of the page,change the SOP code to result.getNewPage()
Related
I am trying to login to a website using AjaxForm. I managed to retine the forms and reach the xpath of the desired button though when I call #click I get this error:
EcmaError: lineNumber=[193] column=[0] lineSource=[<no source>] name=[ReferenceError] sourceName=[script in https://test.paypo.com/Account/Login?ReturnUrl=%2FHome%2FStart from (177, 32) to (221, 10)]
message=[ReferenceError: "Paypo" is not defined.
(script in https://test.paypo.com/Account/Login?ReturnUrl=%2FHome%2FStart from (177, 32) to (221, 10)#193)]
com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: "Paypo" is not defined. (script in https://test.paypo.com/Account/Login?ReturnUrl=%2FHome%2FStart from (177, 32) to (221, 10)#193)
I am honestly clueless on how to get around this... important note is that I have no access to the source of the website, the actual website logging works perfectly fine.
I've tried using any kind of BrowserVersion and different HtmlUnit versions...
Current code:
final HtmlPage thePage = ((HtmlPage) page);
final HtmlButtonInput button = (HtmlButtonInput) thePage.getByXPath("//input[#type='button']").get(0);
webClient.getOptions().setThrowExceptionOnScriptError(true);
final HtmlPage newPage = button.click();
Error araises when #click is called!
Any clue? Please!
Ok have done a short check with this code:
final String url = "https://test.paypo.com/Account/Login?ReturnUrl=%2FHome%2FStart";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
HtmlPage page = webClient.getPage(url);
}
Running this produces a bunch of errors; the first one is
com.gargoylesoftware.htmlunit.ScriptException: identifier is a reserved word: class (https://test.paypo.com/bundles/SharedJS?v=qrYYsvxJCv4nRnx8xzi1sMLBQPQlIPteJjoj8eCO1go1#7)
What does this mean?
The page includes some js code from the url https://test.paypo.com/bundles/SharedJS?v=qrYYsvxJCv4nRnx8xzi1sMLBQPQlIPteJjoj8eCO1go1 and there is a problem with this code. In detail the code uses the javascript 'class' language feature and HtmlUnit (in the end Rhino) does not support this syntax in the current version
Because of this the javascript from this external resource is not 'compilable' and thereof not available for the other javascript on that page
And finally this leads to the error you are facing.
I'm trying to run a JS code same as I would run it from console on webpage.
In code below "myhtml" is string with HTML from page I want to get value of #green
Simpliest answer would be to use webbrowser, but I can't as I'm not working with single-threaded application (so I'd get errors obviously).
The "myhtml" has external scripts that need to be loaded, so answer I'm looking is something like webbrowser that I can use in multithreaded application.
Code I've tried (changed JScript with HTML in js.Language):
Dim js As MSScriptControl.ScriptControlClass = New MSScriptControl.ScriptControlClass()
js.AllowUI = False
js.Language = "HTML"
js.Reset()
js.AddCode("myhtml")
Dim parms As Object() = New Object() {11}
Dim result As Integer = CInt(js.Run("alert(document.getElementById('green').value)", parms))
MsgBox(result)
With result of:
A script engine for the specified language can not be created.
I want to use HtmlUnit (v2.21) to get some search result pages from google. This requires me to click on "people also looked for" link when searching for a person (right side, see example link), which triggers some JavaScript and changes the content of the current page. But this gives me an JavaScript Wrapper Exception (see below).
Clickable example link: https://www.google.de/search?ie=UTF-8&safe=off&q=nicki+minaj
Simple TestCase with errors:
String url = "https://www.google.de/search?ie=UTF-8&safe=off&q=nicki+minaj";
WebClient client = new WebClient(BrowserVersion.BEST_SUPPORTED);
HtmlPage page = client.getPage(url);
HtmlElement link = page.getFirstByXPath("//a[#class='_Zjg']");
HtmlPage newPage = link.click(); //throws exception
this.storeResultFile(newPage.asXml(), "test");
client.close();
Result:
net.sourceforge.htmlunit.corejs.javascript.WrappedException: Wrapped java.lang.NullPointerException
at net.sourceforge.htmlunit.corejs.javascript.Context.throwAsScriptRuntimeEx(Context.java:2053)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:947)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:1012)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:799)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:742)
at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:689)
I stored the xml of the "page" object and made sure that the XPath expression is valid and has results.
Anybody got any ideas?
Looks like the JavaScript-Engine (based on Rhino) is very easy to upset and quits on some script-issues, where other browsers are still able to run the script.
I dont know if there is a mistake in the scripts from google, but these two lines solved it for me:
JavaScriptEngine engine = client.getJavaScriptEngine();
engine.holdPosponedActions();
Nevertheless, when running multiple htmlunit-objects in multiple threads it is still possible to get accross this error. This is more a workaround than a solution.
I'm trying to get PhantomJS to take an html string and then have it render the full page as a browser would (including execution of any javascript in the page source). I need the resulting html result as a string. I have seen examples of page.open which is of no use since I already have the page source in my database.
Do I need to use page.open to trigger the javascript rendering engine in PhantomJS? Is there anyway to do this all in memory (ie.. without page.open making a request or reading/writing html source from/to disk?
I have seen a similar question and answer here but it doesn't quite solve my issue. After running the code below, nothing I do seems to render the javascript in the html source string.
var page = require('webpage').create();
page.setContent('raw html and javascript in this string', 'http://whatever.com');
//everything i've tried from here on doesn't execute the javascript in the string
--------------Update---------------
Tried the following based on the suggestion below but this still does not work. Just returns the raw source that I supplied with no javascript rendered.
var page = require('webpage').create();
page.settings.localToRemoteUrlAccessEnabled = true;
page.settings.webSecurityEnabled = false;
page.onLoadFinished = function(){
var resultingHtml = page.evaluate(function() {
return document.documentElement.innerHTML;
});
console.log(resultingHtml);
//console.log(page.content); // this didn't work either
phantom.exit();
};
page.url = input.Url;
page.content = input.RawHtml;
//page.setContent(input.RawHtml, input.Url); //this didn't work either
The following works
page.onLoadFinished = function(){
console.log(page.content); // rendered content
};
page.content = "your source html string";
But you have to keep in mind that if you set the page from a string, the domain will be about:blank. So if the html loads resources from other domains, then you should run PhantomJS with the --web-security=false --local-to-remote-url-access=true commandline options:
phantomjs --web-security=false --local-to-remote-url-access=true script.js
Additionally, you may need to wait for the completion of the JavaScript execution which might be not be finished when PhantomJS thought it finished. Use either setTimeout() to wait a static amount of time or waitFor() to wait for a specific condition on a page. More robust ways to wait for a full page are given in this question: phantomjs not waiting for “full” page load
The setTimeout made it work even though I'm not excited to wait a set amount of time for each page. The waitFor approach that is discussed here doesn't work since I have no idea what elements each page might have.
var system = require('system');
var page = require('webpage').create();
page.setContent(input.RawHtml, input.Url);
window.setTimeout(function () {
console.log(page.content);
phantom.exit();
}, input.WaitToRenderTimeInMilliseconds);
Maybe not the answer you want, but using PhantomJsCloud.com you can do it easily, Here's an example: http://api.phantomjscloud.com/api/browser/v2/a-demo-key-with-low-quota-per-ip-address/?request={url:%22http://example.com%22,content:%22%3Ch1%3ENew%20Content!%3C/h1%3E%22,renderType:%22png%22,scripts:{domReady:[%22var%20hiDiv=document.createElement%28%27div%27%29;hiDiv.innerHTML=%27Hello%20World!%27;document.body.appendChild%28hiDiv%29;window._pjscMeta.scriptOutput={Goodbye:%27World%27};%22]},outputAsJson:false} The "New Content!" is the content that replaces the original content, and the "Hello World!" is placed in the page by a script.
If you want to do this via normal PhantomJs, you'll need to use the injectJs or includeJs functions, after the page content is loaded.
I am looking for a simple way to take a screenshot of an iFrame in my ASP page. I just couldn't achieve it with C# and I lack of knowledge of Javascript! Does anyone out there know the simple and best way to achieve this?
What I am trying to do is, I am building a website that students can log in to e-government website in my country and prove if they are continuing student with a single click so that they can get discount from our service.
Edit: The puzzle should be solved in local.
this piece of code worked for me. I hope it does the same to the others.
private void saveURLToImage(string url)
{
if (!string.IsNullOrEmpty(url))
{
string content = "";
System.Net.WebRequest webRequest = WebRequest.Create(url);
System.Net.WebResponse webResponse = webRequest.GetResponse();
System.IO.StreamReader sr = new StreamReader(webResponse.GetResponseStream(), System.Text.Encoding.GetEncoding("UTF-8"));
content = sr.ReadToEnd();
//save to file
byte[] b = Convert.FromBase64String(content);
System.IO.MemoryStream ms = new System.IO.MemoryStream(b);
System.Drawing.Image img = System.Drawing.Image.FromStream(ms);
img.Save(#"c:\pic.jpg", System.Drawing.Imaging.ImageFormat.Jpeg);
img.Dispose();
ms.Close();
}
}
Unless I'm misunderstanding you, this is impossible.
You cannot instruct the user's browser to take a screenshot (this would be a security risk … and has few uses cases anyway).
You cannot load the page you want a screenshot of yourself (with server side code) because you don't have the credentials needed to access it.
server side
Take a screenshot of a webpage with JavaScript?
javascript
http://html2canvas.hertzen.com/