Load web page contents in console application using c# - javascript

I want to load contents of below web page in console application using c#.
http://justicecourts.maricopa.gov/findacase/casehistory.aspx
Using below code I am getting empty on the screen but it works perfectly if I load google.com web page.
By using WebClient and WebRequest I was getting error "Please enable javascript" and content was not loading so I used below code and javascipt error is not displaying now but web page content is not loading. I am struggling with this issue quite from long time, have seen lot of post regarding this and couldn't get this work.
Could anyone please help?
Thanks in Advance..
class Program
{
private static bool completed = false;
private static WebBrowser wb;
[STAThread]
static void Main(string[] args)
{
wb = new WebBrowser();
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
wb.Navigate("http://justicecourts.maricopa.gov/findacase/casehistory.aspx");
while (!completed)
{
Application.DoEvents();
Thread.Sleep(100);
}
Console.Write("\n\nDone with it!\n\n");
Console.ReadLine();
}
static void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
Console.WriteLine(wb.Document.Body.InnerHtml);
completed = true;
}
}

If you literally just want to dump the contents of that URL out to the console, try this:
using(WebClient client = new WebClient()) {
Console.WriteLine(client.DownloadString(url));
}

try adding more wait.
static void Main(string[] args)
{
wb = new WebBrowser();
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
wb.Navigate("http://justicecourts.maricopa.gov/findacase/casehistory.aspx");
while (!completed)
{
Application.DoEvents();
Thread.Sleep(100);
}
//wait even more
for (int i = 0; i < 6; i++)
{
Application.DoEvents();
Thread.Sleep(1000);
}
Console.Write("\n\nDone with it!\n\n");
Console.ReadLine();
}
otherwise you can use EO Browser it is paid. but in your case trail will work cause it is not GUI application.as it shows trail message in GUI.
in EO you can say..
EOContorol.WebView.LoadUrlAndWait(URL);

Try using PhantomJs
basicaly like running a webbrowser without a window. (headless)

Related

Dynamic content of Web Page not loaded totally using Htmlunit WebClient

I am trying to load web page (https://genpact.taleo.net/careersection/sgy_external_career_section/jobsearch.ftl?lang=en) for scraping using HtmlUnit WebClient. But the content is not being loaded properly. For example, i am unable to find the Apply buttons.
My webclient code is as below
webClient.setCssErrorHandler(new DefaultCssErrorHandler());
webClient.setJavaScriptErrorListener(new DefaultJavaScriptErrorListener());
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getCookieManager().setCookiesEnabled(true);
webClient.waitForBackgroundJavaScript(60000);
Can someone please help me with this
This works for me
public static void main(String[] args) throws IOException{
final String url = "https://genpact.taleo.net/careersection/sgy_external_career_section/jobsearch.ftl?lang=en";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
HtmlPage page = webClient.getPage(url);
// waitForBackgroundJavaScript has to be called after every action
// this page is really slow wait for the last part of the dynamic content
while(!page.asText().contains("Previous\r\n1\r\n2\r\n3\r\n4\r\n")) {
webClient.waitForBackgroundJavaScript(1_000);
}
System.out.println("-------------------------------------------------------------------------------");
System.out.println(page.asText());
System.out.println("-------------------------------------------------------------------------------");
}
}

C# Webbrowser Programmatically Close JS Confirm Box

while using webbrowser control, I need to programmatically auto close a javascript confirm box.
I used below user32.dll approach and it is working fine on OS which are based english language.
[DllImport("user32.dll", CharSet = CharSet.Auto)]
static extern IntPtr SendMessage(IntPtr hWnd, UInt32 Msg, IntPtr wParam, IntPtr lParam);
But if the computer running non-english OS, it is not working fine as I am using "OK" as text in above method call.
One approach which I suppose can work is I should detect OS language and then use translated "OK" text to use above method.
Here my question is can I change language of the current thread and so webbrowser control so that it show confirm box in English language? This way it would be easy and fast solution in my opinion.
Please suggest your solutions. Thanks in advance.
I am using similar approach in my code however these solutions are working for English language software only. I am actually looking for some generic solution that can run on non-english OS as well.
A possible solution consists in injecting and immediately calling a Javascript function that hijacks the original confirm function:
function hijackConfirm(){
alert('yep!');
window.oldConfirm = window.confirm;
window.confirm = function(){ return true };
}
This is an example in WPF application with the standard WPF WebBrowser control, I'm quite confident that everything I do here can be adjusted to fit the WinForm control (since the underlying ActiveX is the same).
I have a UserControl that acts as an adapter of the WebBrowser, here is the XAML:
<UserControl x:Class="WebBrowserExample.WebBrowserAdapter"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
mc:Ignorable="d"
d:DesignHeight="300" d:DesignWidth="300">
<Grid>
<WebBrowser x:Name="WebBrowserControl"></WebBrowser>
</Grid>
</UserControl>
First, in the WebBrowserAdapter class, you need a method to inject a javascript function in the current HTML document:
public void InjectScript(String scriptText)
{
HTMLDocument htmlDocument = (HTMLDocument)WebBrowserControl.Document;
var headElements = htmlDocument.getElementsByTagName("head");
if (headElements.length == 0)
{
throw new IndexOutOfRangeException("No element with tag 'head' has been found in the document");
}
var headElement = headElements.item(0);
IHTMLScriptElement script = (IHTMLScriptElement)htmlDocument.createElement("script");
script.text = scriptText;
headElement.AppendChild(script);
}
then you call InjectScript, when needed, whenever a document completes to load:
void WebBrowserAdapter_Loaded(object sender, RoutedEventArgs e)
{
WebBrowserControl.LoadCompleted += WebBrowserControl_LoadCompleted;
WebBrowserControl.Navigate("http://localhost:9080/console/page.html");
}
void WebBrowserControl_LoadCompleted(object sender, NavigationEventArgs e)
{
//HookHTMLElements();
String script =
#" function hijackConfirm(){
alert('yep!');
window.oldConfirm = window.confirm;
window.confirm = function(){ return true };
}";
InjectScript(script);
WebBrowserControl.InvokeScript("hijackConfirm");
}
Here I navigate to http://localhost:9080/console/page.html, which is a test page hosted on my system. This works well in this simple scenario. If you find this could apply to you, you may need to tweak a little bit the code. In order to compile the code, you have to add Microsoft.mshtml in the project references
EDIT: WinForm version
To make it work, you have to use the IE 11 engine in your application. Follow the instructions found here to set it
I just tried a WinForm version of this and it works with some minor changes. Here is the code of a form that has a WebBrowser control as one of its children:
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
this.Load += Form1_Load;
}
void Form1_Load(object sender, EventArgs e)
{
webBrowserControl.Navigate("file:///C:/Temp/page.html");
webBrowserControl.Navigated += webBrowserControl_Navigated;
}
void webBrowserControl_Navigated(object sender, WebBrowserNavigatedEventArgs e)
{
InjectConfirmHijack();
}
private void InjectConfirmHijack()
{
String script =
#" function hijackConfirm(){
alert('yep!');
window.oldConfirm = window.confirm;
window.confirm = function(){ return true };
}";
InjectScript(script);
webBrowserControl.Document.InvokeScript("hijackConfirm");
}
public void InjectScript(String scriptText)
{
//mshtml.HTMLDocument htmlDocument = (mshtml.IHTMLDocument) webBrowserControl.Document.get;
var headElements = webBrowserControl.Document.GetElementsByTagName("head");
if (headElements.Count == 0)
{
throw new IndexOutOfRangeException("No element with tag 'head' has been found in the document");
}
var headElement = headElements[0];
var script = webBrowserControl.Document.CreateElement("script");
script.InnerHtml = scriptText;
headElement.AppendChild(script);
}
}

Waiting for Javascript with HtmlUnit

I was experimenting with HtmlUnit the other day. I wrote a program that performs a login to a site and gathers some information. But when clicking a specific button, htmlUnit doesn't wait for the resulting action. I tried to wait till all jobs from the JavascriptJobManager are done but it gets stuck at around 15 jobs. After that I tried to wait till the resulting htmlpage changes, but that does not work either. What could I try next ? Thanks for your time and I will try to implement any suggestion as fast as possible.
Edit: I´m completely aware that facebook doesn't like webcrawling, but I´m only doing this for study purposes, so no harm done. Following the error messages, the program throws. http://www.pastebin.ca/3007578
When the infoButton gets clicked, a new window appears making the old window unaccessible. http://imgur.com/aiF7nJR
final static WebClient webClient = new WebClient(BrowserVersion.FIREFOX_31);
public static void main(String [] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException, InterruptedException{
//init webclient
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(true);
webClient.getOptions().setCssEnabled(true);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.waitForBackgroundJavaScript(12000);
webClient.setAlertHandler(new AlertHandler() {
#Override
public void handleAlert(Page arg0, String arg1) {
System.out.println("ALERT ON "+arg0.getUrl()+" :"+ arg1);
}
});
// perform the login
final HtmlPage loginPage = webClient.getPage("https://facebook.com");
final HtmlForm form = loginPage.getForms().get(0);
final HtmlTextInput username = form.getInputByName("email");
final HtmlPasswordInput password = form.getInputByName("pass");
final HtmlElement button = (HtmlElement) loginPage.getElementById("u_0_l");
username.setText("Your email");
password.setText("Your password");
final HtmlPage frontPage = (HtmlPage) button.click();
// The actual problem
final HtmlPage testPage = webClient.getPage("https://www.facebook.com/pages/Stackoverflow/1462865420609264");
HtmlElement infoButton = testPage.getFirstByXPath("//*[#class='share_action_link']"); // First share button.
HtmlPage testPage2 = infoButton.click();
JavaScriptJobManager manager = testPage2.getEnclosingWindow().getJobManager();
while (manager.getJobCount() > 0) {
Thread.sleep(1000);
webClient.waitForBackgroundJavaScript(100);
System.out.println(manager.getEarliestJob());
}
while(testPage == testPage2){
System.out.println("failed");
webClient.waitForBackgroundJavaScript(100);
Thread.sleep(5 * 1000);
}
}
With latest snapshot (in maven also), there was an error that window.performance is not defined, which was fixed.
EDIT: there was another error detected, fixed and snapshot deployed.
Please retest

Downloading web pages and refreshing it when connection available

So I'm building an app with a lot of web content I plan to release it using Phone Gap build but will host all the content online and will link to it. I was wondering if there is a way that the web pages can be downloaded when there is an active internet connection for offline use and when there is a connection again for the data to be refreshed preferably when the user is using a wifi connection. The site will mostly be in html, js, and php. I will be hosting with bluehost
Is there any way of doing this? Thanks in advance! Littleswany!
PhoneGap apps ARE downloaded to the device, when they are downloaded from the store. They are basically a wrapper around an index.html file, but the app is actually programmed in JavaScript, which is responsible for creating and displaying views etc. The only time you need to check for an internet connection is when you are communicating with your back end (PHP)... If the ajax request fails, the best solution is to provide the user with a button/link to try again when they have regained their internet connection, or set a timer which fires intermittently to keep trying again... NEVER use a while(true) loop in your Phone Gap app - it will just hang.
I am not familiar with java, but i think i can provide the logic to get the job done.
You want to do an infinite loop that checks if the user is on wifi. Then if true, use wget, rsync, or scp to download the website. Something like this.:
while (true){
// do an if statement that checks if user is on wifi. Then do a then statement that uses rsync or wget.
}
Info on how to nest if statements in while loops in java: java loop, if else
I do not know if wget, rsync, or scp can be ran from java. You'll need to look more into it or write your own alternative function to do it. Something like:
function download_file() {
var url = "http://www.example.com/file.doc"
window.location = url;
}
You should be able to do it from your java like this:
String whatToRun = "/usr/local/bin/wget http://insitu.fruitfly.org/insitu_image_storage/img_dir_38/insitu38795.jpe";
Sources:
1. What is the equivalent of wget in javascript to download a file from a given url?
2. Call a command in terminal using Java (OSX)
First Create an Connection filter class
public class Connection_Status{
private static ConnectivityManager connectivityManager;
static boolean connected = false;
public static Boolean isOnline(Context ctx) {
try {
connectivityManager = (ConnectivityManager) ctx.getSystemService(Context.CONNECTIVITY_SERVICE);
NetworkInfo networkInfo = connectivityManager.getActiveNetworkInfo();
connected = networkInfo != null && networkInfo.isAvailable()&& networkInfo.isConnected();
return connected;
} catch (Exception e) {
System.out.println("CheckConnectivity Exception: " + e.getMessage());
}
return connected;
}
}
And in your Main class
public class Main extends Activity{
private WebView mWebView;
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
mWebView = (WebView) findViewById(R.id.webview);
mWebView.getSettings().setJavaScriptEnabled(true);
mWebView.getSettings().setBuiltInZoomControls(true);
if(Connection_Status.isOnline(Main.this)){
HttpClient httpclient = new DefaultHttpClient(); // Create HTTP Client
HttpGet httpget = new HttpGet("http://yoururl.com"); // Set the action you want to do
HttpResponse response = httpclient.execute(httpget); // Executeit
HttpEntity entity = response.getEntity();
InputStream is = entity.getContent(); // Create an InputStream with the response
BufferedReader reader = new BufferedReader(new InputStreamReader(is, "iso-8859-1"), 8);
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) // Read line by line
sb.append(line + "\n");
String resString = sb.toString(); //
is.close(); // Close the stream
}
}
}
Or you can use cache on it e.g
mWebView.getSettings().setAppCacheMaxSize(1024*1024*8);
mWebView.getSettings().setAppCachePath(""+this.getCacheDir());
mWebView.getSettings().setAppCacheEnabled(true);
mWebView.getSettings().setCacheMode(WebSettings.LOAD_DEFAULT);
Don't forget to add the following permissions
<uses-permission android:name="android.permission.INTERNET"/>
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" /> <!-- for the connection status-->
Sources:
https://stackoverflow.com/a/6503817/1309629

Disabling caching, cookies and everything else in a WebView

I have a webservice that I am trying to authenticate with in the background using a webview. When I initially send the request it will work appropriately (failure/success based on credentials), but after it seems like I am getting a cached response.
Here is my webview setup code:
WebView browser = new WebView(this);
WebSettings settings = browser.getSettings();
settings.setJavaScriptEnabled(true);
settings.setSavePassword(false);
settings.setCacheMode(WebSettings.LOAD_NO_CACHE);
settings.setAppCacheEnabled(false);
browser.setWebChromeClient(new WebChromeClient() {
public void onProgressChanged(WebView view, int progress) {
Log.d("BROWSERPROGRESS", Integer.toString(progress));
}
});
jsInterface = new AddAccountJSInterface();
browser.addJavascriptInterface(jsInterface, "ADDACCOUNTJSINTERFACE");
browser.setWebViewClient(new AddAccountClient(this));
So as you may see I have two additional classes controlling my webView:
An object that provides an interface for javascript (AddAccountJSInterface)
A WebViewClient
Additionally I do have a WebChromeClient, but it's only there for debugging and I'm pretty sure that it won't interfere with anything.
The JS interface simply provides an easy way of getting the body HTML for performing analysis, so I'm confident that isn't the issue either.
The WebViewClient has the following code in it which does most of the "custom" work for routing based on various responses from the webservice.
#Override
public boolean shouldOverrideUrlLoading(WebView view, String url) {
if(url.contains(INSTALL_PREFIX)) {
HashMap<String, String> params = extractParameters(url);
verificationComplete(params);
return true;
}
return false;
}
#Override
public void onPageFinished(WebView view, String url){
if(invalidShop(view)) {
Toast.makeText(context, context.getString(R.string.no_find_shop), Toast.LENGTH_SHORT).show();
shopAddressField.requestFocus();
replaceUiElements(loadingBar, addAccountButton);
} else if(url.contains(ADMIN_AUTH_LOGIN)) {
if(invalidLogin(view)) {
Toast.makeText(context, context.getString(R.string.invalid_login),Toast.LENGTH_SHORT).show();
emailField.requestFocus();
replaceUiElements(loadingBar, addAccountButton);
} else {
String email = emailField.getText().toString();
String password = passwordField.getText().toString();
String submitJS = String.format(FORM_SUBMISSION_JS, email, password);
jsInterface.setInnerHTML("");
browser.loadUrl(submitJS);
}
}
}
In my activity I have 3 text fields that I need to fill followed by clicking a button to submit it. The activity then takes in the data from 3 text fields (shopAddressField, usernameField, passwordField) and then executes some javascript that populates some form data (which was loaded in the invisible webView) then clicks the submit button.
It is the last part that is messing up, which appears to be caching the response from the server (perhaps using cookies?) and return that instead of asking the server if the data is correct or not.
A bit of clarification:
JSInterface is simply a Java object that allows me to execute javascript on my webview which is tied to a function within that object. In my case my JSInterface has one function which is setInnerHtml(String html).
This is the javascript that is executed on the webview:
javascript:window.ADDACOUNTJSINTERFACE.setInnerHTML(document.body.innerHTML)
And this is the setInnerHtml function:
public void setInnerHtml(String innerHtml) {
this.innerHtml = innerHtml;
}
So when I actually execute jsInterface.setInnerHtml("") I'm just over-writing the HTML that was pulled in (to be sure I'm not getting my old data from there for some reason).
As for my submitJS it is once again some Javascript that is executed on my webView as follows:
// submitJS will be something like this once all the credentials have been set
// Note: I know that the server will make jQuery available
// Note: Much of the Java string formatting has been removed to help clarify
// the code.
String submitJS =
"javascript:(function() {
$('login-input').value='username';
$('password').value='password';
$('sign-in-form').up().submit();
})()"
// I then simply get the webview to execute the javascript above
webView.loadData(submitJS);
So it turns out the problem wasn't based around the Caching, and possibly not cookies.
When executing javascript on your webView it does this in a separate thread and can be quite slow. This lead to a race condition which caused code to be executed in the wrong order.
I've solved this problem by using a Semaphore as a Mutex. This allows me to prevent my getter from returning before the Javascript on the webView is able to execute.
The interface I created now looks like this:
private class AddAccountJSInterface {
private final String TAG = getClass().getName().toUpperCase();
private Semaphore mutex = new Semaphore(1, false);
private String innerHTML;
public void aquireSemaphore() {
Log.d(TAG, "Attempting to lock semaphore");
try {
mutex.acquire();
} catch(InterruptedException e) {
Log.d(TAG, "Oh snap, we got interrupted. Just going to abort.");
return;
}
Log.d(TAG, "Semaphore has been aquired");
}
#SuppressWarnings("unused")
public void setInnerHTML(String html) {
this.innerHTML = html;
Log.d(TAG, "setInnerHTML is now releasing semaphore.");
mutex.release();
Log.d(TAG, "setInnerHTML has successfully released the semaphore.");
}
public synchronized String getInnerHTML() {
Log.d(TAG, "getInnerHTML attempting to aquire semaphore, may block...");
String innerHTML = "";
try {
mutex.acquire();
Log.d(TAG, "getInnerHTML has aquired the semaphore, grabbing data.");
innerHTML = this.innerHTML;
Log.d(TAG, "getInnerHTML no longer needs semaphore, releasing");
mutex.release();
} catch (InterruptedException e) {
Log.d(TAG, "Something has gone wrong while attempting to aquire semaphore, aborting");
}
return innerHTML;
}
}
Now the way I use this in my code is as follows:
// I have access to the jsInterface object which is an instance of the class above as well as a webView which I will be executing the javascript on.
String getInnerHtmlJS = "javascript:window.MYJSINTERFACE.setInnerHTML(document.body.innerHTML);"
jsInterface.aquireSemaphore()
// Execute my JS on the webview
jsInterface.loadUrl(getInnerHtmlJS)
// Now we get our inner HTML
// Note: getInnerHTML will block since it must wait for the setInnerHTML (executed via the JS) function to release the semaphore
String theInnerHTML = jsInterface.getInnerHTML();

Categories