HtmlUnit Load Facebook Photos - javascript

So, I have a project where I need to get the photos from a profile.
I am able to navigate to the photos page of a profile, but I believe the JavaScript is not loading.
I am currently using HtmlUnit but if you know of another Java API that would help I'm all ears.
Basically, when I view Facebook in a normal browser, it will load all of the pages and I can inspect the elements.
When inspecting, there is a div called fbStarGrid and a few other modifiers. This div contains all the images for a user's profile.
When I use HTMLUnit, I cannot find the div. I had it print the full page XML to a file, and I found that the div is commented out. I believe this means the Javascript never ran to load the content.
After browsing a lot of javascript help on SO, I have found a few things that help with debugging but can't seem to fix the problem.
The first thing I've done is create an instance of a JavaScriptJobManager. I used it to see how much JavaScript is not complete. After waiting for a while (10+ seconds) it says there are still 3 JS jobs uncomplete. After a very long time (about 60 seconds), it says there are 2 JS jobs uncomplete.
I do not know what is hanging with those JS jobs.
I get a warning upon page load about application/ld+json not running but I do not believe that part of the website is related to the photos.
Is there something I can do to force the JS to run? Is there a job it's stuck on and won't proceed to the next job?
I've also wondered if it's an issue with the page not re-syncing.
I've tried two solutions related to this:
Setting the AjaxController to NicelyResynchronizingAjaxController()
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
And someone suggested creating a custom controller that forces syncing.
webClient.setAjaxController(new AjaxController(){
#Override
public boolean processSynchron(HtmlPage page, WebRequest request, boolean async)
{
return true;
}
});
Neither of these seemed to effect the page.
If HTMLUnit is not the right library for the job, any other ideas? I need this to be headless/guiless to run on a linux server. Java is preferred, but I can switch languages if necessary.

Related

Starting a vbs script from a HTML file

I've been wrapping my head around this problem for a couple of days searching for all possible solutions on the forums and online but can't seem to get it working.
I'm calling a script by a link on a "button" to start a script on a server (in HTML):
<a href="#" onClick="RunScript();">
The script code is:
<script type="text/javascript" language="javascript">
function RunScript() {
var objShell = new ActiveXObject("WScript.Shell");
objShell.Run("%comspec% /k my_projects_EN.vbs" "), 1, false;
}
</script>
So why am I using a vbs? What I'm trying to do is create custom pages for each employee. So the vbs is actually checking the computer name and an if clause directs the employee to a custom page. With my basic knowledge of programming and a lot of hours of searching I did not find a better solution for this yet. So I'm trying to make this one to work.
And it does but only if I'm running the script locally (desktop). But as the webpage will be used in an intranet location this script will be on a server. And this is where it became a bit hairy as I can't seem to find the right combination of commands to do so. I already tried pushd for creating a mounted volume or currentDir for setting up the location of script but nothing seems to work completely.
I assume that I'm missing a subroutine for the function as adding anything there just stops the script - but how to go at it is beyond me.
All help is appreciated even if it means I have to bury myself into another program language (not preferred of course).
I am certain that there is a way to solve this other than sending a script to each employee to put on their desktop (each time a new employee comes to work).
Thanks
Edit: I see an additional clarification is in order:
We're creating an intranet webpage as a help for more efficient work for our employees. We're on the same level as the rest so not IT or admin rights guys so we're on our own.
The point is to have a personal page for each employee which can be accessed via the same interface. So a link has to send each person to another page that is why I've created the vbs code which helps with that. Checking several other options this seemed to be the simplest and best one - and it works at least partially. I don't see any security risks as all will be done on each client computer - the files themselves will be located on the server. The script itself does not represent any risk at least not that I would see it - but of course I'm not a specialist.
So in short this is what we're trying to do:
Main page -> link to My_projects button -> start script (located on the same server as the main page) -> determine the client computer name -> redirect to the right webpage.
Sorry for a lack of details, I see that it's sometimes hard to explain exactly what you want if you're not a pro in these things.
Thanks again.
If those computers are physically located at your workplace and you have control over the system, it would be better to tweak DNS redirections on those computers. Otherwise, more general and OS independent solution, would be session, cookie, or token on employee's computer. Still, some kind of authentication other than having one piece of machine, could be more versatile and secure (unless your PCs are 1000 feet underground :-) ).
Edit: What kind of info/data are sent to the server script? Server script runs on server and everything related to "this computer" (e.g. name) is actually referring to the server itself. Thus the script needs some data from the client to recognise his computer.
thanks for the effort
Everything is actually located on the server so the client computer only runs the page or interface which is in \Server\folder\folder for example.
In your browser you open the start page which contains a button with a link to this script (located on the same server).
When the script executes it searches for the computer name and send the user to his personal page:
Set wshShell = CreateObject( "WScript.Shell" )
strComputerName = wshShell.ExpandEnvironmentStrings( "%COMPUTERNAME%" )
On Error Resume Next
'#01 name_surname
If strComputerName = "XXXXXXXX" Then
CreateObject("WScript.Shell").Run """name_surname.html"""
and so on.
And this is all there is. As mentioned before we don't have admin rights to change anything on the client computer. So nothing is being done on the client side other that executing a script located on the server.

Chrome just doesn't finish loading JS files

I am writing an app that includes about 12 short JS files in the <head> section (I know, I should merge them and move them just before the end of <body>; would get to those when in production)
The trouble is that when I try to load the app in Chrome, Some files load immediately while some never finish loading at all! Chrome keeps trying to load the 12 JS files and never renders the page until I hit "Stop".
When I hit stop, the HTML is rendered and the JS files fail as in the image below:
Note that different JS files fail on each attempt! It's not the same file that gets stuck every time!
Inspecting the headers of the failed files shows "Caution: request is not finished yet". The files are stuck in "Receiving" sometimes for many minutes!
Now here's the fun part, after hitting stop, if I focus on the omnibar and press enter, all the JS files load instantly and the application works fine!
On the server side, I am using Apache, PHP and MySQL. Have I misconfigured something in Apache?
STATUS after 2 gruelling days: zilch, nothing, nada, this is driving me nuts. I have tried running the code from different machines, have tried changing apache cache settings and changed myriad things in javascript but nothing has worked. The worst thing is that no one can pin point where the problem is!
One possibility is that you have Keep-Alive enabled and it is not working properly. I've seen this before. The browser establishes it's maximum number of connections to your server (typically 6) to download the first few files (CSS, JS etc.) then those connections are not released until they time out. My symptoms were not quite the same as yours - the timeout was 20 seconds and everything would load in batches of 6 after that - but this could still be the cause.
In your Apache configuration (httpd.conf on most systems), look for the KeepAlive line (or add it if it's missing) and set it to Off.
More than an answer, here's how I would troubleshoot the problem.
One of the things I'd try is commenting out tags one at a time and reload, to see where the threshold is. Also, because this is probably a server configuration problem, I'd restart it after each try, to have a clean slate, so to speak, i.e., no state preserved between tries.
I'd try to get more hints by trying to make the requests for the various Javascripts from a script in your favourite language. Ideally, I'd try GETting the scripts one by one (say with curl) waiting a few milliseconds between requests. I imagine I'd hit a threshold here as well. Like, getting one script per second works, but when requests get too close, the server gets stuck.
If still no clue, I'd use tcpdump to watch the traffic between the browser and the server. Okay, this may be a little too low level!
But perhaps I'd use netstat to see how many connections the browser opens in parallel to the server to fetch the resources, and see if we hit a concurrency limit.
I'm sorry this is a solution but I hope you get some ideas, and I'd be very curious to know what your problem is, in the end!
We got exactly the same message "Caution: request is not finished yet" after a request in the browser.
The request itself was in the order of 15 MB of JSON, which was fed into an Angular 1 application. This request took about 2 seconds. Right after the request was finished, the Angular digest cycle blocked the browser for more than 12 seconds (this is visible by profiling during the request). During that time Chrome showed this "Caution: request is not finished yet" message in request whilst it actually was already finished.
Check the Content-Length header. Server may pass incorrect value - greather than actual content length.
Try to emulate problem in https://www.stevesouders.com/cuzillion/
It should say you side of problem - chrome or server
There are several good answers, but if you want a foolproof method of doing this, you can use PHP to send all the scripts at once. If the problem is truly too many connections to the server, you could add some html code like this:
<script type="text/javascript" src="scripts/scripts.php />
And then in scripts.php you use the include() function to include all of the JavaScript files like so:
<?php
header(''); // control all your headers
include('jquery-1.9.1.min.js');
// rest of scripts
?>
If nothing else works and the problem is excessive connections, this should work.

What my host could be blocking| dynamic scripts doesnt load

I am not an expert in javascripts/ajax, but I know some html,css an php....
And I have a joomla blogs... with ajax extensions...
The problem is, that some scripts are supposed to do dynamical loading, and they never work on my current host...but they are loading just fine on my other hosting, on wamp, etc....I don't know what is going on and I believe the host is something blocking, but their only answer is "contact developer, we can't find anything"....:(
I believe they are still blocked something, but I can't tell what....I don't get even error messages, and firebug doesn't show any errors, just not loading and that's it....
The first case was with search results, which were supposed to load more, when scrolling down --> I got a blank page, and no errors, even I set error displaying to the max...
Second case - now - I have on home page intro from articles, that are loading fine, but when I set in a module the option - "to load more on click of button" - nothing happens...
but on both cases it works fine on wamp and another similar hosting....
If you haven't already I would try to put the URL that your AJAX is using right into your browser and see what that returns.

Full Ajax site, redirecting from "normal" URL to Ajax (fragment) URL automatically?

Ok, so all the rage these days is having a site like this:
mysite.com/
mysite.com/about
mysite.com/contact
But then if the user has Javascript enabled, then to have them browse those pages with Ajax:
mysite.com/#/
mysite.com/#/about
mysite.com/#/contact
That's all well and good. I have that all working perfectly well.
My question is, if the user arrives at "mysite.com/about", I want to automatically redirect them to "mysite.com/#/about" immediately if they have Javascript.
I have it working so if they arrive at "mysite.com/about", that page will load fine on its own (no redirects) and then all clicks after that load via ajax, but the pre-fragment URL doens't change. e.g. if they arrive on "mysite.com/about" and then click "contact", the new URL will be "mysite.com/about#/contact". I really don't like that though, it's very ugly.
The only way I can think of to automatically redirect a user arriving at "mysite.com/about" to "mysite.com/#/about" is to have some javascript in the header that is ONLY run if the page is NOT being loaded via ajax. That code looks like this ($ = jQuery):
$(function(){
if( !location.hash || location.hash.substr(1,1) != '/' ) {
location.replace( location.protocol+'//'+location.hostname+'/#'+location.pathname+location.search );
}
});
That technically works, but it causes some very strange behavior. For example, normally when you "view source" for a page that has some ajax content, that ajax content will not be in the source because you're viewing the original page's source. Well, when I view source after redirecting like this, then the code I see is ONLY the code that was loaded via Ajax - I've never seen anything like that before. This happens in both Firefox 3.6 and Chrome 6.0. I haven't verified it with other browsers yet but the fact that two browsers using completely different engines exhibit the same behavior indicates I am doing something bad (e.g. not just a bug with FF or Chrome).
So somehow the browser thinks the page I'm on "is" the Ajax page. I can continue to browse around and it works fine, but if I e.g. close Firefox and re-open it (and it re-opens the pages I was on), it only reloads the Ajax fragment of the page, and not the whole wrapper, until I do a manual refresh. (Chrome doesn't do this though, only Firefox). I've never seen anything like that.
I've tried using setTimeout so it does not do the redirect until after the page has fully loaded, but the same thing happens. Basically, as far as I can tell, this only works if the fragment is put there as the result of a user action (click), and not automatically.
So my question is - what's the best way to automatically redirect a Javascript capable browser from a "normal" URL to an Ajax URL? Anyone have experience doing this? I know there are sites that do this - e.g., http://rdio.com (a music site). No weirdness happens there, but I can't figure out how they're doing it.
Thanks for any advice.
This behavior is like the new twitter. If you type the URL:
http://twitter.com/dinizz
You will be redirected to:
http://twitter.com/#!/dinizz
I realize that this is done, not with javascript but in the server side. I am looking for a solution to implements this using ruby on rails.
Although I suggest you to take a look on this article: Making AJAX Applications Crawlable

Is there a way to mitigate downloading of resources (images/css and js files) with Javascript?

I have a html page on my localhost - get_description.html.
The snippet below is part of the code:
<input type="text" id="url"/>
<button id="get_description_button">Get description</button>
<iframe id="description_container" src="#"/>
When the button is clicked the src of the iframe is set to the url entered in the textbox. The pages fetched this way are very big with lots of linked files. What I am interested in the page is a block of text contained in a <div id="description"> element.
Is there a way to mitigate downloading of resources linked in the page that loads into the iframe?
I don't want to use curl because the data is only available to logged in users and the steps to take with curl to get the content is too complicated. The iframe is simple as I use this on a box which sends the right cookies to identify the request as coming from a logged in user, but the problem is that it is very wasteful to get nearly 1 MB of data to keep 1 KB of it and throw out the rest.
Edit
If the proposed method just works in Firefox it is fine, so I added Firefox tag. Also, it is possible that the answer actually is from the realm of Firefox add-on techniques, so I added that tag as well.
The problem is not that I cannot get at what I'm looking for, rather, the problem is the easy iframe method is wasteful.
I know that Firefox does allow loading only the text of a page. If you open a page and press Ctrl+U you are taken to 'view page source' window, There links behave as normal and are clickable, if you click on a link in source view, the source of the new page is loaded into the view source window, without the linked resources being downloaded, exactly what I'm trying to get. But I don't know how to access this behaviour.
Another example is the Adblock add-on. It somehow kills elements before they get loaded. With plain Javascript this is not possible. Because it only is triggered too late to intervene in good time.
The Same Origin Policy forbids any web page to access contents of any other web page in a different domain so basically you cannot do that.
However it seems that with some browsers it is allowed to access web pages content if you are trying to access it from a local web page which seems to be your case.
Safari, IE 6/7/8 are browser that allow a local web page to do so via XMLHttpRequest (source: Google Browser Security Handbook) so you may want to choose to use one of those browsers to do what you need (note that future versions of those browsers may not allow to do so anymore).
A part from this solution I only see two possibities:
If the web pages you need to fetch content from are somehow controlled by you, you can create a simpler interface to let other web pages to get the content you need (for example allowing JSONP requests).
If the web pages you need to fetch content from are not controlled by you the only solution I see is to fetch content server side logging in from the server directly (I know that you don't want to do so, but I don't see any other possibility if the previous I mentioned are not practicable)
Hope it helps.
Actually I've seen Cross Domain jQuery .load request before, here: http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/
The author claims that codes like these found on that page
$('#container').load('http://google.com'); // SERIOUSLY!
$.ajax({
url: 'http://news.bbc.co.uk',
type: 'GET',
success: function(res) {
var headline = $(res.responseText).find('a.tsh').text();
alert(headline);
}
});
// Works with $.get too!
would work. (The BBC code might not work because of the recent redesign, but you get the idea)
Apparently it is using YQL wrapped into a jQuery plugin to do the trick. Now I cannot say I fully understand what he is doing there but it appears to work, and fits the bill. Once you load the data I suppose it is a simple matter of filtering out the data that you need.
If you prefer something that works at the browser level, may I suggest Mozilla's Jetpack framework for lightweight extensions. I've not yet read the documentations in its entirety but it should contain the APIs needed for this to work.
There are various ways to go about this in AJAX, I'm going to show the jQuery way for brevity as one option, though you could do this in vanilla JavaScript as well.
Instead of an <iframe> you can just use a container, let's say a <div> like this:
<div id="description_container"></div>
Then to load it:
$(function() {
$("#get_description_button").click(function() {
$("#description_container").load($("input").val() + " #description");
});
});
This uses the .load() method which takes a string in this format: .load("url selector"), then takes that element in the page and places it's content inside the container you're loading, in this case #description_container.
This is just the jQuery route, mainly to illustrate that yes, you can do what you want, but you don't have to do it exactly like this, just showing the concept is getting what you want from an AJAX request, rather than in an <iframe>.
Your description sounds like you are fetching pages from the same domain (you said that you need to be logged in and have session credentials) so have you tried to use async request via XMLHttpRequest? It might complain if the html on a page is particularly messed up but you chould still be able to get raw text via .responseText and extract what you need with a regex.

Categories