So, I have a project where I need to get the photos from a profile.
I am able to navigate to the photos page of a profile, but I believe the JavaScript is not loading.
I am currently using HtmlUnit but if you know of another Java API that would help I'm all ears.
Basically, when I view Facebook in a normal browser, it will load all of the pages and I can inspect the elements.
When inspecting, there is a div called fbStarGrid and a few other modifiers. This div contains all the images for a user's profile.
When I use HTMLUnit, I cannot find the div. I had it print the full page XML to a file, and I found that the div is commented out. I believe this means the Javascript never ran to load the content.
After browsing a lot of javascript help on SO, I have found a few things that help with debugging but can't seem to fix the problem.
The first thing I've done is create an instance of a JavaScriptJobManager. I used it to see how much JavaScript is not complete. After waiting for a while (10+ seconds) it says there are still 3 JS jobs uncomplete. After a very long time (about 60 seconds), it says there are 2 JS jobs uncomplete.
I do not know what is hanging with those JS jobs.
I get a warning upon page load about application/ld+json not running but I do not believe that part of the website is related to the photos.
Is there something I can do to force the JS to run? Is there a job it's stuck on and won't proceed to the next job?
I've also wondered if it's an issue with the page not re-syncing.
I've tried two solutions related to this:
Setting the AjaxController to NicelyResynchronizingAjaxController()
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
And someone suggested creating a custom controller that forces syncing.
webClient.setAjaxController(new AjaxController(){
#Override
public boolean processSynchron(HtmlPage page, WebRequest request, boolean async)
{
return true;
}
});
Neither of these seemed to effect the page.
If HTMLUnit is not the right library for the job, any other ideas? I need this to be headless/guiless to run on a linux server. Java is preferred, but I can switch languages if necessary.
Due to a PC crash, my project has lost some unsaved code and I am running to different errors. One of them being that on clicking on any anchor tag on the page, I can see the function for ng-click is called but also at the end page navigates to server-base path. I have no clue what causes this navigation and I am trying to find out. How can I debug to find out what is causing this page unload/ navigation? Is there any defined rule for angular that makes it go to default path if something goes wrong?
On the debug console, I can see the following highlighted network request made about which I have no clue
What does 'Other' as the request source mean? The GetSourceList calls specifies that angularjs called it, but what about the requests tagged as others?
On my page I have boostrapped a javascript file to the head section. The URL is correct. The Javascript file exists on my server at that correct URL. I checked with fiddler2 and it shows that a server request is never sent for the file. The file has a simple test script
$(document).ready(function() {
alert("blah blah");
});
Now it was working just fine. It fired the alert every time I refreshed the page while I was working on that particular page for like 30 minutes. I got up, went to grab a soda and some left over pizza. Papa Johns of course. But when I came back and refreshed the page.. The javascript file no longer showed up bootstrapped to the head section. The alert no longer fires either.
<script type="text/javascript" src="js/events.js"></script>
Hmm so I cleared cache. Refreshed, but nothing.
Tried Firefox, Opera, IE, Safari and Chrome. Nothing.
Checked server. Still there and nothing. Chmod 777 to the file and
its directory. Nothing.
Checked console for errors. No errors and still, nothing.
Saved file and opened in notepad++, checked show all characters
(to find hidden treacherous keys), nothing unusual. Just Carriage
return and tab. But still of course nothing.
Mind blown. Help?
I rebooted my server and mysteriously it started working again. I'm sure there is perfectly logical explanation for this but it's something beyond the scope of my knowledge. If someone has an explanation, I would really like to hear it. If you want I can pull out server logs if you need them.
But suffice it to say its working again.
Since 12 june 2012 11:20 TU, I see very weirds errors in my varnish/apache logs.
Sometimes, when a user has requested one page, several seconds later I see a similar request but the all string after the last / in the url has been replaced by "undefined".
Example:
http://example.com/foo/bar triggers a http://example.com/foo/undefined request.
Of course theses "undefined" pages does not exist and my 404 page is returned instead (which is a custom page with a standard layout, not a classic apache 404)
This happens with any pages (from the homepage to the deepest)
with various browsers, (mostly Chrome 19, but also firefox 3.5 to 12, IE 8/9...) but only 1% of the trafic.
The headers sent by these request are classic headers (and there is no ajax headers).
For a given ip, this seems occur randomly: sometimes at the first page visited, sometimes on a random page during the visit, sometimes several pages during the visit...
Of course it looks like a javascript problem (I'm using jquery 1.7.2 hosted by google), but I've absolutely nothing changed in the js/html or the server configuration since several days and I never saw this kind of error before. And of course, there is no such links in the html.
I also noticed some interesting facts:
the undefined requests are never found as referer of another pages, but instead the "real" pages were used as referer for the following request of the same IP (the user has the ability to use the classic menu on the 404 page)
I did not see any trace of these pages in Google Analytics, so I assume no javascript has been executed (tracker exists on all pages including 404)
nobody has contacted us about this, even when I invoked the problem in the social networks of the website
most of the users continue the visit after that
All theses facts make me think the problem occurs silently in the browers, probably triggered by a buggy add-on, antivirus, a browser bar or a crappy manufacturer soft integrated in browsers updated yesterday (but I didn't find any add-on released yesterday for chrome, firefox and IE).
Is anyone here has noticed the same issue, or have a more complete explanation?
There is no simple straight answer.
You are going to have to debug this and it is probably JavaScript due to the 'undefined' word in the URL. However it doesn't have to be AJAX, it could be JavaScript creating any URL that is automatically resolved by the browser (e.g. JavaScript that sets the src attribute on an image tag, setting a css-image attribute, etc). I use Firefox with Firebug installed most of the time, so my directions will be with that in mind.
Firebug Initial Setup
Skip this if you already know how to use Firebug.
After the installs and restarting Firefox for Firebug, you are going to have to enable most of Firebug's 'panels'. To open Firebug there will be a little fire bug/insect looking thing in the top right corner of your browser or you can press F12. Click through the Firebug tabs 'Console', 'Script', 'Net' and enable them by opening them up and reading the panel's information. You might have to refresh the page to get them working properly.
Debugging User Interaction
Navigate to one of the pages that has the issue with Firebug open and the Net panel active. In the Net panel there will be a few options: 'Clear', 'Persist', 'All', 'Html', etc. Make sure ALL is selected. Don't do anything on the page and try not to mouse over anything on it. Look through the requests. The request for the invalid URL will be red and probably have a status of 404 Not Found (or similar).
See it on load? Skip to the next part.
Don't see it on initial load? Start using your page and continue here.
Start clicking on every feature, mouse over everything, etc. Keep your eyes on the Net panel and watch for a requests that fail. You might have to be creative, but continue using your application till you see your browser make an invalid request. If the page makes many requests, feel free to hit the 'Clear' button on the top left of the Net panel to clear it up a bit.
If you submit the page and see a failed request go out really quick but then lose it because the next page loads, enable persistence by clicking 'Persist' in the top left of the Net panel.
Once it does, and it should, consider what you did to make that happen. See if you can make it happen again. After you figure out what user interaction is making it happen, dive into that code and start looking for things that are making invalid requests.
You can use the Script tab to setup breakpoints in your JavaScript and step through them. Investigate event handlers done via $(elemment).bind/click/focus/etc or from old school event attributes like onclick=""/onfocus="" etc.
If the request is happening as soon as the page loads
This is going to be a little harder to peg down. You will need to go to the Script tab and start adding break points to every script that runs on load. You do this by clicking on the left side of the line of JavaScript.
Reload your page and your break points should stop the browser from loading the page. Press the 'Continue' button on the script panel. Go to your net panel and see if your request was made, continue till it is found. You can use this to narrow down where the request is being made from by slowly adding more and more break points and then stepping into and out of functions.
What you are looking for in your code
Something that is similar to the following:
var url = workingUrl + someObject['someProperty'];
var url = workingUrl + someObject.someProperty;
Keep in mind that someObject might be an object {}, an array [], or any of the internal browser types. The point is that a property will be accessed that doesn't exist.
I don't see any 404/red requests
Then whatever is causing it isn't being triggered by your tests. Try using more things. The point is you should be able to make the request happen somehow. You just don't know yet. It has to show up in the Net panel. The only time it won't is when you aren't doing whatever triggers it.
Conclusion
There is no super easy way to peg down what exactly is going on. However using the methods I outlined you should be at least be able to get close. It is probably something you aren't even considering.
Based on this post, I reverse-engineered the "Complitly" Chrome Plugin/malware, and found that this extension is injecting an "improved autocomplete" feature that was throwing "undefined" requests at every site that has a input text field with NAME or ID of "search", "q" and many others.
I found also that the enable.js file (one of complitly files) were checking a global variable called "suggestmeyes_loaded" to see if it's already loaded (like a Singleton). So, setting this variable to false disables the plugin.
To disable the malware and stop "undefined" requests, apply this to every page with a search field on your site:
<script type="text/javascript">
window.suggestmeyes_loaded = true;
</script>
This malware also redirects your users to a "searchcompletion.com" site, sometimes showing competitors ADS. So, it should be taken seriously.
You have correctly established that the undefined relates to a JavaScript problem and if your site users haven't complained about seeing error pages, you could check the following.
If JavaScript is used to set or change image locations, it sometimes happens that an undefined makes its way into the URI.
When that happens, the browser will happily try to load the image (no AJAX headers), but it will leave hints: it sets a particular Accept: header; instead of text/html, text/xml, ... it will use image/jpeg, image/png, ....
Once such a header is confirmed, you have narrowed down the problem to images only. Finding the root cause will possibly take some time though :)
Update
To help debugging you could override $.fn.attr() and invoke the debugger when something is being assigned to undefined. Something like this:
(function($, undefined) {
var $attr = $.fn.attr;
$.fn.attr = function(attributeName, value) {
var v = attributeName === 'src' ? value : attributeName.src;
if (v === 'undefined') {
alert("Setting src to undefined");
}
return $attr(attributeName, value);
}
}(jQuery));
Some facts that have been established, especially in this thread: http://productforums.google.com/forum/#!msg/chrome/G1snYHaHSOc/p8RLCohxz2kJ
it happens on pages that have no javascript at all.
this proves that it is not an on-page programming error
the user is unaware of the issue and continues to browse quite happily.
it happens a few seconds after the person visits the page.
it doesn't happen to everybody.
happens on multiple browsers (Chrome, IE, Firefox, Mobile Safari, Opera)
happens on multiple operating systems (Linux, Android, NT)
happens on multiple web servers (IIS, Nginx, Apache)
I have one case of googlebot following the link and claiming the same referrer. They may just be trying to be clever and the browser communicated it to the mothership who then set out a bot to investigate.
I am fairly convinced by the proposal that it is caused by plugins. Complitly is one, but that doesn't support Opera. There many be others.
Though the mobile browsers weigh against the plugin theory.
Sysadmins have reported a major drop off by adding some javascript on the page to trick Complitly into thinking it is already initialized.
Here's my solution for nginx:
location ~ undefined/?$ {
return 204;
}
This returns "yeah okay, but no content for you".
If you are on website.com/some/page and you (somehow) navigate to website.com/some/page/undefined the browser will show the URL as changed but will not even do a page reload. The previous page will stay as it was in the window.
If for some reason this is something experienced by users then they will have a clean noop experience and it will not disturb whatever they were doing.
This sounds like a race condition where a variable is not getting properly initialized before getting used. Considering this is not an AJAX issue according to your comments, there will be a couple of ways of figuring this out, listed below.
Hookup a Javascript exception Logger: this will help you catch just about all random javascript exceptions in your log. Most of the time programmatic errors will bubble up here. Put it before any scripts. You will need to catch these on the server and print them to your logs for analysis later. This is your first line of defense. Here is an example:
window.onerror = function(m,f,l) {
var e = window.encodeURIComponent;
new Image().src = "/jslog?msg=" + e(m) + "&filename=" + e(f) + "&line=" + e(l) + "&url=" + e(window.location.href);
};
Search for window.location: for each of these instances you should add logging or check for undefined concats/appenders to your window.location. For example:
function myCode(loc) {
// window.location.href = loc; // old
typeof loc === 'undefined' && window.onerror(...); //new
window.location.href = loc; //new
}
or the slightly cleaner:
window.setLocation = function(url) {
/undefined/.test(url) ?
window.onerror(...) : window.location.href = url;
}
function myCode(loc) {
//window.location.href = loc; //old
window.setLocation(loc); //new
}
If you are interested in getting stacktraces at this stage take a look at: https://github.com/eriwen/javascript-stacktrace
Grab all unhandled undefined links: Besides window.location The only thing left are the DOM links themselves. The third step is to check all unhandeled DOM links for your invalid URL pattern (you can attach this right after jQuery finishes loading, earlier better):
$("body").on("click", "a[href$='undefined']", function() {
window.onerror('Bad link: ' + $(this).html()); //alert home base
});
Hope this is helpful. Happy debugging.
I'm wondering if this might be an adblocker issue. When I search through the logs by IP address it appears that every request by a particular user to /folder/page.html is followed by a request to /folder/undefined
I don't know if this helps, but my website is replacing one particular *.webp image file with undefined after it's loaded in multiple browsers. Is your site hosting webp images?
I had a similar problem (but with /null 404 errors in the console) that #andrew-martinez's answer helped me to resolve.
Turns out that I was using img tags with an empty src field:
<img src="" alt="My image" data-src="/images/my-image.jpg">
My idea was to prevent browser from loading the image at page load to manually load later by setting the src attribute from the data-src attribute with javascript (lazy loading). But when combined with iDangerous Swiper, that method caused the error.
Ok, so all the rage these days is having a site like this:
mysite.com/
mysite.com/about
mysite.com/contact
But then if the user has Javascript enabled, then to have them browse those pages with Ajax:
mysite.com/#/
mysite.com/#/about
mysite.com/#/contact
That's all well and good. I have that all working perfectly well.
My question is, if the user arrives at "mysite.com/about", I want to automatically redirect them to "mysite.com/#/about" immediately if they have Javascript.
I have it working so if they arrive at "mysite.com/about", that page will load fine on its own (no redirects) and then all clicks after that load via ajax, but the pre-fragment URL doens't change. e.g. if they arrive on "mysite.com/about" and then click "contact", the new URL will be "mysite.com/about#/contact". I really don't like that though, it's very ugly.
The only way I can think of to automatically redirect a user arriving at "mysite.com/about" to "mysite.com/#/about" is to have some javascript in the header that is ONLY run if the page is NOT being loaded via ajax. That code looks like this ($ = jQuery):
$(function(){
if( !location.hash || location.hash.substr(1,1) != '/' ) {
location.replace( location.protocol+'//'+location.hostname+'/#'+location.pathname+location.search );
}
});
That technically works, but it causes some very strange behavior. For example, normally when you "view source" for a page that has some ajax content, that ajax content will not be in the source because you're viewing the original page's source. Well, when I view source after redirecting like this, then the code I see is ONLY the code that was loaded via Ajax - I've never seen anything like that before. This happens in both Firefox 3.6 and Chrome 6.0. I haven't verified it with other browsers yet but the fact that two browsers using completely different engines exhibit the same behavior indicates I am doing something bad (e.g. not just a bug with FF or Chrome).
So somehow the browser thinks the page I'm on "is" the Ajax page. I can continue to browse around and it works fine, but if I e.g. close Firefox and re-open it (and it re-opens the pages I was on), it only reloads the Ajax fragment of the page, and not the whole wrapper, until I do a manual refresh. (Chrome doesn't do this though, only Firefox). I've never seen anything like that.
I've tried using setTimeout so it does not do the redirect until after the page has fully loaded, but the same thing happens. Basically, as far as I can tell, this only works if the fragment is put there as the result of a user action (click), and not automatically.
So my question is - what's the best way to automatically redirect a Javascript capable browser from a "normal" URL to an Ajax URL? Anyone have experience doing this? I know there are sites that do this - e.g., http://rdio.com (a music site). No weirdness happens there, but I can't figure out how they're doing it.
Thanks for any advice.
This behavior is like the new twitter. If you type the URL:
http://twitter.com/dinizz
You will be redirected to:
http://twitter.com/#!/dinizz
I realize that this is done, not with javascript but in the server side. I am looking for a solution to implements this using ruby on rails.
Although I suggest you to take a look on this article: Making AJAX Applications Crawlable