I am trying to run my selenium javascript on the site bet365.com. I am using Firefox (geckodriver), I tried both headless and normal but for understanding/debugging the problem the non-headless-mode is helpful.
This is the code:
const driver = await new Builder().forBrowser("firefox").build();
await driver.get("https://bet365.com");
The problem is that the site is not loading:
After 5 mins I then end up with the error
TimeoutError: TimedPromise timed out after 300000 ms
at Object.throwDecodedError (C:\1_code\RFB\git\webscraper-srv\node_modules\selenium-webdriver\lib\error.js:517:15)
at parseHttpResponse (C:\1_code\RFB\git\webscraper-srv\node_modules\selenium-webdriver\lib\http.js:671:13)
at Executor.execute (C:\1_code\RFB\git\webscraper-srv\node_modules\selenium-webdriver\lib\http.js:597:28)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
at async Driver.execute (C:\1_code\RFB\git\webscraper-srv\node_modules\selenium-webdriver\lib\webdriver.js:729:17)
at async C:\1_code\RFB\git\webscraper-srv\scripts\seleniumUtils.js:190:3
at async XWrap.<anonymous> (C:\1_code\RFB\git\webscraper-srv\scripts\utils.js:127:60)
at async XWrap.<anonymous> (C:\1_code\RFB\git\webscraper-srv\scripts\utils.js:127:60)
at async XWrap.<anonymous> (C:\1_code\RFB\git\webscraper-srv\scripts\utils.js:127:60)
at async XWrap.<anonymous> (C:\1_code\RFB\git\webscraper-srv\scripts\utils.js:127:60) {
remoteStacktrace: 'WebDriverError#chrome://marionette/content/error.js:181:5\n' +
'TimeoutError#chrome://marionette/content/error.js:450:5\n' +
'bail#chrome://marionette/content/sync.js:229:19\n'
I tried visiting a different site with selenium and they work perfectly so I don't think its a problem with my setup.
If I try visiting the site with my normal Firefox Browser it works too.
I also tried manually searching for the page in the browser which is opened by the program and it leads to another endless loop. But manually opening other pages in this browser opened by selenium works fine.
Is it possible for the server of a webpage to detect browsers that were started using selenium? I always thought the only way to detect webscrapers was by looking at the frequency it is visiting and the clicks the scraper is doing on a page but it is the first time I visited the page with Selenium...
If the server doesn't allow these kind of requests, is there any way to still scrape data from this webpage? I also already tried opening it in headless mode...
This is the (entire) Firefox Network tab when its stuck loading (sometimes it looks a little different):
This is the (begin of the) Firefox Network tab when its loaded normally:
I circled the requests that might be causing the problem. In the bottom left of the browser it's telling me the entire time that it's Transfering data from ff.kis.v2.scr.kaspersky-labs.com. I tried deactivating kaspersky on my machine and also let the progamme run on a machine without kaspersky so I am not quite sure why this request is made. It might have to do something with https validation but I am not sure.
Another interesting thing is that the response of the first request to www.bet365.com/ looks like this (even with selenium):
Meaning that it does actually reach the server but it just sends a loading screen. Also the following requests get the same response as with the normal browser. Only the requests with status 101 don't give back any response, unlike with the normal browser.
Last interesting thing is this request www.bet365.com/increment?desktop-site-loaded_11=1. It is only made when starting it with the selenium browser, not when opening the site with a normal browser. This might mean that its not a loading problem but that its actively blocking the request and telling the backend to increase a counter of requests that were blocked.
Any ideas how I can get the code working or why this problem comes up?
try
driver.execute_script(f"location.href='{url}';")
instand of
driver.get
Related
I am writing an app that includes about 12 short JS files in the <head> section (I know, I should merge them and move them just before the end of <body>; would get to those when in production)
The trouble is that when I try to load the app in Chrome, Some files load immediately while some never finish loading at all! Chrome keeps trying to load the 12 JS files and never renders the page until I hit "Stop".
When I hit stop, the HTML is rendered and the JS files fail as in the image below:
Note that different JS files fail on each attempt! It's not the same file that gets stuck every time!
Inspecting the headers of the failed files shows "Caution: request is not finished yet". The files are stuck in "Receiving" sometimes for many minutes!
Now here's the fun part, after hitting stop, if I focus on the omnibar and press enter, all the JS files load instantly and the application works fine!
On the server side, I am using Apache, PHP and MySQL. Have I misconfigured something in Apache?
STATUS after 2 gruelling days: zilch, nothing, nada, this is driving me nuts. I have tried running the code from different machines, have tried changing apache cache settings and changed myriad things in javascript but nothing has worked. The worst thing is that no one can pin point where the problem is!
One possibility is that you have Keep-Alive enabled and it is not working properly. I've seen this before. The browser establishes it's maximum number of connections to your server (typically 6) to download the first few files (CSS, JS etc.) then those connections are not released until they time out. My symptoms were not quite the same as yours - the timeout was 20 seconds and everything would load in batches of 6 after that - but this could still be the cause.
In your Apache configuration (httpd.conf on most systems), look for the KeepAlive line (or add it if it's missing) and set it to Off.
More than an answer, here's how I would troubleshoot the problem.
One of the things I'd try is commenting out tags one at a time and reload, to see where the threshold is. Also, because this is probably a server configuration problem, I'd restart it after each try, to have a clean slate, so to speak, i.e., no state preserved between tries.
I'd try to get more hints by trying to make the requests for the various Javascripts from a script in your favourite language. Ideally, I'd try GETting the scripts one by one (say with curl) waiting a few milliseconds between requests. I imagine I'd hit a threshold here as well. Like, getting one script per second works, but when requests get too close, the server gets stuck.
If still no clue, I'd use tcpdump to watch the traffic between the browser and the server. Okay, this may be a little too low level!
But perhaps I'd use netstat to see how many connections the browser opens in parallel to the server to fetch the resources, and see if we hit a concurrency limit.
I'm sorry this is a solution but I hope you get some ideas, and I'd be very curious to know what your problem is, in the end!
We got exactly the same message "Caution: request is not finished yet" after a request in the browser.
The request itself was in the order of 15 MB of JSON, which was fed into an Angular 1 application. This request took about 2 seconds. Right after the request was finished, the Angular digest cycle blocked the browser for more than 12 seconds (this is visible by profiling during the request). During that time Chrome showed this "Caution: request is not finished yet" message in request whilst it actually was already finished.
Check the Content-Length header. Server may pass incorrect value - greather than actual content length.
Try to emulate problem in https://www.stevesouders.com/cuzillion/
It should say you side of problem - chrome or server
There are several good answers, but if you want a foolproof method of doing this, you can use PHP to send all the scripts at once. If the problem is truly too many connections to the server, you could add some html code like this:
<script type="text/javascript" src="scripts/scripts.php />
And then in scripts.php you use the include() function to include all of the JavaScript files like so:
<?php
header(''); // control all your headers
include('jquery-1.9.1.min.js');
// rest of scripts
?>
If nothing else works and the problem is excessive connections, this should work.
I am currently using faye for pubsub and am disconnecting the client on the beforeunload event.While it disconnects during tab close during page refresh it throws the following error :
The connection was interrupted while the page was loading
The code is
window.addEventListener('beforeunload',function(event){
fayeClient.disconnect();
event.preventDefault();});
Is there a way to stop firefox from closing the connection before the call completes.The above code works perfectly in chrome
How can I prevent a page unload with jQuery?
way down in the comments it says:
event.preventDefault() doesn't work in this case, presumably because modern browsers don't want malicious coders to hijack the window and make it un-closable? – yochannah May 9 '13 at 8:45
I dont believe it is possible to delay it longer then it takes for that code to execute, excluding any async returns and timeouts.
So, to hack this. Call your disconnect, then make a synchronous call to a file which does
<?php
sleep(1);
My app at http://beta.billboard.fm is producing errors in my normal browsing session after playing a single song.
If i reload the page in incognito, the app works fully. I only recently starting experiencing these issues. I have completed cleared all of the cache and it works again, but only temporarily before throwing the same errors.
Additionally I have disable all browser extensions.
But, no matter what I do I can't get this error from being thrown by the Youtube API:
Unable to post message to http://www.youtube.com. Recipient has origin https://www.youtube.com
It looks like there is a mismatch in the security protocols. I tried changing them to https or just removing "http:" all together on my side. But it did not resolve the issue.
Any one have an idea what is happening here?
It is quite clear to me at this point that this is a major bug in Google/YouTube's API. They have written some bad code somewhere. This bug is not a consistent thing. This is well documented by the fact that everybody's code works just fine for an extended period of time, and then they discover that all of a sudden their sites stop working properly. Additionally, all of my websites that had this problem last week are now working without a glitch - again, without me altering code.
So while it sucks to say this - the onus is on Google & YouTube to fix this and provide APIs that actually work as advertised... It doesn't look to me like there's anything we can do about it on our own :(
I am having the same problem - I also tried changing my links to http: to https: and vice-versa with no luck. I found this tread on Google Groups, but so far there has been no response. https://code.google.com/p/gdata-issues/issues/detail?id=4697
Clearing my cache allowed the player to work for a few videos, but after 3 or 4, the same error pops back up.
UPDATE 2 - Dec. 24, 2013: This solution has not actually fixed the problem at all:
After following a thread that poulified referred me to in his answer, a user in the forum posted the following solution which seems to be doing the trick for me (UPDATE: Still experiencing issues on random page loads :/):
Hi all,
It is working for replacing http:// with https://
example: http://jsfiddle.net/8tkgW/29/
Please make sure the following tips
load iframe api https://www.youtube.com/player_api
load iframe src path: https://www.youtube.com/embed/0GN2kpBoFs4?rel=0
If load player via new YT.Player, you must check the iframe src path:
setTimeout(function(){
var url = $('#iframe_youtube').prop('src');
if (url.match('^http://') {
$('#iframe_youtube').prop('src', url.replace(/^http:\/\//i, 'https://'));
}
}, 500);
Please refer my github project:
https://github.com/appleboy/js-video-player/blob/master/js/jsplayer.js#L120
I'm loading a json file via ajax. If Chrome dev tools is open, everything functions perfectly. If Chrome dev tools is closed it fails. Thankfully dev tools still keeps doing it's thing even when closed so I can still see the exception I get:
Failed to load resource: the server responded with a status of 412 (Precondition Failed) http://localhost/experiments/escape/maps/test.json
Why would there be a precondition on whether dev tools is open? Also, it seems unlikely that opening and closing the dev tools could in any way affect the server's behaviour so I think it is Chrome that is preventing the request rather than the server as suggested in the exception.
Unfortunately, dev tools does not keep track of network activity when closed so I can't use the network tab to get any further info.
The AJAX is handled via JQuery with the following code:
map.load = function(mapName, tileSource, tileWidth, tileHeight, onLoad) {
$.ajax({
url: '../escape/maps/'+mapName+'.json',
type: 'post',
success: function(mapData) {
// there's loads of stuff in here but I don't think it's relevant to the question as the failure prevents the success method from being called.
}
});
};
This code causes no issues in Firefox and so does seem specifically to be connected to Chrome Dev Tools. Any suggestions welcome as I'm completely flummoxed!
EDIT: Ok so it's not dev tools fault at all - I had disabled the cache in dev tools, re-enabling it allows the script to work correctly. Why does my code depend on the cache? Disabling / enabling the cache in Firefox does not cause any issues
EDIT2: Ok, I think I'm getting close. The precondition that is failing is the if-modified-since condition (the file hasn't changed). I assume that chrome is sending this to confirm whether or not to use the cached version, however, despite the precondition failing it does not load the cached version. I thought this might mean the cache was corrupted in some way so I cleared the cache. Unfortunately this doesn't solve the issue. The file will happily load once but on the next time I'm back where I started with the same issue. Any ideas?
#Rondel - You've got it! The issue was that I was stupidly using 'post' to fetch a static file. Post requests are never supposed to be cached so that is why Chrome doesn't retrieve it. I've still got no idea why chrome still sends the if-modified-since header but in any case changing the request type to get is the solution to the problem (Sorry Crome Dev tools for unfairly blaming you - the issue, as usual, was my code!)
Since 12 june 2012 11:20 TU, I see very weirds errors in my varnish/apache logs.
Sometimes, when a user has requested one page, several seconds later I see a similar request but the all string after the last / in the url has been replaced by "undefined".
Example:
http://example.com/foo/bar triggers a http://example.com/foo/undefined request.
Of course theses "undefined" pages does not exist and my 404 page is returned instead (which is a custom page with a standard layout, not a classic apache 404)
This happens with any pages (from the homepage to the deepest)
with various browsers, (mostly Chrome 19, but also firefox 3.5 to 12, IE 8/9...) but only 1% of the trafic.
The headers sent by these request are classic headers (and there is no ajax headers).
For a given ip, this seems occur randomly: sometimes at the first page visited, sometimes on a random page during the visit, sometimes several pages during the visit...
Of course it looks like a javascript problem (I'm using jquery 1.7.2 hosted by google), but I've absolutely nothing changed in the js/html or the server configuration since several days and I never saw this kind of error before. And of course, there is no such links in the html.
I also noticed some interesting facts:
the undefined requests are never found as referer of another pages, but instead the "real" pages were used as referer for the following request of the same IP (the user has the ability to use the classic menu on the 404 page)
I did not see any trace of these pages in Google Analytics, so I assume no javascript has been executed (tracker exists on all pages including 404)
nobody has contacted us about this, even when I invoked the problem in the social networks of the website
most of the users continue the visit after that
All theses facts make me think the problem occurs silently in the browers, probably triggered by a buggy add-on, antivirus, a browser bar or a crappy manufacturer soft integrated in browsers updated yesterday (but I didn't find any add-on released yesterday for chrome, firefox and IE).
Is anyone here has noticed the same issue, or have a more complete explanation?
There is no simple straight answer.
You are going to have to debug this and it is probably JavaScript due to the 'undefined' word in the URL. However it doesn't have to be AJAX, it could be JavaScript creating any URL that is automatically resolved by the browser (e.g. JavaScript that sets the src attribute on an image tag, setting a css-image attribute, etc). I use Firefox with Firebug installed most of the time, so my directions will be with that in mind.
Firebug Initial Setup
Skip this if you already know how to use Firebug.
After the installs and restarting Firefox for Firebug, you are going to have to enable most of Firebug's 'panels'. To open Firebug there will be a little fire bug/insect looking thing in the top right corner of your browser or you can press F12. Click through the Firebug tabs 'Console', 'Script', 'Net' and enable them by opening them up and reading the panel's information. You might have to refresh the page to get them working properly.
Debugging User Interaction
Navigate to one of the pages that has the issue with Firebug open and the Net panel active. In the Net panel there will be a few options: 'Clear', 'Persist', 'All', 'Html', etc. Make sure ALL is selected. Don't do anything on the page and try not to mouse over anything on it. Look through the requests. The request for the invalid URL will be red and probably have a status of 404 Not Found (or similar).
See it on load? Skip to the next part.
Don't see it on initial load? Start using your page and continue here.
Start clicking on every feature, mouse over everything, etc. Keep your eyes on the Net panel and watch for a requests that fail. You might have to be creative, but continue using your application till you see your browser make an invalid request. If the page makes many requests, feel free to hit the 'Clear' button on the top left of the Net panel to clear it up a bit.
If you submit the page and see a failed request go out really quick but then lose it because the next page loads, enable persistence by clicking 'Persist' in the top left of the Net panel.
Once it does, and it should, consider what you did to make that happen. See if you can make it happen again. After you figure out what user interaction is making it happen, dive into that code and start looking for things that are making invalid requests.
You can use the Script tab to setup breakpoints in your JavaScript and step through them. Investigate event handlers done via $(elemment).bind/click/focus/etc or from old school event attributes like onclick=""/onfocus="" etc.
If the request is happening as soon as the page loads
This is going to be a little harder to peg down. You will need to go to the Script tab and start adding break points to every script that runs on load. You do this by clicking on the left side of the line of JavaScript.
Reload your page and your break points should stop the browser from loading the page. Press the 'Continue' button on the script panel. Go to your net panel and see if your request was made, continue till it is found. You can use this to narrow down where the request is being made from by slowly adding more and more break points and then stepping into and out of functions.
What you are looking for in your code
Something that is similar to the following:
var url = workingUrl + someObject['someProperty'];
var url = workingUrl + someObject.someProperty;
Keep in mind that someObject might be an object {}, an array [], or any of the internal browser types. The point is that a property will be accessed that doesn't exist.
I don't see any 404/red requests
Then whatever is causing it isn't being triggered by your tests. Try using more things. The point is you should be able to make the request happen somehow. You just don't know yet. It has to show up in the Net panel. The only time it won't is when you aren't doing whatever triggers it.
Conclusion
There is no super easy way to peg down what exactly is going on. However using the methods I outlined you should be at least be able to get close. It is probably something you aren't even considering.
Based on this post, I reverse-engineered the "Complitly" Chrome Plugin/malware, and found that this extension is injecting an "improved autocomplete" feature that was throwing "undefined" requests at every site that has a input text field with NAME or ID of "search", "q" and many others.
I found also that the enable.js file (one of complitly files) were checking a global variable called "suggestmeyes_loaded" to see if it's already loaded (like a Singleton). So, setting this variable to false disables the plugin.
To disable the malware and stop "undefined" requests, apply this to every page with a search field on your site:
<script type="text/javascript">
window.suggestmeyes_loaded = true;
</script>
This malware also redirects your users to a "searchcompletion.com" site, sometimes showing competitors ADS. So, it should be taken seriously.
You have correctly established that the undefined relates to a JavaScript problem and if your site users haven't complained about seeing error pages, you could check the following.
If JavaScript is used to set or change image locations, it sometimes happens that an undefined makes its way into the URI.
When that happens, the browser will happily try to load the image (no AJAX headers), but it will leave hints: it sets a particular Accept: header; instead of text/html, text/xml, ... it will use image/jpeg, image/png, ....
Once such a header is confirmed, you have narrowed down the problem to images only. Finding the root cause will possibly take some time though :)
Update
To help debugging you could override $.fn.attr() and invoke the debugger when something is being assigned to undefined. Something like this:
(function($, undefined) {
var $attr = $.fn.attr;
$.fn.attr = function(attributeName, value) {
var v = attributeName === 'src' ? value : attributeName.src;
if (v === 'undefined') {
alert("Setting src to undefined");
}
return $attr(attributeName, value);
}
}(jQuery));
Some facts that have been established, especially in this thread: http://productforums.google.com/forum/#!msg/chrome/G1snYHaHSOc/p8RLCohxz2kJ
it happens on pages that have no javascript at all.
this proves that it is not an on-page programming error
the user is unaware of the issue and continues to browse quite happily.
it happens a few seconds after the person visits the page.
it doesn't happen to everybody.
happens on multiple browsers (Chrome, IE, Firefox, Mobile Safari, Opera)
happens on multiple operating systems (Linux, Android, NT)
happens on multiple web servers (IIS, Nginx, Apache)
I have one case of googlebot following the link and claiming the same referrer. They may just be trying to be clever and the browser communicated it to the mothership who then set out a bot to investigate.
I am fairly convinced by the proposal that it is caused by plugins. Complitly is one, but that doesn't support Opera. There many be others.
Though the mobile browsers weigh against the plugin theory.
Sysadmins have reported a major drop off by adding some javascript on the page to trick Complitly into thinking it is already initialized.
Here's my solution for nginx:
location ~ undefined/?$ {
return 204;
}
This returns "yeah okay, but no content for you".
If you are on website.com/some/page and you (somehow) navigate to website.com/some/page/undefined the browser will show the URL as changed but will not even do a page reload. The previous page will stay as it was in the window.
If for some reason this is something experienced by users then they will have a clean noop experience and it will not disturb whatever they were doing.
This sounds like a race condition where a variable is not getting properly initialized before getting used. Considering this is not an AJAX issue according to your comments, there will be a couple of ways of figuring this out, listed below.
Hookup a Javascript exception Logger: this will help you catch just about all random javascript exceptions in your log. Most of the time programmatic errors will bubble up here. Put it before any scripts. You will need to catch these on the server and print them to your logs for analysis later. This is your first line of defense. Here is an example:
window.onerror = function(m,f,l) {
var e = window.encodeURIComponent;
new Image().src = "/jslog?msg=" + e(m) + "&filename=" + e(f) + "&line=" + e(l) + "&url=" + e(window.location.href);
};
Search for window.location: for each of these instances you should add logging or check for undefined concats/appenders to your window.location. For example:
function myCode(loc) {
// window.location.href = loc; // old
typeof loc === 'undefined' && window.onerror(...); //new
window.location.href = loc; //new
}
or the slightly cleaner:
window.setLocation = function(url) {
/undefined/.test(url) ?
window.onerror(...) : window.location.href = url;
}
function myCode(loc) {
//window.location.href = loc; //old
window.setLocation(loc); //new
}
If you are interested in getting stacktraces at this stage take a look at: https://github.com/eriwen/javascript-stacktrace
Grab all unhandled undefined links: Besides window.location The only thing left are the DOM links themselves. The third step is to check all unhandeled DOM links for your invalid URL pattern (you can attach this right after jQuery finishes loading, earlier better):
$("body").on("click", "a[href$='undefined']", function() {
window.onerror('Bad link: ' + $(this).html()); //alert home base
});
Hope this is helpful. Happy debugging.
I'm wondering if this might be an adblocker issue. When I search through the logs by IP address it appears that every request by a particular user to /folder/page.html is followed by a request to /folder/undefined
I don't know if this helps, but my website is replacing one particular *.webp image file with undefined after it's loaded in multiple browsers. Is your site hosting webp images?
I had a similar problem (but with /null 404 errors in the console) that #andrew-martinez's answer helped me to resolve.
Turns out that I was using img tags with an empty src field:
<img src="" alt="My image" data-src="/images/my-image.jpg">
My idea was to prevent browser from loading the image at page load to manually load later by setting the src attribute from the data-src attribute with javascript (lazy loading). But when combined with iDangerous Swiper, that method caused the error.