Splitting requests across domains—thwarting overzealous security

Splitting requests across domains—thwarting overzealous security - javascript

Following on Steve (YSlow) Souder's evangelism, my site (LibraryThing.com) splits requests across domains to facilitate parallel loading. We do CSS, JS and images; you can also do Flash, etc. We also use Google's version of Prototype, which is cross-domain, not just cross-subdomain.
This is all great for speed, but for a small percent of users, it's going wrong. I think the problem is overzealous security settings, probably in IE, but perhaps in other browsers and/or upstream systems as well. I'm amazed Souders and others don't discuss this, as we get it a lot.
The question is: What is the best way to handle this?
Right now, when it hits the bottom of the page we're checking to see if some JS variable, declared in a script that should have loaded, is set. If it isn't set, it gets it from the main domain and sets a cookie so next time it won't load it from the subdomain. But we're only reloading the JS at the bottom, so if the CSS also failed, you're looking at junk.
Does anyone have a better or more generalized solution? I'm thinking that there could be a general "onload" or "onerror" script that sets the cookie AND loads the content?

If this behavior always affects JS files at least, one option would be to keep a cookie indicating whether the user's browser has been tested for this behavior yet. If they've not been tested, insert (as the first script element in the tag) a reference to a cross-domain script that simply sets this cookie to "success". Then immediately afterward have some inline JS that will check for this cookie, and if not set, set to "failed" and reload the page.
Then on the server-side just check for the same cookie, and ensure cross-site requests aren't sent to anyone with a "failed" result.
This approach should ensure that users with browsers that do support cross-site requests don't see any odd behavior, but should immediately fix the problem for other users at the cost of an automatic refresh the first time they visit.

Do you have a specific user-agents list that present this behaviour?
Maybe Apache conf could solve this problem? (or create a new problem for you to solve :-) ).
Watch out for the cookie frenzy - the more you add cookies (moreover, on the main domain), the more your clients will have to send it along their requests.
Souders talked about it too, but it's always good to check your clients browsers sent/received ratio for requests.

I'm going to take some wild guesses about your problem.
Cache. Did you make these changes in the script files these problem users could have older versions. IE 6 is extremely bad with overzealous caching.
I notice your scripts don't have a build # in the url, XYZ.js?version=3 will force the browser not to use the old cached scripts like XYZ.ks?version=2. (Applies to Images/Css as well)
You also have inline javascript mixed in with your HTML which would also get cached.
3 domains is likely overkill unless your site has a TON of content on it (huge pages)
DNS lookups can be expensive and have very long timeout values.
I wouldn't be comfortable putting my sites javascript on a separate domain because of the possible security conflicts. You have to keep your javascript/ajax calls in sync with domains. Seems like more of a hassle than it's worth.
I've been using i.domain.com and domain.com for 5+ years with no issues.
I bet putting the JS back on the main domain will fix your problems. It will certainly make it less complex and easier to deal with.
But done properly, your 3 domains should work. Unfortunately I don't have enough info in this question to find the issue.

Related

How to execute external JS file blocked by users' adblockers

We use an external service (Monetate) to serve JS to our site such that we can perform adhoc presentation-layer site updates without going through the process of a site re-deploy - which in our case is a time-consuming, monolithic process which we can only afford to do about once per month.
However, users who use adblockers in the browser do not see some of these presentation-layer updates. This can negatively affect their experience of the site as we sometimes include time-sensitive promotions that those users may not be aware of.
To work around this, I was thinking to duplicate the JavaScript file that Monetate is serving and host it on a separate infrastructure from the site. That way, it we needed to make updates to it, we could do so as needed without doing a full site re-deploy.
However, I'm wondering if there is some way to work around the blocking of the Monetate JS file and somehow execute the remote Monetate JS file from our own JS code in such a way that adblockers would not be able to block it? This avoid the need to duplicate the file.

If that file is blocked by adblockers, chances are that it is used to serve ads. In fact, your description of time-sensitive promotions sounds an awful lot like ads, just not for an external provider, but for your own site.
Since adblockers usually match the URL, the easiest solution would indeed be to rehost this file, if possible under a different name. Instead of hosting a static copy, you can also implement a simple proxy with the equivalent of <?php readfile('http://monetdate.com/file.js'); or apache's mod_rewrite. While this will increase load times and can fail if the remote host goes down, it means the client will always get the newest version of the file.
Apart from using a different URL, there is no client-side solution - adblockers are included in the browser (or an extension thereof), and you cannot modify that code for good reasons.
Beware that adblockers may decide to block your URL too, if the script is indeed used to serve ads.

Monetate if probably blacklisted in Adblock, so you can't do nothing about.
I think that self-hosting Monetate script would require to keep it updated by checking for new versions from time to time (maintaining it could become a pain in the ass).
A good solution in my opinion is to inform your users about that limitation with a clear message.
Or, you can get in touch with Monetate and ask for a solution.

Javascript concept using history object

I am interested in making a website that flashes through a visitors entire web history when they visit. I plan on using JavaScript to grab the history on each viewer's computer and animate through it with varying speeds depending on how much they had. My thought was to use history.length to determine the length of the visitor's history, and then use history.go() to navigate -1, -2, -3, etc. through the entire web history. I recognize that load times would be HUGE, but right now I am just trying to think through the concept. This related question seems like what I would use for the basis of my code, however, I don't understand why they describe that this method would not work. I am a student who is very new to JavaScript.
Do you guys have any knowledge of whether or not this will work, or any ideas on ways to achieve my idea?

You can call history.go() once. That's about as far as you'll get. The reason is simple, once you're on the previous page, your javascript is gone. Iframes won't work either due to the fact that you can't execute your own JS in an iframe that has a page from another domain. Read about same origin policy for more info on that.
The only real solution I can think of is a browser extension. The reason that'll work is due to the fact that your JS can persist across multiple sites. You'd probably just need a userscript in each page that does the following:
check a variable to see if the functionality is enabled
if it is, call history.go(-1) after a timeout (to control the speed)
I'm most familiar with Chrome so I'm imagining a browserAction to enable/disable the script and a content script that does the redirect. Other potential options include Greasemonkey (Firefox), Tampermonkey (Chrome), Personalized Web (Chrome) scripts

As stated in the question you linked to, JavaScript and / or the DOM does not give you access to the entire browser history since that would be a severe privacy violation. Imagine going to a site and having them be able to know every site you ever visited in that browser.
This would potentially give the site access to:
Sessions you are still logged into on other sites (if they store the session key in the URL, as some sites do)
Insight into what kind of activities you perform (are you a moderator on site X?)
Enormous amounts of data on what you are interested in.
This is not something that standards bodies or browser manufacturers thought users would be interested in sharing with everybody. That's why there isn't an API to walk through the browser's entire history.
#sachleen has already provided a very nice in-depth answer on how you can get around this limitation for individual browsers if you want to build this application. For the sake of completeness I'll simply mention the key term: "browser extension". :-)

How to reliably load required JavaScript files?

I came across the problem when due to internet connection problems, some of the required JavaScript files are not loading. Body onload event gets fired however classes required for logic of the page are not present.
One more thing, problem which I want to fix is not in the website, it is in web application which does not have any image or CSS files. Just imagine a JavaScript code running in iframe. Thus, I have problems only with scripts.
Here are my ideas how to fix this, please comment/correct me if I'm wrong:
Obfuscate and combine files into when pushing to live so overall size of the files will be decreased and task will come to reliably loading 1 file
Enable gzip compression on server. So again resulting file size will be much smaller
Put proper cache headers for that file, so once loaded it will be cached in browser/proxy server
Finally, even having all this, there could be a case that file will not be loaded. In this case I plan to load that file dynamically from JavaScript, once page is loaded. There will be "Retry failed load" logic with maximum 5 attempts for example
Any other ideas?

If the "retry" script fails to grab the required dependencies, redirect to a "no script" version of the site, if one is available. Or try to have a gracefully degrading page, so even if all steps fail, the site is still usable.

1 - Correct but double check if JavaScript functions from different files don't overlap each other.
2 - Correct - this should be always on.
3 - Correct but the Browser will still try to get a HTTP 304: Not Modified code from the server.
4 - Correct, consider fallback to a noscript version of the website after 1 or 2 failed attempts (5 is too much).

I don't personally think it's worth it to try to redo the logic that the browser has. What if the images in your page don't load? What if the main page doesn't load. If the user has internet connection problems, they need to fix those internet connection problems. Lots of things will not work reliably until they do.
So, are you going to check that every image your page displays loads properly and if it didn't load, are you going to manually try to reload those too?
In my opinion, it might be worth it to put some inline JS to detect whether an external JS file didn't load (all you have to do is check for the existence of one global variable or top level function in the external JS file) and then just tell the user that they are having internet connection problems and they should fix those problems and then reload the site.

Your points are valid for script loading, but you must also consider the website usage.
If the scripts are not loading for whatever reason, the site must be still completely usable and navigable. The user experience come first before everything else.
The scripts should be loaded after the website interface has been loaded and visualized by the browsers, and should contain code to enhance user experience, not something you must absolutely rely on.
This way even when the connection is really slow, I will still be able to read content and choose to change page or go somewhere else, instead of having a blank page or a page with only the header displayed.
This to me is the most important point.
Also, are you sure about a retry approach? It causes more requests to the server. If the connection is slow or laggy then it may be best to not run the script at all, expecially considering users may spend little time on the page and only need to fast read at content. Also, in the connection is slow, how much time would you set for a timeout? What if the script is being downloaded while your timeout fired and you retry again? How can you effectively determine that amount of time, and the "slowness" of the connection?
EDIT
Have you tried out head.js? Is a plugin aimed at fastest possible sripts loading, maybe it will help.

Do browsers cache inline Javascript, and if so, how to force a reload?

We recently moved to jQuery 1.6 and ran into the attr() versus prop() back-compat issue. During the first few hours after the change was deployed everything was fine, then it started breaking for people. We identified the problem pretty quickly and updated the offending JS, which was inline.
No we have a situation where some folks are still having issues. In every case thus far, I could get the user up and running again by telling them to load the page in question then manually refresh it in the browser. So something must still be cached somewhere.
But there are basically only two potential culprits: First, the jQuery library itself, but this is loaded with the version number in the query string so I think browsers will be refreshing it in their cache. Second, the inline javascript. Is it possible that this is being cached in the browser?
We are using APC, apc.stat=1 so it should be detecting that the PHP files have changed. Just to be on the safe side I nuked the opcode cache anyway.
To summarize, I have two questions:
Could some browsers be ignoring the query string when jQuery is loaded?
Could some browsers be caching an older version of the inline javascript?
Any other ideas very welcome too.
UPDATE: In the course of checking that there wasn't any unexpected caching going on using Firebug, I discovered a case where the old jQuery library would load. That doesn't explain why we had trouble after deploying the site and before we updated the inline code, but if it solves the problem I'll take it.

The answer to both your questions is no. Unless the whole page is being cached. A browser can't cache part of a file, since it would have to download it to know which parts it had cached and by that time it's downloaded them all anyways. It makes no sense :)
You could try sending some headers along with your page that force the browser not to use it's cached copy.

I'd say the answers to your questions are 1) No and 2) Yes.
jQuery versions are different URLs so there's no caching problems there unless you somehow edit a jQuery file directly without changing the version string.
Browser pages (including inline javascript) will get cached according to both the page settings and the browser settings (that's what browsers do). The inline javascript isn't cached separately, but if the web page is cached, then the inline javascript is cached with it. What kind of permissible caching have you set for your web pages (either in meta tags or via http headers)?
Lots to read on the subject of web page cache control here and here if needed.
It is very important to plan/design an upgrade strategy when you want to roll out upgrades that works properly with cached files in the browser. Doing this wrong can result in your user either staying on old content/code until caches expire or even worse ending with a mix of old/new content/code that doesn't work.
The safest thing to do is to change source URLs when you have new content. Then there is zero possibility that an old cached page will ever get the new content so you avoid the mixing possibility. For example, on the Smugmug photo sharing web site, whenever any site owner updates an image to a new version of the image, a version number in the image URL is changed. Then, when the source page that shows that image is served from the web server, it includes the new image URL so, no matter whether the old version of the image is in the browser cache or not, the new version image is shown to the user.
It is obviously not always practical to change the URL of all pages (especially top level pages) so those pages often have to be set with short cache settings so the browsers won't cache them for long and they will regularly pull fresh content.

Why are most marketing tags (Omniture, XE, etc) written with document.write()? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why use document.write?
Considering the negative effects of document.write(), why are most tracking/marketing tags written using document.write()?
I've thought about it quite a bit, and the only respectable idea I've come up with is that by writing it client side, we're guaranteed the browser won't try to cache the resource. Is that it? Any other ideas?

It's definitely ugly, but it's about the most battle-hardened, dumbed-down, idiot-proof method there is. The paradigm of "write this content onto the page right here" leaves little room for surprises, and the implementation works reliably at least all the way back to v3 browsers, possibly earlier.
[edit] "v3" browsers, not as in Firefox 3, but as in Netscape 3. If Netscape still existed today I suppose it would be at version 11 by now.

Scripts that are inserted into the page using document.write do not block other script execution so the page load speed is not impacted when external resources for ads are inserted into the page. More info here: http://www.stevesouders.com/blog/2009/04/27/loading-scripts-without-blocking/

I don't see that this method guarantees anything when it comes to browser cache...? If it requests an image, for instance, the browser might still elect to serve that image from cache, regardless of whether the request originates from an IMG tag in the original source, or a document.write-written IMG tag.
My best guess is that they want the script to be self-contained and easy to deploy. A lot of the times (and for obvious reasons) the script referenced (eg. the URL to which the image points) is different depending on whether or not the current page is loaded over a secure HTTPS connection. If the current page is a https-page, load https://omniture.com/xxx, otherwise, load http://omniture.com/yyy. That's very easy to achieve in javascript, but you cannot hard code it in HTML. Given any server side language, it'd be equally easy to achieve, and that would be preferable, but I figure they don't want to say "go on and implement this functionality in whatever way you prefer", but rather, they want to deliver a solution that works as well as possible, regardless of the environment, and with as few dependencies as possible.

It has nothing to do with cache. It has to do with the script provider not knowing where in the document the script was placed. The script provider can't just do document.getElementsByTagName("script") and insert HTML after the first script tag with a matched URI, as it doesn't know if the URI contains a hash (http://example.com/blah.js#foo) or was hosted directly on the first-party server to lessen DNS requests. There is a workaround, but it involves using the dark magic from script file names exposed in caught errors. An implementation of this magic can be found in my implementation of document.write for asynchronous scripts. (<script async>)

We Keep Coding

JavaScript is the programming language of the Web.