The Problem
We want to be able to securely add third party javascripts into our site to enable functionality. This could be Google Tag Manager, AppCues, MixPanel, etc. Third party javascript may be hosted on a third party site or CDN. These domains are not owned by us and should not be inherently trusted - because as we know, if they are hacked, that means when we include this code, our site is now owned.
SubResource Integrity
The W3C (and browsers) have attempted to address this issue with SubResource Integrity. This works well for libraries that are well versioned. When you update the version, you update the integrity hash. However, this works terribly for javascripts like GTM, or Mixpanel that want to provide seamless updates and have constantly changing integrity hashes.
Proposals
Its very surprising this issue is not well solved in 2021. There are some academic approaches to this here and here that discuss using signatures to solve this issue. There is even an approach to solving this using blockchain here!
The question
So, am I missing something? Is there a more recognized way to solve the security of including third party javascripts? We solved signing files, and software libraries, why can't we digitally sign javascripts? Could JWT or PGP/GPG be used?
In particular, I am wondering if there is a strategy that could leverage some basic coordination between the 3rd party javascript author and the including code. For instance, could the 3rd party author publish integrity hashes that were signed, so we as the including application could pull signed hashes and validate them?
UPDATE
This is related to this SO back in 2016 before SRI was fully adopted. This question is the 2021 version. I believe this to be a relevant question in 2021 because of the advent of SaaS services which were not as prevalent in 2016. Many of these services provide hosted javascript files and specifically do not version them because they want to provide constant updates.
Related
I (or a lot of German people) need your help.
In Germany, more and more website operators are receiving a legal letter with a warning and are supposed to pay around €170. The problem is that it doesn't stop there, which means that if you pay the €170, someone else can come right away and warn you again.
It's about Google Fonts. Many Wordpress websites use themes that load Google Fonts. The German court has decided that it is not allowed to send the IP address to the Google because of Google Fonts and this is a violation of the rights of the customers.
Since I run a few websites, I'm now looking for a solution, but to be honest I'm coming up against technical limits. So I want to open this thread to discuss possibilities.
I have listed issues below, I will add them to my solutions.
I can think of the following options:
Create a child theme and then load the Google fonts locally. (Issue: 1st 2nd 3rd 4th)
Service worker that rewrites the URLs
(Issue: 5.)
Nginx rewrite, rewrite the php output and replace google fonts url
(Issuing: 1. 4.)
More?
Issues:
If you have e.g. integrated a script (Google Maps, Recaptcha, Intercom, ...) it can happen that Google Fonts are reloaded by Javascript.
Theme updates.
A lot of work when multiple customers.
Plugins load elements on certain pages or only later after it can happen that google fonts are loaded again.
Only works if the service worker is installed
I am open for any idea. It looks like Google will not fix this.
There is no easy technical fix. The only long-term fix is to review how you include any third-party content on your websites, in case this embedding causes any visitor personal data to flow to such third parties.
This is not a new development. A lot of the relevant compliance steps already entered the (German) mainstream in the early 2010s when the problem was Facebook's “Like button”. The generally accepted solution for that is that the third party content is not loaded directly. Instead, a placeholder widget is rendered that indicates which content would be available there. Then, the user can give consent with one click and the actual embedded content is loaded.
With Google Fonts, no such consent flow is needed or appropriate. All of the fonts on Google Fonts are Open Source licensed – you are allowed to use and distribute them for free, but subject to the license conditions (like making the license notice available to users). So on a technical level, it is easy to self-host the fonts in question.
What is tricky is efficiently rewriting the requests caused by your websites to point to your own servers instead of to the Google servers. You have identified a couple of approaches and have identified their pros and cons. Some comments:
Client-side rewriting sounds very fragile, I'd avoid it.
Server-side rewriting can be very powerful, but would also be somewhat fragile. The main advantage of such rewrites would be that it doesn't just handle Google Fonts embeds from your themes, but also requests inserted by server-side plugins.
Updating the theme is the only reliable long-term solution. Creating a child theme might be a suitable stop-gap measure until the theme developer fixes the problem. Similarly, it may be necessary to temporarily modify WordPress plugins.
I think that as a band-aid, server-side rewrites will be good enough to prevent many automated scanning tools used by these cease-and-desist lawyers from sounding the alarm on your sites.
However, you have correctly identified that especially JavaScript could cause problems for achieving actual compliance. This is why you should revisit your decisions about what plugins and scripts you have integrated. Loading third party JavaScript has pretty much the same legal consequences as loading fonts from Google, so you should only do it if it's actually necessary for your site (where necessity depends on the user's perspective), or if the user has given valid consent. For example, you can use the placeholder widget technique mentioned above for embedded content like Google Maps or Intercom, whereas loading a Captcha may indeed be strictly necessary on some pages.
For testing these issues, I'd recommend installing Firefox with the uBlock Origin addon, and setting the addon to hard mode. This will block all third-party/cross-origin requests. You can then allowlist those domains that are under your direct control, or are provided by your data processors (who are contractually bound to only use the personal data as instructed by you, and are considered first-party for GDPR purposes), or domains for which you have a legal basis to share the data (e.g. a “legitimate interest” to load stuff that is strictly necessary for your site to work, or to investigate what requests are made when the user gives consent).
IANAL but the two sections may be relevant.
Using their APIs. From what I can tell nothing here explicitly forbids proxying.
API Prohibitions on sublicensing. The last part of the statement and offer it for use by third parties means you're okay as long as you're not offering it for other people to use.
I do have Google Fonts Proxy Docker Image which I use for my own stacks, I don't offer the use of my running proxy for use with other services. It does not mean you can't simply deploy my image on your own servers.
This won't resolve your 3rd party Google services such as Maps though.
It's well known that Google and Microsoft host several common javascript libraries on their CDNs (content distribution networks). Unfortunately neither seems to host JSON2.js.
I'm aware that I could upload a copy of JSON2.js to my server and serve it myself, but there are a number advantages CDNs offer that I would like to take advantage of.
So with that in mind, are there any publicly available CDNs that host JSON2? If not, any idea why? Is there some sort of copyright reason?
Checkout cdnjs.com
http://cdnjs.com/libraries/json2/
Might also be worth investigating Json3
http://cdnjs.com/libraries/json3/
UPDATE: Some of the information was out of date, changed to better links.
json2.js can be found on Yandex CDN servers.
Full version: http://yandex.st/json2/2011-10-19/json2.js
Minified: http://yandex.st/json2/2011-10-19/json2.min.js
HTTPS also works.
I think probably it's too early to expect the big CDNs to start doing this. When enough sites are using a library, the benefits become clear: greater availability, more frequent use, reduced client requests, increased performance for the end user. If only a few sites are using it, chances of client having a copy in their cache already is low and all performance boosts are lost. So all that's left is that MS and Google offset your bandwidth charges, which is not their intention. Thus, the solution is to get more developers to use the library.
Plus the library is so tiny. The code is still only 3.5KB using conservative minification. For comparison, jQuery is 24KB and ext-core is 29KB. I'd personally recommend folding the library into your own site's base JS and get your performance boost there. At least until there's wider acceptance.
Plus, it's funny I'd have expected the JSON library to be hosted also at Yahoo, but I can't find it. I mean Crockford works there.
Thomas from cdnjs.com here with two quick reasons why there is no minified version.
1) The script may not possibly function as the author intended using the method of minification we choose.
2) As a security step we ensure that all files checksums match the original authors hosted files so community submitted updates cannot contain malformed minified code.
So for now that leaves us hosting Crockfords hosted un-minified version;
https://github.com/douglascrockford/JSON-js/raw/master/json2.js
There is now.
Douglas Crockford recently put JSON2 on github, this url will always link to the most recent version.
Edit:
Its not a good idea to use this method, see my comment below.
If I wanted to get a javascript library published to the ajax CDNs hosted by Google or Microsoft, what would I have to do?
Are there any formal requirements for this, like number of users etc?
I doubt there are any formal requirements except that the lib has to be wildly popular, and probably will have to be regarded to be of high quality by the companies running the CDNs.
Google's Ajax libraries main page has this to say:
Google works directly with the key stake holders for each library effort and accepts the latest stable versions as they are released. Once we host a release of a given library, we are committed to hosting that release indefinitely.
I'd say if you feel your library is popular and good enough - seeing as Google for example are hosting 12 projects at the moment, yours would have to be in the world wide top twenty by some measure though! - simply talk to Google and Microsoft and see what they say.
Here is a blog post that could provide you with some contacts to approach. Also, the author seems to be somehow affiliated with Google (he's talking about "we").
The Google Ajax Library Blog may also be a good resource.
There are some tutorials which suggest to use jquery path which is from google eg:
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script>
Is that safe to use in our projects?
Aren't we dependent because we are not sure it will be there after a year or beyond?
The reason why I have asked this question is that there are some people who go in favor
of that.
From the documentation:
Google works directly with the key
stake holders for each library effort
and accepts the latest stable versions
as they are released. Once we host a
release of a given library, we are
committed to hosting that release
indefinitely.
It seems pretty low-risk to me. And more likely to be already in the user's cache. And served with the proper gzip and caching headers. Also won't eat up a http request to your domain on browsers that only allow downloading 2 requests to a domain at a time (e.g. IE6 and IE7).
I have an article for you that explains the benefits and cons of using this method: Here
I really doubt that google will put this up for people to use and then all of a sudden take it down and cause issues with thousands or more websites. It's not like they will lose their domain or run out of bandwidth. The only issue that I think you should worry about is if the end users of your sites cannot access google. Personally I just host the file on my own server anyway
Short answer is yes and I agree if that include doesn't work it is probably a sign of a much bigger problem. My general rule of thumb is for all public facing apps I use that include where as internal apps (which theoretically could be used w/o a connection to the outside world) I include a local copy instead.
There will be always a chance that it will not be there after a year, same as gmail, gdocs, google.com...
for jquery alone, i do not see a reason to use google source, as the file is small, impact to your server and BW will not be too much. But jquery UI may worth to use google's source.
It's pretty 'safe' like what the other guys mentioned. You probably ease a bit of load of your own server too. Even SO itself is using it.
But to be safe, always have a fallback plan and have a local copy, just in case.
There's not really much risk involved if you think about it. Suppose Google ceases to exist in a year (chuckle), it wouldn't take you but a couple of minutes to replace the google.load command in your common file with a reference to your own local jQuery copy.
The worse-case scenario is that in the improbable future of Google's demise, your hover-effects stop to work for 5 minutes :)
A similar question: Where do you include the jQuery library from? Google JSAPI? CDN?
Because of the answers from that question, I have started using:
<script type="text/javascript" src="//ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script>
I have it running on quite a number of sites. The only issue I've had is some firewalls start blocking a site if there are too many requests (or at least this is my guess), which is the case on higher traffic sites all used in one location.
This is a broad-based question around delivering a javascript library that other web developers will use on their site. Here's the scope of my library:
I'm providing a data service that's delivered in the form of a JS file. A similar implementation would be Google Analytics.
Will always be hosted by me. Developers will simply use the #src attribute on the <script> tag.
My library consists of an object (let's call it Jeff for now) with a set of properties. No methods, just values.
The library isn't static, but is instead session-based. We're providing data points that can only be determined at request time. (Think of a web service normally called through AJAX, available at page-load.)
This is not a free service; implementors will pay for usage.
The Jeff object will always be returned, though not all properties may be populated due to a runtime error that happened back at my server. The Jeff object includes a Response section that indicates success/failure and a description.
Now, to my question: what's ideal in terms of best practices for providing a service in the form of a JS library such as I've described? Standard Googling has not given me much to go on, but references to guidelines are greatly appreciated.
Doesn't sound like something I'd use. The fact that you want it always hosted on your server leaves any consumer of the service open to you substituting malicious code after they've reviewed and determined its useful and safe. Thus I'd see limited uptake for it unless you're a large corporation with a trustworthy reputation.
No comment on you personally, just how I'd view something like that and how Information Security overseers in larger companies would likely view it as well.
YUI host all their files for developers to access directly, and with free use of their CDN to boot. Also hundreds of thousands of companies worldwide use Google Analytics, which is the same risk profile as "Jeff".
Admittedly the trust profile for Yahoo! and Goole is a lot higher than it is for "Jeff", but still, my point is there are plenty of precedents out there for this delivery model.
Personally (btw there is no right answer, except for the market's response) I believe it may have merit depending on the value proposition behind "Jeff". I agree with MadMurf, describe it as a 'web service' that requires only one JS file to integrate into a customer's website.
PS: I'm not sure if "javascript" was the best tag for discussing this. Maybe the "business" tag would have elicited wider feedback. Good luck!