On online mode, the service worker api fallback takes over the server side rendered page. When I first load the /about page for example, looking in the source code I have the home page rendered (because I defined / as navigateFallback in the sw-precache options):
I want this behavior only when I'm running on offline mode.
Here are the options I use:
{
cacheId: pkg.name,
dontCacheBustUrlsMatching: /./,
dynamicUrlToDependencies: {
'/': [ resolve(__dirname, '../server/views/index.ejs') ]
},
navigateFallback: '/',
staticFileGlobs: [
`${publicDir}/{bundle,vendor}.*.{js,css,gz}`,
`${publicDir}/manifest.json`
],
stripPrefix: publicDir,
runtimeCaching: [{
urlPattern: /api/,
handler: 'networkFirst'
}]
}
The expected use case of navigateFallback in sw-precache is to provide a static HTML document that could be served using a cache-first strategy immediately, without having to go against the network. This ensures that requests for any URL on your site can be fulfilled with something almost instantly, instead of having to wait an indeterminate amount of time for a response from the network. This presupposes that you have a generic HTML page (referred to as an App Shell) that could be used to fulfill any navigation requests, and which knows how to implement client-side logic to examine the current URL and dynamically insert the appropriate content. If you already have code that supports Single Page App-style navigations, then that's usually sufficient.
You can read more about this architectural pattern, and there's also a full example of a React-based PWA that uses SSR for the initial navigation (or for clients that don't support service workers) and then upgrades to the navigateFallback behavior once the SW work is installed.
If you'd prefer not to use this model, and instead want to always go against the network for your SSR version of each page, and only use a cached page when the network request fails, then the navigateFallback option is not the right choice. You can find some examples of offline fallbacks around the web.
Related
I'm new to Vue and created a project with the PWA Service-worker plugin. After deploying a new version of my App I get these messages in console:
After refreshing the page (F5) these messages still appear the same way and the App is still in it's old state. I tried everything to clear the cache but it still won't load the new content.
I haven't changed anything from the default config after creating my project and didn't add any code which interacts with the serviceworker. What is going wrong? Am I missing something?
As I figured out, this question is really only related to beginners in PWA, which don't know that you can (and need) to configure PWA for achieving this. If you feel addressed now (and using VueJS) remember:
To automatically download the new content, you need to configure PWA. In my case (VueJS) this is done by creating a file vue.config.js in the root directory of my project (On the same level as package.json).
Inside this file you need this:
module.exports = {
pwa: {
workboxOptions: {
skipWaiting: true
}
}
}
Which will automatically download your new content if detected.
However, the content won't be displayed to your client yet, since it needs to refresh after downloading the content. I did this by adding window.location.reload(true) to registerServiceWorker.js in my src/ directory:
updated () {
console.log('New content is available: Please refresh.')
window.location.reload(true)
},
Now, if the Service Worker detects new content, it will download it automatically and refresh the page afterwards.
I figured out a different approach to this and from what I've seen so far it works fine.
updated() {
console.log('New content is available; please refresh.');
caches.keys().then(function(names) {
for (let name of names) caches.delete(name);
});
},
What's happening here is that when the updated function gets called in the service worker it goes through and deletes all the caches. This means that your app will start up slower if there is an update but if not then it will serve the cached assets. I like this approach better because service workers can be complicated to understand and from what I've read using skipWaiting() isn't recommend unless you know what it does and the side effects it has. This also works with injectManifest mode which is how I'm currently using it.
pass registration argument then use the update() with that.
the argument uses ServiceWorkerRegistration API
updated (registration) {
console.log('New content is available; please refresh.')
registration.update()
},
I am making a web application with offline capabilities using a service worker generated by a Nodejs plugin called sw-precache. Everything works fine and I do have access to the html files or images offline.
But then, since you have no server-side language possible, is there a way to rewrite url client-side like an .htaccess file would do? Like showing a "404 Page not found" page when no file matches the url? I know that redirections are possible using Javascript or meta tags, but rewriting the url?
By default, sw-precache will only respond to fetch events when the URL being requested is a URL for a resource that it has cached. If someone navigations to a URL for a non-existent web page, then sw-precache won't respond to the fetch event.
That does mean that you have a chance to run your own code in an additional fetch event handler that could implement custom behavior, like returning a 404.html page when a user navigates to a non-existent page while offline. You need to jump through a couple of hoops, but here's how to do it:
// In custom-offline-import.js:
self.addEventListener('fetch', event => {
if (event.request.mode === 'navigate') {
event.respondWith(
fetch(event.request)
.catch(() => caches.match('404.html', {ignoreSearch: true}))
// {ignoreSearch: true} is needed, since sw-precache appends a search
// parameter with versioning information.
);
}
});
// In your sw-precache config:
{
// Make sure 404.html is picked up in one of the glob patterns:
staticFileGlobs: ['404.html'],
// See https://github.com/GoogleChrome/sw-precache#importscripts-arraystring
importScripts: ['custom-offline-import.js'],
}
This shouldn't interfere with anything that sw-precache is doing, as it will just be used as fallback.
Content Script can be injected programatically or permanently by declaring in Extension manifest file. Programatic injection require host permission, which is generally grant by browser or page action.
In my use case, I want to inject gmail, outlook.com and yahoo mail web site without user action. I can do by declaring all of them manifest, but by doing so require all data access to those account. Some use may want to grant only outlook.com, but not gmail. Programatic injection does not work because I need to know when to inject. Using tabs permission is also require another permission.
Is there any good way to optionally inject web site?
You cannot run code on a site without the appropriate permissions. Fortunately, you can add the host permissions to optional_permissions in the manifest file to declare them optional and still allow the extension to use them.
In response to a user gesture, you can use chrome.permission.request to request additional permissions. This API can only be used in extension pages (background page, popup page, options page, ...). As of Chrome 36.0.1957.0, the required user gesture also carries over from content scripts, so if you want to, you could add a click event listener from a content script and use chrome.runtime.sendMessage to send the request to the background page, which in turn calls chrome.permissions.request.
Optional code execution in tabs
After obtaining the host permissions (optional or mandatory), you have to somehow inject the content script (or CSS style) in the matching pages. There are a few options, in order of my preference:
Use the chrome.declarativeContent.RequestContentScript action to insert a content script in the page. Read the documentation if you want to learn how to use this API.
Use the webNavigation API (e.g. chrome.webNavigation.onCommitted) to detect when the user has navigated to the page, then use chrome.tabs.executeScript to insert the content script in the tab (or chrome.tabs.insertCSS to insert styles).
Use the tabs API (chrome.tabs.onUpdated) to detect that a page might have changed, and insert a content script in the page using chrome.tabs.executeScript.
I strongly recommend option 1, because it was specifically designed for this use case. Note: This API was added in Chrome 38, but only worked with optional permissions since Chrome 39. Despite the "WARNING: This action is still experimental and is not supported on stable builds of Chrome." in the documentation, the API is actually supported on stable. Initially the idea was to wait for a review before publishing the API on stable, but that review never came and so now this API has been working fine for almost two years.
The second and third options are similar. The difference between the two is that using the webNavigation API adds an additional permission warning ("Read your browsing history"). For this warning, you get an API that can efficiently filter the navigations, so the number of chrome.tabs.executeScript calls can be minimized.
If you don't want to put this extra permission warning in your permission dialog, then you could blindly try to inject on every tab. If your extension has the permission, then the injection will succeed. Otherwise, it fails. This doesn't sound very efficient, and it is not... ...on the bright side, this method does not require any additional permissions.
By using either of the latter two methods, your content script must be designed in such a way that it can handle multiple insertions (e.g. with a guard). Inserting in frames is also supported (allFrames:true), but only if your extension is allowed to access the tab's URL (or the frame's URL if frameId is set).
I advise against using declarativeContent APIs because they're deprecated and buggy with CSS, as described by the last comment on https://bugs.chromium.org/p/chromium/issues/detail?id=708115.
Use the new content script registration APIs instead. Here's what you need, in two parts:
Programmatic script injection
There's a new contentScripts.register() API which can programmatically register content scripts and they'll be loaded exactly like content_scripts defined in the manifest:
browser.contentScripts.register({
matches: ['https://your-dynamic-domain.example.com/*'],
js: [{file: 'content.js'}]
});
This API is only available in Firefox but there's a Chrome polyfill you can use. If you're using Manifest v3, there's the native chrome.scripting.registerContentScript which does the same thing but slightly differently.
Acquiring new permissions
By using chrome.permissions.request you can add new domains on which you can inject content scripts. An example would be:
// In a content script or options page
document.querySelector('button').addEventListener('click', () => {
chrome.permissions.request({
origins: ['https://your-dynamic-domain.example.com/*']
}, granted => {
if (granted) {
/* Use contentScripts.register */
}
});
});
And you'll have to add optional_permissions in your manifest.json to allow new origins to be requested:
{
"optional_permissions": [
"*://*/*"
]
}
In Manifest v3 this property was renamed to optional_host_permissions.
I also wrote some tools to further simplify this for you and for the end user, such as
webext-domain-permission-toggle and webext-dynamic-content-scripts. They will automatically register your scripts in the next browser launches and allow the user the remove the new permissions and scripts.
Since the existing answer is now a few years old, optional injection is now much easier and is described here. It says that to inject a new file conditionally, you can use the following code:
// The lines I have commented are in the documentation, but the uncommented
// lines are the important part
//chrome.runtime.onMessage.addListener((message, callback) => {
// if (message == “runContentScript”){
chrome.tabs.executeScript({
file: 'contentScript.js'
});
// }
//});
You will need the Active Tab Permission to do this.
I've been looking into ways to improve SEO for angularJS apps that are hosted on a CDN like Amazon S3 (i.e. simple storage with no backend). Most of the solutions out there, PhantomJS, prerender.io, seo.js etc., rely on a backend to recognise the ?_escaped_fragment_ url that the crawler generates and then fetch the relevant page from elsewhere. Even grunt-html-snapshot ultimately needs you to do this, even though you generate the snapshot pages ahead of time.
This solution is basically relying on using cloudflare as a reverse proxy, which seems a bit of a waste given that most of the security apparatus etc. that their service provides is totally redundant for a static site. Setting up a reverse proxy myself as suggested here also seems problematic given that it would require either i) routing all AngularJS apps I need static html for through one proxy server which would potentially hamper performance or ii) setting up a separate proxy server for each app, at which point I may as well set up a backend, which isn't affordable at the scale I am working.
Is there anyway of doing this, or are statically hosted AngularJS apps with great SEO basically impossible until google updates their crawlers?
Reposted on webmasters following John Conde's comments.
Actually this is a task that is indeed very troublesome, but I have managed to get SEO working nicely with my AngularJS SPA site (hosted on AWS S3) at http://www.jobbies.co/. The main idea is to pre-generate and populate the content into the HTML. The templates will still be loaded when the page loads and the pre-rendered content will be replaced.
You can read more about my solution at http://www.ericluwj.com/2015/11/17/seo-for-angularjs-on-s3.html, but do note that there are a lot of conditions.
Here is a full overview of how to make your app SEO-friendly on a storage service such as S3, with nice urls (no #) and everything with grunt with the simple command to be performed after build:
grunt seo
It's still a puzzle of workarounds, but it's working and it's the best you can do. Thank you to #ericluwj and his blogpost who inspired me.
Overview
The goal & url structure
The goal is to create 1 html file per state in your angular app. The only major assumption is that you remove the '#' from your url by using html5history (which you should do !) and that all your paths are absolute or using angular states. There are plenty of posts explaining how to do so.
Urls end with a trailing slash like this
http://yourdomain.com/page1/
Personally I made sure that http://yourdomain.com/page1 (no trailing slash) also reaches its destination, but that's off topic here. I also made sure that every language has a different state and a different url.
The SEO logic
Our goal is that when someone reaches your website through an http request:
If it's a search engine crawler: keep him on the page which contains the required html. The page also contains angular logic (eg to start your app) but the crawler cannot read that so he is intentionally stuck with the html you served him and will index that.
For normal humans and intelligent machines : make sure angular gets activated, erase the generated html and start your app normally
The grunt tasks
Here we go with the grunt tasks:
//grunt plugins you will need:
grunt.loadNpmTasks('grunt-prerender');
grunt.loadNpmTasks('grunt-replace');
grunt.loadNpmTasks('grunt-wait');
grunt.loadNpmTasks('grunt-aws-s3');
//The grunt tasks in the right order
grunt.registerTask('seo', 'First launch server, then prerender and replace', function (target) {
grunt.task.run([
'concurrent:seo' //Step 1: in parrallel launch server, then perform so-called seotasks
]);
});
grunt.registerTask('seotasks', [
'http', //This is an API call to get all pages on my website. Skipping this step in this tutorial.
'wait', // wait 1.5 sec to make sure that server is launched
'prerender', //Step 2: create a snapshot of your website
'replace', //Step 3: clean the mess
'sitemap', //Create a sitemap of your production environment
'aws_s3:dev' //Step 4: upload
]);
Step 1: Launch local server with concurrent:seo
We first need to launch a local server (like grunt serve) so that we can take snapshots of our website.
//grunt config
concurrent: {
seo: [
'connect:dist:keepalive', //Launching a server and keeping it alive
'seotasks' //now that we have a running server we can launch the SEO tasks
]
}
Step 2: Create a snapshot of your website with grunt prerender
The grunt-prerender plugins allows you to take a snapshot of any website using PhantomJS. In our case we want to take a snapshot of all pages of the localhost website we just launched.
//grunt config
prerender: {
options: {
sitePath: 'http://localhost:9001', //points to the url of the server you just launched. You can also make it point to your production website.
//As you can see the source urls allow for multiple languages provided you have different states for different languages (see note below for that)
urls: ['/', '/projects/', '/portal/','/en/', '/projects/en/', '/portal/en/','/fr/', '/projects/fr/', '/portal/fr/'],//this var can be dynamically updated, which is done in my case in the callback of the http task
hashed: true,
dest: 'dist/SEO/',//where your static html files will be stored
timeout:5000,
interval:5000, //taking a snapshot of how the page looks like after 5 seconds.
phantomScript:'basic',
limit:7 //# pages processed simultaneously
}
}
Step 3: Clean the mess with grunt replace
If you open the pre-rendered files, they will work for crawlers, but not for humans. For humans using chrome, your directives will load twice. Therefore you need to redirect intelligent browsers to your home page before angular gets activated (i.e., right after head).
//Add the script tag to redirect if we're not a search bot
replace: {
dist: {
options: {
patterns: [
{
match: '<head>',
//redirect to a clean page if not a bot (to your index.html at the root basically).
replacement: '<head><script>if(!/bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent)) { document.location = "/#" + window.location.pathname; }</script>'
//note: your hashbang (#) will still work.
}
],
usePrefix: false
},
files: [
{expand: true, flatten: false, src: ['dist/SEO/*/**/*.html'], dest: ''}
]
}
Also make sure you have this code in your index.html on your ui-view element, which clears all the generated html directives BEFORE angular starts.
<div ui-view autoscroll="true" id="ui-view"></div>
<!-- this script is needed to clear ui-view BEFORE angular starts to remove the static html that has been generated for search engines who cannot read angular -->
<script>
if(!/bot|googlebot|crawler|spider|robot|crawling/i.test( navigator.userAgent)) { document.getElementById('ui-view').innerHTML = ""; }
</script>
Step 4: Upload to aws
You first upload your dist folder which contains your build. Then you overwrite it with the files you prerendered and updated.
aws_s3: {
options: {
accessKeyId: "<%= aws.accessKeyId %>", // Use the variables
secretAccessKey: "<%= aws.secret %>", // You can also use env variables
region: 'eu-west-1',
uploadConcurrency: 5, // 5 simultaneous uploads
},
dev: {
options: {
bucket: 'xxxxxxxx'
},
files: [
{expand: true, cwd: 'dist/', src: ['**'], exclude: 'SEO/**', dest: '', differential: true},
{expand: true, cwd: 'dist/SEO/', src: ['**'], dest: '', differential: true},
]
}
}
That's it, you have your solution ! Both humans and bots will be able to read your web-app
if you use ng-cloak in interesting ways there could be a good solution.
I haven't tried this myself, but it should work in theory
The solution is highly dependent on CSS, but it should perfectly well.
For example you have three states in your angular app:
- index (pathname : #/)
- about (pathname : #/about)
- contact (pathname : #/contact)
The base case for index can be added in too, but will be tricky so I'll leave it out for now.
Make your HTML look like this:
<body>
<div ng-app="myApp" ng-cloak>
<!-- Your whole angular app goes here... -->
</div>
<div class="static">
<div id="about class="static-other">
<!-- Your whole about content here... -->
</div>
<div id="contact" class="static-other">
<!-- Your whole contact content here... -->
</div>
<div id="index" class="static-main">
<!-- Your whole index content here... -->
</div>
</div>
</body>
(It's Important that you put your index case last, if you want to make it more awesome)
Next Make your CSS look something like this:-
[ng-cloak], .static { display: none; }
[ng-cloak] ~ .static { display: block; }
Just that will probably work well enough for you anyway.
The mg-cloak directive will keep your angular app hidden when angular is not loaded and will show your static content instead. Google will get your static content in the HTML.
As a bonus end-users can also see well styles static content while angular loads.
You can then get more creative if you start using :target pseudo selectors in your CSS. You can use actual links in your Static content, but just make them links to various ids. So in #index div make sure you have links to #about and #contact. Note the missing '/' in the links. HTML id's can't start with a slash.
Then make your CSS look like this:
[ng-cloak], .static { display: none; }
[ng-cloak] ~ .static { display: block; }
.static-other {display: none;}
.static-other:target {display: block;}
.static-other:target ~ .static-main {display: none;}
You now have a full functioning static app WITH ROUTINg that works before angular starts-up.
As an additional bonus, when angular starts up it is smart enough to convert #about to #/about automatically, and the experience shouldn't even break at all.
Also, not to forget the SEO problem has totally been solved, of course. I've not used this technique yet, as I've always had a server to configure, but I'm very interested in how this works out for you.
Hope this helps.
As AWS is offering Lambda#Edge as a service we can handle this issue without grunt or anything else. (Atleast for basic stuff)
I tried Lambda#Edge and it worked as expected, in my case I just had all the routes set to "/" in Lambda#Edge (Except for the files are present in s3 like css, images etc).
The event for the Lambda that I set to is "viewerRequest" and following is the code.
'use strict';
exports.handler = (event, context, callback) => {
console.log("Event received is", JSON.stringify(event));
console.log("Context received is", context);
const request = event.Records[0].cf.request;
if (request.uri.endsWith(".rt")) {
console.log("URI is matching with .rt, the URI is ", request.uri);
request.uri = "/";
} else {
console.log("URI is not ending with rt so letting it go URI is", request.uri);
}
console.log("Final request URI is", request.uri);
callback(null, request);
};
Logs in the cloudwatch are little difficult to check as the logs are populated in the region of the cloudwatch which is nearer to the edge location which is handling the request.
For ex. Though this Lambda is deployed/written for us-east I see this in ap-south region as I am accessing the cloudfront from Singapore.
Checked it in Google webmaster tools 'Fetch as google' options and the page is being rendered and viewed as expected.
I've been looking for days to find a solution for this. As far as I know there isn't nice solution to the problem. I hope firebase will eventually enable user-agent redirects. If you have the money you could use MaxCDN enterprise. They offer Edge Rules which includes redirects by user agent .
https://www.maxcdn.com/features/rules/
I'm working on a macro recording and playback system with selenium and JavaScript. At some point I run a JavaScript code that basically subscribes a new even handler to all window events, and dumps to the localStorage some event data, that I will later collect. The problem is that when the user clicks a link, or by some other reason the page is reloaded, the event handlers are lost. All the data so far is still in the localStorage, but I cannot continue collecting new data.
I don't have control of the server, so I cannot insert code in the page source. I can only control the browser using selenium, so all I can do is execute some JavaScript at some point to start dumping events, and some JavaScript at a later point to recover the events data. The user might be browsing StackOverflow, for what I know.
Is there any workaround?
PS: I'm using selenium for python, if that matters.
If you use a proxy, such as squid, then you can integrate that with an ICAP* server to transform the pages before they arrive at your browser. This would allow any page to be altered before it arrives at the browser, inserting your javascript.
Squid version 3 or greater comes with an integrated ICAP server.
* Internet Content Adaptation Protocol - defined in RFC 3507
I think i have a very really good simple solution:
To make the injection easiest you can the the Gatejs SPDY/HTTP proxy and use the injection gatejs opcode - it works both on forward and reverse proxy.
Gatejs injection will try to add you html code into a content of type HTML (text/html).
Below a forward proxy example using injection.
var serverConfig = function(bs) { return({
hostname: "testServer0",
runDir: "/tmp/gatejs",
dataDir: "/path/to/dataDir",
logDir: "/var/log/gatejs",
http: {
testInterface: {
type: 'forward',
port: 8080,
pipeline: 'pipetest'
},
},
pipeline: {
pipetest: [
['injection', {
code: "<h1>w00t injection</h1>"
}],
['proxyPass', { mode: 'host', timeout: 10 }]
],
}
})};
mk-