I'm currently working on a small web app which should implement a cached first scenario (users download the wep app in a wifi provided base and then should be able use it offline outside)
I'm not using any framework and therefore implement the caching (SW) myself.
As I also integrate some playcanvas content (which has its own loading screen) over an iframe I was wondering what overall strategy in terms of loading would make sense.
In a similar project I simply let the service worker download the assets parallel to the (initial) load of the application.
But it came to my mind that that it would be better to implement a workflow which is closer to an native app behavior - meaning showing a overall loading screen during the service worker download process and building/showing my main application after this process is finished (or did fail -> forced network scenario or did happen before -> offline scenario). Another solution would be to show a non blocking "Assets are still being downloaded" banner.
The main thoughts leading mit to the second workflow where:
The SW-Loading screen / banner could provide better feedback to the user: "All assets downloaded - I'm safe to go offline", while the old scenario could cause issues here - successfully showing the the user the first state - while some critical files are still downloaded in the back.
With the SW-Loading screen the download process is a bit more controllable/understandable for me - as the parallel process of an SW-Download and the Playcanvas Loading for example become sequential.
It would be great if someone could provide me feedback/info:
if I'm on the right track with this second scenario for being better or it just being overhead
how / if it might be possible to implement a cheap loading screen, meaning for example 100 of 230 Files downloaded or else.
better strategies for this scenario in general
As always, thanks for any heads up in advance.
A lot of this comes down to what you want your users to experience. The underlying technology is there to accomplish any of the scenarios you outline.
For instance, if you want to show information about the precaching progress during initial service worker installation, you could do that by adding code along the lines of the following.
In your service worker:
const PRECACHE_NAME = "...";
const URLS_TO_PRECACHE = [
// ...
];
async function notifyClients(urlsCached, totalURLs) {
const clients = await self.clients.matchAll({ includeUncontrolled: true });
for (const client of clients) {
client.postMessage({ urlsCached, totalURLs });
}
}
self.addEventListener("install", (event) => {
event.waitUntil(
(async () => {
const cache = await caches.open(PRECACHE_NAME);
const totalURLs = URLS_TO_PRECACHE.length;
let urlsCached = 0;
for (const urlToPrecache of URLS_TO_PRECACHE) {
await cache.add(urlToPrecache);
urlsCached++;
await notifyClients(urlsCached, totalURLs);
}
})()
);
});
In your client pages:
// Optional: if controller is not set, then there isn't already a
// previous service worker, so this is a "first-time" install.
// If you would prefer, you could add this event listener
// unconditionally, and you'll get update messages even when there's an
// updated service worker.
if (!navigator.serviceWorker.controller) {
navigator.serviceWorker.addEventListener("message", (event) => {
const { urlsCached, totalURLs } = event.data;
// Display a message about how many URLs have been cached.
});
}
Related
I know you can use navigator onLine inside the renderer process because it's a rendered inside a browser. But what I'm trying to do is something like this in the main process:
if (navigator.onLine){
mainWindow.loadURL("https://google.com")
} else {
mainWindow.loadFile(path.join(__dirname, 'index.html'));
}
So basically if the user is offline, just load a local html file, and if they're online, take them to a webpage. But, like expected, I keep getting the error that 'navigator is not defined'. Does anyone know how can I somehow import the navigate cdn in the main process? Thanks!
TL;DR: The easiest thing to do is to just ask Electron. You can do this via the net module from within the Main Process:
const { net } = require ("electron");
const isInternetAvailable = () => return net.isOnline ();
// To check:
if (isInternetAvailable ()) { /* do something... */ }
See Electron's documentation on the method; specifically, this approach doesn't tell you whether your service is accessible via the internet, but rather that a service can be contacted (or not even this, as the documentation mentions links which would not involve any HTTP request at all).
However, this is not a reliable measurement and you might want to increase its hit rate by manuallly checking whether a certain connection can be made.
In order to check whether an internet connection is available, you'll have to make a connection yourself and see if it fails. This can be done from the Main Process using plain NodeJS:
// HTTP code basically from the NodeJS HTTP tutorial at
// https://nodejs.dev/learn/making-http-requests-with-nodejs/
const https = require('https');
const REMOTE_HOST = "google.com"; // Or your domain
const REMOTE_EP = "/"; // Or your endpoint
const REMOTE_PAGE = "https://" + REMOTE_HOST + REMOTE_EP;
function checkInternetAvailability () {
return new Promise ((resolve, reject) => {
const options = {
hostname: REMOTE_HOST,
port: 443,
path: REMOTE_EP,
method: 'GET',
};
// Try to fetch the given page
const req = https.request (options, res => {
// Yup, that worked. Tell the depending code.
resolve (true);
req.destroy (); // This is no longer needed.
});
req.on ('error', error => {
reject (error);
});
req.on ('timeout', () => {
// No, connection timed out.
resolve (false);
req.destroy ();
});
req.end ();
});
}
// ... Your window initialisation code ...
checkInternetAvailability ().then (
internetAvailable => {
if (internetAvailable) mainWindow.loadURL (REMOTE_PAGE);
else mainWindow.loadFile (path.join (__dirname, 'index.html'));
// Call any code needed to be executed after this here!
}
).catch (error => {
console.error ("Oops, couldn't initialise!", error);
app.quit (1);
});
Please note that this code here might not be the most desirable since it just "crashes" your app with exit code 1 if there is any error other than connection timeout.
This, however, makes your startup asynchronous, which means that you need to pay attention on the execution chain of your app startup. Also, startup may be really slow in case the timeout is reached, it may be worth considering NodeJS' http module documentation.
Also, it makes sense to actually try to retrieve the page you're wanting to load in the BrowserWindow (constant values REMOTE_HOST and REMOTE_EP), because that also gives you an indication whether your server is up or not, although that means that the page will be fetched twice (in the best case, when the connection test succeeds and when Electron loads the page into the window). However, that should not be that big of a problem, since no external assets (images, CSS, JS) will be loaded.
One last note: This is not a good metric of whether any internet connection is available, it just tells you whether your server answered within the timeout window. It might very well be that any other service works or that the connection just is very slow (i.e., expect false negatives). Should be "good enough" for your use-case though.
Using Node.js, Chrome and puppeteer as headless on ubuntu server, I'm scraping a few different websites. One of the occasional task is to interact with the loaded page (click on a link to open another page and then possibly do another click to accept the terms and such).
I can do all this just fine, but I'm trying to understand how it will work if I have multiple pages open simultaneously and am trying to interact with different loaded pages at the same time (overlapping times).
To visualize this, I'm thinking how a user will do the same job. They'll have to open multiple browser windows, open the page and switch between them to see and then click on links.
But using puppeteer, we have separate browser object, we don't need to see the window or page to know where to click. We can traverse it through the browser object and then do a click on desired element without looking (headless).
I'm thinking I should be able to do multiple pages at the same time as long as I have CPU and memory available to handle them.
Does anyone have any experience with puppeteer interacting with multiple websites simultaneously? Anything I need to watch out for?
This is the problem the library puppeteer-cluster (I'm the author) is addressing. It allows you to build a pool of pages (or browsers) to use and run tasks inside.
You find several general code samples in the repository (and also on stackoverflow). Let me address your specific use case of running different tasks with an example.
Code Sample
The following code creates two tasks:
crawl: Opens the page and extracts an URL to then start the second task
screenshot: Takes a screenshot of the extracted URL
The process is started by queuing the crawl task with the URLs.
const { Cluster } = require('puppeteer-cluster');
(async () => {
const cluster = await Cluster.launch({ // use four pages in parallel
concurrency: Cluster.CONCURRENCY_PAGE,
maxConcurrency: 4,
});
// We define two tasks
const crawl = async ({ page, data: url }) => {
await page.goto(url);
const extractedURL = /* ... */; // extract an URL (or multiple) from the document somehow
cluster.queue(extractedURL, screenshot);
};
const screenshot = async ({ page, data: url }) => {
await page.goto(url);
await page.screenshot();
};
// Crawl some pages
cluster.queue('https://www.google.com/', crawl);
cluster.queue('https://github.com/', crawl);
// Wait until everything is done and close the cluster
await cluster.idle();
await cluster.close();
})();
This is a minimal example. I left out error handling, monitoring and the setup options.
I can usually get 5 or so browsers going on a 4GB server, if you're just popping urls off a queue it's pretty straightforward:
const puppeteer = require('puppeteer');
let queue = [
'http://www.amazon.com',
'http://www.google.com',
'http://www.fabebook.com',
'http://www.reddit.com',
]
const doQueue = async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
let url
while(url = queue.shift()){
await page.goto(url)
console.log(await page.title())
}
await browser.close()
}
[1,2,3].map(() => doQueue())
I am using workbox v4.3.1 to provide offline capability to the users of the web application.
While everything works perfectly in Chrome as you would expect a PWA to work (i.e everything is cached locally, all the updates from the APP are captured in IndexedDB and synced back to the server when the application is back online.
However the major use case for me is to provide support for iOS Safari and as a PWA.
While all the pages are cached locally using the Service Worker in Safari and all the offline updates are also captured in the indexed DB as shown below,
However, when the connection returns online, the sync event is not triggered by the browser (Safari in this case). While background sync is not supported natively by Safari, I would expect that when I refresh the page, SW initialisation should trigger the sync event manually if it finds some data to be refreshed to the server in the indexed DB.
But this is not happening and I tried to manually listen for the "message" - "replayRequests" and then replay the requests - that did not work as well.
Any help here would be appreciated. Here is the service worker code for reference.
// If we're not in the context of a Web Worker, then don't do anything
if ("function" === typeof importScripts) {
importScripts(
"https://storage.googleapis.com/workbox-cdn/releases/4.3.1/workbox-sw.js"
);
//Plugins
// Background Sync Plugin.
const bgSyncPlugin = new workbox.backgroundSync.Plugin("offlineSyncQueue", {
maxRetentionTime: 24 * 60
});
// Alternate method for creating a queue and managing the events ourselves.
const queue = new workbox.backgroundSync.Queue("offlineSyncQueue");
workbox.routing.registerRoute(
matchCb,
workbox.strategies.networkOnly({
plugins: [
{
fetchDidFail: async ({ request }) => {
await queue.addRequest(request);
}
}
]
}),
"POST"
);
// CacheKeyControlPlugin
const myCacheKeyPlugin = {
cacheKeyWillBeUsed: async ({ request, mode }) => {
normalizedUrl = removeTimeParam(request.url);
return new Request(normalizedUrl);
}
};
if (workbox) {
console.info("SW - Workbox is available and successfully installed");
} else {
console.info("SW - Workbox unavailable");
}
//Intercept all api requests
var matchCb = ({ url, event }) => {
// Filter out the presence api calls
return url.pathname.indexOf("somethingidontwanttocache") == -1;
};
function removeTimeParam(urlString) {
let url = new URL(urlString);
url.searchParams.delete("time");
return url.toString();
}
/* //Pre cache a page and see if it works offline - Temp code
workbox.precaching.precache(getPageAPIRequestURLs(), {
cleanUrls: false
}); */
workbox.routing.registerRoute(
matchCb,
new workbox.strategies.CacheFirst({
cacheName: "application-cache",
plugins: [myCacheKeyPlugin]
})
);
self.addEventListener("message", event => {
if (event.data === "replayRequests") {
queue.replayRequests();
}
});
}
workbox-background-sync emulates background sync functionality in browsers that lack native support by replaying queued requests whenever the service worker process starts up. The service worker process is meant to be lightweight and short lived, and is killed aggressively when there's a period of time without any events, and then is started up again in response to further events.
Reloading a web page may cause the service worker process to start up, assuming it had previously been stopped. But if the service worker is still running, then reloading the page will just cause a fetch event to be fired on the existing process.
The interval at which a service worker process can remain idle before it's killed is browser-dependent.
Chrome's DevTools offers a method of inspecting the state of a service worker and starting/stopping it on demand, but I don't believe Safari's DevTools offers that functionality. If you wanted to guanratee that a service worker was stopped and then start it up again, I would quit Safari, reopen it, and then navigate back to your web app.
My team and I have a project that was originally built as a PWA, but have since decided to scrap that idea as we realized it would need to change much more frequently than originally intended. However, the service worker is already live, as well as a newly redesigned landing page for the website. Despite all our efforts to clear the PWA caching, our clients are still reporting that they are receiving the old cached version of the website.
Currently, we have the service worker set up to delete all caches upon install (and whenever anything at all happens as a precaution), as well as some JavaScript to unregister the service worker when the new page actually loads. However, the problem is that none of this runs until the user makes a request to the website, and at that point the browser is already loading the cached content. Is it possible to clear this cache and prevent the browser from loading any content that was already cached?
Current service-worker.js
// Caching
var cacheCore = 'mkeSculptCore-0330121058';
var cacheAssets = 'mkeSculptAssets-0330121058';
self.addEventListener('install', function (event) {
self.skipWaiting();
caches.keys().then(function (names) {
for (let name of names)
caches.delete(name);
});
});
self.addEventListener('activate', function (event) {
caches.keys().then(function (names) {
for (let name of names)
caches.delete(name);
});
});
self.addEventListener('fetch', function (event) {
caches.keys().then(function (names) {
for (let name of names)
caches.delete(name);
});
});
Script in index.html
(function () {
if ('serviceWorker' in navigator) {
navigator.serviceWorker.getRegistrations().then(function (registrations) {
//returns installed service workers
if (registrations.length) {
for (let registration of registrations) {
registration.unregister();
}
}
});
}
})();
So far, I've read a few other similar StackOverflow answers, including this one, but they tend to rely on users manually doing something to fetch the new content, ie. via a hard reload or disabling the service worker manually through the browser settings. However, in my case, we cannot rely on manual user actions.
One way to solve this issue is to add a timestamp at end of the file(js, css) name so each time when it is making a request, the cache key is not available in the service worker and thus it tends to get a new version of the file at each load.
<script type="text/javascript" src="/js/scipt1.js?t=05042018121212"/>
For appending a new timestamp dynamically in the file name, please check this answer
But this may not be reliable if HTML itself is cached.
add this before to "update" all contents:
$.each(['index.html','file1.js','file2.js','file3.js'],function(index,file) {
$.get(file+'?t='+new Date().getTime(), function(){});
});
location.reload(true);
for the ServiceWorker to stop all windows using it must be closed.
If it's a webapp you can use window.close();
This code just loads a fresh version of the files in the list.
If there are any internal caches they will be all updated.
I'm coding a script in nodejs to automatically retrieve data from an online directory.
Knowing that I had never done this, I chose javascript because it is a language I use every day.
I therefore from the few tips I could find on google use request with cheerios to easily access components of dom of the page.
I found and retrieved all the necessary information, the only missing step is to recover the link to the next page except that the one is generated 4 seconds after loading of page and link contains a hash so that this step Is unavoidable.
What I would like to do is to recover dom of page 4-5 seconds after its loading to be able to recover the link
I looked on the internet, and much advice to use PhantomJS for this manipulation, but I can not get it to work after many attempts with node.
This is my code :
#!/usr/bin/env node
require('babel-register');
import request from 'request'
import cheerio from 'cheerio'
import phantom from 'node-phantom'
phantom.create(function(err,ph) {
return ph.createPage(function(err,page) {
return page.open(url, function(err,status) {
console.log("opened site? ", status);
page.includeJs('http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js', function(err) {
//jQuery Loaded.
//Wait for a bit for AJAX content to load on the page. Here, we are waiting 5 seconds.
setTimeout(function() {
return page.evaluate(function() {
var tt = cheerio.load($this.html())
console.log(tt)
}, function(err,result) {
console.log(result);
ph.exit();
});
}, 5000);
});
});
});
});
but i get this error :
return ph.createPage(function (page) {
^
TypeError: ph.createPage is not a function
Is what I am about to do is the best way to do what I want to do? If not what is the simplest way? If so, where does my error come from?
If You dont have to use phantomjs You can use nightmare to do it.
It is pretty neat library to solve problems like yours, it uses electron as web browser and You can run it with or without showing window (You can also open developer tools like in Google Chrome)
It has only one flaw if You want to run it on server without graphical interface that You must install at least framebuffer.
Nightmare has method like wait(cssSelector) that will wait until some element appears on website.
Your code would be something like:
const Nightmare = require('nightmare');
const nightmare = Nightmare({
show: true, // will show browser window
openDevTools: true // will open dev tools in browser window
});
const url = 'http://hakier.pl';
const selector = '#someElementSelectorWitchWillAppearAfterSomeDelay';
nightmare
.goto(url)
.wait(selector)
.evaluate(selector => {
return {
nextPage: document.querySelector(selector).getAttribute('href')
};
}, selector)
.then(extracted => {
console.log(extracted.nextPage); //Your extracted data from evaluate
});
//this variable will be injected into evaluate callback
//it is required to inject required variables like this,
// because You have different - browser scope inside this
// callback and You will not has access to node.js variables not injected
Happy hacking!