I'm working on a closed system web application to aid companies in their everyday online commerce chores. That means on the one hand that it won't be open to the public, on the other: it will have to deal with large amounts of data while maintaining a fluent work experience.
This is why I turned to web workers in JS to run all sorts of database access and data loading in the background.
My understanding is, that not only the main UI/main JS remains uninterrupted but also the different web workers run without hindering each other.
I now have the following setup:
mainJS: function statusCheck which runs on pageload:
function statusCheck() {
if(typeof(w__statusCheck) == "undefined") {
var w__statusCheck = new Worker("...statusCheck.js");
w__statusCheck.postMessage("go");
w__statusCheck.onmessage = function(e) {
var message = JSON.parse(e.data);
if(message.text!=undefined) displayMessage(message.text);
}
}
statusCheck.js which is the worker simply goes like this:
function checkStatus() {
console.log("statusCheck started");
// I will leave standard parts out:
// creating and testing the ajax variable against different browsers
ajaxRequest.onreadystatechange = function() {
if(ajaxRequest.readyState == 4) {
self.postMessage(ajaxRequest.responseText);
var timer;
timer = self.setTimeout(function(){
checkStatus();
}, 1000);
}
}
ajaxRequest.open("GET", "...worker_statusCheck.php", true);
ajaxRequest.send(null);
}
this.onmessage = function(e){
checkStatus();
};
As you can see, this restarts itself every second (for now). The intervall might be longer in production.
worker_statusCheck.php simply gets different things from the database and knits them into a JSON object which gives me the system status.
This works beautifully.
Now I have another worker which is supposed to get initiated by a click on a link to effectively call some php to perform actions:
mainJS loadWorker
function loadWorker(url="") {
console.log("loadWorker started");
if(url!="") {
var uniqueID = "XXX" // creating a random ID based on timestamp and Math.random()
if(typeof(window[uniqueID]) == "undefined") {
var variables = { ajaxURL: url };
window[uniqueID] = new Worker("....loadWorker.js");
window[uniqueID].postMessage(JSON.stringify(variables));
window[uniqueID].onmessage = function(e) {
var message = JSON.parse(e.data);
if(message["success"]!=undefined) {
variables["close"] = "yes";
window[uniqueID].postMessage(JSON.stringify(variables));
}
}
}
With every click on a certain link this gets called, creates a uniquely named worker, runs it, receives the data and tells the worker to close().
The php again does its thing and writes a progress update in the DB after each step of the lengthy procedure. These progress updates I fetch from the DB with the above repeating statusCheck.
Now, I can see the entries in the DB with timestamp, so I know they get written each at their time.
So, both workers do their job and run reliably. But I have noticed, that whenever I initiate the manual (randomly named) worker the statusCheck actually stops performing. It just gets stuck... I was able to confirm this with console output from both workers. So it's not the main JS that seems stuck, but the statusCheck actually pauses... and resumes when loadWorker is done.
Am I missing something fundamental here? Any insight would be appreciated since I'm new to this concept of web workers.
Thanx :)
Your question lacks resources to truly figure out what exactly goes wrong. I can concur that two web workers can operate at the same time, even with synchronous operations. I tested this for both for loops and sync XHR requests.
There are multiple things I would recommend though.
First - unless you're processing the data with some CPU heavy algorithm, web workers are waste of time. XHR requests do not block main thread (unless you explicitly ask them to).
In statusCheck() you declare var w__statusCheck which means a local variable. Therefore it will always be null as seen from outer scope. It might get garbage-collected once no code is running in the worker.
Do not use XMLHttpRequest.onreadystatechange. Use onload and onerror.
Random unique ID's for variables are almost always wrong. If you need to store the worker refference at all, either give it a reasonable name (eg. the url it's supposed to load) or use incremental id.
Do NOT stringify data that you post to web worker. It's already done for you by the browser, possibly in more optimal manner. Converting the data to something is a single most common stupid thing people do with web workers.
Also when posting question, at least make sure the code makes some sense. In your post curly braces do not match.
Alright.. I figured it out:
I was looking in all the wrong places. Turns out, I had initialized my php session in all the php scripts which are called by the workers. And my two parallel workers both called one. So the session file was locked by the first php script and the second had to wait until it was back open again. It was not the workers or the JS being hindered, it was the php.
I now took out the session initialization from my statusCheck.php and it works like a charm. I will keep it in those others that handle the user input responses because there it actually makes sense: user clicks on button "compile data XY" which is run by the worker and takes a while. Impatient as he is he already clicks the next button "show this data"... and due to the locked session file I have sort of a neat queue for those actions. :)
I still will take above recommendations to heart and see to it to improve my code. :)
Related
I have a wasm process (compiled from c++) that processes data inside a web application. Let's say the necessary code looks like this:
std::vector<JSONObject> data
for (size_t i = 0; i < data.size(); i++)
{
process_data(data[i]);
if (i % 1000 == 0) {
bool is_cancelled = check_if_cancelled();
if (is_cancelled) {
break;
}
}
}
This code basically "runs/processes a query" similar to a SQL query interface:
However, queries may take several minutes to run/process and at any given time the user may cancel their query. The cancellation process would occur in the normal javascript/web application, outside of the service Worker running the wasm.
My question then is what would be an example of how we could know that the user has clicked the 'cancel' button and communicate it to the wasm process so that knows the process has been cancelled so it can exit? Using the worker.terminate() is not an option, as we need to keep all the loaded data for that worker and cannot just kill that worker (it needs to stay alive with its stored data, so another query can be run...).
What would be an example way to communicate here between the javascript and worker/wasm/c++ application so that we can know when to exit, and how to do it properly?
Additionally, let us suppose a typical query takes 60s to run and processes 500MB of data in-browser using cpp/wasm.
Update: I think there are the following possible solutions here based on some research (and the initial answers/comments below) with some feedback on them:
Use two workers, with one worker storing the data and another worker processing the data. In this way the processing-worker can be terminated, and the data will always remain. Feasible? Not really, as it would take way too much time to copy over ~ 500MB of data to the webworker whenever it starts. This could have been done (previously) using SharedArrayBuffer, but its support is now quite limited/nonexistent due to some security concerns. Too bad, as this seems like by far the best solution if it were supported...
Use a single worker using Emterpreter and using emscripten_sleep_with_yield. Feasible? No, destroys performance when using Emterpreter (mentioned in the docs above), and slows down all queries by about 4-6x.
Always run a second worker and in the UI just display the most recent. Feasible? No, would probably run into quite a few OOM errors if it's not a shared data structure and the data size is 500MB x 2 = 1GB (500MB seems to be a large though acceptable size when running in a modern desktop browser/computer).
Use an API call to a server to store the status and check whether the query is cancelled or not. Feasible? Yes, though it seems quite heavy-handed to long-poll with network requests every second from every running query.
Use an incremental-parsing approach where only a row at a time is parsed. Feasible? Yes, but also would require a tremendous amount of re-writing the parsing functions so that every function supports this (the actual data parsing is handled in several functions -- filter, search, calculate, group by, sort, etc. etc.
Use IndexedDB and store the state in javascript. Allocate a chunk of memory in WASM, then return its pointer to JavaScript. Then read database there and fill the pointer. Then process your data in C++. Feasible? Not sure, though this seems like the best solution if it can be implemented.
[Anything else?]
In the bounty then I was wondering three things:
If the above six analyses seem generally valid?
Are there other (perhaps better) approaches I'm missing?
Would anyone be able to show a very basic example of doing #6 -- seems like that would be the best solution if it's possible and works cross-browser.
For Chrome (only) you may use shared memory (shared buffer as memory). And raise a flag in memory when you want to halt. Not a big fan of this solution (is complex and is supported only in chrome). It also depends on how your query works, and if there are places where the lengthy query can check the flag.
Instead you should probably call the c++ function multiple times (e.g. for each query) and check if you should halt after each call (just send a message to the worker to halt).
What I mean by multiple time is make the query in stages (multiple function cals for a single query). It may not be applicable in your case.
Regardless, AFAIK there is no way to send a signal to a Webassembly execution (e.g. Linux kill). Therefore, you'll have to wait for the operation to finish in order to complete the cancellation.
I'm attaching a code snippet that may explain this idea.
worker.js:
... init webassembly
onmessage = function(q) {
// query received from main thread.
const result = ... call webassembly(q);
postMessage(result);
}
main.js:
const worker = new Worker("worker.js");
const cancel = false;
const processing = false;
worker.onmessage(function(r) {
// when worker has finished processing the query.
// r is the results of the processing.
processing = false;
if (cancel === true) {
// processing is done, but result is not required.
// instead of showing the results, update that the query was canceled.
cancel = false;
... update UI "cancled".
return;
}
... update UI "results r".
}
function onCancel() {
// Occurs when user clicks on the cancel button.
if (cancel) {
// sanity test - prevent this in UI.
throw "already cancelling";
}
cancel = true;
... update UI "canceling".
}
function onQuery(q) {
if (processing === true) {
// sanity test - prevent this in UI.
throw "already processing";
}
processing = true;
// Send the query to the worker.
// When the worker receives the message it will process the query via webassembly.
worker.postMessage(q);
}
An idea from user experience perspective:
You may create ~two workers. This will take twice the memory, but will allow you to "cancel" "immediately" once. (it will just mean that in the backend the 2nd worker will run the next query, and when the 1st finishes the cancellation, cancellation will again become immediate).
Shared Thread
Since the worker and the C++ function that it called share the same thread, the worker will also be blocked until the C++ loop is finished, and won't be able to handle any incoming messages. I think the a solid option would minimize the amount of time that the thread is blocked by instead initializing one iteration at a time from the main application.
It would look something like this.
main.js -> worker.js -> C++ function -> worker.js -> main.js
Breaking up the Loop
Below, C++ has a variable initialized at 0, which will be incremented at each loop iteration and stored in memory.
C++ function then performs one iteration of the loop, increments the variable to keep track of loop position, and immediately breaks.
int x;
x = 0; // initialized counter at 0
std::vector<JSONObject> data
for (size_t i = x; i < data.size(); i++)
{
process_data(data[i]);
x++ // increment counter
break; // stop function until told to iterate again starting at x
}
Then you should be able to post a message to the web worker, which then sends a message to main.js that the thread is no longer blocked.
Canceling the Operation
From this point, main.js knows that the web worker thread is no longer blocked, and can decide whether or not to tell the web worker to execute the C++ function again (with the C++ variable keeping track of the loop increment in memory.)
let continueOperation = true
// here you can set to false at any time since the thread is not blocked here
worker.expensiveThreadBlockingFunction()
// results in one iteration of the loop being iterated until message is received below
worker.onmessage = function(e) {
if (continueOperation) {
worker.expensiveThreadBlockingFunction()
// execute worker function again, ultimately continuing the increment in C++
} {
return false
// or send message to worker to reset C++ counter to prepare for next execution
}
}
Continuing the Operation
Assuming all is well, and the user has not cancelled the operation, the loop should continue until finished. Keep in mind you should also send a distinct message for whether the loop has completed, or needs to continue, so you don't keep blocking the worker thread.
I've done an HTML form which has a lot of questions (coming from a database) in many different tabs. User then gives answers in those questions. Each time a user changes a tab my Javascript creates a save. The problem is that I have to loop through all questions each time the tab is changed and it freezes the form for about 5 seconds every time.
I've been searching for an answer how I can run my save function in the background. Apparently there is no real way to run something in the background and many recommend using setTimeout(); For example this one How to get a group of js function running in background
But none of these examples does explain or take into consideration that even if I use something like setTimeout(saveFunction, 2000); it doesn't solve my problem. It only postpones it by 2 seconds in this case.
Is there a way to solve this problem?
You can use web workers. Some of the older answers here say that they're not widely supported (which I guess they weren't when those answers were written), but today they're supported by all major browsers.
To run a web worker, you need to create an instance of the built-in Worker class. The constructor takes one argument which is the URI of the javascript file containing the code you want to run in the background. For example:
let worker = new Worker("/path/to/script.js");
Web workers are subject to the same origin policy so if you pass a path like this the target script must be on the same domain as the page calling it.
If you don't want to create an new Javascript file just for this, you can also use a data URI:
let worker = new Worker(
`data:text/javascript,
//Enter Javascript code here
`
);
Because of the same origin policy, you can't send an AJAX request from a data URI, so if you need to send an AJAX request in the web worker, you must use a separate Javascript file.
The code that you specify (either in a separate file or in a data URI) will be run as soon as you call the Worker constructor.
Unfortunately, web workers don't have access to neither outside Javascript variables, functions or classes, nor the DOM, but you can get around this by using the postMessage method and the onmessage event. In the outside code, these are members of the worker object (worker in the example above), and inside the worker, these are members of the global context (so they can be called either by using this or just like that with nothing in front).
postMessage and onmessage work both ways, so when worker.postMessage is called in the outside code, onmessage is fired in the worker, and when postMessage is called in the worker, worker.onmessage is fired in the outside code.
postMessage takes one argument, which is the variable you want to pass (but you can pass several variables by passing an array). Unfortunately, functions and DOM elements can't be passed, and when you try to pass an object, only its attributes will be passed, not its methods.
onmessage takes one argument, which is a MessageEvent object. The MessageEvent object has a data attribute, which contains the data sent using the first argument of postMessage.
Here is an example using web workers. In this example, we have a function, functionThatTakesLongTime, which takes one argument and returns a value depending on that argument, and we want to use web workers in order to find functionThatTakesLongTime(foo) without freezing the UI, where foo is some variable in the outside code.
let worker = new Worker(
`data:text/javascript,
function functionThatTakesLongTime(someArgument){
//There are obviously faster ways to do this, I made this function slow on purpose just for the example.
for(let i = 0; i < 1000000000; i++){
someArgument++;
}
return someArgument;
}
onmessage = function(event){ //This will be called when worker.postMessage is called in the outside code.
let foo = event.data; //Get the argument that was passed from the outside code, in this case foo.
let result = functionThatTakesLongTime(foo); //Find the result. This will take long time but it doesn't matter since it's called in the worker.
postMessage(result); //Send the result to the outside code.
};
`
);
worker.onmessage = function(event){ //Get the result from the worker. This code will be called when postMessage is called in the worker.
alert("The result is " + event.data);
}
worker.postMessage(foo); //Send foo to the worker (here foo is just some variable that was defined somewhere previously).
Apparently there is no real way to run something on background...
There is on most modern browsers (but not IE9 and earlier): Web Workers.
But I think you're trying to solve the problem at the wrong level: 1. It should be possible to loop through all of your controls in a lot less than five seconds, and 2. It shouldn't be necessary to loop through all controls when only one of them has changed.
I suggest looking to those problems before trying to offload that processing to the background.
For instance, you could have an object that contains the current value of each item, and then have the UI for each item update that object when the value changes. Then you'd have all the values in that object, without having to loop through all the controls again.
You could take a look at HTML5 web workers, they're not all that widely supported though.
This works in background:
setInterval(function(){ d=new Date();console.log(d.getTime()); }, 500);
If you can't use web workers because you need to access the DOM, you can also use async functions. The idea is to create an async refreshUI function that refreshes the UI, and then call that function regularly in your function that takes long time.
The refreshUI function would look like this:
async function refreshUI(){
await new Promise(r => setTimeout(r, 0));
}
In general, if you put await new Promise(r => setTimeout(r, ms)); in an async function, it will run all the code before that line, then wait for ms milliseconds without freezing the UI, then continues running the code after that line. See this answer for more information.
The refreshUI function above does the same thing except that it waits zero milliseconds without freezing the UI before continuing, which in practice means that it refreshes the UI and then continues.
If you use this function to refresh the UI often enough, the user won't notice the UI freezing.
Refreshing the UI takes time though (not enough time for you to notice if you just do it once, but enough time for you to notice if you do it at every iteration of a long for loop). So if you want the function to run as fast as possible while still not freezing the UI, you need to make sure not to refresh the UI too often. So you need to find a balance between refreshing the UI often enough for the UI not to freeze, but not so often that it makes your code significantly slower. In my use case I found that refreshing the UI every 20 milliseconds is a good balance.
You can rewrite the refreshUI function from above using performance.now() so that it only refreshes the UI once every 20 milliseconds (you can adjust that number in your own code if you want) no matter how often you call it:
let startTime = performance.now();
async function refreshUI(){
if(performance.now() > startTime + 20){ //You can change the 20 to how often you want to refresh the UI in milliseconds
startTime = performance.now();
await new Promise(r => setTimeout(r, 0));
}
}
If you do this, you don't need to worry about calling refreshUI to often (but you still need to make sure to call it often enough).
Since refreshUI is an async function, you need to call it using await refreshUI() and the function calling it must also be an async function.
Here is an example that does the same thing as the example at the end of my other answer, but using this method instead:
let startTime = performance.now();
async function refreshUI(){
if(performance.now() > startTime + 20){ //You can change the 20 to how often you want to refresh the UI in milliseconds
startTime = performance.now();
await new Promise(r => setTimeout(r, 0));
}
}
async function functionThatTakesLongTime(someArgument){
//There are obviously faster ways to do this, I made this function slow on purpose just for the example.
for(let i = 0; i < 1000000000; i++){
someArgument++;
await refreshUI(); //Refresh the UI if needed
}
return someArgument;
}
alert("The result is " + await functionThatTakesLongTime(3));
This library helped me out a lot for a very similar problem that you describe: https://github.com/kmalakoff/background
It basically a sequential background queue based on the WorkerQueue library.
Just create a hidden button. pass the function to its onclick event.
Whenever you want to call that function (in background), call the button's click event.
<html>
<body>
<button id="bgfoo" style="display:none;"></button>
<script>
function bgfoo()
{
var params = JSON.parse(event.target.innerHTML);
}
var params = {"params":"in JSON format"};
$("#bgfoo").html(JSON.stringify(params));
$("#bgfoo").click(bgfoo);
$("#bgfoo").click(bgfoo);
$("#bgfoo").click(bgfoo);
</script>
</body>
</html>
If I have a var on my main page, and have a worker thread trying to set this var, is there a way the page can access it? Assuming everything is synchronized?
var routeWorker = new Worker('getroute.js');
var checkPatrolRouteFoundTimer;
var rw_resultRoute;
var routeFound = false;
routeWorker.onmessage = function(e) {
rw_resultRoute = e.data.route;
routeFound = true;
}
function checkPatrolReady() {
if(!routeFound)
checkPatrolRouteFoundTimer = setTimeout("checkPatrolReady()", 1000);
}
function ForcePatrol(index) {
routeWorker.postMessage(index);
checkPatrolReady();
...
//do work on route
...
}
in this case, the var I'm talking about is rw_resultRoute, and I can see it get set correctly when debugging. But the only thing is that it's set in the worker thread, not in the page thread.
I flow through the ForcePatrol() method the way i'm expecting to, and it looks like the rw_resultRoute is being set, since routeFound evaluates to true after the worker finishes.
Technically, it doesn't make sense, since routeFound can be set by the worker and read by the page thread, but rw_resultRoute can only be accessed by the worker.
I truly hope this is possible, otherwise I don't see a purpose for worker threads other than showing alert() messages and updating page HTML.
I truly hope this is possible, otherwise I don't see a purpose for worker threads other than showing alert() messages and updating page HTML.
It is meant to handle processing that would normally lock up the browser. Great for crunching numbers for canvas and running hashing.
in this case, the var I'm talking about is rw_resultRoute, and I can see it get set correctly when debugging. But the only thing is that it's set in the worker thread, not in the page thread.
The worker is separate from the page that spawns it. Only way to pass data is through messaging. You need to send the data with postMessage and have the onMessage handle the result. If you are handling different things, set up a switch statement to handle the different message types.
I solved the problem. There was some synchronization I wasn't doing correctly. I was using the setTimeout in the wrong way.
var routeWorker = new Worker('getroute.js');
var checkPatrolRouteFoundTimer;
var rw_resultRoute;
var routeFound = false;
routeWorker.onmessage = function(e) {
rw_resultRoute = e.data.route;
routeFound = true;
}
function checkPatrolReady() {
if(routeFound) {
...
//do work on route
...
clearInterval(checkPatrolRouteFoundTimer);
} else {
// do any maint here?
}
}
function ForcePatrol(index) {
routeWorker.postMessage(index);
checkPatrolRouteFoundTimer = setInterval("checkPatrolReady()", 1000);
}
Any call to setTimeout/setInterval will flow through, and in the first example i was using setTimeout instead of setInterval.
In the new way, calling ForcePatrol will setup the timer, and checkPatrolReady() will evaluate the flag, doing the work and clearing the timer if it is true.
So there is indeed nothing fancy in getting the results from web workers, but I was essentially creating a race condition with the worker results.
I am porting an old game from C to Javascript. I have run into an issue with display code where I would like to have the main game code call display methods without having to worry about how those status messages are displayed.
In the original code, if the message is too long, the program just waits for the player to toggle through the messages with the spacebar and then continues. This doesn't work in javascript, because while I wait for an event, all of the other program code continues. I had thought to use a callback so that further code can execute when the player hits the designated key, but I can't see how that will be viable with a lot of calls to display.update(msg) scattered throughout the code.
Can I architect things differently so the event-based, asynchronous model works, or is there some other solution that would allow me to implement a more traditional event loop?
Am I making sense?
Example:
// this is what the original code does, but obviously doesn't work in Javascript
display = {
update : function(msg) {
// if msg is too long
// wait for user input
// ok, we've got input, continue
}
};
// this is more javascript-y...
display = {
update : function(msg, when_finished) {
// show part of the message
$(document).addEvent('keydown', function(e) {
// display the rest of the message
when_finished();
});
}
};
// but makes for amazingly nasty game code
do_something(param, function() {
// in case do_something calls display I have to
// provide a callback for everything afterwards
// this happens next, but what if do_the_next_thing needs to call display?
// I have to wait again
do_the_next_thing(param, function() {
// now I have to do this again, ad infinitum
}
}
The short answer is "no."
The longer answer is that, with "web workers" (part of HTML5), you may be able to do it, because it allows you to put the game logic on a separate thread, and use messaging to push keys from the user input into the game thread. However, you'd then need to use messaging the other way, too, to be able to actually display the output, which probably won't perform all that well.
Have a flag that you are waiting for user input.
var isWaiting = false;
and then check the value of that flag in do_something (obviously set it where necessary as well :) ).
if (isWaiting) return;
You might want to implement this higher up the call stack (what calls do_something()?), but this is the approach you need.
I have a web application where there are number of Ajax components which refresh themselves every so often inside a page (it's a dashboard of sorts).
Now, I want to add functionality to the page so that when there is no Internet connectivity, the current content of the page doesn't change and a message appears on the page saying that the page is offline (currently, as these various gadgets on the page try to refresh themselves and find that there is no connectivity, their old data vanishes).
So, what is the best way to go about this?
navigator.onLine
That should do what you're asking.
You probably want to check that in whatever code you have that updates the page. Eg:
if (navigator.onLine) {
updatePage();
} else {
displayOfflineWarning();
}
It seems like you've answered your own question. If the gadgets send an asynch request and it times out, don't update them. If enough of them do so, display the "page is offline" message.
See the HTML 5 draft specification. You want navigator.onLine. Not all browsers support it yet. Firefox 3 and Opera 9.5 do.
It sounds as though you are trying to cover up the problem rather than solve it. If a failed request causes your widgets to clear their data, then you should fix your code so that it doesn't attempt to update your widgets unless it receives a response, rather than attempting to figure out whether the request will succeed ahead of time.
One way to handle this might be to extend the XmlHTTPRequest object with an explicit timeout method, then use that to determine if you're working in offline mode (that is, for browsers that don't support navigator.onLine). Here's how I implemented Ajax timeouts on one site (a site that uses the Prototype library). After 10 seconds (10,000 milliseconds), it aborts the call and calls the onFailure method.
/**
* Monitor AJAX requests for timeouts
* Based on the script here: http://codejanitor.com/wp/2006/03/23/ajax-timeouts-with-prototype/
*
* Usage: If an AJAX call takes more than the designated amount of time to return, we call the onFailure
* method (if it exists), passing an error code to the function.
*
*/
var xhr = {
errorCode: 'timeout',
callInProgress: function (xmlhttp) {
switch (xmlhttp.readyState) {
case 1: case 2: case 3:
return true;
// Case 4 and 0
default:
return false;
}
}
};
// Register global responders that will occur on all AJAX requests
Ajax.Responders.register({
onCreate: function (request) {
request.timeoutId = window.setTimeout(function () {
// If we have hit the timeout and the AJAX request is active, abort it and let the user know
if (xhr.callInProgress(request.transport)) {
var parameters = request.options.parameters;
request.transport.abort();
// Run the onFailure method if we set one up when creating the AJAX object
if (request.options.onFailure) {
request.options.onFailure(request.transport, xhr.errorCode, parameters);
}
}
},
// 10 seconds
10000);
},
onComplete: function (request) {
// Clear the timeout, the request completed ok
window.clearTimeout(request.timeoutId);
}
});
Hmm actually, now I look into it a bit, it's a bit more complicated than that. Have a read of these links on John Resig's blog and the Mozilla site. The above poster may also have a good point - you're making requests anyway, so you should be able to work out when they fail.. That might be a much more reliable way to go.
Make a call to a reliable destination, or perhaps a series of calls, ones that should go through and return if the user has an active net connection - even something as simple as a token ping to google, yahoo, and msn, or something like that. If at least one comes back green, you know you're connected.
I think google gears have such functionality, maybe you could check how they did that.
Use the relevant HTML5 API: online/offline status/events.
One possible solution is that if the page and the cached page have a different url to just look and see what url you are on. If you are on the url of the cached page then you are in offline mode. This blog makes a good point about why navigator.online is broke