Can I detect a hanging JS function and abort? - javascript

I am working on a tool that allows users to enter a regular expression for a find&replace, and then my tool will execute that find&replace and return the changed text. However, I recently ran into a situation where the find&replace simply froze, so I decided it would probably be best to somehow detect issues with regular expression matching, and abort after a certain amount of time has passed.
I've checked around, and what I was able to find using this answer was that the problem I'm experiencing was called 'catastrophic backtracking'. That's ideal to know, because that way I can make a minimal working example of where it goes wrong, however not ideal if the solution to change the regular expression isn't possible, because I have no control over the user's regex input (and there's no way I can write an advanced enough regex parser to limit the user's regex usage to exclude situations like this).
So in an attempt to solve this, I tried using promises, as suggested in this answer. I've made the 'catastrophic' match string in this example just long enough for the effect to hang my tab for a few seconds, without completely crashing the tab. Different computer specs may see different results though, I'm not sure.
Just one heads-up: Executing this code might freeze your current tab. PLEASE make sure you do not have a partial answer written when executing this code, as it might cause loss of work.
var PTest = function () {
return new Promise(function (resolve, reject) {
setTimeout(function() {
reject();
}, 100)
"xxxxxxxxxxxxxxxxxxxxxxxxx".match(/(x+x+)+y/)
resolve();
});
}
var myfunc = PTest();
myfunc.then(function () {
console.log("Promise Resolved");
}).catch(function () {
console.log("Promise Rejected");
});
On my computer, this causes the tab to freeze for about 4 seconds before showing "Promise Resolved" in the console.
My question now is: is it at all possible to "abort" the execution of a script like this, if execution takes too long (in the example: over 0.2 seconds)? I'd rather kill the regex find&replace than completely crash the tool, causing loss of work for the user.

I recommend using a Web Worker since it will run in its own sandbox: https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API
The Web Worker is its own script that you need to include in your JavaScript, such as:
var worker = new Worker('/path/to/run-regex.js');
The following is untested code, but should get you going.
Your run-regex.js does the (potentially long running) regex match:
function regexMatch(str, regexStr, callback) {
let regex = new RegExp(regexStr);
let result = str.match(regex);
callback(result, '');
}
onmessage = function(e) {
let data = e.data;
switch (data.cmd) {
case 'match':
regexMatch(data.str, data.regex, function(result, err) {
postMessage({ cmd: data.cmd, result: result, err: err });
});
break;
case 'replace':
//regexMatch(data.str, data.regex, data.replace, function(result, err) {
// postMessage({ cmd: data.cmd, result: result, err: err });
//});
break;
default:
break;
postMessage({ err: 'Unknown command: ' + data.cmd });
}
}
In your own script, load the Web Worker, and add an event listener:
if(window.Worker) {
const myWorker = new Worker('/path/to/run-regex.js');
myWorker.onmessage = function(e) {
let data = e.data;
if(data.err) {
// handle error
} else {
// handle match result using data.result;
}
}
function regexMatch(str, regex) {
let data = { cmd: 'match', str: str, regex: regex.toString() };
myWorker.postMessage(data);
}
regexMatch('xxxxxxxxxxxxxxxxxxxxxxxxx', /(x+x+)+y/);
} else {
console.log('Your browser does not support web workers.');
}
With this, your main JavaScript thread is non blocking while the worker is working.
In case of a long running worker, you may add code to either:
ungracefully terminate the web worker using myWorker.terminate(), then restart it -- see Is it possible to terminate a running web worker?
or, try to close() from within the web worker scope -- see JavaScript Web Worker - close() vs terminate()

Related

browser.executeScript(return window.document.readyState) is not getting resolved in protractor

I want to make my script to wait till web page is loaded completely for which I am using javascript expression "window.document.readyState" which will return "complete" if page is loaded completely.
function waitForWebpageLoadingCompletely(callback) {
try {
var status="Incomplete";
do {
var flag = browser.executeScript("return window.document.readyState ;");
//console.log(flag)
flag.then(function (state) {
console.log(state);
if(state==="complete")
callback();
else {
//status = "Incomplete";
console.log(state);
}
},function (err) {
console.log(err)
})
}while(!(status ==="complete"));
} catch (e) {
expect(false);
console.log(e);
callback();
}
}
But executeScript is not resolving to any success or error. Execution stops at this line. And after some time it gives below eror:
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
You need to use browser.wait - you mentioned in your comment that you needed to wait for the page title, so you will need to write something like this:
var WaitForPage = function(pageTitle, timeout = 30000){
var deferred = promise.defer();
browser.waitForAngular().then(function(){
var el = element(by.cssContainingText('h1', pageTitle));
var EC = protractor.ExpectedConditions;
browser.wait(EC.presenceOf(el), timeout).then(function(){
deferred.fulfill();
});
});
return deferred.promise;
}
This function will wait up to 30000ms for an element of type h1 that contains the specified pageTitle text. Change as appropriate to suit your situation and then call WaitforPage('title') before you continue.
Actually webdriver does that type of wait already out of the box. Not quite sure why you need to implement own pooling and waiting for document ready state.
If you working with some AngularJS/ReactJS or other SPA applications - this wait will give you actually nothing, since in such apps all the work are done after page is loaded and js starts to execute.
I would prefer to use suggestion of #m-hudson about using browser.wait() for some specific elements to appear.

Why does postMessage called upon terminated worker not throw some error?

After a web worker is terminated, why does postMessage not throw an error if I call it?
Is it possible to restart a worker, with an existing terminated instance, without the constructor new Worker("same-worker.js")?
const myWorker = new Worker("my-worker.js");
myWorker.addEventListener("message", function(event) {
const message = event.data;
console.log("from worker", message);
myWorker.terminate();
myWorker.postMessage("");
// what happens here, is only silence
// Why not throw error?
});
myWorker.postMessage(""); //start it, maybe receive a response
Edit: I am asking about the rationale for this particular design. XHR, WebSocket and WebRTC immediately throw when trying to do stuff on a terminated instance.
tl;dr jsFiddle
I cannot answer why, because I am not the one who wrote javascript standard. But there is no indication whatsoever whether the worker has terminated or not.
Detecting termination
If you need to detect this state, you must create your own API. I propose this simple function based draft as a starting point:
//Override worker methods to detect termination
function makeWorkerSmart(workerURL) {
// make normal worker
var worker = new Worker(workerURL);
// assume that it's running from the start
worker.terminated = false;
// allows the worker to terminate itself
worker.addEventListener("message", function(e) {
if(e.data == "SECRET_WORKER_TERMINATE_MESSAGE") {
this.terminated = true;
console.info("Worker killed itself.");
}
});
// Throws error if terminated is true
worker.postMessage = function() {
if(this.terminated)
throw new Error("Tried to use postMessage on worker that is terminated.");
// normal post message
return Worker.prototype.postMessage.apply(this, arguments);
}
// sets terminate to true
// throws error if called multiple times
worker.terminate = function() {
if(this.terminated)
throw new Error("Tried to terminate terminated worker.");
this.terminated = true;
// normal terminate
return Worker.prototype.terminate.apply(this, arguments);
}
// creates NEW WORKER with the same URL as itself
worker.restart = function() {
return makeWorkerSmart(workerURL);
}
return worker;
}
To also detect termination from the side of the worker, you will need to run this code inside every worker:
function makeInsideWorkerSmart(workerScope) {
var oldClose = workerScope.close;
workerScope.close = function() {
postMessage("SECRET_WORKER_TERMINATE_MESSAGE");
oldClose();
}
}
makeInsideWorkerSmart(self);
This will send message to the main window when worker terminates itself with close.
You can use it like this:
var worker = makeWorkerSmart(url);
worker.terminate();
worker.postMessage("test"); // throws error!
Restarting the worker
As of restarting the worker: it is not technically possible to start from some previous state without saving it somewhere yourself. I propose this solution that I have implemented above:
worker.terminate();
worker = worker.restart();
You can also clone the worker this way as it doesn't stop the original worker.
According to the docs, the terminate method of the worker
immediately terminates the Worker. This does not offer the worker an opportunity to finish its operations; it is simply stopped at once.
I think that since the instance it's still there even if it's in this dead state, doing a postMessage will not throw, but since your worker is not processing any operation, it will simply do nothing.
There is no way to my knowledge from resuming a worker, but you could potentially set a Boolean pausing your processing via a message too, if you want to be able to resume it at will.

After error 404 my stream stop working in RxJs

I'm trying to learn RxJs. I have this working code but after one AJAX error, everything stopped working.
(function($, _) {
var alertBox = $('.alert-box');
alertBox.hide();
var fetchRepoButton = $('.fetch-repo');
var organization = $('#organization');
var repositories = $('.repositories');
var fetchRepoClickStream = Rx.Observable.fromEvent(fetchRepoButton, 'click');
var requestStream = fetchRepoClickStream.map(function() {
var theOrg = organization.val();
return '/api/orgs/' + theOrg;
});
var responseStream = requestStream.flatMap(function (requestUrl) {
return Rx.Observable.fromPromise($.getJSON(requestUrl)).catch(function() {
alertBox.fadeIn('fast').delay(500).fadeOut('slow');
return Rx.Observable.Empty();
});
});
var renderRepositories = function(repos) {
// render DOM
}
responseStream.subscribe(function (repos) {
renderRepositories(repos);
});
})($, _);
How do I recover from AJAX error?
This could be because you are actually terminating the stream when you return Rx.Observable.Empty(); from the flatMap. You could return the error and process it downstream, instead of ending the stream.
But in the end, the answer to your question will depend on what makes sense for you (you can retry a number of times with exponential increasing delay, you can abort and continue with something else, etc.).
Generally speaking, there are a variety of operators helping with error management. This would definitely be a good starting point for going deeper in error management with Rxjs : https://xgrommx.github.io/rx-book/content/getting_started_with_rxjs/creating_and_querying_observable_sequences/error_handling.html
Among the interesting operators are :
retry/retryWhen
catch/finally
Interesting link on the subject from SO :
Rx.js and application workflow
How to build an rx poller that waits some interval AFTER the previous ajax promise resolves? (have a look at the final answer included in the question, it makes use of repeatWhen and retryWhen)

Catch Error from gapi.client.load

I'm using Google App Engine with Java and Google Cloud Endpoints. In my JavaScript front end, I'm using this code to handle initialization, as recommended:
var apisToLoad = 2;
var url = '//' + $window.location.host + '/_ah/api';
gapi.client.load('sd', 'v1', handleLoad, url);
gapi.client.load('oauth2', 'v2', handleLoad);
function handleLoad() {
// this only executes once,
if (--apisToLoad === 0) {
// so this is not executed
}
}
How can I detect and handle when gapi.client.load fails? Currently I am getting an error printed to the JavaScript console that says: Could not fetch URL: https://webapis-discovery.appspot.com/_ah/api/static/proxy.html). Maybe that's my fault, or maybe it's a temporary problem on Google's end - right now that is not my concern. I'm trying to take advantage of this opportunity to handle such errors well on the client side.
So - how can I handle it? handleLoad is not executed for the call that errs, gapi.client.load does not seem to have a separate error callback (see the documentation), it does not actually throw the error (only prints it to the console), and it does not return anything. What am I missing? My only idea so far is to set a timeout and assume there was an error if initialization doesn't complete after X seconds, but that is obviously less than ideal.
Edit:
This problem came up again, this time with the message ERR_CONNECTION_TIMED_OUT when trying to load the oauth stuff (which is definitely out of my control). Again, I am not trying to fix the error, it just confirms that it is worth detecting and handling gracefully.
I know this is old but I came across this randomly. You can easily test for a fail (at least now).
Here is the code:
gapi.client.init({}).then(() => {
gapi.client.load('some-api', "v1", (err) => { callback(err) }, "https://someapi.appspot.com/_ah/api");
}, err, err);
function callback(loadErr) {
if (loadErr) { err(loadErr); return; }
// success code here
}
function err(err){
console.log('Error: ', err);
// fail code here
}
Example
Unfortunately, the documentation is pretty useless here and it's not exactly easy to debug the code in question. What gapi.client.load() apparently does is inserting an <iframe> element for each API. That frame then provides the necessary functionality and allows accessing it via postMessage(). From the look of it, the API doesn't attach a load event listener to that frame and rather relies on the frame itself to indicate that it is ready (this will result in the callback being triggered). So the missing error callback is an inherent issue - the API cannot see a failure because no frame will be there to signal it.
From what I can tell, the best thing you can do is attaching your own load event listener to the document (the event will bubble up from the frames) and checking yourself when they load. Warning: While this might work with the current version of the API, it is not guaranteed to continue working in future as the implementation of that API changes. Currently something like this should work:
var framesToLoad = apisToLoad;
document.addEventListener("load", function(event)
{
if (event.target.localName == "iframe")
{
framesToLoad--;
if (framesToLoad == 0)
{
// Allow any outstanding synchronous actions to execute, just in case
window.setTimeout(function()
{
if (apisToLoad > 0)
alert("All frames are done but not all APIs loaded - error?");
}, 0);
}
}
}, true);
Just to repeat the warning from above: this code makes lots of assumptions. While these assumptions might stay true for a while with this API, it might also be that Google will change something and this code will stop working. It might even be that Google uses a different approach depending on the browser, I only tested in Firefox.
This is an extremely hacky way of doing it, but you could intercept all console messages, check what is being logged, and if it is the error message you care about it, call another function.
function interceptConsole(){
var errorMessage = 'Could not fetch URL: https://webapis-discovery.appspot.com/_ah/api/static/proxy.html';
var console = window.console
if (!console) return;
function intercept(method){
var original = console[method];
console[method] = function() {
if (arguments[0] == errorMessage) {
alert("Error Occured");
}
if (original.apply){
original.apply(console, arguments)
}
else {
//IE
var message = Array.prototype.slice.apply(arguments).join(' ');
original(message)
}
}
}
var methods = ['log', 'warn', 'error']
for (var i = 0; i < methods.length; i++)
intercept(methods[i])
}
interceptConsole();
console.log('Could not fetch URL: https://webapis-discovery.appspot.com/_ah/api/static/proxy.html');
//alerts "Error Occured", then logs the message
console.log('Found it');
//just logs "Found It"
An example is here - I log two things, one is the error message, the other is something else. You'll see the first one cause an alert, the second one does not.
http://jsfiddle.net/keG7X/
You probably would have to run the interceptConsole function before including the gapi script as it may make it's own copy of console.
Edit - I use a version of this code myself, but just remembered it's from here, so giving credit where it's due.
I use a setTimeout to manually trigger error if the api hasn't loaded yet:
console.log(TAG + 'api loading...');
let timer = setTimeout(() => {
// Handle error
reject('timeout');
console.error(TAG + 'api loading error: timeout');
}, 1000); // time till timeout
let callback = () => {
clearTimeout(timer);
// api has loaded, continue your work
console.log(TAG + 'api loaded');
resolve(gapi.client.apiName);
};
gapi.client.load('apiName', 'v1', callback, apiRootUrl);

Handling interdependent and/or layered asynchronous calls

As an example, suppose I want to fetch a list of files from somewhere, then load the contents of these files and finally display them to the user. In a synchronous model, it would be something like this (pseudocode):
var file_list = fetchFiles(source);
if (!file_list) {
display('failed to fetch list');
} else {
for (file in file_list) { // iteration, not enumeration
var data = loadFile(file);
if (!data) {
display('failed to load: ' + file);
} else {
display(data);
}
}
}
This provides decent feedback to the user and I can move pieces of code into functions if I so deem necessary. Life is simple.
Now, to crush my dreams: fetchFiles() and loadFile() are actually asynchronous. The easy way out is to transform them into synchronous functions. But this is not good if the browser locks up waiting for calls to complete.
How can I handle multiple interdependent and/or layered asynchronous calls without delving deeper and deeper into an endless chain of callbacks, in classic reductio ad spaghettum fashion? Is there a proven paradigm to cleanly handle these while keeping code loosely coupled?
Deferreds are really the way to go here. They capture exactly what you (and a whole lot of async code) want: "go away and do this potentially expensive thing, don't bother me in the meantime, and then do this when you get back."
And you don't need jQuery to use them. An enterprising individual has ported Deferred to underscore, and claims you don't even need underscore to use it.
So your code can look like this:
function fetchFiles(source) {
var dfd = _.Deferred();
// do some kind of thing that takes a long time
doExpensiveThingOne({
source: source,
complete: function(files) {
// this informs the Deferred that it succeeded, and passes
// `files` to all its success ("done") handlers
dfd.resolve(files);
// if you know how to capture an error condition, you can also
// indicate that with dfd.reject(...)
}
});
return dfd;
}
function loadFile(file) {
// same thing!
var dfd = _.Deferred();
doExpensiveThingTwo({
file: file,
complete: function(data) {
dfd.resolve(data);
}
});
return dfd;
}
// and now glue it together
_.when(fetchFiles(source))
.done(function(files) {
for (var file in files) {
_.when(loadFile(file))
.done(function(data) {
display(data);
})
.fail(function() {
display('failed to load: ' + file);
});
}
})
.fail(function() {
display('failed to fetch list');
});
The setup is a little wordier, but once you've written the code to handle the Deferred's state and stuffed it off in a function somewhere you won't have to worry about it again, you can play around with the actual flow of events very easily. For example:
var file_dfds = [];
for (var file in files) {
file_dfds.push(loadFile(file));
}
_.when(file_dfds)
.done(function(datas) {
// this will only run if and when ALL the files have successfully
// loaded!
});
Events
Maybe using events is a good idea. It keeps you from creating code-trees and de-couples your code.
I've used bean as the framework for events.
Example pseudo code:
// async request for files
function fetchFiles(source) {
IO.get(..., function (data, status) {
if(data) {
bean.fire(window, 'fetched_files', data);
} else {
bean.fire(window, 'fetched_files_fail', data, status);
}
});
}
// handler for when we get data
function onFetchedFiles (event, files) {
for (file in files) {
var data = loadFile(file);
if (!data) {
display('failed to load: ' + file);
} else {
display(data);
}
}
}
// handler for failures
function onFetchedFilesFail (event, status) {
display('Failed to fetch list. Reason: ' + status);
}
// subscribe the window to these events
bean.on(window, 'fetched_files', onFetchedFiles);
bean.on(window, 'fetched_files_fail', onFetchedFilesFail);
fetchFiles();
Custom events and this kind of event handling is implemented in virtually all popular JS frameworks.
Sounds like you need jQuery Deferred. Here is some untested code that might help point you in the right direction:
$.when(fetchFiles(source)).then(function(file_list) {
if (!file_list) {
display('failed to fetch list');
} else {
for (file in file_list) {
$.when(loadFile(file)).then(function(data){
if (!data) {
display('failed to load: ' + file);
} else {
display(data);
}
});
}
}
});
I also found another decent post which gives a few uses cases for the Deferred object
If you do not want to use jQuery, what you could use instead are web workers in combination with synchronous requests. Web workers are supported across every major browser with the exception of any Internet Explorer version before 10.
Web Worker browser compatability
Basically, if you're not entirely certain what a web worker is, think of it as a way for browsers to execute specialized JavaScript on a separate thread without impacting the main thread (Caveat: On a single-core CPU, both threads will run in an alternating fashion. Luckily, most computers nowadays come equipped with dual-core CPUs). Usually, web workers are reserved for complex computations or some intense processing task. Just keep in mind that any code within the web worker CANNOT reference the DOM nor can it reference any global data structures that have not been passed to it. Essentially, web workers run independent of the main thread. Any code that the worker executes should be kept separate from the rest of your JavaScript code base, within its own JS file. Furthermore, if the web workers need specific data in order to properly work, you need to pass that data into them upon starting them up.
Yet another important thing worth noting is that any JS libraries that you need to use to load the files will need to be copied directly into the JavaScript file that the worker will execute. That means these libraries should first be minified(if they haven't been already), then copied and pasted into the top of the file.
Anyway, I decided to write up a basic template to show you how to approach this. Check it out below. Feel free to ask questions/criticize/etc.
On the JS file that you want to keep executing on the main thread, you want something like the following code below in order to invoke the worker.
function startWorker(dataObj)
{
var message = {},
worker;
try
{
worker = new Worker('workers/getFileData.js');
}
catch(error)
{
// Throw error
}
message.data = dataObj;
// all data is communicated to the worker in JSON format
message = JSON.stringify(message);
// This is the function that will handle all data returned by the worker
worker.onMessage = function(e)
{
display(JSON.parse(e.data));
}
worker.postMessage(message);
}
Then, in a separate file meant for the worker (as you can see in the code above, I named my file getFileData.js), write something like the following...
function fetchFiles(source)
{
// Put your code here
// Keep in mind that any requests made should be synchronous as this should not
// impact the main thread
}
function loadFile(file)
{
// Put your code here
// Keep in mind that any requests made should be synchronous as this should not
// impact the main thread
}
onmessage = function(e)
{
var response = [],
data = JSON.parse(e.data),
file_list = fetchFiles(data.source),
file, fileData;
if (!file_list)
{
response.push('failed to fetch list');
}
else
{
for (file in file_list)
{ // iteration, not enumeration
fileData = loadFile(file);
if (!fileData)
{
response.push('failed to load: ' + file);
}
else
{
response.push(fileData);
}
}
}
response = JSON.stringify(response);
postMessage(response);
close();
}
PS: Also, I dug up another thread which would better help you understand the pros and cons of using synchronous requests in combination with web workers.
Stack Overflow - Web Workers and Synchronous Requests
async is a popular asynchronous flow control library often used with node.js. I've never personally used it in the browser, but apparently it works there as well.
This example would (theoretically) run your two functions, returning an object of all the filenames and their load status. async.map runs in parallel, while waterfall is a series, passing the results of each step on to the next.
I am assuming here that your two async functions accept callbacks. If they do not, I'd require more info as to how they're intended to be used (do they fire off events on completion? etc).
async.waterfall([
function (done) {
fetchFiles(source, function(list) {
if (!list) done('failed to fetch file list');
else done(null, list);
});
// alternatively you could simply fetchFiles(source, done) here, and handle
// the null result in the next function.
},
function (file_list, done) {
var loadHandler = function (memo, file, cb) {
loadFile(file, function(data) {
if (!data) {
display('failed to load: ' + file);
} else {
display(data);
}
// if any of the callbacks to `map` returned an error, it would halt
// execution and pass that error to the final callback. So we don't pass
// an error here, but rather a tuple of the file and load result.
cb(null, [file, !!data]);
});
};
async.map(file_list, loadHandler, done);
}
], function(err, result) {
if (err) return display(err);
// All files loaded! (or failed to load)
// result would be an array of tuples like [[file, bool file loaded?], ...]
});
waterfall accepts an array of functions and executes them in order, passing the result of each along as the arguments to the next, along with a callback function as the last argument, which you call with either an error, or the resulting data from the function.
You could of course add any number of different async callbacks between or around those two, without having to change the structure of the code at all. waterfall is actually only 1 of 10 different flow control structures, so you have a lot of options (although I almost invariably end up using auto, which allows you to mix parallel and series execution in the same function via a Makefile like requirements syntax).
I had this issue with a webapp I'm working on and here's how I solved it (with no libraries).
Step 1: Wrote a very lightweight pubsub implementation. Nothing fancy. Subscribe, Unsubscribe, Publish and Log. Everything (with comments) adds up 93 lines of Javascript. 2.7kb before gzip.
Step 2: Decoupled the process you were trying to accomplish by letting the pubsub implementation do the heavy lifting. Here's an example:
// listen for when files have been fetched and set up what to do when it comes in
pubsub.notification.subscribe(
"processFetchedResults", // notification to subscribe to
"fetchedFilesProcesser", // subscriber
/* what to do when files have been fetched */
function(params) {
var file_list = params.notificationParams.file_list;
for (file in file_list) { // iteration, not enumeration
var data = loadFile(file);
if (!data) {
display('failed to load: ' + file);
} else {
display(data);
}
}
);
// trigger fetch files
function fetchFiles(source) {
// ajax call to source
// on response code 200 publish "processFetchedResults"
// set publish parameters as ajax call response
pubsub.notification.publish("processFetchedResults", ajaxResponse, "fetchFilesFunction");
}
Of course this is very verbose in the setup and scarce on the magic behind the scenes.
Here's some technical details:
I'm using setTimeout to handle triggering subscriptions. This way they run in a non-blocking fashion.
The call is effectively decoupled from the processing. You can write a different subscription to the notification "processFetchedResults" and do multiple things once the response comes through (for example logging and processing) while keeping them in very separate, tiny and easily-managed code blocks.
The above code sample doesn't address fallbacks or run proper checks. I'm sure it will require a bit of tooling to get to production standards. Just wanted to show you how possible it is and how library-independent your solution can be.
Cheers!

Categories