I have an array of files, that I'd like to attack N at a time. And a function doWork that returns a promise.
var files = []
var doWork = function(file) {
return asyncFn(file)
}
I'd like to be able to push onto this queue dynamically.
Edit: I've tried various modules (promise-queue, async-q). They all work in a fashion, but they don't allow using an array as a queue. They have there own internal structure that you need to push onto.
The reason I need to use an array as I want to be able to push an item onto the queue, and check that it's not already on the queue.
Here is how you would do that with Bluebird which you indicated you were using.
var files = ["foo.txt", "bar.txt", "baz.txt"];
var task = Promise.map(files, doWork, {concurrency: 4}); // four at a time
task.then(function(results){
// results contains the results, tasks are executed at most 4 at a time
});
A word of caution - this puts an upper limit on how much the current invocation will run, calling the function multiple times, or from multiple node processes will (obviously) execute with larger/smaller concurrency. However in the simple case - this works.
You could do something like this:
function enq (step)
var f = function() {
var d = Q.defer();
step(d);
return d.promise;
}
enq_head = enq_head.then(f);
}
where step is a function that fulfills the promise you pass it. But I don't recommend it cos it's just a fancy way of doing what setTimeout does much more efficiently.
If you want to keep track of which files you've scheduled and/or completed, just put them in a done list or take them out of the todo list you get them from, stick a bool in an object under the filename or whatever. It's a seperate problem from the scheduling.
Related
The issue
I have 5 work queues and each queue has 5 tasks.
First, I make an API call to get all work queues. I then send an ID from each queue to another endpoint which gets me the tasks for that queue.
I'm trying to push all tasks from all queues into one array, so ideally the end product would be an array of 25 tasks.
Due to Angular's asynchronous nature, I'm having trouble getting all the data into the right place at the right time.
My current code
var getAllTasks = function () {
var deferred = $q.defer();
var allTasks = [];
qSrv.getQueues().then(function (response) {
// Loop through all queues
response.data.forEach(function (e, i) {
// Get Queue task list by passing in Global Unique Identifier to endpoint
qSrv.getQueueByQueueGuid(e.workQueueGuid).then(function (queue) {
// Push all tasks within task list to one array
queue.data.taskList.forEach(function (ee, ii) {
ee.queueName = e.name;
allTasks.push(ee);
});
});
});
// Resolve promise with arrays of all 25 tasks
deferred.resolve(allTasks);
})
return deferred.promise;
}
Conclusion
I know there's something funky going on with the promises not finishing before being passed along to the next step, but I don't know how to structure it to avoid this problem.
I would appreciate any and all help, thank you.
I've created this object which contains an array, which serves as a work queue.
It kind of works like this:
var work1 = new Work();
var work2 = new Work();
var queue = Workqueue.instance();
queue.add(work1) // Bluebird promise.
.then(function addWork2() {
return queue.add(work2);
})
.then(function toCommit() {
return queue.commit();
})
.then(function done(results) {
// obtain results here.
})
.catch(function(err){});
It works in that case and I can commit more than one task before I call the commit.
However if it's like this:
var work1 = new Work();
var work2 = new Work();
var queue = Workqueue.instance();
queue.add(work1)
.then(function toCommit1() {
return queue.commit();
})
.then(function done1(result1) {
// obtain result1 here.
})
.catch(function(err){});
queue.add(work2)
.then(function toCommit2() {
return queue.commit();
})
.then(function done2(result2) {
// obtain result2 here.
})
.catch(function(err){});
Something may go wrong, because if the first commit is called after the second commit (two works/tasks are already added), the first commit handler expects a result but they all go to the second commit handler.
The task involves Web SQL database read and may also involves network access. So it's basically a complicated procedure so the above described problem may surface. If only I can have a addWorkAndCommit() implemented which wraps the add and commit together, but still there is no guarantee because addWorkAndCommit() cannot be "atomic" in a sense because they involves asynchronous calls. So even two calls to addWorkAndCommit() may fail. (I don't know how to describe it other than by "atomic", since JavaScript is single-threaded, but this issue crops up).
What can I do?
The problem is that there is a commit() but no notion of a transaction, so you cannot explicitly have two isolated transactions running in parallel. From my understanding the Javascript Workqueue is a proxy for a remote queue and the calls to add() and commit() map directly to some kind of remote procedure calls having a similar interface without transactions. I also understand that you would not care if the second add() actually happened after the first commit(), you just want to write two simple subsequent addWorkAndCommit() statements without synchronizing the underlying calls in client code.
What you can do is write a wrapper around the local Workqueue (or alter it directly if it is your code), so that each update of the queue creates a new transaction and a commit() always refers to one such transaction. The wrapper then delays new updates until all previous transactions are committed (or rolled back).
Adopting Benjamin Gruenbaum's recommendation to use a disposer pattern, here is one, written as an adapter method for Workqueue.instance() :
Workqueue.transaction = function (work) { // `work` is a function
var queue = this.instance();
return Promise.resolve(work(queue)) // `Promise.resolve()` avoids an error if `work()` doesn't return a promise.
.then(function() {
return queue.commit();
});
}
Now you can write :
// if the order mattters,
// then add promises sequentially.
Workqueue.transaction(function(queue) {
var work1 = new Work();
var work2 = new Work();
return queue.add(work1)
.then(function() {
return queue.add(work2);
});
});
// if the order doesn't mattter,
// add promises in parallel.
Workqueue.transaction(function(queue) {
var work1 = new Work();
var work2 = new Work();
var promise1 = queue.add(work1);
var promise2 = queue.add(work2);
return Promise.all(promise1, promise2);
});
// you can even pass `queue` around
Workqueue.transaction(function(queue) {
var work1 = new Work();
var promise1 = queue.add(work1);
var promise2 = myCleverObject.doLotsOfAsyncStuff(queue);
return Promise.all(promise1, promise2);
});
In practice, an error handler should be included like this - Workqueue.transaction(function() {...}).catch(errorHandler);
Whatever you write, all you need to do is ensure that the callback function returns a promise that is an aggregate of all the component asynchronisms (component promises). When the aggregate promise resolves, the disposer will ensure that the transaction is committed.
As with all disposers, this one doesn't do anything you can't do without it. However it :
serves as a reminder of what you are doing by providing a named .transaction() method,
enforces the notion of a single transaction by constraining a Workqueue.instance() to one commit.
If for any reason you should ever need to do two or more commits on the same queue (why?), then you can always revert to calling Workqueue.instance() directly.
In a language with threads and locks it is easy to implement a lazy load by checking the value of a variable, if it's null then lock the next section of code, check the value again and then load the resource and assign. This prevents it from being loaded multiple times and causes threads after the first to wait for the first thread to complete the action that's needed.
Psuedo code:
if(myvar == null) {
lock(obj) {
if(myvar == null) {
myvar = getData();
}
}
}
return myvar;
JavaScript runs in a single thread, however, it still has this type of issue because of asynchronous execution while one call is waiting on a blocking resource. In this Node.js example:
var allRecords;
module.exports = getAllRecords(callback) {
if(allRecords) {
return callback(null,allRecords);
}
db.getRecords({}, function(err, records) {
if (err) {
return callback(err);
}
// Use existing object if it has been
// set by another async request to this
// function
allRecords = allRecords || partners;
return callback(null, allRecords);
});
}
I'm lazy loading all the records from a small DB table the first time this function is called and then returning the in-memory records on subsequent calls.
Problem: If multiple async requests are made to this function at the same time then the table is going to be loaded unnecessarily from the DB multiple times.
In order to solve this I could simulate a locking mechanism by creating a var lock; variable and setting it to true while the table is loading. I would then put the other async calls into a setTimeout() loop and check back on this variable every (say) 1 second until the data was available and then allow them to return.
The problems with that solution are:
It's fragile, what if the first async call throws and doesn't unset the lock.
How many times do we loop back into the timer before giving up?
How long should the timer be set for? In some environments 1 second might be way too long and inefficient.
Is there a best practise for solving this in JavaScript?
On the first call to the service, initialize an array. Start the fetch operation. Create a Promise, store it in the array.
On subsequent calls, if the data is there, return an already-fulfilled Promise. If not, add another Promise to the array and return that.
When the data arrives, resolve all the waiting Promise objects in the list. (You can throw away the list once the data's there.)
I really like the promise solution in the other answer -- very clever, very interesting. Promises aren't the dominent methodology, so you may need to educate the team. I'm going to go in another direction though.
What you're after is a memoize function -- an in-memory key/value cache of expensive results. JavaScript the Good Parts has a memoize sample towards the end. Lodash has a memoize function. These assume synchronous processing so don't account for your scenario -- which is to say they'd hit the database lots of times until one of the "threads" replied.
The async library also has a memoize function that does exactly what you want. In it's innards, it keeps a queue array of callbacks, and once it gets the answer, it both caches it and calls all the callbacks.
If you're into inventing, by all means, use promises. If you'd just like a plug-n-play answer, use async#memoize.
I have an http request that should return a list of tasks. However, those tasks are generated in a complex fashion. This is how it works.
Get all current tasks from the DB
expire tasks that are old
get user profiles from the DB
if the user doesn't have a profile and a task for creating the profile doesn't exist, add a task for creating the profile
additionally, for every subprofile the user has, make a daily task and save it to the DB, if a daily task hasn't already been created.
return all the tasks to the HTTP caller
I'm listing this all here, in case there's a better way to do it. From what I understand, I should have promises for both the DB calls that are then followed by promises that manipulate the task/profile data.
What I don't understand is how to add the N promises that are needed for daily tasks into my promise chain. I also need all the data available the final process to return the newly created list of tasks. Should I be nesting promises somehow?
Currently, I imagine it being something like this:
var taskPromise = dbPromise(serverName, taskRequest, params);
var profilesPromise = dbPromise(serverName, profilesRequest, params);
Q.all([taskPromise, profilesPromise])
.then(function(arrayOfTasksAndProfiles){
//do calculations and create an object like {tasks:[], profile:profile, subprofiles:[]})
.then(function(currentDataObject) {
var deferred = Q.defer();
var newTasksToBeCreated = // make a list of all the new tasks I want to create
var promisesForNewTasks = [] // create an array of promises that save each of the new tasks to the server
Q.all(promisesForNewTasks)
.then(function(returnedIDsForNewTasks) {
// somehow match the returned IDs to the newTasksToBeCreated and add them on
currentDataObject.newTasks = newTasksToBeCreated
deferred.resolve(currentDataObject);
});)
.then(function(currentDataObject) {
// now that the currentDataObject has all the tasks from the DB, plus the new ones with their IDs, I can respond with that information
res.json(currentDataObject))
.done();
I have to make multiple calls to the DB to create new tasks, and I need to return those appended to the other tasks I received from the DB, and the only way I can see to do that is to nest a Q.all() call.
"There's gotta be a better way."
Only one thing: Don't create a custom deferred that you need to manually resolve. Instead, just return from the then handler; and return the resulting promise of the .then() call.
.then(function(currentDataObject) {
var newTasksToBeCreated = // make a list of all the new tasks I want to create
var promisesForNewTasks = [] // create an array of promises that save each of the new tasks to the server
return Q.all(promisesForNewTasks)
// ^^^^^^
.then(function(returnedIDsForNewTasks) {
// somehow match the returned IDs to the newTasksToBeCreated and add them on
currentDataObject.newTasks = newTasksToBeCreated
return currentDataObject;
// ^^^^^^
});
})
Else, it looks quite fine. If you have problems matching the returned ids to the tasks - don't do it that way. Instead, make each of the promisesForNewTasks resolve with its own task object (combined with the id?).
Is there a way to wait on a promise so that you can get the actual result from it and return that instead of returning the promise itself? I'm thinking of something similar to how the C# await keyword works with Tasks.
Here is an example of why I'd like to have a method like canAccess() that returns true or false instead of a promise so that it can be used in an if statement. The method canAccess() would make an AJAX call using $http or $resource and then somehow wait for the promise to get resolved.
The would look something like this:
$scope.canAccess = function(page) {
var resource = $resource('/api/access/:page');
var result = resource.get({page: page});
// how to await this and not return the promise but the real value
return result.canAccess;
}
Is there anyway to do this?
In general that's a bad idea. Let me tell you why. JavaScript in a browser is basically a single threaded beast. Come to think of it, it's single threaded in Node.js too. So anything you do to not "return" at the point you start waiting for the remote request to succeed or fail will likely involve some sort of looping to delay execution of the code after the request. Something like this:
var semaphore = false;
var superImportantInfo = null;
// Make a remote request.
$http.get('some wonderful URL for a service').then(function (results) {
superImportantInfo = results;
semaphore = true;
});
while (!semaphore) {
// We're just waiting.
}
// Code we're trying to avoid running until we know the results of the URL call.
console.log('The thing I want for lunch is... " + superImportantInfo);
But if you try that in a browser and the call takes a long time, the browser will think your JavaScript code is stuck in a loop and pop up a message in the user's face giving the user the chance to stop your code. JavaScript therefore structures it like so:
// Make a remote request.
$http.get('some wonderful URL for a service').then(function (results) {
// Code we're trying to avoid running until we know the results of the URL call.
console.log('The thing I want for lunch is... " + results);
});
// Continue on with other code which does not need the super important info or
// simply end our JavaScript altogether. The code inside the callback will be
// executed later.
The idea being that the code in the callback will be triggered by an event whenever the service call returns. Because event driven is how JavaScript likes it. Timers in JavaScript are events, user actions are events, HTTP/HTTPS calls to send and receive data generate events too. And you're expected to structure your code to respond to those events when they come.
Can you not structure your code such that it thinks canAccess is false until such time as the remote service call returns and it maybe finds out that it really is true after all? I do that all the time in AngularJS code where I don't know what the ultimate set of permissions I should show to the user is because I haven't received them yet or I haven't received all of the data to display in the page at first. I have defaults which show until the real data comes back and then the page adjusts to its new form based on the new data. The two way binding of AngularJS makes that really quite easy.
Use a .get() callback function to ensure you get a resolved resource.
Helpful links:
Official docs
How to add call back for $resource methods in AngularJS
You can't - there aren't any features in angular, Q (promises) or javascript (at this point in time) that let do that.
You will when ES7 happens (with await).
You can if you use another framework or a transpiler (as suggested in the article linked - Traceur transpiler or Spawn).
You can if you roll your own implementation!
My approach was create a function with OLD javascript objects as follows:
var globalRequestSync = function (pUrl, pVerbo, pCallBack) {
httpRequest = new XMLHttpRequest();
httpRequest.onreadystatechange = function () {
if (httpRequest.readyState == 4 && httpRequest.status == 200) {
pCallBack(httpRequest.responseText);
}
}
httpRequest.open(pVerbo, pUrl, false);
httpRequest.send(null);
};
I recently had this problem and made a utility called 'syncPromises'. This basically works by sending what I called an "instruction list", which would be array of functions to be called in order. You'll need to call the first then() to kick things of, dynamically attach a new .then() when the response comes back with the next item in the instruction list so you'll need to keep track of the index.
// instructionList is array.
function syncPromises (instructionList) {
var i = 0,
defer = $q.defer();
function next(i) {
// Each function in the instructionList needs to return a promise
instructionList[i].then(function () {
var test = instructionList[i++];
if(test) {
next(i);
}
});
}
next(i);
return defer.promise;
}
This I found gave us the most flexibility.
You can automatically push operations etc to build an instruction list and you're also able to append as many .then() responses handlers in the callee function. You can also chain multiple syncPromises functions that will all happen in order.