Javascript memory consumption with map() over a large set and callbacks - javascript

I don't even know how to properly ask this question but I have concerns about the performance (mostly memory consumption) of the following code. I am anticipating that this code will consume a lot of memory because of map on a large set and a lot of 'hanging' functions that wait for external service. Are my concerns justified here? What would be a better approach?
var list = fs.readFileSync('./mailinglist.txt') // say 1.000.000 records
.split("\n")
.map( processEntry );
var processEntry = function _processEntry(i){
i = i.split('\t');
getEmailBody( function(emailBody, name){
var msg = {
"message" : emailBody,
"name" : i[0]
}
request(msg, function reqCb(err, result){
...
});
}); // getEmailBody
}
var getEmailBody = function _getEmailBody(obj, cb){
// read email template from file;
// v() returns the correct form for person's name with web-based service
v(obj.name, function(v){
cb(obj, v)
});
}

If you're worried about submitting a million http requests in a very short time span (which you probably should be), you'll have to set up a buffer of some kind.
one simple way to do it:
var lines = fs.readFileSync('./mailinglist.txt').split("\n");
var entryIdx = 0;
var done = false;
var processNextEntry = function () {
if (entryIdx < lines.length) {
processEntry(lines[entryIdx++]);
} else {
done = true;
}
};
var processEntry = function _processEntry(i){
i = i.split('\t');
getEmailBody( function(emailBody, name){
var msg = {
"message" : emailBody,
"name" : name
}
request(msg, function reqCb(err, result){
// ...
!done && processNextEntry();
});
}); // getEmailBody
}
// getEmailBody didn't change
// you set the ball rolling by calling processNextEntry n times,
// where n is a sensible number of http requests to have pending at once.
for (var i=0; i<10; i++) processNextEntry();
Edit: according to this blog post node has an internal queue system, it will only allow 5 simultaneous requests. But you can still use this method to avoid filling up that internal queue with a million items if you're worried about memory consumption.

Firstly I would advise against using readFileSync, and instead favour the async equivalent. Blocking on IO operations should be avoided as reading from a disk is very expensive, and whilst that's the sole purpose of your code now, I would consider how that might change in the future - and arbitrarily wasting clock cycles is never a good idea.
For large data files I would read them in in defined chunks and process them. If you can come up with some schema, either sentinels to distinguish data blocks within the file, or padding to boundaries, then process the file piece by piece.
This is just rough, untested off the top of my head, but something like:
var fs = require("fs");
function doMyCoolWork(startByteIndex, endByteIndex){
fs.open("path to your text file", 'r', function(status, fd) {
var chunkSize = endByteIndex - startByteIndex;
var buffer = new Buffer(chunkSize);
fs.read(fd, buffer, 0, chunkSize, 0, function(err, byteCount) {
var data = buffer.toString('utf-8', 0, byteCount);
// process your data here
if(stillWorkToDo){
//recurse
doMyCoolWork(endByteIndex, endByteIndex + 100);
}
});
});
}
Or look into one of the stream library functions for similar functionality.
H2H
ps. Javascript and Node works extremely well with async and eventing.. using sync is an antipattern in my opinion, and likely to cause code to be a headache in future

Related

JavaScript function blocks web socket & causes sync issue & delay

I have a web socket that receives data from a web socket server every 100 to 200ms, ( I have tried both with a shared web worker as well as all in the main.js file),
When new JSON data arrives my main.js runs filter_json_run_all(json_data) which updates Tabulator.js & Dygraph.js Tables & Graphs with some custom color coding based on if values are increasing or decreasing
1) web socket json data ( every 100ms or less) -> 2) run function filter_json_run_all(json_data) (takes 150 to 200ms) -> 3) repeat 1 & 2 forever
Quickly the timestamp of the incoming json data gets delayed versus the actual time (json_time 15:30:12 vs actual time: 15:31:30) since the filter_json_run_all is causing a backlog in operations.
So it causes users on different PC's to have websocket sync issues, based on when they opened or refreshed the website.
This is only caused by the long filter_json_run_all() function, otherwise if all I did was console.log(json_data) they would be perfectly in sync.
Please I would be very very grateful if anyone has any ideas how I can prevent this sort of blocking / backlog of incoming JSON websocket data caused by a slow running javascript
function :)
I tried using a shared web worker which works but it doesn't get around the delay in main.js blocked by filter_json_run_all(), I dont thing I can put filter_json_run_all() since all the graph & table objects are defined in main & also I have callbacks for when I click on a table to update a value manually (Bi directional web socket)
If you have any ideas or tips at all I will be very grateful :)
worker.js:
const connectedPorts = [];
// Create socket instance.
var socket = new WebSocket(
'ws://'
+ 'ip:port'
+ '/ws/'
);
// Send initial package on open.
socket.addEventListener('open', () => {
const package = JSON.stringify({
"time": 123456,
"channel": "futures.tickers",
"event": "subscribe",
"payload": ["BTC_USD", "ETH_USD"]
});
socket.send(package);
});
// Send data from socket to all open tabs.
socket.addEventListener('message', ({ data }) => {
const package = JSON.parse(data);
connectedPorts.forEach(port => port.postMessage(package));
});
/**
* When a new thread is connected to the shared worker,
* start listening for messages from the new thread.
*/
self.addEventListener('connect', ({ ports }) => {
const port = ports[0];
// Add this new port to the list of connected ports.
connectedPorts.push(port);
/**
* Receive data from main thread and determine which
* actions it should take based on the received data.
*/
port.addEventListener('message', ({ data }) => {
const { action, value } = data;
// Send message to socket.
if (action === 'send') {
socket.send(JSON.stringify(value));
// Remove port from connected ports list.
} else if (action === 'unload') {
const index = connectedPorts.indexOf(port);
connectedPorts.splice(index, 1);
}
});
Main.js This is only part of filter_json_run_all which continues on for about 6 or 7 Tabulator & Dygraph objects. I wante to give an idea of some of the operations called with SetTimeout() etc
function filter_json_run_all(json_str){
const startTime = performance.now();
const data_in_array = json_str //JSON.parse(json_str.data);
// if ('DATETIME' in data_in_array){
// var milliseconds = (new Date()).getTime() - Date.parse(data_in_array['DATETIME']);
// console.log("milliseconds: " + milliseconds);
// }
if (summary in data_in_array){
if("DATETIME" in data_in_array){
var time_str = data_in_array["DATETIME"];
element_time.innerHTML = time_str;
}
// summary Data
const summary_array = data_in_array[summary];
var old_sum_arr_krw = [];
var old_sum_arr_irn = [];
var old_sum_arr_ntn = [];
var old_sum_arr_ccn = [];
var old_sum_arr_ihn = [];
var old_sum_arr_ppn = [];
var filtered_array_krw_summary = filterByProperty_summary(summary_array, "KWN")
old_sum_arr_krw.unshift(Table_summary_krw.getData());
Table_summary_krw.replaceData(filtered_array_krw_summary);
//Colour table
color_table(filtered_array_krw_summary, old_sum_arr_krw, Table_summary_krw);
var filtered_array_irn_summary = filterByProperty_summary(summary_array, "IRN")
old_sum_arr_irn.unshift(Table_summary_inr.getData());
Table_summary_inr.replaceData(filtered_array_irn_summary);
//Colour table
color_table(filtered_array_irn_summary, old_sum_arr_irn, Table_summary_inr);
var filtered_array_ntn_summary = filterByProperty_summary(summary_array, "NTN")
old_sum_arr_ntn.unshift(Table_summary_twd.getData());
Table_summary_twd.replaceData(filtered_array_ntn_summary);
//Colour table
color_table(filtered_array_ntn_summary, old_sum_arr_ntn, Table_summary_twd);
// remove formatting on fwds curves
setTimeout(() => {g_fwd_curve_krw.updateOptions({
'file': dataFwdKRW,
'labels': ['Time', 'Bid', 'Ask'],
strokeWidth: 1,
}); }, 200);
setTimeout(() => {g_fwd_curve_inr.updateOptions({
'file': dataFwdINR,
'labels': ['Time', 'Bid', 'Ask'],
strokeWidth: 1,
}); }, 200);
// remove_colors //([askTable_krw, askTable_inr, askTable_twd, askTable_cny, askTable_idr, askTable_php])
setTimeout(() => { askTable_krw.getRows().forEach(function (item, index) {
row = item.getCells();
row.forEach(function (value_tmp){value_tmp.getElement().style.backgroundColor = '';}
)}); }, 200);
setTimeout(() => { askTable_inr.getRows().forEach(function (item, index) {
row = item.getCells();
row.forEach(function (value_tmp){value_tmp.getElement().style.backgroundColor = '';}
)}); }, 200);
color_table Function
function color_table(new_arr, old_array, table_obj){
// If length is not equal
if(new_arr.length!=old_array[0].length)
console.log("Diff length");
else
{
// Comparing each element of array
for(var i=0;i<new_arr.length;i++)
//iterate old dict dict
for (const [key, value] of Object.entries(old_array[0][i])) {
if(value == new_arr[i][key])
{}
else{
// console.log("Different element");
if(key!="TENOR")
// console.log(table_obj)
table_obj.getRows()[i].getCell(key).getElement().style.backgroundColor = 'yellow';
if(key!="TIME")
if(value < new_arr[i][key])
//green going up
//text_to_speech(new_arr[i]['CCY'] + ' ' +new_arr[i]['TENOR']+ ' getting bid')
table_obj.getRows()[i].getCell(key).getElement().style.backgroundColor = 'Chartreuse';
if(key!="TIME")
if(value > new_arr[i][key])
//red going down
table_obj.getRows()[i].getCell(key).getElement().style.backgroundColor = 'Crimson';
}
}
}
}
Potential fudge / solution, thanks Aaron :):
function limiter(fn, wait){
let isCalled = false,
calls = [];
let caller = function(){
if (calls.length && !isCalled){
isCalled = true;
if (calls.length >2){
calls.splice(0,calls.length-1)
//remove zero' upto n-1 function calls from array/ queue
}
calls.shift().call();
setTimeout(function(){
isCalled = false;
caller();
}, wait);
}
};
return function(){
calls.push(fn.bind(this, ...arguments));
// let args = Array.prototype.slice.call(arguments);
// calls.push(fn.bind.apply(fn, [this].concat(args)));
caller();
};
}
This is then defined as a constant for a web worker to call:
const filter_json_run_allLimited = limiter(data => { filter_json_run_all(data); }, 300); // 300ms for examples
Web worker calls the limited function when new web socket data arrives:
// Event to listen for incoming data from the worker and update the DOM.
webSocketWorker.port.addEventListener('message', ({ data }) => {
// Limited function
filter_json_run_allLimited(data);
});
Please if anyone knows how websites like tradingview or real time high performance data streaming sites allow for low latency visualisation updates, please may you comment, reply below :)
I'm reticent to take a stab at answering this for real without knowing what's going on in color_table. My hunch, based on the behavior you're describing is that filter_json_run_all is being forced to wait on a congested DOM manipulation/render pipeline as HTML is being updated to achieve the color-coding for your updated table elements.
I see you're already taking some measures to prevent some of these DOM manipulations from blocking this function's execution (via setTimeout). If color_table isn't already employing a similar strategy, that'd be the first thing I'd focus on refactoring to unclog things here.
It might also be worth throwing these DOM updates for processed events into a simple queue, so that if slow browser behavior creates a rendering backlog, the function actually responsible for invoking pending DOM manipulations can elect to skip outdated render operations to keep the UI acceptably snappy.
Edit: a basic queueing system might involve the following components:
The queue, itself (this can be a simple array, it just needs to be accessible to both of the components below).
A queue appender, which runs during filter_json_run_all, simply adding objects to the end of the queue representing each DOM manipulation job you plan to complete using color_table or one of your setTimeout` callbacks. These objects should contain the operation to performed (i.e: the function definition, uninvoked), and the parameters for that operation (i.e: the arguments you're passing into each function).
A queue runner, which runs on its own interval, and invokes pending DOM manipulation tasks from the front of the queue, removing them as it goes. Since this operation has access to all of the objects in the queue, it can also take steps to optimize/combine similar operations to minimize the amount of repainting it's asking the browser to do before subsequent code can be executed. For example, if you've got several color_table operations that coloring the same cell multiple times, you can simply perform this operation once with the data from the last color_table item in the queue involving that cell. Additionally, you can further optimize your interaction with the DOM by invoking the aggregated DOM manipulation operations, themselves, inside a requestAnimationFrame callback, which will ensure that scheduled reflows/repaints happen only when the browser is ready, and is preferable from a performance perspective to DOM manipulation queueing via setTimeout/setInterval.

Any way to kill String.prototype.match after specified time passed? [duplicate]

Is it possible to cancel a regex.match operation if takes more than 10 seconds to complete?
I'm using an huge regex to match a specific text, and sometimes may work, and sometimes can fail...
regex: MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)(?:[\s\S]*?))PÁG\s:\s+\d+\/(\d+)\b(?:\D*(?:(?!\1\/\1)\d\D*)*)\1\/\1(?:[^Z]*(?:Z(?!6:\s\d+)[^Z]*)(?:[\s\S]*?))Z6:\s+\d+
Working example: https://regex101.com/r/kU6rS5/1
So.. i want cancel the operation if takes more than 10 seconds. Is it possible? I'm not finding anything related in sof
Thanks.
You could spawn a child process that does the regex matching and kill it off if it hasn't completed in 10 seconds. Might be a bit overkill, but it should work.
fork is probably what you should use, if you go down this road.
If you'll forgive my non-pure functions, this code would demonstrate the gist of how you could communicate back and forth between the forked child process and your main process:
index.js
const { fork } = require('child_process');
const processPath = __dirname + '/regex-process.js';
const regexProcess = fork(processPath);
let received = null;
regexProcess.on('message', function(data) {
console.log('received message from child:', data);
clearTimeout(timeout);
received = data;
regexProcess.kill(); // or however you want to end it. just as an example.
// you have access to the regex data here.
// send to a callback, or resolve a promise with the value,
// so the original calling code can access it as well.
});
const timeoutInMs = 10000;
let timeout = setTimeout(() => {
if (!received) {
console.error('regexProcess is still running!');
regexProcess.kill(); // or however you want to shut it down.
}
}, timeoutInMs);
regexProcess.send('message to match against');
regex-process.js
function respond(data) {
process.send(data);
}
function handleMessage(data) {
console.log('handing message:', data);
// run your regex calculations in here
// then respond with the data when it's done.
// the following is just to emulate
// a synchronous computational delay
for (let i = 0; i < 500000000; i++) {
// spin!
}
respond('return regex process data in here');
}
process.on('message', handleMessage);
This might just end up masking the real problem, though. You may want to consider reworking your regex like other posters have suggested.
Another solution I found here:
https://www.josephkirwin.com/2016/03/12/nodejs_redos_mitigation/
Based on the use of VM, no process fork.
That's pretty.
const util = require('util');
const vm = require('vm');
var sandbox = {
regex:/^(A+)*B/,
string:"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC",
result: null
};
var context = vm.createContext(sandbox);
console.log('Sandbox initialized: ' + vm.isContext(sandbox));
var script = new vm.Script('result = regex.test(string);');
try{
// One could argue if a RegExp hasn't processed in a given time.
// then, its likely it will take exponential time.
script.runInContext(context, { timeout: 1000 }); // milliseconds
} catch(e){
console.log('ReDos occurred',e); // Take some remedial action here...
}
console.log(util.inspect(sandbox)); // Check the results

node.js - mutual variable

I'm new to node.js, so before releasing my node.js app, I need to be sure it will work as it should.
Let's say I have an array variable and I intialize it on beginning of my script
myArray = [];
then I pull some data from an external API, store it inside myArray, and use setInterval() method to pull this data again each 30 minutes:
pullData();
setInterval(pullData, 30*60*1000);
pullData() function takes about 2-3 seconds to finish.
Clients will be able to get myArray using this function:
http.createServer(function(request, response){
var path = url.parse(request.url).pathname;
if(path=="/getdata"){
var string = JSON.stringify(myArray);
response.writeHead(200, {'Content-Type': 'text/plain'});
response.end(string);
}
}).listen(8001);
So what I'm asking is, can next situation happen?:
An client tries to get data from this node.js server, and in that same moment, data is being written into myArray by pullData() function, resulting in invalid data being sent to client?
I read some documentation, and what I realized is that when pullData() is running, createServer() will not respond to clients until pullData() finishes its job?
I'm really not good at understanding concurrent programming, so I need your confirmation on this, or if you have some better solution?
EDIT: here is the code of my pullData() function:
var now = new Date();
Date.prototype.addDays = function(days){
var dat = new Date(this.valueOf());
dat.setDate(dat.getDate() + days);
return dat;
}
var endDateTime = now.addDays(noOfDays);
var formattedEnd = endDateTime.toISOString();
var url = "https://api.mindbodyonline.com/0_5/ClassService.asmx?wsdl";
soap.createClient(url, function (err, client) {
if (err) {
throw err;
}
client.setEndpoint('https://api.mindbodyonline.com/0_5/ClassService.asmx');
var params = {
"Request": {
"SourceCredentials": {
"SourceName": sourceName,
"Password": password,
"SiteIDs": {
"int": [siteIDs]
}
},
"EndDateTime" : formattedEnd
}
};
client.Class_x0020_Service.Class_x0020_ServiceSoap.GetClasses(params, function (errs, result) {
if (errs) {
console.log(errs);
} else {
var classes = result.GetClassesResult.Classes.Class;
myArray = [];
for (var i = 0; i < classes.length; i++) {
var name = classes[i].ClassDescription.Name;
var staff = classes[i].Staff.Name;
var locationName = classes[i].Location.Name;
var start = classes[i].StartDateTime.toISOString();
var end = classes[i].EndDateTime.toISOString();
var klasa = new Klasa(name,staff,locationName,start,end);
myArray.push(klasa);
}
myArray.sort(function(a,b){
var c = new Date(a.start);
var d = new Date(b.start);
return c-d;
});
string = JSON.stringify(myArray);
}
})
});
No, NodeJs is not multi-threaded and everything run on a single thread, this means except non-blocking calls (ie. IO) everything else will engage CPU until it returns, and NodeJS absolutely doesn't return half-way populated array to the end user, as long as you only do one HTTP call to populate your array.
Update:
As pointed out by #RyanWilcox any asynchronous (non-blocking syscall) call may hint NodeJS interpreter to leave your function execution half way and return to it later.
In general: No.
JavaScript is single threaded. While one function is running, no other function can be.
The exception is if you have delays between functions that access the value of an array.
e.g.
var index = i;
function getNext() {
async.get(myArray[i], function () {
i++;
if (i < myArray.length) {
getNext()
}
});
}
… in which case the array could be updated between the calls to the asynchronous function.
You can mitigate that by creating a deep copy of the array when you start the first async operation.
Javascript is single threaded language so you don't have to be worried about this kind of concurrency. That means no two parts of code are executed at the same time. Unlike many other programming languages, javascript has different concurrency model based on event loop. To achieve best performance, you should use non-blocking operations handled by callback functions, promises or events. I suppose that your external API provides some asynchronous i/o functions what is well suited for node.js.
If your pullData call doesn't take too long, another solution is to cache the data.
Fetch the data only when needed (so when the client accesses /getdata). If it is fetched you can cache the data with a timestamp. If the /getdata is called again, check if the cached data is older than 30 minutes, if so fetch again.
Also parsing the array to json..
var string = JSON.stringify(myArray);
..might be done outside the /getdata call, so this does not have to be done for each client visiting /getdata. Might make it slightly quicker.

Attempt to use Promises to delete records

Here is a simple task I would like to accomplish on Parse.com with Cloud Code.
The task consists to delete a Unit and what is related to it.
One Unit has several Sentences related to it and each Sentence has one or more Translations.
So when the task is performed, the Unit as well as the Sentence and Translations should be deleted.
I have a strong feeling I should be using Promises (and chain them up) in order to do what I want in a good manner.
Below is the code I wrote, but it works only partially (The translations are deleted, not the rest).
Parse.Cloud.define("deleteUnitAndDependencies", function(request, response) {
var unitListQuery = new Parse.Query("UnitList");
unitListQuery.equalTo("objectId", request.params.unitID);
unitListQuery.equalTo("ownerID", request.params.userID);
unitListQuery.find().then(function(resUnit) {
var sentenceListQuery = new Parse.Query("SentenceList");
sentenceListQuery.equalTo("unit", resUnit[0]);
return sentenceListQuery.find();
}).then(function(resSentence) {
var translatListQuery = new Parse.Query("TranslatList");
for (i = 0; i < resSentence.length; i++) {
var query = new Parse.Query("TranslatList");
query.equalTo("sentence", resSentence[i]);
translatListQuery = Parse.Query.or(translatListQuery, query);
}
return translatListQuery.find();
}).then(function(resTranslat) {
for (iT = 0; iT < resTranslat.length; iT++) {
resTranslat[iT].destroy({});
}
});
});
I surely need to add some lines of code like:
resSentence[x].destroy({});
and:
resUnit[0].destroy({});
The problem is that I do not quite see where is the adequate place for that.
Collect the objects to be deleted then use Parse.Object.destroyAll(someArray); to delete all at once.
In cases like this I like to use a scope variable to hold things for later use.
var scope = {
sentences: [],
units: []
};
// later inside then block...
scope.sentences.push(resSentence[i]);
// ...now we have them collected safely
.then(function() {
return Parse.Object.destroyAll(scope.sentences);
})

Do I ever need to synchronize node.js code like in Java?

I have only recently started developing for node.js, so forgive me if this is a stupid question - I come from Javaland, where objects still live happily sequentially and synchronous. ;)
I have a key generator object that issues keys for database inserts using a variant of the high-low algorithm. Here's my code:
function KeyGenerator() {
var nextKey;
var upperBound;
this.generateKey = function(table, done) {
if (nextKey > upperBound) {
require("../sync/key-series-request").requestKeys(function(err,nextKey,upperBound) {
if (err) { return done(err); }
this.nextKey = nextKey;
this.upperBound = upperBound;
done(nextKey++);
});
} else {
done(nextKey++);
}
}
}
Obviously, when I ask it for a key, I must ensure that it never, ever issues the same key twice. In Java, if I wanted to enable concurrent access, I would make make this synchronized.
In node.js, is there any similar concept, or is it unnecessary? I intend to ask the generator for a bunch of keys for a bulk insert using async.parallel. My expectation is that since node is single-threaded, I need not worry about the same key ever being issued more than once, can someone please confirm this is correct?
Obtaining a new series involves an asynchronous database operation, so if I do 20 simultaneous key requests, but the series has only two keys left, won't I end up with 18 requests for a new series? What can I do to avoid that?
UPDATE
This is the code for requestKeys:
exports.requestKeys = function (done) {
var db = require("../storage/db");
db.query("select next_key, upper_bound from key_generation where type='issue'", function(err,results) {
if (err) { done(err); } else {
if (results.length === 0) {
// Somehow we lost the "issue" row - this should never have happened
done (new Error("Could not find 'issue' row in key generation table"));
} else {
var nextKey = results[0].next_key;
var upperBound = results[0].upper_bound;
db.query("update key_generation set next_key=?, upper_bound=? where type='issue'",
[ nextKey + KEY_SERIES_WIDTH, upperBound + KEY_SERIES_WIDTH],
function (err,results) {
if (err) { done(err); } else {
done(null, nextKey, upperBound);
}
});
}
}
});
}
UPDATE 2
I should probably mention that consuming a key requires db access even if a new series doesn't have to be requested, because the consumed key will have to be marked as used in the database. The code doesn't reflect this because I ran into trouble before I got around to implementing that part.
UPDATE 3
I think I got it using event emitting:
function KeyGenerator() {
var nextKey;
var upperBound;
var emitter = new events.EventEmitter();
var requesting = true;
// Initialize the generator with the stored values
db.query("select * from key_generation where type='use'", function(err, results)
if (err) { throw err; }
if (results.length === 0) {
throw new Error("Could not get key generation parameters: Row is missing");
}
nextKey = results[0].next_key;
upperBound = results[0].upper_bound;
console.log("Setting requesting = false, emitting event");
requesting = false;
emitter.emit("KeysAvailable");
});
this.generateKey = function(table, done) {
console.log("generateKey, state is:\n nextKey: " + nextKey + "\n upperBound:" + upperBound + "\n requesting:" + requesting + " ");
if (nextKey > upperBound) {
if (!requesting) {
requesting = true;
console.log("Requesting new series");
require("../sync/key-series-request").requestSeries(function(err,newNextKey,newUpperBound) {
if (err) { return done(err); }
console.log("New series available:\n nextKey: " + newNextKey + "\n upperBound: " + newUpperBound);
nextKey = newNextKey;
upperBound = newUpperBound;
requesting = false;
emitter.emit("KeysAvailable");
done(null,nextKey++);
});
} else {
console.log("Key request is already underway, deferring");
var that = this;
emitter.once("KeysAvailable", function() { console.log("Executing deferred call"); that.generateKey(table,done); });
}
} else {
done(null,nextKey++);
}
}
}
I've peppered it with logging outputs, and it does do what I want it to.
As another answer mentions, you will potentially end up with results different from what you want. Taking things in order:
function KeyGenerator() {
// at first I was thinking you wanted these as 'class' properties
// and thus would want to proceed them with this. rather than as vars
// but I think you want them as 'private' members variables of the
// class instance. That's dandy, you'll just want to do things differently
// down below
var nextKey;
var upperBound;
this.generateKey = function (table, done) {
if (nextKey > upperBound) {
// truncated the require path below for readability.
// more importantly, renamed parameters to function
require("key-series-request").requestKeys(function(err,nKey,uBound) {
if (err) { return done(err); }
// note that thanks to the miracle of closures, you have access to
// the nextKey and upperBound variables from the enclosing scope
// but I needed to rename the parameters or else they would shadow/
// obscure the variables with the same name.
nextKey = nKey;
upperBound = uBound;
done(nextKey++);
});
} else {
done(nextKey++);
}
}
}
Regarding the .requestKeys function, you will need to somehow introduce some kind of synchronization. This isn't actually terrible in one way because with only one thread of execution, you don't need to sweat the challenge of setting your semaphore in a single operation, but it is challenging to deal with the multiple callers because you will want other callers to effectively (but not really) block waiting for the first call to requestKeys() which is going to the DB to return.
I need to think about this part a bit more. I had a basic solution in mind which involved setting a simple semaphore and queuing the callbacks, but when I was typing it up I realized I was actually introducing a more subtle potential synchronization bug when processing the queued callbacks.
UPDATE:
I was just finishing up one approach as you were writing about your EventEmitter approach, which seems reasonable. See this gist which illustrates the approach. I took. Just run it and you'll see the behavior. It has some console logging to see which calls are getting deferred for a new key block or which can be handled immediately. The primary moving part of the solution is (note that the keyManager provides the stubbed out implementation of your require('key-series-request'):
function KeyGenerator(km) {
this.nextKey = undefined;
this.upperBound = undefined;
this.imWorkingOnIt = false;
this.queuedCallbacks = [];
this.keyManager = km;
this.generateKey = function(table, done) {
if (this.imWorkingOnIt){
this.queuedCallbacks.push(done);
console.log('KG deferred call. Pending CBs: '+this.queuedCallbacks.length);
return;
};
var self=this;
if ((typeof(this.nextKey) ==='undefined') || (this.nextKey > this.upperBound) ){
// set a semaphore & add the callback to the queued callback list
this.imWorkingOnIt = true;
this.queuedCallbacks.push(done);
this.keyManager.requestKeys(function(err,nKey,uBound) {
if (err) { return done(err); }
self.nextKey = nKey;
self.upperBound = uBound;
var theCallbackList = self.queuedCallbacks;
self.queuedCallbacks = [];
self.imWorkingOnIt = false;
theCallbackList.forEach(function(f){
// rather than making the final callback directly,
// call KeyGenerator.generateKey() with the original
// callback
setImmediate(function(){self.generateKey(table,f);});
});
});
} else {
console.log('KG immediate call',self.nextKey);
var z= self.nextKey++;
setImmediate(function(){done(z);});
}
}
};
If your Node.js code to calculate the next key didn't need to execute an async operation then you wouldn't run into synchronization issues because there is only one JavaScript thread executing code. Access to the nextKey/upperBound variables will be done in sequence by only one thread (i.e. request 1 will access first, then request 2, then request 3 et cetera.) In the Java-world you will always need synchronization because multiple threads will be executing even if you didn't make a DB call.
However, in your Node.js code since you are making an async call to get the nextKey you could get strange results. There is still only one JavaScript thread executing your code, but it would be possible for request 1 to make the call to the DB, then Node.js might accept request 2 (while request 1 is getting data from the DB) and this second request will also make a request to the DB to get keys. Let's say that request 2 gets data from the DB quicker than request 1 and update nextKey/upperBound variables with values 100/150. Once request 1 gets its data (say values 50/100) then it will update nextKey/upperBound. This scenario wouldn't result in duplicate keys, but you might see gaps in your keys (for example, not all keys 100 to 150 will be used because request 1 eventually reset the values to 50/100)
This makes me think that you will need a way to sync access, but I am not exactly sure what will be the best way to achieve this.

Categories