Debouncing unique values with Bacon.js

Debouncing unique values with Bacon.js - javascript

I have a file system watcher producing a Bacon.js event stream of changed file paths. I'd like to filter and debounce this stream so that each unique file path only appears in the output stream after 5 seconds of no activity for that unique value. I essentially want to write the following pseudocode:
var outputStream = inputStream.groupBy('.path',
function (groupedStream) { return groupedStream.debounce(5000); }
).merge();
I have a convoluted solution that involves creating a separate Bacon.Bus for each filtered stream, and creating a new Bus each time I encounter a new unique value. These are each debounced and plugged into an output Bus. Is there a better way? Would I be better off switching to RxJS and using its groupBy function?

It turns out Bacon.js recently added a groupBy function! I had been misled by searches that indicated it didn't exist. So this works for me:
var outputStream = inputStream.groupBy(function (item) { return item.path; })
.flatMap(function (groupedStream) { return groupedStream.debounce(5000); });
Edit: here's a simplified version based on OlliM's comment (kiitos!):
var outputStream = inputStream.groupBy('.path')
.flatMap(function (groupedStream) { return groupedStream.debounce(5000); });

Related

JSON.stringify is very slow for large objects

I have a very big object in javascript (about 10MB).
And when I stringify it, it takes a long time, so I send it to backend and parse it to an object( actually nested objects with arrays), and that takes long time too but it's not our problem in this question.
The problem:
How can I make JSON.stringify faster, any ideas or alternatives, I need a javaScript solution, libraries I can use or ideas here.
What I've tried
I googled a lot and looks there is no better performance than JSON.stringify or my googling skills got rusty!
Result
I accept any suggestion that may solve me the long saving (sending to backend) in the request (I know its big request).
Code Sample of problem (details about problem)
Request URL:http://localhost:8081/systemName/controllerA/update.html;jsessionid=FB3848B6C0F4AD9873EA12DBE61E6008
Request Method:POST
Status Code:200 OK
Am sending a POST to backend and then in JAVA
request.getParameter("BigPostParameter")
and I read it to convert to object using
public boolean fromJSON(String string) {
if (string != null && !string.isEmpty()) {
ObjectMapper json = new ObjectMapper();
DateFormat dateFormat = new SimpleDateFormat(YYYY_MM_DD_T_HH_MM_SS_SSS_Z);
dateFormat.setTimeZone(TimeZone.getDefault());
json.setDateFormat(dateFormat);
json.configure(DeserializationFeature.ACCEPT_SINGLE_VALUE_AS_ARRAY, true);
WebObject object;
// Logger.getLogger("JSON Tracker").log(Level.SEVERE, "Start");
try {
object = json.readValue(string, this.getClass());
} catch (IOException ex) {
Logger.getLogger(JSON_ERROR).log(Level.SEVERE, "JSON Error: {0}", ex.getMessage());
return false;
}
// Logger.getLogger("JSON Tracker").log(Level.SEVERE, "END");
return this.setThis(object);
}
return false;
}
Like This
BigObject someObj = new BigObject();
someObj.fromJSON(request.getParameter("BigPostParameter"))
P.S : FYI this line object = json.readValue(string, this.getClass());
is also very very very slow.
Again to summarize
Problem in posting time (stringify) JavaScript bottle nick.
Another problem parsing that stringified into an object (using jackson), and mainly I have svg tags content in that stringified object as a style column, and other columns are strings, int mainly

As commenters said - there is no way to make parsing faster.
If the concern is that the app is blocked while it's stringifying/parsing then try to split data into separate objects, stringily them and assemble back into one object before saving on the server.
If loading time of the app is not a problem you could try to ad-hoc incremental change on top of the existing app.
... App loading
Load map data
Make full copy of the data
... End loading
... App working without changes
... When saving changes
diff copy with changed data to get JSON diff
send changes (much smaller then full data)
... On server
apply JSON diff changes on the server to the full data stored on server
save changed data
I used json-diff https://github.com/andreyvit/json-diff to calc changes, and there are few analogs.

Parsing is a slow process. If what you want is to POST a 10MB object, turn it into a file, a blob, or a buffer. Send that file/blob/buffer using formdata instead of application/json and application/x-www-form-urlencoded.
Reference
An example using express/multer

Solution
Well just as most big "repeatable" problems go, you could use async!
But wait, isn't JS still single-threaded even when it does async... yes... but you can use Service-Workers to get true async and serialize an object way faster by parallelizing the process.
General Approach
mainPage.js
//= Functions / Classes =============================================================|
// To tell JSON stringify that this is already processed, don't touch
class SerializedChunk {
constructor(data){this.data = data}
toJSON() {return this.data}
}
// Attach all events and props we need on workers to handle this use case
const mapCommonBindings = w => {
w.addEventListener('message', e => w._res(e.data), false)
w.addEventListener('error', e => w._rej(e.data), false)
w.solve = obj => {
w._state && await w._state.catch(_=>_) // Wait for any older tasks to complete if there is another queued
w._state = new Promise((_res, _rej) => {
// Give this object promise bindings that can be handled by the event bindings
// (just make sure not to fire 2 errors or 2 messages at the same time)
Object.assign(w, {_res, _rej})
})
w.postMessage(obj)
return await w._state // Return the final output, when we get the `message` event
}
}
//= Initialization ===================================================================|
// Let's make our 10 workers
const workers = Array(10).fill(0).map(_ => new Worker('worker.js'))
workers.forEach(mapCommonBindings)
// A helper function that schedules workers in a round-robin
workers.schedule = async task => {
workers._c = ((workers._c || -1) + 1) % workers.length
const worker = workers[workers._c]
return await worker.solve(task)
}
// A helper used below that takes an object key, value pair and uses a worker to solve it
const _asyncHandleValuePair = async ([key, value]) => [key, new SerializedChunk(
await workers.schedule(value)
)]
//= Final Function ===================================================================|
// The new function (You could improve the runtime by changing how this function schedules tasks)
// Note! This is async now, obviously
const jsonStringifyThreaded = async o => {
const f_pairs = await Promise.all(Object.entries(o).map(_asyncHandleValuePair))
// Take all final processed pairs, create a new object, JSON stringify top level
final = f_pairs.reduce((o, ([key, chunk]) => (
o[key] = chunk, // Add current key / chunk to object
o // Return the object to next reduce
), {}) // Seed empty object that will contain all the data
return JSON.stringify(final)
}
/* lot of other code, till the function that actually uses this code */
async function submitter() {
// other stuff
const payload = await jsonStringifyThreaded(input.value)
await server.send(payload)
console.log('Done!')
}
worker.js
self.addEventListener('message', function(e) {
const obj = e.data
self.postMessage(JSON.stringify(obj))
}, false)
Notes:
This works the following way:
Creates a list of 10 workers, and adds a few methods and props to them
We care about async .solve(Object): String which solves our tasks using promises while masking away callback hell
Use a new method: async jsonStringifyThreaded(Object): String which does the JSON.stringify asynchronously
We break the object into entries and solve each one parallelly (this can be optimized to be recursive to a certain depth, use best judgement :))
Processed chunks are cast into SerializedChunk which the JSON.stringify will use as is, and not try to process (since it has .toJSON())
Internally if the number of keys exceeds the workers, we round-robin back to the first worker and overschedule them (remember, they can handle queued tasks)
Optimizations
You may want to consider a few more things to improve performance:
Use of Transferable Objects which will decrease the overhead of passing objects to service workers significantly
Redesign jsonStringifyThreaded() to schedule more objects at deeper levels.

You can explore libraries like fast-json-stringify which use a template schema and use it while converting the json object, to boost the performance. Check the below article.
https://developpaper.com/how-to-improve-the-performance-of-json-stringify/

Rxjs - Consume API output and re-query when cache is empty

I'm trying to implement a version of this intro to RxJS (fiddle here) that instead of picking a random object from a returned API array, it consumes a backthrottled stream of objects from the returned API array.
Here's a portion of the code that produces a controlled Observable from the API response (full fiddle here):
var responseStream = requestStream.flatMap(function (requestUrl) {
return Rx.Observable.fromPromise(fetch(requestUrl));
}).flatMap(function(response) {
return Rx.Observable.fromPromise(response.json());
}).flatMap(function(json) {
return Rx.Observable.from(json);
}).controlled();
I just dump each emitted user in console.log, and use a click event stream to trigger the request() call in the controlled Observable:
responseStream.subscribe(function(user) {
console.log(user);
});
refreshClickStream.subscribe(function (res) {
responseStream.request(1);
});
There's about 50 user objects returned from the GitHub API, and I'd like to backthrottle-consume them one per click (as seen above). However, after I'm fresh out of user objects I'd like to send in another call to requestStream to fetch another API call, replenish the responseStream and continue providing user objects to console.log upon each click. What would be the RxJS-friendly way to do so?

I'd do it similarly to the article example with combineLatest() although I wonder if there's an easier way than mine.
I'm making request for only 3 items. Working with 3 items is hardcoded so you'll want to modify this. I was thinking about making it universal but that would require using Subject and made it much more complicated so I stayed with this simple example.
Also, I'm using concatMap() to trigger fetching more data. However, just clicking the link triggers just the combineLatest() which emits another item from the array.
See live demo: https://jsfiddle.net/h3bwwjaz/12/
var refreshButton = document.querySelector('#main');
var refreshClickStream = Rx.Observable.fromEvent(refreshButton, 'click')
.startWith(0)
.scan(function(acc, val, index) {
return index;
});
var usersStream = refreshClickStream
.filter(function(index) {
return index % 3 === 0;
})
.concatMap(function() {
var randomOffset = Math.floor(Math.random() * 500);
var url = 'https://api.github.com/users?since=' + randomOffset + '&per_page=3';
return Rx.Observable.fromPromise(fetch(url))
.flatMap(function(response) {
return Rx.Observable.fromPromise(response.json());
});
})
.combineLatest(refreshClickStream, function(responseArray, index) {
return responseArray[index % 3];
})
.distinct();
usersStream.subscribe(function(user) {
console.log(user);
});
I use refreshClickStream twice:
to emit next item in the array in combineLatest()
to check whether this is the end of the array and we need to make another request (that's the filter() operator).
At the end distinct() is required because when you click index % 3 === 0 time triggers in fact two emission. First is the one from downloading the data and the second one is directly in combineLatest() that we want to ignore because we don't want to iterate the same data again. Thanks to distinct() it's ignored and only the new values is passed.
I was trying to figure out a method without using distinct() but I couldn't find any.

Will Rx.Observable.groupBy clean up empty streams?

In a Node application I'm trying to process a stream of events using RxJS. The event stream is a list of changes to many documents. I'm using groupBy to partition the stream into new streams by documentId. But I'm wondering, once a document is closed on the client and no new events are added to the stream for that documentId, will groupBy dispose of that document's stream once it is empty? If not, how would I manually do that? I want to avoid a memory leak caused by new documents streams being create but never destroyed.

What I'd suggest doing:
instead of just having a documentChanges observable, have a documentEvents observable.
Clients will send documentOpened events when they open a document, documentChanged events when they change a document and documentClosed events when they close a document.
By sending all 3 types of events through the same observable, you establish and guarantee an ordering. If a client that sends documentOpened, documentChanged, documentClosed events in that order, then your server will see them in that order. Note there won't be any guarantees about the order of events sent by 2 different clients. This will just let you ensure that the events sent by a particular client will be in order.
And then, this is how you'd use groupByUntil:
documentEvents
.groupByUntil(
function (e) { return e.documentId; }, // key
null, // element
function (group) { // duration selector
var documentId = group.key;
return group.filter(function (e) { return e.eventType === 'documentClosed'; });
})
.flatMap(function (eventsForDocument) {
var documentId = eventsForDocument.key;
return eventsForDocument.whatever(...);
})
.subscribe(...);
Another option that is a lot simpler: you can just expire the group after an idle period. Depending on what you are doing with the events this may be more than sufficient. This example expires a group if the document has not been edited in 5 minutes. If more edits come in then a new group is spun up.
var idleTime = 5 * 60 * 1000;
events
.groupByUntil(
function(e) { return e.documentId; },
null,
function(g) { return g.debounce(idleTime); })
.flatMap...

Since you included the .NET tag, I'll cover Rx.NET as well.
Your question is a phrased a bit incorrectly. Streams are empty if and only if they never have an event. So, they can't become empty. A stream that isn't emitting data doesn't typically consume much in the way of resources though.
In .NET, groups will not terminate until the source terminates. We use 'GroupByUntil` which allows you to specify a durationSelector stream for each group. Observable.Timer often works well for this.
This means that you may get multiple non-concurrent streams with the same key appearing over time, but if (as is often the case) your group streams are flattened at some point, it won't matter.
In rxjs, we also have groupByUntil.
In Rx-Java, the groupByUntil method, which behaved similarly, was rolled into groupBy - see https://github.com/ReactiveX/RxJava/pull/1727 and https://github.com/benjchristensen/RxJava/commit/b9302956832e3e77579f63fd9db25aa60eb4192a for more details.
http://reactivex.io/documentation/operators/groupby.html says:
If you unsubscribe from one of the GroupedObservables, that GroupedObservable will be terminated. If the source Observable later emits an item whose key matches the GroupedObservable that was terminated in this way, groupBy will create and emit a new GroupedObservable to match the key.
So, in Rx-Java you must unsubscribe from a grouped observable stream to terminate it. takeUntil with a timer stream can work for this.
Addendum:
In response to your comment, a stream will not terminate until a downstream operator unsubscribes from it. The duration selector of groupByUntil would cause termination. If a document will not be opened again once closed, then you can just send a "documentclosed" event into the stream and use a regular groupBy with a takeWhile testing for the "documentClosed".
The reason why it's important the document is not opened again is because with groupBy (in rx-js, and rx.net) a new group will not be created if an already seen key reappears.
If this is a problem, then you will need to use groupByUntil and use a published stream to watch for the documentClosed event - using a published stream will ensure you don't get subscription side effects.

Dynamically run a sequence of promises using Q

I am writing a SPA with typescript using breeze and knockout.
What I want to do is to create a launch manager, which can perform the neccessary steps required to even start the site (e.g. read configuration json, download odata metadata, initialize breeze metadata store and so on).
I've created the following to represent each step in the launch sequence:
export enum LauncherProgressStatus {
Ready,
InProgress,
Success,
Failed,
Aborted
}
export class LauncherProgressItem {
public status: KnockoutObservable<LauncherProgressStatus> = ko.observable<LauncherProgressStatus>();
public description: KnockoutObservable<String> = ko.observable<String>();
public statusText: KnockoutComputedFunctions<String> = ko.computed<String>(() => {
return LauncherProgressItem.getStatusText(this.status());
});
public start() {
this.action(this);
}
constructor(descriptionText: String,
public action: (arg: LauncherProgressItem) => Boolean) {
this.status(LauncherProgressStatus.InProgress);
this.description(descriptionText);
}
public static getStatusText(status: LauncherProgressStatus) : String {
switch (status) {
case LauncherProgressStatus.Ready:
return "Ready";
case LauncherProgressStatus.InProgress:
return "In progress";
case LauncherProgressStatus.Success:
return "Success";
case LauncherProgressStatus.Aborted:
return "Aborted";
default:
return "Failed";
}
}
}
TL;DR I create each step like this in code:
var item1 = new launcher.LauncherProgressItem("Loading configuration...", (item: LauncherProgressItem) => {
cfgMgr.setConfigurationFromFile("config.json?bust=" + (new Date()).getTime());
return true;
});
Now the problem: I want to utilize this to create a promise chain using Q. I can do this manually, i.e.
q.fcall(() => item1.action(item1))
.then(() => item2.action(item2))
.fail((r) => { console.log("Many whelps, HANDLE IT!") });
But I want to create some kind of manager object that doesnt really know how many steps is required. It will just be responsible for building an array of promises and execute them in sequence, whilst being able to detect errors (in the fail promise presumably) and abort the sequence.
The manager will have some kind of collection containing the LauncherProgressItem steps. Then I'm looking to build a chain of promises based on the content of that collection.
I've been looking at this for a while now but can't really seem to get my head around how to do this with Q. I've seen some examples etc but I don't really understand how it works.
Anyone got any suggestions on how to achieve this?
Update: I'll try to clarify what I am trying to achieve: My LauncherProgressItem wraps a lambda function and some state information which I bind to my view. This is why I am using these, but this is kind of irrelevant to what I'm actually struggling with.
So lets assume I have a class which contains an array of lambdas. This class has a method which will run all these lambas in sequence using Q, aborting on error. Exactly what I would achieve with the following code:
Q.fcall(doSomething).then(doSomethingElse).fail(reportError);
However, in this case doSomething and doSomethingElseresides in an array of functions, rather than a fixed number of steps. This is because I want it to be reusable, i.e. being able to run in multiple scenarios depending on the task at hand. So I want to avoid hard-coding the chain of functions to run.

Sorry I don't know typescript but I thought the comment thread above was not going very well, so here's the function you asked for in plain JS:
function runInSequence (functions) {
if (!functions || !functions.length) {
return;
}
var nextPromise = Q.fcall(functions[0]);
functions.slice(1).forEach(function (f) {
nextPromise = nextPromise.then(f);
});
nextPromise.fail(yourErrorHandler);
}

Create a copy of a module instead of an instance in Node.js

this would be my first question ever on stackoverflow, hope this goes well.
I've been working on a game (using corona SDK) and I used Node.js to write a small little server to handle some chat messages between my clients, no problems there.
Now I'm working on expanding this little server to do some more, and what I was thinking to do is create an external file (module) that will hold an object that has all the functions and variables I would need to represent a Room in my games "Lobby", where 2 people can go into to play one against the other, and each time I have 2 players ready to play, I would create a copy of this empty room for them, and then initialize the game in that room.
So I have an array in my main project file, where each cell is a room, and my plan was to import my module into that array, and then I can init the game in that specific "room", the players would play, the game will go on, and all would be well... but... my code in main.js:
var new_game_obj = require('./room.js');
games[room_id] = new_game_obj();
games[room_id].users = [user1_name,user2_name];
Now, in my room.js, I have something of the sort:
var game_logistics = {};
game_logistics.users = new Array();
game_logistics.return_users_count = function(){
return game_logistics.users.length;
}
module.exports = function() {
return game_logistics;
}
So far so good, and this work just fine, I can simply go:
games[room_id].return_users_count()
And I will get 0, or 1, or 2, depending of course how many users have joined this room.
The problems starts once I open a new room, since Node.js will instance the module I've created and not make a copy of it, if I now create a new room, even if I eliminated and/or deleted the old room, it will have all information from the old room which I've already updated, and not a new clean room. Example:
var new_game_obj = require('./room.js');
games["room_1"] = new_game_obj();
games["room_2"] = new_game_obj();
games["room_1"].users = ["yuval","lahav"];
_log(games["room_1"].return_user_count()); //outputs 2...
_log(games["room_2"].return_user_count()); //outputs 2...
Even doing this:
var new_game_obj = require('./room.js');
games["room_1"] = new_game_obj();
var new_game_obj2 = require('./room.js');
games["room_2"] = new_game_obj2();
games["room_1"].users = ["yuval","lahav"];
_log(games["room_1"].return_user_count()); //outputs 2...
_log(games["room_2"].return_user_count()); //outputs 2...
Gives the same result, it is all the same instance of the same module in all the "copies" I make of it.
So my question as simple as that, how do I create a "clean" copy of my original module instead of just instancing it over and over again and actually have just one messy room in the end?

What you're doing is this (replacing your require() call with what gets returned);
var new_game_obj = function() {
return game_logistics;
}
So, every time you call new_game_obj, you return the same instance of game_logistics.
Instead, you need to make new_game_obj return a new instance of game_logistics;
// room.js
function Game_Logistics() {
this.users = [];
this.return_users_count = function(){
return this.users.length;
};
}
module.exports = function() {
return new Game_Logistics();
}
This is quite a shift in mentality. You'll see that we're using new on Game_Logistics in module.exports to return a new instance of Game_Logistics each time it's called.
You'll also see that inside Game_Logistics, this is being used everywhere rather than Game_Logistics; this is to make sure we're referencing the correct instance of Game_Logistics rather than the constructor function.
I've also capitalized your game_logistics function to adhere to the widely-followed naming convention that constructor functions should be capitalized (more info).
Taking advantage of the prototype chain in JavaScript is recommended when you're working with multiple instances of functions. You can peruse various articles on "javascript prototypical inheritance* (e.g. this one), but for now, the above will accomplish what you need.

We Keep Coding

JavaScript is the programming language of the Web.