Will Rx.Observable.groupBy clean up empty streams? - javascript

In a Node application I'm trying to process a stream of events using RxJS. The event stream is a list of changes to many documents. I'm using groupBy to partition the stream into new streams by documentId. But I'm wondering, once a document is closed on the client and no new events are added to the stream for that documentId, will groupBy dispose of that document's stream once it is empty? If not, how would I manually do that? I want to avoid a memory leak caused by new documents streams being create but never destroyed.

What I'd suggest doing:
instead of just having a documentChanges observable, have a documentEvents observable.
Clients will send documentOpened events when they open a document, documentChanged events when they change a document and documentClosed events when they close a document.
By sending all 3 types of events through the same observable, you establish and guarantee an ordering. If a client that sends documentOpened, documentChanged, documentClosed events in that order, then your server will see them in that order. Note there won't be any guarantees about the order of events sent by 2 different clients. This will just let you ensure that the events sent by a particular client will be in order.
And then, this is how you'd use groupByUntil:
documentEvents
.groupByUntil(
function (e) { return e.documentId; }, // key
null, // element
function (group) { // duration selector
var documentId = group.key;
return group.filter(function (e) { return e.eventType === 'documentClosed'; });
})
.flatMap(function (eventsForDocument) {
var documentId = eventsForDocument.key;
return eventsForDocument.whatever(...);
})
.subscribe(...);
Another option that is a lot simpler: you can just expire the group after an idle period. Depending on what you are doing with the events this may be more than sufficient. This example expires a group if the document has not been edited in 5 minutes. If more edits come in then a new group is spun up.
var idleTime = 5 * 60 * 1000;
events
.groupByUntil(
function(e) { return e.documentId; },
null,
function(g) { return g.debounce(idleTime); })
.flatMap...

Since you included the .NET tag, I'll cover Rx.NET as well.
Your question is a phrased a bit incorrectly. Streams are empty if and only if they never have an event. So, they can't become empty. A stream that isn't emitting data doesn't typically consume much in the way of resources though.
In .NET, groups will not terminate until the source terminates. We use 'GroupByUntil` which allows you to specify a durationSelector stream for each group. Observable.Timer often works well for this.
This means that you may get multiple non-concurrent streams with the same key appearing over time, but if (as is often the case) your group streams are flattened at some point, it won't matter.
In rxjs, we also have groupByUntil.
In Rx-Java, the groupByUntil method, which behaved similarly, was rolled into groupBy - see https://github.com/ReactiveX/RxJava/pull/1727 and https://github.com/benjchristensen/RxJava/commit/b9302956832e3e77579f63fd9db25aa60eb4192a for more details.
http://reactivex.io/documentation/operators/groupby.html says:
If you unsubscribe from one of the GroupedObservables, that GroupedObservable will be terminated. If the source Observable later emits an item whose key matches the GroupedObservable that was terminated in this way, groupBy will create and emit a new GroupedObservable to match the key.
So, in Rx-Java you must unsubscribe from a grouped observable stream to terminate it. takeUntil with a timer stream can work for this.
Addendum:
In response to your comment, a stream will not terminate until a downstream operator unsubscribes from it. The duration selector of groupByUntil would cause termination. If a document will not be opened again once closed, then you can just send a "documentclosed" event into the stream and use a regular groupBy with a takeWhile testing for the "documentClosed".
The reason why it's important the document is not opened again is because with groupBy (in rx-js, and rx.net) a new group will not be created if an already seen key reappears.
If this is a problem, then you will need to use groupByUntil and use a published stream to watch for the documentClosed event - using a published stream will ensure you don't get subscription side effects.

Related

Web Worker Creating Garbage

I'm working on a project that utilizes web workers. It seems that the workers are generating quite a bit of extra garbage that has to be collected from the message passing.
I'm sending three things to the worker via post message from the main thread. First is just a number, second is an array with 7 numbers, and 3rd is the date. The firs two are properties of an object as seen below. This is called every 16ms on RAF for about 20 objects. The GC ends up collecting 12MB every 2 seconds or so. I'm wondering if there is a way to do this without creating so much garbage? Thanks for any help!
//planet num (property of object) is just a number like: 1
//planetele looks like this (property of an object)
//[19.22942, 313.4868, 0.04441, 0.7726, 170.5310, 73.9893, 84.3234]
//date is just the date object
//posted to worker like so:
planetWorker.postMessage({
"planetnum": planet.num,
"planetele": planet.ele,
"date": datet
});
//the worker.js file uses that information to do calculations
//and sends back the planet number, with xyz coordinates. (4 numbers)
postMessage({data: {planetnum : planetnum, planetpos: planetpos}});
I tried two different avenues and ended up using a combination of them. First, before I sent some of the elements over I used JSON.stringify to convert them to strings, then JSON.parse to get them back once they were sent to the worker. For the array I ended up using transferable objects. Here is a simplified example of what I did:
var ast = [];
ast.elements = new Float64Array([0.3871, 252.2507, 0.20563, 7.005, 77.4548, 48.3305, 0.2408]);
ast.num = 1;
var astnumJ = JSON.stringify(ast.num); // Probably not needed, just example
// From main thread, post message to worker
asteroidWorker.postMessage({
"asteroidnum": astnumJ,
"asteroidele": ast.elements.buffer
},[ast.elements.buffer]);
This sends the array to the worker, it doesn't copy it, which reduces the garbage made. It is now not accessible in the main thread, so once the worker posts the message, you have to send the array back to the main thread or it wont be accessible as a property of ast anymore. In my case, because I have 20 - 30 ast objects, I need to make sure they all have their elements restored via post message before I call another update to them. I did this with a simple counter in a loop.
// In worker.js
asteroidele = new Float64Array(e.data.asteroidele); // cast to type
asteroidnum = JSON.parse(e.data.asteroidnum); // parse JSON
// Do calculations with this information in worker then return it to the main thread
// Post message from worker back to main
self.postMessage({
asteroidnum : asteroidnum,
asteroidpos : asteroidpos, // Calculated position with elements
asteroidele : asteroidele // Return the elements buffer back to main
});
// Main thread worker onmessage function
asteroidWorker.onmessage = function(e){
var data1 = e.data;
ast.ele = data1.asteroidele; // Restore elements back to ast object
}
Not sure this is the best approach yet, but it does work for sending the array to and from the worker without making a bunch of extra garbage. I think the best approach here will be to send the array to the worker and leave it there, then just return updated positions. Working on that still.

Do RANGE_ADD mutations have to fetch the whole connection-array each time?

I finally got RANGE_ADD mutations to work. Now I'm a bit confused and worried about their performance.
One outputField of a RANGE_ADD mutation is the edge to the newly created node. Each edge has a cursor. cursorForObjectInConnection()(docs) is a helper function that creates that cursor. It takes an array and a member object. See:
newChatroomMessagesEdge: {
type: chatroomMessagesEdge,
resolve: async ({ chatroom, message }) => {
clearCacheChatroomMessages();
const messages = await getChatroomMessages(chatroom.id);
let messageToPass;
for (const m of messages) {
if (m.id === message.id) {
messageToPass = m;
}
}
return {
cursor: cursorForObjectInConnection(messages, messageToPass),
node: messageToPass,
};
},
},
plus a similar edge for user messages.
Now this is what confuses me. Say you want to make a chatroom app. You have a userType, a chatroomTypeand a messageType. Both the userType and the chatroomType have a field messages. It queries for all of the user's or chatroom's messages respectively and is defined as a relay-connection that points to messageType. Now, each time a user sends a new message, two new RANGE_ADD mutations are committed. One that creates an edge for the chatroom's messages and one that creates an edge for the user's messages. Each time, because of cursorForObjectInConnection(), a query for all of the user's messages and one for all of the chatroom's messages is sent to a database. See:
const messages = await getChatroomMessages(chatroom.id);
As one can imagine, there occur lots of "message-sent-mutations" in a chatroom and the number of messages in a chatroom grows rapidly.
So here is my question: Is it really necessary to query for all of the chatroom messages each time? Performance-wise, this seems like an awful thing to do.
Sure, although I did not look into it yet, there are optimistic updates I can use to ease the pain client-side - a sent message gets displayed immediately. But still, the endless database queries continue.
Also, there is Dataloader. But as far as I understand Dataloader, it is used as a per request batching and caching mechanism - especially in conjunction with GraphQL. Since each mutation should be a new request, Dataloader does not help on that front either it seems.
If anyone can shed some light on my thoughts and worries, I'd be more than happy :)

Node.js Socket IO: How to continuously save socket data to MongoDB

I'm building a 3D game in the browser using THREE.js. Lots of fun, but I came across the following situation:
An object in my 3D scene is continuously moving around, driven by user input. I need to save the object's position to my database in real-time.
Let's start at the front-end. Angular.js is watching my object's position using its built-in $watch functionality. The object's position can change multiple times per second.
On each change, I emit an event to the backend Node.js server using Socket IO, like so:
socket.emit('update', {
id: id,
position: position
});
In the back-end, the event is caught and immediatly emitted to other members in the same Socket IO Room. This way, everyone in this room will have the most real-time update possible.
Now, because the event can happen multiple times per second, I don't want to update my MongoDB collection on each change, since this would cause a lot of overhead. Instead, I'm looking for a way of incidentally saving data to the database.
I've came up with a solution by using Node.js setInterval function, which saves data every 1000ms. For each distinct id (which is unique per object) received on the backend, a new key is created on an JavaScript object, thus keeping track of changes on a per-object basis.
The (simplified) code on the backend:
let update_queue = new Object();
// ...
// Update Event
socket.on('update', (msg) => {
// Flag Changes
if (!update_queue[msg.id]) update_queue[msg.id] = { changes: true };
// Set Interval Timer
if (!update_queue[msg.id].timer) {
update_queue[msg.id].timer = setInterval(() => {
if (!update_queue[msg.id].changes) {
clearInterval(update_queue[msg.id].timer);
return;
}
// This saves data to MongoDB
Object3DCollection.update(msg.id, msg.position)
.then((res) => {
console.log('saved');
});
// Unflag Changes
update_queue[msg.id].changes = false;
}, 1000);
}
// Immediate Broadcast to Socket Room
socket.broadcast.to('some_room').emit('object_updated', msg);
});
The Question
Is this a proper way of handling very frequent socket data and still saving it to a database? Or are there any other suggestions/solutions that are more robuust or work better.
Note
I do not want to wait for my object to be saved to the database and then emit the saved data to the rest of the socket room. The delay of database write operations is not suitable for the real-time game situation I'm dealing with.
Thanks in advance! All suggestions/solutions are appreciated and will be considered.

Is localStorage thread safe?

I'm curious about the possibility of damaging localStorage entry by overwriting it in two browser tabs simultaneously. Should I create a mutex for local storage?
I was already thinking of such pseudo-class:
LocalStorageMan.prototype.v = LocalStorageMan.prototype.value = function(name, val) {
//Set inner value
this.data[name] = val;
//Delay any changes if the local storage is being changed
if(localStorage[this.name+"__mutex"]==1) {
setTimeout(function() {this.v(name, val);}, 1);
return null; //Very good point #Lightness Races in Orbit
}
//Lock the mutext to prevent overwriting
localStorage[this.name+"__mutex"] = 1;
//Save serialized data
localStorage[this.name] = this.serializeData;
//Allow usage from another tabs
localStorage[this.name+"__mutex"] = 0;
}
The function above implies local storage manager that is managing one specific key of the local storage - localStorage["test"] for example. I want to use this for greasomonkey userscripts where avoiding conlicts is a priority.
Yes, it is thread safe. However, your code isn't atomic and that's your problem there. I'll get to thread safety of localStorage but first, how to fix your problem.
Both tabs can pass the if check together and write to the item overwriting each other. The correct way to handle this problem is using StorageEvents.
These let you notify other windows when a key has changed in localStorage, effectively solving the problem for you in a built in message passing safe way. Here is a nice read about them. Let's give an example:
// tab 1
localStorage.setItem("Foo","Bar");
// tab 2
window.addEventListener("storage",function(e){
alert("StorageChanged!"); // this will run when the localStorage is changed
});
Now, what I promised about thread safety :)
As I like - let's observe this from two angles - from the specification and using the implementation.
The specification
Let's show it's thread safe by specification.
If we check the specification of Web Storage we can see that it specifically notes:
Because of the use of the storage mutex, multiple browsing contexts will be able to access the local storage areas simultaneously in such a manner that scripts cannot detect any concurrent script execution.
Thus, the length attribute of a Storage object, and the value of the various properties of that object, cannot change while a script is executing, other than in a way that is predictable by the script itself.
It even elaborates further:
Whenever the properties of a localStorage attribute's Storage object are to be examined, returned, set, or deleted, whether as part of a direct property access, when checking for the presence of a property, during property enumeration, when determining the number of properties present, or as part of the execution of any of the methods or attributes defined on the Storage interface, the user agent must first obtain the storage mutex.
Emphasis mine. It also notes that some implementors don't like this as a note.
In practice
Let's show it's thread safe in implementation.
Choosing a random browser, I chose WebKit (because I didn't know where that code is located there before). If we check at WebKit's implementation of Storage we can see that it has its fare share of mutexes.
Let's take it from the start. When you call setItem or assign, this happens:
void Storage::setItem(const String& key, const String& value, ExceptionCode& ec)
{
if (!m_storageArea->canAccessStorage(m_frame)) {
ec = SECURITY_ERR;
return;
}
if (isDisabledByPrivateBrowsing()) {
ec = QUOTA_EXCEEDED_ERR;
return;
}
bool quotaException = false;
m_storageArea->setItem(m_frame, key, value, quotaException);
if (quotaException)
ec = QUOTA_EXCEEDED_ERR;
}
Next, this happens in StorageArea:
void StorageAreaImpl::setItem(Frame* sourceFrame, const String& key, const String& value, bool& quotaException)
{
ASSERT(!m_isShutdown);
ASSERT(!value.isNull());
blockUntilImportComplete();
String oldValue;
RefPtr<StorageMap> newMap = m_storageMap->setItem(key, value, oldValue, quotaException);
if (newMap)
m_storageMap = newMap.release();
if (quotaException)
return;
if (oldValue == value)
return;
if (m_storageAreaSync)
m_storageAreaSync->scheduleItemForSync(key, value);
dispatchStorageEvent(key, oldValue, value, sourceFrame);
}
Note that blockUntilImportComplete here. Let's look at that:
void StorageAreaSync::blockUntilImportComplete()
{
ASSERT(isMainThread());
// Fast path. We set m_storageArea to 0 only after m_importComplete being true.
if (!m_storageArea)
return;
MutexLocker locker(m_importLock);
while (!m_importComplete)
m_importCondition.wait(m_importLock);
m_storageArea = 0;
}
They also went as far as add a nice note:
// FIXME: In the future, we should allow use of StorageAreas while it's importing (when safe to do so).
// Blocking everything until the import is complete is by far the simplest and safest thing to do, but
// there is certainly room for safe optimization: Key/length will never be able to make use of such an
// optimization (since the order of iteration can change as items are being added). Get can return any
// item currently in the map. Get/remove can work whether or not it's in the map, but we'll need a list
// of items the import should not overwrite. Clear can also work, but it'll need to kill the import
// job first.
Explaining this works, but it can be more efficient.
No, it's not. Mutex was removed from the spec, and this warning was added instead:
The localStorage getter provides access to shared state. This
specification does not define the interaction with other browsing
contexts in a multiprocess user agent, and authors are encouraged to
assume that there is no locking mechanism. A site could, for instance,
try to read the value of a key, increment its value, then write it
back out, using the new value as a unique identifier for the session;
if the site does this twice in two different browser windows at the
same time, it might end up using the same "unique" identifier for both
sessions, with potentially disastrous effects.
See HTML Spec: 12 Web storage

Subscribe to a count of an existing collection

I need to keep track of a counter of a collection with a huge number of documents that's constantly being updated. (Think a giant list of logs). What I don't want to do is to have the server send me a list of 250k documents. I just want to see a counter rising.
I found a very similar question here, and I've also looked into the .observeChanges() in the docs but once again, it seems that .observe() as well as .observeChanges() actually return the whole set before tracking what's been added, changed or deleted.
In the above example, the "added" function will fire once per every document returned to increment a counter.
This is unacceptable with a large set - I only want to keep track of a change in the count as I understand .count() bypasses the fetching of the entire set of documents. The former example involves counting only documents related to a room, which isn't something I want (or was able to reproduce and get working, for that matter)
I've gotta be missing something simple, I've been stumped for hours.
Would really appreciate any feedback.
You could accomplish this with the meteor-streams smart package by Arunoda. It lets you do pub/sub without needing the database, so one thing you could send over is a reactive number, for instance.
Alternatively, and this is slightly more hacky but useful if you've got a number of things you need to count or something similar, you could have a separate "Statistics" collection (name it whatever) with a document containing that count.
There is an example in the documentation about this use case. I've modified it to your particular question:
// server: publish the current size of a collection
Meteor.publish("nbLogs", function () {
var self = this;
var count = 0;
var initializing = true;
var handle = Messages.find({}).observeChanges({
added: function (id) {
count++;
if (!initializing)
self.changed("counts", roomId, {nbLogs: count});
},
removed: function (id) {
count--;
self.changed("counts", roomId, {nbLogs: count});
}
// don't care about moved or changed
});
// Observe only returns after the initial added callbacks have
// run. Now return an initial value and mark the subscription
// as ready.
initializing = false;
self.added("counts", roomId, {nbLogs: count});
self.ready();
// Stop observing the cursor when client unsubs.
// Stopping a subscription automatically takes
// care of sending the client any removed messages.
self.onStop(function () {
handle.stop();
});
});
// client: declare collection to hold count object
Counts = new Meteor.Collection("counts");
// client: subscribe to the count for the current room
Meteor.subscribe("nbLogs");
// client: use the new collection
Deps.autorun(function() {
console.log("nbLogs: " + Counts.findOne().nbLogs);
});
There might be some higher level ways to do this in the future.

Categories