NodeJS + SocketIO large socket event management

NodeJS + SocketIO large socket event management - javascript

I have a million Dinosaur Users all logged in.
Dinosaurs want to see when other Dinosaurs update their profile in real time, so they are hooked into the NodeJS/Mongoose model as:
dinosaur.schema.post('save', function (doc) {
socket.emit('dinosaur:save', doc);
});
where socket is the socket of the connected Dinosaur.
Dinosaurs are also going to see real time updates from several other things. Maybe news, comments, etc etc.
So, my question, is there some instance where this emitting of events will grow large and impact performance?
On the client side, I'll have something like socket.on('dinosaur:save', function(){})... I destroy the listeners when not needed. BUT, if I'm listening to every dinosaur:save, I could theoretically be processing that for a million saves a second (say if every dinosaur updated their profile in the same second). It just seems like there's a better way to do that with large data sets.
I imagine there are several other events I may want to watch and I'm just wondering if there are some recommended methods for this kind of socket management.
EDIT: To be clear, I'm aware of rooms, but if I, for example, have a scrolling list of all nearby Dinosaurs in my area, I probably just want to hook into receiving all of the dinosaur:save events. So I'm still not sure.

Notifying a million of anything is a lot of packets and if the thing you're notifying occurs a lot, that's a lot of a lot and it's even more than a lot to try to show on screen.
The usual first things to consider are:
How real-time do these notifications really have to be? Can you batch up 60 seconds or longer of notifications into one packet per user per notification period?
Does every user really have to see every single change from every other user. You know there's absolutely no way that any user interface can present the state of a million other users. So, I'd argue that every user doesn't have to know about the state of every other user. Maybe if there's 1-50 other users, but not if there's a million.
Can you algorithmically determine which users state a given user might be interested in and only broadcast to them. For example, can you keep them up to date only on other users that are geographically near them?
Can you have a user interface where the user tells you which other users they want to track so you only update those? Or perhaps some combination of users they tell you about and users who are geograpnhically interesting to them. The point is that you can't watch a million users anyway so you're going to have to invent a UI that shows a lot less than that anyway.
You can always have a UI that will fetch the state of other users upon demand if the client doesn't already have that state, so you don't have to keep the state for all million users in each client (since it can't possibly all be shown at once anyway). If the user browses to see some things they don't already have, you just fetch it from the server via the socket or ajax call.
Oh, and at the scale you're talking about, you are probably going to need to have your user's connections spread out among several servers so you're going to have to handle that complexity too.

In case anyone comes across this in the future, here's the problem and my current solution.
We want real time updates. If you're on someones profile page, and t hey update it, show that. If you're viewing some catered result set, say, of user profiles, and any one of those users updates their profile, show that. If you're on another page and some counter changes, say, of users near you, show that. But, we aren't going to be on all these pages at the same time, so on the client side i don't even want to know about the other changes if I'm not on those other pages. This is the problem that could cause me to be notified of everything which could cause bandwidth issues, and a whole bunch of unecessary socket usage.
So, the way I'm solving my problem is using rooms. I'm using rooms over namespaces because namespaces are generally used for two mutually disjoint applications accessing the same socket resource. Rooms also just fit better with this application.
I've created a dynamic, on the fly room for every user profile page. When a visitor opens a profile page have the client call socket.emit("joinRoom", modelName + ":" + modelObj._id); and on the server handle that with socket.on('joinRoom', function(room) {
socket.join(room);
});. This automatically creates the room if there isn't one yet. And adds the user to it. modelName could be whatever we want. It's just a naming convention of how I split up the rooms. You can call rooms anything. BUT the important part is the ._id at the end. Using Mongoose, no two DB objects can have the same ._id, so this guarantees unique rooms.
When the owner of this profile page updates their information we call on the server:
io.sockets.in('Dinosaur:' + doc._id).emit("Dinosaur:" + doc._id + ":updated", doc); And receive that on the client side using
socket.on(modelName + ":" + modelObj._id + ":updated" , function(msg){
// do something
})
Viola, we have only sent this necessary information to the interested clients.
--
( A separate problem ) --
Using this approach, we can also deliver data pertaining to multiple users. if we have a catered result list of user profiles, for each profile, we can add the current user into the rooms of all of those catered result profiles. (So they're in a room belonging to _id X, _id Y, _id Z, etc.
The current user would then be in multiple rooms, all reflecting the immediate updates of those users, and therefore the entire catered result list, for whatever list it may be (perhaps it is "Dinosaurs nearby").
Another approach to this, especially if the list is somewhat more static, is to have the socket re-deliver the result set every X seconds, using the same socket, and just the same initial room.

Related

Syncing multiple Socket.io clients

I'm building an app that uses Angular.js for the front and Socket.IO & Redis on Express on the back.
The base usage of sockets is to allow one type of users to push items to lists that are consumed by groups of a second type of users.
A simple example:
Students can push messages into a class list and only teachers of this class can see the list.
I'm trying to sync the list between multiple teachers that are connected at different times,
the lists are stored in a Redis store and I'm wondering if the correct approach to sync clients:
A. Send the list on each update - saving the need of having to manage sync in the client and having potential missmatches.
B. Send the list only on connection and apply incremental updates on successive events.
I'm sure this has been addressed in the past as it seems quite a basic issue with socket communication but I was not able to find a definitive answer.
Thanks!

If the list is not particularly large, then I'd think you want to go with something simple. The simplest thing I can think of is as follows:
Student creates change to the list and sends message to the server (which could be an ajax call, doesn't have to be a web socket).
Server receives message and puts it into the appropriate list storage.
Server then looks for any clients monitoring that list and sends an update message to them.
Teacher connects to the server. Any lists that the teacher is monitoring are sent in their entirety to the teacher and they are subscribed to updates for those lists.
This way, you're never actually doing sync which simplifies matters a lot - you're just doing download list and then incremental updates. There's only one master store. If a client goes off-line, they just get a fresh copy of the list and resubscribe to updates when they come back on-line. Avoiding sync makes the whole solution a lot simpler. This assumes the data is not particularly large so it's feasible to just get a fresh copy of the list as needed.
If you do want to do sync, then a fairly straightforward technique is to maintain one master copy of the store on the server and have every change transaction coin a monotonically increasing transaction ID. Then, each synced copy can just keep track of the last transaction ID that they synced and request all transactions since then. The data store needs to keep track of all changes as transactions (often by writing to a transaction log for each transaction or perhaps a feature in some databases) so any given set of transactions can be played back for any client that is syncing.

An alternate way of reloading a chat after few seconds

So I was implementing a chat room. I'll start off with the schema that I used.
I have a room table, that basically stores the information regarding the chatroom like the number of participant, the topic etc etc.
I have a users table that stores the users info.
I have a posts table that stores the posts. This has a foreign key from Users and from room tables.
also, I have one final table that is to have a relation between users and rooms. So it just has the roomid and the userid from the users who are a part of the room.
Now, I have three divs on page, one for the chatarea, the other where the people online are shown and then there is a text area to post the message.
What I am doing currently is, to have a javascript function loadChats(), now this method calls a php file that just fetches all the posts in that particular room till now. And the same is dumped into my div ie "chatroom".
Also, similarly, I have a loadParticipants() that load the users every other second.
I am using jquery.post for the purpose and in the end of the method, I do a setTimeout in the end of the function. Now here are my questions
Ofcourse i can make this better. Any suggestions? I was thinking of a few.
On every call to php, I get the entire chathistory and send it back to browser, ofcourse I can check if the count of messages is the same as it is on the client side, and if it is, then I wont send the messages. But is it going to make it any better? How?
Also, making a call to server side every other second seems a bit too much of an overkill. Is there any way to do it like, if some new chat is added to posts table, then that particular chatroom is notified and updated? i.e. instead of constantly pinging the server to ask for new request, just ask it once, and wait if there is anything new or not. When that request is completed, it pings the server again for the next update.

You should look into websockets (I've never used them with PHP but this seems really promising: http://socketo.me/). What you can do is have the server push any new messages to the client whenever they come in, and have each of the clients push to the server, etc. This way you won't have to keep pinging the server over and over every 2 seconds, and loading tons of data to compare. When there's a new message, the server saves it to some database and then pushes that message through all the open sockets. Same thing with logging in/logging off.
edit: Just looked through the page even more and their tutorial even goes through how to get it set up with a basic chatroom-esque functionality.

Caching client side code results across pages

In our application, we are painting navigation component using JavaScript/jQuery and because of authorization, this involves complex logic.
Navigation component is required on almost all authenticated pages, hence whenever user navigates from one page to another, the complex logic is repeated on every page.
I am sure that under particular conditions the results of such complex calculations will not change for a certain period, hence I feel recalculation is unnecessary under those conditions.
So I want to store/cache the results at browser/client side. One of the solution I feel would be creating a cookie with the results.
I need suggestions if it is a good approach. If not, what else can I do here?

If you can rely on modern browsers HTML 5 web strorage options are a good bet.
http://www.html5rocks.com/en/features/storage
Quote from above
There are several reasons to use client-side storage. First, you can
make your app work when the user is offline, possibly sync'ing data
back once the network is connected again. Second, it's a performance
booster; you can show a large corpus of data as soon as the user
clicks on to your site, instead of waiting for it to download again.
Third, it's an easier programming model, with no server infrastructure
required. Of course, the data is more vulnerable and the user can't
access it from multiple clients, so you should only use it for
non-critical data, in particular cached versions of data that's also
"in the cloud". See "Offline": What does it mean and why should I
care? for a general discussion of offline technologies, of which
client-side storage is one component.
if(typeof(Storage)!=="undefined")
{
// this will store and retrieve key / value for the browser session
sessionStorage.setItem('your_key', 'your_value');
sessionStorage.getItem('your_key');
// this will store and retrieve key / value permanently for the domain
localStorage.setItem('your_key', 'your_value');
localStorage.getItem('your_key');
}

Better you can try HTML 5 Local Storage or Web SQL, you can have more options in it.Web SQL support is very less when compared to Local Storage. Have a look on this http://diveintohtml5.info/storage.html

How to push update to user without multiple updates if more than one tab is open?

When a function is run, an update gets pushed to the user,
but if the user has 2 or more tabs open, it'll update equal to the number of tabs open.
I'd like the update to only apply once per user regardless of how many connections have have to the server.
any ideas on how to stop/fix this?

I think this is related to sharing a connection in the browser side. With this support, regardless of the number of tabs, a physical connection will be established once per device and notifications from the server will be broadcasted to the tabs. So, the server don't need to consider that.
In fact, I added this support to my library, the portal, but there were some arguments. If you want to implement or enhance the connection sharing from scratch, see my answer to a similar question. However, that is somewhat out of date.

How to prevent multiple ajax requests when multiple browser windows are open

I have been working on having a instant messaging system on a website(kind of like Facebook and Gmail). I have javascript poll the server for new messages.
If the user has multiple instances of the site open is there any way to prevent each one from making requests?

You can assign each "new" load of the page with a UUID, and drop requests from all UUIDs that are not the most recent one for user. You need to send the UUID back in each request. If you want to get advanced, you can have the JavaScript on the page check the response to see if the server says it's an old UUID, and that it should stop making the requests.

Register each connection with a GUID generated on the fly in the browser. Check the GUID and the username pair to see which page was owner last. On page load, declare yourself a new window and that you're taking ownership. Sort of PageJustLoadedMakeMeOwner(myGuid, username)
Then have that GUID targeted frame update the server regularly for it's ownerness of the page.
If it stops updating the server, then have rules in the server that allow the next page to contact to take ownership of for that username.
Have pages that have lost ownership self-demote to only accessing once a minute or so.
The response to check if a given page is owner of that username is really fast. Takes almost no time to do, as far as the client is aware. So the AJAX there doesn't really restrict you.
Sort of a AmIOwner(username, myGuid) check (probably do this every five seconds or so). If true, then do the thing that you want to happen. If false, then poll to see if the owner of the page is vacant. If true, then take ownership. If false, then poll again in xx amount of seconds to see if the owner is vacant.
Does that make any sort of sense?

You could do something for multiple instances in the same browser, but there's nothing you can do if the user has multiple browsers. (Granted, not that common scenario)
If you still want to give it a try, probably the easiest way would be to keep a timestamp of the last request in a cookie and make new request only upon a certain threshold. You still might run a small race until the multiple instance s settle down, but if you use fuzzy time period for the polls, the instances should settle down pretty quickly to a stable state where one of the instances makes the call and the others reuse the result from the last call.
The main advantage of that approach is that the requests can be made by any of the instances, so you don't have to worry about negotiating a "primary" instance that makes the calls and figuring a fallback algorithm if the user closes the "primary" one. The main drawback is that since it's a fuzzy timing based algorithm, it does not fully eliminate the race conditions and occasionally you'll have two instances make the requests. You'll have to fine tune the timing a bit, to minimize that case, but you can't fully prevent it.

We Keep Coding

JavaScript is the programming language of the Web.