I'm building an app that uses Angular.js for the front and Socket.IO & Redis on Express on the back.
The base usage of sockets is to allow one type of users to push items to lists that are consumed by groups of a second type of users.
A simple example:
Students can push messages into a class list and only teachers of this class can see the list.
I'm trying to sync the list between multiple teachers that are connected at different times,
the lists are stored in a Redis store and I'm wondering if the correct approach to sync clients:
A. Send the list on each update - saving the need of having to manage sync in the client and having potential missmatches.
B. Send the list only on connection and apply incremental updates on successive events.
I'm sure this has been addressed in the past as it seems quite a basic issue with socket communication but I was not able to find a definitive answer.
Thanks!
If the list is not particularly large, then I'd think you want to go with something simple. The simplest thing I can think of is as follows:
Student creates change to the list and sends message to the server (which could be an ajax call, doesn't have to be a web socket).
Server receives message and puts it into the appropriate list storage.
Server then looks for any clients monitoring that list and sends an update message to them.
Teacher connects to the server. Any lists that the teacher is monitoring are sent in their entirety to the teacher and they are subscribed to updates for those lists.
This way, you're never actually doing sync which simplifies matters a lot - you're just doing download list and then incremental updates. There's only one master store. If a client goes off-line, they just get a fresh copy of the list and resubscribe to updates when they come back on-line. Avoiding sync makes the whole solution a lot simpler. This assumes the data is not particularly large so it's feasible to just get a fresh copy of the list as needed.
If you do want to do sync, then a fairly straightforward technique is to maintain one master copy of the store on the server and have every change transaction coin a monotonically increasing transaction ID. Then, each synced copy can just keep track of the last transaction ID that they synced and request all transactions since then. The data store needs to keep track of all changes as transactions (often by writing to a transaction log for each transaction or perhaps a feature in some databases) so any given set of transactions can be played back for any client that is syncing.
Related
We are in the middle of migrating a fairly complex web app over to a React/Redux architecture.
One major design question i have not been able to find an answer for, is how is data stored in redux supposed to be 'refreshed'?
For example, say i load a list of items at a route like /items. Now the user wants to view a specific item and goes to /items/<id>.
The flow, as i understand should work something like, on the /items request, we make a API request and store all our items in the redux store. When a user clicks on a specific item, we pick out that specific item from our redux store, not needing to make a new API request as we already have the data.
This is all fine & well. But the question then, is what is the correct pattern for 'invalidating' this data?
Say, the user loads the list of items and walks away from there computer for a few hours. Now the list of items is theoretically out of date with the server.
How does one then, go about keeping the store up to date with the server?
You could use one of the following:
1) short polling (i.e. polling your server once in a while and update the store items)
2) long polling (you open connection, keep it until the data on the server changes, the server then sends you the new data and closes the connection, you reopen it etc...)
3) live updates with websockets, which provide bidirectional communication (this means the server can push data to the client)
Is it when the state changes, react will automatically rerender the component, this might not what you want, but what do you mean by the correct pattern for 'invalidating' this data? like 30 minute it will dispatch an action checkout the state change?
i have a parse.com based app with offline capabilities where the whole database is stored locally (localStorage on web clients and parse.com local database on mobile clients). I am looking for a design solution to efficiently update the local database with latest changes in the remote database. The options that I could think of are:
Journaling with code triggers. Setup cloud code triggers (afterSave, afterDelete) for every object and add a log to the journal table every time an object has been saved or destroyed. The clients will query the table for updates and remember lastUpdateTime for subsequent requests.
Pros: a) we can have a very detailed summary what has been changed and who made the change. b) all the changes are instantly available for other clients (e.g. table call be polled for notifications in real time with little delays)
Cons: a) there may be too many entries in the table
Journaling with background job. Setup a background job that queries all tables by updatedAt, populates journal table and saves the lastUpdateTime for subsequent requests.
Pros: a) less entries in the journal table
Cons: a) changes are available with unpredictable delay (not suitable for real time notifications?) b) cannot track deletes, there's still a need to setup another table to track deletes or implement soft-delete c) less details in the log (e.g. when object is created by one user and deleted by another user, we will not know who created an object)
No journal. All clients query all tables by updatedAt and store lastUpdateTime for subsequent requests.
Pros: a) easy to implement, b) changes are instantly available
Cons: a) same problem with deletes as in 2, b) inefficient (i believe that querying 20+ tables by all clients is not a good idea
We also have an UI where user can look through the recent activity (who changed what), so I kind of lean towards number 1 approach, but the potential size of the table is worrying me.
Client needs ability to recover irrespective of current state. This is critical if you are using local storage that may get cleared by the user. In that case you need a recoverable state. Additionally the client needs to be able to fetch only the transaction required / relevant to it.
Implementing a transaction store on the backend
Creating a recovery mechanism in case the localstorage offline is corrupted
Journaling with code triggers or use of event source db type mechanism so that you have complete history and can use that to build tables for the client.
In conclusion - Modified Journaling with Code Triggers (Modified to recover and storing the state for client in server and using that to query the data)
I have an ionic app and a Parse.com backend. My users can perform CRUD functions on exercise programmes, changing every aspect of the programme including adding, deleting, editing the exercises within it.
I am confused about when to save, when to call the server and how much data can be held in services / $rootScope?
Typical user flow is as below:
Create Programme and Client (Create both on server and store data in $localStorage).
User goes to edit screen where they can perform CRUD functions on all exercises within the programme. Currently I perform a server call on each function so it is synced to the backed.
The user may go back and select a different programme - downloading the data and storing it localStorage again.
My question is how can I ensure that my users data is always saved to the server and offer them a responsive fast user experience.
Would it be normal to have a timeout function that triggers a save periodically? On a mobile the amount of calls to the server is quite painful over a poor connection.
Any ideas on full local / remote sync with Ionic and Parse.com would be welcome.
From my experience, the best way to think of this is as follows:
localStorage is essentially a cache layer, which if up to date is great because it can reduce network calls. However it is limited to the current session, and should be treated as volatile storage.
Your server is your source of truth, and as such, should always be updated.
What this means is, for reads, localstorage is great, you don't need to fetch your data a million times if it hasn't changed. For writes, always trust your server for long term storage.
The pattern I suggest is, on load, fetch any relevant data and save it to local storage. Any further reads should come from local storage. Edits, should go directly to the server, and on success, you can write those changes to localstorage. This way, if you have an error on save, the user can be informed, and/or you can use localstorage as a queue to continue trying to post the data to the server until a full success.
This is called "offline sync" or sometimes "4 ways data binding". The point is to cache data locally and sync it with a remote backend. This is a very common need, but the solutions are unfornately not that common... The ideal flow would follows this philosophy:
save data locally
try to sync it with server (performing auto merges)
And
Periodically sync, along with a timer and maybe some "connection resumed" event
This is very hard to achieve manually. If been searching modules for a long time, and the only ones that come to my mind don't realy fit your needs (become they often are backend providers that give you frontend connectors; and you already have an opiniated backend), but here they are anyway:
Strongloop's Loopback.io
Meteor
PouchDB
I have a million Dinosaur Users all logged in.
Dinosaurs want to see when other Dinosaurs update their profile in real time, so they are hooked into the NodeJS/Mongoose model as:
dinosaur.schema.post('save', function (doc) {
socket.emit('dinosaur:save', doc);
});
where socket is the socket of the connected Dinosaur.
Dinosaurs are also going to see real time updates from several other things. Maybe news, comments, etc etc.
So, my question, is there some instance where this emitting of events will grow large and impact performance?
On the client side, I'll have something like socket.on('dinosaur:save', function(){})... I destroy the listeners when not needed. BUT, if I'm listening to every dinosaur:save, I could theoretically be processing that for a million saves a second (say if every dinosaur updated their profile in the same second). It just seems like there's a better way to do that with large data sets.
I imagine there are several other events I may want to watch and I'm just wondering if there are some recommended methods for this kind of socket management.
EDIT: To be clear, I'm aware of rooms, but if I, for example, have a scrolling list of all nearby Dinosaurs in my area, I probably just want to hook into receiving all of the dinosaur:save events. So I'm still not sure.
Notifying a million of anything is a lot of packets and if the thing you're notifying occurs a lot, that's a lot of a lot and it's even more than a lot to try to show on screen.
The usual first things to consider are:
How real-time do these notifications really have to be? Can you batch up 60 seconds or longer of notifications into one packet per user per notification period?
Does every user really have to see every single change from every other user. You know there's absolutely no way that any user interface can present the state of a million other users. So, I'd argue that every user doesn't have to know about the state of every other user. Maybe if there's 1-50 other users, but not if there's a million.
Can you algorithmically determine which users state a given user might be interested in and only broadcast to them. For example, can you keep them up to date only on other users that are geographically near them?
Can you have a user interface where the user tells you which other users they want to track so you only update those? Or perhaps some combination of users they tell you about and users who are geograpnhically interesting to them. The point is that you can't watch a million users anyway so you're going to have to invent a UI that shows a lot less than that anyway.
You can always have a UI that will fetch the state of other users upon demand if the client doesn't already have that state, so you don't have to keep the state for all million users in each client (since it can't possibly all be shown at once anyway). If the user browses to see some things they don't already have, you just fetch it from the server via the socket or ajax call.
Oh, and at the scale you're talking about, you are probably going to need to have your user's connections spread out among several servers so you're going to have to handle that complexity too.
In case anyone comes across this in the future, here's the problem and my current solution.
We want real time updates. If you're on someones profile page, and t hey update it, show that. If you're viewing some catered result set, say, of user profiles, and any one of those users updates their profile, show that. If you're on another page and some counter changes, say, of users near you, show that. But, we aren't going to be on all these pages at the same time, so on the client side i don't even want to know about the other changes if I'm not on those other pages. This is the problem that could cause me to be notified of everything which could cause bandwidth issues, and a whole bunch of unecessary socket usage.
So, the way I'm solving my problem is using rooms. I'm using rooms over namespaces because namespaces are generally used for two mutually disjoint applications accessing the same socket resource. Rooms also just fit better with this application.
I've created a dynamic, on the fly room for every user profile page. When a visitor opens a profile page have the client call socket.emit("joinRoom", modelName + ":" + modelObj._id); and on the server handle that with socket.on('joinRoom', function(room) {
socket.join(room);
});. This automatically creates the room if there isn't one yet. And adds the user to it. modelName could be whatever we want. It's just a naming convention of how I split up the rooms. You can call rooms anything. BUT the important part is the ._id at the end. Using Mongoose, no two DB objects can have the same ._id, so this guarantees unique rooms.
When the owner of this profile page updates their information we call on the server:
io.sockets.in('Dinosaur:' + doc._id).emit("Dinosaur:" + doc._id + ":updated", doc); And receive that on the client side using
socket.on(modelName + ":" + modelObj._id + ":updated" , function(msg){
// do something
})
Viola, we have only sent this necessary information to the interested clients.
--
( A separate problem ) --
Using this approach, we can also deliver data pertaining to multiple users. if we have a catered result list of user profiles, for each profile, we can add the current user into the rooms of all of those catered result profiles. (So they're in a room belonging to _id X, _id Y, _id Z, etc.
The current user would then be in multiple rooms, all reflecting the immediate updates of those users, and therefore the entire catered result list, for whatever list it may be (perhaps it is "Dinosaurs nearby").
Another approach to this, especially if the list is somewhat more static, is to have the socket re-deliver the result set every X seconds, using the same socket, and just the same initial room.
My app is built in PHP (served on nginx/php-fpm) and I use node/js with socket.io for users push notifications. These are pushed using Redis pub/sub to interlink between PHP and node.js
The node.js app maintains an array of online userid's. They get added when a user connects to socket.io and removed from the array when they disconnect.
MySQL is used as the main database and I have a standard relationship table which denotes who is following who. A list of followers userid's is retrieved when a user logs and displayed to them.
I now wish to intersect these two sets of data to provide live online status's of these relationships in a similar manner to facebook (a green light for online and grey for offline)
What would be the most performant and most scale-able way of managing this. My current thought process is along these lines:
On the client side we have a javascript array of followers user id's. Set up a timer client side which pushes this array to the node.js app every 60 seconds or so. Node.js handles inersecting the followers id's with its current array of online users and returns an object depicting which users are online.
Now this would work but it feels like it might be a heavy load on node.js to be consistently looping through users followers lists for every online user. Or perhaps I am wrong and this would be relatively trivial considering the main application itself is served by PHP and not via node which only currently handles notification pushing?
Regardless, is there a better way? It's worth noting that I also use redis to build users activity streams (The data is stored in MySQL and redis maintains lists of activity id's)
So seeing as I already have a Redis server active, would there be a better method leveraging Redis itself instead?
Thanks for any advice
If i remember right, when socket.io connected to client side, that makes a request to the client every time for checking active connection and return result, in callback succeffull you can put code that will be update time active users in DB. And when get last notes beyond 5 minutes from DB.