NodeJs-Cluster with Socket.IO and Namespaces

NodeJs-Cluster with Socket.IO and Namespaces - javascript

I am building a browsergame. The prototype is done and now i am thinking of how i can scale and host it. I already have a "gameserver", which currently handles one gamesession. A game session consists of a socketio connection to 8 players (max).
I would like to use e.g. Node-Cluster to have multiple "gameservers" in the same docker-container, in which each gameserver should handle one ore more gamesessions, so that all cores of the cpu can be used.
A gamesession is just used by 8 players and the gamesessions should be isolated from each other.
As far as I know, Node-Cluster enables me to have all the subprocesses share the same port.
I thought of having a namespace for each gamesession (e.g. an ID in the browser to which the players connect anyways - The goal is: you get an URL which you sent to a friend in order to invite him, similar to e.g. skribble.io - https://localhost/?gameid=1234). So in my case, joining a namespace is basically the same as joining a game.
My goal is to have multiple of these docker-containers, and use a reverse proxy to route the incoming connection to the correct container, that handles the game, specified in the url.
But the problem here is, that i do not know, how i can forward the connection to the correct subprocess of the cluster.
so in the end it should look like following:
GameServerMaster [Maps ids to processes like: [ClusterNode1:[id1,id2,..]}
- GameServerClusterNode1
- GameSession1 [id1 io.off("/id1")]
- GameSession2 [id2 io.off("/id2)]
- ....
- GameServerClusterNode2
- GameSession3 [id3 io.off("/id3")]
- GameSession4 [id4 io.off("/id4)]
- ....
I have found socket.io-redis, but this seems a bit overpowered, because there is no broadcast - each game is isolated. (and also i do not want to host redis just for this usecase)
Also, i have found sticky-session, but by using this i do not know how to forward the incoming connection to the correct process, i think it just load-balances them, which is bad, because it must be routed to the one that actually handles the requested gamesession - but i think i need something like that anyways, or am i?
Is what i am doing (or planning to do) even the right approach? Do you have any suggestions in terms of technology or architecture?
Would it be better to throw away the whole clustering part and just have the docker-container handle one "node", that handles these multiple gamesessions?

I found myself in the same situation, and I came out with the following solution.
Idea:
You fork new process with the cluster module on different port.
And with the master process you act like a router.
Example:
Before player joins a room, he asks the router,
(broadcast) where is room 123. The router asks his subprocesses
which of them hosts room 123, and forwards the result (eg. port 3001)
to the client. The client then connects to the exact process.
Problems:
If more than one subprocess has the same roomId, there will be a conflict,
but I guess you will figure out a way to avoid this,
even tho the probability for this to happen is very low.
Note to mention is that this solution is to use all the cores of your machine, but not a solution to distribute to multiple machines.
If you want to distribute it to multiple machines, you should reconsider Redis as your first choice.

Related

Should I use one publish per collection or several?

I am working on a user group system. Each group has several features and I want to make the interaction with the group collection as secure and simple as it can be since it is still at an early stage.
Right now, I have a group section in my website where I use several nested pages. The purpose of the section is to allow the user to get in a group, request membership if the group is private, browse one group objects, etc.
For example, within my group section, I can load in the yield a "see all groups" page, a "create a new group" page or "see only my groups" (the ones I am member of) or a "view group" to get a group details.
My first approach was to create one controller.js file for each subpage, which call one subscription tailored for the subpage needs. For instance, I have an 'all_group' publication/subscription for the "see all groups" subpage and a "my_groups" one for the "see only my groups" subpage.
But this is becoming really messy. Additionally, I declared my "group" collection in the both folder, so I am not sure to follow where the data available to the client comes from.
Now that I explained the situation, here are my questions:
when I do a console.table(Groups.find().fetch()); on client, I see fields that shouldn't be there (i.e. not returned by my current publication or any other). Is that because I declared the "group" collection on client side? How to fix that?
Should I get rid of all these publications and create only one with everything the client is allowed to see? I would then subscribe to it from the group section page controller and work with a single set of data.
Should I simply block any insert/update/remove from client with allow/deny rules and make these using methods only?
Would it be safe/advised to put my methods in both folder so I don't lose the latency compensation feature?
EDIT
Ok, I was freaking out because I had all my collection data on client-side but it was just a bad query in the publish (I was using both field:1 and field:0 projections).
Two questions remain:
If I use methods, I assume I don't have to deny everything in the native driver, I just have to be more restrictive than what method allow, right?
If I put my methods in the both folder, it will be executed both on client and server, so in "client offline" context, even if the client mess with my methods, the server should roll back the changes if the client result is different than his (assuming that the changes couldn't be done using the allow-deny rules)? And I will have latency compensation working even with the methods?

To better control and visualize your subscriptions, you can use msavin:mongol.
Creating one catch-all publication is not a good idea performance-wise (sending all data to all clients will be a pain to everyone involved).
If you use methods and have removed autopublish, then yes everything is denied... Except for updates on the user's own profile. You may want to manually deny that too.
With methods and collection rules you should share the validation code. This way, client and server validate the same way (and should always come up with the same results), so unless your client is screwing up with the console there should be no issue and lag compensation should remain.
If your server method does something the client should not know about, you can also define the method once on the server, and once on the client. Same effect.

How to set max number of clients in sockjs connection

I am currently experimenting with sockjs. How can I set the maximum number of clients that can join a sockjs server?
I understand that i can achieve this by simply closing any new connection if the total number of connections is above x, but i dont really find it elegant. I was hoping that there was some inbuilt way of doing this.
I am currently achieving this by:
var numConnections = currentConnections.length;
console.log('\nNumber of Connections = ' + numConnections);
// check for number of connections
if (numConnections >=3) {
//disconnect the client
conn.end();
}

As counter is very application based thing, especially in horizontally scaled environments with load balanced multiple SockJS servers - it is up to you to implement the mechanics of counting.
It is totally fine to have such minimalistic counter like you did. If you will have multiple services, you can easily switch to using redis for storing count of clients and share that value across different processes.
My personal opinion - you should not worry about such simple things in early stages of development, and focus on real challenges and problems as they occur. Though is good to think, but experience will help you, just don't waste time on such little things.

Memcache vs Redis vs Javascript Hash object

I know memcache and redis are used when caching needs to be there for more than one servers.
I'm creating a node application which will run on single server only and uses mysql as db, and i need to hash around 100,000 keys and each key will contain json string of 200 in length, so that i dont have to call mysql for reads.
If i use memcache or redis i will use a callback to get my data, but if i use javascript hash i can get the data synchronously, but will it affect the application somehow, like high usage of memory. Which one i should be using for a application like this?

I know memcache and redis are used when caching needs to be there for more than one servers.
Not necessarily, for instance Facebook puts a memcache instance in front of each of their mysql servers. You can use Redis/Memcache for fast computation (e.g. real-time analytics) without having a whole cluster.
and i need to hash around 100,000 keys and each key will contain json string of 200 in length, so that i dont have to call mysql for reads.
It seems like premature optimization to mee, if MySQL have enough RAM (the dataset live in memory) you don't have to worry about performance, that's just 100 keys.
If i use memcache or redis i will use a callback to get my data
If really depends on what language you use (Ruby and Python offers synchronous Redis clients) and what type of paradygm is used (event-loop, thread pool...)
but if i use javascript hash i can get the data synchronously
To be more precise, that's just because you are using node_redis and not because you are using a javascript "hash" (an object in fact).
but will it affect the application somehow, like high usage of memory
It depends if you are loading all keys in your process or not, if you are using a Redis Hash, you will be able to only query the field you want and not the whole field each time.
Which one i should be using for a application like this?
The best thing to keep in mind is to lower the number of application you have to maintain in your stack while still using the right tool for the right job. Here MySQL could be enough but if you really want to use Redis or MemCached, I would go for Redis. It will offers simirarly the same features as memcached with the same performances will allowing you to use its other data-structures in the future without needing another application in your stack.
Moreover, if you put all your data in a Redis HASH, you will be able to retrieve a field (hget) or a group of fields (hmget) or all fields (hgetall) with just one call.
Finally, regarding recent statistics and Redis ecosystem (GUI, hosting, librairies, ...), Redis seems to be way more future proof than Memcached if you really want to go that way.
Disclaimer: I am the founder of Redsmin, an online developer oriented service for administrating and monitoring Redis.

It depends- you could even opt for memcached over mysql :). For simple operation such as only -readonly just storing it within your javascript code (I believe as dictionary objects) is enough. But be sure that you have enough RAM :) .

Javascript Multiplayer Game - Server Side Validation.

I'm starting my hand at creating a multi-player game using HTML 5 / Javascript.
I'm currently doing some prototypes on different ideas on how to stop people from cheating. Now I know I need to do everything server side and just have the client sending "actions".
My issue is I can't workout the best way to store the game state for each player.
So just looking at something basic; Two players running round a empty map.
Currently my idea is
Both clients(sockets.io) send their actions to a Node.JS server that then responds with a X/Y coord. This is fine. But obviously both clients need to know where the other player is.
I thought about doing this by creating a new database table for each game and having the game state stored in there so the two node.js connections can talk to eachother.
My question is, is this the best way to interact between two node.js connections, would it be fast enough? Or is there a design patten for this specific task that I'm missing?
thanks

Creating a new table per game is generally considered a Terrible Idea.
If you don't need persistence -- ie. in the case your Node.js server croaks, a game may be lost -- you can simply store the games in memory, in an object, let's say games, which might contain an object that has an array of players, each of which might contain x and y coordinates, etc.
var games = {
123019240: {
players: {
123: {x: 1, y: 1, name: 'Joe'},
456: {x: 2, y: 2, name: 'Jimbob'}
}
}
};
If you do need persistence, though, you really should probably look into some other databases than SQL -- for instance Redis might be a good choice for the task.
In any case, SQL feels like the wrong tool, and creating new tables on demand even more so. (If you are using SQL, though, consider a table layout with game_id, player_id, x and y and be sure to have an index on game_id :) )

The approach you need to take is the following:
Each client sends it's coordinate to the server at specific intervals (using emit). The server checks this position for validity and stores it the db. It then uses a broadcast message (http://socket.io/#how-to-use - Broadcasting messages) to send this position to all the clients.
Each client in turn will update the displayed position of the character/player that moved.
You cannot create a direct connection between two players because they are using a browser. Each connection must pass through a single node.js server.
You can view some tutorials here:
http://www.nodejs-news.com/nodejs-tech/Nodejs-Socketio-Building-HTML5-game/
(They use the Impact engine but the principles are the same)

In my experience, if you have users interacting with each other in a clustered server environment, its best to have them interacting on the same server.
Example:
5 game servers, people create games together in a lobby. A game contains a max number of people. With this scenario, it is much much much easier to keep the game on one server, and make all users connect to that one game server. This prevents needed convo between servers, and keeps game state consistent and fast!
Eve Online is a great example of this. Each 'region' is it's own server, and when you travel far enough through the universe, you are transparently moved to another game server. That way if you're fighting somebody, chances are they're on the same server. Then the game server is free to periodically write data to the DB. This is the best way, as the user never has to wait for the DB. They communicate directly with the game server, game server communicates with DB every once in a while. At most, your user would lose only a few seconds of data (and so would everyone else on that game server).

How to design a multi-user ajax web application to be concurrently safe

I have a web page that shows a large amount of data from the server. The communication is done via ajax.
Every time the user interacts and changes this data (Say user A renames something) it tells the server to do the action and the server returns the new changed data.
If user B accesses the page at the same time and creates a new data object it will again tell the server via ajax and the server will return with the new object for the user.
On A's page we have the data with a renamed object. And on B's page we have the data with a new object. On the server the data has both a renamed object and a new object.
What are my options for keeping the page in sync with the server when multiple users are using it concurrently?
Such options as locking the entire page or dumping the entire state to the user on every change are rather avoided.
If it helps, in this specific example the webpage calls a static webmethod that runs a stored procedure on the database. The stored procedure will return any data it has changed and no more. The static webmethod then forwards the return of the stored procedure to the client.
Bounty Edit:
How do you design a multi-user web application which uses Ajax to communicate with the server but avoids problems with concurrency?
I.e. concurrent access to functionality and to data on a database without any risk of data or state corruption

Overview:
Intro
Server architecture
Client architecture
Update case
Commit case
Conflict case
Performance & scalability
Hi Raynos,
I will not discuss any particular product here. What others mentioned is a good toolset to have a look at already (maybe add node.js to that list).
From an architectural viewpoint, you seem to have the same problem that can be seen in version control software. One user checks in a change to an object, another user wants to alter the same object in another way => conflict. You have to integrate users changes to objects while at the same time being able to deliver updates timely and efficiently, detecting and resolving conflicts like the one above.
If I was in your shoes I would develop something like this:
1. Server-Side:
Determine a reasonable level at which you would define what I'd call "atomic artifacts" (the page? Objects on the page? Values inside objects?). This will depend on your webservers, database & caching hardware, # of user, # of objects, etc. Not an easy decision to make.
For each atomic artifact have:
an application-wide unique-id
an incrementing version-id
a locking mechanism for write-access (mutex maybe)
a small history or "changelog" inside a ringbuffer (shared memory works well for those). A single key-value pair might be OK too though less extendable. see http://en.wikipedia.org/wiki/Circular_buffer
A server or pseudo-server component that is able to deliver relevant changelogs to a connected user efficiently. Observer-Pattern is your friend for this.
2. Client-Side:
A javascript client that is able to have a long-running HTTP-Connection to said server above, or uses lightweight polling.
A javascript artifact-updater component that refreshes the sites content when the connected javascript client notifies of changes in the watched artifacts-history. (again an observer pattern might be a good choice)
A javascript artifact-committer component that may request to change an atomic artifact, trying to acquire mutex lock. It will detect if the state of the artifact had been changed by another user just seconds before (latancy of javascript client and commit process factors in) by comparing known clientside artifact-version-id and current serverside artifact-version-id.
A javascript conflict-solver allowing for a human which-change-is-the-right decision. You may not want to just tell the user "Someone was faster than you. I deleted your change. Go cry.". Many options from rather technical diffs or more user-friendly solutions seem possible.
So how would it roll ...
Case 1: kind-of-sequence-diagram for updating:
Browser renders page
javascript "sees" artifacts which each having at least one value field, unique- and a version-id
javascript client gets started, requesting to "watch" the found artifacts history starting from their found versions (older changes are not interesting)
Server process notes the request and continuously checks and/or sends the history
History entries may contain simple notifications "artifact x has changed, client pls request data" allowing the client to poll independently or full datasets "artifact x has changed to value foo"
javascript artifact-updater does what it can to fetch new values as soon as they become known to have updated. It executes new ajax requests or gets feeded by the javascript client.
The pages DOM-content is updated, the user is optionally notified. History-watching continues.
Case 2: Now for committing:
artifact-committer knows the desired new value from user input and sends a change-request to the server
serverside mutex is acquired
Server receives "Hey, I know artifact x's state from version 123, let me set it to value foo pls."
If the Serverside version of artifact x is equal (can not be less) than 123 the new value is accepted, a new version id of 124 generated.
The new state-information "updated to version 124" and optionally new value foo are put at the beginning of the artifact x's ringbuffer (changelog/history)
serverside mutex is released
requesting artifact committer is happy to receive a commit-confirmation together with the new id.
meanwhile serverside server component keeps polling/pushing the ringbuffers to connected clients. All clients watching the buffer of artifact x will get the new state information and value within their usual latency (See case 1.)
Case 3: for conflicts:
artifact committer knows desired new value from user input and sends a change-request to the server
in the meanwhile another user updated the same artifact successfully (see case 2.) but due to various latencies this is yet unknown to our other user.
So a serverside mutex is acquired (or waited on until the "faster" user committed his change)
Server receives "Hey, I know artifact x's state from version 123, let me set it to value foo."
On the Serverside the version of artifact x now is 124 already. The requesting client can not know the value he would be overwriting.
Obviously the Server has to reject the change request (not counting in god-intervening overwrite priorities), releases the mutex and is kind enough to send back the new version-id and new value directly to the client.
confronted with a rejected commit request and a value the change-requesting user did not yet know, the javascript artifact committer refers to the conflict resolver which displays and explains the issue to the user.
The user, being presented with some options by the smart conflict-resolver JS, is allowed another attempt to change the value.
Once the user selected a value he deems right, the process starts over from case 2 (or case 3 if someone else was faster, again)
Some words on Performance & Scalability
HTTP Polling vs. HTTP "pushing"
Polling creates requests, one per second, 5 per second, whatever you regard as an acceptable latency. This can be rather cruel to your infrastructure if you do not configure your (Apache?) and (php?) well enough to be "lightweight" starters. It is desirable to optimize the polling request on the serverside so that it runs for far less time than the length of the polling interval. Splitting that runtime in half might well mean lowering your whole system load by up to 50%,
Pushing via HTTP (assuming webworkers are too far off to support them) will require you to have one apache/lighthttpd process available for each user all the time. The resident memory reserved for each of these processes and your systems total memory will be one very certain scaling limit that you will encounter. Reducing the memory footprint of the connection will be necessary, as well as limiting the amount continuous CPU and I/O work done in each of these (you want lots of sleep/idle time)
backend scaling
Forget database and filesystem, you will need some sort of shared memory based backend for the frequent polling (if the client does not poll directly then each running server process will)
if you go for memcache you can scale better, but its still expensive
The mutex for commits has to work globaly even if you want to have multiple frontend servers to loadbalance.
frontend scaling
regardless if you are polling or receiving "pushes", try to get information for all watched artifacts in one step.
"creative" tweaks
If clients are polling and many users tend to watch the same artifacts, you could try to publish the history of those artifacts as a static file, allowing apache to cache it, nevertheless refreshing it on the serverside when artifacts change. This takes PHP/memcache out of the game some for requests. Lighthttpd is verry efficent at serving static files.
use a content delivery network like cotendo.com to push artifact history there. The push-latency will be bigger but scalability's a dream
write a real server (not using HTTP) that users connect to using java or flash(?). You have to deal with serving many users in one server-thread. Cycling through open sockets, doing (or delegating) the work required. Can scale via forking processes or starting more servers. Mutexes have to remain globaly unique though.
Depending on load scenarios group your frontend- and backend-servers by artifact-id ranges. This will allow for better usage of persistent memory (no database has all the data) and makes it possible to scale the mutexing. Your javascript has to maintain connections to multiple servers at the same time though.
Well I hope this can be a start for your own ideas. I am sure there are plenty more possibilities.
I am more than welcoming any criticism or enhancements to this post, wiki is enabled.
Christoph Strasen

I know this is an old question, but I thought I'd just chime in.
OT (operational transforms) seem like a good fit for your requirement for concurrent and consistent multi-user editing. It's a technique used in Google Docs (and was also used in Google Wave):
There's a JS-based library for using Operational Transforms - ShareJS (http://sharejs.org/), written by a member from the Google Wave team.
And if you want, there's a full MVC web-framework - DerbyJS (http://derbyjs.com/) built on ShareJS that does it all for you.
It uses BrowserChannel for communication between the server and clients (and I believe WebSockets support should be in the works - it was in there previously via Socket.IO, but was taken out due to the developer's issues with Socket.io) Beginner docs are a bit sparse at the moment, however.

I would consider adding time-based modified stamp for each dataset. So, if you're updating db tables, you would change the modified timestamp accordingly. Using AJAX, you can compare the client's modified timestamp with the data source's timestamp - if the user is ever behind, update the display. Similar to how this site checks a question periodically to see if anyone else has answered while you're typing an answer.

You need to use push techniques (also known as Comet or reverse Ajax) to propagate changes to the user as soon as they are made to the db. The best technique currently available for this seems to be Ajax long polling, but it isn't supported by every browser, so you need fallbacks. Fortunately there are already solutions that handle this for you. Among them are: orbited.org and the already mentioned socket.io.
In the future there will be an easier way to do this which is called WebSockets, but it isn't sure yet when that standard will be ready for prime time as there are security concerns about the current state of the standard.
There shouldn't be concurrency problems in the database with new objects. But when a user edits an object the server needs to have some logic that checks whether the object has been edited or deleted in the meantime. If the object has been deleted the solution is, again, simple: Just discard the edit.
But the most difficult problem appears, when multiple users are editing the same object at the same time. If User 1 and 2 start editing an object at the same time, they will both make their edits on the same data. Let's say the changes User 1 made are sent to the server first while User 2 is still editing the data. You then have two options: You could try to merge User 1's changes into the data of User 2 or you could tell User 2 that his data is out of date and display him an error message as soon as his data gets send to the server. The latter isn't very user friendly option here, but the former is very hard to implement.
One of the few implementations that really got this right for the first time was EtherPad, which was acquired by Google. I believe they then used some of EtherPad's technologies in Google Docs and Google Wave, but I can't tell that for sure. Google also opensourced EtherPad, so maybe that's worth a look, depending on what you're trying to do.
It's really not easy to do this simultaneously editing stuff, because it's not possible to do atomic operations on the web because of the latency. Maybe this article will help you to learn more about the topic.

Trying to write all this yourself is a big job, and it's very difficult to get it right. One option is to use a framework that's built to keep clients in sync with the database, and with each other, in realtime.
I've found that the Meteor framework does this well (http://docs.meteor.com/#reactivity).
"Meteor embraces the concept of reactive programming. This means that you can write your code in a simple imperative style, and the result will be automatically recalculated whenever data changes that your code depends on."
"This simple pattern (reactive computation + reactive data source) has wide applicability. The programmer is saved from writing unsubscribe/resubscribe calls and making sure they are called at the right time, eliminating whole classes of data propagation code which would otherwise clog up your application with error-prone logic."

I can't believe that nobody has mentioned Meteor. It's a new and immature framework for sure (and only officially supports one DB), but it takes all the grunt work and thinking out of a multi-user app like the poster is describing. In fact, you can't NOT build a mult-user live-updating app. Here's a quick summary:
Everything is in node.js (JavaScript or CoffeeScript), so you can share stuff like validations between the client and server.
It uses websockets, but can fall back for older browsers
It focuses on immediate updates to local object (i.e. the UI feels snappy), with changes sent to the server in the background. Only atomic updates are allowed to make mixing updates simpler. Updates rejected on the server are rolled back.
As a bonus, it handles live code reloads for you, and will preserves user state even when the app changes radically.
Meteor is simple enough that I would suggest you at least take a look at it for ideas to steal.

These Wikipedia pages may help add perspective to learning about concurrency and concurrent computing for designing an ajax web application that either pulls or is pushed state event (EDA) messages in a messaging pattern. Basically, messages are replicated out to channel subscribers which respond to change events and synchronization requests.
https://en.wikipedia.org/wiki/Category:Concurrency_control
https://en.wikipedia.org/wiki/Distributed_concurrency_control
https://en.wikipedia.org/wiki/CAP_theorem
https://en.wikipedia.org/wiki/Operational_transformation
https://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
There are many forms of concurrent web-based collaborative software.
There are a number of HTTP API client libraries for etherpad-lite, a collaborative real-time editor.
django-realtime-playground implements a realtime chat app in Django with various real-time technologies like Socket.io.
Both AppEngine and AppScale implement the AppEngine Channel API; which is distinct from the Google Realtime API, which is demonstrated by googledrive/realtime-playground.

Server-side push techniques are the way to go here. Comet is (or was?) a buzz word.
The particular direction you take depends heavily on your server stack, and how flexible you/it is. If you can, I would take a look at socket.io, which provides a cross-browser implementation of websockets, which provide a very streamline way to have bidirectional communication with the server, allowing the server to push updates to the clients.
In particular, see this demonstration by the library's author, which demonstrates almost exactly the situation you describe.

We Keep Coding

JavaScript is the programming language of the Web.