Publish/Subscribe reliable messaging: Redis VS RabbitMQ

Publish/Subscribe reliable messaging: Redis VS RabbitMQ - javascript

Background
I am making a publish/subscribe typical application where a publisher sends messages to a consumer.
The publisher and the consumer are on different machines and the connection between them can break occasionally.
Objective
The goal here is to make sure that no matter what happens to the connection, or to the machines themselves, a message sent by a publisher is always received by the consumer.
Ordering of messages is not a must.
Problem
According to my research, RabbitMQ is the right choice for this scenario:
Redis Vs RabbitMQ as a data broker/messaging system in between Logstash and elasticsearch
However, although RabbitMQ has a tutorial about publish and subscriber this tutorial does not present us to persistent queues nor does it mention confirms which I believe are the key to making sure messages are delivered.
On the other hand, Redis is also capable of doing this:
http://abhinavsingh.com/customizing-redis-pubsub-for-message-persistence-part-2/
but I couldn't find any official tutorials or examples and my current understatement leads to me to believe that persistent queues and message confirms must be done by us, as Redis is mainly an in memory-datastore instead of a message broker like RabbitMQ.
Questions
For this use case, which solution would be the easiest to implement? (Redis solution or RabbitMQ solution?)
Please provide a link to an example with what you think would be best!

Background
I originally wanted publish and subscribe with message and queue persistence.
This in theory, does not exactly fit publish and subscribe:
this pattern doesn't care if the messages are received or not. The publisher simply fans out messages and if there are any subscribers listening, good, otherwise it doesn't care.
Indeed, looking at my needs I would need more of a Work Queue pattern, or even an RPC pattern.
Analysis
People say both should be easy, but that really is subjective.
RabbitMQ has a better official documentation overall with clear examples in most languages, while Redis information is mainly in third party blogs and in sparse github repos - which makes it considerably harder to find.
As for the examples, RabbitMQ has two examples that clearly answer my questions:
Work queues
RPC example
By mixing the two I was able to have a publisher send to several consumers reliable messages - even if one of them fails. Messages are not lost, nor forgotten.
Downfall of rabbitMQ:
The greatest problem of this approach is that if a consumer/worker crashes, you need to define the logic yourself to make sure that tasks are not lost. This happens because once a task is completed, following the RPC pattern with durable queues from Work Queues, the server will keep sending messages to the worker until it comes back up again. But the worker doesn't know if it already read the reply from the server or not, so it will take several ACK from the server. To fix this, each worker message needs to have an ID, that you save to the disk (in case of failure) or the requests must be idempotent.
Another issue is that if the connection is lost, the clients blow up with errors as they cannot connect. This is also something you must prepare in advance.
As for redis, it has a good example of durable queues in this blog:
https://danielkokott.wordpress.com/2015/02/14/redis-reliable-queue-pattern/
Which follows the official recommendation. You can check the github repo for more info.
Downfall of redis:
As with rabbitmq, you also need to handle worker crashes yourself, otherwise tasks in progress will be lost.
You have to do polling. Each consumer needs to ask the producer if there are any news, every X seconds.
This is, in my opinion, a worst rabbitmq.
Conclusion
I ending up going with rabbitmq for the following reasons:
More robust official online documentation, with examples.
No need for consumers to poll the producer.
Error handling is just as simple as in redis.
With this in mind, for this specific case, I am confident in saying that redis is a worst rabbitmq in this scenario.
Hope it helps.

Regarding implementation, they should both be easy - they both have libraries in various languages, check here for redis and here for rabbitmq. I'll just be honest here: I don't use javascript so I don't know how are the respected libraries implemented or supported.
Regarding what you didn't find in the tutorial (or maybe missed in the second one where there are a few words about durable queues and persistent messages and acknowledging messages) there are some nicely explained things:
about persistence
about confirms (same link as you've provided in the question, just listing it here for clarity)
about reliability
Publisher confirms are indeed not in the tutorial but there is an example on github in amqp.node's repo.
With rabbit mq message travels (in most cases) like this
publisher -> exchange -> queue -> consumer and on each of these stops there is some sort of persistence to be achieved. Also if you get to clusters and queue mirroring you'll achieve even greater reliability (and availability of course).

i think they are both easy to use as there are many libraries developed for them both.
There are a fews to name such as disque, bull, kue, amqplib, etc...
The documentations for them are pretty good. You can simple copy and paste and have it running in few mins.
I use seneca and seneca amqp is a pretty good example
https://github.com/senecajs/seneca-amqp-transport

Related

Run jobs on FCFS basis in Nodejs from a database

I am developing a NodeJS application wherein a user can schedule a job (CPU intensive) to be run. I am keeping the event loop free and want to run the job in a separate process. When the user submits the job, I make an entry in the database (PostgreSQL), with the timestamp along with some other information. The processes should be run in the FCFS order. Upon some research on stackoverflow, I found people suggesting Bulljs (with Redis), Kue, RabbitMQ, etc. as a solution. My doubt is why do I need to use those when I can just poll the database and get the oldest job. I don't intend to poll the db at a regular interval but instead only when the current job is done executing.
My application does not receive too many simultaneous requests. And also users do not wait for the job to be completed. Instead they logout and are notified through mail when the job is done. What can be the potential drawbacks of using child_process (spawn/exec) module as a solution?

My doubt is why do I need to use those when I can just poll the database and get the oldest job.
How are you planning on handling failures? What if Node.js crashes with a job mid-progress, would that effect your users? Would you then retry a failed job? How do you support back-off? How many attempts before it should completely stop?
These questions are answered in the Bull implementation, RabbitMQ and almost every solution you'll find for your current challenge.
From what I noticed (child_process), it's a lower level implementation (low-level in Node.js), meaning that a lot of the functionality you'll typically require (failover/backoff) isn't included. You'll have to implement this.
That's where it usually becomes more trouble than it's worth, although admittedly managing, monitoring and deploying a Redis server may not be the most optimal solution either.
Have you considered a different approach, how would a periodic CRON job work? (For example).
The challenge with such a system is usually how you plan to handle failure and what impact failure has on your application and end-users.
I will say, in the defense of Bull, for a CPU intensive task I prefer to have a separated instance of the worker process, I can then re-deploy that single process as many times as I need. This keeps my back-end code separated and generally easier to manage, whilst also giving me the ability to easily scale up/down when required.
EDIT: I mention "more trouble than it's worth", if you're looking to really learn how technology like this is developed, go with child process and build your own abstractions on-top, if it's something you need today, use Bull, RabbitMQ or any purpose-built alternative.

Background processes in Node.js

What is a good aproach to handle background processes in a NodeJS application?
Scenario: After a user posts something to an app I want to crunch the data, request additional data from external resources, etc. All of this is quite time consuming, so I want it out of the req/res loop. Ideal would be to just have a queue of jobs where you can quickly dump a job on and a daemon or task runner will always take the oldest one and process it.
In RoR I would have done it with something like Delayed Job. What is the Node equivalent of this API?

If you want something lightweight, that runs in the same process as the server, I highly recommend Bull. It has a simple API that allows for a fine grained control over your queues.
If you're familiar with Ruby's Resque, there is a node implementation called Node-resque
Bull and Node-resque are all backed by Redis, which is ubiquitous among Node.js worker queues. They would be able to do what RoR's DelayedJob does, it's matter of specific features that you want, and your API preferences.

Background jobs are not directly related to your web service work, so they should not be in the same process. As you scale up, the memory usage of the background jobs will impact the web service performance. But you can put them in the same code repository if you want, whatever makes more sense.
One good choice for messaging between the two processes would be redis, if dropping a message every now and then is OK. If you want "no message left behind" you'll need a more heavyweight broker like Rabbit. Your web service process can publish and your background job process can subscribe.
It is not necessary for the two processes to be co-hosted, they can be on separate VMs, Docker containers, whatever you use. This allows you to scale out without much trouble.

If you're using MongoDB, I recommend Agenda. That way, separate Redis instances aren't running and features such as scheduling, queuing, and Web UI are all present. Agenda UI is optional and can be run separately of course.
Would also recommend setting up a loosely coupled abstraction between your application logic and the queuing / scheduling system so the entire background processing system can be swapped out if needed. In other words, keep as much application / processing logic away from your Agenda job definitions in order to keep them lightweight.

I'd like to suggest using Redis for scheduling jobs. It has plenty of different data structures, you can always pick one that suits better to your use case.
You mentioned RoR and DJ, so I assume you're familiar with sidekiq. You can use node-sidekiq for job scheduling if you want to, but its suboptimal imo, since it's main purpose is to integrate nodejs with RoR.
For worker daemonising I'd recommend using PM2. It's widely used and actively-maintained. It solves a lot of problems (e.g. deployment, monitoring, clustering) so make sure it won't be an overkill for you.

I tried bee-queue & bull and chose bull in the end.
I first chose bee-queue b/c it is quite simple, their examples are easy to understand, while bull's examples are bit complicated. bee's wiki Bee Queue's Origin also resonates with me. But the problem with bee is <1> their issue resolution time is quite slow, their latest update was 10 months ago. <2> I can't find an easy way to pause/cancel job.
Bull, on the other hand, frequently updates their codes, response to issues. Node.js job queue evaluation said bull's weakness is "slow issues resolution time", but my experience is the opposite!
But anyway their api is similar so it is quite easy to switch from one to another.

I suggest to use a proper Node.js framework to build you app.
I think that the most powerful and easy to use is Sails.js.
It's a MVC framework so if you are used to develop in ROR, you will find it very very easy!
If you use it, It's already present a powerful (in javascript terms) job manager.
new sails.cronJobs('0 01 01 * * 0', function () {
sails.log.warn("START ListJob");
}, null, true, "Europe/Dublin");
If you need more info not hesitate to contact me!

How to achieve realtime updates on my website (with Flask)?

I am using Flask and I want to show the user how many visits that he has on his website in realtime.
Currently, I think a way is to, create an infinite loop which has some delay after every iteration and which makes an ajax request getting the current number of visits.
I have also heard about node.js however I think that running another process might make the computer that its running on slower (i'm assuming) ?
How can I achieve the realtime updates on my site? Is there a way to do this with Flask?
Thank you in advance!

Well, there are many possibilites:
1) Polling - this is exactly what you've described. Infinite loop which makes an AJAX request every now and then. Easy to implement, can be easily done with Flask however quite inefficient - eats lots of resources and scales horribly - making it a really bad choice (avoid it at all costs). You will either kill your machine with it or the notification period (polling interval) will have to be so big that it will be a horrible user experience.
2) Long polling - a technique where a client makes an AJAX request but the server responds to that request only when a notification is available. After receiving the notification the client immediately makes a new request. You will require a custom web server for this - I doubt it can be done with Flask. A lot better then polling (many real websites use it) but could've been more efficient. That's why we have now:
3) WebSockets - truely bidirectional communication. Each client maintains an open TCP connection with the server and can react to incoming data. But again: requires a custom server plus only the most modern browsers support it.
4) Other stuff like Flash or Silverlight or other HTTP tricks (chunked encoding): pretty much the same as no 3). Though more difficult to maintain.
So as you can see if you want something more elegant (and efficient) than polling it requires some serious preparation.
As for processes: you should not worry about that. It's not about how many processes you use but how heavy they are. 1 badly written process can easily kill your machine while 100 well written will work smoothly. So make sure it is written in such a way that it won't freeze your machine (and I assure you that it can be done up to some point defined by number of simultaneous users).
As for language: it doesn't matter whether this is Node.js or any other language (like Python). Pick the one you are feeling better with. However I am aware that there's a strong tendency to use Node.js for such projects and thus there might be more proper libraries out there in the internets. Or maybe not. Python has for example Twisted and/or Tornado specially for that (and probably much much more).

Websocket is an event-driven protocol, which means you can actually use it for truly real-time communication.
Kenneth Reitz wrote an extension named Flask-Sockets that is excellent for websockets:
Article: introducing-flask-sockets
Github: flask-sockets

In my opinion, the best option for achieving real time data streaming to a frontend UI is to use a messaging service like pubnub. They have libraries for any language you are going to want to be using. Basically, your user interfaces subscribe to a data channel. Things which create data then publish to that channel, and all subscribers receive the publish very quickly. It is also extremely simple to implement.

You can use PubNub and specifically PubNub presence meant especially for online presence detection. It provides key features like
Track online and offline status of users and devices in realtime
Occupancy to monitor user and machine presence in realtime
Join/Leave Notification for immediate updates of all client connections
Global Scale with synchronized servers across the PubNub Data Stream Network
PubNub Presence Tutorial provides code that can be downloaded to get presence up and running in a few minutes. Five Ways You Can Use PubNub Presence shows you the different ways you can use PubNub presence.
You can get started by signing up for PubNub and getting your API keys.

How to develop a large chat application with Socket.io and Node.js [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I've been working with Socket.io over the past few months, developing a fairly complex chat application with chat rooms, kicking/banning/moderators/friends/etc.
Over the course of development, I've rewritten the app several times and I'm still fighting with my code.
I really like JavaScript, but I find it really hard to maintain the application as it grows. I've read through huge amounts of "tutorials" on how to write chat applications, but they all cover just the most basic aspects. Same goes for all example apps on GitHub, and even most chat applications I found on the web (most of them are just simple IM without any user management).
Some use cases just seem too ridiculous to me, for example kicking a user from a room.
moderator clicks on a kick button -> emit event to the server
the server pairs the username with a socket (or just broadcast to all users and filter on the client side) -> emit to him kicked event
the user emits a logout event to the server and is also displayed a message that he was kicked (logout is just my implementation of punishment)
the user is removed from the chat room's user list -> emit current list of users to all users in the room
This doesn't seem too complex, but when I add all the callbacks that happen on the client side to manage the UI (since I use AngularJS, I use events to communicate between controllers), and also ton of callbacks on the server side, since everything is non-blocking, I find this really hard to test.
There's another problem on the client side, where I have to listen for the socket events in multiple places, so I have to have a kind of singleton global socket object, and hook event listeners in multiple places to it.
Am I doing something wrong, or is this callback hell the result of working with websockets with no way around it?
Are there any ways to make developing applications like this easier? For example alternative technology to Socket.io? So far I only found NowJS, which has last commit 5 months ago, and meteor, which is really cool, but looking at the website it doesn't really seem to be stable.

Just use promises my friend. jQuery has them built in, but on the nodejs side you can use the q library. Don't write async code without them! They take a little getting used to, but once you're in the mindset writing async code is a breeze.
There are also libraries like async which give you utility functions around callback code. But don't let the popularity fool you. Promises are sweet.
For async code testing you don't need to go much farther than nodeunit which gives you a simple done() function you can call when your test is done, so it can run as long as you like.

Whoa, let's break that into actual problems, as I don't see many defined:
Kicking user from a room:
moderator clicks the Remove button (emit should contain roomID and userID)
server code removes the user from the Room object and emits message to each user in the room that userID has been kicked. If you had user's socket join(roomID) earlier, server code for that would look like this:
sio.sockets.to(roomID).emit('kicked', { userID: uid });
That's all. In the client code, you should receive this "kicked" event and have code like this:
if (data.removedUserID == myUserID)
alert('You have been kicked by moderator');
else
removeUserFromList(userID);
You should not have the client emit a message that he is leaving, as malicious users could write a client that ignores banning.
I have to listen for the socket events in multiple places
Why? What does "multiple places" mean exactly?
Are there any ways to make developing applications like this easier?
Don't just dive into code. Think how people would communicate directly. Clients are people sending messages, server is the post office. Translate the logic in code later.
Make sure that server is the only one in control and clients only give commands and display the state without doing any logic there.
I have developed a 4000 loc multiplayer game where chat is only a small portion of it. I have about 60.000 users playing it each day. Everything is plain socket.io code with some Express/EJS to get the screens up, and I cannot imagine it getting any simpler. Especially not by using some "magic" library that would hide all the communication from me and surely introduce it's own set of bugs that are waiting to be discovered and fixed.

Disclosure: I'm the developer of scoop.
I had more or less the same problem and it boils down to the usual callback pyramid which can be solved with many libraries (there are dozens of them, just take a look.
I was very happy with step before I found a major drawback: you can't nest them (a step call which calls more step stuff). This was very important for me and I didn't liked all the other async libs that much because they provided too much of functionality which I wouldn't use anyway, so I wrote scoop.
It's a simple lib which tries to help as all the other async libs, modeled after step with some personal flavor. Take a look at the examples, it may fit for your needs.

You could also take a look at derby.js. It is a framework very similar to meteor but built on all the node.js "goodies" like npm, socket.io, express, etc. Derby includes a powerful data synchronization engine called Racer that automatically syncs data among browsers, servers, and a database. They even have a basic chat example.
Meteor uses a lot of their own technology (fibers, own package manager). Like meteor derby is still in the alpha phase but gained a lot of attraction during the last time. Airbnb announced recently that they will consider derby for future implementations.
Some more information:
http://techcrunch.com/2012/07/27/move-over-meteor-derby-is-the-other-high-speed-node-js-framework-in-town/
http://blog.derbyjs.com/2012/04/14/our-take-on-derby-vs-meteor/
https://stackoverflow.com/questions/10374113/meteor-vs-derbyjs

Your question is hard to answer, but I can assure you I feel your pain... Even without node.js, callbacks can get hairy pretty quickly, and async testing is really hard to do. I guess I should rather say: hard to do well, but that might sound like I know how to do it easily and I don't. The background problem is that async development is hard, just like concurrent programming used to be.
I don't think that a different websocket library is going to help you, or even avoiding websockets completely. What might help you is to use a few tricks. Andy Ray above suggests promises; I have not used them extensively but it is worth a try.
Self-diagnosis is your friend. JavaScript is a dynamic language without a type system worth its salt and that masks null objects; only a huge battery of automatic tests can ensure quality. But as you say testing can be really hard.
Another trick: instrument your application like crazy. Send instrumentation events and check for them in your tests. We have built a cool test suite around a headless browser (PhantomJS) where we check with JavaScript that the client is sending the right events; it can be hard to debug but it works.
Of course, the usual design tips help: KISS, YAGNI, and so on, but I will not insult your intelligence with them. Good luck.

Client notification, should I use an AJAX Push or Poll?

I am working on a simple notification service that will be used to deliver messages to the users surfing a website. The notifications do not have to be sent in real time but it might be a better user experience if they happened more frequently than say every 5 minutes. The data being sent to and from the client is not very large and it is a straight forward database query to retrieve the data.
In reading other conversations on the topic it would appear that an AJAX push can result in higher server loads. Since I can tolerate longer server delays is it worth while to have the server push notifications or to simply poll.
It is not much harder to implement the push scenario and so I thought I would see what the opinion was here.
Thanks for your help.
EDIT:
I have looked into a simple AJAX Push and implemented a simple demo based on this article by Mike Purvis.
The client load is fairly low at around 5k for the initial version and expected to stay that way for quite some time.
Thank you everyone for your responses. I have decided to go with the polling solution but to wrap it all within a utility library so that if they want to change it later it is easier.

I'm surprised noone here has mentioned long-polling. Long polling means keeping an open connection for a longer period (say 30-60 seconds), and once it's closed, re-opening it again, and simply having the socket/connection listen for responses. This results in less connections (but longer ones), and means that responses are almost immediate (some may have to wait for a new polling connection). I'd like to add that in combination with technologies like NodeJS, this results in a very efficient, and resource-light solution, that is 100% browser compatible across all major browsers and versions, and does not require any additional tech like Comet or Flash.
I realize this is an old question, but thought it might still be useful to provide this information :)

Definitely use push its much cooler. If you just want simple notifications I would use something like StreamHub Push Server to do the heavy-lifting for you. Developing your own Ajax Push functionality is an extremely tricky and rocky road - you have to get it working in all browsers and then handle firewalls and proxies killing keep-alive connections etc... Why re-invent the wheel. Also, it has a similarly low footprint of less than 10K so it should suit if that is a priority for you.

Both have diferent requirements and address diferent scenarios.
If you need realtime updates, like in an online chat, push is a must.
But, if the refresh period is big, as it is in your case (5 minutes), then pool is the appropriate solution. Push, in this case, will require a lot of resource from both the client and the server.
Tip! try to make the page that checks the pool fast and clean, so it doesn't consumes a lot of resources in the server in each request. What I usually do is to keep a flag in memory (like in a session variable) that says if the pool is empty or not... so, I only do havy look in the pool only if it is not empty. When the pool is empty, which is most of the time, the page request runs extremely fast.

Because using a push requires an open HTTP connection to be maintained between your server and each client, I'd go for poll as well - not only is that going to consume a lot of server resources but it's also going to be significantly more tricky to implement as matt b mentioned.
My experience with polling is that if you have a frequent enough polling interval on a busy enough site your web server logs can get flooded with poll requests real quickly.
Edit (2017): I'd say your choices are now are between websockets and long polling (mentioned in another answer). Sounds like long polling might be the right choice based on the way the question mentions that the notifications don't need to be received in real time, an infrequent polling period would be pretty easy to implement and shouldn't be very taxing on your server. Websockets are cool and a great choice for many applications these days, sounds like that might be overkill in this case though.

I would implement a poll just because it sounds simpler to write, and keeping it simple is very valuable.

Not sure if you have taken a look at some of the COMET implementations out there (is that what you mean by AJAX push).
If the user is surfing the site, won't that in effect be requesting information from the server that this notification can piggy-back on?

It's impossible to say whether polling will be more expensive then pushing without knowing how many clients you'll have. I'd recommend polling because:
It sounds like you want to update data about once per minute. Unless notifications are able to arrive at a much faster rate than that, pushing would mean you're keeping an HTTP connection open but seeing very little activity on it.
Polling is built on top of existing HTTP conventions, so any server that talks to web browsers is already ready to respond to ordinary Ajax requests. A Comet– or Flash socket–based solution has different requirements; you'll need something like cometd on the server side and a client-side library that groks server-side push.
So if you needed something heavy-duty to manage a torrent of data and a crapload of clients, I'd recommend Comet. But that doesn't seem to be the case.

There's now a service http://pusherapp.com that is trying to solve this problem once and for all, in a blink. Might be worth checking out. (disclaimer: i am in no way associated with them).

I haven't tried it myself, but some say COMET works and is easier than you think. There's also a Ruby on Rails plug-in called Juggernaut that I've heard talked about highly. Again, I haven't used it, so YMMV, but my understanding is that it takes far fewer resources compared to polling. I believe (can someone confirm?) that COMET is how MacRumorsLive.com delivers live blogging of WWDC Stevenotes.

We Keep Coding

JavaScript is the programming language of the Web.