Reading Event Hub documentation and creating a simple producer-consumer example
Link -> https://learn.microsoft.com/en-us/javascript/api/overview/azure/event-hubs-readme?view=azure-node-latest
I was wondering in a production application how this would work. The reason is that in the current implementation is listening for a specific amount of time then the connection is closing.
Should we send the request to specific REST endpoints and activate the listeners after the producer finishes?
You are correct that in most production scenario's this does not work. Best is to keep the listener open during the lifetime of the application. In most cases when a restart of the application is triggered, processing should resume from the last checkpoint on continuation. The example does not cover this.
From the docs:
For the majority of production scenarios, we recommend that you use the event processor client for reading and processing events. The processor client is intended to provide a robust experience for processing events across all partitions of an event hub in a performant and fault tolerant manner while providing a means to checkpoint its progress. Event processor clients can work cooperatively within the context of a consumer group for a given event hub. Clients will automatically manage distribution and balancing of work as instances become available or unavailable for the group.
Here is an example of processing events combined with checkpointing. For demo purposes the listener stops after a while. You will have to modify the code to run as long as the process is not stopped.
Checkpointing is important if you have a continuous flow of events being send. If the listener is not available for some period you do want to resume processing not from the beginning of the first event nor from new events only. Instead you will want to start from the last know processed event.
Related
It is to my understanding that the node event loop will continue to handle requests until the event loop is empty, at which point it will look the the event queue to complete the blocking I/O requests.
My question is.. What happens if the event loop never becomes empty? Not due to bad code (i.e. never ending loop) but due to consistent client requests (thinking something like google that gets never ending requests)?
I realize there is a possibility I am misunderstanding a fundamental aspect of how client requests are handled by a server.
There are actually several different phases of the event loop (timers, I/O, check events, pending callbacks, etc...) and they are checked in a circular order. In addition some things (like promises and other microtasks) go to the front of the line no matter what phase of the event loop is currently processing.
It is possible that a never ending set of one type of event can block the event queue from serving other types of events. That would be a design/implementation problem that needs to be prevented.
You can read a bit more about the different types of things in the event loop here: https://developer.ibm.com/tutorials/learn-nodejs-the-event-loop/ and https://www.geeksforgeeks.org/node-js-event-loop/ and https://snyk.io/blog/nodejs-how-even-quick-async-functions-can-block-the-event-loop-starve-io.
While it is possible to overload the event loop such that is doesn't get out of one phase of the event loop, it's not very common because of the way most processing in nodejs consists of multiple events which gives other things a chance to interleave. For example, processing an incoming http request consists of connecting, reading, writing, closing, etc... and the processing of that event may involve other types of events. So, it can happen that you overload one type of event (I've done it only once in my coding and that was because of poorly written communication between the main thread and a bunch of WorkerThreads that was easily fixed to not have the problem once I realized what the problem was.
When Node.js starts, it initializes the event loop, processes the provided input script which may make async API calls, schedule timers, or call process.nextTick(), then begins processing the event loop.
There are seven phases and each phase has its own event queue which is based on FIFO.
So application makes a request event, event demultiplexer gathers those requests and pushes to respective event queues.
For example, If my code makes two reqeusts one is setTimeOut() and another is some API Call, demultiplexer will push the first one in timer queue and other in poll queue.
But events are there, and loop watches over those queues and events, on completion in pushes the registered callback to the callstack where it is processed.
My question is,
1). Who handles events in event queue to OS?
2). Does event loop polls for event completion in each event queue or does OS notifies back?
3). Where and who decides whether to call native asyncrhonous API or handle over to a thread pool?
I am very verge of understanding this, I have been strugling a lot to grasp the concepts. There are a lot of false information about node.js event loop and how it handles asynchronous calls using one thread.
Please answer this questions if possible. Below are the references where I could get some better insight from.
https://github.com/nodejs/nodejs.org/blob/master/locale/en/docs/guides/event-loop-timers-and-nexttick.md
https://dev.to/lunaticmonk/understanding-the-node-js-event-loop-phases-and-how-it-executes-the-javascript-code-1j9
how does reactor pattern work in Node.js?
https://www.youtube.com/watch?v=PNa9OMajw9w&t=3s
Who handles events in event queue to OS?
How OS events work depends upon the specific type of event. Disk I/O works one way and Networking works a different way. So, you can't ask about OS events generically - you need to ask about a specific type of event.
Does event loop polls for event completion in each event queue or does OS notifies back?
It depends. Timers for example are built into the event loop and the head of the timer list is checked to see if it's time has come in each timer through the event loop. File I/O is handled by a thread pool and when a disk operation completes, the thread inserts a completion event into the appropriate queue directly so the event loop will just find it there the next time through the event loop.
Where and who decides whether to call native asynchronous API or handle over to a thread pool?
This was up to the designers of nodejs and libuv and varies for each type of operation. The design is baked into nodejs and you can't yourself change it. Nodejs generally uses libuv for cross platform OS access so, in most cases, it's up to the libuv design for how it handles different types of OS calls. In general, if all the OSes that nodejs runs on offer a non-blocking, asynchronous mechanism, then libuv and nodejs will use it (like for networking). If they don't (or it's problematic to make them all work similarly), then libuv will build their own abstraction (as with file I/O and a thread pool).
You do not need to know the details of how this works to program asynchronously in nodejs. You make a call and get a callback (or resolved promise) when its done, regardless of how it works internally. For example, nodejs offers some asynchronous crypto APIs. They happen to be implemented using a thread pool, but you don't need to know that in order to use them.
Im using azure event hub for a project via this npm package #azure/events-hub
Im wondering if theres a way to make the event receiver only receive new event when it is done processing a previously received event. The point is i want to process one event at a time.
My observation currently is that it sends events to the handler the moment they become available.
The api im using is the client.receive(partitionId, onMessage, onError) from the docs.
Wondering if there's a way to achieve the mentioned behaviour, with this api.
The client.receive() method returns a RecieverHandler object that you could use to stop the stream using it's stop() method. You would then start it again using a fresh client.receive().
Another option would be to use client.recieveBatch() where the max batch size is set to 1.
Neither option is ideal- as Peter Bons mentioned, Event Hubs are not designed for a slow drip of data.The service assumes that you will be able to accept messages at the same rate they came in, and that you will have only 1 receiver per partition. Service Bus is indeed a good alternative to look into. You can choose how many messages a recieve at a time and connect multiple receivers, each processing one message at a time, to scale your solution.
What you are describing is the need to have back pressure and the latest version of the #azure/event-hubs package has indeed solved this problem. It ensures that you receive events only after the previously received events are processed. By default, you will receive events in batches of size 10 and this size is configurable.
See the migration guide from v2 to v5 if you need to upgrade your current application to use the latest version of the package.
I play around with CQRS/event sourcing for a couple of months now. Currently, I'm having trouble with another experiment I try and hope somebody could help, explain or even hint on another approach than event sourcing.
I want to build a distributed application in which every user has governance of his/her data. So my idea is each user hosts his own event store while other users may have (conditional) access to it.
When user A performs some command this may imply more than one event store. Two examples:
1) Delete a shared task from a tasklist hosted by both event store A and B
2) Adding the reference to a comment persisted in event store A to a post persisted in event store B.
My only solution currently seems to use a process manager attached to each event store, so when an event was added to one event store, a saga deals with applying the event to other related event stores as well.
Not sure what is the purpose of your solution but if you want one system to react on events from another system, after events are saved to the store, a subscription (like catch-up subscription provided by Greg Young's EventStore) publishes it on a message bus using pub-sub and all interested parties can handle this event.
However, this will be wrong if they just "save" this event to their stores. In fact they should have an event handler that will produce a command inside the local service and this command might (or might not) result in a local event, if all conditions are met. Only something that happens within the boundaries, under the local control, should be saved to the local store.
When using Server-Sent Events should the client establish multiple connections to receive different events it is interested in, or should there be a single connection and the client indicates what it is interested via a separate channel? IMO the latter seems more preferable although to some it might make the client code more complex. The spec supports named events (events that relate to a particular topic), which to me suggests that a Server-Sent Events connection should be used as single channel for all events.
The following code illustrates the first scenario where a multiple Server-Sent Event connections are initiated:
var EventSource eventSource1 = new EventSource("events/topic1");
eventSource1.addEventListener('topic1', topic1Listener, false);
var EventSource eventSource2 = new EventSource("events/topic2");
eventSource2.addEventListener('topic2', topic2Listener, false);
eventSource1 would receive "topic1" events and eventSource2 would receive "topic2" events. Whilst this is pretty straight forward it is also pretty inefficient with a hanging GET occurring for each topic you are interested in.
The alternative is something like the following:
var EventSource eventSource3 = new EventSource("/events?id=1234")
eventSource3.addEventListener('topic3', topic3Listener, false);
eventSource3.addEventListener('topic4', topic4Listener, false);
var subscription = new XMLHttpRequest();
subscription.open("PUT", "/events/topic3?id=1234", true);
subscription.send();
In this example a single EventSource would exist and interest in a particular event would be specified by a separate request with the Server-Sent Event connection and the registration being correlated by the id param. topic3Listener would receive "topic3" events and topic4Listener would not. Whilst requiring slightly more code the benefit is that only a single connection is made, but events can be still be identified and handled differently.
There are a number examples on the web that show the use of named events, but it seems the event names (or topics) are known in advance so there is no need for a client to register interest with the server (example). Whilst I am yet to see an example showing multiple EventSource objects, I also haven't seen an example showing a client using a separate request to register interest in a particular topic, as I am doing above. My interpretation of the spec leads me to believe that indicating an interest in a certain topic (or event name) is entirely up to the developer and that it can be done statically with the client knowing the names of the events it is going to receive or dynamically with the client alerting the server that it is interested in receiving particular events.
I would be pretty interested in hearing other people's thoughts on the topic. NB: I am usually a Java dev so please forgive my mediocre JS code.. :)
I would highly recommend, IMHO, that you have one EventSource object per SSE-providing service, and then emit the messages using different types.
Ultimately, though, it depends on how similar the message types are. For example, if you have 5 different types of messages related to users, have a user EventSource and differentiate with event types.
If you have one event type about users, and another about sandwiches, I'd say keep them in different services, and thus EventSources.
It's a good idea to think of breaking up EventSources the same way you would a restful service. If you wouldn't get two things from the same service with AJAX, you probably shouldn't get them from the same EventSource.
In response to vague and permissive browser standard interpretation*, browser vendors have inconsistently implemented restrictions to the number of persistent connections allowed to a single domain/port. As each event receiver to an async context assumes a single persistent connection allocation for as long as that receiver is open, it is crucial that the number of the EventSource listeners be strictly limited in order to avoid exceeding the varying, vendor-specific limits. In practice this limits you to about 6 EventSource/async context pairs per application. Degradation is graceful (e.g. additional EventSource connection requests will merely wait until a slot is available), but keep in mind there must be connections available for retrieving page resources, responding to XHR, etc.
*The W3C has issued standards with respect to persistent connections that contain language “… SHOULD limit the number of simultaneous connections…” This language means the standard is not mandatory so vendor compliance is variable.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.4