I created an SQS with default settings. I published two messages to it, and I would like to read them back in the same time. I tried it like this:
const sqsClient = new SQSClient({ region: REGION });
const params = {
AttributeNames: ["SentTimestamp"],
MaxNumberOfMessages: 5,
MessageAttributeNames: ["All"],
QueueUrl: queueURL,
WaitTimeSeconds: 5,
};
const data = await sqsClient.send(new ReceiveMessageCommand(params));
const messages = data.Messages ?? [];
console.log(messages.length);
Unfortunately only one message is returned, no matter what I provide in MaxNumberOfMessages. What can cause this? How is it possible to fix this issue?
I was able to find a similar question, but it has only one answer, refering to a 3rd party library.
A ReceiveMessageCommand does not guarantee that you will get exactly the number of messages specified for MaxNumberOfMessages. In fact the documentation says the following:
Short poll is the default behavior where a weighted random set of machines is sampled on a ReceiveMessage call. Thus, only the messages on the sampled machines are returned. If the number of messages in the queue is small (fewer than 1,000), you most likely get fewer messages than you requested per ReceiveMessage call. If the number of messages in the queue is extremely small, you might not receive any messages in a particular ReceiveMessage response. If this happens, repeat the request.
You must use long-polling to receive multiple messages. This is essentially setting the WaitTimeSeconds to a greater value (5 seconds should be enough).
And you must have a larger number of messages in the queue to be able to fetch multiple messages with one call.
To summarize:
SQS is a distributed system, each call will poll one machine only.
Messages are distributes on those machines, if you have a small number of messages, it might happen that you fetch only one message, or none.
Test your code with a larger set of sent messages and put your receiving call in loop.
Related
So I wrote simple video creator script in NodeJS.
It's running on scheduled cron job.
I have a panel written in PHP, user enter details and clicks "Submit new Video Job" Button.
This new job is saving to DB with details, jobId and status="waiting" data.
PHP API is responsible for returning 1 status at a time, checks status="waiting" limits query to 1 then returns data with jobID when asked
Video Creation Script requests every x seconds to that API asks for new job is available.
It has 5 tasks.
available=true.
Check if new job order available (With GET Request in every 20 seconds), if has new job;
available=false
Get details (name, picture url, etc.)
Create video with details.
Upload Video to FTP
Post data to API to update details. And Mark that job as "done"
available=true;
These tasks are async so everytask has to be wait previous task to be done.
Right now, get or post requesting api if new job available in every 20 seconds (Time doesnt mattter) seems bad way to me.
So any way / package / system to accomplish this behavior?
Code Example:
const cron = require('node-cron');
let available=true;
var scheduler = cron.schedule(
'*/20 * * * * *',
() => {
if (available) {
makevideo();
}
},
{
scheduled: false,
timezone: 'Europe/Istanbul',
}
);
let makevideo = async () => {
available = false;
let {data} = await axios.get(
'https://api/checkJob'
);
if (data == 0) {
console.log('No Job');
available = true;
} else {
let jobid = data.id;
await createvideo();
await sendToFTP();
await axios.post('https://api/saveJob', {
id: jobid,
videoPath: 'somevideopath',
});
available = true;
}
};
scheduler.start();
RabbitMQ is also a good queueing system.
Why ?
It's really well documented (examples for many languages including javascript & php).
Tutorials are simple while they're exposing real use cases.
It has a REST API.
It ships with a monitoring UI.
How to use it to solve your problem ?
On the job producer side : send messages (jobs) to a queue by following tutorial 1
To consume jobs with your nodejs process : see RabbitMQ's tutorial 2
Other suggestions :
Use a prefetch value of 1 and publisher confirms so you can ensure that an instance of consumer will not receive messages while there's a job running.
Roadmap for a quick prototype : tutorial 1... then tutorial 2 x). After sending and receiving messages you can explore the options you can set on queues and messages
Nodejs package : http://www.squaremobius.net/amqp.node/
PHP package : https://github.com/php-amqplib/php-amqplib
While it is possible to use the database as a queue, it is commonly known as an anti-pattern (next to using the database for logging), and as you are looking for:
So any way / package / system to accomplish this behavior?
I use the free-form of your question thanks to the placed bounty to suggest: Beanstalk.
Beanstalk is a simple, fast work queue.
Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.
It has client libraries in the languages you mention in your question (and many more), is easy to develop with and to run in production.
What you are doing in a very standard system design paradigm, done with Apache Kafka or any queue based implementation(ex, RabbitMQ). You can check out about Kafka/rabbitmq but basically Not going into details:
There is a central Queue.
When user submits a job the job gets added to the Queue.
The video processor runs indefinitely subscribing to the queue.
You can go ahead and look up : https://www.gentlydownthe.stream/ and you will recognize the similarities on what you are doing.
Here you don't need to poll yourself, you need to subscribe to an event and the other things will be managed by the respective queues.
Infra-Overview:
I have a setup where I am reading a set of messages from IBM MQ and processing those messages in k8 cluster env and sending it to the destination host.
Issue:
I observed that sometimes the flow of the messages is huge and before sending it to the destination host our pod gets failed and restarts, by this we are losing all the messages as we are following a read-and-delete approach from ibmmq example
Expected Solution:
I am looking for a solution where, until these messages are sent to the destination host, we don't lose the track of the messages.
What I tried:
We have a concept of unit of work in IBM MQ but since we can't expect a delay in reading and processing, I can't wait for a single message to get processed and then read the another message as it might have a major performance setback.
Code language:
NodeJs
As the comments suggest there are a number of ways to skin this cat, but you will need to use transactions.
As soon as you create the connection with the transaction option, the transaction scope begins. This gets closed and next transaction begins when you either commit or rollback.
So you should handle the messages in batches, that make sense to your application, and commit when the batch is complete. If your application is killed by k8s then all uncommitted read messages will get rolled back, via back out queue process to stop poison messages.
Section added to show sample code, and explanation of backout queues.
In your normal processing, if an app gets stopped before it has had time to process the message, you will want that message returned to the queue. So that the message is still available to be processed.
To enable this rollback you need to or in the MQC.MQPMO_SYNCPOINT into the get message options
gmo.Options |= MQC.MQGMO_SYNCPOINT
Then if all goes well, you can commit.
mq.Cmit(hConn, function(err) {
if (err) {
debug_warn('Error on commit', err);
} else {
debug_info('Commit was successful');
}
});
or rollback
mq.Back(hConn, function(err) {
if (err) {
debug_warn('Error on rollback', err);
} else {
debug_info('rollback was successful');
}
});
If you rollback, the message goes back to the queue. Which means it is also the next message that your app will read. This can generate a poison message loop. So you should also set up a backout queue with pass all context permissions for your app user and a backout threshold.
Say you set the threshold to 5. The message can be read 5 times, with rollback. Your app needs to check the threshold and decide that it is a poison message and move it off the queue.
To check the backout threshold (and the backout queue name) you can use the following code
// Remember to or in the Inquire option on the Open
openOptions |= MQC.MQOO_INQUIRE;
...
attrs = [ new mq.MQAttr(MQC.MQIA_BACKOUT_THRESHOLD),
new mq.MQAttr(MQC.MQCA_BACKOUT_REQ_Q_NAME) ];
mq.Inq(hObj, attrs, (err, selectors) => {
if (err) {
debug_warn('Error retrieving backout threshold', err);
} else {
debug_info('Attributes have been found');
selectors.forEach((s) => {
switch (s.selector) {
case MQC.MQIA_BACKOUT_THRESHOLD:
debug_info('Threshold is ', s.value);
break;
case MQC.MQCA_BACKOUT_REQ_Q_NAME:
debug_info('Backout queue is ', s.value);
break;
}
});
}
});
When getting the message your app can use mqmd.BackoutCount to check how often the message has been rolled back.
if (mqmd.BackoutCount >= threshold) {
...
}
What I have noticed, that if this is in the same application instance that is repeatedly calling rollback on the same message, then at the threshold a MQRC_HOBJ_ERROR error is thrown. Which your app can check for, and then discard the message.
If its a different app instance then it doesn't get the MQRC_HOBJ_ERROR error, so it can check the backout threshold and can discard the message, remembering to commit the discard action.
See https://github.com/ibm-messaging/mq-dev-patterns/tree/master/transactions/JMS/SE for more information.
As an alternative you could use keda - https://keda.sh - which works with k8s
to monitor your queue depth and scale according to the number of messages waiting to be processed, as opposed to CPU / memory consumption. That way you can scale up when there are lots of messages waiting to be processed, and slowly scale down then the queue becomes manageable. Here is a link to getting started - https://github.com/ibm-messaging/mq-dev-patterns/tree/master/Go-K8s - the example is for a Go app, but equally applies to Node.js
I'm experimenting with node and it's child_process module.
My goal is to create server which will run on maximum of 3 processes (1 main and optionally 2 children).
I'm aware that code below may be incorrect, but it displays interesting results.
const app = require ("express")();
const {fork} = require("child_process")
const maxChildrenRuning = 2
let childrenRunning = 0
app.get("/isprime", (req, res) => {
if(childrenRunning+1 <= maxChildrenRuning) {
childrenRunning+=1;
console.log(childrenRunning)
const childProcess = fork('./isprime.js');
childProcess.send({"number": parseInt(req.query.number)})
childProcess.on("message", message => {
console.log(message)
res.send(message)
childrenRunning-=1;
})
}
})
function isPrime(number) {
...
}
app.listen(8000, ()=>console.log("Listening on 8000") )
I'm launching 3 requests with 5*10^9'ish numbers.
After 30 seconds I receive 2 responses with correct results.
CPU stops doing hard work and goes idle
Surprisingly after next 1 minute 30 seconds 1 thread starts to proceed, still pending, 3rd request and finishes after next 30 seconds with correct answer. Console log displayed below:
> node index.js
Listening on 8000
1
2
{ number: 5000000029, isPrime: true, time: 32471 }
{ number: 5000000039, isPrime: true, time: 32557 }
1
{ number: 5000000063, isPrime: true, time: 32251 }
Either express listens and checks pending requests once for a while or my browser sends actual requests every x time while pending. Can anybody explain what is happening here and why? How can I correctly achieve my goal?
The way your server code is written, if you receive a /isprime request and two child processes are already running, your request handler for /isprime does nothing. It never sends any response. You don't pass that first if test and then nothing happens afterwards. So, that request will just sit there with the client waiting for a response. Depending upon the client, it will probably eventually time out as a dead/inactive request and the client will shut it down.
Some clients (like browsers) may assume that something just got lost in the network and they may retry the request by sending it again. It would be my guess that this is what is happening in your case. The browser eventually times out and then resends the request. By the time it retries, there are less than two child processes running so it gets processed on the retry.
You could verify that the browser is retrying automatically by going to the network tab in the Chrome debugger and watching exactly what the browser sends to your server and watch that third request, see it timeout and see if it is the browser retrying the request.
Note, this code seems to be only partially implemented because you initially start two child processes, but you don't reuse those child processes. Once they finish and you decrement maxChildrenRuning, your code will then start another child process. Probably what you really want to do is to keep track of the two child processes you started and when one finishes, add it to an array of "available child processes" so when a new request comes in, you can just use an existing child process that is already started, but idle.
You also need to either queue incoming requests when all the child processes are full or you need to send some sort of error response to the http request. Never sending an http response to an incoming request is a poor design that just leads to great inefficiencies (connections hanging around much longer than needed that never actually accomplish anything).
Strange situation.
I try to start chat application.
I use postgresql 9.3 and tomcat as web server.
What is happens when one browser sending message another:
1 - Broswer A send message to server (tomcat)
2 - Tomcat put msg into database and get his id
INSERT INTO messages VALUES('first message') returning into MSGID id
3 - Tomcat resend message to Browser B (websocket recipient)
4 - Browser B send system answer: MSGID_READED
5 - Tomcat update database message
UPDATE messages SET readtime = now() WHERE id = MSGID
All works, but sometimes at point 5 update can't find message by MSGID...
Very strange, coz at point 2 I getting message record ID, but at 5, not.
May postgresql write slowly and this record not allow (not visible) from parallel db connection?
UPDATE
I found solution for me, just put insert inside begin/exception/end block.
BEGIN
INSERT INTO messages (...)
VALUES (...)
RETURNING id INTO MSGID;
EXCEPTION
WHEN unique_violation THEN
-- nothing
END;
UPDATE 2
In detail tests above changes with BEGIN block has no effects.
Solution in Javascript! I sent websocket messages from other thread and problem solved!
// WebSocket send message function
// Part of code. so is a web socket
send = function(msg) {
if (msg != null && msg != '') {
var f = function() {
var mm = m;
// JCC.log('SENT: [' + mm + ']');
so.send(mm);
};
setTimeout(f, 1);
}
};
Ok, so the problem is that normally writers do not block readers. This means that your first insert happens, and the second insert fires before the first one commits. This introduces a race condition in your application which introduces the problem you see.
Your best issue here is either to switch to serializable snapshot isolation or to do what you have done and do exception handling on the insert. One way or another you end up with additional exception handling that must be handled (if serializable, then a serialization failure exception may sometimes happen and you may have to wait for it).
In your case, despite the performance penalty of exception handling in plpgsql, you are best off to do things the way you are currently doing them because that avoids the locking issues and waiting for the transaction to complete.
Is it possible to check/log how much data has been transferred during each run of PhantomJs/CasperJS?
Each instance of Phantom/Casper has a instance_id assigned to it (by the PHP function that spun up the instance). After the run has finished, the amount of data transferred and the instance_id will have to make its way to be inserted into a MySQL database, possibly via the PHP function that spawned the instance. This way the bandwidth utilization of individual phantomjs runs can be logged.
There can be many phantom/casper instances running, each lasting a minute or two.
The easiest and most accurate approach when trying to capture data is to get the collector and emitter as close as possible. In this case it would be ideal if phantomjs could capture that data that you need and send it back to your PHP function to associate it to the instance_id and do the database interaction. Turns out it can (at least partially).
Here is one approach:
var page = require('webpage').create();
var bytesReceived = 0;
page.onResourceReceived = function (res) {
if (res.bodySize) {
bytesReceived += res.bodySize;
}
};
page.open("http://www.google.com", function (status) {
console.log(bytesReceived);
phantom.exit();
});
This captures the size of all resources retrieved, adds them up, and spits out the result to standard output where your PHP code is able to work with it. This does not include the size of headers or any POST activity. Depending upon your application, this might be enough. If not, then hopefully this gives you a good jumping off point.