Child process setInterval sporadically not firing - javascript

My application places bets on certain sporting events. Almost like an automatic betting bot. A part of this is to track the current status of the event so it can make an informed calculation whether to place a bet or not. To do this, I poll the status of an event every minute using setInterval. A single sporting event is "watched" by 1 child process. There could be up 100+ sporting events at any one time meaning there could be 100+ child process' spawned & actively polling.
worker/index.js
const logger = require('../lib/logger/winston')
const utils = {
updateEventState: require('../utils/update-event-state')
}
const db = require('../db')
let eventStatesInterval
module.exports = async function() {
try {
logger.info(`[STARTED FOR EVENT ${process.EVENT_ID} (Process: ${process.pid})]`)
const event = await db.getEvent(process.EVENT_ID)
logger.info(`[STARTING STATE POLLING FOR EVENT (${process.EVENT_ID} - Process: ${process.pid})]`)
eventStatesInterval = setInterval(utils.updateEventState, 60000, event) // 1 min
process.on('exit', code => {
clearInterval(eventStatesInterval)
})
} catch(err) {
throw err
}
}
utils/update-event-state.js
const logger = require('../lib/logger/winston')
const db = require('../db')
module.exports = async function(event) {
try {
const update = {}
if (!process.CHECKING_EVENT) {
process.CHECKING_EVENT = true
logger.info(`[POLLING STATE (${process.EVENT_ID} - Process: ${process.pid})]`)
// Some async operations polling APIs to get the full status of an event
const hasEnded = await api.getHasEventEnded(process.EVENT_ID)
await db.updateEvent(process.EVENT_ID, update)
if (hasEnded) {
process.exit(0)
}
process.CHECKING_EVENT = false
}
} catch(err) {
throw err
}
}
It's also worth noting that a single child process could have more setInterval process' further down the line. For example, if I place a bet that is not fully matched. I poll to check when/if it gets matched. This is also run on a setInterval basis (about every 5 secs). Check ing the logs, some process' are polled correctly every minute but a couple (inconsistent number each time) are not being polled at all. Looking at the logs for a specific process, I get:
For reference, the current time was 22:33 when that screenshot was taken so the interval had not happened in over an hour
There was only 4 events (child process') running at the time.
That is an example screenshot. A process can log several interval callbacks & then just...stop. No more logs at all. No errors or anything. Just stops.
This application runs on a Digital Ocean box with 4GB memory & 2 vCPUs. The app is dockerised. When running docker stats, 100% CPU is being used constantly. When starting with docker run, I limit the memory usage but not the CPU. Could this be the issue? Would setInterval callback liable not be invoked due to CPU constraints?
It's worth noting that this feature to poll the event state is new (5 days) & I had never had an issue with setInterval beforehand (I don't know how much CPU was being used though). I was initially polling the state every 30 seconds & when I noticed this problem. When I checked docker stats, almost 200% CPU usage was shown. Lowering it to 1 minute instead has fixed that slightly. I have the process.CHECKING_EVENT global bool there to not block the thread & not cause a pile up of tasks on the stack.

Related

react-Countdown Timer (minor out of sync) problem after refreshing the screen

I have an auction site in MERN stack with socket.io and i seem to have this unsolvable problem which i think is related to browsers and basic JS
Flow:
whenever a product is added by admin, socket broadcasts it with all
of details (including time )to all clients.
the clients can bid on them and stuff.
if a user refreshes the screen , it requests the socket for latest
product time and starts countdown from that time.
Everything is fine except some times react-countdown is lagging 0.5 second to 1 second behind whenever page is refreshed (please note that problem does not occur when opening same auction on new tab)
Note: i have also tried self made Countdown timer using setInterval but the problem does not go away
I am seeking assistance with this problem and am willing to compensate someone for their time and efforts to work with me directly to resolve it. Any help would be greatly appreciated.
Using setInterval and setTimeout means you are at the mercy of the browser. Browsers will often slow down the execution of these if some other process is happening and return to it once that's done, or if you switch away to another tab, they will purposefully reduce the tick rate. The level of accuracy you require is not easily achieved.
Firstly, I would suggest that getting the time it ends, then finding the difference between then and now, and counting down from this value with 1s increments will aggravate this problem. If each "tick" is off by even a small amount, this will accumulate into a larger error. This is probably what the library is doing by default. It may have also been what you were doing when you made your own timer, but I'd need to see it to confirm.
Instead, you need to store the time it ends passed from the socket (or keep it in scope like below) and on each "tick", work out the difference between then and now, every single time.
You could do this by using react-countdown in controlled mode and doing this logic in the parent.
I've made up a function here which would be the thing that receives the time from the socket -- or it's called from it. Its pseudo-code.
const timer = useRef(null)
const [timeLeft, setTimeLeft] = useState(null) // In seconds
const handleSocketReceived = (({endTime}) => {
timer.current = setInterval(() => {
const newTimeLeft = endTime - Date.now() // Pseudo code, depends on what end time is, but basically, work out the diff
setTimeLeft(newTimeLeft)
}, 100) // Smaller period means more regular correction
}, [])
// ...
useEffect(() => {
return () => clearInterval(timer.current)
}, [])
// ...
<Countdown date={timeLeft} controlled={true} />

Google Cloud Pub/Sub triggers high latency on low messages throughput

i'm running project which publishes messages to a PubSub topic and triggers background cloud function.
I read that with high volumes of messages, it performs well, but for lesser amounts like hundreds or even tens of messages per second, Pub/Sub may yield high latencies..
Code example to publish single message:
const {PubSub} = require('#google-cloud/pubsub');
const pubSubClient = new PubSub();
async function publishMessage() {
const topicName = 'my-topic';
const dataBuffer = Buffer.from(data);
const messageId = await pubSubClient.topic(topicName).publish(dataBuffer);
console.log(`Message ${messageId} published.`);
}
publishMessage().catch(console.error);
Code example function triggered by PubSub:
exports.subscribe = async (message) => {
const name = message.data
? Buffer.from(message.data, 'base64').toString()
: 'World';
console.log(`Hello, ${name}!`);
}
Cloud Function Environment Details:
Node: 8
google-cloud/pubsub: 1.6.0
The problem is when using PubSub with low throughput of messages (for example, 1 request per second) it struggles sometimes and shows incredibly high latency (up to 7-9s or more).
Is there a way or workaround to make PubSub perform well each time (50ms or less delay) even with small amount of incoming messages?
If you are always publishing to the same topic, you'd be better off keeping the object returned from pubSubClient.topic(topicName) and reusing it, whether you have a small number or large number of messages. If you want to minimize latency, you'll also want to set the maxMilliseconds property of the batching setting. By default, this is 10ms. As you have the code now, it means that every publish waits 10ms to send a message in hopes of filling the batch. Given that you create a new publisher via the topic call on every publish, you are guaranteed to always wait at least 10ms. You can set it when you call topic:
const publisher = pubSubClient.topic(topicName, {
batching: {
maxMessages: 100,
maxMilliseconds: 1,
},
});
If after reusing the the object returned from pubSubClient.topic(topicName) and changing maxMilliseconds you are still experiencing such a delay, then you should reach out to Google Cloud Support so they can look at the specific project and topic you are using as that kind of latency is definitely not expected.

Error: 10 ABORTED: Too much contention on these documents. Please try again

What does this error mean?
Especially, what do they mean by : Please try again
Does it mean that the transaction failed I have to re-run the transaction manually?
From what I understood from the documentation,
The transaction read a document that was modified outside of the
transaction. In this case, the transaction automatically runs again.
The transaction is retried a finite number of times.
If so, on which documents?
The error do not indicate which document it is talking about. I just get this stack:
{ Error: 10 ABORTED: Too much contention on these documents. Please
try again.
at Object.exports.createStatusErrornode_modules\grpc\src\common.js:87:15)
at ClientReadableStream._emitStatusIfDone \node_modules\grpc\src\client.js:235:26)
at ClientReadableStream._receiveStatus \node_modules\grpc\src\client.js:213:8)
at Object.onReceiveStatus \node_modules\grpc\src\client_interceptors.js:1256:15)
at InterceptingListener._callNext node_modules\grpc\src\client_interceptors.js:564:42)
at InterceptingListener.onReceiveStatus\node_modules\grpc\src\client_interceptors.js:614:8)
at C:\Users\Tolotra Samuel\PhpstormProjects\CryptOcean\node_modules\grpc\src\client_interceptors.js:1019:24
code: 10, metadata: Metadata { _internal_repr: {} }, details: 'Too
much contention on these documents. Please try again.' }
To recreate this error, just run a for loop on the db.runTransaction method as indicated on the documentation
We run into the same problem with the Firebase Firestore database. Even small counters with less then 30 items to cound where running into this issue.
Our solution was not to distribute the counter but to increase the number of tries for the transaction and to add a deffer time for those retries.
The first step was to save the transaction action as const witch could be passed to another function.
const taskCountTransaction = async transaction => {
const taskDoc = await transaction.get(taskRef)
if (taskDoc.exists) {
let increment = 0
if (change.after.exists && !change.before.exists) {
increment = 1
} else if (!change.after.exists && change.before.exists) {
increment = -1
}
let newCount = (taskDoc.data()['itemsCount'] || 0) + increment
return await transaction.update(taskRef, { itemsCount: newCount > 0 ? newCount : 0 })
}
return null
}
The second step was to create two helper functions. One for waiting a specifix amount of time and the other one to run the transaction and catch errors. If the abort error with the code 10 occurs we just run the transaction again for a specific amount of retries.
const wait = ms => { return new Promise(resolve => setTimeout(resolve, ms))}
const runTransaction = async (taskCountTransaction, retry = 0) => {
try {
await fs.runTransaction(taskCountTransaction)
return null
} catch (e) {
console.warn(e)
if (e.code === 10) {
console.log(`Transaction abort error! Runing it again after ${retry} retries.`)
if (retry < 4) {
await wait(1000)
return runTransaction(taskCountTransaction, ++retry)
}
}
}
}
Now that we have all we need we can just call our helper function with await and our transaction call will run longer then a default one and it will deffer in time.
await runTransaction(taskCountTransaction)
What I like about this solution is that it doesn't mean more or complicated code and that most of the already written code can stay as it is. It also uses more time and resources only if the counter gets to the point that it has to count more items. Othervise the time and resources are the same as if you would have the default transactions.
For scaling up for large amounts of items we can increase eather the amount of retries or the waiting time. Both are also affecting the costs for Firebase. For the waiting part we also need to increase the timeout for our function.
DISCLAIMER: I have not stress tested this code with thousands or more of items. In our specific case the problems started with 20+ items and we need up to 50 items for a task. I tested it with 200 items and the problem did not apear again.
The transaction does run several times if needed, but if the values read continue to be updated before the write or writes can occur it will eventually fail, thus the documentation noting the transaction is retried a finite number of times. If you have a value that is updating frequently like a counter, consider other solutions like distributed counters. If you'd like more specific suggestions, I recommend you include the code of your transaction in your question and some information about what you're trying to achieve.
Firestore re-runs the transaction only a finite number of times. As of writing, this number is hard-coded as 5, and cannot be changed. To avoid congestion/contention when many users are using the same document, normally we use the exponential back-off algorithm (but this will result in transactions taking longer to complete, which may be acceptable in some use cases).
However, as of writing, this has not been implemented in the Firebase SDK yet — transactions are retried right away. Fortunately, we can implement our own exponential back-off algorithm in a transaction:
const createTransactionCollisionAvoider = () => {
let attempts = 0
return {
async avoidCollision() {
attempts++
await require('delay')(Math.pow(2, attempts) * 1000 * Math.random())
}
}
}
…which can be used like this:
// Each time we run a transaction, create a collision avoider.
const collisionAvoider = createTransactionCollisionAvoider()
db.runTransaction(async transaction => {
// At the very beginning of the transaction run,
// introduce a random delay. The delay increases each time
// the transaction has to be re-run.
await collisionAvoider.avoidCollision()
// The rest goes as normal.
const doc = await transaction.get(...)
// ...
transaction.set(...)
})
Note: The above example may cause your transaction to take up to 1.5 minutes to complete. This is fine for my use case. You might have to adjust the backoff algorithm for your use case.
I have implemented a simple back-off solution to share : maintain a global variable that assigns a different "retry slot" to each failed connection. For example if 5 connections came at the same time and 4 of them got a contention error, each would get a delay of 500ms, 1000ms, 1500ms, 2000ms until trying again, for example. So it could potentially all resolved at the same time without any more contention.
My transaction is a response of calling Firebase Functions. Each Functions computer instance could have a global variable nextRetrySlot that is preserved until it is shut down. So if error.code === 10 is caught for contention issue, the delay time can be (nextRetrySlot + 1) * 500 then you could for example nextRetrySlot = (nextRetrySlot + 1) % 10 so next connections get a different time round-robin in 500ms ~ 5000ms range.
Below are some benchmarks :
My situation is that I would like each new Firebase Auth registration to get a much shorter ID derived from unique Firebase UID, thus it has risk of collision.
My solution is simply to check all registered short ID and if the query returns something, just generate an another one until it is not. Then we register this new short ID to the database. So the algorithm cannot rely on only Firebase UID, but it is able to "move to the next one" in a deterministic way. (not just random again).
This is my transaction, it first read a database of all used short ID then write a new one atomically, to prevent an extremely unlikely event that 2 new registers came at the same time, with a different Firebase UID that derived into the same short ID, and both see that the short ID is vacant at the same time.
I run a test that intentionally register 20 different Firebase UIDs which all derived into the same short ID. (extremely unlikely situation) All that runs in burst at the same time. First I tried using the same delay on next retry, so I expect it to clash with each other again and again while slowly resolving some connections.
Same 500ms delay on retry : 45000ms ~ 60000ms
Same 1000ms delay on retry : 30000ms ~ 49000ms
Same 1500ms delay on retry : 43000ms ~ 49000ms
Then with distributed delay time in slots :
500ms * 5 slots on retry : 20000ms ~ 31000ms
500ms * 10 slots on retry : 22000ms ~ 23000ms
500ms * 20 slots on retry : 19000ms ~ 20000ms
1000ms * 5 slots on retry : ~29000ms
1000ms * 10 slots on retry : ~25000ms
1000ms * 20 slots on retry : ~26000ms
Confirming that different delay time definitely helps.
Found maxAttempts in the runTransaction code which should modify the 5 default attempts (but didn't tested yet).
Anyway, I think that random wait (plus eventually the queue) are still the better option.
Firestore now supports server-side increment() and decrement() atomic operations.
You can increment or decrement by any amount. See their blog post for full details. In many cases, this will remove the need for a client side transaction.
Example:
document("fitness_teams/Team_1").
updateData(["step_counter" : FieldValue.increment(500)])
This is still limited to a sustained write limit of 1 QPS per document so if you need higher throughput, consider using distributed counters. This will increase your read cost (as you'll need to read all the shard documents and compute a total) but allow you to scale your throughput by increasing the number of shards. Now, if you do need to increment a counter as part of a transaction, it's much less likely to fail due to update contention.

How can I always terminate a NodeJs script with a timeout even if the event loop is occupied?

Is it possible to use setTimeout in NodeJS to terminate a process even if the event loop is being occupied by something else?
For example, say I have code that looks like the following
setTimeout(async () => {
// Some code I run to gracefully exit my process.
}, timeout);
while (true) {
let r = 1;
}
The callback in my timeout will never be hit since the while loop will occupy the event loop. Is there some way that I can say: "Execute the following code after N seconds regardless of everything else?"
I'm writing selenium tests, but for some reason every once in a while the test will get "stuck" and never terminate. I basically want to always timeout my tests after a certain amount of time so we don't get into the position of tests that run forever.
Thanks!
Since JavaScript is single threaded, what you are going to want to do is to create a worker using fork, which will give it the feeling of being multi threaded. This will actually just give us two instances of node, each one having their own event loop. This fork will have your endless loop which you can then kill with your timeout.
main.js
const cp = require('child_process')
const path = require('path')
// Create the child
const child = cp.fork(path.join(__dirname, './worker.js'), [])
// Kill after "x" milliseconds
setTimeout(() => {
process.exit()
}, 5000);
// Listen for messages from the child
child.on('message', data => console.log(data))
Next you will setup your worker:
worker.js
let i = 0;
while (true) {
// Send the value of "i" to the parent
process.send(i++);
}
The child can communicate info about itself to the parent using process.send(data).
The parent can listen for messages from the child using child.on('message', ...).
Another thing we can do is kill the child instead of the main process if you need the main process to do more stuff still. And in this case you would call child.kill() inside the setTimeout instead.
const cp = require('child_process')
const path = require('path')
// Create the child
let child = cp.fork(path.join(__dirname, './worker.js'), [])
// Kill after "x" milliseconds
setTimeout(() => {
child.kill()
}, 5000);
If there are no more events in the eventloop, the main process will automatically close itself thus we don't need to call process.exit().

Semaphore equivalent in Node js , variable getting modified in concurrent request?

I am facing this issue for the past 1 week and I am just confused about this.
Keeping it short and simple to explain the problem.
We have an in memory Model which stores values like budget etc.Now when a call is made to the API it has a spent associated with it.
We then check the in memory model and add the spent to the existing spend and then check to the budget and if it exceeds we donot accept any more clicks of that model. for each call we also udpate the db but that is a async operation.
A short example
api.get('/clk/:spent/:id', function(req, res) {
checkbudget(spent, id);
}
checkbudget(spent, id){
var obj = in memory model[id]
obj.spent+= spent;
obj.spent > obj.budjet // if greater.
obj.status = 11 // 11 is the stopped status
update db and rebuild model.
}
This used to work fine but now with concurrent requests we are getting false spends out spends increase more than budget and it stops after some time. We simulated the call with j meter and found this.
As far as we could find node is async so by the time the status is updated to 11 many threads have already updated the spent for the campaign.
How to have a semaphore kind of logic for Node.js so that the variable budget is in sync with the model
update
db.addSpend(campaignId, spent, function(err, data) {
campaign.spent += spent;
var totalSpent = (+camp.spent) + (+camp.cpb);
if (totalSpent > camp.budget) {
logger.info('Stopping it..');
camp.status = 11; // in-memory stop
var History = [];
History.push(some data);
db.stopCamp(campId, function(err, data) {
if (err) {
logger.error('Error while stopping );
}
model.campMAP = buildCatMap(model);
model.campKeyMap = buildKeyMap(model);
db.campEventHistory(cpcHistory, false, function(err) {
if (err) {
logger.error(Error);
}
})
});
}
});
GIST of the code can anyone help now please
Q: Is there semaphore or equivalent in NodeJs?
A: No.
Q: Then how do NodeJs users deal with race condition?
A: In theory you shouldn't have to as there is no thread in javascript.
Before going deeper into my proposed solution I think it is important for you to know how NodeJs works.
For NodeJs it is driven by an event based architecture. This means that in the Node process there is an event queue that contains all the "to-do" events.
When an event gets pop from the queue, node will execute all of the required code until it is finished. Any async calls that were made during the run were spawned as other events and they are queued up in the event queue until a response is heard back and it is time to run them again.
Q: So what can I do to ensure that only 1 request can perform updates to the database at a time?
A: I believe there are many ways you can achieve this but one of the easier way out is to use the set_timeout API.
Example:
api.get('/clk/:spent/:id', function(req, res) {
var data = {
id: id
spending: spent
}
canProceed(data, /*functions to exec after canProceed=*/ checkbudget);
}
var canProceed = function(data, next) {
var model = in memory model[id];
if (model.is_updating) {
set_timeout(isUpdating(data, next), /*try again in=*/1000/*milliseconds*/);
}
else {
// lock is released. Proceed.
next(data.spending, data.id)
}
}
checkbudget(spent, id){
var obj = in memory model[id]
obj.is_updating = true; // Lock this model
obj.spent+= spent;
obj.spent > obj.budjet // if greater.
obj.status = 11 // 11 is the stopped status
update db and rebuild model.
obj.is_updating = false; // Unlock the model
}
Note: What I got here is pseudo code as well so you'll may have to tweak it a bit.
The idea here is to have a flag in your model to indicate whether a HTTP request can proceed to do the critical code path. In this case your checkbudget function and beyond.
When a request comes in it checks the is_updating flag to see if it can proceed. If it is true then it schedules an event, to be fired in a second later, this "setTimeout" basically becomes an event and gets placed into node's event queue for later processing
When this event gets fired later, the checks again. This occurs until the is_update flag becomes false then the request goes on to do its stuff and is_update is set to false again when all the critical code is done.
Not the most efficient way but it gets the job done, you can always revisit the solution when performance becomes a problem.

Categories