Sync worker threads in node js

Sync worker threads in node js - javascript

I have some code for playing with "worker threads" (Node js). When in app.js in loop I set condition i<100000000, my second thread do not start before first thread not finish. How in Node js work sync threads ? And how I can use two and more threads on parallel ?
const { Worker } = require('worker_threads');
const path = require('path');
const WORKERS_NUMBER = 2;
for (var i = 1; i <= WORKERS_NUMBER ; i++) {
const w = new Worker(path.join(__dirname, './app.js'), { workerData: { id: i } });
w.addListener("message",(message)=>{console.log(message);});
}
const { workerData, parentPort } = require('worker_threads');
const id = workerData.id;
console.log(`Worker ${id} initializad.`);
let i=0;
while (i<10) {
i++;
process.nextTick((i)=>{
parentPort.postMessage( `${id}:${i}` );
},i);
}

You need to manually sync the threads' execution.
You can do it by listening on a certain message you'd send from workers to the main thread and using a counter to know when all the threads are done.
Otherwise, you can use the microjob lib that easily wraps the worker thread execution inside a Promise, so the synchronisation becomes trivial.
Take a look to the examples inside the docs.

Related

Data object consistency with several workers node

I am trying to create a simple server which will give every new request to different worker. The DATA object is a simple javascript object in separate file. The problem I faced with is CONSISTENCY of this DATA object.
How to prevent worker from handling the request if the previous request is still proceeding? For example first request is UPDATE and lasts longer and the next request is DELETE and proceeds faster What node tool or pattern I need to use to be 100% percent sure that DELETE will happen after UPDATE?
I need to run every worker on a different port
const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;
cluster.schedulingPolicy = cluster.SCHED_RR;
const PORT = 4000;
if (cluster.isMaster) {
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
} else {
http.createServer((req, res) => {
if(req.url === '/users' && req.method === "PUT") {
updateUser(req)
} else if(req.url === '/users' && req.method === "DELETE") {
deleteUser(req)
}
}).listen(PORT++);
}

Each worker must reserve ("lock") the DATA object for exclusive use before it can change it. This can be done by writing a lock file and deleting it again after successful object change.
try {
fs.openSync("path/to/lock/file", "wx+");
/* Change DATA object */
fs.rmSync("path/to/lock/file");
} catch(err) {
if (err.code === "EEXIST") throw "locking conflict";
}
The worker executing the first (UPDATE) request will succeed in writing the lock file, but a concurrent worker executing a second (DELETE) request will experience a locking conflict. It can then either report the failure to the user, or re-try after a short waiting time.
(If you decide to implement the lock in this way, the asynchronous fs methods may be more efficient.)

Your code won't even create multiple servers set aside the different ports, and the PORT variable is a const, so it won't increment either.
What node tool or pattern I need to use to be 100% percent sure that DELETE will happen after UPDATE?
Use some sort of lock, not yet available on JavaScript
Use a semaphore/Mutex variable lock (See code).
Remember, JavaScript is a single-threaded language.
need to run every worker on a different port
For each worker, set the listening based on worker ID (See code). Remember that the CPU cannot have capability to generate workers equal to that of number of cores.
Sample working code:
const express = require('express')
const cluster = require('cluster')
const os = require('os')
if (cluster.isMaster) {
for (let i = 0; i < os.cpus().length; i++) {
cluster.fork()
}
} else {
const app = express()
// Global semaphore/mutex variable isUpdating
var isUpdating = false;
const worker = {
handleRequest(req, res) {
console.log("handleRequest on worker /" + cluster.worker.id);
if (req.method == "GET") { // FOR BROWSER TESTING, CHANGE IT LATER TO PUT
isUpdating = true;
console.log("updateUser GET");
// do updateUser(req);
isUpdating = false;
} else if (req.method == "DELETE") {
if (!isUpdating) { // Check for update lock
console.log("updateUser DELETE");
// do deleteUser(req)
}
}
},
}
app.get('/users', (req, res) => {
worker.handleRequest(req, res)
})
// Now each worker will run on different port
app.listen(4000 + cluster.worker.id, () => {
console.log(`Worker ${cluster.worker.id} started listening on port ${4000 + cluster.worker.id}`)
})
}

How to create a pool for worker threads

I have been learning about the experimental worker threads module in Node.js. I've read the official documentation, as well as most available articles, which are still quite sparse.
I have created a simple example that spawns ten (10) Worker threads in order to generate 10,000 SHA256 digests and then digitally sign them.
Using ten (10) Workers takes around two (2) seconds to generate all 10,000. Without workers, it takes approximately fifteen (15) seconds.
In the official documentation, it states that creating a pool of Workers is recommended versus spawning Workers on demand.
I've tried to find articles on how I'd go about doing this, but I haven't had any luck thus far.
How would I create a pool of Worker threads? Would the worker.js file somehow be modified so that I could create the Workers in advance and then send a message to the workers, which would cause them to execute their code? Would the pool be specific to the use case or is it possible to create a generic pool that could load a file or something and handle any use case?
Thank you.
MAIN
const { performance } = require('perf_hooks')
const { Worker } = require('worker_threads')
// Spawn worker
const spawn = function spawnWorker(workerData) {
return new Promise((resolve, reject) => {
const worker = new Worker('./worker.js', { workerData })
worker.on('message', (message) => resolve(message))
worker.on('error', reject)
worker.on('exit', (code) => {
if (code !== 0)
reject(new Error(`Worker stopped with exit code ${code}`))
})
})
}
const generate = async function generateData() {
const t0 = performance.now()
const initArray = []
for (step = 1; step < 10000; step += 1000) {
initArray.push({
start: step,
end: step + 999
})
}
const workersArray = initArray
.map(x => spawn(x))
const result = await Promise.all(workersArray)
let finalArray = []
for (let x of result) {
finalArray = finalArray.concat(x.data)
}
const t1 = performance.now()
console.log(`Total time: ${t1 - t0} ms`)
console.log('Length:', finalArray.length)
}
generate()
.then(x => {
console.log('EXITING!')
process.exit(0)
})
WORKERS
const { performance } = require('perf_hooks')
const { workerData, parentPort, threadId} = require('worker_threads')
const crypto = require('crypto')
const keys = require('./keys')
const hash = function createHash(data) {
const result = crypto.createHash('sha256')
result.update(data, 'utf8')
return result.digest('hex')
}
const sign = function signData(key, data) {
const result = crypto.createSign('RSA-SHA256')
result.update(data)
return result.sign(key, 'base64')
}
const t0 = performance.now()
const data = []
for (i = workerData.start; i <= workerData.end; i++) {
const digest = hash(i.toString())
const signature = sign(keys.HTTPPrivateKey, digest)
data.push({
id: i,
digest,
signature,
})
}
const t1 = performance.now()
parentPort.postMessage({
workerData,
data,
time: t1 - t0,
status: 'Done',
})

I would suggest using workerpool. It basically does all the pool management for you and it supports both worker threads and clusters.

JS: electron use fs.readSync in render process

use fs.readSync() in render process,and element of buffer is always 0.
use fs.read() will get correct result.
const electron = window.require('electron');
const { remote } = electron;
const fs = remote.require('fs');
const fd = fs.openSync(localPath, 'r');
const fileStat = fs.fstatSync(fd);
const { size: fileSize } = fileStat;
const dataBuffer = Buffer.alloc(fileSize);
const readSize = 1024;
for(let i = 0; i < fileSize; i += readSize) {
fs.readSync(fd, dataBuffer, i, Math.min(fileSize - i, readSize), null);
console.log(dataBuffer);
}

Return 0 probably means that the synchronized operation is failed with remote. You might capture exception in main thread.
BTW: Synchronized function call will be very slow in nodejs. That's why all these functions are named with Sync as suffix. I'd high recommend you write async code everywhere if it is possible.

child-process on callback access global object nodejs

process npm module in Node and accessing another file that does computations for me. The problem is that when on. message event/callback I am not sure what it is actually but there i am trying to access global variable and it says it is undefined. If somebody have can have a good explained solution.
_addBlock(newBlock)
{
newBlock.previousHash = this._getLatestBlock().hash;
var child =
childProcess.fork('C:\\Users\\Yoana\\WebstormProjects\\ChildProcess\\mining1.js'
);
child.on('message', function(newBlock)
{
// Receive results from child process
console.log('received: ' , newBlock);
this.chain.push(newBlock);
})
// Send child process some work
child.send(newBlock);
}
It says that this.chain.push is undefined. The method _addBlock is part of a class Blockchain and this.chain is globally accessible.

I'm not sure which model are you using i.e. node.js master/worker architecture with cluster native module or child_process native module with message passing etc., by the way despite of sharing globals it is not recommended (how to handle the shared memory? how to handle protected memory?), you can do in this way:
global.GlobalBotFactory = function() {
if (typeof(instance)=="undefined")
instance = new MyClass(options);
return instance;
}
and then you can reference it in other files like
this.instance = GlobalBotFactory(); // the shared factory instance
But this approach, despite it works, could led to several issues like
concurrent variable modification
shared memory corruption
reader/writer problem
etc. so I strongly suggest to follow a node cluster module with master/worker approach and then message passing:
/// node clustering
const cluster = require('cluster');
const numCPUs = require('os').cpus().length;
if (cluster.isMaster) { // master node
var masterConfig=require('./config/masterconfig.json');
// Fork workers.
var maxCPUs = masterConfig.cluster.worker.num;
maxCPUs=(maxCPUs>=numCPUs)?numCPUs:maxCPUs;
for (let i = 0; i < maxCPUs; i++) {
const worker=cluster.fork();
}
var MasterNode=require('./lib/master');
var master= new MasterNode(masterConfig);
master.start()
.then(done=> {
console.log(`Master ${process.pid} running on ${masterConfig.pubsub.node}`);
})
.catch(error=> {
console.error(`Master ${process.pid} error`,error);
});
}
else if (cluster.isWorker) { // worker node
var workerConfig=require('./config/workerconfig.json');
var WorkerNode=require('./lib/worker');
var worker= new WorkerNode(workerConfig);
worker.start()
.then(done=> {
console.log(`Worker ${process.pid} running on ${workerConfig.pubsub.node}`);
})
.catch(error=> {
console.error(`Worker ${process.pid} error`,error);
});
}
For the message passing part take care since you will deal with async forked process, and in node.js there is not guarantee that a message will be delivered, so you need a ack logic or you can use a pubsub approach (Redis will offer this for free, please check here), by the way here you are
for (var i = 0; i < 2; i++) {
var worker = cluster.fork();
// Receive messages from this worker and handle them in the master process.
worker.on('message', function(msg) {
console.log('Master ' + process.pid + ' received message from worker ' + this.pid + '.', msg);
});
// Send a message from the master process to the worker.
worker.send({msgFromMaster: 'This is from master ' + process.pid + ' to worker ' + worker.pid + '.'});
}
this will fork the workers and listen for incoming messages from the master or other workers. But please keep in mind that the delivery logic it's up to you. See here for more info about subprocess.send.

Nodejs/Javascript Getting Process Memory of any process

I am looking for a way of getting the process memory of any process running.
I am doing a web application. I have a server (through Nodejs), my file app.js, and an agent sending information to app.js through the server.
I would like to find a way to get the process memory of any process (in order to then sending this information to the agent) ?
Do you have any idea how I can do this ? I have searched on google but I haven't found my answer :/
Thank you
PS : I need a windows compatible solution :)

Windows
For windows, use tasklist instead of ps
In the example below, i use the ps unix program, so it's not windows compatible.
Here, the %MEM is the 4st element of each finalProcess iterations.
On Windows the %MEM is the 5th element.
var myFunction = function(processList) {
// here, your code
};
var parseProcess = function(err, process, stderr) {
var process = (process.split("\n")),
finalProcess = [];
// 1st line is a tab descriptor
// if Windows, i should start to 2
for (var i = 1; i < process.length; i++) {
finalProcess.push(cleanArray(process[i].split(" ")));
}
console.log(finalProcess);
// callback to another function
myFunction(finalProcess);
};
var getProcessList = function() {
var exec = require('child_process').exec;
exec('ps aux', parseProcess.bind(this));
}
// thx http://stackoverflow.com/questions/281264/remove-empty-elements-from-an-array-in-javascript
function cleanArray(actual){
var newArray = new Array();
for(var i = 0; i<actual.length; i++){
if (actual[i]){
newArray.push(actual[i]);
}
}
return newArray;
}
getProcessList();

We Keep Coding

JavaScript is the programming language of the Web.

Sync worker threads in node js - javascript

Related

Data object consistency with several workers node

How to create a pool for worker threads

JS: electron use fs.readSync in render process

child-process on callback access global object nodejs

Nodejs/Javascript Getting Process Memory of any process

Categories

Resources