I need to create a service in NodeJS that periodically executes a GET request to an API that return all the Jobs/Tasks. The service then needs to create a CronJob for each task returned while continuing to check for new tasks, and if there are new ones create new CronJobs.
I made something similar by having a service that runs a GET and then does a forEach loop and creates new CronJobs. But this doesn't take in account new tasks that are created after the first initialization. How do I solve this? How do I make a service that is always looking for new tasks and dynamically creates them?
EDIT1: the axios.post just post a log on a database, nothing special
const axios = require("axios");
const CronJob = require("cron").CronJob;
const cron = require("cron");
const startCron = async () => {
const schedules = await axios
.get("http://127.0.0.1:4000/")
.then((res) => {
return res.data;
})
.catch((err) => console.log(err));
schedules.forEach((schedule) => {
return new CronJob(`${schedule.timing} * * * * *`, () => {
let d = new Date();
console.log(schedule.message + " in data: " + d);
axios.post(`http://127.0.0.1:4000/${schedule.id}`);
}).start();
});
};
startCron();
Related
I've an application where I get the data from an API, every things is working fine, but I want to know how much time a request take in times, so I've used the interceptors from axios and I get the time in milleseconds, BUT the problem is that the time I get request-duration is after the request respond which is not useful I want to know the time before or at calling the webservice, the thing make it hard for me is that the method I call is located in another file:
Request.js
export const getRequest = async (url, baseURL, headers) => {
const HTTP = axios.create({
baseURL,
headers,
});
HTTP.interceptors.request.use((config) => {
config.headers["request-startTime"] = new Date().getTime();
return config;
});
HTTP.interceptors.response.use((response) => {
const currentTime = new Date().getTime();
const startTime = response.config.headers["request-startTime"];
response.headers["request-duration"] = currentTime - startTime;
return response;
});
return HTTP.get(url);
};
Users.vue
async getUsers() {
try {
let url = `/users`;
let baseUrl = `baseURL`;
let headers = {};
const responseUsers = await getRequest(url,baseUrl,headers);
console.log(responseJobTasks.headers["request-duration"]); //show how much milleseconds here
if (responseJobTasks.status === 200) {
const { data } = responseJobTasks;
this.users = data;
}
} catch (error) {
console.error(error);
}
}
You can know it simply by taking the timestamp of the before and after:
const before = Date.now();
const responseUsers = await getRequest(url,baseUrl,headers);
const after = Date.now();
const duration = after-before;
Another option is the User Timing API, but it is overkill for timing a single request.
I'm developing an AWS Lambda in TypeScript that uses Axios to get data from an API and that data will be filtered and be put into a dynamoDb.
The code looks as follows:
export {};
const axios = require("axios");
const AWS = require('aws-sdk');
exports.handler = async (event: any) => {
const shuttleDB = new AWS.DynamoDB.DocumentClient();
const startDate = "2021-08-16";
const endDate = "2021-08-16";
const startTime = "16:00:00";
const endTime = "17:00:00";
const response = await axios.post('URL', {
data:{
"von": startDate+"T"+startTime,
"bis": endDate+"T"+endTime
}}, {
headers: {
'x-rs-api-key': KEY
}
}
);
const params = response.data.data;
const putPromise = params.map(async(elem: object) => {
delete elem.feat1;
delete elem.feat2;
delete elem.feat3;
delete elem.feat4;
delete elem.feat5;
const paramsDynamoDB = {
TableName: String(process.env.TABLE_NAME),
Item: elem
}
shuttleDB.put(paramsDynamoDB).promise();
});
await Promise.all(putPromise);
};
This all works kind of fine. If the test button gets pushed the first time, everything seems fine and is working. E.g. I received all the console.logs during developing but the data is not put into the db.
With the second try it is the same output but the data is successfully put into the Db.
Any ideas regarding this issue? How can I solve this problem and have the data put into the Db after the first try?
Thanks in advance!
you need to return the promise from the db call -
return shuttleDB.put(paramsDynamoDB).promise();
also, Promise.all will complete early if any call fails (compared to Promise.allSettled), so it may be worth logging out any errors that may be happening too.
Better still, take a look at transactWrite - https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB/DocumentClient.html#transactWrite-property to ensure all or nothing gets written
I have a function that inserts into the database with the POST method and debugging, I test it with postman, sending it an empty post request, so it executes the controller
The function executes 2 more, than 1 is the one that inserts to the DB, ok, I want to execute this function automatically with node-cron
My functions
export class GettingInfo {
ReadingFileFromServer = () => {
const file = path.resolve(__dirname, '../../../dist/entity/PRUEBA.txt')
try {
const data = fs.readFileSync(file, 'utf-8');
const lines = data.split("\n")
let values = []
let bi = []
lines.forEach(line => {
line.trim()
values = line.split("\|", 6).map(a => a.trim());
bi.push(values)
console.log(bi)
})
const convert = this.TransformingFiletoJson(bi)
console.log(convert)
const save = this.SavingReferences(convert)
console.log(save)
} catch (err) {
console.error(err), 'something has happened to the file';
}
}
for the moment to test it I call it in a controller.ts
#Post('data')
createData(){
const tasks = new GettingInfo(this.referenceService)
tasks.ReadingFileFromServer()
return "created! 201 test.."
}
}
But now, that I want to run it alone, create a file "execute.ts" with the following code and it does not run alone
import cron = require("node-cron")
import {GettingInfo} from "./reference.task";
cron.schedule("5 * * * * *", ()=> {
const echale = new GettingInfo(this.referenceService)
echale.ReadingFileFromServer()
console.log("Executing...")
})
From what I can see in the node-cron documentation you need to start the task in order to start the scheduled cron executions.
Change your code to:
const task = cron.schedule("5 * * * * *", ()=> {
const echale = new GettingInfo(this.referenceService)
echale.ReadingFileFromServer()
console.log("Executing...")
})
task.start()
And it should work.
I have been learning about the experimental worker threads module in Node.js. I've read the official documentation, as well as most available articles, which are still quite sparse.
I have created a simple example that spawns ten (10) Worker threads in order to generate 10,000 SHA256 digests and then digitally sign them.
Using ten (10) Workers takes around two (2) seconds to generate all 10,000. Without workers, it takes approximately fifteen (15) seconds.
In the official documentation, it states that creating a pool of Workers is recommended versus spawning Workers on demand.
I've tried to find articles on how I'd go about doing this, but I haven't had any luck thus far.
How would I create a pool of Worker threads? Would the worker.js file somehow be modified so that I could create the Workers in advance and then send a message to the workers, which would cause them to execute their code? Would the pool be specific to the use case or is it possible to create a generic pool that could load a file or something and handle any use case?
Thank you.
MAIN
const { performance } = require('perf_hooks')
const { Worker } = require('worker_threads')
// Spawn worker
const spawn = function spawnWorker(workerData) {
return new Promise((resolve, reject) => {
const worker = new Worker('./worker.js', { workerData })
worker.on('message', (message) => resolve(message))
worker.on('error', reject)
worker.on('exit', (code) => {
if (code !== 0)
reject(new Error(`Worker stopped with exit code ${code}`))
})
})
}
const generate = async function generateData() {
const t0 = performance.now()
const initArray = []
for (step = 1; step < 10000; step += 1000) {
initArray.push({
start: step,
end: step + 999
})
}
const workersArray = initArray
.map(x => spawn(x))
const result = await Promise.all(workersArray)
let finalArray = []
for (let x of result) {
finalArray = finalArray.concat(x.data)
}
const t1 = performance.now()
console.log(`Total time: ${t1 - t0} ms`)
console.log('Length:', finalArray.length)
}
generate()
.then(x => {
console.log('EXITING!')
process.exit(0)
})
WORKERS
const { performance } = require('perf_hooks')
const { workerData, parentPort, threadId} = require('worker_threads')
const crypto = require('crypto')
const keys = require('./keys')
const hash = function createHash(data) {
const result = crypto.createHash('sha256')
result.update(data, 'utf8')
return result.digest('hex')
}
const sign = function signData(key, data) {
const result = crypto.createSign('RSA-SHA256')
result.update(data)
return result.sign(key, 'base64')
}
const t0 = performance.now()
const data = []
for (i = workerData.start; i <= workerData.end; i++) {
const digest = hash(i.toString())
const signature = sign(keys.HTTPPrivateKey, digest)
data.push({
id: i,
digest,
signature,
})
}
const t1 = performance.now()
parentPort.postMessage({
workerData,
data,
time: t1 - t0,
status: 'Done',
})
I would suggest using workerpool. It basically does all the pool management for you and it supports both worker threads and clusters.
I have multiple scrapers like below:
await scraper1.run
await scraper2.run
// etc
to increase performance and response time I used websocket and I pass socket connection down to each scraper and emit for each single item (of the result).
const express = require('express')
const app = express()
const http = require('http').Server(app)
const cors = require('cors')
const puppeteer = require('puppeteer')
const io = require('socket.io')(http)
const mongoose = require('mongoose')
const _ = require('lodash')
const scraper1 = require('./scraper1')
const scraper2 = require('./scraper2')
mongoose.connect("mongodb://localhost:27017/test")
;(async function () {
try {
const browser = await puppeteer.launch({
headless: false
})
io.on('connection', async function (socket) {
socket.on('search', async function (query) {
// check whether document exists with user ip address then return
// otherwise run the scrapres
await scraper1.run(browser, socket, query)
await scraper2.run(browser, socket, query)
})
})
} catch (e) {
console.log(e)
}
})()
http.listen(3000)
Context: When user does refresh and multiple socket connections made the scrapers run multiple times and data become duplicate. I prevented the duplication using mongodb but the performance issue remains because scrapers run until their result are ready then I check with the database.
Question: How to lock or prevent scrapers from running multiple times and also wait for each scraper to be done with websocket?
I can propose you next solution (I didn't test it but I guess you can figure out what I am trying to achieve, the explanation below the example):
'use strict';
const queryState = {
};
const getQueryKey = (query) => {
// base64 but can be a hash like sha256
const key = Buffer.from(query).toString('base64');
return key;
};
/**
* Return query state
* #param {String} query
* #return {String} state [PENDING, DONE, null] null if query doesn't exist
*/
const getQueryState = (query) => {
const key = getQueryKey(query);
const state = queryState[key] || null;
return state;
};
/**
* Add a query and initialize it as pending
* #param {String} query
* #return {String} state
*/
const addQuery = (query) => {
const key = getQueryKey(query);
const state = 'PENDING';
queryState[key] = state;
return state;
};
/**
* Hashmap to associate pending queries to be notified to socket connections
* when query is done
* This structure keeps and array of callbacks per query key
*/
const observers = {
};
const addObserver = (query, callback) => {
const key = getQueryKey(query);
if (typeof observers[key] !== 'undefined') {
observers[key] = [callback];
} else {
observers[key] = [...observers[key], callback];
}
};
const notifyObservers = (query) => {
const key = getQueryKey(query);
const callbacks = observers[key] || [];
// TODO: get query data scrapper from a cache / database / etc
const data = getDataFromQuery(query);
callbacks.forEach((callback) => {
callback(data);
});
};
/**
* Update query status to done
* PreCD: query must exist in queryState (previously added using addQuery)
* #param {String} query
* #return {String} state
*/
const endQuery = (query) => {
const key = getQueryKey(query);
const state = 'DONE';
queryState[key] = state;
return state;
};
io.on('connection', async function (socket) {
socket.on('search', async function (query) {
/**
* If query doesn't exist, scrap it
*/
const state = getQueryState(query);
if (state === null) {
addQuery(query);
await scraper1.run(browser, socket, query);
await scraper2.run(browser, socket, query);
endQuery(query);
// store scrapper data in cache / database / etc and
// socket send scraperData to the user
// notify pending queries to send data scrapper
notifyObservers(query);
} else if (state === 'PENDING') {
// add callback to return data to the user
addObserver(query, (scraperData) => {
// socket send scraperData to the user
});
} else {
// socket send scraperData to the user
}
});
});
To simplify the things, the example is simple but not the best one in terms of performance / architecture. This solution implements:
Scenario 1 (someone asking for query1 first time)
A request (socket connection) came to the backend asking for a query
This request doesn't exist yet, so, mark it as PENDING and start scrappers
Scenario 2 (another one asking for query1)
A second connection appears asking for same query as scenario 1
The request is in state PENDING, so, we add a callback to be call when this query finish
Scenario 3 (query 1 finish)
Scrappers started at Scenario 1 ends so query1 is market as DONE
Each request (observer) waiting for query1 will be notified
This solution can have multiple ways to be implemented, but my point is try to expose this to you and then you can modify it in the way you want.
Hope this helps