I have a stream that is composed from a chain of pipes.
I am using event-stream package to create the building blocks of the pipes.
The code gets a file from S3, unzips it, parses it and send the data to some async function
I am trying to get the promise resolved when it finished handling that file.
How can I be sure that the all the chain has finished draining?
My current solution looks like this.
it looks bad and I still think that there is a possibility that resolve()
will be called while there are data chunks in the gzReader for example.
thanks
const inputStream = this.s3client.getObject(params).createReadStream()
inputStream.on('end',() => {
console.log("Finished handling file " + fileKey)
let stopInterval = setInterval(() => {
if (counter == 0) {
resolve(this.eventsSent)
clearInterval(stopInterval)
}
}, 300)
})
const gzReader = zlib.createGunzip();
inputStream
.pipe(gzReader)
.pipe(es.split())
.pipe(es.parse())
.pipe(es.mapSync(data => {
counter++
this.eventsSent.add(data.data)
asyncFunc(this.destinationStream, data.data)
.then(() => {
counter--
})
.catch((e) => {
counter--
console.error('Failed sending event ' + data.data + e)
})
}))
Because you never initialize counter, it is zero and after the first 300ms, your function resolves (which can be before your pipes are working and increase the counter).
So don't use setInterval ;) You don't need it.
Also there is no need to use mapSync, if you already call an async function in it. Just use map and pass the data and callback (https://github.com/dominictarr/event-stream#map-asyncfunction). Don't forget to call the callback in your async function!
Add a last step in your pipe: wait(callback) (https://github.com/dominictarr/event-stream#wait-callback)
There you can resolve.
Related
I am trying to write a code in angular 11 for a scenario like this -
I have list of files, and for every file I hit an api (say api1), i take an fileId from response and i pass it to another api (say api2),i want to keep on hitting the api2 every 3 seconds,unless i dont get the status="available" in the response. Once i get the available status, i no more need to hit the api2 for that fileId and we can start processing for the next file in loop.
This whole process for every file that I have.
I understand we can achieve this using rxjs operators like mergeMap or switchMap (as the sequence do not matter to me right now) . But i am very new to rxjs and not sure how to put it together.
This is what i am doing right now -
this.filesToUpload.forEach((fileItem) => {
if (!fileItem.uploaded) {
if (fileItem.file.size < this.maxSize) {
self.fileService.translateFile(fileItem.file).then( //hit api1
(response) => {
if (response && get(response, 'status') == 'processing') {
//do some processing here
this.getDocumentStatus(response.fileId);
}
},
(error) => {
//show error
}
);
}
}
});
getDocumentStatus(fileId:string){
this.docStatusSubscription = interval(3000) //hitting api2 for every 3 seconds
.pipe(takeWhile(() => !this.statusProcessing))
.subscribe(() => {
this.statusProcessing = false;
this.fileService.getDocumentStatus(fileId).then((response)=>{
if(response.results.status=="available"){
this.statusProcessing = true;
//action complete for this fileId
}
},(error)=>{
});
})
}
Here's how I might do this given the description of what you're after.
Create a list of observables of all the calls you want to make.
Concatenate the list together
Subscribe
The thing that makes this work is that we only subscribe once (not once per file), and we let the operators handle subscribing and unsubscribing for everything else.
Then nothing happens until we subscribe. That way concat can do the heavy lifting for us. There's no need for tracking anything ourselves with variables like this.statusProessing or anything like that. That's all handled for us! It's less error prone that way.
// Create callList. This is an array of observables that each hit the APIs and only
// complete when status == "available".
const callList = this.filesToUpload
.filter(fileItem => !fileItem.uploaded && fileItem.file.size < this.maxSize)
.map(fileItem => this.createCall(fileItem));
// concatenate the array of observables by running each one after the previous one
// completes.
concat(...callList).subscribe({
complete: () => console.log("All files have completed"),
error: err => console.log("Aborted call list due to error,", err)
});
createCall(fileItem: FileItemType): Observable<never>{
// Use defer to turn a promise into an observable
return defer(() => this.fileService.translateFile(fileItem.file)).pipe(
// If processing, then wait untill available, otherwise just complete
switchMap(translateFileResponse => {
if (translateFileResponse && get(translateFileResponse, 'status') == 'processing') {
//do some processing here
return this.delayByDocumentStatus(translateFileResponse.fileId);
} else {
return EMPTY;
}
}),
// Catch and then rethrow error. Right now this doesn't do anything, but If
// you handle this error here, you won't abort the entire call list below on
// an error. Depends on the behaviour you're after.
catchError(error => {
// show error
return throwError(() => error);
})
);
}
delayByDocumentStatus(fileId:string): Observable<never>{
// Hit getDocumentStatus every 3 seconds, unless it takes more
// than 3 seconds for api to return response, then wait 6 or 9 (etc)
// seconds.
return interval(3000).pipe(
exhaustMap(_ => this.fileService.getDocumentStatus(fileId)),
takeWhile(res => res.results.status != "available"),
ignoreElements(),
tap({
complete: () => console.log("action complete for this fileId: ", fileId)
})
);
}
In my code below I get an empty array on my console.log(response) but the console.log(filterdIds) inside the getIds function is showing my desired data. I think my resolve is not right.
Note that I run do..while once for testing. The API is paged. If the records are from yesterday it will keep going, if not then the do..while is stopped.
Can somebody point me to the right direction?
const axios = require("axios");
function getToken() {
// Get the token
}
function getIds(jwt) {
return new Promise((resolve) => {
let pageNumber = 1;
const filterdIds = [];
const config = {
//Config stuff
};
do {
axios(config)
.then((response) => {
response.forEach(element => {
//Some logic, if true then:
filterdIds.push(element.id);
console.log(filterdIds);
});
})
.catch(error => {
console.log(error);
});
} while (pageNumber != 1)
resolve(filterdIds);
});
}
getToken()
.then(token => {
return token;
})
.then(jwt => {
return getIds(jwt);
})
.then(response => {
console.log(response);
})
.catch(error => {
console.log(error);
});
I'm also not sure where to put the reject inside the getIds function because of the do..while.
The fundamental problem is that resolve(filterdIds); runs synchronously before the requests fire, so it's guaranteed to be empty.
Promise.all or Promise.allSettled can help if you know how many pages you want up front (or if you're using a chunk size to make multiple requests--more on that later). These methods run in parallel. Here's a runnable proof-of-concept example:
const pages = 10; // some page value you're using to run your loop
axios
.get("https://httpbin.org") // some initial request like getToken
.then(response => // response has the token, ignored for simplicity
Promise.all(
Array(pages).fill().map((_, i) => // make an array of request promisess
axios.get(`https://jsonplaceholder.typicode.com/comments?postId=${i + 1}`)
)
)
)
.then(responses => {
// perform your filter/reduce on the response data
const results = responses.flatMap(response =>
response.data
.filter(e => e.id % 2 === 0) // some silly filter
.map(({id, name}) => ({id, name}))
);
// use the results
console.log(results);
})
.catch(err => console.error(err))
;
<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
The network tab shows the requests happening in parallel:
If the number of pages is unknown and you intend to fire requests one at a time until your API informs you of the end of the pages, a sequential loop is slow but can be used. Async/await is cleaner for this strategy:
(async () => {
// like getToken; should handle err
const tokenStub = await axios.get("https://httpbin.org");
const results = [];
// page += 10 to make the snippet run faster; you'd probably use page++
for (let page = 1;; page += 10) {
try {
const url = `https://jsonplaceholder.typicode.com/comments?postId=${page}`;
const response = await axios.get(url);
// check whatever condition your API sends to tell you no more pages
if (response.data.length === 0) {
break;
}
for (const comment of response.data) {
if (comment.id % 2 === 0) { // some silly filter
const {name, id} = comment;
results.push({name, id});
}
}
}
catch (err) { // hit the end of the pages or some other error
break;
}
}
// use the results
console.log(results);
})();
<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
Here's the sequential request waterfall:
A task queue or chunked loop can be used if you want to increase parallelization. A chunked loop would combine the two techniques to request n records at a time and check each result in the chunk for the termination condition. Here's a simple example that strips out the filtering operation, which is sort of incidental to the asynchronous request issue and can be done synchronously after the responses arrive:
(async () => {
const results = [];
const chunk = 5;
for (let page = 1;; page += chunk) {
try {
const responses = await Promise.all(
Array(chunk).fill().map((_, i) =>
axios.get(`https://jsonplaceholder.typicode.com/comments?postId=${page + i}`)
)
);
for (const response of responses) {
for (const comment of response.data) {
const {name, id} = comment;
results.push({name, id});
}
}
// check end condition
if (responses.some(e => e.data.length === 0)) {
break;
}
}
catch (err) {
break;
}
}
// use the results
console.log(results);
})();
<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
(above image is an except of the 100 requests, but the chunk size of 5 at once is visible)
Note that these snippets are proofs-of-concept and could stand to be less indiscriminate with catching errors, ensure all throws are caught, etc. When breaking it into sub-functions, make sure to .then and await all promises in the caller--don't try to turn it into synchronous code.
See also
How do I return the response from an asynchronous call? and Why is my variable unaltered after I modify it inside of a function? - Asynchronous code reference which explain why the array is empty.
What is the explicit promise construction antipattern and how do I avoid it?, which warns against adding a new Promise to help resolve code that already returns promises.
To take a step back and think about why you ran into this issue, we have to think about how synchronous and asynchronous javascript code works together. Your
synchronous getIds function is going to run to completion, stepping through each line until it gets to the end.
The axios function invocation is returning a Promise, which is an object that represents some future fulfillment or rejection value. That Promise isn't going to resolve until the next cycle of the event loop (at the earliest), and your code is telling it to do some stuff when that pending value is returned (which is the callback in the .then() method).
But your main getIds function isn't going to wait around... it invokes the axios function, gives the Promise that is returned something to do in the future, and keeps going, moving past the do/while loop and onto the resolve method which returns a value from the Promise you created at the beginning of the function... but the axios Promise hasn't resolved by that point and therefore filterIds hasn't been populated.
When you moved the resolve method for the promise you're creating into the callback that the axios resolved Promise will invoke, it started working because now your Promise waits for axios to resolve before resolving itself.
Hopefully that sheds some light on what you can do to get your multi-page goal to work.
I couldn't help thinking there was a cleaner way to allow you to fetch multiple pages at once, and then recursively keep fetching if the last page indicated there were additional pages to fetch. You may still need to add some additional logic to filter out any pages that you batch fetch that don't meet whatever criteria you're looking for, but this should get you most of the way:
async function getIds(startingPage, pages) {
const pagePromises = Array(pages).fill(null).map((_, index) => {
const page = startingPage + index;
// set the page however you do it with axios query params
config.page = page;
return axios(config);
});
// get the last page you attempted, and if it doesn't meet whatever
// criteria you have to finish the query, submit another batch query
const lastPage = await pagePromises[pagePromises.length - 1];
// the result from getIds is an array of ids, so we recursively get the rest of the pages here
// and have a single level array of ids (or an empty array if there were no more pages to fetch)
const additionalIds = !lastPage.done ? [] : await getIds(startingPage + pages, pages);
// now we wait for all page queries to resolve and extract the ids
const resolvedPages = await Promise.all(pagePromises);
const resolvedIds = [].concat(...resolvedPages).map(elem => elem.id);
// and finally merge the ids fetched in this methods invocation, with any fetched recursively
return [...resolvedIds, ...additionalIds];
}
I am working on a project where I am building a simple front end in Angular (typescript) / Node to make call to a back end server for executing different tasks. These tasks take time to execute and thus need to be queued on the back end server. I solved this issue by following the following tutorial: https://github.com/realpython/flask-by-example and everything seems to work just fine.
Now I am finishing things up on the front end, where most of the code has been already written in Typescript using Angular and Rxjs. I am trying to replicate the following code in Typescript:
https://github.com/dimoreira/word-frequency/blob/master/static/main.js
This code consists of two functions, where first function "getModelSummary"(getResults in the example) calls a post method via:
public getModelSummary(modelSummaryParameters: ModelSummaryParameters): Observable<ModelSummary> {
return this.http.post(`${SERVER_URL}start`, modelSummaryParameters)
.map(res => res.json())
;
}
to put the job in queue and assign a jobID to that function on the back end server. The second function "listenModelSummary", ideally should get executed right after the first function with the jobId as it's input and loops in a short interval checking if the job has been completed or not:
public listenModelSummary(jobID: string) {
return this.http.get(`${SERVER_URL}results/` + jobID).map(
(res) => res.json()
);
}
Once the job is done, it needs to return the results, which would update the front end.
I am new to Typescript, Observables and rxjs and wanted to ask for the right way of doing this. I do not want to use javascript, but want to stick to Typescript as much as possible in my front end code stack. How can I use the first function to call the second function with it's output "jobID" and have the second function run via interval until the output comes back?
Observables are great, and are the type of object returned by Angular's HttpClient class, but sometimes, in my opinion, dealing with them is a lot more complicated than using promises.
Yes, there is a slight performance hit for the extra operation to convert the Observable to a Promise, but you get a simpler programming model.
If you need to wait for the first function to complete, and then hand the returned value to another function, you can do:
async getModelSummary(modelSummaryParameters: ModelSummaryParameters): Promise<ModelSummary> {
return this.http.post(`${SERVER_URL}start`, modelSummaryParameters).toPromise();
}
async doStuff(): Promise<void> {
const modelSummary = await this.getModelSummary(params);
// not sure if you need to assign this to your viewmodel,
// what's returned, etc
this.listenModelSummary(modelSummary)
}
If you're dead-set on using Observables, I would suggest using the concatMap pattern, which would go something like this:
doStuff(modelSummaryParameters: ModelSummaryParameters): Observable<ModelSummary> {
return this.http
.post(`${SERVER_URL}start`, modelSummaryParameters)
.pipe(
concatMap(modelSummary => <Observable<ModelSummary>> this.listenModelSummary(modelSummary))
);
}
Here's an article on different mapping solutions for Observables: https://blog.angularindepth.com/practical-rxjs-in-the-wild-requests-with-concatmap-vs-mergemap-vs-forkjoin-11e5b2efe293 that might help you out.
You can try the following:
getModelSummary(modelSummaryParameters: ModelSummaryParameters): Promise<ModelSummary> {
return this.http.post(`${SERVER_URL}start`, modelSummaryParameters).toPromise();
}
async someMethodInYourComponent() {
const modelSummary = await this.get(modelSummary(params);
this.listenModelSummary(modelSummary)
}
// OR
someMethodInYourComponent() {
this.get(modelSummary(params).then(() => {
this.listenModelSummary(modelSummary);
});
}
After doing more reading/researching into rxjs, I was able to make my code work and I wanted to thank you guys for the feedback and to post my code below.
In my services I created two observables:
First one is to fetch a jobId returned by queue server:
// API: GET / FUNCTION /:jobID
public getModelSummaryQueueId(modelSummaryParameters: ModelSummaryParameters): Observable<JobId>{
return this.http.post(${SERVER_URL}start, modelSummaryParameters).map(
(jobId) => jobId.json()
)
}
Use the jobId from first segment to fetch data:
// API: GET / FUNCTION /:results
public listenModelSummary(jobId: JobId): Observable <ModelSummary>{
return this.http.get(${SERVER_URL}results/+ jobId).map(
(res) => res.json()
)
}
Below is the component that works with the 2 services above:
`
this.subscription = this.developmentService.getModelSummaryQueueId(this.modelSummaryParameters)
.subscribe((jobId) => {
return this.developmentService.listenModelSummary(jobId)
// use switchMap to pull value from observable and check if it completes
.switchMap((modelSummary) =>
// if value has not changed then invoke observable again else return
modelSummary.toString() === 'Nay!'
? Observable.throw(console.log('...Processing Request...'))
// ? Observable.throw(this.modelSummary = modelSummary)
: Observable.of(modelSummary)
)
.retryWhen((attempts) => {
return Observable
// specify number of attempts
.range(1,20)
.zip(attempts, function(i) {
return(i);
})
.flatMap((res:any) => {
// res is a counter of how many attempts
console.log("number of attempts: ", res);
res = 'heartbeat - ' + res
this.getProgressBar(res);
// this.res = res;
// delay request
return Observable.of(res).delay(100)
})
})
// .subscribe(this.displayData);
// .subscribe(modelSummary => console.log(modelSummary));
.subscribe((modelSummary) => {
console.log("FINAL RESULT: ", modelSummary)
this.modelSummary = modelSummary;
this.getProgressBar('Done');
});
});
`
I have a Node.js script that subscribes to a notification service and runs a bunch of things when push notification is received. However the service sometimes sends multiple notifications for the same event, so to avoid duplicate work I made a basic semaphore to block other tasks.
The problem is that Node still continues with execution despite the fact I see the file created on disk. I've tried a few different solutions but I think the problem comes from my lack of experience with the JS execution model, there's something I don't know about how it works that prevents my solution from working. How do I fix this?
const fse = require('fs-extra');
// notification handler callback
function handleRequest(data)
{
try{
var semaphore = fse.readJsonSync(__dirname + '/' + objectId);
console.log('task already running, stopping');
return;
}catch(err){
// semaphore doesn't exist, ok to proceed
console.log('starting new task');
fse.writeJson(__dirname + '/' + objectId, {objectId: objectId})
.then(stepOne).catch(rejectPromise)
.then(resp => stepTwo(resp, data)).catch(rejectPromise)
.then(resp => stepThree(resp, extra)).catch(rejectPromise)
.then(resp => stepFour(resp, argument)).catch(rejectPromise)
.then(sleep(20000))
.then(resp => releaseLock(objectId))
.catch(resp => rejectionHandler(resp);
}
}
function releaseLock(objectId)
{
return fse.remove(__dirname + '/' + objectId);
}
Other things I've tried
Create file in a separate function that returns promise, same outcome
Use Sync method to write file, but then I'm unable to chain promises
Wait synchronously after file creation, no effect
There is no need to create an external file to maintain locks, you can do something like this, this will also give you the performance boost ( less I/O opts).
const fse = require('fs-extra');
// notification handler callback
class NamedLocks {
constructor() {
this._pid = {};
}
acquire(pid) {
if (this._pid[pid]) {
// process is locked
// handle it
return Promise.reject();
}
this._pid[pid] = true;
return Promise.resolve();
}
release(pid) {
delete this._pid[pid];
}
}
const userLocks = new NamedLocks();
function handleRequest(data) {
userLocks.acquire(objectId)
.then(() => {
// semaphore doesn't exist, ok to proceed
console.log('starting new task');
fse.writeJson(__dirname + '/' + objectId, { objectId: objectId })
.then(stepOne).catch(rejectPromise)
.then(resp => stepTwo(resp, data)).catch(rejectPromise)
.then(resp => stepThree(resp, extra)).catch(rejectPromise)
.then(resp => stepFour(resp, argument)).catch(rejectPromise)
.then(sleep(20000))
.then(resp => userLocks.release(objectId))
.catch(resp => rejectionHandler(resp))
}).catch(() => {
// handle lock exist condition here
});
};
In this, you basically ask for a lock and if the lock exists, handle that in catch handler else do your thing and release the lock
I'm fairly new with rxjs. I'm calling a function below and the complete stream is read and the read console statements are printed, but I never see a "Subscibe done" and I don't know why. What will it take to get this stream to finish? Is something obviously wrong?
const readline$ = RxNode.fromReadLineStream(rl)
.filter((element, index, observable) => {
if (index >= range.start && index < range.stop) {
console.log(`kept line is ${JSON.stringify(element)}`);
return true;
} else {
console.log(`not keeping line ${JSON.stringify(element)}`);
return false;
}
})
.concatMap(line => Rx.Observable.fromPromise(myFunction(line)))
.do(response => console.log(JSON.stringify(response)));
readline$.subscribe(i => { console.log(`Subscribe object: ${util.inspect(i)}`); },
err => { console.error(`Subscribe error: ${util.inspect(err)}`); },
done => { console.log("Subscribe done."); // NEVER CALLED
anotherFunc(); // NEVER CALLED
}
);
You can see from the source code that it send complete notification only when the source stream emits close event. https://github.com/Reactive-Extensions/rx-node/blob/master/index.js#L100-L102
So if you need the proper complete handler to be called you'll need to close the stream yourself, see How to close a readable stream (before end)?.
In other words the Observable doesn't complete automatically after reading the entire file.