Strapi Async API Request in a Loop - javascript

I'm using a headless CMS (Strapi), which forces pagination on its GET endpoints, with up to 100 entries per page on the content types. I have ~400 entries on a Location content type that I want to show on the same page. I'm trying to read each page on load and store each Location in an array, but I'm having some issues fetching this data asynchronously. Instead of getting each page, I get the first page for each of the iterations of the loop. I'm new to async-await requests, so I'm not sure how to ensure that I am reading each page's data. Thanks in advance!
const getLocationsPage = async (page) => {
return await useFetch(`${this.config.API_BASE_URL}/api/locations?sort=slug&pagination[page]=${page}&pagination[pageSize]=50`)
}
for(let i = 1; i < this.pageCount; i++){
const pageResults = await getLocationsPage(i)
console.log(pageResults)
}

Related

how to maintain the order of http requests using node js?

I have a bunch of data that I want to send to a server through http. However in the server side I need to process the data in the same order as they were sent(e.g. if the order of sending is elem1, elem2 and elem3, I would like to process elem1 first, then elem2 and then elem3). Since in http, there is no grantee that the order will be maintained I need some way to maintain the order.
Currently I am keeping the data in a queue and I send one element and await for the response. Once the response reaches me I send the next element.
while (!queue.isEmpty()) {
let data = queue.dequeue();
await sendDataToServer(data);
}
I am not very sure if this will actually work in a production environment and what will be the impact on the performance.
Any sort of help is much appreciated. Thank you
Sorry, I don't have enough reputation to comment, thus I am posting this as an answer.
Firstly, your code will work as intended.
However, since the server has to receive them in order, the performance won't be good. If you can change the server, I suggest you implement it like this:
Add an ID to each data item.
Send all the data items, no need to ensure order.
Create a buffer on the server, the buffer will be able to contain all the data items.
The server receives the items and puts them into the buffer in the right position.
Example code:
Client (see Promise.all)
let i = 0;
let promises = [];
await sendDataLengthToServer(queue.length());
while (!queue.isEmpty()) {
let data = queue.dequeue();
data.id = i;
// no need to wait for a request to finish
promises.push(sendDataToServer(data));
}
await Promise.all(promises);
Server (pseudo-code)
length = receiveDataLengthFromClient()
buffer = new Array(length)
int received = 0
onDataReceivedFromClient(data, {
received = received + 1
buffer[data.id] = data
if (received == length) {
// the buffer contains the data in the right order
}
})

How to get only one user instead of a list?

I try to get from a list of users to only one user and display his profile on another page.
I want to do so with the routerLink and passing on an id of this specific user to the next page.
The routing is working, Im directed to the profile page but when I log the results of the http request I still get back the whole list of users like in the users page instead of the details of one user.
I have tried many things like changing the path of the url in my user.service.ts but that didn't solve the problem I even got 404 request errors when using this path ${this.url}/users/${id}/ instead of ${this.url}/users/?i=${id}/ (where its working).
The api docs is saying though that in order to retrieve one single user its http://1234//users/{id}/ it this scheme while id is an integer. But when I want to apply that scheme I get the 404 error.
Thats why I have to use the ?I= version but there the problem is I only get the full list of users on the next page.
MY CODE:
user.service.ts
// get a user's profile
getUserDetails(id): Observable<any> {
return this.http.get(`${this.url}/users/?i=${id}/`); // why add ?i
}
user.page.ts
// get all users in the users page
getAllUsers() {
this.userList = this.userService.getList()
.pipe(map(response => response.results));
}
user.page.html
<ion-avatar class="user-image" slot="start" [routerLink]="['/','profile', 'user.id']">
<ion-img src="assets/22.jpeg"> </ion-img>
</ion-avatar>
profile.page.ts
information = null;
...
ngOnInit() {
// Get the ID that was passed with the URL
let id = this.activatedRoute.snapshot.paramMap.get('id');
// Get the information from the API
this.userService.getUserDetails(id).subscribe(result => {
this.information = result;
console.log(result);
});
}
It seems like the url is wrong. If it was me I would console.log the url and compare it to the docs. Heres a snippet to try a few variations:
const id = 1;
const options = [
`${this.url}/users/?i=${id}/`,
`${this.url}/users/?i=${id}`,
`${this.url}/users/i/${id}/`,
`${this.url}/users/i/${id}`,
`${this.url}/user/?i=${id}/`,
`${this.url}/user/?i=${id}`,
`${this.url}/user/i/${id}/`,
`${this.url}/user/i/${id}`,
];
for (const option of options) {
try {
const response = await this.http.get(option);
console.log(options, response);
} catch (e) {
}
}
I would also consider dropping the second http request. If the first request returns all the required data you could just store it in a variable on the service.

Sending thousands of fetch requests crashes the browser. Out of memory

I was tasked with transferring a large portion of data using javascript and an API from one database to another. Yes I understand that there are better ways of accomplishing this task, but I was asked to try this method.
I wrote some javascript that makes a GET call to an api that returns an array of data, which I then turnaround and make calls to another api to send this data as individual POST requests.
What I have written so far seems to works fairly well, and I have been able to send over 50k individual POST requests without any errors. But I am having trouble when the number of POST requests increases past around 100k. I end up running out of memory and the browser crashes.
From what I understand so far about promises, is that there may be an issue where promises (or something else?) are still kept in heap memory after they are resolved, which results in running out of memory after too many requests.
I've tried 3 different methods to get all the records to POST successfully after searching for the past couple days. This has included using Bluebirds Promise.map, as well as breaking up the array into chunks first before sending them as POST requests. Each method seems to work up until it has processed about 100k records before it crashes.
async function amGetRequest(controllerName) {
try{
const amURL = "http://localhost:8081/api/" + controllerName;
const amResponse = await fetch(amURL, {
"method": "GET",
});
return await amResponse.json();
} catch (err) {
closeModal()
console.error(err)
}
};
async function brmPostRequest(controllerName, body) {
const brmURL = urlBuilderBRM(controllerName);
const headers = headerBuilderBRM();
try {
await fetch(brmURL, {
"method": "POST",
"headers": headers,
"body": JSON.stringify(body)
});
}
catch(error) {
closeModal()
console.error(error);
};
};
//V1.0 Send one by one and resolve all promises at the end.
const amResult = await amGetRequest(controllerName); //(returns an array of ~245,000 records)
let promiseArray = [];
for (let i = 0; i < amResult.length; i++) {
promiseArray.push(await brmPostRequest(controllerName, amResult[i]));
};
const postResults = await Promise.all(promiseArray);
//V2.0 Use bluebirds Promise.map with concurrency set to 100
const amResult = await amGetRequest(controllerName); //(returns an array of ~245,000 records)
const postResults = Promise.map(amResult, async data => {
await brmPostRequest(controllerName, data);
return Promise.resolve();
}, {concurrency: 100});
//V3.0 Chunk array into max 1000 records and resolve 1000 promises before looping to the next 1000 records
const amResult = await amGetRequest(controllerName); //(returns an array of ~245,000 records)
const numPasses = Math.ceil(amResult.length / 1000);
for (let i=0; i <= numPasses; i++) {
let subset = amResult.splice(0,1000);
let promises = subset.map(async (record) => {
await brmPostRequest(controllerName, record);
});
await Promise.all(promises);
subset.length = 0; //clear out temp array before looping again
};
Is there something that I am missing about getting these promises cleared out of memory after they have been resolved?
Or perhaps a better method of accomplishing this task?
Edit: Disclaimer - I'm still fairly new to JS and still learning.
"Well-l-l-l ... you're gonna need to put a throttle on this thing!"
Without (pardon me ...) attempting to dive too deeply into your code, "no matter how many records you need to transfer, you need to control the number of requests that the browser attempts to do at any one time."
What's probably happening right now is that you're stacking up hundreds or thousands of "promised" requests in local memory – but, how many requests can the browser actually transmit at once? That should govern the number of requests that the browser actually attempts to do. As each reply is returned, your software then decides whether to start another request and if so for which record.
Conceptually, you have so-many "worker bees," according to the number of actual network requests your browser can simultaneously do. Your software never attempts to launch more simultaneous requests than that: it simply launches one new request as each one request is completed. Each request, upon completion, triggers code that decides to launch the next one.
So – you never are "sending thousands of fetch requests." You're probably sending only a handful at a time, even though, in this you-controlled manner, "thousands of requests do eventually get sent."
As you are not intereted in the values delivered by brmPostRequest(), there's no point mapping the original array; neither the promises nor the results need to be acumulated.
Not doing so will save memory and may allow progress beyond the 100k sticking point.
async function foo() {
const amResult = await amGetRequest(controllerName);
let counts = { 'successes': 0, 'errors': 0 };
for (let i = 0; i < amResult.length; i++) {
try {
await brmPostRequest(controllerName, amResult[i]);
counts.successes += 1;
} catch(err) {
counts.errors += 1;
}
};
const console.log(counts);
}

Best way to push one more scrape after all are done

I have following scenario:
My scrapes are behind a login, so there is one login page that I always need to hit first
then I have a list of 30 urls that can be scraped asynchronously for all I care
then at the very end, when all those 30 urls have been scraped I need to hit one last separate url to put the results of the 30 URL scrape into a firebase db and to do some other mutations (like geo lookups for addresses etc)
Currently I have all 30 urls in a request queue (through the Apify web-interface) and I'm trying to see when they are all finished.
But obviously they all run async so that data is never reliable
const queue = await Apify.openRequestQueue();
let pendingRequestCount = await queue.getInfo();
The reason why I need that last URL to be separate are two-fold:
Most obvious reason being that I need to be sure I have the
results of all 30 scrapes before I send everything to DB
neither of the 30 URL's allow me to do Ajax / Fetch calls, which
I need for sending to Firebase and do the geo lookups of addresses
Edit: Tried this based on answer from #Lukáš Křivka. handledRequestCount in the while loop reaches a max of 2, never 4 ... and Puppeteer just ends normally. I've put the "return" inside the while loop because otherwise requests never finish (of course).
In my current test setup I have 4 urls to be scraped (in the Start URLS input fields of Puppeteer Scraper (on Apify.com) and this code :
let title = "";
const queue = await Apify.openRequestQueue();
let {handledRequestCount} = await queue.getInfo();
while (handledRequestCount < 4){
await new Promise((resolve) => setTimeout(resolve, 2000)) // wait for 2 secs
handledRequestCount = await queue.getInfo().then((info) => info.handledRequestCount);
console.log(`Curently handled here: ${handledRequestCount} --- waiting`) // this goes max to '2'
title = await page.evaluate(()=>{ return $('h1').text()});
return {title};
}
log.info("Here I want to add another URL to the queue where I can do ajax stuff to save results from above runs to firebase db");
title = await page.evaluate(()=>{ return $('h1').text()});
return {title};
I would need to see your code to answer completely correctly but this has solutions.
Simply use Apify.PuppeteerCrawler for the 30 URLs. Then you run the crawler with await crawler.run().
After that, you can simply load the data from the default dataset via
const dataset = await Apify.openDataset();
const data = await dataset.getdata().then((response) => response.items);
And do whatever with the data, you can even create new Apify.PuppeteerCrawler to crawl the last URL and use the data.
If you are using Web Scraper though, it is a bit more complicated. You can either:
1) Create a separate actor for the Firebase upload and pass it a webhook from your Web Scraper to load the data from it. If you look at the Apify store, we already have a Firestore uploader.
2) Add a logic that will poll the requestQueue like you did and only when all the requests are handled, you proceed. You can create some kind of loop that will wait. e.g.
const queue = await Apify.openRequestQueue();
let { handledRequestCount } = await queue.getInfo();
while (handledRequestCount < 30) {
console.log(`Curently handled: ${handledRequestCount } --- waiting`)
await new Promise((resolve) => setTimeout(resolve, 2000)) // wait for 2 secs
handledRequestCount = await queue.getInfo().then((info) => info.handledRequestCount);
}
// Do your Firebase stuff
In the scenario where you have one async function that's called for all 30 URLs you scrape, first make sure the function returns its result after all necessary awaits, you could await for Promise.all(arrayOfAll30Promises) then run your last piece of code
Because I was not able to get consistent results with the {handledRequestCount} from getInfo() (see my edit in my original question), I went another route.
I'm basically keeping a record of which URL's have already been scraped via the key/value store.
urls = [
{done:false, label:"vietnam", url:"https://en.wikipedia.org/wiki/Vietnam"},
{done:false , label:"cambodia", url:"https://en.wikipedia.org/wiki/Cambodia"}
]
// Loop over the array and add them to the Queue
for (let i=0; i<urls.length; i++) {
await queue.addRequest(new Apify.Request({ url: urls[i].url }));
}
// Push the array to the key/value store with key 'URLS'
await Apify.setValue('URLS', urls);
Now every time I've processed an url I set its "done" value to true.
When they are all true I'm pushing another (final) url into the queue:
await queue.addRequest(new Apify.Request({ url: "http://www.placekitten.com" }));

AngularFire / Firestore - Return collections and documents as a service

I have several pages that reference the same node in firestore, each pulling different segments from the firestore node. For example, a summary page might pull through album title, date, genre and image, whilst another page might pull through just the title, artist and record label. A couple of questions:
Is it possible to turn one of the firestore queries into a service?
If so, does that mean the data is only read once whilst navigating across different pages (angular components) that use the same service?
Will the query only run again when data is modified in firestore through the observable? ("return Observable.create(observer => {" )
I have tried a service with the code below. However, the issue observed is that on page refresh, the data isn't present. It is however present whilst navigating through the site. I believe this is because my page is running before the observable is returned. Is there a way to wrap up the query as an observable?
Any assistance would be greatly appreciated.
getAlbumData() {
this.albumDoc = this.afs.doc(`albums/${this.albumId}`);
this.album = this.albumDoc.snapshotChanges();
this.album.subscribe((value) => {
// The returned Data
const data = value.payload.data();
// Firebase Reference
var storage = firebase.storage();
// If album cover exists
if (data.project_signature != undefined) {
// Get the Image URL
var image = data.album_cover_image;
// Create an image reference to the storage location
var imagePathReference = storage.ref().child(image);
// Get the download URL and set the local variable to the result (url)
imagePathReference.getDownloadURL().then((url) => {
this.album_cover = url;
});
}
});
}
When I build my observables, I try to use operators as much as I can until I get the data I want to display in my UI.
You don't want to implement too much code in the subscribe method because you break the reactive paradigm by doing so.
Instead, extract you data in your observable and display it in your template.
Don't forget to use the async pipe in your template to display your data when it gets fetched by your application.
I would do something like this:
// In AlbumService
getAlbumCover(albumId: string) {
const albumDoc = this.afs.doc(`albums/${albumId}`);
const album_cover$ = this.albumDoc.snapshotChanges().pipe(
// get the album data from firestore request
map(data => {
return {
id: data.payload.id,
...data.payload.data()
};
}),
// emits data only if it contains a defined project_signature
filter(album => album.project_signature),
// prepare the imagePath and get the album cover from the promise
mergeMap(album => {
const storage = firebase.storage();
const image = album.album_cover_image;
const imagePathReference = storage.ref().child(image);
return imagePathReference.getDownloadURL();
})
);
return album_cover$;
}
By doing so, when your data is updated in firestore, it will be fetched automatically by your application since you use an observable.
In your component, in the onInit() method after getting your album id from the url:
this.album_cover$ = this.albumService.getAlbumCover(albumId);
Finally, in my template, I would do :
<div>{{album_cover$ | async}}</div>

Categories