How to pass data between pageFunction executions in Apify web - javascript

I'm scraping website with Apify. I want to scrape different types of pages and then combine the data into one data set. Now i have different sets of data for each kind of pages (users, shots). How to transfer data between pageFunction executions, ex. to calculate followers number for each shot author.
async function pageFunction(context) {
const { request, log, jQuery } = context;
const $ = jQuery;
if (request.url.indexOf('/shots/') > 0) {
const title = $('.shot-title').text();
return {
url: request.url,
} else if (request.userData.label === "USER") {
var followers_count = $('.followers .count').first().text();
return {
url: request.url,

If I understand the question correctly, you can pass the data through crawled pages and save only one item in the end. For this use case, you can work with userData, which you can pass with every request.
For example, if you would like to pass the data from /shots site to the USER, you could do it like this. (but it requires you to enqueue pages manually to control the flow of the data, also this approach except that the /shots type of the page is the first one you visit and then continue)
async function pageFunction(context) {
const { request, log, jQuery } = context;
const $ = jQuery;
if (request.url.indexOf('/shots/') > 0) {
const title = $('.shot-title').text();
const userLink = 'some valid url to user page'
//add to the queue your request with the title in the userData
await context.enqueueRequest({
url: userLink,
shotsTitle: title
} else if (request.userData.label === "USER") {
var followers_count = $('.followers .count').first().text();
//here you need to get the shotsTitle and return it
return {
url: request.url,
shotsTitle: request.userData.shotsTitle
If you would need to share the between runs of the actors, that is other topic, let me know if it helped.
Also would recommend going through the getting started guide which is here.


With JS fetch-PUT i update my db (and it does indeed) but with fetch-GET it displays the data just before the update

Im working on a Django project kind of network. I have a JS code in which with a fetch-PUT i update my db and i can check from the file that it is updated.
function update_like(identity) {
method: 'PUT',
body: JSON.stringify({
id: identity,
And then with a fetch-GET i try to retrieve the data
function show(identity) {
fetch('/like', {
headers: {
'Cache-Control': 'no-cache'
.then(response => response.json())
.then(likedic => {
if (likedic[identity]){
document.getElementById(`arithmos${identity}`).innerHTML = ` ${likedic[identity].length}`;
} else {
document.getElementById(`arithmos${identity}`).innerHTML =' 0';
the thing is, that every time, it displays the data from the just previous updated db.
I mean first time i run update_like function, the show function diplays the db as it was before update_like function runs. But i can see from the file that db is updated.
Second time i run update_like function the show function diplays the db as it should be the first time, even if i can see again from the file that db is updated etc.
I suppose that it doesn't have enough time to read the update db. I have tryied so many things but i cant make it work. Underneath is my python function
def like(request):
likedic = {}
if request.method == 'GET':
allcomments = Like.objects.all()
for i in range(len(allcomments)):
if allcomments[i] not in likedic.keys():
likedic[allcomments[i]] = []
return JsonResponse(likedic, safe=False)
elif request.method == "PUT":
data = json.loads(request.body)
Likes = Like.objects.filter(comment = Comment.objects.get(id = data['id']), user = User.objects.get(username = request.user.username))
if Likes:
Likes = Like(comment = Comment.objects.get(id = data['id']), user = User.objects.get(username = request.user.username))
return HttpResponse(status=204)
# Email must be via GET or PUT
return JsonResponse({
"error": "GET or PUT request required."
}, status=400)
I would really apreciate some advise. Thanks so much in advance.
Use async functions instead of regular functions, and await the completion of the first function before performing the 2nd. For example, convert update_like to an async funciton.
Later, call show only after awaiting update_like:
// `await` must be used within an async fucntion
// So, I'll use await inside this immediately invoked function:
(async function()
let identity = "someIdentity";
await update_like(identity); // Waits until update is complete
return show(identity);
I finally found a solution! I added a setTimeout function before I call show function. I set 0.05sec and it works! I dont know if this is the right or the best way to do it but it finally works!
function update_like(identity) {
method: 'PUT',
body: JSON.stringify({
id: identity,
setTimeout(() => {
}, 50)

Script runs smoothly from start to finish, but the expected result doesn't happen

The project aims to study a new social media:
My needs are:
1 - Collect data from profiles that follow a specific profile.
2 - My account use this data to follow the collected profiles.
3 - Among other possible options, also unfollow the profiles I follow.
The problem found in the current script:
The profile data in theory is being collected, the script runs perfectly until the end, but for some reason I can't specify, instead of following all the collected profiles, it only follows the base profile.
For example:
I want to follow all 250 profiles that follow the ID 123456
I activate the booyahGetAccounts(123456); script
In theory the end result would be my account following 250 profiles
But the end result I end up following only the 123456 profile, so the count of people I'm following is 1
Complete Project Script:
const csrf = 'MY_CSRF_TOKEN';
async function booyahGetAccounts(uid, type = 'followers', follow = 1) {
if (typeof uid !== 'undefined' && !isNaN(uid)) {
const loggedInUserID = window.localStorage?.loggedUID;
if (uid === 0) uid = loggedInUserID;
const unfollow = follow === -1;
if (unfollow) follow = 1;
if (loggedInUserID) {
if (csrf) {
async function getUserData(uid) {
const response = await fetch(`${uid}`),
data = await response.json();
return data.user;
const loggedInUserData = await getUserData(loggedInUserID),
targetUserData = await getUserData(uid),
followUser = uid => fetch(`${loggedInUserID}/followings`, { method: (unfollow ? 'DELETE' : 'POST'), headers: { 'X-CSRF-Token': csrf }, body: JSON.stringify({ followee_uid: uid, source: 43 }) }),
logSep = (data = '', usePad = 0) => typeof data === 'string' && usePad ? console.log((data ? data + ' ' : '').padEnd(50, '━')) : console.log('━'.repeat(50),data,'━'.repeat(50));
async function getList(uid, type, follow) {
const isLoggedInUser = uid === loggedInUserID;
if (isLoggedInUser && follow && !unfollow && type === 'followings') {
follow = 0;
console.warn('You alredy follow your followings. `follow` mode switched to `false`. Followings will be retrieved instead of followed.');
const userData = await getUserData(uid),
totalCount = userData[type.slice(0,-1)+'_count'] || 0,
totalCountStrLength = totalCount.toString().length;
if (totalCount) {
let userIDsLength = 0;
const userIDs = [],
nickname = userData.nickname,
nicknameStr = `${nickname ? ` of ${nickname}'s ${type}` : ''}`,
alreadyFollowedStr = uid => `User ID ${uid} already followed by ${loggedInUserData.nickname} (Account #${loggedInUserID})`;
async function followerFetch(cursor = 0) {
const fetched = [];
await fetch(`${uid}/${type}?cursor=${cursor}&count=100`).then(res => res.json()).then(data => {
const list = data[type.slice(0,-1)+'_list'];
if (list?.length) fetched.push( => e.uid));
if (fetched.length) {
userIDsLength += fetched.length;
if (follow) followUser(uid);
console.log(`${userIDsLength.toString().padStart(totalCountStrLength)} (${(userIDsLength / totalCount * 100).toFixed(4)}%)${nicknameStr} ${follow ? 'followed' : 'retrieved'}`);
if (fetched.length === 100) {
} else {
console.log(`END REACHED. ${userIDsLength} accounts ${follow ? 'followed' : 'retrieved'}.`);
if (!follow) logSep(targetList);
await followerFetch();
return userIDs;
} else {
console.log(`This account has no ${type}.`);
logSep(`${follow ? 'Following' : 'Retrieving'} ${targetUserData.nickname}'s ${type}`, 1);
const targetList = await getList(uid, type, follow);
} else {
console.error('Missing CSRF token. Retrieve your CSRF token from the Network tab in your inspector by clicking into the Network tab item named "bug-report-claims" and then scrolling down in the associated details window to where you see "x-csrf-token". Copy its value and store it into a variable named "csrf" which this function will reference when you execute it.');
} else {
console.error('You do not appear to be logged in. Please log in and try again.');
} else {
console.error('UID not passed. Pass the UID of the profile you are targeting to this function.');
This current question is a continuation of that answer from the link:
Collect the full list of buttons to follow without having to scroll the page (DevTools Google Chrome)
Since I can't offer more bounty on that question, I created this one to offer the new bounty to anyone who can fix the bug and make the script work.
Access account on Booyah website to use for tests:
Access by google:
Password: quartodemilha
I have to admit that it is really hard to read your code, I spent a lesser amount of time rewriting everything from scratch.
Stated that we need a code piece to be cut/pasted in the JavaScript console of web browsers able to store some data (i.e. expiration of followings and permanent followings) we need some considerations.
We can consider expiration of followings as volatile data: something that if lost can be reset to 1 day later from when we loose this data. window.localStorage is a perfect candidate to store these kind of data. If we change web browser the only drawback is that we loose the expiration of followings and we can tolerate to reset them to 1 day later from when we change browser.
While to store the list of permanent followings we need a permanent store even if we change web browser. The best idea that came to my mind is to create an alternative account with which to follow the users we never want to stop following. In my code I used uid 3186068 (a random user), once you have created your own alternative account, just replace the first line of the code block with its uid.
Another thing we need to take care is error handling: API could always have errors. The approach I chosen is to write myFetch which, in case of errors, retries twice the same call; if the error persists, probably we are facing a temporary outage. Probably we just need to retry a bit later.
To try to provide a comfortable interface, the code blocks gathers the uid from window.location: to follow the followers of users, just cut/paste the code block on tabs opened on their profiles. For example I run the code from a tab open on
Last, to unfollow users the clean function is called 5 minutes later we paste the code (to not conflict with calls to follow followers) and than is executed one hour later it finishes its job. It is written to access the localStorage in an atomic way, so you can have many of them running simultaneously on different tabs of the same browser, you can not care about it. The only thing you need to take care it that when the window.location changes, all the JavaScript events in the tab are reset; so I suggest to keep a tab open on the home page, paste the code block on it, and forget about this tab; it will be the tab responsible of unfollowing users. Then open other tabs to do what you need, when you hit a user you want to follow the followers, paste the block on it, wait the job is finished and continue to use the tab normally.
// The account we use to store followings
const followingsUID = 3186068;
// Gather the loggedUID from window.localStorage
const { loggedUID } = window.localStorage;
// Gather the CSRF-Token from the cookies
const csrf = document.cookie.split("; ").reduce((ret, _) => (_.startsWith("session_key=") ? _.substr(12) : ret), null);
// APIs could have errors, let's do some retries
async function myFetch(url, options, attempt = 0) {
try {
const res = await fetch("" + url, options);
const ret = await res.json();
return ret;
} catch(e) {
// After too many consecutive errors, let's abort: we need to retry later
if(attempt === 3) throw e;
return myFetch(url, option, attempt + 1);
function expire(uid, add = true) {
const { followingsExpire } = window.localStorage;
let expires = {};
try {
// Get and parse followingsExpire from localStorage
expires = JSON.parse(followingsExpire);
} catch(e) {
// In case of error (ex. new browsers) simply init to empty
window.localStorage.followingsExpire = "{}";
if(! uid) return expires;
// Set expire after 1 day
if(add) expires[uid] = new Date().getTime() + 3600 * 24 * 1000;
else delete expires[uid];
window.localStorage.followingsExpire = JSON.stringify(expires);
async function clean() {
try {
const expires = expire();
const now = new Date().getTime();
for(const uid in expires) {
if(expires[uid] < now) {
await followUser(parseInt(uid), false);
expire(uid, false);
} catch(e) {}
// Repeat clean in an hour
window.setTimeout(clean, 3600 * 1000);
async function fetchFollow(uid, type = "followers", from = 0) {
const { cursor, follower_list, following_list } = await myFetch(`users/${uid}/${type}?cursor=${from}&count=50`);
const got = (type === "followers" ? follower_list : following_list).map(_ => _.uid);
const others = cursor ? await fetchFollow(uid, type, cursor) : [];
return [, ...others];
async function followUser(uid, follow = true) {
console.log(`${follow ? "F" : "Unf"}ollowing ${uid}...`);
return myFetch(`users/${loggedUID}/followings`, {
method: follow ? "POST" : "DELETE",
headers: { "X-CSRF-Token": csrf },
body: JSON.stringify({ followee_uid: uid, source: 43 })
async function doAll() {
if(! loggedUID) throw new Error("Can't get 'loggedUID' from localStorage: try to login again");
if(! csrf) throw new Error("Can't get session token from cookies: try to login again");
console.log("Fetching current followings...");
const currentFollowings = await fetchFollow(loggedUID, "followings");
console.log("Fetching permanent followings...");
const permanentFollowings = await fetchFollow(followingsUID, "followings");
console.log("Syncing permanent followings...");
for(const uid of permanentFollowings) {
expire(uid, false);
if(currentFollowings.indexOf(uid) === -1) {
await followUser(uid);
// Sync followingsExpire in localStorage
for(const uid of currentFollowings) if(permanentFollowings.indexOf(uid) === -1) expire(uid);
// Call first clean task in 5 minutes
window.setTimeout(clean, 300 * 1000);
// Gather uid from window.location
const match = /\/studio\/(\d+)/.exec(window.location.pathname);
if(match) {
console.log("Fetching this user followers...");
const followings = await fetchFollow(parseInt(match[1]));
for(const uid of followings) {
if(currentFollowings.indexOf(uid) === -1) {
await followUser(uid);
return "Done";
await doAll();
The problem: I strongly suspect a API bug
To test my code I run it from
If I run it multiple times I continue to get following output:
Fetching current followings...
Fetching permanent followings...
Syncing permanent followings...
Following 1801775...
Following 143823...
Following 137017...
Fetching this user followers...
Following 16884042...
Following 16166724...
There is bug somewhere! The expected output for subsequent executions in the same tab would be:
Fetching current followings...
Fetching permanent followings...
Syncing permanent followings...
Fetching this user followers...
After seeking the bug in my code without success, I checked APIs: if I navigate following URLs (the uids are the ones the code continue to follow in subsequent executions)
I can clearly see I follow them, but if I navigate (the list of users I follow) I can't find them, neither if I scroll the page till the end.
Since I do exactly the same calls the website does, I strongly suspect the bug is in APIs, exactly in the way they handle the cursor parameter.
I suggest you to open a support ticket to support team. You could use the test account you provided us: I already provided you the details to do that. ;)

How to setup Appinsights with azure search javascript sdk

From the Azure Search documentation I know that we have to get some search information to setup appinsights telemetry.
The problem is: How do I get SearchID information from the #azure/search-documents SearchDocumentResult?
Using the #azure/search-documents module, you can set up your client and add custom headers to operations like so:
const { SearchClient, AzureKeyCredential } = require("#azure/search-documents");
const indexName = "nycjobs";
const apiKey = "252044BE3886FE4A8E3BAA4F595114BB";
const client = new SearchClient(
new AzureKeyCredential(apiKey)
async function main() {
var searchId = '';
const searchResults = await'Microsoft', {
top: 3,
requestOptions: {
customHeaders: {
'Access-Control-Expose-Headers': 'x-ms-azs-searchid',
'x-ms-azs-return-searchid': 'true'
shouldDeserialize: (response) => {
searchId = response.headers.get('x-ms-azs-searchid');
return true;
console.log(`Search ID: ${searchId}\n`);
for await (const result of searchResults.results) {
It seems that currently the only way to get them out is the shouldDeserialize callback as shown in the example since it gives you the raw response including the headers before deserializing when the headers are stripped from some objects, such as those paged response objects returned by search.
I'm assuming that you care more about search query telemetry and not indexer telemetry, but please correct me if I'm wrong. Is this documentation page helpful?
From that page, here is how you set the searchId:
request.setRequestHeader("x-ms-azs-return-searchid", "true");
request.setRequestHeader("Access-Control-Expose-Headers", "x-ms-azs-searchid");
var searchId = request.getResponseHeader('x-ms-azs-searchid');
Please let me know if I'm misunderstanding the question.

How do I only render the updated stat - websockets

right now the entire div re-renders, but I am searching for a way to only re-render the updated statistic
these are parts of what I have now
document.querySelector("#submit").addEventListener("click", function (e) {
let country = document.querySelector("#country").value
let numberCases = document.querySelector("#number").value
fetch(base_url + "/api/v1/stats/updatestats", {
method: "put",
headers: {
"Content-Type": "application/json"
body: JSON.stringify({
"country": country,
"numberCases": numberCases
}).catch(err => {
primus.write({ "action": "update" })
primus.on("data", (json) => {
if (json.action === "update") {
document.querySelector("#overview").innerHTML = ""
function appendInfo() {
fetch(base_url + "/api/v1/stats", {
method: "get",
headers: {
'Content-Type': 'application/json'
}).then(response => {
return response.json();
}).then(json => { => {
let country =
let numberCases = stat.numberCases
let p = document.createElement('p')
let text = document.createTextNode(`${country}: ${numberCases}`)
let overview = document.querySelector("#overview")
}).catch(err => {
window.onload = appendInfo();
h1 Cases by country
So if I only update the country Belgium I only want that statistic to be changed. Now everything seems to reload
What I meant with my suggestion is to keep te communication of data between client and server strictly in the sockets. Meaning when one user updates 1 value on their end, that value will be send to the server and stored. After the server finished storing the value, that same value will be sent to all other clients. This way you only send and receive the parts that have been changed without having to download everything on every change.
I might not be able to write the code exactly as it should be as I have limited experience with Primus.js and know little about your backend.
But I would think that your frontend part would look something like this. In the example below I've removed the fetch function from the click event. Instead send the changed data to the server which should handle those expensive tasks.
const countryElement = document.querySelector("#country");
const numberCasesElement = document.querySelector("#number");
const submitButton = document.querySelector("#submit");
submitButton.addEventListener("click", function (e) {
let data = {
action: 'update',
country: countryElement.value,
numberCases: numberCasesElement.value
Now the server should get a message that one of the clients has updated some of the data. And should do something with that data, like storing it and letting the other clients know that this piece of data has been updated.
primus.on('data', data => {
const { action } = data;
if (action === 'update') {
// Function that saves the data that the socket received.
// saveData(data) for example.
// Send changed data to all clients.
The server should now have stored the changes and broadcasted the change to all other clients. Now you yourself and other will receive the data that has been changed and can now render it. So back to the frontend. We do the same trick as on the server by listening for the data event and check the action in the data object to figure out what to do.
You'll need a way to figure out how to target the elements which you want to change, you could do this by having id attributes on your elements that correspond with the data. So for example you want to change the 'Belgium' paragraph then it would come in handy if there is a way to recognize it. I won't go into that too much but just create something simple which might do the trick.
In the HTML example below I've given the paragraph an id. This id is the same as the country value that you want to update. This is a unique identifier to find the element that you want to update. Or even create if it is not there.
The JavaScript example after that receives the data from the server through the sockets and checks the action. This is the same data that we send to the server, but only now when everybody received we do something with it.
I've written a function that will update the data in your HTML. It will check for an element with the id that matches the country and updates the textContent property accordingly. This is almost the same as using document.createTextNode but with less steps.
<div id="overview">
<p id="Belgium">Belgium: 120</p>
const overview = document.querySelector("#overview");
primus.on('data', data => {
const { action } = data;
if (action === 'update') {
function updateInfo(data) {
const { country, numberCases } = data;
// Select existing element with country data.
const countryElement = overview.querySelector(`#${country}`);
// Check if the element is already there, if not, then create it.
// Otherwise update the values.
if (countryElement === null) {
const paragraph = document.createElement('p'); = country;
paragraph.textContent = `${country}: ${numberCases}`;
} else {
countryElement.textContent = `${country}: ${numberCases}`;
I hope that this is what you are looking for and / or is helpful for what you are trying to create. I want to say again that this is an example of how it could work and has not been tested on my end.
If you have any questions or I have been unclear, then please don't hesitate to ask.
To elaborate #EmielZuurbier's suggestion in the comment, please try the following code.
primus.on("dataUpdated", (json) => {
primus.on('data',data =>{
//process it here and then
//send it out again
primus.emit('dataUpdated','the data you want to send to the front end');

How to edit request url in service worker?

I'm using cache first caching strategy for my pwa, for every GET request I first look if that request exists in cache, if it does I return it and update the cache.
The problem is that users can switch between multiple projects, so when they switch to another project,
the first time they open some url, they get the stuff from previous project if it exists in cache.
My solution is to try to add GET parametar ?project=projectId(project=2 for example) in the service worker, so each project would have its own version of the request saved in the cache.
I wanted to concatinate project id to the event.request.url, but I've read here that it is read only.
After doing that, hopefully I would have urls like this in cache:
Instead of:
I would have:
So I would get questions from the project I'm on, instead of just getting questions from previous project is /questions is saved in cache already.
Is there a way to edit request url in service worker?
My service worker code:
self.addEventListener('fetch', function(event) {
const url = new URL(event.request.clone().url);
if (event.request.clone().method === 'POST') {
// update project id in service worker when it's changed
if(url.pathname.indexOf('/project/') != -1 ) {
// update user data on project switch
let splitUrl = url.pathname.split('/');
if (splitUrl[2] && !isNaN(splitUrl[2])) {
console.log( user );
setTimeout(function() {
console.log( user );
}, 1000);
// do other unrelated stuff to post requests
// ideally,here I would be able to do something like this:
if(user.project_id !== 'undefined') {
event.request.url = event.request.url + '?project=' + user.project_id;
event.respondWith(async function () {
const cache = await'CACHE_NAME')
const cachedResponsePromise = await cache.match(event.request.clone())
const networkResponsePromise = fetch(event.request.clone())
if (event.request.clone().url.startsWith(self.location.origin)) {
event.waitUntil(async function () {
const networkResponse = await networkResponsePromise.catch(function(err) {
console.log( 'CACHE' );
// return caches.match(event.request);
return caches.match(event.request).then(function(result) {
// If no match, result will be undefined
if (result) {
return result;
} else {
.then((cache) => {
return caches.match('/offline.html');
await cache.put(event.request.clone(), networkResponse.clone())
// news and single photos should be network first
if (url.pathname.indexOf("news") > -1 || url.pathname.indexOf("/photos/") > -1) {
return networkResponsePromise || cachedResponsePromise;
return cachedResponsePromise || networkResponsePromise;
It's possible to use any URL as a cache key when reading/writing to the Cache Storage API. When writing to the cache via put(), for instance, you can pass in a string representing the URL you'd like to use as the first parameter:
// You're currently using:
await cache.put(event.request.clone(), networkResponse.clone())
// Instead, you could use:
await cache.put(event.request.url + '?project=' + someProjectId, networkResponse.clone())
But I think a better approach that would accomplish what you're after is to use different cache names for each project, and then within each of those differently-named caches you would not have to worry about modifying the cache keys to avoid collisions.
// You're currently using:
const cache = await'CACHE_NAME')
// Instead, you could use:
const cache = await'CACHE_NAME' + someProjectId)
(I'm assuming that you have some reliable way of figuring out what the correct someProjectId value should be inside of the service worker, based on which client is making the incoming request.)
