why does the browser freeze when executing many ajax requests? - javascript

I have a page that is executing around 200 ajax requests using jquery.load but it is behaving in a very un-ajax way because the browser is frozen while the results are fetched.
By freezing I mean losing control of the browser, not able to scroll it up and down even. Then the results all display at once when it has finished all requests, but I know it is actually fetching the results 6 at a time (browser controlled "same host" policy) from watching the access log of the target server.
Though the jquery.load commands are built using a "foreach" loop they are already written to the source of the page when the user loads it (so for all intents and purposes they could all be hand written individually), so its not like the page is waiting for the loop to finish. The last "symptom" is that even if it is only 30 requests instead, the issue is just the same.
So it's odd to me and I am looking for ideas of what could cause this and how it could be worked around. It's definitely confusing to the end user especially as it could take 90-100 seconds until all the responses are back and the user regains control of the browser.
One small update:
I have very similar code running in another webapp that does around 20 requests simultaneously without issue. The difference is that instead of fetching a page, it is ssh'ing to the server and reading/updating a file on the file system via a script. I would have thought that would actually have a little more overhead but it has none of these issues.
And as I have said - even 20 requests causes the same issue with the code in question... so I am tempted to think its perhaps curl related... though its pure speculation.
The Bigger update Now with infinitely more Code!!!
The fuller background to app is this. We run a cluster of some of the highest trafficked WebSphere AppServers in the world, which are running our Commerce applications. The intensity of the traffic means that if we simply let traffic on to an appserver before the JVM is warmed up, they crash! So we hit a few key pages before allowing traffic on, as this precompiles all the major servlets, proportions the JVM, and populates some of the servlet caches. Then the traffic can come onto the server with no issues and they run great.
We had a version of the app written in CGI which worked but was so slow due to being synchronous. We are talking about 10 minutes on some clusters to run. Due to being synchronous requests, only one thread on the appserver and one jdbc connection was being used.
So what the new webapp does is use a template of these key pages, to combine with a bunch of market definitions (country code, language code, catalog id's etc) to produce a list of all those URL's that need to be hit. By hitting them all in an asynchronous way it not only runs faster (now taking only 90 seconds), it also does a better job of proportioning the JVM, uses up to 30 threads and opens the JDBC pool to its full number of connections. Thus it's REALLY in a production-like state by the time we let traffic on. So I am very pleased with results, but this browser freeze is annoying me from a purely cosmetic and puzzle-solving point of view.
So now some code, the user simply selects an appserver, the app decides which cluster it is from, and displays the list of computed URL's it will hit. At this point the page is a table of 'Markets x Urls' with each cell having a unique id that the jquery uses to put the right result in the right cell (as we can't guarantee the order in which the results come back - nor do we want to as that takes us back into synchronous territory again.
So at the point at which the user is ready to click Go, the table is written and the jQuery commands prepared. On clicking go the jquery script is executed and URL's are hit and return a HTTP status code for each so we know they were successful.
The JQ part generated looks like (shortened to just a few markets)
$("a#submit").click(function(event) {
alert(" booya ");
$("#sesv-1").load("psurl.php?server=servera.domain.com&url=/se/sv");
$("#sesv-2").load("psurl.php?server=servera.domain.com&url=/se/sv/catalog/productsaz/");
$("#sesv-3").load("psurl.php?server=servera.domain.com&url=/se/sv/catalog/products/12345678");
$("#sesv-4").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?storeId=14&productId=103406&StoreNumber=099&langId=-13&ddkey=http:StockSearch");
$("#sesv-5").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?query=testProd&storeId=14&langId=-11&StoreNumber=011");
$("#atde-1").load("psurl.php?server=servera.domain.com&url=/at/de");
$("#atde-2").load("psurl.php?server=servera.domain.com&url=/at/de/catalog/productsaz/");
$("#atde-3").load("psurl.php?server=servera.domain.com&url=/at/de/catalog/products/12345678");
$("#atde-4").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?storeId=1&productId=103406&StoreNumber=114&langId=-99&ddkey=http:StockSearch");
$("#atde-5").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?query=testProd&storeId=1&langId=-21&StoreNumber=273");
$("#benl-1").load("psurl.php?server=servera.domain.com&url=/be/nl");
$("#benl-2").load("psurl.php?server=servera.domain.com&url=/be/nl/catalog/productsaz/");
$("#benl-3").load("psurl.php?server=servera.domain.com&url=/be/nl/catalog/products/12345678");
$("#benl-4").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?storeId=18&productId=103406&StoreNumber=412&langId=-44&ddkey=http:StockSearch");
$("#benl-5").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?query=testProd&storeId=18&langId=-23&StoreNumber=482");
$("#befr-1").load("psurl.php?server=servera.domain.com&url=/be/fr");
$("#befr-2").load("psurl.php?server=servera.domain.com&url=/be/fr/catalog/productsaz/");
$("#befr-3").load("psurl.php?server=servera.domain.com&url=/be/fr/catalog/products/12345678");
$("#befr-4").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?storeId=130&productId=103406&StoreNumber=048&langId=-73&ddkey=http:StockSearch");
$("#befr-5").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?query=testProd&storeId=130&langId=-24&StoreNumber=482");
$("#caen-1").load("psurl.php?server=servera.domain.com&url=/ca/en");
$("#caen-2").load("psurl.php?server=servera.domain.com&url=/ca/en/catalog/productsaz/");
$("#caen-3").load("psurl.php?server=servera.domain.com&url=/ca/en/catalog/products/12345678");
$("#caen-4").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?storeId=30&productId=103406&StoreNumber=006&langId=-11&ddkey=http:StockSearch");
$("#caen-5").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?query=testProd&storeId=30&langId=-15&StoreNumber=216");
$("#cafr-1").load("psurl.php?server=servera.domain.com&url=/ca/fr");
$("#cafr-2").load("psurl.php?server=servera.domain.com&url=/ca/fr/catalog/productsaz/");
$("#cafr-3").load("psurl.php?server=servera.domain.com&url=/ca/fr/catalog/products/12345678");
$("#cafr-4").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?storeId=33&productId=103406&StoreNumber=124&langId=-09&ddkey=http:StockSearch");
$("#cafr-5").load("psurl.php?server=servera.domain.com&url=/webapp/wcs/stores/servlet/StockSearch?query=testProd&storeId=33&langId=-16&StoreNumber=216")
});
});
The PS URL is simply a curl request function that responds with 404, 200, 500 etc which is then used to populate the relevant cell.
function getPage( $url ) {
$options = array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "pre-surf", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_POST => 0, // i am not sending post data
CURLOPT_SSL_VERIFYHOST => 0, // don't verify ssl
CURLOPT_SSL_VERIFYPEER => FALSE, //
);
$ch = curl_init();
curl_setopt_array($ch, $options);
$content = curl_exec($ch);
$err = curl_errno($ch);
$errmsg = curl_error($ch) ;
$header = curl_getinfo($ch);
curl_close($ch);
// $header['errno'] = $err;
// $header['errmsg'] = $errmsg;
// $header['content'] = $content;
return $header['http_status_code'];
}

The problem here is not the Ajax requests, the problem is each one of those requests is updating the DOM. The browser redraw is what is causing the browser to lock up.
You need to find a better solution that does not write to the DOM so often.

Related

Express server that creates maximum of 2 child/worker processes

I'm experimenting with node and it's child_process module.
My goal is to create server which will run on maximum of 3 processes (1 main and optionally 2 children).
I'm aware that code below may be incorrect, but it displays interesting results.
const app = require ("express")();
const {fork} = require("child_process")
const maxChildrenRuning = 2
let childrenRunning = 0
app.get("/isprime", (req, res) => {
if(childrenRunning+1 <= maxChildrenRuning) {
childrenRunning+=1;
console.log(childrenRunning)
const childProcess = fork('./isprime.js');
childProcess.send({"number": parseInt(req.query.number)})
childProcess.on("message", message => {
console.log(message)
res.send(message)
childrenRunning-=1;
})
}
})
function isPrime(number) {
...
}
app.listen(8000, ()=>console.log("Listening on 8000") )
I'm launching 3 requests with 5*10^9'ish numbers.
After 30 seconds I receive 2 responses with correct results.
CPU stops doing hard work and goes idle
Surprisingly after next 1 minute 30 seconds 1 thread starts to proceed, still pending, 3rd request and finishes after next 30 seconds with correct answer. Console log displayed below:
> node index.js
Listening on 8000
1
2
{ number: 5000000029, isPrime: true, time: 32471 }
{ number: 5000000039, isPrime: true, time: 32557 }
1
{ number: 5000000063, isPrime: true, time: 32251 }
Either express listens and checks pending requests once for a while or my browser sends actual requests every x time while pending. Can anybody explain what is happening here and why? How can I correctly achieve my goal?
The way your server code is written, if you receive a /isprime request and two child processes are already running, your request handler for /isprime does nothing. It never sends any response. You don't pass that first if test and then nothing happens afterwards. So, that request will just sit there with the client waiting for a response. Depending upon the client, it will probably eventually time out as a dead/inactive request and the client will shut it down.
Some clients (like browsers) may assume that something just got lost in the network and they may retry the request by sending it again. It would be my guess that this is what is happening in your case. The browser eventually times out and then resends the request. By the time it retries, there are less than two child processes running so it gets processed on the retry.
You could verify that the browser is retrying automatically by going to the network tab in the Chrome debugger and watching exactly what the browser sends to your server and watch that third request, see it timeout and see if it is the browser retrying the request.
Note, this code seems to be only partially implemented because you initially start two child processes, but you don't reuse those child processes. Once they finish and you decrement maxChildrenRuning, your code will then start another child process. Probably what you really want to do is to keep track of the two child processes you started and when one finishes, add it to an array of "available child processes" so when a new request comes in, you can just use an existing child process that is already started, but idle.
You also need to either queue incoming requests when all the child processes are full or you need to send some sort of error response to the http request. Never sending an http response to an incoming request is a poor design that just leads to great inefficiencies (connections hanging around much longer than needed that never actually accomplish anything).

Scraping a rendered javascript webpage

I'm trying to build a short Python program that extracts Pewdiepie's number of subscribers which is updated every second on socialblade to show it in the terminal. I want this data like every 30 seconds.
I've tried using PyQt but it's slow, i've turned to dryscrape, slightly faster but doesn't work either as I want it to. I've just found Invader and written some short code that still has the same problem : the number returned is the one before the Javascript on the page is executed :
from invader import Invader
url = 'https://socialblade.com/youtube/user/pewdiepie/realtime'
invader = Invader(url, js=True)
subscribers = invader.take(['#rawCount', 'text'])
print(subscribers.text)
I know that this data is accessible via the site's API but it's not always working, sometimes it just redirect to this.
Is there a way to get this number after the Javascript on the page modified the counter and not before ? And which method seems the best to you ? Extract it :
from the original page which always returns the same number for hours ?
from the API's page which bugs when not using cookies in the code and after a certain amount of time ?
Thanks for your advices !
If you want scrape a web page that has parts of it loaded in by javascript you pretty much need to use a real browser.
In python this can be achieved with pyppeteer:
import asyncio
from pyppeteer import launch
async def main():
browser = await launch(headless=False)
page = await browser.newPage()
await page.goto('https://socialblade.com/youtube/user/pewdiepie/realtime',{
'waitUntil': 'networkidle0'
})
count = int(await page.Jeval('#rawCount', 'e => e.innerText'))
print(count)
asyncio.get_event_loop().run_until_complete(main())
Note: It does not seems like the website you mentioned above is updating the subscriber count frequently any more (even with JavaScript). See: https://socialblade.com/blog/abbreviated-subscriber-counts-on-youtube/
For best success and reliability you will probably need to set the user agent(page.setUserAgent in pyppeteer) and keep it up to date and use proxies (so your ip does not get banned). This can be a lot of work.
It might be easier and cheaper (in time and than buying a large pool of proxies) to use a service that will handle this for you like Scraper's Proxy. It supports will use a real browser and return the resulting html after the JavaScript has run and route all of our requests through a large network of proxies, so you can sent a lot of requests without getting you ip banned.
Here is an example using the Scraper's Proxy API getting the count directly from YouTube:
import requests
from pyquery import PyQuery
# Send request to API
url = "https://scrapers-proxy2.p.rapidapi.com/javascript"
params = {
"click_selector": '#subscriber-count', # (Wait for selector work-around)
"wait_ajax": 'true',
"url":"https://www.youtube.com/user/PewDiePie"
}
headers = {
'x-rapidapi-host': "scrapers-proxy2.p.rapidapi.com",
'x-rapidapi-key': "<INSERT YOUR KEY HERE>" # TODO
}
response = requests.request("GET", url, headers=headers, params=params)
# Query html
pq = PyQuery(response.text)
count_text = pq('#subscriber-count').text()
# Extract count from text
clean_count_text = count_text.split(' ')[0]
clean_count_text = clean_count_text.replace('K','000')
clean_count_text = clean_count_text.replace('M','000000')
count = int(clean_count_text)
print(count)
I know this is a bit late, but I hope this helps

Preventing client side abuse/cheating of repeating Ajax call that rewards users

I am working on a coin program to award the members for being on my site. The program I have makes two random numbers and compares them, if they are the same, you get a coin. The problem I have is someone could go in the console and get "free" coins. They could also cheat by opening more tabs or making a program to generate more coins right now which I am trying to stop. I am thinking about converting it over to php from js to stop the cheating (for the most part) but I don't know how to do this. The code in question is:
$.ajax({
type: 'post',
url: '/version2.0/coin/coins.php',
data: {Cid : cs, mode : 'updateCoins'},
success: function (msg) {
window.msg=msg;
}});
And the code for the console is that with a loop around it. In the code above, "cs" is the id of the member so by replacing it with their id would cause them to get all the coins they would want.
Should I just have an include with variable above it? But then how would I display the success message which has the current number of coins. Also, this code is in a setInterval function that repeats every 15 milliseconds.
There are multiple ways you could do this, but perhaps the simplest would be to go in your server side code - when a request comes in, you check the time of last coin update, if there ins't one, you run your coin code and save the time of this operation in their session. If there is a stored time, ensure that it is beyond the desired time. If it is, continue to the coin update. If it isn't, simply respond with a 403 or other failure code.
In pseudo code:
if (!$userSession['lastCoinTime'] || $currentTime + $delay > $userSession['lastCoinTime']) {
// coin stuff
$userSession['lastCoinTime'] = // new time
} else {
// don't give them a chance at coin, respond however you want
}
However, since you're talking about doing this check every 15ms, I would use websockets so that the connection to the server is ongoing. Either way, the logic can be comparable.
Just in case there's any uncertainty about this, definitely do ALL of the coin logic on the server. You can never trust the user for valid data coming in. The most you can trust, depending on how your authentication is setup, is some kind of secret code only they would have that would just let you know who they are, which is a technique used in place of persistent sessions. Unless you're doing that, you would rely on the session to know who the user is - definitely don't let them tell you that either!

Check/Log how much bandwidth PhantomJS/CasperJS used

Is it possible to check/log how much data has been transferred during each run of PhantomJs/CasperJS?
Each instance of Phantom/Casper has a instance_id assigned to it (by the PHP function that spun up the instance). After the run has finished, the amount of data transferred and the instance_id will have to make its way to be inserted into a MySQL database, possibly via the PHP function that spawned the instance. This way the bandwidth utilization of individual phantomjs runs can be logged.
There can be many phantom/casper instances running, each lasting a minute or two.
The easiest and most accurate approach when trying to capture data is to get the collector and emitter as close as possible. In this case it would be ideal if phantomjs could capture that data that you need and send it back to your PHP function to associate it to the instance_id and do the database interaction. Turns out it can (at least partially).
Here is one approach:
var page = require('webpage').create();
var bytesReceived = 0;
page.onResourceReceived = function (res) {
if (res.bodySize) {
bytesReceived += res.bodySize;
}
};
page.open("http://www.google.com", function (status) {
console.log(bytesReceived);
phantom.exit();
});
This captures the size of all resources retrieved, adds them up, and spits out the result to standard output where your PHP code is able to work with it. This does not include the size of headers or any POST activity. Depending upon your application, this might be enough. If not, then hopefully this gives you a good jumping off point.

setInterval alternative

In my app I am polling the webserver for messages every second and displaying them in the frontend.
I use setInterval to achieve this. However as long as the user stays on that page the client keeps polling the server with requests even if there is no data. The server does give an indication when no more messages are being generated by setting a variable.
I thought of using this variable to clearInterval and stop the timer but that didn't work. What else can I use in this situation?
I am using jquery and django. Here is my code:
jquery:
var refresh = setInterval(
function ()
{
var toLoad = '/myMonitor'+' #content';
$('#content').load(toLoad).show();
}, 1000); // refresh every 1000 milliseconds
});
html:
div id=content is here
I can access the django variable for completion in html with each refresh. How can I set clearInterval if at all ?
Note: stack overflow does not let me put is &gt &lt so html is incomplete
Thanks
Updated 03/16/2010
I must be doing something wrong. But cannot figure it out. Here is my script with clearTimer and it does not work.
var timer = null;
$(function(){
if ("{{status}}" == "False")
{
clearInterval(timer);
}
else
{
timer = setInterval(
function(){
var toLoad = '/myMonitor'+' #content';
$('#content').load(toLoad).show();}
,1000); // refresh every 1000 milliseconds
}
});
status is a boolean set in "views.py" (Django).
Thanks a bunch.
A couple people have already answered with specific resources to your problem, so I thought I would provide a bit of background.
In short, you want the server to push data to the browser to avoid extensive client-side polling. There isn't a good cross-browser way to support server push, so a common solution that requires much less polling is to use the Comet (another cleaning product, like AJAX) long-poll technique.
With Comet, the browser makes a request, and the server keeps the connection open without responding until new data is available. When the server does has new data, it sends it over the open connection and the browser receives it right away. If the connection times out, the browser opens a new one. This lets the server send data to the client as soon as it becomes available. As others have indicated, this approach requires special configuration of your web server. You need a script on the server that checks for data at an interval and responds to the client if it exists.
Something to keep in mind with this approach is that most web servers are built to get a request from a client and respond as quickly as possible; they're not intended to be kept alive for a long period of time. With Comet you'll have far more open connections than normal, probably consuming more resources than you expect.
Your clearInterval check is only checking when the document ready event is fired.
If the code you gave is exactly what's in the browser, then you're comparing the string "{{status}}" to the string "False". I'd rather watch paint dry than wait for that to evaluate as true.
What if your requests taking longer than 1 second to complete? : You'll flood your server with requests.
function update () {
$('#content').show().load('/myMonitor'+' #content', function (response, status) {
if (!/* whatever you're trying to check*/) {
setTimeout(update, 1000);
};
});
};
$(document).ready(function () {
update();
});
Is closer than where you were, but you still need to work out how you're going to decide when you want to stop polling.

Categories