Get large data from API with pagination - javascript

I'm trying to GET a large amount of data from the API (over 300k records). It has pagination (25 records per page) and request limit is 50 request per 3 minutes. I'm using PHP curl to get the data. The API needs JWT token authorization. I can get a single page and put its records into an array.
...
$response = curl_exec($curl);
curl_close($curl);
$result = json_decode($response, true);
The problem is I need to get all records from all pages and save it into array or file. How to do it? Maybe I should use JS to do it better?
Best regards and thank you.

Ideally use cron and some form of storage, database or a file.
It is important that you ensure a new call to the script doesn't start unless the previous one has finished, otherwise they start stacking up and after a few you will start having server overload, failed scripts and it gets messy.
Store a value to say the script is starting.
Run the CURL request.
Once curl has been returned and data is processed and stored change the value you stored at the beginning to say the script has finished.
Run this script as a cron in the intervals you deem necessary.
A simplified example:
<?php
if ($script_is_busy == 1) exit();
$script_is_busy = 1;
// YOUR CURL REQUEST AND PROCESSING HERE
$script_is_busy = 0;
?>

I would use a series of requests. A typical request takes at most 2 seconds to fulfill, so 50 requests per 3oo secs does not require parallel requests. Still you need to measure time and wait if you don't want to be banned for DoS. Note that even with parallelism, curl supports it as far as I remember. When you reach the request limit you must use the sleep function to wait until you can send new requests. For PHP the real problem that it is a long running job, so you need to change settings, otherwise it will timeout. You can do it this way: Best way to manage long-running php script? As of nodejs, I think it is a lot better solution for this kind of async tasks, because the required features come naturally with nodejs without extensions and such things, though I am biased towards it.

Okay. I misinterpreted what you needed. I have more questions.
Can you do one request and get your 50 records immediately? That is assuming when you said 50 requests per 3 minutes you meant 50 records.
Why do you think there is this 50/3 limitation?
Can you provide a link to this service?
Is that 50 records per IP address?
Is leasing 5 or 6 IP addresses an option?
Do you pay for each record?
How many records does this service have total?
Do the records have a time limit on their viability.
I am thinking if you can use 6 IP addresses (or 6 processes) you can run the 6 requests simultaneously using stream_socket_client().
stream_socket_client allows you to make simultaneous requests.You then create a loop that monitors each socket for a response.
About 10 years ago I made an app that evaluated web page quality. I ran
W3C Markup Validation
W3C CSS Validation
W3C Mobile OK
WebPageTest
My own performance test.
I put all the URLs in an array like this:
$urls = array();
$path = $url;
$url = urlencode("$url");
$urls[] = array('host' => "jigsaw.w3.org",'path' => "/css-validator/validator?uri=$url&profile=css3&usermedium=all&warning=no&lang=en&output=text");
$urls[] = array('host' => "validator.w3.org",'path' => "/check?uri=$url&charset=%28detect+automatically%29&doctype=Inline&group=0&output=json");
$urls[] = array('host' => "validator.w3.org",'path' => "/check?uri=$url&charset=%28detect+automatically%29&doctype=XHTML+Basic+1.1&group=0&output=json");
Then I'd make the sockets.
foreach($urls as $path){
$host = $path['host'];
$path = $path['path'];
$http = "GET $path HTTP/1.0\r\nHost: $host\r\n\r\n";
$stream = stream_socket_client("$host:80", $errno,$errstr, 120,STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT);
if ($stream) {
$sockets[] = $stream; // supports multiple sockets
$start[] = microtime(true);
fwrite($stream, $http);
}
else {
$err .= "$id Failed<br>\n";
}
}
Then I monitored the sockets and retrieved the response from each socket.
while (count($sockets)) {
$read = $sockets;
stream_select($read, $write = NULL, $except = NULL, $timeout);
if (count($read)) {
foreach ($read as $r) {
$id = array_search($r, $sockets);
$data = fread($r, $buffer_size);
if (strlen($data) == 0) {
// echo "$id Closed: " . date('h:i:s') . "\n\n\n";
$closed[$id] = microtime(true);
fclose($r);
unset($sockets[$id]);
}
else {
$result[$id] .= $data;
}
}
}
else {
// echo 'Timeout: ' . date('h:i:s') . "\n\n\n";
break;
}
}
I used it for years and it never failed.
It would be easy to gather the records and paginate them.
After all sockets are closed you can gather the pages and send them to your user.
Do you think the above is viable?
JS is not better.
Or did you mean 50 records each 3 minutes?
This is how I would do the pagination.
I'd organize the response into pages of 25 records per page.
In the query results while loop I'd do this:
$cnt = 0;
$page = 0;
while(...){
$cnt++
$response[$page][] = $record;
if($cnt > 24){$page++, $cnt = 0;}
}
header('Content-Type: application/json');
echo json_encode($response);

Related

Multi-threaded ajax call for PHP

I am trying to make a web app that will figure out if one or many e-commerce items are out of stock from their url(s) entered by user. These urls can be seperated by commas. Currently, I make ajax calls to my one of my PHP scripts for each url after spliting them by comma in a javascript loop. Below is the code for that:
function sendRequest(urls) {
if (urls.length == 0) {
return;
} else {
var A = urls.split(',');
for (var i = 0; i < A.length; i++) {
var xmlhttp = new XMLHttpRequest();
xmlhttp.onreadystatechange = function () {
if (this.readyState == 4 && this.status == 200) {
var result_set = JSON.parse(this.responseText);
if (result_set.flag == 1) {
insertRow('stock-table', result_set.url, result_set.title); // It populates a table and insert row in it.
}
}
};
xmlhttp.open("GET", "scrapper.php?url=" + A[i], true);
xmlhttp.send();
}
}
}
The scrapper.php goes like:
<?php
function get_title($data)
{
$title = preg_match('/<title[^>]*>(.*?)<\/title>/ims', $data, $matches) ? $matches[1] : null;
return $title;
}
if (!isset($_SESSION['username'])) {
header("Location: index.php");
}
else if (isset($_GET["url"])) {
$url = $_GET["url"];
$title = null;
$result_set = null;
$flag = 0;
$file = fopen($url,"r");
if (!$file) {
echo "<p>Unable to open remote file.\n";
exit;
}
while (!feof($file)) {
$line = fgets($file, 1024);
if ($title == null){
$title = get_title($line);
}
if (preg_match('/<span[^>]*>Add to Cart/i',
$line, $matches, PREG_OFFSET_CAPTURE)) {
break;
}
if (preg_match('/Sold Out|Out of Stock/i',
$line, $matches, PREG_OFFSET_CAPTURE)) {
$flag = 1;
break;
}
}
fclose($file);
$result_set = array("flag" => $flag,
"url" => $url,
"title" => $title
);
echo json_encode($result_set);
}
?>
Now problem is: This program takes too much time even for two urls. Although, I moved from file_get_contents()(which was even slower) to here (ftp solution) . I have few confusion is my mind:
Considering my javascript, is it like sending one ajax call, waiting for its response and then second one?
If point one is not true, will scrapper.php be able to respond to second call from the loop? since it is busy with handeling first ajax call computation.
If point 2 is true, how can I make it multi-threaded such that ajax keeps sending the call until loop is finished and scrapper.php activates different threads for each call to then reply back to client once a thread completes its execution? (How can a make to pool of limted threads and grant new ajax response once a threads compltes its execution. Since, I have 200 urls. So, making 200 threads must not be an optimal solution)
Is it a good solution if I insert all urls (around 200) into the database, and then fetch all of them to make multi-threaded executions. In that case, how can i reply back multiple results from multiple threads against a single ajax call?
Please Help
No. XMLHttpRequest defaults to async, which means every new request with said async will execute in parallel.
Completely depends on how you're running PHP. In typical setups - and it's unlikely you're doing otherwise, your http server will wait for an available PHP worker thread from a thread pool, or execute a PHP binary directly. Either way, more than one PHP program can execute at once. (Think about how a regular website works. You need to be able to support more than one user at a time.)
N/A
If I'm understanding correctly, you just want to handle all requests in one Ajax call? Just send a list of all the URLs in the request, and loop server-side. Your current way of doing it is fine. Most likely the "slow" nature can be attributed to your connection to the remote URLs.
Some other notes:
I would validate the URL before passing it into fopen, especially considering the user can pass simply pass in a relative path and start reading your "private" files.
I'd switch back to file_get_contents. It's pretty much equivalent to fopen but does much of the work for you.
Not sure if intentional, but I'd use the newer const keyword instead of var for the XMLHttpRequest variable in the for loop's inner block. Currently, the var gets hoisted to the top of the function scope and you are simply overwriting it every iteration of the loop. If you want to add more logic to the XMLHttpRequest, you may find yourself prone to some unintentional behaviour.

allow PHP script with long execution time to send updates back to the browser

I looked over a few of the questions, namely
Show progress for long running PHP script
How do you run a long PHP script and keep sending updates to the browser via HTTP?
and neither one seems to answer my question, part of which seems to be "how do I do this?" and the other half is "Hey, the way I'm doing this right now - is it the best way? Could I code this better?"
I have a simple ajax script that sends some data over to a PHP script:
$.ajax({
type: 'POST',
url: 'analysis.php',
data: { reportID:reportID, type:type, value:value, filter_type:filter_type, filter_value:filter_value, year:year },
success:function(dataReturn){
analysis_data = JSON.parse(dataReturn);
/* do stuff with analysis_data... */
});
This PHP script takes about 3 minutes to run, as it loops through a database and runs some pretty complex queries:
<?php
session_start();
ob_start();
ini_set('max_execution_time', 180);
$breaks = [ 1000, 2000, 4000, 6000, 8000, 10000, 20000, 50000, 99999999 ];
$breaks_length = count($breaks);
$p = 0;
foreach ( $breaks as $b ) {
$p++;
$percentage_complete = number_format($p / $breaks_length,2) . "%";
$sql = "query that takes about 20 seconds to run each loop of $b....";
$query = odbc_exec($conn, $sql);
while(odbc_fetch_row($query)){
$count = odbc_result($query, 'count');
}
$w[] = $count;
/* tried this... doesn't work as it screws up the AJAX handler success which expects JSON
echo $percentage_complete;
ob_end_flush();
*/
}
echo json_encode($w);
?>
All of this works - but what I'd really like to do is find a way after each foreach loop, to output $percentage_complete back to the user so they can see it working, instead of just sitting there for 2 minutes with a FontAwesome icon spinning in front of them. I tried using ob_start();, but not only does it not output anything until the page is done running, it echoes the value, which is then part of what is sent back to my AJAX success handler, causing it to screw up. (I need the output in a JSON_encoded format as I use it for something else later.)
So far in threads I've read, my only thought is to start the $breaks array loop on the previous page, so instead of looping 6 times on the same page, I loop once, return an answer, then call analysis.php again using the second element of the $breaks array, but I'm not sure this is the best way to go about things.
Also - during the 3 minutes that the user is waiting for this script to execute, they cannot do anything else on the page, so they just have to sit and wait. I'm sure there's a way to get this script to execute in such a way it doesn't "lock down" the rest of the server for the user, but everything I've searched for in Google doesn't give me a good answer for this as I'm not sure exactly what to search for...
You are encountering what is know as Session Locking. So basically PHP will not accept another request with session_start() until the first request has finished.
The immediate fix to your issue is to remove session_start(); from line #1 completely because I can see that you do not need it.
Now, for your question about showing a percentage on-screen:
analysis.php (modified)
<?php
ob_start();
ini_set('max_execution_time', 180);
$breaks = [ 1000, 2000, 4000, 6000, 8000, 10000, 20000, 50000, 99999999 ];
$breaks_length = count($breaks);
$p = 0;
foreach ( $breaks as $b ) {
$p++;
session_start();
$_SESSION['percentage_complete'] = number_format($p / $breaks_length,2) . "%";
session_write_close();
$sql = "query that takes about 20 seconds to run each loop of $b....";
$query = odbc_exec($conn, $sql);
while(odbc_fetch_row($query)){
$count = odbc_result($query, 'count');
}
$w[] = $count;
/* tried this... doesn't work as it screws up the AJAX handler success which expects JSON
echo $percentage_complete;
ob_end_flush();
*/
}
echo json_encode($w);
check_analysis_status.php get your percentage with this file
<?php
session_start();
echo (isset($_SESSION['percentage_complete']) ? $_SESSION['percentage_complete'] : '0%');
session_write_close();
Once your AJAX makes a call to analysis.php then just call this piece of JS:
// every half second call check_analysis_status.php and get the percentage
var percentage_checker = setInterval(function(){
$.ajax({
url: 'check_analysis_status.php',
success:function(percentage){
$('#percentage_div').html(percentage);
// Once we've hit 100% then we don't need this no more
if(percentage === '100%'){
clearInterval(percentage_checker);
}
}
});
}, 500);
I have done this a couple different ways, but the pattern I like the best is to have three scripts (or one controller to handle all of this), analysis_create.php, analysis.php, and analysis_status.php. The key is to create a DB object that you reference in your status checks (analysis_status.php). analysis_create.php will store all the data in the post into a DB table that will also have a column for percent_complete. The analysis_create.php function should return an ID/Token for the analysis. Once the front-end has the ID, it would post to analysis.php and then after a delay (250ms) kill the request, because you don't want to wait for it to finish. analysis.php should read the data out of the DB and start doing the work. You will need to make sure ignore_user_abort is set properly in your analysis.php script. Once the request to analysis.php is killed, you will start long polling to analysis_status.php with that ID. As analysis.php is working through the query, it should be updating the corresponding DB record with the percentage complete. analysis_status.php should look up this record and return the percentage complete to the front end.
I ran into the same issue. What caused it is different to what people are suggesting here.
Reason was gzip was enabled. Leading to a type of session locking even without an actual session.
Several ways to disable for one specific file:
How to disable mod_deflate in apache2?
Put this in httpd.conf
SetEnvIfNoCase Request_URI getMyFile\.php$ no-gzip dont-vary

Execute imap_close not working

I have page with customers and with ajax im loading info on whether they send us email or not.
Code looks like this:
$hostname = '{imap.gmail.com:993/imap/ssl}INBOX';
$username = 'email';
$password = 'password';
$this->session->data['imap_inbox'] = $inbox = imap_open($hostname,$username,$password) or die('Cannot connect to Gmail: ' . imap_last_error());
foreach($customers as $customer){
$emails = imap_search($inbox, 'FROM ' . $email);
// Processing info
}
But there are roughly 20-30 customers on one page, so the proccess takes sometimes about 10-20 seconds to show and I was unable to optimize the process.
But when client tries to reload a page, it is still waiting before imap_search finishes, so when reloading it could take 20 seconds before the page is actually reloaded.
I have tried to abort the ajax with beforeunload function and close the imap but this is not working.
My code:
Ajax:
$(window).bind('beforeunload',function(){
imap_email.abort(); // the ajax is succesfully aborted(as showed in console), yet the page still takes considerable time to reload
$.ajax({
type: 'GET',
url: 'getimapmails&kill=1',
async:false
}); // ajax call to the same function to call imap_close
});
PHP:
if($this->request->get['kill'] == '1'){
imap_close($this->session->data['imap_inbox']);
unset($this->session->data['imap_inbox']);
$kill == 1;
exit;
}
But even though the ajax is aborted and imap_close is called on variable holding imap_open, it still takes 10-20 seconds for page to reload, so I'm assuming the imap was not closed.
How do I close the imap so the page can reload immediately?
I would recommend killing it by creating a file that causes a break:
$hostname = '{imap.gmail.com:993/imap/ssl}INBOX';
$username = 'email';
$password = 'password';
$this->session->data['imap_inbox'] = $inbox = imap_open($hostname,$username,$password) or die('Cannot connect to Gmail: ' . imap_last_error());
foreach($customers as $customer){
clearstatcache(); //Can't use the cached result.
if(file_exists('/tmp/kill_imap.'.$this->session->id)) break; //making the assumption that /tmp and session->id are set, but the idea is a temporary folder and a unique identifier to that session.
$emails = imap_search($inbox, 'FROM ' . $email);
// Processing info
}
if(file_exists('/tmp/kill_imap.'.$this->session->id)) unlink('/tmp/kill_imap.'.$this->session->id);
Then on your exit ajax, just call to a php script that simply creates that file. and it will break your loop and remove the file.
If I understood correctly, the time-consuming code lies within the foreach() loop.
Now, even if you make a second request to kill the IMAP session, that foreach() loop will continue until either it finishes or PHP kills it if (and when) execution time exceeds your max_execution_time setting.
In any case, you need something within your foreach() loop that will check on each round if a condition to abort has been met, so as to switfly terminate the current request and allow the client to make new one.
I suggest you look at the PHP function connection_aborted(), that you could use to detect once the client aborts the current request, and more generally you could read on the topic of connection handling to get a better sense of how connections and requests are handled in PHP.

What is causing a very slow ajax response?

I wrote some PHP code to help me connect to a REST API for a telephone system (ie. ICWS.php.)
Then to make my life easier, I wrote a small script (ie. interations.php) that accepts two parameters: a method and an ID. This script will basically call a public method in my PHP connector.
In addition, I have another script (ie. poll.php). This script will ping the API once every half a second to see if there is a new message available. I am using server-side polling to handle this. The code below will show how poll.php
while(1){
//process Messages
$icws->processMessages();
//show the Calls Queue
$result = $icws->getCallsQueue();
//$current = $icws->getCurrentUserStatusQueue();
echo 'event: getMessagingQueue' . "\n";
echo 'data: ' . json_encode( array('calls' => $result));
echo "\n\n"; //required
ob_flush();
flush();
putToSleep($sleepTime);
}
function putToSleep($val = 1){
if( is_int($val) ){
sleep($val);
} else {
usleep($val * 1000000);
}
}
From my site (ie. phonesystem.html) I start server-side polling "which pings the API once every 1/2 seconds." From the same page, I can also make other direct calls (ie. Dial 7204536695); all requests are done via Ajax.
Here is my code that generates the server-side polling
//Server Side Message Polling
function startPolling(){
var evtSource = new EventSource("poll.php");
evtSource.addEventListener("getMessagingQueue", function(e) {
var obj = JSON.parse(e.data);
if(!obj.calls || obj.calls.length === 0){
console.log('no messages');
phoneKeyPad(false);
return;
}
processMessages(obj.calls);
}, false);
}
$(function(){
startPolling();
});
The problem that I am facing is that when making an ajax call, the response takes way too long (+1 minute).
It seems that the Apache server slows down as using other application becomes a little slower.
What can I check and how can I trouble shoot this problem?

enhancing php script performance

(subject of this ques might not match with the ques, but i couldn't think of better) I have a webpage, where user provides email address of recipients, there can be 100 and more email addresses delimited by ; provided.in the textarea. Ofcourse i have to send an email to all those addresses. I have 2 approches in mind but couldn't decide on which one would provide better user experience and performance.
approach 1: i loop through all those emails in my js and send ajax request to php script. But then there would be 100 requests to the server, and if user closes browser in between, all email address wont go through
approach 2: i send all the 100 email addresses in one go to the php script, and let php script loop through emails. I am assuming that i would be able to echo some mesg back to client with success message after each loop count, and even if client is dead, then also at least php will keep executing untill loop ends
can somebody pls provide me cons and pros of these 2 approaches
Here is an idea on how to implement a queue.
define('MAX_EMAIL_BUFFER_SIZE', 15);
// do a query to see how many emails are needed to be sent, you need to do store
// this data in mysql or some other place.
// array getEmails() { }
$total = count( getEmails());
$pages = ceil($total / MAX_EMAIL_BUFFER_SIZE);
$i = 0;
for(; $i < $total; $i++) {
$offset = ($page - 1) * MAX_EMAIL_BUFFER_SIZE;
/* query
SELECT
*
FROM
table
ORDER BY
name
LIMIT
MAX_EMAIL_BUFFER_SIZE
OFFSET
$offset
*/
// the result returned by the query are the emails you wills send.
// do the above query in a function that returns the results
foreach($data as $email) {
mail(...);
}
// sleep for 10 seconds.
sleep(10);
}

Categories