Loading CSVs Periodically using AJAX - javascript

I'm trying to load multiple CSV files each 5seconds for updating some displays in Google Maps API v3, but only loads one or two, never all the files.
Here is the code:
setInterval(function() {
checkLaneStatus();
initMap();
}, 2000);
function checkLaneStatus(){
laneStatus('landing_lane.csv',landing_lane);
laneStatus('landing_curve.csv',landing_curve);
laneStatus('arrival_lane_1.csv',arrival_lane_1);
laneStatus('arrival_lane_2.csv',arrival_lane_2);
laneStatus('arrival_lane_3.csv',arrival_lane_3);
laneStatus('arrival_lane_4.csv',arrival_lane_4);
laneStatus('t1.csv',terminal1);
laneStatus('t2.csv',terminal2);
laneStatus('t3.csv',terminal3);
laneStatus('t4.csv',terminal4);
laneStatus('departure_lane_1.csv',departure_lane_1);
laneStatus('departure_lane_2.csv',departure_lane_2);
laneStatus('departure_lane_3.csv',departure_lane_3);
laneStatus('departure_lane_4.csv',departure_lane_4);
laneStatus('departure_curve.csv',departure_curve);
laneStatus('departure_lane.csv',departure_lane);
}
function laneStatus(file,lane){
var temp = lane;
$.ajax({
type: "GET",
url: file,
dataType: "text",
success: function(data) {processData(data);}
});
function processData(allText) {
console.log(allText);
var allTextLines = allText.split(/\r\n|\n/);
var entries = allTextLines[0].split(',');
if(entries[2] != -1){
temp.setOptions({strokeColor: colorRed, fillColor: colorRed});
}
}
}
CSV file example (id,name,status): If the status is -1 means that the lane is free any other number means that is busy.
1,'Departure Lane 1',-1
The supposed behavior is: Loading same csv and update display colors, so should check all csv for detect changes. But it only loads one csv each interval, so only that "lane" is updated.
laneStatus function receives the location of the csv that are located in the root (same folder of the index.html where the code is executed). "lane" argument is a google.maps.Rectangle object.
I guess that I explained well, would appreciate any reply!
Thanks!

You are likely hitting your browsers max simultaneous request limit. Because you cannot control your end-users browser config what about implementing a simple queue that is just a first in first out array. Add each request to the queue, and have a second interval that creates one of those requests every 500 milliseconds or so until the queue is empty.
If you are in control of the data could you combine some of the files with one more column for what lane it corresponds to? Or even write a simple middleware layer that combines them, it would speed up the page quite a bit as it only has to open the one connection.
I also like using web sockets (socket.io) if you are going to pull data that often as it keeps a persistent connection across requests.

Related

Web Scraping interactive map (javascript) with R and PhantomJS

I am trying to scrape data from an interactive map (looking to get crime data for a county). I am using R (rvest) and trying to use phantomjs too. I'm new to web scraping so I am not really understanding how all the elements work together (trying to get there).
The problem I believe I am having is that after I run the phantomjs and upload the html using R's rvest package, I end up with more scripts and no clear data in the html. My code is below.
writeLines("var url = 'http://www.google.com';
var page = new WebPage();
var fs = require('fs');
page.open(url, function (status) {
just_wait();
});
function just_wait() {
setTimeout(function() {
fs.write('cool.html', page.content, 'w');
phantom.exit();
}, 2500);
}
", con = "scrape.js")
A function that takes in the url that I want to scrape
s_scrape <- function(url = "https://gis.adacounty.id.gov/apps/crimemapper/",
js_path = "scrape.js",
phantompath = "/Users/alihoop/Documents/phantomjs/bin/phantomjs"){
# this section will replace the url in scrape.js to whatever you want
lines <- readLines(js_path)
lines[1] <- paste0("var url ='", url ,"';")
writeLines(lines, js_path)
command = paste(phantompath, js_path, sep = " ")
system(command)
}
Execute the js_scrape() function and get a html file saved as "cool.html"
js_scrape()
Where I am not understanding what to do next is the below R code:
map_data <- read_html('cool.html') %>%
html_nodes('script')
The output I get in the HTML via phantomjs is just scripts again. Looking for help on how to proceed when faced (in my mind) is javascript nested in javascript scripts(?)
Thank you!
This site uses javascript to make queries to the server. One solution is to reproduce the rest request and read the returning JSON file directly. This avoids the need to use Phantomjs.
From the developer tools options from your browser and looking through the xhr files, you will find a file(s) named "query" with a link similar to: "https://gisapi.adacounty.id.gov/arcgis/rest/services/CrimeMapper/CrimeMapperWAB/FeatureServer/11/query?f=json&where=1%3D1&returnGeometry=true&spatialRel=esriSpatialRelIntersects&outFields=*&outSR=102100&resultOffset=0&resultRecordCount=1000"
Read this JSON response directly and convert to a list with the use of the jsonlite package:
library(jsonlite)
output<-jsonlite::fromJSON("https://gisapi.adacounty.id.gov/arcgis/rest/services/CrimeMapper/CrimeMapperWAB/FeatureServer/11/query?f=json&where=1%3D1&returnGeometry=true&spatialRel=esriSpatialRelIntersects&outFields=*&outSR=102100&resultOffset=0&resultRecordCount=1000")
output$features
Find the first number in the link, (11 in this case) "FeatureServer/11/query?f=json". This number will determine which crime to query the server with. I found, it can take a value from 0 to 11. Enter 0 for arson, 4 for drugs, 11 for vandalism, etc.

Node.js: requesting a page and allowing the page to build before scraping

I've seen some answers to this that refer the askee to other libraries (like phantom.js), but I'm here wondering if it is at all possible to do this in just node.js?
Considering my code below. It requests a webpage using request, then using cheerio it explores the dom to scrape the page for data. It works flawlessly and if everything had gone as planned, I believe it would have outputted a file as i imagined in my head.
The problem is that the page I am requesting in order to scrape, build the table im looking at asynchronously using either ajax or jsonp, i'm not entirely sure how .jsp pages work.
So here I am trying to find a way to "wait" for this data to load before I scrape the data for my new file.
var cheerio = require('cheerio'),
request = require('request'),
fs = require('fs');
// Go to the page in question
request({
method: 'GET',
url: 'http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp'
}, function(err, response, body) {
if (err) return console.error(err);
// Tell Cherrio to load the HTML
$ = cheerio.load(body);
// Create an empty object to write to the file later
var toSort = {}
// Itterate over DOM and fill the toSort object
$('#emb table td.list_right').each(function() {
var row = $(this).parent();
toSort[$(this).text()] = {
[$("#lastdate").text()]: $(row).find(".idx1").html(),
[$("#currdate").text()]: $(row).find(".idx2").html()
}
});
//Write/overwrite a new file
var stream = fs.createWriteStream("/tmp/shipping.txt");
var toWrite = "";
stream.once('open', function(fd) {
toWrite += "{\r\n"
for(i in toSort){
toWrite += "\t" + i + ": { \r\n";
for(j in toSort[i]){
toWrite += "\t\t" + j + ":" + toSort[i][j] + ",\r\n";
}
toWrite += "\t" + "}, \r\n";
}
toWrite += "}"
stream.write(toWrite)
stream.end();
});
});
The expected result is a text file with information formatted like a JSON object.
It should look something like different instances of this
"QINHUANGDAO - GUANGZHOU (50,000-60,000DWT)": {
 "2016-09-29": 26.7,
"2016-09-30": 26.8,
},
But since the name is the only thing that doesn't load async, (the dates and values are async) I get a messed up object.
I tried Actually just setting a setTimeout in various places in the code. The script will only be touched by developers that can afford to run the script several times if it fails a few times. So while not ideal, even a setTimeout (up to maybe 5 seconds) would be good enough.
It turns out the settimeouts don't work. I suspect that once I request the page, I'm stuck with the snapshot of the page "as is" when I receive it, and I'm in fact not looking at a live thing I can wait for to load its dynamic content.
I've wondered investigating how to intercept the packages as they come, but I don't understand HTTP well enough to know where to start.
The setTimeout will not make any difference even if you increase it to an hour. The problem here is that you are making a request against this url:
http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp
and their server returns back the html and in this html there are the js and css imports. This is the end of your case, you just have the html and that's it. Instead the browser knows how to use and to parse the html document, so it is able to understand the javascript scripts and to execute/run them and this is exactly your problem. Your program is not able to understand that has something to do with the HTML contents. You need to find or to write a scraper that is able to run javascript. I just found this similar issue on stackoverflow:
Web-scraping JavaScript page with Python
The guy there suggests https://github.com/niklasb/dryscrape and it seems that this tool is able to run javascript. It is written in python though.
You are trying to scrape the original page that doesn't include the data you need.
When the page is loaded, browser evaluates JS code it includes, and this code knows where and how to get the data.
The first option is to evaluate the same code, like PhantomJS do.
The other (and you seem to be interested in it) is to investigate the page's network activity and to understand what additional requests you should perform to get the data you need.
In your case, these are:
http://index.chineseshipping.com.cn/servlet/cbfiDailyGetContrast?SpecifiedDate=&jc=jsonp1475577615267&_=1475577619626
and
http://index.chineseshipping.com.cn/servlet/allGetCurrentComposites?date=Tue%20Oct%2004%202016%2013:40:20%20GMT+0300%20(MSK)&jc=jsonp1475577615268&_=1475577620325
In both requests:
_ is a decache parameter to prevent caching.
jc is a name of a JS wrapper function which should be invoked with the result (https://en.wikipedia.org/wiki/JSONP)
So, scrapping the table template at http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp and performing two additional requests you will be able to combine them into the same data structure you see in the browser.

How to find specific cache entries in firefox and turn them into a File or Blob object?

I have the following scenario:
A user can paste html content in a wysiwyg editor. When that pasted content contains images which are hosted on other domains, I want these to be uploaded to my server. Right now the only way of doing that is manually downloading via "save image as..." context menu, then uploading the image to the server via a form and updating the images in the editor.
I have to solve this client side.
I'm working on a firefox addon that can automate the process. Of course I could download these images, store them on the harddrive and then upload them with FormData or better the pupload , but this seems clumsy as since the content is displayed in the browser, it must be downloaded already and reside somewhere in memory. I would like to grab the image files from memory and tell firefox to upload them (being able to make a Blob of them would suffice it seems).
However, I'm getting hopelessly lost in the API documentation for several different Caching systems on MDN and fail to find any example code of how to use them. I checked code of other addons that access the cache, but most is uncommented and still quite cryptic.
Can you point me to some sample code of what the recommended way would be to achieve this? The best possible solution would be if I can request the particular url from firefox so I can use it in FormData, and if it isn't in the cache firefox downloads to memory, but if it's already there I just get it directly.
The master documentation for Mozilla's version 2 HTTP Cache is located here. Aside from the blurbs on this page, the only way I was able to make sense of this new scheme is by looking at the actual code for each object and back-referencing almost everything. Even though I wasn't able to get a 100% clear picture of what exactly was going on, I figured out enough to get it working. In my opinion, Mozilla should have taken the time to create some simple-terms documentation before they went ahead an pushed out the new API. But, we get what they give us I suppose.
On to your problem. We're assuming that the users who want to upload an image already have this image saved in their cache somewhere. In order to be able to pull it out of the user's cache for upload, you must first be able to determine the URI of the image before it can be pulled explicitly from the cache. For the sake of brevity, I'm going to assume that you already have this part figured out.
An important thing to note about the new HTTP Cache is that although it's all based off callbacks, there can still only ever be a single writing process. While in your example it may not be necessary to write to the descriptor, you should still request write access since that will prevent any other processes (i.e. the browser) from altering/deleting the data until you are done with it. Another side note and a source of a lot of pain for me was the fact that requesting a cache entry from the memory cache will ALWAYS created a new entry, overwriting any pre-existing entries. You shouldn't need this, but if it is necessary, you can access the memory cache from the disk (the disk cache is physical disk+memory cache -- Mozilla logic) cache without that side effect.
Once the URI is in hand, you can then make a request to pull it out of the cache. The new caching system is based completely on callbacks. There is one key object that we will need in order to be able to fetch the cache entry's data -- nsICacheEntryOpenCallback. This is a user-defined object that handles the response after a cache entry is requested. It must have two member functions: onCacheEntryCheck(entry, appcache) and onCacheEntryAvilable(descriptor, isnew, appcache, status).
Here is a cut-down example from my code of such an object:
var cacheWaiter = {
//This function essentially tells the cache service whether or not we want
//this cache descriptor. If ENTRY_WANTED is returned, the cache descriptor is
//passed to onCacheEntryAvailable()
onCacheEntryCheck: function( descriptor, appcache )
{
//First, we want to be sure the cache entry is not currently being written
//so that we can be sure that the file is complete when we go to open it.
//If predictedDataSize > dataSize, chances are it's still in the process of
//being cached and we won't be able to get an exclusive lock on it and it
//will be incomplete, so we don't want it right now.
try{
if( descriptor.dataSize < descriptor.predictedDataSize )
//This tells the nsICacheService to call this function again once the
//currently writing process is done writing the cache entry.
return Components.interfaces.nsICacheEntryOpenCallback.RECHECK_AFTER_WRITE_FINISHED;
}
catch(e){
//Also return the same value for any other error
return Components.interfaces.nsICacheEntryOpenCallback.RECHECK_AFTER_WRITE_FINISHED;
}
//If no exceptions occurred and predictedDataSize == dataSize, tell the
//nsICacheService to pass the descriptor to this.onCacheEntryAvailable()
return Components.interfaces.nsICacheEntryOpenCallback.ENTRY_WANTED;
}
//Once we are certain we want to use this descriptor (i.e. it is done
//downloading and we want to read it), it gets passed to this function
//where we can do what we wish with it.
//At this point we will have full control of the descriptor until this
//function exits (or, I believe that's how it works)
onCacheEntryAvailable: function( descriptor, isnew, appcache, status )
{
//In this function, you can do your cache descriptor reads and store
//it in a Blob() for upload. I haven't actually tested the code I put
//here, modifications may be needed.
var cacheentryinputstream = descriptor.openInputStream(0);
var blobarray = new Array(0);
var buffer = new Array(1024);
for( var i = descriptor.dataSize; i == 0; i -= 1024)
{
var chunksize = 1024;
if( i < 0 )
chunksize = 1024 + i;
try{
cacheentryinputstream.read( buffer, chunksize );
}
catch(e){
//Nasty NS_ERROR_WOULD_BLOCK exceptions seem to happen to me
//frequently. The Mozilla guys don't provide a way around this,
//since they want a responsive UI at all costs. So, just keep
//trying until it succeeds.
i += 1024;
continue;
}
for( var j = 0; j < chunksize; j++ )
{
blobarray.push(buffer.charAt(j));
}
if( i < 0 )
i = 0 //Set i == 0 to signal loop break
}
}
var theblob = new Blob(blobarray);
//Do an AJAX POST request here.
}
Now that the callback object is set up, we can actually do some requests for cache descriptors. Try something like this:
var theuri = "http://www.example.com/image.jpg";
//Load the cache service
var cacheservice = Components.classes["#mozilla.org/netwerk/cache-storage-service;1"].getService(Components.interfaces.nsICacheStorageService)
//Select the default disk cache.
var hdcache = cacheservice.diskCacheStorage(Services.loadContextInfo.default, true);
//Request a cache entry for the URI. OPEN_NORMALLY requests write access.
hdcache.asyncOpenURI(ioservice.newURI(theuri, null, null), "", hdcache.OPEN_NORMALLY, cacheWaiter);
As far as actually getting the URI, you could provide a window for a user to drag-and-drop an image into or perhaps just paste the URL of the image into. Then, you could do an AJAX request to fetch the image (in the case that the user hasn't actually visited the image for some reason, it would then be cached). You could then use that URL to then fetch the cache entry for upload. As an aesthetic touch, you could even show a preview of the image but that's a bit out of scope of the question.
If you need any more clarifications, please feel free to ask!

Different external .js files with same variable names

I'm making a websites that displays noise measurement data from different locations. The data for each location is captured on a sound level meter device and it is then read with a windows-based application. The application then uploads data on a web server as a .js file with an array variable in it. This .js files are refreshed every 5 minutes.
I first created a javascript application that displays live data for a single measuring unit. But now I need to display data on a map for all the locations. The problem is that the windows application on each location makes a file with the same name and same variables only on another location. I'm having some trouble with reading the correct data.
This is what I did so far:
function removejscssfile(filename, filetype){
var targetelement=(filetype=="js")? "script" : (filetype=="css")? "link" : "none" //determine element type to create nodelist from
var targetattr=(filetype=="js")? "src" : (filetype=="css")? "href" : "none" //determine corresponding attribute to test for
var allsuspects=document.getElementsByTagName(targetelement)
for (var i=allsuspects.length; i>=0; i--){ //search backwards within nodelist for matching elements to remove
if (allsuspects[i] && allsuspects[i].getAttribute(targetattr)!=null && allsuspects[i].getAttribute(targetattr).indexOf(filename)!=-1)
allsuspects[i].parentNode.removeChild(allsuspects[i]) //remove element by calling parentNode.removeChild()
}
}
function updateData(){
var numberOfNoiseSniffers = noiseSniffers.length-1;
var j = 0;
for (i=0;i<=numberOfNoiseSniffers;i++) {
file = '../'+ noiseSniffers[i] + "/" + "CurrentMeasurement.js";
$.include(file,function(){
laeq[j] = currentMeas[1][1];
lastUpdate[j] = currentMeas[0][1];
if (j==numberOfNoiseSniffers){
updateMarkers();
}
removejscssfile(file[0], "js");
j++;
});
}
t=setTimeout(function() { updateData() }, 300000);
}
$(function (){
map = new google.maps.Map(document.getElementById("gMap"), myOptions);
//noiseSniffers is an array where I have save all the folder names of different measurement locations
var numberOfNoiseSniffers = noiseSniffers.length-1;
var j = 0;
for (i=0;i<=numberOfNoiseSniffers;i++) {
var file = '../'+ noiseSniffers[i] + "/" + "CurrentMeasurement.js";
//I am using include plugin for jquery to include files because it has a callback for when a file is actually loaded
$.include(file,function(){
//a set of global arrays that keep the data from the loaded file and this data is then displayed in google maps markers
laeq[j] = currentMeas[1][1];
lastUpdate[j] = currentMeas[0][2];
latitude[j] = systemstats[12][5];
longitude[j] = systemstats[11][6];
//checking to see if I am in the process of including the last file
if (j==numberOfNoiseSniffers){
//a function that creates google maps markers
createMarkers();
}
//after that I remove the files that were just included and read
removejscssfile(file, "js");
j++;
});
}
setTimeout(function() { updateData() }, 300000);
});
I got the function for removing my .js file here: Dynamically removing an external JavaScript or CSS file.
And this is the jquery plugin for loading the .js file: Include File On Demand.
The initial load usually works (sometimes it happens that only one or no markers get loaded. But the update function mostly returns the same data for both locations.
So what I want to know is, how can I firstly make my code working and how to optimize it. I posted just the main parts of the javascript code, but I can provide all the code if it is needed. Thanks for any help.
I think you need some sort of JSONP-like solution.
Basically load data on the server side, then wrap it in a method call before returning it to client side. Your response should look something like this:
var location_data = [1,2,3,4]
updateLocation('location_id', location_data)
Now you define an updateLocation() function in your client side script. Now, every time you need new data, you create new 'script' tag with src pointing to your server side. When the response is loaded, your updateLocation() will be invoked with correct params.
I hope this is clear enough
You can maybe try some form of namespacing
i exactly dont understood your problem, but you may try this
//put your code inside an anonymous function and execute it immediately
(function(){
//your javascript codes
//create variable with same names here
//
})();

jQuery Slimbox is not requesting files correctly

I am using jQuery slimbox with it's API.
Here is my JavaScript that gets image paths via JSON, and then launches the slimbox via it's API.
$('#main-container').append('<span id="check-our-adverts">Check our Adverts</span>');
var advertImages = [];
$.getJSON( config.basePath + 'get-adverts/', function(images) {
advertImages = images;
});
$('#check-our-adverts').click(function() {
console.log(advertImages);
$.slimbox(advertImages, 0);
});
The JSON is returning ["\/~wwwlime\/assets\/images\/adverts\/advert.jpg","\/~wwwlime\/assets\/images\/adverts\/advert2.jpg"].
The actual page is here. Click top red box next to the frog. If you have a console, check it for the JSON returned.
When I view the request using Live HTTP Headers, it seems slimbox is requesting vanquish.websitewelcome.com/ and nothing else.
This is resulting in the slimbox being launched, and it's throbber spinning forever.
What could be causing this problem? Thanks
Update
I added this inside the JSON callback
$.each(images, function(i, image) {
$('body').append('link');
});
And clicking those links takes me directly to the image... what gives?
I am not 100% familiar with slimbox but the api says that the method takes and array of arrays, so your return from JSON should, i believe, look more like
[["\/~wwwlime\/assets\/images\/adverts\/advert.jpg"],["\/~wwwlime\/assets\/images\/adverts\/advert2.jpg"]]
making you call to slimbox
$.slimbox( [["\/~wwwlime\/assets\/images\/adverts\/advert.jpg"],["\/~wwwlime\/assets\/images\/adverts\/advert2.jpg"]],0);
let me know if that helps?

Categories