I'm working on some changes to a page that needs to retrieve information from some files under /proc so the page can display version information to the user. Currently, the page is generated entirely by the Python script, which allows me to just read the file and put everything in the page at creation time.
However, this led to the issue that the version numbers wouldn't update when a new version of the software was uploaded. I don't want to regenerate the page every time a new package is installed, so I made the main page static and want to instead just query the information from a Python script and return it to the page to populate the page when loaded.
The Python scripts are set up as CGI and have sudo access, so there's no issue with them retrieving those files. However, if I wanted to use something like AJAX to call the Python script, is there any way I could return the data without using a REST framework such as Flask or Django? The application needs to be lightweight and preferably not rely on a new framework.
Is there a way I can do this with vanilla JavaScript and Python?
Ok, so the solution was fairly simple, I just made a few syntactical errors that led to it not working the first few times I tried it.
So the request looked like this:
window.onload = function() {
var xhr = new XMLHttpRequest();
xhr.onreadystatechange = function() {
if((this.readyState == 4) && (this.status == 200)) {
var response = JSON.parse(this.responseText);
// Do stuff with the JSON here...
}
};
xhr.open("GET", scriptURL, true);
xhr.send();
}
From there, the Python script simply needed to do something like this to return JSON data containing my version numbers:
import sys, cgi, json
result = {}
result['success'] = True
result['message'] = "The command completed successfully"
d = {}
... write version information to the 'd' map ...
result['data'] = d
sys.stdout.write("Content-Type: text/plain\n\n")
sys.stdout.write(json.dumps(result))
sys.stdout.write("\n")
sys.stdout.close()
The most persistent problem that took me forever to find was I forgot a closing quotation in my script tag, which caused the whole page to not load.
Related
I am trying to scrape data from an interactive map (looking to get crime data for a county). I am using R (rvest) and trying to use phantomjs too. I'm new to web scraping so I am not really understanding how all the elements work together (trying to get there).
The problem I believe I am having is that after I run the phantomjs and upload the html using R's rvest package, I end up with more scripts and no clear data in the html. My code is below.
writeLines("var url = 'http://www.google.com';
var page = new WebPage();
var fs = require('fs');
page.open(url, function (status) {
just_wait();
});
function just_wait() {
setTimeout(function() {
fs.write('cool.html', page.content, 'w');
phantom.exit();
}, 2500);
}
", con = "scrape.js")
A function that takes in the url that I want to scrape
s_scrape <- function(url = "https://gis.adacounty.id.gov/apps/crimemapper/",
js_path = "scrape.js",
phantompath = "/Users/alihoop/Documents/phantomjs/bin/phantomjs"){
# this section will replace the url in scrape.js to whatever you want
lines <- readLines(js_path)
lines[1] <- paste0("var url ='", url ,"';")
writeLines(lines, js_path)
command = paste(phantompath, js_path, sep = " ")
system(command)
}
Execute the js_scrape() function and get a html file saved as "cool.html"
js_scrape()
Where I am not understanding what to do next is the below R code:
map_data <- read_html('cool.html') %>%
html_nodes('script')
The output I get in the HTML via phantomjs is just scripts again. Looking for help on how to proceed when faced (in my mind) is javascript nested in javascript scripts(?)
Thank you!
This site uses javascript to make queries to the server. One solution is to reproduce the rest request and read the returning JSON file directly. This avoids the need to use Phantomjs.
From the developer tools options from your browser and looking through the xhr files, you will find a file(s) named "query" with a link similar to: "https://gisapi.adacounty.id.gov/arcgis/rest/services/CrimeMapper/CrimeMapperWAB/FeatureServer/11/query?f=json&where=1%3D1&returnGeometry=true&spatialRel=esriSpatialRelIntersects&outFields=*&outSR=102100&resultOffset=0&resultRecordCount=1000"
Read this JSON response directly and convert to a list with the use of the jsonlite package:
library(jsonlite)
output<-jsonlite::fromJSON("https://gisapi.adacounty.id.gov/arcgis/rest/services/CrimeMapper/CrimeMapperWAB/FeatureServer/11/query?f=json&where=1%3D1&returnGeometry=true&spatialRel=esriSpatialRelIntersects&outFields=*&outSR=102100&resultOffset=0&resultRecordCount=1000")
output$features
Find the first number in the link, (11 in this case) "FeatureServer/11/query?f=json". This number will determine which crime to query the server with. I found, it can take a value from 0 to 11. Enter 0 for arson, 4 for drugs, 11 for vandalism, etc.
I've seen some answers to this that refer the askee to other libraries (like phantom.js), but I'm here wondering if it is at all possible to do this in just node.js?
Considering my code below. It requests a webpage using request, then using cheerio it explores the dom to scrape the page for data. It works flawlessly and if everything had gone as planned, I believe it would have outputted a file as i imagined in my head.
The problem is that the page I am requesting in order to scrape, build the table im looking at asynchronously using either ajax or jsonp, i'm not entirely sure how .jsp pages work.
So here I am trying to find a way to "wait" for this data to load before I scrape the data for my new file.
var cheerio = require('cheerio'),
request = require('request'),
fs = require('fs');
// Go to the page in question
request({
method: 'GET',
url: 'http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp'
}, function(err, response, body) {
if (err) return console.error(err);
// Tell Cherrio to load the HTML
$ = cheerio.load(body);
// Create an empty object to write to the file later
var toSort = {}
// Itterate over DOM and fill the toSort object
$('#emb table td.list_right').each(function() {
var row = $(this).parent();
toSort[$(this).text()] = {
[$("#lastdate").text()]: $(row).find(".idx1").html(),
[$("#currdate").text()]: $(row).find(".idx2").html()
}
});
//Write/overwrite a new file
var stream = fs.createWriteStream("/tmp/shipping.txt");
var toWrite = "";
stream.once('open', function(fd) {
toWrite += "{\r\n"
for(i in toSort){
toWrite += "\t" + i + ": { \r\n";
for(j in toSort[i]){
toWrite += "\t\t" + j + ":" + toSort[i][j] + ",\r\n";
}
toWrite += "\t" + "}, \r\n";
}
toWrite += "}"
stream.write(toWrite)
stream.end();
});
});
The expected result is a text file with information formatted like a JSON object.
It should look something like different instances of this
"QINHUANGDAO - GUANGZHOU (50,000-60,000DWT)": {
"2016-09-29": 26.7,
"2016-09-30": 26.8,
},
But since the name is the only thing that doesn't load async, (the dates and values are async) I get a messed up object.
I tried Actually just setting a setTimeout in various places in the code. The script will only be touched by developers that can afford to run the script several times if it fails a few times. So while not ideal, even a setTimeout (up to maybe 5 seconds) would be good enough.
It turns out the settimeouts don't work. I suspect that once I request the page, I'm stuck with the snapshot of the page "as is" when I receive it, and I'm in fact not looking at a live thing I can wait for to load its dynamic content.
I've wondered investigating how to intercept the packages as they come, but I don't understand HTTP well enough to know where to start.
The setTimeout will not make any difference even if you increase it to an hour. The problem here is that you are making a request against this url:
http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp
and their server returns back the html and in this html there are the js and css imports. This is the end of your case, you just have the html and that's it. Instead the browser knows how to use and to parse the html document, so it is able to understand the javascript scripts and to execute/run them and this is exactly your problem. Your program is not able to understand that has something to do with the HTML contents. You need to find or to write a scraper that is able to run javascript. I just found this similar issue on stackoverflow:
Web-scraping JavaScript page with Python
The guy there suggests https://github.com/niklasb/dryscrape and it seems that this tool is able to run javascript. It is written in python though.
You are trying to scrape the original page that doesn't include the data you need.
When the page is loaded, browser evaluates JS code it includes, and this code knows where and how to get the data.
The first option is to evaluate the same code, like PhantomJS do.
The other (and you seem to be interested in it) is to investigate the page's network activity and to understand what additional requests you should perform to get the data you need.
In your case, these are:
http://index.chineseshipping.com.cn/servlet/cbfiDailyGetContrast?SpecifiedDate=&jc=jsonp1475577615267&_=1475577619626
and
http://index.chineseshipping.com.cn/servlet/allGetCurrentComposites?date=Tue%20Oct%2004%202016%2013:40:20%20GMT+0300%20(MSK)&jc=jsonp1475577615268&_=1475577620325
In both requests:
_ is a decache parameter to prevent caching.
jc is a name of a JS wrapper function which should be invoked with the result (https://en.wikipedia.org/wiki/JSONP)
So, scrapping the table template at http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp and performing two additional requests you will be able to combine them into the same data structure you see in the browser.
I don't want solution using Node.js, FileReader, or whatever else exept javascript!
Developing the html page, I encountered a problem as follows:
I get accurate results with this procedure, unfortunately the procedure remembers result of the first login page. Whatever text file in the meantime change the content, the procedure returns the first result.
Can someone give advice!
var filePath = "../../dir/sub dir/text_file.txt";
function getBackData(filePath){
var axd, i, artx, txli, tdr;
if(window.XMLHttpRequest){
axd = new XMLHttpRequest();
}else{
axd = new ActiveXObject("Microsoft.XMLHTTP");
}
axd.open('GET', filePath, true);
axd.onreadystatechange = function(){
if(axd.readyState == 4 && axd.status == 200){
artx = axd.responseText;
txli = artx.split("\n");
for(i = 0; i < txli.length; i++){
alert(txli[i]);
}
}
}
axd.send(null);
}
You could try :axd setRequestHeader('Cache-Control', 'no-cache');
Or try: axd.open('GET', filePath+'?_=' + new Date().getTime()), true); This will prevent your server from using the cash, because each request is different.
This is probably because the browser cache it. If you send a parameter which is almost always different like timestamp, you can disable the cache. Or you can try to use POST, since post request are never cached.
You are not allowed to request local files directly with ajax - you need a server to serve them. The browser is sandboxed and cannot generally open local files. This is a security measure - imagine what would happen if any website was allowed to open your files!
There are ways to set up a simple server for your images, like http-server. This allows you to serve files directly off a chosen directory, like so:
npm install -g http-server
http-server path-to-text-files/
Then you can request the files normally with ajax, at a path relative to the one your server is serving, like so:
url = "/dir/subdir/text-file.txt";
...
ajax.open('GET', url, true);
...
ajax.send(null);
I would like to add a function in my javascript to write to a text file in the local directory where the javascript file is located. This means I'm not looking for some insecure way of accessing the user's file system in any way. All I care about is extracting the user's input into an html page that is accessed by my javascript then using that input as data externally. I just need a simple text file. This user input isn't actually text by the way, but rather a bunch of actions using my online game's components that the underlying javascript turns into a text string (so this particular string is what I want to save, not really even anything direct from the user).
I don't want to write to a user's file system, but rather, the file where the javascript (and html) code is located (a folder hosted on a server). Is there any simple way to get some file I/O going?
I know Javascript has a FileReader, is there any way to get it to do this in reverse? Like a FileWriter. GoogleClosure looks like it has a FileWriter, but it doesn't seem to quite work and I can't find any decent examples of how to get it to do this.
If this requires a different language, is there any way I can just get the relevant snippet and insert this into my Javascript file?
(the folder is hosted on a Linux system if that helps)
ADDENDUM: Elias Van Ootegem's solution below is excellent and I would highly recommend looking into it as it's a great example of client-server interaction and getting your system to provide you the data you're looking to extract. Workers are pretty interesting.
But for those of you looking at this post with that similar question that I initially had about JavaScript I/O, I found one other work-a-round depending on your case. My team's project site made use of a database site, MongoDB, that stored some of the user's interaction data if the user had hit a "Save" button. MongoDB, and other online database systems, provide a "dumping" function/script that you can call from your local machine/server and put that data into an output file (I was able to put the JSON data into a text file). From that output, you can write a parser to extract and format the data you desire from that output since databases like MongoDB can be pretty clear as to what format the text will be in (very structured, organized). I wrote a parser in C (with a few libraries I had written to extend the language) to do what I needed, but the idea is pretty generalizable to other programming/scripting languages.
I did look at leaving cookies as option as well, and made use of a test program to try it out (it works too!). However, one tradeoff for leaving cookies on a user's local system is that those cookies generally are meant to hold small amounts of data (usually things like username, date created, & expiration date of the cookie) and are dependent upon the user's local machine. Further, while you can extract the data in those cookies from JavaScript, you are back to the initial problem: the data still exists on the web, not on an output file on your server's file system. In the case you need to extract data and want some guarantee this data will exist on your machine, use Elias Van Ootegem's solution.
JavaScript code that is running client-side cannot access the server's filesystem at the same time, let alone write a file. People often say that, if JS were to have IO capabilities, that would be rather insecure... just imagine how dangerous that would be.
What you could do, is simply build your string, using a Worker that, on closing, returns the full data-string, which is then sent to the server (AJAX call).
The server-side script (Perl, PHP, .NET, Ruby...) can receive this data, parse it and then write the file to disk as you want it to.
All in all, not very hard, but quite an interesting project anyway. Oh, and when using a worker, seeing as it's an online game and everything, perhaps a setInterval to send (a part of) the data every 5000ms might not be a bad idea, either.
As requested - some basic code snippets.
A simple AJAX-setup function:
function getAjax(url,method, callback)
{
var ret;
method = method || 'POST';
url = url || 'default.php';
callback = callback || success;//assuming you have a default function called "success"
try
{
ret = new XMLHttpRequest();
}
catch (error)
{
try
{
ret= new ActiveXObject('Msxml2.XMLHTTP');
}
catch(error)
{
try
{
ret= new ActiveXObject('Microsoft.XMLHTTP');
}
catch(error)
{
throw new Error('no Ajax support?');
}
}
}
ret.open(method, url, true);
ret.setRequestHeader('X-Requested-With', 'XMLHttpRequest');
ret.setRequestHeader('Content-type', 'application/x-www-form-urlencode');
ret.onreadystatechange = callback;
return ret;
}
var getRequest = getAjax('script.php?some=Get¶ms=inURL', 'GET');
getRequest.send(null);
var postRequest = getAjax('script.php', 'POST', function()
{//passing anonymous function here, but this could just as well have been a named function reference, obviously...
if (this.readyState === 4 && this.status === 200)
{
console.log('Post request complete, answer was: ' + this.response);
}
});
postRequest.send('foo=bar');//set different headers to pos JSON.stringified data
Here's a good place to read up on whatever you don't get from the code above. This is, pretty much a copy-paste bit of code, but if you find yourself wanting to learn just a bit more, Here's a great place to do just that.
WebWorkers
Now these are pretty new, so using them does mean not being able to support older browsers (you could support them by using the event listeners to send each morsel of data to the server, but a worker allows you to bundle, pre-process and structure the data without blocking the "normal" flow of your script. Workers are often presented as a means to sort-of multi-thread JavaScript code. Here's a good intro to them
Basically, you'll need to add something like this to your script:
var worker = new Worker('preprocess.js');//or whatever you've called the worker
worker.addEventListener('message', function(e)
{
var xhr = getAjax('script.php', 'post');//using default callback
xhr.send('data=' + e.data);
//worker.postMessage(null);//clear state
}, false);
Your worker, then, could start off like so:
var time, txt = '';
//entry point:
onmessage = function(e)
{
if (e.data === null)
{
clearInterval(time);
txt = '';
return;
}
if (txt === '' && !time)
{
time = setInterval(function()
{
postMessage(txt);
}, 5000);//set postMessage to be called every 5 seconds
}
txt += e.data;//add new text to current string...
}
Server-side, things couldn't be easier:
if ($_POST && $_POST['data'])
{
$file = $_SESSION['filename'] ? $_SESSION['filename'] : 'File'.session_id();
$fh = fopen($file, 'a+');
fwrite($fh, $_POST['data']);
fclose($fh);
}
echo 'ok';
Now all of this code is a bit crude, and most if it cannot be used in its current form, but it should be enough to get you started. If you don't know what something is, google it.
But do keep in mind that, when it comes to JS, MDN is easily the best reference out there, and as far as PHP goes, their own site (php.net/{functionName}) is pretty ugly, but does contain a lot of info, too...
I have a html page using javascript that gives the user the option to read and use his own text files from his PC. But I want to have an example file on the server that the user can open via a click on a button.
I have no idea what is the best way to open a server file. I googled a bit. (I'm new to html and javascript, so maybe my understanding of the following is incorrect!). I found that javascript is client based and it is not very straightforward to open a server file. It looks like it is easiest to use an iframe (?).
So I'm trying (first test is simply to open it onload of the webpage) the following. With kgr.bss on the same directory on the server as my html page:
<IFRAME SRC="kgr.bss" ID="myframe" onLoad="readFile();"> </IFRAME>
and (with file_inhoud, lines defined elsewhere)
function readFile() {
func="readFile=";
debug2("0");
var x=document.getElementById("myframe");
debug2("1");
var doc = x.contentDocument ? x.contentDocument : (x.contentWindow.document || x.document);
debug2("1a"+doc);
var file_inhoud=doc.document.body;
debug2("2:");
lines = file_inhoud.split("\n");
debug2("3");
fileloaded();
debug2("4");
}
Debug function shows:
readFile=0//readFile=1//readFile=1a[object HTMLDocument]//
So statement that stops the program is:
var file_inhoud=doc.document.body;
What is wrong? What is correct (or best) way to read this file?
Note: I see that the file is read and displayed in the frame.
Thanks!
Your best bet, since the file is on your server is to retrieve it via "ajax". This stands for Asynchronous JavaScript And XML, but the XML part is completely optional, it can be used with all sorts of content types (including plain text). (For that matter, the asynchronous part is optional as well, but it's best to stick with that.)
Here's a basic example of requesting text file data using ajax:
function getFileFromServer(url, doneCallback) {
var xhr;
xhr = new XMLHttpRequest();
xhr.onreadystatechange = handleStateChange;
xhr.open("GET", url, true);
xhr.send();
function handleStateChange() {
if (xhr.readyState === 4) {
doneCallback(xhr.status == 200 ? xhr.responseText : null);
}
}
}
You'd call that like this:
getFileFromServer("path/to/file", function(text) {
if (text === null) {
// An error occurred
}
else {
// `text` is the file text
}
});
However, the above is somewhat simplified. It would work with modern browsers, but not some older ones, where you have to work around some issues.
Update: You said in a comment below that you're using jQuery. If so, you can use its ajax function and get the benefit of jQuery's workarounds for some browser inconsistencies:
$.ajax({
type: "GET",
url: "path/to/file",
success: function(text) {
// `text` is the file text
},
error: function() {
// An error occurred
}
});
Side note:
I found that javascript is client based...
No. This is a myth. JavaScript is just a programming language. It can be used in browsers, on servers, on your workstation, etc. In fact, JavaScript was originally developed for server-side use.
These days, the most common use (and your use-case) is indeed in web browsers, client-side, but JavaScript is not limited to the client in the general case. And it's having a major resurgence on the server and elsewhere, in fact.
The usual way to retrieve a text file (or any other server side resource) is to use AJAX. Here is an example of how you could alert the contents of a text file:
var xhr;
if (window.XMLHttpRequest) {
xhr = new XMLHttpRequest();
} else if (window.ActiveXObject) {
xhr = new ActiveXObject("Microsoft.XMLHTTP");
}
xhr.onreadystatechange = function(){alert(xhr.responseText);};
xhr.open("GET","kgr.bss"); //assuming kgr.bss is plaintext
xhr.send();
The problem with your ultimate goal however is that it has traditionally not been possible to use javascript to access the client file system. However, the new HTML5 file API is changing this. You can read up on it here.