JavaScript: Writing an output file within a limited & secure scenario - javascript

I would like to add a function in my javascript to write to a text file in the local directory where the javascript file is located. This means I'm not looking for some insecure way of accessing the user's file system in any way. All I care about is extracting the user's input into an html page that is accessed by my javascript then using that input as data externally. I just need a simple text file. This user input isn't actually text by the way, but rather a bunch of actions using my online game's components that the underlying javascript turns into a text string (so this particular string is what I want to save, not really even anything direct from the user).
I don't want to write to a user's file system, but rather, the file where the javascript (and html) code is located (a folder hosted on a server). Is there any simple way to get some file I/O going?
I know Javascript has a FileReader, is there any way to get it to do this in reverse? Like a FileWriter. GoogleClosure looks like it has a FileWriter, but it doesn't seem to quite work and I can't find any decent examples of how to get it to do this.
If this requires a different language, is there any way I can just get the relevant snippet and insert this into my Javascript file?
(the folder is hosted on a Linux system if that helps)
ADDENDUM: Elias Van Ootegem's solution below is excellent and I would highly recommend looking into it as it's a great example of client-server interaction and getting your system to provide you the data you're looking to extract. Workers are pretty interesting.
But for those of you looking at this post with that similar question that I initially had about JavaScript I/O, I found one other work-a-round depending on your case. My team's project site made use of a database site, MongoDB, that stored some of the user's interaction data if the user had hit a "Save" button. MongoDB, and other online database systems, provide a "dumping" function/script that you can call from your local machine/server and put that data into an output file (I was able to put the JSON data into a text file). From that output, you can write a parser to extract and format the data you desire from that output since databases like MongoDB can be pretty clear as to what format the text will be in (very structured, organized). I wrote a parser in C (with a few libraries I had written to extend the language) to do what I needed, but the idea is pretty generalizable to other programming/scripting languages.
I did look at leaving cookies as option as well, and made use of a test program to try it out (it works too!). However, one tradeoff for leaving cookies on a user's local system is that those cookies generally are meant to hold small amounts of data (usually things like username, date created, & expiration date of the cookie) and are dependent upon the user's local machine. Further, while you can extract the data in those cookies from JavaScript, you are back to the initial problem: the data still exists on the web, not on an output file on your server's file system. In the case you need to extract data and want some guarantee this data will exist on your machine, use Elias Van Ootegem's solution.

JavaScript code that is running client-side cannot access the server's filesystem at the same time, let alone write a file. People often say that, if JS were to have IO capabilities, that would be rather insecure... just imagine how dangerous that would be.
What you could do, is simply build your string, using a Worker that, on closing, returns the full data-string, which is then sent to the server (AJAX call).
The server-side script (Perl, PHP, .NET, Ruby...) can receive this data, parse it and then write the file to disk as you want it to.
All in all, not very hard, but quite an interesting project anyway. Oh, and when using a worker, seeing as it's an online game and everything, perhaps a setInterval to send (a part of) the data every 5000ms might not be a bad idea, either.
As requested - some basic code snippets.
A simple AJAX-setup function:
function getAjax(url,method, callback)
{
var ret;
method = method || 'POST';
url = url || 'default.php';
callback = callback || success;//assuming you have a default function called "success"
try
{
ret = new XMLHttpRequest();
}
catch (error)
{
try
{
ret= new ActiveXObject('Msxml2.XMLHTTP');
}
catch(error)
{
try
{
ret= new ActiveXObject('Microsoft.XMLHTTP');
}
catch(error)
{
throw new Error('no Ajax support?');
}
}
}
ret.open(method, url, true);
ret.setRequestHeader('X-Requested-With', 'XMLHttpRequest');
ret.setRequestHeader('Content-type', 'application/x-www-form-urlencode');
ret.onreadystatechange = callback;
return ret;
}
var getRequest = getAjax('script.php?some=Get&params=inURL', 'GET');
getRequest.send(null);
var postRequest = getAjax('script.php', 'POST', function()
{//passing anonymous function here, but this could just as well have been a named function reference, obviously...
if (this.readyState === 4 && this.status === 200)
{
console.log('Post request complete, answer was: ' + this.response);
}
});
postRequest.send('foo=bar');//set different headers to pos JSON.stringified data
Here's a good place to read up on whatever you don't get from the code above. This is, pretty much a copy-paste bit of code, but if you find yourself wanting to learn just a bit more, Here's a great place to do just that.
WebWorkers
Now these are pretty new, so using them does mean not being able to support older browsers (you could support them by using the event listeners to send each morsel of data to the server, but a worker allows you to bundle, pre-process and structure the data without blocking the "normal" flow of your script. Workers are often presented as a means to sort-of multi-thread JavaScript code. Here's a good intro to them
Basically, you'll need to add something like this to your script:
var worker = new Worker('preprocess.js');//or whatever you've called the worker
worker.addEventListener('message', function(e)
{
var xhr = getAjax('script.php', 'post');//using default callback
xhr.send('data=' + e.data);
//worker.postMessage(null);//clear state
}, false);
Your worker, then, could start off like so:
var time, txt = '';
//entry point:
onmessage = function(e)
{
if (e.data === null)
{
clearInterval(time);
txt = '';
return;
}
if (txt === '' && !time)
{
time = setInterval(function()
{
postMessage(txt);
}, 5000);//set postMessage to be called every 5 seconds
}
txt += e.data;//add new text to current string...
}
Server-side, things couldn't be easier:
if ($_POST && $_POST['data'])
{
$file = $_SESSION['filename'] ? $_SESSION['filename'] : 'File'.session_id();
$fh = fopen($file, 'a+');
fwrite($fh, $_POST['data']);
fclose($fh);
}
echo 'ok';
Now all of this code is a bit crude, and most if it cannot be used in its current form, but it should be enough to get you started. If you don't know what something is, google it.
But do keep in mind that, when it comes to JS, MDN is easily the best reference out there, and as far as PHP goes, their own site (php.net/{functionName}) is pretty ugly, but does contain a lot of info, too...

Related

How to call a fetch request and wait for it's answer inside a onBeforeRequest in a web extension

I'm trying to write a web extension that stops the requests from a url list provided locally, fetches the URL's response, analyzes it in a certain way and based on the analysis results, blocks or doesn't block the request.
Is that even possible?
The browser doesn't matter.
If it's possible, could you provide some examples?
I tried doing it with Chrome extensions, but it seems like it's not possible.
I heard it's possible on mozilla though
I think that this is only possible using the old webRequestBlocking API which Chrome is removing as a part of Manifest v3. Fortunately, Firefox is planning to continue supporting blocking web requests even as they transition to manifest v3 (read more here).
In terms of implementation, I would highly recommend referring to the MDN documentation for webRequest, in particular their section on modifying responses and their documentation for the filterResponseData method.
Mozilla have also provided a great example project that demonstrates how to achieve something very close to what I think you want to do.
Below I've modified their background.js code slightly so it is a little closer to what you want to do:
function listener(details) {
if (mySpecialUrls.indexOf(details.url) === -1) {
// Ignore this url, it's not on our list.
return {};
}
let filter = browser.webRequest.filterResponseData(details.requestId);
let decoder = new TextDecoder("utf-8");
let encoder = new TextEncoder();
filter.ondata = event => {
let str = decoder.decode(event.data, {stream: true});
// Just change any instance of Example in the HTTP response
// to WebExtension Example.
str = str.replace(/Example/g, 'WebExtension Example');
filter.write(encoder.encode(str));
filter.disconnect();
}
// This is a BlockingResponse object, you can set parameters here to e.g. cancel the request if you want to.
// See: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/BlockingResponse#type
return {};
}
browser.webRequest.onBeforeRequest.addListener(
listener,
// 'main_frame' means this will only affect requests for the main frame of the browser (e.g. the HTML for a page rather than the images, CSS, etc. that are loaded afterwards). You might want to look into whether you want to expand this.
{urls: ["*://*/*"], types: ["main_frame"]},
["blocking"]
);
Correction:
The above example only works properly if the response data fits in one chunk. If it is larger (and you still want to inspect the entirety of the response data), you would need to put all of the data into a buffer, and then work on it once all data has been received. See the document here for more information: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/StreamFilter/ondata#webextension_examples (the code section titled "This example combines all buffers into a single buffer" would be of most interest to you I think).
In terms of using this API to block responses, data is only returned from this URL if you call filter.write(), so if you don't like the response, you can simply not call it (just call filter.close()) and an empty response will be returned. You can also only return part of the full response body by filter.write()ing only the bits that you want to return.

JavaScript / Python interaction in Linux without a REST framework?

I'm working on some changes to a page that needs to retrieve information from some files under /proc so the page can display version information to the user. Currently, the page is generated entirely by the Python script, which allows me to just read the file and put everything in the page at creation time.
However, this led to the issue that the version numbers wouldn't update when a new version of the software was uploaded. I don't want to regenerate the page every time a new package is installed, so I made the main page static and want to instead just query the information from a Python script and return it to the page to populate the page when loaded.
The Python scripts are set up as CGI and have sudo access, so there's no issue with them retrieving those files. However, if I wanted to use something like AJAX to call the Python script, is there any way I could return the data without using a REST framework such as Flask or Django? The application needs to be lightweight and preferably not rely on a new framework.
Is there a way I can do this with vanilla JavaScript and Python?
Ok, so the solution was fairly simple, I just made a few syntactical errors that led to it not working the first few times I tried it.
So the request looked like this:
window.onload = function() {
var xhr = new XMLHttpRequest();
xhr.onreadystatechange = function() {
if((this.readyState == 4) && (this.status == 200)) {
var response = JSON.parse(this.responseText);
// Do stuff with the JSON here...
}
};
xhr.open("GET", scriptURL, true);
xhr.send();
}
From there, the Python script simply needed to do something like this to return JSON data containing my version numbers:
import sys, cgi, json
result = {}
result['success'] = True
result['message'] = "The command completed successfully"
d = {}
... write version information to the 'd' map ...
result['data'] = d
sys.stdout.write("Content-Type: text/plain\n\n")
sys.stdout.write(json.dumps(result))
sys.stdout.write("\n")
sys.stdout.close()
The most persistent problem that took me forever to find was I forgot a closing quotation in my script tag, which caused the whole page to not load.

Node.js: requesting a page and allowing the page to build before scraping

I've seen some answers to this that refer the askee to other libraries (like phantom.js), but I'm here wondering if it is at all possible to do this in just node.js?
Considering my code below. It requests a webpage using request, then using cheerio it explores the dom to scrape the page for data. It works flawlessly and if everything had gone as planned, I believe it would have outputted a file as i imagined in my head.
The problem is that the page I am requesting in order to scrape, build the table im looking at asynchronously using either ajax or jsonp, i'm not entirely sure how .jsp pages work.
So here I am trying to find a way to "wait" for this data to load before I scrape the data for my new file.
var cheerio = require('cheerio'),
request = require('request'),
fs = require('fs');
// Go to the page in question
request({
method: 'GET',
url: 'http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp'
}, function(err, response, body) {
if (err) return console.error(err);
// Tell Cherrio to load the HTML
$ = cheerio.load(body);
// Create an empty object to write to the file later
var toSort = {}
// Itterate over DOM and fill the toSort object
$('#emb table td.list_right').each(function() {
var row = $(this).parent();
toSort[$(this).text()] = {
[$("#lastdate").text()]: $(row).find(".idx1").html(),
[$("#currdate").text()]: $(row).find(".idx2").html()
}
});
//Write/overwrite a new file
var stream = fs.createWriteStream("/tmp/shipping.txt");
var toWrite = "";
stream.once('open', function(fd) {
toWrite += "{\r\n"
for(i in toSort){
toWrite += "\t" + i + ": { \r\n";
for(j in toSort[i]){
toWrite += "\t\t" + j + ":" + toSort[i][j] + ",\r\n";
}
toWrite += "\t" + "}, \r\n";
}
toWrite += "}"
stream.write(toWrite)
stream.end();
});
});
The expected result is a text file with information formatted like a JSON object.
It should look something like different instances of this
"QINHUANGDAO - GUANGZHOU (50,000-60,000DWT)": {
 "2016-09-29": 26.7,
"2016-09-30": 26.8,
},
But since the name is the only thing that doesn't load async, (the dates and values are async) I get a messed up object.
I tried Actually just setting a setTimeout in various places in the code. The script will only be touched by developers that can afford to run the script several times if it fails a few times. So while not ideal, even a setTimeout (up to maybe 5 seconds) would be good enough.
It turns out the settimeouts don't work. I suspect that once I request the page, I'm stuck with the snapshot of the page "as is" when I receive it, and I'm in fact not looking at a live thing I can wait for to load its dynamic content.
I've wondered investigating how to intercept the packages as they come, but I don't understand HTTP well enough to know where to start.
The setTimeout will not make any difference even if you increase it to an hour. The problem here is that you are making a request against this url:
http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp
and their server returns back the html and in this html there are the js and css imports. This is the end of your case, you just have the html and that's it. Instead the browser knows how to use and to parse the html document, so it is able to understand the javascript scripts and to execute/run them and this is exactly your problem. Your program is not able to understand that has something to do with the HTML contents. You need to find or to write a scraper that is able to run javascript. I just found this similar issue on stackoverflow:
Web-scraping JavaScript page with Python
The guy there suggests https://github.com/niklasb/dryscrape and it seems that this tool is able to run javascript. It is written in python though.
You are trying to scrape the original page that doesn't include the data you need.
When the page is loaded, browser evaluates JS code it includes, and this code knows where and how to get the data.
The first option is to evaluate the same code, like PhantomJS do.
The other (and you seem to be interested in it) is to investigate the page's network activity and to understand what additional requests you should perform to get the data you need.
In your case, these are:
http://index.chineseshipping.com.cn/servlet/cbfiDailyGetContrast?SpecifiedDate=&jc=jsonp1475577615267&_=1475577619626
and
http://index.chineseshipping.com.cn/servlet/allGetCurrentComposites?date=Tue%20Oct%2004%202016%2013:40:20%20GMT+0300%20(MSK)&jc=jsonp1475577615268&_=1475577620325
In both requests:
_ is a decache parameter to prevent caching.
jc is a name of a JS wrapper function which should be invoked with the result (https://en.wikipedia.org/wiki/JSONP)
So, scrapping the table template at http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp and performing two additional requests you will be able to combine them into the same data structure you see in the browser.

How to find specific cache entries in firefox and turn them into a File or Blob object?

I have the following scenario:
A user can paste html content in a wysiwyg editor. When that pasted content contains images which are hosted on other domains, I want these to be uploaded to my server. Right now the only way of doing that is manually downloading via "save image as..." context menu, then uploading the image to the server via a form and updating the images in the editor.
I have to solve this client side.
I'm working on a firefox addon that can automate the process. Of course I could download these images, store them on the harddrive and then upload them with FormData or better the pupload , but this seems clumsy as since the content is displayed in the browser, it must be downloaded already and reside somewhere in memory. I would like to grab the image files from memory and tell firefox to upload them (being able to make a Blob of them would suffice it seems).
However, I'm getting hopelessly lost in the API documentation for several different Caching systems on MDN and fail to find any example code of how to use them. I checked code of other addons that access the cache, but most is uncommented and still quite cryptic.
Can you point me to some sample code of what the recommended way would be to achieve this? The best possible solution would be if I can request the particular url from firefox so I can use it in FormData, and if it isn't in the cache firefox downloads to memory, but if it's already there I just get it directly.
The master documentation for Mozilla's version 2 HTTP Cache is located here. Aside from the blurbs on this page, the only way I was able to make sense of this new scheme is by looking at the actual code for each object and back-referencing almost everything. Even though I wasn't able to get a 100% clear picture of what exactly was going on, I figured out enough to get it working. In my opinion, Mozilla should have taken the time to create some simple-terms documentation before they went ahead an pushed out the new API. But, we get what they give us I suppose.
On to your problem. We're assuming that the users who want to upload an image already have this image saved in their cache somewhere. In order to be able to pull it out of the user's cache for upload, you must first be able to determine the URI of the image before it can be pulled explicitly from the cache. For the sake of brevity, I'm going to assume that you already have this part figured out.
An important thing to note about the new HTTP Cache is that although it's all based off callbacks, there can still only ever be a single writing process. While in your example it may not be necessary to write to the descriptor, you should still request write access since that will prevent any other processes (i.e. the browser) from altering/deleting the data until you are done with it. Another side note and a source of a lot of pain for me was the fact that requesting a cache entry from the memory cache will ALWAYS created a new entry, overwriting any pre-existing entries. You shouldn't need this, but if it is necessary, you can access the memory cache from the disk (the disk cache is physical disk+memory cache -- Mozilla logic) cache without that side effect.
Once the URI is in hand, you can then make a request to pull it out of the cache. The new caching system is based completely on callbacks. There is one key object that we will need in order to be able to fetch the cache entry's data -- nsICacheEntryOpenCallback. This is a user-defined object that handles the response after a cache entry is requested. It must have two member functions: onCacheEntryCheck(entry, appcache) and onCacheEntryAvilable(descriptor, isnew, appcache, status).
Here is a cut-down example from my code of such an object:
var cacheWaiter = {
//This function essentially tells the cache service whether or not we want
//this cache descriptor. If ENTRY_WANTED is returned, the cache descriptor is
//passed to onCacheEntryAvailable()
onCacheEntryCheck: function( descriptor, appcache )
{
//First, we want to be sure the cache entry is not currently being written
//so that we can be sure that the file is complete when we go to open it.
//If predictedDataSize > dataSize, chances are it's still in the process of
//being cached and we won't be able to get an exclusive lock on it and it
//will be incomplete, so we don't want it right now.
try{
if( descriptor.dataSize < descriptor.predictedDataSize )
//This tells the nsICacheService to call this function again once the
//currently writing process is done writing the cache entry.
return Components.interfaces.nsICacheEntryOpenCallback.RECHECK_AFTER_WRITE_FINISHED;
}
catch(e){
//Also return the same value for any other error
return Components.interfaces.nsICacheEntryOpenCallback.RECHECK_AFTER_WRITE_FINISHED;
}
//If no exceptions occurred and predictedDataSize == dataSize, tell the
//nsICacheService to pass the descriptor to this.onCacheEntryAvailable()
return Components.interfaces.nsICacheEntryOpenCallback.ENTRY_WANTED;
}
//Once we are certain we want to use this descriptor (i.e. it is done
//downloading and we want to read it), it gets passed to this function
//where we can do what we wish with it.
//At this point we will have full control of the descriptor until this
//function exits (or, I believe that's how it works)
onCacheEntryAvailable: function( descriptor, isnew, appcache, status )
{
//In this function, you can do your cache descriptor reads and store
//it in a Blob() for upload. I haven't actually tested the code I put
//here, modifications may be needed.
var cacheentryinputstream = descriptor.openInputStream(0);
var blobarray = new Array(0);
var buffer = new Array(1024);
for( var i = descriptor.dataSize; i == 0; i -= 1024)
{
var chunksize = 1024;
if( i < 0 )
chunksize = 1024 + i;
try{
cacheentryinputstream.read( buffer, chunksize );
}
catch(e){
//Nasty NS_ERROR_WOULD_BLOCK exceptions seem to happen to me
//frequently. The Mozilla guys don't provide a way around this,
//since they want a responsive UI at all costs. So, just keep
//trying until it succeeds.
i += 1024;
continue;
}
for( var j = 0; j < chunksize; j++ )
{
blobarray.push(buffer.charAt(j));
}
if( i < 0 )
i = 0 //Set i == 0 to signal loop break
}
}
var theblob = new Blob(blobarray);
//Do an AJAX POST request here.
}
Now that the callback object is set up, we can actually do some requests for cache descriptors. Try something like this:
var theuri = "http://www.example.com/image.jpg";
//Load the cache service
var cacheservice = Components.classes["#mozilla.org/netwerk/cache-storage-service;1"].getService(Components.interfaces.nsICacheStorageService)
//Select the default disk cache.
var hdcache = cacheservice.diskCacheStorage(Services.loadContextInfo.default, true);
//Request a cache entry for the URI. OPEN_NORMALLY requests write access.
hdcache.asyncOpenURI(ioservice.newURI(theuri, null, null), "", hdcache.OPEN_NORMALLY, cacheWaiter);
As far as actually getting the URI, you could provide a window for a user to drag-and-drop an image into or perhaps just paste the URL of the image into. Then, you could do an AJAX request to fetch the image (in the case that the user hasn't actually visited the image for some reason, it would then be cached). You could then use that URL to then fetch the cache entry for upload. As an aesthetic touch, you could even show a preview of the image but that's a bit out of scope of the question.
If you need any more clarifications, please feel free to ask!

reading server file with javascript

I have a html page using javascript that gives the user the option to read and use his own text files from his PC. But I want to have an example file on the server that the user can open via a click on a button.
I have no idea what is the best way to open a server file. I googled a bit. (I'm new to html and javascript, so maybe my understanding of the following is incorrect!). I found that javascript is client based and it is not very straightforward to open a server file. It looks like it is easiest to use an iframe (?).
So I'm trying (first test is simply to open it onload of the webpage) the following. With kgr.bss on the same directory on the server as my html page:
<IFRAME SRC="kgr.bss" ID="myframe" onLoad="readFile();"> </IFRAME>
and (with file_inhoud, lines defined elsewhere)
function readFile() {
func="readFile=";
debug2("0");
var x=document.getElementById("myframe");
debug2("1");
var doc = x.contentDocument ? x.contentDocument : (x.contentWindow.document || x.document);
debug2("1a"+doc);
var file_inhoud=doc.document.body;
debug2("2:");
lines = file_inhoud.split("\n");
debug2("3");
fileloaded();
debug2("4");
}
Debug function shows:
readFile=0//readFile=1//readFile=1a[object HTMLDocument]//
So statement that stops the program is:
var file_inhoud=doc.document.body;
What is wrong? What is correct (or best) way to read this file?
Note: I see that the file is read and displayed in the frame.
Thanks!
Your best bet, since the file is on your server is to retrieve it via "ajax". This stands for Asynchronous JavaScript And XML, but the XML part is completely optional, it can be used with all sorts of content types (including plain text). (For that matter, the asynchronous part is optional as well, but it's best to stick with that.)
Here's a basic example of requesting text file data using ajax:
function getFileFromServer(url, doneCallback) {
var xhr;
xhr = new XMLHttpRequest();
xhr.onreadystatechange = handleStateChange;
xhr.open("GET", url, true);
xhr.send();
function handleStateChange() {
if (xhr.readyState === 4) {
doneCallback(xhr.status == 200 ? xhr.responseText : null);
}
}
}
You'd call that like this:
getFileFromServer("path/to/file", function(text) {
if (text === null) {
// An error occurred
}
else {
// `text` is the file text
}
});
However, the above is somewhat simplified. It would work with modern browsers, but not some older ones, where you have to work around some issues.
Update: You said in a comment below that you're using jQuery. If so, you can use its ajax function and get the benefit of jQuery's workarounds for some browser inconsistencies:
$.ajax({
type: "GET",
url: "path/to/file",
success: function(text) {
// `text` is the file text
},
error: function() {
// An error occurred
}
});
Side note:
I found that javascript is client based...
No. This is a myth. JavaScript is just a programming language. It can be used in browsers, on servers, on your workstation, etc. In fact, JavaScript was originally developed for server-side use.
These days, the most common use (and your use-case) is indeed in web browsers, client-side, but JavaScript is not limited to the client in the general case. And it's having a major resurgence on the server and elsewhere, in fact.
The usual way to retrieve a text file (or any other server side resource) is to use AJAX. Here is an example of how you could alert the contents of a text file:
var xhr;
if (window.XMLHttpRequest) {
xhr = new XMLHttpRequest();
} else if (window.ActiveXObject) {
xhr = new ActiveXObject("Microsoft.XMLHTTP");
}
xhr.onreadystatechange = function(){alert(xhr.responseText);};
xhr.open("GET","kgr.bss"); //assuming kgr.bss is plaintext
xhr.send();
The problem with your ultimate goal however is that it has traditionally not been possible to use javascript to access the client file system. However, the new HTML5 file API is changing this. You can read up on it here.

Categories