Serve html file with fs.readFileSync() fails

Serve html file with fs.readFileSync() fails - javascript

I have this code (https://gist.github.com/2402116) :
server.get('/', function(request, response) {
var k = fs.readFileSync('./index.html','utf8');
response.send( k );
});
Tries to read this file:
https://gist.github.com/2402070
and the browser keeps loading and never end.
But if I remove all the js includes from the html file works fine.
what am I doing wrong?

Your current server implementation does not do anything but serve index.html to requests for the base url, i.e. '/'. You will need to write further code/routes to serve the requests for the js includes in your index.html, i.e. '/app.js' and the various js files in '/js/'.
Now, the routing implementation in the gist is quite crude and doesn't support many aspects of url matching. The original code is clearly just demonstrating a concept for a single page site with no resources. You will see it will quickly become burdensome to get your code working as you will effectively have to write a route for every resource request, e.g.
server.get('/app.js', function(request, response) {
var k = fs.readFileSync('./app.js','utf8');
response.send( k );
});
server.get('/js/jquery-1.7.2.js', function(request, response) {
var k = fs.readFileSync('./js/jquery-1.7.2.js','utf8');
response.send( k );
});
etc...
You are better off looking at a node.js url routing library already out there (e.g. director) or a web framework such as express which has inbuilt support for routing (and static file serving).

You need a response.end() once you are done sending data to your browser.
Actually, since you are sending all of your data at once, you can just replace response.send(k) with response.end(k). Although this method is not recommended. I highly recommend reading your file asynchronously and sending it to the client chunk-by-chunk.
See also: http://nodejs.org/api/http.html#http_response_end_data_encoding

try .toString on k and not send but .end
response.end( k.toString() );
maybe some wierd things happens and he tries to eval the code

Related

Node.js: requesting a page and allowing the page to build before scraping

I've seen some answers to this that refer the askee to other libraries (like phantom.js), but I'm here wondering if it is at all possible to do this in just node.js?
Considering my code below. It requests a webpage using request, then using cheerio it explores the dom to scrape the page for data. It works flawlessly and if everything had gone as planned, I believe it would have outputted a file as i imagined in my head.
The problem is that the page I am requesting in order to scrape, build the table im looking at asynchronously using either ajax or jsonp, i'm not entirely sure how .jsp pages work.
So here I am trying to find a way to "wait" for this data to load before I scrape the data for my new file.
var cheerio = require('cheerio'),
request = require('request'),
fs = require('fs');
// Go to the page in question
request({
method: 'GET',
url: 'http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp'
}, function(err, response, body) {
if (err) return console.error(err);
// Tell Cherrio to load the HTML
$ = cheerio.load(body);
// Create an empty object to write to the file later
var toSort = {}
// Itterate over DOM and fill the toSort object
$('#emb table td.list_right').each(function() {
var row = $(this).parent();
toSort[$(this).text()] = {
[$("#lastdate").text()]: $(row).find(".idx1").html(),
[$("#currdate").text()]: $(row).find(".idx2").html()
}
});
//Write/overwrite a new file
var stream = fs.createWriteStream("/tmp/shipping.txt");
var toWrite = "";
stream.once('open', function(fd) {
toWrite += "{\r\n"
for(i in toSort){
toWrite += "\t" + i + ": { \r\n";
for(j in toSort[i]){
toWrite += "\t\t" + j + ":" + toSort[i][j] + ",\r\n";
}
toWrite += "\t" + "}, \r\n";
}
toWrite += "}"
stream.write(toWrite)
stream.end();
});
});
The expected result is a text file with information formatted like a JSON object.
It should look something like different instances of this
"QINHUANGDAO - GUANGZHOU (50,000-60,000DWT)": {
 "2016-09-29": 26.7,
"2016-09-30": 26.8,
},
But since the name is the only thing that doesn't load async, (the dates and values are async) I get a messed up object.
I tried Actually just setting a setTimeout in various places in the code. The script will only be touched by developers that can afford to run the script several times if it fails a few times. So while not ideal, even a setTimeout (up to maybe 5 seconds) would be good enough.
It turns out the settimeouts don't work. I suspect that once I request the page, I'm stuck with the snapshot of the page "as is" when I receive it, and I'm in fact not looking at a live thing I can wait for to load its dynamic content.
I've wondered investigating how to intercept the packages as they come, but I don't understand HTTP well enough to know where to start.

The setTimeout will not make any difference even if you increase it to an hour. The problem here is that you are making a request against this url:
http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp
and their server returns back the html and in this html there are the js and css imports. This is the end of your case, you just have the html and that's it. Instead the browser knows how to use and to parse the html document, so it is able to understand the javascript scripts and to execute/run them and this is exactly your problem. Your program is not able to understand that has something to do with the HTML contents. You need to find or to write a scraper that is able to run javascript. I just found this similar issue on stackoverflow:
Web-scraping JavaScript page with Python
The guy there suggests https://github.com/niklasb/dryscrape and it seems that this tool is able to run javascript. It is written in python though.

You are trying to scrape the original page that doesn't include the data you need.
When the page is loaded, browser evaluates JS code it includes, and this code knows where and how to get the data.
The first option is to evaluate the same code, like PhantomJS do.
The other (and you seem to be interested in it) is to investigate the page's network activity and to understand what additional requests you should perform to get the data you need.
In your case, these are:
http://index.chineseshipping.com.cn/servlet/cbfiDailyGetContrast?SpecifiedDate=&jc=jsonp1475577615267&_=1475577619626
and
http://index.chineseshipping.com.cn/servlet/allGetCurrentComposites?date=Tue%20Oct%2004%202016%2013:40:20%20GMT+0300%20(MSK)&jc=jsonp1475577615268&_=1475577620325
In both requests:
_ is a decache parameter to prevent caching.
jc is a name of a JS wrapper function which should be invoked with the result (https://en.wikipedia.org/wiki/JSONP)
So, scrapping the table template at http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp and performing two additional requests you will be able to combine them into the same data structure you see in the browser.

How to update server's content without restarting it? (node.js)

So I have a simple node.js server which serves only dynamic content:
// app.js
var app = require('express')();
app.get('/message/:poster', function(request, response) {
response.writeHeader(200, {'content-type': 'text/html'});
// some database queries
response.end(""+
"<!DOCTYPE html>"+
"<html>"+
"<head>"+
"<title>messages of "+request.params.poster+"</title>"+
"<script src='http://example.com/script.js'></script>"+
"<link rel='stylesheet' href='http://example.com/style.css'>"+
"</head>"+
"<body>"+
"" // and so on
);
})
app.listen(2345);
Now, suppose I want to update my HTML.
And suppose further that I don't want to restart the server.
Is there a way of achieving this?
I tried exporting the part to send to an external file like:
// lib.js
module.exports.message = function(request, response) {
response.writeHeader(200, {'content-type': 'text/html'})
//some database queries
response.end(""+
"<!DOCTYPE html>"+
"<html>"+
"<head>"+
"<title>messages of "+request.params.poster+"</title>"+
"<script src='http://example.com/script.js></script>"+
"<link rel='stylesheet' href='http://example.com/style.css'>"+
"</head>"+
"<body>"+
"" //and so on
);
}
and
// app.js
var app = require('express')();
app.get('/message/:poster', require('./lib.js').message)
app.listen(2345);
And it works but if I update lib.js it doesn't update. It seems to be making a copy of that function.
Then I tried
// app.js
var app = require('express')();
app.get('/message/:poster', function(request, response) {
require('./lib.js').message(request, response);
})
app.listen(2345);
But this doesn't update either.Seems like the function gets cached and reused all the time(once I start the server). I dare say that there must be a way to set it so that it either revalidates the function each time(checking if the file containing it changed) and if so updates its cache, or we can set it to update the function each n amount of time, or even better, since we're in node, having an event listener for changes to the files containing the function and as the function changes, event fires and the function in cache gets updated.
So how do we get one of the above behaviors? Or something else? I know to restart a server may take only 100ms but restarting it would interrupt all the currently active websockets, which is not an option.
NOTE: I don't want to use any templating languages like jade, ejc etc.

By requiring a module its module.exports is cached for all future calls to require. You can programmatically empty the cache: http://nodejs.org/docs/latest/api/globals.html#globals_require_cache. If you additionally want to do it upon a file change you can use fs.watch: http://nodejs.org/api/fs.html#fs_fs_watch_filename_options_listener.

If you dont want to loose any request in production - just gracefull restart your app using PM2, Forever, etc.
If you want to automatic apply your changes in development - use Nodemon.
I dont know any other reason why you dont like to restart app.

Exporting the content to other module seems like a good way to tackle this requirement. But the problem is: modules are only instantiated once and cached for later requests, that's why when you update the lib.js it doesn't reflect the updated content.
What are you looking for is a way for hot loading node.js modules to get a fresh instance of the module every time it changes. You can use node-hotswap
require('hotswap');
and for any module you want to track the changes:
module.change_code = 1;

could you not just use the read/write API for nodejs?
http://nodejs.org/api/fs.html
Request->ReadFile->Send Content to client
At least for static content as html.

JavaScript: Writing an output file within a limited & secure scenario

I would like to add a function in my javascript to write to a text file in the local directory where the javascript file is located. This means I'm not looking for some insecure way of accessing the user's file system in any way. All I care about is extracting the user's input into an html page that is accessed by my javascript then using that input as data externally. I just need a simple text file. This user input isn't actually text by the way, but rather a bunch of actions using my online game's components that the underlying javascript turns into a text string (so this particular string is what I want to save, not really even anything direct from the user).
I don't want to write to a user's file system, but rather, the file where the javascript (and html) code is located (a folder hosted on a server). Is there any simple way to get some file I/O going?
I know Javascript has a FileReader, is there any way to get it to do this in reverse? Like a FileWriter. GoogleClosure looks like it has a FileWriter, but it doesn't seem to quite work and I can't find any decent examples of how to get it to do this.
If this requires a different language, is there any way I can just get the relevant snippet and insert this into my Javascript file?
(the folder is hosted on a Linux system if that helps)
ADDENDUM: Elias Van Ootegem's solution below is excellent and I would highly recommend looking into it as it's a great example of client-server interaction and getting your system to provide you the data you're looking to extract. Workers are pretty interesting.
But for those of you looking at this post with that similar question that I initially had about JavaScript I/O, I found one other work-a-round depending on your case. My team's project site made use of a database site, MongoDB, that stored some of the user's interaction data if the user had hit a "Save" button. MongoDB, and other online database systems, provide a "dumping" function/script that you can call from your local machine/server and put that data into an output file (I was able to put the JSON data into a text file). From that output, you can write a parser to extract and format the data you desire from that output since databases like MongoDB can be pretty clear as to what format the text will be in (very structured, organized). I wrote a parser in C (with a few libraries I had written to extend the language) to do what I needed, but the idea is pretty generalizable to other programming/scripting languages.
I did look at leaving cookies as option as well, and made use of a test program to try it out (it works too!). However, one tradeoff for leaving cookies on a user's local system is that those cookies generally are meant to hold small amounts of data (usually things like username, date created, & expiration date of the cookie) and are dependent upon the user's local machine. Further, while you can extract the data in those cookies from JavaScript, you are back to the initial problem: the data still exists on the web, not on an output file on your server's file system. In the case you need to extract data and want some guarantee this data will exist on your machine, use Elias Van Ootegem's solution.

JavaScript code that is running client-side cannot access the server's filesystem at the same time, let alone write a file. People often say that, if JS were to have IO capabilities, that would be rather insecure... just imagine how dangerous that would be.
What you could do, is simply build your string, using a Worker that, on closing, returns the full data-string, which is then sent to the server (AJAX call).
The server-side script (Perl, PHP, .NET, Ruby...) can receive this data, parse it and then write the file to disk as you want it to.
All in all, not very hard, but quite an interesting project anyway. Oh, and when using a worker, seeing as it's an online game and everything, perhaps a setInterval to send (a part of) the data every 5000ms might not be a bad idea, either.
As requested - some basic code snippets.
A simple AJAX-setup function:
function getAjax(url,method, callback)
{
var ret;
method = method || 'POST';
url = url || 'default.php';
callback = callback || success;//assuming you have a default function called "success"
try
{
ret = new XMLHttpRequest();
}
catch (error)
{
try
{
ret= new ActiveXObject('Msxml2.XMLHTTP');
}
catch(error)
{
try
{
ret= new ActiveXObject('Microsoft.XMLHTTP');
}
catch(error)
{
throw new Error('no Ajax support?');
}
}
}
ret.open(method, url, true);
ret.setRequestHeader('X-Requested-With', 'XMLHttpRequest');
ret.setRequestHeader('Content-type', 'application/x-www-form-urlencode');
ret.onreadystatechange = callback;
return ret;
}
var getRequest = getAjax('script.php?some=Get&params=inURL', 'GET');
getRequest.send(null);
var postRequest = getAjax('script.php', 'POST', function()
{//passing anonymous function here, but this could just as well have been a named function reference, obviously...
if (this.readyState === 4 && this.status === 200)
{
console.log('Post request complete, answer was: ' + this.response);
}
});
postRequest.send('foo=bar');//set different headers to pos JSON.stringified data
Here's a good place to read up on whatever you don't get from the code above. This is, pretty much a copy-paste bit of code, but if you find yourself wanting to learn just a bit more, Here's a great place to do just that.
WebWorkers
Now these are pretty new, so using them does mean not being able to support older browsers (you could support them by using the event listeners to send each morsel of data to the server, but a worker allows you to bundle, pre-process and structure the data without blocking the "normal" flow of your script. Workers are often presented as a means to sort-of multi-thread JavaScript code. Here's a good intro to them
Basically, you'll need to add something like this to your script:
var worker = new Worker('preprocess.js');//or whatever you've called the worker
worker.addEventListener('message', function(e)
{
var xhr = getAjax('script.php', 'post');//using default callback
xhr.send('data=' + e.data);
//worker.postMessage(null);//clear state
}, false);
Your worker, then, could start off like so:
var time, txt = '';
//entry point:
onmessage = function(e)
{
if (e.data === null)
{
clearInterval(time);
txt = '';
return;
}
if (txt === '' && !time)
{
time = setInterval(function()
{
postMessage(txt);
}, 5000);//set postMessage to be called every 5 seconds
}
txt += e.data;//add new text to current string...
}
Server-side, things couldn't be easier:
if ($_POST && $_POST['data'])
{
$file = $_SESSION['filename'] ? $_SESSION['filename'] : 'File'.session_id();
$fh = fopen($file, 'a+');
fwrite($fh, $_POST['data']);
fclose($fh);
}
echo 'ok';
Now all of this code is a bit crude, and most if it cannot be used in its current form, but it should be enough to get you started. If you don't know what something is, google it.
But do keep in mind that, when it comes to JS, MDN is easily the best reference out there, and as far as PHP goes, their own site (php.net/{functionName}) is pretty ugly, but does contain a lot of info, too...

Inject local .js file into a webpage?

I'd like to inject a couple of local .js files into a webpage. I just mean client side, as in within my browser, I don't need anybody else accessing the page to be able to see it. I just need to take a .js file, and then make it so it's as if that file had been included in the page's html via a <script> tag all along.
It's okay if it takes a second after the page has loaded for the stuff in the local files to be available.
It's okay if I have to be at the computer to do this "by hand" with a console or something.
I've been trying to do this for two days, I've tried Greasemonkey, I've tried manually loading files using a JavaScript console. It amazes me that there isn't (apparently) an established way to do this, it seems like such a simple thing to want to do. I guess simple isn't the same thing as common, though.
If it helps, the reason why I want to do this is to run a chatbot on a JS-based chat client. Some of the bot's code is mixed into the pre-existing chat code -- for that, I have Fiddler intercepting requests to .../chat.js and replacing it with a local file. But I have two .js files which are "independant" of anything on the page itself. There aren't any .js files requested by the page that I can substitute them for, so I can't use Fiddler.

Since your already using a fiddler script, you can do something like this in the OnBeforeResponse(oSession: Session) function
if ( oSession.oResponse.headers.ExistsAndContains("Content-Type", "html") &&
oSession.hostname.Contains("MY.TargetSite.com") ) {
oSession.oResponse.headers.Add("DEBUG1_WE_EDITED_THIS", "HERE");
// Remove any compression or chunking
oSession.utilDecodeResponse();
var oBody = System.Text.Encoding.UTF8.GetString(oSession.responseBodyBytes);
// Find the end of the HEAD script, so you can inject script block there.
var oRegEx = oRegEx = /(<\/head>)/gi
// replace the head-close tag with new-script + head-close
oBody = oBody.replace(oRegEx, "<script type='text/javascript'>console.log('We injected it');</script></head>");
// Set the response body to the changed body string
oSession.utilSetResponseBody(oBody);
}
Working example for www.html5rocks.com :
if ( oSession.oResponse.headers.ExistsAndContains("Content-Type", "html") &&
oSession.hostname.Contains("html5rocks") ) { //goto html5rocks.com
oSession.oResponse.headers.Add("DEBUG1_WE_EDITED_THIS", "HERE");
oSession.utilDecodeResponse();
var oBody = System.Text.Encoding.UTF8.GetString(oSession.responseBodyBytes);
var oRegEx = oRegEx = /(<\/head>)/gi
oBody = oBody.replace(oRegEx, "<script type='text/javascript'>alert('We injected it')</script></head>");
oSession.utilSetResponseBody(oBody);
}
Note, you have to turn streaming off in fiddler : http://www.fiddler2.com/fiddler/help/streaming.asp and I assume you would need to decode HTTPS : http://www.fiddler2.com/fiddler/help/httpsdecryption.asp
I have been using fiddler script less and less, in favor of fiddler .Net Extensions - http://fiddler2.com/fiddler/dev/IFiddlerExtension.asp

If you are using Chrome then check out dotjs.
It will do exactly what you want!

How about just using jquery's jQuery.getScript() method?
http://api.jquery.com/jQuery.getScript/

save the normal html pages to the file system, add the js files manually by hand, and then use fiddler to intercept those calls so you get your version of the html file

jQuery .getJSON Firefox 3 Syntax Error Undefined

I'm getting a syntax error (undefined line 1 test.js) in Firefox 3 when I run this code. The alert works properly (it displays 'work') but I have no idea why I am receiving the syntax error.
jQuery code:
$.getJSON("json/test.js", function(data) {
alert(data[0].test);
});
test.js:
[{"test": "work"}]
Any ideas? I'm working on this for a larger .js file but I've narrowed it down to this code. What's crazy is if I replace the local file with a remote path there is no syntax error (here's an example):
http://api.flickr.com/services/feeds/photos_public.gne?tags=cat&tagmode=any&format=json&jsoncallback=?

I found a solution to kick that error
$.ajaxSetup({'beforeSend': function(xhr){
if (xhr.overrideMimeType)
xhr.overrideMimeType("text/plain");
}
});
Now the explanation:
In firefox 3 (and I asume only firefox THREE) every file that has the mime-type of "text/xml" is parsed and syntax-checked. If you start your JSON with an "[" it will raise an Syntax Error, if it starts with "{" it's an "Malformed Error" (my translation for "nicht wohlgeformt").
If I access my json-file from an local script - no server is included in this progress - I have to override the mime-type... Maybe you set your MIME-Type for that very file wrong...
How ever, adding this little piece of code will save you from an error-message
Edit: In jquery 1.5.1 or higher, you can use the mimeType option to achieve the same effect. To set it as a default for all requests, use
$.ajaxSetup({ mimeType: "text/plain" });
You can also use it with $.ajax directly, i.e., your calls translates to
$.ajax({
url: "json/test.js",
dataType: "json",
mimeType: "textPlain",
success: function(data){
alert(data[0].test);
} });

getJSON may be insisting on at least one name:value pair.
A straight array ["item0","item1","Item2"] is valid JSON, but there's nothing to reference it with in the callback function for getJSON.
In this little array of Zip codes:
{"result":[["43001","ALEXANDRIA"],["43002","AMLIN"],["43003","ASHLEY"],["43004","BLACKLICK"],["43005","BLADENSBURG"],["43006","BRINKHAVEN"]]}
... I was stuck until I added the {"result": tag. Afterward I could reference it:
<script>
$.getJSON("temp_test_json.php","",
function(data) {
$.each(data.result, function(i, item) {
alert(item[0]+ " " + i);
if (i > 4 ) return false;
});
});
</script>
... I also found it was just easier to use $.each().

This may sound really really dumb, but change the file extension for test.js from .js to .txt. I had the same thing happen with perfectly valid JSON data files with pretty well any extension except .txt (example: .json, .i18n). Since I've changed the extension, I get the data and use it just fine.
Like I said, it may sound dumb but it worked for me.

HI
I have this same error when testing the web page on my local PC, but once it is up on the hosting server the error no longer happens. Sorry - I have no idea of the reason, but thought it may help someone else track down the reason

Try renaming "test.js" to "test.json", which is what Wikipedia says is the official extension for JSON files. Maybe it's being processed as Javascript at some point.

Have you tried disabling all the Firefox extensions?
I usually get some errors in the Firebug console that are caused by the extensions, not by the webs being visited.

Check if there's ; at the end of the test.js. jQuery executes eval("(" + data + ")") and semicolon would prevent Firefox from finding closing parenthesis. And there might be some other unseen characters that prevents it from doing so.
I can tell you why this remote location working though, it's because it's executed in completely different manner. Since it has jsoncallback=? as the part of query parameters, jQuery thinks of it as of JSONP and actually inserts it into the DOM inside <script> tags. Try use "json/test.js?callback=?" as target, it might help too.

What kind of webserver are you running that on? I once had an issue reading a JSON file on IIS because it wasn't defined as a valid MIME type.

Try configuring the content type of the .js file. Firefox expects it to be text/plain, apparently. You can do that as Peter Hoffmann does above, or you can set the content type header server side.
This might mean a server-side configuration change (like apache's mime.types file), or if the json is served from a script, setting the content-type header in the script.
Or at least that seems to have made the error go away for me.

I had a similar problem but was looping through a for loop. I think the problem might be that the index is out of bound.
Kien

For the people who don't use jQuery, you need to call the overrideMimeType method before sending the request:
var r = new XMLHttpRequest();
r.open("GET", filepath, true);
r.overrideMimeType("text/plain");

We Keep Coding

JavaScript is the programming language of the Web.