How to update server's content without restarting it? (node.js) - javascript

So I have a simple node.js server which serves only dynamic content:
// app.js
var app = require('express')();
app.get('/message/:poster', function(request, response) {
response.writeHeader(200, {'content-type': 'text/html'});
// some database queries
response.end(""+
"<!DOCTYPE html>"+
"<html>"+
"<head>"+
"<title>messages of "+request.params.poster+"</title>"+
"<script src='http://example.com/script.js'></script>"+
"<link rel='stylesheet' href='http://example.com/style.css'>"+
"</head>"+
"<body>"+
"" // and so on
);
})
app.listen(2345);
Now, suppose I want to update my HTML.
And suppose further that I don't want to restart the server.
Is there a way of achieving this?
I tried exporting the part to send to an external file like:
// lib.js
module.exports.message = function(request, response) {
response.writeHeader(200, {'content-type': 'text/html'})
//some database queries
response.end(""+
"<!DOCTYPE html>"+
"<html>"+
"<head>"+
"<title>messages of "+request.params.poster+"</title>"+
"<script src='http://example.com/script.js></script>"+
"<link rel='stylesheet' href='http://example.com/style.css'>"+
"</head>"+
"<body>"+
"" //and so on
);
}
and
// app.js
var app = require('express')();
app.get('/message/:poster', require('./lib.js').message)
app.listen(2345);
And it works but if I update lib.js it doesn't update. It seems to be making a copy of that function.
Then I tried
// app.js
var app = require('express')();
app.get('/message/:poster', function(request, response) {
require('./lib.js').message(request, response);
})
app.listen(2345);
But this doesn't update either.Seems like the function gets cached and reused all the time(once I start the server). I dare say that there must be a way to set it so that it either revalidates the function each time(checking if the file containing it changed) and if so updates its cache, or we can set it to update the function each n amount of time, or even better, since we're in node, having an event listener for changes to the files containing the function and as the function changes, event fires and the function in cache gets updated.
So how do we get one of the above behaviors? Or something else? I know to restart a server may take only 100ms but restarting it would interrupt all the currently active websockets, which is not an option.
NOTE: I don't want to use any templating languages like jade, ejc etc.

By requiring a module its module.exports is cached for all future calls to require. You can programmatically empty the cache: http://nodejs.org/docs/latest/api/globals.html#globals_require_cache. If you additionally want to do it upon a file change you can use fs.watch: http://nodejs.org/api/fs.html#fs_fs_watch_filename_options_listener.

If you dont want to loose any request in production - just gracefull restart your app using PM2, Forever, etc.
If you want to automatic apply your changes in development - use Nodemon.
I dont know any other reason why you dont like to restart app.

Exporting the content to other module seems like a good way to tackle this requirement. But the problem is: modules are only instantiated once and cached for later requests, that's why when you update the lib.js it doesn't reflect the updated content.
What are you looking for is a way for hot loading node.js modules to get a fresh instance of the module every time it changes. You can use node-hotswap
require('hotswap');
and for any module you want to track the changes:
module.change_code = 1;

could you not just use the read/write API for nodejs?
http://nodejs.org/api/fs.html
Request->ReadFile->Send Content to client
At least for static content as html.

Related

Input Processing in JavaScipt

I'm new to Web Development (including JavaScript and HTML) and have a few issues within my personal project that seem to have no clear fixes.
Overview
My project is taking input from a user on the website, and feeding it to my back-end to output a list of word completion suggestions.
For example, input => "bass", then the program would suggest "bassist", "bassa", "bassalia", "bassalian", "bassalan", etc. as possible completions for the pattern "bass" (these are words extracted from an English dictionary text file).
The backend - running on Node JS libraries
trie.js file:
/* code for the trie not fully shown */
var Deque = require("collections/deque"); // to be used somewhere
function add_word_to_trie(word) { ... }
function get_words_matching_pattern(pattern, number_to_get = DEFAULT_FETCH) { ... }
// read in words from English dictionary
var file = require('fs');
const DICTIONARY = 'somefile.txt';
function preprocess() {
file.readFileSync(DICTIONARY, 'utf-8')
.split('\n')
.forEach( (item) => {
add_word_to_trie(item.replace(/\r?\n|\r/g, ""));
});
}
preprocess();
module.exports = get_words_matching_trie;
The frontend
An HTML script that renders the visuals for the website, as well as getting input from the user and passing it onto the backend script for getting possible suggestions. It looks something like this:
index.html script:
<!DOCTYPE HTML>
<html>
<!-- code for formatting website and headers not shown -->
<body>
<script src = "./trie.js">
function get_predicted_text() {
const autofill_options = get_words_matching_pattern(input.value);
/* add the first suggestion we get from the autofill options to the user's input
arbitrary, because I couldn't get this to actually work. Actual version of
autofill would be more sophisticated. */
document.querySelector("input").value += autofill_options[0];
}
</script>
<input placeholder="Enter text..." oninput="get_predicted_text()">
<!-- I get a runtime error here saying that get_predicted_text is not defined -->
</body>
</html>
Errors I get
Firstly, I get the obvious error of 'require()' being undefined on the client-side. This, I fix using browserify.
Secondly, there is the issue of 'fs' not existing on the client-side, for being a node.js module. I have tried running the trie.js file using node and treating it with some server-side code:
function respond_to_user_input() {
fs.readFile('./index.html', null, (err, html) => {
if (err) throw err;
http.createServer( (request, response) => {
response.write(html);
response.end();
}).listen(PORT);
});
respond_to_user_input();
}
With this, I'm not exactly sure how to edit document elements, such as changing input.value in index.html, or calling the oninput event listener within the input field. Also, my CSS formatting script is not called if I invoke the HTML file through node trie.js command in terminal.
This leaves me with the question: is it even possible to run index.html directly (through Google Chrome) and have it use node JS modules when it calls the trie.js script? Can the server-side code I described above with the HTTP module, how can I fix the issues of invoking my external CSS script (which my HTML file sends an href to) and accessing document.querySelector("input") to edit my input field?

Node.js: requesting a page and allowing the page to build before scraping

I've seen some answers to this that refer the askee to other libraries (like phantom.js), but I'm here wondering if it is at all possible to do this in just node.js?
Considering my code below. It requests a webpage using request, then using cheerio it explores the dom to scrape the page for data. It works flawlessly and if everything had gone as planned, I believe it would have outputted a file as i imagined in my head.
The problem is that the page I am requesting in order to scrape, build the table im looking at asynchronously using either ajax or jsonp, i'm not entirely sure how .jsp pages work.
So here I am trying to find a way to "wait" for this data to load before I scrape the data for my new file.
var cheerio = require('cheerio'),
request = require('request'),
fs = require('fs');
// Go to the page in question
request({
method: 'GET',
url: 'http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp'
}, function(err, response, body) {
if (err) return console.error(err);
// Tell Cherrio to load the HTML
$ = cheerio.load(body);
// Create an empty object to write to the file later
var toSort = {}
// Itterate over DOM and fill the toSort object
$('#emb table td.list_right').each(function() {
var row = $(this).parent();
toSort[$(this).text()] = {
[$("#lastdate").text()]: $(row).find(".idx1").html(),
[$("#currdate").text()]: $(row).find(".idx2").html()
}
});
//Write/overwrite a new file
var stream = fs.createWriteStream("/tmp/shipping.txt");
var toWrite = "";
stream.once('open', function(fd) {
toWrite += "{\r\n"
for(i in toSort){
toWrite += "\t" + i + ": { \r\n";
for(j in toSort[i]){
toWrite += "\t\t" + j + ":" + toSort[i][j] + ",\r\n";
}
toWrite += "\t" + "}, \r\n";
}
toWrite += "}"
stream.write(toWrite)
stream.end();
});
});
The expected result is a text file with information formatted like a JSON object.
It should look something like different instances of this
"QINHUANGDAO - GUANGZHOU (50,000-60,000DWT)": {
 "2016-09-29": 26.7,
"2016-09-30": 26.8,
},
But since the name is the only thing that doesn't load async, (the dates and values are async) I get a messed up object.
I tried Actually just setting a setTimeout in various places in the code. The script will only be touched by developers that can afford to run the script several times if it fails a few times. So while not ideal, even a setTimeout (up to maybe 5 seconds) would be good enough.
It turns out the settimeouts don't work. I suspect that once I request the page, I'm stuck with the snapshot of the page "as is" when I receive it, and I'm in fact not looking at a live thing I can wait for to load its dynamic content.
I've wondered investigating how to intercept the packages as they come, but I don't understand HTTP well enough to know where to start.
The setTimeout will not make any difference even if you increase it to an hour. The problem here is that you are making a request against this url:
http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp
and their server returns back the html and in this html there are the js and css imports. This is the end of your case, you just have the html and that's it. Instead the browser knows how to use and to parse the html document, so it is able to understand the javascript scripts and to execute/run them and this is exactly your problem. Your program is not able to understand that has something to do with the HTML contents. You need to find or to write a scraper that is able to run javascript. I just found this similar issue on stackoverflow:
Web-scraping JavaScript page with Python
The guy there suggests https://github.com/niklasb/dryscrape and it seems that this tool is able to run javascript. It is written in python though.
You are trying to scrape the original page that doesn't include the data you need.
When the page is loaded, browser evaluates JS code it includes, and this code knows where and how to get the data.
The first option is to evaluate the same code, like PhantomJS do.
The other (and you seem to be interested in it) is to investigate the page's network activity and to understand what additional requests you should perform to get the data you need.
In your case, these are:
http://index.chineseshipping.com.cn/servlet/cbfiDailyGetContrast?SpecifiedDate=&jc=jsonp1475577615267&_=1475577619626
and
http://index.chineseshipping.com.cn/servlet/allGetCurrentComposites?date=Tue%20Oct%2004%202016%2013:40:20%20GMT+0300%20(MSK)&jc=jsonp1475577615268&_=1475577620325
In both requests:
_ is a decache parameter to prevent caching.
jc is a name of a JS wrapper function which should be invoked with the result (https://en.wikipedia.org/wiki/JSONP)
So, scrapping the table template at http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp and performing two additional requests you will be able to combine them into the same data structure you see in the browser.

Serving Static Files Node

I am trying use only Node (no additional npms or express). Here is my current route handler:
function requestHandler(req, res) {
var localFolder = __dirname + '/views/',
page404 = localFolder + '404.html',
fileToServe = "";
if(/\/posts\/\d+/.test(req.url)){
fileToServe = __dirname + req.url.match(/\/posts\/\d+/) +'.json';
fs.stat(fileToServe, function(err, contents){
if(!err && contents){
res.end(templateEngine('post',fileToServe));
} else {
res.end(templateEngine('error', err))
}
});
} else if (/\/posts\//.test(req.url)){
} else if (/.+[^\W]$/.test(req.url)){
fileToServe = __dirname + '/views' + req.url.match(/\/.+[^\W]$/gi);
fs.stat(fileToServe, function(err, contents){
if(!err && contents){
res.end(fs.readFileSync(fileToServe));
} else {
res.end(templateEngine('error', err))
}
});
}
}
I have two questions:
In one of my views if have a <link> tag with a css file. When I go straight to the path, it is served (this is the regex that catches it: /.+[^\W]$/.test(req.url)). However, as mentioned, one of my views built from the template engine uses the css file.
How does the browser work when it sees the link tag? Does it send a GET request to my local server (node)? If it does, why doesn't my server send a response back? When I go directly to the link, it sends the response perfectly fine.
Furthermore, when I try going to the page that uses the css file in the link tag, it just hangs on an empty page. When I kill the server, it says it never received a response (once again, when I go to the link directly, I get the proper file).
Do I have to re-organize my routes? Serve static files separately?
How does the browser work when it sees the link tag? Does it send a GET request to my local server (node)?
Yes. Browser creates the full URL based on the current URL and makes an HTTP GET request like it does for any other resource.
If it does, why doesn't my server send a response back? When I go directly to the link, it sends the response perfectly fine.
All evidence suggests that your page which links to the css is not being captured in the handler if-blocks. Try putting a couple of console.logs, one right inside the requestHandler and the other inside in the block which is supposed to handle the page request. I think you will only see one log show up in the server console.

Inject local .js file into a webpage?

I'd like to inject a couple of local .js files into a webpage. I just mean client side, as in within my browser, I don't need anybody else accessing the page to be able to see it. I just need to take a .js file, and then make it so it's as if that file had been included in the page's html via a <script> tag all along.
It's okay if it takes a second after the page has loaded for the stuff in the local files to be available.
It's okay if I have to be at the computer to do this "by hand" with a console or something.
I've been trying to do this for two days, I've tried Greasemonkey, I've tried manually loading files using a JavaScript console. It amazes me that there isn't (apparently) an established way to do this, it seems like such a simple thing to want to do. I guess simple isn't the same thing as common, though.
If it helps, the reason why I want to do this is to run a chatbot on a JS-based chat client. Some of the bot's code is mixed into the pre-existing chat code -- for that, I have Fiddler intercepting requests to .../chat.js and replacing it with a local file. But I have two .js files which are "independant" of anything on the page itself. There aren't any .js files requested by the page that I can substitute them for, so I can't use Fiddler.
Since your already using a fiddler script, you can do something like this in the OnBeforeResponse(oSession: Session) function
if ( oSession.oResponse.headers.ExistsAndContains("Content-Type", "html") &&
oSession.hostname.Contains("MY.TargetSite.com") ) {
oSession.oResponse.headers.Add("DEBUG1_WE_EDITED_THIS", "HERE");
// Remove any compression or chunking
oSession.utilDecodeResponse();
var oBody = System.Text.Encoding.UTF8.GetString(oSession.responseBodyBytes);
// Find the end of the HEAD script, so you can inject script block there.
var oRegEx = oRegEx = /(<\/head>)/gi
// replace the head-close tag with new-script + head-close
oBody = oBody.replace(oRegEx, "<script type='text/javascript'>console.log('We injected it');</script></head>");
// Set the response body to the changed body string
oSession.utilSetResponseBody(oBody);
}
Working example for www.html5rocks.com :
if ( oSession.oResponse.headers.ExistsAndContains("Content-Type", "html") &&
oSession.hostname.Contains("html5rocks") ) { //goto html5rocks.com
oSession.oResponse.headers.Add("DEBUG1_WE_EDITED_THIS", "HERE");
oSession.utilDecodeResponse();
var oBody = System.Text.Encoding.UTF8.GetString(oSession.responseBodyBytes);
var oRegEx = oRegEx = /(<\/head>)/gi
oBody = oBody.replace(oRegEx, "<script type='text/javascript'>alert('We injected it')</script></head>");
oSession.utilSetResponseBody(oBody);
}
Note, you have to turn streaming off in fiddler : http://www.fiddler2.com/fiddler/help/streaming.asp and I assume you would need to decode HTTPS : http://www.fiddler2.com/fiddler/help/httpsdecryption.asp
I have been using fiddler script less and less, in favor of fiddler .Net Extensions - http://fiddler2.com/fiddler/dev/IFiddlerExtension.asp
If you are using Chrome then check out dotjs.
It will do exactly what you want!
How about just using jquery's jQuery.getScript() method?
http://api.jquery.com/jQuery.getScript/
save the normal html pages to the file system, add the js files manually by hand, and then use fiddler to intercept those calls so you get your version of the html file

Serve html file with fs.readFileSync() fails

I have this code (https://gist.github.com/2402116) :
server.get('/', function(request, response) {
var k = fs.readFileSync('./index.html','utf8');
response.send( k );
});
Tries to read this file:
https://gist.github.com/2402070
and the browser keeps loading and never end.
But if I remove all the js includes from the html file works fine.
what am I doing wrong?
Your current server implementation does not do anything but serve index.html to requests for the base url, i.e. '/'. You will need to write further code/routes to serve the requests for the js includes in your index.html, i.e. '/app.js' and the various js files in '/js/'.
Now, the routing implementation in the gist is quite crude and doesn't support many aspects of url matching. The original code is clearly just demonstrating a concept for a single page site with no resources. You will see it will quickly become burdensome to get your code working as you will effectively have to write a route for every resource request, e.g.
server.get('/app.js', function(request, response) {
var k = fs.readFileSync('./app.js','utf8');
response.send( k );
});
server.get('/js/jquery-1.7.2.js', function(request, response) {
var k = fs.readFileSync('./js/jquery-1.7.2.js','utf8');
response.send( k );
});
etc...
You are better off looking at a node.js url routing library already out there (e.g. director) or a web framework such as express which has inbuilt support for routing (and static file serving).
You need a response.end() once you are done sending data to your browser.
Actually, since you are sending all of your data at once, you can just replace response.send(k) with response.end(k). Although this method is not recommended. I highly recommend reading your file asynchronously and sending it to the client chunk-by-chunk.
See also: http://nodejs.org/api/http.html#http_response_end_data_encoding
try .toString on k and not send but .end
response.end( k.toString() );
maybe some wierd things happens and he tries to eval the code

Categories