Serving Static Files Node - javascript

I am trying use only Node (no additional npms or express). Here is my current route handler:
function requestHandler(req, res) {
var localFolder = __dirname + '/views/',
page404 = localFolder + '404.html',
fileToServe = "";
if(/\/posts\/\d+/.test(req.url)){
fileToServe = __dirname + req.url.match(/\/posts\/\d+/) +'.json';
fs.stat(fileToServe, function(err, contents){
if(!err && contents){
res.end(templateEngine('post',fileToServe));
} else {
res.end(templateEngine('error', err))
}
});
} else if (/\/posts\//.test(req.url)){
} else if (/.+[^\W]$/.test(req.url)){
fileToServe = __dirname + '/views' + req.url.match(/\/.+[^\W]$/gi);
fs.stat(fileToServe, function(err, contents){
if(!err && contents){
res.end(fs.readFileSync(fileToServe));
} else {
res.end(templateEngine('error', err))
}
});
}
}
I have two questions:
In one of my views if have a <link> tag with a css file. When I go straight to the path, it is served (this is the regex that catches it: /.+[^\W]$/.test(req.url)). However, as mentioned, one of my views built from the template engine uses the css file.
How does the browser work when it sees the link tag? Does it send a GET request to my local server (node)? If it does, why doesn't my server send a response back? When I go directly to the link, it sends the response perfectly fine.
Furthermore, when I try going to the page that uses the css file in the link tag, it just hangs on an empty page. When I kill the server, it says it never received a response (once again, when I go to the link directly, I get the proper file).
Do I have to re-organize my routes? Serve static files separately?

How does the browser work when it sees the link tag? Does it send a GET request to my local server (node)?
Yes. Browser creates the full URL based on the current URL and makes an HTTP GET request like it does for any other resource.
If it does, why doesn't my server send a response back? When I go directly to the link, it sends the response perfectly fine.
All evidence suggests that your page which links to the css is not being captured in the handler if-blocks. Try putting a couple of console.logs, one right inside the requestHandler and the other inside in the block which is supposed to handle the page request. I think you will only see one log show up in the server console.

Related

How to get URLs of page's requests?

I am working on a Chrome app and I need to find one of the request urls (the request is initiated in a JS script).
The page script after loading asks for .../online_mektep/lesson/L_(page id)/index.json and I need this page id. How can I find out the URL?
The only way I can see now is to modify the original script with a web request and just get the data before the request. Are there other ways?
Not sure if I completely understand what you're trying to accomplish, however: maybe you can add a listener and get the url. Then you can split the URL afterwards and get the route parameter you want
chrome.webRequest.onBeforeRequest.addListener(
function(details) {
console.log('onBeforeRequest', details.url);
const yourUrl = details.url // example: ".../online_mektep/lesson/L_(page id)/index.json"
const pathArray = yourUrl.split('/')
console.log(pathArray[3].split('_')[1]) // should output (page id)
},
);

Why a JS file is publicly available for one file but not the other even after declaring it as public? [duplicate]

I have a local website with "Nodejs" (using "Express" framework). I'm using express route for showing each file in my directory and if the file that requested isn't in my directory I want to render
not-found.html.
But I realized a weird things happen.
Here it's the issue:
when user enter something like this: "http://localhost:3000/swgw" the last middleware execute and "not-found.html" render property. (with css style)
when user enter URL like following pattern: "http://localhost:3000/products/*" the problem is this time not-found.html render without css style. (Note: * isn't 1-6)
public
products
product-1.html
product-2.html
product-3.html
product-4.html
product-5.html
product-6.html
style
not-found.css
not-found.html
server.js
server.js
```
...
app.use(express.static(path.join(__dirname, 'public')));
app.get("/products/:id", (req, res, next) => {
// since I have six product with id from 1 to 6.
if (req.params.id <= 6 && req.params.id >= 1) {
res.setHeader('content-type', 'text/html');
return res.sendFile(path.resolve(`public/products/product-${req.params.id}.html`));
}
else {
next();
}
});
app.get('*', function(req, res){
res.status(404);
res.sendFile(path.resolve('public/not-found.html'));
});
```
not-found.html
...
<link rel="stylesheet" href="./style/not-found.css" >
...
Change to
<link rel="stylesheet" href="/style/not-found.css" >.
You want a path that is relative to the public directory that express.static() has as its root.
But may u please explain me in case href="./style/not-found.css" why it's works correctly when user enter : "localhost:3000/5" but not work on "localhost:3000/products/5" (I mean loading css successfully)
When the link your HTML page does not start with http:// or with /, then it is considered a path-relative link and the browser will take the path of the page and combine it with the URL in the link to make a full URL before sending it to the server. So, when you have this:
href="./style/not-found.css"
and the page URL is this:
http://localhost:3000/products/something
The browser will end up requesting:
http://localhost:3000/products/style/not-found.css
And, your server won't know what to do with that. On the other hand, when you change the <style> tag to this:
href="/style/not-found.css"
Then, your URL starts with a / so the only thing the browser will add to it is the domain and the browser will request:
http://localhost:3000/style/not-found.css
which will work.
So, when you use a path like:
http://localhost:3000/5
Then, the path for that is just / so when you combine / with ./style/not-found.css, the browser will end up requesting
http://localhost:3000/stye/not-found.css
and it will work because the path was a root path. So, it doesn't work for pages that are not at the top level. This is why your static resource URLs should always be path absolute (start with a /) so they don't depend upon the path of the hosting page.

Node.js: requesting a page and allowing the page to build before scraping

I've seen some answers to this that refer the askee to other libraries (like phantom.js), but I'm here wondering if it is at all possible to do this in just node.js?
Considering my code below. It requests a webpage using request, then using cheerio it explores the dom to scrape the page for data. It works flawlessly and if everything had gone as planned, I believe it would have outputted a file as i imagined in my head.
The problem is that the page I am requesting in order to scrape, build the table im looking at asynchronously using either ajax or jsonp, i'm not entirely sure how .jsp pages work.
So here I am trying to find a way to "wait" for this data to load before I scrape the data for my new file.
var cheerio = require('cheerio'),
request = require('request'),
fs = require('fs');
// Go to the page in question
request({
method: 'GET',
url: 'http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp'
}, function(err, response, body) {
if (err) return console.error(err);
// Tell Cherrio to load the HTML
$ = cheerio.load(body);
// Create an empty object to write to the file later
var toSort = {}
// Itterate over DOM and fill the toSort object
$('#emb table td.list_right').each(function() {
var row = $(this).parent();
toSort[$(this).text()] = {
[$("#lastdate").text()]: $(row).find(".idx1").html(),
[$("#currdate").text()]: $(row).find(".idx2").html()
}
});
//Write/overwrite a new file
var stream = fs.createWriteStream("/tmp/shipping.txt");
var toWrite = "";
stream.once('open', function(fd) {
toWrite += "{\r\n"
for(i in toSort){
toWrite += "\t" + i + ": { \r\n";
for(j in toSort[i]){
toWrite += "\t\t" + j + ":" + toSort[i][j] + ",\r\n";
}
toWrite += "\t" + "}, \r\n";
}
toWrite += "}"
stream.write(toWrite)
stream.end();
});
});
The expected result is a text file with information formatted like a JSON object.
It should look something like different instances of this
"QINHUANGDAO - GUANGZHOU (50,000-60,000DWT)": {
 "2016-09-29": 26.7,
"2016-09-30": 26.8,
},
But since the name is the only thing that doesn't load async, (the dates and values are async) I get a messed up object.
I tried Actually just setting a setTimeout in various places in the code. The script will only be touched by developers that can afford to run the script several times if it fails a few times. So while not ideal, even a setTimeout (up to maybe 5 seconds) would be good enough.
It turns out the settimeouts don't work. I suspect that once I request the page, I'm stuck with the snapshot of the page "as is" when I receive it, and I'm in fact not looking at a live thing I can wait for to load its dynamic content.
I've wondered investigating how to intercept the packages as they come, but I don't understand HTTP well enough to know where to start.
The setTimeout will not make any difference even if you increase it to an hour. The problem here is that you are making a request against this url:
http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp
and their server returns back the html and in this html there are the js and css imports. This is the end of your case, you just have the html and that's it. Instead the browser knows how to use and to parse the html document, so it is able to understand the javascript scripts and to execute/run them and this is exactly your problem. Your program is not able to understand that has something to do with the HTML contents. You need to find or to write a scraper that is able to run javascript. I just found this similar issue on stackoverflow:
Web-scraping JavaScript page with Python
The guy there suggests https://github.com/niklasb/dryscrape and it seems that this tool is able to run javascript. It is written in python though.
You are trying to scrape the original page that doesn't include the data you need.
When the page is loaded, browser evaluates JS code it includes, and this code knows where and how to get the data.
The first option is to evaluate the same code, like PhantomJS do.
The other (and you seem to be interested in it) is to investigate the page's network activity and to understand what additional requests you should perform to get the data you need.
In your case, these are:
http://index.chineseshipping.com.cn/servlet/cbfiDailyGetContrast?SpecifiedDate=&jc=jsonp1475577615267&_=1475577619626
and
http://index.chineseshipping.com.cn/servlet/allGetCurrentComposites?date=Tue%20Oct%2004%202016%2013:40:20%20GMT+0300%20(MSK)&jc=jsonp1475577615268&_=1475577620325
In both requests:
_ is a decache parameter to prevent caching.
jc is a name of a JS wrapper function which should be invoked with the result (https://en.wikipedia.org/wiki/JSONP)
So, scrapping the table template at http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp and performing two additional requests you will be able to combine them into the same data structure you see in the browser.

How to update server's content without restarting it? (node.js)

So I have a simple node.js server which serves only dynamic content:
// app.js
var app = require('express')();
app.get('/message/:poster', function(request, response) {
response.writeHeader(200, {'content-type': 'text/html'});
// some database queries
response.end(""+
"<!DOCTYPE html>"+
"<html>"+
"<head>"+
"<title>messages of "+request.params.poster+"</title>"+
"<script src='http://example.com/script.js'></script>"+
"<link rel='stylesheet' href='http://example.com/style.css'>"+
"</head>"+
"<body>"+
"" // and so on
);
})
app.listen(2345);
Now, suppose I want to update my HTML.
And suppose further that I don't want to restart the server.
Is there a way of achieving this?
I tried exporting the part to send to an external file like:
// lib.js
module.exports.message = function(request, response) {
response.writeHeader(200, {'content-type': 'text/html'})
//some database queries
response.end(""+
"<!DOCTYPE html>"+
"<html>"+
"<head>"+
"<title>messages of "+request.params.poster+"</title>"+
"<script src='http://example.com/script.js></script>"+
"<link rel='stylesheet' href='http://example.com/style.css'>"+
"</head>"+
"<body>"+
"" //and so on
);
}
and
// app.js
var app = require('express')();
app.get('/message/:poster', require('./lib.js').message)
app.listen(2345);
And it works but if I update lib.js it doesn't update. It seems to be making a copy of that function.
Then I tried
// app.js
var app = require('express')();
app.get('/message/:poster', function(request, response) {
require('./lib.js').message(request, response);
})
app.listen(2345);
But this doesn't update either.Seems like the function gets cached and reused all the time(once I start the server). I dare say that there must be a way to set it so that it either revalidates the function each time(checking if the file containing it changed) and if so updates its cache, or we can set it to update the function each n amount of time, or even better, since we're in node, having an event listener for changes to the files containing the function and as the function changes, event fires and the function in cache gets updated.
So how do we get one of the above behaviors? Or something else? I know to restart a server may take only 100ms but restarting it would interrupt all the currently active websockets, which is not an option.
NOTE: I don't want to use any templating languages like jade, ejc etc.
By requiring a module its module.exports is cached for all future calls to require. You can programmatically empty the cache: http://nodejs.org/docs/latest/api/globals.html#globals_require_cache. If you additionally want to do it upon a file change you can use fs.watch: http://nodejs.org/api/fs.html#fs_fs_watch_filename_options_listener.
If you dont want to loose any request in production - just gracefull restart your app using PM2, Forever, etc.
If you want to automatic apply your changes in development - use Nodemon.
I dont know any other reason why you dont like to restart app.
Exporting the content to other module seems like a good way to tackle this requirement. But the problem is: modules are only instantiated once and cached for later requests, that's why when you update the lib.js it doesn't reflect the updated content.
What are you looking for is a way for hot loading node.js modules to get a fresh instance of the module every time it changes. You can use node-hotswap
require('hotswap');
and for any module you want to track the changes:
module.change_code = 1;
could you not just use the read/write API for nodejs?
http://nodejs.org/api/fs.html
Request->ReadFile->Send Content to client
At least for static content as html.

Serve html file with fs.readFileSync() fails

I have this code (https://gist.github.com/2402116) :
server.get('/', function(request, response) {
var k = fs.readFileSync('./index.html','utf8');
response.send( k );
});
Tries to read this file:
https://gist.github.com/2402070
and the browser keeps loading and never end.
But if I remove all the js includes from the html file works fine.
what am I doing wrong?
Your current server implementation does not do anything but serve index.html to requests for the base url, i.e. '/'. You will need to write further code/routes to serve the requests for the js includes in your index.html, i.e. '/app.js' and the various js files in '/js/'.
Now, the routing implementation in the gist is quite crude and doesn't support many aspects of url matching. The original code is clearly just demonstrating a concept for a single page site with no resources. You will see it will quickly become burdensome to get your code working as you will effectively have to write a route for every resource request, e.g.
server.get('/app.js', function(request, response) {
var k = fs.readFileSync('./app.js','utf8');
response.send( k );
});
server.get('/js/jquery-1.7.2.js', function(request, response) {
var k = fs.readFileSync('./js/jquery-1.7.2.js','utf8');
response.send( k );
});
etc...
You are better off looking at a node.js url routing library already out there (e.g. director) or a web framework such as express which has inbuilt support for routing (and static file serving).
You need a response.end() once you are done sending data to your browser.
Actually, since you are sending all of your data at once, you can just replace response.send(k) with response.end(k). Although this method is not recommended. I highly recommend reading your file asynchronously and sending it to the client chunk-by-chunk.
See also: http://nodejs.org/api/http.html#http_response_end_data_encoding
try .toString on k and not send but .end
response.end( k.toString() );
maybe some wierd things happens and he tries to eval the code

Categories