Detecting successful read stream open - javascript

I'm implementing cache for static serving middleware for Express.js, which works as follows — when request comes, middleware first tries to serve file from filesystem, and if there is none, file is fetched from upstream and stored in file system.
Problem is I don't know how to properly detect “cache hit” event.
staticMiddleware = function(req, res, next) {
// try to read file from fs
filename = urlToFilename(req.url);
stream = fs.createReadStream(filename);
// cache miss - file not found
stream.on('error', function() {
console.log('miss ' + req.url);
// get file from upstream, store it into fs and serve as response
stream = fetchFromUpstream(url);
stream.pipe(fs.createWriteStream(filename));
stream.pipe(res);
});
// cache hit - file is being read
I_DONT_KNOW_WHAT_TO_PUT_HERE(function() {
console.log('hit ' + req.url);
stream.pipe(res);
});
}
So, basically, how can I detect succesful file reading? If I listen to 'data' event, I guess I miss first chunk of data. If I just pipe() it to response, response stream gets closed on error, and I can't serve it with fetched data, and this approach really lacks flexibility. I wonder if there is way to listen for event like fdcreated or opened or similar, or way to push back data I've got in data event, so it will be resent in next data event.

Method createReadStream returns a ReadableStream which also an event open. You can add an event handler for the open event so you will know when the resource is valid before piping:
stream.on('open', function() {
console.log('hit ' + req.url);
stream.pipe(res);
});

Related

Why is request.on data firing with a delay on NodeJS?

There is a simple web server that accepts data. Sample code below.
The idea is to track in real time how much data has entered the server and immediately inform the client about this. If you send a small amount of data, then everything works well, but if you send more than X data in size, then the on.data event on the server is triggered with a huge delay. I can see that data is transfering for 5 seconds already but on.data event is not trigerred.
on.data event seems to be triggered only when data is uploaded completely to the server, so that's why it works fine with small data (~2..20Mb), but with big data (50..200Mb) it doesnt work well.
Or maybe it is due to some kind of buffering..?
Do you have any suggestions why on.data triggered with delay and how to fix it?
const app = express();
const port = 3000;
// PUBLIC API
// upload file
app.post('/upload', function (request, response) {
request.on('data', chunk => {
// message appears with delay
console.log('upload on data', chunk.length);
// send message to the client about chunk.length
});
response.send({
message: `Got a POST request ${request.headers['content-length']}`
});
});
app.listen(port, () => {
console.log(`Example app listening at http://localhost:${port}`);
});
TLDR:
The delay that you are experiencing probably is the Queueing from Resource scheduling from the browser.
The Test
I did some tests with express, and then I found that it uses http to handle requests/response, so I used a raw http server listener to test this scenario, which has the same situation.
Backend code
This code, based on sample of Node transaction samples, will create a http server and give log of time on 3 situations:
When a request was received
When the first data event fires
When the end event fires
const http = require('http');
var firstByte = null;
var server = http.createServer((request, response) => {
const { headers, method, url } = request;
let body = [];
request.on('error', (err) => {
}).on('data', (chunk) => {
if (!firstByte) {
firstByte = Date.now();
console.log('received first byte at: ' + Date.now());
}
}).on('end', () => {
console.log('end receive data at: ' + Date.now());
// body = Buffer.concat(body).toString();
// At this point, we have the headers, method, url and body, and can now
// do whatever we need to in order to respond to this request.
if (url === '/') {
response.statusCode = 200;
response.setHeader('Content-Type', 'text/html');
response.write('<h1>Hello World</h1>');
}
firstByte = null;
response.end();
});
console.log('received a request at: ' + Date.now());
});
server.listen(8083);
Frontend code (snnipet from devtools)
This code will fire a upload to /upload which some array data, I filled the array before with random bytes, but then I removed and see that it did not have any affect on my timing log, so yes.. the upload content for now is just an array of 0's.
console.log('building data');
var view = new Uint32Array(new Array(5 * 1024 * 1024));
console.log('start sending at: ' + Date.now());
fetch("/upload", {
body: view,
method: "post"
}).then(async response => {
const text = await response.text();
console.log('got response: ' + text);
});
Now running the backend code and then running the frontend code I get some log.
Log capture (screenshots)
The Backend log and frontend log:
The time differences between backend and frontend:
Results
looking at the screenshoots and I get two differences between the logs:
The first, and most important, is the difference between frontend fetch start and backend request recevied, I got 1613ms which is "close" (1430ms) to Resource Scheduling in network timing tab, I think there are more things happening between the frontend fetch call and the node backend event, so I can't direct compare the times:
log.backendReceivedRequest - log.frontEndStart
1613
The second is the difference between receving data on backend, which I got
578ms, close to Request sent (585ms) in network timing tab:
log.backendReceivedAllData - log.backendReceivedFirstData
578
I also changed the frontend code to send different sizes of data and the network timing tab still matches the log
The thing that remains unknown for me is... Why does Google Chrome is queueing my fetch since I'm not running any more requests and not using the bandwidth of the server/host? I readed the conditions for Queueing but not found the reason, maybe is allocating the resources on disk, but not sure: https://developer.chrome.com/docs/devtools/network/reference/#timing-explanation
References:
https://nodejs.org/es/docs/guides/anatomy-of-an-http-transaction/
https://developer.chrome.com/docs/devtools/network/reference/#timing-explanation
I found a problem. It was in nginx config. Nginx was setup like a reverse proxy. By default proxy request buffering is enabled, so nginx grabs first whole request body and only then forwards it to nodejs, so that's why I saw delay.
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering

May I have a res.json and res.sendFile together in the same api endpoint in express?

I want upload the content of an excel file into the server in order to get its data and do some stuff...
I came up with the following code, however it seems like it is not working properly as the following error displays in the console Error: Can't set headers after they are sent.
The file is getting uploaded into the folder and the json message is being displayed... However I do not know if I am going to face any issue in the future...
Actually I just need the excel data no need for the excel being uploaded... Maybe you could give me a workaround, guys...
const router = express.Router();
const storage = multer.diskStorage({
destination(req, file, cb) {
cb(null, 'uploads/');
},
filename(req, file, cb) {
cb(
null,
`${file.fieldname}-${Date.now()}${path
.extname(file.originalname)
.toLowerCase()}`
);
},
});
const excelFilter = (req, file, cb) => {
if (
file.mimetype.includes('excel') ||
file.mimetype.includes('spreadsheetml')
) {
cb(null, true);
} else {
cb('Please upload only excel file.', false);
}
};
const upload = multer({
storage,
fileFilter: excelFilter,
});
router.post('/', upload.single('file'), (req, res) => {
var workbook = XLSX.readFile(req.file.path);
var sheet_name_list = workbook.SheetNames;
var xlData = XLSX.utils.sheet_to_json(workbook.Sheets[sheet_name_list[0]]);
res.json(xlData).sendFile(`/${req.file.path}`, { root: path.resolve() });
});
May I have a res.json and res.sendFile together in the same api endpoint in express?
No, you cannot. Each of those methods, sends a complete http response (including calling res.end() which terminates the http request) and you can only send one http response to each incoming request. The particular error you're getting has to do with the res.sendFile() trying to configure the response that it's getting ready to send and finding that the http response object has already been used for sending a response and can't be used again.
Ordinarily, if you wanted to sent two different pieces of data, you would just combine them into a single Javascript object and just call res.json() on the object that contains both pieces of data.
But, sending a binary file is not something you can easily put in a JSON package. You could construct a multipart response where one part was the JSON and one part was the file. You could JSON encode binary data (though that's inefficient). I presume there are probably some modules that would help you do that, but for most clients, that isn't what they are really expecting or equipped to handle.
The only way to a proper solution is for us to understand what client/server workflow you're trying to implement here and why you're trying to send back the same file that was just uploaded. There would normally not be a reason to do that since the client already has that data (they just uploaded it).

Why Does Browser Keep Loading When Using createReadStream() Node.js

New to Node.js I do understand that createReadStream() function is better for the performance than readFile(), because createReadStream() reads and writes data in chucks while readFile() first reads the whole content. Thus if the file is large, readFile() function might take longer before data can be processed futher. Thus I choose to create server using createReadStream() function as following.
// Create a server with fs.createReadStream(), better performance and less memory usage.
http.createServer( function (request, response) {
// Parse the request containing file name
var pathname = url.parse(request.url).pathname;
// Create a readable stream.
var readerStream = fs.createReadStream(pathname.substr(1));
// Set the encoding to be UTF8.
readerStream.setEncoding('UTF8');
// Handle stream events --> data, end and error
readerStream.on('data', function(chunk) {
// Page found
// HTTP Status: 200 : OK
// Content Type: text/plain
response.writeHead(200, {'Content-type': 'text/html'});
// Write the content of the file to response body.
response.write(chunk);
console.log('Page is being streamed...');
});
readerStream.on('end', function() {
console.log('Page is streamed and emitted successfully.');
});
readerStream.on('error', function(err) {
// HTTP Status: 404 : NOT FOUND
// Content Type: text/plain
response.writeHead(404, {'Content-type': 'text/html'});
console.log('Page streaming error: ' + err);
});
console.log('Code ends!');
}).listen(8081);
// Console will print the message
console.log('Server running at http://127.0.0.1:8081/');
My .html or .txt file contains three short lines of text. After starting my server I visit my web page by going to http://127.0.0.1:8081/index.html. Everything works fine and the content of index.html is echoed on the browser.
But on the tab of the browser, the loader icon keeps turning like it keeps loading for about 1 minute.
Is that normal with Node.js server? Does the icon just keep turning, but costs nothing to the server? Or do I miss something and icon is not supposed to keep turning?
It doesn't look like you are ending your response. The browser probably thinks the request isn't finished and thus continues to "load".
If you look at the Network tab in the developer console you might see the request hasn't finished.
You should be sending response.end()
This method signals to the server that all of the response headers and body have been sent; that server should consider this message complete. The method, response.end(), MUST be called on each response.
I believe you should be calling response.end() in both the readerStream.on('end' and readerStream.on('error' callbacks after you write the head. This will tell the browser the request is finished and it can stop the loading action.

Node.js Pipe a PDF API Response

So my scenario is a user clicks a button on a web app, this triggers a server side POST request to an internal (i.e non public) API sitting on another server in the same network, this should return a PDF to my server which will proxy (pipe) it back to the user.
I want to just proxy the PDF body content directly to the client without creating a tmp file.
I have this code which works using the npm request module but it does not feel right:
var pdfRequest = request(requestOptions);
pdfRequest.on('error', function (err) {
utils.sendErrorResponse(500, 'PROBLEM PIPING PDF DOWNLOAD: ' + err, res);
});
pdfRequest.on('response', function (resp) {
if (resp.statusCode === 200) {
pdfRequest.pipe(res);
} else {
utils.sendErrorResponse(500, 'PROBLEM PIPING PDF DOWNLOAD: RAW RESP: ' + JSON.stringify(resp), res);
}
});
Is the the correct way to pipe the PDF response?
Notes:
I need to check the status code to conditionally handle errors, the payload for the POST is contained in the requestOptions (I know this part is all correct).
I would like to keep using the request module
I defiantly do not want to be creating any temp files
If possible I would also like to modify the content disposition header to set a custom filename, i know how to do this without using pipes

express node server return a value to async client call

Sorry, I tend to be a bad writer when I have not fully woken up, let me revise.
I am using expressjs with passportjs (local strategy) to manage my server and using connect-busboy to manage file uploading. I do not think passport will play a role in this.
Here is the server code for managing file uploads:
app.post('/upload', isLoggedIn, (req, res) => {
if(req.busboy){
req.pipe(req.busboy);
req.busboy.on('file', (fieldname, file, filename, encoding, mimetype) => {
if(mimetype.match(/^image\//)){
var root = path.join(__dirname, "../public/images/");
if(fs.existsSync(path.join(root, filename))){
var name = getUnique(path.join(root, filename));
} else {
var name = filename;
}
var ws = fs.createWriteStream(path.join(root, name), { flags: "a" });
file.pipe(ws);
}
});
}
});
As for my client page, it is used to change a JSON object which will get re-uploaded to the server as a configuration tool. When I upload a new image asynchronously I need to get the filename to update this JSON object while working on it. For uploading from the clients end I am using dropzonejs, which did not require any configuration on my part to work.
So, in summary I upload a number of images via dropzone asynchronously, busboy and fs on my server save the file, and I would like to get the filename returned to my javascript to modify the existing JSON object.
Edit solution:
Thanks to Elliot Blackburn for pointing me in the right direction.
By calling:
ws.on('close', () => {
res.send({filename: name});
});
after file.pipe(ws); to send the response back to the client. On the client side modify dropzone to handle the response like so:
dropzone.on('success', (file, res) => {
console.log(res);
});
Just send it in the normal http response. It'll depend what library you're using but most will allow you to trigger a normal req, res, next express call. From that you can access the file object, and return anything you want.
Something like:
req.send({filename: name}); // name is the filename var set earlier in the code.
Once you've finished editing the file and such, you can get the name and put it into that returned object and your client will receive that as object as the response which you can act upon.

Categories