This question extends that of What is Node.js' Connect, Express and "middleware"?
I'm going the route of learning Javascript -> Node.js -> Connect -> Express -> ... in order to learn about using a modern web development stack. I have a background in low-layer networking, so getting up and going with Node.js' net and http modules was easy. The general pattern of using a server to route requests to different handlers seemed natural and intuitive.
Moving to Connect, I'm afraid I don't understand the paradigm and the general flow of data of this "middleware". For example, if I create some middleware for use with Connect ala;
// example.js
module.exports = function (opts) {
// ...
return function(req, res, next) {
// ...
next();
};
};
and "use" it in Connect via
var example = require('./example');
// ...
var server = connect.createServer();
// ...
server.use(example(some_paramater));
I don't know when my middleware gets called. Additionally, if I'm use()'ing other middlware, can I be guaranteed on the order in which the middleware is called? Furthuremore, I'm under the assumption the function next() is used to call the next (again, how do I establish an ordering?) middleware; however, no parameters (req, res, next) are passed. Are these parameters passed implicitly somehow?
I'm guessing that the collection of middleware modules used are strung together, starting with the http callback -> hence a bunch of functionality added in the middle of the initial request callback and the server ending a response.
I'm trying to understand the middleware paradigm, and the flow of information/execution.
Any help is greatly appreciated. Thank you for reading
The middleware is called as a chain of functions, with order based on middleware definition order(time) with matching routes (if applicable).
Taking in account that req and res objects are travelling through chain so you can reuse/improve/modify data in them along the chain.
There are two general use cases for middleware: generic and specific.
Generic is as you have defined in example above: app.use, it will apply to every single request. Each middleware have to call next() inside, if it wants to proceed to next middleware.
When you use app.get('/path', function(... this actual function is middleware as well, just inline defined. So it is sort of fully based on middlewares, and there is no endware :D
The chain order is based on definition order. So it is important to define middleware in sync manner or order-reliable async manner. Otherwise different order of middleware can break logic, when chain of middleware depends on each other.
Some middleware can be used to break the chain return next(new Error());. It is useful for example for validation or authentication middleware.
Another useful pattern of use for middleware is to process and parse request data, like cookies, or good example of such app.use(express.bodyParser());.
Related
I joined a small team of devs for a start up. We have not even launched yet. I have been handed a backend service written in node/express. I have not worked with this tech beyond small pet projects. I was looking into implementing a style guide just to keep code consistent, with the goal of implementing this across other backend services as well.
That brought me to the Airbnb style guide. This part jumped out at me.
Never mutate parameters
// bad
function f1(obj) {
obj.key = 1;
}
// good
function f2(obj) {
const key = Object.prototype.hasOwnProperty.call(obj, 'key') ? obj.key : 1;
}
In express there are typically controllers that get defined like so:
async function someController(req, res, next) {
// I've seen similar code to this
req.someNewProp = "Some new value."
res.status(200).json({"someJSONKey":"someJSONVal"});
}
Middleware typically gets defined like this:
// Route
router.get('/endpoint', function1, function2)
async function function1(req, res, next) {
// I've seen similar code to this
req.someNewProp = "Some new value."
// Pass req and res to function2
next();
}
I notice that the req object, as it gets passed around gets modified a lot. Data gets added to this object in middleware and other functions as it is passed along before the response is returned. The original dev that authored the code referred to it as "keeping things in request scope." But that seems to directly contradict a major point in the style guide and made me wonder if this is bad practice.
So the question now is, is there a "better" or more widely accepted way to keep track of things in the context of the request that is not mutating the original request object? What are some approaches of doing this?
Express provides a name space for applications to store request/response processing variables by adding them as properties of res.locals. This seems a better choice than attaching not standard properties to the request or response objects themselves.
In similar fashion, global application variables can be stored as properties of app.locals
Unfortunately there doesn't seem to be a locals property defined for router instances. I have placed a reference to global router instance options in res.locals as the first middleware step in a route I wrote, but that was my choice.
It can happen that request properties do need to be changed during processing, such as req.path, but this is not something to avoid at all costs. For example Express provides req.originalURL so you can recalculate path components any time you need to by deliberate design.
You may find Express gets more interesting with use - I've only recently learned of and passed an error object argument to the next function. As for the Airbnb guide quote in the post: underwhelming in a word! The "good" and "bad" code quoted in the post don't do the same thing.
It's not a bad practice, this is the idea behind the middleware in express, in the simple definition, middlewares are functions that can modify the request and response object or even decide if the flow of the request continue or it's terminated. However you have to be careful and don't set a value in a pre-existing property or you can have some strange behaviors, also if the information that you are going to store in the request in big, you can think in other strategies for instance store the information in a memory database as Redis.
After reviewing the express docs I found this bit in the middleware section:
Middleware functions can perform the following tasks:
Execute any code.
Make changes to the request and the response objects.
End the request-response cycle.
Call the next middleware function in the stack.
Middleware Docs
So it's probably safe to say that if the docs explicitly say that we can modify req and response objects in middleware, it is probably not bad practice.
I'm writing a node.js server script that uses a shared text list data for multiple clients asynchronously.
the clients can read, add or update items of this shared list.
static getitems(){
if (list== undefined) list = JSON.parse(fs.readFileSync("./list.json"));
return list;
}
static additem(newitem){
var key = Object.keys(newitem)[0];
list[key] = newitem[key];
fs.writeFileSync("./list.json", JSON.stringify(list));
}
clients can modify and get the list data using the following express APIs
app.get("/getlist"), (req, res)=>{
res.send(TempMan.getTemplates());
});
app.post("/addlist"), (req, res)=>{
additem(req.body.newitem)
res.status(204).end()
});
with long background in C#, C++ and other desktop programming languages, although I red javascript doesn't run into race condition, I am so worried resource sharing is going to be a problem. I was first thinking of semaphores or shared lock or some other multiple thread management solutions in other languages, but yet read javascript doesn't need such methods.
does such node.js implementation run to resource sharing problem such as simultaneous attempts of file read/write? how can I solve this? do I need some kind of transaction functions I can use in javascript?
Generally speaking, a Node.js program may encounter a resource sharing problem you call, usually, we call it "race condition" problems. It is not due to two threads/processes but it is due to the intrinsic property: async. Assume that there are two async functions, the first one has started but is not finished and it has some await inside, in this situation, the second async function can start. It may cause race conditions if they access the same resource in their code blocks.
I have made a slide to introduce this issue: https://slides.com/grimmer/intro_js_ts_task_queuelib_d4c/fullscreen#/0/12.
Go back to your example code, your code WILL NOT have any race conditions. Even you put any usage of async function inside express routing callback instead of fs.writeFileSync, the reason is that the implementation of Express will await the first async routing callback handler function and only starts to execute the second async routing callback handler function after the first one is finished.
For example:
app.post('/testing1', async (req, res) => {
// Do something here
});
app.post('/testing2', async (req, res) => {
// Do something here
});
is like the below code in the implementation of Express,
async function expressCore() {
await 1st_routing_call_back()
await 2nd_routing_call_back()
}
But please keep in mind that the other server frameworks may not have the same behavior. https://www.apollographql.com/ and https://nestjs.com/ both allow two async routing methods to be executed concurrently. Like below
async function otherServerFrameworkCore() {
1st_routing_call_back()
2nd_routing_call_back()
}
and you need to find a way to avoid race conditions if this is your concern. Either using transaction for DB usage or some npm synchronization libraries which are lightweight and suitable for single Node.js instance program, e.g. https://www.npmjs.com/package/d4c-queue which is made by me. Multi Node.js instances are multi-processes and should have possible race condition issues and DB transaction is a more suitable solution.
I've been reading through the docs but still don't quite understand why we store express() inside an app variable.
I know we can't just call methods using express().get and .post because I tried and failed, but why?
How come it doesn't work like if we would call a function from the module.exports of any file we require?
I'm just really confused lol.
express expects you to create an instance object of it and use that. A short way of answering is to say "because that's what the makers of express expect from their users."
Across your script the expectation from the developers is that your .get and .post methods are called against a common instance of express. In this way, we can say that the call to express() initializes the instance and returns an object, which you store in app.
Edit in response to your comment:
express is a function that creates a new object based off a class
express() initializes the app object and I have not yet encountered a situation where I need to know specifically how. I have no idea if it's a function or a class. This is "encapsulation", the concept in OOP where there is a clear boundary between what you, the user of a module need to know in order to use it, and what the developer of the module needs to know to keep it working.
...dependent on the method used(ex: .get), and then uses that instance to allow us to make a route that returns things such as the req and res parameters in the callback?
The initialized object implements methods, callbacks, et al (like .get as you describe.)
All of which is in the express module?
All of which is the conventional pattern for working with the express API.
What really happens when your code call var express = require('express'), it actually imports the Factory Method called createApplication (source code here).
Meanwhile, when you do express().get and express().post, you're expecting that it will return the same instance of express app object, while it's not. Your code will work if express is using Singleton pattern under the hood (resulting in the same instance being returned on every call to express()). While the Factory Method design pattern will always create a new instance.
That said, every route you add directly using express().get or express().post will always be spread across many different application instance. So basically, it will work as advertised, but not as you expected to be.
From Express's error handling docs:
Define error-handling middleware functions in the same way as other middleware functions, except error-handling functions have four arguments instead of three: (err, req, res, next). For example:
app.use(function (err, req, res, next) {
console.error(err.stack)
res.status(500).send('Something broke!')
})
It seems that Express's .use(middlware) inspects the middleware function's length to see how many arguments it takes, and if four, then it treats it differently to other middleware functions, passing it an error as the first argument.
This is incompatible with modern linting setups like XO or Airbnb, as it forces you to define an extra trailing parameter tha doesn't get consumed (i.e. next in the above example). Also, unused trailing parameters might get removed by some automated code transformations, which is just something I don't want to worry about. In my opinion, Function.prototype.length is best kept for metaprogramming/introspection as opposed to affecting real production behaviour.
Is there any other, more explicit way to define an error handler in Express, that doesn't force you to imply intended functionality through the number of parameters your function takes?
i am starting to learn Node.js and trying to understand the architecture of it combined with the micro-framework Express.
I see that Express uses Connect as a middleware. Connect augments the request and response objects with all kinds of stuff in a chain of functions, and it provides an API so you can add custom middleware. I guess this augmenting is a way to keep things simple and flexible in the handlers/controllers, instead of having a variable number of parameters and parameter types. Here is an example of a simple GET handler:
app.get('/', function (req, res) {
res.render('index', { title: 'Hey', message: 'Hello there!'});
})
In tutorials from Node.js experts i have seen stuff like augmenting the request object with a MongoDB collection. In a blog from Azat Mardan i have seen this code :
var db = mongoskin.db('mongodb://#localhost:27017/test', {safe:true})
app.param('collectionName', function(req, res, next, collectionName){
req.collection = db.collection(collectionName)
return next()
})
The approach above is using the 'collectionName' parameter in the route name as a conditional to control the augmentation of the request. However, i have seen uglier code where the database middleware is attached on EVERY request that goes through Node.js without this conditional approach.
Looking at standard software principles like single responsibility principle, separation of concerns and testability why is it a good idea to extend the request with a MongoDB collection object and dozens of other objects? isn't the request and response object bloated with functionality this way and has unpredictable state and behavior? Where does this pattern come from and what are the pros and cons and alternatives?
This is fine. IMHO the very purpose of the request object is as a container to pass things down the stack for other handlers to use. It is far cleaner than looking for some agreed-upon-named global holder.
You could argue that it should be mostly empty, and then have the "official" request and response functionality on some property of the request/response objects, so it is cleaner, but I think the benefits are minimal.
Matter of fact, just about every middleware I have seen, including looking at the express source code and ones I have authored, uses request for exactly this sort of "container to pass properties and functionalities down the handler stack".