I am questioning whether it is required to validate fields like req.ip or req.path server-side.
It boils down to the question: Is it possible for the client to set something like .set('Remote-Addr', <Malicious JavaScript>) and it successfully being propagated to my Node or Express middleware?
Thanks for helping!
There is no way to validate source IP, particular when proxies are involved. In the proxy case, a chain of IP addresses is supposed to be put in http headers, but that can certainly be faked so what express thinks is the original IP cannot be trusted. It is likely accurate, but not guaranteed accurate.
req.path is entirely local and does not involve any client headers and is not subject to any client spoofing. It just comes from the actual HTTP request URL that arrives at your server. The only way it wouldn't be the same as the actual request URL is if you were using a mount point for routers in which case the mount point part of the path will have been removed by express. Or perhaps if your own middleware attempted to mess with it.
Related
Ive been trying to implement the WebSocket protocol from scratch in nodejs, doing so i have a question thats since been unawnsered. What exactly are subprotocols in regards to websockets? The second parameter of the WebSocket constructor is where you specify "subprotocols" -
let socket = new WebSocket("ws://localhost:3000",["http",...]);
Can anybody give me a clear awnser to what purpose they have?
Websockets just define a mechanism to exchange arbitrary messages. What those messages mean, what kind of messages a client can expect at any particular point in time or what messages they are allowed to send is entirely up to the implementing application. So you need an agreement between the server and client about these things. You might say… you need a protocol specification. The subprotocol parameter simply lets clients formally exchange this information. You can just make up any name for any protocol you want. The server can simply check that the client appears to adhere to that protocol during the handshake. You can also use it to request different kinds of protocols from the server, or use it for versioning (e.g. when you introduce my-protocol-v2, but still need to support clients only understanding my-protocol-v1).
Explained on MDN here
Think of a subprotocol as a custom XML schema or doctype declaration.
You're still using XML and its syntax, but you're additionally
restricted by a structure you agreed on. WebSocket subprotocols are
just like that. They do not introduce anything fancy, they just
establish structure. Like a doctype or schema, both parties must agree
on the subprotocol; unlike a doctype or schema, the subprotocol is
implemented on the server and cannot be externally referred to by the
client.
Subprotocols are explained in sections 1.9, 4.2, 11.3.4, and 11.5 of the spec.
A client has to ask for a specific subprotocol. To do so, it will send
something like this as part of the original handshake:
http GET /chat HTTP/1.1 ... Sec-WebSocket-Protocol: soap, wamp
or, equivalently:
... Sec-WebSocket-Protocol: soap Sec-WebSocket-Protocol: wamp
Now the server must pick one of the protocols that the client
suggested and it supports. If there is more than one, send the first
one the client sent. Imagine our server can use both soap and wamp.
Then, in the response handshake, it sends:
Sec-WebSocket-Protocol: soap
The server can't send more than one Sec-Websocket-Protocol header. If
the server doesn't want to use any subprotocol, it shouldn't send any
Sec-WebSocket-Protocol header. Sending a blank header is incorrect.
The client may close the connection if it doesn't get the subprotocol
it wants.
If you want your server to obey certain subprotocols, then naturally
you'll need extra code on the server. Let's imagine we're using a
subprotocol json. In this subprotocol, all data is passed as JSON. If
the client solicits this protocol and the server wants to use it, the
server needs to have a JSON parser. Practically speaking, this will be
part of a library, but the server needs to pass the data around.
Some sample code, copy from https://hpbn.co/websocket/#subprotocol-negotiation, to make it clear.
The client can advertise which protocols it supports to the server as
part of its initial connection handshake:
var ws = new WebSocket('wss://example.com/socket',
['appProtocol', 'appProtocol-v2']);
ws.onopen = function () {
if (ws.protocol == 'appProtocol-v2') {
...
} else {
...
}
}
Currently working on an app where I authenticate and set the auth properties of a user in a middleware (to be used in the route). These auth properties are stored in res.locals.user.
I was wondering if the api caller could access and log the res.locals variable after the request is sent back. From what I've read it doesn't seem like the case but I want to be sure.
I've seen answers refer to the request/response lifecycle so any resource on that would also be greatly appreciated.
Thank you
res.locals.user is local to your server and local to the code processing that specific request. The sender of an incoming http request has no access to that data.
If you want to share anything in that with the sender of an incoming http request, then you can write your code to include something from there in the response you are sending back as the http response or if you are using an html template engine, then you can code the template to include anything you want from res.locals.user.
Server-side variables are only available on the server unless your code specifically sends them back to the client. And, things in the req or res object are only available to that specific request while it is being processed. Once the request processing is over and your code for processing the request is done, then those specific req and res objects will be garbage collected and they are not reachable from other requests.
I'm messing around with the Darksky API and under one of the query parameters it states:
extend=hourly optional
When present, return hour-by-hour data for the next 168 hours, instead
of the next 48. When using this option, we strongly recommend enabling
HTTP compression.
I'm using Express as a node proxy which hits the Darksky api (i.e. localhost:3000/api/forecast/LATITUDE, LONGITUDE).
What does "HTTP compression" mean and how would I go about enabling it?
Here compression means the gzip compression on the express server. You can use the compression middleware to add easy gzip compression to your server.
Read more about how you can install that middleware on here.
https://github.com/expressjs/compression
An example implementation should be look like this.
var compression = require('compression')
var express = require('express')
var app = express()
// compress all responses
app.use(compression())
// add all routes
To quote from https://darksky.net/dev/docs
The Forecast Data API supports HTTP compression. We heartily recommend using it, as it will make responses much smaller over the wire. To enable it, simply add an Accept-Encoding: gzip header to your request. (Most HTTP client libraries wrap this functionality for you, please consult your library’s documentation for details.)
I'm not familiar with the Dark Sky API but I would imagine it returns a large amount of highly redundant data, which is ideal for compression. HTTP requests have a compression mechanism built in via Accept-Encoding, as mentioned above.
In your case that data will be travelling across the wire twice, once from Dark Sky to your server and then again from your server to your end user. You could compress just one of these two transmissions or both, it's up to you but it's likely you'd want both unless the end user is on the same local network as your server.
There are various SO questions about making compressed requests, such as:
node.js - easy http requests with gzip/deflate compression
The key decision for you is whether you want to decompress and recompress the data in your proxy or just stream it through. If you don't need a decompressed copy of the data in the server then it would be more efficient to skip the extra steps. You'd need to be careful to ensure all the headers are set correctly but if you just pass on the relevant headers that you receive (in both directions) it should be relatively simple to pipe through the response from Dark Sky.
In express.js we often attach objects to the req object in middleware, e.g. req.myObject. What prevents a user sending an http request that includes req.myObject already set to some value? For example, I could use req.myObject as part of authentication. Could a user set req.myObject = true when sending a request when it should really be false? Potentially an issue if req.myObject is set on some routes but not others but middleware that checks req.myObject is re-used across routes.
req is an object created by Express when a request is received. It's not something passed directly from client to the server, in fact it isn't even available to client.
A client can only relay information to the server in some limited ways - GET query, POST form data, or route paths which are attached to the req object by Express as req.query, req.body, and req.params respectively.
Anything else attached to the req object is out of scope of the client, at least directly.
Related question: Node.js request object documentation?
I'm trying to determine the best practice in a REST API for determining whether the client can access a particular resource. Two quick example scenarios:
A phone directory lookup service. Client looks up a phone number by accessing eg.
GET http://host/directoryEntries/numbers/12345
... where 12345 is the phone number to try and find in the directory. If it exists, it would return information like the name and address of the person whose phone number it is.
A video format shifting service. Client submits a video in one format to eg.
POST http://host/videos/
... and receives a 'video GUID' which has been generated by the server for this video. Client then checks eg.
GET http://host/videos/[GUID]/flv
... to get the video, converted into the FLV format, if the converted version exists.
You'll notice that in both cases above, I didn't mention what should happen if the resource being checked for doesn't exist. That's my question here. I've read in various other places that the proper RESTful way for the client to check whether the resource exists here is to call HEAD (or maybe GET) on the resource, and if the resource doesn't exist, it should expect a 404 response. This would be fine, except that a 404 response is widely considered an 'error'; the HTTP/1.1 spec states that the 4xx class of status code is intended for cases in which the client 'seems to have erred'. But wait; in these examples, the client has surely not erred. It expects that it may get back a 404 (or others; maybe a 403 if it's not authorized to access this resource), and it has made no mistake whatsoever in requesting the resource. The 404 isn't intended to indicate an 'error condition', it is merely information - 'this does not exist'.
And browsers behave, as the HTTP spec suggests, as if the 404 response is a genuine error. Both Google Chrome and Firebug's console spew out a big red "404 Not Found" error message into the Javascript console each time a 404 is received by an XHR request, regardless of whether it was handled by an error handler or not, and there is no way to disable it. This isn't a problem for the user, as they don't see the console, but as a developer I don't want to see a bunch of 404 (or 403, etc.) errors in my JS console when I know perfectly well that they aren't errors, but information being handled by my Javascript code. It's line noise. In the second example I gave, it's line noise to the extreme, because the client is likely to be polling the server for that /flv as it may take a while to compile and the client wants to display 'not compiled yet' until it gets a non-404. There may be a 404 error appearing in the JS console every second or two.
So, is this the best or most proper way we have with REST to check for the existence of a resource? How do we get around the line noise in the JS console? It may well be suggested that, in my second example, a different URI could be queried to check the status of the compilation, like:
GET http://host/videos/[GUID]/compileStatus
... however, this seems to violate the REST principle a little, to me; you're not using HTTP to its full and paying attention to the HTTP headers, but instead creating your own protocol whereby you return information in the body telling you what you want to know instead, and always return an HTTP 200 to shut the browser up. This was a major criticism of SOAP - it tries to 'get around' HTTP rather than use it to its full. By this principle, why does one ever need to return a 404 status code? You could always return a 200 - of course, the 200 is indicating that the a resource's status information is available, and the status information tells you what you really wanted to know - the resource was not found. Surely the RESTful way should be to return a 404 status code.
This mechanism seems even more contrived if we apply it to the first of my above examples; the client would perhaps query:
GET http://host/directoryEntries/numberStatuses/12345
... and of course receive a 200; the number 12345's status information exists, and tells you... that the number is not found in the directory. This would mean that ANY number queried would be '200 OK', even though it may not exist - does this seem like a good REST interface?
Am I missing something? Is there a better way to determine whether a resource exists RESTfully, or should HTTP perhaps be updated to indicate that non-2xx status codes should not necessarily be considered 'errors', and are just information? Should browsers be able to be configured so that they don't always output non-2xx status responses as 'errors' in the JS console?
PS. If you read this far, thanks. ;-)
It is perfectly okay to use 404 to indicate that resource is not found. Some quotes from the book "RESTful Web Services" (very good book about REST by the way):
404 indicates that the server can’t map the client’s URI to a
resource. [...] A web service may use a 404 response as a signal to
the client that the URI is “free”; the client can then create a new
resource by sending a PUT request to that URI. Remember that a 404 may
be a lie to cover up a 403 or 401. It might be that the resource
exists, but the server doesn’t want to let the client know about it.
Use 404 when service can't find requested resource, do not overuse to indicate the errors which are actually not relevant to the existence of resource. Also, client may "query" the service to know whether this URI is free or not.
Performing long-running operations like encoding of video files
HTTP has a synchronous request-response model. The client opens an
Internet socket to the server, makes its request, and keeps the socket
open until the server has sent the response. [...]
The problem is not all operations can be completed in the time we
expect an HTTP request to take. Some operations take hours or days. An
HTTP request would surely be timed out after that kind of inactivity.
Even if it didn’t, who wants to keep a socket open for days just
waiting for a server to respond? Is there no way to expose such
operations asynchronously through HTTP?
There is, but it requires that the operation be split into two or more
synchronous requests. The first request spawns the operation, and
subsequent requests let the client learn about the status of the
operation. The secret is the status code 202 (“Accepted”).
So you could do POST /videos to create a video encoding task. The service will accept the task, answer with 202 and provide a link to a resource describing the state of the task.
202 Accepted
Location: http://tasks.example.com/video/task45543
Client may query this URI to see the status of the task. Once the task is complete, representation of resource will become available.
I think you have changed the semantics of the request.
With a RESTful architecture, you are requesting a resource. Therefore requesting a resource that does not exist or not found is considered an error.
I use:
404 if GET http://host/directoryEntries/numbers/12345 does not exist.
400 is actually a bad request 400 Bad Request
Perhaps, in your case you could think about searching instead.
Searches are done with query parameters on a collection of resources
What you want is
GET http://host/directoryEntries/numbers?id=1234
Which would return 200 and an empty list if none exist or a list of matches.
IMO the client has indeed erred in requesting a non-existent resource. In both your examples the service can be designed in a different way so an error can be avoided on the client side. For example, in the video conversion service as the GUID has already been assigned, the message body at videos/id can contain a flag indicating whether the conversion was done or not.
Similarly, in the phone directory example, you are searching for a resource and this can be handled through something like /numbers/?search_number=12345 etc. so that the server returns a list of matching resources which you can then query further.
Browsers are designed for working with the HTTP spec and showing an error is a genuine response (pretty helpful too). However, you need to think about your Javascript code as a separate entity from the browser. So you have your Javascript REST client which knows what the service is like and the browser which is sort of dumb with regards to your service.
Also, REST is independent of protocols in theory. HTTP happens to be the most common protocol where REST is used. Another example I can think of is Android content providers whose design is RESTful but not dependent on HTTP.
I've only ever seen GET/HEAD requests return 404 (Not Found) when a resource doesn't exist. I think if you are trying to just get a status of a resource a head request would be fine as it shouldn't return the body of a resource. This way you can differentiate between requests where you are trying to retrieve the resource and requests where you are trying to check for their existance.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
Edit: I remember reading about an alternative solution by adding a header to the original request that indicated how the server should handle 404 errors. Something along the lines of responding with 200, but an empty body.