Why serve 1x1 pixel GIF (web bugs) data at all?

Why serve 1x1 pixel GIF (web bugs) data at all? - javascript

Many analytic and tracking tools are requesting 1x1 GIF image (web bug, invisible for the user) for cross-domain event storing/processing.
Why to serve this GIF image at all? Wouldn't it be more efficient to simply return some error code such as 503 Service Temporary Unavailable or empty file?
Update: To be more clear, I'm asking why to serve GIF image data when all information required has been already sent in request headers. The GIF image itself does not return any useful information.

Doug's answer is pretty comprehensive; I thought I'd add in an additional note (at the OP's request, off of my comment)
Doug's answer explains why 1x1 pixel beacons are used for the purpose they are used for; I thought I'd outline a potential alternative approach, which is to use HTTP Status Code 204, No Content, for a response, and not send an image body.
204 No Content
The server has fulfilled the request
but does not need to return an
entity-body, and might want to return
updated metainformation. The response
MAY include new or updated
metainformation in the form of
entity-headers, which if present
SHOULD be associated with the
requested variant.
Basically, the server receives the request, and decides to not send a body (in this case, to not send an image). But it replies with a code to inform the agent that this was a conscious decision; basically, its just a shorter way to respond affirmatively.
From Google's Page Speed documentation:
One popular way of recording page
views in an asynchronous fashion is to
include a JavaScript snippet at the
bottom of the target page (or as an
onload event handler), that notifies a
logging server when a user loads the
page. The most common way of doing
this is to construct a request to the
server for a "beacon", and encode all
the data of interest as parameters in
the URL for the beacon resource. To
keep the HTTP response very small, a
transparent 1x1-pixel image is a good
candidate for a beacon request. A
slightly more optimal beacon would use
an HTTP 204 response ("no content")
which is marginally smaller than a 1x1
GIF.
I've never tried it, but in theory it should serve the same purpose without requiring the gif itself to be transmitted, saving you 35 bytes, in the case of Google Analytics. (In the scheme of things, unless you're Google Analytics serving many trillions of hits per day, 35 bytes is really nothing.)
You can test it with this code:
var i = new Image();
i.src = "http://httpstat.us/204";

First, i disagree with the two previous answers--neither engages the question.
The one-pixel image solves an intrinsic problem for web-based analytics apps (like Google Analytics) when working in the HTTP Protocol--how to transfer (web metrics) data from the client to the server.
The simplest of the methods described by the Protocol, the simplest (at lest the simplest method that includes a request body) is the GET request. According to this Protocol method, clients initiate requests to servers for resources; servers process those requests and return appropriate responses.
For a web-based analytics app, like GA, this uni-directional scheme is bad news, because it doesn't appear to allow a server to retrieve data from a client on demand--again, all servers can do is supply resources not request them.
So what's the solution to the problem of getting data from the client back to the server? Within the HTTP context there are other Protocol methods other than GET (e.g., POST) but that's a limited option for many reasons (as evidenced by its infrequent and specialized use such as submitting form data).
If you look at a GET Request from a browser, you'll see it is comprised of a Request URL and Request Headers (e.g., Referer and User-Agent Headers), the latter contains information about the client--e.g., browser type and version, browser langauge, operating system, etc.
Again, this is part of the Request that the client sends to the server. So the idea that motivates the one-pixel gif is for the client to send the web metrics data to the server, wrapped inside a Request Header.
But then how to get the client to Request a resource so it can be "tricked" into sending the metrics data? And how to get the client to send the actual data the server wants?
Google Analytics is a good example: the ga.js file (the large file whose download to the client is triggered by a small script in the web page) includes a few lines of code that directs the client to request a particular resource from a particular server (the GA server) and to send certain data wrapped in the Request Header.
But since the purpose of this Request is not to actually get a resource but to send data to the server, this resource should be a small as possible and it should not be visible when rendered in the web page--hence, the 1 x 1 pixel transparent gif. The size is the smallest size possible, and the format (gif) is the smallest among the image formats.
More precisely, all GA data--every single item--is assembled and packed into the Request URL's query string (everything after the '?'). But in order for that data to go from the client (where it is created) to the GA server (where it is logged and aggregated) there must be an HTTP Request, so the ga.js (google analytics script that's downloaded, unless it's cached, by the client, as a result of a function called when the page loads) directs the client to assemble all of the analytics data--e.g., cookies, location bar, request headers, etc.--concatenate it into a single string and append it as a query string to a URL (*http://www.google-analytics.com/__utm.gif*?) and that becomes the Request URL.
It's easy to prove this using any web browser that has allows you to view the HTTP Request for the web page displayed in your browser (e.g., Safari's Web Inspector, Firefox/Chrome Firebug, etc.).
For instance, i typed in valid url to a corporate home page into my browser's location bar, which returned that home page and displayed it in my browser (i could have chosen any web site/page that uses one of the major analytics apps, GA, Omniture, Coremetrics, etc.)
The browser i used was Safari, so i clicked Develop in the menu bar then Show Web Inspector. On the top row of the Web Inspector, click Resources, find and click the utm.gif resource from the list of resources shown on the left-hand column, then click the Headers tab. That will show you something like this:
Request URL:http://www.google-analytics.com/__utm.gif?
utmwv=1&utmn=1520570865&
utmcs=UTF-8&
utmsr=1280x800&
utmsc=24-bit&
utmul=enus&
utmje=1&
utmfl=10.3%20r181&
Request Method:GET
Status Code:200 OK
Request Headers
User-Agent:Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/533.21.1
(KHTML, like Gecko) Version/5.0.5 Safari/533.21.1
Response Headers
Cache-Control:private, no-cache, no-cache=Set-Cookie, proxy-revalidate
Content-Length:35
Content-Type:image/gif
Date:Wed, 06 Jul 2011 21:31:28 GMT
The key points to notice are:
The Request was in fact a request
for the utm.gif, as evidenced by the
first line above: *Request
URL:http://www.google-analytics.com/__utm.gif*.
The Google Analytics parameters are clearly visible in the query string
appended to the Request URL: e.g.,
utmsr is GA's variable name to refer to the client screen
resolution, for me, shows a value of
1280x800; utmfl is the variable
name for flash version, which has a
value of 10.3, etc.
The Response Header called
Content-Type (sent by the server back to the client) also confirms
that the resource requested and
returned was a 1x1 pixel gif:
Content-Type:image/gif
This general scheme for transferring data between a client and a server has been around forever; there could very well be a better way of doing this, but it's the only way i know of (that satisfies the constraints imposed by a hosted analytics service).

Some browsers may display an error icon if the resource could not load. It makes debugging/monitoring the service also a little bit more complicated, you have to make sure that your monitoring tools treat the error as a good result.
OTOH you don't gain anything. The error message returned by the server/framework is typically bigger then the 1x1 image. This means you increase your network traffic for basically nothing.

Because such a GIF has a known presentation in a browser - it's a single pixel, period. Anything else presents a risk of visually interfering with the actual content of the page.
HTTP errors could appear as oversized frames of error text or even as a pop-up window. Some browsers may also complain if they receive empty replies.
In addition, in-page images are one of the very few data types allowed by default in all broswers. Anything else may require explicit user action to be downloaded.

This is to answer the OP's question - "why to serve GIF image data..."
Some users will put a simple img tag to call your event logging service -
<img src="http://www.example.com/logger?event_id=1234">
In this case, if you don't serve an image, the browser will show a placeholder icon that will look ugly and give the impression that your service is broken!
What I do is, look for the Accept header field. When your script is called via an img tag like this, you will see something like following in the header of the request -
Accept: image/gif, image/*
Accept-Encoding:gzip,deflate
...
When there is "image/"* string in the Accept header field, I supply the image, otherwise I just reply with 204.

Well the major reason is to attach the cookie to it so if users go from one side to another we still have the same element to attach cookie to.

#Maciej Perliński is basically correct, but I feel a detailed answer will be beneficial.
why 1x1 GIF and not a 204 No-Content status code?
204 No-Content enables the server to omit all response headers (Content-Type, Content-Length, Content-Encoding, Cache-Control etc...) and return an empty response body with 0 bytes (and saving a lot of unneeded bandwidth).
Browsers know to respect 204 No-Content responses, and not to expect/wait for response headers and response body.
if the server needs to set any response header (e.g. cache-control or cookie), he cannot use 204 No-Content because browsers will ignore any response header by design (according to the HTTP protocol spec).
why 1x1 GIF and not a Content-Length: 0 header with 200 OK status code?
Probably a mix of several issues, just to name a few:
legacy browsers compatibility
MIME type checks on browsers, 0 bytes is not a valid image.
200 OK with 0 bytes might not be fully supported by intermediate proxy servers and VPNs

You don't have to serve an image if you are using the Beacon API (https://w3c.github.io/beacon/) implementation method.
An error code would work if you have access to the log files of your server. The purpose of serving the image is to obtain more data about the user than you normally would with a log file.

Related

Capture jQuery $.ajax error (or Browser console error) in Javascript [duplicate]

This question is related to Cross-Origin Resource Sharing (CORS, http://www.w3.org/TR/cors/).
If there is an error when making a CORS request, Chrome (and AFAIK other browsers as well) logs an error to the error console. An example message may look like this:
XMLHttpRequest cannot load http://domain2.example. Origin http://domain1.example is not allowed by Access-Control-Allow-Origin.
I'm wondering if there's a way to programmatically get this error message? I've tried wrapping my xhr.send() call in try/catch, I've also tried adding an onerror() event handler. Neither of which receives the error message.

See:
http://www.w3.org/TR/cors/#handling-a-response-to-a-cross-origin-request
...as well as notes in XHR Level 2 about CORS:
http://www.w3.org/TR/XMLHttpRequest2/
The information is intentionally filtered.
Edit many months later: A followup comment here asked for "why"; the anchor in the first link was missing a few characters which made it hard to see what part of the document I was referring to.
It's a security thing - an attempt to avoid exposing information in HTTP headers which might be sensitive. The W3C link about CORS says:
User agents must filter out all response headers other than those that are a simple response header or of which the field name is an ASCII case-insensitive match for one of the values of the Access-Control-Expose-Headers headers (if any), before exposing response headers to APIs defined in CORS API specifications.
That passage includes links for "simple response header", which lists Cache-Control, Content-Language, Content-Type, Expires, Last-Modified and Pragma. So those get passed. The "Access-Control-Expose-Headers headers" part lets the remote server expose other headers too by listing them in there. See the W3C documentation for more information.
Remember you have one origin - let's say that's the web page you've loaded in your browser, running some bit of JavaScript - and the script is making a request to another origin, which isn't ordinarily allowed because malware can do nasty things that way. So, the browser, running the script and performing the HTTP requests on its behalf, acts as gatekeeper.
The browser looks at the response from that "other origin" server and, if it doesn't seem to be "taking part" in CORS - the required headers are missing or malformed - then we're in a position of no trust. We can't be sure that the script running locally is acting in good faith, since it seems to be trying to contact servers that aren't expecting to be contacted in this way. The browser certainly shouldn't "leak" any sensitive information from that remote server by just passing its entire response to the script without filtering - that would basically be allowing a cross-origin request, of sorts. An information disclosure vulnerability would arise.
This can make debugging difficult, but it's a security vs usability tradeoff where, since the "user" is a developer in this context, security is given significant priority.

Externally load Json with jquery.getJSON

I don't know if this is a duplicate post or not, sorry if it is. I'm using jquery.getJSON to load a json on my server which works just fine. Although, if I try and load a json file on a different server it doesn't work. I know I don't have any code here (because there's not much point) but I just want to know if I'm using it wrong or if it isn't supposed to load external files. I'm using the iOS Safari browser if that effects anything.
EDIT: I've looked at the console (idk what the error thing really means, it's just red with an x by the url it's trying to get the json from) and it looks like it's not actually receiving the data. Plus, do remember I'm on iOS, not desktop so I couldn't look at the console in the "Develop tab :P
EDIT 2: Great! I think I got it working! http://skitty.xyz/getJSON/

You're most likely encountering a path issue; the purpose of $.getJSON is to acquire data via http GET request so yes, it is intended to work remotely. To diagnose your issue, make certain you can access the json file in your browser first: http://domain.com/my_data.json. If that works, use that as the URL you pass into $.getJSON:
$.getJSON( 'http://domain.com/my_data.json', function(data) {
// do something with your data
});
http://api.jquery.com/jquery.getjson/

jquery.getJSON uses ajax which is all about external resources. Here's a couple things to check for if it's not working on an external resource:
1: Is the path you specified correct? The usage is jquery.getJSON(path, callback). The path should be something you can just drop in your browser and see. If an incorrect path is your problem, you'll see a 404 in the console.
2: Is the resource http and your site https? Non-secure resources on secure pages will get blocked by browser security features. You'd see a error to this effect in the console.
3: Is CORS (Cross-origin resource sharing) enabled for your site on the external resource? Servers will sometimes use a whitelist of IPs and domains to determine what origins are allowed to make requests of it. You'd also see an error to this effect in the console.
There probably some other things to look for but this is where I'd start.

Also, by all means, use the debugging features of Safari to LQQK at the actual HTTP data-streams that are passing back-and-forth in response to what you're doing. (You might need to click on a preference to see the "Develop" menu, which will take you to "Show Web Inspector" and its Network tab.)
This approach will instantly answer many questions that a JavaScript-centered approach will not so-readily tell you. (And of course, you can look at the JavaScript console too ... and at the same time.) "The actual data streams, please." Safari will tell you "exactly what bytes" your app actually sent to the server, and "exactly what bytes" the server sent in return. "Priceless!™"

Are you saying you are using jquery ajax request to load some json data from a server?
check the "not working server" has the same end point as your server.
Check if the url you want to get data from is correct.
check if console logged any errors.
Also quote from http://api.jquery.com/jquery.getjson/
"Additional Notes:
Due to browser security restrictions, most "Ajax" requests are subject to the same origin policy; the request can not successfully retrieve data from a different domain, subdomain, port, or protocol.
Script and JSONP requests are not subject to the same origin policy restrictions."

I need a more detailed understanding of precisely how cookies work

I can build a full stack app using Ruby on Rails, JavaScript, React, HTML and CSS. Yet, I feel I don't understand completely how cookies actually work and what they are precisely. Below I write what I think they are, and ask that someone confirm or correct what is written.
An HTTP request contains an HTTP method, a path, the HTTP protocol version, headers, and a body.
An HTTP response contains the HTTP protocol version, a status code, a status message, headers, and a body.
Both are simply text (which means that they are simply sequences of encoded characters), but when this text is parsed it contains useful structure. Is there one single structure that an HTTP request is usually parsed into (an array, a hash)? What about an HTTP response?
Cookies represent some content associated with a specific header in an HTTP request, specifically the "Cookie" header.
When building an HTTP response, the server sets the 'Set-Cookie' header. This header needs the following information: a name for the cookie, a path, and the actual content of the cookie. The path is a description of the range of URLs for which this cookie should be sent from client to server.
Does the browser keep a list of cookies (ie, a list of elements that are each text of some sort), and it only sends the right ones to the right sites (say a google cookie to google.com)?
Let's say I visit site A and then site B and authenticate on both. Session management just adds a specific element in the cookies (perhaps a hash named Session inside another hash that corresponds to the totality of the cookie stored in Cookie), correct? How do sites alter my cookies? Do they append new information, do they ask my browser to append information?

A cookie is a string (with a specific format) that your browser stores. It can be set by a server when it sends a http-response, by the 'Set-Cookie' header. Each http-request that your browser sends that matches the cookie's path will contain that cookie in the 'Cookie' header.
The server cannot tell the browser to append data to the cookie. It can only get the current cookie value, add to it the new information, and then reset it.

HTML SSE request body

When using the EventSource API in JavaScript, is there any way to send a request body along with the HTTP request initiating the polling?
I need to send a large blob of JSON to the server at the SSE request so that the server can calculate what events to send to the client. It seems daft to do web-sockets when I don't need it or do weird things with cookies or multiple requests.
I worry i'll run in to length limits on query strings if I bundle the data in to that, which may be likely.
Thanks in advance!

The initial SSE request is a fairly ordinary HTTP GET request, so:
Given that SSE is only supported by modern browsers, the maximum URL length should not be assumed to be the old 255 bytes "for old browsers". Most modern browsers allow longer URLs, with IE providing the lowest cap of ~2k. (granted EventSource is not supported on IE anyway, but there's an XHR polyfill...) However, if by large blob you mean several kilobytes, the URL is not reliable. Proxies could also potentially cause problems.
See:
What is the maximum length of a URL in different browsers?,
Is there any limitation on Url's length in Android's WebView.loadUrl method?,
http://www.benzado.com/blog/post/28/iphone-openurl-limit
You could also store information in one or more cookies which will be sent along with the GET request. This could include a cookie you set on the request for the page that uses SSE, or a cookie you set in javascript (prior to creating your EventSource object). The max size for a cookie is specified as being at least 4096 bytes (which is the whole cookie, so somewhat less for your actual data portion) with at least 20 cookies per hostname supported. Emperical testing appears to bear this out: http://browsercookielimits.x64.me/ Worst case you could possibly chunk the information in multiple cookies.
Larger than that, and I think you need an initial request that uploads the JSON and sends back an ID that is referenced by the SSE request.
It is technically possible, but (strongly) discouraged, to send a body with a GET request. See HTTP GET with request body. The EventSource constructor only takes a URL and so does not directly support this.
As dandavis pointed out, you can compress your JSON.

What's the RESTful way to check whether the client can access a resource?

I'm trying to determine the best practice in a REST API for determining whether the client can access a particular resource. Two quick example scenarios:
A phone directory lookup service. Client looks up a phone number by accessing eg.
GET http://host/directoryEntries/numbers/12345
... where 12345 is the phone number to try and find in the directory. If it exists, it would return information like the name and address of the person whose phone number it is.
A video format shifting service. Client submits a video in one format to eg.
POST http://host/videos/
... and receives a 'video GUID' which has been generated by the server for this video. Client then checks eg.
GET http://host/videos/[GUID]/flv
... to get the video, converted into the FLV format, if the converted version exists.
You'll notice that in both cases above, I didn't mention what should happen if the resource being checked for doesn't exist. That's my question here. I've read in various other places that the proper RESTful way for the client to check whether the resource exists here is to call HEAD (or maybe GET) on the resource, and if the resource doesn't exist, it should expect a 404 response. This would be fine, except that a 404 response is widely considered an 'error'; the HTTP/1.1 spec states that the 4xx class of status code is intended for cases in which the client 'seems to have erred'. But wait; in these examples, the client has surely not erred. It expects that it may get back a 404 (or others; maybe a 403 if it's not authorized to access this resource), and it has made no mistake whatsoever in requesting the resource. The 404 isn't intended to indicate an 'error condition', it is merely information - 'this does not exist'.
And browsers behave, as the HTTP spec suggests, as if the 404 response is a genuine error. Both Google Chrome and Firebug's console spew out a big red "404 Not Found" error message into the Javascript console each time a 404 is received by an XHR request, regardless of whether it was handled by an error handler or not, and there is no way to disable it. This isn't a problem for the user, as they don't see the console, but as a developer I don't want to see a bunch of 404 (or 403, etc.) errors in my JS console when I know perfectly well that they aren't errors, but information being handled by my Javascript code. It's line noise. In the second example I gave, it's line noise to the extreme, because the client is likely to be polling the server for that /flv as it may take a while to compile and the client wants to display 'not compiled yet' until it gets a non-404. There may be a 404 error appearing in the JS console every second or two.
So, is this the best or most proper way we have with REST to check for the existence of a resource? How do we get around the line noise in the JS console? It may well be suggested that, in my second example, a different URI could be queried to check the status of the compilation, like:
GET http://host/videos/[GUID]/compileStatus
... however, this seems to violate the REST principle a little, to me; you're not using HTTP to its full and paying attention to the HTTP headers, but instead creating your own protocol whereby you return information in the body telling you what you want to know instead, and always return an HTTP 200 to shut the browser up. This was a major criticism of SOAP - it tries to 'get around' HTTP rather than use it to its full. By this principle, why does one ever need to return a 404 status code? You could always return a 200 - of course, the 200 is indicating that the a resource's status information is available, and the status information tells you what you really wanted to know - the resource was not found. Surely the RESTful way should be to return a 404 status code.
This mechanism seems even more contrived if we apply it to the first of my above examples; the client would perhaps query:
GET http://host/directoryEntries/numberStatuses/12345
... and of course receive a 200; the number 12345's status information exists, and tells you... that the number is not found in the directory. This would mean that ANY number queried would be '200 OK', even though it may not exist - does this seem like a good REST interface?
Am I missing something? Is there a better way to determine whether a resource exists RESTfully, or should HTTP perhaps be updated to indicate that non-2xx status codes should not necessarily be considered 'errors', and are just information? Should browsers be able to be configured so that they don't always output non-2xx status responses as 'errors' in the JS console?
PS. If you read this far, thanks. ;-)

It is perfectly okay to use 404 to indicate that resource is not found. Some quotes from the book "RESTful Web Services" (very good book about REST by the way):
404 indicates that the server can’t map the client’s URI to a
resource. [...] A web service may use a 404 response as a signal to
the client that the URI is “free”; the client can then create a new
resource by sending a PUT request to that URI. Remember that a 404 may
be a lie to cover up a 403 or 401. It might be that the resource
exists, but the server doesn’t want to let the client know about it.
Use 404 when service can't find requested resource, do not overuse to indicate the errors which are actually not relevant to the existence of resource. Also, client may "query" the service to know whether this URI is free or not.
Performing long-running operations like encoding of video files
HTTP has a synchronous request-response model. The client opens an
Internet socket to the server, makes its request, and keeps the socket
open until the server has sent the response. [...]
The problem is not all operations can be completed in the time we
expect an HTTP request to take. Some operations take hours or days. An
HTTP request would surely be timed out after that kind of inactivity.
Even if it didn’t, who wants to keep a socket open for days just
waiting for a server to respond? Is there no way to expose such
operations asynchronously through HTTP?
There is, but it requires that the operation be split into two or more
synchronous requests. The first request spawns the operation, and
subsequent requests let the client learn about the status of the
operation. The secret is the status code 202 (“Accepted”).
So you could do POST /videos to create a video encoding task. The service will accept the task, answer with 202 and provide a link to a resource describing the state of the task.
202 Accepted
Location: http://tasks.example.com/video/task45543
Client may query this URI to see the status of the task. Once the task is complete, representation of resource will become available.

I think you have changed the semantics of the request.
With a RESTful architecture, you are requesting a resource. Therefore requesting a resource that does not exist or not found is considered an error.
I use:
404 if GET http://host/directoryEntries/numbers/12345 does not exist.
400 is actually a bad request 400 Bad Request
Perhaps, in your case you could think about searching instead.
Searches are done with query parameters on a collection of resources
What you want is
GET http://host/directoryEntries/numbers?id=1234
Which would return 200 and an empty list if none exist or a list of matches.

IMO the client has indeed erred in requesting a non-existent resource. In both your examples the service can be designed in a different way so an error can be avoided on the client side. For example, in the video conversion service as the GUID has already been assigned, the message body at videos/id can contain a flag indicating whether the conversion was done or not.
Similarly, in the phone directory example, you are searching for a resource and this can be handled through something like /numbers/?search_number=12345 etc. so that the server returns a list of matching resources which you can then query further.
Browsers are designed for working with the HTTP spec and showing an error is a genuine response (pretty helpful too). However, you need to think about your Javascript code as a separate entity from the browser. So you have your Javascript REST client which knows what the service is like and the browser which is sort of dumb with regards to your service.
Also, REST is independent of protocols in theory. HTTP happens to be the most common protocol where REST is used. Another example I can think of is Android content providers whose design is RESTful but not dependent on HTTP.

I've only ever seen GET/HEAD requests return 404 (Not Found) when a resource doesn't exist. I think if you are trying to just get a status of a resource a head request would be fine as it shouldn't return the body of a resource. This way you can differentiate between requests where you are trying to retrieve the resource and requests where you are trying to check for their existance.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
Edit: I remember reading about an alternative solution by adding a header to the original request that indicated how the server should handle 404 errors. Something along the lines of responding with 200, but an empty body.

We Keep Coding

JavaScript is the programming language of the Web.