Fetch raw gzip-encoded web page into Uint8Array - javascript

I'm using fetch to retrieve a URL. This is in code that is acting as a proxy and if it gets a response with content-encoding: gzip, I want to get the raw data so I can send it back to the consumer without decoding it.
But I see no way to do this. Response.blob, Response.arrayBuffer, Response.body.getReader(), .... these all seem to decode the content.
So, for example, in my test case, I end up with a 15K array of bytes instead of the actual response body which was only 4k.
Is there a way to get the raw contents of the response without it being decoded?

The browser automatically decodes compressed, gzip-encoded HTTP responses in its low-level networking layer before surfacing response data to the programatic Javascript layer. Given that the compressed bytes are already transmitted over the network and available in users' browsers, that data has already been "sent to the customer".
The Compression Streams API can performantly create a Gzip-encoded, ArrayBuffer, Blob, etc:
const [ab, abGzip] = await Promise.all((await fetch(url)).body.tee().map(
(stream, i) => new Response(i === 0
? stream
: stream.pipeThrough(new CompressionStream('gzip'))
).arrayBuffer()
))
Full example
https://batman.dev/static/74634392/
A gzip-encoded SVG image is provided to users as a downloadable file link. The original SVG image is retrived using fetch() from a gzip-encoded response.

Related

Confusion about HTTP transferring non-text (binary) data

I was looking at this MDN tutorial https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages
where it says
HTTP messages are composed of textual information encoded in ASCII.
I thought it means that HTTP can only transfer textual info aka strings, assuming the HTTP message here refers to header + body in responses.
But later I found out that HTTP response body can have multiple MIME types outside of text, such as image, video, application/json etc. Doesn't that mean HTTP can also transfer non-textual information, which contradicts what that MDN page says about HTTP messages?
I am aware of encoding methods like utf-8 and base64, I guess you can use Base64 Encoding for the binary data so that it is transformed into text — and then can be sent with an application/json content type as another property of the JSON payload. But when you choose not to do encoding, instead using correct content-type you can just transfer the binary data? I am still trying to figure this out.
Also I have some experience consuming REST APIs from the front end. My impression is that you typically don't transfer any binary data e.g. images, files, audios with RESTful APIs. They often serve JSON or XML as the response. I wonder why is that? Is it because REST APIs is not suitable for transferring binary data directly? What are some of the common practice for transferring images or audios files to the front end?
The line you quoted is talking about the start line, status line, and headers, which use only ASCII.
The body of a request or response is an arbitrary sequence of bytes. It's mainly intepreted by the application, not by the HTTP layer. It doesn't need to be in any particular encoding. The header has a Content-Length field, and the client simply reads that many bytes after the header (there's also chunked encoding, which breaks the content up into chunks, but each one starts with a byte length, and the client simply concatenates them).
In addition, HTTP includes Transfer-Encoding types that specify the encoding of the data. This includes a number of compression formats that produce binary data.
While it's possible to use textual encodings such as base64, this is not usually done in HTTP because it increases the size of the message and it's not necessary.

Sending pdf as binary in POST request (from Javascript)

I want:
To send a file (I have my eye on .docx and text files, but let's use a .pdf as an example) as binary in the body of a POST request from a browser (Javascript).
Main problem
I can do this just fine in POSTMAN. You can select "binary" as the body type, and viola! your file is configured to be the body. But I don't know how to mimic that behavior in Javascript.
My question is: On the client-side, in Javascript, how can I get a file into my POST request as binary?
Specifically, how can I get the file into the same format that POSTMAN uses when you select Body -> Binary in a POST request?
For context:
I have been using this guide to get everything configured how I want in AWS. It ends with making requests in POSTMAN. But adding a file as binary in POSTMAN is one thing - doing it from a browser in Javascript is another, and the main question that I have.
I am sending this through API Gateway to a Lambda function. I have API Gateway configured to handle application/pdf as binary, and the Lambda function to decode it once it arrives.
So I think I want to hand it in as a binary blob, not base64. But not sure exactly how.
JavaScript
postBinary() {
var settings = {
"url": "https:<my-aws-api>.amazonaws.com/v1/upload",
"method": "POST",
"timeout": 0,
"headers": {
"Content-Type": "application/pdf"
},
"data": <my pdf file here as binary>
};
$.ajax(settings).done(function (response) {
console.log(response);
});
},
API Gateway:
Integration has 'When there are no templates defined (recommended)' set to 'Content-Type':'application/pdf'. The API's Binary Media Types have 'application/pdf' set. I know I have CORS set correctly - I can pass strings through the POST request and get success messages back, but I would like to handle files here, not just a simple string. I also want to avoid requiring the client side to parse out data on their end.
My Lambda function will take the file and then parse information out of it, then send it back.
Lambda function:
import json
import base64
import boto3
BUCKET_NAME = 'my-bucket'
def lambda_handler(event, context):
file_content = base64.b64decode(event['content'])
parsed_data = some_function(file_content) # parse information from file
return {
'statusCode': 200,
'body': {
'file_path': file_path
}
}
In the end, we want a user experience of: choose a file, send to API, get back parsed data. Simple.
Note: I know there are lots of good reasons to put files in s3 instead of going through Lambda first. But our files are small, and we are not concerned about taking considerable compute time/power in Lambda. Further, we want to avoid sending to s3 right away because we would like the user to only have to make one call to the API: send file in POST request, get results. If we send to s3 first, the user has to send multiple requests: request pre-signed URL, send file to s3, request results from parsing.
I am mostly concerned with the fact that this is possible in POSTMAN and it must be possible via browser/Javascript as well.
Thanks, everyone!

Insert Gzip response into json object

I'm building a simple proxy with php for server side and javascript for the client side (browser).
the php server receives post requests with json data , something like that:
{headersArray: ["Get /someurl...","Cookie: some cookie"],url...}
than after extracting the headers and the url the php code uses curl to fetch the resource , and at this point i would like to construct a response object on the client via the "Response" constructor , so i need to transmit back to the client the headers also so i thought about constructing again a json object which will contain the headers and the response body , but a lot of servers uses gzip encoding, can i insert the gzip encoded response body as a json property and safely transmit it back to the client? would i need to decode it on the client browser? does it add a lot of overhead? any better ideas?

Why is my bas64 encoded image creating a http request?

I have encoded all the images in my css to base64 encoded data to reduce the number of http requests in the website. However, it appears that there is still an http request for the data encoded images as you can see below.
I tried checking for a solution on the web but everywhere it says that there should be no http request for images which are encoded to base64. What am I doing wrong ?
It is not another request however it will show up in your assets.

Unable to fetch thumbnails using Dropbox core API - 401

I am able to successfully retrieve metadata using the Dropbox API using the following URL:
https://api.dropbox.com/1/metadata/dropbox/mse
When I try to retrieve a thumbnail for a listed asset, I get a 401:
https://api-content.dropbox.com/1/thumbnails/dropbox/mse/modem_status.png?size=l
In both cases I am providing the access token in the header:
Authorization: Bearer 4JSL1tGWoVEAAAAAAAAAAUxNYpLbiYw-D8l3vqTKRKNBuGnezhps8j.....
I can't see what is missing here.
It turns out that an image is being returned - I was using POSTMAN to test the url and two urls show up in the log - the first returns a 200 with an image buffer containing JPEG data, and a subsequent url that fails with a 401. (the second GET does not contain the authorization header). However, I am at a loss as to how to make use of JPEG data returned from an XHR request - how am I supposed to create an <img> element this way? Isn't there something equivalent to /media which returns a public url that is valid for 4 hours?

Categories