Base64 decode from Buffer to Buffer efficiently in node (node.js) - javascript

I currently have a python and C version of wsproxy (WebSockets to plain TCP socket proxy) in noVNC. I would like to create a version of wsproxy using node.js. A key factor (and the reason I'm not just using existing node WebSocket code) is that until the WebSocket standard has binary encoding, all traffic between wsproxy and the browser/client must be encoded (and base64 decode/encode is fast and easy in the browser).
Buffer types have base64 encoding support but this is from a Buffer to a string and vice versa. How can I base64 encode/decode between two buffers without having to convert to a string first?
Constraints:
Direct Buffer to Buffer (unless you can show Buffer->string->Buffer is just as fast).
Since node has built-in base64 support I would like to use that and not external modules.
In place encode/decode within a single Buffer is acceptable.
Here is a discussion of base64 support in node, but from what I can see doesn't answer my question.

You should be able to do this using streams, but first read through this blog about UTF-8 decoding because you will likely encounter similar issues. I'm not suggesting that you do UTF-8 encode/decode if you don't need it, but that you look at how this code handled the issue of a single character spread across multiple bytes that were separated by a chunk boundary.

Related

deserialize protostuff byte array with javascript

I used protostuff to transform to byte array a json input i have. The code in java is:
LinkedBuffer buffer = LinkedBuffer.allocate(1024);
Schema<String> orderSchema = RuntimeSchema.getSchema(String.class);
int i = 1 ;
for(String p:poligonsStr) {
buffer.clear();
byteslist.add(ProtostuffIOUtil.toByteArray(p, orderSchema, buffer));
}
The problem is I don't know the algorithm that is used and how I can decode with the JavaScript client (Node.js). Also I saw there is a very good algorithm called Smile implemented for protostuff in project com.dyuproject.protostuff but I would like to know how to get schema with that library- I didn't manage that yet.
I would like to know what's the best to use: ProtostuffIOUtil or SmileIOUtil?
And how to use? And how to decode with JavaScript?
protostuff binary encoding is different from protobuf, and as far as I know there is no JavaScript library to decode protostuff-encoded data at the moment.
smile is not supported by web browsers out of the box, but there are libraries that can decode it.
As for me, there are two optiomal ways how you can encode data on server using Protostuff library, and decode it using JavaScript on client side:
Use protobuf encoding, it is good if size of encoded data is important. On server side, you should use ProtobufIOUtil to serialize your data to protobuf binary format. On client side, you can use https://github.com/dcodeIO/ProtoBuf.js/ to decode binary data from server.
Use JSON encoding, it is native format for JavaScript and usually it will be parsed faster than binary protobuf-encoded data. On server side, you should use JsonIOUtil (from protostuff-json module) to serialize your data to JSON text format. On client side, it is supported out of the box.
Here is an example how to serialize your POJO into protobuf binary using Protostuff: HelloService.java

Should I use Buffer and UTF-8 in Node.js for sending data across the wire?

I have a system with two process, one in Java and one in Node.js. The node.js process is a web front-end. It ingests data and sends it to a queue for process, where it is consumed by the Java process. The data is a string of user data collected from browser code, and I create a string of the data in Node.js using JSON.stringify(data), and push it to kinesis, the aws version of kafka. The java process receives the raw bytes, creates a string object, and then parses the json.
My question is this: I am not sure what decoding I should instruct Java to use on the raw bytes. Right now, it "just works" with the default decoding, but I feel this is a bad idea, since the default decoding could be platform-dependent. Should I use Buffer on the Node.js side, and encode the string into UTF-8 before I push it to the queue? That way I could explicitly set UTF-8 as the decoding on the java side? Is this a best practice? Any advice much appreciated.

Replacing Base64 - Is http/https communication 8 bit clean?

Here is an overview of what 8 bit clean means.
In the context of web applications, why are images saved as Base64? There is a 33% overhead associated with being 8 bit clean.
If the transmission method is safe there is no need for this.
But basically, my images are saved in Base64 on the server, and transferred to the client, which as we all know can read Base64.
Here is the client side version of Base 64 in an SO Post.
How can you encode a string to Base64 in JavaScript?
Is http/https 8 bit clean?
Reference
http://www.princeton.edu/~achaney/tmve/wiki100k/docs/8-bit_clean.html
http://en.wikipedia.org/wiki/8-bit_clean
You are asking two different things.
Q: Is http 8 bit clean?
A: yes HTTP is "bit 8 clean".
Q: In the context of web applications, why are images saved as Base64?
A: images are not usually saved in Base64. In fact, they are almost never. They are usually saved or transmitted or streamed in compressed binary format (PNG or JPG or similar)
Base64 is used to embed images inside the HTML.
So, you got an image logo.png. You include it statically in your page as <img src='logo.png'>. The image is transmitted thru HTTP in binary, no encoding in neither browser nor server side. This is the most common case.
Alternatively, you might decide to embed the contents of the image inside the HTML. It has some advantages: The browser will not need to do a second trip to the server to fetch the image, because the browser has already received it in the same HTTP GET response of the HTML file. But some disadvantages, because HTML files are text and certain character values may have special meaning for HTML (not for HTTP), you cannot just embed the binary values inside the HTML text. You have to encode them to avoid such collisions. The most usual encoding method is base64, which avoids all the collisions with only a 33% of overhead.
RFC 2616s abstract states:
A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.
HTTP always starts with a text-only header and in this header the content-type is specified.
As long as sender and receiver agree on this contents type anything is possible.
HTTP relies on a reliable (recognize the wordplay) transport layer such as TCP. HTTPS only adds security to the transport layer (or between the transport layer and HTTP, not sure about this).
So yep, http(s) is 8 bit clean.
In addition to PAs answer and your question "But why use an encoding method that adds 33% overhead, when you don't need it?": because that's part of a different concept!
HTTP transfers data of any kind, and the http-content may be an html file with an embedded picture. But after receiving that html file a browser or some other renderer has to interpret the html content. And that follows different standards, which require arbitrary data to be encoded. html is not 8-bit clean, in fact it is not even 7-bit clean as there are many restrictions on the characters used and their order of appearance.
In the context of web applications, why are images saved as Base64?
There is a 33% overhead associated with being 8 bit clean.
Base64 is used to allow 8-bit binary data to be presented as printable text within the ASCII definition. This is only 7-bits, not 8 as the last 128 characters would be depending on set encoding (Latin1, UTF8 etc.) which means that the encoded data could be mangled if a different encoding type was set at client/receiver end compared to source.
As there aren't enough printable characters within ASCII to represent all 8-bit values (which has absolute values and aren't dependent on encoding itself) you need to "water out the bits" and base-64 keeps high enough numbers to enable the bytes to be represented as printable chars.
This is the 33% overhead you see as the byte values representing characters outside the printable range must be shifted to a value that becomes printable within the ASCII table; Base-64 allows this (you could also use quoted printable which was common in the past, ie. with Usenet, email etc.).
I'm thinking about writing another encoding type to remove the overhead.
Good luck :-)
Related to the query
Is HTTP 8-bit clean ?
HTTP protocol is not in entirety a 8-bit clean protocol.
HTTP Entity Body is 8-bit clean since there is a provision to suggest the content-type, allowing content-negotiation between the interacting entities as pointed by everyone in this thread.
However the request line , the headers and the status line are not 8-bit clean.
In order to send any binary information as part of
the request line, as part of query parameters / path segments
header
one must use one of the binary-to-text encoding to preserve the binary values.
For instance when sending a signature as part of query parameters or headers , which is the case of signed URL technique employed by CDN , the signature a binary information has to be encoded to preserve the binary value of it.

Wrapping a bidirectional stream in NodeJS

I would like to wrap a socket into another object which:
transforms output - e.g. turning strings into Base64
transforms input - e.g. turning Base64 into strings
(Note: my use case is not Base64 but is isomorphic to that and would significantly complicate the question.)
It is trivial to do this in the two direction separately - e.g. pipe socket into a Base64 decoder and write into a Base64 encoder which pipes into the socket.
I would like to generate a single new object from a socket, which could be written to and read from (via data events), yet perform the required transformations for both directions.
The solution needs to support Node 0.8.X and 0.10.X.
This seems to be one approach:
https://github.com/ajlopez/ObjectStream/blob/master/lib/objectstream.js

Loading EUC-JP and other Japanese text encodings in Node.JS

I'm trying to scrape some Japanese websites for a personal project. Sites with text in UTF-8 work perfectly fine, as you'd expect, but I can't get any text out of sites specifying other international encodings, specifically EUC-JP. Node also seems to be interpreting the text and performing modifications rather than passing it on raw - I've tried setting the response to be interpreted as both ascii and binary, and then set my terminal application to EUC-JP, but after doing a console.log(), neither result in the actual text.
I've had a scan through the Node documentation, and it seems to only support two main text encodings (apart from binary and base64.)
I'm using the inbuilt http client, and specifying the encoding through the response.setEncoding method, e.g. response.setEncoding('utf8');
How are other people working with international text in Node (especially with regard to situations where the original data is not in UTF-8?) Are binary buffers the only way?
While I've done a bit of research, I'm not hugely knowledgeable when it comes to character encoding, so simple answers would be appreciated. Thanks!
There is a module that adds iconv bindings to node.js. If you grab the response as a binary Buffer, you can use Iconv.convert to convert it from EUC-JP to UTF-8 (take a look at the README for an example).

Categories