I get base64 data as an answer to a POST request.
It's decoded the following way (based on the documentation of the REST API):
let buf = Buffer.from(base64, "base64");
buf = buf.slice(4);
let data = gunzipSync(buf).toString()
console.log(data) // -> {"Code":200,"Value":"8e286fdb-aad2-43c6-87b1-1c6c0d21808a","Route":""}
console.log(data.length) // -> 140 -> Seems weird? Shouldn't it be 70?
Problem:
console.log(JSON.parse(data)) -> SyntaxError: Unexpected token in JSON at position 1
I tried to delete all white characters via replace(/\s/g,''), tried decoding with toString("utf8"), etc.
Nothing helps. The only thing that could help is the weird wrong length described above.
Your buffer is UTF-16 encoded and contains \0 bytes, like {·"·C·o·d·e·"·=·… (with · representing \0), that's why it's double the expected length. The \0 bytes don't print when you output the buffer with console.log(), that's why the output seems to be correct.
Decode the buffer before JSON-parsing it.
var buffer = Buffer.from(base64, "base64");
var str = buffer.toString('utf16le');
console.log(str) // -> {"Code":200,"Value":"8e286fdb-aad2-43c6-87b1-1c6c0d21808a","Route":""}
console.log(str.length) // -> 70
console.log(JSON.parse(str)) // -> { Code: 200, Value: '8e286fdb-aad2-43c6-87b1-1c6c0d21808a', Route: '' }
In general, never work with buffers as if they were strings. Buffers are always encoded in some way, that is their fundamental, defining difference from strings. You must decode them before outputting their contents as text.
Related
I have this hex encoded cbor string that I need decoded as a string: 821a0485c1e3ae581c368bedeac13b8f1fbc30cdaadb987635ff95e88960b1ea216e3f96faa14f74656e7461696c656761637931323201581c3dc002874772549f7758066ae025b92a4dab66c57d187dd78821b673a34a6c746c6d6b7973363535014a6c746c6d6b7973383333014b6c746c6d6b79733238393901581c482fb00dc32186a4c587dca2df3c7cf2bc455332ab581d51967306e1a1444d4f4149190743581c49da502b625d310ad3a742c1e747c0027e83426f8717180b21c871b1a14959524941433136373401581c4b92de6b0398970dcafe4aee5329e591819ca11aa3dc163a981c7f99a54f4d654d654f66546865446179363534014f4d654d654f6654686544617937343001504d654d654f665468654461793132343001504d654d654f665468654461793133373801504d654d654f665468654461793234343901581c561696ab9e70db98f8ff5c12f0fdbd837bd1b95d84c748b04ede8fbaa14441504f43190226581c5993061274861159508aef767fc0ccd8b8a9b836171a989ad543fa1fa74f44446f53546573744d696e74373432015044446f53546573744d696e7431323132015044446f53546573744d696e7431333238015044446f53546573744d696e7432333130015044446f53546573744d696e7432393239015044446f53546573744d696e7433303037015044446f53546573744d696e743330313701581c5cf33cfea1b37c289060f55fa09c1fb3b9cb6972e40d9ed2f94a5adaa14f484d5072696f4d6f6e73746572313901581c9b542bc33521163a7ff3a05d1df1bcc0c0ec6a0638337e4b2870f6eaa153496e74726f76657274736d636172643034303201581caaf1f848b36940b0703f43f8d406b815132efe64fccb34bc30f993a0a14f466c7566667957686974656c69737401581cc76975c66380ad2bcdf6b465b3b0df34bdd76112046979b9a834364fa1534f4e434841494e20534b554c4c20233038353301581cd79181749db228d10c98501a7e1728585780bcf133b7b3df953a9017a24e496e74726f766572747332313135014e496e74726f76657274733436383901581cdd589bbcfa48c9a133a22e205da33a5d07ef79dac1f8d5d8067b1004a158187768657265734472696c6c646f54686557616c646f39353301581cf8ed2f8e4fdc992644710e45be7fdc4a556b7517ee2956414b5c2b25a14d434e4654536c6162733032373601
I found this website: https://cbor.me/ which are able to decode it as a string as seen in the picture
And I have been trying to figure out how it works so I could recreate it in my own code, but have been unsuccessful.
I have tried this, which gives me a string like I wanted but it doesn't decode the words as seen in the picture:
buf = new Uint8Array(itxt.match(/.{1,2}/g).map(byte => parseInt(byte, 16)))
data = await cbor.diagnose(buf)
console.log(data)
So I have come here to ask if any of you might know the code to do it. thanks
console.log(data.replace(/h'(.*?)'/g, function(m, p) {
var s = Buffer.from(p, "hex").toString();
if (/\P{L}/u.test(s)) return m;
else return `'${s}'`;
}));
replaces the hex strings with UTF-8 strings, unless the UTF-8 string would contain a non-letter (regular expression \P{L}).
An alternative to using the Buffer class is the code you already gave in your question:
var s = p.match(/.{1,2}/g)
.map(byte => String.fromCharCode(parseInt(byte, 16))).join("");
When I load my data from my model (asp.net model) if the string is in another language (Russian for example), I'm getting the unicode hex code of the chars. How can I convert them to a normal string?
The problem isn't from the javascript encoding, it's from loading it from the model. I tried using a lot of functions from the forum but none worked!
Here is an example of a value I should get:
Петър
And here is what I'm actually getting:
Петър
const sequence = 'Петър'
const charCode = sequence.split(/[;\s]+/g)
// remove empty
.filter((v) => v)
// П -> 0x41F -> number
.map((v) => Number.parseInt(v.replace(/&#/, '0')))
console.log(
String.fromCharCode(...charCode)
)
Encoding my URL works perfectly with base-64 encoding. So does decoding but not with the string literal variable.
This works:
document.write(atob("hi"));
This does not:
var tempvar = "hello";
document.write(atob(tempvar));
What am I doing wrong? Nothing is displayed. But if I quote "tempvar", then it of course works but is not the same thing since "tempvar" is a string, not a variable.
Your Question
What am I doing wrong?
The string being passed to atob() is a string literal of length 5 (and not technically a base-64 encoded string). The browser console should reveal an exception in the error log (see explanation in The cause below).
The cause
Per the MDN documentation of atob():
Throws
Throws a DOMException if the length of passed-in string is not a multiple of 4. 1
The length of the string literal "hello" (i.e. 5) is not a multiple of 4. Thus the exception is thrown instead of returning the decoded version of the string literal.
A Solution
One solution is to either use a string that has actually been encoded (e.g. with btoa()) or at least has a length of four (e.g. using String.prototype.substring()). See the snippet below for an example.
var tempvar = "hello";
window.addEventListener("DOMContentLoaded", function(readyEvent) {
var container = document.getElementById("container");
//encode the string
var encoded = btoa(tempvar);
container.innerHTML = encoded;
var container2 = document.getElementById("container2");
//decode the encoded string
container2.innerHTML = atob(encoded);
var container3 = document.getElementById("container3");
//decode the first 4 characters of the string
container3.innerHTML = atob(tempvar.substring(0, 4));
});
<div> btoa(tempvar): <span id="container"></span></div>
<div> atob(decoded): <span id="container2"></span></div>
<div> atob(tempvar.substring(0, 4)): <span id="container3"></span></div>
1https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/atob
It's because it can't decode the string "hello", try an actual string that can be decoded from base64, here is an example;
var tempvar = "aHR0cDovL3N0YWNrb3ZlcmZsb3cuY29tL3F1ZXN0aW9ucy80MzEyOTEzNi9kZWNvZGluZy1ub3Qtd29ya2luZy13aXRoLWJhc2U2NA==";
document.write(atob(tempvar));
If you want to encode, use the btoa function instead,
var tempvar = "hello";
document.write(btoa(tempvar));
You can use this website to test decoding and encoding base64, https://www.base64encode.org/
it's because you are trying to decode a not base64 encoded string
that it works on hi is just a coincidence it seems.
atob = decode
btoa = encode
You're using the wrong function. You should use btoa() to encode.
When you do atob('hi'), you're actually decoding 'hi', which happens to be valid base-64.
How do I convert the below string:
var string = "Bouchard+P%E8re+et+Fils"
using javascript into UTF-8, so that %E8 would become %C3%A8?
Reason is this character seems to be tripping up decodeURIComponent
You can test it out by dropping the string into http://meyerweb.com/eric/tools/dencoder/ and seeing the console error that says Uncaught URIError: URI malformed
I'm looking specifically for something that can decode an entire html document, that claims to be windows-1252 encoded which is where I assume this %E8 character is coming from, into UTF-8.
Thanks!
First create a map of Windows-1252. You can find references to the encoding using your search engine of choice.
For the sake of this example, I'm going to include on the character in your sample data.
Then find all the percentage signs followed by two hexadecimal characters, convert them to numbers, and convert them using the map (to get raw data), then convert them again using encodeURIComponent (to get the encoded data).
var string = "Bouchard+P%E8re+et+Fils"
var w2512chars = [];
w2512chars[232] = "è"
var percent_encoded = /(%[a-fA-F0-9]{2})/g;
function filter(match, group) {
var number = parseInt(group.substr(1), 16);
var character = w2512chars[number];
return encodeURIComponent(character);
}
string = string.replace(percent_encoded, filter);
alert(string);
If I receive a UTF-8 string via a socket (or for that matter via any external source) I would like to get it as a properly parsed string object. The following code shows what I mean
var str='21\r\nJust a demo string \xC3\xA4\xC3\xA8-should not be anymore parsed';
// Find CRLF
var i=str.indexOf('\r\n');
// Parse size up until CRLF
var x=parseInt(str.slice(0, i));
// Read size bytes
var s=str.substr(i+2, x)
console.log(s);
This code should print
Just a demo string äè
but as the UTF-8 data is not properly parsed it only parses it up to the first Unicode character
Just a demo string ä
Would anyone have an idea how to convert this properly?
It seems you could use this decodeURIComponent(escape(str)):
var badstr='21\r\nJust a demo string \xC3\xA4\xC3\xA8-should not be anymore parsed';
var str=decodeURIComponent(escape(badstr));
// Find CRLF
var i=str.indexOf('\r\n');
// Parse size up until CRLF
var x=parseInt(str.slice(0, i));
// Read size bytes
var s=str.substr(i+2, x)
console.log(s);
BTW, this kind of issue occurs when you mix UTF-8 and other types of enconding. You should check that as well.
You should use utf8.js which is available on npm.
var utf8 = require('utf8');
var encoded = '21\r\nJust a demo string \xC3\xA4\xC3\xA8-foo bar baz';
var decoded = utf8.decode(encoded);
console.log(decoded);