decode from unicode hex to string - javascript

When I load my data from my model (asp.net model) if the string is in another language (Russian for example), I'm getting the unicode hex code of the chars. How can I convert them to a normal string?
The problem isn't from the javascript encoding, it's from loading it from the model. I tried using a lot of functions from the forum but none worked!
Here is an example of a value I should get:
Петър
And here is what I'm actually getting:
Петър

const sequence = 'Петър'
const charCode = sequence.split(/[;\s]+/g)
// remove empty
.filter((v) => v)
// &#x41F -> 0x41F -> number
.map((v) => Number.parseInt(v.replace(/&#/, '0')))
console.log(
String.fromCharCode(...charCode)
)

Related

decode hex encoded cbor string as a string in javascript

I have this hex encoded cbor string that I need decoded as a string: 821a0485c1e3ae581c368bedeac13b8f1fbc30cdaadb987635ff95e88960b1ea216e3f96faa14f74656e7461696c656761637931323201581c3dc002874772549f7758066ae025b92a4dab66c57d187dd78821b673a34a6c746c6d6b7973363535014a6c746c6d6b7973383333014b6c746c6d6b79733238393901581c482fb00dc32186a4c587dca2df3c7cf2bc455332ab581d51967306e1a1444d4f4149190743581c49da502b625d310ad3a742c1e747c0027e83426f8717180b21c871b1a14959524941433136373401581c4b92de6b0398970dcafe4aee5329e591819ca11aa3dc163a981c7f99a54f4d654d654f66546865446179363534014f4d654d654f6654686544617937343001504d654d654f665468654461793132343001504d654d654f665468654461793133373801504d654d654f665468654461793234343901581c561696ab9e70db98f8ff5c12f0fdbd837bd1b95d84c748b04ede8fbaa14441504f43190226581c5993061274861159508aef767fc0ccd8b8a9b836171a989ad543fa1fa74f44446f53546573744d696e74373432015044446f53546573744d696e7431323132015044446f53546573744d696e7431333238015044446f53546573744d696e7432333130015044446f53546573744d696e7432393239015044446f53546573744d696e7433303037015044446f53546573744d696e743330313701581c5cf33cfea1b37c289060f55fa09c1fb3b9cb6972e40d9ed2f94a5adaa14f484d5072696f4d6f6e73746572313901581c9b542bc33521163a7ff3a05d1df1bcc0c0ec6a0638337e4b2870f6eaa153496e74726f76657274736d636172643034303201581caaf1f848b36940b0703f43f8d406b815132efe64fccb34bc30f993a0a14f466c7566667957686974656c69737401581cc76975c66380ad2bcdf6b465b3b0df34bdd76112046979b9a834364fa1534f4e434841494e20534b554c4c20233038353301581cd79181749db228d10c98501a7e1728585780bcf133b7b3df953a9017a24e496e74726f766572747332313135014e496e74726f76657274733436383901581cdd589bbcfa48c9a133a22e205da33a5d07ef79dac1f8d5d8067b1004a158187768657265734472696c6c646f54686557616c646f39353301581cf8ed2f8e4fdc992644710e45be7fdc4a556b7517ee2956414b5c2b25a14d434e4654536c6162733032373601
I found this website: https://cbor.me/ which are able to decode it as a string as seen in the picture
And I have been trying to figure out how it works so I could recreate it in my own code, but have been unsuccessful.
I have tried this, which gives me a string like I wanted but it doesn't decode the words as seen in the picture:
buf = new Uint8Array(itxt.match(/.{1,2}/g).map(byte => parseInt(byte, 16)))
data = await cbor.diagnose(buf)
console.log(data)
So I have come here to ask if any of you might know the code to do it. thanks
console.log(data.replace(/h'(.*?)'/g, function(m, p) {
var s = Buffer.from(p, "hex").toString();
if (/\P{L}/u.test(s)) return m;
else return `'${s}'`;
}));
replaces the hex strings with UTF-8 strings, unless the UTF-8 string would contain a non-letter (regular expression \P{L}).
An alternative to using the Buffer class is the code you already gave in your question:
var s = p.match(/.{1,2}/g)
.map(byte => String.fromCharCode(parseInt(byte, 16))).join("");

Parsing comma separated key value pair

I want to turn
realm="https://api.digitalocean.com/v2/registry/auth",service="registry.digitalocean.com",scope="registry:catalog:*"
To
{
realm: "https://api.digitalocean.com/v2/registry/auth",
service: "registry.digitalocean.com",
scope: "registry:catalog:*"
}
I don't know what does it call, but I think there should be easier to parse these type of format. I wonder if there is existing library or easier way to parse this one?
Currently, this is what I do, but I feel it is not reliable.
auth = `realm="https://api.digitalocean.com/v2/registry/auth",service="registry.digitalocean.com",scope="registry:catalog:*"`;
const convertToJsonString = '{"' + auth.replace(/="/g, `":"`).replace(/",/g, `","`) + '}';
JSON.parse(convertToJsonString);
What I have to achieve is to parse the Www-Authenticate header as the following spec:
https://www.rfc-editor.org/rfc/rfc2617#section-3.2.1
https://datatracker.ietf.org/doc/html/rfc7235#section-4.1
I am surprised that something that common has no existing library for parsing or maybe I don't know where to look for.
Assuming all entries are well-formed and ", within the values are escaped with backslashes (\",), then the following incantation does the trick:
Object.fromEntries(
s
.replace(/(?<!\\)",/g, '"\n') // replace separators by newlines
.split("\n") // split by newlines
.map((r) => /^(.+?)=(.+?)$/.exec(r)) // split by first equals
.map(([, k, v]) => [k, JSON.parse(v)]), // parse value as JSON string
)

Javascript JSON parse problem with decoded POST request data

I get base64 data as an answer to a POST request.
It's decoded the following way (based on the documentation of the REST API):
let buf = Buffer.from(base64, "base64");
buf = buf.slice(4);
let data = gunzipSync(buf).toString()
console.log(data) // -> {"Code":200,"Value":"8e286fdb-aad2-43c6-87b1-1c6c0d21808a","Route":""}
console.log(data.length) // -> 140 -> Seems weird? Shouldn't it be 70?
Problem:
console.log(JSON.parse(data)) -> SyntaxError: Unexpected token in JSON at position 1
I tried to delete all white characters via replace(/\s/g,''), tried decoding with toString("utf8"), etc.
Nothing helps. The only thing that could help is the weird wrong length described above.
Your buffer is UTF-16 encoded and contains \0 bytes, like {·"·C·o·d·e·"·=·… (with · representing \0), that's why it's double the expected length. The \0 bytes don't print when you output the buffer with console.log(), that's why the output seems to be correct.
Decode the buffer before JSON-parsing it.
var buffer = Buffer.from(base64, "base64");
var str = buffer.toString('utf16le');
console.log(str) // -> {"Code":200,"Value":"8e286fdb-aad2-43c6-87b1-1c6c0d21808a","Route":""}
console.log(str.length) // -> 70
console.log(JSON.parse(str)) // -> { Code: 200, Value: '8e286fdb-aad2-43c6-87b1-1c6c0d21808a', Route: '' }
In general, never work with buffers as if they were strings. Buffers are always encoded in some way, that is their fundamental, defining difference from strings. You must decode them before outputting their contents as text.

Number to UTF-8 char

In Node.js I'm able to print to the console a Japanese character like this:
console.log('\u3041');
If I have 3041 as a number, for instance because it is randomly generated, how do I print the corresponding UTF-8 sigil?
const charNumber = 3041;
// of course this doesn't work, but I need something like that:
console.log(`\u${charNumber}`);
You can use the HEX representation with .fromCharCode by replacing \u with 0x:
const charNumber = 3041;
console.log(String.fromCharCode(`0x${charNumber}`));

Javascript convert windows-1252 encoding to UTF-8

How do I convert the below string:
var string = "Bouchard+P%E8re+et+Fils"
using javascript into UTF-8, so that %E8 would become %C3%A8?
Reason is this character seems to be tripping up decodeURIComponent
You can test it out by dropping the string into http://meyerweb.com/eric/tools/dencoder/ and seeing the console error that says Uncaught URIError: URI malformed
I'm looking specifically for something that can decode an entire html document, that claims to be windows-1252 encoded which is where I assume this %E8 character is coming from, into UTF-8.
Thanks!
First create a map of Windows-1252. You can find references to the encoding using your search engine of choice.
For the sake of this example, I'm going to include on the character in your sample data.
Then find all the percentage signs followed by two hexadecimal characters, convert them to numbers, and convert them using the map (to get raw data), then convert them again using encodeURIComponent (to get the encoded data).
var string = "Bouchard+P%E8re+et+Fils"
var w2512chars = [];
w2512chars[232] = "è"
var percent_encoded = /(%[a-fA-F0-9]{2})/g;
function filter(match, group) {
var number = parseInt(group.substr(1), 16);
var character = w2512chars[number];
return encodeURIComponent(character);
}
string = string.replace(percent_encoded, filter);
alert(string);

Categories