Parsing JSON with escaped unicode characters displays incorrectly - javascript

I have downloaded JSON data from Instagram that I'm parsing in NodeJS and storing in MongoDB. I'm having an issue where escaped unicode characters are not displaying the correct emoji symbols when displayed on the client side.
For instance, here's a property from one of the JSON files I'm parsing and storing:
"title": "#mujenspirits is in the house!NEW York City \u00f0\u009f\u0097\u00bd\u00f0\u009f\u008d\u008e \nImperial Vintner Liquor Store"
The above example should display like this:
#mujenspirits is in the house!NEW York City 🗽🍎
Imperial Vintner Liquor Store
But instead looks like this:
#mujenspirits is in the house!NEW York City 🗽ðŸŽ
Imperial Vintner Liquor Store
I found another SO question where someone had a similar problem and their solution works for me in the console using a simple string, but when used with JSON.parse still gives the same incorrect display. This is what I'm using now to parse the JSON files.
export default function parseJsonFile(filepath: string) {
const value = fs.readFileSync(filepath)
const converted = new Uint8Array(
new Uint8Array(Array.prototype.map.call(value, (c) => c.charCodeAt(0)))
)
return JSON.parse(new TextDecoder().decode(converted))
}
For posterity, I found an additional SO question similar to mine. There wasn't a solution, however, one of the comments said:
The JSON files were generated incorrectly. The strings represent Unicode code points as escape codes, but are UTF-8 data decoded as Latin1
The commenter suggested encoding the loaded JSON to latin1 then decoding to utf8, but this didn't work for me either.
import buffer from 'buffer'
const value = fs.readFileSync(filepath)
const buffered = buffer.transcode(value, 'latin1', 'utf8')
return JSON.parse(buffered.toString())
I know pretty much nothing about character encoding, so at this point I'm shooting in the dark searching for a solution.

An easy solution is to decode the string with the uft8 package
npm install utf8
Now as an example of use, look at this code that uses nodejs and express:
import express from "express";
import uft8 from "utf8";
const app = express();
app.get("/", (req, res) => {
const text = "\u00f0\u009f\u0097\u00bd\u00f0\u009f\u008d\u008e it is a test";
const textDecode = uft8.decode(text);
console.log(textDecode);
res.send(textDecode);
});
const port = process.env.PORT || 5000;
app.listen(port, () => {
console.log("Server on port 5000");
});
The result is that in localhost:5000 you will see the emojis without problem. You can apply this idea to your project, to treat the json with emojis.
And here is an example from the client side:
const element= document.getElementById("text")
const txt = "\u00f0\u009f\u0097\u00bd\u00f0\u009f\u008d\u008e it is a test"
const text= utf8.decode(txt)
console.log(text)
element.innerHTML= text
<script src="https://cdnjs.cloudflare.com/ajax/libs/utf8/2.1.1/utf8.min.js" integrity="sha512-PACCEofNpYYWg8lplUjhaMMq06f4g6Hodz0DlADi+WeZljRxYY7NJAn46O5lBZz/rkDWivph/2WEgJQEVWrJ6Q==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
<p id="text"></p>

You can try converting the unicode escape sequences to bytes before parsing the JSON; probably, the utf8.js library can help you with that.
Alternatively, the solution you found should work but only after unserializing the JSON (it will turn each unicode escape sequence into one character). So, you need to traverse the object and apply the solution to each string
For example:
function parseJsonFile(filepath) {
const value = fs.readFileSync(filepath);
return decodeUTF8(JSON.parse(value));
}
function decodeUTF8(data) {
if (typeof data === "string") {
const utf8 = new Uint8Array(
Array.prototype.map.call(data, (c) => c.charCodeAt(0))
);
return new TextDecoder("utf-8").decode(utf8);
}
if (Array.isArray(data)) {
return data.map(decodeUTF8);
}
if (typeof data === "object") {
const obj = {};
Object.entries(data).forEach(([key, value]) => {
obj[key] = decodeUTF8(value);
});
return obj;
}
return data;
}

Related

How to apply regular expression for Javascript

I am trying to get message log from Azure application Insight like this
az monitor app-insights --app [app id] --analystics-query [condition like specific message id]
Then I got a message like this
"message": [
"Receiving message: {"type":"CTL","traceId":"f0d11b3dbf27b8fc57ac0e40c4ed9e48","spanId":"a5508acb0926fb1a","id":{"global":"GLkELDUjcRpP4srUt9yngY","caller":null,"local":"GLkELDUisjnGrSK5wKybht"},"eventVersion":"format version","timeStamp":"2021-10-01T14:55:59.8168722+07:00","eventMetadata":{"deleteTimeStamp":null,"ttlSeconds":null,"isFcra":null,"isDppa":true,"isCCPA":true,"globalProductId":null,"globalSubProductId":null,"mbsiProductId":null},"eventBody":{"sys":"otel","msg":"Testing Centralized Event Publisher with App1 (using logback)","app":{"name":"otel","service":"postHouse","status":"status name","method":"POST","protocol":"HTTP","resp_time_ms":"250","status_code":"4"},}}"
] }
So that I would like to apply Regular Expression for this message to get only the message from {"type.....to "status_code":"4"},}} and also convert it to JSON format
I have code like this in my .js file
Then('extract json from {string}', function(message){
message = getVal(message, this);
const getmess = message.match(/{(.*)}/g);
const messJson = JSON.parse(getmess);
console.log(messJson);
})
But it doesn't work for me
SyntaxError: Unexpected token \ in JSON at position 1
How can I apply this in my code on Javascript? Thank you so much for your help
Try this. But keep in mind, that current regex is binded with provided program output syntax. If output will be different in wrapper structure, this regex might not work any more.
// Text from app
const STDOUT = `
"message": [ "Receiving message: {"type":"CTL","traceId":"f0d11b3dbf27b8fc57ac0e40c4ed9e48","spanId":"a5508acb0926fb1a","id":{"global":"GLkELDUjcRpP4srUt9yngY","caller":null,"local":"GLkELDUisjnGrSK5wKybht"},"eventVersion":"format version","timeStamp":"2021-10-01T14:55:59.8168722+07:00","eventMetadata":{"deleteTimeStamp":null,"ttlSeconds":null,"isFcra":null,"isDppa":true,"isCCPA":true,"globalProductId":null,"globalSubProductId":null,"mbsiProductId":null},"eventBody":{"sys":"otel","msg":"Testing Centralized Event Publisher with App1 (using logback)","app":{"name":"otel","service":"postHouse","status":"status name","method":"POST","protocol":"HTTP","resp_time_ms":"250","status_code":"4"},}}"
] }
`;
// Match JSON part string
let JSONstr = /.*\[\s*\"Receiving message:\s*(.*?)\s*\"\s*]\s*}\s*$/.exec(STDOUT)[1];
// Remove trailing comma(s)
JSONstr = JSONstr.replace(/^(.*\")([^\"]+)$/, (s, m1, m2) => `${m1}${m2.replace(/\,/, "")}`);
// Convert to object
const JSONobj = JSON.parse(JSONstr);
// Result
console.log(JSONobj);
Try this one:
/.*?({"type":.*?,"status_code":"\d+"\})/
When used in Javascript, the part covered by the parentheses counts as Group 1, i.e.,:
const messJson = JSON.parse(message.match(/.*?({"type":.*?,"status_code":"\d+"\})/)[1]);
Reference here: https://regexr.com/66mf2

Why I get Malformed UTF-8 data error on crypto-js?

I try to encrypt and decrypt this string using crypto-js:
const str = `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1aWQiOiI1ZDg5MjMxMjc5OTkxYjJhNGMwMjdjMGIiLCJoc2giOiIkMmEkMTMkWk53Y0cubjdRZFIybDA3S1RHd2RoLlN0QksudW5GSFVGLkZnZ0tQTGlUV2pOVEFqVy9SMm0iLCJncmFudCI6ImFjY2VzcyIsImlhdCI6MTU2OTI2ODUwMiwiZXhwIjoxNjAwODI2MTAyfQ.PQcCoF9d25bBqr1U4IhJbylpnKTYiad3NjCh_LvMfLE~3~null~undefined~434ce0149ce42606d8746bd9`;
But I got an error:
Error: Malformed UTF-8 data
What I doing wrong? How do I fix that?
The full code also on stackbliz:
import crypto from 'crypto-js';
const str = `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1aWQiOiI1ZDg5MjMxMjc5OTkxYjJhNGMwMjdjMGIiLCJoc2giOiIkMmEkMTMkWk53Y0cubjdRZFIybDA3S1RHd2RoLlN0QksudW5GSFVGLkZnZ0tQTGlUV2pOVEFqVy9SMm0iLCJncmFudCI6ImFjY2VzcyIsImlhdCI6MTU2OTI2ODUwMiwiZXhwIjoxNjAwODI2MTAyfQ.PQcCoF9d25bBqr1U4IhJbylpnKTYiad3NjCh_LvMfLE~9~null~undefined~434ce0149ce42606d8746bd9`;
const cryptoInfo = crypto.AES.encrypt(str, 'secret').toString();
console.log({ cryptoInfo });
const info2 = crypto.AES.decrypt(str, 'secret').toString(crypto.enc.Utf8);
console.log({ info2 });
Not sure why, but you have to wrap your string with an object and use JSON.stringify in order to make it works.
Here:
import crypto from 'crypto-js';
const str = `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1aWQiOiI1ZDg5MjMxMjc5OTkxYjJhNGMwMjdjMGIiLCJoc2giOiIkMmEkMTMkWk53Y0cubjdRZFIybDA3S1RHd2RoLlN0QksudW5GSFVGLkZnZ0tQTGlUV2pOVEFqVy9SMm0iLCJncmFudCI6ImFjY2VzcyIsImlhdCI6MTU2OTI2ODUwMiwiZXhwIjoxNjAwODI2MTAyfQ.PQcCoF9d25bBqr1U4IhJbylpnKTYiad3NjCh_LvMfLE~9~null~undefined~434ce0149ce42606d8746bd9`;
const cryptoInfo = crypto.AES.encrypt(JSON.stringify({ str }), 'secret').toString();
console.log({ cryptoInfo });
const info2 = crypto.AES.decrypt(cryptoInfo, 'secret').toString(crypto.enc.Utf8);
console.log({ info2 });
const info3 = JSON.parse(info2);
console.log({ str: info3.str });
I encrypt a name and pass it as URL parameter.
I was supprised, that the decrypt code did not work
It was because of the "+" char generated in the encrypted parameter. Then using "encodeURIComponent" and "decodeURIComponent" it worked.
<script>
jQuery("#myBtn").click(function(){
var clientname= jQuery("#myInput").val();
var encrypted = CryptoJS.AES.encrypt(clientname, "secret key 123");
//my URL to call with encrypted client name
jQuery("#output").append('<small id="myurl">https://www.xxxxx.com/?id='+encodeURIComponent(encrypted)+"</small>");
});
</script>
var urlParams = new URLSearchParams(window.location.search);
var crypted_param = decodeURIComponent(urlParams.get('id'));
if(crypted_param && crypted_param != null && crypted_param != "" && crypted_param != "null"){
var decrypted = CryptoJS.AES.decrypt(crypted_param, "secret key 123");
jQuery('#output1').val(decrypted.toString(CryptoJS.enc.Utf8));
}
You forgot to pass the encrypted text as parameter to decrypt function.
In decrypt function you are passing original string, i.e. 'str' which is causing the problem in above code, here is the corret code.
import crypto from "crypto-js";
const str = `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1aWQiOiI1ZDg5MjMxMjc5OTkxYjJhNGMwMjdjMGIiLCJoc2giOiIkMmEkMTMkWk53Y0cubjdRZFIybDA3S1RHd2RoLlN0QksudW5GSFVGLkZnZ0tQTGlUV2pOVEFqVy9SMm0iLCJncmFudCI6ImFjY2VzcyIsImlhdCI6MTU2OTI2ODUwMiwiZXhwIjoxNjAwODI2MTAyfQ.PQcCoF9d25bBqr1U4IhJbylpnKTYiad3NjCh_LvMfLE~9~null~undefined~434ce0149ce42606d8746bd9`;
const cryptoInfo = crypto.AES.encrypt(JSON.stringify(str), "secret");
console.log({cryptoInfo});
const info2 = crypto.AES.decrypt(cryptoInfo.toString(), 'secret').toString(crypto.enc.Utf8);
console.log({ info2 });
Despite all the above suggestions check your Encryption Key and Secret Key. While decrypting Encryption Key should match with your Encryption Key which you have used at the time of encrypting.
I was experiencing the same issue, it seems the encrypted value is base64 and needs to be converted to utf-8 first.
Example:
const utf8 = CryptoJS.enc.Base64.parse(value);
const decrypted = CryptoJS.DES.decrypt({ ciphertext: utf8 }, keyWords, { iv: ivWords });
I found the solution Here
Might be this is slightly funny.. but this is how my senior has resolved this problem to me
We do have 2 different portals, assume XYZ portal and ABC portal (I am facing this issue in xyz portal)
ABC is the portal where we login.. to redirect to XYZ portal..
So in local I have opened both xyz portal and ABC portal.. and the issue is resolved..
(Earlier I opened only xyz portal so I was facing the issue) :D
I've resolved my problem cleaning up the local storage.

How to convert a string into it's real binary representation (UTF-8 or whatever is currently used)?

I want to experiment with UTF-8 and Unicode, for that I want to build a small Website which helps me to understand the encoding better.
First I want the ability to enter some Text and then get the actual binary encoding of the string. For that I'm searching for a equivalent to ".GetBytes" from C# or Java. I do not want the resolved CharCodes!
Here a C# function I would like to reproduce in JavaScript
string ToBinary(string input)
{
//this is the part I am looking for in JavaScript
var utf8Bytes = Encoding.UTF8.GetBytes(input);
var bytesFormatedToBin = utf8Bytes.Select(b => Convert.ToString(b, 2).PadLeft(8, '0'));
return string.Join(' ', bytesFormatedToBin);
}
Here some sample results:
"abc" => "01100001 01100010 01100011"
"#©®" => "01000000 11000010 10101001 11000010 10101110"
"😀😄" => "11110000 10011111 10011000 10000000 11110000 10011111
10011000 10000100"
Is there a way to achieve this in JavaScript?
Thanks.
Marc
Edit: Fixed truncated sample result.
String.prototype.charCodeAt(...) only works properly when the the string only contains ASCII characters. You'll have to use the standard TextEncoder if you want to deal with other characters:
const te = new TextEncoder('utf-8')
function toBinaryRepr(str) {
return Array.from(te.encode(str))
.map(i => i
.toString(2)
.padStart(8, '0'))
.join(' ')
}
// '01100001 01100010 01100011'
toBinaryRepr('abc')
// '01000000 11000010 10101001 11000010 10101110'
toBinaryRepr('#©®')
// '11110000 10011111 10011000 10000000 11110000 10011111 10011000 10000100'
toBinaryRepr('😀😄')
Warning: TextEncoder is not a global constructor in older versions of Node.js - if you get some errors saying TextEncoder is not defined, try importing it by:
const { TextEncoder } = require('util')

Decompress gzip and zlib string in javascript

I want to get compress layer data from tmx file . Who knows libraries for decompress gzip and zlib string in javascript ? I try zlib but it doesn't work for me . Ex , layer data in tmx file is :
<data encoding="base64" compression="zlib">
eJztwTEBAAAAwqD1T20JT6AAAHgaCWAAAQ==
</data>
My javascript code is
var base64Data = "eJztwTEBAAAAwqD1T20JT6AAAHgaCWAAAQ==";
var compressData = atob(base64Data);
var inflate = new Zlib.Inflate(compressData);
var output = inflate.decompress();
It runs with displays message error "unsupported compression method" . But I try decompress with online tool as http://i-tools.org/gzip , it returns correct string.
Pako is a full and modern Zlib port.
Here is a very simple example and you can work from there.
Get pako.js and you can decompress byteArray like so:
<html>
<head>
<title>Gunzipping binary gzipped string</title>
<script type="text/javascript" src="pako.js"></script>
<script type="text/javascript">
// Get datastream as Array, for example:
var charData = [31,139,8,0,0,0,0,0,0,3,5,193,219,13,0,16,16,4,192,86,214,151,102,52,33,110,35,66,108,226,60,218,55,147,164,238,24,173,19,143,241,18,85,27,58,203,57,46,29,25,198,34,163,193,247,106,179,134,15,50,167,173,148,48,0,0,0];
// Turn number array into byte-array
var binData = new Uint8Array(charData);
// Pako magic
var data = pako.inflate(binData);
// Convert gunzipped byteArray back to ascii string:
var strData = String.fromCharCode.apply(null, new Uint16Array(data));
// Output to console
console.log(strData);
</script>
</head>
<body>
Open up the developer console.
</body>
</html>
Running example: http://jsfiddle.net/9yH7M/
Alternatively you can base64 encode the array before you send it over as the Array takes up a lot of overhead when sending as JSON or XML. Decode likewise:
// Get some base64 encoded binary data from the server. Imagine we got this:
var b64Data = 'H4sIAAAAAAAAAwXB2w0AEBAEwFbWl2Y0IW4jQmziPNo3k6TuGK0Tj/ESVRs6yzkuHRnGIqPB92qzhg8yp62UMAAAAA==';
// Decode base64 (convert ascii to binary)
var strData = atob(b64Data);
// Convert binary string to character-number array
var charData = strData.split('').map(function(x){return x.charCodeAt(0);});
// Turn number array into byte-array
var binData = new Uint8Array(charData);
// Pako magic
var data = pako.inflate(binData);
// Convert gunzipped byteArray back to ascii string:
var strData = String.fromCharCode.apply(null, new Uint16Array(data));
// Output to console
console.log(strData);
Running example: http://jsfiddle.net/9yH7M/1/
To go more advanced, here is the pako API documentation.
I can solve my problem by zlib . I fix my code as below
var base64Data = "eJztwTEBAAAAwqD1T20JT6AAAHgaCWAAAQ==";
var compressData = atob(base64Data);
var compressData = compressData.split('').map(function(e) {
return e.charCodeAt(0);
});
var inflate = new Zlib.Inflate(compressData);
var output = inflate.decompress();
For anyone using Ruby on Rails, who wants to send compressed encoded data to the browser, then uncompress it via Javascript on the browser, I've combined both excellent answers above into the following solution. Here's the Rails server code in my application controller which compresses and encodes a string before sending it the browser via a #variable to a .html.erb file:
require 'zlib'
require 'base64'
def compressor (some_string)
Base64.encode64(Zlib::Deflate.deflate(some_string))
end
Here's the Javascript function, which uses pako.min.js:
function uncompress(input_field){
base64data = document.getElementById(input_field).innerText;
compressData = atob(base64data);
compressData = compressData.split('').map(function(e) {
return e.charCodeAt(0);
});
binData = new Uint8Array(compressData);
data = pako.inflate(binData);
return String.fromCharCode.apply(null, new Uint16Array(data));
}
Here's a javascript call to that uncompress function, which wants to unencode and uncompress data stored inside a hidden HTML field:
my_answer = uncompress('my_hidden_field');
Here's the entry in the Rails application.js file to call pako.min.js, which is in the /vendor/assets/javascripts directory:
//= require pako.min
And I got the pako.min.js file from here:
https://github.com/nodeca/pako/tree/master/dist
All works at my end, anyway! :-)
I was sending data from a Python script and trying to decode it in JS. Here's what I had to do:
Python
import base64
import json
import urllib.parse
import zlib
...
data_object = {
'_id': '_id',
...
}
compressed_details = base64.b64encode(zlib.compress(bytes(json.dumps(data_object), 'utf-8'))).decode("ascii")
urlsafe_object = urllib.parse.quote(str(compressed_details))#.replace('%', '\%') # you likely don't need this last part
final_URL = f'https://my.domain.com?data_object={urlsafe_object}'
...
JS
// npm install this
import pako from 'pako';
...
const urlParams = new URLSearchParams(window.location.search);
const data_object = urlParams.get('data_object');
if (data_object) {
const compressedData = Uint8Array.from(window.atob(data_object), (c) => c.charCodeAt(0));
originalObject = JSON.parse(pako.inflate(compressedData, { to: 'string' }));
};
...

How can I get the sha1 hash of a string in node.js?

I'm trying to create a websocket server written in node.js
To get the server to work I need to get the SHA1 hash of a string.
What I have to do is explained in Section 5.2.2 page 35 of the docs.
NOTE: As an example, if the value of the "Sec-WebSocket-Key"
header in the client's handshake were "dGhlIHNhbXBsZSBub25jZQ==", the server would append thestring "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" to form the
string "dGhlIHNhbXBsZSBub25jZQ==258EAFA5-E914-47DA-95CA-C5AB0DC85B11". The server would then take the SHA-1 hash of this string, giving the value 0xb3 0x7a 0x4f 0x2c 0xc0 0x62 0x4f 0x16 0x90 0xf6 0x46 0x06 0xcf 0x38 0x59 0x45 0xb2 0xbe 0xc4 0xea. This value is then base64-encoded, to give the value "s3pPLMBiTxaQ9kYGzzhZRbK+xOo=", which would be returned
in the "Sec-WebSocket-Accept" header.
See the crypto.createHash() function and the associated hash.update() and hash.digest() functions:
var crypto = require('crypto')
var shasum = crypto.createHash('sha1')
shasum.update('foo')
shasum.digest('hex') // => "0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"
Obligatory: SHA1 is broken, you can compute SHA1 collisions for 45,000 USD (and even less since this answer was written). You should use sha256:
var getSHA256ofJSON = function(input){
return crypto.createHash('sha256').update(JSON.stringify(input)).digest('hex')
}
To answer your question and make a SHA1 hash:
const INSECURE_ALGORITHM = 'sha1'
var getInsecureSHA1ofJSON = function(input){
return crypto.createHash(INSECURE_ALGORITHM).update(JSON.stringify(input)).digest('hex')
}
Then:
getSHA256ofJSON('whatever')
or
getSHA256ofJSON(['whatever'])
or
getSHA256ofJSON({'this':'too'})
Official node docs on crypto.createHash()
Tips to prevent issue (bad hash) :
I experienced that NodeJS is hashing the UTF-8 representation of the string. Other languages (like Python, PHP or PERL...) are hashing the byte string.
We can add binary argument to use the byte string.
const crypto = require("crypto");
function sha1(data) {
return crypto.createHash("sha1").update(data, "binary").digest("hex");
}
sha1("Your text ;)");
You can try with : "\xac", "\xd1", "\xb9", "\xe2", "\xbb", "\x93", etc...
Other languages (Python, PHP, ...):
sha1("\xac") //39527c59247a39d18ad48b9947ea738396a3bc47
Nodejs:
sha1 = crypto.createHash("sha1").update("\xac", "binary").digest("hex") //39527c59247a39d18ad48b9947ea738396a3bc47
//without:
sha1 = crypto.createHash("sha1").update("\xac").digest("hex") //f50eb35d94f1d75480496e54f4b4a472a9148752
You can use:
const sha1 = require('sha1');
const crypt = sha1('Text');
console.log(crypt);
For install:
sudo npm install -g sha1
npm install sha1 --save
Please read and strongly consider my advice in the comments of your post. That being said, if you still have a good reason to do this, check out this list of crypto modules for Node. It has modules for dealing with both sha1 and base64.
Answer using the new browser compatible, zero dependency SubtleCrypto API added in Node v15
const crypto = this.crypto || require('crypto').webcrypto;
const sha1sum = async (message) => {
const encoder = new TextEncoder()
const data = encoder.encode(message)
const hashBuffer = await crypto.subtle.digest('SHA-1', data)
const hashArray = Array.from(new Uint8Array(hashBuffer)); // convert buffer to byte array
const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join(''); // convert bytes to hex string
return hashHex;
}
sha1sum('foo')
.then(digestHex => console.log(digestHex))
// "0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"
Node Sandbox: https://runkit.com/hesygolu/61564dbee2ec8600082a884d
Sources:
https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest#converting_a_digest_to_a_hex_string
https://nodejs.org/api/webcrypto.html#webcrypto_class_subtlecrypto

Categories