How to apply regular expression for Javascript

How to apply regular expression for Javascript - javascript

I am trying to get message log from Azure application Insight like this
az monitor app-insights --app [app id] --analystics-query [condition like specific message id]
Then I got a message like this
"message": [
"Receiving message: {"type":"CTL","traceId":"f0d11b3dbf27b8fc57ac0e40c4ed9e48","spanId":"a5508acb0926fb1a","id":{"global":"GLkELDUjcRpP4srUt9yngY","caller":null,"local":"GLkELDUisjnGrSK5wKybht"},"eventVersion":"format version","timeStamp":"2021-10-01T14:55:59.8168722+07:00","eventMetadata":{"deleteTimeStamp":null,"ttlSeconds":null,"isFcra":null,"isDppa":true,"isCCPA":true,"globalProductId":null,"globalSubProductId":null,"mbsiProductId":null},"eventBody":{"sys":"otel","msg":"Testing Centralized Event Publisher with App1 (using logback)","app":{"name":"otel","service":"postHouse","status":"status name","method":"POST","protocol":"HTTP","resp_time_ms":"250","status_code":"4"},}}"
] }
So that I would like to apply Regular Expression for this message to get only the message from {"type.....to "status_code":"4"},}} and also convert it to JSON format
I have code like this in my .js file
Then('extract json from {string}', function(message){
message = getVal(message, this);
const getmess = message.match(/{(.*)}/g);
const messJson = JSON.parse(getmess);
console.log(messJson);
})
But it doesn't work for me
SyntaxError: Unexpected token \ in JSON at position 1
How can I apply this in my code on Javascript? Thank you so much for your help

Try this. But keep in mind, that current regex is binded with provided program output syntax. If output will be different in wrapper structure, this regex might not work any more.
// Text from app
const STDOUT = `
"message": [ "Receiving message: {"type":"CTL","traceId":"f0d11b3dbf27b8fc57ac0e40c4ed9e48","spanId":"a5508acb0926fb1a","id":{"global":"GLkELDUjcRpP4srUt9yngY","caller":null,"local":"GLkELDUisjnGrSK5wKybht"},"eventVersion":"format version","timeStamp":"2021-10-01T14:55:59.8168722+07:00","eventMetadata":{"deleteTimeStamp":null,"ttlSeconds":null,"isFcra":null,"isDppa":true,"isCCPA":true,"globalProductId":null,"globalSubProductId":null,"mbsiProductId":null},"eventBody":{"sys":"otel","msg":"Testing Centralized Event Publisher with App1 (using logback)","app":{"name":"otel","service":"postHouse","status":"status name","method":"POST","protocol":"HTTP","resp_time_ms":"250","status_code":"4"},}}"
] }
`;
// Match JSON part string
let JSONstr = /.*\[\s*\"Receiving message:\s*(.*?)\s*\"\s*]\s*}\s*$/.exec(STDOUT)[1];
// Remove trailing comma(s)
JSONstr = JSONstr.replace(/^(.*\")([^\"]+)$/, (s, m1, m2) => `${m1}${m2.replace(/\,/, "")}`);
// Convert to object
const JSONobj = JSON.parse(JSONstr);
// Result
console.log(JSONobj);

Try this one:
/.*?({"type":.*?,"status_code":"\d+"\})/
When used in Javascript, the part covered by the parentheses counts as Group 1, i.e.,:
const messJson = JSON.parse(message.match(/.*?({"type":.*?,"status_code":"\d+"\})/)[1]);
Reference here: https://regexr.com/66mf2

Related

Parsing JSON with escaped unicode characters displays incorrectly

I have downloaded JSON data from Instagram that I'm parsing in NodeJS and storing in MongoDB. I'm having an issue where escaped unicode characters are not displaying the correct emoji symbols when displayed on the client side.
For instance, here's a property from one of the JSON files I'm parsing and storing:
"title": "#mujenspirits is in the house!NEW York City \u00f0\u009f\u0097\u00bd\u00f0\u009f\u008d\u008e \nImperial Vintner Liquor Store"
The above example should display like this:
#mujenspirits is in the house!NEW York City 🗽🍎
Imperial Vintner Liquor Store
But instead looks like this:
#mujenspirits is in the house!NEW York City ðŸ—½ðŸŽ
Imperial Vintner Liquor Store
I found another SO question where someone had a similar problem and their solution works for me in the console using a simple string, but when used with JSON.parse still gives the same incorrect display. This is what I'm using now to parse the JSON files.
export default function parseJsonFile(filepath: string) {
const value = fs.readFileSync(filepath)
const converted = new Uint8Array(
new Uint8Array(Array.prototype.map.call(value, (c) => c.charCodeAt(0)))
)
return JSON.parse(new TextDecoder().decode(converted))
}
For posterity, I found an additional SO question similar to mine. There wasn't a solution, however, one of the comments said:
The JSON files were generated incorrectly. The strings represent Unicode code points as escape codes, but are UTF-8 data decoded as Latin1
The commenter suggested encoding the loaded JSON to latin1 then decoding to utf8, but this didn't work for me either.
import buffer from 'buffer'
const value = fs.readFileSync(filepath)
const buffered = buffer.transcode(value, 'latin1', 'utf8')
return JSON.parse(buffered.toString())
I know pretty much nothing about character encoding, so at this point I'm shooting in the dark searching for a solution.

An easy solution is to decode the string with the uft8 package
npm install utf8
Now as an example of use, look at this code that uses nodejs and express:
import express from "express";
import uft8 from "utf8";
const app = express();
app.get("/", (req, res) => {
const text = "\u00f0\u009f\u0097\u00bd\u00f0\u009f\u008d\u008e it is a test";
const textDecode = uft8.decode(text);
console.log(textDecode);
res.send(textDecode);
});
const port = process.env.PORT || 5000;
app.listen(port, () => {
console.log("Server on port 5000");
});
The result is that in localhost:5000 you will see the emojis without problem. You can apply this idea to your project, to treat the json with emojis.
And here is an example from the client side:
const element= document.getElementById("text")
const txt = "\u00f0\u009f\u0097\u00bd\u00f0\u009f\u008d\u008e it is a test"
const text= utf8.decode(txt)
console.log(text)
element.innerHTML= text
<script src="https://cdnjs.cloudflare.com/ajax/libs/utf8/2.1.1/utf8.min.js" integrity="sha512-PACCEofNpYYWg8lplUjhaMMq06f4g6Hodz0DlADi+WeZljRxYY7NJAn46O5lBZz/rkDWivph/2WEgJQEVWrJ6Q==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
<p id="text"></p>

You can try converting the unicode escape sequences to bytes before parsing the JSON; probably, the utf8.js library can help you with that.
Alternatively, the solution you found should work but only after unserializing the JSON (it will turn each unicode escape sequence into one character). So, you need to traverse the object and apply the solution to each string
For example:
function parseJsonFile(filepath) {
const value = fs.readFileSync(filepath);
return decodeUTF8(JSON.parse(value));
}
function decodeUTF8(data) {
if (typeof data === "string") {
const utf8 = new Uint8Array(
Array.prototype.map.call(data, (c) => c.charCodeAt(0))
);
return new TextDecoder("utf-8").decode(utf8);
}
if (Array.isArray(data)) {
return data.map(decodeUTF8);
}
if (typeof data === "object") {
const obj = {};
Object.entries(data).forEach(([key, value]) => {
obj[key] = decodeUTF8(value);
});
return obj;
}
return data;
}

Translate Special Characters with npm latinize is not working dynamically

I am using latinize to translate german language's special characters to English, they module work only when I pass string within single or double quotes, but not when I pass by storing them inside a variable.
import latinize from 'latinize';
ANd inside render, I console this and it works fine,
console.log('render', latinize('VfL Osnabrück'))
also when i pass my
let tag_name = 'VfL Osnabrück';
console.log('render', latinize('VfL Osnabrück'))
it will again works fine, but did not work fine when I get tag_name from my api. And complete code is below
let tag_parsing = sub_category_id.split('%20');
let tag_string = '';
for (let i = 0; i < tag_parsing.length; i++) {
tag_parsing[i];
// tag_parsing[0] == Vlf
// tag_parsing[1] == Osnabrück
console.log('latinized text', tag_parsing[i]);
tag_string += ' ' + tag_parsing[i]
}
OUTPUT
output of latinized text ==> Osnabr%C3%BCck
output of latinized text inside quotes ==> Osnabruck
I also try with .toString() but did not work

I think there may be something off with how you are attempting to process the query string from the URL.
Here's a snippet of the logic I used to process your query string in a forked codesandbox. I used a functional component for ease, but the same logic can be used in a class-based component.
// get the search string
const { search } = useLocation();
const [latinizedValue, setLatinizedValue] = React.useState("");
React.useEffect(() => {
console.log({ search });
// create search params object
const newParams = new URLSearchParams(search);
const key = newParams.get("key");
const value = newParams.get("value")?.trim();
console.log("Param", key, `"${value}"`);
console.log("latinize param:", `"${latinize(value)}"`);
setLatinizedValue(latinize(value));
}, [search]);
Demo

The font-family that i was using was not contains special german characters, and finally I changed the font-family that supports german special characters, and everything goes smooth and latinize also works fine.

ctx in ANTLR4 javascript visitor

Using ANTLR4 v4.8
I am in the process of writing transpiler exploring use of ANTLR (javascript target with visitor).
Grammar -> lex/parse is fine and I now sit on parse tree.
Grammar
grammar Mygrammar;
/*
* parser rules
*/
progm : stmt+;
stmt
: progdecl
| print
;
progdecl : PROGDECLKW ID '..';
print : WRITEKW STRLIT '..';
/*
* lexer rules
*/
PROGDECLKW : 'DECLAREPROGRAM';
WRITEKW : 'PRINT';
// Literal
STRLIT : '\'' .*? '\'' ;
// Identifier
ID : [a-zA-Z0-9]+;
// skip
LINE_COMMENT : '*' .*? '\n' -> skip;
TERMINATOR : [\r\n]+ -> skip;
WS : [ \t\n\r]+ -> skip;
hw.mg
***************
* Hello world
***************
DECLAREPROGRAM hw..
PRINT 'Hello World!'..
index.js
...
const myVisitor = require('./src/myVisitor').myVisitor;
const input = './src_sample/hw.mg';
const chars = new antlr4.FileStream(input);
...
parser.buildParseTrees = true;
const myVisit = new myVisitor();
myVisit.visitPrint(parser.print());
Use of visitor didn't seem straightforward, and this SO post helps to an extent.
On use of context. Is there a good way to track ctx, when I hit each node?
Using myVisit.visit(tree) as starting context is fine. When I start visiting each node, using non-root context
myVisit.visitPrint(parser.print()) throws me error.
Error:
PrintContext {
parentCtx: null,
invokingState: -1,
ruleIndex: 3,
children: null,
start: CommonToken {
source: [ [MygrammarLexer], [FileStream] ],
type: -1,
channel: 0,
start: 217,
together with exception: InputMismatchException [Error]
I believe it is because children is null instead of being populated.
Which, in turn, is due to
line 9:0 mismatched input '<EOF>' expecting {'DECLAREPROGRAM', 'PRINT'}
Question:
Is above the only way to pass the context or am I doing this wrong?
If the use is correct, then I incline towards looking at reporting this as bug.
edit 17.3 - added grammar and source

When you invoke parser.print() but feed it the input:
***************
* Hello world
***************
DECLAREPROGRAM hw..
PRINT 'Hello World!'..
it will not work. For print(), the parser expects input like this PRINT 'Hello World!'... For the entire input, you will have to invoke prog() instead. Also, it is wise to "anchor" your starting rule with the EOF token which will force ANTLR to consume the entire input:
progm : stmt+ EOF;
If you want to parse and visit an entire parse tree (using prog()), but are only interested in the print node/context, then it is better to use a listener instead of a visitor. Check this page how to use a listener: https://github.com/antlr/antlr4/blob/master/doc/javascript-target.md
EDIT
Here's how a listener works (a Python demo since I don't have the JS set up properly):
import antlr4
from playground.MygrammarLexer import MygrammarLexer
from playground.MygrammarParser import MygrammarParser
from playground.MygrammarListener import MygrammarListener
class PrintPreprocessor(MygrammarListener):
def enterPrint_(self, ctx: MygrammarParser.Print_Context):
print("Entered print: `{}`".format(ctx.getText()))
if __name__ == '__main__':
source = """
***************
* Hello world
***************
DECLAREPROGRAM hw..
PRINT 'Hello World!'..
"""
lexer = MygrammarLexer(antlr4.InputStream(source))
parser = MygrammarParser(antlr4.CommonTokenStream(lexer))
antlr4.ParseTreeWalker().walk(PrintPreprocessor(), parser.progm())
When running the code above, the following will be printed:
Entered print: `PRINT'Hello World!'..`
So, in short: this listener accepts the entire parse tree of your input, but only "listens" when we enter the print parser rule.
Note that I renamed print to print_ because print is protected in the Python target.

Problem in printing angle brackets using the xml-builder node module

I am creating an xml file using "xml-builder" node module. But when I tried to write angle brackets ("<" or ">"), I got characters like "<" and ">". The code is as follows:
let builder = require('xmlbuilder', { encoding: 'utf-8' });
let name = "ABC";
let xml = builder.create('Slides');
xml.ele('props',"Hello").up();
xml.ele('name',"<Hello> "+name+" </Hello>").up();
xml.end({ pretty: true });
console.log(xml.toString())
The output is as follows:
<Slides>
<props>Hello</props>
<name><Hello> ABC </Hello></name>
</Slides>
What should I do to get < or > printed instead of < or > ?

There is an npm module decode-html that will handle the same use case as your.
var decode = require('decode-html');
console.log(decode('<div class="hidden">NON&SENSE&apos;s</div>'));
// -> '<div class="hidden">NON&SENSE\'s</div>'

The problem is that is you are attempting to create a child element in an incorrect way, by passing some xml in the value field of xml.ele. The module is correctly escaping your angle brackets.
What you need to do is create another element named Hello and append it to the name element. You can do this by either chaining your .ele calls or using their return values.
Here is the correct code:
let builder = require('xmlbuilder', { encoding: 'utf-8' });
let name = "ABC";
let xml = builder.create('Slides');
xml.ele('props',"Hello");
xml.ele('name')
.ele("Hello", name);
xml.end({ pretty: true });
console.log(xml.toString())
Output:
<Slides>
<props>Hello</props>
<name>
<Hello>ABC</Hello>
</name>
</Slides>

Unexpected token o in JSON at position 1

I keep getting this error in this block of code below:
function openWebsocket(url) {
var ws;
ws = $websocket(url);
ws.onOpen(function(event) {
console.log(' Websocket connection established:', event);
});
ws.onMessage(function(message) {
var userObj = UserFactory.getUserObject();
var settings = userObj.alert_settings;
// The JSON parsing...
var parsedMsg = JSON.parse(message.data);
var alert = JSON.parse(parsedMsg);
var date = new Date(parseFloat(alert.start_epoch+'000'));
alert.hour = date.getHours() +':'+date.getMinutes();
alert.percent_change = Math.round(alert.percent_change);
var shouldPush = main_alert_filter(settings, alert);
updateFeed(alerts, shouldPush, alert);
});
}
I've looked at both Parsing JSON giving "unexpected token o" error and I keep getting "Uncaught SyntaxError: Unexpected token o"
However neither answer helped. Because when I first run JSON.parse(message.data) I get a string back not an Object. So thus I have to run JSON.parse again to finally get a real object back.
This is what message.data looks like:
"
"{\"term\": \"\\\"nike\\\"\", \"percent_change\": 125, \"hour\": \"10:9\", \"term_id\": 2890413, \"start_epoch\": 1420474140, \"term_trend_id\": 793950, \"end_epoch\": 1420477740, \"formatted_date_difference\": \"January 5, 2015\", \"tickers\": [\"NKE\", \"$PUM\", \"ADDYY\", \"LULU\", \"UA\", \"HIBB\"], \"twitter_preview\": \"\", \"type\": \"spike\", \"approved\": 1, \"search_preview\": [\"\"]}"
"
Now after the first parsing parsedMsg is a string that looks like this:
{"term": "minimum wage +increase", "percent_change": 729, "hour": "9:14", "term_id": 2522115, "start_epoch": 1447168440, "term_trend_id": 657898, "end_epoch": 1447175700, "formatted_date_difference": "November 10, 2015", "tickers": ["$JAB", "$SLCY", "AAL", "AAPL", "ABCD", "ABTL", "ADDYY", "ADM", "AEO", "AFCO", "AHC"......
Finally I need an actual object, so I have to run JSON.parse again to get this:
Object {term: "minimum wage +increase", percent_change: 729, hour: "9:14", term_id: 2522115, start_epoch: 1447168440…}
Another thing to note, I never get that error when I'm stepping through in Chrome. It only happens when I don't have the breakpoint set. Could this be a race condition type issue? Like it tries to JSON.parse something that isn't ready to be parsed?
UPDATE
Ok so sometimes the JSON is invalid apparently and sometimes not, so far I'm doing good without errors with the following snippet, thoughts?
if (typeof alert === 'object') {
// do nothing...
} else {
var alert = JSON.parse(alert);
}
Most of the time the alert result of JSON.parse(message.data) is a string so I need the other check to double parse it.

Why would you parse your json second time, its already been parsed in the first attempt.
Have a look at the snippet
var obj = "{\"term\": \"minimum wage +increase\", \"percent_change\": 729, \"hour\": \"9:14\", \"term_id\": 2522115, \"start_epoch\": 1447168440, \"term_trend_id\": 657898, \"end_epoch\": 1447175700, \"formatted_date_difference\": \"November 10, 2015\", \"tickers\": [\"$JAB\", \"$SLCY\", \"AAL\", \"AAPL\", \"ABCD\", \"ABTL\", \"ADDYY\"]}";
$(function(){
var data = JSON.parse(obj);
alert(typeof data);
console.log(data.tickers[0] +" -> an item in `tickers` array");
console.log(data.tickers);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

The JSON string you specified with message.data is not a well formed JSON parsed as String. It might be because the server is sending you a multi-part message during/after establishing the connection.
I suggest you print the message object received in OnMessage function and analyze if they are fully formed valid JSON Strings.

It looks like Your message.data is incomplete.
Take a look on the library docs You are using, maybe you should collect the data until it's end? Maybe there is some onEnd method?

We Keep Coding

JavaScript is the programming language of the Web.

How to apply regular expression for Javascript - javascript

Try this one: /.?({"type":.?,"status_code":"\d+"\})/ When used in Javascript, the part covered by the parentheses counts as Group 1, i.e.,: const messJson = JSON.parse(message.match(/.?({"type":.?,"status_code":"\d+"\})/)[1]); Reference here: https://regexr.com/66mf2

Related

Parsing JSON with escaped unicode characters displays incorrectly

Translate Special Characters with npm latinize is not working dynamically

ctx in ANTLR4 javascript visitor

Problem in printing angle brackets using the xml-builder node module

Unexpected token o in JSON at position 1

Categories

Resources

We Keep Coding

JavaScript is the programming language of the Web.

How to apply regular expression for Javascript - javascript

Try this one: /.*?({"type":.*?,"status_code":"\d+"\})/ When used in Javascript, the part covered by the parentheses counts as Group 1, i.e.,: const messJson = JSON.parse(message.match(/.*?({"type":.*?,"status_code":"\d+"\})/)[1]); Reference here: https://regexr.com/66mf2

Related

Parsing JSON with escaped unicode characters displays incorrectly

Translate Special Characters with npm latinize is not working dynamically

ctx in ANTLR4 javascript visitor

Problem in printing angle brackets using the xml-builder node module

Unexpected token o in JSON at position 1

Categories

Resources

Try this one: /.?({"type":.?,"status_code":"\d+"\})/ When used in Javascript, the part covered by the parentheses counts as Group 1, i.e.,: const messJson = JSON.parse(message.match(/.?({"type":.?,"status_code":"\d+"\})/)[1]); Reference here: https://regexr.com/66mf2