Escaping square brackets with node expressionparser - javascript

I am using node expressionparser in combination with jsonpath-plus library to parse json paths and read the data out of a json object based on the term returned. Here is what my parser code looks like:
export function initialiseParser(workerObject: WorkerObjectType): ExpressionParser {
return init(formula, (term: string) => {
if (term === '$NULL') {
return '$NULL';
} else if (term.startsWith('$VALUE:')) {
const [, key] = term.split('$VALUE:');
return getJsonData({ path: key, json: workerObject }) ?? '$NULL';
} else {
return '$NULL';
}
});
}
I am trying to parse the below expression:
$VALUE:Worker_Data.Employment_Data.Worker_Job_Data[?(#.attributes['wd:Primary_Job'] === '1')].Position_Data.Business_Site_Summary_Data.Name
When running this through the parser, I can see the term evaluates to .Position_Data.Business_Site_Summary_Data.Name. The result I would like to achieve is to get out the full expression: Worker_Data.Employment_Data.Worker_Job_Data[?(#.attributes['wd:Primary_Job'] === '1')].Position_Data.Business_Site_Summary_Data.Name.
My gut feeling is that the formula language is recognizing the '[]' and parses it at something else but looking through the docs I am unclear as to what.
I wonder is someone could shed some light on this for me? If formula uses [] for something else, how can I escape the character when using it in the expression? Also open to other solutions. Please let me know if more details are required.

Related

Parsing JSON with escaped unicode characters displays incorrectly

I have downloaded JSON data from Instagram that I'm parsing in NodeJS and storing in MongoDB. I'm having an issue where escaped unicode characters are not displaying the correct emoji symbols when displayed on the client side.
For instance, here's a property from one of the JSON files I'm parsing and storing:
"title": "#mujenspirits is in the house!NEW York City \u00f0\u009f\u0097\u00bd\u00f0\u009f\u008d\u008e \nImperial Vintner Liquor Store"
The above example should display like this:
#mujenspirits is in the house!NEW York City 🗽🍎
Imperial Vintner Liquor Store
But instead looks like this:
#mujenspirits is in the house!NEW York City 🗽ðŸŽ
Imperial Vintner Liquor Store
I found another SO question where someone had a similar problem and their solution works for me in the console using a simple string, but when used with JSON.parse still gives the same incorrect display. This is what I'm using now to parse the JSON files.
export default function parseJsonFile(filepath: string) {
const value = fs.readFileSync(filepath)
const converted = new Uint8Array(
new Uint8Array(Array.prototype.map.call(value, (c) => c.charCodeAt(0)))
)
return JSON.parse(new TextDecoder().decode(converted))
}
For posterity, I found an additional SO question similar to mine. There wasn't a solution, however, one of the comments said:
The JSON files were generated incorrectly. The strings represent Unicode code points as escape codes, but are UTF-8 data decoded as Latin1
The commenter suggested encoding the loaded JSON to latin1 then decoding to utf8, but this didn't work for me either.
import buffer from 'buffer'
const value = fs.readFileSync(filepath)
const buffered = buffer.transcode(value, 'latin1', 'utf8')
return JSON.parse(buffered.toString())
I know pretty much nothing about character encoding, so at this point I'm shooting in the dark searching for a solution.
An easy solution is to decode the string with the uft8 package
npm install utf8
Now as an example of use, look at this code that uses nodejs and express:
import express from "express";
import uft8 from "utf8";
const app = express();
app.get("/", (req, res) => {
const text = "\u00f0\u009f\u0097\u00bd\u00f0\u009f\u008d\u008e it is a test";
const textDecode = uft8.decode(text);
console.log(textDecode);
res.send(textDecode);
});
const port = process.env.PORT || 5000;
app.listen(port, () => {
console.log("Server on port 5000");
});
The result is that in localhost:5000 you will see the emojis without problem. You can apply this idea to your project, to treat the json with emojis.
And here is an example from the client side:
const element= document.getElementById("text")
const txt = "\u00f0\u009f\u0097\u00bd\u00f0\u009f\u008d\u008e it is a test"
const text= utf8.decode(txt)
console.log(text)
element.innerHTML= text
<script src="https://cdnjs.cloudflare.com/ajax/libs/utf8/2.1.1/utf8.min.js" integrity="sha512-PACCEofNpYYWg8lplUjhaMMq06f4g6Hodz0DlADi+WeZljRxYY7NJAn46O5lBZz/rkDWivph/2WEgJQEVWrJ6Q==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
<p id="text"></p>
You can try converting the unicode escape sequences to bytes before parsing the JSON; probably, the utf8.js library can help you with that.
Alternatively, the solution you found should work but only after unserializing the JSON (it will turn each unicode escape sequence into one character). So, you need to traverse the object and apply the solution to each string
For example:
function parseJsonFile(filepath) {
const value = fs.readFileSync(filepath);
return decodeUTF8(JSON.parse(value));
}
function decodeUTF8(data) {
if (typeof data === "string") {
const utf8 = new Uint8Array(
Array.prototype.map.call(data, (c) => c.charCodeAt(0))
);
return new TextDecoder("utf-8").decode(utf8);
}
if (Array.isArray(data)) {
return data.map(decodeUTF8);
}
if (typeof data === "object") {
const obj = {};
Object.entries(data).forEach(([key, value]) => {
obj[key] = decodeUTF8(value);
});
return obj;
}
return data;
}

Regex Validation for userInput in React

I am trying to implement a search using regex . To validate if the entered value is a valid regex in the search box I am using the source code from a library regex-validate (REGEX VALIDATION LIBRARY - regex-regex)
If the entered value is a valid regex then I am Parsing it to a regular expression using the source code from this library Regex-Parse) (PARSING LIBRARY - Regex Parser) to filter/search using the parsed regex.Here is a code snippet for the same
import { useState } from "react";
import "./styles.css";
import { re, RegexParser } from "./regexValidation";
export default function App() {
const [val, setVal] = useState("");
const [validRegex, setValidRegex] = useState(false);
const validateRegex = (val: string) => {
setVal(val);
if (val === "") {
setValidRegex(false);
return;
}
// to check if the entered value(val) is a valied regex in string
if (re.test(val)) {
setValidRegex(false);
// parsing the entered value(val) to a regularexpression
const convertToRegex = RegexParser(val);
//filtering logic to filter based on convertToRegex variable
} else {
setValidRegex(true);
}
};
const handleChange = (e: any) => {
validateRegex(e.target.value);
};
return (
<div className="App">
<input value={val} onChange={handleChange}></input>
<h1>{validRegex ? "inValidRegex" : "ValidRegex"}</h1>
</div>
);
}
CodeSandBox link for the regex search RegexValidationSearch
I am facing an issue when the user enters '/?/' or '/*/' the re.test(val) returns true thereby implying that it is a valid regex but when it is trying to get parsed that is this line of code const convertToRegex = RegexParser(val) it throws the following errorRegexError
Is there any way to fix this such that this line of code re.test(val) returns false when the user enters any invalid regular expression there by implying that it is an invalid regex(in string format) and hence there is no need to parse it to a regular expression
This looks like it might be an incompatibility between the two libraries you are using (ie, they have different ideas of what valid Regex is).
Easiest way to fix this (and honestly the safest too, since you're dealing with user input) is to wrap your entire parsing logic with a try/catch like this:
// to check if the entered value(val) is a valied regex in string
if (re.test(val)) {
let convertToRegex;
try {
convertToRegex = RegexParser(val);
setValidRegex(true); // only set this AFTER a successful parse.
// also note I've swapped the true / false value here.
} catch (e) {
setValidRegex(false);
}
if (convertToRegex) {
// TODO: implement filtering logic based on convertToRegex variable
}
} else {
// NOTE: it didn't parse correctly, shouldn't this be false?
setValidRegex(false); // I've changed it for you
}
Also I think(?) you've made a couple errors in how you're handling setValidRegex which I've corrected in code. Don't be optimistic and say the user input is valid regex when you haven't actually confirmed (by creating a RegexParser) that it is!
With this approach there's an argument for deleting the re.test(val) and the other library entirely since you can get what you want by simply try/catch-ing. Up to you to decide if this is a decent choice for your codebase.

JavaScript - multiline template literals when source code has tabs in front

I am trying to create a JS multiline template literal like this:
function _on_message_arrived(_m) {
// Feedback.
console.log(
`FUNCTION: "_on_message_arrived()":
String (glyphs): ${_m.payloadString}
String (hex): ${_m.payloadBytes}`
);
}
But because there are tabs in 2nd and 3rd lines, these tabs are also printed in the browser's console (I colored them in red):
How can I format the JS source code so that it resembles what I get in multiple lines? That is when I am using tabs in the source code to indent the code and I am also using template literal.
You will have to resort to wrecking your code's indentation, unfortunately:
function _on_message_arrived(_m) {
// Feedback.
console.log(
`FUNCTION: "_on_message_arrived()":
String (glyphs): ${_m.payloadString}
String (hex): ${_m.payloadBytes}`
);
}
Get rid of whitespaces and try to make poor man's content management system.
const data = {
payloadString: '26.9',
payloadBytes: '50,54,46,57'
}
function _on_message_arrived(_m) {
const padStuff = (match, offset, string) => {
return Array(Number(match.replace('<', '').replace('>', ''))).fill('\t').join('');
}
// Feedback.
console.log(
`FUNCTION: "_on_message_arrived()":
<1>String (glyphs): ${_m.payloadString}
<2>String (hex): ${_m.payloadBytes}`.replace(/ +?/g, '').replace(/\<(?<year>\d+)\>/g, padStuff));
}
_on_message_arrived(data);

How to replace string values with numeric inside a json object used in google visualization api column chart

I have this Json string that i use for google chart visualization that needs to be in this exact format and i need to replace every value of "v" that is a number to its numeric value( the value without the ""). I should do some javascript replace function, but i couldn't find a way to move around the json object. Here is and example json string that i should modify :
{"cols":[
{"id":"r","label":"Reason","type":"string"},
{"id":"m","label":"Minutes","type":"number"}
],
"rows":[
{"c":[
{"v":"Flour - Blower","f":"Flour - Blower"},
{"v":"7","f":"7"}]},
{"c":[
{"v":"Whole Line - d","f":"Whole Line - d"},
{"v":"4","f":"4"}]},
{"c":[
{"v":"Flour - Pipework","f":"Flour - Pipework"},
{"v":"3","f":"3"}]},
{"c":[
{"v":"Horseshoe - Belt","f":"Horseshoe - Belt"},
{"v":"1","f":"1"}]}
],
"p":null
}
probably i should do something like :
var jsonStr = ...;
for (i in jsonStr.rows) {
for(j in jsonStr[i].c)
{
if (parseInt(jsonStr[i].c[j].v) != 'NaN') {
jsonStr.rows[i].c[j].v = parseInt(jsonStr.rows[i].c[j].v);
}
}
Since JSON is effectively a string, why not put the entire string through a global string.replace:
jsonStr = JSON.stringify(jsonStr);
jsonStr = jsonStr.replace(/"v":"(\d+)"/g, '"v":$1');
Jsfiddle demo
Well, the parsing seems okay to me. It's probably not working because you can't really check if a string contains a number or not by comparing something with NaN
This is because even NaN === NaN, famously, returns false.
I'd suggest that you use the isNaN method (which does use parseInt internally). So, something like this ought to work
for (i in jsonStr.rows) {
for(j in jsonStr[i].c)
{
if (!isNaN(jsonStr[i].c[j].v)) {
jsonStr.rows[i].c[j].v = parseInt(jsonStr.rows[i].c[j].v);
}
}
A function that returns string if isNaN else a number:
function convertNumberToInteger(val) {
if (isNaN(val)) {
return val;
} else {
return parseInt(val);
}
}
Usage:
convertNumberToInteger("sasdfasdf");
Output: "sasdfasdf"
convertNumberToInteger("3");
Output: 3
And if you really want to parse it you can do a forEach on the JSON object

jQuery / JavaScript Parsing strings the proper way

Recently, I've been attempting to emulate a small language in jQuery and JavaScript, yet I've come across what I believe is an issue. I think that I may be parsing everything completely wrong.
In the code:
#name Testing
#inputs
#outputs
#persist
#trigger
print("Test")
The current way I am separating and parsing the string is by splitting all of the code into lines, and then reading through this lines array using searches and splits. For example, I would find the name using something like:
if(typeof lines[line] === 'undefined')
{
}
else
{
if(lines[line].search('#name') == 0)
{
name = lines[line].split(' ')[1];
}
}
But I think that I may be largely wrong on how I am handling parsing.
While reading through examples on how other people are handling parsing of code blocks like this, it appeared that people parsed the entire block, instead of splitting it into lines as I do. I suppose the question of the matter is, what is the proper and conventional way of parsing things like this, and how do you suggest I use it to parse something such as this?
In simple cases like this regular expressions is your tool of choice:
matches = code.match(/#name\s+(\w+)/)
name = matches[1]
To parse "real" programming languages regexps are not powerful enough, you'll need a parser, either hand-written or automatically generated with a tool like PEG.
A general approach to parsing, that I like to take often is the following:
loop through the complete block of text, character by character.
if you find a character that signalizes the start of one unit, call a specialized subfunction to parse the next characters.
within each subfunction, call additional subfunctions if you find certain characters
return from every subfunction when a character is found, that signalizes, that the unit has ended.
Here is a small example:
var text = "#func(arg1,arg2)"
function parse(text) {
var i, max_i, ch, funcRes;
for (i = 0, max_i = text.length; i < max_i; i++) {
ch = text.charAt(i);
if (ch === "#") {
funcRes = parseFunction(text, i + 1);
i = funcRes.index;
}
}
console.log(funcRes);
}
function parseFunction(text, i) {
var max_i, ch, name, argsRes;
name = [];
for (max_i = text.length; i < max_i; i++) {
ch = text.charAt(i);
if (ch === "(") {
argsRes = parseArguments(text, i + 1);
return {
name: name.join(""),
args: argsRes.arr,
index: argsRes.index
};
}
name.push(ch);
}
}
function parseArguments(text, i) {
var max_i, ch, args, arg;
arg = [];
args = [];
for (max_i = text.length; i < max_i; i++) {
ch = text.charAt(i);
if (ch === ",") {
args.push(arg.join(""));
arg = [];
continue;
} else if (ch === ")") {
args.push(arg.join(""));
return {
arr: args,
index: i
};
}
arg.push(ch);
}
}
FIDDLE
this example just parses function expressions, that follow the syntax "#functionName(argumentName1, argumentName2, ...)". The general idea is to visit every character exactly once without the need to save current states like "hasSeenAtCharacter" or "hasSeenOpeningParentheses", which can get pretty messy when you parse large structures.
Please note that this is a very simplified example and it misses all the error handling and stuff like that, but I hope the general idea can be seen. Note also that I'm not saying that you should use this approach all the time. It's a very general approach, that can be used in many scenerios. But that doesn't mean that it can't be combined with regular expressions for instance, if it, at some part of your text, makes more sense than parsing each individual character.
And one last remark: you can save yourself the trouble if you put the specialized parsing function inside the main parsing function, so that all functions have access to the same variable i.

Categories