Merging two JSONata expressions - javascript

I am using JSONata for performing JSON to JSON transformation.
For some unique reasons, I want to merge two JSONata expressions :
As an example :
Parent Expression:
var script = `
{
"data":
{
"name" : data.payload.Name.(FirstName & ' ' & LastName),
"alias": data.payload.Name.(Salutation & ' ' & FirstName),
"active": data.payload.Status = 'New' ? true : false,
"signature": "Have good day ," & data.payload.Name.FirstName & "!"
}
}
`;
Also I have few simple assignment kind of JSONata expression like :
Expression 1 :
{
"source" : source
}
Expression 2 :
{
"data": {
"email" : data.payload.Email
}
}
I would like to add above two expressions to expressions defined using script.
So after adding these two expressions, I should be able to get :
var script = `
{
"source": source,
"data":
{
"name" : data.payload.Name.(FirstName & ' ' & LastName),
"alias": data.payload.Name.(Salutation & ' ' & FirstName),
"active": data.payload.Status = 'New' ? true : false,
"signature": "Have good day ," & data.payload.Name.FirstName & "!",
"email": data.payload.Email
}
}
`;
How do I do using javascript/JSONata ?
Background and constraints :
Child Expressions (expression 1 and 2 in the example) (that is supposed to be added into Parent expression) will always be simple assignment like "a" : x.y.z or "b" : x.
Child Expressions may already be present in parent expression. In that case, it replaces assignment.
Also I want to delete some json paths from parent expression (ofcouse , if it is present) like If delete path data.Email.
What I have done ? :
I tried to convert JSONata script to JSON by putting values under double quotes and encoding value using escape() function.
Once I have JSON, I look for path mentioned in child expression (like data.Email)
If path exists : replace its value
If path does not exist : create path and assign value
If path is supposed to be deleted : simply delete it.
Once I have done processing above JSON,
I convert it to JSONata script by removing quotes using bunch of regex and then applying unescape() method for decoding.
The problem with this approach is :
It is not reliable (regex matching and replacement is not fullproof)
I am not sure whether every JSONata (which does not declare any functions) can be converted to valid JSON always.

I think that your best bet might be to translate your expressions to the JSONata AST and then merge them into a new AST.
Here's a super simple example:
const ast1 = jsonata(expr1).ast();
const ast2 = jsonata(expr1).ast();
if (ast1.type !== "unary" || ast2.type!== "unary") throw Error("Only support unary expressions")
const combinedAst = {
"type": "unary",
"value": "{",
"lhs": [...ast1.lhs, ast2.lhs]
}
// TODO: Serialize the AST or inject it into jsonata()
The problem is what to do with your new AST. In my case I also wrote a custom serializer to turn the AST back into a JSONata string, and evaluate that.
ASTs in use
AST for Expression 1
{
"type": "unary",
"value": "{",
"position": 1,
"lhs": [
[
{
"value": "source",
"type": "string",
"position": 13
},
{
"type": "path",
"steps": [
{
"value": "source",
"type": "name",
"position": 22
}
]
}
]
]
}
AST for Expression 2
{
"type": "unary",
"value": "{",
"position": 1,
"lhs": [
[
{
"value": "data",
"type": "string",
"position": 10
},
{
"type": "unary",
"value": "{",
"position": 13,
"lhs": [
[
{
"value": "email",
"type": "string",
"position": 26
},
{
"type": "path",
"steps": [
{
"value": "data",
"type": "name",
"position": 33
},
{
"value": "payload",
"type": "name",
"position": 41
},
{
"value": "Email",
"type": "name",
"position": 47
}
]
}
]
]
}
]
]
}
Combined AST
{
"type": "unary",
"value": "{",
"position": 1,
"lhs": [
[
{
"value": "source",
"type": "string",
"position": 12
},
{
"type": "path",
"steps": [
{
"value": "source",
"type": "name",
"position": 20
}
]
}
],
[
{
"value": "data",
"type": "string",
"position": 30
},
{
"type": "unary",
"value": "{",
"position": 33,
"lhs": [
[
{
"value": "email",
"type": "string",
"position": 46
},
{
"type": "path",
"steps": [
{
"value": "data",
"type": "name",
"position": 53
},
{
"value": "payload",
"type": "name",
"position": 61
},
{
"value": "Email",
"type": "name",
"position": 67
}
]
}
]
]
}
]
]
}

Related

How do you delete an empty array from javascript?

For reference I have zero javascript knowledge or any coding knowledge. I typically just hook up applications via IPASS applications that don't require any coding knowledge. However, I found out I need to inject some javascript into the application in order to avoid an error message.
I have the below JSON record.
I need to get rid of the empty array (sorry... if it is not an array but an object? Like I said, no javascript knowledge).
In the below code essentially what I want is to completely delete this line, because there is nothing inside the brackets and it is causing errors:
"lineitemsdata": []
Full JSON record below for reference
"id": "5399286500",
"properties": {
"state": "AB",
"website": null,
"zip": "T3B5Y9"
},
"createdAt": "2021-02-18T22:13:06.111Z",
"updatedAt": "2021-05-17T14:35:09.540Z",
"archived": false,
"associations": {
"deals": {
"results": [
{
"id": "5230410841",
"type": "company_to_deal"
}
]
}
},
"dealdata": [
{
"id": "5230410841",
"properties": {
"hs_lastmodifieddate": "2021-05-13T14:00:33.101Z",
"hs_object_id": "5230410841",
"hubspot_owner_id": "52200226"
},
"associations": {
"line items": {
"results": [
{
"id": "1468189759",
"type": "deal_to_line_item"
},
{
"id": "1468189760",
"type": "deal_to_line_item",
"lineitemsdata": []
}
]
}
}
}
],
"DealOwner": [
{
"id": "52200226",
"email": "email#email.com",
"firstName": "Bob"
}
],
"NetSuiteCustomerID": 1745
}
Item inside object is called a property. If you (for some reason) have to include the property, but don't want it to have any value you can either set it's value to null or undefined.
I suspect I'm going to get criticized for this, but here is a quick and dirty way of removing this specific problem through string replacement. The 'right' way would be to break down your json into separte objects until you get to the level where the offending object lives, remove it, then rebuild it all back. For what it's worth, here's an alternative to that
let json = {
"id": "5399286500",
"properties": {
"state": "AB",
"website": null,
"zip": "T3B5Y9"
},
"createdAt": "2021-02-18T22:13:06.111Z",
"updatedAt": "2021-05-17T14:35:09.540Z",
"archived": false,
"associations": {
"deals": {
"results": [{
"id": "5230410841",
"type": "company_to_deal"
}]
}
},
"dealdata": [{
"id": "5230410841",
"properties": {
"hs_lastmodifieddate": "2021-05-13T14:00:33.101Z",
"hs_object_id": "5230410841",
"hubspot_owner_id": "52200226"
},
"associations": {
"line items": {
"results": [{
"id": "1468189759",
"type": "deal_to_line_item"
},
{
"id": "1468189760",
"type": "deal_to_line_item",
"lineitemsdata": []
}
]
}
}
}],
"DealOwner": [{
"id": "52200226",
"email": "email#email.com",
"firstName": "Bob"
}],
"NetSuiteCustomerID": 1745
}
json = JSON.stringify(json)
let strstart = json.indexOf('"lineitemsdata":[]');
let commapos = json.lastIndexOf(',', strstart);
json = json.substr(0, commapos) + " " + json.substr(commapos + 1);
json = json.replace('"lineitemsdata":[]', '');
json = JSON.parse(json)
console.log(json)
You can use this to strip any empty lineitems arrays from your json.
Assuming the reference to your record is json
for(dealIdx in json.dealdata) {
for (resultIdx in json.dealdata[dealIdx].associations["line items"].results) {
let lineItemsData = json.dealdata[dealIdx].associations["line items"].results[resultIdx].lineitemsdata
if (lineItemsData != undefined && lineItemsData.length === 0 ) {
delete json.dealdata[dealIdx].associations["line items"].results[resultIdx].lineitemsdata
}
}
}

How to Restructure/Translate Json objects to satisfy any schema?

I have a (long) json schema and I already validate the json objects in my code based on that schema.
My problem/use-case is to translate/restructure an incoming json object to a new json object that satisfies my schema.
I realized however, that it isn't that easy to do, especially if you look for an "adaptive" solution.
Adaptive meaning that the code doesn't care about changes of the schema or the incoming object.
So my first approach was to create a class that contains a mapping of the incoming object to a valid object.
This means however that I have to write this map by hand and update it if I change the schema.
So is there a solution where I can insert any schema, any non-valid (based on schema) json object and get a valid object out?
Or at-least a way to generate a valid json object template, where you can insert the data of the incoming invalid json object?
This way I could at least be independent from schema changes.
PS: I tried to solve this problem in an abstract way, but if needed I can provide schema and the invalid object.
PPS: My environment is node.js.
Edit: E.g.:
Schema
{
...//other Schema stuff (using draft-07)
"tokens": {
"$id": "/token",
"title": "Token",
"description": "A representation of a word in a Message with all the nlp Analysis results and infos",
"type": "array",
"minItems": 1,
"items": {
"$ref": "#/definitions/tokens/definitions/token"
},
"token": {
"$id": "#token",
"type": "object",
"required": [
"id",
"text"
],
"properties": {
"id": {
"description": "global (level:conversation) unique identifier to identify a token (cumulative)",
"type": "number",
"exclusiveMinimum": 0,
"uniqueItems": true
},
"isText": {
"description": "Describes if the token is a real semantic word or a whitespace or similar",
"type": "boolean"
},
"text": {
"description": "the actual content of the token",
"type": "string",
"pattern": "^(.+)$"
},
"word_index": {
"description": "Index of a word in a text",
"type": "number",
"exclusiveMinimum": 0,
"uniqueItems": true
},
"punctuation": {
"description": "The punctuation that needs to be rendered before/after that token. TODO: finish it",
"type": "string"
},
"characterOffsetBegin": {
"type": "integer",
"description": "The inclusive character index in the sentence where this token begins"
},
"characterOffsetEnd": {
"type": "integer",
"description": "The exclusive character index in the sentence where this token ends"
},
"lemma": {
"type": "string",
"pattern": "^(.+)$"
},
"pos": {
"type": "string",
"description": "part of speech tag",
"pattern": "^(.+)$"
},
}
},
"text_label": {
//This is where it gets difficult
"description": "Label of the text for the frontend. ",
"type": "object",
"required": [
"name",
"begin",
"end"
],
"additionalProperties": false,
"properties": {
"name": {
"description": "Name of the label",
"type": "string"
},
"begin": {
"description": "Token index where the label starts",
"type": "number"
},
"end": {
"description": "Token indes where the label ends",
"type": "number"
}
}
}
}
}
Object to translate:
{
"text": "Hello, my name is Somebody. ",
"sentences": [
{
"index": 0,
"tokens": [
{
"index": 1,
"word": "Hello",
"originalText": "Hello",
"characterOffsetBegin": 0,
"characterOffsetEnd": 5,
"before": "",
"after": "",
"pos": "UH",
"lemma": "hello"
},
{
"index": 2,
"word": ",",
"originalText": ",",
"characterOffsetBegin": 5,
"characterOffsetEnd": 6,
"before": "",
"after": " ",
"pos": ",",
"lemma": ","
},
{
"index": 3,
"word": "my",
"originalText": "my",
"characterOffsetBegin": 7,
"characterOffsetEnd": 9,
"before": " ",
"after": " ",
"pos": "PRP$",
"lemma": "my"
},
{
"index": 4,
"word": "name",
"originalText": "name",
"characterOffsetBegin": 10,
"characterOffsetEnd": 14,
"before": " ",
"after": " ",
"pos": "NN",
"lemma": "name"
},
{
"index": 5,
"word": "is",
"originalText": "is",
"characterOffsetBegin": 15,
"characterOffsetEnd": 17,
"before": " ",
"after": " ",
"pos": "VBZ",
"lemma": "be"
},
{
"index": 6,
"word": "Somebody",
"originalText": "Somebody",
"characterOffsetBegin": 18,
"characterOffsetEnd": 26,
"before": " ",
"after": "",
"pos": "NN",
"lemma": "somebody"
},
{
"index": 7,
"word": ".",
"originalText": ".",
"characterOffsetBegin": 26,
"characterOffsetEnd": 27,
"before": "",
"after": "",
"pos": ".",
"lemma": "."
}
],
"basicDependencies": []
}
]
}
Note: that this is corenlp analysis result that can change based on the annotator used.
Edit2: adding example of a correct object:
{ ...other parts of the object
tokens:[
{ id:1, text: 'Hello', isText: true, word_index: 0, characterOffsetBegin:0, characterOffsetEnd: 5, lemma: 'hello', pos:'UH', ...},
{...},
{...}
],
text_label:{
name: 'joy', begin: 0, end: 3
}
}
or another text_label:
text_label:{
name: 'PERSON', begin: 4, end: 5
}
Edit: Difference:
As you can see, some changes a suttle like index->word_index, some are additional information like id, but some need a bit more like the text_label.
corenlp. It could also happen that I need to unify 2 objects together. which is the case of the 'anger' text_label but isnt with the PERSON text_label.
So Im looking for some kind of json Object/Schema manager where translations, validations get easy and dont need hard coded mappings or the generation of that mappings is easy.

JSON schema definitions or oneOf in array of objects?

I need to express an array of different objects in a schema. The array, called contents, can contain any number of elements, but they must be one of two types of object: One type of object represents a piece of text, the other type of object represents an image.
So far, I've not been able to find a way to enforce the validation correctly. It seems (nested) required inside a oneOf doesn't work, so I tried using definitions but that doesn't seem to fit.
I've tried a few online validators, but they seem happy for me to add illegal values in item-2 value object. Its the value property that seems to be the biggest problem. Unfortunately, due to legacy issues, I'm stuck with this being an object in an array.
Is it possible to validate and enforce correct type/requirements for this object?
(this is the data, not a schema. Unfortunately, we also used the keyword type when we design the original json layout!)
{
"uuid":"780aa509-6b40-4cfe-9620-74a9659bfd59",
"contents":
[
{
"name":"item-1",
"label":"My editable text Label",
"value":"This text is editable",
"type":"text"
},
{
"name":"item-2",
"label":"My editable image label",
"index":0,
"type":"image",
"value":
[
{
"name":"1542293213356.png",
"rect":[0,0,286,286]
}
]
}
],
"version":"2.0"
}
Well, I think this is it, though the online validator doesn't seem 100% reliable. Editing the values isn't always invalidating the object.
{
"$schema": "http://json-schema.org/draft-06/schema#",
"type": "object",
"properties": {
"uuid": { "type": "string" },
"version": { "type": "string" },
"contents": {
"$ref": "#/definitions/contents"
},
},
"required": ["uuid", "version"],
"definitions": {
"image": {
"type": "object",
"properties": {
"name": { "type": "string" },
"label": { "type": "string" },
"type": { "enum": ["image", "text"] },
"value": { "type": "object" }
},
"required": ["name", "label", "type", "value"]
},
"text": {
"type": "object",
"properties": {
"name": { "type": "string" },
"label": { "type": "string" },
"type": { "enum": ["image", "text"] },
"value": { "type": "string" }
},
"required": ["name", "label", "type", "value"]
},
"contents": {
"type": "array",
"contains": {
"oneOf": [
{ "$ref": "#/definitions/image" },
{ "$ref": "#/definitions/text" }
]
},
},
},
}

What is the best way to replace text in json?

So I have a bunch of JSON data and it contains a few fields. for example:
[{
"id": "XXX",
"version": 1,
"head": {
"text": "Main title",
"sub": {
"value": "next"
},
"place": "secondary"
},
"body": [{
"id": "XXX1",
"info": "three little birds",
"extended": {
"spl": {
"text": "song",
"type": {
"value": "a"
}
}
}
},
{
"id": "XXX2",
"info": [
"how are you?"
],
"extended": {
"spl": {
"text": "just",
"non-type": {
"value": "abc"
}
}
}
}
]
}]
what I'm trying to do is kind of conversion table (from a different JSON file)
if a field has the value 'a' replace it with 'some other text..' etc.
I have a service for the JSON pipeline, so I guess this is the right place to do the replacement.
so for this example, I have the JSON above and in my conversion table I have the following terms:
next: forward,
song: music,
a: option1,
just: from
etc...
What you are looking for can be achieved with templates. Replace the variable sections with some specific markers that you can find and replace from some external tools such as perl or sed.
For example, you could have a template.json with something like this:
...
"type": {
"value": "##VALUE##"
}
...
Then when you need the actual JSON, you could pass this though an intermediate script that replaces these templates with actual data.
cat template.json | sed -e 's/##VALUE##/my_value/' > target.json
Alternatively, with Perl:
cat template.json | perl -pi -e 's:\#\#VALUE\#\#:my_value:' > target.json
The best way is to parse it, replace the text in the object, and then stringify it.
The next best way is to use a regular expression.
In this example, I catch exceptions if path cannot be indexed, and use ['type'] instead of .type so it will scale to indexing 'non-type' if you wish.
const data = `[{
"id": "XXX",
"version": 1,
"head": {
"text": "Main title",
"sub": {
"value": "next"
},
"place": "secondary"
},
"body": [{
"id": "XXX1",
"info": "three little birds",
"extended": {
"spl": {
"text": "song",
"type": {
"value": "a"
}
}
}
},
{
"id": "XXX2",
"info": [
"how are you?"
],
"extended": {
"spl": {
"text": "just",
"non-type": {
"value": "abc"
}
}
}
}
]
}]
`
const o = JSON.parse(data)
o[0].body.forEach(b => {
try {
if (b.extended.spl['type'].value === 'a') {
b.extended.spl['type'].value = 'CHANGED'
}
} catch (e) {}
})
const newData = JSON.stringify(o, null, 2)
console.log(newData)
A string replace approach will work if you know and can rely on your source conforming, such as the only "value" is inside "type"
const data = `[{
"id": "XXX",
"version": 1,
"head": {
"text": "Main title",
"sub": {
"value": "next"
},
"place": "secondary"
},
"body": [{
"id": "XXX1",
"info": "three little birds",
"extended": {
"spl": {
"text": "song",
"type": {
"value": "a"
}
}
}
},
{
"id": "XXX2",
"info": [
"how are you?"
],
"extended": {
"spl": {
"text": "just",
"non-type": {
"value": "abc"
}
}
}
}
]
}]
`
const newData = data.replace(/"value": "a"/g, '"value": "NEWVALUE"')
console.log(newData)

Count Occurrences of Intrinsics Using Esprima

I am working on an application that allows user-submitted web-pages with embedded JavaScript code.
I would like to learn more about how my users are writing their JavaScript code, by building a map of common JavaScript intrinsics (built in objects) which shows the occurrence of each in user generated code.
Assume I have a JS file from a user's webpage:
Main.js
for (let i = 0; i < 10; i++) { myArrs.push(new Array()); }
let myObj = new Object();
I would like a script that can generate the following output:
Array: 10
Object: 1
I have attempted to do this using both regex and string traversal, but this does not account for the case where an item is used in an iteration. Simple string traversal would yield me with:
Array: 1
Object: 1
Which is not correct.
Esprima seems to provide a solution to this as it can perform syntactic analysis since it is a JS parser. I have attempted to use esprima.parseScript(input, config, delegate) to generate a tree and than traverse the tree but the output still does not take into account iterations.
Here is the output from my attempt at parsing this information using Esprima:
{
"type": "Program",
"body": [
{
"type": "ForStatement",
"init": {
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "i"
},
"init": {
"type": "Literal",
"value": 0,
"raw": "0"
}
}
],
"kind": "let"
},
"test": {
"type": "BinaryExpression",
"operator": "<",
"left": {
"type": "Identifier",
"name": "i"
},
"right": {
"type": "Literal",
"value": 10,
"raw": "10"
}
},
"update": {
"type": "UpdateExpression",
"operator": "++",
"argument": {
"type": "Identifier",
"name": "i"
},
"prefix": false
},
"body": {
"type": "BlockStatement",
"body": [
{
"type": "ExpressionStatement",
"expression": {
"type": "CallExpression",
"callee": {
"type": "MemberExpression",
"computed": false,
"object": {
"type": "Identifier",
"name": "myArrs"
},
"property": {
"type": "Identifier",
"name": "push"
}
},
"arguments": [
{
"type": "NewExpression",
"callee": {
"type": "Identifier",
"name": "Array"
},
"arguments": []
}
]
}
}
]
}
},
{
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "myObj"
},
"init": {
"type": "NewExpression",
"callee": {
"type": "Identifier",
"name": "Object"
},
"arguments": []
}
}
],
"kind": "let"
}
],
"sourceType": "script"
}
I was not able to find this answer on SO already - it seems like this is a useful problem to solve but requires a bit of knowledge in lexical analysis tools such as Esprima.

Categories