Javascript: Escape " from json string

Javascript: Escape " from json string - javascript

I've got a bit of a problem. I'm currently working on converting an old system to a newer version. In the old version, data requests were managed by Java applets, which had no problems with the "-char in the data.
Now, though, I'm fetching the data from a database and converting it to a JSON-string using XSLT, and then - with the prototype-function .evalJSON() - making the string into an object.
The XSL then structure the data like this (example) :
{rowsets: [ { rows: [ { "ID":"xxx","OtherProperty":"yyy" } ] } ] }
Which in it self is OK.
Now,when there's some data in the database containing "-characters, the evalJSON() fails, because it destroys the usually well-formatted JSON string, like this:
{rowsets: [ { rows: [ { "ID":"xxx","OtherProperty":"yyy "more" zzz" } ] } ] }
Now, what i want to do, is escape the 'unwanted' "-chars somehow - without having to make some kind of Stored Procedure to du it server-side for me.
I've tried to wrap my head around a RegEx, but I'm not very experienced in that area, and therefore I'm having a really hard time figuring it out.
If it's any help, the character sequences that are sure to be legal are:
[":"] and [","]
and the sequences that are likely to appear, and should be escaped, are:
[\s"], ["\s], [",], [".] (\s indicates a whitespace)
All kinds of help is appreciated, even if it's some SQL that makes it all a lot easier :)
Thanks in advance!

If you're in XSLT land then you're reinventing the wheel. Google up "badgerfish" and see here for a fairly solid implementation. You may of course have other problems getting in the way, but first things first.

I ended up taking a shortcut, using a string-replace template in my XSL to replace the "-characters with \" before returning the JSON to my javascript function, thus not needing to escape anything client-side.
(Used this: http://geekswithblogs.net/Erik/archive/2008/04/01/120915.aspx)

Related

JSON.parse() fails when a string contains the "null" substring... for example with italian words like "annullo" or "annullare"

Good morning everyone and thank you in advance for any suggestions. I have written a small web application to perform simple searches in a stamps database using php and javascript.
The server sends to the browser the whole database as a JSON and the queries are done client-side with a javascript code.
The JSON has this structure:
{"ck":0,"db":[["string11","string12","string13"],["string21","string22","string23"], etc... } .
Until now the system has worked perfectly and over 1500 stamps could be shown.
Suddenly it stopped working and, in the browser's Javascript console, this error message appeared:
VM672:1 Uncaught SyntaxError: Expected ',' or ']' after array element in JSON at position 97506 at JSON.parse (<anonymous>) ...etc...
After a series of tests, by exclusion I came to discover that it was the word "annullo" in the last added record to generate the error.
I guess it could be the substring "null" to give problems, but I have no idea how to escape it.
A really strange thing is that, whilst failing with the JSON.parse() function, browser's javascript console, as well as other json validation tools, recognise the server's response as a valid JSON.
Thanks for any help!

As #Andrea Soffiantini pointed in a comment, this was the actual problem:
Client side I had this command a few lines before parsing: data = data.replaceAll("null","\"\""); , where data is the json as a string, as received from server.
This replaces the string "anullo" with "a""o", which is invalid in JSON syntax.

I think I understand the problem. All strings are enclosed in quotes.
{example: "example"}
In the development of this case they must have put "annullo" but they forgot to insert the quotes. If it was null, not a string, it could look like this, without quotes.
json = {example: null}
However, as annullo is an invalid word in javascript and is not as a string, it gives an error.
The correct thing to do is to correct the error in the source, transforming it to null, however, I believe that you can also use this regular expression. But do tests to ensure that there is no undue substitution elsewhere.
json.replace(/.+["']: ?annull(o|are)[\n,}]$/g, matched => {
return matched.replace(/annull(o|are)/, 'null')
})

Why is google natural language returning an incorrect beginOffset for analyzed string?

I am using google-cloud/language api to make an #annotate call and analyze entities and sentiments from a csv of comments which I have taken from various online resources.
To begin with, the string I am trying to analyze includes commentId's so I reformat this:
youtubez22htrtb1ymtdlka404t1aokg2kirffb53u3pya0,i just bot a Nostromo... ( ._.)
youtubez22oet0bruejcdf0gacdp431wxg3vb2zxoiov1da,Good Job Baby! MSI Propeller Blade Technology!
youtubez22ri11akra4tfku3acdp432h1qyzap3yy4ziifc,"exactly, i have to deal with that damned brick, and the power supply can't be upgraded because of it, because as far as power supply goes, i have never seen an external one on newegg that has more power then the x51's"
youtubez23ttpsyolztc1ep004t1aokg5zuyqxfqykgyjqs,"I like how people are liking your comment about liking the fact that Sky DID put Deadlox's channel in the description instead of Ryan's. Nice Alienware thing logo thing, btw"
youtubez12zjp5rupbcttvmy220ghf4ctqnerqwa04,"You know, If you actually made this. People would actually buy it."
So that it doesn't include any comment ID's:
I just bot a Nostromo... ( ._.)
Good Job Baby! MSI Propeller Blade Technology!\n"exactly, i have to deal with that damned brick, and the power supply can't be upgraded because of it, because as far as power supply goes, i have never seen an external one on newegg that has more power then the x51's"
"I like how people are liking your comment about liking the fact that Sky DID put Deadlox's channel in the description instead of Ryan's. Nice Alienware thing logo thing, btw"
"You know, If you actually made this. People would actually buy it."
After sending a request for google cloud/language to #annotate the text. I receive a response which includes various substrings sentiments and magnitudes. Each string is also given a beginOffset value, which relates to the strings index in the original string (the string in the request).
{ content: 'i just bot a Nostromo... ( ._.)\nGood Job Baby!',
beginOffset: 0 }
{ content: 'MSI Propeller Blade Technology!\n"exactly, i have to deal with that damned brick, and the power supply can't be upgraded because of it, because as far as power supply goes, i have never seen an external one on newegg that has more power then the x51's"\n"I like how people are liking your comment about liking the fact that Sky DID put Deadlox's channel in the description instead of Ryan's.',
beginOffset: 50 }
{ content: 'Nice Alienware thing logo thing, btw"\n"You know, If you actually made this.',
beginOffset: 462 }
My aim is then to locate the original comment in the original string, which should be simple enough. Something like (originalString[beginOffset]).....
This value is incorrect!
I am assuming that they do not include certain characters, but I have tried a multitude of regexes and nothing seems to work perfectly. Does anyone have any idea about what might be causing the issue???

I know this is an old question but the problem seems to persist even today. I have recently encountered the same issue and resolved it by interpreting Google's offsets as "byte offsets" rather than string offsets in the chosen encoding. Works great. I hope it helps someone.
The following is some C# code but anybody should be able to interpret it and recode in their own favorite language. If we assume that text is actually the sentiment text being analyzed then the following code transforms, Google's offsets into correct offsets.
int TransformOffset(string text, int offset)
{
return Encoding.UTF8.GetString(
Encoding.UTF8.GetBytes(text),
0,
offset)
.Length;
}

This has got something to do with encoding. Play around with one of the encodings or simply use one of the example approaches provided in their github repo:
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/language/api/analyze.py
Key code block:
def get_native_encoding_type():
"""Returns the encoding type that matches Python's native strings."""
if sys.maxunicode == 65535:
return 'UTF16'
else:
return 'UTF32'
This worked for me. It was messing up characters like ' (that is \u2019 in unicode).

You should set the EncodingType on the request.
Example using Java client library and working with UTF-8 encoded texts:
Document doc = Document.newBuilder().setContent(dreamText).setType(Type.PLAIN_TEXT).build();
AnalyzeEntitiesRequest request = AnalyzeEntitiesRequest.newBuilder().setEncodingType(EncodingType.UTF8).setDocument(doc).build();

Find all string variables in javascript files

i am facing the problem that i have to translate a larger html and javascript project into several languages. The html content was no problem, but the numerous javascript files are problematic, since i was a bit lazy during the development process. For instance, if i needed a message text, i just added it in the concerning position.
My approach now is, that i am using a build-in file search (Eclipse) for every occurrence of " and ', which i am getting line-wise. This would be extremely time consuming and errors are unavoidable.
Here are some examples that occur in the files:
var d = "Datum: " + d.getDate()+"."+(d.getMonth()+1)+"."+d.getFullYear();
showYesNoDialog("heading text","Are you sure?",function(){};
Sometimes i am mixing " and ', sometimes a string goes over several lines:
var list="";
list+='<li data-role="list-divider">';
list+='Text To Translate';
list+='</li>';
Things i don't want to get, are jquery selectors, e.g.:
$("input[name^=checkbox]").each(function () {};
Do you see any time saving method to get all of the strings that i would like to translate?
Regex? A java interpreter? Grep?
I know, that is a bit unusual question.
So any suggestion would be great.
Thanks in advance!

It is better to use some kind of the lexical scanner that converts the code into the tokens and then walk over the list of tokens (or syntax tree). There is a number of such tools (I even created one of them myself - here you can find some of the examples https://github.com/AlexAtNet/spelljs/blob/master/test/scan.js).
With it you can scan the JS file and just iterate over the tokens:
var scan = require('./..../scan.js');
scan(text).forEach(function (item) {
if (item.str) {
console.log(item);
}
});

Using PEG Parser for BBCode Parsing: pegjs or ... what?

I have a bbcode -> html converter that responds to the change event in a textarea. Currently, this is done using a series of regular expressions, and there are a number of pathological cases. I've always wanted to sharpen the pencil on this grammar, but didn't want to get into yak shaving. But... recently I became aware of pegjs, which seems a pretty complete implementation of PEG parser generation. I have most of the grammar specified, but am now left wondering whether this is an appropriate use of a full-blown parser.
My specific questions are:
As my application relies on translating what I can to HTML and leaving the rest as raw text, does implementing bbcode using a parser that can fail on a syntax error make sense? For example: [url=/foo/bar]click me![/url] would certainly be expected to succeed once the closing bracket on the close tag is entered. But what would the user see in the meantime? With regex, I can just ignore non-matching stuff and treat it as normal text for preview purposes. With a formal grammar, I don't know whether this is possible because I am relying on creating the HTML from a parse tree and what fails a parse is ... what?
I am unclear where the transformations should be done. In a formal lex/yacc-based parser, I would have header files and symbols that denoted the node type. In pegjs, I get nested arrays with the node text. I can emit the translated code as an action of the pegjs generated parser, but it seems like a code smell to combine a parser and an emitter. However, if I call PEG.parse.parse(), I get back something like this:
[
[
"[",
"img",
"",
[
"/",
"f",
"o",
"o",
"/",
"b",
"a",
"r"
],
"",
"]"
],
[
"[/",
"img",
"]"
]
]
given a grammar like:
document
= (open_tag / close_tag / new_line / text)*
open_tag
= ("[" tag_name "="? tag_data? tag_attributes? "]")
close_tag
= ("[/" tag_name "]")
text
= non_tag+
non_tag
= [\n\[\]]
new_line
= ("\r\n" / "\n")
I'm abbreviating the grammar, of course, but you get the idea. So, if you notice, there is no contextual information in the array of arrays that tells me what kind of a node I have and I'm left to do the string comparisons again even thought the parser has already done this. I expect it's possible to define callbacks and use actions to run them during a parse, but there is scant information available on the Web about how one might do that.
Am I barking up the wrong tree? Should I fall back to regex scanning and forget about parsing?
Thanks

First question (grammar for incomplete texts):
You can add
incomplete_tag = ("[" tag_name "="? tag_data? tag_attributes?)
// the closing bracket is omitted ---^
after open_tag and change document to include an incomplete tag at the end. The trick is that you provide the parser with all needed productions to always parse, but the valid ones come first. You then can ignore incomplete_tag during the live preview.
Second question (how to include actions):
You write socalled actions after expressions. An action is Javascript code enclosed by braces and are allowed after a pegjs expression, i. e. also in the middle of a production!
In practice actions like { return result.join("") } are almost always necessary because pegjs splits into single characters. Also complicated nested arrays can be returned. Therefore I usually write helper functions in the pegjs initializer at the head of the grammar to keep actions small. If you choose the function names carefully the action is self-documenting.
For an examle see PEG for Python style indentation. Disclaimer: this is an answer of mine.

Regarding your first question I have tosay that a live preview is going to be difficult. The problems you pointed out regarding that the parser won't understand that the input is "work in progress" are correct. Peg.js tells you at which point the error is, so maybe you could take that info and go a few words back and parse again or if an end tag is missing try adding it at the end.
The second part of your question is easier but your grammar won't look so nice afterwards. Basically what you do is put callbacks on every rule, so for example
text
= text:non_tag+ {
// we captured the text in an array and can manipulate it now
return text.join("");
}
At the moment you have to write these callbacks inline in your grammar. I'm doing a lot of this stuff at work right now, so I might make a pullrequest to peg.js to fix that. But I'm not sure when I find the time to do this.

Try something like this replacement rule. You're on the right track; you just have to tell it to assemble the results.
text
= result:non_tag+ { return result.join(''); }

JavaScript multiline strings and templating?

I have been wondering if there is a way to define multiline strings in JavaScript like you can do in languages like PHP:
var str = "here
goes
another
line";
Apparently this breaks up the parser. I found that placing a backslash \ in front of the line feed solves the problem:
var str = "here\
goes\
another\
line";
Or I could just close and reopen the string quotes again and again.
The reason why I am asking because I am making JavaScript based UI widgets that utilize HTML templates written in JavaScript. It is painful to type HTML in strings especially if you need to open and close quotes all the time. What would be a good way to define HTML templates within JavaScript?
I am considering using separate HTML files and a compilation system to make everything easier, but the library is distributed among other developers so that HTML templates have to be easy to include for the developers.

No thats basically what you have to do to do multiline strings.
But why define the templates in javascript anwyay? why not just put them into a file and have a ajax call load them up in a variable when you need them?
For instantce (using jquery)
$.get('/path/to/template.html', function(data) {
alert(data); //will alert the template code
});

#slebetman, Thanks for the detailed example.
Quick comment on the substitute_strings function.
I had to revise
str.replace(n,substitutions[n]);
to be
str = str.replace(n,substitutions[n]);
to get it to work. (jQuery version 1.5? - it is pure javascript though.)
Also when I had below situation in my template:
$CONTENT$ repeated twice $CONTENT$ like this
I had to do additional processing to get it to work.
str = str.replace(new RegExp(n, 'g'), substitutions[n]);
And I had to refrain from $ (regex special char) as the delimiter and used # instead.
Thought I would share my findings.

There are several templating systems in javascript. However, my personal favorite is one I developed myself using ajax to fetch XML templates. The templates are XML files which makes it easy to embed HTML cleanly and it looks something like this:
<title>This is optional</title>
<body><![CDATA[
HTML content goes here, the CDATA block prevents XML errors
when using non-xhtml html.
<div id="more">
$CONTENT$ may be substituted using replace() before being
inserted into $DOCUMENT$.
</div>
]]></body>
<script><![CDATA[
/* javascript code to be evaled after template
* is inserted into document. This is to get around
* the fact that this templating system does not
* have its own turing complete programming language.
* Here's an example use:
*/
if ($HIDE_MORE$) {
document.getElementById('more').display = 'none';
}
]]></script>
And the javascript code to process the template goes something like this:
function insertTemplate (url_to_template, insertion_point, substitutions) {
// Ajax call depends on the library you're using, this is my own style:
ajax(url_to_template, function (request) {
var xml = request.responseXML;
var title = xml.getElementsByTagName('title');
if (title) {
insertion_point.innerHTML += substitute_strings(title[0],substitutions);
}
var body = xml.getElementsByTagName('body');
if (body) {
insertion_point.innerHTML += substitute_strings(body[0],substitutions);
}
var script = xml.getElementsByTagName('script');
if (script) {
eval(substitute_strings(script[0],substitutions));
}
});
}
function substitute_strings (str, substitutions) {
for (var n in substitutions) {
str.replace(n,substitutions[n]);
}
return str;
}
The way to call the template would be:
insertTemplate('http://path.to.my.template', myDiv, {
'$CONTENT$' : "The template's content",
'$DOCUMENT$' : "the document",
'$HIDE_MORE$' : 0
});
The $ sign for substituted strings is merely a convention, you may use % of # or whatever delimiters you prefer. It's just there to make the part to be substituted unambiguous.
One big advantage to using substitutions on the javascript side instead of server side processing of the template is that this allows the template to be plain static files. The advantage of that (other than not having to write server side code) is that you can then set the caching policy for the template to be very aggressive so that the browser only needs to fetch the template the first time you load it. Subsequent use of the template would come from cache and would be very fast.
Also, this is a very simple example of the implementation to illustrate the mechanism. It's not what I'm using. You can modify this further to do things like multiple substitution, better handling of script block, handle multiple content blocks by using a for loop instead of just using the first element returned, properly handling HTML entities etc.
The reason I really like this is that the HTML is simply HTML in a plain text file. This avoids quoting hell and horrible string concatenation performance issues that you'll usually find if you directly embed HTML strings in javascript.

I think I found a solution I like.
I will store templates in files and fetch them using AJAX. This works for development stage only. For production stage, the developer has to run a compiler once that compiles all templates with the source files. It also compiles JavaScript and CSS to be more compact and it compiles them to a single file.
The biggest problem now is how to educate other developers doing that. I need to build it so that it is easy to do and understand why and what are they doing.

You could also use \n to generate newlines. The html would however be on a single line and difficult to edit. But if you generate the JS using PHP or something it might be an alternative

We Keep Coding

JavaScript is the programming language of the Web.