How to parse a JavaScript code/array with c# - javascript

I've got a web request to a javascript file. As a response I've JavaScript-Snippet which I'm trying to parse in C#.
The Snippet looks like this:
sDt[1647110]=['SVK U19 A','D43A71','Jupie Podlavice Badin(U19)','TJ Straza(U19)','','',' / '
,'','',114745,114746,1,'',0,0,0,1012,1,'','',''];sDt[1647108]=['SVK U19 A','D43A71','Kysucke Nove Mesto(U19)',
'MFK Lokomotiva Zvolen(U19)','','',' / ','','',114741,114742,1,'',0,0,0,1012,1,'','',''];
sDt[1647109]=['SVK U19 A', /* A lot of more of that kind followed by */ ;WLID[1623901]=1;
WLID[1623902]=1;WLID[1623903]=1;WLID[1637686]=1;
WLID[1637692]=1;WLID[1637687]=1;WLID[1637688]=1;WLID[1637685]= /* ending with */
var ORD = [1647110,1647108,1647109,1647133,1645669,1647122,1626152,1647251,1646643,
1647130,1646685,1 ... ];
Obviously this isn't pure JSON array. Now I wonder how to parse this most efficiently. First I started to do this per pedes meaning usig String.Split and so on. But this is slow and unfortunately not really stable.
While the Part behind each sDt[Idendifier]= is an Array which I could parse with Json.Net I also need the Idendifier. Everything else like WLID or var ORD I can ignore.
Does anyone has an idea how to do this efficiently?
Thanks in advance

You have to go through the whole request token by token if you don't have any other information. There is no other way around.
Why don't you just send the JSON?
But to parse it I would do the following:
Go through the whole request.
If you come across a '[' make sure to check if you're not in a string. (For example by setting a flag when you stumble over a ' " ' and by unsetting it if you come to the next ' " ').
If you're are not parsing a string right now, the following tokens are either the identifier or the content. You can easily check that.
In case of a number, this is your identifier until you reach "]" (and given that you aren't parsing a string currently).
In the other case it's the content which you can parse with Json.Net now, just remember where (the index) the first "[" and the following "]" is and you can generate a substring which you can then pass to Json.Net.
If you come across a ";" and you are not in a string, make sure that you skip the WLID and ORD part.
The whole operation takes O(n * m) with n=Number of tokens and m=length of the longest content string.
If you do the parsing of the content yourself (and not letting Json.Net do that for you) you could narrow it down to O(n) of course.

Related

How can I replace some calls to JavaScript's eval() with Ext.decode()?

We are trying to get rid of all of our eval() calls in our JavaScript. Unfortunately, I am not much of a JavaScript programmer, and I need some help.
Many of our eval() calls operate on strings, outputs from a web service, that are very JSON-like, for example, we might eval the following string:
ClassMetaData['Asset$Flex'] = {
fields: {
}
,label: 'Flex Fields'
};
I've seen various suggestions on the Internet suggesting Ext.decode(). The documentation for it says - "Decodes (parses) a JSON string to an object. If the JSON is invalid, this function throws a SyntaxError unless the safe option is set." The string that I am supplying as an argument isn't legitimate JSON as I understand it (the field names aren't quoted), but Ext.decode() nearly works for me anyway. If I decode the above string, I get an error (why?) - "Uncaught SyntaxError: Unexpected token ;". However, if I remove the trailing semi-colon, and decode, everything seems to be fine.
I am using the following code to determine whether the decode call and the eval call do the same thing:
var evaled = eval(inputString);
var decoded = Ext.decode(inputString.replace(";", "")); // remove trailing ";", if any
console.log("Equal? - " + (JSON.stringify(decoded) == JSON.stringify(evaled)));
Unfortunately, this is not a very good solution. For example, some of the input strings to eval are fairly complex. They may have all sorts of embedded characters - semicolons, HTML character encodings, etc. Decode may complain about some other syntax problem, besides semicolons at the end, and I haven't found a good way to determine where the problem is that decode objects to. (It doesn't say "illegal character in position 67", for example.)
My questions:
Could we, with a small amount of work, create a generic solution
using decode?
Is there an easy way to convert our JSON-like input
into true JSON?
Is there a better way of comparing the results of
eval and decode than JSON.stringify(decoded) == JSON.stringify(evaled)?

How to split a string with space delimiter in google apps script

I want to get the first space delimited string out of a longer string with multiple words in it. Lots of examples, all of which do something like .split(" "). Google Apps script tells me that split can't take null as a parameter. I tried setting a variable to " ", even tried String.fromCharCode(32), but GAS tells me that's null.
Search, indexOf have the same issue -- GAS tells me it can't do those with a null argument. How do I tell GAS that a single space is not a null?
Yes, this was a case of the string being null in some cases. I was using the file method getDescription. After using Logger a bunch, I found the problem and I solved those some cases by this check.
var desctext = file.getDescription();
if ( desctext == null) { desctext = " ; ; ; "};
It's been many years since I've done anything with REGEX, and is was confusing then. Any suggestions on a good site for learning it?
I also changed my delimiter to a semicolon, even though it makes the user input more tedious.
The background is that I'm writing a script to work with a photo inventory where I want to keep the important information in the photo's EXIF. Not sure if Google is actually storing it there, but in any case the only fields available aside from the date and filename is the description. So I'm having the users type in coded filenames and descriptions with semicolon delimiters to indicate things like category, item name, status and value.

How to encode and decode all special characters in javascript or jquery?

I want to be able to encode and decode all the following characters using javascript or jquery...
~!##$%^&*()_+|}{:"?><,./';[]\=-`
I tried to encode them using this...
var cT = encodeURI(oM); // oM holds the special characters
cT = cT.replace(/[!"#$%&'()*+,.\/:;<=>?#[\\\]^`{|}~]/g, "\\\\$&");
Which does encode them, or escape them rather, but then I am trying to do the reverse with this...
decodeURIComponent(data.convo.replace(/\+/g, ' '));
But, it's not coming out in any way desired.
I've built a chat plugin for jquery, but the script crashes if someone enters a special character. I want the special characters to get encoded, then when they get pulled out of the data base, they should be decoded. I tried using urldecode in PHP before the data is returned to the ajax request but it's coming out horribly wrong.
I would think that there exists some function to encode and decode all special characters.
Oh, one caveat for this is that I'm wrapping each message with html elements, so I think the decoding needs to be done server side, before the message is wrapped, or be able to know when to ignore valid html tags and decode the other characters that are just what the user wanted to type.
Am I encoding/escaping them wrong to begin with?
Is that why the results are horrible?
This is pretty simple in javascript
//Note that i have escaped the " in the string - this means it still gets processed
var exampleInput = "Hello there h4x0r ~!##$%^&*()_+|}{:\"?><,./';[]\=-`";
var encodedInput = encodeURI(exampleInput);
var decodedInput = decodeURI(encodedInput);
console.log(exampleInput);
console.log(encodedInput);
console.log(decodedInput);
Just encode and decode the input. If something else is breaking in your script it means you are not stripping away things that you are somehow processing. It's hard to provide an accurate answer as you can see encoding and decoding the URI standards does not crash things. Only the processing of this content improperly would cause issues.
When you output the content in HTML you should be encoding the HTML entities.
Reference this thread Encode html entities in javascript if you need to actually encode for display inside HTML safely.
An additional reference on how html entities work can be found here: W3 Schools - HTML Entities and W3 Schools - HTML Symbols

Finding text strings in JavaScript

I have a large valid JavaScript file (utf-8), from which I need to extract all text strings automatically.
For simplicity, the file doesn't contain any comment blocks in it, only valid ES6 JavaScript code.
Once I find an occurrence of ' or " or `, I'm supposed to scan for the end of the text block, is where I got stuck, given all the possible variations, like "'", '"', "\'", '\"', '", `\``, etc.
Is there a known and/or reusable algorithm for detecting the end of a valid ES6 JavaScript text block?
UPDATE-1: My JavaScript file isn't just large, I also have to process it as a stream, in chunks, so Regex is absolutely not usable. I didn't want to complicate my question, mentioning joint chunks of code, I will figure that out myself, If I have an algorithm that can work for a single piece of code that's in memory.
UPDATE-2: I got this working initially, thanks to the many advises given here, but then I got stuck again, because of the Regular Expressions.
Examples of Regular Expressions that break any of the text detection techniques suggested so far:
/'/
/"/
/\`/
Having studied the matter closer, by reading this: How does JavaScript detect regular expressions?, I'm afraid that detecting regular expressions in JavaScript is a whole new ball game, worth a separate question, or else it gets too complicated. But I appreciate very much if somebody can point me in the right direction with this issue...
UPDATE-3: After much research I found with regret that I cannot come up with an algorithm that would work in my case, because presence of Regular Expressions makes the task incredibly more complicated than was initially thought. According to the following: When parsing Javascript, what determines the meaning of a slash?, determining the beginning and end of regular expressions in JavaScript is one of the most complex and convoluted tasks. And without it we cannot figure out when symbols ', '"' and ` are opening a text block or whether they are inside a regular expression.
The only way to parse JavaScript is with a JavaScript parser. Even if you were able to use regular expressions, at the end of the day they are not powerful enough to do what you are trying to do here.
You could either use one of several existing parsers, that are very easy to use, or you could write your own, simplified to focus on the string extraction problem. I hardly imagine you want to write your own parser, even a simplified one. You will spend much more time writing it and maintaining it than you might think.
For instance, an existing parser will handle something like the following without breaking a sweat.
`foo${"bar"+`baz`}`
The obvious candidates for parsers to use are esprima and babel.
By the way, what are you planning to do with these strings once you extract them?
If you only need an approximate answer, or if you want to get the string literals exactly as they appear in the source code, then a regular expression can do the job.
Given the string literal "\n", do you expect a single-character string containing a newline or the two characters backslash and n?
In the former case you need to interpret escape sequences exactly like a JavaScript interpreter does. What you need is a lexer for JavaScript, and many people have already programmed this piece of code.
In the latter case the regular expression has to recognize escape sequences like \x40 and \u2026, so even in that case you should copy the code from an existing JavaScript lexer.
See https://github.com/douglascrockford/JSLint/blob/master/jslint.js, function tokenize.
Try code below:
txt = "var z,b \n;z=10;\n b='321`1123`321321';\n c='321`321`312`3123`';"
function fetchStrings(txt, breaker){
var result = [];
for (var i=0; i < txt.length; i++){
// Define possible string starts characters
if ((txt[i] == "'")||(txt[i] == "`")){
// Get our text string;
textString = txt.slice(i+1, i + 1 + txt.slice(i+1).indexOf(txt[i]));
result.push(textString)
// Jump to end of fetched string;
i = i + textString.length + 1;
}
}
return result;
};
console.log(fetchStrings(txt));

Javascript eval fails to evaluate json when escape characters are involved

I am using rhino javascript engine to evaluate the json. The Json structure is as following :
{"DataName":"111","Id":"222","Date":"2015-12-31T00:00:00","TextValue":"{\"Id\":\"1\",\"Name\":\"Daugherty\",\"ContactName\":\"May C\",\"ContactEmail\":\"may.c#gamil.com\",\"Total\":25,\"Phone\":\"111-111-1111\",\"Type\":\"Daily\",\"Notes\":[{\"Comments\":\"One\",\"Date\":\"2014-11-27T00:00:00.000\"},{\"Comments\":\"Two\",\"Date\":\"2014-11-28T00:00:00.000\"}],\"ImportComplete\":true,\"RunComplete\":true,\"CompleteDate\":\"2014-07-31T00:00:00.000\",\"Amount\":2400.00,\"ProcessingComplete\":true}","NumberValue":4444.5555,"DateValue":"2014-12-01T00:00:00"}
Since I am using Rhino js engine I can't use JSON.parse and JSON.stringify.
As you can see the json has embedded json, this json I am getting from a .net web api which is putting the escape character '\'. I am trying to replace that escape character in javascript but no help.
Is there any way in javascript where we can replace that escape character and use 'eval()' to evaluate the json.
Here's the code that I am trying
var json = '{"DataName":"111","Id":"222","Date":"2015-12-31T00:00:00","TextValue":"{\"Id\":\"1\",\"Name\":\"Daugherty\",\"ContactName\":\"May C\",\"ContactEmail\":\"may.c#gamil.com\",\"Total\":25,\"Phone\":\"111-111-1111\",\"Type\":\"Daily\",\"Notes\":[{\"Comments\":\"One\",\"Date\":\"2014-11-27T00:00:00.000\"},{\"Comments\":\"Two\",\"Date\":\"2014-11-28T00:00:00.000\"}],\"ImportComplete\":true,\"RunComplete\":true,\"CompleteDate\":\"2014-07-31T00:00:00.000\",\"Amount\":2400.00,\"ProcessingComplete\":true}","NumberValue":4444.5555,"DateValue":"2014-12-01T00:00:00"}';
var find = '\"';
var regex = new RegExp(find,'g');
var inj = json.replace(regex,'"');
var pl = eval('(' + inj +')');
confusing backslashes
The problem you are getting is due to the fact of not fully understanding escape characters, when you are more than one level of "string" deep. Whilst a single slash is fine for one level i.e:
"It is no coincidence that in no known language does the " +
"phrase \"As pretty as an Airport\" appear.";
If you take that and then wrap it in outer quotes:
'"It is no coincidence that in no known language does the "' +
'"phrase \"As pretty as an Airport\" appear."';
The backslashes (if supported as escape characters by the system parsing the string) work for the outer-most wrapping quote, not any of the inner quotes/strings as they were before. This means once the js engine has parsed the string, internally the string will be.
'"It is no coincidence that in no known language does the phrase "As pretty as an Airport" appear."';
Which makes it impossible to tell the difference between the " and the \" from the original string. In order to get around this, you need to escape the backslashes in the original string, before you wrap it. This has the result of one level of escaping being used by the JavaScript engine, but still leaving another level remaining within the string. e.g.
'"It is no coincidence that in no known language does the "' +
'"phrase \\"As pretty as an Airport\\" appear."';
Now when the string is parsed, internally it will be:
'"It is no coincidence that in no known language does the phrase \"As pretty as an Airport\" appear."';
ignore the my random Douglas Adams quotes being separated onto more than one line (using +), I've only done that for ease of reading within a fix width area. I've kept it parsable by JavaScript, just in case people copy and paste and expect things to work.
So in order to fix your issue, your JSON source (before placing in the JavaScript code) will have to look like this:
var json = '{"DataName":"111","Id":"222","Date":"2015-12-31T00:00:00","TextValue":"{\\"Id\\":\\"1\\",\\"Name\\":\\"Daugherty\\",\\"ContactName\\":\\"May C\\",\\"ContactEmail\\":\\"may.c#gamil.com\\",\\"Total\\":25,\\"Phone\\":\\"111-111-1111\\",\\"Type\\":\\"Daily\\",\\"Notes\\":[{\\"Comments\\":\\"One\\",\\"Date\\":\\"2014-11-27T00:00:00.000\\"},{\\"Comments\\":\\"Two\\",\\"Date\\":\\"2014-11-28T00:00:00.000\\"}],\\"ImportComplete\\":true,\\"RunComplete\\":true,\\"CompleteDate\\":\\"2014-07-31T00:00:00.000\\",\\"Amount\\":2400.00,\\"ProcessingComplete\\":true}","NumberValue":4444.5555,"DateValue":"2014-12-01T00:00:00"}';
You should find the above will eval directly, without any replacements.
In order to achieve the above programatically, you will have to see what the .NET system you are using offers in the way of escaping backslashes. I mainly work with PHP or Python on the server side. Using those languages you could use:
the $s and s strings below have been cropped for brevity.
<?php
$s = '{"DataName":"111","Id":"222"...';
$s = str_replace("\\", "\\\\", $s);
echo "var json = '$s';";
or ...
#!/usr/bin/env python
s = r'{"DataName":"111","Id":"222"...'
s = s.replace("\\", "\\\\")
print "var json = '" + s + "';"
another solution
It all depends on how you are requesting the content you are wrapping in the string in JavaScript. If you have the ability to write out your js from the server side (most likely with .NET). Like I have shown above with PHP or Python, you don't need to wrap the content in a string at all. You can instead just output the content without being wrapped in single quotes. JavaScript will then just parse and treat it as a literal object structure:
var jso = {"DataName":"111","Id":"222","Date":"2015-12-31T00:00:00","TextValue":"{\"Id\":\"1\",\"Name\":\"Daugherty\",\"ContactName\":\"May C\",\"ContactEmail\":\"may.c#gamil.com\",\"Total\":25,\"Phone\":\"111-111-1111\",\"Type\":\"Daily\",\"Notes\":[{\"Comments\":\"One\",\"Date\":\"2014-11-27T00:00:00.000\"},{\"Comments\":\"Two\",\"Date\":\"2014-11-28T00:00:00.000\"}],\"ImportComplete\":true,\"RunComplete\":true,\"CompleteDate\":\"2014-07-31T00:00:00.000\",\"Amount\":2400.00,\"ProcessingComplete\":true}","NumberValue":4444.5555,"DateValue":"2014-12-01T00:00:00"};
This works because JSON is just a more strict version of a JavaScript Object, and the quote/escape level you already have will work fine.
The only downside to the above solution is that you have to implicitly trust the source of where you are getting this data from, and it will always have to be well formed. If not, you could introduce parse errors or unwanted js into your code; which could be avoided with an eval/JSON.parse system.

Categories