How does Crockfords JSON Parser work? - javascript

I have stared for a long time at the code found here. It's Douglas Crockfords JSON-parsing function (called a recursive descent parser). Can anyone elaborate on the mechanics of this parser? I really can't get my head around it.

Logically you may start with the actual parse functions which starts at line 311 (omitted the receiver part for clarity).
function (source, reviver) {
var result;
text = source;
at = 0;
ch = ' ';
result = value();
white();
if (ch) {
error("Syntax error");
}
return result;
}
Initializes function global variables text with the source text, position at with position and current character ch with a space. Afterwards it parses a value by calling function value.
Each object to be parsed is encapsulated in a function itself (in above example the value object). There are several of them: number, string, white, ...). Each one does basically work in the same way. First we'll look into white as basic example:
white = function () {
// Skip whitespace.
while (ch && ch <= ' ') {
next();
}
}
Note that ch constains always the current character. This variable is only updated by next which reads in the next one. This can be seen within white where each whitespace is eaten by a call to next. Thus after calling this function the first non-space character will be in variable ch.
Let's look for a more complex example value:
value = function () {
// Parse a JSON value. It could be an object, an array, a string, a number,
// or a word.
white();
switch (ch) {
case '{':
return object();
case '[':
return array();
case '"':
return string();
case '-':
return number();
default:
return ch >= '0' && ch <= '9' ? number() : word();
}
};
It first parses whitespaces by calling white. Note that ch now contains the current character to be parsed. If it is a '{' we'll now that a json object is coming next and call the corresponding function object. If instead it is a '[' we expect an json array and so on.
All other functions are build the same way: inspect the current character, decide what has to come next and then parse this object.
The object itself may contain other values and therefore you'll find an indirect recursive call of function value in object again. Thus by recursively calling all the json object functions they are actually parsed from the source string.

Related

Regex to define the number of appearances substituted [duplicate]

I'd like to know how to replace a capture group with its uppercase in JavaScript. Here's a simplified version of what I've tried so far that's not working:
> a="foobar"
'foobar'
> a.replace( /(f)/, "$1".toUpperCase() )
'foobar'
> a.replace( /(f)/, String.prototype.toUpperCase.apply("$1") )
'foobar'
Would you explain what's wrong with this code?
You can pass a function to replace.
var r = a.replace(/(f)/, function(v) { return v.toUpperCase(); });
Explanation
a.replace( /(f)/, "$1".toUpperCase())
In this example you pass a string to the replace function. Since you are using the special replace syntax ($N grabs the Nth capture) you are simply giving the same value. The toUpperCase is actually deceiving because you are only making the replace string upper case (Which is somewhat pointless because the $ and one 1 characters have no upper case so the return value will still be "$1").
a.replace( /(f)/, String.prototype.toUpperCase.apply("$1"))
Believe it or not the semantics of this expression are exactly the same.
I know I'm late to the party but here is a shorter method that is more along the lines of your initial attempts.
a.replace('f', String.call.bind(a.toUpperCase));
So where did you go wrong and what is this new voodoo?
Problem 1
As stated before, you were attempting to pass the results of a called method as the second parameter of String.prototype.replace(), when instead you ought to be passing a reference to a function
Solution 1
That's easy enough to solve. Simply removing the parameters and parentheses will give us a reference rather than executing the function.
a.replace('f', String.prototype.toUpperCase.apply)
Problem 2
If you attempt to run the code now you will get an error stating that undefined is not a function and therefore cannot be called. This is because String.prototype.toUpperCase.apply is actually a reference to Function.prototype.apply() via JavaScript's prototypical inheritance. So what we are actually doing looks more like this
a.replace('f', Function.prototype.apply)
Which is obviously not what we have intended. How does it know to run Function.prototype.apply() on String.prototype.toUpperCase()?
Solution 2
Using Function.prototype.bind() we can create a copy of Function.prototype.call with its context specifically set to String.prototype.toUpperCase. We now have the following
a.replace('f', Function.prototype.apply.bind(String.prototype.toUpperCase))
Problem 3
The last issue is that String.prototype.replace() will pass several arguments to its replacement function. However, Function.prototype.apply() expects the second parameter to be an array but instead gets either a string or number (depending on if you use capture groups or not). This would cause an invalid argument list error.
Solution 3
Luckily, we can simply substitute in Function.prototype.call() (which accepts any number of arguments, none of which have type restrictions) for Function.prototype.apply(). We have now arrived at working code!
a.replace(/f/, Function.prototype.call.bind(String.prototype.toUpperCase))
Shedding bytes!
Nobody wants to type prototype a bunch of times. Instead we'll leverage the fact that we have objects that reference the same methods via inheritance. The String constructor, being a function, inherits from Function's prototype. This means that we can substitute in String.call for Function.prototype.call (actually we can use Date.call to save even more bytes but that's less semantic).
We can also leverage our variable 'a' since it's prototype includes a reference to String.prototype.toUpperCase we can swap that out with a.toUpperCase. It is the combination of the 3 solutions above and these byte saving measures that is how we get the code at the top of this post.
Why don't we just look up the definition?
If we write:
a.replace(/(f)/, x => x.toUpperCase())
we might as well just say:
a.replace('f','F')
Worse, I suspect nobody realises that their examples have been working only because they were capturing the whole regex with parentheses. If you look at the definition, the first parameter passed to the replacer function is actually the whole matched pattern and not the pattern you captured with parentheses:
function replacer(match, p1, p2, p3, offset, string)
If you want to use the arrow function notation:
a.replace(/xxx(yyy)zzz/, (match, p1) => p1.toUpperCase()
Old post but it worth to extend #ChaosPandion answer for other use cases with more restricted RegEx. E.g. ensure the (f) or capturing group surround with a specific format /z(f)oo/:
> a="foobazfoobar"
'foobazfoobar'
> a.replace(/z(f)oo/, function($0,$1) {return $0.replace($1, $1.toUpperCase());})
'foobazFoobar'
// Improve the RegEx so `(f)` will only get replaced when it begins with a dot or new line, etc.
I just want to highlight the two parameters of function makes finding a specific format and replacing a capturing group within the format possible.
SOLUTION
a.replace(/(f)/,(m,g)=>g.toUpperCase())
for replace all grup occurrences use /(f)/g regexp. The problem in your code: String.prototype.toUpperCase.apply("$1") and "$1".toUpperCase() gives "$1" (try in console by yourself) - so it not change anything and in fact you call twice a.replace( /(f)/, "$1") (which also change nothing).
let a= "foobar";
let b= a.replace(/(f)/,(m,g)=>g.toUpperCase());
let c= a.replace(/(o)/g,(m,g)=>g.toUpperCase());
console.log("/(f)/ ", b);
console.log("/(o)/g", c);
Given a dictionary (object, in this case, a Map) of property, values, and using .bind() as described at answers
const regex = /([A-z0-9]+)/;
const dictionary = new Map([["hello", 123]]);
let str = "hello";
str = str.replace(regex, dictionary.get.bind(dictionary));
console.log(str);
Using a JavaScript plain object and with a function defined to get return matched property value of the object, or original string if no match is found
const regex = /([A-z0-9]+)/;
const dictionary = {
"hello": 123,
[Symbol("dictionary")](prop) {
return this[prop] || prop
}
};
let str = "hello";
str = str.replace(regex, dictionary[Object.getOwnPropertySymbols(dictionary)[0]].bind(dictionary));
console.log(str);
In the case of string conversion from CamelCase to bash_case (ie: for filenames), use a callback with ternary operator.
The captured group selected with a regexp () in the first (left) replace arg is sent to the second (right) arg that is a callback function.
x and y give the captured string (don't know why 2 times!) and index (the third one) gives the index of the beginning of the captured group in the reference string.
Therefor a ternary operator can be used not to place _ at first occurence.
let str = 'MyStringName';
str = str.replace(/([^a-z0-9])/g, (x,y,index) => {
return index != 0 ? '_' + x.toLowerCase() : x.toLowerCase();
});
console.log(str);

SyntaxError when extending Number object

I am trying to extend the Number object with this code:
Number.prototype.isNumber = function(i){
if(arguments.length === 1){
return !isNaN(parseFloat(i)) && isFinite(i);
} else {
return !isNaN(parseFloat(this)) && isFinite(this);
}
}
try {
var x = 8.isNumber();
} catch(err) {
console.log(err);
}
I get SyntaxError: identifier starts immediately after numeric literal
also when I try the following:
Number.isNumber(8)
I get Number.isNumber is not a function!!
The JavaScript parser reads 8.isNumber as a number literal.
To access a Number method on a numeric literal you'll have to surround the number with parenthesis so the JavaScript interpreter knows you're trying to use the number properties.
Number.prototype.isNumber = function(i) {
if (arguments.length === 1) {
return !isNaN(parseFloat(i)) && isFinite(i);
}
return !isNaN(parseFloat(this)) && isFinite(this);
}
try {
var x = (8).isNumber();
console.log(x);
} catch(err) {
console.log(err);
}
I couldn't help it but provide an additional answer although you already accepted one.
The first thing you need to know, is that there is a fundamental difference between the Number object, and the Number prototype (see here).
As it stands, you are extending the Number prototype, not the object itself! Your isNumber implementation actually has the same effect like the following:
Number.prototype.isNumber = function(){return isFinite(this)}
Why? Because in order to execute this prototype method, the parser first needs to know the type of the literal you are invoking the function on. That's why you either need to turn your number literal into an expression by wrapping it in parentheses: (8).isNumber() or by using an even weirder notation 8..isNumber() (the first . is the decimal point, the second the property accessor). At this point, the javascript engine already evaluated it as a Number and thus can execute the isNumber() method.
On the other hand, although at first glimpse your code looks like it could handle the following case correctly (since you are doing a parseFloat): "8".isNumber() will always throw an exception, because here we have a string literal, and the String prototype does not have the according method. This means, you will never be able to detect numbers that are actually string literals in the first place.
What you instead should do, is directly extend the Number object so you can actually do a proper check without having to deal with errors:
Number.isFiniteNumber = function(i){
return !Number.isNaN(i) && Number.isFinite(i);
}
Number.isFiniteNumber(8); // returns true
Number.isFiniteNumber("3.141"); // returns true
Number.isFiniteNumber(".2e-34"); // returns true
Number.isFiniteNumber(Infinity); // returns false
// just for informational purposes
typeof Infinity === "number" // is true
Bonus material:
Extending native objects is potentially dangerous.
Number.isNaN() probably does not what you think it does.

Confused on how to approach this (writing a function, returning a string)

So I have to write a function plusLettuce that accepts one parameter as an argument and the function has to return a string that has my argument and the phrase "plus lettuce". So I'm guessing if I type in plusLettuce("Onions"); into my console I should get "Onions plus lettuce" as my output.
This is what I have so far .. so I wrote my function with a parameter and I'm confused what to do next. (I'm a total noon sorry) Do I make a variable word? I'm just stuck on what my next step has to be. Please help.
var plusLettuce = function(word) {
var word =
}
You can use the addition operator + to concatenate strings, and the return statement to return the result of the function call:
var plusLettuce = function(word) {
return word + " plus lettuce";
};
plusLettuce("Onions"); // "Onions plus lettuce"
JS uses + for string concatenation.
You're also overwriting your word (which is already there, in your function), when you declare a new var word.
So
function plusLettuce (phrase) {
// I don't say `var phrase`, because it already exists
var phrasePlusLettuce = phrase + " plus lettuce"; // note the space at the start
return phrasePlusLettuce;
}
When you give a function a parameter, it automatically becomes a local variable for that function. Meaning that you can immediately use it as a variable too.
var plusLettuce = function(word) { // I take the var word from here...
return word + ' plus lettuce'; // ...and then use it here.
};
console.log(plusLettuce('Onions')); // This is where I assign the var word.
So what's happening here is that I'm telling the plusLettuce function to return whatever the user gave as a parameter plus ' plus lettuce'. Then call it in the console.log();
In programming this is called string concatenation, what your been asked to do is make a static string concatenate with a dynamic one.
function plusLettuce (phrase){
var staticWord = ' plus lettuce';
return phrase + staticWord
}
console.log(plusLettuce('Onions'))
Remember that a parameter/argument is a variable accessible by the function only, the static part meaning it will always be the same can be a assign to a variable to keep the code clean. and the dynamic part which is the parameter will be different every time according to what is passed on to the function each time is called.

Converting a loop into a recursive function

I wrote a function yesterday to count the number of "a" characters in a string. My teacher told me to refactor the code into a recursive function and I don't really know how to do so.
I would like some feedback on the subject, and by the way I'm an absolute beginner in JavaScript.
function numberOfA(n){
var numberA =0;
for (i=0; i<=n.length; i++){
if(n.charAt(i)== "a" ){
numberA++;}
}
return numberA;
}
to call the function following piece of code :
var n = prompt("type a word");
var output = numberOfA(n);
alert (output);
Thanks in advance !
The goal of recursion is to make a function which calls itself.
You might have mutual-recursion -- function A calls function B, calls function A... but that's certainly not needed here, and is better suited for when you know that you need to do two distinct things (one per function) and know that you need to do them in a leapfrog pattern.
Where recursion comes into play is when you're thinking about loops.
Normally, when you're doing things with loops, you might end up having two or three loops inside of one another.
Instead of worrying about managing loops, recursion is a way of thinking about what happens in a single-iteration of a loop, and writing ONLY the code needed to do that.
A really simple example of singular recursion might be to log all elements of an array to the console.
This is not a practical example -- it's a trivial example which has most of the pieces you need to make practical examples.
var array = [ "one", "two", "three", "four" ];
function listNextItem (array, index) {
var item = array[index];
if (!item) { return; }
console.log(item);
listNextItem(array, index + 1);
}
listNextItem(array, 0);
I've created a very simple function which looks like the inside of your innermost loop.
It sets an item variable, based on array[index].
If it doesn't exist, we're done, and we can return out of the function, so we don't try to go on forever (this is very important in recursion).
If it does exist, we log the item's value.
Then we call the exact same function, and pass it the exact-same array, but we pass it the value of index + 1.
Did this change anybody's life, or make loops obsolete?
Not really.
But it's the first step to getting recursion.
The next step is getting a return from recursion.
function recursiveAddOne (current, max) {
if (current === max) { return current; }
return 1 + recursiveAddOne(current + 1, max);
}
var total = recursiveAddOne(0, 3); // === 3 + 1 + 1 + 1
total; // 6
Normally in my return statement, I'd be sending the answer back to the variable in the outside world.
I'm still doing that, but here I'm adding a call to the same function, as part of my return.
What does that do?
Well, the outside function can't return a value until the inside function returns.
The inside function can't return a value until ITS inside function returns...
...and it goes all the way down until my termination-condition is met.
That condition returns a value to its outer function. That outer function returns that added value to ITS outer function... ...all the way up to where the outermost function gets handed the value of all of the other functions put together, and then returns THAT to the outside world.
It's like giving each Russian Matryoshka ("babushka") doll a piece of work.
You start with the biggest one, and go all the way inside to the tiniest one.
The tiniest one does its work first, and hands it back to the next one, which does its work and hands that back... ...all the way back until you're outside again.
Well, the basic concept of recursion is solving a problem with a smaller version of itself.
You have a function, numberOfA which gives you the length of a string(or maybe substring).
So let's say you have the string "javascript' the first string is at index 2.
It's logical to say that the number of as in your string is equal to 1 plus the number of as in the entire substring after the first a.
So what you do, is you add 1 to the number of as in the substring vascript
So here's some psudocode
function numA(str)
{
var substring = substr(index_of_first_a, str.length - index_of_first_a
return 1 + numA(substring);
}
function numberOfA(n, count){
if(!n.length) {
return count;
}
if(n.charAt(i)== "a") {
++count;
}
return numberOfA(n.substr(1), count);
}
var numberA = numberOfA('asdfafeaa', 0);
Try this:
function numberOfA(n) {
return n == "" ? 0 : (n.charAt(0) == "a" ? 1 : 0) + numberOfA(n.substring(1))
}
Here's how it works:
If n is the empty string, return 0 and finish the recursion. This is the base case of the recursion.
Else if the character at the first position in the string is an "a" add one, if not add zero and either way advance the recursion by removing the first character from the string. This is the recursive step of the recursion.
As you can see, every recursive solution must have at least a base case and a recursive step.
<!DOCTYPE html><html lang="en"><body><script>
var foo = function foo() {
console.log(arguments.callee); // logs foo()
// callee could be used to invoke recursively the foo function (e.g. arguments.callee())
}();
</script></body></html>
arguments.callee function will call the currently being executed method.

Crockford's deentityify method - p.41 of The Good Parts

In a fit of self-improvement, I'm reading (and rereading) TGP by SeƱor Crockford. I cannot, however, understand the middlemost part of his deentityify method.
...
return this.replace(...,
function (a, b) {
var r = ...
}
);
I think I understand that:
this.replace is passed two arguments, the regex as the search value and the function to generate the replacement value;
the b is used to access the properties in the entity object;
the return ? r : a; bit determines whether to return the text as is or the value of the appropriate property in entity.
What I don't get at all is how the a & b are provided as arguments into function (a, b). What is calling this function? (I know the whole thing is self-executing, but that doesn't really clear it up for me. I guess I'm asking how is this function being called?)
If someone was interested in giving a blow by blow analysis akin to this, I'd really appreciate it, and I suspect others might too.
Here's the code for convenience:
String.method('deentityify', function ( ) {
var entity = {
quot: '"',
lt: '<',
gt: '>'
};
return function () {
return this.replace(
/&([^&;]+);/g,
function (a, b) {
var r = entity[b];
return typeof r === 'string' ? r : a;
}
);
};
}());
a isn't the numerical offset, it's the matched substring.
b (in this case) is the first grouping, i.e., the match minus the surrounding & and ;.
The method checks to make sure the entity exists, and that it's a string. If it is, that's the replacement value, otherwise it's replaced by the original value, minus the & and ;
The replace function can take a function as the second parameter.
This function is then called for every match, with a signature that depends on the number of groups in the regular expression being searched for. If the regexp does not contain any capturing groups, a will be the matched substring, b the numerical offset in the whole string. For more details, refer to the MDN documentation.

Categories