Crockford's deentityify method - p.41 of The Good Parts - javascript

In a fit of self-improvement, I'm reading (and rereading) TGP by SeƱor Crockford. I cannot, however, understand the middlemost part of his deentityify method.
...
return this.replace(...,
function (a, b) {
var r = ...
}
);
I think I understand that:
this.replace is passed two arguments, the regex as the search value and the function to generate the replacement value;
the b is used to access the properties in the entity object;
the return ? r : a; bit determines whether to return the text as is or the value of the appropriate property in entity.
What I don't get at all is how the a & b are provided as arguments into function (a, b). What is calling this function? (I know the whole thing is self-executing, but that doesn't really clear it up for me. I guess I'm asking how is this function being called?)
If someone was interested in giving a blow by blow analysis akin to this, I'd really appreciate it, and I suspect others might too.
Here's the code for convenience:
String.method('deentityify', function ( ) {
var entity = {
quot: '"',
lt: '<',
gt: '>'
};
return function () {
return this.replace(
/&([^&;]+);/g,
function (a, b) {
var r = entity[b];
return typeof r === 'string' ? r : a;
}
);
};
}());

a isn't the numerical offset, it's the matched substring.
b (in this case) is the first grouping, i.e., the match minus the surrounding & and ;.
The method checks to make sure the entity exists, and that it's a string. If it is, that's the replacement value, otherwise it's replaced by the original value, minus the & and ;

The replace function can take a function as the second parameter.
This function is then called for every match, with a signature that depends on the number of groups in the regular expression being searched for. If the regexp does not contain any capturing groups, a will be the matched substring, b the numerical offset in the whole string. For more details, refer to the MDN documentation.

Related

Regex to define the number of appearances substituted [duplicate]

I'd like to know how to replace a capture group with its uppercase in JavaScript. Here's a simplified version of what I've tried so far that's not working:
> a="foobar"
'foobar'
> a.replace( /(f)/, "$1".toUpperCase() )
'foobar'
> a.replace( /(f)/, String.prototype.toUpperCase.apply("$1") )
'foobar'
Would you explain what's wrong with this code?
You can pass a function to replace.
var r = a.replace(/(f)/, function(v) { return v.toUpperCase(); });
Explanation
a.replace( /(f)/, "$1".toUpperCase())
In this example you pass a string to the replace function. Since you are using the special replace syntax ($N grabs the Nth capture) you are simply giving the same value. The toUpperCase is actually deceiving because you are only making the replace string upper case (Which is somewhat pointless because the $ and one 1 characters have no upper case so the return value will still be "$1").
a.replace( /(f)/, String.prototype.toUpperCase.apply("$1"))
Believe it or not the semantics of this expression are exactly the same.
I know I'm late to the party but here is a shorter method that is more along the lines of your initial attempts.
a.replace('f', String.call.bind(a.toUpperCase));
So where did you go wrong and what is this new voodoo?
Problem 1
As stated before, you were attempting to pass the results of a called method as the second parameter of String.prototype.replace(), when instead you ought to be passing a reference to a function
Solution 1
That's easy enough to solve. Simply removing the parameters and parentheses will give us a reference rather than executing the function.
a.replace('f', String.prototype.toUpperCase.apply)
Problem 2
If you attempt to run the code now you will get an error stating that undefined is not a function and therefore cannot be called. This is because String.prototype.toUpperCase.apply is actually a reference to Function.prototype.apply() via JavaScript's prototypical inheritance. So what we are actually doing looks more like this
a.replace('f', Function.prototype.apply)
Which is obviously not what we have intended. How does it know to run Function.prototype.apply() on String.prototype.toUpperCase()?
Solution 2
Using Function.prototype.bind() we can create a copy of Function.prototype.call with its context specifically set to String.prototype.toUpperCase. We now have the following
a.replace('f', Function.prototype.apply.bind(String.prototype.toUpperCase))
Problem 3
The last issue is that String.prototype.replace() will pass several arguments to its replacement function. However, Function.prototype.apply() expects the second parameter to be an array but instead gets either a string or number (depending on if you use capture groups or not). This would cause an invalid argument list error.
Solution 3
Luckily, we can simply substitute in Function.prototype.call() (which accepts any number of arguments, none of which have type restrictions) for Function.prototype.apply(). We have now arrived at working code!
a.replace(/f/, Function.prototype.call.bind(String.prototype.toUpperCase))
Shedding bytes!
Nobody wants to type prototype a bunch of times. Instead we'll leverage the fact that we have objects that reference the same methods via inheritance. The String constructor, being a function, inherits from Function's prototype. This means that we can substitute in String.call for Function.prototype.call (actually we can use Date.call to save even more bytes but that's less semantic).
We can also leverage our variable 'a' since it's prototype includes a reference to String.prototype.toUpperCase we can swap that out with a.toUpperCase. It is the combination of the 3 solutions above and these byte saving measures that is how we get the code at the top of this post.
Why don't we just look up the definition?
If we write:
a.replace(/(f)/, x => x.toUpperCase())
we might as well just say:
a.replace('f','F')
Worse, I suspect nobody realises that their examples have been working only because they were capturing the whole regex with parentheses. If you look at the definition, the first parameter passed to the replacer function is actually the whole matched pattern and not the pattern you captured with parentheses:
function replacer(match, p1, p2, p3, offset, string)
If you want to use the arrow function notation:
a.replace(/xxx(yyy)zzz/, (match, p1) => p1.toUpperCase()
Old post but it worth to extend #ChaosPandion answer for other use cases with more restricted RegEx. E.g. ensure the (f) or capturing group surround with a specific format /z(f)oo/:
> a="foobazfoobar"
'foobazfoobar'
> a.replace(/z(f)oo/, function($0,$1) {return $0.replace($1, $1.toUpperCase());})
'foobazFoobar'
// Improve the RegEx so `(f)` will only get replaced when it begins with a dot or new line, etc.
I just want to highlight the two parameters of function makes finding a specific format and replacing a capturing group within the format possible.
SOLUTION
a.replace(/(f)/,(m,g)=>g.toUpperCase())
for replace all grup occurrences use /(f)/g regexp. The problem in your code: String.prototype.toUpperCase.apply("$1") and "$1".toUpperCase() gives "$1" (try in console by yourself) - so it not change anything and in fact you call twice a.replace( /(f)/, "$1") (which also change nothing).
let a= "foobar";
let b= a.replace(/(f)/,(m,g)=>g.toUpperCase());
let c= a.replace(/(o)/g,(m,g)=>g.toUpperCase());
console.log("/(f)/ ", b);
console.log("/(o)/g", c);
Given a dictionary (object, in this case, a Map) of property, values, and using .bind() as described at answers
const regex = /([A-z0-9]+)/;
const dictionary = new Map([["hello", 123]]);
let str = "hello";
str = str.replace(regex, dictionary.get.bind(dictionary));
console.log(str);
Using a JavaScript plain object and with a function defined to get return matched property value of the object, or original string if no match is found
const regex = /([A-z0-9]+)/;
const dictionary = {
"hello": 123,
[Symbol("dictionary")](prop) {
return this[prop] || prop
}
};
let str = "hello";
str = str.replace(regex, dictionary[Object.getOwnPropertySymbols(dictionary)[0]].bind(dictionary));
console.log(str);
In the case of string conversion from CamelCase to bash_case (ie: for filenames), use a callback with ternary operator.
The captured group selected with a regexp () in the first (left) replace arg is sent to the second (right) arg that is a callback function.
x and y give the captured string (don't know why 2 times!) and index (the third one) gives the index of the beginning of the captured group in the reference string.
Therefor a ternary operator can be used not to place _ at first occurence.
let str = 'MyStringName';
str = str.replace(/([^a-z0-9])/g, (x,y,index) => {
return index != 0 ? '_' + x.toLowerCase() : x.toLowerCase();
});
console.log(str);

Working of javascript inline functions

I am not able to grasp how function(match, p1, p2) is working.
What is use of match parameter? The code breaks if I don't write match parameter.
function incrementString(input) {
if (isNaN(parseInt(input[input.length - 1]))) return input + '1';
return input.replace(/(0*)([0-9]+$)/, function(match, p1, p2) {
var up = parseInt(p2) + 1;
return up.toString().length > p2.length ? p1.slice(0, -1) + up : p1 + up;
});
}
P.S: I am new entirely using Js for development. However, I have been working on JSF and Java since past few years.
From MDN:
str.replace(regexp|substr, newSubStr|function[, flags])
In that case, we can see that two arguments are passed to replace, a regular expression literal and a function expression. So that's:
str.replace(regexp, function)
and MDN tells us what they are:
function (replacement)A function to be invoked to create the new
substring (to put in place of the substring received from parameter
1). The arguments supplied to this function are described in the "Specifying a function as a parameter" section below.
and
The arguments to the function are as follows:
etc. etc. I won't quote the entire table.
If you leave the match argument out of the parameter list, then the values assigned to p1 and p2 will be the first and second argument instead of the second and third. Those won't be the values you need.
It would be like taking this code:
function call_with_one_two_three(f) {
f(1,2,3);
}
call_with_one_two_three(function (one, two, three) {
alert(two + three);
});
And deciding that since you weren't using one you didn't need it:
function call_with_one_two_three(f) {
f(1,2,3);
}
call_with_one_two_three(function (two, three) {
alert(two + three);
});
That giving you two + three as 3.
In short: The position of arguments matters (and the name doesn't).

Javascript Variable Reference

Something I don't understand, which I'm sure someone with any simple knowledge of Javascript will get;
How does the 'm' variable referenced in this replace function actually refer to the input from the str - I don't understand how it takes the str as m?
str = str.replace("whatevers",function(m){ return m.toUpperCase(); })
Many thanks in advance. Tyler.
Each function defines how any functions passed in are used. The documentation for String.prototype.replace() explains how it's used in the section on specifying a function as a parameter.
Somewhere in the implementation of replace, that function you're passing in is called with several arguments. The full example is:
function replacer(match, p1, p2, p3, offset, string) {
return "replacement_text";
}
In the context of string replacing, if you pass in a function as the second parameter like the way you're doing, the first argument of that function that you pass in (in your case 'm') will be anything that matches your initial first argument (in this case "whatevers"). Once it finds a match, that gets assigns to 'm', and then it will perform the toUpperCase function on that variable 'm'.

JavaScript: alert object name as a string

I'm trying to alert any JavaScript object as a string, in a function. This means if the parameter given to the function is window.document, the actual object, it should alert "window.document" (without quotes) as a literal string.
The following calls...
example(window);
example(window.document);
example(document.getElementById('something'));
...calling this function...
function example(o) {/* A little help here please? */}
...should output the following strings...
window
window.document
document.getElementById('something')
I've attempted to do this with combinations of toString() and eval() among some more miscellaneous shots in the dark without success.
No need insane backwards compatibility, newer ECMAScript / JavaScript features/functions are fine. Feel free to inquire for clarifications though the goal should be pretty straight forward.
This is not possible to do in a self contained script.
If using a preprocessor would be an option, then you could write one which converts example(whatever) into example('whatever'). Other than that I'm afraid you're out of luck.
The first problem is that objects don't have names.
The second problem is that from your examples, you're not really wanting to print the (nonexistent) name of an object, you want to print the expression that evaluated into a reference to an object. That's what you're trying to do in this example:
example(document.getElementById('something'));
For that to print document.getElementById('something'), JavaScript would have had to keep the actual text of that expression somewhere that it would make available to you. But it doesn't do that. It merely evaluates the parsed and compiled expression without reference to the original text of the expression.
If you were willing to quote the argument to example(), then of course it would be trivial:
example( "document.getElementById('something')" );
Obviously in this case you could either print the string directly, or eval() it to get the result of the expression.
OTOH, if you want to try a real hack, here's a trick you could use in some very limited circumstances:
function example( value ) {
var code = arguments.callee.caller.toString();
var match = code.match( /example\s*\(\s*(.*)\s*\)/ );
console.log( match && match[1] );
}
function test() {
var a = (1);
example( document.getElementById('body') );
var b = (2);
}
test();
This will print what you wanted:
document.getElementById('body')
(The assignments to a and b in the test() function are just there to verify that the regular expression isn't picking up too much code.)
But this will fail if there's more than one call to example() in the calling function, or if that call is split across more than one line. Also, arguments.callee.caller has been deprecated for some time but is still supported by most browsers as long as you're not in strict mode. I suppose this hack could be useful for some kind of debugging purposes though.
Don't know why you need this, but you can try walking the object tree recursively and compare its nodes with your argument:
function objectName(x) {
function search(x, context, path) {
if(x === context)
return path;
if(typeof context != "object" || seen.indexOf(context) >= 0)
return;
seen.push(context);
for(var p in context) {
var q = search(x, context[p], path + "." + p);
if(q)
return q;
}
}
var seen = [];
return search(x, window, "window");
}
Example:
console.log(objectName(document.body))
prints for me
window.document.activeElement

How does Crockfords JSON Parser work?

I have stared for a long time at the code found here. It's Douglas Crockfords JSON-parsing function (called a recursive descent parser). Can anyone elaborate on the mechanics of this parser? I really can't get my head around it.
Logically you may start with the actual parse functions which starts at line 311 (omitted the receiver part for clarity).
function (source, reviver) {
var result;
text = source;
at = 0;
ch = ' ';
result = value();
white();
if (ch) {
error("Syntax error");
}
return result;
}
Initializes function global variables text with the source text, position at with position and current character ch with a space. Afterwards it parses a value by calling function value.
Each object to be parsed is encapsulated in a function itself (in above example the value object). There are several of them: number, string, white, ...). Each one does basically work in the same way. First we'll look into white as basic example:
white = function () {
// Skip whitespace.
while (ch && ch <= ' ') {
next();
}
}
Note that ch constains always the current character. This variable is only updated by next which reads in the next one. This can be seen within white where each whitespace is eaten by a call to next. Thus after calling this function the first non-space character will be in variable ch.
Let's look for a more complex example value:
value = function () {
// Parse a JSON value. It could be an object, an array, a string, a number,
// or a word.
white();
switch (ch) {
case '{':
return object();
case '[':
return array();
case '"':
return string();
case '-':
return number();
default:
return ch >= '0' && ch <= '9' ? number() : word();
}
};
It first parses whitespaces by calling white. Note that ch now contains the current character to be parsed. If it is a '{' we'll now that a json object is coming next and call the corresponding function object. If instead it is a '[' we expect an json array and so on.
All other functions are build the same way: inspect the current character, decide what has to come next and then parse this object.
The object itself may contain other values and therefore you'll find an indirect recursive call of function value in object again. Thus by recursively calling all the json object functions they are actually parsed from the source string.

Categories