What are JavaScript's builtin strings?

What are JavaScript's builtin strings? - javascript

this question is difficult to summarize in a question title
UPDATE
I created a JSFiddle that builds an obfuscated string out of your input based on the letters extracted from this question: You can access it here, or would a gist be easier?
I recently came across a fun bit of obfuscated JavaScript in this profile that looks like this:
javascript:[[]+1/!1][1^1][1>>1]+({}+[])[1<<1^11>>1]+([]+!!-
[])[1<<1]+[/~/+{}][+!1][-~1<<1]+([]+/-/[(!!1+[])[1>>1]+(!!1
+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^11<<1]+([,][
~1]+[])[1-~1]+[[]+{}][!1.1%1][11111.1%11.1*111e11|!1]+(/1/+
1/[1<1][1%1])[1^11]+[[],[]+{}][1][+1]+(/<</[1]+[])[1/1.1&1]
Sorry to ruin the surprise but when this is evaluated it returns this:
"I love you" in Chrome
"I lone you" In Firefox
"I lo[e you" in IE10
The way this works when broken out, is to generate a series of messages and to pull letters out of them like so (using the "I" as an example):
[]+1/!1
returns
"Infinity"
then
[[]+1/!1]
creates this array:
["Infinity"]
then
[[]+1/!1][1^1]
Takes the first (1^1 == 0) element of that array
"Infinity"
finally
[[]+1/!1][1^1][1>>1]
Takes the first (1>>1 == 0) char of that string
"I"
Other strings that are generated include:
({}+[]) -> "[object Object]" (where the space comes from)
([]+!!-[]) -> "false" (used for it's "l")
[/~/+{}][+!1] -> "/~/[object Object]" (this is used for an "o")
(/<</[1]+[]) -> "undefined"
I was interested in finding a replacement for the "n" and "[" and came up with this:
String.fromCharCode(('1'.charCodeAt(0)<<1)+(10<<1))
Which I feel in in the spirit of using 1's and 0's, but violates one of the more elegant aspects of the original code which is the appearance of having nothing to do with strings at all. Does anyone else have an idea of how to generate a "v" that is in keeping with the original obfuscated code?
Here is some extra information that was found after many talented JavaScript programers took a deeper look at this
Firefox returns "I lone you" Because of this line:
([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^11<<1]+
[1^11<<1] trims a specific character from this:
([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])
Which evaluates to this:
"function test() {
[native code]
}"
Which looks like we might have our "V"!!!
Chrome returns "I love you", because the same code returns this:
"function test() { [native code] }"
Before the question is closed for questionable connection with "a real programming issue", I thought I'd add a summarized solution that builds on #Supr's, #Cory's and #alpha123's, behold:
alert([[]+1/!1][1^1][1>>1]+({}+[])[1<<1^11>>1]+(
[]+!!-[])[1<<1]+[/~/+{}][+!1][-~1<<1]+[([]+/-/[(
!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(
!!1+[])[1^1]])[1+(1^(11+1+1)<<1)],([]+/-/[(!!1+[
])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[
])[1^1]])[1^11<<1],([]+/-/[(!!1+[])[1>>1]+(!!1+[
])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^(11
+1+1)<<1]][((([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<
1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[(1<<1<<1<<1
)+1<<1]==({}+[])[1^1])*1)+((([]+/-/[(!!1+[])[1>>
1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1
]])[(1^11<<1)-1]==({}+[])[1^1])<<1)]+([,][~1]+[]
)[1-~1]+[[]+{}][!1.1%1][11111.1%11.1*111e11|!1]+
(/1/+1/[1<1][1%1])[1^11]+[[],[]+{}][1][+1]+(/<</
[1]+[])[1/1.1&1])
Given the complexity of the code and the message it produces it's almost like the JavaScript engine is telling how special you make it feel :)

First of all, I would like to thank Jason and all the contributors for playing with that funny snippet. I have written that piece of code just for fun in order to send it to my wife on February 14 :) Having only Chrome installed on the laptop I had no options to check how it works in Firefox and IE. Moreover, I haven't really expected that toString() representation of build-in methods might look differently in other browsers.
Now, moving to the real problem, let's precisely have a look at the code. Yes, "v" was the real "problem" here. I found no other ways of getting this letter except parsing [native code] string, which can be taken from any built-in method. Since I limited myself with having no strings and no numbers except 1 used, I needed to exploit some method that has only available characters in its name.
Available characters can be obtained from existing keywords and string representations, i.e. from the start we had NaN, null, undefined, Infinity, true, false, and "[object Object]". Some of them can be easily converted to strings, e.g. 1/!1+[] gives "Infinity".
I have analyzed different build-in methods for arrays [], objects {}, regular expressions /(?:)/, numbers 1.1, strings "1", and discovered one beautiful method of RegExp object called test(). Its name can be assembled from all available characters, e.g. "t" and "e" from true, and "s" from false. I have created a string "test" and addressed this method using square brackets notation for regex literal /-/, correctly identified in this line:
/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]]
As was already discussed, this piece of code is evaluated in Chrome as:
function test() { [native code] }
in Firefox as:
function test() {
[native code]
}
and in IE as:
function test() { [native code] }
(in the latter pay special attention to the space before function keyword)
So, as you clearly see, my code was getting the 24th character from the presented string, which in Chrome was "v" (as was planned), but unfortunately in Firefox and IE -- "n" and "[" respectively.
In order to make the same output in all the browsers, I have used different approach than illustrated in the other answers. Now the modified version looks like that:
javascript:[[]+1/!1][1^1][1>>1]+({}+[])[1<<1^11>>1]+([]+!!-
[])[1<<1]+[/~/+{}][+!1][-~1<<1]+/\[[^1]+\]/[([]+![])[1<<1<<
1]+(/|/[(1+{})[1+11>>>1]+[[]+{}][+!1][1]+([]+1/[])[1<<1>>1]
+([1<1]+[])[1+11>>>1+1]+[[!!1]+1][+[]][1-1]+([]+!!/!/)[1|1]
+(/1/[1]+[])[!1%1]+(-{}+{})[-1+1e1-1]+(1+[!!1])[1]+([]+1+{}
)[1<<1]+[!!/!!/+[]][+[]][1&1]]+/=/)[1e1+(1<<1|1)+(([]+/-/[(
!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1
]])[1^1]==+!1)]+(!![]+{})[1|1<<1]+[1+{}+1][!1+!1][(11>>1)+1
]](([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+
(!!1+[])[1^1]]))[1&.1][11>>>1]+([,][~1]+[])[1-~1]+[[]+{}][!
1.1%1][11111.1%11.1*111e11|!1]+(/1/+1/[1<1][1%1])[1^11]+[[]
,[]+{}][1<<1>>>1][1||1]+(/[<+>]/[1&1|1]+[1.1])[1/11.1&1.11]
However, in order to intrigue the readers I won't provide a solution for that. I honestly believe that you will easily understand how it works... and some can even surprise their beloved in cross-browser way ;)
P.S. Yet Another Obfuscator
Inspired by Jason's idea to create a universal obfuscating tool, I have written one more. You can find it at JSBin: http://jsbin.com/amecoq/2. It can obfuscate any text that contains numbers [0-9], small latin letters [a-z], and spaces. The string length is limited mostly with your RAM (at least the body of my answer was successfully obfuscated). The output is supported by Chrome, Firefox, and IE.
Hint: the tool uses different obfuscation approach than was presented above.

Why isn't the native code bit from the question being used? This one gives a 'v' in both Chrome and Firefox:
([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^11<<1]>([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^(11+1+1)<<1]?([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^11<<1]:([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^(11+1+1)<<1]
Edit to support IE and do it without the ternary operator:
This one works in Chrome, IE, and FF. Builds an array and uses == to determine browser.
[([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1+(1^(11+1+1)<<1)],([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^11<<1],([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^(11+1+1)<<1]][((([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[(1<<1<<1<<1)+1<<1]==({}+[])[1^1])*1)+((([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[(1^11<<1)-1]==({}+[])[1^1])<<1)]
Readable:
[
//ie
([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1+(1^(11+1+1)<<1)],
//ch
([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^11<<1],
//ff
([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^(11+1+1)<<1]
]
[
//ch?
((([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[(1<<1<<1<<1)+1<<1]==({}+[])[1^1])*1)+
//ff?
((([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[(1^11<<1)-1]==({}+[])[1^1])<<1)
]

This is about as close as I could get, unfortunately it violates the convention of the original obfuscation by making a call to unescape():
unescape((/%/+[])[1]+(/1/[1]+[])[1%1]+(+!1)+(+!1)+(1e1+(11*(1-~1)<<1)))
Teardown:
(/%/+[])[1] => "%"
(/1/[1]+[])[1%1] => "u"
(+!1) => "0"
(+!1) => "0"
(1e1+(11*(1-~1)<<1)) => "76"
===========================
unescape("%u0076") => "v"
Other ideas:
Somehow get to unescape("\x76")
Somehow convert 118 without calling String.fromCharCode()
Get the text from an exception with the word "Invalid" in it
Updates:
I started playing code golf and have been making it shorter, replacing parts with more 1s, etc.

Here's the part that generates the n/v:
([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[1^11<<1]
In Firefox, ([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]]) evaluates to
"function test() {
[native code]
}"
while in Chrome it is
"function test() { [native code] }"
1^11<<1 equals 23. So due to Firefox's extra whitespace, this isn't quite enough to get to the 'v', and is instead 'n'.
And this is why you shouldn't rely on Function#toString behavior. ;)
EDIT:
Finally I found a reasonably obfuscated cross-browser version:
[[]+1/!1][1^1][1>>1]+({}+[])[1<<1^11>>1]+([]+!!-[])[1<<1]+[/~/+{}][+!1][-~1<<1]+([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[(1^11<<1)+(parseInt("010")<10?(1+1+1+1):0)]+([,][~1]+[])[1-~1]+[[]+{}][!1.1%1][11111.1%11.1*111e11|!1]+(/1/+1/[1<1][1%1])[1^11]+[[],[]+{}][1][+1]+(/<</[1]+[])[1/1.1&1]
This replaces the n/v section with:
([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]])[(1^11<<1)+(parseInt("010")<10?(1+1+1+1):0)]
which exploits differences in parseInt (apparently Firefox parses numbers starting with 0 as octal, while Chrome doesn't) to add 4 in Firefox's case, thus getting 'v' from 'native' in both cases (I can't find another 'v' :P).
The parseInt looks a little out of place, but that's the best I can do for now.

For the general use-case, if character casing isn't a big concern, I might be inclined to cheat a little.
Create function "c" which turns a number 0 .. 25 into a character.
c=function(v){for(var i in window){for(var ci in i){if(parseInt(i[ci],(10+11+11)+(1<<1)+(1<<1))==(v+10)){return i[ci]}}}};
For performance reasons, pre-cache the letters, if you want.
l=[];for(var i=0; i<(11+11)+(1<<1)+(1<<1);i++){l[i]=c(i);}
In the Chrome console, the resulting array looks like this:
> l;
["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "K", "l", "m", "n", "o", "p", "q", "r", "S", "t", "u", "v", "w", "x", "y", "Z"]
So ... your v might be l[10+10+1].
Alternatively, a general solution like this:
p=(function(){10%11}+[])[1+11+(1<<1)]; // "%"
u=(function(){club=1}+[])[1+11+(1<<1)]; // "u"
vc=p+u+(0+[])+(0+[])+((111>>1)+11+10+[]); // "%u0076"
unescape(vc);
Or, for this specific problem, maybe just:
(function(){v=1}+[])[10+(1<<1)]; // "v"

This gives a v in Chrome:
Object.getOwnPropertyNames(Object)[17][3];
And this does it in Firefox:
Object.getOwnPropertyNames(Object)[9][3]
They both pull it out of Object.prototype.preventExtensions(), so you could probably find a cross-browser way to reference that method. (It's the only 17-character name in Object.Prototype for starters.)
Feel free to build a more-obfuscated version of this and take all the credit for yourself, I'm out of time ;)

In chrome, the expression ([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]]) evaluates to "function test() { [native code] }", the [1^11<<1] evaluates to 23 (bitwise operators cause the variable to be truncated to 32 bits)

Related

Node.js SyntaxError: Invalid or unexpected token (but its weird)

I have never seen this error in my life
?
And I just found it so odd...
My code here is seen like this:
const no = require('noparenth');
(at index.js:1)
and I have literally never seen this in my life...
the library in question is one that just allows you to invoke a function without parenthesis. I made it so it would make it easier to code + make it faster.
the code is as seen here:
Function.prototype.valueOf = function() {
this.call(this);
return 0;
};
Thats it...
actual error:
C:\Users\summit\testing\index.js:1
��c
SyntaxError: Invalid or unexpected token

One possible way to get that type of characters is to save a file as UTF-16 LE and load it as if it were UTF-8.
If you use PowerShell, it's easy to produce a file in UTF-16 LE encoding, for example if you run echo "" > index.js. If you open the resulting file in VSCode, you can see at the bottom right that it's UTF-16 LE.
Note that the resulting file is not really empty, and you can verify that by inputting require('fs').readFileSync('index.js') in the Node.js REPL.
You'll get <Buffer ff fe 0d 00 0a 00> which, when interpreted in UTF-16 LE, consists of byte order mark (U+FEFF), carriage return (U+000D) and new line character (U+000A).
Even if you open that file with a text editor and replace everything with your text, the byte order mark will still be there (as long as you still save in UTF-16 LE). For example, if you save const x = 0 the buffer will become like ff fe 63 00 ..., notice that the ff fe still is there.
When a program attempts to load it as UTF-8, it will get <invalid character> <invalid character> c <null character> and so on.
What you're seeing in the output is exactly from <invalid character> <invalid character> c (Unicode: U+FFFD U+FFFD U+0063), obviously not a valid token in Node.js.
As for your actual function, it will only be invoked when you have type coercions (like +myFunction or myFunction + "") or if you directly call it like myFunction.valueOf().
And I expect it to make it (slightly) slower, because it calls a user-defined function rather than just native code.
And apart from the bad idea of modifying prototypes of objects you don't own, there is limited usefulness (if any) to this.
The this.call(this) means that you can't add more arguments to the function call.
Also, it's like myFunction.call(myFunction), I'm not sure why you would want to do that. And when used on a method of an object, it's like myObject.myMethod.call(myObject.myMethod) (with this equal to the function itself) rather than myObject.myMethod.call(myObject)
And it will break things that depend on the behavior of myFunction + "" etc.
It seems not that easy to find an example, nevertheless I found this example :
isNative(Math.sin) was originally supposed to return true, but after modifying the prototype the way shown above, it became 3

JSON.stringify and "\u2028\u2029" check?

Sometimes I see in a view source page ( html view source) this code:
if (JSON.stringify(["\u2028\u2029"]) === '["\u2028\u2029"]') JSON.stringify = function (a) {
var b = /\u2028/g,
c = /\u2029/g;
return function (d, e, f) {
var g = a.call(this, d, e, f);
if (g) {
if (-1 < g.indexOf('\u2028')) g = g.replace(b, '\\u2028');
if (-1 < g.indexOf('\u2029')) g = g.replace(c, '\\u2029');
}
return g;
};
}(JSON.stringify);
What is the problem with JSON.stringify(["\u2028\u2029"]) that it needs to be checked ?
Additional info :
JSON.stringify(["\u2028\u2029"]) value is "["  "]"
'["\u2028\u2029"]' value is also "["  "]"

I thought it might be a security feature. FileFormat.info for 2028 and 2029 have a banner stating
Do not use this character in domain names. Browsers are blacklisting it because of the potential for phishing.
But it turns out that the line and paragraph separators \u2028 and \u2029 respectively are treated as a new line in ES5 JavaScript.
From http://www.thespanner.co.uk/2011/07/25/the-json-specification-is-now-wrong/
\u2028 and \u2029 characters that can break entire JSON feeds since the string will contain a new line and the JavaScript parser will bail out
So you are seeing a patch for JSON.stringify. Also see Node.js JavaScript-stringify
Edit: Yes, modern browsers' built-in JSON object should take care of this correctly. I can't find any links to the actual source to back this up though. The Chromium code search doesn't mention any bugs that would warrant adding this workaround manually. It looks like Firefox 3.5 was the first version to have native JSON support, not entirely bug-free though. IE8 supports it too. So it is likely a now unnecessary patch, assuming browsers have implemented the specification correctly.

After reading both answers , here is the Simple visual explanation :
doing this
alert(JSON.stringify({"a":"sddd\u2028sssss"})) // can cause problems
will alert :
While changing the trouble maker to something else ( for example from \u to \1u)
will alert :
Now , let's invoke the function from my original Q ,
Lets try this alert(JSON.stringify({"a":"sddd\u2028sssss"})) again :
result :
and now , everybody's happy.

\u2028 and \u2029 are invisible Unicode line and paragraph separator characters. Natively JSON.stringify method converts these codes to their symbolic representation (as JavaScript automatically does in the strings), resulting in "["  "]". The code you have provided does not let JSON to convert the codes to symbols and preserves their \uXXXX representation in the output string, i.e. returning "["\u2028\u2029"]".

How to stop Firebug from truncating strings in the console?

While debugging, I frequently dump strings and arrays to the console. But in some cases, Firebug chops up string values, making it hard to be sure of the result.
For example, this code in the console:
console.log ( [
"123456789A123456789B123456789C123456789D123456789E123456789F123456789G",
"123456789A123456789B123456789C123456789D123456789E123456789F123456789G"
] );
Yields:
[ "123456789A123456789B123...89E123456789F123456789G",
"123456789A123456789B123...89E123456789F123456789G"
]
(Bad!)
A single string is okay. This:
console.log ("123456789A123456789B123456789C123456789D123456789E123456789F123456789G");
Yields:
123456789A123456789B123456789C123456789D123456789E123456789F123456789G
as expected.
But, arrays and objects get shortened.
How do I stop this behavior? Is this a bug? (My Google-Fu has failed, so far.)

Okay, after pawing through the list of Firebug Preferences (there's 204 of those, right now, and not in an apparent order), I found stringCropLength.
It defaults to 50, which makes sense, since the test strings were truncated to 123456789A123456789B123...89E123456789F123456789G, which is 49 characters long.
Opening about:config and setting extensions.firebug.stringCropLength to 0, stopped the strings from being truncated!
Note that according to Issue 5898: Introduce different string cropping preferences, this preference might affect a few things (for now). But, so far, I've seen no ill effects from having this set to no "cropping".

Use console.dir instead of console.log - the output has a + near it which lets you expand the string.

Use console.dir("....") function instead of console.log("...")
Also u might be intersted to look at firebug preferences

Using PEG Parser for BBCode Parsing: pegjs or ... what?

I have a bbcode -> html converter that responds to the change event in a textarea. Currently, this is done using a series of regular expressions, and there are a number of pathological cases. I've always wanted to sharpen the pencil on this grammar, but didn't want to get into yak shaving. But... recently I became aware of pegjs, which seems a pretty complete implementation of PEG parser generation. I have most of the grammar specified, but am now left wondering whether this is an appropriate use of a full-blown parser.
My specific questions are:
As my application relies on translating what I can to HTML and leaving the rest as raw text, does implementing bbcode using a parser that can fail on a syntax error make sense? For example: [url=/foo/bar]click me![/url] would certainly be expected to succeed once the closing bracket on the close tag is entered. But what would the user see in the meantime? With regex, I can just ignore non-matching stuff and treat it as normal text for preview purposes. With a formal grammar, I don't know whether this is possible because I am relying on creating the HTML from a parse tree and what fails a parse is ... what?
I am unclear where the transformations should be done. In a formal lex/yacc-based parser, I would have header files and symbols that denoted the node type. In pegjs, I get nested arrays with the node text. I can emit the translated code as an action of the pegjs generated parser, but it seems like a code smell to combine a parser and an emitter. However, if I call PEG.parse.parse(), I get back something like this:
[
[
"[",
"img",
"",
[
"/",
"f",
"o",
"o",
"/",
"b",
"a",
"r"
],
"",
"]"
],
[
"[/",
"img",
"]"
]
]
given a grammar like:
document
= (open_tag / close_tag / new_line / text)*
open_tag
= ("[" tag_name "="? tag_data? tag_attributes? "]")
close_tag
= ("[/" tag_name "]")
text
= non_tag+
non_tag
= [\n\[\]]
new_line
= ("\r\n" / "\n")
I'm abbreviating the grammar, of course, but you get the idea. So, if you notice, there is no contextual information in the array of arrays that tells me what kind of a node I have and I'm left to do the string comparisons again even thought the parser has already done this. I expect it's possible to define callbacks and use actions to run them during a parse, but there is scant information available on the Web about how one might do that.
Am I barking up the wrong tree? Should I fall back to regex scanning and forget about parsing?
Thanks

First question (grammar for incomplete texts):
You can add
incomplete_tag = ("[" tag_name "="? tag_data? tag_attributes?)
// the closing bracket is omitted ---^
after open_tag and change document to include an incomplete tag at the end. The trick is that you provide the parser with all needed productions to always parse, but the valid ones come first. You then can ignore incomplete_tag during the live preview.
Second question (how to include actions):
You write socalled actions after expressions. An action is Javascript code enclosed by braces and are allowed after a pegjs expression, i. e. also in the middle of a production!
In practice actions like { return result.join("") } are almost always necessary because pegjs splits into single characters. Also complicated nested arrays can be returned. Therefore I usually write helper functions in the pegjs initializer at the head of the grammar to keep actions small. If you choose the function names carefully the action is self-documenting.
For an examle see PEG for Python style indentation. Disclaimer: this is an answer of mine.

Regarding your first question I have tosay that a live preview is going to be difficult. The problems you pointed out regarding that the parser won't understand that the input is "work in progress" are correct. Peg.js tells you at which point the error is, so maybe you could take that info and go a few words back and parse again or if an end tag is missing try adding it at the end.
The second part of your question is easier but your grammar won't look so nice afterwards. Basically what you do is put callbacks on every rule, so for example
text
= text:non_tag+ {
// we captured the text in an array and can manipulate it now
return text.join("");
}
At the moment you have to write these callbacks inline in your grammar. I'm doing a lot of this stuff at work right now, so I might make a pullrequest to peg.js to fix that. But I'm not sure when I find the time to do this.

Try something like this replacement rule. You're on the right track; you just have to tell it to assemble the results.
text
= result:non_tag+ { return result.join(''); }

Coding convention in Javascript: use of spaces between parentheses

According to JSHint, a Javascript programmer should not add a space after the first parenthesis and before the last one.
I have seen a lot of good Javascript libraries that add spaces, like this:
( foo === bar ) // bad according to JSHint
instead of this way:
(foo === bar) // good according to JSHint
Frankly, I prefer the first way (more spaces) because it makes the code more readable. Is there a strong reason to prefer the second way, which is recommended by JSHint?

This is my personal preference with reasons as to why.
I will discuss the following items in the accepted answer but in reverse order.
note-one not picking on Alnitak, these comments are common to us all...
note-two Code examples are not written as code blocks, because syntax highlighting deters from the actual question of whitespace only.
I've always done it that way.
Not only is this never a good reason to defend a practice in programming, but it also is never a good reason to defend ANY idea opposing change.
JS file download size matters [although minification does of course fix that]
Size will always matter for Any file(s) that are to be sent over-the-wire, which is why we have minification to remove unnecessary whitespace. Since JS files can now be reduced, the debate over whitespace in production code is moot.
moot: of little or no practical value or meaning; purely academic.
moot definition
Now we move on to the core issue of this question. The following ideas are mine only, and I understand that debate may ensue. I do not profess that this practice is correct, merely that it is currently correct for me. I am willing to discuss alternatives to this idea if it is sufficiently shown to be a poor choice.
It's perfectly readable and follows the vast majority of formatting conventions in Javascript's ancestor languages
There are two parts to this statement: "It's perfectly readable,"; "and follows the vast majority of formatting conventions in Javascript's ancestor languages"
The second item can be dismissed as to the same idea of I've always done it that way.
So let's just focus on the first part of the statement It's perfectly readable,"
First, let's make a few statements regarding code.
Programming languages are not for computers to read, but for humans to read.
In the English language, we read left to right, top to bottom.
Following established practices in English grammar will result in more easily read code by a larger percentage of programmers that code in English.
NOTE: I am establishing my case for the English language only, but may apply generally to many Latin-based languages.
Let's reduce the first statement by removing the adverb perfectly as it assumes that there can be no improvement. Let's instead work on what's left: "It's readable". In fact, we could go all JS on it and create a variable: "isReadable" as a boolean.
THE QUESTION
The question provides two alternatives:
( foo === bar )
(foo === bar)
Lacking any context, we could fault on the side of English grammar and go with the second option, which removes the whitespace. However, in both cases "isReadable" would easily be true.
So let's take this a step further and remove all whitespace...
(foo===bar)
Could we still claim isReadable to be true? This is where a boolean value might not apply so generally. Let's move isReadable to an Float where 0 is unreadable and 1 is perfectly readable.
In the previous three examples, we could assume that we would get a collection of values ranging from 0 - 1 for each of the individual examples, from each person we asked: "On a scale of 0 - 1, how would you rate the readability of this text?"
Now let's add some JS context to the examples...
if ( foo === bar ) { } ;
if(foo === bar){};
if(foo===bar){};
Again, here is our question: "On a scale of 0 - 1, how would you rate the readability of this text?"
I will make the assumption here that there is a balance to whitespace: too little whitespace and isReadable approaches 0; too much whitespace and isReadable approaches 0.
example: "Howareyou?" and "How are you ?"
If we continued to ask this question after many JS examples, we may discover an average limit to acceptable whitespace, which may be close to the grammar rules in the English language.
But first, let's move on to another example of parentheses in JS: the function!
function isReadable(one, two, three){};
function examineString(string){};
The two function examples follow the current standard of no whitespace between () except after commas. The next argument below is not concerned with how whitespace is used when declaring a function like the examples above, but instead the most important part of the readability of code: where the code is invoked!
Ask this question regarding each of the examples below...
"On a scale of 0 - 1, how would you rate the readability of this text?"
examineString(isReadable(string));
examineString( isReadable( string ));
The second example makes use of my own rule
whitespace in-between parentheses between words, but not between opening or closing punctuation.
i.e. not like this examineString( isReadable( string ) ) ;
but like this examineString( isReadable( string ));
or this examineString( isReadable({ string: string, thing: thing });
If we were to use English grammar rules, then we would space before the "(" and our code would be...
examineString (isReadable (string));
I am not in favor of this practice as it breaks apart the function invocation away from the function, which it should be part of.
examineString(); // yes; examineString (): // no;
Since we are not exactly mirroring proper English grammar, but English grammar does say that a break is needed, then perhaps adding whitespace in-between parentheses might get us closer to 1 with isReadable?
I'll leave it up to you all, but remember the basic question:
"Does this change make it more readable, or less?"
Here are some more examples in support of my case.
Assume functions and variables have already been declared...
input.$setViewValue(setToUpperLimit(inputValue));
Is this how we write a proper English sentence?
input.$setViewValue( setToUpperLimit( inputValue ));
closer to 1?
config.urls['pay-me-now'].initialize(filterSomeValues).then(magic);
or
config.urls[ 'pay-me-now' ].initialize( fitlerSomeValues ).then( magic );
(spaces just like we do with operators)
Could you imagine no whitespace around operators?
var hello='someting';
if(type===undefined){};
var string="I"+"can\'t"+"read"+"this";
What I do...
I space between (), {}, and []; as in the following examples
function hello( one, two, three ){
return one;
}
hello( one );
hello({ key: value, thing1: thing2 });
var array = [ 1, 2, 3, 4 ];
array.slice( 0, 1 );
chain[ 'things' ].together( andKeepThemReadable, withPunctuation, andWhitespace ).but( notTooMuch );

There are few if any technical reasons to prefer one over the other - the reasons are almost entirely subjective.
In my case I would use the second format, simply because:
It's perfectly readable, and follows the vast majority of formatting conventions in Javascript's ancestor languages
JS file download size matters [although minification does of course fix that]
I've always done it that way.

Quoting Code Conventions for the JavaScript Programming Language:
All binary operators except . (period) and ( (left parenthesis) and [ (left bracket) should be separated from their operands by a space.
and:
There should be no space between the name of a function and the ( (left parenthesis) of its parameter list.

I prefer the second format. However there are also coding style standards out there that insist on the first. Given the fact that javascript is often transmitted as source (e.g. any client-side code), one could see a slightly stronger case with it than with other languages, but only marginally so.
I find the second more readable, you find the first more readable, and since we aren't working on the same code we should each stick as we like. Were you and I to collaborate then it would probably be better that we picked one rather than mixed them (less readable than either), but while there have been holy wars on such matters since long before javascript was around (in other languages with similar syntax such as C), both have their merits.

I use the second (no space) style most of the time, but sometimes I put spaces if there are nested brackets - especially nested square brackets which for some reason I find harder to read than nested curved brackets (parentheses). Or to put that another way, I'll start any given expression without spaces, but if I find it hard to read I insert a few spaces to compare, and leave 'em in if they helped.
Regarding JS Hint, I wouldn't worry- this particular recommendation is more a matter of opinion. You're not likely to introduce bugs because of this one.

I used JSHint to lint this code snippet and it didn't give such an advice:
if( window )
{
var me = 'me';
}

I personally use no spaces between the arguments in parentheses and the parentheses themselves for one reason: I use keyboard navigation and keyboard shortcuts. When I navigate around the code, I expect the cursor to jump to the next variable name, symbol etc, but adding spaces messes things up for me.
It's just personal preference as it all gets converted to the same bytecode/binary at the end of the day!

Standards are important and we should follow them, but not blindly.
To me, this question is about that syntax styling should be all about readability.
this.someMethod(toString(value),max(value1,value2),myStream(fileName));
this.someMethod( toString( value ), max( value1, value2 ), myStream( fileName ) );
The second line is clearly more readable to me.
In the end, it may come down to personal preference, but I would ask those who prefer the 1st line if they really make their choice because "they are used it" or because they truly believe it's more readable.
If it's something you are used to, then a short time investment into a minor discomfort for a long term benefit might be worth the switch.

We Keep Coding

JavaScript is the programming language of the Web.

What are JavaScript's builtin strings? - javascript

In chrome, the expression ([]+/-/[(!!1+[])[1>>1]+(!!1+[])[1<<1^1]+(!1+[])[1|1<<1]+(!!1+[])[1^1]]) evaluates to "function test() { [native code] }", the [1^11<<1] evaluates to 23 (bitwise operators cause the variable to be truncated to 32 bits)

Related

Node.js SyntaxError: Invalid or unexpected token (but its weird)

JSON.stringify and "\u2028\u2029" check?

How to stop Firebug from truncating strings in the console?

Using PEG Parser for BBCode Parsing: pegjs or ... what?

Coding convention in Javascript: use of spaces between parentheses

Categories

Resources