Coding convention in Javascript: use of spaces between parentheses - javascript

According to JSHint, a Javascript programmer should not add a space after the first parenthesis and before the last one.
I have seen a lot of good Javascript libraries that add spaces, like this:
( foo === bar ) // bad according to JSHint
instead of this way:
(foo === bar) // good according to JSHint
Frankly, I prefer the first way (more spaces) because it makes the code more readable. Is there a strong reason to prefer the second way, which is recommended by JSHint?

This is my personal preference with reasons as to why.
I will discuss the following items in the accepted answer but in reverse order.
note-one not picking on Alnitak, these comments are common to us all...
note-two Code examples are not written as code blocks, because syntax highlighting deters from the actual question of whitespace only.
I've always done it that way.
Not only is this never a good reason to defend a practice in programming, but it also is never a good reason to defend ANY idea opposing change.
JS file download size matters [although minification does of course fix that]
Size will always matter for Any file(s) that are to be sent over-the-wire, which is why we have minification to remove unnecessary whitespace. Since JS files can now be reduced, the debate over whitespace in production code is moot.
moot: of little or no practical value or meaning; purely academic.
moot definition
Now we move on to the core issue of this question. The following ideas are mine only, and I understand that debate may ensue. I do not profess that this practice is correct, merely that it is currently correct for me. I am willing to discuss alternatives to this idea if it is sufficiently shown to be a poor choice.
It's perfectly readable and follows the vast majority of formatting conventions in Javascript's ancestor languages
There are two parts to this statement: "It's perfectly readable,"; "and follows the vast majority of formatting conventions in Javascript's ancestor languages"
The second item can be dismissed as to the same idea of I've always done it that way.
So let's just focus on the first part of the statement It's perfectly readable,"
First, let's make a few statements regarding code.
Programming languages are not for computers to read, but for humans to read.
In the English language, we read left to right, top to bottom.
Following established practices in English grammar will result in more easily read code by a larger percentage of programmers that code in English.
NOTE: I am establishing my case for the English language only, but may apply generally to many Latin-based languages.
Let's reduce the first statement by removing the adverb perfectly as it assumes that there can be no improvement. Let's instead work on what's left: "It's readable". In fact, we could go all JS on it and create a variable: "isReadable" as a boolean.
THE QUESTION
The question provides two alternatives:
( foo === bar )
(foo === bar)
Lacking any context, we could fault on the side of English grammar and go with the second option, which removes the whitespace. However, in both cases "isReadable" would easily be true.
So let's take this a step further and remove all whitespace...
(foo===bar)
Could we still claim isReadable to be true? This is where a boolean value might not apply so generally. Let's move isReadable to an Float where 0 is unreadable and 1 is perfectly readable.
In the previous three examples, we could assume that we would get a collection of values ranging from 0 - 1 for each of the individual examples, from each person we asked: "On a scale of 0 - 1, how would you rate the readability of this text?"
Now let's add some JS context to the examples...
if ( foo === bar ) { } ;
if(foo === bar){};
if(foo===bar){};
Again, here is our question: "On a scale of 0 - 1, how would you rate the readability of this text?"
I will make the assumption here that there is a balance to whitespace: too little whitespace and isReadable approaches 0; too much whitespace and isReadable approaches 0.
example: "Howareyou?" and "How are you ?"
If we continued to ask this question after many JS examples, we may discover an average limit to acceptable whitespace, which may be close to the grammar rules in the English language.
But first, let's move on to another example of parentheses in JS: the function!
function isReadable(one, two, three){};
function examineString(string){};
The two function examples follow the current standard of no whitespace between () except after commas. The next argument below is not concerned with how whitespace is used when declaring a function like the examples above, but instead the most important part of the readability of code: where the code is invoked!
Ask this question regarding each of the examples below...
"On a scale of 0 - 1, how would you rate the readability of this text?"
examineString(isReadable(string));
examineString( isReadable( string ));
The second example makes use of my own rule
whitespace in-between parentheses between words, but not between opening or closing punctuation.
i.e. not like this examineString( isReadable( string ) ) ;
but like this examineString( isReadable( string ));
or this examineString( isReadable({ string: string, thing: thing });
If we were to use English grammar rules, then we would space before the "(" and our code would be...
examineString (isReadable (string));
I am not in favor of this practice as it breaks apart the function invocation away from the function, which it should be part of.
examineString(); // yes; examineString (): // no;
Since we are not exactly mirroring proper English grammar, but English grammar does say that a break is needed, then perhaps adding whitespace in-between parentheses might get us closer to 1 with isReadable?
I'll leave it up to you all, but remember the basic question:
"Does this change make it more readable, or less?"
Here are some more examples in support of my case.
Assume functions and variables have already been declared...
input.$setViewValue(setToUpperLimit(inputValue));
Is this how we write a proper English sentence?
input.$setViewValue( setToUpperLimit( inputValue ));
closer to 1?
config.urls['pay-me-now'].initialize(filterSomeValues).then(magic);
or
config.urls[ 'pay-me-now' ].initialize( fitlerSomeValues ).then( magic );
(spaces just like we do with operators)
Could you imagine no whitespace around operators?
var hello='someting';
if(type===undefined){};
var string="I"+"can\'t"+"read"+"this";
What I do...
I space between (), {}, and []; as in the following examples
function hello( one, two, three ){
return one;
}
hello( one );
hello({ key: value, thing1: thing2 });
var array = [ 1, 2, 3, 4 ];
array.slice( 0, 1 );
chain[ 'things' ].together( andKeepThemReadable, withPunctuation, andWhitespace ).but( notTooMuch );

There are few if any technical reasons to prefer one over the other - the reasons are almost entirely subjective.
In my case I would use the second format, simply because:
It's perfectly readable, and follows the vast majority of formatting conventions in Javascript's ancestor languages
JS file download size matters [although minification does of course fix that]
I've always done it that way.

Quoting Code Conventions for the JavaScript Programming Language:
All binary operators except . (period) and ( (left parenthesis) and [ (left bracket) should be separated from their operands by a space.
and:
There should be no space between the name of a function and the ( (left parenthesis) of its parameter list.

I prefer the second format. However there are also coding style standards out there that insist on the first. Given the fact that javascript is often transmitted as source (e.g. any client-side code), one could see a slightly stronger case with it than with other languages, but only marginally so.
I find the second more readable, you find the first more readable, and since we aren't working on the same code we should each stick as we like. Were you and I to collaborate then it would probably be better that we picked one rather than mixed them (less readable than either), but while there have been holy wars on such matters since long before javascript was around (in other languages with similar syntax such as C), both have their merits.

I use the second (no space) style most of the time, but sometimes I put spaces if there are nested brackets - especially nested square brackets which for some reason I find harder to read than nested curved brackets (parentheses). Or to put that another way, I'll start any given expression without spaces, but if I find it hard to read I insert a few spaces to compare, and leave 'em in if they helped.
Regarding JS Hint, I wouldn't worry- this particular recommendation is more a matter of opinion. You're not likely to introduce bugs because of this one.

I used JSHint to lint this code snippet and it didn't give such an advice:
if( window )
{
var me = 'me';
}

I personally use no spaces between the arguments in parentheses and the parentheses themselves for one reason: I use keyboard navigation and keyboard shortcuts. When I navigate around the code, I expect the cursor to jump to the next variable name, symbol etc, but adding spaces messes things up for me.
It's just personal preference as it all gets converted to the same bytecode/binary at the end of the day!

Standards are important and we should follow them, but not blindly.
To me, this question is about that syntax styling should be all about readability.
this.someMethod(toString(value),max(value1,value2),myStream(fileName));
this.someMethod( toString( value ), max( value1, value2 ), myStream( fileName ) );
The second line is clearly more readable to me.
In the end, it may come down to personal preference, but I would ask those who prefer the 1st line if they really make their choice because "they are used it" or because they truly believe it's more readable.
If it's something you are used to, then a short time investment into a minor discomfort for a long term benefit might be worth the switch.

Related

Methods for de-obfuscating javascript that uses string concatenation for property names

I am trying to puzzle out a way to de-obfuscate javascript that looks like this:
https://jsfiddle.net/douglasg14b/4951br9f/2/
var testString = 'Test | String'
var wf6 = {
fq4: 'su',
k8d: 'bs',
l8z: 'tri',
cy1: 'ng',
t5j: 'te',
ol: 'stS',
x3q: 'tri',
l9x: 'ng',
gh: 'xO'
};
//Obfuscated
let test1 = testString[wf6.fq4 + wf6.k8d + wf6.l8z + wf6.cy1](4,11);
//Normal
let test2 = testString.substring(4,11);
let test3;
//More complex obfuscation
(function moreComplex(){
let h = "i",
w = "nde",
T0 = "f",
hj = '|',
a = eval(wf6.t5j + wf6.ol + wf6.x3q + wf6.l9x).length;
//Obfuscated
test3 = testString[wf6.fq4 + wf6.k8d + wf6.l8z + wf6.cy1](testString[h + w + wf6.gh + T0](hj), a);
//Normal
let test4 = testString.substring(testString.indexOf('|'), testString.length);
})();
$('.span1').text(test1);
$('.span2').text(test3);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<span class="span1"></span><br>
<span class="span2"></span>
This is a small example, the file I'm working with is ~60k lines long and is full this kind of obfuscation. Everywhere a string can be used as a property name, this kind of obfuscation is used.
The way I can think of doing this, is to evaluate all the string concatenations so they are turned into a readable equivalent. Though, I am not sure how to go about this and ignore all the other working code that exists between all the concatenations.
Thoughts?
Bonus question: Is there a commonly used name for this kind of obfuscation that might make searches a bit easier?
Edit: Added a more complex example.
You have the basic idea right: you have to partially-evaluate the program and precompute all the constant computations. In your case, the constant computations of main interest are the concatenation steps over values which don't change.
To do this, you need a program transformation system (PTS). This is a tool that will read/parse source code for a specified language and build an abstract syntax tree, allow you specify transformations and analyses over the AST, and run those, and then spit out the modified AST as source code again.
In your case, you obviously want a PTS that is wired to know JavaScript out of the box (rare) or is willing to accept a description of JavaScript and then read JavaScript (more typical) with the hope that you can build or get a JavaScript description easily. [I build a PTS that has JavaScript descriptions available, see my bio].
With that in hand, you need to:
code an analyzer that inspects each variable found in an expression to see if that expression is constant (e.g., "wf6"). To demonstrate it is constant, you will have to find the variable definition, and check that all the values used in the variable definition are themselves constants. If there is more than one variable definition, you might have to check that all definitions produce the same value. You need to check for side-effects on the variable (e.g, there are no function calls "foo(...,wf6,...)" which would allow the variable's value to be modified). You need to worry about whether an eval command to accomplish such a side effect exists [this is virtually impossible to do, so you often have to just ignore evals and assume they do not do such things]. Many PTSes will have a way to allow you to build such analyzers; some are easier than others.
For every constant valued variable, substitute the value of that variable in the code
For every constant-valued sub-expression after such substitutions, "fold" (calculate) the result of that expression and substitute that value for that subexpression and repeat until no more folding is possible. Obviously you want to do this for at least all "+" operators. [OP just modified his example; he'll want to do it for "eval" operators too when all its operands are constant].
You may have to iterate this process, as folding an expression may make it obvious that a variable now has a constant value
The above process is called "constant propagation" in the compiler literature and is a feature of many compilers.
In your case, you could restrict the constant folding to just string concatenates. However, once you have adequate machinery to do constant value propagation, doing all or most operators on constants isn't that hard. You may need this to undo other obfuscations involving constants since that
seems to be the obfuscation style used on the code you are working on.
You'll need a special rule that transforms
var['string'](args)
into
var.string(args)
as a final step.
You have another complication: that is knowing that you have all the JavaScript relevant to producing constant-valued variables. A single web page may have many included chunks of JavaScript; you will need all of them to demonstrate there are no side effects on a variable. I assume in your case you are sure you have it all.
With respect to producing known-constant values, you may have worry about a tricky case: an expression that produces constant values from non-constant operands. Imagine the obfuscated expression was:
x=random(); // produce a value between 0 and 1
one=x+(1-x); // not constant by constant propagation, but constant by algebraic relations
teststring['st'[one]+'vu'[one+1]+'bz'[one]+...](4,11)
You can see it always computes 'substring' as a property. You can add a transformation rule that understands the trick used to compute "one", e.g., a rule for each algebraic trick used to compute known constants. Unfortunately for you, there's an infinite number of algebra theorems one can use to manufacture constants; how many are really used in your example bit of code? [Welcome to the problem of reverse engineering with a smart adversary].
Nope, none of this "easy". Presumably that's why the obfuscation method
used was chosen.

Javascript else on a new line

When creating an if block, I was wondering if there was any reason beyond personal preference to use the standard bracket formatting vs the second one I listed.
I've run code in the second format without any obvious issues (no ASI or unexpected errors), just looking for some clarification or insight on if there could be any possible issues in the future if I permanently switch to this style.
// Standard formatting
if (true) {
} else {
}
// Other formatting
if (true) {
}
else {
}
Spaces and tabs are not considered to be significant in Javascript in most cases. (I believe all, but I can't find a source for that)
You can technically put all of your code on one line (as most minification algorithms do), but that won't be very readable. In your own code, it comes down to solely personal preference, it will not cause any errors or cause the code to run slower if there are spaces (though more spaces will take longer to load if the JS is not minified).
Best practice is to keep your code style consistent throughout your projects.
None, they are equivalent if you wanted you could put the code in one line as well and it would work. Usually people have personal preferences, as well as some companies require you to write the code in a specific way in order to standardize it and make it so anyone taking over your project knows what to expect.
You don't even need the brackets, it wil just work:
http://jsfiddle.net/4ywahnof/1/
(function () {
var t = 1;
if (t == 1) alert("hi");
else alert("no hi");
})();

Using PEG Parser for BBCode Parsing: pegjs or ... what?

I have a bbcode -> html converter that responds to the change event in a textarea. Currently, this is done using a series of regular expressions, and there are a number of pathological cases. I've always wanted to sharpen the pencil on this grammar, but didn't want to get into yak shaving. But... recently I became aware of pegjs, which seems a pretty complete implementation of PEG parser generation. I have most of the grammar specified, but am now left wondering whether this is an appropriate use of a full-blown parser.
My specific questions are:
As my application relies on translating what I can to HTML and leaving the rest as raw text, does implementing bbcode using a parser that can fail on a syntax error make sense? For example: [url=/foo/bar]click me![/url] would certainly be expected to succeed once the closing bracket on the close tag is entered. But what would the user see in the meantime? With regex, I can just ignore non-matching stuff and treat it as normal text for preview purposes. With a formal grammar, I don't know whether this is possible because I am relying on creating the HTML from a parse tree and what fails a parse is ... what?
I am unclear where the transformations should be done. In a formal lex/yacc-based parser, I would have header files and symbols that denoted the node type. In pegjs, I get nested arrays with the node text. I can emit the translated code as an action of the pegjs generated parser, but it seems like a code smell to combine a parser and an emitter. However, if I call PEG.parse.parse(), I get back something like this:
[
[
"[",
"img",
"",
[
"/",
"f",
"o",
"o",
"/",
"b",
"a",
"r"
],
"",
"]"
],
[
"[/",
"img",
"]"
]
]
given a grammar like:
document
= (open_tag / close_tag / new_line / text)*
open_tag
= ("[" tag_name "="? tag_data? tag_attributes? "]")
close_tag
= ("[/" tag_name "]")
text
= non_tag+
non_tag
= [\n\[\]]
new_line
= ("\r\n" / "\n")
I'm abbreviating the grammar, of course, but you get the idea. So, if you notice, there is no contextual information in the array of arrays that tells me what kind of a node I have and I'm left to do the string comparisons again even thought the parser has already done this. I expect it's possible to define callbacks and use actions to run them during a parse, but there is scant information available on the Web about how one might do that.
Am I barking up the wrong tree? Should I fall back to regex scanning and forget about parsing?
Thanks
First question (grammar for incomplete texts):
You can add
incomplete_tag = ("[" tag_name "="? tag_data? tag_attributes?)
// the closing bracket is omitted ---^
after open_tag and change document to include an incomplete tag at the end. The trick is that you provide the parser with all needed productions to always parse, but the valid ones come first. You then can ignore incomplete_tag during the live preview.
Second question (how to include actions):
You write socalled actions after expressions. An action is Javascript code enclosed by braces and are allowed after a pegjs expression, i. e. also in the middle of a production!
In practice actions like { return result.join("") } are almost always necessary because pegjs splits into single characters. Also complicated nested arrays can be returned. Therefore I usually write helper functions in the pegjs initializer at the head of the grammar to keep actions small. If you choose the function names carefully the action is self-documenting.
For an examle see PEG for Python style indentation. Disclaimer: this is an answer of mine.
Regarding your first question I have tosay that a live preview is going to be difficult. The problems you pointed out regarding that the parser won't understand that the input is "work in progress" are correct. Peg.js tells you at which point the error is, so maybe you could take that info and go a few words back and parse again or if an end tag is missing try adding it at the end.
The second part of your question is easier but your grammar won't look so nice afterwards. Basically what you do is put callbacks on every rule, so for example
text
= text:non_tag+ {
// we captured the text in an array and can manipulate it now
return text.join("");
}
At the moment you have to write these callbacks inline in your grammar. I'm doing a lot of this stuff at work right now, so I might make a pullrequest to peg.js to fix that. But I'm not sure when I find the time to do this.
Try something like this replacement rule. You're on the right track; you just have to tell it to assemble the results.
text
= result:non_tag+ { return result.join(''); }

Emacs problematic JavaScript indentation

I'm following the Douglas Crockford's code convention, but I can't get the correct identation in JS mode in Emacs. I tried to customize the indent options of the mode, tried another modes like js3, but nothing seems to work.
When I have parenthesis, and I have to break the expression, Emacs indent like this:
this.offices.each(this.addOfficesToMap,
this);
While the convention that I'm following, says that I should leave just 4 spaces when an expression is broken up. So the indentation should look like:
this.offices.each(this.addOfficesToMap,
this);
Any idea of how I can change the indentation on broken up expressions?
The behaviour you want to change is hard-coded into a function called js--proper-indentation. An inelegant fix to your problem would be to replace the function in your .emacs:
(require 'cl)
(eval-after-load "js" '(defun js--proper-indentation (parse-status)
"Return the proper indentation for the current line."
(save-excursion
(back-to-indentation)
(cond ((nth 4 parse-status)
(js--get-c-offset 'c (nth 8 parse-status)))
((nth 8 parse-status) 0) ; inside string
((js--ctrl-statement-indentation))
((eq (char-after) ?#) 0)
((save-excursion (js--beginning-of-macro)) 4)
((nth 1 parse-status)
;; A single closing paren/bracket should be indented at the
;; same level as the opening statement. Same goes for
;; "case" and "default".
(let ((same-indent-p (looking-at
"[]})]\\|\\_<case\\_>\\|\\_<default\\_>"))
(continued-expr-p (js--continued-expression-p)))
(goto-char (nth 1 parse-status)) ; go to the opening char
(if (looking-at "[({[]\\s-*\\(/[/*]\\|$\\)")
(progn ; nothing following the opening paren/bracket
(skip-syntax-backward " ")
(when (eq (char-before) ?\)) (backward-list))
(back-to-indentation)
(cond (same-indent-p
(current-column))
(continued-expr-p
(+ (current-column) (* 2 js-indent-level)
js-expr-indent-offset))
(t
(+ (current-column) js-indent-level
(case (char-after (nth 1 parse-status))
(?\( js-paren-indent-offset)
(?\[ js-square-indent-offset)
(?\{ js-curly-indent-offset))))))
;; If there is something following the opening
;; paren/bracket, everything else should be indented at
;; the same level.
;; Modified code here:
(unless same-indent-p
(move-beginning-of-line 1)
(forward-char 4))
;; End modified code
(current-column))))
((js--continued-expression-p)
(+ js-indent-level js-expr-indent-offset))
(t 0)))) )
I have modified three lines of code towards the bottom of the function. If you want your indentation to be 8 chars instead of 4, change the (forward-char 4) line accordingly.
Note that js--proper-indentation (and the js library) requires the cl.el library, but that using eval-after-load mucks this up. So you need to explicitly require cl in your .emacs for this to work.
Note that this 'solution' hard codes a 4 space indentation only for the situation you indicate, and does not handle nested code at all. But knowing the point in the code that deals with your situation should at least point you towards the bit that needs work for a more sophisticated solution.
you can try https://github.com/mooz/js2-mode ...it's a fork js2-mode but with some impovements like good indentation...other way is read this article: http://mihai.bazon.net/projects/editing-javascript-with-emacs-js2-mode .. but sincerely it's better idea replace the old js2-mode ..it has several improvements https://github.com/mooz/js2-mode/wiki/Changes-from-the-original-mode ...hope this can help you...
You can file a feature request on js3-mode at https://github.com/thomblake/js3-mode/issues
Do you have a link to a style guide?
BTW, while the indentation conventions vary from language to language, and the preferences can even vary between users (such as in the above case), there is a fair bit of overlap and there are often ways to write your code such that there is little disagreement.
E.g. your above code could be written:
this.offices.each(
this.addOfficesToMap,
this
);
or
this.offices.each
(this.addOfficesToMap,
this);
and most indentation styles would largely agree on how to indent it.

Testing with multiple regexps at the same time (for use in syntactic analysis)

I am writing a simple syntax highlighter in JavaScript, and I need to find a way to test with multiple regular expressions at the same time.
The idea is to find out which comes first, so I can determine the new set of expressions to look for.
The expressions could be something like:
/<%#/, /<%--/, /<!--/ and /<[a-z:-]/
First I tried a strategy where I combined the expressions in groups like:
/(<%#)|(<%--)|(<!--)|(<[a-z:-])/
That way I could find out which matched group was not undefined. But the problem is, when some of the subexpressions contain groups or backrefferences.
So my question is this:
Does anyone know a good and reasonable way the look for matches with multiple regular expressions in a string?
Is there any particular reason why you can't tokenize the input and then test the beginning of each token to see what type it is for the purposes of highlighting? I think you're overthinking this one. A simple cascade of if-elseifs will cover this just fine:
if (token.startsWith("<%#")) {
// paint it red
}
else if (token.startsWith("<%--")) {
// paint it green
}
else if (token.startsWith("<!--")) {
// paint it blue
}
else if (token.matches("^<[a-z:-]")) {
// paint it black
}
The above is pseudocode and needs to be magically translated into JavaScript. I leave this as an exercise for the reader.
ANTLR is an excellent grammar development system. There's a project to build a JavaScript back-end for it at http://code.google.com/p/antlr-javascript/
I agree with Welbog's answer to your regex question, but you can probably learn a lot about implementing JavaScript grammars by looking at the ANTLR generated ones.

Categories