Python and JavaScript both allow developers to use or to omit semicolons. However, I've often seen it suggested (in books and blogs) that I should not use semicolons in Python, while I should always use them in JavaScript.
Is there a technical difference between how the languages use semicolons or is this just a cultural difference?
Semicolons in Python are totally optional (unless you want to have multiple statements in a single line, of course). I personally think Python code with semicolons at the end of every statement looks very ugly.
Now in Javascript, if you don't write a semicolon, one is automatically inserted1 at the end of line. And this can cause problems. Consider:
function add(a, b) {
return
a + b
}
You'd think this returns a + b, but Javascript just outsmarted you and sees this as:
function add() {
return;
a + b;
}
Returning undefined instead.
1 See page 27, item 7.9 - Automatic Semicolon Insertion on ECMAScript Language Specification for more details and caveats.
This had me confused for the longest time. I thought it was just a cultural difference, and that everyone complaining about semicolon insertion being the worst feature in the language was an idiot. The oft-repeated example from NullUserException's answer didn't sway me because, disregarding indentation, Python behaves the same as JavaScript in that case.
Then one day, I wrote something vaguely like this:
alert(2)
(x = $("#foo")).detach()
I expected it to be interpreted like this:
alert(2);
(x = $("#foo")).detach();
It was actually interpreted like this:
alert(2)(x = $("#foo")).detach();
I now use semicolons.
JavaScript will only1 treat a newline as a semicolon in these cases:
It's a syntax error not to.
The newline is between the throw or return keyword and an expression.
The newline is between the continue or break keyword and an identifier.
The newline is between a variable and a postfix ++ or -- operator.
This leaves cases like this where the behaviour is not what you'd expect. Some people2 have adopted conventions that only use semicolons where necessary. I prefer to follow the standard convention of always using them, now that I know it's not pointless.
1 I've omitted a few minor details, consult ECMA-262 5e Section 7.9 for the exact description.
2 Twitter Bootstrap is one high-profile example.
Aside from the syntactical issues, it is partly cultural. In Python culture any extraneous characters are an anathema, and those that are not white-space or alphanumeric, doubly so.
So things like leading $ signs, semi-colons, and curly braces, are not liked. What you do in your code though, is up to you, but to really understand a language it is not enough just to learn the syntax.
JavaScript is designed to "look like C", so semicolons are part of the culture. Python syntax is different enough to not make programmers feel uncomfortable if the semicolons are "missing".
The answer why you don't see them in Python code is: no one needs them, and the code looks cleaner without them.
Generally speaking, semicolons is just a tradition. Many new languages have just dropped them for good (take Python, Ruby, Scala, Go, Groovy, and Io for example). Programmers don't need them, and neither do compilers. If a language lets you not type an extra character you never needed, you will want to take advantage of that, won't you?
It's just that JavaScript's attempt to drop them wasn't very successful, and many prefer the convention to always use them, because that makes code less ambiguous.
It is mostly that Python looks nothing like Java, and JavaScript does, which leads people to treat it that way. It is very simple to not get into trouble using semicolons with JavaScript (Semicolons in JavaScript are optional), and anything else is FUD.
Both are dynamic typing to increase the readability.
Python Enhancement Proposal 8, or PEP 8, is a style guide for Python code. In 2001, Guido van Rossum, Barry Warsaw, and Nick Coghlan created PEP 8 to help Python programmers write consistent and readable code. Reference.
So in JavaScript we have the ECMAScript specification that describes how, if a statement is not explicitly terminated with a semicolon, sometimes a semicolon will be automatically inserted by the JavaScript engine (called “automatic semicolon insertion” (ASI)). Reference.
See this article from Google talking about JavaScript too.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Languages such as C++ will not work if semicolons are forgotten but other languages such as JavaScript will automatically include them for you.
I know from this article Do you recommend using semicolons after every statement in JavaScript?, that it is recommended to use semicolons and there are scenarios that can create unwanted ambiguities (such as dangling else in C++ when braces aren't used).
At some point in time there must have been a decision to make them optional (e.g. when the creators of JavaScript made the conscious choice to make it optional).
I would like to know why this decision was made and how it is beneficial to users of these languages.
Background: I am a novice coder and have only recently began learning JavaScript.
EDIT: To the comments that are saying it is bad in JavaScript, I know. I'm asking why it is allowed to happen in the first place, if most people consider it bad practice.
Regarding JavaScript, Douglas Crockford explains the origins of the idea in this video. (It's a great talk and it's really worth your time to watch it if you intend to continue pursuing JavaScript.)
This is a direct quote from the talk:
Semicolon insertion was something intended to make the C syntax easier for beginners.
As far as how it's beneficial to users of the language, Crockford explains in detail a few reasons why it's not beneficial, but rather how it introduces very serious ambiguities and gotchas into the syntax. One of the most notable cases is when attempting to return an object literal using a braces-on-the-left coding style (source from the video):
return
{
ok: false
};
Which actually returns undefined, because semicolon insertion adds one after return, and the remaining intended object literal gets parsed as a code block, equivalent to this:
return;
{
ok: false;
}
Trying to make a language easier for beginners can be a great source of well-intentioned blunders.
The author of the JavaScript language, Brendan Eich, has a blog post on this subject called The infernal semicolon on the topic of Automatic Semicolon Insertion (ASI).
Relevant quotes:
ASI is (formally speaking) a syntactic error correction procedure.
I wish I had made newlines more significant in JS back in those ten days in May, 1995. Then instead of ASI, we would be cursing the need to use infix operators at the ends of continued lines, or perhaps or brute-force parentheses, to force continuation onto a successive line. But that ship sailed almost 17 years ago.
My two cents: be careful not to use ASI as if it gave JS significant newlines.
Long ago, in the distant, dusty past, things like this were done primarily to make up for the fact that compile/link/run cycles were measured in hours at a minimum, and often ran more than a day. It could be (okay: was) extremely frustrating to wait hours for a result, only to find that the compiler had stopped at line 3 (or whatever) because of some silly typo.
To try to combat that, some compilers of the time tried to second-guess your intended meaning, so if a typo was minor enough (for some definition of "minor enough") it would assume it knew what you really intended, and continue compiling (and potentially even executing) despite an error.
Those who fail to study history are doomed to repeat it. A few who are just too arrogant to learn from history repeat it as well. There's probably room for considerably debate about the exact sort of character defect that would lead a language designer to make this mistake at the present time. There is much less room (none at all, really) for argument about whether it is a mistake though--it clearly is, and an inexcusable one at that.
in javascript, the semi colon is a statement seperator, but so is newlines, so you don't need them if you have a statement per line.
other languages, like C++, only have ; as a seperator, and whitespace like newlines, do nothing. There are pros and cons
in C++ it means the syntax is consistent
if you write
int x=0;
x++;
if you then compress to one line, its the same general syntax :-
int x = 0; x++;
in javascript if you write
var x=0
x++
then if you compressed to one line
var x=0 x++
would be a problem
you'd need to do var x=0; x++
So, the big thing is whether whitespace is significant or not. Ideally a language would consistently use one mechanisim. But for javascript it is mixed so it leaves a bit of ambiguity when to use ;
I'm trying to understand how JS is actually parsed. But my searches either return some ones very vaguely documented project of a "parser/generator" (i don't even know what that means), or how to parse JS using a JS Engine using the magical "parse" method. I don't want to scan through a bunch of code and try all my life to understand (although i can, it would take too long).
i want to know how an arbitrary string of JS code is actually turned into objects, functions, variables etc. I also want to know the procedures, and techniques that turns that string into stuff, gets stored, referenced, executed.
Are there any documentation/references for this?
Parsers probably work in all sorts of ways, but fundamentally they first go through a stage of tokenisation, then give the result to the compiler, which turns it into a program if it can. For example, given:
function foo(a) {
alert(a);
}
the parser will remove any leading whitespace to the first character, the letter "f". It will collect characters until it gets something that doesn't belong, the whitespace, that indicates the end of the token. It starts again with the "f" of "foo" until it gets to the "(", so it now has the tokens "function" and "foo". It knows "(" is a token on its own, so that's 3 tokens. It then gets the "a" followed by ")" which are two more tokens to make 5, and so on.
The only need for whitespace is between tokens that are otherwise ambiguous (e.g. there must be either whitespace or another token between "function" and "foo").
Once tokenisation is complete, it goes to the compiler, which sees "function" as an identifier, and interprets it as the keyword "function". It then gets "foo", an identifier that the language grammar tells it is the function name. Then the "(" indicates an opening grouping operator and hence the start of a formal parameter list, and so on.
Compilers may deal with tokens one at a time, or may grab them in chunks, or do all sorts of weird things to make them run faster.
You can also read How do C/C++ parsers work?, which gives a few more clues. Or just use Google.
While it doesn't correspond closely to the way real JS engines work, you might be interested in reading Douglas Crockford's article on Top Down Operator Precedence, which includes code for a small working lexer and parser written in the Javascript subset it parses. It's very readable and concise code (with good accompanying explanations) which at least gives you an outline of how a real implementation might work.
A more common technique than Crockford's "Top Down Operator Precedence" is recursive descent parsing, which is used in Narcissus, a complete implementation of JS in JS.
maybe esprima will help you to understand how JS parses the grammar. it's online
Im trying to get a javascript regex that matches x opening braces, then x closing braces, while allowing them to be nested in-between each other.
For example, it would match:
"{ a { q } }"
but not
"{ a { q } { }"
or
"{ } } { } {"
That being said, I have no idea how to do it with regexpes, or if it's even possible.
The short answer to this is no. Regular expressions are a non-context-free grammar, so it cannot be done with true regex. You can, however, look for specific (non-arbitrary) nesting patterns.
http://blogs.msdn.com/b/jaredpar/archive/2008/10/15/regular-expression-limitations.aspx
The recursion problem here is, at its heart, the same reason you can't correctly parse HTML with regex. Like XML, the construct you've described is a context-free grammar; note its close similarity with the first example from the Wikipedia article.
I've heard there are engines out there that extend regex to offer support for arbitrarily nested elements, but this would make them something other than true regex. Anyway, I don't know of any such libraries for JavaScript. I think what you want is some kind of string-manipulation-based parser.
AFAIK, uou can’t really do this with regular expressions only.
However, Javascript’s String.replace method does have a nice feature that could allow you some level of recursion. If you pass a function as the second parameter, that function will be called for each match encountered. You could then perform the same replace on that match, passing along the same function, which would be called for each match inside that match, etc.
I’m too tired right now to write up an example that fits what you’re asking for — or even if it’s actually possible, so I’ll leave it at this possible hint, and further working out as an exercise to the reader.
That is not possible to do with real regular expression, and even with full-blown PCRE the "counting problem" that you're describing is an example of something that you just can't do.
An old textbook I had in school said, "regular expressions can't count." That's not true of modern "supercharged" regular expression implementations with the "{n,m}" qualifiers, but note that the values in curly braces there are constants.
To do that, you need a more complicated automaton. Context-free grammars can represent languages like you describe, as can parse expression grammars.
Yes, it's probably possible with Regexes. No, it isn't possible in Javascript Regexes. Yes, it's probably possible in .NET Regexes for example (Balancing Groups http://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.71).aspx ). No, I don't know how to do them. They give me migraine (and I'm not kidding here). They are quite extreme voodoo.
First of all, I'd like to say that I'm not trying to start a discussion on what is the best coding style.
Rather, I was wondering what is actually the global standard when it comes to styling your code. I've seen different websites and mainly open source organisations which have their own guideline page, which for example says that you should put } else { on the same line.
Are there some (un)written rules concerning code style which apply to all JavaScript being written? Is there a common preference for specific coding styles? Or is this really on a per-organisation basis?
These are widely accepted*:
Variable names contain only characters a-zA-Z_ (and sometimes $0-9)
Indent by 4 spaces or a tab character (Never mix!)
Constructor functions begin with an uppercase letter
Terminate every statement with a semicolon
Egyptian bracing
always use blocks in after if, else, etc., even for a single statement
One space after a comma, no space before
Assignment/comparison operators are surrounded by spaces
Avoid lines containing multiple statements
Use ' as a string delimiter
From my experience, most conventions are subject to heated discussions.
So, no, there is no general rule. Some people even try to completely avoid semicolons
* or are they? ;)
There isn't one standard. Are there any guidelines out there that you can follow if you want to keep your code consistent? How about google's coding style? http://google-styleguide.googlecode.com/svn/trunk/javascriptguide.xml
We use that as basic guidelines at our company
Douglas Crockford's JavaScript: The Good Parts is widely used as a basis for coding guidelines.
His JSLint tool can be used to check whether code meets his recommendations.
Standard is the new standard.
I've been using it in all my projects.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
In many situations, JavaScript parsers will insert semicolons for you if you leave them out. My question is, do you leave them out?
If you're unfamiliar with the rules, there's a description of semicolon insertion on the Mozilla site. Here's the key point:
If the first through the nth tokens of a JavaScript program form are grammatically valid but the first through the n+1st tokens are not and there is a line break between the nth tokens and the n+1st tokens, then the parser tries to parse the program again after inserting a virtual semicolon token between the nth and the n+1st tokens.
That description may be incomplete, because it doesn't explain #Dreas's example. Anybody have a link to the complete rules, or see why the example gets a semicolon? (I tried it in JScript.NET.)
This stackoverflow question is related, but only talks about a specific scenario.
Yes, you should use semicolons after every statement in JavaScript.
An ambiguous case that breaks in the absence of a semicolon:
// define a function
var fn = function () {
//...
} // semicolon missing at this line
// then execute some code inside a closure
(function () {
//...
})();
This will be interpreted as:
var fn = function () {
//...
}(function () {
//...
})();
We end up passing the second function as an argument to the first function and then trying to call the result of the first function call as a function. The second function will fail with a "... is not a function" error at runtime.
Yes, you should always use semicolons. Why? Because if you end up using a JavaScript compressor, all your code will be on one line, which will break your code.
Try http://www.jslint.com/; it will hurt your feelings, but show you many ways to write better JavaScript (and one of the ways is to always use semicolons).
What everyone seems to miss is that the semi-colons in JavaScript are not statement terminators but statement separators. It's a subtle difference, but it is important to the way the parser is programmed. Treat them like what they are and you will find leaving them out will feel much more natural.
I've programmed in other languages where the semi-colon is a statement separator and also optional as the parser does 'semi-colon insertion' on newlines where it does not break the grammar. So I was not unfamiliar with it when I found it in JavaScript.
I don't like noise in a language (which is one reason I'm bad at Perl) and semi-colons are noise in JavaScript. So I omit them.
I'd say consistency is more important than saving a few bytes. I always include semicolons.
On the other hand, I'd like to point out there are many places where the semicolon is not syntactically required, even if a compressor is nuking all available whitespace. e.g. at then end of a block.
if (a) { b() }
JavaScript automatically inserts semicolons whilst interpreting your code, so if you put the value of the return statement below the line, it won't be returned:
Your Code:
return
5
JavaScript Interpretation:
return;
5;
Thus, nothing is returned, because of JavaScript's auto semicolon insertion
I think this is similar to what the last podcast discussed. The "Be liberal in what you accept" means that extra work had to be put into the Javascript parser to fix cases where semicolons were left out. Now we have a boatload of pages out there floating around with bad syntax, that might break one day in the future when some browser decides to be a little more stringent on what it accepts. This type of rule should also apply to HTML and CSS. You can write broken HTML and CSS, but don't be surprise when you get weird and hard to debug behaviors when some browser doesn't properly interpret your incorrect code.
The article Semicolons in JavaScript are optional makes some really good points about not using semi colons in Javascript. It deals with all the points have been brought up by the answers to this question.
This is the very best explanation of automatic semicolon insertion that I've found anywhere. It will clear away all your uncertainty and doubt.
I use semicolon, since it is my habit.
Now I understand why I can't have string split into two lines... it puts semicolon at the end of each line.
No, only use semicolons when they're required.