JavaScript/HTML in UPPER or lower case? - javascript

Is it better for sake of compatibility/JIT-Compiling performance to use UPPER CASE or lower case, in JS/HTML? For eg:
<DIV> my content </DIV>
<div> my content </div>
ALERT(DOCUMENT.LOCATION);
alert(document.location);
This is not a newbie question, I know lowercase is the de-facto standard. But since I've seen some uppercase JS+HTML, I was wondering which would be better to write in. (like SQL is fully uppercase?)

I don't think it'd make a difference, speed-wise.
XHTML: Lowercase tags is what the W3C specified.
JavaScript: It probably wouldn't work, because I have never seen anyone's code use all caps in JS.
SQL is fully uppercase to differentiate the actions, functions, etc from the actual data. You can use lowercase, but it becomes less readable (to some, me included).
IMO, wading through a heap of uppercase tags is less readable than lowercase tags. I'd say user agents don't care what case the tags are. Here's a little bit of history: when I made a website in 1999, uppercase tags were the standard.
You can still find some dodgy non updated websites out there that still write
'Use <B></B> to make text bold'

It is incorrect (in xhtml, at least) to use <DIV>...</DIV>; it is a <div>...</div>.
Likewise, I'd use lower-case in the javascript (for alert(document.location);), as that is their names ;-p

I can't imagine it making any difference compatibility- or performance-wise. I think some people find UPPERCASE easier to recognize as markup or code rather than content.
You can do some benchmarks, if you must.
(XHTML specifies lowercase as the standard, so if your goal is satisfying validators, then go with that)

JavaScript is (using Fx3.0) case sensitive.
var myURL = document.URL; // sets myURL to the current URL
var myURL2 = DOCUMENT.URL; // ReferenceError: "DOCUMENT" is not defined
HTML allows mixed-case tags, XHTML demands lower-case only tags, attributes.

It certainly makes a difference with javascript since it is case sensitive.
The accepted community standard for html is lowercase, even though browser don't care.
So be nice to the ones who have to read your code later!

I'd definitely go with lowercase wherever possible. I tend to like camel casing multi-word variable names, but even that can be eschewed in favor of underscores.

Related

The reason to use "-" instead of "_" in the HTML class and id attributes [duplicate]

In the past I've always used underscores for defining class and id attributes in HTML. Over the last few years I changed over to dashes, mostly to align myself with the trend in the community, not necessarily because it made sense to me.
I've always thought dashes have more drawbacks, and I don't see the benefits:
Code completion & Editing
Most editors treat dashes as word separators, so I can't tab through to the symbol I want. Say the class is "featured-product", I have to auto-complete "featured", enter a hyphen, and complete "product".
With underscores "featured_product" is treated as one word, so it can be filled in one step.
The same applies to navigating through the document. Jumping by words or double-clicking on class names is broken by hyphens.
(More generally, I think of classes and ids as tokens, so it doesn't make sense to me that a token should be so easily splittable on hyphens.)
Ambiguity with arithmetic operator
Using dashes breaks object-property access to form elements in JavaScript. This is only possible with underscores:
form.first_name.value='Stormageddon';
(Admittedly I don't access form elements this way myself, but when deciding on dashes vs underscores as a universal rule, consider that someone might.)
Languages like Sass (especially throughout the Compass framework) have settled on dashes as a standard, even for variable names. They originally used underscores in the beginning too. The fact that this is parsed differently strikes me as odd:
$list-item-10
$list-item - 10
Inconsistency with variable naming across languages
Back in the day, I used to write underscored_names for variables in PHP, ruby, HTML/CSS, and JavaScript. This was convenient and consistent, but again in order to "fit in" I now use:
dash-case in HTML/CSS
camelCase in JavaScript
underscore_case in PHP and ruby
This doesn't really bother me too much, but I wonder why these became so misaligned, seemingly on purpose. At least with underscores it was possible to maintain consistency:
var featured_product = $('#featured_product'); // instead of
var featuredProduct = $('#featured-product');
The differences create situations where we have to translate strings unnecessarily, along with the potential for bugs.
So I ask: Why did the community almost universally settle on dashes, and are there any reasons that outweigh underscores?
There is a related question from back around the time this started, but I'm of the opinion that it's not (or shouldn't have been) just a matter of taste. I'd like to understand why we all settled on this convention if it really was just a matter of taste.
Code completion
Whether dash is interpreted as punctuation or as an opaque identifier depends on the editor of choice, I guess. However, as a personal preference, I favor being able to tab between each word in a CSS file and would find it annoying if they were separated with underscore and there were no stops.
Also, using hyphens allows you to take advantage of the |= attribute selector, which selects any element containing the text, optionally followed by a dash:
span[class|="em"] { font-style: italic; }
This would make the following HTML elements have italic font-style:
<span class="em">I'm italic</span>
<span class="em-strong">I'm italic too</span>
Ambiguity with arithmetic operator
I'd say that access to HTML elements via dot notation in JavaScript is a bug rather than a feature. It's a terrible construct from the early days of terrible JavaScript implementations and isn't really a great practice. For most of the stuff you do with JavaScript these days, you'd want to use CSS Selectors for fetching elements from the DOM anyway, which makes the whole dot notation rather useless. Which one would you prefer?
var firstName = $('#first-name');
var firstName = document.querySelector('#first-name');
var firstName = document.forms[0].first_name;
I find the two first options much more preferable, especially since '#first-name' can be replaced with a JavaScript variable and built dynamically. I also find them more pleasant on the eyes.
The fact that Sass enables arithmetic in its extensions to CSS doesn't really apply to CSS itself, but I do understand (and embrace) the fact that Sass follows the language style of CSS (except for the $ prefix of variables, which of course should have been #). If Sass documents are to look and feel like CSS documents, they need to follow the same style as CSS, which uses dash as a delimiter. In CSS3, arithmetic is limited to the calc function, which goes to show that in CSS itself, this isn't an issue.
Inconsistency with variable naming across languages
All languages, being markup languages, programming languages, styling languages or scripting languages, have their own style. You will find this within sub-languages of language groups like XML, where e.g. XSLT uses lower-case with hyphen delimiters and XML Schema uses camel-casing.
In general, you will find that adopting the style that feels and looks most "native" to the language you're writing in is better than trying to shoe-horn your own style into every different language. Since you can't avoid having to use native libraries and language constructs, your style will be "polluted" by the native style whether you like it or not, so it's pretty much futile to even try.
My advice is to not find a favorite style across languages, but instead make yourself at home within each language and learn to love all of its quirks. One of CSS' quirks is that keywords and identifiers are written in lowercase and separated by hyphens. Personally, I find this very visually appealing and think it fits in with the all-lowercase (although no-hyphen) HTML.
Perhaps a key reason why the HTML/CSS community aligned itself with dashes instead of underscores is due to historical deficiencies in specs and browser implementations.
From a Mozilla doc published March 2001 # https://developer.mozilla.org/en-US/docs/Underscores_in_class_and_ID_Names
The CSS1 specification, published in its final form in 1996, did not
allow for the use of underscores in class and ID names unless they
were "escaped." An escaped underscore would look something like this:
p.urgent\_note {color: maroon;}
This was not well supported by browsers at the time, however, and the
practice has never caught on. CSS2, published in 1998, also forbade
the use of underscores in class and ID names. However, errata to the
specification published in early 2001 made underscores legal for the
first time. This unfortunately complicated an already complex
landscape.
I generally like underscores but the backslash just makes it ugly beyond hope, not to mention the scarce support at the time. I can understand why developers avoided it like the plague. Of course, we don't need the backslash nowadays, but the dash-etiquette has already been firmly established.
I don't think anyone can answer this definitively, but here are my educated guesses:
Underscores require hitting the Shift key, and are therefore harder to type.
CSS selectors which are part of the official CSS specifications use dashes (such as pseudo-classes like :first-child and pseudo-elements :first-line), not underscores. Same thing for properties, e.g. text-decoration, background-color, etc. Programmers are creatures of habit. It makes sense that they would follow the standard's style if there's no good reason not to.
This one is further out on the ledge, but... Whether it's myth or fact, there is a longstanding idea that Google treats words separated by underscores as a single word, and words separated by dashes as separate words. (Matt Cutts on Underscores vs. Dashes.) For this reason, I know that my preference now for creating page URLs is to use-words-with-dashes, and for me at least, this has bled into my naming conventions for other things, like CSS selectors.
There are many reasons, but one of the most important thing is maintaining consistency.
I think this article explains it comprehensively.
CSS is a hyphen-delimited syntax. By this I mean we write things like font-size, line-height, border-bottom etc.
So:
You just shouldn’t mix syntaxes: it’s inconsistent.
There's been a clear uptick in hyphen-separated, whole-word segments of URLs over recent years. This is encouraged by SEO best practices. Google explicitly "recommend that you use hyphens (-) instead of underscores (_) in your URLs": http://www.google.com/support/webmasters/bin/answer.py?answer=76329.
As noted, different conventions have prevailed at different times in different contexts, but they typically are not a formal part of any protocol or framework.
My hypothesis, then, is that Google's position anchors this pattern within one key context (SEO), and the trend to use this pattern in class, id, and attribute names is simply the herd moving slowly in this general direction.
I think it's a programmer dependent thing. Someones like to use dashes, others use underscores.
I personally use underscores (_) because I use it in other places too. Such as:
- JavaScript variables (var my_name);
- My controller actions (public function view_detail)
Another reason that I use underscores, is this that in most IDEs two words separated by underscores are considered as 1 word. (and are possible to select with double_click).
point of refactoring only btn to bt
case: btn_pink
search btn in word
result btn
case: btn-pink
search btn in word
result btn | btn-pink
case: btn-pink
search btn in regexp
\bbtn\b(?!-) type to hard
result btn

Pascal Case (or Upper Camel Case) but allows name like HTMLInputElement?

I'm trying to find a definition of a "casing standard" that allows two capitalized letters to follow directly after each other. An example of this is HTMLInputElement that seemingly defies the rules. I prefer this, even though it breaks the strict naming rules, which would require it to be HtmlInputElement instead. Does anyone know if there's an official name for this casing subset because I don't think it adheres to either of these standards... but perhaps this is also one of the differences between Pascal Case and Upper Camel Case and it hasn't been defined well enough in the existing definitions I've found online?
I'm not sure if that casing standard exists, I think it's just an exception in camel case when there is an abbreviated term like HTML.
Here's another version of the question:
Acronyms in CamelCase
Apparently Microsoft guidelines state (according to this article):
When using acronyms, use Pascal case or camel case for acronyms more
than two characters long. For example, use HtmlButton or htmlButton.
However, you should capitalize acronyms that consist of only two
characters, such as System.IO instead of System.Io.
Do not use abbreviations in identifiers or parameter names. If you
must use abbreviations, use camel case for abbreviations that consist
of more than two characters, even if this contradicts the standard
abbreviation of the word.
Although pretty debated, I'm not sure there is a perfectly correct answer to your question. Lots of this is based to opinion and interpretation.
Since it appears that there is no name for this and I've given it a while, I'm going to officially name it "Acronym Case" :-)

Preventing DOM XSS

We recently on-boarded someone else's code which has since been tested, and failed, for DOM XSS attacks.
Basically the url fragments are being passed directly into jQuery selectors and enabling JavaScript to be injected, like so:
"http://website.com/#%3Cimg%20src=x%20onerror=alert%28/XSSed/%29%3E)"
$(".selector [thing="+window.location.hash.substr(1)+"]");
The problem is that this is occurring throughout their scripts and would need a lot of regression testing to fix e.g. if we escape the data if statements won't return true any more as the data won't match.
The JavaScript file in question is concatenated at build time from many smaller files so this becomes even more difficult to fix.
Is there a way to prevent these DOM XSS attacks with some global code without having to go through and debug each instance.
I proposed that we add a little regular expression at the top of the script to detect common chars used in XSS attacks and to simply kill the script if it returns true.
var xss = window.location.href.match(/(javascript|src|onerror|%|<|>)/g);
if(xss != null) return;
This appears to work but I'm not 100% happy with the solution. Does anyone have a better solution or any useful insight they can offer?
If you stick to the regular expression solution, which is far from ideal but may be the best choice given your constraints:
Rather than defining a regular expression matching malicious hashes (/(javascript|src|onerror|%|<|>)/g), I would define a regular expression matching sound hashes (e.g. /^[\w_-]*$/).
It will avoid false-positive errors (e.g. src_records), make it clear what is authorized and what isn't, and block more complex injection mechanisms.
Your issue is caused by that jQuery's input string may be treated as HTML, not only as selector.
Use native document.querySelector() instead of jQuery.
If support for IE7- is important for you, you can try Sizzle selector engine which likely, unlike jQuery and similar to native querySelector(), does not interpret input string as something different from a selector.

need a JavaScript Regex that requires upper or lowercase letters

I have a regex that right now only allows lowercase letters, I need one that requires either lowercase or uppercase letters:
/(?=.*[a-z])/
You Can’t Get There from Here
I have a regex that right now only allows lowercase letters, I need one that requires either lowercase or uppercase letters: /(?=.*[a-z])/
Unfortunately, it is utterly impossible to do this correctly using Javascript! Read this flavor comparison’s ECMA column for all of what Javascript cannot do.
Theory vs Practice
The proper pattern for lowercase is the standard Unicode derived binary property \p{Lowercase}, and the proper pattern for uppercase is similarly \p{Uppercase}. These are normative properties that sometimes include non-letters in them under certain exotic circumstances.
Using just General Category properties, you can have \p{Ll} for Lowercase_Letter, \p{Lu} for Uppercase_Letter, and \p{Lt} for titlecase letter. Remember they are three cases in Unicode, not two). There is a standard alias \p{LC} which means [\p{Lu}\p{Lt}\p{Ll}].
If you want a letter than is not a lowercase letter, you could use (?=\P{Ll})\pL. Written in longhand that’s (?=\P{Lowercase_Letter})\p{Letter}. Again, these mix some of the Other_Lowercase code points that \p{Lowercase} recognizes. I must again stress that the Lowercase property is a superset of the Lowercase_Letter property.
Remember the previous paragraph, swapping in upper everywhere I have written lower, and you get the same thing for the capitals.
Possible Platforms
Because access to these essential properties is the minimal level of critical functionality necessary for Unicode regular expressions, some versions of Javascript implement them in just the way I have written them above. However, the standard for Javascript still does not require them, so you cannot in general count on them. This means that it is impossible to this correctly under all implementations of Javascript.
Languages in which it is possible to do what you want done minimally include:
C♯ and Java (both only General Categories)
Ruby if and only if v1.9 or better (only binary properties, including General Categories)
PHP and PCRE (only General Category and Script properties plus a couple extras)
ICU’s C++ library and Perl, which both support all Unicode properties
Of those listed bove, only the last line’s — ICU and Perl — strictly and completely meet all Level 1 compliance requirements (plus some Levels 2 and 3) for the proper handling of Unicode in regexes. However, all of those I’ve listed in the previous paragraph’s bullets can easily handle most, and quite probably all, of what you need.
Javascript is not amongst those, however. Your version might, though, if you are very lucky and never have to run on a standard-only Javascript platform.
Summary
So very sadly, you cannot really use Javascript regexes for Unicode work unless you have a non-standard extension. Some people do, but most do not. If you do not, you may have to use a different platform until the relevant ECMA standard catches up with the 21st century (Unicode 3.1 came out a decade ago!!).
If anyone knows of a Javascript library that implements the Level 1 requirements of UTS#18 on Unicode Regular Expressions including both RL1.2 “Properties” and RL1.2a “Annex C: Compatibility Properties”, please chime in.
Not sure if you mean mixed-case, or strictly lowercase plus strictly uppercase.
Here's the mixed-case version:
/^[a-zA-Z]+$/
And the strictly one-or-the-other version:
/^([a-z]+|[A-Z]+)$/
Try /(?=.*[a-z])/i
Note the i at the end, this makes the expression case insensitive.
Or add an uppercase range to your regex:
/(?=.*[a-zA-Z])/

What is the 'standard' concerning style guidelines in JavaScript?

First of all, I'd like to say that I'm not trying to start a discussion on what is the best coding style.
Rather, I was wondering what is actually the global standard when it comes to styling your code. I've seen different websites and mainly open source organisations which have their own guideline page, which for example says that you should put } else { on the same line.
Are there some (un)written rules concerning code style which apply to all JavaScript being written? Is there a common preference for specific coding styles? Or is this really on a per-organisation basis?
These are widely accepted*:
Variable names contain only characters a-zA-Z_ (and sometimes $0-9)
Indent by 4 spaces or a tab character (Never mix!)
Constructor functions begin with an uppercase letter
Terminate every statement with a semicolon
Egyptian bracing
always use blocks in after if, else, etc., even for a single statement
One space after a comma, no space before
Assignment/comparison operators are surrounded by spaces
Avoid lines containing multiple statements
Use ' as a string delimiter
From my experience, most conventions are subject to heated discussions.
So, no, there is no general rule. Some people even try to completely avoid semicolons
* or are they? ;)
There isn't one standard. Are there any guidelines out there that you can follow if you want to keep your code consistent? How about google's coding style? http://google-styleguide.googlecode.com/svn/trunk/javascriptguide.xml
We use that as basic guidelines at our company
Douglas Crockford's JavaScript: The Good Parts is widely used as a basis for coding guidelines.
His JSLint tool can be used to check whether code meets his recommendations.
Standard is the new standard.
I've been using it in all my projects.

Categories