The question 'How does Google Closure Compiler handle quotes (string literals)?' could also be re-phrased like:
Why does Closure replace/swap single quotes with double quotes?
How does Closure decide what quote- formatting/style to use?
(How) can I change this (default) behavior?
Note 1: this question is not about why Closure (or some other minifiers) have chosen to prefer double quotes (as is asked here and here).
Note 2: this question is not about single vs double quote discussions, however some understanding what GCC does to our code (and why) is rather useful!
It is often stated that (or asked why) Google Closure Compiler (GCC) replaces single quotes with double-quotes (even when the compilation_level is set to WHITESPACE_ONLY !!!):
example xmp_1.js:
alert('Hello world!'); // output: alert("Hello world!");
However... this is only half of 'the truth', because:
example xmp_2.js:
alert('Hello "world"!'); // output: alert('Hello "world"!');
// *NOT*: alert("Hello \"world\"!");
GCC is essentially a 'your raw javascript' to 'smaller (and more efficient) javascript' translator: so it does not 'blindly' replace single quotes with double quotes, but tries to choose an 'optimal quote-character' (after all.. one of the primary goals is to 'mini-fy' the script).
From the source-code (CompilerOptions.java) and this issue-report one can learn that:
If the string contains more single quotes than double quotes then the
compiler will wrap the string using double quotes and vice versa.
If the string contains no quotes or an equal number of single and
double quotes, then the compiler will default to using double quotes.
Like this example xmp_3.js:
alert('Hello "w\'orld"!'); // output: alert('Hello "w\'orld"!');
alert('Hello "w\'o\'rld"!'); // alert("Hello \"w'o'rld\"!");
Note how the above xmp_3 results in a 'mixed' output that uses both ' and " as outer quotation: the optimal choice followed by the default (when it didn't matter).
How to change/override the default double quotes to single quotes?
As it turned out there are some serious legitimate real-world cases where defaulting to single-quotes would have been better. As explained in the issue 836 (from Oct 8, 2012) referenced above:
The FT web app (app.ft.com) and the Economist app for Playbook deliver
JavaScript updates to the client along with other resources by
transmitting them as part of a JSON-encoded object. JSON uses double
quotes natively, so all the double quotes in the compiled JavaScript
need to be escaped. This inflates the size of the FT web app's JS by
about 20kB when transmitting a large update.
The reporter of the issue came with a gift: a patch that added the option prefer_single_quotes to change the default quote-character from double quote to single quote.
This issue was taken seriously enough that project member Santos considered changing the default double quote to single quote ('and see if anybody complains').. TWICE (also after the reporter/patch-contributer stated that he implemented it as an option so that it wouldn't have any backward-compatibility consequences since 'someone might be relying on strings being output with double quotes for some bizarre reason').
However, about one week later the patch was accepted (r2258), another week later reworked (r2257) and on Oct 30, 2012 Santos reported back that the option could now be enabled with:
--formatting=SINGLE_QUOTES
(so a third option besides PRETTY_PRINT and PRINT_INPUT_DELIMITER for the formatting-key).
(Note: in the current source-code one can currently still find numerous references to 'prefer_single_quotes' as well.)
Usage:
If you (download and) use the (local java) application:
java -jar compiler.jar --js xmp_1.js --formatting SINGLE_QUOTES
and you will see that: alert('Hello world!'); now compiles to alert('Hello world!');
However, at this time of writing, the Compiler Service API and UI (that most probably uses the API) located at http://closure-compiler.appspot.com, do not accept this third (new, although a year in existence) formatting-option: SINGLE_QUOTES and will throw an error:
17: Unknown formatting option single_quotes.
After digging (again) through the source, it seems (I'm not a Java-expert) that this is because jscomp/webservice/common/Protocol.java only accepts the older PRETTY_PRINT and PRINT_INPUT_DELIMITER
* All the possible values for the FORMATTING key.
*/
public static enum FormattingKey implements ProtocolEnum {
PRETTY_PRINT("pretty_print"),
PRINT_INPUT_DELIMITER("print_input_delimiter"),
;
I will update this answer should this option become available in the API and/or UI.
Hope this helps and saves someone some time, since the only documentation and reference google can find about SINGLE_QUOTES is currently in this one issue 836 and some comments in the source. Now it has some explanation on SO (where I'd expect it).
Related
I have an application which uses a Javascript-based rules engine. I need a way to convert regular straight quotes into curly (or smart) quotes. It’d be easy to just do a string.replace for ["], only this will only insert one case of the curly quote.
The best way I could think of was to replace the first occurrence of a quote with a left curly quote and every other one following with a left, and the rest right curly.
Is there a way to accomplish this using Javascript?
You could replace all that preceed a word character with the left quote, and all that follow a word character with a right quote.
str = str.replace(/"(?=\w|$)/g, "“");
str = str.replace(/(?<=\w|^)"/g, "”"); // IF the language supports look-
// behind. Otherwise, see below.
As pointed out in the comments below, this doesn't take punctuation into account, but easily can:
/(?<=[\w,.?!\)]|^)"/g
[Edit:] For languages that don't support look-behind, like Javascript, as long as you replace all the front-facing ones first, you have two options:
str = str.replace(/"/g, "”"); // Replace the rest with right curly quotes
// or...
str = str.replace(/\b"/g, "”"); // Replace any quotes after a word
// boundary with right curly quotes
(I've left the original solution above in case this is helpful to someone using a language that does support look-behind)
You might want to look at what Pandoc does—apparently with the --smart option, it handles quotes properly in all cases (including e.g. ’tis and ’twere).
I recently wrote a Javascript typography prettification engine that does, among other things, quote replacement; I wound up using basically the algorithm suggested by Renesis, but there’s currently a failing test up waiting for a smarter solution.
If you’re interested in cribbing my code (and/or submitting a patch based on work you’ve done), check it out: jsPrettify. jsprettify.prettifyStr does what you’re looking for. If you don’t want to deal with the Closure dependency, there’s an older version that runs on its own—it even works in Rhino.
'foo "foo bar" "bar"'.replace(/"([-a-zA-Z0-9 ]+)"/g, function(wholeMatch, m1){
return "“" + m1 + "”";
});
The following just changes every quote by alternating (this specific example however would leave out the orphaned quotes).
str.replace(/\"([^\"]*)\"/gi,"“$1”");
Works perfectly, as long as the text you're texturizing isn't already screwed up with improper use of the double quote. In English, quotes are never nested.
I don't think something like that in general is easy at all, because you'd have to interpret exactly what each double-quote character in your content means. That said, what I'd do is collect all the text nodes I was interested in, and then go through and keep track of the "on/off" (or "odd/even"; whatever) nature of each double quote instance. Then you can know which replacement entity to use.
I didn't find the logic I wanted here, so here's what I ended up going with.
value = value.replace(/(^|\s)(")/g, "$1“"); // replace quotes that start a line or follow spaces
value = value.replace(/"/g, "”"); // replace rest of quotes with the back smart quote
I have a small textarea that I need to replace straight quotes with curly (smart) quotes. I'm just executing this logic on keyup. I tried to make it behave like Microsoft Word.
Posting for posterity.
As suggested by #Steven Dee, I went to Pandoc.
I try to use a mature and tested tool whenever I can versus baking my own regex. Hand built regex's can be overly greedy, or not greedy enough, and they may not be sensitive to word boundaries and commas etc. Pandoc accounts for most this and more.
From the command line (the --smart parameter turns on smart quotes):
pandoc --smart --standalone -o output.html input.html
..and I know a command line script may or may not fit OP's requirement of using Javascript. (related: How to execute shell command in Javascript)
I'm trying to use a powershell regex to pull version data from the AssemblyInfo.cs file. The regex below is my best attempt, however it only pulls the string [assembly: AssemblyVersion(". I've put this regex into a couple web regex testers and it LOOKS like it's doing what I want, however this is my first crack at using a powershell regex so I could be looking at it wrong.
$s = '[assembly: AssemblyVersion("1.0.0.0")]'
$prog = [regex]::match($s, '([^"]+)"').Groups[1].Value
You also need to include the starting double quotes otherwise it would start capturing from the start until the first " is reached.
$prog = [regex]::match($s, '"([^"]+)"').Groups[1].Value
^
Try this regex "([^"]+)"
Regex101 Demo
Regular expressions can get hard to read, so best practice is to make them as simple as they can be while still solving all possible cases you might see. You are trying to retrieve the only numerical sequence in the entire string, so we should look for that and bypass using groups.
$s = '[assembly: AssemblyVersion("1.0.0.0")]'
$prog = [regex]::match($s, '[\d\.]+').Value
$prog
1.0.0.0
For the generic solution of data between double quotes, the other answers are great. If I were parsing AssemblyInfo.cs for the version string however, I would be more explicit.
$versionString = [regex]::match($s, 'AssemblyVersion.*([0-9].[0-9].[0-9].[0-9])').Groups[1].Value
$version = [version]$versionString
$versionString
1.0.0.0
$version
Major Minor Build Revision
----- ----- ----- --------
1 0 0 0
Update/Edit:
Related to parsing the version (again, if this is not a generic question about parsing text between double quotes) is that I would not actually have a version in the format of M.m.b.r in my file because I have always found that Major.minor are enough, and by using a format like 1.2.* gives you some extra information without any effort.
See Compile date and time and Can I generate the compile date in my C# code to determine the expiry for a demo version?.
When using a * for the third and fourth part of the assembly version, then these two parts are set automatically at compile time to the following values:
third part is the number of days since 2000-01-01
fourth part is the number of seconds since midnight divided by two (although some MSDN pages say it is a random number)
Something to think about I guess in the larger picture of versions, requiring 1.2.*, allowing 1.2, or 1.2.3, or only accepting 1.2.3.4, etc.
We have a translation extraction tool that we've written, that extracts strings that we've marked for translation in TypeScript. The JavaScript tool reads our Typescript files and has a regex like:
fileContent.match(/this.\translate\((.*?));/);
(simplified for readability, this works fine)
The translation method takes 3 parameters: 1. The string to be translated, 2. any variables that might be interpolated, 3. description. The last 2 are optional.
Examples of the implementation:
this.translate('text to translate');
this.translate('long text' +
'over multiple lines');
this.translate(`text to translate with backticks for interpolation`);
this.translate(`some test with a ${variable}`, [variable]);
this.translate(`some test with a ${variable}`, [variable], 'Description');
We need to extract these 3 parameters from text in JavaScript and have issues parsing it. We are currently using a regex to check the first opening string character (' or "`") and trying to match a closing character, but that is hard to do.
I'm currently trying to use eval (the script doesn't run in the browser, but CLI), like this:
function getParameters(text, variables, description){
return {text: text, variables: variables, description: description}
}
toEval = string.replace('this.translate', 'getParameters');
eval(toEval);
Which works perfect if there are no variables, but complains that "variables" not defined, when we pass in variables.
Can anyone suggest a good/better way to deal with this text extraction?
Instead of regex, you can use either babel or webpack to properly parse Javascript (or typescript) and extract all the information.
I have a webpack plugin that works on static strings only, but it should give a good starting point:
https://github.com/grassator/webpack-extract-translation-keys
According to Crockford's json.org, a JSON object is made up of members, which is made up of pairs.
Every pair is made of a string and a value, with a string being defined as:
A string is a sequence of zero or more
Unicode characters, wrapped in double
quotes, using backslash escapes. A
character is represented as a single
character string. A string is very
much like a C or Java string.
But in practice most programmers don't even know that a JSON key should be surrounded by double quotes, because most browsers don't require the use of double quotes.
Does it make any sense to bother surrounding your JSON in double quotes?
Valid Example:
{
"keyName" : 34
}
As opposed to the invalid:
{
keyName : 34
}
The real reason about why JSON keys should be in quotes, relies in the semantics of Identifiers of ECMAScript 3.
Reserved words cannot be used as property names in Object Literals without quotes, for example:
({function: 0}) // SyntaxError
({if: 0}) // SyntaxError
({true: 0}) // SyntaxError
// etc...
While if you use quotes the property names are valid:
({"function": 0}) // Ok
({"if": 0}) // Ok
({"true": 0}) // Ok
The own Crockford explains it in this talk, they wanted to keep the JSON standard simple, and they wouldn't like to have all those semantic restrictions on it:
....
That was when we discovered the
unquoted name problem. It turns out
ECMA Script 3 has a whack reserved
word policy. Reserved words must be
quoted in the key position, which is
really a nuisance. When I got around
to formulizing this into a standard, I
didn't want to have to put all of the
reserved words in the standard,
because it would look really stupid.
At the time, I was trying to convince
people: yeah, you can write
applications in JavaScript, it's
actually going to work and it's a good
language. I didn't want to say, then,
at the same time: and look at this
really stupid thing they did! So I
decided, instead, let's just quote the
keys.
That way, we don't have to tell
anybody about how whack it is.
That's why, to this day, keys are quoted in
JSON.
...
The ECMAScript 5th Edition Standard fixes this, now in an ES5 implementation, even reserved words can be used without quotes, in both, Object literals and member access (obj.function Ok in ES5).
Just for the record, this standard is being implemented these days by software vendors, you can see what browsers include this feature on this compatibility table (see Reserved words as property names)
Yes, it's invalid JSON and will be rejected otherwise in many cases, for example jQuery 1.4+ has a check that makes unquoted JSON silently fail. Why not be compliant?
Let's take another example:
{ myKey: "value" }
{ my-Key: "value" }
{ my-Key[]: "value" }
...all of these would be valid with quotes, why not be consistent and use them in all cases, eliminating the possibility of a problem?
One more common example in the web developer world: There are thousands of examples of invalid HTML that renders in most browsers...does that make it any less painful to debug or maintain? Not at all, quite the opposite.
Also #Matthew makes the best point of all in comments below, this already fails, unquoted keys will throw a syntax error with JSON.parse() in all major browsers (and any others that implement it correctly), you can test it here.
If I understand the standard correctly, what JSON calls "objects" are actually much closer to maps ("dictionaries") than to actual objects in the usual sense. The current standard easily accommodates an extension allowing keys of any type, making
{
"1" : 31.0,
1 : 17,
1n : "valueForBigInt1"
}
a valid "object/map" of 3 different elements.
If not for this reason, I believe the designers would have made quotes around keys optional for all cases (maybe except keywords).
YAML, which is in fact a superset of JSON, supports what you want to do. ALthough its a superset, it lets you keep it as simple as you want.
YAML is a breath of fresh air and it may be worth your time to have a look at it. Best place to start is here: http://en.wikipedia.org/wiki/YAML
There are libs for every language under the sun, including JS, eg https://github.com/nodeca/js-yaml
So I have 1000 lines of javascript. I need to turn it into a Java String so that I can output (via System.out.println or whatever).
I'm looking for an online tool to escape all the quotes... something geared toward my specific need would be nice as I don't want other special characters changed. Lines like:
var rgx = /(\d+)(\d{3})/;
need to stay intact.
The situation mandates the JavaScript be put into a String so please no workarounds.
Here's a link which features Crockford's implementation of the quote() function. Use it to build your own JavaScript converter.
Edit: I also slightly modified the function to output an ascii-safe string by default.
Edit2: Just a suggestion: It might be smarter to keep the JavaScript in an external file and read it at runtime instead of hardcoding it...
Edit3: And here's a fully-featured solution - just copy to a .html file and replace the dummy script:
<script src="quote.js"></script>
<script>
// this is the JavaScript to be converted:
var foo = 'bar';
var spam = 'eggs';
function fancyFunction() {
return 'cool';
}
</script>
<pre><script>
document.writeln(quote(
document.getElementsByTagName('script')[1].firstChild.nodeValue, true));
</script></pre>
You can compress the file using one of the available tools to achieve this effect:
YUI Compressor Online
Dean Edward's Packer
Douglas Crockford's JSMIN
You can use the jsmin tool to compress the Javascript to a single line (hopefully), but it doesn't escape the quotes. This can be done with search/replace in an editor or the server side scripting language used.
So everything I tried ended up breaking the javascript. I finally got it to work by doing the following:
Using Notepad++:
Hit Shift + Tab a bunch of times to unindent every line
Do View -> Show End Of Line
Highlight the LF char and do a Replace All to replace with empty string
Repeat for the CR char
Highlight a " (quote character) and do a Replace All with \" (escaped quote)... just typing the quote character into the Replace prompt only grabbed some of the quotes for some reason.
Now You have 1 enormously long line... I ended up having to break the 1 string apart into about 2000 character long lines.... The crazy long line was killing IE and/or breaking the Java String limit.