Why charAt() and charCodeAt() are called safe? - javascript

I was learning about javascript string methods here.
Under section Extracting String Characters, it said:
There are 2 safe methods for extracting string characters:
charAt(position)
charCodeAt(position)
The questions here are:
Why these methods are called safe?
What are these methods protecting from?

There are two ways to access a character from a string.
// Bracket Notation
"Test String1"[6]
// Real Implementation
"Test String1".charAt(6)
It is a bad idea to use brackets, for these reasons (Source):
This notation does not work in IE7.
The first code snippet will return
undefined in IE7. If you happen to use
the bracket notation for strings all
over your code and you want to migrate
to .charAt(pos), this is a real pain:
Brackets are used all over your code
and there's no easy way to detect if
that's for a string or an
array/object.
You can't set the character using this notation. As there is no warning of
any kind, this is really confusing and
frustrating. If you were using the
.charAt(pos) function, you would not
have been tempted to do it.
Also, it can produce unexpected results in edge cases
console.log('hello' [NaN]) // undefined
console.log('hello'.charAt(NaN)) // 'h'
console.log('hello' [true]) //undefined
console.log('hello'.charAt(true)) // 'e'
Basically, it's a short-cut notation that is not fully implemented across all browsers.
Note, you are not able to write characters using either method. However, that functionality is a bit easier to understand with the .charAt() function which, in most languages, is a read-only function.
So for the compatibility purpose .charAt is considered to be safe.
Source
Speed Test: http://jsperf.com/string-charat-vs-bracket-notation
Testing in Chrome 47.0.2526.80 on Mac OS X 10.10.4
Test Ops/sec
String charAt
testCharAt("cat", 1);
117,553,733
±1.25%
fastest
String bracket notation
testBracketNotation("cat", 1);
118,251,955
±1.56%
fastest

Related

An example of how to use RGI_Emoji in a Regex

I'm currently looking at regexs and emojis, and I'd like to use unicode property escapes to simplify the task
In https://unicode.org/reports/tr18/#Full_Properties, it lists a number of emoji properties such as Emoji and Emoji_Presentation etc.
Creating a regex using these patterns works:
const re = /\p{Emoji}/gu
The same page also lists RGI_Emoji, which is
The set of all emoji (characters and sequences) covered by ED-20, ED-21, ED-22, ED-23, ED-24, and ED-25.
or basic emojis, modifiers, etc, which seems to cover all use cases that I'm looking at.
However, creating a regex using this:
const re = /\p{RGI_Emoji}/gu
Gives a SyntaxError:
Uncaught SyntaxError: invalid property name in regular expression
The unicode page does mention that
Properties marked with * are properties of strings, not just single code points.
which RGI_Emoji is marked as. My knowledge of unicode isn't amazing, so I'm not sure of the proper way to use this.
Is it possible to use RGI_Emoji in a regex, and if so, what's the correct way to use it?
RGI_Emoji is not available in JavaScript yet.
It is mentioned on top of the Full Properties table that,
Properties marked with * are properties of strings, not just single code points.
Support for following sequence properties is being proposed in proposal-regexp-unicode-sequence-properties. The proposal is at stage 2 i.e. not part of the ECMAScript specification yet and hence not available.
RGI_Emoji
Basic_Emoji
Emoji_Keycap_Sequence
RGI_Emoji_Modifier_Sequence
RGI_Emoji_Flag_Sequence
RGI_Emoji_Tag_Sequence
RGI_Emoji_ZWJ_Sequence
To further confirm, check available \p{UnicodeBinaryPropertyName}'s in the latest ECMAScript specification. Only following properties of characters related to emoji's are available:
Emoji
Emoji_Component
EComp
Emoji_Modifier
EMod
Emoji_Modifier_Base
EBase
Emoji_Presentation
You'll have to form a regular expression with unicode ranges covering ED-20, ED-21, ED-22, ED-23, ED-24, and ED-25 unicode sets. Like suggested by #JosefZ in a comment.
This discussion may help JavaScript regular expression for Unicode emoji
The emoji properties were only added to UTS #18 relatively recently (mid 2020), and this involved a significant change in Unicode's properties model in that it involved formally defining for the first time properties of strings. RGI_Emoji is a binary-valued property of strings of characters. A potential issue for use of string properties in regex is that the set corresponding to a string property is potentially a vast number of strings. To avoid potential problems in existing implementations, UTS #18 allows for use of the syntax \m{Property_Name} for string properties. See https://www.unicode.org/reports/tr18/#Resolving_Character_Ranges_with_Strings for more information.
It's possible that the implementation you're using has not been fully updated for Rev. 21 of UTS #18, with support for all new properties, or that it requires you to use the \m syntax for string properties.
The online Unicode UnicodeSet utility does support enumerating string results of a regex using the RGI_Emoji property:
https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BRGI_Emoji%7D&g=&i=

How can I replace some calls to JavaScript's eval() with Ext.decode()?

We are trying to get rid of all of our eval() calls in our JavaScript. Unfortunately, I am not much of a JavaScript programmer, and I need some help.
Many of our eval() calls operate on strings, outputs from a web service, that are very JSON-like, for example, we might eval the following string:
ClassMetaData['Asset$Flex'] = {
fields: {
}
,label: 'Flex Fields'
};
I've seen various suggestions on the Internet suggesting Ext.decode(). The documentation for it says - "Decodes (parses) a JSON string to an object. If the JSON is invalid, this function throws a SyntaxError unless the safe option is set." The string that I am supplying as an argument isn't legitimate JSON as I understand it (the field names aren't quoted), but Ext.decode() nearly works for me anyway. If I decode the above string, I get an error (why?) - "Uncaught SyntaxError: Unexpected token ;". However, if I remove the trailing semi-colon, and decode, everything seems to be fine.
I am using the following code to determine whether the decode call and the eval call do the same thing:
var evaled = eval(inputString);
var decoded = Ext.decode(inputString.replace(";", "")); // remove trailing ";", if any
console.log("Equal? - " + (JSON.stringify(decoded) == JSON.stringify(evaled)));
Unfortunately, this is not a very good solution. For example, some of the input strings to eval are fairly complex. They may have all sorts of embedded characters - semicolons, HTML character encodings, etc. Decode may complain about some other syntax problem, besides semicolons at the end, and I haven't found a good way to determine where the problem is that decode objects to. (It doesn't say "illegal character in position 67", for example.)
My questions:
Could we, with a small amount of work, create a generic solution
using decode?
Is there an easy way to convert our JSON-like input
into true JSON?
Is there a better way of comparing the results of
eval and decode than JSON.stringify(decoded) == JSON.stringify(evaled)?

Math.pow alternative "**" ES7 polyfill for IE11

I'm trying to evaluate an expression which contains power, in string as **. i.e. eval("(22**3)/12*6+3/2").The problem is Internet Explorer 11 does not recognizes this and throws syntax error. Which poly-fill I should use to overcome this? Right now I'm using Modernizr 2.6.2.
example equation would be,
((1*2)*((3*(4*5)*(1+3)**(4*5))/((1+3)**(4*5)-1)-1)/6)/7
((1*2)*((3*(4*5)*(1+3)**(4*5))/((1+3)**(4*5)-1)-1)/6)/7*58+2*5
(4*5+4-5.5*5.21+14*36**2+69/0.258+2)/(12+65)
If it is not possible to do this, what are the possible alternatives?
You cannot polyfill operators - only library members (prototypes, constructors, properties).
As your operation is confined to an eval call, you could attempt to write your own expression parser, but that would be a lot of work.
(As an aside, you shouldn't be using eval anyway, for very good reasons that I won't get into in this posting).
Another (hack-ish) option is to use a regular expression to identify trivial cases of x**y and convert them to Math.pow:
function detectAndFixTrivialPow( expressionString ) {
var pattern = /(\w+)\*\*(\w+)/i;
var fixed = expressionString.replace( pattern, 'Math.pow($1,$2)' );
return fixed;
}
eval( detectAndFixTrivialPow( "foo**bar" ) );
You can use a regular expression to replace the occurrences of ** with Math.pow() invocations:
let expression = "(22**3)/12*6+3/2"
let processed = expression.replace(/(\w+)\*\*(\w+)/g, 'Math.pow($1,$2)');
console.log(processed);
console.log(eval(processed));
Things might get complicated if you start using nested or chained power expressions though.
I think you need to do some preprocessing of the input. Here is how i would approach this:
Find "**" in string.
Check what is on the left and right.
Extract "full expressions" from left and right - if there is just a number - take it as is, and if there is a bracket - find the matching one and take whatever is inside as an expression.
Replace the 2 expressions with Math.pow(left, right)
You can use Babel online to convert javascript for IE 11.

Why are JSON attribute names quoted? [duplicate]

According to Crockford's json.org, a JSON object is made up of members, which is made up of pairs.
Every pair is made of a string and a value, with a string being defined as:
A string is a sequence of zero or more
Unicode characters, wrapped in double
quotes, using backslash escapes. A
character is represented as a single
character string. A string is very
much like a C or Java string.
But in practice most programmers don't even know that a JSON key should be surrounded by double quotes, because most browsers don't require the use of double quotes.
Does it make any sense to bother surrounding your JSON in double quotes?
Valid Example:
{
"keyName" : 34
}
As opposed to the invalid:
{
keyName : 34
}
The real reason about why JSON keys should be in quotes, relies in the semantics of Identifiers of ECMAScript 3.
Reserved words cannot be used as property names in Object Literals without quotes, for example:
({function: 0}) // SyntaxError
({if: 0}) // SyntaxError
({true: 0}) // SyntaxError
// etc...
While if you use quotes the property names are valid:
({"function": 0}) // Ok
({"if": 0}) // Ok
({"true": 0}) // Ok
The own Crockford explains it in this talk, they wanted to keep the JSON standard simple, and they wouldn't like to have all those semantic restrictions on it:
....
That was when we discovered the
unquoted name problem. It turns out
ECMA Script 3 has a whack reserved
word policy. Reserved words must be
quoted in the key position, which is
really a nuisance. When I got around
to formulizing this into a standard, I
didn't want to have to put all of the
reserved words in the standard,
because it would look really stupid.
At the time, I was trying to convince
people: yeah, you can write
applications in JavaScript, it's
actually going to work and it's a good
language. I didn't want to say, then,
at the same time: and look at this
really stupid thing they did! So I
decided, instead, let's just quote the
keys.
That way, we don't have to tell
anybody about how whack it is.
That's why, to this day, keys are quoted in
JSON.
...
The ECMAScript 5th Edition Standard fixes this, now in an ES5 implementation, even reserved words can be used without quotes, in both, Object literals and member access (obj.function Ok in ES5).
Just for the record, this standard is being implemented these days by software vendors, you can see what browsers include this feature on this compatibility table (see Reserved words as property names)
Yes, it's invalid JSON and will be rejected otherwise in many cases, for example jQuery 1.4+ has a check that makes unquoted JSON silently fail. Why not be compliant?
Let's take another example:
{ myKey: "value" }
{ my-Key: "value" }
{ my-Key[]: "value" }
...all of these would be valid with quotes, why not be consistent and use them in all cases, eliminating the possibility of a problem?
One more common example in the web developer world: There are thousands of examples of invalid HTML that renders in most browsers...does that make it any less painful to debug or maintain? Not at all, quite the opposite.
Also #Matthew makes the best point of all in comments below, this already fails, unquoted keys will throw a syntax error with JSON.parse() in all major browsers (and any others that implement it correctly), you can test it here.
If I understand the standard correctly, what JSON calls "objects" are actually much closer to maps ("dictionaries") than to actual objects in the usual sense. The current standard easily accommodates an extension allowing keys of any type, making
{
"1" : 31.0,
1 : 17,
1n : "valueForBigInt1"
}
a valid "object/map" of 3 different elements.
If not for this reason, I believe the designers would have made quotes around keys optional for all cases (maybe except keywords).
YAML, which is in fact a superset of JSON, supports what you want to do. ALthough its a superset, it lets you keep it as simple as you want.
YAML is a breath of fresh air and it may be worth your time to have a look at it. Best place to start is here: http://en.wikipedia.org/wiki/YAML
There are libs for every language under the sun, including JS, eg https://github.com/nodeca/js-yaml

Substring arguments best practice

The JavaScript String object has two substring functions substring and substr.
substring takes two parameters beginIndex and endIndex.
substr also takes two parameters beginIndex and length.
It's trivial to convert any range between the two variants but I wonder if there's any significance two how the two normally would be used (in day-to-day programming). I tend to favor the index/length variant but I have no good explanation as to why.
I guess it depends on what kind of programming you do, but if you have strong opinion on the matter, I'd like to hear it.
When is a (absolute, relative) range more suited than an (absolute, absolute) and vice versa?
Update:
This is not a JavaScript question per se (JavaScript just happen to implement both variants [which I think is stupid]), but what practical implication does the relative vs. absolute range have? I'm looking for solid argument for why we prefer one over the other. To broaden the debate a bit, how would you prefer to design your data structures for use with either one approach?
I prefer the startIndex, endIndex variant (substring) because String.substring() operates the same way in Java and I feel it makes me more efficient to stick to the same concepts in whatever language I use most often (when possible).
If I were doing more C# work, I might use the other variant more because that is how String.Substring() works in C#.
To answer your comment about JavaScript having both, it looks like substr() was added to browsers after substring() (reference - it seems that although substr() was part of JavaScript 1.0, most browser vendors didn't implement it until later). This suggests to me that even the implementers of the early language recognized the duplication of functionality. I'd suggest substring() came first in an attempt to leverage the JavaScript trademark. Regardless, it seems that they recognized this duplication in ECMA-262 and took some small steps toward removing it:
substring(): ECMA Version: ECMA-262
substr(): ECMA Version: None, although ECMA-262 ed. 3 has a non-normative section suggesting uniform semantics for substr
Personally I wouldn't mind a substring() where the second parameter can be negative, which would return the characters between the first parameter and the length of the string minus the second parameter. Of course you can already achieve that more explicitly and I imagine the design would be confusing to many developers:
String s1 = "The quick brown fox jumps over the lazy dog";
String s2 = s1.substring(20, -13); // "jumps over"
When is a (absolute, relative) range more suited than an (absolute, absolute) and vice versa?
The former, when you know how much, the latter when you know where.
I presume substring is implemented in terms of substr:
substring( b, e ) {
return substr( b, e - b );
}
or substr in terms of substring:
substr( b, l) {
return substring( b, b + l );
}
I slightly prefer the startIndex, endIndex variant, since then to get the last bit of a string I can do:
string foo = bar.substring(5, foo.length());
instead of:
string foo = bar.substring(5, foo.length() - 5);
It depends on the case, but I more often find I know exactly how many characters I want to take out, and prefer the start with length parameterization. But I could easily see a case where I've searched a long string for two tokens and now have their indexes, while it's trivial math to use either case, in this case I might prefer the start and end indexes.
Also, from a document writer's perspective, having two parameters of the same basic meaning is probably easier to write about and an easier mnemonic.
Each of these functions does neat saves when given strange values, such as an end smaller than a start, a negative length, a negative start, or a length or end beyond the string's end.
For JavaScript the best practice is to use substring over substr because it's supported in more (albeit usually older) browsers. If they'd gone with BasicScript instead would there have been a MID() and a MIDDLE() function? Who doesn't love BASIC syntax?

Categories