Javascript Regular Expression "Single Space Character" - javascript

I am learning javascript and I am analyzing existing codes.
In my JS reference book, it says to search on a single space use "\s"?
But I have came across the code
obj.match(/Kobe Bryant/);
Instead of using \s, it uses the actual space?
Why doesn't this generate an error?

The character class \s does not just contain the space character but also other Unicode white space characters. \s is equivalent to this character class:
[\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]

No. It is perfectly legal to include a literal space in a regex.
However, it's not equivalent - \s will include any whitespace character, including tabs, non-breaking spaces, half-width spaces and other characters, whereas a literal space will only match the regular space character.

\s matches any whitespace character, including tabs etc. Sure you can use a literal space also without problems. Just like you can use [0-9] instead of \d to denote any digit. However, keep in mind that [0-9] is equivalent to \d whereas the literal space is a subset of \s.

In addition to normal spaces, \s matches different kinds of white space characters, including tabs (and possibly newline characters, according to configuration). That said, matching with a normal space is certainly valid, especially in your case where it seems you want to match a name, which is normally separated by a normal space.

Related

Ignore newlines in a regex that doesn't care about order

I have a regex here at scriptular.com
/(?=.*net)(?=.*income)(?=.*total)(?=.*depreciation)/i
How do I make the regex successfully match the string?
Without the newline characters in the string, the regex would succeed. I could remove them... but I'd rather not.
1.) The dot matches any character besides newline. It won't skip over newlines if the desired words would match in lines after the first one. In many regex flavors there is the dotall or single line s-flag available for making the dot also match newlines but unfortunately not in JS Regex.
Workarounds are to use a character class that contains any character. Such as [\s\S] any whitespace character \s together with any non whitespace \S or [\w\W] for any word character together with any non word character or even [^] for not nothing instead of the dot.
2.) Anchor the lookaheads to ^ start of string as it's not wanted to repeat the lookaheads at any position in the string. This will drastically improve performance.
3.) Use lazy matching for being satisfied with first match of each word.
/^(?=[\s\S]*?net)(?=[\s\S]*?income)(?=[\s\S]*?total)(?=[\s\S]*?depreciation)/i
See demo at regex101 (dunno why this doesn't work in your demo tool)
Additionally you can use \b word boundaries around the words for making sure such as net won't be matched in brunet, network... so the regex becomes ^(?=[\s\S]*?\bnet\b)...

JavaScript regular expression for word boundaries, tolerating in-word hyphens and apostrophes

I'm looking for a Regular Expression for JavaScript that will identify word boundaries in English, while accepting hyphens and apostrophes that appear inside words, but excluding those that appear alone or at the beginning or end of a word.
For example, for the sentence ...
  She said - 'That'll be all, Two-Fry.'
... I want the characters shown in grey below to be detected:
  Shesaid- 'That'llbeall,Two-Fry.'
If I use the regex /[^A-Za-z'-]/g, then "loose" hyphens and apostrophes are not detected.
  Shesaid-'That'llbeall,Two-Fry.'
How can I alter my regex so that it detects apostrophes and hyphens that don't have a word character on both sides?
You can test my regex here: https://regex101.com/r/bR8sV1/2
Note: the text I will be working on may contain other writing scripts, like руский and ไทอ so it will not be feasible to simply include all the characters that are not part of any English word.
You can organize your word-boundary characters into two groups.
Characters that cannot be alone.
Characters that can be alone.
A regex that works with your example would be:
[\s.,'-]{2,}|[\s.]
Regex101 Demo
Now all that's left is to keep adding all non-word characters into those two groups until it fits all of your needs. So you might start adding symbols and more punctuation to those character classes.
You could write something like that:
(\s|[!-/]|[:-#]|[\[-`]|[\{-~])*\s(\s|[!-/]|[:-#]|[\[-`]|[\{-~])*
Or the compact version:
(\s|[!-/:-#\[-`\{-~])*\s(\s|[!-/:-#\[-`\{-~])*
The RegExp requires one \s (Space character) and selects als spaces and non alphanumeric chars before and after it.
https://regex101.com/r/bR8sV1/4
\s matches all spaces
!-/ every char from ! to /
:-# every char from : to #
\[-`` every char from [ to ``
\{-~ every char from { to ~

What's the difference between these regexes

I'm reading Ionic's source code. I came across this regex, and i"m pretty baffled by it.
([\s\S]+?)
Ok, it's grouping on every char that is either a white space, or non white space???
Why didn't they just do
(.+?)
Am I missing something?
The . matches any symbol but a newline. In order to make it match a newline, in most languages there is a modifier (dotall, singleline). However, in JS, there is no such a modifier.
Thus, a work-around is to use a [\s\S] character class that will match any character, including a newline, because \s will match all whitespace and \S will match all non-whitespace characters. Similarly, one could use [\d\D] or [\w\W].
Also, there is a [^] pattern to match the same thing in JS, but since it is JavaScript-specific, the regexes containing this pattern are not portable between regex flavors.
The +? lazy quanitifier matches 1 or more symbols conforming to the preceding subpattern, but as few as possible. Thus, it will match just 1 symbol if used like this, at the end of the pattern.
In many realizations of Regexp "." doesn't match new lines. So they use "[\s\S]" as a little hack =)
A . matches everything but the newline character. This is actually a well known/documented problem with javascript. The \s (whitespace match) alongside it's negation \S (non-whitespace match) provides a dotall match including the newline. Thus [\s\S] is generally used more frequently than .
The RegEx they used includes more characters (essentially everything).
\s matches any word or digit character or whitespace.
\S matches anything except a digit, word character, or whitespace
As Casimir notes:
. matches any character except newline (\n)
. matches any char except carriage return /r and new line /n
The Shortest way to do [/s/S](white space and non white space) is [^](not nothing)

Regex: make "enter" and extra spaces equal to single space?

I'm trying to format the input of a user into single spaces because I will use the input to compare the strings in the results. The problem is the compare function only works in single spaces, so when a user accidentally added some extra spaces or press enter it will not work.
here's my regex and code plus a fiddle:http://jsfiddle.net/purmou/eSE9Y/
html
<textarea id="input"></textarea>
<button>Submit</button>
javascript
$("button").click(function(){
$('#input').val($('#input').val().replace(/[\t ]+/g,' '));
});
I already solved the extra spaces part my only problem is the "enter" part.
Use \s+ instead to match all whitespace characters including spaces, tabs, newlines etc.
$('#q').val($('#q').val().replace(/\s+/g,' '));
Demo: http://jsfiddle.net/eSE9Y/1/
\s stands for "whitespace character". (...) which characters this actually includes, depends on the regex flavor. (...) \s matches a space, a tab, a line break, or a form feed. Most flavors also include the vertical tab (...). In flavors that support Unicode, \s normally includes all characters from the Unicode "separator" category. Java and PCRE are exceptions once again. But JavaScript does match all Unicode whitespace with \s.
Reference: http://www.regular-expressions.info/shorthand.html
Newline character can be represented with \n in the RegEx. So, just change your RegEx like this
/[\t\n ]+/g
Instead, you can simply replace all whitespace characters with a single space, with this regular expression
/\s+/g
Quoting from MDN RegularExpressions page, about \s
Matches a single white space character, including space, tab, form
feed, line feed. Equivalent to [
\f\n\r\t\v​\u00a0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​​\u202f\u205f​\u3000].

Javascript validation

For text validation for chars I am using like [a-zA-z] and for numbers like [0-9] ..if I need to add special symbols I am adding slash like [a-zA-z/-/].
While including lot of symbols its getting difficult and my javascript is getting extremely big. Is there an easy way to do it ?
Regards
A.Collins
You can take a look at this cheat sheet. for instance, [0-9] can be reduced to \d.
For the general case of "a lot of characters" — no.
\w for alphanumerics and underscores
\d for digits
\s for whitespace
You can mix them, resulting in stuff like, for example, [\d.] (for matching numbers & dots).
In a character class, x-y means "all characters between x and y". If you just have one additional character, in your case / then you don't need to use the x-y format, you can just drop the character in:
[a-zA-z/]
That's not the correct way to escape characters. \ is the correct escape character to be used:
[a-zA-Z\/]
You can use character classes, [a-zA-Z0-9] can be replaced by [\w]. The only characters that needs to be escaped are \ and -. ^ should be escaped too when it's the first character in the character class.

Categories