Why have two '\' in Regex? [duplicate] - javascript

This question already has answers here:
Why do regex constructors need to be double escaped?
(5 answers)
Extra backslash needed in PHP regexp pattern
(4 answers)
Regex to replace single backslashes, excluding those followed by certain chars
(3 answers)
Closed 7 years ago.
function trim(str) {
var trimer = new RegExp("(^[\\s\\t\\xa0\\u3000]+)|([\\u3000\\xa0\\s\\t]+\x24)", "g");
return String(str).replace(trimer, "");
}
why have two '\' before 's' and 't'?
and what's this "[\s\t\xa0\u3000]" mean?

You're using a literal string.
In a literal string, the \ character is used to escape some other chars, for example \n (a new line) or \" (a double quote), and it must be escaped itself as \\. So when you want your string to have \s, you must write \\s in your string literal.
Thankfully JavaScript provides a better solution, Regular expression literals:
var trimer = /(^[\s\t\xa0\u3000]+)|([\u3000\xa0\s\t]+\x24)/g

why have two '\' before 's' and 't'?
In regex the \ is an escape which tells regex that a special character follows. Because you are using it in a string literal you need to escape the \ with \.
and what's this "[\s\t\xa0\u3000]" mean?
It means to match one of the following characters:
\s white space.
\t tab character.
\xa0 non breaking space.
\u3000 wide space.
This function is inefficient because each time it is called it is converting a string to a regex and then it is compiling that regex. It would be more efficient to use a Regex literal not a string and compile the regex outside the function like the following:
var trimRegex = /(^[\s\t\xa0\u3000]+)|([\u3000\xa0\s\t]+$)/g;
function trim(str) {
return String(str).replace(trimRegex, "");
}
Further to this \s will match any whitespace which includes tabs, the wide space and the non breaking space so you could simplify the regex to the following:
var trimRegex = /(^\s+)|(\s+$)/g;
Browsers now implement a trim function so you can use this and use a polyfill for older browsers. See this Answer

Related

In JavaScript, string '\m' is fully equals to string 'm', why? [duplicate]

This question already has answers here:
How can I use backslashes (\) in a string?
(4 answers)
Closed 2 years ago.
console.log('\d' === 'd'); // true
Character 'd' is not a special character, why javascript want to slice the escape notation.
It's better to keep the escape notation in my view.
When I want to fully match string-'\d' using regular expression, it just impossible!
Taking the following code as an example.
console.log(RE.test('\d')); // it should log true
console.log(RE.test('d')); // it should log false
Unfortunately, you just cannot figure out a regular expression pattern.
You have no reason to escape d in a string and JavaScript ignores it. If you need \d you need to escape the escape character: \\d.
See also Why do linters pick on useless escape character?
\d has a special meaning in regular expressions (a digit character), but also in strings (escaped 'd' character, which is exactly like 'd').
Any / creates an escape sequence in a string. Some are "useful" (\n === new line) and some arguably useless (`'\d' === 'd').
If you want the regex \d, you could
1 - use a regex literal instead : /\d/
2 - escape the \ in the string : '\\d', so that the string containing the two characters \ and d is correctly understood by Javascript.

Switching \n with \\n in javascript [duplicate]

This question already has answers here:
How do I handle newlines in JSON?
(10 answers)
Closed 4 years ago.
So I have a string:
var s = "foo\nbar\nbob";
I want the string to become:
"foo\\nbar\\nbob"
How can I replace every \n with a \\n?
I've tried using some for loops, but I can't figure it out.
A simple .replace would work - search for \n, and replace with \\n:
var s = "foo\nbar\nbob";
console.log(
s.replace(/\n/g, '\\\n')
// ^^ double backslash needed to indicate single literal backslash
);
Note that this results in "a single backslash character, followed by a literal newline character" - there will not be two backslashes in a row in the actual string. It might be a bit less confusing to use String.raw, which will interpret every character in the template literal literally:
var s = "foo\nbar\nbob";
console.log(
s.replace(/\n/g, String.raw`\
`) // template literal contains one backslash, followed by one newline
);

Remove excessive blank lines [duplicate]

This question already has answers here:
Why do regex constructors need to be double escaped?
(5 answers)
Closed 4 years ago.
Here is an attempt to remove any excessive blank lines in string.
I'm trying to understand why second approach doesn't workfor lines which contains whitespace.
Demo.
var string = `
foo
bar (there are whitespaced lines between bar and baz. I replaced them with dots)
....................
.......................
...........
baz
`;
// It works
string = string.replace(/^(\s*\n){2,}/gm, '\n');
// Why it doesn't work?
var EOL = string.match(/\r\n/gm) ? '\r\n' : '\n';
var regExp = new RegExp('^(\s*' + EOL + '){2,}', 'gm');
string = string.replace(regExp, EOL);
alert(string);
Your \s needs to be changed to \\s. Just putting \s is the same as s.
In strings (enclosed in quotes), the backslash has a special meaning. For example, \n is the newline character. There are a couple of others that you may or may not have heard of, e.g. \b, \t, \v. It would be bad language design choice to make only a few defined ones special, and consider the non-existent \s to be an actual backslash and an s, because it would be inconsistent, a source of errors, and not future-proof. That's why, when you want to have a backslash in a string, you escape the backslash to \\.
In your first example, you use / characters to delimit the regular expression. This is not considered a string bound by the above rules.

JavaScript RegExp : Why causes a double backslash ( \\ ) an error? [duplicate]

This question already has answers here:
Why do regex constructors need to be double escaped?
(5 answers)
Closed 2 years ago.
Have found this out by accident and have no idea what's the reason.
// Results in "Syntax error in regular Expression".
var re = RegExp('\\');
I know that the constructor-function expects a string as parameter. And that the backslash is used within strings to escape characters with special meaning. I know that I have to escape characters like \d to \\d .
So therefore: The right backslash should the interpreted as some normal character.
Instead it throws an error. Why?
Can anyone explain this to me?
\ is used to escape \ in strings, so to get \d as you wrote you need to do \\d.
Also in regexp you need to escape \ with \\.
So you have two escape syntaxes that need to take place in regexps, using a single \\ will mean \ in regexp which is not correct, because it needs to be escaped.
So to workaround this you need double escape: \\\\ - this will be a regex looking for \.
The string literal '\\' creates a string containing nothing but a single backslash character, because within string literals the backslash is an escape character.
A single backslash character is not a valid regular expression.
If you want a regex that matches a single backslash then that needs to be escaped within the regex, so you need to do either:
re = /\\/;
// or
re = new RegExp('\\\\');
I believe the reason you are getting this error is that the effective regex which you are feeding into the JavaScript engine is a single backslash \.
The reason for this is that the first backslash escapes the second one. So you are putting in a literal backslash, which doesn't make any sense.
The backslash \ is the escape character for regular expressions. Therefore a double backslash would indeed mean a single, literal backslash. \ (backslash) followed by any of [\^$. ?*+(){} escapes the special character to suppress its special meaning.

Javascript regex to remove punctuation [duplicate]

This question already has answers here:
How can I strip all punctuation from a string in JavaScript using regex?
(16 answers)
Closed 7 years ago.
I'm having trouble with my regex. I'm sure something is not escaping properly.
function regex(str) {
str = str.replace(/(~|`|!|#|#|$|%|^|&|*|\(|\)|{|}|\[|\]|;|:|\"|'|<|,|\.|>|\?|\/|\\|\||-|_|+|=)/g,"")
document.getElementById("innerhtml").innerHTML = str;
}
<div id="innerhtml"></div>
<p><input type="button" value="Click Me" onclick="regex('test # . / | ) this');">
* and + needs to be escaped.
function regex (str) {
return str.replace(/(~|`|!|#|#|$|%|^|&|\*|\(|\)|{|}|\[|\]|;|:|\"|'|<|,|\.|>|\?|\/|\\|\||-|_|\+|=)/g,"")
}
var testStr = 'test # . / | ) this'
document.write('<strong>before: </strong>' + testStr)
document.write('<br><strong>after: </strong>' + regex(testStr))
The accepted answer on the question proposed duplicate doesn't cover all the punctuation characters in ASCII range. (The comment on the accepted answer does, though).
A better way to write this regex is to use put the characters into a character class.
/[~`!##$%^&*(){}\[\];:"'<,.>?\/\\|_+=-]/g
In a character class, to match the literal characters:
^ does not need escaping, unless it is at the beginning of the character class.
- should be placed at the beginning of the character class (after the ^ in a negated character class) or at the end of a character class.
] has to be escaped to be specified as literal character. [ does not need to be escaped (but I escape it anyway, as a habit, since some language requires [ to be escaped inside character class).
$, *, +, ?, (, ), {, }, |, . loses their special meaning inside character class.
In RegExp literal, / has to be escaped.
In RegExp, since \ is the escape character, if you want to specify a literal \, you need to escape it \\.

Categories