Regular Expression in JS: \\. does not match \n - javascript

I am getting a string containing newlines (/n), tabs (/t) and lowercase letters [a-z]. It is possible to do that by matching /\n|\t/. AFAIK the dot represents the wildcard.
Therefore I was wondering, why /\n|\t/ doesn't match the same things as /\\./
var text = 'test1 \ntest2';
text.split(/\n/) //['test1', 'test2']
text.split(/\./) //['test1 \ntest2']
text.split(/\\./) //['test1 \ntest2']
Shouldn't the \\. match the \n (newline)?

Let me try and answer all the points:
AFAIK the dot represents the wildcard.
No, in regex, we do not use the term "wildcard". It is a special regex (meta)character. A dot in JavaScript regex matches any character but a newline.
I was wondering, why /\n|\t/ doesn't match the same things as /\\./
Because /\n|\t/ matches 1 symbol, either a newline or tab, while the regex /\\./ matches a literal \ and a character other than a newline.
The \n and \t are escape sequences. That means that the \ is not a literal backaslash that, together with the following symbol forms a code unit, a string that cannot be written otherwise. Indeed, how can we write a line break on the paper with a pen? No way!
See more about JavaScript character escape sequences here.
Now,
text.split(/\n/) //['test1', 'test2']
True, your input string contains a line break, thus, you get two elements in the resulting array
text.split(/\./) //['test1 \ntest2']
No match was found because \. matches a literal dot. A dot that is escaped (that has a literal \ before it) in the regex stops being a special regex metacharacter, and just matches its literal representation. Your string has no dot, thus, no matches.
text.split(/\\./) //['test1 \ntest2']
Again, no match is found, as /\\./ looks for a literal \ followed by any character but a newline.
A hint: use your expressions at regex101.com, it will tell you what your regex can match on the right.
Here, with regex, you have a literal notation (/.../). In literal notation, \ is considered a literal, thus, you do not have to escape it twice. If you used a constructor notation (i.e. RegExp(....)), you would have to use double escaping. E.g.
var re = /\\./; // is equal to
var re = new RegExp("\\\\.");
See more about constructor and literal notations at MDN RegExp help page.

\n gets evaluated to a new line, so you're essentially matching against an empty string. If you do a quick console.log('\n'); you can see the output of that.

Related

why does '\\\[' equals '\\[' ? How does backslash work in string?

As the title
console.log('\\\[' === '\\[');
returns true.
Can anyone explain in detail what's the difference?
A backslash before most characters will only be parsed as an unnecessary escape character - the backslash will be ignored. This is what's happening in the second part of the first string. Before a certain few characters though, such as another backslash in \\, or \n, it will be parsed as a escape sequence. \\ is the escape sequence for a single literal backslash:
console.log('\\');
and is only one character.
A backslash before a [ will resolve to just the [, though:
console.log('\[');
So:
'\\\[' - A literal backslash, followed by an (unnecessarily escaped) [
'\\[' - A literal backslash, followed by a plain [
See MDN for a list of escape sequences.
In strings, the backslash (\) is a special character used to encode other special characters, including the backslash.
'\\[' is a JavaScript string literal that contains a backslash (\\) and an open square bracket ([). In the compiled program the string is \[.
'\\\[' is a JavaScript string literal that contains a correctly encoded backslash (\\) followed by the combination of characters \[ that looks like an escape sequence but doesn't mean anything. Because this combination is not defined and \ by itself does not mean anything, the JavaScript interpreter ignores the backslash and corrects the string; it becomes identical to the first one (\[).
The behaviour is documented:
For characters not listed in the table, a preceding backslash is ignored, but this usage is deprecated and should be avoided.
Backslash is a special character. Literally, JS talk to browser to interpret the symbol after \ as is. Sometimes it calls screening or shielding.
That is why we can write smth like that: console.log("Double \"quotes\" inside another one."); with the result of Double "quotes" inside another one. without any error. Although that is not the way we need to use anywhere.
"\\\[" separates into 2 parts: \\ and \[. First returns \ and the second returns [. Finally it is \[.
"\\[" separates into 2 parts: \\ and [. First returns \ and the second returns [. Finally it is \[.

Regex match with '\' slash and replace with '\\'?

I was converting normal string in to latex format.So i was created the latex code match and replace the \ single slash into \\ double slash.why the i need it Refer this link.I tried Below code :
function test(){
var tex="$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$";
var tex_form = tex.replace("/[\\\/\\\\\.\\\\]/g", "\\");
document.getElementById('demo').innerHTML=tex_form;//nothing get
}
test();
<p id="demo"></p>
Not getting any output data.But the match in this link
i wish to need replace the \ into \\
There are these issues:
The string literal has no backslashes;
The regular expression is not a regular expression;
The class in the intended regular expression cannot match sequences, only single characters;
The replacement would not add backslashes, only replace with them.
Here you find the details on each point:
1. How to Encode Backslashes in String Literals
Your tex variable has no backslashes. This is because a backslash in a string literal is not taken as a literal backslash, but as an escape for interpreting the character that follows it.
When you have "$$\left...", then the \l means "literal l", and so the content of your variable will be:
$$left...
As an l does not need to be escaped, the backslash is completely unnecessary, and these two assignments result in the same string value:
var tex="$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$";
var tex="$$left[ x=left({{11}over{2}}+{{sqrt{3271}}over{2,3^{{{3}over{2} $$";
To bring the point home, this will also represent the same value:
var tex="\$\$\l\e\f\t\[\ \x\=\l\e\f\t\(\{\{\1\1\}\o\v\e\r\{\2\}\}\+\{\{\s\q\r\t\{\3\2\7\1\}\}\o\v\e\r\{\2\,\3\^\{\{\{\3\}\o\v\e\r\{\2\}\ \$\$";
If you really want to have literal backslashes in your content (which I understand you do, as this is about LaTeX), then you need to escape each of those backslashes... with a backslash:
var tex="$$\\left[ x=\\left({{11}\\over{2}}+{{\\sqrt{3271}}\\over{2\\,3^{{{3}\\over{2} $$";
Now the content of your tex variable will be this string:
$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$
2. How to Code Regular Expression Literals
You are passing a string literal to the first argument of replace, while you really intend to pass a regular expression literal. You should leave out the quotes for that to happen. The / are the delimiters of a regular expression literal, not quotes:
/[\\\/\\\\\.\\\\]/g
This should not be wrapped in quotes. JavaScript understands the / delimiters as denoting a regular expression literal, including the optional modifiers at the end (like g here).
3. Classes are sets of single characters
This regular expression has unnecessary characters. The class [...] should list all individual characters you want to match. Currently you have these characters (after resolving the escapes):
\
/
\
\
.
\
\
It is overkill to have the backslash represented 5 times. Also, in JavaScript the forward slash and dot do not need to be escaped when occurring in a class. So the above regular expression is equivalent to this one:
/[\\/.]/g
Maybe this is, or is not, what you intended to match. To match several sequences of characters, you could use the | operator. This is just an example:
/\\\\|\\\/|\\\./g
... but I don't think you need this.
4. How to actually prefix with backslashes
It seems strange to me that you would want to replace a point or forward slash with a backslash. Probably you want to prefix those with a backslash. In that case make a capture group (with parentheses) and refer to it with $1 in this replace:
tex.replace(/([\\/.])/g, "\\$1");
Note again, that in the replacement string there is only one literal backslash, as the first one is an escape (see point 1 above).
why the i need it
As the question you link to says, the \ character has special meaning inside a JavaScript string literal. It represents an escape sequence.
Not getting any output data.But the match in this link
The escape sequence is processed when the string literal is parsed by the JavaScript compiler.
By the time you apply your regular expression to them, they have been consumed. The slash characters only exist in your source code, not in your data.
If you want to put a slash character in your string, then you need to write the escape sequence for it (the \\) in the source code. You can't add them back in with JavaScript afterwards.
Not sure if I understood the problem, but try this code:
var tex_form = tex.replace("/(\\)/g","\\\\");.
You need to use '(' ')' instead of '['']' to get a match for output.

unable to parse - in Regular expression in Javascript

I am a bit new to the regular expressions in Javascript.
I am trying to write a function called parseRegExpression()
which parses the attributes passed and generates a key/value pairs
It works fine with the input:
"iconType:plus;iconPosition:bottom;"
But it is not able to parse the input:
"type:'date';locale:'en-US';"
Basically the - sign is being ignored. The code is at:
http://jsfiddle.net/visibleinvisibly/ZSS5G/
The Regular Expression key value pair is as below
/[a-z|A-Z|-]*\s*:\s*[a-z|A-Z|'|"|:|-|_|\/|\.|0-9]*\s*;|[a-z|A-Z|-]*\s*:\s*[a-z|A-Z|'|"|:|-|_|\/|\.|0-9]*\s*$/gi;
There are a few problems:
A | inside a character class means a literal | character, not an alternation.
A . inside a character class means a literal . character, so there's no need to escape it.
A - as the first or last character inside a character class means a literal - character, otherwise it means a character range.
There's no need to use [a-zA-Z] when you use the case-insensitive modifier (i); [a-z] is enough.
The only difference between your alterations is the last bit; this can be simplified significantly by just limiting your alternation to that part which is different.
This should be equivalent to your original pattern:
/[a-z-]*\s*:\s*[a-z0-9'":_\/.-]*\s*(?:;|$)/gi
You can avoid the regex:
var test1 = "iconType:plus;iconPosition:bottom;";
var test2 = "type:'date';locale:'en-US';";
function toto(str) {
var result = new Array();
var temp = str.split(';');
for (i=0; i<temp.length-1; i++) {
result[i] = temp[i].split(':',1);
}
return result;
}
console.log(toto(test1));
console.log(toto(test2));
Inside a character set atom [...] the pipe char | is just a regular char and doesn't mean "or".
A character set atom lists characters or ranges you want to accept (or exclude if the character set starts with ^) and "or" is implicit.
You can use a backslash in a character set if you need to include/exclude a close bracket ], the ^ sign, the dash - that is used for ranges, the backslash \ itself, an unprintable character or if you want to use a non-ASCII unicode char specifying the code instead of literally.
Regular expression syntax however also lets you to avoid backslash-escaping in a character set atom by placing the character in a position where it cannot have the special meaning... for example a dash - as first or last in the set (it cannot mean a range there).
Note also that if you need to be able to match as values quoted strings, including backslash escaping, the regular expression is more complex, for example
'(?:[^'\\]|\\.)*'|"(?:[^"\\]|\\.)*"
matches a single-quoted or double-quoted string including backslash escaping, the meaning being:
A single quote '
Zero or more of either:
Any char except the single quote ' or the backslash \
A pair composed of a backslash \ followed by any char
A single quote '
or the same with double quotes " instead.
Note that the groups have been delimited with (?:...) instead of plain (...) to avoid capture
It doesn't match hyphens because it interpreting |-| as a range that starts at | and ends at |. (I would have expected that to be treated as a syntax error, but there you have it. It works the same in every regex flavor I've tried, too.)
Have a look at this regex:
/(?:^|;)([a-z-]*)\s*:\s*([a-z'":_\/.0-9-]*)\s*(?=;|$)/ig
As suggested by the other responders, I collapsed it to one alternative, removed the unneeded pipes, and escaped the hyphen by moving it to the end. I also anchored it at the beginning as well as the end. Or anchored it as well as I can, anyway. I used a lookahead to match the trailing semicolon so it will still be there when the next match starts. It's far from foolproof, but it should work okay as long as the input is well formed.
Replace regular expressions in your code as follow:
regExpKeyValuePair = /[-a-z]*\s*:\s*[-a-z'":_\/.0-9]*\s*;|[-a-z]*\s*:\s*[-a-z'":-_\/.0-9]*\s*$/gi;
regExpKey = /[-a-z]*/gi;
regExpValue = /[-a-z:_\/.0-9]*/gi;
You don't need escape . inside [].
No need to put | between elements [].
Because you are using /i flag, [A-Z] is not needed.
- should be at the beginning or at the end.

Regular expression issue in javascript

Im trying to get all the power calculations out of a string using reg exp's i tried the following code:
var regex = new RegExp('[0-9]+[^]{1}[0-9]+');
regex.exec('1^2');
this works and returns 1^2 but when i try to use the following string:
regex.exec('1+1^2');
it returns 1+1
This is because [^xyz] means "not x, y, or z." ^ is the "not" operator in character classes ([...]). To fix this, simply escape it (one backslash to escape the ^, and another to escape the first backslash since it's in a string and it's a special character):
var regex = new RegExp('[0-9]+[\\^]{1}[0-9]+');
Also, you don't need to use character classes and the {1} if you only have one character; just do this:
var regex = new RegExp('[0-9]+\\^[0-9]+');
Finally, one more improvement - you can use literal regular expression syntax (/.../) so you don't need two backslashes:
var regex = /[0-9]+\^[0-9]+/;
Fiddle
[^] in regex terms is a character class ([]) that's been inverted (^). e.g. [^abc] is "any character that is NOT a, b, or c". You need to escape the carat: [\^].
As well, {1} is redundant. Any character class or individual character in a regex has an implied {1} on it, so /a{1}b{1}c{1}/ is just a very verbose way of saying /abc/.
As well, a single-char character class is also redundant. /[a]/ is exactly the same as /a/.

How can I use backslashes (\) in a string?

I tried many ways to get a single backslash from an executed (I don't mean an input from html).
I can get special characters as tab, new line and many others then escape them to \\t or \\n or \\(someother character) but I cannot get a single backslash when a non-special character is next to it.
I don't want something like:
str = "\apple"; // I want this, to return:
console.log(str); // \apple
and if I try to get character at 0 then I get a instead of \.
(See ES2015 update at the end of the answer.)
You've tagged your question both string and regex.
In JavaScript, the backslash has special meaning both in string literals and in regular expressions. If you want an actual backslash in the string or regex, you have to write two: \\.
The following string starts with one backslash, the first one you see in the literal is an escape character starting an escape sequence. The \\ escape sequence tells the parser to put a single backslash in the string:
var str = "\\I have one backslash";
The following regular expression will match a single backslash (not two); again, the first one you see in the literal is an escape character starting an escape sequence. The \\ escape sequence tells the parser to put a single backslash character in the regular expression pattern:
var rex = /\\/;
If you're using a string to create a regular expression (rather than using a regular expression literal as I did above), note that you're dealing with two levels: The string level, and the regular expression level. So to create a regular expression using a string that matches a single backslash, you end up using four:
// Matches *one* backslash
var rex = new RegExp("\\\\");
That's because first, you're writing a string literal, but you want to actually put backslashes in the resulting string, so you do that with \\ for each one backslash you want. But your regex also requires two \\ for every one real backslash you want, and so it needs to see two backslashes in the string. Hence, a total of four. This is one of the reasons I avoid using new RegExp(string) whenver I can; I get confused easily. :-)
ES2015 and ES2018 update
Fast-forward to 2015, and as Dolphin_Wood points out the new ES2015 standard gives us template literals, tag functions, and the String.raw function:
// Yes, this unlikely-looking syntax is actually valid ES2015
let str = String.raw`\apple`;
str ends up having the characters \, a, p, p, l, and e in it. Just be careful there are no ${ in your template literal, since ${ starts a substitution in a template literal. E.g.:
let foo = "bar";
let str = String.raw`\apple${foo}`;
...ends up being \applebar.
Try String.raw method:
str = String.raw`\apple` // "\apple"
Reference here: String.raw()
\ is an escape character, when followed by a non-special character it doesn't become a literal \. Instead, you have to double it \\.
console.log("\apple"); //-> "apple"
console.log("\\apple"); //-> "\apple"
There is no way to get the original, raw string definition or create a literal string without escape characters.
please try the below one it works for me and I'm getting the output with backslash
String sss="dfsdf\\dfds";
System.out.println(sss);

Categories