unable to parse - in Regular expression in Javascript - javascript

I am a bit new to the regular expressions in Javascript.
I am trying to write a function called parseRegExpression()
which parses the attributes passed and generates a key/value pairs
It works fine with the input:
"iconType:plus;iconPosition:bottom;"
But it is not able to parse the input:
"type:'date';locale:'en-US';"
Basically the - sign is being ignored. The code is at:
http://jsfiddle.net/visibleinvisibly/ZSS5G/
The Regular Expression key value pair is as below
/[a-z|A-Z|-]*\s*:\s*[a-z|A-Z|'|"|:|-|_|\/|\.|0-9]*\s*;|[a-z|A-Z|-]*\s*:\s*[a-z|A-Z|'|"|:|-|_|\/|\.|0-9]*\s*$/gi;

There are a few problems:
A | inside a character class means a literal | character, not an alternation.
A . inside a character class means a literal . character, so there's no need to escape it.
A - as the first or last character inside a character class means a literal - character, otherwise it means a character range.
There's no need to use [a-zA-Z] when you use the case-insensitive modifier (i); [a-z] is enough.
The only difference between your alterations is the last bit; this can be simplified significantly by just limiting your alternation to that part which is different.
This should be equivalent to your original pattern:
/[a-z-]*\s*:\s*[a-z0-9'":_\/.-]*\s*(?:;|$)/gi

You can avoid the regex:
var test1 = "iconType:plus;iconPosition:bottom;";
var test2 = "type:'date';locale:'en-US';";
function toto(str) {
var result = new Array();
var temp = str.split(';');
for (i=0; i<temp.length-1; i++) {
result[i] = temp[i].split(':',1);
}
return result;
}
console.log(toto(test1));
console.log(toto(test2));

Inside a character set atom [...] the pipe char | is just a regular char and doesn't mean "or".
A character set atom lists characters or ranges you want to accept (or exclude if the character set starts with ^) and "or" is implicit.
You can use a backslash in a character set if you need to include/exclude a close bracket ], the ^ sign, the dash - that is used for ranges, the backslash \ itself, an unprintable character or if you want to use a non-ASCII unicode char specifying the code instead of literally.
Regular expression syntax however also lets you to avoid backslash-escaping in a character set atom by placing the character in a position where it cannot have the special meaning... for example a dash - as first or last in the set (it cannot mean a range there).
Note also that if you need to be able to match as values quoted strings, including backslash escaping, the regular expression is more complex, for example
'(?:[^'\\]|\\.)*'|"(?:[^"\\]|\\.)*"
matches a single-quoted or double-quoted string including backslash escaping, the meaning being:
A single quote '
Zero or more of either:
Any char except the single quote ' or the backslash \
A pair composed of a backslash \ followed by any char
A single quote '
or the same with double quotes " instead.
Note that the groups have been delimited with (?:...) instead of plain (...) to avoid capture

It doesn't match hyphens because it interpreting |-| as a range that starts at | and ends at |. (I would have expected that to be treated as a syntax error, but there you have it. It works the same in every regex flavor I've tried, too.)
Have a look at this regex:
/(?:^|;)([a-z-]*)\s*:\s*([a-z'":_\/.0-9-]*)\s*(?=;|$)/ig
As suggested by the other responders, I collapsed it to one alternative, removed the unneeded pipes, and escaped the hyphen by moving it to the end. I also anchored it at the beginning as well as the end. Or anchored it as well as I can, anyway. I used a lookahead to match the trailing semicolon so it will still be there when the next match starts. It's far from foolproof, but it should work okay as long as the input is well formed.

Replace regular expressions in your code as follow:
regExpKeyValuePair = /[-a-z]*\s*:\s*[-a-z'":_\/.0-9]*\s*;|[-a-z]*\s*:\s*[-a-z'":-_\/.0-9]*\s*$/gi;
regExpKey = /[-a-z]*/gi;
regExpValue = /[-a-z:_\/.0-9]*/gi;
You don't need escape . inside [].
No need to put | between elements [].
Because you are using /i flag, [A-Z] is not needed.
- should be at the beginning or at the end.

Related

Odd RegEx request for Javascript

I'm having trouble with a certain RegEx replacement string for later use in Javascript.
We have quite a bit of text that was stored in a rather odd format that we aren't allowed to fix.
But we do need to find all the "network path" strings inside it, following these rules:
A. The matches always start with 2 backslashes.
B. The matching characters should stop as soon as it hits a first occurrence of any 1 of these:
A < character
A space
A line feed
A carriage return
A & character
A literal "\r" or "\n" string (but only if occurring at end of line)
We "almost" have it working with /\\\\[^ &<\s]*/gi as shown in this RegEx Tester page:
https://regex101.com/r/T4cDOL/5
Even if we get it working, the RegEx has to be even futher "escape escaped" before putting on
our Javascript code, but that's also not working as expected.
From your example, it seems you literally have a backslash followed by an n and a backslash followed by an r (as opposed to a newline or carriage return), which means you can't only use a negated character class (since you need to handle a sequence of two characters). I'd use a positive lookahead to know where to stop, so I can use an alternation for that part.
You haven't said what parts of those strings should match, so I've had to guess a bit, but here's my best guess (with useful input from Niet the Dark Absol):
const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$| ))/gmi;
That says:
Match starting with \\
Take everything prior to the lookahead (non-greedy)
Lookahead: An alternation of:
A space, &, <, carriage return (\r, character 13), or a newline (\n, character 10); or
A backslash followed by r or n if that's either at the end of a line or followed by a space (so we get the \nancy but not the \n after it).
Updated regex101
You might want to have more characters than just a space after the \r/\n. If so, make it a character class (and/or use \s for "whitespace" if that applies):
const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$|[ others]))/gmi;
// −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−^^^^^^^^^

Regex match with '\' slash and replace with '\\'?

I was converting normal string in to latex format.So i was created the latex code match and replace the \ single slash into \\ double slash.why the i need it Refer this link.I tried Below code :
function test(){
var tex="$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$";
var tex_form = tex.replace("/[\\\/\\\\\.\\\\]/g", "\\");
document.getElementById('demo').innerHTML=tex_form;//nothing get
}
test();
<p id="demo"></p>
Not getting any output data.But the match in this link
i wish to need replace the \ into \\
There are these issues:
The string literal has no backslashes;
The regular expression is not a regular expression;
The class in the intended regular expression cannot match sequences, only single characters;
The replacement would not add backslashes, only replace with them.
Here you find the details on each point:
1. How to Encode Backslashes in String Literals
Your tex variable has no backslashes. This is because a backslash in a string literal is not taken as a literal backslash, but as an escape for interpreting the character that follows it.
When you have "$$\left...", then the \l means "literal l", and so the content of your variable will be:
$$left...
As an l does not need to be escaped, the backslash is completely unnecessary, and these two assignments result in the same string value:
var tex="$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$";
var tex="$$left[ x=left({{11}over{2}}+{{sqrt{3271}}over{2,3^{{{3}over{2} $$";
To bring the point home, this will also represent the same value:
var tex="\$\$\l\e\f\t\[\ \x\=\l\e\f\t\(\{\{\1\1\}\o\v\e\r\{\2\}\}\+\{\{\s\q\r\t\{\3\2\7\1\}\}\o\v\e\r\{\2\,\3\^\{\{\{\3\}\o\v\e\r\{\2\}\ \$\$";
If you really want to have literal backslashes in your content (which I understand you do, as this is about LaTeX), then you need to escape each of those backslashes... with a backslash:
var tex="$$\\left[ x=\\left({{11}\\over{2}}+{{\\sqrt{3271}}\\over{2\\,3^{{{3}\\over{2} $$";
Now the content of your tex variable will be this string:
$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$
2. How to Code Regular Expression Literals
You are passing a string literal to the first argument of replace, while you really intend to pass a regular expression literal. You should leave out the quotes for that to happen. The / are the delimiters of a regular expression literal, not quotes:
/[\\\/\\\\\.\\\\]/g
This should not be wrapped in quotes. JavaScript understands the / delimiters as denoting a regular expression literal, including the optional modifiers at the end (like g here).
3. Classes are sets of single characters
This regular expression has unnecessary characters. The class [...] should list all individual characters you want to match. Currently you have these characters (after resolving the escapes):
\
/
\
\
.
\
\
It is overkill to have the backslash represented 5 times. Also, in JavaScript the forward slash and dot do not need to be escaped when occurring in a class. So the above regular expression is equivalent to this one:
/[\\/.]/g
Maybe this is, or is not, what you intended to match. To match several sequences of characters, you could use the | operator. This is just an example:
/\\\\|\\\/|\\\./g
... but I don't think you need this.
4. How to actually prefix with backslashes
It seems strange to me that you would want to replace a point or forward slash with a backslash. Probably you want to prefix those with a backslash. In that case make a capture group (with parentheses) and refer to it with $1 in this replace:
tex.replace(/([\\/.])/g, "\\$1");
Note again, that in the replacement string there is only one literal backslash, as the first one is an escape (see point 1 above).
why the i need it
As the question you link to says, the \ character has special meaning inside a JavaScript string literal. It represents an escape sequence.
Not getting any output data.But the match in this link
The escape sequence is processed when the string literal is parsed by the JavaScript compiler.
By the time you apply your regular expression to them, they have been consumed. The slash characters only exist in your source code, not in your data.
If you want to put a slash character in your string, then you need to write the escape sequence for it (the \\) in the source code. You can't add them back in with JavaScript afterwards.
Not sure if I understood the problem, but try this code:
var tex_form = tex.replace("/(\\)/g","\\\\");.
You need to use '(' ')' instead of '['']' to get a match for output.

javascript replace() not replacing text containing literal \r\n strings

Using this bit of code trims out hidden characters like carriage returns and linefeeds with nothing using javascript just fine:
value = value.replace(/[\r\n]*/g, "");
but when the code actually contains \r\n text what do I do to trim it without affecting r's and n's in my content? I've tried this code:
value = value.replace(/[\\r\\n]+/g, "");
on this bit of text:
{"client":{"werdfasreasfsd":"asdfRasdfas\r\nMCwwDQYJKoZIhvcNAQEBBQADGw......
I end up with this:
{"cliet":{"wedfaseasfsd":"asdfRasdfasMCwwDQYJKoZIhvcNAQEBBQADGw......
Side note: It leaves the upper case versions of R and N alone because I didn't include the /i flag at the end and thats ok in this case.
What do I do to just remove \r\n text found in the string?
If you want to match literal \r and literal \n then you should use the following:
value = value.replace(/(?:\\[rn])+/g, "");
You might think that matching literal \r and \n with [\\r\\n] is the right way to do it and it is a bit confusing but it won't work and here is why:
Remember that in character classes, each single character represents a single letter or symbol, it doesn't represent a sequence of characters, it is just a set of characters.
So the character class [\\r\\n] actually matches the literal characters \, r and n as separate letters and not as sequences.
Edit: If you want to replace all carriage returns \r, newlines \n and also literal \r and '\n` then you could use:
value = value.replace(/(?:\\[rn]|[\r\n]+)+/g, "");
About (?:) it means a non-capturing group, because by default when you put something into a usual group () then it gets captured into a numbered variable that you can use elsewhere inside the regular expression itself, or latter in the matches array.
(?:) prevents capturing the value and causes less overhead than (), for more info see this article.
To just remove them, this seems to work for me:
value = value.replace(/[\r\n]/g, "");
You don't need the * after the character set because the g flag solves that for you.
Note, this will remove all \r or \n chars whether they are in this exact sequence or not.
Working demo of this option: http://jsfiddle.net/jfriend00/57GtJ/
If you want to remove these characters only when in this exact sequence (e.g. only when a \r is directly followed by a \n, you could use this:
value = value.replace(/\r\n/g, "");
Working demo of this option: http://jsfiddle.net/jfriend00/Ta3sn/
If you have text with a lot of \r\n and want to save all of them try this one
value.replace(/(?:\\[rn]|[\r\n])/g,"<br>")
http://jsfiddle.net/57GtJ/63/

Javascript regex invalid range in character class

I'm using a regex pattern that I got from regexlib to validate relative urls. On their site you can test the pattern to make sure it fits your needs. Everything works great on their site, as soon as I use the pattern in mine I get the error message:
Invalid range in character class
I know that this error usually means that a hyphen is mistakenly being used to represent a range and is not properly escaped. But in this case since it works on their site I'm confused why it's not working on mine.
var urlRegex = new RegExp('^(?:(?:\.\./)|/)?(?:\w(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*\w?)?(?:/\w(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*\w?)*(?:\?[^#]+)?(?:#[a-z0-9]\w*)?$', 'g');
NOTE:
If you're going to test the regex from their site (using the link above) be sure to change the Regex Engine dropdown to Client-side Engine and the Engine dropdown to Javascript.
Either put - at the end or beginning of the character class or use two backslashes to do a regex escape within string
since you are using string you need to use two backslashes for each special characters..
NOTE
Check out this answer on SO which explains when to use single or double backslashes to escape special characters
There is no reason to use RegExp constructor here. Just use RegExp literal:
var urlRegex = /^(?:(?:\.\.\/)|\/)?(?:\w(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*\w?)?(?:\/\w(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*\w?)*(?:\?[^#]+)?(?:#[a-z0-9]\w*)?$/g;
^ ^ ^ ^ ^
Inside RegExp literal, you just write the regex naturally, except for /, which now needs escaping, since / is used as delimiter in the RegExp literal.
In character class, ^ has special meaning at the beginning of the character class, - has special meaning in between 2 characters, and \ has special meaning, which is to escape other characters (mainly ^, -, [, ] and \) and also to specify shorthand character classes (\d, \s, \w, ...). [, ] are used as delimiters for character class, so they also have special meaning. (Actually, in JavaScript, only ] has special meaning, and you can specify [ without escaping inside character class). Other than those 5 character listed above, other characters (unless involved in an escape sequence with \) doesn't have any special meaning.
You can reduce the number of escaping \ with the information above. For ^, unless it is the only character in the character class, you can put it away from the beginning of the character class. For -, you can put it at the end of the character class.
var urlRegex = /^(?:(?:\.\.\/)|\/)?(?:\w(?:[\w`~!$=;+.^()|{}\[\]-]|(?:%\d\d))*\w?)?(?:\/\w(?:[\w`~!$=;+.^()|{}\[\]-]|(?:%\d\d))*\w?)*(?:\?[^#]+)?(?:#[a-z0-9]\w*)?$/g;
What was changed:
[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]
[\w`~!$=;+.^()|{}\[\]-]

Writing a Javascript regex that includes special reserved characters

I'm writing a function that takes a prospective filename and validates it in order to ensure that no system disallowed characters are in the filename. These are the disallowed characters: / \ | * ? " < >
I could obviously just use string.indexOf() to search for each special char one by one, but that's a lot longer than it would be to just use string.search() using a regular expression to find any of those characters in the filename.
The problem is that most of these characters are considered to be part of describing a regular expression, so I'm unsure how to include those characters as actually being part of the regex itself. For example, the / character in a Javascript regex tells Javascript that it is the beginning or end of the regex. How would one write a JS regex that functionally behaves like so: filename.search(\ OR / OR | OR * OR ? OR " OR < OR >)
Put your stuff in a character class like so:
[/\\|*?"<>]
You're gonna have to escape the backslash, but the other characters lose their special meaning. Also, RegExp's test() method is more appropriate than String.search in this case.
filenameIsInvalid = /[/\\|*?"<>]/.test(filename);
Include a backslash before the special characters [\^$.|?*+(){}, for instance, like \$
You can also search for a character by specified ASCII/ANSI value. Use \xFF where FF are 2 hexadecimal digits. Here is a hex table reference. http://www.asciitable.com/ Here is a regex reference http://www.regular-expressions.info/reference.html
The correct syntax of the regex is:
/^[^\/\\|\*\?"<>]+$/
The [^ will match anything, but anything that is matched in the [^] group will return the match as null. So to check for validation is to match against null.
Demo: jsFiddle.
Demo #2: Comparing against null.
The first string is valid; the second is invalid, hence null.
But obviously, you need to escape regex characters that are used in the matching. To escape a character that is used for regex needs to have a backslash before the character, e.g. \*, \/, \$, \?.
You'll need to escape the special characters. In javascript this is done by using the \ (backslash) character.
I'd recommend however using something like xregexp which will handle the escaping for you if you wish to match a string literal (something that is lacking in javascript's native regex support).

Categories