Javascript regex parsing - javascript

I'm looking to parse some formatting out of a field using javascript. My rule is catching some extra things which I need to fix. The regex is:
/[\((\)\s)-]/g
This regex is properly cleaning up: (123) 456-7890 the problem I'm having is that it is also removing all spaces rather than just spaces following a closing parentheses. I'm no expert in regex but it was my understanding that (\)\s) would only remove the closing parentheses and space combo. What would the correct regex look like? It needs to remove all parentheses and dashes. Also, only remove spaces immediately following a closing parentheses.
The outcomes I would like are such.
The replace method i am using should work as such
var str = mystring.replace(/[\((\)\s)-]/g, '');
(123) 456-7890 should become 1234567890 which is working.
leave me alone should stay leave me alone the issue is that it is becoming leavemealone

This will do the job:
var str = mystring.replace(/\)\s*|\(\s*|-/g, '');
Explanation of the regex:
\)\s* : Open parenthesis followed by any number of whitespace
| : OR
\(\s* : Close parenthesis followed by any number of whitespace
| : OR
- : Hyphen
Since parenthesis are regex-metacharacters used for grouping they need to be escaped when you want to match them literally.

Placing everything in brackets ([]) creates a class of characters to match anywhere in the input. Taking your requirements literally ("remove all parentheses, dashes and spaces immediately following a closing parentheses"):
"(123) 456-789 0".replace(/\)[\(\)\s-]+/g, ")")
Output:
"(123)456-789 0"
This matches (essentially) the same character class, but specifies that these characters immediately follow a closing parenthesis.

You could use lookbehind to ensure that there is a paranthesis or something else preceding the space:
(?<=\))\s
------------ OLD ANSWER ----------
If you want to remove all paranthesis, dashes and spaces, you would go with something like this:
/[\s-\(\)]+/g
[something] - would look for anything that is in the brackets (letters s, o, m, e, t, h, i, n, g).
\s = white space
( = paranthesis
) = paranthesis
+ = at least one or more occurance of what is preceding it (which would be paranthesis, white space and dashes)

Related

Odd RegEx request for Javascript

I'm having trouble with a certain RegEx replacement string for later use in Javascript.
We have quite a bit of text that was stored in a rather odd format that we aren't allowed to fix.
But we do need to find all the "network path" strings inside it, following these rules:
A. The matches always start with 2 backslashes.
B. The matching characters should stop as soon as it hits a first occurrence of any 1 of these:
A < character
A space
A line feed
A carriage return
A & character
A literal "\r" or "\n" string (but only if occurring at end of line)
We "almost" have it working with /\\\\[^ &<\s]*/gi as shown in this RegEx Tester page:
https://regex101.com/r/T4cDOL/5
Even if we get it working, the RegEx has to be even futher "escape escaped" before putting on
our Javascript code, but that's also not working as expected.
From your example, it seems you literally have a backslash followed by an n and a backslash followed by an r (as opposed to a newline or carriage return), which means you can't only use a negated character class (since you need to handle a sequence of two characters). I'd use a positive lookahead to know where to stop, so I can use an alternation for that part.
You haven't said what parts of those strings should match, so I've had to guess a bit, but here's my best guess (with useful input from Niet the Dark Absol):
const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$| ))/gmi;
That says:
Match starting with \\
Take everything prior to the lookahead (non-greedy)
Lookahead: An alternation of:
A space, &, <, carriage return (\r, character 13), or a newline (\n, character 10); or
A backslash followed by r or n if that's either at the end of a line or followed by a space (so we get the \nancy but not the \n after it).
Updated regex101
You might want to have more characters than just a space after the \r/\n. If so, make it a character class (and/or use \s for "whitespace" if that applies):
const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$|[ others]))/gmi;
// −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−^^^^^^^^^

how to replace or remove all text strings outside of parenthesis in javascript

I am very new to Regex and I am trying to remove all text outside the parenthesis and only keep everything inside the parenthesis.
For example 1,
Hello,this_isLuxy.(example)
to this:
(example)
Example 2:remove everything after the period
luxySO_i.example
to this:
luxySO_i
Using JS + Regex? Thanks so much!
For this simple string, you can use indexOf and substring functions:
var openParenthesisIndex = str.indexOf('(');
var closedParenthesisIndex = str.indexOf(')', openParenthesisIndex);
var result = str.substring(openParenthesisIndex, closedParenthesisIndex + 1);
Ok, if you want to use regex, then it's going to be a bit complicated. Anyways, here you go:
var str = "Hello,this_(isL)uxy.(example) asd (todo)";
var result = str.replace(/[^()](?=([^()]*\([^()]*\))*[^()]*$)/g, '');
console.log(result); // "(isL)(example)(todo)"
In short, this replaces any non () character, which is followed by zero or more balanced parenthesis. It will fail for nested or non-balanced parenthesis though.
To keep only things inside parenthesis you can use
s.replace(/.*?(\([^)]*\)).*?/g, "$1")
meaning is:
.*? any sequence of any char (but the shortest possible sequence)
\( an open parenthesis
[^)]* zero or more chars that are NOT a closed parenthesis
\) a close parenthesis
.*? any sequence of any char (but the shortest possible)
the three middle elements are what is kept using grouping (...) and $1.
To remove everything after the first period the expression is simply:
s.replace(/\..*/, "")
meaning:
\. the dot character (. is special and would otherwise mean "any char")
.* any sequence of any characters (i.e. everything until the end of the string)
replacing it with the empty string

Regex-Groups in Javascript

I have a problem using a Javascript-Regexp.
This is a very simplified regexp, which demonstrates my Problem:
(?:\s(\+\d\w*))|(\w+)
This regex should only match strings, that doesn't contain forbidden characters (everything that is no word-character).
The only exception is the Symbol +
A match is allowed to start with this symbol, if [0-9] is trailing.
And a + must not appear within words (44+44 is not a valid match, but +4ad is)
In order to allow the + only at the beginning, I said that there must be a whitespace preceding. However, I don't want the whitespace to be part of the match.
I tested my regex with this tool: http://regex101.com/#javascript and the resultig matches look fine.
There are 2 Issues with that regexp:
If I use it in my JS-Code, the space is always part of the match
If +42 appears at the beginning of a line, it won't be matched
My Questions:
How should the regex look like?
Why does this regex add the space to the matches?
Here's my JS-Code:
var input = "+5ad6 +5ad6 sd asd+as +we";
var regexp = /(?:\s(\+\d\w*))|(\w+)/g;
var tokens = input.match(regexp);
console.log(tokens);
How should the regex look like?
You've got multiple choices to reach your goal:
It's fine as you have it. You might allow the string beginning in place of the whitespace as well, though. Just get the capturing groups (tokens[1], tokens[2]) out of it, which will not include the whitespace.
If you didn't use JavaScript, a lookbehind could help. Unfortunately it's not supported.
Require a non-word-boundary before the +, which would make every \w character before the + prevent the match:
/\B\+\d\w+|\w+/
Why does this regex add the space to the matches?
Because the regex does match the whitespace. It does not add the \s(\+\d\w+) to the captured groups, though.

unable to parse - in Regular expression in Javascript

I am a bit new to the regular expressions in Javascript.
I am trying to write a function called parseRegExpression()
which parses the attributes passed and generates a key/value pairs
It works fine with the input:
"iconType:plus;iconPosition:bottom;"
But it is not able to parse the input:
"type:'date';locale:'en-US';"
Basically the - sign is being ignored. The code is at:
http://jsfiddle.net/visibleinvisibly/ZSS5G/
The Regular Expression key value pair is as below
/[a-z|A-Z|-]*\s*:\s*[a-z|A-Z|'|"|:|-|_|\/|\.|0-9]*\s*;|[a-z|A-Z|-]*\s*:\s*[a-z|A-Z|'|"|:|-|_|\/|\.|0-9]*\s*$/gi;
There are a few problems:
A | inside a character class means a literal | character, not an alternation.
A . inside a character class means a literal . character, so there's no need to escape it.
A - as the first or last character inside a character class means a literal - character, otherwise it means a character range.
There's no need to use [a-zA-Z] when you use the case-insensitive modifier (i); [a-z] is enough.
The only difference between your alterations is the last bit; this can be simplified significantly by just limiting your alternation to that part which is different.
This should be equivalent to your original pattern:
/[a-z-]*\s*:\s*[a-z0-9'":_\/.-]*\s*(?:;|$)/gi
You can avoid the regex:
var test1 = "iconType:plus;iconPosition:bottom;";
var test2 = "type:'date';locale:'en-US';";
function toto(str) {
var result = new Array();
var temp = str.split(';');
for (i=0; i<temp.length-1; i++) {
result[i] = temp[i].split(':',1);
}
return result;
}
console.log(toto(test1));
console.log(toto(test2));
Inside a character set atom [...] the pipe char | is just a regular char and doesn't mean "or".
A character set atom lists characters or ranges you want to accept (or exclude if the character set starts with ^) and "or" is implicit.
You can use a backslash in a character set if you need to include/exclude a close bracket ], the ^ sign, the dash - that is used for ranges, the backslash \ itself, an unprintable character or if you want to use a non-ASCII unicode char specifying the code instead of literally.
Regular expression syntax however also lets you to avoid backslash-escaping in a character set atom by placing the character in a position where it cannot have the special meaning... for example a dash - as first or last in the set (it cannot mean a range there).
Note also that if you need to be able to match as values quoted strings, including backslash escaping, the regular expression is more complex, for example
'(?:[^'\\]|\\.)*'|"(?:[^"\\]|\\.)*"
matches a single-quoted or double-quoted string including backslash escaping, the meaning being:
A single quote '
Zero or more of either:
Any char except the single quote ' or the backslash \
A pair composed of a backslash \ followed by any char
A single quote '
or the same with double quotes " instead.
Note that the groups have been delimited with (?:...) instead of plain (...) to avoid capture
It doesn't match hyphens because it interpreting |-| as a range that starts at | and ends at |. (I would have expected that to be treated as a syntax error, but there you have it. It works the same in every regex flavor I've tried, too.)
Have a look at this regex:
/(?:^|;)([a-z-]*)\s*:\s*([a-z'":_\/.0-9-]*)\s*(?=;|$)/ig
As suggested by the other responders, I collapsed it to one alternative, removed the unneeded pipes, and escaped the hyphen by moving it to the end. I also anchored it at the beginning as well as the end. Or anchored it as well as I can, anyway. I used a lookahead to match the trailing semicolon so it will still be there when the next match starts. It's far from foolproof, but it should work okay as long as the input is well formed.
Replace regular expressions in your code as follow:
regExpKeyValuePair = /[-a-z]*\s*:\s*[-a-z'":_\/.0-9]*\s*;|[-a-z]*\s*:\s*[-a-z'":-_\/.0-9]*\s*$/gi;
regExpKey = /[-a-z]*/gi;
regExpValue = /[-a-z:_\/.0-9]*/gi;
You don't need escape . inside [].
No need to put | between elements [].
Because you are using /i flag, [A-Z] is not needed.
- should be at the beginning or at the end.

regex for content between and including the parenthees

Can someone help me with a regex that will catch the following:
has to be at the end of the string
remove all characters between ( and ) including the parentheses
It's going to me done in javascript.
here's what i have so far -
var title = $(this).find('title').text().replace(/\w+\s+\(.*?\)/, "");
It seems to be catching some chars outside of the parenthees though.
This deals with matching between parens, and only at the of the string: \([^(]*\)\s*$. If the parens might be nested, you need a parser, not a regular expression.
Where's the $? You need a dollar at the end and possibly catch optional whitespace.
var title = $(this).find('title').text().replace(/\s*\([^\)]*?\)\s*$/, "");
If brackets can also be angle brackets, then this can match those too:
var title = $(this).find('title').text().replace(/\s*(\([^\)]*?\)|\<[^\>]*?\>)\s*$/, "");
var title = $(this).find('title').text().replace(/\([^()]*\)\s*$/, "");
should work.
To remove < and > you don't really need regexes, but of course you can do a mystr.replace(/[<>]+/g, "");
This will match a (, any number of characters except parentheses (thereby ensuring that only the last parentheses will match) and a ), and then the end of the string.
Currently, it allows whitespace between the parentheses and the end of the string (and will remove it, too). If that's not desired, remove the \s* bit from the regex.

Categories