Nice way to do this regex substitution

Nice way to do this regex substitution - javascript

I'm writing a javascript function which takes a regex and some elements against which it matches the regex against the name attribute.
Let's say i'm passed this regex
/cmw_step_attributes\]\[\d*\]/
and a string that is structured like this
"foo[bar][]chicken[123][cmw_step_attributes][456][name]"
where all the numbers could vary, or be missing. I want to match the regex against the string in order to swap out the 456 for another number (which will vary), eg 789. So, i want to end up with
"foo[bar][]chicken[123][cmw_step_attributes][789][name]"
The regex will match the string, but i can't swap out the whole regex for 789 as that will wipe out the "[cmw_step_attributes][" bit. There must be a clean and simple way to do this but i can't get my head round it. Any ideas?
thanks, max

Capture the first part and put it back into the string.
.replace(/(cmw_step_attributes\]\[)\d*/, '$1789');
// note I removed the closing ] from the end - quantifiers are greedy so all numbers are selected
// alternatively:
.replace(/cmw_step_attributes\]\[\d*\]/, 'cmw_step_attributes][789]')

Either literally rewrite part that must remain the same in replacement string, or place it inside capturing brackets and reference it in replace.

See answer on: Regular Expression to match outer brackets.
Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.

Have you tried:
var str = 'foo[bar][]chicken[123][cmw_step_attributes][456][name]';
str.replace(/cmw_step_attributes\]\[\d*?\]/gi, 'cmw_step_attributes][XXX]');

Related

Get the Opposite of a Regular Expression [duplicate]

Is it possible to write a regex that returns the converse of a desired result? Regexes are usually inclusive - finding matches. I want to be able to transform a regex into its opposite - asserting that there are no matches. Is this possible? If so, how?
http://zijab.blogspot.com/2008/09/finding-opposite-of-regular-expression.html states that you should bracket your regex with
/^((?!^ MYREGEX ).)*$/
, but this doesn't seem to work. If I have regex
/[a|b]./
, the string "abc" returns false with both my regex and the converse suggested by zijab,
/^((?!^[a|b].).)*$/
. Is it possible to write a regex's converse, or am I thinking incorrectly?

Couldn't you just check to see if there are no matches? I don't know what language you are using, but how about this pseudocode?
if (!'Some String'.match(someRegularExpression))
// do something...
If you can only change the regex, then the one you got from your link should work:
/^((?!REGULAR_EXPRESSION_HERE).)*$/

The reason your inverted regex isn't working is because of the '^' inside the negative lookahead:
/^((?!^[ab].).)*$/
^ # WRONG
Maybe it's different in vim, but in every regex flavor I'm familiar with, the caret matches the beginning of the string (or the beginning of a line in multiline mode). But I think that was just a typo in the blog entry.
You also need to take into account the semantics of the regex tool you're using. For example, in Perl, this is true:
"abc" =~ /[ab]./
But in Java, this isn't:
"abc".matches("[ab].")
That's because the regex passed to the matches() method is implicitly anchored at both ends (i.e., /^[ab].$/).
Taking the more common, Perl semantics, /[ab]./ means the target string contains a sequence consisting of an 'a' or 'b' followed by at least one (non-line separator) character. In other words, at ANY point, the condition is TRUE. The inverse of that statement is, at EVERY point the condition is FALSE. That means, before you consume each character, you perform a negative lookahead to confirm that the character isn't the beginning of a matching sequence:
(?![ab].).
And you have to examine every character, so the regex has to be anchored at both ends:
/^(?:(?![ab].).)*$/
That's the general idea, but I don't think it's possible to invert every regex--not when the original regexes can include positive and negative lookarounds, reluctant and possessive quantifiers, and who-knows-what.

You can invert the character set by writing a ^ at the start ([^…]). So the opposite expression of [ab] (match either a or b) is [^ab] (match neither a nor b).
But the more complex your expression gets, the more complex is the complementary expression too. An example:
You want to match the literal foo. An expression, that does match anything else but a string that contains foo would have to match either
any string that’s shorter than foo (^.{0,2}$), or
any three characters long string that’s not foo (^([^f]..|f[^o].|fo[^o])$), or
any longer string that does not contain foo.
All together this may work:
^[^fo]*(f+($|[^o]|o($|[^fo]*)))*$
But note: This does only apply to foo.

You can also do this (in python) by using re.split, and splitting based on your regular expression, thus returning all the parts that don't match the regex, how to find the converse of a regex

In perl you can anti-match with $string !~ /regex/;.

With grep, you can use --invert-match or -v.

Java Regexps have an interesting way of doing this (can test here) where you can create a greedy optional match for the string you want, and then match data after it. If the greedy match fails, it's optional so it doesn't matter, if it succeeds, it needs some extra data to match the second expression and so fails.
It looks counter-intuitive, but works.
Eg (foo)?+.+ matches bar, foox and xfoo but won't match foo (or an empty string).
It might be possible in other dialects, but couldn't get it to work myself (they seem more willing to backtrack if the second match fails?)

Negate random regular expression

Is there a way to negate any regular expression? I'm using regular expressions to validate input on a form. I'm now trying to create a button that sanitizes my input. Is there a way so I can use the regular expression used for the validating also for stripping the invalid characters?
I'm using this regex for validation of illegal characters
<input data-val-regex-pattern="[^|<>:\?'\*\[\]\=%\$\+,;~&\{\}]*" type="text" />
When clicking on a button next to it, I'm calling this function:
$('#button').click(function () {
var inputElement = $(this).prev();
var regex = new RegExp(inputElement.attr('data-val-regex-pattern'), 'g');
var value = inputElement.val();
inputElement.val(value.replace(regex, ''));
});
At the moment the javascript is doing the exact opposite of what I'm trying to accomplish. I need to find a way to 'reverse' the regex.
Edit: I'm trying to reverse the regex in the javascript function. The regex in the data-val-regex-pattern-attribute is doing his job for validation.

To find the invalid characters, just take the ^ off from your regex. The carret is the negative of everything that is inside the brackets.
data-val-regex-pattern="[|<>:\?'\*\[\]\=%\$\+,;~&\{\}]*"
This will return the undesired characters so you can replace them.
Also, as you want to take off a lot of non-word characters, you could try a simpler regex. If you want only word characters and spaces, you could use something like this:
data-val-regex-pattern="[\W\S]*"

Your reges is as so:
[^|<>:\?'\*\[\]\=%\$\+,;~&\{\}]*
That means, it matches any non-invalid character multiple times.
Then you replace this for empty, so you leave only the bad characters.
Try this instead, without the negation (hat moved somewhere else):
[|^<>:\?'\*\[\]\=%\$\+,;~&\{\}]*

The following answer is to the general question of negating a regular expression. In your specific case you just need to negate a character group, or more precisely remove the negation of a character group - which is detailed in other answers.
Regular languages – those consisting of all strings entirely by matched some RE – are in fact closed under negation: there is another RE which matches exactly those strings the original RE does not. It is however not trivial to construct, which perhaps explains why RE implementations often do not offer a negation operator.
However the Javascript regexp language has extensions that make it more expressive than regular languages; in particular there is the construct of negative lookahead.
If R1 is a regexp then
^(?!.*(R1))
matches precisely the strings that does not contain a match for R1.
And
^(?!R1$)
matches precisely the strings where the whole string is not a match for R1.
Ie. negation.
For rewriting any substring not matching a given regexp, the above is insufficient. One would have to do something like
((?!R1).)*
Which would catch any substring not containing a subsubstring that matches R1. - But consideration of the edge cases show that this does not quite do what we are after. For example ((?!ab).)* matches "b" in "ab", because "ab" is not a substring of "b".
One can cheat, and make your regexp like;
(.*)(R1|$)
And rewrite to T1$2
Where T1 is the target string you want to rewrite to.
This should rewrite any portion of the string not matching R1 to T1. However I would be very careful about any edge cases for this. So much so that it might be better to write the regexp from scratch rather than trying a general approach.

Alternation operator inside square brackets does not work

I'm creating a javascript regex to match queries in a search engine string. I am having a problem with alternation. I have the following regex:
.*baidu.com.*[/?].*wd{1}=
I want to be able to match strings that have the string 'word' or 'qw' in addition to 'wd', but everything I try is unsuccessful. I thought I would be able to do something like the following:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
but it does not seem to work.

replace [wd|word|qw] with (wd|word|qw) or (?:wd|word|qw).
[] denotes character sets, () denotes logical groupings.

Your expression:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
does need a few changes, including [wd|word|qw] to (wd|word|qw) and getting rid of the redundant {1}, like so:
.*baidu.com.*[/?].*(wd|word|qw)=
But you also need to understand that the first part of your expression (.*baidu.com.*[/?].*) will match baidu.com hello what spelling/handle????????? or hbaidu-com/ or even something like lkas----jhdf lkja$##!3hdsfbaidugcomlaksjhdf.[($?lakshf, because the dot (.) matches any character except newlines... to match a literal dot, you have to escape it with a backslash (like \.)
There are several approaches you could take to match things in a URL, but we could help you more if you tell us what you are trying to do or accomplish - perhaps regex is not the best solution or (EDIT) only part of the best solution?

Help with a regex

I've got the following sequence I'm attempting to detect...#hl=b&xhr=a where b is equal to anything and a is equal to anything.
I've got the following.. but it doesn't appear to be working... (#hl=.+&xhr=) Does anyone know why?
I'm using javascript and values a and b are letters of the alphabet.

(#hl=.+&xhr=.+), you missed the second .+. Depending on your regex engine, you should also see their escaping rules, often the braces or the + have to be escaped. If you just want to match a whole string, the braces are not needed anyway, btw.

You'll need to be more specific to get a better answer:
what programming language are you using RegEx in?
what values can a and b have? Anything implies that newlines are included, which . doesn't match
do you want to get the values of a and b?
Now that that's all been said, lets move onto a regex with some assumptions:
/#h1=(.+)&xhr=(.+)/
This will match a string #h1=a&xhr=b and select the a and b values from the string. It will be greedy, so if there are key-value pairs in the pseudo-URL (I assume it's a url encoded string as a hashtag) they will be matched in b.
#h1=a&xhr=b&foo=bar
the second selection will match b&foo=bar.
The regex also assumes #h1= comes before &xhr=.

Assuming #, & and = are special characters, how about this regular expression:
#h1=([^#&=]+)&xhr=([^#&=]+)
Are you sure your key/value pairs (?) are always in this order without anything in between?

Struggling with regex to match only two of a character, not three

I need to match all occurrences of // in a string in a Javascript regex
It can't match /// or /
So far I have (.*[^\/])\/{2}([^\/].*)
which is basically "something that isn't /, followed by // followed by something that isn't /"
The approach seems to work apart from when the string I want to match starts with //
This doesn't work:
//example
This does
stuff // example
How do I solve this problem?
Edit: A bit more context - I am trying to replace // with !, so I am then using:
result = result.replace(myRegex, "$1 ! $2");

Replace two slashes that either begin the string or do not follow a slash,
and are followed by anything not a slash or the end of the string.
s=s.replace(/(^|[^/])\/{2}([^/]|$)/g,'$1!$2');

It looks like it wouldn't work for example// either.
The problem is because you're matching // preceded and followed by at least one non-slash character. This can be solved by anchoring the regex, and then you can make the preceding/following text optional:
^(.*[^\/])?\/{2}([^\/].*)?$

Use negative lookahead/lookbehind assertions:
(.*)(?<!/)//(?!/)(.*)

Use this:
/([^/]*)(\/{2})([^/]*)/g
e.g.
alert("///exam//ple".replace(/([^/]*)(\/{2})([^/]*)/g, "$1$3"));
EDIT: Updated the expression as per the comment.
/[/]{2}/
e.g:
alert("//example".replace(/[/]{2}/, ""));

This does not answer the OP's question about using regex, but since some of the original comments suggested using .replaceAll, since not everyone who reads the question in the future wants to use regex, since people might mistakenly assume that regex is the only alternative, and since these details cannot be accommodated by submitting a comment, here's a poor man's non-regex approach:
Temporarily replace the three contiguous characters with something that would never naturally occur — really important when dealing with user-entered values.
Replace the remaining two contiguous characters using .replaceAll().
Return the original three contiguous characters.
For instance, let's say you wanted to remove all instances of ".." without affecting occurrences of "...".
var cleansedText = $(this).text().toString()
.replaceAll("...", "☰☸☧")
.replaceAll("..", "")
.replaceAll("☰☸☧", "...")
;
$(this).text(cleansedText);
Perhaps not as fast as regex for longer strings, but works great for short ones.

We Keep Coding

JavaScript is the programming language of the Web.