Help with a regex - javascript

Help with a regex - javascript

I've got the following sequence I'm attempting to detect...#hl=b&xhr=a where b is equal to anything and a is equal to anything.
I've got the following.. but it doesn't appear to be working... (#hl=.+&xhr=) Does anyone know why?
I'm using javascript and values a and b are letters of the alphabet.

(#hl=.+&xhr=.+), you missed the second .+. Depending on your regex engine, you should also see their escaping rules, often the braces or the + have to be escaped. If you just want to match a whole string, the braces are not needed anyway, btw.

You'll need to be more specific to get a better answer:
what programming language are you using RegEx in?
what values can a and b have? Anything implies that newlines are included, which . doesn't match
do you want to get the values of a and b?
Now that that's all been said, lets move onto a regex with some assumptions:
/#h1=(.+)&xhr=(.+)/
This will match a string #h1=a&xhr=b and select the a and b values from the string. It will be greedy, so if there are key-value pairs in the pseudo-URL (I assume it's a url encoded string as a hashtag) they will be matched in b.
#h1=a&xhr=b&foo=bar
the second selection will match b&foo=bar.
The regex also assumes #h1= comes before &xhr=.

Assuming #, & and = are special characters, how about this regular expression:
#h1=([^#&=]+)&xhr=([^#&=]+)
Are you sure your key/value pairs (?) are always in this order without anything in between?

Related

Negate random regular expression

Is there a way to negate any regular expression? I'm using regular expressions to validate input on a form. I'm now trying to create a button that sanitizes my input. Is there a way so I can use the regular expression used for the validating also for stripping the invalid characters?
I'm using this regex for validation of illegal characters
<input data-val-regex-pattern="[^|<>:\?'\*\[\]\=%\$\+,;~&\{\}]*" type="text" />
When clicking on a button next to it, I'm calling this function:
$('#button').click(function () {
var inputElement = $(this).prev();
var regex = new RegExp(inputElement.attr('data-val-regex-pattern'), 'g');
var value = inputElement.val();
inputElement.val(value.replace(regex, ''));
});
At the moment the javascript is doing the exact opposite of what I'm trying to accomplish. I need to find a way to 'reverse' the regex.
Edit: I'm trying to reverse the regex in the javascript function. The regex in the data-val-regex-pattern-attribute is doing his job for validation.

To find the invalid characters, just take the ^ off from your regex. The carret is the negative of everything that is inside the brackets.
data-val-regex-pattern="[|<>:\?'\*\[\]\=%\$\+,;~&\{\}]*"
This will return the undesired characters so you can replace them.
Also, as you want to take off a lot of non-word characters, you could try a simpler regex. If you want only word characters and spaces, you could use something like this:
data-val-regex-pattern="[\W\S]*"

Your reges is as so:
[^|<>:\?'\*\[\]\=%\$\+,;~&\{\}]*
That means, it matches any non-invalid character multiple times.
Then you replace this for empty, so you leave only the bad characters.
Try this instead, without the negation (hat moved somewhere else):
[|^<>:\?'\*\[\]\=%\$\+,;~&\{\}]*

The following answer is to the general question of negating a regular expression. In your specific case you just need to negate a character group, or more precisely remove the negation of a character group - which is detailed in other answers.
Regular languages – those consisting of all strings entirely by matched some RE – are in fact closed under negation: there is another RE which matches exactly those strings the original RE does not. It is however not trivial to construct, which perhaps explains why RE implementations often do not offer a negation operator.
However the Javascript regexp language has extensions that make it more expressive than regular languages; in particular there is the construct of negative lookahead.
If R1 is a regexp then
^(?!.*(R1))
matches precisely the strings that does not contain a match for R1.
And
^(?!R1$)
matches precisely the strings where the whole string is not a match for R1.
Ie. negation.
For rewriting any substring not matching a given regexp, the above is insufficient. One would have to do something like
((?!R1).)*
Which would catch any substring not containing a subsubstring that matches R1. - But consideration of the edge cases show that this does not quite do what we are after. For example ((?!ab).)* matches "b" in "ab", because "ab" is not a substring of "b".
One can cheat, and make your regexp like;
(.*)(R1|$)
And rewrite to T1$2
Where T1 is the target string you want to rewrite to.
This should rewrite any portion of the string not matching R1 to T1. However I would be very careful about any edge cases for this. So much so that it might be better to write the regexp from scratch rather than trying a general approach.

Nice way to do this regex substitution

I'm writing a javascript function which takes a regex and some elements against which it matches the regex against the name attribute.
Let's say i'm passed this regex
/cmw_step_attributes\]\[\d*\]/
and a string that is structured like this
"foo[bar][]chicken[123][cmw_step_attributes][456][name]"
where all the numbers could vary, or be missing. I want to match the regex against the string in order to swap out the 456 for another number (which will vary), eg 789. So, i want to end up with
"foo[bar][]chicken[123][cmw_step_attributes][789][name]"
The regex will match the string, but i can't swap out the whole regex for 789 as that will wipe out the "[cmw_step_attributes][" bit. There must be a clean and simple way to do this but i can't get my head round it. Any ideas?
thanks, max

Capture the first part and put it back into the string.
.replace(/(cmw_step_attributes\]\[)\d*/, '$1789');
// note I removed the closing ] from the end - quantifiers are greedy so all numbers are selected
// alternatively:
.replace(/cmw_step_attributes\]\[\d*\]/, 'cmw_step_attributes][789]')

Either literally rewrite part that must remain the same in replacement string, or place it inside capturing brackets and reference it in replace.

See answer on: Regular Expression to match outer brackets.
Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.

Have you tried:
var str = 'foo[bar][]chicken[123][cmw_step_attributes][456][name]';
str.replace(/cmw_step_attributes\]\[\d*?\]/gi, 'cmw_step_attributes][XXX]');

Trouble with word-boundary (\b)

I have an array of keywords, and I want to know whether at least one of the keywords is found within some string that has been submitted. I further want to be absolutely sure that it is the keyword that has been matched, and not something that is very similar to the word.
Say, for example, that our keywords are [English, Eng, En] because we are looking for some variation of English.
Now, say that the input from a user is i h8 eng class, or something equally provocative and illiterate - then the eng should be matched. It should also fail to match a word like england or some odd thing chen, even though it's got the en bit.
So, in my infinite lack of wisdom I believed I could do something along the lines of this in order to match one of my array items with the input:
.match(RegExp('\b('+array.join('|')+')\b','i'))
With the thinking that the regular expression would look for matches from the array, now presented like (English|Eng|En) and then look to see whether there were zero-width word bounds on either side.

You need to double the backslashes.
When you create a regex with the RegExp() constructor, you're passing in a string. JavaScript string constant syntax also treats the backslash as a meta-character, for quoting quotes etc. Thus, the backslashes will be effectively stripped out before the RegExp() code even runs!
By doubling them, the step of parsing the string will leave one backslash behind. Then the RegExp() parser will see the single backslash before the "b" and do the right thing.

You need to double the backslashes in a JavaScript string or you'll encode a Backspace character:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))

You need to double-escape a \b, cause it have special value in strings:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))

\b is an escape sequence inside string literals (see table 2.1 on this page). You should escape it by adding one extra slash:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))
You do not need to escape \b when used inside a regular expression literal:
/\b(english|eng|en)\b/i

Result of javascript regular expression not understood

When I eval (in javascript) [I meant, used string.match()]:
<!--:es-->Text number 1<!--:--><!--:en-->text 2<!--:-->
using
/<!--:es-->(.|\n)*?<!--:-->/
I get as match:
Text number 1,1
I mean, it adds a comma and repeats the last character. Does anybody know why this happens?
PS. text could have carriage return, that is why i used (.|\n).
Thanks a lot.

The result of a regular expression match is an array.
The zero-th element of the array is the whole match : "Text number 1"
The first element of the array is the contents of the first group, in this case "1" since the * is outside the parentheses.
When the array is converted to a string, you get the contents with commas in between.

When I eval (in javascript)
Don't. Use RegExp
Eval() evaluates any ECMAScript, you don't want to do this if you don't have 100% control over the input.

Some research has shown me that the . can't match newlines in javascript.
I'd rewrite your regex this way:
/<!--:es-->[\s\S]*?<!--:-->/
This will avoid the problem you saw, as it excludes the capture group.
And ghoppe is right: use RegExp.

JS+Regexp: Match anything except if it's between [[ ]]

I've got a <textarea> that will be basically a list of names, so I set a function to replace the spaces between the names for a new line.
Now I need to specify that two or more spaces between names are in fact part of the same element.
IE:
John Lucas [[Laurie Vega]] [[Daniel Deer]] Robert
Should turn to
John
Lucas
[[Laurie Vega]]
[[Daniel Deer]]
Robert
So now my regexp $("textarea").val().toString().replace(\ \g, '\n'); is broken as it will add a new line before Vega and Deer.
I need to replace anything that's not in between [ and ]. I just made the opposite and tried to negate it, but it doesn't seem to work:
// Works
$("textarea").val().toString().match(/\[([^\]]*)\]/g));
// Am I using the ! operand wrong?
$("textarea").val().toString().match(/!\[([^\]]*)\]/g));
I'm a little lost. I tried matching and then replacing, but that way I won't be able to recover my original string. So I have to match anything outside double brackets and replace the space.

If there is a chance that your names contain non alphabetic characters ("Jim-bo O'Leary"?), you may prefer match anything that is not a '[' or a space using /[^[ ]/.
You can then join the matched strings to get the new line effect.
$("textarea").val().toString().match(/([^\[ ]+|\[\[[^\]]*\]\])/g).join("\n");

The exclamation point has no particular meaning in a regex.
What you're looking for is either (that means the | operator) a sequence of letters
[A-Za-z]+
or two brackets, followed by some non-closing-brackets, followed by two closing brackets
\[\[[^\]]+\]\]
So
$("textarea").val().toString().match(/[A-Za-z]+|\[\[[^\]]+\]\]/g)

We Keep Coding

JavaScript is the programming language of the Web.