JS+Regexp: Match anything except if it's between [[ ]] - javascript

I've got a <textarea> that will be basically a list of names, so I set a function to replace the spaces between the names for a new line.
Now I need to specify that two or more spaces between names are in fact part of the same element.
IE:
John Lucas [[Laurie Vega]] [[Daniel Deer]] Robert
Should turn to
John
Lucas
[[Laurie Vega]]
[[Daniel Deer]]
Robert
So now my regexp $("textarea").val().toString().replace(\ \g, '\n'); is broken as it will add a new line before Vega and Deer.
I need to replace anything that's not in between [ and ]. I just made the opposite and tried to negate it, but it doesn't seem to work:
// Works
$("textarea").val().toString().match(/\[([^\]]*)\]/g));
// Am I using the ! operand wrong?
$("textarea").val().toString().match(/!\[([^\]]*)\]/g));
I'm a little lost. I tried matching and then replacing, but that way I won't be able to recover my original string. So I have to match anything outside double brackets and replace the space.

If there is a chance that your names contain non alphabetic characters ("Jim-bo O'Leary"?), you may prefer match anything that is not a '[' or a space using /[^[ ]/.
You can then join the matched strings to get the new line effect.
$("textarea").val().toString().match(/([^\[ ]+|\[\[[^\]]*\]\])/g).join("\n");

The exclamation point has no particular meaning in a regex.
What you're looking for is either (that means the | operator) a sequence of letters
[A-Za-z]+
or two brackets, followed by some non-closing-brackets, followed by two closing brackets
\[\[[^\]]+\]\]
So
$("textarea").val().toString().match(/[A-Za-z]+|\[\[[^\]]+\]\]/g)

Related

Matching varients and mis-spellings of a word using RegEx in MS Word

I am trying to capture varients of a word using Microsft Word find and replace function. Here is a searchable snippet:
There are going to be 3 instances of the word successful for the purpose of Regex matching. Here is the second sucesfull and here is another succesfull , both spelt incorrectly.
This is my Regex expression used in Find and Replace with "Use Wildcards" selected (I have also tried this with replacing the braces with brackets with no joy)
<([Ss]uc[1,]es[1,]ful[1,])>
[Ss]uc{1,}es{1,}ful{1,}
Replace the [ ] with { } and it should work fine. The curly braces specify how many times you want a character to repeat. Square brackets are used to specify the acceptable characters.
So the current regular expression will match the following.
succcccesssfulll
sucesful
successful
Successsssfull
and so on.
I think this is cleaner and easier to type.
[Ss]uc+es+ful+
"+" counts for one or more occurrence of a character.
The search string you want would be:
<[sS]uc#es#ful#>
This searches for a word (the < and > symbols) starting with either s or S and including one or more (the # symbol) of c, s, and l.

Nice way to do this regex substitution

I'm writing a javascript function which takes a regex and some elements against which it matches the regex against the name attribute.
Let's say i'm passed this regex
/cmw_step_attributes\]\[\d*\]/
and a string that is structured like this
"foo[bar][]chicken[123][cmw_step_attributes][456][name]"
where all the numbers could vary, or be missing. I want to match the regex against the string in order to swap out the 456 for another number (which will vary), eg 789. So, i want to end up with
"foo[bar][]chicken[123][cmw_step_attributes][789][name]"
The regex will match the string, but i can't swap out the whole regex for 789 as that will wipe out the "[cmw_step_attributes][" bit. There must be a clean and simple way to do this but i can't get my head round it. Any ideas?
thanks, max
Capture the first part and put it back into the string.
.replace(/(cmw_step_attributes\]\[)\d*/, '$1789');
// note I removed the closing ] from the end - quantifiers are greedy so all numbers are selected
// alternatively:
.replace(/cmw_step_attributes\]\[\d*\]/, 'cmw_step_attributes][789]')
Either literally rewrite part that must remain the same in replacement string, or place it inside capturing brackets and reference it in replace.
See answer on: Regular Expression to match outer brackets.
Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.
Have you tried:
var str = 'foo[bar][]chicken[123][cmw_step_attributes][456][name]';
str.replace(/cmw_step_attributes\]\[\d*?\]/gi, 'cmw_step_attributes][XXX]');

Alternation operator inside square brackets does not work

I'm creating a javascript regex to match queries in a search engine string. I am having a problem with alternation. I have the following regex:
.*baidu.com.*[/?].*wd{1}=
I want to be able to match strings that have the string 'word' or 'qw' in addition to 'wd', but everything I try is unsuccessful. I thought I would be able to do something like the following:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
but it does not seem to work.
replace [wd|word|qw] with (wd|word|qw) or (?:wd|word|qw).
[] denotes character sets, () denotes logical groupings.
Your expression:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
does need a few changes, including [wd|word|qw] to (wd|word|qw) and getting rid of the redundant {1}, like so:
.*baidu.com.*[/?].*(wd|word|qw)=
But you also need to understand that the first part of your expression (.*baidu.com.*[/?].*) will match baidu.com hello what spelling/handle????????? or hbaidu-com/ or even something like lkas----jhdf lkja$##!3hdsfbaidugcomlaksjhdf.[($?lakshf, because the dot (.) matches any character except newlines... to match a literal dot, you have to escape it with a backslash (like \.)
There are several approaches you could take to match things in a URL, but we could help you more if you tell us what you are trying to do or accomplish - perhaps regex is not the best solution or (EDIT) only part of the best solution?

Trouble with word-boundary (\b)

I have an array of keywords, and I want to know whether at least one of the keywords is found within some string that has been submitted. I further want to be absolutely sure that it is the keyword that has been matched, and not something that is very similar to the word.
Say, for example, that our keywords are [English, Eng, En] because we are looking for some variation of English.
Now, say that the input from a user is i h8 eng class, or something equally provocative and illiterate - then the eng should be matched. It should also fail to match a word like england or some odd thing chen, even though it's got the en bit.
So, in my infinite lack of wisdom I believed I could do something along the lines of this in order to match one of my array items with the input:
.match(RegExp('\b('+array.join('|')+')\b','i'))
With the thinking that the regular expression would look for matches from the array, now presented like (English|Eng|En) and then look to see whether there were zero-width word bounds on either side.
You need to double the backslashes.
When you create a regex with the RegExp() constructor, you're passing in a string. JavaScript string constant syntax also treats the backslash as a meta-character, for quoting quotes etc. Thus, the backslashes will be effectively stripped out before the RegExp() code even runs!
By doubling them, the step of parsing the string will leave one backslash behind. Then the RegExp() parser will see the single backslash before the "b" and do the right thing.
You need to double the backslashes in a JavaScript string or you'll encode a Backspace character:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))
You need to double-escape a \b, cause it have special value in strings:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))
\b is an escape sequence inside string literals (see table 2.1 on this page). You should escape it by adding one extra slash:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))
You do not need to escape \b when used inside a regular expression literal:
/\b(english|eng|en)\b/i

Help with a regex

I've got the following sequence I'm attempting to detect...#hl=b&xhr=a where b is equal to anything and a is equal to anything.
I've got the following.. but it doesn't appear to be working... (#hl=.+&xhr=) Does anyone know why?
I'm using javascript and values a and b are letters of the alphabet.
(#hl=.+&xhr=.+), you missed the second .+. Depending on your regex engine, you should also see their escaping rules, often the braces or the + have to be escaped. If you just want to match a whole string, the braces are not needed anyway, btw.
You'll need to be more specific to get a better answer:
what programming language are you using RegEx in?
what values can a and b have? Anything implies that newlines are included, which . doesn't match
do you want to get the values of a and b?
Now that that's all been said, lets move onto a regex with some assumptions:
/#h1=(.+)&xhr=(.+)/
This will match a string #h1=a&xhr=b and select the a and b values from the string. It will be greedy, so if there are key-value pairs in the pseudo-URL (I assume it's a url encoded string as a hashtag) they will be matched in b.
#h1=a&xhr=b&foo=bar
the second selection will match b&foo=bar.
The regex also assumes #h1= comes before &xhr=.
Assuming #, & and = are special characters, how about this regular expression:
#h1=([^#&=]+)&xhr=([^#&=]+)
Are you sure your key/value pairs (?) are always in this order without anything in between?

Categories