Alternation operator inside square brackets does not work - javascript

I'm creating a javascript regex to match queries in a search engine string. I am having a problem with alternation. I have the following regex:
.*baidu.com.*[/?].*wd{1}=
I want to be able to match strings that have the string 'word' or 'qw' in addition to 'wd', but everything I try is unsuccessful. I thought I would be able to do something like the following:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
but it does not seem to work.

replace [wd|word|qw] with (wd|word|qw) or (?:wd|word|qw).
[] denotes character sets, () denotes logical groupings.

Your expression:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
does need a few changes, including [wd|word|qw] to (wd|word|qw) and getting rid of the redundant {1}, like so:
.*baidu.com.*[/?].*(wd|word|qw)=
But you also need to understand that the first part of your expression (.*baidu.com.*[/?].*) will match baidu.com hello what spelling/handle????????? or hbaidu-com/ or even something like lkas----jhdf lkja$##!3hdsfbaidugcomlaksjhdf.[($?lakshf, because the dot (.) matches any character except newlines... to match a literal dot, you have to escape it with a backslash (like \.)
There are several approaches you could take to match things in a URL, but we could help you more if you tell us what you are trying to do or accomplish - perhaps regex is not the best solution or (EDIT) only part of the best solution?

Related

In Regex How to find and replace characters when they are not within other alphabetic characters?

I'm formatting a datetime string in javascript by Regex, so I want:
Find and replace all d characters when d is not within other alphabetic characters. like this:
Find and replace all dd characters when dd is not within other alphabetic characters. like this:
I tested /\bd\b/mg pattern but its result is not which I want everytime.
How should I exclude unwanted cases in the following command:
str = str.replace(/\bd\b/mg, number);
The regular expression You posted does not consider _ as a word boundary, so it does not replace the character as expected.
In order to include this character as well, either before or after the d character to be replaced, You can use expressions similar to these:
To replace d:
/(\b|_)(d)(\b|_)/mg
To replace dd:
/(\b|_)(dd)(\b|_)/mg
Or to replace both in the same way (if it's acceptable):
/(\b|_)(d|dd)(\b|_)/mg
In comments under this answer in another thread on StackOverflow, it was also suggested to use a library that can format dates, instead of implementing it by Yourself.
UPDATE: As someone mentioned, the issue with this is also that including _ in the regular expression, removes it after the replacement. However, You can call replace and use capturing parentheses references, like this:
str.replace(/(\b|_)(d)(\b|_)/mg, "$1" + number + "$3")
I've updated earlier expressions posted in this answer to work with this method.
Please note that I'm not aware of all the cases You want to consider, so if You have any problems using the suggested solution or it does not work as expected in Your case, please let me know, so I can try to help.
I could use a lookahead and if you are not using JavaScript then a lookbehind as well.
example lookahead which checks if there is no following alpha character:
(?=[^a-zA-Z])
If you are using JavaScript it doesn't support lookbehind so you will need to use a capturing group and backreferencing.
For JS capture the part in the outermost parentheses and then use \1, \2... to target:
[^a-zA-Z](d(?=[^a-zA-Z]))
non-JS can use lookbehind:
(?<=[^a-zA-Z])d(?=[^a-zA-Z])

Regex get last 2 characters pipe not working

I'm creating a regex expression to get the variables passed to a JavaScript constructor.
The input is always going to follow along these lines:
app.use(express.static('public'));
And the regex I plan to use to strip out the unnecessary parts is:
(^app.use\()|(..$)
The first part of the regex gets everything up to the first parenthesis, and the it's supposed to pipe it to another expression which gets the last 2 characters of the string.
My issue is that it seems to be ignoring the second regex. I tried a few other expressions in the second part and they worked, but this one isn't.
What am I doing wrong?
Regex example on Regex101: https://regex101.com/r/jV9eH6/3
UPDATE:
This is not a duplicate of How to replace all occurrences of a string in JavaScript?
My question is about a specific issue with a regex, not about replacing one string with another in JavaScript.
You need to use multiline modifier. Whenever anchors ^, $ are used in your regex then feel free to add multi-line modifier m.
/(^app.use\()|(..$)/gm
DEMO

JavaScript regular expression match amount

I'm trying to write a regular expression to match amounts. In my case, what I need is that either the amount should be a positive integer or if the decimal is used, it must be followed by one or two integers. So basically, the following are valid amounts:
34000
345.5
876.45
What I wrote was this: /[0-9]+(\.[0-9]{1,2}){0,1}/
My thinking was that by using parenthesis like so: (\.[0-9]{1,2}), I would be able to bundle the whole "decimal plus one or two integers" part. But it isn't happening. Among other problems, this regex is allowing stuff like 245. and 345.567 to slip through. :(
Help, please!
Your regular expression is good, but you need to match the beginning and end of the string. Otherwise, your regex can match only a portion of the string and still (correctly) return a match. To match the beginning of the string, use ^, for the end, use $.
Update: as Avinash has noted, you can replace {0,1} with ?. JS supports \d for digits, so the regex can be further simplified
Finally, since if are only testing against a regex, you can use a non-capturing group ( (?:...) instead of (...)), which offers better performance.
original:
/[0-9]+(\.[0-9]{1,2}){0,1}/.test('345.567')
Fixed, and faster ;)
/^\d+(?:\.\d{1,2})?$/.test('345.567')

Nice way to do this regex substitution

I'm writing a javascript function which takes a regex and some elements against which it matches the regex against the name attribute.
Let's say i'm passed this regex
/cmw_step_attributes\]\[\d*\]/
and a string that is structured like this
"foo[bar][]chicken[123][cmw_step_attributes][456][name]"
where all the numbers could vary, or be missing. I want to match the regex against the string in order to swap out the 456 for another number (which will vary), eg 789. So, i want to end up with
"foo[bar][]chicken[123][cmw_step_attributes][789][name]"
The regex will match the string, but i can't swap out the whole regex for 789 as that will wipe out the "[cmw_step_attributes][" bit. There must be a clean and simple way to do this but i can't get my head round it. Any ideas?
thanks, max
Capture the first part and put it back into the string.
.replace(/(cmw_step_attributes\]\[)\d*/, '$1789');
// note I removed the closing ] from the end - quantifiers are greedy so all numbers are selected
// alternatively:
.replace(/cmw_step_attributes\]\[\d*\]/, 'cmw_step_attributes][789]')
Either literally rewrite part that must remain the same in replacement string, or place it inside capturing brackets and reference it in replace.
See answer on: Regular Expression to match outer brackets.
Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.
Have you tried:
var str = 'foo[bar][]chicken[123][cmw_step_attributes][456][name]';
str.replace(/cmw_step_attributes\]\[\d*?\]/gi, 'cmw_step_attributes][XXX]');

Trouble with word-boundary (\b)

I have an array of keywords, and I want to know whether at least one of the keywords is found within some string that has been submitted. I further want to be absolutely sure that it is the keyword that has been matched, and not something that is very similar to the word.
Say, for example, that our keywords are [English, Eng, En] because we are looking for some variation of English.
Now, say that the input from a user is i h8 eng class, or something equally provocative and illiterate - then the eng should be matched. It should also fail to match a word like england or some odd thing chen, even though it's got the en bit.
So, in my infinite lack of wisdom I believed I could do something along the lines of this in order to match one of my array items with the input:
.match(RegExp('\b('+array.join('|')+')\b','i'))
With the thinking that the regular expression would look for matches from the array, now presented like (English|Eng|En) and then look to see whether there were zero-width word bounds on either side.
You need to double the backslashes.
When you create a regex with the RegExp() constructor, you're passing in a string. JavaScript string constant syntax also treats the backslash as a meta-character, for quoting quotes etc. Thus, the backslashes will be effectively stripped out before the RegExp() code even runs!
By doubling them, the step of parsing the string will leave one backslash behind. Then the RegExp() parser will see the single backslash before the "b" and do the right thing.
You need to double the backslashes in a JavaScript string or you'll encode a Backspace character:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))
You need to double-escape a \b, cause it have special value in strings:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))
\b is an escape sequence inside string literals (see table 2.1 on this page). You should escape it by adding one extra slash:
.match(RegExp('\\b('+array.join('|')+')\\b','i'))
You do not need to escape \b when used inside a regular expression literal:
/\b(english|eng|en)\b/i

Categories