I would like to be able to use new RegExp() in JS to match words like Macys to Macy's. Can someone show me how they would achieve this please. This is used for a search feature and i would like to return results if they the user types either spelling of the macys brand.
/macy'?s/gmi
macy matches the characters
macy
literally (case insensitive)
'?
matches the character ' literally
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed[greedy]
s
matches the character s literally (case insensitive)
g modifier:global.
Demo:
https://regex101.com/r/tV6yG1/1
PS: I'm using the stack android app and I cannot format the code as I'd like, but you get the idea of what's needed.
As #torazaburo pointed out: /Macy'?s/ is the regex you want. If you want it to be case-insensitive, add the i flag at the end of the regex.
/Macy'?s/i.test('Macys'); // true
/Macy'?s/i.test("Macy's"); // true
/Macy'?s/i.test("macys"); // true
Related
I'm looking to validate a chess FEN string and I'm working on the Regex for it. I'm looking to implement only very simple validation. Here are the rules I'm looking to match with my regex:
Exactly 7 "/" characters
Start and end of the string cannot be "/"
In between the slashes it must be either a number from 1-8 or the letters PNBRQK uppercase or lowercase
Example of a match
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR
Examples of non-match
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR/
/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR/
rnbqkbnr/pppppppp/8/8/8/10/PPPPPPPP/RNBQKBNR
rnbqkbnr/Z/8/8/8/8/PPPPPPPP/RNBQKBNR
Currently, I have been able to implement exactly 7 "/" anywhere in the string with the following regex:
/^(?:[^\/]*\/){7}[^\/]*$/gm
I'm unsure how to implement the rest as RegEx is not my strong suit.
This should do the trick: (passes all your tests)
/^(?:(?:[PNBRQK]+|[1-8])\/){7}(?:[PNBRQK]+|[1-8])$/gim
All you needed was to use positive matching for the characters you're after instead of "not slash". The key addition is the non-capturing group with one or more PNBRQK or a digit from 1-8. The same group is repeated at the end of the expression.
Oh, and I added the i flag for case insensitive matching.
/^([1-8PNBRQK]+\/){7}[1-8PNBRQK]+$/gim
/gim = global, case insensitive, and multiline.
I got the above working on https://regexr.com/ - one of my favorite places for working out regex problems (but I know there are many other good resources online).
Hope this helps.
Firstly we have the following string:
aaa{ignoreme}asdebla bla f{}asdfdsaignoreme}asd
We want our regex to find the whitespaces and any special charsacters like {}, but if after { comes exactly ignoreme} then exclude it
This is where we are right now:
(?!{ignoreme})[\s\[\]{}()<>\\'"|^`]
The problem is that our regex finds the } after ignoreme
Here is the link https://regex101.com/r/bU1oG0/2
Any help is appreciated,
Thanks
The point is that the } is matched since your (?!{ignoreme}) lookahead only skips a { followed with ignoreme} and matches a } since it is not starting a {ignoreme} char sequence. Also, in JS, you cannot use a lookbehind, like (?<!{ignoreme)}.
This is a kind of issue that can be handled with a regex that matches what you do not need, and matches and captures what you need:
/{ignoreme}|([\s[\]{}()<>\\'"|^`])/g
See the regex demo
Now, {ignoreme} is matched (and you do not have to use this value) and ([\s[]{}()<>\\'"|^`]) is captured into Group 1 the value of which you need to use.
I'm trying to create a function with a regex that can decide if my string value is correct or not. It should be true, if the string begins with lower or uppercase alphabetical characters or underscore. If it begins with any others, the function must return false.
My test input is something like this: ".dasfh"
The expressions, what I tried to use: [_a-zA-Z]..., [:alpha:]..., but both of them returned true.
I tried a bit easier task also:
"Hadfg" where the expression is [a-z]...: returns true
BUT
"hadfg" where the expression is [A-Z]...: returns false
Could anybody help me to understand this behaviour?
You're trying to match the first character in the string to be something in particular, this means you have to tell regex that it has to be the first character in the string.
The regex engine just tries to find any match in the entire string.
All you're telling it with [a-z] is "find me a lowercase character anywhere in the string". This means that:
"Hadfg" will equal true because it can find a, d, f or g as a match.
"HADFG" will equal false because there are no lowercase letters.
the same will happen for "hADFG" when matched with [A-Z] for instance, it will be able to find an A, D, F or G as a match whereas "hadfg" will return false because there is no uppercase character.
What you are looking for here is ^ in your regex, it is a special kind of modifier that indicates "start of line"
So when you apply this to your regex it will look like this: /^[a-z]/.
The regex on the previous line basically says "from the start of the string, is the first character following up a lowercase a-z?"
Try it out and you'll see.
For your solution you'd need /^[_a-zA-Z]/ to check if the first character is an _, a-z or A-Z character.
For reference, you can find cheatsheets within these tools (and test your regexes with it ofcourse!)
Regexr - My personal favorite (Uses your browsers JS regex engine)
Rubular - A Ruby regex tester
Regex101 - A Python / PCRE / PHP / JavaScript
And for a reference or tutorial (I'd recommend reading from start to finish if you want to start understanding regexp and how they work) theres regular-expressions.info.
Regex is never easy and be careful with what you do with it, it's a powerful but sometimes ugly beast to deal with :)
PS
I see you tagged your question as email-validation so I'll add a little bonus regex that validates the minimum requirements for an email address to be absolutely correct, I use this one personally:
.+#.+\..{2,}
which when broken up, looks like this:
.+ - one or more of any character
# - followed by a literal # character
.+ - one or more of any character
\. - followed by a literal . character
.{2,} - two or more of any character
Optionally you could replace {2,} with a + to make it one or more but this would allow a TLD with 1 character.
To see a RFC email-regex at work check this link.
When I look at that regex I basically just want to cry in a corner somewhere, there are definitely things you cannot do in an email address that my regex doesn't address but at least it makes sure it's something that looks like it's e-mailable anyways, if a new user decides to fill in some bull that's not my problem anymore and I wouldn't want to force them to change that 1 character just because the huge regex doesn't agree with it either.
I want to match all valid prefixes of substitute followed by other characters, so that
sub/abc/def matches the sub part.
substitute/abc/def matches the substitute part.
subt/abc/def either doesn't match or only matches the sub part, not the t.
My current Regex is /^s(u(b(s(t(i(t(u(te?)?)?)?)?)?)?)?)?/, which works, however this seems a bit verbose.
Is there any better (as in, less verbose) way to do this?
This would do like the same as you mentioned in your question.
^s(?:ubstitute|ubstitut|ubstitu|ubstit|ubsti|ubst|ubs|ub|u)?
The above regex will always try to match the large possible word. So at first it checks for substitute, if it finds any then it will do matching else it jumps to next pattern ie, substitut , likewise it goes on upto u.
DEMO 1 DEMO 2
you could use a two-step regex
find first word of subject by using this simple pattern ^(\w+)
use the extracted word from step 1 as your regex pattern e.g. ^subs against the word substitute
I have been trying to match just the user id or vanity part of the URI for Google+ accounts. I am using GAS (Google Script Engine) which I've loaded XRegExp to help match Unicode characters.
So far I have this: ((https?://)?(plus\.)?google\.com/)?(.*/)?([a-zA-Z0-9._]*)($|\?.*) which you can see the regex tests (external site) still don't just match the right parts.
I've tried using \p{L} inside of [a-zA-Z0-9._] but no luck with that. Also, I end up with an extra forward slash at the end of the profile name when it does match.
UPDATE #1: I am trying to fix some G+ URL in a spreadsheet copied from a Google Form. The links are not all the same and the most simplest profile link is "https://plus.google.com/" + user id OR vanity name.
UPDATE #2: So far I have ([+]\w+|[0-9]{21})(?:\/)?(?:\w+)?$ with uses #demrks simplified version of #guest271314's response. However, two problems:
1) Google Vanity URLs can have unicode in them. Example: https://plus.google.com/u/0/+JoseManuelGarcía_ertatto which fails. I have tried to use \p{L} but can't seem to get it right.
2) GAS doesn't seem to like it event though regex tests works on this site. =(
UPDATE #3: It seems GAS just hates using \w so I've had to expand it. So I have this so far:
/([+][A-Za-z0-9-_]+|[0-9]{21})(?:\/)?(?:[A-Za-z0-9-_]+)?$/
This matches even with "/about" or "/posts" at end of the URL. However still doesn't match UNICODE. =( I am still working on that.
UPDATE #4: So this seems to work:
/([+][\\w-_\\p{L}]+|[\\d]{21})(?:\/)?(?:[\\w-_]+)?$/
Looks like I needed to do double backslashes in side of the character classes. So this seems to work so far. Not sure if there is shorter way to use this however.
Edit, updated
Try (v4)
document.URL.match(/\++\w+.*|\d+\d|\/+\w+$/).toString()
.replace(/\/+|posts|about|photos|videos|plusones|reviews/g, "")
e.g.,
var urls = ["https://plus.google.com/+google/posts"
, "https://plus.google.com/+google/about"
, "https://plus.google.com/+google/photos"
, "https://plus.google.com/+google/videos"
, "https://plus.google.com/+google/plusones"
, "https://plus.google.com/+google/reviews"
, "https://plus.google.com/communities/104645458102703754878"
, "https://plus.google.com/u/0/LONGIDHERE"
, "https://plus.google.com/u/0/+JoseManuelGarcía_ertatto"];
var _urls = [];
urls.forEach(function(item) {
_urls.push(item.match(/\++\w+.*|\d+\d|\/+\w+$/).toString()
.replace(/\/+|posts|about|photos|videos|plusones|reviews/g, ""));
});
_urls.forEach(function(id) {
var _id = document.createElement("div");
_id.innerHTML = id;
document.body.appendChild(_id)
});
jsfiddle http://jsfiddle.net/guest271314/o4kvftwh/
This solution should match both IDs and usernames (with unicode characters):
/\+[^/]+|\d{21}/
http://regexr.com/39ds0
Explanation: As an alternative to \w (which doesn't match unicode characters) I used a negation group [^/] (matches anything but "/").
Following a possible solution:
(?:\+)(\w+)|(?:\/)(\w+)$
Explanation:
1st Alternative: (?:\+)(\w+)
(?:\+) Non-capturing group: \+ matches the character + literally. Capturing group (\w+): \w+ match any word character [a-zA-Z0-9_]. Quantifier: Between one and unlimited
times.
2nd Alternative: (?:\/)(\w+)$. (?:\/) Non-capturing group. \/ matches the character / literally. Capturing group (\w+). \w+ match any word character [a-zA-Z0-9_]. Quantifier: Between one and unlimited times. $ assert position at end of the string.
Hope it useful!
So this seems to work:
/([+][\\w-_\\p{L}]+|[\\d]{21})(?:\/)?(?:[\\w-_]+)?$/
Looks like I needed to do double backslashes in side of the character classes. So this seems to work so far. Not sure if there is shorter way to use this however.