Javascript RegExp Matching weirdness

Javascript RegExp Matching weirdness - javascript

I have a RegExp:
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi
and some text "Champion"
somehow, this is coming back as a match, am I crazy?
0: "pio"
1: "i"
index: 4
input: "Champion"
length: 2
the loop is here:
// contruct the pattern, dynamically
var someText = "Champion";
var phrase = ".?(NCAA|Division|I|Basketball|Champions,|1939-2011).?";
var pat = new RegExp(phrase, "gi"); // <- ends up being
var result;
while( result = pat.exec(someText) ) {
// do stuff!
}
There has to be something wrong with my RegExp, right?
EDIT:
The .? thing was just a quick and dirty attempt to say that I'd like to match one of those words AND/OR one of those words with a single char on either side. ex:
\sNCAA\s
NCAA
NCAA\s
\sNCAA
GOAL:
I'm trying to do some simple hit highlighting based on some search words. I've got a function that gets all of the text nodes on a page, and I'd like to go through them all and highlight any matches to any of the terms in my phrase variable.
I think that I just need to rework how I am building my RegExp.

Well, first of all you're specifying case-insensitivity, and secondly, you are matching the letter I as one of your matchable string.
Champion would match pio and i, because they both match /.?I.?/gi
It however doesn't match /.?Champions,.?/gi because of the trailing comma.

Add start (^) and end ($) anchors to the regexp.
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
Without the anchors, the regexp's match can start and end anywhere in the string, which is why
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
can match pio and i: because it's actually matching around the (case-insensitive) I. If you leave the anchors off, but remove the ...|I|..., the regex won't match 'Champion':
> /.?(NCAA|Division|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
null

Champion matches /.?I.?/i.
Your own output notes that it's matching the substring "pio".
Perhaps you meant to bound the expression to the start and end of the input, with ^ and $ respectively:
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
I know you said to ignore the .?, but I can't: it's most likely wrong, and it's most likely going to continue to cause you problems. Explain why they're there and we can tell you how to do it properly. :)

Related

I need some help for a specific regex in javascript

I try to set a correct regex in my javascript code, but I'm a bit confused with this. My goal is to find any occurence of "rotate" in a string. This should be simple, but in fact I'm lost as my "rotate" can have multiple endings! Here are some examples of what I want to find with the regex:
rotate5
rotate180
rotate-1
rotate-270
The "rotate" word can be at the begining of my string or at the end, or even in the middle separated by spaces from other words. The regex will be used in a search-and-replace function.
Can someone help me please?
EDIT: What I tried so far (probably missing some of them):
/\wrotate.*/
/rotate.\w*/
/rotate.\d/
/\Srotate*/
I'm not fully understanding the regex mechanic yet.

Try this regex as a start. It will return all occurrences of a "rotate" string where a number (positive or negative) follows the "rotate".
/(rotate)([-]?[0-9]*)/g
Here is sample code
var aString = ["rotate5","rotate180","rotate-1","some text rotate-270 rotate-1 more text rotate180"];
for (var x = 0; x < 4; x++){
var match;
var regex = /(rotate)([-]?[0-9]*)/g;
while (match = regex.exec(aString[x])){
console.log(match);
}
}
In this example,
match[0] gives the whole match (e.g. rotate5)
match[1] gives the text "rotate"
match[2] gives the numerical text immediately after the word "rotate"
If there are multiple rotate stings in the string, this will return them all

If you just need to know if the 'word' is in the string so /rotate/ simply will be OK.
But if you want some matching about what coming before or after the #mseifert will be good
If you just want to replace the word rotate by another one
you can just use the string method String.replace use it like var str = "i am rotating with rotate-90"; str.repalace('rotate','turning')'
WHy your regex doesnt work ?
/\wrotate.*/
means that the string must start with a caracter [a-zA-Z0-9_] followed by rotate and another optional character
/rotate.\w*/
meanse rotate must be followed by a character and others n optional character
...............

Using your description:
The "rotate" word can be at the beginning of my string or at the end, or even in the middle separated by spaces from other words. The regex will be used in a search-and-replace function.
This regex should do the work:
const regex = /(^rotate|rotate$|\ {1}rotate\ {1})/gm;
You can learn more about regular expressions with these sites:
http://www.regular-expressions.info
regex101.com and btw here is an example using your requirements.

split on words except when phrase contains that word

I am trying to split where clauses, I want to split text on AND|OR|NOT except when NOT is in the 'phrase' NOT IN or NOT LIKE or IS NOT NULL.
1st example:
DEVLDATE IS NOT NULL AND STATUS = D AND PICKUPDATE IS NULL
I expect 3 segments, splitting on the AND's, but not on the NOT in this instance.
2nd ex:
(NOT (STATUS IN ('A','X') )) AND LINEHAUL = 0
I want to split on this NOT & AND, also expecting 3 segments in this instance
I'm trying this look ahead from another almost similar example but it is not splitting at all. I have next to zero regex experience. Not sure what I'm missing or if it's even possible.
Thanks in advance.
var ignoreRegex = /(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b)/g
var filterArray = filterBy.split(new RegExp(ignoreRegex));

Try with:
\b(AND|OR|NOT(?!\s+NULL|IN|LIKE))\b
DEMO
About your regex:
(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b
[NOT IN] - this is character class [...] it will match character
which you put in in, so it can match: N,T,etc. not whole
word/sentence,
([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL]) - this whole part actually can match only one character, because it doesn't use any quantifires or intervals, it doesn't work as you expect at all,
so whole regex should match: some text with AND, OR or NOT, but if line within which the part was matched doesn't consist letters and spaces included in character classes..... so it will not match anything probably.

regex not being called repeatedly for multiple matches (isn't global)

I have this regex /#[a-zA-Z0-9_]+$/g to do a global look up of all user names that are mentioned.
Here is some sample code.
var userRegex = /#[a-zA-Z0-9_]+$/g;
var text = "This is some sample text #Stuff #Stuff2 #Stuff3";
text.replace(userRegex, function(match, text, urlId) {
console.log(match);
});
So basically that console.log only gets called once, in this case it'll just show #Stuff3. I'm not sure why it isn't searching globally. If someone can help fix up that regex for me, that'd be awesome!

$ means "Assert the position at the end of the string (or before a line break at the end of the string, if any)". But you don't seem to want that.
So remove the $ and use /#[a-zA-Z0-9_]+/g instead.
var userRegex = /#[a-zA-Z0-9_]+/g,
text = "This is some sample text #Stuff #Stuff2 #Stuff3";
text.match(userRegex); // [ "#Stuff", "#Stuff2", "#Stuff3" ]

It isn't doing a global search throughout the entire context simply because of the end of string $ anchor (which only asserts at the end of string position). You can use the following here:
var results = text.match(/#\w+/g) //=> [ '#Stuff', '#Stuff2', '#Stuff3' ]
Note: \w is shorthand for matching any word character.

Adding to #Oriol's answer. You can add word boundaries to be more specific.
#([a-zA-Z0-9_]+)\b
the \b will cause the username to match only if it is followed by a non-word character.
Here is the regex demo.

RegEx - Get All Characters After Last Slash in URL

I'm working with a Google API that returns IDs in the below format, which I've saved as a string. How can I write a Regular Expression in javascript to trim the string to only the characters after the last slash in the URL.
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9'

Don't write a regex! This is trivial to do with string functions instead:
var final = id.substr(id.lastIndexOf('/') + 1);
It's even easier if you know that the final part will always be 16 characters:
var final = id.substr(-16);

A slightly different regex approach:
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
Breaking down this regex:
\/ match a slash
( start of a captured group within the match
[^\/] match a non-slash character
+ match one of more of the non-slash characters
) end of the captured group
\/? allow one optional / at the end of the string
$ match to the end of the string
The [1] then retrieves the first captured group within the match
Working snippet:
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9';
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
// display result
document.write(afterSlashChars);

Just in case someone else comes across this thread and is looking for a simple JS solution:
id.split('/').pop(-1)

this is easy to understand (?!.*/).+
let me explain:
first, lets match everything that has a slash at the end, ok?
that's the part we don't want
.*/ matches everything until the last slash
then, we make a "Negative lookahead" (?!) to say "I don't want this, discard it"
(?!.*) this is "Negative lookahead"
Now we can happily take whatever is next to what we don't want with this
.+
YOU MAY NEED TO ESCAPE THE / SO IT BECOMES:
(?!.*\/).+

this regexp: [^\/]+$ - works like a champ:
var id = ".../base/nabb80191e23b7d9"
result = id.match(/[^\/]+$/)[0];
// results -> "nabb80191e23b7d9"

This should work:
last = id.match(/\/([^/]*)$/)[1];
//=> nabb80191e23b7d9

Don't know JS, using others examples (and a guess) -
id = id.match(/[^\/]*$/); // [0] optional ?

Why not use replace?
"http://google.com/aaa".replace(/(.*\/)*/,"")
yields "aaa"

JavaScript RegEx Match Failing

I am having issues matching a string using regex in javascript. I am trying to get everything up to the word "at". I am using the following and while it doesn't return any errors, it also doesn't do anything either.
var str = "Team A at Team B";
var matches = str.match(/(.*?)(?=at|$)/);
I tried multiple regex patterns before coming across this SO post, Regex to capture everything before first optional string, but it doesn't to return what I want.

Remove the ? at your first capturing group, and |$ from your second, and add ^ to mark beginning of string:
str.match(/^(.*)(?=at)/)
Alternatively (I personally find below easier to read, but your call):
str.substr(0, str.search(/\bat\b/))

We Keep Coding

JavaScript is the programming language of the Web.

Javascript RegExp Matching weirdness - javascript

Well, first of all you're specifying case-insensitivity, and secondly, you are matching the letter I as one of your matchable string. Champion would match pio and i, because they both match /.?I.?/gi It however doesn't match /.?Champions,.?/gi because of the trailing comma.

Related

I need some help for a specific regex in javascript

split on words except when phrase contains that word

regex not being called repeatedly for multiple matches (isn't global)

RegEx - Get All Characters After Last Slash in URL

JavaScript RegEx Match Failing

Categories

Resources