how do I get this regex to mimic a look-behind? - javascript

/(([$]*)([A-Z]{1,3})([$]*)([0-9]{1,5}))/gi
Regex running on Debuggex
This is for pulling cell refs out of spreadsheet formulas and checking to see if the formula contains an absolute ref. The problem is that it's matching an invalid cell, the last one here:
a1
$a1
$A$5
A5*4
A20+45
A34/A$23
A1*6
A1*A45
$AAA11
AAA33
AA33:A33
$AAAAA44 // <-- not a valid cell!
It's matching the AAA44 in $AAAAA44, but it shouldn't. All the rest of the capture groups etc are working correctly -- each of those rows but the last one are correctly grabbing 1 or more cell refs. A negative lookahead seems like the right way to go, but after mucking with it for a good long while I must admit to being stuck.

If you can't match for ^...$ then you may still be able to introduce some \b matching
/foo\bbar/.test('foobar'); // false
/foo\b\d/.test('foo1'); // false
/foo\b.\d/.test('foo+1'); // true
So your RegExp would look like (I left in your capture groups)
var re = /(?:\b|^)((\$?)([a-z]{1,3})(\$?)(\d{1,5}))(?:\b|$)/i;
re.test('$AAAAA44'); // false
re.test('$AAA44'); // true
Demo

Related

remove Niqqud from string in javascript

I have the exact problem described here:
removing Hebrew "niqqud" using r
Have been struggling to remove niqqud ( diacritical signs used to represent vowels or distinguish between alternative pronunciations of letters of the Hebrew alphabet). I have for instance this variable: sample1 <- "הֻסְמַק"
And i cannot find effective way to remove the signs below the letters.
But in my case i have to do this in javascript.
Based of UTF-8 values table described here, I have tried this regex without success.
Just a slight problem with your regex. Try the following:
const input = "הֻסְמַק";
console.log(input)
console.log(input.replace(/[\u0591-\u05C7]/g, ''));
/*
$ node index.js
הֻסְמַק
הסמק
*/
nj_’s answer is great.
Just to add a bit (because I don’t have enough reputation points to comment directly) -
[\u0591-\u05C7] may be too broad a brush. See the relevant table here: https://en.wikipedia.org/wiki/Unicode_and_HTML_for_the_Hebrew_alphabet#Compact_table
Rows 059x and 05AX are for t'amim (accents/cantillation marks).
Niqud per se is in rows 05Bx and 05Cx.
And as Avraham commented, you can run into an issues if 2 words are joined by a makaf (05BE), then by removing that you will end up with run-on words.
If you want to remove only t’amim but keep nikud, use /[\u0591-\u05AF]/g. If you want to avoid the issue raised by Avraham, you have 2 options - either keep the maqaf, or replace it with a dash:
//keep the original makafim
const input = "כִּי־טוֹב"
console.log(input)
console.log(input.replace(/([\u05B0-\u05BD]|[\u05BF-\u05C7])/g,""));
//replace makafim with dashes
console.log(input.replace(/\u05BE/g,"-").replace(/[\u05B0-\u05C7]/g,""))
/*
$ node index.js
כִּי־טֽוֹב
כי־טוב
כי-טוב
*/

Regex- match 3 or 6 of type

I'm writing an application that requires color manipulation, and I want to know when the user has entered a valid hex value. This includes both '#ffffff' and '#fff', but not the ones in between, like 4 or 5 Fs. My question is, can I write a regex that determines if a character is present a set amount of times or another exact amount of times?
What I tried was mutating the:
/#(\d|\w){3}{6}/
Regular expression to this:
/#(\d|\w){3|6}/
Obviously this didn't work. I realize I could write:
/(#(\d|\w){3})|(#(\d|\w){6})/
However I'm hoping for something that looks better.
The shortest I could come up with:
/#([\da-f]{3}){1,2}/i
I.e. # followed by one or two groups of three hexadecimal digits.
You can use this regex:
/#[a-f\d]{3}(?:[a-f\d]{3})?\b/i
This will allow #<3 hex-digits> or #<6 hex-digits> inputs. \b in the end is for word boundary.
RegEx Demo
I had to find a pattern for this myself today but I also needed to include the extra flag for transparency (i.e. #FFF5 / #FFFFFF55). Which made things a little more complicated as the valid combinations goes up a little.
In case it's of any use, here's what I came up with:
var inputs = [
"#12", // Invalid
"#123", // Valid
"#1234", // Valid
"#12345", // Invalid
"#123456", // Valid
"#1234567", // Invalid
"#12345678", // Valid
"#123456789" // Invalid
];
var regex = /(^\#(([\da-f]){3}){1,2}$)|(^\#(([\da-f]){4}){1,2}$)/i;
inputs.forEach((itm, ind, arr) => console.log(itm, (regex.test(itm) ? "valid" : "-")));
Which should return:
#123 valid
#1234 valid
#12345 -
#123456 valid
#1234567 -
#12345678 valid
#123456789 -

split on words except when phrase contains that word

I am trying to split where clauses, I want to split text on AND|OR|NOT except when NOT is in the 'phrase' NOT IN or NOT LIKE or IS NOT NULL.
1st example:
DEVLDATE IS NOT NULL AND STATUS = D AND PICKUPDATE IS NULL
I expect 3 segments, splitting on the AND's, but not on the NOT in this instance.
2nd ex:
(NOT (STATUS IN ('A','X') )) AND LINEHAUL = 0
I want to split on this NOT & AND, also expecting 3 segments in this instance
I'm trying this look ahead from another almost similar example but it is not splitting at all. I have next to zero regex experience. Not sure what I'm missing or if it's even possible.
Thanks in advance.
var ignoreRegex = /(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b)/g
var filterArray = filterBy.split(new RegExp(ignoreRegex));
Try with:
\b(AND|OR|NOT(?!\s+NULL|IN|LIKE))\b
DEMO
About your regex:
(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b
[NOT IN] - this is character class [...] it will match character
which you put in in, so it can match: N,T,etc. not whole
word/sentence,
([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL]) - this whole part actually can match only one character, because it doesn't use any quantifires or intervals, it doesn't work as you expect at all,
so whole regex should match: some text with AND, OR or NOT, but if line within which the part was matched doesn't consist letters and spaces included in character classes..... so it will not match anything probably.

Regex only resulting in last occurrence

My regex string /(.)(?:(.)(?!.*\2))+\1/g is to find two characters with no repeated characters between them. For example, "aba" or "abcadea" are valid, whereas "abcba" is not valid because the b is present twice within the two a's. Essentially the a's are acting as a borders and no characters should be repeated within them.
The issue I am having is that its not correctly identifying all occurrences where this is happening. Take this example:
var s = "abababab";
s.match(/(.)(?:(.)(?!.*\2))+\1/g)
["bab"] //aba is also a valid occurrence
var s = "aba"
s.match(/(.)(?:(.)(?!.*\2))+\1/g)
["aba"] //it works on the string by itself
Another issue which I believe is related is its only finding the shortest match, so for example:
var s = "abcadefa";
s.match(/(.)(?:(.)(?!.*\2))+\1/g)
["abca"] //should also result abcadefa as a valid string
I cannot find where the bug is in my regex query. Any assistance would be great!

Javascript RegExp Matching weirdness

I have a RegExp:
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi
and some text "Champion"
somehow, this is coming back as a match, am I crazy?
0: "pio"
1: "i"
index: 4
input: "Champion"
length: 2
the loop is here:
// contruct the pattern, dynamically
var someText = "Champion";
var phrase = ".?(NCAA|Division|I|Basketball|Champions,|1939-2011).?";
var pat = new RegExp(phrase, "gi"); // <- ends up being
var result;
while( result = pat.exec(someText) ) {
// do stuff!
}
There has to be something wrong with my RegExp, right?
EDIT:
The .? thing was just a quick and dirty attempt to say that I'd like to match one of those words AND/OR one of those words with a single char on either side. ex:
\sNCAA\s
NCAA
NCAA\s
\sNCAA
GOAL:
I'm trying to do some simple hit highlighting based on some search words. I've got a function that gets all of the text nodes on a page, and I'd like to go through them all and highlight any matches to any of the terms in my phrase variable.
I think that I just need to rework how I am building my RegExp.
Well, first of all you're specifying case-insensitivity, and secondly, you are matching the letter I as one of your matchable string.
Champion would match pio and i, because they both match /.?I.?/gi
It however doesn't match /.?Champions,.?/gi because of the trailing comma.
Add start (^) and end ($) anchors to the regexp.
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
Without the anchors, the regexp's match can start and end anywhere in the string, which is why
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
can match pio and i: because it's actually matching around the (case-insensitive) I. If you leave the anchors off, but remove the ...|I|..., the regex won't match 'Champion':
> /.?(NCAA|Division|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
null
Champion matches /.?I.?/i.
Your own output notes that it's matching the substring "pio".
Perhaps you meant to bound the expression to the start and end of the input, with ^ and $ respectively:
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
I know you said to ignore the .?, but I can't: it's most likely wrong, and it's most likely going to continue to cause you problems. Explain why they're there and we can tell you how to do it properly. :)

Categories