Regex - conditional match for hyphened appendices - javascript

I'm dealing with 8 character jobnames that must follow convention, but I want to allow additional characters if appended with a hyphen.
I have come up with this:
\w{2}YYY\w{3}(?(-).*|\b)
Which matches correctly:
XXYYY001 >> match
XXYYY001-TEST >> match
XXYYY001123 >> no match
This seems cumbersome however, so I just wanna know the most efficient expression.
EDIT: Thanks Wiktor, your answer worked.
And to take it one step further: If I wanted to use a variable for YYY?

Like this.
explanation:
^ matches beginning of string
\w{2}YYY\w{3} is the part you wrote. Matches main pattern
(\-.*) matches a dash, followed by anything (including nothing. see test #4)
? Means the previous match can occur zero or one times
const pattern = /^\w{2}YYY\w{3}(\-.*)?$/;
const strings = [
'XXYYY001',
'XXYYY001XXXTEST',
'XXYYY001-TEST',
'XXYYY003-',
'FARFXXYYY003',
'FARFXXYYY003-TEST'
];
strings.forEach(string => {
let conforms = pattern.test(string);
console.log(string,conforms);
});

Related

How can I include the delimiter with regex String.split()?

I need to parse the tokens from a GS1 UDI format string:
"(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
I would like to split that string with a regex on the "(nnn)" and have the delimiter included with the split values, like this:
[ "(20)987111", "(240)A", "(10)ABC123", "(17)2022-04-01", "(21)888888888888888" ]
Below is a JSFiddle with examples, but in case you want to see it right here:
// This includes the delimiter match in the results, but I want the delimiter included WITH the value
// after it, e.g.: ["(20)987111", ...]
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\))/).filter(Boolean))
// Result: ["(20)", "987111", "(240)", "A", "(10)", "ABC123", "(17)", "2022-04-01", "(21)", "888888888888888"]
// If I include a pattern that should (I think) match the content following the delimiter I will
// only get a single result that is the full string:
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)\W+)/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
// I think this is because I'm effectively mathching the entire string, hence a single result.
// So now I'll try to match only up to the start of the next "(":
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)(^\())/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
I've found and read this question, however the examples there are matching literals and I'm using character classes and getting different results.
I'm failing to create a regex pattern that will provide what I'm after. Here's a JSFiddle of some of the things I've tried: https://jsfiddle.net/6bogpqLy/
I can't guarantee the order of the "application identifiers" in the input string and as such, match with named captures isn't an attractive option.
You can split on positions where parenthesised element follows, by using a zero-length lookahead assertion:
const text = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
const parts = text.split(/(?=\(\d+\))/)
console.log(parts)
Instead of split use match to create the array. Then find 1) digits in parenthesis, followed by a group that might contain a digit, a letter, or a hyphen, and then 2) group that whole query.
(PS. I often find a site like Regex101 really helps when it comes to testing out expressions outside of a development environment.)
const re = /(\(\d+\)[\d\-A-Z]+)/g;
const str = '(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888';
console.log(str.match(re));

split on words except when phrase contains that word

I am trying to split where clauses, I want to split text on AND|OR|NOT except when NOT is in the 'phrase' NOT IN or NOT LIKE or IS NOT NULL.
1st example:
DEVLDATE IS NOT NULL AND STATUS = D AND PICKUPDATE IS NULL
I expect 3 segments, splitting on the AND's, but not on the NOT in this instance.
2nd ex:
(NOT (STATUS IN ('A','X') )) AND LINEHAUL = 0
I want to split on this NOT & AND, also expecting 3 segments in this instance
I'm trying this look ahead from another almost similar example but it is not splitting at all. I have next to zero regex experience. Not sure what I'm missing or if it's even possible.
Thanks in advance.
var ignoreRegex = /(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b)/g
var filterArray = filterBy.split(new RegExp(ignoreRegex));
Try with:
\b(AND|OR|NOT(?!\s+NULL|IN|LIKE))\b
DEMO
About your regex:
(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b
[NOT IN] - this is character class [...] it will match character
which you put in in, so it can match: N,T,etc. not whole
word/sentence,
([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL]) - this whole part actually can match only one character, because it doesn't use any quantifires or intervals, it doesn't work as you expect at all,
so whole regex should match: some text with AND, OR or NOT, but if line within which the part was matched doesn't consist letters and spaces included in character classes..... so it will not match anything probably.

RegEx - Get All Characters After Last Slash in URL

I'm working with a Google API that returns IDs in the below format, which I've saved as a string. How can I write a Regular Expression in javascript to trim the string to only the characters after the last slash in the URL.
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9'
Don't write a regex! This is trivial to do with string functions instead:
var final = id.substr(id.lastIndexOf('/') + 1);
It's even easier if you know that the final part will always be 16 characters:
var final = id.substr(-16);
A slightly different regex approach:
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
Breaking down this regex:
\/ match a slash
( start of a captured group within the match
[^\/] match a non-slash character
+ match one of more of the non-slash characters
) end of the captured group
\/? allow one optional / at the end of the string
$ match to the end of the string
The [1] then retrieves the first captured group within the match
Working snippet:
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9';
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
// display result
document.write(afterSlashChars);
Just in case someone else comes across this thread and is looking for a simple JS solution:
id.split('/').pop(-1)
this is easy to understand (?!.*/).+
let me explain:
first, lets match everything that has a slash at the end, ok?
that's the part we don't want
.*/ matches everything until the last slash
then, we make a "Negative lookahead" (?!) to say "I don't want this, discard it"
(?!.*) this is "Negative lookahead"
Now we can happily take whatever is next to what we don't want with this
.+
YOU MAY NEED TO ESCAPE THE / SO IT BECOMES:
(?!.*\/).+
this regexp: [^\/]+$ - works like a champ:
var id = ".../base/nabb80191e23b7d9"
result = id.match(/[^\/]+$/)[0];
// results -> "nabb80191e23b7d9"
This should work:
last = id.match(/\/([^/]*)$/)[1];
//=> nabb80191e23b7d9
Don't know JS, using others examples (and a guess) -
id = id.match(/[^\/]*$/); // [0] optional ?
Why not use replace?
"http://google.com/aaa".replace(/(.*\/)*/,"")
yields "aaa"

How to make this simple regexp?

I need to make a string starts and ends with alphanumeric range between 5 to 20 characters and it could have a space or none between characters. /^[a-z\s?A-Z0-9]{5,20}$/ but this is not working.
EDIT
test test -should pass
testtest -should pass
test test test -should not pass
You can't do this with traditional regex without writing a ridiculously long expression, so you need to use a look-ahead:
/^(?=(\w| ){15,20}$)\w+ ?\w+$/
This says, make sure there are between 15 and 20 characters in the match, then match /\w+ \w+/
Note I used \w for simplification. It is the same as your character class above except it also accepts underscores. If you don't want to match them you have to do:
/^(?=[a-zA-Z0-9 ]{15,20}$)[a-zA-Z0-9]+ ?[a-zA-Z0-9]+$/
You can't put a ? inside of [...]. [...] is used to specify a set of characters precisely, you can't maybe (?) have a character inside a set of characters. The occurrence of any specific characters is already optional, the ? is meaningless.
If you allow any number of spaces inside your match, just remove the question mark. If you want to allow a single space but no more, then regular expressions alone can't do that for you, you'd need something like
if (myString.match(/^[a-z\sA-Z0-9]{5,20}$/ && myString.match(/\s/g).length <= 1)
You couldn't do this with a single traditional regex without it being dozens of lines long; regexes are meant for matching more simpler patterns than this.
If you only want to use regexes, you could use two instead of one. The first matches the general pattern, the second ensures that only one non-space characters is found.
if (myString.match(/^[a-z\sA-Z0-9]{5,20}$/ && myString.match(/^[^\s]*\s?[^\s]*$/))) {
Example Usage
inputs = ["test test", "testtest", "test test test"];
for (index in inputs) {
var myString = inputs[index];
if (myString.match(/^[a-z\sA-Z0-9]{5,20}$/ && myString.match(/^[^\s]*\s?[^\s]*$/))) {
console.log(myString + " matches.")
} else {
console.log(myString + " does not match.")
}
}
This produces the output specified in your question.
Meh , So here's the ridiculously long traditional regex for the same
(?i)[a-z0-9]+( [a-z0-9]+)?{5,12}
js vesrion (w/o the nested quantifier)
/^([a-z0-9]( [a-z0-9])?){5,12}$/i

Javascript RegExp quantifier issue

I have some JavaScript that runs uses a replace with regular expressions to modify content on a page. I'm having a problem with a specific regex quantifier, though. All the documentation I've read (and I know it work in regex in other languages, too) says that JavaScript supports the {N}, {N,} and {N,N} quantifiers. That is, you can specify a particular number of matches you want, or a range of matches. E.g. (zz){5,} matches at least 10 z's in a row, and z{5,10} would match any number of z's from 5 to 10, no more and no less.
The problem is, I can match an exact number (e.g. z{5}) but not a range. The nearest I can figure is that it has something to do with the comma in the regex string, but I don't understand why and can't get around this. I have tried escaping the comma and even using the unicode hexidecimal string for comma (\u002C), but to no avail.
To clear up any possible misunderstandings, and to address some of the questions asked in the comments, here is some additional information (also found in the comments): I have tried creating the array in all possible ways, including var = [/z{5,}/gi,/a{4,5}/gi];, var = [new RegExp('z{5,}', 'gi'), new RegExp('a{4,5}', 'gi')];, as well as var[0] = new RegExp('z{5,}'), 'gi');, var[1] = /z{5,}/gi;, etc. The array is used in a for-loop as somevar.replace(regex[i], subst[i]);.
Perhaps I'm misunderstanding the question, but it seems like the Javascript implementation of the {n} operators is pretty good:
"foobar".match(/o{2,4}/); // => matches 'oo'
"fooobar".match(/o{2,4}/); // => matches 'ooo'
"foooobar".match(/o{2,4}/); // => matches 'oooo'
"fooooooobar".match(/o{2,4}/); // => matches 'oooo'
"fooooooobar".match(/o{2,4}?/); // => lazy, matches 'oo'
"foooobar".match(/(oo){2}/); // => matches 'oooo', and captures 'oo'
"fobar".match(/[^o](o{2,3})[^o]/); // => no match
"foobar".match(/[^o](o{2,3})[^o]/); // => matches 'foob' and captures 'oo'
"fooobar".match(/[^o](o{2,3})[^o]/); // => matches 'fooob' and captures 'oo'
"foooobar".match(/[^o](o{2,3})[^o]/); // => no match
It works for me.
var regex = [/z{5,}/gi,/a{4,5}/gi];
var subst = ['ZZZZZ','AAAAA'];
var somevar = 'zzzzz aaaaa aaaaaaa zzzzzzzzzz aaazzzaaaaaa';
print(somevar);
for (var i=0; i<2; i++) {
somevar = somevar.replace(regex[i], subst[i]);
}
print(somevar);
output:
zzzzz aaaaa aaaaaaa zzzzzzzzzz aaazzzaaaaaa
ZZZZZ AAAAA AAAAAaa ZZZZZ aaazzzAAAAAa
The constructor version works, too:
var regex = [new RegExp('z{5,}','gi'),new RegExp('a{4,5}','gi')];
See it in action on ideone.com.
I think I've figured it out. I was building the array various ways to get it to work, but what I think made the difference was using single-quotes around the regex string, instead of leaving it open like [/z{5,}/,/t{7,9}/gi]. So when I did ['/z{5,}/','/t{7,9}/gi'] that seems to have fixed it. Even though, like in Alan's example, it does sometimes work fine without them. Just not in my case I guess.

Categories