Regex only resulting in last occurrence - javascript

My regex string /(.)(?:(.)(?!.*\2))+\1/g is to find two characters with no repeated characters between them. For example, "aba" or "abcadea" are valid, whereas "abcba" is not valid because the b is present twice within the two a's. Essentially the a's are acting as a borders and no characters should be repeated within them.
The issue I am having is that its not correctly identifying all occurrences where this is happening. Take this example:
var s = "abababab";
s.match(/(.)(?:(.)(?!.*\2))+\1/g)
["bab"] //aba is also a valid occurrence
var s = "aba"
s.match(/(.)(?:(.)(?!.*\2))+\1/g)
["aba"] //it works on the string by itself
Another issue which I believe is related is its only finding the shortest match, so for example:
var s = "abcadefa";
s.match(/(.)(?:(.)(?!.*\2))+\1/g)
["abca"] //should also result abcadefa as a valid string
I cannot find where the bug is in my regex query. Any assistance would be great!

Related

split on words except when phrase contains that word

I am trying to split where clauses, I want to split text on AND|OR|NOT except when NOT is in the 'phrase' NOT IN or NOT LIKE or IS NOT NULL.
1st example:
DEVLDATE IS NOT NULL AND STATUS = D AND PICKUPDATE IS NULL
I expect 3 segments, splitting on the AND's, but not on the NOT in this instance.
2nd ex:
(NOT (STATUS IN ('A','X') )) AND LINEHAUL = 0
I want to split on this NOT & AND, also expecting 3 segments in this instance
I'm trying this look ahead from another almost similar example but it is not splitting at all. I have next to zero regex experience. Not sure what I'm missing or if it's even possible.
Thanks in advance.
var ignoreRegex = /(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b)/g
var filterArray = filterBy.split(new RegExp(ignoreRegex));
Try with:
\b(AND|OR|NOT(?!\s+NULL|IN|LIKE))\b
DEMO
About your regex:
(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b
[NOT IN] - this is character class [...] it will match character
which you put in in, so it can match: N,T,etc. not whole
word/sentence,
([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL]) - this whole part actually can match only one character, because it doesn't use any quantifires or intervals, it doesn't work as you expect at all,
so whole regex should match: some text with AND, OR or NOT, but if line within which the part was matched doesn't consist letters and spaces included in character classes..... so it will not match anything probably.

Get id from url

I have the following example url: #/reports/12/expense/11.
I need to get the id just after the reports -> 12. What I am asking here is the most suitable way to do this. I can search for reports in the url and get the content just after that ... but what if in some moment I decide to change the url, I will have to change my algorythm.
What do You think is the best way here. Some code examples will be also very helpfull.
It's hard to write code that is future-proof since it's hard to predict the crazy things we might do in the future!
However, if we assume that the id will always be the string of consecutive digits in the URL then you could simply look for that:
function getReportId(url) {
var match = url.match(/\d+/);
return (match) ? Number(match[0]) : null;
}
getReportId('#/reports/12/expense/11'); // => 12
getReportId('/some/new/url/report/12'); // => 12
You should use a regular expression to find the number inside the string. Passing the regular expression to the string's .match() method will return an array containing the matches based on the regular expression. In this case, the item of the returned array that you're interested in will be at the index of 1, assuming that the number will always be after reports/:
var text = "#/reports/12/expense/11";
var id = text.match(/reports\/(\d+)/);
alert(id[1]);
\d+ here means that you're looking for at least one number followed by zero to an infinite amount of numbers.
var text = "#/reports/12/expense/11";
var id = text.match("#/[a-zA-Z]*/([0-9]*)/[a-zA-Z]*/")
console.log(id[1])
Regex explanation:
#/ matches the characters #/ literally
[a-zA-Z]* - matches a word
/ matches the character / literally
1st Capturing group - ([0-9]*) - this matches a number.
[a-zA-Z]* - matches a word
/ matches the character / literally
Regular expressions can be tricky (add expensive). So usually if you can efficiently do the same thing without them you should. Looking at your URL format you would probably want to put at least a few constraints on it otherwise the problem will be very complex. For instance, you probably want to assume the value will always appear directly after the key so in your sample report=12 and expense=11, but report and expense could be switched (ex. expense/11/report/12) and you would get the same result.
I would just use string split:
var parts = url.split("/");
for(var i = 0; i < parts.length; i++) {
if(parts[i] === "report"){
this.reportValue = parts[i+1];
i+=2;
}
if(parts[i] === "expense"){
this.expenseValue = parts[i+1];
i+=2;
}
}
So this way your key/value parts can appear anywhere in the array
Note: you will also want to check that i+1 is in the range of the parts array. But that would just make this sample code ugly and it is pretty easy to add in. Depending on what values you are expecting (or not expecting) you might also want to check that values are numbers using isNaN

capture with regex in javascript

I have a string like "ListUI_col_order[01234567][5]". I'd like to capture the two numeric sequences from the string. The last part between the square brackets may contain 2 digits, while the first numeric sequence always contains 8 digits (And the numbers are dynamically changing of course.) Im doing this in javascript and the code for the first part is simple: I get the only 8digit sequence from the string:
var str = $(this).attr('id');
var unique = str.match(/([0-9]){8}/g);
Getting the second part is a bit complicated to me. I cannot simply use:
var column = str.match(/[0-9]{1,2}/g)
Because this will match '01', '23', '45', '67', '5' in our example, It's clear. Although I'm able to get the information what I need as column[4], because the first part always contains 8 digits, but I'd like a nicer way to retrieve the last number.
So I define the contex and I can tell the regex that Im looking for a 1 or 2 digit number which has square brackets directly before and after it:
var column = str.match(/\[[0-9]{1,2}\]/g)
// this will return [5]. which is nearly what I want
So to get Only the numeric data I use parenthesis to capture only the numbers like:
var column = str.match(/\[([0-9]){1,2}\]/g)
// this will result in:
// column[0] = '[5]'
// column[1] = [5]
So my question is how to match the '[5]' but only capture the '5'? I have only the [0-9] between the parenthesis, but this will still capture the square brackets as well
You can get both numbers in one go :
var m = str.match(/\[(\d{8})\]\[(\d{1,2})\]$/)
For your example, this makes ["[01234567][5]", "01234567", "5"]
To get both matches as numbers, you can then do
if (m) return m.slice(1).map(Number)
which builds [1234567, 5]
Unfortunately, JavaScript does not support the lookbehind necessary to do this. In other languages such as PHP, it'd be as simple as /(?<=\[)\d{1,2}(?=\])/, but in JavaScript I am not aware of any way to do this other than use a capturing subpattern as you are here, and getting that index from the result array.
Side-note, it's usually better to put the quantifier inside the capturing group - otherwise you're repeating the group itself, not its contents!

Javascript regex, determining what group was matched on

I have the following regex in javascript for matching similar to book[n], book[1,2,3,4,5,...,n], book[author="Kristian"] and book[id=n] (n is an arbitrary number):
var opRegex = /\[[0-9]+\]|\[[0-9]+,.*\]|\[[a-zA-Z]+="*.+"*\]/gi;
I can use this in the following way:
// If there is no match in any of the groups hasOp will be null
hasOp = opRegex.exec('books[0]');
/*
Result: ["[0]", index: 5, input: "books[0]"]
*/
As shown above I not only get the value but also the [ and ]. I can avoid this by using groups. So I changed the regex to:
var opRegex = /\[([0-9]+)\]|\[([0-9]+,.*)\]|\[([a-zA-Z]+=".+")\]/gi;
Running the same as above the results will instead be:
["[0]", "0", undefined, undefined, index: 5, input: "books[0]"]
Above I get the groups as index 1, 2 and 3 in the array. For this example the match is in the first but if the match is in the second regex group the match will be in index 2 or the array.
Can I change my first regex to get the value without the brackets or do I go with the grouped approach and a while loop to get the first defined value?
Anything else I'm missing? Is it greedy?
Let me know if you need more information and I'll be happy to provide it.
I have a few suggestions. First, especially since you are looking for literal brackets, avoid the regex brackets when you can (replace [0-9] with \d, for example). Also, you were allowing multiple quotes with the *, so I changed it to "?. But most importantly, I moved the match for the brackets outside the alternation, since they should be in every alternate match. That way, you have the same group no matter which part matches.
/\[(\d+(,\d+)*|[a-zA-Z]+="?[^\]]+"?)\]/gi

Javascript RegExp Matching weirdness

I have a RegExp:
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi
and some text "Champion"
somehow, this is coming back as a match, am I crazy?
0: "pio"
1: "i"
index: 4
input: "Champion"
length: 2
the loop is here:
// contruct the pattern, dynamically
var someText = "Champion";
var phrase = ".?(NCAA|Division|I|Basketball|Champions,|1939-2011).?";
var pat = new RegExp(phrase, "gi"); // <- ends up being
var result;
while( result = pat.exec(someText) ) {
// do stuff!
}
There has to be something wrong with my RegExp, right?
EDIT:
The .? thing was just a quick and dirty attempt to say that I'd like to match one of those words AND/OR one of those words with a single char on either side. ex:
\sNCAA\s
NCAA
NCAA\s
\sNCAA
GOAL:
I'm trying to do some simple hit highlighting based on some search words. I've got a function that gets all of the text nodes on a page, and I'd like to go through them all and highlight any matches to any of the terms in my phrase variable.
I think that I just need to rework how I am building my RegExp.
Well, first of all you're specifying case-insensitivity, and secondly, you are matching the letter I as one of your matchable string.
Champion would match pio and i, because they both match /.?I.?/gi
It however doesn't match /.?Champions,.?/gi because of the trailing comma.
Add start (^) and end ($) anchors to the regexp.
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
Without the anchors, the regexp's match can start and end anywhere in the string, which is why
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
can match pio and i: because it's actually matching around the (case-insensitive) I. If you leave the anchors off, but remove the ...|I|..., the regex won't match 'Champion':
> /.?(NCAA|Division|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
null
Champion matches /.?I.?/i.
Your own output notes that it's matching the substring "pio".
Perhaps you meant to bound the expression to the start and end of the input, with ^ and $ respectively:
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
I know you said to ignore the .?, but I can't: it's most likely wrong, and it's most likely going to continue to cause you problems. Explain why they're there and we can tell you how to do it properly. :)

Categories