Regex to get every character that appears between equals sign and text - javascript

I would like to get all the text/numbers that appear after the equals sign such that this input
"Input: m = 2, n = 3, indices = [[0,1],[1,1]]"
Would return this output:
[2,3, [[0,1],[1,1]] ]
This is what I have tried:
eachEx.match(/= (.+)/)[1]
However this returns:
2, n = 3, indices = [[0,1],[1,1]]
I have thought of splitting the string and iterating through each element, passing it through the match I have. However, the problem is that I would lose the ability to know whether or not the element in question was meant to be a string or an integer or an array. I need to know this information

I won't be surprised if you end up needing to write a simple parser for this, rather than a single regex. But the specific example given can be done with a single regex. It's just that I suspect when you throw more examples at it, it'll become too complicated.
If the thing that makes the , a delimiter after the 2 and 3 but not in the final match is that the final match is wrapped in [___], then you can use an alternation to adjust what characters are allowed in the capture:
/= (\[[^=]+\]|[^=,]+)/
That says that if the text starts with [ and ends with ], match all non-= inside it. Otherwise, match all non-= and non-,.
Then, to get all the matches, add a g flag and use matchAll, then post-process the iterable you get from it to extract the capture groups:
const eachEx = "Input: m = 2, n = 3, indices = [[0,1],[1,1]]";
const match = eachEx.matchAll(/= (\[[^=]+\]|[^=,]+)/g);
console.log(Array.from(match, ([, capture]) => capture));
As an example of a string that would be parsed incorrectly by that, consider "a = [3, [2, ], b = 3", which gives us the array [ "[3, [2, ]", "3" ] when probably it should be an error:
const eachEx = "Input: a = [3, [2, ], b = 3";
const match = eachEx.matchAll(/= (\[[^=]+\]|[^=,]+)/g);
console.log(Array.from(match, ([, capture]) => capture));
Hence the warning above that you may need to write a simple parser instead.

Related

Regex for Strings without Consecutive Letters

Given an array of words, I want to find all letters that don't appear consecutively (e.g., ee, aa, ZZ, TT) in any of the words.
I have tried a variety of approaches and am hitting a roadblock in my understanding. This should be with vanilla JavaScript ES6 (no libraries or imports).
Here is a short sample word list I'm using to test:
const sampleArr = [
"BORROW", "BRANCH", "CYST", "DEIFIED", "DIPLOMATIC",
"GEESE", "HAIRCUT", "HYMN", "LEVEL", "MOSQUITO",
"MURDRUM", "NON", "POP", "POWER", "GOD", "THY"
]
And here is the code I came up with, but it is only returning me the matches, but I need inverse/reverse matches.
So, if there are no occurrences of "AA" for instance, the code should add "A" to the return array.
I've tried negative lookahead regex, but couldn't get it to work right.
Here is the code I currently have that gives no errors, but doesn't work right:
wordlist = sampleArr
let joinedWordList = wordlist.join('')
console.log(joinedWordList)
let pattern = /([A-Z])\1+/g
doubleLettersFound = joinedWordList.match(pattern)
let singleDoubleLettersFound = doubleLettersFound.filter(el => el.split('')[0]
// console.log(el)
)
console.log(singleDoubleLettersFound)
This is the result I'm receiving:
[ 'RR', 'MM', 'DD', 'EE', 'PP' ]
Also, if it helps, here is an earlier regex (in context) I was trying:
// Join word as string then process; For each letter, if consecutives found anywhere,
// go to next letter; If no consecutives found, add letter to out.lettersNoConsec
haystack = arr.join('')
out.lettersNonConsec = abc.split('').filter(ltr => haystack.match(RegExp(`(${ltr})\\1`)))
// console.log(letters
The pattern with a back reference is definitely a good idea to identify letters that repeat consecutively, but:
As some letters might not occur at all in any of the strings, you cannot only rely on the strings themselves; you need to iterate all the letters of the alphabet -- which is what you seemed to try in the second attempt.
If you join all words together, you should leave a space or some other punctuation between words as otherwise you may get false matches: the last letter of a word might be the same as the first letter of the next word
If you change the regex to have a look-ahead for the second character, it will match only the first character of a repeated sequence of a letter. That single letter will make it easier to work with.
Here is a possible solution:
const sampleArr = [
"BORROW", "BRANCH", "CYST", "DEIFIED", "DIPLOMATIC",
"GEESE", "HAIRCUT", "HYMN", "LEVEL", "MOSQUITO",
"MURDRUM", "NON", "POP", "POWER", "GOD", "THY"
];
const allwords = sampleArr.join(" ").toUpperCase();
const paired = new Set(allwords.match(/([A-Z])(?=\1)/g));
const result = [..."ABCDEFGHIJKLMNOPQRSTUVWXYZ"]
.filter(ch => !paired.has(ch));
console.log(...result);

RegExp object won't execute more than once, why?

I stored a RegExp object in a variable and used it for mapping an array of strings into an array of objects (parsed e-mail recipients), but it doesn't work, as if a RegExp object couldn't run its .exec() method more than once.
However, if I use a regular expression literal instead of the stored object, it works as intended.
I cannot understand the reason behind this behavior. Is it expected, or could it be a bug?
The code:
const pattern = /^\s*(?<name>\w.*?)?\W+(?<address>[a-zA-Z\d._-]+#[a-zA-Z\d._-]+\.[a-zA-Z\d_-]+)\W*$/gi;
const input = "John Doe jdoe#acme.com; Ronald Roe <rroe#acme.com>";
const splitValues = input.split(/[\r\n,;]+/).map(s => s.trim()).filter(s => !!s);
const matchGroups1 = splitValues.map(s => pattern.exec(s));
console.log('Using pattern RegExp object:', JSON.stringify(matchGroups1, null, 2));
const matchGroups2 = splitValues.map(s => /^\s*(?<name>\w.*?)?\W+(?<address>[a-zA-Z\d._-]+#[a-zA-Z\d._-]+\.[a-zA-Z\d_-]+)\W*$/gi.exec(s));
console.log('Using literal regular expression:', JSON.stringify(matchGroups2, null, 2));
The output:
[LOG]: "Using pattern RegExp object:", "[
[
"John Doe jdoe#acme.com",
"John Doe",
"jdoe#acme.com"
],
null
]"
[LOG]: "Using literal regular expression:", "[
[
"John Doe jdoe#acme.com",
"John Doe",
"jdoe#acme.com"
],
[
"Ronald Roe <rroe#acme.com>",
"Ronald Roe",
"rroe#acme.com"
]
]"
test in TypeScript playground
The difference lies in the /g flag that you've passed to both regexes. From MDN:
RegExp.prototype.exec() method with the g flag returns each match and its position iteratively.
const str = 'fee fi fo fum';
const re = /\w+\s/g;
console.log(re.exec(str)); // ["fee ", index: 0, input: "fee fi fo fum"]
console.log(re.exec(str)); // ["fi ", index: 4, input: "fee fi fo fum"]
console.log(re.exec(str)); // ["fo ", index: 7, input: "fee fi fo fum"]
console.log(re.exec(str)); // null
So /g on a regex turns the regex object itself into a funny sort of mutable state-tracker. When you call exec on a /g regex, you're matching and also setting a parameter on that regex which remembers where it left off for next time. The intention is that if you match against the same string, you won't get the same match twice, allowing you to do mutable tricks with while loops similar to the sort of way you would write a global regex match in Perl.
But since you're matching on two different strings, it causes problems. Let's look at a simplified example.
const re = /a/g;
re.exec("ab"); // Fine, we match against "a"
re.exec("ba"); // We start looking at the second character, so we match the "a" there.
re.exec("ab"); // We start looking at the third character, so we get *no* match.
Whereas in the case where you produce the regex every time, you never see this statefulness, since the regex object is made anew each time.
So the summary is: Don't use /g if you're planning to reuse the regex against multiple strings.
See Why does Javascript's regex.exec() not always return the same value?. The issue is that exec is stateful: in other words it starts the next search after the index of the last one. You can avoid the issue by including pattern.lastIndex = 0; in the map; or else by using a literal as you suggest; or probably better, by removing the global (/g) flag on the regular expression, as per the other answer here.
const pattern = /^\s*(?<name>\w.*?)?\W+(?<address>[a-zA-Z\d._-]+#[a-zA-Z\d._-]+\.[a-zA-Z\d_-]+)\W*$/gi;
const input = "John Doe jdoe#acme.com; Ronald Roe <rroe#acme.com>";
const splitValues = input.split(/[\r\n,;]+/).map(s => s.trim()).filter(s => !!s);
const matchGroups1 = splitValues.map(s => {pattern.lastIndex = 0; return pattern.exec(s)});
console.log('Using pattern RegExp object:', JSON.stringify(matchGroups1, null, 2));
const matchGroups2 = splitValues.map(s => /^\s*(?<name>\w.*?)?\W+(?<address>[a-zA-Z\d._-]+#[a-zA-Z\d._-]+\.[a-zA-Z\d_-]+)\W*$/gi.exec(s));
console.log('Using literal regular expression:', JSON.stringify(matchGroups2, null, 2));
Playground link

Finding The location of items in a number array inside a string JavaScript

I am working on a Chat bot for discord that has an addition calculator.... I am currently trying to find the .indexOf the first time a number appears in the string... For ex: !add 1 + 1 would be the command to add 1 and 1... I have an array that I use that contains all single numbers ex:
const singleNumbers = [
0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
];
when I get the string back I am using
for (const num of singleNumbers){
const num1 = msg.content.indexOf(num,);
const add = '+';
const locationOfAdd = msg.content.indexOf(add,);
const num2 = msg.content.indexOf(num,locationOfAdd);
const add1 = msg.content.slice(num1,) * 1;
const add2 = msg.content.slice(num2,) * 1;
msg.reply(add1 + add2);
}
When I run this... It for some reason will only use the first number of the Array so the numbers I use in !add 1 + 1 have to start with 0... so !add 01 + 01 which in math is fine... but for simplicity how do I make it be able to start with any number in the array rather than the first... If you don't know discord.js,
msg.content
Is the string returned so if I type in chat...
Hey Guys what's goin on?
it would return as a String ("Hey Guys what's goin on?")...
To sum this all up... I am wondering how to make my const num that I declared in my for of loop check for all the numbers in its array rather than just the first in my case 0.
If you just want to find the index for the first digit str.search(/[0-9]/) or str.match/[0-9]/).index.
Using regex to extract the numbers, and reduce to add them:
match on: string starts with !add, at least one space, at least one digit as a capture group, optional space and a + sign, another series of digits as a capture group.
You get an array with [full match, digit group 1, digit group 2]. I slice off the full match, and then use reduce to (+x converts the string to a number; you could also use Number(x)) to add each number and collect it in sum.
const sum = msg.content.match(/^!add\s+(\d+)\s*\+\s*(\d+)/).slice(1).reduce((sum,x)=>+x+sum,0)
Note: \d is the same as [0-9]
JavaScript.info RegExp tutorial

Regex to match all combinations of a given string

I am trying to make a regex to matches all the combinations of a given string. For example of the string is "1234", answers would include:
"1"
"123"
"4321"
"4312"
Nonexamples would include:
"11"
"11234"
"44132"
If it matters, the programming language I am using is javascript.
Thank you for any help.
You may use this lookahead based assertions in your regex:
^(?!(?:[^1]*1){2})(?!(?:[^2]*2){2})(?!(?:[^3]*3){2})(?!(?:[^4]*4){2})[1234]+$
RegEx Demo
Here we have 4 lookahead assertions:
(?!(?:[^1]*1){2}): Assert that we don't have more than one instance of 1
(?!(?:[^2]*2){2}): Assert that we don't have more than one instance of 2
(?!(?:[^3]*3){2}): Assert that we don't have more than one instance of 3
(?!(?:[^4]*4){2}): Assert that we don't have more than one instance of 4
We use [1234]+ to match any string with these 4 characters.
A combination of group captures using character classes and negative look-ahead assertions using back-references would do the trick.
Let's begin with simply matching any combination of 1, 2, 3, and 4 using a character class,[1-4], and allowing any length from 1 to 4 characters. {1,4}.
const regex = /^[1-4]{1,4}$/;
// Create set of inputs from 0 to 4322
const inputs = Array.from(new Array(4323), (v, i) => i.toString());
// Output only values that match criteria
console.log(inputs.filter((input) => regex.test(input)));
When that code is run, it's easy to see that although only numbers consisting of some combination of 1, 2, 3, and 4 are matched, it also is matching numbers with repeating combinations (e.g. 11, 22, 33, 112, etc). Obviously, this was not what was desired.
To prevent repeating characters requires a reference to previously matched characters and then a negation of them from any following matched characters. Negative look-aheads, (?!...) using a back-reference, \1-9, can accomplish this.
Building on the previous example with a subset of the inputs (limiting to a max length of two characters for the moment) would now incorporate a group match surrounding the first character, ([1-4]), followed by a negative look-ahead with a back-reference to the first capture, (?!\1), and finally a second optional character class.
const regex = /^([1-4])(?!\1)[1-4]?$/;
// Create set of inputs from 0 to 44
const inputs = Array.from(new Array(45), (v, i) => i.toString());
// Output only values that match criteria
console.log(inputs.filter((input) => regex.test(input)));
This matches the desired characters with no repetition!
Expanding this pattern to include back-references for each of the previously matched characters up to the desired max length of 4 yields the following expression.
const regex = /^([1-4])((?!\1)[1-4])?((?!\1|\2)[1-4])?((?!\1|\2|\3)[1-4])?$/;
// Create set of inputs from 0 to 4322
const inputs = Array.from(new Array(4323), (v, i) => i.toString());
// Output only values that match criteria
console.log(inputs.filter((input) => regex.test(input)));
Hope this helps!
You don't need to use regex for this. The snippet below does the following:
Loop over possible combinations (a => s) (1, 123, 4321, etc.)
Copy the current combination so as not to overwrite it (s2 = s)
Loop over the characters of test string (x => ch) (1234 => 1, 2, 3, 4)
Replace common characters in the combination string shared with the test string (s2.replace)
For example in the combination 1, the 1 will be replaced when the loop gets to the character 1 in 1234 resulting in an empty string
If the combination string's length reaches 0 (s2.length == 0) write the result to the console and break out of the loop (no point in continuing to attempt to replace on an empty string)
const x = "1234"
const a = ["1","123","4321","4312","11","11234","44132"]
a.forEach(function(s) {
var s2 = s
for(var ch of x) {
s2 = s2.replace(ch, '')
if(s2.length == 0) {
console.log(s);
break;
}
}
})
Results:
1
123
4321
4312

Is there a way to get which group is the one that matched in a regex. Prefer Javascript but Python OK too.

Is there a way to get which group matches a regex?
for example (using js but could be re-expressed in Python etc):
`
myRegex = /(this)|(that)|(other)/ig //describes 3 group captures
// The groups number (this==0, that==1, other==2) --> I'd like to get the capturing group number returned in the matches.
myString = "this is awesome cause that is the other thing this needs"
//result desired
getMyMatchesByCaptureIndex(myRegex,myString) // function I'd like help with
//returns a result like this...
[
//[indexOfGroup, indexOfBeginningOfMatch, lengthOfMatch]...
[0, 0, 4], //a group 0 match at position 0 length of 4
[1, 22, 4], //a group 1 match at position 22 length of 4
[2, 34, 4], //a group 2 match at position 34 length of 4
[0, 46, 5] //a group 0 match at position 46 length of 5
]
`
Prefer Javascript or Python if possible and allow more than 9 groups. My simplified example could include more complex regex constructs inside the parenthesis but I used strings just for simplicity of asking. (e.g. instead of (this) could be (t[0-9][A-F]...) etc) so I'd really like this to be a regex or parser centric answer if possible!
Thanks much!
We can use RegExp.exec and RegExp.lastIndex to get all the information in one go:
function matchesByCaptureIndex(regex, text) {
var match = null, result = [];
while (match = regex.exec(text)) {
var matchedGroup = match.slice(1).indexOf(match[0]);
var matchLength = match[0].length;
var startIndex = regex.lastIndex - matchLength;
result.push([matchedGroup, startIndex, matchLength]);
}
return result;
}

Categories