Javascript regular expression : match 2 substrings in any order - javascript

After reading how to write a regexp in Javascript, I'm still pretty confused how to write this one...
I want to match every string containing at least one occurence of 2 substrings, in any order.
Say sub1 = "foo" and sub2 = "bar"
foo => doesn't match
bar => doesn't match
foobar => matches
barfoo => matches
foohellobar => matches
Could somebody help me with this ?
Additionnally, I'd like to exclude another substring. So it would match the strings containing the 2 substrings like before, but not containing a sub3, regardless of its order with the 2 others.
Thanks a lot

You can use indexOf:
str.indexOf(sub1) > -1 && str.indexOf(sub2) > -1
Or includes in ES6:
str.includes(sub1) && str.includes(sub2)
Or if you have an array of substrings:
[sub1, sub2/*, ...*/].every(sub => str.includes(sub));

This will work:
/.*foo.*bar|.*bar.*foo/g
.* matches 0 or many characters (where . matches any character and * stands for 0 or many)
| is regex' or operator
Generated code from regex101:
var re = /.*foo.*bar|.*bar.*foo/g;
var str = 'foobar';
var m;
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
// View your result using the m-variable.
// eg m[0] etc.
}
DEMO
That being said, better use Oriol's answer using indexOf() or includes().

I would not use a complicated regex, but instead just used logical operand &&.
var param = 'foobar';
alert(param.match(/foo/) && param.match(/bar/) && !param.match(/zoo/));
param = 'foobarzoo';
alert(param.match(/foo/) && param.match(/bar/) && !param.match(/zoo/));

Related

Regex expression to get numbers without parentheses ()

I'm trying to create a regex that will select the numbers/numbers with commas(if easier, can trim commas later) that do not have a parentheses after and not the numbers inside the parentheses should not be selected either.
Used with the JavaScript's String.match method
Example strings
9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
What i have so far:
/((^\d+[^\(])|(,\d+,)|(,*\d+$))/gm
I tried this in regex101 and underlined the numbers i would like to match and x on the one that should not.
You could start with a substitution to remove all the unwanted parts:
/\d*\(.*?\),?//gm
Demo
This leaves you with
5,10
10,2,5,
10,7,2,4
which makes the matching pretty straight forward:
/(\d+)/gm
If you want it as a single match expression you could use a negative lookbehind:
/(?<!\([\d,]*)(\d+)(?:,|$)/gm
Demo - and here's the same matching expression as a runnable javascript (skeleton code borrowed from Wiktor's answer):
const text = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4`;
const matches = Array.from(text.matchAll(/(?<!\([\d,]*)(\d+)(?:,|$)/gm), x=>x[1])
console.log(matches);
Here, I'd recommend the so-called "best regex trick ever": just match what you do not need (negative contexts) and then match and capture what you need, and grab the captured items only.
If you want to match integer numbers that are not matched with \d+\([^()]*\) pattern (a number followed with a parenthetical substring), you can match this pattern or match and capture the \d+, one or more digit matching pattern, and then simply grab Group 1 values from matches:
const text = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4`;
const matches = Array.from(text.matchAll(/\d+\([^()]*\)|(\d+)/g), x=> x[1] ?? "").filter(Boolean)
console.log(matches);
Details:
text.matchAll(/\d+\([^()]*\)|(\d+)/g) - matches one or more digits (\d+) + ( (with \() + any zero or more chars other than ( and ) (with [^()]*) + \) (see \)), or (|) one or more digits captured into Group 1 ((\d+))
Array.from(..., x=> x[1] ?? "") - gets Group 1 value, or, if not assigned, just adds an empty string
.filter(Boolean) - removes empty strings.
Using several replacement regexes
var textA = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
`
console.log('A', textA)
var textB = textA.replace(/\(.*?\),?/g, ';')
console.log('B', textB)
var textC = textB.replace(/^\d+|\d+$|\d*;\d*/gm, '')
console.log('C', textC)
var textD = textC.replace(/,+/g, ' ').trim(',')
console.log('D', textD)
With a loop
Here is a solution which splits the lines on comma and loops over the pieces:
var inside = false;
var result = [];
`9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
`.split("\n").map(line => {
let pieceArray = line.split(",")
pieceArray.forEach((piece, k) => {
if (piece.includes('(')) {
inside = true
} else if (piece.includes(')')) {
inside = false
} else if (!inside && k > 0 && k < pieceArray.length-1 && !pieceArray[k-1].includes(')')) {
result.push(piece)
}
})
})
console.log(result)
It does print the expected result: ["5", "7"]

Match all instances of character except the first one, without lookbehind

I’m struggling with this simple regex that is not working correctly in Safari:
(?<=\?.*)\?
It should match each ?, except of the first one.
I know that lookbehind is not working on Safari yet, but I need to find some workaround for it. Any suggestions?
You can use an alternation capture until the first occurrence of the question mark. Use that group again in the replacement to leave it unmodified.
In the second part of the alternation, match a questionmark to be replaced.
const regex = /^([^?]*\?)|\?/g;
const s = "test ? test ? test ?? test /";
console.log(s.replace(regex, (m, g1) => g1 ? g1 : "[REPLACE]"));
There are always alternatives to lookbehinds.
In this case, all you need to do is replace all instances of a character (sequence), except the first.
The .replace method accepts a function as the second argument.
That function receives the full match, each capture group match (if any), the offset of the match, and a few other things as parameters.
.indexOf can report the first offset of a match.
Alternatively, .search can also report the first offset of a match, but works with regexes.
The two offsets can be compared inside the function:
const yourString = "Hello? World? What? Who?",
yourReplacement = "!",
pattern = /\?/g,
patternString = "?",
firstMatchOffsetIndexOf = yourString.indexOf(patternString),
firstMatchOffsetSearch = yourString.search(pattern);
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstMatchOffsetIndexOf){
return yourReplacement;
}
return match;
}));
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstMatchOffsetSearch){
return yourReplacement;
}
return match;
}));
This works for character sequences, too:
const yourString = "Hello. Hello. Hello. Hello.",
yourReplacement = "Hi",
pattern = /Hello/g,
firstOffset = yourString.search(pattern);
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstOffset){
return yourReplacement;
}
return match;
}));
Split and join with
var s = "one ? two ? three ? four"
var l = s.split("?") // Split with ?
var first = l.shift() // Get first item and remove from l
console.log(first + "?" + l.join("<REPLACED>")) // Build the results

How to find all appearances of text after a specific word?

I get a string like:
str = “Test/hello/filename/12345678/first
Hddhkhd
Hdhal
filename/1212abcd/second”
I want to get an array of the all strings that comes after “filename//“ and I know that after the “/“ there is an 8 letter word that I want to get.
In this case, I want to get an array that will be:
strArr = [“12345678”, “1212abcd”]
How do I solve this problem?
A regex that captures the 8 characters that immediately follow a literal "filename//":
/filename\/\/(.{8})/
Try use this regex first:
filename\/\w{8}
and after it, slice from the result by this regex:
\w{8}$
First you will get:
filename/12345678
filename/1212abcd
Second you will get :
12345678
1212abcd
You might also capture in a group matching 8 times not a forward slash or a newline after matching /filename
\bfilename\/([^\/\n]{8})
Regex demo
If you want to match 8 or more times you could use {8,} instead or if you want to match 1 or more times you could use a +.
If you don't want to match whitespace characters you could change the \n to \s
const regex = /filename\/([^\/\n]{8})/g;
const str = `Test/hello/filename/12345678/first
Hddhkhd
Hdhal
filename/1212abcd/second`;
let m;
while ((m = regex.exec(str)) !== null) {
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
console.log(m[1]);
}
You can use the following code. It will match all characters after the filename/ until it encounters another /. After you get the matches in an array you can map it out and replace all the filename/ with '':
let a = /filename\/[^\/]+/g;
let b = 'Test/hello/filename/12345678/first Hddhkhd Hdhal filename/1212abcd/second';
let c = b.match(a).map(x=>x.replace('filename/',''));
console.log(c);
For explanation check this REGEX
var arr = "Test/hello/filename/12345678/first Hddhkhd Hdhal filename/1212abcd/second".match(/(?<=filename\/)(.*?)(?=\/)/g);
console.log(arr)
OR
For unsupported Lookbehinds browser use Array#map after regex
var arr = "Test/hello/filename/12345678/first Hddhkhd Hdhal filename/1212abcd/second".match(/filename\/(.*?)\//g).map(i=> i.split('/')[1]);
console.log(arr)

Word counter in javascript

I'm working on a lab assignment for a web applications class and am stuck on implementing a word counter for a basic HTML webpage. The setup of the tests and HTML are already done for us. I simply need to write a function called countWords that takes a string and returns the number of words. It works differently from your traditional word counter though. A word is defined as anything A-Z. Everything else is considered not part of a word. So, if the string is just "234##$^" then the word count is 0. So, I'm not just counting white space like most word counters. All the answers I've found on StackOverflow to similar questions try to just count white space and don't work for my situation. Hence why I made a new question.
My idea was to have a return statement that matches any grouping of a-z using a regular expression and return the length. Then, have a conditional to check for the empty string or string with no letters a-z.
function countWords(s) {
if(s === "" || s === "%$#^23#") {
return 0
}
return s.match(/[^a-z]/gi).length
}
Right now the if statement is just matching the two test cases so that I can pass my tests. I'm not sure how to go about writing another match regular expression to check for no letters in the string or the empty string. Any help is appreciated! Been stuck for a while.
const str1 = '%$#^23#';
const str2 = 'String with ___ special characters and #$&# white spaces !!!';
const str3 = 'Special &$%# characters --> and %$#^5# connected,words but our <++##||++> function,still_works!';
const wordCount = (str) => str.replace(/[\W_\d]/g,' ').split(' ').filter(Boolean).length;
console.log(wordCount(str1)); // 0
console.log(wordCount(str2)); // 7
console.log(wordCount(str3)); // 11
use "regex" to replace all special characters, underscores, numbers, and extra white spaces with an empty space
--> replace(/[\W_\d]/g,' ')
convert the string into an array
--> .split(' ')
use filter to remove all empty string(s) in the array
--> .filter(Boolean)
then, get the word count with "length"
--> .length
You first need to filter the string, remove all the special characters and numbers:
var filtered_test = my_text.replace(/[^a-zA-Z ]/g, '');
then do a normal split and count:
var words = filtered_test.split(" ");
console.log(words.length); //prints out the count of words
You can use a functional replace method to chunk all of the "words" into an array, then simply return the array length. This has the added benefit of providing a 0 count:
explanatory version:
function countWords(str, words = []) {
str.replace(/[A-Z]+/gi, (m) => words.push(m));
return words.length;
}
minimal version:
let countWords = (str, words = []) =>
( str.replace(/[A-Z]+/gi, (m) => words.push(m)), words.length );
let countWords = (str, words = []) => (str.replace(/[A-Z]+/gi, (m) => words.push(m)), words.length);
console.log( "##asdfadf###asfadf: " + countWords("##asdfadf###asfadf") )
console.log("##13424#$#$#$%: " + countWords("##13424#$#$#$%"));
How about this regular expression: /.*?[a-z]+.*?(\s|$)/gi
Use return s.match(/.*?[a-z]+.*?(\s|$)/gi).length
Anything with at least 1 letter in it is counted. Then the phrase O##ne two $#!+ ##%Three four^&&$ five would count as 5 words.
Edit: If you want to be evil to pass your test cases when there are 0 matches use (input.match(/.*?[a-z]+.*?(\s|$)/gi) || "").length

How to get first 2 words?

Let data.title be ABC XYZ PQRS - www.aaa.tld.
Output needs to be like this ABC+XYZ
i've tried this:
var t = data.title.split(' ').join('+');
t = t.replace(/(([^\s]+\s\s*){1})(.*)/,"Unknown");
$("#log").text(t);
Here is one way to do it, no regex though, it only grabs the first two words and must have a space between those words.
First we split into and array, then we slice that array from the 0 index to 2(exclusive) or 1, and finally we join them with a '+':
var x = 'ABC XYZ PQRS';
var y = x.split(' ').slice(0,2).join('+');
// y = "ABC+XYZ"
Working Fiddle
Try using .match() with RegExp /([\w+]+)/g; concatenate first match, + character, second match
var matches = "ABC XYZ PQRS - www.aaa.tld".match(/([\w+]+)/g);
console.log(matches[0] + "+" + matches[1])
This is my general function for first n words. Haven't tested it extensively but it is fast even on long strings because it doesn't use a global regex or split every word. You can fine tune the regex for dealing with punctuation. I'm considering a hyphen as a delimiter but you can move that to the word portion instead if you prefer.
function regFirstWords(s, n) {
// ?: non-capturing subsequent sp+word.Change {} if you want to require n instead of allowing fewer
var a = s.match(new RegExp('[\\w\\.]+' + '(?:[\\s-]*[\\w\\.]+){0,' + (n - 1) + '}'));
return (a === undefined || a === null) ? '' : a[0];
}
To satisfy the OP's request to replace with '+'
regFirstWords('ABC XYZ PQRS - www.aaa.tld',2).replace(/\s/g,'+')

Categories