Lazy match front part in JavaScript regex

Lazy match front part in JavaScript regex - javascript

I have a string
Steve Jobs steve.jobs#example.com somethingElse
I hope to match steve.jobs#example.com somethingElse (a space in front)
This is my regular expression (JavaScript)
\s.+?#.+
But now it matches Jobs steve.jobs#example.com somethingElse (a space in front)
I know I can use ? to lazy match the following part, but how to lazy match front part?

A . can be any character, including the whitespaces.
Normally e-mails don't contain whitespaces.
(although it's actually allowed between 2 ")
So you could change the regex so that it looks for non-whitespaces \S before and after the #.
It can be greedy.
A whitespace followed by 1 or more non-whitespaces and a # and 1 or more non-whitespaces. Then by whitespace(s) and something else:
\s(\S+#\S+)(?:\s+(\S+))?

You can also use trim, split and pop
var output = "Steve Jobs steve.jobs#example.com ".trim().split(" ").pop();
Regex solution
You can use trim and match
var output = "Steve Jobs steve.jobs#example.com ".trim().match( /[\w.]+#[\w.]+/g )
Regex - /[\w.]+#[\w.]+$/gi
Edit
var output = "Steve Jobs steve.jobs#example.com somethingelse ".trim().match( /[\w.]+#[\w.]+/g )
Demo
var regex = /[\w.]+#[\w.]+/g;
var input1 = "Steve Jobs steve.jobs#example.com ";
var input2 = "Steve Jobs steve.jobs#example.com somethingelse ";
var fn = (str) => str.trim().match(regex);
console.log( fn(input1) );
console.log( fn(input2) );

The allowed characters in an email are;
*0-9* | *a-z* | *. - _*
And it must have a # symbol too.
So our regex must start with allowed characters,
[a-zA-z0-9-_.]
It must continue with # symbol;
[a-zA-z0-9-_.]+#
Then it can end with .com or anything which includes dot
[a-zA-z0-9-_.]+#[a-zA-z0-9.]+

Related

Regex to extract search terms is not working as expected

I have the test string
ti: harry Potter OR kw: magic AND sprint: title OR ti: HARRY
and want the output as
["ti: harry Potter OR kw:", "kw: magic AND sprint:", "sprint: title OR ti:", "ti: HARRY"]
but the output I am getting is
["ti: harry Potter OR kw:", "kw: magic AND sprint:", "nt: title OR ti:", "ti: HARRY"]
It is taking only 2 characters before the colon
The regex I am using is
const match = /[a-z0-9]{2}:.*?($|[a-z0-9]{2}:)/g;
and I am extracting it and putting it in an array
I tried replacing it with /[a-z0-9]+:.*?($|[a-z0-9]+:)/g; but when I increase index and add the strings to parsed, it does it weirdly (This is included in code as well)
I tried changing the {2} to n and that is also not working as expected.
const parsed = [];
const match = /[a-z0-9]{2}:.*?($|[a-z0-9]{2}:)/g;
const message = "ti: harry Potter OR kw: magic AND sprint: title OR ti: HARRY";
let next = match.exec(message);
while (next) {
parsed.push(next[0]);
match.lastIndex = next.index + 1;
next = match.exec(message);
console.log("next again", next);
}
console.log("parsed", parsed);
https://codesandbox.io/s/regex-forked-6op514?file=/src/index.js

For the desired matches, you might use a pattern where you would also optionally match AND or OR and get the match in capture group 1, which is denoted be m[1] in the example code.
\b(?=([a-z0-9]+:.*?(?: (?:AND|OR) [a-z0-9]+:|$)))
In parts, the pattern matches:
\b A word boundary to prevent a partial match
(?= Positive lookahead to assert what is on the right is
( Capture group 1
[a-z0-9]+:
.*? Match any char except a newline as least as possible
(?: Non capture group
(?:AND|OR) [a-z0-9]+: Match either AND or OR followed by a space and 1+ times a char a-z0-9 and :
| Or
$ Assert the end of the string
) Close non capture group
) Close group 1
) Close the lookahead
See a regex demo.
const regex = /\b(?=([a-z0-9]+:.*?(?: (?:AND|OR) [a-z0-9]+:|$)))/gm;
const str = `ti: harry Potter OR kw: magic AND sprint: title OR ti: HARRY`;
const result = Array.from(str.matchAll(regex), m => m[1]);
console.log(result);

Regex get string between ()

I have the text
var text = (hello) world this is (hi) text
I want to write a regex function so I can get
parseText(text) // returns ['hello', 'hi']
I tried this but not work:
'(hello) world this is (hi) text'.match('((.*?))')
Thanks for your help

you can try with:
/\([^\)]+\)/g
\(: escaped char
[^\)]+: one or more character(including symbols) until ) char.
\): escaped char
g flag: search all coincidences
const regex = /\([^\)]+\)/g;
const str = `(hello) world this is (hi) text`;
console.log(
str.match(regex) // this returns an string array
.map(i => i.slice(1, -1)) // remove first and last char
);
TIPS:
About point #2, you can change to [\)]* to take effect over zero
or more character.
If you need only string, you can use \w+ or \w*.
If you need only words you can use /\(\b\w+\b\)/g

You can find several options in this post.
Apart from using groups or postprocessing of the match results, you can use single regex match using lookahead / lookbehind:
var text = " (hello) world this is (hi) text"
var output = text.match(/(?<=\().*?(?=\))/g)
console.log(output)
output:
[ 'hello', 'hi' ]
Explanation:
(?<=...) ... positive lookbehind. The match is preceded be ..., but the ... is not included in the match
(?<=\() ... positive lookbehind for ( character
.* ... zero or more times of any character
.*? ... nongreedy version of .*
(?=...) ... positive lookahead, the match is followed by ... but the ... is not included in the match
(?=\)) ... positive lookahead for ) character
/.../g ... g is global flag, match finds all, not only the first, occurrence
do not forget to escape "special characters", e.g. parentheses

'(hello) world this is (hi) text'.match(/\([\w]*\)/g)
This returns [ "(hello)", "(hi)" ] and you can run another parse function to remove that extra parenthesis.
const text = '(hello) world this is (hi) text';
const list = text.match(/\([\w]*\)/g);
const parsed = list.map(item => item.replace(/\(|\)/g, ''));
console.log(parsed);

Javascript regex to always find a match from the end?

I have a string which looks like below
str = "hey there = pola"
Now I need to check if there is equal = sign and the first word to the left of it. So this is what I do
str.match(/\w+(?= *=)/)[0]
So I get the desired result
But say I have a string like this
str = "hey there= pola so = boba"
Now I have two = signs. But the above regex will only give me the result for the first = sign.
Is there any regex that can always look for the first instance of = from the end of the string?

You can assert what is on the right is an equals sign followed by matching any char except an equals sign until the end of the string
\w+(?= *=[^=]*$)
In parts:
\w+
(?= Positive lookahead
*= Match 0+ occurrences of a space followed by =
[^=]* Match 0+ occurrences of = ( Use [^=\r\n]* to not cross line breaks)
$ End of string
) Close lookahead
Regex demo
const regex = /\w+(?= *=[^=]*$)/;
const str = `hey there= pola so = boba`;
console.log(str.match(regex)[0]);
Without using a lookahead, you could use a capturing group:
^.*\b(\w+) *=[^=]*$
Regex demo
const regex = /^.*\b(\w+) *=[^=]*$/m;
const str = `hey there= pola so = boba`;
console.log(str.match(regex)[1]);

I'm not much expert on regex but for you requirement I think split and pop should work
let str = "hey there= pola so = boba";
let endres = str.split('=').pop(); // gives the last element in the split array
Hope this helps.

Allow only certain character in string. Javascript

I have no idea, why this simple code is not working. I am planning to match a string against the allowed pattern.
The string should ONLY have a-z, A-Z, 0-9, _ (underscore), . (dot) , - (hiphen).
Below is code:
var profileIDPattern = /[a-zA-Z0-9_.-]./;
var str = 'Heman%t';
console.log('hemant',profileIDPattern.test(str));
The code logs 'true' for below string, although these string DOES NOT match the pattern.
'Heman%t' -> true
'#Hemant$' -> true
I dont know what is the problem.

Try changing it to this RegExp (/^[a-zA-Z0-9_.-]*$/):
var profileIDPattern = /^[a-zA-Z0-9_.-]*$/;
var str1 = 'Hemant-._67%'
var str2 = 'Hemant-._67';
console.log('hemant1',profileIDPattern.test(str1));
console.log('hemant2',profileIDPattern.test(str2));

Issues : [a-zA-Z0-9_.-] will match any character inside [] and . will match anything after so basically it will match the mention character and any other character
Use ^ and $ anchor to mention start and end of match and remove .
^[a-zA-Z0-9_.-]+ : starting with any given value inside []
[a-zA-Z0-9_.-]+$ : one or more matches and $ to end the match
var profileIDPattern = /^[a-zA-Z0-9_.-]+$/;
console.log('hemant', profileIDPattern.test('Heman%t')); // no match -
console.log('hemant-._', profileIDPattern.test('hemant-._')); // valid match
console.log('empty', profileIDPattern.test('')); // no match ,empty

Multiple nested matches in JavaScript Regular Expression

Trying to write a regular expression to match GS1 barcode patterns ( https://en.wikipedia.org/wiki/GS1-128 ), that contain 2 or more of these patterns that have an identifier followed by a certain number of characters of data.
I need something that matches this barcode because it contains 2 of the identifier and data patterns:
human readable with the identifiers in parens: (01)12345678901234(17)501200
actual data: 011234567890123417501200
but should match not this barcode when there is only one pattern in:
human readable: (01)12345678901234
actual data: 0112345678901234
It seems like the following should work:
var regex = /(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6})){2,}/g;
var str = "011234567890123417501200";
console.log(str.replace(regex, "$4"));
// matches 501200
console.log(str.replace(regex, "$1"));
// no match? why?
For some strange reason as soon as I remove the {2,} it works, but I need the {2,} so that it only returns matches if there is more than one match.
// Remove {2,} and it will return the first match
var regex = /(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6}))/g;
var str = "011234567890123417501200";
console.log(str.replace(regex, "$4"));
// matches 501200
console.log(str.replace(regex, "$1"));
// matches 12345678901234
// but then the problem is it would also match single identifiers such as
var str2 = "0112345678901234";
console.log(str2.replace(regex, "$1"));
How do I make this work so it will only match and pull the data if there is more than 1 set of match groups?
Thanks!

Your RegEx is logically and syntatically correct for Perl-Compatible Regular Expressions (PCRE). The issue I believe you are facing is the fact that JavaScript has issues with repeated capture groups. This is why the RegEx works fine once you take out the {2,}. By adding the quantifier, JavaScript will be sure to return only the last match.
What I would recommend is removing the {2,} quantifier and then programmatically checking for matches. I know it's not ideal for those who are big fans of RegEx, but c'est la vie.
Please see the snippet below:
var regex = /(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6}))/g;
var str = "011234567890123417501200";
// Check to see if we have at least 2 matches.
var m = str.match(regex);
console.log("Matches list: " + JSON.stringify(m));
if (m.length < 2) {
console.log("We only received " + m.length + " matches.");
} else {
console.log("We received " + m.length + " matches.");
console.log("We have achieved the minimum!");
}
// If we exec the regex, what would we get?
console.log("** Method 1 **");
var n;
while (n = regex.exec(str)) {
console.log(JSON.stringify(n));
}
// That's not going to work. Let's try using a second regex.
console.log("** Method 2 **");
var regex2 = /^(\d{2})(\d{6,})$/;
var arr = [];
var obj = {};
for (var i = 0, len = m.length; i < len; i++) {
arr = m[i].match(regex2);
obj[arr[1]] = arr[2];
}
console.log(JSON.stringify(obj));
// EOF
I hope this helps.

The reason is that the capture groups only give the last match by that particular group. Imagine that you would have two barcodes in your sequence that have both the same identifier 01... now it becomes clear that $1 cannot refer to both at the same time. The capture group only retains the second occurrence.
A straightforward way, but not so elegant, is to drop the {2,}, and instead repeat the whole regular expression pattern for matching the second barcode sequence. I think you also need to use the ^ (start of string anchor) to be sure the match is at the start of the string, otherwise you might pick up an identifier halfway an invalid sequence. After the repeated regular expression pattern you should also add .* if you want to ignore anything that follows after the second sequence, and not have it come back to you when using replace.
Finally, as you don't know which identifier will be found for the first and second match, you need to reproduce $1$2$3$4 in your replace, knowing that only one of those four will be a non-empty string. Same for the second match: $5$6$7$8.
Here is the improved code applied to your example string:
var regex = /^(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6}))(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6})).*/;
var str = "011234567890123417501200";
console.log(str.replace(regex, "$1$2$3$4")); // 12345678901234
console.log(str.replace(regex, "$5$6$7$8")); // 501200
If you need to also match the barcodes that follow the second, then you cannot escape from writing a loop. You cannot do that with just a regular expression based replace.
With a loop
If a loop is allowed, then you can use the regex#exec method. I would then suggest to add in your regular expression a kind of "catch all", which will match one character if none of the other identifiers match. If in the loop you detect such a "catch all" match, you exit:
var str = "011234567890123417501200";
var regex = /(?:01(\d{14})|10([^\x1D]{6,20})|11(\d{6})|17(\d{6})|(.))/g;
// 1: ^^^^^^ 2: ^^^^^^^^^^^^^ 3: ^^^^^ 4: ^^^^^ 5:^ (=failure)
var result = [], grp;
while ((grp = regex.exec(str)) && !grp[5]) result.push(grp.slice(1).join(''));
// Consider it a failure when not at least 2 matched.
if (result.length < 2) result = [];
console.log(result);

update
1st example
example with $1 $2 $3 $4 don't know why in matrix :)
but you see $1 -> abc
$2 -> def $3 -> ghi $4 -> jkl
// $1 $2 $3 $4
var regex = /(abc)|(def)|(ghi)|(jkl)/g;
var str = "abcdefghijkl";
// test
console.log(str.replace(regex, "$1 1st "));
console.log(str.replace(regex, "$2 2nd "));
console.log(str.replace(regex, "$3 3rd "));
console.log(str.replace(regex, "$4 4th "));
2nd example
sth in here is mixing faulty
// $1 $2 $3 $4
var regex = /((abc)|(def)|(ghi)|(jkl)){2,}/g;
var str = "abcdefghijkl";
// test
console.log(str.replace(regex, "$1 1st "));
console.log(str.replace(regex, "$2 2nd "));
console.log(str.replace(regex, "$3 3rd "));
console.log(str.replace(regex, "$4 4th "));
As you see there is ($4)( )( )( ) instead of ($1)( )( )( ).
If I think correctly the problem is with outside brackets () confusing 'pseudo' $1 is $4. If you have in outside brackets () a pattern and then {2,} so in outside brackets () it is $4 but in subpattern there is (?:01(\d{14})) but it reads like not $1 but faulty in this case $4 . Maybe this cause conflicts between the remembered values in outside brackets () and 1st remembered values but inside brackets (this is $1) . That's why it doesn't display. In other words you have ($4 ($1 $2 $3 $4) ) and this is not correct.
I add the picture to show what I mean.
As #Damian said
By adding the quantifier, JavaScript will be sure to return only the last match.
so $4 is the last match.
end update
I added useful little test
var regex = /(?:01(\d{14})|10(\x1D{6,20})|11(\d{6})|17(\d{6})){2,}/g;
var str = "011234567890123417501200";
// test
console.log(str.replace(regex, "$1 1st "));
console.log(str.replace(regex, "$2 2nd "));
console.log(str.replace(regex, "$3 3rd "));
console.log(str.replace(regex, "$4 4th "));

We Keep Coding

JavaScript is the programming language of the Web.

Lazy match front part in JavaScript regex - javascript

The allowed characters in an email are; 0-9 | a-z | . - _ And it must have a # symbol too. So our regex must start with allowed characters, [a-zA-z0-9-_.] It must continue with # symbol; [a-zA-z0-9-_.]+# Then it can end with .com or anything which includes dot [a-zA-z0-9-_.]+#[a-zA-z0-9.]+

Related

Regex to extract search terms is not working as expected

Regex get string between ()

Javascript regex to always find a match from the end?

Allow only certain character in string. Javascript

Multiple nested matches in JavaScript Regular Expression

Categories

Resources

We Keep Coding

JavaScript is the programming language of the Web.

Lazy match front part in JavaScript regex - javascript

The allowed characters in an email are; *0-9* | *a-z* | *. - _* And it must have a # symbol too. So our regex must start with allowed characters, [a-zA-z0-9-_.] It must continue with # symbol; [a-zA-z0-9-_.]+# Then it can end with .com or anything which includes dot [a-zA-z0-9-_.]+#[a-zA-z0-9.]+

Related

Regex to extract search terms is not working as expected

Regex get string between ()

Javascript regex to always find a match from the end?

Allow only certain character in string. Javascript

Multiple nested matches in JavaScript Regular Expression

Categories

Resources

The allowed characters in an email are; 0-9 | a-z | . - _ And it must have a # symbol too. So our regex must start with allowed characters, [a-zA-z0-9-_.] It must continue with # symbol; [a-zA-z0-9-_.]+# Then it can end with .com or anything which includes dot [a-zA-z0-9-_.]+#[a-zA-z0-9.]+