Match text surrounded by underscore - javascript

I need a regex to match:
_Sample welcome text_ or Sample _welcome_ _text_
but not Sample_welcome_text
i.e there can be (space or nothing) before the opening underscore and (space or nothing) after the closing underscore.
I have tried using this:
/_(?:(?! ))(.*?)[^ ]_/gmi
Though it works but unfortunately it matches Sample_welcome_text

You could use an alternation to either start with optional whitespace chars followed by an underscore, or the other way around.
Note that \s can also match newlines. You could match mere spaces instead if that is required, or [^\S\n]* to exclude newlines.
^\s*_.*|.*_\s*$
Regex demo
const regex = /^\s*_.*|.*_\s*$/;
[
"Sample welcome text_",
"Sample _welcome_ _text_",
"Sample_welcome_text"
].forEach(s =>
console.log(`${s} --> ${regex.test(s)}`)
)

You could use lookbehind and lookahead assertion for searching text which surrounded by underscores, and there can be (space or nothing/start of string) before the opening underscore, (space or nothing/end of string) after the closing underscore.
/(?<=[ ]+|^)_(.*?)_(?=[ ]+|$)/gmi
Demo: https://regex101.com/r/t41Fkm/1

You can use a positive lookbehind and lookahead for either whitespace or start/end of string, and reference the word in capture group 1: (.*?)
const regex = /(?<=\s|^)_(.*?)_(?=\s|$)/gs;
[
"Sample welcome text_",
"Sample _welcome_ _text_",
"Sample_welcome_text"
].forEach(str => {
let matches = [...str.matchAll(regex)].map(m => m[1]);
console.log(str, '=>', matches);
});
If you are concerned about Safari not supporting lookbehind, you can turn the lookbehind into capture group, and reference capture group 2 instead:
const regex = /(\s|^)_(.*?)_(?=\s|$)/gs;
[
"Sample welcome text_",
"Sample _welcome_ _text_",
"Sample_welcome_text"
].forEach(str => {
let matches = [...str.matchAll(regex)].map(m => m[2]);
console.log(str, '=>', matches);
});
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

Related

Regex match hashtag with exceptions

I have the current expression:
/(?<![http://|https://|#])#([\d\w]+[^\d\s<]+[^\s<>]+)/g
However it's not compatible to run on Safari. I'm trying to handle the following cases:
#tag => match
#123 => no match
#32bit => match
##tag => no match
http://google.com/#/test => no match
tag##tag => no match
tag#tag => no match
<p>#tag</p> => match only #tag
#tag. => match only #tag
tag## => no match
tag# => no match
this is a match #tag => only #tag
I wonder how I can make a character before the match result in a negative match. E.g. # and /.
Is there any alternative to negative look behind that is compatible with Safari?
Thanks in advance.
You might use a negated character class and a capture group, and make sure that there are not only digits.
Note that \w also matches \d
(?:^|[^\w#/])(#(?!\d+\b)\w+)\b
Explanation
(?: Non capture group
^ Assert the start of the string
| Or
[^\w#/] Match a single non word char other than # or /
) Close non capture group
( Capture group 1
# Match literally
(?!\d+\b) Negative lookahead, assert not only digits to the right followed by a word boundary
\w+ Match 1+ word characters
) Close group 1
\b A word boundary to prevent a partial word match
Regex demo
let regex = /(?:^|[^\w#/])(#(?!\d+\b)\w+)\b/;
[
"#tag",
"#123",
"#32bit",
"##tag",
"http://google.com/#/test",
"tag##tag",
"tag#tag",
"<p>#tag</p>",
"#tag.",
"tag##",
"tag#"
].forEach(s => {
const m = s.match(regex)
if (m) {
console.log(`${s} --> ${m[1]}`)
}
})
Using the matches in a replacement:
let regex = /((?:^|[^\w#/]))(#(?!\d+\b)\w+)\b/;
[
"#tag",
"#123",
"#32bit",
"##tag",
"http://google.com/#/test",
"tag##tag",
"tag#tag",
"<p>#tag</p>",
"#tag.",
"tag##",
"tag#",
"this is a match #tag"
].forEach(s => {
const m = s.match(regex)
if (m) {
console.log(s.replace(regex, "$1<span>$2</span>"))
}
})
If you use the following pattern the second matching group contains what you want.
^(<\w*>)?(#\w+[a-zA-Z])
This satisfies your test cases. Not sure though whether you want this or not.
It does't work on #123 but I forgot it and I'm now lazy to add it as a screenshot.
/^(?:[^#]*[^#\w])?(#[\w]*[a-zA-Z][\w]*).*$/g
If you only want to allow <tags> before the "#", you can insted use #kendle's solution for the first non-capture group (before the actual group starting with #).
(?:<\w*>)?
You can also achieve this, without a capture group, with this regex:
/(?<![#\w])#{1}(?!\d+\b)\w+/g
const regex = /(?<![#\w])#{1}(?!\d+\b)\w+/g;
const stringToTest = [
"#tag",
"#123",
"#32bit",
"##tag",
"http://google.com/#/test",
"tag##tag",
"tag#tag",
"<p>#tag</p>",
"#tag.",
"tag##",
"tag#",
"this is a match #tag",
];
stringToTest.forEach(str => {
const match = str.match(regex);
if (match) {
console.log(`${str} -> ${match[0]}`);
} else {
console.log(`${str} -> ${match}`);
}
});
Good luck !

Javascript regex to find two characters between two delimitators

EDITED
I need to find two characters between '[' ']' and '/' '/' using Javascript.
I am using this regex:
([^.][/[string]]|\/string\/)|(\[(string))|(\/(string))| ((string)\])|((string)\/)
that gets two charactes but gets too one character.
The question is, how can I do to get just two characters?
Also I want to get exactly the two characters inside the string, I mean not just only the exact match.
Eg.
User input: dz
It must to find just exact matches that contains "dz", e.g. --> "dzone" but not "dazone". Currently I am getting matches with both strings, "dzone" and "dazone".
Demo: https://regex101.com/r/FEs6ib/1
You could optionally repeat any char except the delimiters between the delimiters them selves, and capture in a group what you want to keep.
If you want multiple matches for /dzone/dzone/ you could assert the last delimiter to the right instead of matching it.
The matches are in group 1 or group 2 where you can check for if they exist.
\/[^\/]*(dz)[^\/]*(?=\/)|\[[^\][]*(dz)[^\][]*(?=])
The pattern matches:
\/ Match /
[^\/]*(dz)[^\/]* Capture dz in group 1 between optional chars other than /
(?=\/) Positive lookahead, assert / to the right
| Or
\[ Match [
[^\][]*(dz)[^\][]* Capture dz in group 2 between optional chars other than [ and ]
-(?=]) Positive lookahead, assert ] to the right
Regex demo
This will match 1 occurrence of dz in the word. If you want to match the whole word, the capture group can be broadened to before and after the negated character class like:
\/([^\/]*dz[^\/]*)(?=\/)|\[([^\][]*dz[^\][]*)(?=])
Regex demo
const regex = /\/[^\/]*(dz)[^\/]*(?=\/)|\[[^\][]*(dz)[^\][]*(?=])/g;
[
"[dzone]",
"/dzone/",
"/dzone/dzone/",
"/testdztest/",
"[dazone]",
"/dazone/",
"dzone",
"dazone"
].forEach(s =>
console.log(
`${s} --> ${Array.from(s.matchAll(regex), m => m[2] ? m[2] : m[1])}`
)
);
If supported, you might also match all occurrences of dz between the delimiters using lookarounds with an infinite quantifier:
(?<=\/[^\/]*)dz(?=[^\/]*\/)|(?<=\[[^\][]*)dz(?=[^\][]*])
Regex demo
const regex = /(?<=\/[^\/]*)dz(?=[^\/]*\/)|(?<=\[[^\][]*)dz(?=[^\][]*])/g;
[
"[adzadzone]",
"[dzone]",
"/dzone/",
"/dzone/dzone/",
"/testdztest/",
"[dazone]",
"/dazone/",
"dzone",
"dazone"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(`${s} --> ${s.match(regex)}`);
}
});

Regex add space in string if the word is longer than 4 characters and have numbers

I try to create a regex with 2 condition:
if word length more than 4 character
And if the word contains numbers
I need to add spaces
So like: iph12 return iph12, but iphone12 return iphone 12
I wrote regex
.replace(/\d+/gi, ' $& ').trim()
and this function return in anyway string like iphone 12. I tried to use function
.replace(/(?=[A-Z]+\d|\d+[A-Z])[A-Z\d]{,4}/i, ' $& ').trim()
but without second argument in {,4} it's not working. So is this possible?
You can use
text.replace(/\b([a-zA-Z]{4,})(\d+)\b/g, '$1 $2')
See the regex demo. Details:
\b - word boundary
([a-zA-Z]{4,}) - Group 1: four or more ASCII letters
(\d+) - Group 2: one or more digits
\b - word boundary
See the JavaScript demo:
const texts = ['iphone12', 'iph12'];
const regex = /\b([a-zA-Z]{4,})(\d+)\b/g;
for (const text of texts) {
console.log(text, '=>', text.replace(regex, '$1 $2'));
}
Output:
iphone12 => iphone 12
iph12 => iph12

Regex get string between ()

I have the text
var text = (hello) world this is (hi) text
I want to write a regex function so I can get
parseText(text) // returns ['hello', 'hi']
I tried this but not work:
'(hello) world this is (hi) text'.match('((.*?))')
Thanks for your help
you can try with:
/\([^\)]+\)/g
\(: escaped char
[^\)]+: one or more character(including symbols) until ) char.
\): escaped char
g flag: search all coincidences
const regex = /\([^\)]+\)/g;
const str = `(hello) world this is (hi) text`;
console.log(
str.match(regex) // this returns an string array
.map(i => i.slice(1, -1)) // remove first and last char
);
TIPS:
About point #2, you can change to [\)]* to take effect over zero
or more character.
If you need only string, you can use \w+ or \w*.
If you need only words you can use /\(\b\w+\b\)/g
You can find several options in this post.
Apart from using groups or postprocessing of the match results, you can use single regex match using lookahead / lookbehind:
var text = " (hello) world this is (hi) text"
var output = text.match(/(?<=\().*?(?=\))/g)
console.log(output)
output:
[ 'hello', 'hi' ]
Explanation:
(?<=...) ... positive lookbehind. The match is preceded be ..., but the ... is not included in the match
(?<=\() ... positive lookbehind for ( character
.* ... zero or more times of any character
.*? ... nongreedy version of .*
(?=...) ... positive lookahead, the match is followed by ... but the ... is not included in the match
(?=\)) ... positive lookahead for ) character
/.../g ... g is global flag, match finds all, not only the first, occurrence
do not forget to escape "special characters", e.g. parentheses
'(hello) world this is (hi) text'.match(/\([\w]*\)/g)
This returns [ "(hello)", "(hi)" ] and you can run another parse function to remove that extra parenthesis.
const text = '(hello) world this is (hi) text';
const list = text.match(/\([\w]*\)/g);
const parsed = list.map(item => item.replace(/\(|\)/g, ''));
console.log(parsed);

Replace match and ignore certain characters

I want to replace {r-group1} with "REPLACED" but leave the , where it is.
So, the string
var string = "{r-group1, }foo bar"
should output: "REPLACED, foo bar"
Using a negative lookahead, I tried adding a preceding (?![,]) group to leave the comma alone:
var replaced = string.replace(^(?:(?![,]){r-group1\})+$, 'REPLACED');
But it returns the same string. There are no matches to replace.
The same goes for a preceding comma:
var string = "foo bar{r-, group1}"
This should output: "foo bar, REPLACED"
You could do the replacement without a lookahead. You could match the curly braces and the content that comes before and after it except a comma using a negated character class [^,}]+ and capture the comma with optional whitespace chars in a capturing group.
In the replacement use the capturing groups $1REPLACED$2
Credits to #Nick for the updated pattern.
{r-(,?\s*)[^,}]+(,?\s*)}
Regex demo
const regex = /{r-(,?\s*)[^,}]+(,?\s*)}/g;
const str = `{r-group1, }foo bar`;
const subst = `$1REPLACED$2`;
const result = str.replace(regex, subst);
console.log(result);

Categories