Regex match hashtag with exceptions - javascript

I have the current expression:
/(?<![http://|https://|#])#([\d\w]+[^\d\s<]+[^\s<>]+)/g
However it's not compatible to run on Safari. I'm trying to handle the following cases:
#tag => match
#123 => no match
#32bit => match
##tag => no match
http://google.com/#/test => no match
tag##tag => no match
tag#tag => no match
<p>#tag</p> => match only #tag
#tag. => match only #tag
tag## => no match
tag# => no match
this is a match #tag => only #tag
I wonder how I can make a character before the match result in a negative match. E.g. # and /.
Is there any alternative to negative look behind that is compatible with Safari?
Thanks in advance.

You might use a negated character class and a capture group, and make sure that there are not only digits.
Note that \w also matches \d
(?:^|[^\w#/])(#(?!\d+\b)\w+)\b
Explanation
(?: Non capture group
^ Assert the start of the string
| Or
[^\w#/] Match a single non word char other than # or /
) Close non capture group
( Capture group 1
# Match literally
(?!\d+\b) Negative lookahead, assert not only digits to the right followed by a word boundary
\w+ Match 1+ word characters
) Close group 1
\b A word boundary to prevent a partial word match
Regex demo
let regex = /(?:^|[^\w#/])(#(?!\d+\b)\w+)\b/;
[
"#tag",
"#123",
"#32bit",
"##tag",
"http://google.com/#/test",
"tag##tag",
"tag#tag",
"<p>#tag</p>",
"#tag.",
"tag##",
"tag#"
].forEach(s => {
const m = s.match(regex)
if (m) {
console.log(`${s} --> ${m[1]}`)
}
})
Using the matches in a replacement:
let regex = /((?:^|[^\w#/]))(#(?!\d+\b)\w+)\b/;
[
"#tag",
"#123",
"#32bit",
"##tag",
"http://google.com/#/test",
"tag##tag",
"tag#tag",
"<p>#tag</p>",
"#tag.",
"tag##",
"tag#",
"this is a match #tag"
].forEach(s => {
const m = s.match(regex)
if (m) {
console.log(s.replace(regex, "$1<span>$2</span>"))
}
})

If you use the following pattern the second matching group contains what you want.
^(<\w*>)?(#\w+[a-zA-Z])

This satisfies your test cases. Not sure though whether you want this or not.
It does't work on #123 but I forgot it and I'm now lazy to add it as a screenshot.
/^(?:[^#]*[^#\w])?(#[\w]*[a-zA-Z][\w]*).*$/g
If you only want to allow <tags> before the "#", you can insted use #kendle's solution for the first non-capture group (before the actual group starting with #).
(?:<\w*>)?

You can also achieve this, without a capture group, with this regex:
/(?<![#\w])#{1}(?!\d+\b)\w+/g
const regex = /(?<![#\w])#{1}(?!\d+\b)\w+/g;
const stringToTest = [
"#tag",
"#123",
"#32bit",
"##tag",
"http://google.com/#/test",
"tag##tag",
"tag#tag",
"<p>#tag</p>",
"#tag.",
"tag##",
"tag#",
"this is a match #tag",
];
stringToTest.forEach(str => {
const match = str.match(regex);
if (match) {
console.log(`${str} -> ${match[0]}`);
} else {
console.log(`${str} -> ${match}`);
}
});
Good luck !

Related

Match text surrounded by underscore

I need a regex to match:
_Sample welcome text_ or Sample _welcome_ _text_
but not Sample_welcome_text
i.e there can be (space or nothing) before the opening underscore and (space or nothing) after the closing underscore.
I have tried using this:
/_(?:(?! ))(.*?)[^ ]_/gmi
Though it works but unfortunately it matches Sample_welcome_text
You could use an alternation to either start with optional whitespace chars followed by an underscore, or the other way around.
Note that \s can also match newlines. You could match mere spaces instead if that is required, or [^\S\n]* to exclude newlines.
^\s*_.*|.*_\s*$
Regex demo
const regex = /^\s*_.*|.*_\s*$/;
[
"Sample welcome text_",
"Sample _welcome_ _text_",
"Sample_welcome_text"
].forEach(s =>
console.log(`${s} --> ${regex.test(s)}`)
)
You could use lookbehind and lookahead assertion for searching text which surrounded by underscores, and there can be (space or nothing/start of string) before the opening underscore, (space or nothing/end of string) after the closing underscore.
/(?<=[ ]+|^)_(.*?)_(?=[ ]+|$)/gmi
Demo: https://regex101.com/r/t41Fkm/1
You can use a positive lookbehind and lookahead for either whitespace or start/end of string, and reference the word in capture group 1: (.*?)
const regex = /(?<=\s|^)_(.*?)_(?=\s|$)/gs;
[
"Sample welcome text_",
"Sample _welcome_ _text_",
"Sample_welcome_text"
].forEach(str => {
let matches = [...str.matchAll(regex)].map(m => m[1]);
console.log(str, '=>', matches);
});
If you are concerned about Safari not supporting lookbehind, you can turn the lookbehind into capture group, and reference capture group 2 instead:
const regex = /(\s|^)_(.*?)_(?=\s|$)/gs;
[
"Sample welcome text_",
"Sample _welcome_ _text_",
"Sample_welcome_text"
].forEach(str => {
let matches = [...str.matchAll(regex)].map(m => m[2]);
console.log(str, '=>', matches);
});
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

Javascript regex to find two characters between two delimitators

EDITED
I need to find two characters between '[' ']' and '/' '/' using Javascript.
I am using this regex:
([^.][/[string]]|\/string\/)|(\[(string))|(\/(string))| ((string)\])|((string)\/)
that gets two charactes but gets too one character.
The question is, how can I do to get just two characters?
Also I want to get exactly the two characters inside the string, I mean not just only the exact match.
Eg.
User input: dz
It must to find just exact matches that contains "dz", e.g. --> "dzone" but not "dazone". Currently I am getting matches with both strings, "dzone" and "dazone".
Demo: https://regex101.com/r/FEs6ib/1
You could optionally repeat any char except the delimiters between the delimiters them selves, and capture in a group what you want to keep.
If you want multiple matches for /dzone/dzone/ you could assert the last delimiter to the right instead of matching it.
The matches are in group 1 or group 2 where you can check for if they exist.
\/[^\/]*(dz)[^\/]*(?=\/)|\[[^\][]*(dz)[^\][]*(?=])
The pattern matches:
\/ Match /
[^\/]*(dz)[^\/]* Capture dz in group 1 between optional chars other than /
(?=\/) Positive lookahead, assert / to the right
| Or
\[ Match [
[^\][]*(dz)[^\][]* Capture dz in group 2 between optional chars other than [ and ]
-(?=]) Positive lookahead, assert ] to the right
Regex demo
This will match 1 occurrence of dz in the word. If you want to match the whole word, the capture group can be broadened to before and after the negated character class like:
\/([^\/]*dz[^\/]*)(?=\/)|\[([^\][]*dz[^\][]*)(?=])
Regex demo
const regex = /\/[^\/]*(dz)[^\/]*(?=\/)|\[[^\][]*(dz)[^\][]*(?=])/g;
[
"[dzone]",
"/dzone/",
"/dzone/dzone/",
"/testdztest/",
"[dazone]",
"/dazone/",
"dzone",
"dazone"
].forEach(s =>
console.log(
`${s} --> ${Array.from(s.matchAll(regex), m => m[2] ? m[2] : m[1])}`
)
);
If supported, you might also match all occurrences of dz between the delimiters using lookarounds with an infinite quantifier:
(?<=\/[^\/]*)dz(?=[^\/]*\/)|(?<=\[[^\][]*)dz(?=[^\][]*])
Regex demo
const regex = /(?<=\/[^\/]*)dz(?=[^\/]*\/)|(?<=\[[^\][]*)dz(?=[^\][]*])/g;
[
"[adzadzone]",
"[dzone]",
"/dzone/",
"/dzone/dzone/",
"/testdztest/",
"[dazone]",
"/dazone/",
"dzone",
"dazone"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(`${s} --> ${s.match(regex)}`);
}
});

Regex to get substring between first and last occurence

Assume there is the string
just/the/path/to/file.txt
I need to get the part between the first and the last slash: the/path/to
I came up with this regex: /^(.*?).([^\/]*)$/, but this gives me everything in front of the last slash.
Don't use [^/]*, since that won't match anything that contains a slash. Just use .* to match anything:
/(.*?)\/(.*)\/(.*)/
Group 1 = just, Group 2 = the/path/to and Group 3 = file.txt.
The regex should be \/(.*)\/. You can check my below demo:
const regex = /\/(.*)\//;
const str = `just/the/path/to/file.txt`;
let m;
if ((m = regex.exec(str)) !== null) {
console.log(m[1]);
}
This regex expression will do the trick
const str = "/the/path/to/the/peace";
console.log(str.replace(/[^\/]*\/(.*)\/[^\/]*/, "$1"));
[^\/]*\/(.*)\/[^\/]*
If you are interested in only matching consecutive parts with a single / and no //
^[^/]*\/((?:[^\/]+\/)*[^\/]+)\/[^\/]*$
^ Start of string
[^/]*\/ Negated character class, optionally match any char except / and then match the first /
( Capture group 1
(?:[^\/]+\/)* Optionally repeat matching 1+ times any char except / followed by matching the /
[^\/]+ Match 1+ times any char except /
) Close group 1
\/[^\/]* Match the last / followed by optionally matching any char except /
$ End of string
Regex demo
const regex = /^[^/]*\/((?:[^\/]+\/)*[^\/]+)\/[^\/]*$/;
[
"just/the/path/to/file.txt",
"just/the/path",
"/just/",
"just/the/path/to/",
"just/the//path/test",
"just//",
].forEach(str => {
const m = str.match(regex);
if (m) {
console.log(m[1])
};
});

Getting Multiple Matches with RegExp in JavaScript

I have a string like this:
`DateTime.now().setZone("America Blorp");`
This is my RegEx:
string.match(/DateTime\.(.*)[^)][(;]/)
How can I modify my RegEx so that I can get matches like this:
DateTime.now and DateTime.now.setZone.
I have tried to group matches like this
string.match(/DateTime\.(.*)([^)]*)([(;]*)/)
But I don't get the expected output. Can anyone please help me with this?
PS. I can only use match function, cannot use matchAll.
const string = `DateTime.now().setZone("America Blorp");`
console.log(
string.match(/DateTime\.(.*)[^)][(;]/)
)
You could match the format using 2 capture groups and concat the groups.
\b(DateTime\.now)\(\)(\.[^.()]+)\([^()]*\);
The pattern matches:
\b A word boundary to prevent a partial match
(DateTime\.now) Capture group 1, match DateTime.now
\(\) Match ()
(\.[^.()]+) Capture group 2, match . and 1+ times any char except . or ( and )
\([^()]*\); Match from ( till ) and ;
See a regex demo.
const regex = /\b(DateTime\.now)\(\)(\.[^.()]+)\([^()]*\);/;
const str = `DateTime.now().setZone("America Blorp");`;
const match = str.match(regex);
if (match) {
console.log(match[1] + match[2]);
}

Filter version number from string in javascript?

I found some threads about extracting version number from a string on here but none that does exactly what I want.
How can I filter out the following version numbers from a string with javascript/regex?
Title_v1_1.00.mov filters 1
v.1.0.1-Title.mp3 filters 1.0.1
Title V.3.4A. filters 3.4A
V3.0.4b mix v2 filters 3.0.4b
So look for the first occurrence of: "v" or "v." followed by a digit, followed by digits, letters or dots until either the end of the string or until a whitepace occurs or until a dot (.) occurs with no digit after it.
As per the comments, to match the first version number in the string you could use a capturing group:
^.*?v\.?(\d+(?:\.\d+[a-z]?)*)
Regex demo
That will match:
^ Assert the start of the string
.*? Match 0+ any character non greedy
v\.? Match v followed by an optional dot
( Capturing group
\d+ Match 1+ digits
(?: Non capturing group
\.\d+[a-z]? Match a dot, 1+ digits followed by an optional character a-z
)* Close non capturing group and repeat 0+ times
) Close capturing group
If the character like A in V.3.4A can only be in the last part, you could use:
^.*?v\.?(\d+(?:\.\d+)*[a-z]?)
const strings = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let pattern = /^.*?v\.?(\d+(?:\.\d+[a-z]?)*)/i;
strings.forEach((s) => {
console.log(s.match(pattern)[1]);
});
Details:
v - character "v"
(?:\.)? - matches 1 or 0 repetition of "."
Version capturing group
[0-9a-z\.]* - Matches alphanumeric and "." character
[0-9a-z] - ensures that version number don't ends with "."
You can use RegExp.exec() method to extract matches from string one by one.
const regex = /v(?:\.?)([0-9a-z\.]*[0-9a-z]).*/gi;
let str = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let versions = [];
let v; // variable to store match
for(let i = 0; i < str.length; i++) {
// Executes a check on str[i] to get the result of first capturing group i.e., our version number
if( (v = regex.exec(str[i])) !== null)
versions.push(v[1]); // appends the version number to the array
// If not found, then it checks again if there is a match present or not
else if(str[i].match(regex) !== null)
i--; // if match found then it loops over the same string again
}
console.log(versions);
var test = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b",
];
console.log(test.map(function (a) {
return a.match(/v\.?([0-9a-z]+(?:\.[0-9a-z]+)*)/i)[1];
}));
Explanation:
/ # regex delimiter
v # letter v
\.? # optional dot
( # start group 1, it will contain the version number
[0-9a-z]+ # 1 or more alphanumeric
(?: # start non capture group
\. # a dot
[0-9a-z]+ # 1 or more alphanumeric
)* # end group, may appear 0 or more times
) # end group 1
/i # regex delimiter and flag case insensitive

Categories