How to use regex to replace part of string? - javascript

How do I search a string in Javascript for all instances of [SM_g]randomword[SM_h]. and replace them all with [SM_g]randomword.[SM_h].
I'd also like to do this with commas. For example - I need to be able to turn [SM_g]randomword[SM_h], into [SM_g]randomword,[SM_h].
Here is my code so far:
const periodRegex = /\[SM_g](.*?)\[SM_h](?:[.])/g;
string_afterregex - string_initial.replace(periodRegex, /*I don't know what goes in here*/)

Capture the patterns and then reorder with back reference:
const periodRegex = /(\[SM_g].*?)(\[SM_h])([.,])/g;
var string_initial = '[SM_g]randomword[SM_h].'
var string_afterregex = string_initial.replace(periodRegex, "$1$3$2");
console.log(string_afterregex);
var string_initial = '[SM_g]randomword[SM_h],'
console.log(string_initial.replace(periodRegex, "$1$3$2"))

You don't have to use multiple regexes to achieve what you want if you benefit from lookaheads:
const re = /\[SM_g](?=(((?:(?!\[SM_h]).)*)\[SM_h])([.,]))\1./g;
const str = '[SM_g]$2$3[SM_h]';
console.log("[SM_g]WORD[SM_h], [SM_g]WORD[SM_h]. [SM_g]WORD[SM_h]".replace(re, str));
Breakdown:
\[SM_g] Match [SM_g] literally
(?= Start of a positive lookahead
( Start of capturing group #1
( Start of CG #2
(?: Start of NCG #1, tempered dot
(?!\[SM_h]). Match next character without skipping over a [SM_h]
)* End of NCG #1, repeat as much as possible
) End of CG #2
\[SM_h] Match [SM_h] literally
) End of CG #2
( Start of CG #3
[.,] Match a comma or a period
) End of CG #4
) End of positive lookahead
\1. Match what's matched by CG #1 then a character

Related

Regex negative lookahead excluding full block

I am trying to put together a regex that would extract me the surface from the below strings, excluding the values that are preceded with Japanese characters.
"110.94m2・129.24m2"; --> 110.94m2 and 129.24m2
"81.95m2(24.78坪)、うち2階車庫8.9m2" --> 81.95m2
"80.93m2(登記)" --> 80.93m2
"93.42m2・93.85m2(登記)" --> 93.42m2 and 93.85m2
"81.82m2(実測)" --> 81.82m2
"81.82m2(実測)、うち1階車庫7.82m2" --> 81.82m2
"90.11m2(実測)、うち1階車庫8.07m2" --> 90.11m2
So far I have put together the following regex, however not working in every case.
(?<![\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF])([0-9\.]*m2)
ie. the following string yields: 81.95m2 and .9m2. I would need only 81.85m2.
"81.95m2(24.78坪)、うち2階車庫8.9m2"
Would you know how to treat the following block of the negative look ahead as an exclusion?
Thank you
You need to cancel any match if preceded with a digit or digit + period.
Add (?<!\d)(?<!\d\.) after or before the first lookbehind:
(?<![\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF])(?<!\d)(?<!\d\.)(\d+(?:\.\d+)?m2)
See the regex demo
The (?<!\d) is a negative lookbehind that fails the match if there is a digit immediately to the left of the current location and (?<!\d\.) fails when there is a digit and a dot right before.
The \d+(?:\.\d+)? is a more precise pattern to match numbers like 30 or 30.5678: 1 or more digits followed with an optional sequence of . and 1+ digits.
NOTE that this regex will only work with the ES2018+ JS environments (Chrome, Node). You may capture an optional Japanese char into Group 1 and the number into Group 2, then check if Group 1 matched and if yes, fail the match, else, grab Group 2.
The regex is
/([\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF])?(\d+(?:\.\d+)?m2)/g
See usage example below.
JS ES2018+ demo:
const lst = ["110.94m2・129.24m2", "81.95m2(24.78坪)、うち2階車庫8.9m2", "80.93m2(登記)", "93.42m2・93.85m2(登記)", "81.82m2(実測)" , "81.82m2(実測)、うち1階車庫7.82m2", "90.11m2(実測)、うち1階車庫8.07m2"];
const regex = /(?<![\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF])(?<!\d)(?<!\d\.)(\d+(?:\.\d+)?m2)/g;
lst.forEach( s =>
console.log( s, '=>', s.match(regex) )
);
console.log("Another approach:");
lst.forEach( s =>
console.log(s, '=>', s.match(/(?<![\p{L}\d]|\d\.)\d+(?:\.\d+)?m2/gu))
)
JS legacy ES versions:
var lst = ["110.94m2・129.24m2", "81.95m2(24.78坪)、うち2階車庫8.9m2", "80.93m2(登記)", "93.42m2・93.85m2(登記)", "81.82m2(実測)" , "81.82m2(実測)、うち1階車庫7.82m2", "90.11m2(実測)、うち1階車庫8.07m2"];
var regex = /([\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF])?(\d+(?:\.\d+)?m2)/g;
for (var i=0; i<lst.length; i++) {
var m, res =[];
while (m = regex.exec(lst[i])) {
if (m[1] === undefined) {
res.push(m[2]);
}
}
console.log( lst[i], '=>', res );
}
Variations
If you plan to match a float/int number with m2 after it that is only preceded with whitespace or at the start of the string use
(?<!\S)\d+(?:\.\d+)?m2
If you plan to match it when not preceded with any letter use
pcre java - (?<![\p{L}\d]|\d\.)\d+(?:\.\d+)?m2 (also works in JS ES2018+ environments: /(?<![\p{L}\d]|\d\.)\d+(?:\.\d+)?m2/gu)
python - (?<!\d\.)(?<![^\W_])\d+(?:\.\d+)?m2
Note you may add \b word boundary after 2 to make sure there is a non-word char after it or end of string.

How use regexp match all repeat substring in javascript?

How use regexp match all repeat substring in javascript?
For example:
Get [ "cd","cd","cdcd","cdcdcd", "cdcdcdcd" ] by "abccdddcdcdcdcd123"
+ is not working:
"abccdddcdcdcdcd123".match(/(cd)+/g)
Array [ "cd", "cdcdcdcd" ]
This can be done with positive look aheads ?=. This type of matching doesnt move the cursor forward so you can match the same content multiple times.
var re = /cd(?=((cd)*))/g;
var str = "abccdddcdcdcdcd123";
var m;
while (m = re.exec(str)) {
console.log(m[0]+m[1]);
}
Capture group 0 gets the first cd, then a positive lookahead captures all subsequent cd characters. You can combine the two to get the desired result.
See https://www.regular-expressions.info/refadv.html
Matches at a position where the pattern inside the lookahead can be matched. Matches only the position. It does not consume any characters or expand the match. In a pattern like one(?=two)three, both two and three have to match at the position where the match of one ends.
I guess you could also do it like this.
Put the capture group inside a lookahead assertion.
Most engines bump the current regex position if it didn't change since
last match. Not JS though, you have to do it manually via incrementing lastIndex.
Readable regex
(?=
( # (1 start)
(?: cd )+
) # (1 end)
)
var re = /(?=((?:cd)+))/g;
var str = "abccdddcdcdcdcd123";
var m;
while (m = re.exec(str)) {
console.log( m[1] );
++re.lastIndex;
}
I think the common solution to an overlapping match problem like this should be as following:
/(?=((cd)+))cd
Match the inner pattern in group one or more times in a lookahead whilst moving the carret two characters at a time ahead. (We could also move by two dots ..).
Code sample:
var re = /(?=((cd)+))cd/g;
var str = "abccdddcdcdcdcd123";
var m; //var arr = new Array();
while (m = re.exec(str)) {
//arr.push(m[1]);
console.log(m[1]);
}
We get the result from group 1 via m[1].
Use .push(m[1]); to add it to an array.

getting values from a string using regular expression

Could anyone help me with this regular expression issue?
expr = /\(\(([^)]+)\)\)/;
input = ((111111111111))
the one I would need to be working is = ((111111111111),(222222222),(333333333333333))
That expression works fine to get 111111 from (input) , but not when there are also the groups 2222... and 3333.... the input might be variable by variable I mean could be ((111111111111)) or the one above or different (always following the same parenthesis pattern though)
Is there any reg expression to extract the values for both cases to an array?
The result I would like to come to is:
[0] = "111111"
[1] = "222222"
[2] = "333333"
Thanks
If you are trying to validate format while extracting desired parts you could use sticky y flag. This flag starts match from beginning and next match from where previous match ends. This approach needs one input string at a time.
Regex:
/^\(\(([^)]+)\)|(?!^)(?:,\(([^)]+)\)|\)$)/yg
Breakdown:
^\(\( Match beginning of input and immedietly ((
( Start of capturing group #1
[^)]+ Match anything but )
)\) End of CG #1, match ) immediately
| Or
(?!^) Next patterns shouldn't start at beginning
(?: Start of non-capturing group
,\(([^)]+)\) Match a separetd group (capture value in CG #2, same pattern as above)
| Or
\)$ Match ) and end of input
) End of group
JS code:
var str = '((111111111111),(222222222),(333333333333333))';
console.log(
str.replace(/^\(\(([^)]+)\)|(?!^)(?:,\(([^)]+)\)|\)$)/yg, '$1$2\n')
.split(/\n/).filter(Boolean)
);
You can replace brackes with , split it with , and then use substring to get the required number of string characters out of it.
input.replace(/\(/g, '').replace(/\)/g, '')
This will replace all the ( and ) and return a string like
111111111111,222222222,333333333333333
Now splitting this string with , will result into an array to what you want
var input = "((111111111111),(222222222),(333333333333333))";
var numbers = input.replace(/\(/g, '').replace(/\)/g, '')
numbers.split(",").map(o=> console.log(o.substring(0,6)))
If the level of nesting is fixed, you can just leave out the outer () from the pattern, and add the left parentheses to the [^)] group:
var expr = /\(([^()]+)\)/g;
var input = '((111111111111),(222222222),(333333333333333))';
var match = null;
while(match = expr.exec(input)) {
console.log(match[1]);
}

Get the string between the last 2 / in regex in javascript

How can I get the strings between last 2 slashes in regex in javascript?
for example:
stackoverflow.com/questions/ask/index.html => "ask"
http://regexr.com/foo.html?q=bar => "regexr.com"
https://www.w3schools.com/icons/default.asp => "icons"
You can use /\/([^/]+)\/[^/]*$/; [^/]*$ matches everything after the last slash, \/([^/]+)\/ matches the last two slashes, then you can capture what is in between and extract it:
var samples = ["stackoverflow.com/questions/ask/index.html",
"http://regexr.com/foo.html?q=bar",
"https://www.w3schools.com/icons/default.asp"]
console.log(
samples.map(s => s.match(/\/([^/]+)\/[^/]*$/)[1])
)
You can solve this by using split().
let a = 'stackoverflow.com/questions/ask/index.html';
let b = 'http://regexr.com/foo.html?q=bar';
let c = 'https://www.w3schools.com/icons/default.asp';
a = a.split('/')
b = b.split('/')
c = c.split('/')
indexing after split()
console.log(a[a.length-2])
console.log(b[b.length-2])
console.log(c[c.length-2])
I personally do not recommend using regex. Because it is hard to maintain
I believe that will do:
[^\/]+(?=\/[^\/]*$)
[^\/]+ This matches all chars other than /. Putting this (?=\/[^\/]*$) in the sequence looks for the pattern that comes before the last /.
var urls = [
"stackoverflow.com/questions/ask/index.html",
"http://regexr.com/foo.html?q=bar",
"https://www.w3schools.com/icons/default.asp"
];
urls.forEach(url => console.log(url.match(/[^\/]+(?=\/[^\/]*$)/)[0]));
You can use (?=[^/]*\/[^/]*$)(.*?)(?=\/[^/]*$). You can test it here: https://www.regexpal.com/
The format of the regex is: (positive lookahead for second last slash)(.*?)(positive lookahead for last slash).
The (.*?) is a lazy match for what's between the slashes.
references:
Replace second to last "/" character in URL with a '#'
RegEx that will match the last occurrence of dot in a string

Regex match the prefix and suffix of strings ending with numbers

Using a regex I would like to find out the prefix and suffix of string like these:
T12231 should match ['T', '12231']
Acw2123 should match ['Acw', '2123']
121Ab should match ['121ab', null]
1213 should match [null, '1213']
Matching only the numbers at the end of the string is easily done with this regex /([0-9]+)$/g.
Matching everything from the beginning of the string up to this point I did not manage to do. The closest I got was for the 1st group to match everything but the last number with /^(.*)([0-9]+)$/g.
You can make the first capture group lazy, .*? so it matches as short as possible, i.e, make the second capture group as long as possible:
var s = ["T12231", "Acw2123", "121Ab", "1213"];
console.log(
s.map(x => x.replace(/^(.*?)([0-9]*)$/, "$1 $2"))
);
Push the split result into an array:
var s = ["T12231", "Acw2123", "121Ab", "1213"];
var arr = [];
s.forEach(x => x.replace(/^(.*?)([0-9]*)$/, (string, $1, $2) => arr.push([$1, $2])));
console.log(arr);
You are almost right. Try using this:
var re = /^(.*?)(\d+.*)$/g;
var groups = re.exec(your_string)
Satisfies all cases
^(?=\d|.*\d$)((?:(?!\d+$).)*)(\d*)$
https://regex101.com/r/BWwsIA/1
^ # BOS
(?= \d | .* \d $ ) # Must begin or end with digit
( # (1 start)
(?: # Cluster begin
(?! \d+ $ ) # Not digits then end
. # Any char
)* # Cluster end, 0 to many times
) # (1 end)
( \d* ) # (2)
$ # EOS

Categories