Replace capture group of dynamic size - javascript

I want to replace the first part of regex for a URL with asterisks. Depending on the regex, for example:
Case 1
http://example.com/path1/path2?abcd => http://example.com/path1/**********
Regex 1: /^(https?:\/\/.+\/path1\/?)(.+)/but I want each character in group 2 to be replaced individually with *
or
Case 2
person#example.com => ******#example.com
Regex 2
/^(.+)(#.+)$/, similarly I want all characters in the first capture group to be replaced individually with *
I have tried to use capture groups, but then, I'm left with *#example.com
let email = `person#example.com`;
let regex = /^(.+)(#.+)$/;
console.log(email.replace(regex, '*$2'));
let url = `http://example.com/path1/path2?abcd`;
let regex = /^(https?:\/\/.+\/path1\/?)(.+)/;
console.log(url.replace(regex, '$1*'));

You may use
let email = `person#example.com`;
let regex = /[^#]/gy;
console.log(email.replace(regex, '*'));
// OR
console.log(email.replace(/(.*)#/, function ($0,$1) {
return '*'.repeat($1.length) + "#";
}));
and
let url = `http://example.com/path1/path2?abcd`;
let regex = /^(https?:\/\/.+\/path1\/?)(.*)/gy;
console.log(url.replace(regex, (_,$1,$2) => `${$1}${'*'.repeat($2.length)}` ));
// OR
console.log(url.replace(regex, function (_,$1,$2) {
return $1 + ('*'.repeat($2.length));
}));
In case of .replace(/[^#]/gy, '*'), each char other than # from the start of the string is replaced with * (so, up to the first #).
In case of .replace(/(.*)#/, function ($0,$1) { return '*'.repeat($1.length) + "#"; }), all chars up to the last # are captured into Group 1 and then the match is replaced with the same amount of asterisks as the length of the Group 1 value + the # char (it should be added into the replacement pattern as it is used as part of the consuming regex part).
The .replace(regex, (_,$1,$2) => `${$1}${'*'.repeat($2.length)}` ) follows the same logic as the case described above: you capture the part you need to replace, pass it into the anonymous callback method and manipulate its value using a bit of code.

You can use the sticky flag y (but Internet Explorer doesn't support it):
s = s.replace(/(^https?:\/\/.*?\/path1\/?|(?!^))./gy, '$1*')
But the simplest (and that is supported everywhere), is to use a function as replacement parameter.
s = s.replace(/^(https?:\/\/.+\/path1\/?)(.*)/, function (_, m1, m2) {
return m1 + '*'.repeat(m2.length);
});
For the second case, you can simply check if there's an # after the current position:
s = s.replace(/.(?=.*#)/g, '*');

Related

Regex expression to get numbers without parentheses ()

I'm trying to create a regex that will select the numbers/numbers with commas(if easier, can trim commas later) that do not have a parentheses after and not the numbers inside the parentheses should not be selected either.
Used with the JavaScript's String.match method
Example strings
9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
What i have so far:
/((^\d+[^\(])|(,\d+,)|(,*\d+$))/gm
I tried this in regex101 and underlined the numbers i would like to match and x on the one that should not.
You could start with a substitution to remove all the unwanted parts:
/\d*\(.*?\),?//gm
Demo
This leaves you with
5,10
10,2,5,
10,7,2,4
which makes the matching pretty straight forward:
/(\d+)/gm
If you want it as a single match expression you could use a negative lookbehind:
/(?<!\([\d,]*)(\d+)(?:,|$)/gm
Demo - and here's the same matching expression as a runnable javascript (skeleton code borrowed from Wiktor's answer):
const text = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4`;
const matches = Array.from(text.matchAll(/(?<!\([\d,]*)(\d+)(?:,|$)/gm), x=>x[1])
console.log(matches);
Here, I'd recommend the so-called "best regex trick ever": just match what you do not need (negative contexts) and then match and capture what you need, and grab the captured items only.
If you want to match integer numbers that are not matched with \d+\([^()]*\) pattern (a number followed with a parenthetical substring), you can match this pattern or match and capture the \d+, one or more digit matching pattern, and then simply grab Group 1 values from matches:
const text = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4`;
const matches = Array.from(text.matchAll(/\d+\([^()]*\)|(\d+)/g), x=> x[1] ?? "").filter(Boolean)
console.log(matches);
Details:
text.matchAll(/\d+\([^()]*\)|(\d+)/g) - matches one or more digits (\d+) + ( (with \() + any zero or more chars other than ( and ) (with [^()]*) + \) (see \)), or (|) one or more digits captured into Group 1 ((\d+))
Array.from(..., x=> x[1] ?? "") - gets Group 1 value, or, if not assigned, just adds an empty string
.filter(Boolean) - removes empty strings.
Using several replacement regexes
var textA = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
`
console.log('A', textA)
var textB = textA.replace(/\(.*?\),?/g, ';')
console.log('B', textB)
var textC = textB.replace(/^\d+|\d+$|\d*;\d*/gm, '')
console.log('C', textC)
var textD = textC.replace(/,+/g, ' ').trim(',')
console.log('D', textD)
With a loop
Here is a solution which splits the lines on comma and loops over the pieces:
var inside = false;
var result = [];
`9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
`.split("\n").map(line => {
let pieceArray = line.split(",")
pieceArray.forEach((piece, k) => {
if (piece.includes('(')) {
inside = true
} else if (piece.includes(')')) {
inside = false
} else if (!inside && k > 0 && k < pieceArray.length-1 && !pieceArray[k-1].includes(')')) {
result.push(piece)
}
})
})
console.log(result)
It does print the expected result: ["5", "7"]

Replacing url by a value taking from the url with another url

I have a markdown text file with links like that:
[Text](https://docs.google.com/document/d/unique-doc-id-here/edit)
or
[Text2](https://docs.google.com/document/d/unique-doc-id-here")
I want to replace the whole href with another one by taking the unique-doc-id-here, passing that to a function that will return a new href, so in result my urls would look something like that:
[Text](https://new-url-here.com/fragment-unique-id)
or
[Text2](https://new-url-here.com/fragment-unique-id)
I think my problem is to select the unique-doc-id-here, I think I have to use the regex for that.
So the solution could be looking like this:
text.replace(/https:\/\/docs.google.com\/document\/d\/(.*?)*/gm, (x) =>
this.getNewHref(x)
);
However it seems that the regex does not looks quite right, because it does not much all the cases. Any ideas how to fix?
Here is an input text example:
# Title
Text text text.
Text 1 text 1 text 1, abc.
More text
Bullet points
- [abc]
- [bla]
- [cba]
## Title 2
More text:
- A
- B
- C
- D
Text text text text [url1](https://docs.google.com/document/d/2x2my-DRqfSidOsdve4m9bF_eEOJ7RqIWP7tk7PM4qEr) text.
**BOLD.**
## Title
Text2 text1 text3 text
[url2](https://docs.google.com/document/d/4x2mrhsqfGSidOsdve4m9bb_wEOJ7RqsWP7tk7PMPqEb/edit#bookmark=id.mbnek2bdkj8c) text.
More text here
[bla](https://docs.google.com/document/d/6an7_b4Mb0OdxNZdfD3KedfvFtdf2OeGzG40ztfDhi5o9uU/edit)
I've try this regex \w+:\/\/.*?(?=\s) but it does select the last ) symbol
I've applied a proposed solution by #The fourth bird:
function getNewHref(id: string) {
const data = getText();
const element = data.find((x: any) => x.id === id);
if(element?.url) {
return element.url;
} else {
return 'unknown-url'
}
}
data = data.replace(
/\[[^\][]*]\(https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)/gm,
(x, g1) => getNewHref(g1)
);
The problem is that the replace function replace the whole thing so what was [...](...) becomes ./new-url or unknown-url but needs to me [original text](new result)
You can make the pattern more specific, and then use the group 1 value.
(\[[^\][]*]\()https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)
The pattern in parts matches:
(\[[^\][]*]\() Capture group 1, match from [...]( using a negated character class
https?:\/\/docs\.google\.com\/document\/d\/ Match the leading part of the url
( Capture group 2
[^\s\\\/)]+ Match 1+ chars other than a whitespace char, \ or /
) Close group 1
[^\s)]* Match optional chars other than a whitespace char or )
\) Match )
Regex demo
For example, a happy case scenario where all the keys to be replaced exist (note that you can omit the /m flag as there are no anchors in the pattern)
const text = "[Text](https://docs.google.com/document/d/unique-doc-id-here/edit)";
const regex = /(\[[^\][]*]\()https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)/g;
function getNewHref(id) {
const replacements = {
"unique-doc-id-here": `https://docs.google.com/document/d/${id}`
}
return replacements[id];
}
const replacedText = text.replace(regex, (x, g1, g2) => g1 + getNewHref(g2)) + ")";
console.log(replacedText);
You can achieve this by getting the href link from a string by using RegEx and then by splitting that up using forward slash.
Try this (Descriptive comments has been added in the below code snippet) :
const text = 'Text';
// Get the href link using regex
const link = text.match(/"([^"]*)"/)[1];
// Split the string and get the array of link based on the forward slash.
const linkArr = link.split('/')
// get the unique ID from an array.
const uniqueID = linkArr[linkArr.indexOf('d') + 1]
console.log(uniqueID);

Match all instances of character except the first one, without lookbehind

I’m struggling with this simple regex that is not working correctly in Safari:
(?<=\?.*)\?
It should match each ?, except of the first one.
I know that lookbehind is not working on Safari yet, but I need to find some workaround for it. Any suggestions?
You can use an alternation capture until the first occurrence of the question mark. Use that group again in the replacement to leave it unmodified.
In the second part of the alternation, match a questionmark to be replaced.
const regex = /^([^?]*\?)|\?/g;
const s = "test ? test ? test ?? test /";
console.log(s.replace(regex, (m, g1) => g1 ? g1 : "[REPLACE]"));
There are always alternatives to lookbehinds.
In this case, all you need to do is replace all instances of a character (sequence), except the first.
The .replace method accepts a function as the second argument.
That function receives the full match, each capture group match (if any), the offset of the match, and a few other things as parameters.
.indexOf can report the first offset of a match.
Alternatively, .search can also report the first offset of a match, but works with regexes.
The two offsets can be compared inside the function:
const yourString = "Hello? World? What? Who?",
yourReplacement = "!",
pattern = /\?/g,
patternString = "?",
firstMatchOffsetIndexOf = yourString.indexOf(patternString),
firstMatchOffsetSearch = yourString.search(pattern);
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstMatchOffsetIndexOf){
return yourReplacement;
}
return match;
}));
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstMatchOffsetSearch){
return yourReplacement;
}
return match;
}));
This works for character sequences, too:
const yourString = "Hello. Hello. Hello. Hello.",
yourReplacement = "Hi",
pattern = /Hello/g,
firstOffset = yourString.search(pattern);
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstOffset){
return yourReplacement;
}
return match;
}));
Split and join with
var s = "one ? two ? three ? four"
var l = s.split("?") // Split with ?
var first = l.shift() // Get first item and remove from l
console.log(first + "?" + l.join("<REPLACED>")) // Build the results

Extract a part of a regex name

Examples of filenames
FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_fr-fr-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_de-de-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
REGEX is FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
The only part I need is the translation code which is 'en-gb', 'fr-fr' , 'de-de.
How do I extract just that part of the filename?
Modified the regex little bit to match the numbers and text. You can play around here
Explanation
to capture a group you need to wrap the regex into () this will capture as a group.
to do the named capturing you can (?<name_of_group>) and then you can access by name.
Here goes the matching process.
[a-z]{2} match 2 char from a-z
[a-zA-Z0-9] match any char of a-z or A-Z or 0-9
g means global flag i.e. match all.
i means ignore case.
var r = /FDIP_([a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi;
let t = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
let dd = r.exec(t);
console.log(dd[1]);
This is example of group capturing
See the name in the regex and the object destructing name is matching.
const { groups: { language } } = /FDIP_(?<language>[a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi.exec('FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt');
console.log(language);
To solve your problem, you should:
Fix your regex:
FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
// to
FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt
Use get value from first group by using regex.exec function
const fileNames = [
'FDIP_en-gb-nn_Text_v1_20190101_12345678901234.txt',
'FDIP_fr-fr-nn_Text_v1_20200202_12345678901234.txt',
'FDIP_de-de-nn_Text_v1_20180808_12345678901234.txt']
const cultureNames = fileNames.map(name => {
const matched = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt/.exec(name)
return matched && matched[1]
})
console.log(cultureNames)
Change FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
to
let pattern = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[\w]{8}_[\w]{14}.txt/;
var str = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
console.log(str.match(pattern)[1]);

RegEx To capture URL between angle brackets (or pipe)

Consider the following URLs:
<http://www.google.com>
<http://www.google.com|www.google.com>
<http://google.com|google.com>
I'm trying to figure a RegEx that would capture the URL after < up until | OR >
I've tried URL.match(/<([^>|\|]+)/g) but it always capture the first <
Desired output is simply: http://www.google.com
The RegEx is correct. String#match will return the complete match set. You need to extract the first captured group.
Use RegExp#exec to get the URLs.
var str = `<http://www.google.com>
<http://www.google.com|www.google.com>
<http://google.com|google.com>`;
var regex = /<([^>|\|]+)/g;
var urls = [];
while(match = regex.exec(str)) {
urls.push(match[1]); // Get first captured group, and push in array
}
console.log(urls);
document.body.innerHTML = '<pre>' + JSON.stringify(urls, 0, 4) + '</pre>';
You can also use String#match as follow:
str.match(/[^<>|\s]+/g)
var str = `<http://www.google.com>
<http://www.google.com|www.google.com>
<http://google.com|google.com>`;
var urls = str.match(/[^<>|\s]+/g);
console.log(urls);
document.body.innerHTML = '<pre>' + JSON.stringify(urls, 0, 4) + '</pre>';
This kind of pattern doesn't complex require regular expressions! You can use a simple pattern and string operations:
var url = "<http://www.google.com|www.google.com>";
var parts = url.replace(/^<|>$/, "").split("|");
For the solution using regex, try the following:
var url = "<http://www.google.com|www.google.com>";
var match = /(?:<|\|)([^>|]+)/g.exec(url);
You can then access the value of the first capturing group like this:
var url = match[1];
By calling exec on the same regular expression several times, you can find multiple matches (the multiple URLs you're looking for).
Explanation of the regular expression:
(?:<|\|) is a non-capturing group ((?: ... )) that looks for either a < or a | symbol at the beginning. (In your case, every URL will either have a < or a | on the left side of it!)
([^>|]+) is a capturing group (( ... )) that capturing a sequence of characters that are not > or |. You don't need to escape the | within a character class, it only has special meaning outside of it.
str.match(/(http\:.*)(?=\|)/)[0]
var strs = ["<http://www.google.com>",
"<http://www.google.com|www.google.com>",
"<http://google.com|google.com>"];
strs.forEach(function(str) {
// if `str` contains `|` character,
// match characters that are followed by `|`
if (/\|/.test(str)) {
console.log(str.match(/(http\:.*)(?=\|)/)[0])
}
// else match characters that are not `<`, `>`
else {
console.log(str.match(/[^<>]+/)[0])
}
})
JS Fiddle
var url1 = '<http://www.google.com|www.example.com>',
url2 = '<http://www.yahoo.com>';
console.log(url1.replace(/<|>/g, '').split('|'));
console.log(url2.replace(/<|>/g, '').split('|'));

Categories