I'm trying to create a regex that will select the numbers/numbers with commas(if easier, can trim commas later) that do not have a parentheses after and not the numbers inside the parentheses should not be selected either.
Used with the JavaScript's String.match method
Example strings
9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
What i have so far:
/((^\d+[^\(])|(,\d+,)|(,*\d+$))/gm
I tried this in regex101 and underlined the numbers i would like to match and x on the one that should not.
You could start with a substitution to remove all the unwanted parts:
/\d*\(.*?\),?//gm
Demo
This leaves you with
5,10
10,2,5,
10,7,2,4
which makes the matching pretty straight forward:
/(\d+)/gm
If you want it as a single match expression you could use a negative lookbehind:
/(?<!\([\d,]*)(\d+)(?:,|$)/gm
Demo - and here's the same matching expression as a runnable javascript (skeleton code borrowed from Wiktor's answer):
const text = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4`;
const matches = Array.from(text.matchAll(/(?<!\([\d,]*)(\d+)(?:,|$)/gm), x=>x[1])
console.log(matches);
Here, I'd recommend the so-called "best regex trick ever": just match what you do not need (negative contexts) and then match and capture what you need, and grab the captured items only.
If you want to match integer numbers that are not matched with \d+\([^()]*\) pattern (a number followed with a parenthetical substring), you can match this pattern or match and capture the \d+, one or more digit matching pattern, and then simply grab Group 1 values from matches:
const text = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4`;
const matches = Array.from(text.matchAll(/\d+\([^()]*\)|(\d+)/g), x=> x[1] ?? "").filter(Boolean)
console.log(matches);
Details:
text.matchAll(/\d+\([^()]*\)|(\d+)/g) - matches one or more digits (\d+) + ( (with \() + any zero or more chars other than ( and ) (with [^()]*) + \) (see \)), or (|) one or more digits captured into Group 1 ((\d+))
Array.from(..., x=> x[1] ?? "") - gets Group 1 value, or, if not assigned, just adds an empty string
.filter(Boolean) - removes empty strings.
Using several replacement regexes
var textA = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
`
console.log('A', textA)
var textB = textA.replace(/\(.*?\),?/g, ';')
console.log('B', textB)
var textC = textB.replace(/^\d+|\d+$|\d*;\d*/gm, '')
console.log('C', textC)
var textD = textC.replace(/,+/g, ' ').trim(',')
console.log('D', textD)
With a loop
Here is a solution which splits the lines on comma and loops over the pieces:
var inside = false;
var result = [];
`9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
`.split("\n").map(line => {
let pieceArray = line.split(",")
pieceArray.forEach((piece, k) => {
if (piece.includes('(')) {
inside = true
} else if (piece.includes(')')) {
inside = false
} else if (!inside && k > 0 && k < pieceArray.length-1 && !pieceArray[k-1].includes(')')) {
result.push(piece)
}
})
})
console.log(result)
It does print the expected result: ["5", "7"]
I have a markdown text file with links like that:
[Text](https://docs.google.com/document/d/unique-doc-id-here/edit)
or
[Text2](https://docs.google.com/document/d/unique-doc-id-here")
I want to replace the whole href with another one by taking the unique-doc-id-here, passing that to a function that will return a new href, so in result my urls would look something like that:
[Text](https://new-url-here.com/fragment-unique-id)
or
[Text2](https://new-url-here.com/fragment-unique-id)
I think my problem is to select the unique-doc-id-here, I think I have to use the regex for that.
So the solution could be looking like this:
text.replace(/https:\/\/docs.google.com\/document\/d\/(.*?)*/gm, (x) =>
this.getNewHref(x)
);
However it seems that the regex does not looks quite right, because it does not much all the cases. Any ideas how to fix?
Here is an input text example:
# Title
Text text text.
Text 1 text 1 text 1, abc.
More text
Bullet points
- [abc]
- [bla]
- [cba]
## Title 2
More text:
- A
- B
- C
- D
Text text text text [url1](https://docs.google.com/document/d/2x2my-DRqfSidOsdve4m9bF_eEOJ7RqIWP7tk7PM4qEr) text.
**BOLD.**
## Title
Text2 text1 text3 text
[url2](https://docs.google.com/document/d/4x2mrhsqfGSidOsdve4m9bb_wEOJ7RqsWP7tk7PMPqEb/edit#bookmark=id.mbnek2bdkj8c) text.
More text here
[bla](https://docs.google.com/document/d/6an7_b4Mb0OdxNZdfD3KedfvFtdf2OeGzG40ztfDhi5o9uU/edit)
I've try this regex \w+:\/\/.*?(?=\s) but it does select the last ) symbol
I've applied a proposed solution by #The fourth bird:
function getNewHref(id: string) {
const data = getText();
const element = data.find((x: any) => x.id === id);
if(element?.url) {
return element.url;
} else {
return 'unknown-url'
}
}
data = data.replace(
/\[[^\][]*]\(https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)/gm,
(x, g1) => getNewHref(g1)
);
The problem is that the replace function replace the whole thing so what was [...](...) becomes ./new-url or unknown-url but needs to me [original text](new result)
You can make the pattern more specific, and then use the group 1 value.
(\[[^\][]*]\()https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)
The pattern in parts matches:
(\[[^\][]*]\() Capture group 1, match from [...]( using a negated character class
https?:\/\/docs\.google\.com\/document\/d\/ Match the leading part of the url
( Capture group 2
[^\s\\\/)]+ Match 1+ chars other than a whitespace char, \ or /
) Close group 1
[^\s)]* Match optional chars other than a whitespace char or )
\) Match )
Regex demo
For example, a happy case scenario where all the keys to be replaced exist (note that you can omit the /m flag as there are no anchors in the pattern)
const text = "[Text](https://docs.google.com/document/d/unique-doc-id-here/edit)";
const regex = /(\[[^\][]*]\()https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)/g;
function getNewHref(id) {
const replacements = {
"unique-doc-id-here": `https://docs.google.com/document/d/${id}`
}
return replacements[id];
}
const replacedText = text.replace(regex, (x, g1, g2) => g1 + getNewHref(g2)) + ")";
console.log(replacedText);
You can achieve this by getting the href link from a string by using RegEx and then by splitting that up using forward slash.
Try this (Descriptive comments has been added in the below code snippet) :
const text = 'Text';
// Get the href link using regex
const link = text.match(/"([^"]*)"/)[1];
// Split the string and get the array of link based on the forward slash.
const linkArr = link.split('/')
// get the unique ID from an array.
const uniqueID = linkArr[linkArr.indexOf('d') + 1]
console.log(uniqueID);
I’m struggling with this simple regex that is not working correctly in Safari:
(?<=\?.*)\?
It should match each ?, except of the first one.
I know that lookbehind is not working on Safari yet, but I need to find some workaround for it. Any suggestions?
You can use an alternation capture until the first occurrence of the question mark. Use that group again in the replacement to leave it unmodified.
In the second part of the alternation, match a questionmark to be replaced.
const regex = /^([^?]*\?)|\?/g;
const s = "test ? test ? test ?? test /";
console.log(s.replace(regex, (m, g1) => g1 ? g1 : "[REPLACE]"));
There are always alternatives to lookbehinds.
In this case, all you need to do is replace all instances of a character (sequence), except the first.
The .replace method accepts a function as the second argument.
That function receives the full match, each capture group match (if any), the offset of the match, and a few other things as parameters.
.indexOf can report the first offset of a match.
Alternatively, .search can also report the first offset of a match, but works with regexes.
The two offsets can be compared inside the function:
const yourString = "Hello? World? What? Who?",
yourReplacement = "!",
pattern = /\?/g,
patternString = "?",
firstMatchOffsetIndexOf = yourString.indexOf(patternString),
firstMatchOffsetSearch = yourString.search(pattern);
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstMatchOffsetIndexOf){
return yourReplacement;
}
return match;
}));
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstMatchOffsetSearch){
return yourReplacement;
}
return match;
}));
This works for character sequences, too:
const yourString = "Hello. Hello. Hello. Hello.",
yourReplacement = "Hi",
pattern = /Hello/g,
firstOffset = yourString.search(pattern);
console.log(yourString.replace(pattern, (match, offset) => {
if(offset !== firstOffset){
return yourReplacement;
}
return match;
}));
Split and join with
var s = "one ? two ? three ? four"
var l = s.split("?") // Split with ?
var first = l.shift() // Get first item and remove from l
console.log(first + "?" + l.join("<REPLACED>")) // Build the results
Examples of filenames
FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_fr-fr-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_de-de-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
REGEX is FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
The only part I need is the translation code which is 'en-gb', 'fr-fr' , 'de-de.
How do I extract just that part of the filename?
Modified the regex little bit to match the numbers and text. You can play around here
Explanation
to capture a group you need to wrap the regex into () this will capture as a group.
to do the named capturing you can (?<name_of_group>) and then you can access by name.
Here goes the matching process.
[a-z]{2} match 2 char from a-z
[a-zA-Z0-9] match any char of a-z or A-Z or 0-9
g means global flag i.e. match all.
i means ignore case.
var r = /FDIP_([a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi;
let t = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
let dd = r.exec(t);
console.log(dd[1]);
This is example of group capturing
See the name in the regex and the object destructing name is matching.
const { groups: { language } } = /FDIP_(?<language>[a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi.exec('FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt');
console.log(language);
To solve your problem, you should:
Fix your regex:
FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
// to
FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt
Use get value from first group by using regex.exec function
const fileNames = [
'FDIP_en-gb-nn_Text_v1_20190101_12345678901234.txt',
'FDIP_fr-fr-nn_Text_v1_20200202_12345678901234.txt',
'FDIP_de-de-nn_Text_v1_20180808_12345678901234.txt']
const cultureNames = fileNames.map(name => {
const matched = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt/.exec(name)
return matched && matched[1]
})
console.log(cultureNames)
Change FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
to
let pattern = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[\w]{8}_[\w]{14}.txt/;
var str = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
console.log(str.match(pattern)[1]);
Consider the following URLs:
<http://www.google.com>
<http://www.google.com|www.google.com>
<http://google.com|google.com>
I'm trying to figure a RegEx that would capture the URL after < up until | OR >
I've tried URL.match(/<([^>|\|]+)/g) but it always capture the first <
Desired output is simply: http://www.google.com
The RegEx is correct. String#match will return the complete match set. You need to extract the first captured group.
Use RegExp#exec to get the URLs.
var str = `<http://www.google.com>
<http://www.google.com|www.google.com>
<http://google.com|google.com>`;
var regex = /<([^>|\|]+)/g;
var urls = [];
while(match = regex.exec(str)) {
urls.push(match[1]); // Get first captured group, and push in array
}
console.log(urls);
document.body.innerHTML = '<pre>' + JSON.stringify(urls, 0, 4) + '</pre>';
You can also use String#match as follow:
str.match(/[^<>|\s]+/g)
var str = `<http://www.google.com>
<http://www.google.com|www.google.com>
<http://google.com|google.com>`;
var urls = str.match(/[^<>|\s]+/g);
console.log(urls);
document.body.innerHTML = '<pre>' + JSON.stringify(urls, 0, 4) + '</pre>';
This kind of pattern doesn't complex require regular expressions! You can use a simple pattern and string operations:
var url = "<http://www.google.com|www.google.com>";
var parts = url.replace(/^<|>$/, "").split("|");
For the solution using regex, try the following:
var url = "<http://www.google.com|www.google.com>";
var match = /(?:<|\|)([^>|]+)/g.exec(url);
You can then access the value of the first capturing group like this:
var url = match[1];
By calling exec on the same regular expression several times, you can find multiple matches (the multiple URLs you're looking for).
Explanation of the regular expression:
(?:<|\|) is a non-capturing group ((?: ... )) that looks for either a < or a | symbol at the beginning. (In your case, every URL will either have a < or a | on the left side of it!)
([^>|]+) is a capturing group (( ... )) that capturing a sequence of characters that are not > or |. You don't need to escape the | within a character class, it only has special meaning outside of it.
str.match(/(http\:.*)(?=\|)/)[0]
var strs = ["<http://www.google.com>",
"<http://www.google.com|www.google.com>",
"<http://google.com|google.com>"];
strs.forEach(function(str) {
// if `str` contains `|` character,
// match characters that are followed by `|`
if (/\|/.test(str)) {
console.log(str.match(/(http\:.*)(?=\|)/)[0])
}
// else match characters that are not `<`, `>`
else {
console.log(str.match(/[^<>]+/)[0])
}
})
JS Fiddle
var url1 = '<http://www.google.com|www.example.com>',
url2 = '<http://www.yahoo.com>';
console.log(url1.replace(/<|>/g, '').split('|'));
console.log(url2.replace(/<|>/g, '').split('|'));