Extract a part of a regex name - javascript

Examples of filenames
FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_fr-fr-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_de-de-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
REGEX is FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
The only part I need is the translation code which is 'en-gb', 'fr-fr' , 'de-de.
How do I extract just that part of the filename?

Modified the regex little bit to match the numbers and text. You can play around here
Explanation
to capture a group you need to wrap the regex into () this will capture as a group.
to do the named capturing you can (?<name_of_group>) and then you can access by name.
Here goes the matching process.
[a-z]{2} match 2 char from a-z
[a-zA-Z0-9] match any char of a-z or A-Z or 0-9
g means global flag i.e. match all.
i means ignore case.
var r = /FDIP_([a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi;
let t = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
let dd = r.exec(t);
console.log(dd[1]);
This is example of group capturing
See the name in the regex and the object destructing name is matching.
const { groups: { language } } = /FDIP_(?<language>[a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi.exec('FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt');
console.log(language);

To solve your problem, you should:
Fix your regex:
FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
// to
FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt
Use get value from first group by using regex.exec function
const fileNames = [
'FDIP_en-gb-nn_Text_v1_20190101_12345678901234.txt',
'FDIP_fr-fr-nn_Text_v1_20200202_12345678901234.txt',
'FDIP_de-de-nn_Text_v1_20180808_12345678901234.txt']
const cultureNames = fileNames.map(name => {
const matched = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt/.exec(name)
return matched && matched[1]
})
console.log(cultureNames)

Change FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
to
let pattern = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[\w]{8}_[\w]{14}.txt/;
var str = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
console.log(str.match(pattern)[1]);

Related

js Remove a part from a parameter which doesnt fit a pattern

const regex = /[1-9a-zA-Z]{3}-[1-9a-zA-Z]{3}-[1-9a-zA-Z]{3}/gm;
let m;
while ((m = regex.exec(tweet.text)) !== null) {
let newClass = tweet.text.replace(/[^1-9a-zA-Z]{3}-[^1-9a-zA-Z]{3}-[^1-9a-zA-Z]{3}/g, '');
console.log(`Found match: ${newClass}`);
};
when tweet.text = "123.qwe.456 test" I still get the same output but I want to remove anything which doesnt fit the pattern
/[1-9a-zA-Z]{3}-[1-9a-zA-Z]{3}-[1-9a-zA-Z]{3}/
any ideas?
You can use capture groups to extract exactly what gets matched in your string and then replace your original variable with this value. Something like
const regex = /([1-9a-zA-Z]{3}-[1-9a-zA-Z]{3}-[1-9a-zA-Z]{3})/
let match = tweet.text.match(regex)
tweet.text = match[1]
Instead of replace, you can get the match instead
\b[1-9a-zA-Z]{3}([-.])[1-9a-zA-Z]{3}\1[1-9a-zA-Z]{3}\b
Explanation
\b A word boundary
[1-9a-zA-Z]{3} Match 3 times any of the listed (Note that 1-9 does not match a 0)
([-.]) Capture in group 1 either an - or .
[1-9a-zA-Z]{3} Match 3 times any of the listed
\1 Back reference to group 1, match the same as captured in group 1
[1-9a-zA-Z]{3} Match 3 times any of the listed
\b A word boundary
Regex demo
const regex = /[1-9a-zA-Z]{3}-[1-9a-zA-Z]{3}-[1-9a-zA-Z]{3}/gm;
let m;
while ((m = regex.exec(tweet.text)) !== null) {
console.log(`Found match: ${m[0]}`);
figured the solution

regular expression replacement in JavaScript with some part remaining intact

I need to parse a string that comes like this:
-38419-indices-foo-7119-attributes-10073-bar
Where there are numbers followed by one or more words all joined by dashes. I need to get this:
[
0 => '38419-indices-foo',
1 => '7119-attributes',
2 => '10073-bar',
]
I had thought of attempting to replace only the dash before a number with a : and then using .split(':') - how would I do this? I don't want to replace the other dashes.
Imo, the pattern is straight-forward:
\d+\D+
To even get rid of the trailing -, you could go for
(\d+\D+)(?:-|$)
Or
\d+(?:(?!-\d|$).)+
You can see it here:
var myString = "-38419-indices-foo-7119-attributes-10073-bar";
var myRegexp = /(\d+\D+)(?:-|$)/g;
var result = [];
match = myRegexp.exec(myString);
while (match != null) {
// matched text: match[0]
// match start: match.index
// capturing group n: match[n]
result.push(match[1]);
match = myRegexp.exec(myString);
}
console.log(result);
// alternative 2
let alternative_results = myString.match(/\d+(?:(?!-\d|$).)+/g);
console.log(alternative_results);
Or a demo on regex101.com.
Logic
lazy matching using quantifier .*?
Regex
.*?((\d+)\D*)(?!-)
https://regex101.com/r/WeTzF0/1
Test string
-38419-indices-foo-7119-attributes-10073-bar-333333-dfdfdfdf-dfdfdfdf-dfdfdfdfdfdf-123232323-dfsdfsfsdfdf
Matches
Further steps
You need to split from the matches and insert into your desired array.

javascript extract hashtags from strings

I have a string received from backend, and I need to extract hashtags. The tags are written in one of these two forms
type 1. #World is a #good #place to #live.
type 2. #World#place#live.
I managed to extract from first type by : str.replace(/#(\S*)/g
how can i change the second format to space seperated tags as well as format one?
basically i want format two to be converted from
#World#place#live.
to
#World #place #live.
You can use String.match, with regex #\w+:
var str = `
type 1. #World is a #good #place to #live.
type 2. #World#place#live.`
var matches = str.match(/#\w+/g)
console.log(matches)
\w+ matches any word character [a-zA-Z0-9_] more than once, so you might want to tweak that.
Once you have the matches in an array you can rearrange them to your likes.
The pattern #(\S*) will match a # followed by 0+ times a non whitespace character in a captured group. That would match a single # as well. The string #World#place#live. contains no whitespace character so the whole string will be matched.
You could match them instead by using a negated character class. Match #, followed by a negated character class that matches not a # or a whitespace character.
#[^#\s]+
Regex demo
const strings = [
"#World is a #good #place to #live.",
"#World#place#live."
];
let pattern = /#[^#\s]+/g;
strings.forEach(s => {
console.log(s.match(pattern));
});
How about that using regex /#([\w]+\b)/gm and join by space like below to extract #hastags from your string? OR you can use str.replace(/\b#[^\s#]+/g, " $&") as commented by #Wiktor
function findHashTags(str) {
var regex = /#([\w]+\b)/gm;
var matches = [];
var match;
while ((match = regex.exec(str))) {
matches.push(match[0]);
}
return matches;
}
let str1 = "#World is a #good #place to #live."
let str2 = "#World#place#live";
let res1 = findHashTags(str1);
let res2 = findHashTags(str2);
console.log(res1.join(' '));
console.log(res2.join(' '));

getting values from a string using regular expression

Could anyone help me with this regular expression issue?
expr = /\(\(([^)]+)\)\)/;
input = ((111111111111))
the one I would need to be working is = ((111111111111),(222222222),(333333333333333))
That expression works fine to get 111111 from (input) , but not when there are also the groups 2222... and 3333.... the input might be variable by variable I mean could be ((111111111111)) or the one above or different (always following the same parenthesis pattern though)
Is there any reg expression to extract the values for both cases to an array?
The result I would like to come to is:
[0] = "111111"
[1] = "222222"
[2] = "333333"
Thanks
If you are trying to validate format while extracting desired parts you could use sticky y flag. This flag starts match from beginning and next match from where previous match ends. This approach needs one input string at a time.
Regex:
/^\(\(([^)]+)\)|(?!^)(?:,\(([^)]+)\)|\)$)/yg
Breakdown:
^\(\( Match beginning of input and immedietly ((
( Start of capturing group #1
[^)]+ Match anything but )
)\) End of CG #1, match ) immediately
| Or
(?!^) Next patterns shouldn't start at beginning
(?: Start of non-capturing group
,\(([^)]+)\) Match a separetd group (capture value in CG #2, same pattern as above)
| Or
\)$ Match ) and end of input
) End of group
JS code:
var str = '((111111111111),(222222222),(333333333333333))';
console.log(
str.replace(/^\(\(([^)]+)\)|(?!^)(?:,\(([^)]+)\)|\)$)/yg, '$1$2\n')
.split(/\n/).filter(Boolean)
);
You can replace brackes with , split it with , and then use substring to get the required number of string characters out of it.
input.replace(/\(/g, '').replace(/\)/g, '')
This will replace all the ( and ) and return a string like
111111111111,222222222,333333333333333
Now splitting this string with , will result into an array to what you want
var input = "((111111111111),(222222222),(333333333333333))";
var numbers = input.replace(/\(/g, '').replace(/\)/g, '')
numbers.split(",").map(o=> console.log(o.substring(0,6)))
If the level of nesting is fixed, you can just leave out the outer () from the pattern, and add the left parentheses to the [^)] group:
var expr = /\(([^()]+)\)/g;
var input = '((111111111111),(222222222),(333333333333333))';
var match = null;
while(match = expr.exec(input)) {
console.log(match[1]);
}

Get the string between the last 2 / in regex in javascript

How can I get the strings between last 2 slashes in regex in javascript?
for example:
stackoverflow.com/questions/ask/index.html => "ask"
http://regexr.com/foo.html?q=bar => "regexr.com"
https://www.w3schools.com/icons/default.asp => "icons"
You can use /\/([^/]+)\/[^/]*$/; [^/]*$ matches everything after the last slash, \/([^/]+)\/ matches the last two slashes, then you can capture what is in between and extract it:
var samples = ["stackoverflow.com/questions/ask/index.html",
"http://regexr.com/foo.html?q=bar",
"https://www.w3schools.com/icons/default.asp"]
console.log(
samples.map(s => s.match(/\/([^/]+)\/[^/]*$/)[1])
)
You can solve this by using split().
let a = 'stackoverflow.com/questions/ask/index.html';
let b = 'http://regexr.com/foo.html?q=bar';
let c = 'https://www.w3schools.com/icons/default.asp';
a = a.split('/')
b = b.split('/')
c = c.split('/')
indexing after split()
console.log(a[a.length-2])
console.log(b[b.length-2])
console.log(c[c.length-2])
I personally do not recommend using regex. Because it is hard to maintain
I believe that will do:
[^\/]+(?=\/[^\/]*$)
[^\/]+ This matches all chars other than /. Putting this (?=\/[^\/]*$) in the sequence looks for the pattern that comes before the last /.
var urls = [
"stackoverflow.com/questions/ask/index.html",
"http://regexr.com/foo.html?q=bar",
"https://www.w3schools.com/icons/default.asp"
];
urls.forEach(url => console.log(url.match(/[^\/]+(?=\/[^\/]*$)/)[0]));
You can use (?=[^/]*\/[^/]*$)(.*?)(?=\/[^/]*$). You can test it here: https://www.regexpal.com/
The format of the regex is: (positive lookahead for second last slash)(.*?)(positive lookahead for last slash).
The (.*?) is a lazy match for what's between the slashes.
references:
Replace second to last "/" character in URL with a '#'
RegEx that will match the last occurrence of dot in a string

Categories