RegEx To capture URL between angle brackets (or pipe) - javascript

Consider the following URLs:
<http://www.google.com>
<http://www.google.com|www.google.com>
<http://google.com|google.com>
I'm trying to figure a RegEx that would capture the URL after < up until | OR >
I've tried URL.match(/<([^>|\|]+)/g) but it always capture the first <
Desired output is simply: http://www.google.com

The RegEx is correct. String#match will return the complete match set. You need to extract the first captured group.
Use RegExp#exec to get the URLs.
var str = `<http://www.google.com>
<http://www.google.com|www.google.com>
<http://google.com|google.com>`;
var regex = /<([^>|\|]+)/g;
var urls = [];
while(match = regex.exec(str)) {
urls.push(match[1]); // Get first captured group, and push in array
}
console.log(urls);
document.body.innerHTML = '<pre>' + JSON.stringify(urls, 0, 4) + '</pre>';
You can also use String#match as follow:
str.match(/[^<>|\s]+/g)
var str = `<http://www.google.com>
<http://www.google.com|www.google.com>
<http://google.com|google.com>`;
var urls = str.match(/[^<>|\s]+/g);
console.log(urls);
document.body.innerHTML = '<pre>' + JSON.stringify(urls, 0, 4) + '</pre>';

This kind of pattern doesn't complex require regular expressions! You can use a simple pattern and string operations:
var url = "<http://www.google.com|www.google.com>";
var parts = url.replace(/^<|>$/, "").split("|");
For the solution using regex, try the following:
var url = "<http://www.google.com|www.google.com>";
var match = /(?:<|\|)([^>|]+)/g.exec(url);
You can then access the value of the first capturing group like this:
var url = match[1];
By calling exec on the same regular expression several times, you can find multiple matches (the multiple URLs you're looking for).
Explanation of the regular expression:
(?:<|\|) is a non-capturing group ((?: ... )) that looks for either a < or a | symbol at the beginning. (In your case, every URL will either have a < or a | on the left side of it!)
([^>|]+) is a capturing group (( ... )) that capturing a sequence of characters that are not > or |. You don't need to escape the | within a character class, it only has special meaning outside of it.

str.match(/(http\:.*)(?=\|)/)[0]
var strs = ["<http://www.google.com>",
"<http://www.google.com|www.google.com>",
"<http://google.com|google.com>"];
strs.forEach(function(str) {
// if `str` contains `|` character,
// match characters that are followed by `|`
if (/\|/.test(str)) {
console.log(str.match(/(http\:.*)(?=\|)/)[0])
}
// else match characters that are not `<`, `>`
else {
console.log(str.match(/[^<>]+/)[0])
}
})

JS Fiddle
var url1 = '<http://www.google.com|www.example.com>',
url2 = '<http://www.yahoo.com>';
console.log(url1.replace(/<|>/g, '').split('|'));
console.log(url2.replace(/<|>/g, '').split('|'));

Related

regular expression replacement in JavaScript with some part remaining intact

I need to parse a string that comes like this:
-38419-indices-foo-7119-attributes-10073-bar
Where there are numbers followed by one or more words all joined by dashes. I need to get this:
[
0 => '38419-indices-foo',
1 => '7119-attributes',
2 => '10073-bar',
]
I had thought of attempting to replace only the dash before a number with a : and then using .split(':') - how would I do this? I don't want to replace the other dashes.
Imo, the pattern is straight-forward:
\d+\D+
To even get rid of the trailing -, you could go for
(\d+\D+)(?:-|$)
Or
\d+(?:(?!-\d|$).)+
You can see it here:
var myString = "-38419-indices-foo-7119-attributes-10073-bar";
var myRegexp = /(\d+\D+)(?:-|$)/g;
var result = [];
match = myRegexp.exec(myString);
while (match != null) {
// matched text: match[0]
// match start: match.index
// capturing group n: match[n]
result.push(match[1]);
match = myRegexp.exec(myString);
}
console.log(result);
// alternative 2
let alternative_results = myString.match(/\d+(?:(?!-\d|$).)+/g);
console.log(alternative_results);
Or a demo on regex101.com.
Logic
lazy matching using quantifier .*?
Regex
.*?((\d+)\D*)(?!-)
https://regex101.com/r/WeTzF0/1
Test string
-38419-indices-foo-7119-attributes-10073-bar-333333-dfdfdfdf-dfdfdfdf-dfdfdfdfdfdf-123232323-dfsdfsfsdfdf
Matches
Further steps
You need to split from the matches and insert into your desired array.

Regex match quotes inside bracket regex

I'm working on a regex that must match only the text inside quotes but not in a comment, my macthes must only the strings in bold
<"love";>
>/*"love"*/<
<>'love'<>
"lo
more love
ve"
I'm stunck on this:
/(?:((\"|\')(.|\n)*?(\"|\')))(?=(?:\/\**\*\/))/gm
The first one (?:((\"|\')(.|\n)*?(\"|\'))) match all the strings
the second one (?=(?:\/\**\*\/)) doesn't match text inside quotes inside /* "mystring" */
bit my logic is cleary wrong
Any suggestion?
Thanks
Maybe you just need to use a negative lookahead to check for the comment end */?
But first, I'd split the string into separate lines
var arrayOfLines = input_str.split(/\r?\n/);
or, without empty lines:
var arrayOfLines = input_str.match(/[^\r\n]+/g);
and then use this regex:
["']([^'"]+)["'](?!.*\*\/)
Sample code:
var rebuilt_string = ''
var re = /["']([^'"]+)["'](?!.*\*\/)/g;
var subst = '<b>$1</b>';
for (i = 0; i < arrayOfLines.length; i++)
{
rebuilt_string = rebuilt_string + arrayOfLines[i].replace(re, subst) + "\r\n";
}
The way to avoid commented parts is to match them before. The global pattern looks like this:
/(capture parts to avoid)|target/
Then use a callback function for the replacement (when the capture group exists, return the match without change, otherwise, replace the match with what you want.
Example:
var result = text.replace(/(\/\*[^*]*(?:\*+(?!\/)[^*]*)*\*\/)|"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*'/g,
function (m, g1) {
if (g1) return g1;
return '<b>' + m + '</b>';
});

Retrieving several capturing groups recursively with RegExp

I have a string with this format:
#someID#tn#company#somethingNew#classing#somethingElse#With
There might be unlimited #-separated words, but definitely the whole string begins with #
I have written the following regexp, though it matches it, but I cannot get each #-separated word, and what I get is the last recursion and the first (as well as the whole string). How can I get an array of every word in an element separately?
(?:^\#\w*)(?:(\#\w*)+) //I know I have ruled out second capturing group with ?: , though doesn't make much difference.
And here is my Javascript code:
var reg = /(?:^\#\w*)(?:(\#\w*)+)/g;
var x = null;
while(x = reg.exec("#someID#tn#company#somethingNew#classing#somethingElse#With"))
{
console.log(x);
}
And here is the result (Firebug, console):
["#someID#tn#company#somet...sing#somethingElse#With", "#With"]
0
"#someID#tn#company#somet...sing#somethingElse#With"
1
"#With"
index
0
input
"#someID#tn#company#somet...sing#somethingElse#With"
EDIT :
I want an output like this with regular expression if possible:
["#someID", "#tn", #company", "#somethingNew", "#classing", "#somethingElse", "#With"]
NOTE that I want a RegExp solution. I know about String.split() and String operations.
You can use:
var s = '#someID#tn#company#somethingNew#classing#somethingElse#With'
if (s.substr(0, 1) == "#")
tok = s.substr(1).split('#');
//=> ["someID", "tn", "company", "somethingNew", "classing", "somethingElse", "With"]
You could try this regex also,
((?:#|#)\w+)
DEMO
Explanation:
() Capturing groups. Anything inside this capturing group would be captured.
(?:) It just matches the strings but won't capture anything.
#|# Literal # or # symbol.
\w+ Followed by one or more word characters.
OR
> "#someID#tn#company#somethingNew#classing#somethingElse#With".split(/\b(?=#|#)/g);
[ '#someID',
'#tn',
'#company',
'#somethingNew',
'#classing',
'#somethingElse',
'#With' ]
It will be easier without regExp:
var str = "#someID#tn#company#somethingNew#classing#somethingElse#With";
var strSplit = str.split("#");
for(var i = 1; i < strSplit.length; i++) {
strSplit[i] = "#" + strSplit[i];
}
console.log(strSplit);
// ["#someID", "#tn", "#company", "#somethingNew", "#classing", "#somethingElse", "#With"]

Regex to grab strings between square brackets

I have the following string: pass[1][2011-08-21][total_passes]
How would I extract the items between the square brackets into an array? I tried
match(/\[(.*?)\]/);
var s = 'pass[1][2011-08-21][total_passes]';
var result = s.match(/\[(.*?)\]/);
console.log(result);
but this only returns [1].
Not sure how to do this.. Thanks in advance.
You are almost there, you just need a global match (note the /g flag):
match(/\[(.*?)\]/g);
Example: http://jsfiddle.net/kobi/Rbdj4/
If you want something that only captures the group (from MDN):
var s = "pass[1][2011-08-21][total_passes]";
var matches = [];
var pattern = /\[(.*?)\]/g;
var match;
while ((match = pattern.exec(s)) != null)
{
matches.push(match[1]);
}
Example: http://jsfiddle.net/kobi/6a7XN/
Another option (which I usually prefer), is abusing the replace callback:
var matches = [];
s.replace(/\[(.*?)\]/g, function(g0,g1){matches.push(g1);})
Example: http://jsfiddle.net/kobi/6CEzP/
var s = 'pass[1][2011-08-21][total_passes]';
r = s.match(/\[([^\]]*)\]/g);
r ; //# => [ '[1]', '[2011-08-21]', '[total_passes]' ]
example proving the edge case of unbalanced [];
var s = 'pass[1]]][2011-08-21][total_passes]';
r = s.match(/\[([^\]]*)\]/g);
r; //# => [ '[1]', '[2011-08-21]', '[total_passes]' ]
add the global flag to your regex , and iterate the array returned .
match(/\[(.*?)\]/g)
I'm not sure if you can get this directly into an array. But the following code should work to find all occurences and then process them:
var string = "pass[1][2011-08-21][total_passes]";
var regex = /\[([^\]]*)\]/g;
while (match = regex.exec(string)) {
alert(match[1]);
}
Please note: i really think you need the character class [^\]] here. Otherwise in my test the expression would match the hole string because ] is also matches by .*.
'pass[1][2011-08-21][total_passes]'.match(/\[.+?\]/g); // ["[1]","[2011-08-21]","[total_passes]"]
Explanation
\[ # match the opening [
Note: \ before [ tells that do NOT consider as a grouping symbol.
.+? # Accept one or more character but NOT greedy
\] # match the closing ] and again do NOT consider as a grouping symbol
/g # do NOT stop after the first match. Do it for the whole input string.
You can play with other combinations of the regular expression
https://regex101.com/r/IYDkNi/1
[C#]
string str1 = " pass[1][2011-08-21][total_passes]";
string matching = #"\[(.*?)\]";
Regex reg = new Regex(matching);
MatchCollection matches = reg.Matches(str1);
you can use foreach for matched strings.

Javascript regex - split string

Struggling with a regex requirement. I need to split a string into an array wherever it finds a forward slash. But not if the forward slash is preceded by an escape.
Eg, if I have this string:
hello/world
I would like it to be split into an array like so:
arrayName[0] = hello
arrayName[1] = world
And if I have this string:
hello/wo\/rld
I would like it to be split into an array like so:
arrayName[0] = hello
arrayName[1] = wo/rld
Any ideas?
I wouldn't use split() for this job. It's much easier to match the path components themselves, rather than the delimiters. For example:
var subject = 'hello/wo\\/rld';
var regex = /(?:[^\/\\]+|\\.)+/g;
var matched = null;
while (matched = regex.exec(subject)) {
print(matched[0]);
}
output:
hello
wo\/rld
test it at ideone.com
The following is a little long-winded but will work, and avoids the problem with IE's broken split implementation by not using a regular expression.
function splitPath(str) {
var rawParts = str.split("/"), parts = [];
for (var i = 0, len = rawParts.length, part; i < len; ++i) {
part = "";
while (rawParts[i].slice(-1) == "\\") {
part += rawParts[i++].slice(0, -1) + "/";
}
parts.push(part + rawParts[i]);
}
return parts;
}
var str = "hello/world\\/foo/bar";
alert( splitPath(str).join(",") );
Here's a way adapted from the techniques in this blog post:
var str = "Testing/one\\/two\\/three";
var result = str.replace(/(\\)?\//g, function($0, $1){
return $1 ? '/' : '[****]';
}).split('[****]');
Live example
Given:
Testing/one\/two\/three
The result is:
[0]: Testing
[1]: one/two/three
That first uses the simple "fake" lookbehind to replace / with [****] and to replace \/ with /, then splits on the [****] value. (Obviously, replace [****] with anything that won't be in the string.)
/*
If you are getting your string from an ajax response or a data base query,
that is, the string has not been interpreted by javascript,
you can match character sequences that either have no slash or have escaped slashes.
If you are defining the string in a script, escape the escapes and strip them after the match.
*/
var s='hello/wor\\/ld';
s=s.match(/(([^\/]*(\\\/)+)([^\/]*)+|([^\/]+))/g) || [s];
alert(s.join('\n'))
s.join('\n').replace(/\\/g,'')
/* returned value: (String)
hello
wor/ld
*/
Here's an example at rubular.com
For short code, you can use reverse to simulate negative lookbehind
function reverse(s){
return s.split('').reverse().join('');
}
var parts = reverse(myString).split(/[/](?!\\(?:\\\\)*(?:[^\\]|$))/g).reverse();
for (var i = parts.length; --i >= 0;) { parts[i] = reverse(parts[i]); }
but to be efficient, it's probably better to split on /[/]/ and then walk the array and rejoin elements that have an escape at the end.
Something like this may take care of it for you.
var str = "/hello/wo\\/rld/";
var split = str.replace(/^\/|\\?\/|\/$/g, function(match) {
if (match.indexOf('\\') == -1) {
return '\x00';
}
return match;
}).split('\x00');
alert(split);

Categories