extract all text inside braces into array of strings - javascript

I have a big string from which I would like to extract all parts that are inside round braces.
Say I have a string like
"this (one) that (one two) is (three )"
I need to write a function that would return an array
["one", "one two", "three "]
I tried to write a regex from some advice found here and failed, since I seem to only get the first element and not a proper array filled with all of them: http://jsfiddle.net/gfQzK/
var match = s.match(/\(([^)]+)\)/);
alert(match[1]);
Could someone point me in the right direction? My solution does not have to be regular expression.

You need a global regex. See if this helps:
var matches = [];
str.replace(/\(([^)]+)\)/g, function(_,m){ matches.push(m) });
console.log(matches); //= ["one", "one two", "three "]
match won't do as it doesn't capture groups in global regex. replace can be used to loop.

You are almost there. You just need to change a few things.
First, add the global attribute to your regex. Now your regex should look like:
/\(([^)]+)\)/g
Then, match.length will provide you with the number of matches. And to extract the matches, use indexes such as match[1] match[2] match[3] ...

You need to use the global flag, and multiline if you have new lines in there, and continually exec the result until you have all your results in an array:
var s='Russia ignored (demands) by the White House to intercept the N.S.A. leaker and return him to the United States, showing the two countries (still) have a (penchant) for that old rivalry from the Soviet era.';
var re = /\(([^)]+)\)/gm, arr = [], res = [];
while ((arr = re.exec(s)) !== null) {
res.push(arr[1]);
}
alert(res);
fiddle
For reference check out this mdn article on exec

Related

Regexp group not excluding dots

Let's say I have the following string: div.classOneA.classOneB#idOne
Trying to write a regexp which extracts the classes (classOneA, classOneB) from it. I was able to do this but with Lookbehind assertion only.
It looks like this:
'div.classOneA.classOneB#idOne'.match(/(?<=\.)([^.#]+)/g)
> (2) ["classOneA", "classOneB"]
Now I would like to archive this without the lookbehind approach and do not really understand why my solution's not working.
'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
> (2) [".classOneA", ".classOneB"]
Thought that the grouping will solve my problem but all matching item contains the dot as well.
There isn't a good way in Javascript to both match multiple times (/g option) and pick up capture groups (in the parens). Try this:
var input = "div.classOneA.classOneB#idOne";
var regex = /\.([^.#]+)/g;
var matches, output = [];
while (matches = regex.exec(input)) {
output.push(matches[1]);
}
This is because with g modifier you get all matching substrings but not its matching groups (that is as if (...) pairs worked just like (?:...) ones.
You see. Whithout g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/)
[ '.classOneA',
'classOneA',
index: 3,
input: 'div.classOneA.classOneB#idOne',
groups: undefined ]
With g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
[ '.classOneA', '.classOneB' ]
In other words: you obtain all matches but only the whole match (0 item) per each.
There are many solutions:
Use LookBehind assertions as you pointed out yourself.
Fix each result later adding .map(x=>x.replace(/^\./, ""))
Or, if your input structure won't be much more complicated than the example you provide, simply use a cheaper approach:
> 'div.classOneA.classOneB#idOne'.replace(/#.*/, "").split(".").slice(1)
[ 'classOneA', 'classOneB' ]
Use .replace() + callback instead of .match() in order to be able to access capture groups of every match:
const str = 'div.classOneA.classOneB#idOne';
const matches = [];
str.replace(/\.([^.#]+)/g, (...args)=>matches.push(args[1]))
console.log(matches); // [ 'classOneA', 'classOneB' ]
I would recommend the third one (if there aren't other possible inputs that could eventually break it) because it is much more efficient (actual regular expressions are used only once to trim the '#idOne' part).
If you want to expand you regex. you can simply map on results and replace . with empty string
let op = 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
.map(e=> e.replace(/\./g,''))
console.log(op)
If you know you are searching for a text containing class, then you can use something like
'div.classOneA.classOneB#idOne'.match(/class[^.#]+/g)
If the only thing you know is that the text is preceded by a dot, then you must use lookbehind.
This regex will work without lookbehind assertion:
'div.classOneA.classOneB#idOne'.match(/\.[^\.#]+/g).map(item => item.substring(1));
Lookbehind assertion is not available in JavaScript recently.
I'm not an expert on using regex - particularly in Javascript - but after some research on MDN I've figured out why your attempt wasn't working, and how to fix.
The problem is that using .match with a regexp with the /g flag will ignore capturing groups. So instead you have to use the .exec method on the regexp object, using a loop to execute it multiple times to get all the results.
So the following code is what works, and can be adapted for similar cases. (Note the grp[1] - this is because the first element of the array returned by .exec is the entire match, the groups are the subsequent elements.)
var regExp = /\.([^.#]+)/g
var result = [];
var grp;
while ((grp = regExp.exec('div.classOneA.classOneB#idOne')) !== null) {
result.push(grp[1]);
}
console.log(result)

How to store only the nth substring into a variable in Javascript

var a="how are you?";
In the above example I want to store the second word "are" into another variable in a single step.
I don't want to use something like below
var bigArray = a.split(" ");
var secondText = bigArray[1];
as we may need to store the entire paragraph into a big array and consume a lot of memory without any use.
I would like to know if there is some function which works as below
var secondText=specialFunction(a," ",1);
so that we will get the second substring when the paragraph is split by " "
Well, I would spend my time worrying about more important things than the size of some arrays.
Anyway, you could try using a regexp:
var secondText = (a.match(/ (\w+)/) || []) [1];
This reads as "find a space, then capture the following word".
The || [] part is meant to deal with the situation where there is no match (for example, no second word). In that case, the result will be [][1] which is undefined.
This finds only the second word. What about the more general case? Since we are not allowed to split the string on spaces, because that would create an array and the OP doesn't want that due to memory concerns. So, we will instead build a dynamic regexp. To find the nth word, we want to skip over the first n-1 spaces. Or, to be more precise, we want to skip over the first word, some spaces, then the second word, then some more spaces, etc. So the regexp is
/(?:\w+ ){n}(\w+)/
^^ NO CAPTURING GROUP
^^^^ WORD FOLLOWED BY SPACE
^^^ N TIMES
^^^^^ CAPTURE FOLLOWING WORD
The ?: is to avoid this being treated as a capturing group. We build the regexp using
function make_nth_word_regexp(n) {
n--;
return new RegExp("(?:\\w+ ){" + n + "}(\\w+)");
}
Now look for your nth word:
var fifth_word = str.match(make_nth_word_regexp(5)) [1];
> "Hey there you".match(make_nth_word_regexp(3))[1]
< "you"
Alternative to regex is just to use substring(). Something like
var a="how are you";
alert(a.substring(a.indexOf(" "), a.length).substring(0, a.indexOf(" ")+1));

Split using regex creates 2 empty elements in array

I need to split a string into 2 pieces using regex, so I used the following code:
var str = "This is a test";
var list = str.split(/(test)/);
Required output:
list = ["This is a ", "test"]
Instead of 2 this gives me 3 elements in the array (last one is empty). I understand that regex finds nothing after the 2nd match so it adds an empty (3rd) element. Is there any way that I can modify my code so I get exactly 2 elements thus avoiding the last empty element?
Note: the above code is a simplified version for which we can use other options besides regex but I would have to use regex.
var str = "This is a test";
var list = str.split(/(test)/,2);
list: ["This is a ", "test"]
Perhaps overkill if you can guarantee that you are only expecting an array of length two but given the nature of the question a more robust solution may be to use Array.filter to remove all empty strings from the array - including entries in the middle of the array which would arise from several delimiters appearing next to each other in your input string.
var list = str.split(/(test)/).filter(
function(v){ return v!=null && v!='' }
);
You can try with checking if last element is empty or not:
var last = list.pop();
last.length || list.push(last);
or:
list[list.length-1].length || list.pop();
or even shorter:
list.slice(-1)[0].length || list.pop();
To handle first empty element (test was there as #Kobi suggested) use:
list[0].length || list.shift();
This is giving me the results you want:
var str = "This is a test";
var list = str.split(/(?=test)/g);
(?= is a lookahead, it doesn't capture the word test so that stays in the array after splitting.

javascript, regex parse string content in curly brackets

i am new to regex. I am trying to parse all contents inside curly brackets in a string. I looked up this post as a reference and did exactly as one of the answers suggest, however the result is unexpected.
Here is what i did
var abc = "test/abcd{string1}test{string2}test" //any string
var regex = /{(.+?)}/
regex.exec(abc) // i got ["{string1}", "string1"]
//where i am expecting ["string1", "string2"]
i think i am missing something, what am i doing wrong?
update
i was able to get it with /g for a global search
var regex = /{(.*?)}/g
abc.match(regex) //gives ["{string1}", "{string2}"]
how can i get the string w/o brackets?
"test/abcd{string1}test{string2}test".match(/[^{}]+(?=\})/g)
produces
["string1", "string2"]
It assumes that every } has a corresponding { before it and {...} sections do not nest. It will also not capture the content of empty {} sections.
var abc = "test/abcd{string1}test{string2}test" //any string
var regex = /{(.+?)}/g
var matches;
while(matches = regex.exec(abc))
console.log(matches);
Try this:
var abc = "test/abcd{string1}test{string2}test" //any string
var regex = /{(.+?)}/g //g flag so the regex is global
abc.match(regex) //find every match
a good place to read about Regex in javascript is here, and a nice place to test is here
good luck!
Nothing wrong. But you'll need to look at your capturing groups (the second element in the array) to get the content you wanted (you can ignore the first). To get all occurences, it's not enough to run exec once, you'll need to loop over the results using match.
Edit: nevermind that, afaik you can't access capturing groups with match. A simpler solution would be using a positive lookahead, as Mike Samuel suggested.
This result:
["{string1}", "string1"]
is showing you that for the first match, the entire regex matched "{string1}" and the first capturing parentheses matched "string1".
If you want to get all matches and see all capturing parens of each match, you can use the "g" flag and loop through, calling exec() multiple times like this:
var abc = "test/abcd{string1}test{string2}test"; //any string
var regex = /{(.+?)}/g;
var match, results = [];
while (match = regex.exec(abc)) {
results.push(match[1]); // save first captured parens sub-match into results array
}
// results == ["string1", "string2"]
You can see it work here: http://jsfiddle.net/jfriend00/sapfm/
try this for file
const fs = require('fs');
fs.readFile('logs.txt', function(err, data) {
if(err) throw err;
const paragraph = "'" + data + "'";
const regex = /\d+\<;>\S+\<;>(\d+)\<;/g;
const found = paragraph.match(regex);
console.log(found);
})

Javascript Regexp - Match Characters after a certain phrase

I was wondering how to use a regexp to match a phrase that comes after a certain match. Like:
var phrase = "yesthisismyphrase=thisiswhatIwantmatched";
var match = /phrase=.*/;
That will match from the phrase= to the end of the string, but is it possible to get everything after the phrase= without having to modify a string?
You use capture groups (denoted by parenthesis).
When you execute the regex via match or exec function, the return an array consisting of the substrings captured by capture groups. You can then access what got captured via that array. E.g.:
var phrase = "yesthisismyphrase=thisiswhatIwantmatched";
var myRegexp = /phrase=(.*)/;
var match = myRegexp.exec(phrase);
alert(match[1]);
or
var arr = phrase.match(/phrase=(.*)/);
if (arr != null) { // Did it match?
alert(arr[1]);
}
phrase.match(/phrase=(.*)/)[1]
returns
"thisiswhatIwantmatched"
The brackets specify a so-called capture group. Contents of capture groups get put into the resulting array, starting from 1 (0 is the whole match).
It is not so hard, Just assume your context is :
const context = "https://example.com/pa/GIx89GdmkABJEAAA+AAAA";
And we wanna have the pattern after pa/, so use this code:
const pattern = context.match(/pa\/(.*)/)[1];
The first item include pa/, but for the grouping second item is without pa/, you can use each what you want.
Let try this, I hope it work
var p = /\b([\w|\W]+)\1+(\=)([\w|\W]+)\1+\b/;
console.log(p.test('case1 or AA=AA ilkjoi'));
console.log(p.test('case2 or AA=AB'));
console.log(p.test('case3 or 12=14'));
If you want to get value after the regex excluding the test phrase, use this:
/(?:phrase=)(.*)/
the result will be
0: "phrase=thisiswhatIwantmatched" //full match
1: "thisiswhatIwantmatched" //matching group

Categories