Retrieving several capturing groups recursively with RegExp - javascript

I have a string with this format:
#someID#tn#company#somethingNew#classing#somethingElse#With
There might be unlimited #-separated words, but definitely the whole string begins with #
I have written the following regexp, though it matches it, but I cannot get each #-separated word, and what I get is the last recursion and the first (as well as the whole string). How can I get an array of every word in an element separately?
(?:^\#\w*)(?:(\#\w*)+) //I know I have ruled out second capturing group with ?: , though doesn't make much difference.
And here is my Javascript code:
var reg = /(?:^\#\w*)(?:(\#\w*)+)/g;
var x = null;
while(x = reg.exec("#someID#tn#company#somethingNew#classing#somethingElse#With"))
{
console.log(x);
}
And here is the result (Firebug, console):
["#someID#tn#company#somet...sing#somethingElse#With", "#With"]
0
"#someID#tn#company#somet...sing#somethingElse#With"
1
"#With"
index
0
input
"#someID#tn#company#somet...sing#somethingElse#With"
EDIT :
I want an output like this with regular expression if possible:
["#someID", "#tn", #company", "#somethingNew", "#classing", "#somethingElse", "#With"]
NOTE that I want a RegExp solution. I know about String.split() and String operations.

You can use:
var s = '#someID#tn#company#somethingNew#classing#somethingElse#With'
if (s.substr(0, 1) == "#")
tok = s.substr(1).split('#');
//=> ["someID", "tn", "company", "somethingNew", "classing", "somethingElse", "With"]

You could try this regex also,
((?:#|#)\w+)
DEMO
Explanation:
() Capturing groups. Anything inside this capturing group would be captured.
(?:) It just matches the strings but won't capture anything.
#|# Literal # or # symbol.
\w+ Followed by one or more word characters.
OR
> "#someID#tn#company#somethingNew#classing#somethingElse#With".split(/\b(?=#|#)/g);
[ '#someID',
'#tn',
'#company',
'#somethingNew',
'#classing',
'#somethingElse',
'#With' ]

It will be easier without regExp:
var str = "#someID#tn#company#somethingNew#classing#somethingElse#With";
var strSplit = str.split("#");
for(var i = 1; i < strSplit.length; i++) {
strSplit[i] = "#" + strSplit[i];
}
console.log(strSplit);
// ["#someID", "#tn", "#company", "#somethingNew", "#classing", "#somethingElse", "#With"]

Related

I need help getting the first n characters of a string up to when a number character starts

I'm working with a string where I need to extract the first n characters up to where numbers begin. What would be the best way to do this as sometimes the string starts with a number: 7EUSA8889er898 I would need to extract 7EUSA But other string examples would be SWFX74849948, I would need to extract SWFX from that string.
Not sure how to do this with regex my limited knowledge is blocking me at this point:
^(\w{4}) that just gets me the first four characters but I don't really have a stopping point as sometimes the string could be somelongstring292894830982 which would require me to get somelongstring
Using \w will match a word character which includes characters and digits and an underscore.
You could match an optional digit [0-9]? from the start of the string ^and then match 1+ times A-Za-z
^[0-9]?[A-Za-z]+
Regex demo
const regex = /^[0-9]?[A-Za-z]+/;
[
"7EUSA8889er898",
"somelongstring292894830982",
"SWFX74849948"
].forEach(s => console.log(s.match(regex)[0]));
Can use this regex code:
(^\d+?[a-zA-Z]+)|(^\d+|[a-zA-Z]+)
I try with exmaple and good worked:
1- somelongstring292894830982 -> somelongstring
2- 7sdfsdf5456 -> 7sdfsdf
3- 875werwer54556 -> 875werwer
If you want to create function where the RegExp is parametrized by n parameter, this would be
function getStr(str,n) {
var pattern = "\\d?\\w{0,"+n+"}";
var reg = new RegExp(pattern);
var result = reg.exec(str);
if(result[0]) return result[0].substr(0,n);
}
There are answers to this but here is another way to do it.
var string1 = '7EUSA8889er898';
var string2 = 'SWFX74849948';
var Extract = function (args) {
var C = args.split(''); // Split string in array
var NI = []; // Store indexes of all numbers
// Loop through list -> if char is a number add its index
C.map(function (I) { return /^\d+$/.test(I) === true ? NI.push(C.indexOf(I)) : ''; });
// Get the items between the first and second occurence of a number
return C.slice(NI[0] === 0 ? NI[0] + 1 : 0, NI[1]).join('');
};
console.log(Extract(string1));
console.log(Extract(string2));
Output
EUSA
SWFX7
Since it's hard to tell what you are trying to match, I'd go with a general regex
^\d?\D+(?=\d)

RegExp capturing group in capturing group

I want to capture the "1" and "2" in "http://test.com/1/2". Here is my regexp /(?:\/([0-9]+))/g.
The problem is that I only get ["/1", "/2"]. According to http://regex101.com/r/uC2bW5 I have to get "1" and "1".
I'm running my RegExp in JS.
You have a couple of options:
Use a while loop over RegExp.prototype.exec:
var regex = /(?:\/([0-9]+))/g,
string = "http://test.com/1/2",
matches = [];
while (match = regex.exec(string)) {
matches.push(match[1]);
}
Use replace as suggested by elclanrs:
var regex = /(?:\/([0-9]+))/g,
string = "http://test.com/1/2",
matches = [];
string.replace(regex, function() {
matches.push(arguments[1]);
});
In Javascript your "match" has always an element with index 0, that contains the WHOLE pattern match. So in your case, this index 0 is /1 and /2 for the second match.
If you want to get your DEFINED first Matchgroup (the one that does not include the /), you'll find it inside the Match-Array Entry with index 1.
This index 0 cannot be removed and has nothing to do with the outer matching group you defined as non-matching by using ?:
Imagine Javascript wrapps your whole regex into an additional set of brackets.
I.e. the String Hello World and the Regex /Hell(o) World/ will result in :
[0 => Hello World, 1 => o]

How to get the characters preceded by "add_"

I have a strings "add_dinner", "add_meeting", "add_fuel_surcharge" and I want to get characters that are preceded by "add_" (dinner, meeting, fuel_surcharge).
[^a][^d]{2}[^_]\w+
I have tried this one, but it only works for "add_dinner"
[^add_]\w+
This one works for "add_fuel_surcharge", but takes "inner" from "add_dinner"
Help me to understand please.
Use capturing groups:
/^add_(\w+)$/
Check the returned array to see the result.
Since JavaScript doesn't support lookbehind assertions, you need to use a capturing group:
var myregexp = /add_(\w+)/;
var match = myregexp.exec(subject);
if (match != null) {
result = match[1];
}
[^add_] is a character class that matches a single character except a, d or _. When applied to add_dinner, the first character it matches is i, and \w+ then matches nner.
The [^...] construct matches any single character except the ones listed. So [^add_] matches any single character other than "a", "d" or "_".
If you want to retrieve the bit after the _ you can do this:
/add_(\w+_)/
Where the parentheses "capture" the part of the expression inside. So to get the actual text from a string:
var s = "add_meeting";
var result = s.match(/add_(\w+)/)[1];
This assumes the string will match such that you can directly get the second element in the returned array that will be the "meeting" part that matched (\w+).
If there's a possibility that you'll be testing a string that won't match you need to test that the result of match() is not null.
(Or, possibly easier to understand: result = "add_meeting".split("_")[1];)
You can filter _ string by JavaScript for loop ,
var str = ['add_dinner', 'add_meeting', 'add_fuel_surcharge'];
var filterString = [];
for(var i = 0; i < str.length; i ++){
if(str[i].indexOf("_")>-1){
filterString.push(str[i].substring(str[i].indexOf("_") + 1, str[i].length));
}
}
alert(filterString.join(", "));

Javascript split function not correct worked with specific regex

I have a problem. I have a string - "\,str\,i,ing" and i need to split by comma before which not have slash. For my string - ["\,str\,i", "ing"]. I'm use next regex
myString.split("[^\],", 2)
but it's doesn't worked.
Well, this is ridiculous to avoid the lack of lookbehind but seems to get the correct result.
"\\,str\\,i,ing".split('').reverse().join('').split(/,(?=[^\\])/).map(function(a){
return a.split('').reverse().join('');
}).reverse();
//=> ["\,str\,i", "ing"]
Not sure about your expected output but you are specifying string not a regex, use:
var arr = "\,str\,i,ing".split(/[^\\],/, 2);
console.log(arr);
To split using regex, wrap your regex in /..../
This is not easily possible with js, because it does not support lookbehind. Even if you'd use a real regex, it would eat the last character:
> "xyz\\,xyz,xyz".split(/[^\\],/, 2)
["xyz\\,xy", "xyz"]
If you don't want the z to be eaten, I'd suggest:
var str = "....";
return str.split(",").reduce(function(res, part) {
var l = res.length;
if (l && res[l-1].substr(-1) == "\\" || l<2)
// ^ ^^ ^
// not the first was escaped limit
res[l-1] += ","+part;
else
res.push(part);
return;
}, []);
Reading between the lines, it looks like you want to split a string by , characters that are not preceded by \ characters.
It would be really great if JavaScript had a regular expression lookbehind (and negative lookbehind) pattern, but unfortunately it does not. What it does have is a lookahead ((?=) )and negative lookahead ((?!)) pattern. Make sure to review the documentation.
You can use these as a lookbehind if you reverse the string:
var str,
reverseStr,
arr,
reverseArr;
//don't forget to escape your backslashes
str = '\\,str\\,i,ing';
//reverse your string
reverseStr = str.split('').reverse().join('');
//split the array on `,`s that aren't followed by `\`
reverseArr = reverseStr.split(/,(?!\\)/);
//reverse the reversed array, and reverse each string in the array
arr = reverseArr.reverse().map(function (val) {
return val.split('').reverse().join('');
});
You picked a tough character to match- a forward slash preceding a comma is apt to disappear while you pass it around in a string, since '\,'==','...
var s= 'My dog, the one with two \\, blue \\,eyes, is asleep.';
var a= [], M, rx=/(\\?),/g;
while((M= rx.exec(s))!= null){
if(M[1]) continue;
a.push(s.substring(0, rx.lastIndex-1));
s= s.substring(rx.lastIndex);
rx.lastIndex= 0;
};
a.push(s);
/* returned value: (Array)
My dog
the one with two \, blue \,eyes
is asleep.
*/
Find something which will not be present in your original string, say "###". Replace "\\," with it. Split the resulting string by ",". Replace "###" back with "\\,".
Something like this:
<script type="text/javascript">
var s1 = "\\,str\\,i,ing";
var s2 = s1.replace(/\\,/g,"###");
console.log(s2);
var s3 = s2.split(",");
for (var i=0;i<s3.length;i++)
{
s3[i] = s3[i].replace(/###/g,"\\,");
}
console.log(s3);
</script>
See JSFiddle

How to split a string by a difference in character as delimiter?

What I'd like to achieve is splitting a string like this, i.e. the delimiters are the indexes where the character before that index is different from the character after that index:
"AAABBCCCCDEEE" -> ["AAA", "BB", "CCCC", "D", "EEE"]
I've been trying to make up a concise solution, but I ended up with this rather verbose code: http://jsfiddle.net/b39aM/1/.
var arr = [], // output
text = "AAABBCCCCDEEE", // input
current;
for(var i = 0; i < text.length; i++) {
var char = text[i];
if(char !== current) { // new letter
arr.push(char); // create new array element
current = char; // update current
} else { // current letter continued
arr[arr.length - 1] += char; // append letter to last element
}
}
It's naive and I don't like it:
I'm manually iterating over each character, and I'm appending to the array character by character
It's a little too long for the simple thing I want to achieve
I was thinking of using a regexp but I'm not sure what the regexp should be. Is it possible to define a regexp that means "one character and a different character following"?
Or more generally, is there a more elegant solution for achieving this splitting method?
Yes, you can use a regular expression:
"AAABBCCCCDEEE".match(/(.)\1*/g)
Here . will match any character and \1* will match any following characters that are the same as the formerly matched one. And with a global match you’ll get all matching sequences.

Categories