I have the following text on my page:
pageTracker._addItem("2040504","JACQXSPINKASS-TX4-8","Jacq Socks","","9.00000","1.0");
pageTracker._addItem("2040504","FTWCLSNOCOLOURONE SIZE","Footwear Cleaner","","8.00000","1.0");
I would like to just extract the parameters that are within the brackets for each line using javascripts match() function. I have the following regex but it's not quite right:
/\b_addItem[^);]+/g
This matches the _addItem( part as well. How can I tweak this to only get the stuff inside the brackets?
Regexr example
Ideally it should match any string that begins with pageTracker._addItem(" but not include that part in the match up to the closing bracket.
I am going to be doing the matching with javascript with I don't think supports look behinds if I'm right
Use a look behind to assert, but not capture, the preceding text:
/(?<=pageTracker\._addItem\()[^);]+/g
Note that I added ( to the look behind to not capture that either.
Now that you've added the JavaScript tag, where look behinds are not supported, you must capture your target in a group:
/pageTracker\._addItem\(([^);]+)/g
Your target will be in group 1.
You can split it into two regular expression calls to avoid look-behind:
var str = 'pageTracker._addItem("2040504","JACQXSPINKASS-TX4-8","Jacq Socks","","9.00000","1.0");\npageTracker._addItem("2040504","FTWCLSNOCOLOURONE SIZE","Footwear Cleaner","","8.00000","1.0");'
var m,output=[];
var re = /^pageTracker._addItem\("(.*)"\)/gm;
while(m=re.exec(str))
output.push(m[1].split('","'));
Output is then a 2D array:
[
["2040504","JACQXSPINKASS-TX4-8","Jacq Socks","","9.00000","1.0"],
["2040504","FTWCLSNOCOLOURONE SIZE","Footwear Cleaner","","8.00000","1.0"]
]
You might do as follows too
var strings = ['pageTracker._addItem("2040504","JACQXSPINKASS-TX4-8","Jacq Socks","","9.00000","1.0");',
'pageTracker._addItem("2040504","FTWCLSNOCOLOURONE SIZE","Footwear Cleaner","","8.00000","1.0");'
],
args = strings.map(s => s.match(/".*?"/g)
.map(s => s.replace(/"/g,'')));
console.log(args);
Related
Let's say I have the following string: div.classOneA.classOneB#idOne
Trying to write a regexp which extracts the classes (classOneA, classOneB) from it. I was able to do this but with Lookbehind assertion only.
It looks like this:
'div.classOneA.classOneB#idOne'.match(/(?<=\.)([^.#]+)/g)
> (2) ["classOneA", "classOneB"]
Now I would like to archive this without the lookbehind approach and do not really understand why my solution's not working.
'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
> (2) [".classOneA", ".classOneB"]
Thought that the grouping will solve my problem but all matching item contains the dot as well.
There isn't a good way in Javascript to both match multiple times (/g option) and pick up capture groups (in the parens). Try this:
var input = "div.classOneA.classOneB#idOne";
var regex = /\.([^.#]+)/g;
var matches, output = [];
while (matches = regex.exec(input)) {
output.push(matches[1]);
}
This is because with g modifier you get all matching substrings but not its matching groups (that is as if (...) pairs worked just like (?:...) ones.
You see. Whithout g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/)
[ '.classOneA',
'classOneA',
index: 3,
input: 'div.classOneA.classOneB#idOne',
groups: undefined ]
With g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
[ '.classOneA', '.classOneB' ]
In other words: you obtain all matches but only the whole match (0 item) per each.
There are many solutions:
Use LookBehind assertions as you pointed out yourself.
Fix each result later adding .map(x=>x.replace(/^\./, ""))
Or, if your input structure won't be much more complicated than the example you provide, simply use a cheaper approach:
> 'div.classOneA.classOneB#idOne'.replace(/#.*/, "").split(".").slice(1)
[ 'classOneA', 'classOneB' ]
Use .replace() + callback instead of .match() in order to be able to access capture groups of every match:
const str = 'div.classOneA.classOneB#idOne';
const matches = [];
str.replace(/\.([^.#]+)/g, (...args)=>matches.push(args[1]))
console.log(matches); // [ 'classOneA', 'classOneB' ]
I would recommend the third one (if there aren't other possible inputs that could eventually break it) because it is much more efficient (actual regular expressions are used only once to trim the '#idOne' part).
If you want to expand you regex. you can simply map on results and replace . with empty string
let op = 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
.map(e=> e.replace(/\./g,''))
console.log(op)
If you know you are searching for a text containing class, then you can use something like
'div.classOneA.classOneB#idOne'.match(/class[^.#]+/g)
If the only thing you know is that the text is preceded by a dot, then you must use lookbehind.
This regex will work without lookbehind assertion:
'div.classOneA.classOneB#idOne'.match(/\.[^\.#]+/g).map(item => item.substring(1));
Lookbehind assertion is not available in JavaScript recently.
I'm not an expert on using regex - particularly in Javascript - but after some research on MDN I've figured out why your attempt wasn't working, and how to fix.
The problem is that using .match with a regexp with the /g flag will ignore capturing groups. So instead you have to use the .exec method on the regexp object, using a loop to execute it multiple times to get all the results.
So the following code is what works, and can be adapted for similar cases. (Note the grp[1] - this is because the first element of the array returned by .exec is the entire match, the groups are the subsequent elements.)
var regExp = /\.([^.#]+)/g
var result = [];
var grp;
while ((grp = regExp.exec('div.classOneA.classOneB#idOne')) !== null) {
result.push(grp[1]);
}
console.log(result)
My string:
AA,$,DESCRIPTION(Sink, clinical),$
Wanted matches:
AA
$
DESCRIPTION(Sink, clinical)
$
My regex sofar:
\+d|[\w$:0-9`<>=&;?\|\!\#\+\%\-\s\*\(\)\.ÅÄÖåäö]+
This gives
AA
$
DESCRIPTION(Sink
clinical)
I want to keep matches between ()
https://regex101.com/r/MqFUmk/3
Here's my attempt at the regex
\+d|[\w$:0-9`<>=&;?\|\!\#\+\%\-\s\*\.ÅÄÖåäö]+(\(.+\))?
I removed the parentheses from within the [ ] characters, and allowed capture elsewhere. It seems to satisfy the regex101 link you posted.
Depending on how arbitrary your input is, this regex might not be suitable for more complex strings.
Alternatively, here's an answer which could be more robust than mine, but may only work in Ruby.
((?>[^,(]+|(\((?>[^()]+|\g<-1>)*\)))+)
That one seems to work for me?
([^,\(\)]*(?:\([^\(\)]*\))?[^,\(\)]*)(?:,|$)
https://regex101.com/r/hLyJm5/2
Hope this helps!
Personally, I would first replace all commas within parentheses () with a character that will never occur (in my case I used # since I don't see it within your inclusions) and then I would split them by commas to keep it sweet and simple.
myStr = "AA,$,DESCRIPTION(Sink, clinical),$"; //Initial string
myStr = myStr.replace(/(\([^,]+),([^\)]+\))/g, "$1#$2"); //Replace , within parentheses with #
myArr = myStr.split(',').map(function(s) { return s.replace('#', ','); }); //Split string on ,
//myArr -> ["AA","$","DESCRIPTION(Sink, clinical)","$"]
optionally, if you're using ES6, you can change that last line to:
myArr = myStr.split(',').map(s => s.replace('#', ',')); //Yay Arrow Functions!
Note: If you have nested parentheses, this answer will need a modification
At last take an aproximation of what you need:
\w+(?:\(.*\))|\w+|\$
https://regex101.com/r/MqFUmk/4
I have a url like http://www.somedotcom.com/all/~childrens-day/pr?sid=all.
I want to extract childrens-day. How to get that? Right now I am doing it like this
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
url.match('~.+\/');
But what I am getting is ["~childrens-day/"].
Is there a (definitely there would be) short and sweet way to get the above text without ["~ and /"] i.e just childrens-day.
Thanks
You could use a negated character class and a capture group ( ) and refer to capture group #1. The caret (^) inside of a character class [ ] is considered the negation operator.
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
var result = url.match(/~([^~]+)\//);
console.log(result[1]); // "childrens-day"
See Working demo
Note: If you have many url's inside of a string you may want to add the ? quantifier for a non greedy match.
var result = url.match(/~([^~]+?)\//);
Like so:
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
var matches = url.match(/~(.+?)\//);
console.log(matches[1]);
Working example: http://regex101.com/r/xU4nZ6
Note that your regular expression wasn't actually properly delimited either, not sure how you got the result you did.
Use non-capturing groups with a captured group then access the [1] element of the matches array:
(?:~)(.+)(?:/)
Keep in mind that you will need to escape your / if using it also as your RegEx delimiter.
Yes, it is.
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
url.match('~(.+)\/')[1];
Just wrap what you need into parenteses group. No more modifications into your code is needed.
References: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
You could just do a string replace.
url.replace('~', '');
url.replace('/', '');
http://www.w3schools.com/jsref/jsref_replace.asp
I have strings in my program that are like so:
var myStrings = [
"[asdf] thisIsTheText",
"[qwerty] andSomeMoreText",
"noBracketsSometimes",
"[12345]someText"
];
I want to capture the strings "thisIsTheText", "andSomeMoreText", "noBracketsSometimes", "someText". The pattern of inputs will always be the same, square brackets with something in them (or maybe not) followed by some spaces (again, maybe not), and then the actual text I want.
How can I do this?
Thanks
One approach:
var actualTextYouWant = originalString.replace(/^\[[^\]]+\]\s*/, '');
This will return a copy of originalString with the initial [...] and whitespace removed.
This should get you started:
/(?:\[[^]]*])?\s*(\w+)/
I have a string, which I want to extract the value out. The string is something like this:
cdata = "![CDATA[cu1hcmod6rbg3eenmk9p80c484ma9B]]";
And I want cu1hcmod6rbg3eenmk9p80c484ma9B. In other words, I want anything inside the ![[CDATA[*]].
I tried to use the following javascript snippet:
cdata = "![CDATA[cu1hcmod6rbg3eenmk9p80c484ma9B]]";
rePattern = new RegExp("![?:\\s+]]","m");
arrMatch = rePattern.exec( cdata );
result = arrMatch[0];
But the code is not working, I'm pretty sure that it's the way I how specify the matching string that's causing the problem. Any idea how to fix it?
Your pattern should be something like...
/^!\[CDATA\[(.+?)\]\]$/
Which is...
Match literal starting ![CDATA[.
Lazy match everything up until the closing ] and save it in capturing group $1 (thanks Phrogz for his excellent suggestion).
Match extra ]].
Your string should be available as arrMatch[1].
Try this:
var cdata = "![CDATA[cu1hcmod6rbg3eenmk9p80c484ma9B]]";
var regPattern = /(.*CDATA\[)(.*)(\]\].*)/gm;
alert(cdata.replace(regPattern, "$2"));