Single regular expression for two different strings - javascript

I need to write a single regular expression that returns the color value and size values from the below two strings.
[{"id":"2","name":"Color","code":"COLOR","optionValue":{"value":"TANGERINE TANGO","priority":0,"altValue1":"ORANGE","altValue2":null}},{"id":"3","name":"Size","code":"SIZE","optionValue":{"value":"MEDIUM","priority":4,"altValue1":null,"altValue2":null}}]
[{"id":"3","name":"Size","code":"SIZE","optionValue":{"value":"MEDIUM","priority":4,"altValue1":null,"altValue2":null}},{"id":"2","name":"Color","code":"COLOR","optionValue":{"value":"PEACOCK BLUE","priority":0,"altValue1":"GREEN","altValue2":null}}]
Currently I have two different regexps for them respectively.
1) COLOR(?:.*?)value":"([^"]+)(?:.*?)SIZE(?:.*?)value":"([^"]+)"
2) SIZE(?:.*?)value":"([^"]+)(?:.*?)COLOR(?:.*?)value":"([^"]+)"
Is there a way I can achieve this using a single regex?

Use JSON.parse, it is safer and is more appropriate with JSON strings:
var strings = ['[{"id":"2","name":"Color","code":"COLOR","optionValue":{"value":"TANGERINE TANGO","priority":0,"altValue1":"ORANGE","altValue2":null}},{"id":"3","name":"Size","code":"SIZE","optionValue":{"value":"MEDIUM","priority":4,"altValue1":null,"altValue2":null}}]', '[{"id":"3","name":"Size","code":"SIZE","optionValue":{"value":"MEDIUM","priority":4,"altValue1":null,"altValue2":null}},{"id":"2","name":"Color","code":"COLOR","optionValue":{"value":"PEACOCK BLUE","priority":0,"altValue1":"GREEN","altValue2":null}}]'];
var cnt = 0;
strings.forEach(function(str) {
var array = JSON.parse(str);
cnt += 1;
document.getElementById("r").innerHTML += "<b>Match " + cnt + "</b><br/>";
array.forEach(function(object) {
document.getElementById("r").innerHTML += object.optionValue.value + "<br/>";
});
});
<div id="r"/>
You can declare an array and push the results you get into the array for later use.

Most compact way to do this in a single regex:
(COLOR|SIZE).*?value":"([^"]+).*?(?!\1)(?:COLOR|SIZE).*?value":"([^"]+)"
I might agree with #stribizhev and #Sirko about parsing the JSON, but I can see if you just need to get a quick-and-dirty job done, then a regex is sometimes useful.
Explanation:
You can use alternation, capturing, lookahead assertions, and backreferencing.
First, let's simplify your regex by removing unnecessary non-capturing groups, (?:...):
COLOR.*?value":"([^"]+).*?SIZE.*?value":"([^"]+)"
SIZE.*?value":"([^"]+).*?COLOR.*?value":"([^"]+)"
Now, here's what gets you halfway (similar to #MarcosPerezGude's suggestion):
(?:COLOR|SIZE).*?value":"([^"]+).*?(?:COLOR|SIZE).*?value":"([^"]+)"
^^^ ^^^^^^ ^^^^^^^^^ ^
But the problem with this is it accepts COLOR COLOR and SIZE SIZE. Here's how to get around that:
( COLOR|SIZE).*?value":"([^"]+).*?(?!\1)(?:COLOR|SIZE).*?value":"([^"]+)"
^^ ^^^^^^
Let me explain this. The \1 is a backreference to whatever's captured in the first capturing group. Which in our case is now COLOR or SIZE because we've removed the non-capturing-ness. The (?!\1) is a negative lookahead assertion that says, "As long as what comes next isn't \1..." Therefore if the captured string was COLOR, the second half must be SIZE, or vice versa.

You can try with the OR operator
([size|color](?:.*?))
Good luck!

You should be able to have it select between two alternatives like this:
COLOR(?:.*?)value":"([^"]+)(?:.*?)SIZE(?:.*?)value":"([^"]+)|SIZE(?:.*?)value":"([^"]+)(?:.*?)COLOR(?:.*?)value":"([^"]+)

Related

How to split a string by one delimiter but having a particular format as described below

I have a string as:
const str = 'My [Link format](https://google.com) demo'
I want the word array to be like:
['My', '[Link format](https://google.com)', 'demo']
What to do in javascript?
I was trying using split() and str.match(). Nothing worked yet.
This is a simple split on a space as a delimiter, but we us a negative lookahead to check for the combination of open and closed square brackets [] and round brackets ()
const str = 'My [Link format](https://google.com) demo'
console.log(str.split(/\s+(?![^\[]*\])(?![^\(]*\))/));
We also allow for spaces in the URL portion, even though it has a low chance of having spaces, it could still happen
Try it here: https://jsfiddle.net/m4q6e9x7/
["My", "[Link format](https://google.com)", "demo"]
In the fiddle I've tried to show to two separate negative lookaheads for the combination of the types of brackets: (I've put a space in the round brackets to prove the concept)
const str = 'My [Link format](http s://google.com) demo'
ignore space between []
console.log(str.split(/\s+(?![^\[]*\])/));
["My", "[Link format](http", "s://google.com)", "demo"]
ignore space between ()
console.log(str.split(/\s+(?![^\(]*\))/));
["My", "[Link", "format](http s://google.com)", "demo"]
So we can easily combine the two criteria because we need both of them to not match.
Because [] and () need to be escaped, it might be easier to see the regex if we modify and test for spaces between braces {}
const str = 'My {Link format}(https://google.com) demo'
console.log(str.split(/\s+(?![^{]*})/));
["My", "{Link format}(https://google.com)", "demo"]
Both solutions assume, that the string has correct form (meaning basically no space between ']' and '(', no ']' characters inside [...] and similar intuitions. You didn't really provide information about what the input string can be other than your concrete example – so solutions work well in this and very similar cases. Second is very easily modified as needed, first is easily extended to check if the string is in fact not correct.
Solution using Regular Expressions
Below code finds everything before first '[', everything in '[...](...)' pattern (note: first ... must not contain ']', and second – ')', but I assume this would make for an incorrect input in the first place), and everything after that.
So
let regex = /(.*)(\[.*\]\(.*\))(.*)/
let res = str.match(regex).splice(1,3)
gives res as
['My ', '[Link format](https://google.com)', ' demo']
From there, you can trim every entry in this array ('My ' => 'My') for example using a trim function like so:
res.map((val) => val.trim());
Look here for explanation of what the array obtained from .match() method represents, but generally except index 0 it contains capture groups, meaning the parts of string corresponding to parts of regex surrounded by parentheses.
If you are not familiar with Regular Expressions (regexes) in JS, or at all, you will find many online resources about the topic easily. After grasping the basics, regex101 is a nice tool to experiment with regexes and explore their capabilities. When using it, you should probably choose EcmaSCRIPT/JS flavor from the menu on the left.
Equivalent solution without regex
Equivalent solution is to find where is the first '[' manually, as well as where the '[...](...)' pattern ends. Than splice the parts (before '[', pattern, and after pattern) from the string, and probably trim them. So just loop over characters of the string in search of '[' and than ']', '(', ')'. Note that in this case you can easily and granularily decide what to do if the string has unexpected/incorrect form.
TODO: I will probably sketch some code when I have time for it
Regex is your friend!
const regexMdLinks = /!?\[([^\]]*)\]\(([^\)]+)\)/gm
// Example md file contents
const str = `My [Link format](https://google.com) demo My [Link format2](https://google.com/2) demo2`
let regex_splitted = str.split(regexMdLinks);
let arr = [];
//1. Item will be the text (or empty text)
//2. Item is the link text
//3. Item is the url
for(let i = 0; i < regex_splitted.length; i++){
if(i % 3 == 0){ //Split normal text
arr.push(...regex_splitted[i].split(" ").filter(i => i));
} else if(i % 3 == 1){//Add brackets around link text
arr.push("["+regex_splitted[i]+"]");
} else {
arr.push("("+regex_splitted[i]+")");
}
}
console.log(arr)

Regexp group not excluding dots

Let's say I have the following string: div.classOneA.classOneB#idOne
Trying to write a regexp which extracts the classes (classOneA, classOneB) from it. I was able to do this but with Lookbehind assertion only.
It looks like this:
'div.classOneA.classOneB#idOne'.match(/(?<=\.)([^.#]+)/g)
> (2) ["classOneA", "classOneB"]
Now I would like to archive this without the lookbehind approach and do not really understand why my solution's not working.
'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
> (2) [".classOneA", ".classOneB"]
Thought that the grouping will solve my problem but all matching item contains the dot as well.
There isn't a good way in Javascript to both match multiple times (/g option) and pick up capture groups (in the parens). Try this:
var input = "div.classOneA.classOneB#idOne";
var regex = /\.([^.#]+)/g;
var matches, output = [];
while (matches = regex.exec(input)) {
output.push(matches[1]);
}
This is because with g modifier you get all matching substrings but not its matching groups (that is as if (...) pairs worked just like (?:...) ones.
You see. Whithout g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/)
[ '.classOneA',
'classOneA',
index: 3,
input: 'div.classOneA.classOneB#idOne',
groups: undefined ]
With g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
[ '.classOneA', '.classOneB' ]
In other words: you obtain all matches but only the whole match (0 item) per each.
There are many solutions:
Use LookBehind assertions as you pointed out yourself.
Fix each result later adding .map(x=>x.replace(/^\./, ""))
Or, if your input structure won't be much more complicated than the example you provide, simply use a cheaper approach:
> 'div.classOneA.classOneB#idOne'.replace(/#.*/, "").split(".").slice(1)
[ 'classOneA', 'classOneB' ]
Use .replace() + callback instead of .match() in order to be able to access capture groups of every match:
const str = 'div.classOneA.classOneB#idOne';
const matches = [];
str.replace(/\.([^.#]+)/g, (...args)=>matches.push(args[1]))
console.log(matches); // [ 'classOneA', 'classOneB' ]
I would recommend the third one (if there aren't other possible inputs that could eventually break it) because it is much more efficient (actual regular expressions are used only once to trim the '#idOne' part).
If you want to expand you regex. you can simply map on results and replace . with empty string
let op = 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
.map(e=> e.replace(/\./g,''))
console.log(op)
If you know you are searching for a text containing class, then you can use something like
'div.classOneA.classOneB#idOne'.match(/class[^.#]+/g)
If the only thing you know is that the text is preceded by a dot, then you must use lookbehind.
This regex will work without lookbehind assertion:
'div.classOneA.classOneB#idOne'.match(/\.[^\.#]+/g).map(item => item.substring(1));
Lookbehind assertion is not available in JavaScript recently.
I'm not an expert on using regex - particularly in Javascript - but after some research on MDN I've figured out why your attempt wasn't working, and how to fix.
The problem is that using .match with a regexp with the /g flag will ignore capturing groups. So instead you have to use the .exec method on the regexp object, using a loop to execute it multiple times to get all the results.
So the following code is what works, and can be adapted for similar cases. (Note the grp[1] - this is because the first element of the array returned by .exec is the entire match, the groups are the subsequent elements.)
var regExp = /\.([^.#]+)/g
var result = [];
var grp;
while ((grp = regExp.exec('div.classOneA.classOneB#idOne')) !== null) {
result.push(grp[1]);
}
console.log(result)

Regex .exec into array

I want to capture some values in a string, THEN return them to the page. Here is an example of the code. As I understand, the .exec should store the values it matches into the array correct? This should return Savage, Betsy. Can someone enlighten me on to what's wrong?
var regex = /\b(Betsy)(Savage)\b/i;
var string = "My friend is Betsy Ann Savage";
var arrayMatch = null;
while(arrayMatch = regex.exec(string)){
document.getElementById("text").innerHTML = arrayMatch[1] + ", " + arrayMatch[0];
}
You don't get any matches like this. You could add .* between (Betsy) and (Savage)...
It sounds like you think \b(Besty)(Savage)\b will match EITHER Besty, OR Savage, but that isn't the case. It's looking for one string where both parts are combined - you might as well try to match \b(BetsySavage)\b. This is because a while yes, you do have two groups separated by parentasis, you have them directly next to each other, so the Regex engine says, 'okay', I'll look for both right next to each other. I think what you really want to do is use | which represents an OR. As in \b(Besty|Savage)\b.

Javascript regex expression to replace multiple strings?

I've a string done like this: "http://something.org/dom/My_happy_dog_%28is%29cool!"
How can I remove all the initial domain, the multiple underscore and the percentage stuff?
For now I'm just doing some multiple replace, like
str = str.replace("http://something.org/dom/","");
str = str.replace("_%28"," ");
and go on, but it's really ugly.. any help?
Thanks!
EDIT:
the exact input would be "My happy dog is cool!" so I would like to get rid of the initial address and remove the underscores and percentage and put the spaces in the right place!
The problem is that trying to put a regex on Chrome "something goes wrong". Is it a problem of Chrome or my regex?
I'd suggest:
var str = "http://something.org/dom/My_happy_dog_%28is%29cool!";
str.substring(str.lastIndexOf('/')+1).replace(/(_)|(%\d{2,})/g,' ');
JS Fiddle demo.
The reason I took this approach is that RegEx is fairly expensive, and is often tricky to fine tune to the point where edge-cases become less troublesome; so I opted to use simple string manipulation to reduce the RegEx work.
Effectively the above creates a substring of the given str variable, from the index point of the lastIndexOf('/') (which does exactly what you'd expect) and adding 1 to that so the substring is from the point after the / not before it.
The regex: (_) matches the underscores, the | just serves as an or operator and the (%\d{2,}) serves to match digit characters that occur twice in succession and follow a % sign.
The parentheses surrounding each part of the regex around the |, serve to identify matching groups, which are used to identify what parts should be replaced by the ' ' (single-space) string in the second of the arguments passed to replace().
References:
lastIndexOf().
replace().
substring().
You can use unescape to decode the percentages:
str = unescape("http://something.org/dom/My_happy_dog_%28is%29cool!")
str = str.replace("http://something.org/dom/","");
Maybe you could use a regular expression to pull out what you need, rather than getting rid of what you don't want. What is it you are trying to keep?
You can also chain them together as in:
str.replace("http://something.org/dom/", "").replace("something else", "");
You haven't defined the problem very exactly. To get rid of all stretches of characters ending in %<digit><digit> you'd say
var re = /.*%\d\d/g;
var str = str.replace(re, "");
ok, if you want to replace all that stuff I think that you would need something like this:
/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g
test
var string = "http://something.org/dom/My_happy_dog_%28is%29cool!";
string = string.replace(/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g,"");

Javascript Regex: replacing the last dot for a comma

I have the following code:
var x = "100.007"
x = String(parseFloat(x).toFixed(2));
return x
=> 100.01
This works awesomely just how I want it to work. I just want a tiny addition, which is something like:
var x = "100,007"
x.replace(",", ".")
x.replace
x = String(parseFloat(x).toFixed(2));
x.replace(".", ",")
return x
=> 100,01
However, this code will replace the first occurrence of the ",", where I want to catch the last one. Any help would be appreciated.
You can do it with a regular expression:
x = x.replace(/,([^,]*)$/, ".$1");
That regular expression matches a comma followed by any amount of text not including a comma. The replacement string is just a period followed by whatever it was that came after the original last comma. Other commas preceding it in the string won't be affected.
Now, if you're really converting numbers formatted in "European style" (for lack of a better term), you're also going to need to worry about the "." characters in places where a "U.S. style" number would have commas. I think you would probably just want to get rid of them:
x = x.replace(/\./g, '');
When you use the ".replace()" function on a string, you should understand that it returns the modified string. It does not modify the original string, however, so a statement like:
x.replace(/something/, "something else");
has no effect on the value of "x".
You can use a regexp. You want to replace the last ',', so the basic idea is to replace the ',' for which there's no ',' after.
x.replace(/,([^,]*)$/, ".$1");
Will return what you want :-).
You could do it using the lastIndexOf() function to find the last occurrence of the , and replace it.
The alternative is to use a regular expression with the end of line marker:
myOldString.replace(/,([^,]*)$/, ".$1");
You can use lastIndexOf to find the last occurence of ,. Then you can use slice to put the part before and after the , together with a . inbetween.
You don't need to worry about whether or not it's the last ".", because there is only one. JavaScript doesn't store numbers internally with comma or dot-delimited sets.

Categories