JS Regex to match character outside nested braces [duplicate] - javascript

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 5 years ago.
I need to write a regex to target commas that exist only outside a pair of brackets or braces.
Currently I have:
var regex = /,(?![^{]*})(?![^[]*])/g
When the target string is:
var str = '"a":[{"b":2,"c":["d"]}],"b":2' // OK: only second comma matches
the pattern correctly matches only the second comma.
When the target string is:
var str = '"a":[{"b":2,"c":{"d":9}}],"b":2' // OK: only second comma matches
the pattern also correctly matches only the second comma.
However, when the target string includes a array and object, the negative lookahead fails and the regex matches both commas.
var str = '"a":[{"b":2,"c":[{"d":9}]}],"b":2' // BAD: both commas match

This regex will work (see demo) with the mentioned example:
,(?!([^{]*{[^{]*})?[^{]*})(?!([^[]*\[[^[]*])?[^[]*])
But it's not a generic regex.
For each level of nested brackets you need to expand the regex. For example, to match also "a":[{"b":2,"c":[{"d":[{"e":9}]}]}],"b":2 you will need:
,(?!([^{]*{([^{]*{[^{]*})?[^{]*})?[^{]*})(?!([^[]*\[([^[]*\[[^[]*])?[^[]*])?[^[]*])
See second demo.
This is not a scalable solution, but it's the only one with regexes.
Just for curiosity.

Related

Split a string containing consecutive commas plus comma wrapped in quotation marks [duplicate]

This question already has answers here:
Splitting on comma outside quotes
(5 answers)
Closed 3 years ago.
I'm trying to split a string that contains consecutive commas as well as a comma wrapped in quotation marks but can't quite get the result that I want. Here is an example of a string I have:
var str = ‘10,Apple,"Sweet Gala apple, from Australia",,,,,,in stock,3.99’
where the third element has a comma inside quotation marks, followed by multiple commas.
I want to split the string by commas under two conditions: 1) don't split the comma that's wrapped inside quotations marks 2) between the multiple commas that are next to each other should be treated as a blank space
When I use the regex below:
str.match(/(".*?"|[^,]+)/g)
The result comes out to the array below which meets the first condition but fails to insert a blank space between the consecutive commas
["10","Apple",""Sweet Gala apple, from Australia"","in stock","3.99"]
I want it to look like:
["10","Apple",""Sweet Gala apple, from Australia"",'','','','','',"in stock","3.99"]
What do I need to do to meet the above two conditions?
The main problem here is that you want zero-length matches under certain conditions, but the engine will always try to get a zero-length match no matter whether the last item matched ends at the same index (like 10, will match 10, and if the pattern permits an empty match, then will try to match the empty string between 10 and ,). A plain global match alone won't be able to differentiate between that and the ,,,,,, situation.
I'd use split instead, rather than match - split on a comma, and negative lookahead for non-" characters, followed by ",, to ensure that the comma matched was not within a "" sequence:
var str = '10,Apple,"Sweet Gala apple, from Australia",,,,,,in stock,3.99';
const result = str.split(/,(?![^"]*",)/);
console.log(result);
If the ""s may come at the very end, then at the end of the negative lookahead, alternate the , with $:
var str = '10,Apple,"Sweet Gala apple, from Australia",,,,,,in stock,3.99,"foo, bar"';
const result = str.split(/,(?![^"]*"(?:,|$))/);
console.log(result);
All I had to do was change your regex to use * instead of + like so:
str.match(/(".*?"|[^,]*)/g)

Javascript Replace all commas not in double quotes [duplicate]

This question already has answers here:
Regex to match all instances not inside quotes
(4 answers)
Closed 3 years ago.
I would like to replace all commas in a comma-delimited string with a pipe ('|') except for those that are found in double quotes. I would prefer to use the JavaScript "replace" function if possible.
My regex knowledge is limited at best. I am able to replace all commas with pipes, but that does not give me the desired result for parsing through the data. I also found a regex on here that removed all commas except those in quotations, but does not implement a pipe or some other delimiter.
(?!\B"[^"]*),(?![^"]*"\B)
Here is an example of what I'm trying to accomplish:
string1 = 1234,Cake,,"Smith,John",,"Status: Acknowledge,Accept",,Red,,
and I would like it to look like:
string1 = 1234|Cake||"Smith,John"||"Status: Ackknowledge,Accept"||Red||
One option is to use a replace callback to replace either a quote or a comma with the quote itself or a pipe respectively:
str = `1234,Cake,,"Smith,John",,"Status: Acknowledge,Accept",,Red,,`;
res = str.replace(/(".*?")|,/g, (...m) => m[1] || '|');
console.log(res)
Another (and IMO better in the long run) would be to use a dedicated parser to work with CSV data. CSV is actually trickier than it looks.
We can simply capture our desired commas using alternation with a simple expression such as:
(".+?")|(,)
Demo
RegEx Circuit
jex.im visualizes regular expressions:

javascript regex find words contains (at) in text [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 4 years ago.
I've got a bunch of strings to browse and find there all words which contains "(at)" characters and then gather them in the array.
Sometimes is a replacement of "#" sign. So let's say my goal would be to find something like this: "account(at)example.com".
I tried this code:
let gathering = myString.match(/(^|\.\s+)((at)[^.]*\.)/g;);
but id does not work. How can I do it?
I found a regex for finding email addresses in text:
/([a-zA-Z0-9._-]+#[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi)
I think about something similar but unfortunately I can't just replace # with (at) here.
var longString = "abc(at).com xyzat.com";
var regex = RegExp("[(]at[)]");
var wordList = longString.split(" ").filter((elem, index)=>{
return regex.test(elem);
})
This way you will get all the word in an array that contain "at" in the provided string.
You could use \S+ to match not a whitespace character one or more times and escape the \( and \):
\S+\(at\)\S+\.\w{2,}

Regex allows spaces [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
For the following regex expression:
var regex = new RegExp("^(www\\.)?[0-9A-Za-z-\\.#:%_\+~#=]+(\\.[a-zA-Z]{2,})+(/.*)?(\\?.*)?");
I don't understand why the string "www.goo gle.com" passes the regex test. When I did this:
var regex = new RegExp("^(www\\.)?[0-9A-Za-z-\\.#:%_\+~#=]+(\\.[a-zA-Z]{2,})+(/.*)?(\\?.*)?$");
i.e. adding $ in the end of the regex string prevents the above string passing, which is what I would want.
I tried finding a "simulator" online to help me figure out how the regex is matching but couldn't find much help.
www.goo gle.com passes the test since, www. is matched by [0-9A-Za-z-\\.#:%_\+~#=]+ and
goo is matched by (\.[a-zA-Z]{2,})+. In contrast, (www\\.)?, and the last two groups are optional, so the regex is satisfied even if they are not matched, hence there's no need to further match gle.com.
By adding $, the regex no longer matches, since the space is not matched by any of the subexpressions.

Regular expression seems to treat one string as multiple substrings [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
Why the following code returns "ZZZCamelCase"?? Doesn't such regex examine if the string starts and ends with small case a-z? As what I understand, the str variable should match such condition, so the console output should be "ZZZZZZZZZZZZ", but obviously it somehow breaks the str, and examine the substring against the regex. Why? and how can I tell the program to treat "testCamelCase" as one string?
var str = "testCamelCase";
console.log(str.replace(/^[a-z]+/, 'Z')); // ZZZCamelCase
Here you are matching one or more lower case letters. That's going to be 'test' in your string, because after that comes an uppercase 'C'. So only 'test' gets rpelaced by 'ZZZ'
console.log(str.replace(/^[a-z]+/, 'ZZZ')); // ZZZCamelCase
Use
str.replace(/[a-z]/ig, 'Z')
to get 'ZZZZZZZZZZZZ'
You are forgetting, that regex is case sensitive. This means, that [a-z] doesn't capture the whole string. [a-zA-Z] does. So does [\w] including digits from 0-9.

Categories