RegExp in Javascript to find all parenthesis constructions - javascript

Well, I have expressions like this: 27+3/(12-5)+9-(2*(12-10)+(7-6))
I need all the parenthesis to get like this array:
[(12-5),(2*(12-10)+(7-6)),(12-10),(7-6)]
Or some this array shaped. Is there some easy way to make RegExp for the case? Well, to make smth like:
const myExprStr = '27+3/(12-5)+9-(2*(12-10)+(7-6))';
const neededParenthesisArray = [...[], ...myExprStr.matchAll([some magic regexp])];
Well, finally. The question is: can someone to share with me the needed RegExp, or, maybe, there is some docs to find oyt how to make the RegExp?

Assuming you said there would be not much of nesting. You can see how things easily blow up. The reason is the theoretical boundaries of regex, it is a type of language that is more easily to parse, but on the other hand is is not meant to count. By matching nested parenthesis you need to count. If you have only three levels, we can use a trick, but if you want to go deeper, better use an appropriate parser.
capturing all sigle parenthesis:
\(([^()\n]*)\)
double:
\(([^()\n]*\([^()\n]*\)[^()\n]*)+\)
tripple:
\(([^()\n]*\(([^()\n]*\([^()\n]*\)[^()\n]*)+\))+[^()\n]*\)
https://regex101.com/r/n8SVYH/1
https://regex101.com/r/n8SVYH/2
https://regex101.com/r/n8SVYH/3

Related

Performance about replace() or substr() in Javascript

I was wondering about Javascript performance about using string.replace() or string.substr(). Let me explain what I'm doing.
I've a string like
str = "a.aa.a.aa."
I just have to "pop" last element in str where I always know what type of character it is (e.g, it's a dot here).
It's so simple, I can follow a lot of ways, like
str = str.substr(0, str.length-1) // same as using slice()
or
str = str.replace(/\.$/, '')
Which methods would you use? Why? Is there some lack in performance using this or that method? Length of the string is negligible.
(this is my first post, so if I'm doing something wrong please, notify me!)
For performance tests in JavaScript use jsPerf.com
I created a testcase for your question here, which shows, that substr is a lot faster (at least in firefox).
If you just want the last character in the string, then use the subscript, not some replacement:
str[str.length-1]
Do you have to do this thousands of times in a loop? If not (and "Length of string is negligible"), any way will do.
That said, I'd prefer the first option, since it makes the intention of trimming the last character more clear than the second one (oh, and it's faster, in case you do need to run this a zillion times. Since in the regex case, you need to not only build a new string but also compile a RegExp and run it against the input.)
When you have this kind of doubt, either pick what you like the best (style-speaking, as running this only once doesn't matter much), or use http://jsperf.com.
For this very example, see here why substr is better :-).
The substr way should always be faster than any kind of RegExp. But the performance difference should be minor.

Javascript String Parsing

I am trying to parse a string in this format
[something](something something) [something](something something)
and I want to break on every space that is not between a set of parenthesis?
I tried using js string.split with this as the regex /[^\(].*\s+.*[^\)]/g, but it doesn't work? Any suggestions appreciated :-)
EDIT: I don't want to post this as an answer, because I want to leave it open to comments but I finally found a solution.
var a = "the>[the](the the) the>[the](the the) the"
var regex = /\s+(?!\w+[\)])/
var b = a.split(regex)
alert(b.join("+++"))
Is your input always this consistent? If it is, it could be as simple as splitting your string on ') ['
If it isn't, is it possible to just take what is between [ and )? Or is there some kind of nesting that is going on?
You are using the wrong tool for the job.
As was alluded to in this famous post, regular expressions cannot parse non-regular languages, and the "balanced parenthesis" problem cannot be described by a regular language.
Have you tried writing a parser instead?
EDIT:
It seems that you've finally clarified that nesting is not a requirement. In that case, I'd suggest gnur's solution.
This regex will do exactly what you asked, and nothing more:
'[x](x x) [x](x x)'.split(/ +(?![^\(]*\))/);

Fastest way to remove hyphens from a string

I have IDs that look like: 185-51-671 but they can also have letters at the end, 175-1-7b
All I want to do is remove the hyphens, as a pre-processing step. Show me some cool ways to do this in javascript? I figure there are probably quite a few questions like this one, but I'm interested to see what optimizations people will come up with for "just hyphens"
Thanks!
edit: I am using jQuery, so I guess .replace(a,b) does the trick (replacing a with b)
numberNoHyphens = number.replace("-","");
any other alternatives?
edit #2:
So, just in case anyone is wondering, the correct answer was
numberNoHyphens = number.replace(/-/g,"");
and you need the "g" which is the pattern switch or "global flag" because
numberNoHyphens = number.replace(/-/,"");
will only match and replace the first hyphen
You need to include the global flag:
var str="185-51-671";
var newStr = str.replace(/-/g, "");
This is not faster, but
str.split('-').join('');
should also work.
I set up a jsperf test if anyone wants to add and compare their methods, but it's unlikely anything will be faster than the replace method.
http://jsperf.com/remove-hyphens-from-string
var str='185-51-671';
str=str.replace(/-/g,'');
Gets much easier in String.prototype.replaceAll(). Check out the browser support for the built-in method.
const str = '185-51-671';
console.log(str.replaceAll('-', ''));
Som of these answers, prior to edits, did not remove all of the hyphens. You would need to use .replaceAll("-","")
In tidyverse, there are multiple functions that could suit your needs. Specifically, I would use str_remove, which will replace in a string, the giver character by an empty string (""), effectively removing it (check here the documentation). Example of its usage:
str_remove(x, '-')

Javascript Regular Expressions Lookbehind Failing

I am hoping that this will have a pretty quick and simple answer. I am using regular-expressions.info to help me get the right regular expression to turn URL-encoded, ISO-8859-1 pound sign ("%A3"), into a URL-encoded UTF-8 pound sign ("%C2%A3").
In other words I just want to swap %A3 with %C2%A3, when the %A3 is not already prefixed with %C2.
So I would have thought the following would work:
Regular Expression: (?!(\%C2))\%A3
Replace With: %C2%A3
But it doesn't and I can't figure out why!
I assume my syntax is just slightly wrong, but I can't figure it out! Any ideas?
FYI - I know that the following will work (and have used this as a workaround in the meantime), but really want to understand why the former doesn't work.
Regular Expression: ([^\%C2])\%A3
Replace With: $1%C2%A3
TIA!
Why not just replace ((%C2)?%A3) with %C2%A3, making the prefix an optional part of the match? It means that you're "replacing" text with itself even when it's already right, but I don't foresee a performance issue.
Unfortunately, the (?!) syntax is negative lookahead. To the best of my knowledge, JavaScript does not support negative lookbehind.
What you could do is go forward with the replacement anyway, and end up with %C2%C2%A3 strings, but these could easily be converted in a second pass to the desired %C2%A3.
You could replace
(^.?.?|(?!%C2)...)%A3
with
$1%C2%A3
I would suggest you use the functional form of Javascript String.replace (see the section "Specifying a function as a parameter"). This lets you put arbitrary logic, including state if necessary, into a regexp-matching session. For your case, I'd use a simpler regexp that matches a superset of what you want, then in the function call you can test whether it meets your exact criteria, and if it doesn't then just return the matched string as is.
The only problem with this approach is that if you have overlapping potential matches, you have the possibility of missing the second match, since there's no way to return a value to tell the replace() method that it isn't really a match after all.

Processing Javascript RegEx submatches

I am trying to write some JavaScript RegEx to replace user inputed tags with real html tags, so [b] will become <b> and so forth. the RegEx I am using looks like so
var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;
with the following JavaScript
s.replace(exptags,"<$1>$2</$1>");
this works fine for single nested tags, for example:
[b]hello[/b] [u]world[/u]
but if the tags are nested inside each other it will only match the outer tags, for example
[b]foo [u]to the[/u] bar[/b]
this will only match the b tags. how can I fix this? should i just loop until the starting string is the same as the outcome? I have a feeling that the ((.){1,}?) patten is wrong also?
Thanks
The easiest solution would be to to replace all the tags, whether they are closed or not and let .innerHTML work out if they are matched or not it will much more resilient that way..
var tagreg = /\[(\/?)(b|u|i|s|center|code)]/ig
div.innerHTML="[b][i]helloworld[/b]".replace(tagreg, "<$1$2>") //no closing i
//div.inerHTML=="<b><i>helloworld</i></b>"
AFAIK you can't express recursion with regular expressions.
You can however do that with .NET's System.Text.RegularExpressions using balanced matching. See more here: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
If you're using .NET you can probably implement what you need with a callback.
If not, you may have to roll your own little javascript parser.
Then again, if you can afford to hit the server you can use the full parser. :)
What do you need this for, anyway? If it is for anything other than a preview I highly recommend doing the processing server-side.
You could just repeatedly apply the regexp until it no longer matches. That would do odd things like "[b][b]foo[/b][/b]" => "<b>[b]foo</b>[/b]" => "<b><b>foo</b></b>", but as far as I can see the end result will still be a sensible string with matching (though not necessarily properly nested) tags.
Or if you want to do it 'right', just write a simple recursive descent parser. Though people might expect "[b]foo[u]bar[/b]baz[/u]" to work, which is tricky to recognise with a parser.
The reason the nested block doesn't get replaced is because the match, for [b], places the position after [/b]. Thus, everything that ((.){1,}?) matches is then ignored.
It is possible to write a recursive parser in server-side -- Perl uses qr// and Ruby probably has something similar.
Though, you don't necessarily need true recursive. You can use a relatively simple loop to handle the string equivalently:
var s = '[b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]';
var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;
while (s.match(exptags)) {
s = s.replace(exptags, "<$1>$2</$1>");
}
document.writeln('<div>' + s + '</div>'); // after
In this case, it'll make 2 passes:
0: [b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]
1: <b>hello</b> <u>world</u> <b>foo [u]to the[/u] bar</b>
2: <b>hello</b> <u>world</u> <b>foo <u>to the</u> bar</b>
Also, a few suggestions for cleaning up the RegEx:
var exptags = /\[(b|u|i|s|center|code)\](.+?)\[\/(\1)\]/ig;
{1} is assumed when no other count specifiers exist
{1,} can be shortened to +
Agree with Richard Szalay, but his regex didn't get quoted right:
var exptags = /\[(b|u|i|s|center|code)](.*)\[\/\1]/ig;
is cleaner. Note that I also change .+? to .*. There are two problems with .+?:
you won't match [u][/u], since there isn't at least one character between them (+)
a non-greedy match won't deal as nicely with the same tag nested inside itself (?)
Yes, you will have to loop. Alternatively since your tags looks so much like HTML ones you could replace [b] for <b> and [/b] for </b> separately. (.){1,}? is the same as (.*?) - that is, any symbols, least possible sequence length.
Updated: Thanks to MrP, (.){1,}? is (.)+?, my bad.
How about:
tagreg=/\[(.?)?(b|u|i|s|center|code)\]/gi;
"[b][i]helloworld[/i][/b]".replace(tagreg, "<$1$2>");
"[b]helloworld[/b]".replace(tagreg, "<$1$2>");
For me the above produces:
<b><i>helloworld</i></b>
<b>helloworld</b>
This appears to do what you want, and has the advantage of needing only a single pass.
Disclaimer: I don't code often in JS, so if I made any mistakes please feel free to point them out :-)
You are right about the inner pattern being troublesome.
((.){1,}?)
That is doing a captured match at least once and then the whole thing is captured. Every character inside your tag will be captured as a group.
You are also capturing your closing element name when you don't need it and are using {1} when that is implied. Below is a cleanup up version:
/\[(b|u|i|s|center|code)](.+?)\[\/\1]/ig
Not sure about the other problem.

Categories