Count parentheses with regular expression - javascript

My string is: (as(dh(kshd)kj)ad)... ()()
How is it possible to count the parentheses with a regular expression? I would like to select the string which begins at the first opening bracket and ends before the ...
Applying that to the above example, that means I would like to get this string: (as(dh(kshd)kj)ad)
I tried to write it, but this doesn't work:
var str = "(as(dh(kshd)kj)ad)... ()()";
document.write(str.match(/(.*)/m));

As I said in the comments, contrary to popular belief (don't believe everything people say) matching nested brackets is possible with regex.
The downside of using it is that you can only do it up to a fixed level of nesting. And for every additional level you wish to support, your regex will be bigger and bigger.
But don't take my word for it. Let me show you. The regex \([^()]*\) matches one level. For up to two levels see the regex here. To match your case, you'd need:
\(([^()]*|\(([^()]*|\([^()]*\))*\))*\)
It would match the bold part: (as(dh(kshd)kj)ad)... ()()
Check the DEMO HERE and see what I mean by fixed level of nesting.
And so on. To keep adding levels, all you have to do is change the last [^()]* part to ([^()]*|\([^()]*\))* (check three levels here). As I said, it will get bigger and bigger.

See Tim's answer for why this won't work, but here's a function that'll do what you're after instead.
function getFirstBracket(str){
var pos = str.indexOf("("),
bracket = 0;
if(pos===-1) return false;
for(var x=pos; x<str.length; x++){
var char = str.substr(x, 1);
bracket = bracket + (char=="(" ? 1 : (char==")" ? -1 : 0));
if(bracket==0) return str.substr(pos, (x+1)-pos);
}
return false;
}
getFirstBracket("(as(dh(kshd)kj)ad)... ()(");

There is a possibility and your approach was quite good:
Match will give you an array if you had some hits, if so you can look up the array length.
var str = "(as(dh(kshd)kj)ad)... ()()",
match = str.match(new RegExp('.*?(?:\\(|\\)).*?', 'g')),
count = match ? match.length : 0;
This regular expression will get all parts of your text that include round brackets. See http://gskinner.com/RegExr/ for a nice online regex tester.
Now you can use count for all brackets.
match will deliver a array that looks like:
["(", "as(", "dh(", "kshd)", "kj)", "ad)", "... (", ")", "(", ")"]
Now you can start sorting your results:
var newStr = '', open = 0, close = 0;
for (var n = 0, m = match.length; n < m; n++) {
if (match[n].indexOf('(') !== -1) {
open++;
newStr += match[n];
} else {
if (open > close) newStr += match[n];
close++;
}
if (open === close) break;
}
... and newStr will be (as(dh(kshd)kj)ad)
This is probably not the nicest code but it will make it easier to understand what you're doing.
With this approach there is no limit of nesting levels.

This is not possible with a JavaScript regex. Generally, regular expressions can't handle arbitrary nesting because that can no longer be described by a regular language.
Several modern regex flavors do have extensions that allow for recursive matching (like PHP, Perl or .NET), but JavaScript is not among them.

No. Regular expressions express regular languages. Finite automatons (FA) are the machines which recognise regular language. A FA is, as its name implies, finite in memory. With a finite memory, the FA can not remember an arbitrary number of parentheses - a feature which is needed in order to do what you want.
I suggest you use an algorithms involving an enumerator in order to solve your problem.

try this jsfiddle
var str = "(as(dh(kshd)kj)ad)... ()()";
document.write(str.match(/\((.*?)\.\.\./m)[1] );

Related

3 While Loops into a Single Loop?

I have to remove the commas, periods, and hyphens from an HTML text value. I do not want to write all 3 of these while loops, instead I only want one loop (any) to do all of this.
I already tried a while with multiple && and if else nested inside but i would always only just get the commas removed.
while(beg.indexOf(',') > -1)
{
beg = beg.replace(',','');
document.twocities.begins.value= beg;
}
while(beg.indexOf('-') > -1)
{
beg = beg.replace('-','');
document.twocities.begins.value= beg;
}
while(beg.indexOf('.') > -1)
{
beg= beg.replace('.','');
document.twocities.begins.value= beg;
}
You can do all this without loops by using regex.
Here is an example of removing all those characters using a single regex:
let str = "abc,d-e.fg,hij,1-2,34.56.7890"
str = str.replace(/[,.-]/g, "")
console.log(str)
No loops are necessary for this in the first place.
You can replace characters in a string with String.replace() and you can determine which characters and patterns to replace using regular expressions.
let sampleString = "This, is. a - test - - of, the, code. ";
console.log(sampleString.replace(/[,-.]/g, ""));
A single call to the replace function and using a regular expression suffice:
document.twocities.begins.value = beg = beg.replace(/[,.-]/g, "");
Regular expressions are a pattern matching language. The pattern employed here basically says "every occurrence of one of the characters ., ,, -)". Note that the slash / delimits the pattern while the suffix consists of flags controlling the matching process - in this case it is g (global) telling the engine to replace each occurrence ( as opposed to the first only without the flag ).
This site provides lots of info about regular expressions, their use in programming and implementations in different programming environments.
There are several online sites to test actual regular expression and what they match (including explanations), eg. Regex 101.
Even more details ... ;): You may use the .replace function with a string as the first argument (as you did in your code sample). However, only the first occurrence of the string searched for will be replaced - thus you would have to resort to loops. Specs of the .replace function (and of JS in general) can be found here.
Use regex like below.
let example = "This- is a,,., string.,";
console.log(example.replace(/[-.,]+/g, ""));

JS counting blank spaces

I have to find blank spaces in a string, this includes enter, tabs and spaces using Javascript. I have this code to find spaces
function countThis() {
var string = document.getElementById("textt").value;
var spaceCount = (string.split(" ").length - 1);
document.getElementById("countRedundants").value = spaceCount;
}
This works fine, and gives me the total number of spaces.
The problem is, i want it to only count once, if the space/enter/tab is next to each other. I cant solve this and would appreciate some help or point in the right direction.
Thanks, Gustav
Tou can use regular expressions in your split:
var spaceCount = (string.split(/\s+/gi).length - 1);
Use regex in order to achieve this.
For instance, you could check how many matches of one or more tabs, spaces or newlines exist, and use their count.
The regex rule is : [\t\s\n]+ - meaning that one or more chuncks of tabs, spaces or newlines match the rule.
For JavaScript:
var test = "Test Test Test\nTest\nTest\n\n";
var spacesCount = test.split(/[\t\s\n]+/g).length - 1;
console.log(spacesCount);
Regex is a pretty efficient way of doing this. Alternatively, you would have to manually iterate via the object, and attempt to match the cases where one or multiple spaces, tabs, or newlines exist.
Consider that, what you are attempting to do, is used inside a compiler in order to recognize specific character sequences as specific elements, called tokens. This practice is called Lexical Analysis, or tokenization. Since regex exists, there is no need to perform this check manually, except if you want to do something very advanced or specific.
Here is an ugly solution without using any regex, performance wise it's optimal, but it could be made more pythonic.
def countThis(s):
count = 0
i = 0
while i < len(s):
while i < len(s) and not s[i].isspace():
i += 1
if i < len(s):
count += 1
i += 1
while i < len(s) and s[i].isspace():
i += 1
return count
print(countThis("str"))
print(countThis(" str toto"))
print(countThis("Hello, world!"))
Stéphane Ammar's solution is probably the easiest on the eyes, but if you want something more performant:
function countGaps(str) {
let gaps = 0;
const isWhitespace = ch => ' \t\n\r\v'.indexOf(ch) > -1;
for (let i = 0; i < str.length; i++)
if (isWhitespace(str[i]) && !isWhitespace(str[i - 1]))
++gaps;
return gaps;
}

JavaScript: How to find and retrieve numbers from a string

I'm using RPG Maker MV which is a game creator that uses JavaScript to create plugins. I have a plugin in JavaScript already, however I'm trying to edit a part of the plugin so that it basically checks if a certain string exists in a character in the game and if it does, then sets specific variables to numbers within that string.
for (var i = 0; i < page.list.length; i++) {
if (page.list[i].code == 108 && page.list[i].parameters[0].contains("<post:" + (n) + "," + (n) + ">")) {
var post = page.list[i].parameters[0];
var array = post.split(',');
this._origMovement.x = Number(array[1]);
this._origMovement.y = Number(array[1]);
break;
};
};
So I know the first 2 lines work and contains works when I only put a specific string. However I can't figure out how to check for 2 numbers that are separated by a comma and wrapped in '<>' tags, without knowing what the numbers would be.
Then it needs to extract those numbers and assign one to this._origMovement.x and the other to this._origMovement.y.
Any help would be greatly appreciated.
This is one of those rare cases where I'd use a regular expression. If you haven't come across regular expressions before I suggest reading an introduction to them, such as this one: https://regexone.com/
In your case, you probable want something like this:
var myRegex = /<post:(\d+),(\d+)>/;
var matches = myParameter.match(myRegex);
this._origMovement.x = matches[1]; //the first number
this._origMovement.y = matches[2]; //the second number
The myRegex variable is a regular expression that looks for the pattern you describe, and has 2 capture groups which look for a string of one or more digits (\d+ means "one or more digits"). The result of the .match() call gives you an array containing the entire match and the results of the capture groups.
If you want to allow for decimal numbers, you'll need to use a different capture group that allows for a decimal point, such as ([\d\.]+), which means "a sequence of one or more digits and decimal points", or more sophisticated, (\d+\.?\d*), which is "a sequence of one or more digits, following by an optional decimal point, followed by zero or more digits).
There are lots of good tutorials around to help you write good regular expressions, and sites that will help you live-test your expressions to make sure they work correctly. They're a powerful tool, but be careful not to over-use them!
Got it to work. For anyone who may ever be interested, the code is below.
for (var i = 0; i < page.list.length; i++) {
if (page.list[i].code == 108 && page.list[i].parameters[0].contains("<post:")) {
var myRegex = /<post:(\d+),(\d+)>/;
var matches = page.list[i].parameters[0].match(myRegex);
this._origMovement.x = matches[1]; //the first number
this._origMovement.y = matches[2]; //the second number
break;
}
};

Regular expression, specify a number of loops

This regular expression looks for words with 3 or less characters so that a non-breaking space can be placed in before them.
smallwords = /(\s|^)(([a-zA-Z-_(]{1,2}('|’)*[a-zA-Z-_,;]{0,1}?\s)+)/gi, // words with 3 or less characters
Is there a way, to make the expression only apply itself to 2 words in a row?
Example
Currently, the string:
Singapore, the USA and Vietnam.
will be turned into:
Singapore, the USA and Vietnam.
if the expression only applied to 2 words in a row it would show
Singapore, the USA and Vietnam.
here's the full script:
ragadjust = function (s, method) {
if (document.querySelectorAll) {
var eles = document.querySelectorAll(s),
elescount = eles.length,
smallwords = /(\s|^)(([a-zA-Z-_(]{1,2}('|’)*[a-zA-Z-_,;]{0,1}?\s)+)/gi, // words with 3 or less characters
while (elescount-- > 0) {
var ele = eles[elescount],
elehtml = ele.innerHTML;
if (method == 'small-words' || method == 'all')
// replace small words
elehtml = elehtml.replace(smallwords, function(contents, p1, p2) {
return p1 + p2.replace(/\s/g, ' ');
});
ele.innerHTML = elehtml;
}
}
};
This is from RagAdjust
I know that this is not what you are asking for, but I figured a code review wouldn't hurt:
I think the word boundary \b is better, in this case, than \s|^.
You have the A-Z and a-z characters in your match, yet you are use the i case insensitive operator.
{0,1}? is redundant - either use the ? to make it optional, or use {0,1} to make it match zero or one times.
If your are going to have a dash in your character set put it at the end so that you don't have an ambiguous regex, for example this [a-z_-] is much better than [a-z-_].
If you don't need to capture a value, use the non-capturing parenthesis (?:).
So, here's your cleaned up regex:
/\b((?:[a-z_(-]{1,2}(?:'|’)*[a-z_,;-]?\s)+)/gi
I'm pretty sure the '|’ bit is some sort of typo when you pasted this in from your editor. Not sure what it is supposed to be.
This doesn't quite solve the issue the way you suggested but it does reduce the number of non breaking spaces that end up in the string. But it might give you some insight. Because you have the trailing g on both regex replacements, you're doing global replace. If you instead loop it with some max number of fixes, things work out a little differently.
Try changing the max number of replacements. I think the other thing that happens here (in my modified code) is that after you make one replacement, the spaces and small words are gone because you jammed in a nbsp which may or may not solve the issue you're trying to get around.
Here's my replacement function (simplified from your original). The basic mod is to remove the g from the regex's and add the loop. You should check out the codepen to see the full deal
var new_ragadjust = function (contents) {
MAX_NUMBER_OF_REPLACEMENTS = 5;
smallwords = /(\s|^)(([a-zA-Z-_(]{1,2}('|’)*[a-zA-Z-_,;]{0,1}?\s)+)/i; // words with 3 or less characters
var ii = 0;
var c = contents;
for (;ii < MAX_NUMBER_OF_REPLACEMENTS; ++ii) {
c = c.replace(smallwords, function(contents, p1, p2) {
return p1 + p2.replace(/\s/, ' ');
});
}
return c;
};
Codepen
http://cdpn.io/DKLtc
Also, to see the difference, you need to inspect elements to actually see where the nbsps end up (as you probably already knew).

Regular expression that remove second occurrence of a character in a string

I'm trying to write a JavaScript function that removes any second occurrence of a character using the regular expression. Here is my function
var removeSecondOccurrence = function(string) {
return string.replace(/(.*)\1/gi, '');
}
It's only removing consecutive occurrence. I'd like it to remove even non consecutive one. for example papirana should become pairn.
Please help
A non-regexp solution:
"papirana".split("").filter(function(x, n, self) { return self.indexOf(x) == n }).join("")
Regexp code is complicated, because JS doesn't support lookbehinds:
str = "papirana";
re = /(.)(.*?)\1/;
while(str.match(re)) str = str.replace(re, "$1$2")
or a variation of the first method:
"papirana".replace(/./g, function(a, n, str) { return str.indexOf(a) == n ? a : "" })
Using a zero-width lookahead assertion you can do something similar
"papirana".replace(/(.)(?=.*\1)/g, "")
returns
"pirna"
The letters are of course the same, just in a different order.
Passing the reverse of the string and using the reverse of the result you can get what you're asking for.
This is how you would do it with a loop:
var removeSecondOccurrence = function(string) {
var results = "";
for (var i = 0; i < string.length; i++)
if (!results.contains(string.charAt(i)))
results += string.charAt(i);
}
Basically: for each character in the input, if you haven't seen that character already, add it to the results. Clear and readable, at least.
What Michelle said.
In fact, I strongly suspect it cannot be done using regular expressions. Or rather, you can if you reverse the string, remove all but the first occurences, then reverse again, but it's a dirty trick and what Michelle suggests is way better (and probably faster).
If you're still hot on regular expressions...
"papirana".
split("").
reverse().
join("").
replace(/(.)(?=.*\1)/g, '').
split("").
reverse().
join("")
// => "pairn"
The reason why you can't find all but the first occurence without all the flippage is twofold:
JavaScript does not have lookbehinds, only lookaheads
Even if it did, I don't think any regexp flavour allows variable-length lookbehinds

Categories