Simple regex question. I have a string on the following format:
this is a [sample] string with [some] special words. [another one]
What is the regular expression to extract the words within the square brackets, ie.
sample
some
another one
Note: In my use case, brackets cannot be nested.
You can use the following regex globally:
\[(.*?)\]
Explanation:
\[ : [ is a meta char and needs to be escaped if you want to match it literally.
(.*?) : match everything in a non-greedy way and capture it.
\] : ] is a meta char and needs to be escaped if you want to match it literally.
(?<=\[).+?(?=\])
Will capture content without brackets
(?<=\[) - positive lookbehind for [
.*? - non greedy match for the content
(?=\]) - positive lookahead for ]
EDIT: for nested brackets the below regex should work:
(\[(?:\[??[^\[]*?\]))
This should work out ok:
\[([^]]+)\]
Can brackets be nested?
If not: \[([^]]+)\] matches one item, including square brackets. Backreference \1 will contain the item to be match. If your regex flavor supports lookaround, use
(?<=\[)[^]]+(?=\])
This will only match the item inside brackets.
To match a substring between the first [ and last ], you may use
\[.*\] # Including open/close brackets
\[(.*)\] # Excluding open/close brackets (using a capturing group)
(?<=\[).*(?=\]) # Excluding open/close brackets (using lookarounds)
See a regex demo and a regex demo #2.
Use the following expressions to match strings between the closest square brackets:
Including the brackets:
\[[^][]*] - PCRE, Python re/regex, .NET, Golang, POSIX (grep, sed, bash)
\[[^\][]*] - ECMAScript (JavaScript, C++ std::regex, VBA RegExp)
\[[^\]\[]*] - Java, ICU regex
\[[^\]\[]*\] - Onigmo (Ruby, requires escaping of brackets everywhere)
Excluding the brackets:
(?<=\[)[^][]*(?=]) - PCRE, Python re/regex, .NET (C#, etc.), JGSoft Software
\[([^][]*)] - Bash, Golang - capture the contents between the square brackets with a pair of unescaped parentheses, also see below
\[([^\][]*)] - JavaScript, C++ std::regex, VBA RegExp
(?<=\[)[^\]\[]*(?=]) - Java regex, ICU (R stringr)
(?<=\[)[^\]\[]*(?=\]) - Onigmo (Ruby, requires escaping of brackets everywhere)
NOTE: * matches 0 or more characters, use + to match 1 or more to avoid empty string matches in the resulting list/array.
Whenever both lookaround support is available, the above solutions rely on them to exclude the leading/trailing open/close bracket. Otherwise, rely on capturing groups (links to most common solutions in some languages have been provided).
If you need to match nested parentheses, you may see the solutions in the Regular expression to match balanced parentheses thread and replace the round brackets with the square ones to get the necessary functionality. You should use capturing groups to access the contents with open/close bracket excluded:
\[((?:[^][]++|(?R))*)] - PHP PCRE
\[((?>[^][]+|(?<o>)\[|(?<-o>]))*)] - .NET demo
\[(?:[^\]\[]++|(\g<0>))*\] - Onigmo (Ruby) demo
If you do not want to include the brackets in the match, here's the regex: (?<=\[).*?(?=\])
Let's break it down
The . matches any character except for line terminators. The ?= is a positive lookahead. A positive lookahead finds a string when a certain string comes after it. The ?<= is a positive lookbehind. A positive lookbehind finds a string when a certain string precedes it. To quote this,
Look ahead positive (?=)
Find expression A where expression B follows:
A(?=B)
Look behind positive (?<=)
Find expression A where expression B
precedes:
(?<=B)A
The Alternative
If your regex engine does not support lookaheads and lookbehinds, then you can use the regex \[(.*?)\] to capture the innards of the brackets in a group and then you can manipulate the group as necessary.
How does this regex work?
The parentheses capture the characters in a group. The .*? gets all of the characters between the brackets (except for line terminators, unless you have the s flag enabled) in a way that is not greedy.
Just in case, you might have had unbalanced brackets, you can likely design some expression with recursion similar to,
\[(([^\]\[]+)|(?R))*+\]
which of course, it would relate to the language or RegEx engine that you might be using.
RegEx Demo 1
Other than that,
\[([^\]\[\r\n]*)\]
RegEx Demo 2
or,
(?<=\[)[^\]\[\r\n]*(?=\])
RegEx Demo 3
are good options to explore.
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Test
const regex = /\[([^\]\[\r\n]*)\]/gm;
const str = `This is a [sample] string with [some] special words. [another one]
This is a [sample string with [some special words. [another one
This is a [sample[sample]] string with [[some][some]] special words. [[another one]]`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Source
Regular expression to match balanced parentheses
(?<=\[).*?(?=\]) works good as per explanation given above. Here's a Python example:
import re
str = "Pagination.go('formPagination_bottom',2,'Page',true,'1',null,'2013')"
re.search('(?<=\[).*?(?=\])', str).group()
"'formPagination_bottom',2,'Page',true,'1',null,'2013'"
The #Tim Pietzcker's answer here
(?<=\[)[^]]+(?=\])
is almost the one I've been looking for. But there is one issue that some legacy browsers can fail on positive lookbehind.
So I had to made my day by myself :). I manged to write this:
/([^[]+(?=]))/g
Maybe it will help someone.
console.log("this is a [sample] string with [some] special words. [another one]".match(/([^[]+(?=]))/g));
if you want fillter only small alphabet letter between square bracket a-z
(\[[a-z]*\])
if you want small and caps letter a-zA-Z
(\[[a-zA-Z]*\])
if you want small caps and number letter a-zA-Z0-9
(\[[a-zA-Z0-9]*\])
if you want everything between square bracket
if you want text , number and symbols
(\[.*\])
This code will extract the content between square brackets and parentheses
(?:(?<=\().+?(?=\))|(?<=\[).+?(?=\]))
(?: non capturing group
(?<=\().+?(?=\)) positive lookbehind and lookahead to extract the text between parentheses
| or
(?<=\[).+?(?=\]) positive lookbehind and lookahead to extract the text between square brackets
In R, try:
x <- 'foo[bar]baz'
str_replace(x, ".*?\\[(.*?)\\].*", "\\1")
[1] "bar"
([[][a-z \s]+[]])
Above should work given the following explaination
characters within square brackets[] defines characte class which means pattern should match atleast one charcater mentioned within square brackets
\s specifies a space
+ means atleast one of the character mentioned previously to +.
I needed including newlines and including the brackets
\[[\s\S]+\]
If someone wants to match and select a string containing one or more dots inside square brackets like "[fu.bar]" use the following:
(?<=\[)(\w+\.\w+.*?)(?=\])
Regex Tester
I'm trying to strip a string of all characters that are not a letter or a number. I tried String.prototype.replace with a regular expression, but it didn't remove the expected characters:
this.colorPreset1 = this.colorPreset1.replace(/^[0-9a-zA-Z]+$/, '');
this.colorPreset1=this.colorPreset1.replace(/[^0-9a-zA-Z]/g, '');
The character group was changed to a exclusion group. [^] will match any character not in the list. As you had it, it would only match the characters you wanted to keep.
The anchors for the string were removed - You're wanting to replace any non-alpha numeric characters, so it doesn't matter where they're located.
The global flag //g was added so it will replace all matches instead of just the first one.
By adding ^ and $ around your regular expression, you explicitly tell it to match strings starting and ending with this pattern.
So it will replace the searched pattern only if if all the content of the string matches the pattern.
If you want to match each occurence of non numerical or alphabetical characters, you will have to remove the ^ start constraint and the $ end constraint, but also will have to change the pattern itself:
[A-Za-z0-9]
matches alphabetical or numerical characters, you want the opposite of that (to inverse a character class add a ^ at the start of the character class:
[^A-Za-z0-9]
finally add the g option to the regex to tell it to match each occurence (otherwise only the first occurence will be replaced):
/[^A-Za-z0-9]+/g
JavaScript RegEx replace will only replace the first found value. If you specify the g argument in your pattern, it denotes Global or "replace all."
this.colorPreset1=this.colorPreset1.replace(/[^0-9a-zA-Z]/g, '');
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
What does the regular expression /_/g mean?
What I am trying is to match until first occurrence of & met. Right now it is matching only the last occurrence of &.
My regular expression is
(?!^)(http[^\\]+)\&
And I'm trying to match against this text:
https://www.google.com/url?rct3Dj&sa3Dt&url3Dhttp://business.itbusinessnet.com/article/WorldStage-Supports-Massive-4K-Video-Mapping-at-Adobe-MAX-with-Christie-Boxer-4K-Projectors---4820052&ct3Dga&cd3DCAEYACoTOTEwNTAyMzI0OTkyNzU0OTI0MjIaMTBmYTYxYzBmZDFlN2RlZjpjb206ZW46VVM&usg3DAFQjCNE6oIhIxR6qRMBmLkHOJTKLvamLFg
What I need is:
http://business.itbusinessnet.com/article/WorldStage-Supports-Massive-4K-Video-Mapping-at-Adobe-MAX-with-Christie-Boxer-4K-Projectors---4820052
Click for the codebase.
Use the non-greedy mode like this:
/(?!^)(http[^\\]+?)&/
// ^
In non-greedy mode (or lazy mode) the match will be as short as possible.
If you want to get rid ot the & then just wrap it in a lookahead group so it won't be in the match like this:
/(?!^)(http[^\\]+?)(?=&)/
// ^^ ^
Or you could optimize the regular expression as #apsillers suggested in the comment bellow like this:
/(?!^)(http[^\\&]+)/
Note: & is not a special character so you don't need to escape it,
I have a string in JavaScript which looks like the following:
"This {{#is}} a $|test$| string with $|#string$| delimiters {{as}} follows"
And I have a regex which is used to extract the strings between $|.*?$| and {{.*?}} as follows:
/{{(.*?)}}|\$\|(.*?)\$\|/g
Example: https://regex101.com/r/mV3uR1/1
I would like to combine the alternation so there is only one matching group, e.g.:
/{{|\$\|(.*?)}}|\$\|/g
But this seems to ignore my quantifier for 0 or 1 times (the ?) and it matches the entire string up to ... {{as.
Example: https://regex101.com/r/qZ7iI5/1
Why is that happening?
If I enhance that regex to include parenthesis as follows, it does work:
/({{|\$\|)(.*?)(}}|\$\|)/g
Example: https://regex101.com/r/fT5qH0/1
But this then includes the curly braces/dollar-pipe in my matching group which is what I am trying to avoid (as I only care about the string between these delimiters so only want one matching group).
Can anybody shed some light on this please?
Let's compare:
working regex:
/({{|\$\|)(.*?)(}}|\$\|)/g
and not working regex:
/{{|\$\|(.*?)}}|\$\|/g
In the 2nd regex (.*?) has to be followed by }} and alternation is for the whole \$\|(.*?)}} sub-pattern so effectively it means match:
{{ OR \$\|(.*?)}} OR \$\|
Whereas in the first regex due to grouping alternation is correctly applied before & after (.*?).
You can use non-capturing groups as well:
/(?:{{|\$\|)(.*?)(?:}}|\$\|)/g
Now it means:
{{ OR \$\| followed by (.*?) followed by }} OR \$\|.
(?:{{|\$\|)(.*?)(?:}}|\$\|)
^^ ^^
You can try this.See demo.By making the other groups non capturing you will have only 1 group.
https://regex101.com/r/cT0hV4/13
But this will match {{asd$| too.
Your regex {{|\$\|(.*?)}}|\$\| will match any of the 3 following different strings:
{{
\$\|(.*?)}} #look at the start and end of this string and you will understand
$|.
That is the reason you are getting that match,
I am going through some legacy code and I came across this regular express:
var REGEX_STRING_REGEXP = /^\/(.+)\/([a-z]*)$/;
I am slightly confused as to what this regular expression signifies.
I have so far concluded the following:
Begin with /
Then any character (numeric, alphabetic, symbols, spaces)
then a forward slash
End with alphabetic characters
Can someone advice?
You can use a tool like Regexper to visualise your regular expressions. If we pass your regular expression into Regexper, we'll be given the following visualisation:
Direct link to Regexper result.
regex: /^/(.+)/([a-z]*)$/
^ : anchor regex to start of line
(.+) : 1 or more instances of word characters, non-word characters, or digits
([a-z]*) : 0 or more instances of any single lowercase character a-z
$ : anchor regex to end of line
In summary, your regular expression is looking to match strings where it is the first forwardslash, then 1 or more instances of word characters, non-word characters, or digits followed, then another forwardslash, then 0 or more instances of any single lowercase character a-z. Lastly, since both (.+) and ([a-z]*) are surrounded in parenthesis, they will capture whatever matches when you use them to perform regular expression operations.
I would suggest going to rubular, placing the regex ^/(.+)/([a-z]*)$ in the top field and playing with example strings in the test string box to better understand what strings will fit within that regex. (/string/something for example will work with your regular expression).