Match last string between special characters - javascript

I want to get the last string between special characters. I've done for square bracket as \[(.*)\]$
But, when I use it on something like Blah [Hi]How is this[KoTuWa]. I get the result as [Hi]How is this[KoTuWa].
How do i modify it to get the last stringthat is KotuWa.
Also, I would like to generalise to general special characters, instead of just matching the string between square brackets as above.
Thanks,
Sai

I would do this:
[^[\]]+(?=][^[\]]*$)
Debuggex Demo
To extend this to other types of brackets/special chars, say I also wanna match curly braces { and double quotes ":
[^{}"[\]]+(?=["\]}][^{}"[\]]*$)
Debuggex Demo (I added the multi-line /m only to show multiple examples)

Here is one way to do it:
\[([^\[]*)\]$

You can require that the string between brackets does not contain brackets:
Edit: thanks to funkwurm and jcubic for pointing out an error. Here's the fixed expression:
\[([^[]+)\][^\[]*$
If you need to use other separators than brackets, you should:
replace the \[ and \] with your new separators
replace the negative character classes with your beginning separator.
For example, assuming you need to use the separators <> instead of [], you'd do this:
<([^<]+)>[^\>]*$

Related

Remove all non-ASCII characters from a string except smart quotes

I have this regex that removes all non-ascii characters from a string including all smart quotes:
str.replace(/[\u{0080}-\u{FFFF}]/gu,"");
But I need to keep the Smart quotes
The regex for removing Smart single quotes is: [\u2018\u2019\u201A\u201B\u2032\u2035] and for Smart double quotes is: [\u201C\u201D\u201E\u201F\u2033\u2036].
I need a combined regex that that removes all non-ASCII ([\u{0080}-\u{FFFF}]) except smart quotes ([\u2018\u2019\u201A\u201B\u2032\u2035] or [\u201C\u201D\u201E\u201F\u2033\u2036]).
Note that you need to use the \u{XXXX} notation in the regex with u modifier, and to build the regex you need you need to put the character class with exceptions into a negative lookahead placed right before your more generic pattern:
/(?![\u{2018}\u{2019}\u{201A}\u{201B}\u{2032}\u{2035}\u{201C}\u{201D}\u{201E}\u{201F}\u{2033}\u{2036}])[\u{0080}-\u{FFFF}]/gu
See the regex demo
Note that some chars in the Unicode table go one after another, so we may shorten the pattern using ranges:
/(?![\u{2018}-\u{201F}\u{2032}\u{2033}\u{2035}\u{2036}])[\u{0080}-\u{FFFF}]/gu
See this demo.
Instead of matching the non-ascii, match the ascii + the characters you need, and negate the expression. Example:
str.replace(/[^\x00-\x7F\u2018\u2019\u201A\u201B\u2032\u2035\u201C\u201D\u201E\u201F\u2033\u2036]/gu,"");

Regex match commas after a closing parentheses

In Javascript, I am trying to split the following string
var s1 = "(-infinity, -3],(-3,-2),[1,infinity)";
into an array
["(-infinity, -3]","(-3,-2)","[1,infinity)"]
by using this statement
s1.split(/(?=[\]\)]),/);
to explain, I want to split the string by commas that follow a closing square bracket or parenthesis. I use the Look Ahead (?=[\]\)]), to do so, but it doesn't match any commas. When I change it to (?![\]\)]),, it matches every commas. Please suggest what is the problem in my regex.
Your logic is backwards. (?=...) is a look-ahead group, not a look-behind. This means that s1.split(/(?=[\]\)]),/); matches only if the next character is simultaneously ] or ) and ,, which is impossible.
Try this instead:
s1.split(/,(?=[\[\(])/);

Escape dot in a regex range

Those two regex act the same way:
var str = "43gf\\..--.65";
console.log(str.replace(/[^\d.-]/g, ""));
console.log(str.replace(/[^\d\.-]/g, ""));
In the first regex I don't escape the dot(.) while in the second regex I do(\.).
What are the differences? Why is the result the same?
The dot operator . does not need to be escaped inside of a character class [].
Because the dot is inside character class (square brackets []).
Take a look at http://www.regular-expressions.info/reference.html, it says (under char class section):
Any character except ^-]\ add that character to the possible matches
for the character class.
If you using JavaScript to test your Regex, try \\. instead of \..
It acts on the same way because JS remove first backslash.
On regular-expressions.info, it is stated:
Remember that the dot is not a metacharacter inside a character class,
so we do not need to escape it with a backslash.
So I guess the escaping of it is unnecessary...

Alternation operator inside square brackets does not work

I'm creating a javascript regex to match queries in a search engine string. I am having a problem with alternation. I have the following regex:
.*baidu.com.*[/?].*wd{1}=
I want to be able to match strings that have the string 'word' or 'qw' in addition to 'wd', but everything I try is unsuccessful. I thought I would be able to do something like the following:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
but it does not seem to work.
replace [wd|word|qw] with (wd|word|qw) or (?:wd|word|qw).
[] denotes character sets, () denotes logical groupings.
Your expression:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
does need a few changes, including [wd|word|qw] to (wd|word|qw) and getting rid of the redundant {1}, like so:
.*baidu.com.*[/?].*(wd|word|qw)=
But you also need to understand that the first part of your expression (.*baidu.com.*[/?].*) will match baidu.com hello what spelling/handle????????? or hbaidu-com/ or even something like lkas----jhdf lkja$##!3hdsfbaidugcomlaksjhdf.[($?lakshf, because the dot (.) matches any character except newlines... to match a literal dot, you have to escape it with a backslash (like \.)
There are several approaches you could take to match things in a URL, but we could help you more if you tell us what you are trying to do or accomplish - perhaps regex is not the best solution or (EDIT) only part of the best solution?

Trying to write a regex that matches only numbers,spaces,parentheses,+ and -

I'm trying to write a regular that will check for numbers, spaces, parentheses, + and -
this is what I have so far:
/\d|\s|\-|\)|\(|\+/g
but im getting this error: unmatched ) in regular expression
any suggestions will help.
Thanks
Use a character class:
/[\d\s()+-]/g
This matches a single character if it's a digit \d, whitespace \s, literal (, literal ), literal + or literal -. Putting - last in a character class is an easy way to make it a literal -; otherwise it may become a range definition metacharacter (e.g. [A-Z]).
Generally speaking, instead of matching one character at a time as alternates (e.g. a|e|i|o|u), it's much more readable to use a character class instead (e.g. [aeiou]). It's more concise, more readable, and it naturally groups the characters together, so you can do e.g. [aeiou]+ to match a sequence of vowels.
References
regular-expressions.info/Character Class
Caveat
Beginners sometimes mistake character class to match [a|e|i|o|u], or worse, [this|that]. This is wrong. A character class by itself matches one and exactly one character from the input.
Related questions
Regex: why doesn’t [01-12] range work as expected?
Here is an awesome Online Regular Expression Editor / Tester! Here is your [\d\s()+-] there.
/^[\d\s\(\)\-]+$/
This expression matches only digits, parentheses, white spaces, and minus signs.
example:
888-111-2222
888 111 2222
8881112222
(888)111-2222
...
You need to escape your parenthesis, because parenthesis are used as special syntax in regular expressions:
instead of '(':
\(
instead of ')':
\)
Also, this won't work with '+' for the same reason:
\+
Edit: you may want to use a character class instead of the 'or' notation with '|' because it is more readable:
[\s\d()+-]
Try this:
[\d\s-+()]

Categories