Parse url with regexp, pattern doesnt match optional string - javascript

I've got these strings I want to parse:
?forum=Jiné akce a jiné#comments
?trening=140#$|Pralinka
?novinka=87#comments
?forum=Mimo mísu#comments
?forum=Členské forum#comments
?trening=139#comments
and I want to output array like
1. forum
2. Jiné akce a jiné
3. comments
or
1. trening
2. 140
3. Pralinka
So I wrote following regexp:
\?([a-z]{4,})\=(.+)\#(\$\|)?([a-z]+)
Regex101
But It's not working in second case (optional string part).

Remember that by default, regex are case sensitive... So [a-z] can't match Pralinka. You can fix that by using the i (case insensitive) flag, or with:
\?([a-z]{4,})=(.+)#(?:\$\|)?([A-Za-z]+)
Notice that there is no need to escape the = or the # (we're not in free spacing mode), and I added a non capturing group (?:...) so that Pralinka will be in the same capturing group as comment.
The demo is here

[a-z]+ does not match Pralinka because P is an uppercase letter.
Fixed regex

You need to add a global flag: /g.
http://regex101.com/r/vR0oM4

Related

How do I match a regex group and its lowercase version?

This is an example of what the regex would and wouldn't match:
# Matches
AAAA: aaaa
# Matches
ABCD: abcd
# Doesn't match
AAAA: abcd
# Doesn't match
AAAA: AaAa
How can I accomplish this?
I found this, but it doesn't work for matches because \L transforms the matches in the replace. Besides, \L seems to be only available in PHP and not in Javascript:
This works, but only when the case-insensitive option is set and it matches the last example:
(\w+): \1
You might be able to use case-sensitivity switch and lookahead. eg.
\b(?=[A-Z]+:\s*[a-z]+)(?i)(\w+):\s*\1\b
or
\b(?=\p{Lu}+:\s*\p{Ll}+)(?i)(\p{L}+):\s*\1\b
Essentially you use 2 regexes at once.
The first (i.e. everything within (?=...)) asserts that the first word is all uppercase ([A-Z]+ or \p{Lu}+) and that the second word is all lowercase ([a-z]+ or \p{Ll}+).
Then you turn on case-insensitivity with (?i).
Then the second regex looks for 2 words that are equal (ignoring case).
The \b prevent matches on input like: xxAAA: aaayy
Note: As the question mentioned VSCode, this answer uses .NET-style regex and assumes that the i modifier is initially turned off but can be toggled. However, I don't think this is possible in ECMAScript ("flags are an integral part of a regular expression. They cannot be added or removed later").

How to not match given prefix in RegEx without negative lookbehind?

Goal
The goal is matching a string in JavaScript without certain delimiters, i.e. a string between two characters (the characters can be included in the match).
For example, this string should be fully matched: $ test string $. This can appear anywhere in a string. That would be trivial, however, we want to allow escaping the syntax, e.g. The price is 5\$ to 10\$.
Summarized:
Match any string that is enclosed by two $ signs.
Do not match it if the dollar signs are escaped using \$.
Solution using negative lookbehind
A solution that achieves this goal perfectly is: (?<!\\)\$(.*?)(?<!\\)\$.
Problem
This solution uses negative lookbehind, which is not supported on Safari. How can the same matches be achieved without using negative lookbehind (i.e. on Safari)?
A solution that partially works is (?<!\\)\$(.*?)(?<!\\)\$. However, this will also match the character in front of the $ sign if it is not a \.
You might rule out what you don't want by matching it, and capture what you want to keep in group 1
\\\$.*?\$|\$.*?\\\$|(\$.*?\$)
Regex demo
You may use this regex and grab your inner text using capture group #1 as you are already doing in your current regex using lookbehind:
(?:^|[^\\])\$((?:\\.|[^$])*)\$
RegEx Demo
RegEx Details:
(?:^|[^\\]): Match start position or a non-backslash character in a non-capturing group
\$: Match starting $
(: Start capturing group
(?:\\.|[^$])*: Match any escaped character or a non-$ character. Repeat this group 0 or more times
): End capturing group
\$: Match closing $
PS: This regex will give same matches as your current regex: (?<!\\)\$(.*?)(?<!\\)\$

How to modify this hashtag regex to check if the second character is a-z or A-Z?

I'm building on a regular expression I found that works well for my use case. The purpose is to check for what I consider valid hashtags (I know there's a ton of hashtag regex posts on SO but this question is specific).
Here's the regex I'm using
/(^|\B)#(?![0-9_]+\b)([a-zA-Z0-9_]{1,20})(\b|\r)/g
The only problem I'm having is I can't figure out how to check if the second character is a-z (the first character would be the hashtag). I only want the first character after the hashtag to be a-z or A-Z. No numbers or non-alphanumeric.
Any help much appreciated, I'm very novice when it comes to regular expressions.
As I mentioned in the comments, you can replace [a-zA-Z0-9_]{1,20} with [a-zA-Z][a-zA-Z0-9_]{0,19} so that the first character is guaranteed to be a letter and then followed by 0 to 19 word characters (alphanumeric or underscore).
However, there are other unnecessary parts in your pattern. It appears that all you need is something like this:
/(?:^|\B)#[a-zA-Z][a-zA-Z0-9_]{0,19}\b/g
Demo.
Breakdown of (?:^|\B):
(?: # Start of a non-capturing group (don't use a capturing group unless needed).
^ # Beginning of the string/line.
| # Alternation (OR).
\B # The opposite of `\b`. In other words, it makes sure that
# the `#` is not preceded by a word character.
) # End of the non-capturing group.
Note: You may also replace [a-zA-Z0-9_] with \w.
References:
Word Boundaries.
Difference between \b and \B in regex.
The below should work.
(^|\B)#(?![0-9_]+\b)([a-zA-Z][a-zA-Z0-9_]{0,19})(\b|\r)
If you only want to accept two or more letter hashtags then change {0,19} with {1,19}.
You can test it here
In your pattern you use (?![0-9_]+\b) which asserts that what is directly on the right is not a digit or an underscore and can match a lot of other characters as well besides an upper or lower case a-z.
If you want you can use this part [a-zA-Z0-9_]{1,20} but then you have to use a positive lookahead instead (?=[a-zA-Z]) to assert what is directly to the right is an upper or lower case a-z.
(?:^|\B)#(?=[a-zA-Z])[a-zA-Z0-9_]{1,20}\b
Regex demo

Regex match word after negated set

I'm currently trying to match the following cases with Regex.
Current regex
\.\/[^/]\satoms\s\/[^/]+\/index\.js
Cases
// Should match
./atoms/someComponent/index.js
./molecules/someComponent/index.js
./organisms/someComponent/index.js
// Should not match
./atomsdsd/someComponent/index.js
./atosdfms/someComponent/index.js
./atomssss/someComponent/index.js
However none of the cases are matching, what am I doing wrong?
Hope this will help you out. You have added some addition characters which lets your regex to fail.
Regex: \.\/(atoms|molecules|organisms)\/[^\/]+\/index\.js
1. \.\/ This will match ./
2. (atoms|molecules|organisms) This will match either atoms or molecules or organisms
3. \/[^\/]+\/ This will match / and then till /
4. index\.js This will match index.js
Regex demo
why not just this simpler pattern?
\.\/(atoms|molecules|organisms)\/.*?index\.js
Try the following:
\.\/(atoms|molecules|organisms)\/[a-zA-Z]*\/index\.js
Forward slashes (and other special characters) should be escaped with a back slash \.
\.\/(atoms|molecules|organisms)\/ matches '.atoms/' or .molecules or organisms strictly. Without the parenthesis it will match partial strings. the | is an alternation operator that matches either everything to the left or everything to the right.
[a-zA-Z]* will match a string of any length with characters in any case. a-z accounts for lower case while A-Z accounts for upper case. * indicates one or more characters. Depending on what characters may be in someCompenent you may need to account for numbers using [a-zA-Z\d]*.
\/index\.js will match '/index.js'

Regular Expression matching space but not any non-word character Javascript

I want to write such one using javascript that allow any char or space but not any other non-word character:
david johan // pass
david johan mark // pass
david## johan // doesn't pass
I have used this
/^(([a-zA-Z]{3,30})+[ ]+([a-zA-Z]{3,30})+)+$/
but it doesn't work
any suggestions ?
Hard to tell exactly what you're after, but I think this will do it:
/^([a-z0-9]|\s)*$/i
The ^ means it needs to start with the code in parentheses, and the $ means it needs to end with one of those characters too. * means 0 or more of the preceding expression and the bit inside the parens means any letter in the range a-z or number 0-9 or (|) any space character, tab, new line etc (\s).
That should match any letter or number and it has the case insensitive flag (i) on it too, it also accounts for white space.
If it was okay to include _ then you could have used /^(\w|\s)*$/
You can use this regular expression:
^[a-zA-Z ]*$
or in Javascript:
var re = new RegExp('^[a-zA-Z ]*$');
try this pattern \w+\s+\w+
Demo
console.log(/\w+\s+\w+/g.test("david johan"))
console.log(/\w+\s+\w+/g.test("david johan mark "))
console.log(/\w+\s+\w+/g.test("david## johan"))
console.log(/\w+\s+\w+/g.test("david ## johan"))
console.log(/\w+\s+\w+/g.test("this should not match!"));

Categories