Regex for any character except quote after comma - javascript

I want to match every word separated by comma, but it must not include a quote like ' or ".
I was using this regex:
^[a-zA-Z0-9][\!\[\#\\\:\;a-zA-Z0-9`_\s,]+[a-zA-Z0-9]$
However, it only matches a character and number and not a symbol.
The output should be:
example,example //true
exaplle,examp#3 //true, with symbol or number
example, //false, because there is no word after comma
,example //false, because there is no word before comma
##example&$123,&example& //true, with all character and symbol except quote

You can match 1+ times what is present in the character class. Then repeat 1+ times in a non capturing group (?: what is present in the character class, preceded by a comma.
^[!\[#\\:;a-zA-Z0-9`_ &$#]+(?:,[!\[#\\:;a-zA-Z0-9`_ &$#]+)+$
Regex demo
Note that you don't have to escape \!, \#, \: and \; in the character class, and that \s might also possibly match a newline.

I'm assuming you want the whole string to match perfectly with your conditions and return true then and then only.
These are the conditions-
Each word should be separated by a comma, said comma should have 2 valid words on each side
Words can contain anything except the 2 kinds of quotes (' and ") and whitespace characters (spaces and newlines).
The regex you would use is this- ^(?:[^,'"\s]+,[^,'"\s]+)+$, with the global flag (g) on.
Check out the demo here
Edit: As per request of being able to match only a single word.
This is the regex you would use for that- ^(?:(?:[^,'"\s]+,[^,'"\s]+)+|[^,'"\s]+)$
This will match words separated by a , as well as match just a single word.
The conditions for what qualifies as a word remains the same as aforementioned.
Quick explanation:-
^[^,'"\s]+,[^,'"\s]+$
This part matches 2 words separated by a comma, [^,'"\s]+ denotes a word
Wrapping that whole thing in ^(?:[^,'"\s]+,[^,'"\s]+)+$ simply makes it repeat, so it'll match N number of words separated by a comma, not just 2
Then adding another alternative using | and wrapping the whole thing in a group (non-capturing), we get ^(?:(?:[^,'"\s]+,[^,'"\s]+)+|[^,'"\s]+)$
This simply just adds the alternative [^,'"\s]+ - which matches a singular word.
Check out the updated demo here

Related

(/\s+(\W)/g, '$1') - how are the spaces being removed?

let a = ' lots of spaces in this ! '
console.log(a.replace(/\s+(\W)/g, '$1'))
log shows lots of spaces in this!
The above regex does exactly what I want, but I am trying to understand why?
I understand the following:
s+ is looking for 1 or more spaces
(\W) is capturing the non-alphanumeric characters
/g - global, search/replace all
$1 returns the prior alphanumeric character
The capture/$1 is what removes the space between the words This and !
I get it, but what I don't get is HOW are all the other spaces being removed?? I don't believe I have asked for them to (although I am happy they are).
I get this one console.log(a.replace(/\s+/g, ' ')); because the replace is replacing 1 or more spaces between alphanumeric characters with a single space ' '.
I'm scratching my head to understand HOW the first RegEx /\s+(\W)/g, '$1'replaces 1 or more spaces with a single space.
What your regex says is "match one or more spaces, followed by one or more non-alphanumeric character, and replace that whole result with that one or more non-alphanumeric character". The key is that the \s+ is greedy, meaning that it will try and match as many characters as possible. So in any given string of spaces it will try and match all of the spaces it can. However, your regex also requires one or more non-word characters (\W+). Because in your case the next character after each final space is a word character (i.e. a letter), this last part of the regex must match the last space.
Therefore, given the string a b, and using parens to mark the \s+ and \W+ matches, a( )( )b is the only way for the regex to be valid (\s+ matches the first two spaces and \W+ matches the last space). Now it's just a simple substitution. Since you wrapped the \W+ in parentheses that makes it the first and only capturing group, so replacing the match with $1 will replace it with that final space.
As another example, running this replace against a !b will result in the match looking like a( )(!)b (since ! is now the last non-word character), so the final replaced result will be a!b.
Lets take this string 'aaa &bbb' and run it through.
We get 'aaa&bbb'
\s+ grabs the 3 spaces before the ampersand
(\W) grabs the ampersand
$1 is the ampersand and replaces ' &' with '&'
That same principal applies to the spaces. You are forcing one of the spaces to satisfy the (\W) capture group for the replacement. It's also why your exclamation point isn't nuked.
List of matches would be the following. I replaced space with ☹ so it is easier to see
"☹☹☹☹(☹)",
"☹☹☹☹(☹)",
"☹☹(!)",
"☹(☹)"
And the code is saying to replace the match with what is in the capture group.
' lots of☹☹☹☹(☹)spaces☹☹☹☹(☹)in this☹☹(!)☹(☹)'
so when you replace it you get
' lots of☹spaces☹in this!☹'

How to make regex for 3 slashes?

I want to write regex for following
students/ad34567-06c1-498c-9b15-cdbac695c1f2/data/sessions
Where students, data and sessions should be exact match.
i have tried this
[students]\[a-z]\[a-z]\[a-z]
You can try this regex, although your question is not clear to me.
^students\/([\w\-\d]+)\/data\/sessions$
Check here https://regex101.com/r/xnxwCX/1
you can grab the data in between students/, /data/session.
In your regex [students]\\[a-z]\\[a-z]\\[a-z] you are trying to match with word students in a character class [students] which will match one of the specified characters instead of matching the whole word.
To match a forward slash you have to use \/ instead of //. [a-z] is specified without a quantifier and will match 1 character from a-z.
To match your example string you might use
^students\/[a-z0-9]+(?:-[a-z0-9]+)+\/data\/sessions$
Regex demo
This part [a-z0-9]+(?:-[a-z0-9]+)+ matches one or more times a lowercase character or a digit [a-z0-9]+
Following a non capturing group repeated one or more times that will match a hyphen followed by matching one or more times a lowercase character or a digit (?:-[a-z0-9]+)+
You might also use [a-f0-9] if your characters are a -f

Regex: How do I remove the character BEFORE the matched string?

I am intercepting messages which contain the following characters:
*_-
However, whenever any one of these characters comes through, it will always be preceded by a \. The \ is just for formatting though and I want to remove it before sending it off to my server. I know how to easily create a regex which would remove this backslash from a single letter:
'omg\_bbq\_everywhere'.replace(/\\_/g, '')
And I recognize I could just do this operation 3 times: once for each character I want to remove the preceding backslash for. But how can I create a single regex which would detect all three characters and remove the preceding backslash in all 3 cases?
You can use a character class like [*_-].
To remove only the backslash before these characters:
document.body.innerHTML =
"omg\\-bbq\\*everywhere\\-".replace(/\\([*_-])/g, '$1');
When you place a subpattern into a capturing group ((...)), you capture that subtext into a numbered buffer, and then you can reference it with a $1 backreference (1 because there is only one (...) in the pattern.)
This is a good time to use atomic matching. Specifically you want to check for the slash and then positive lookahead for any of those characters.
Ignoring the code, the raw regex you want is:
\\(?=[*_-])
A literal backslash, with one of these characters in front of it: *_-
So now you are matching the slash. The atomic match is a 0 length match, so it doesn't match anything, but sets a requirement that "for this to be a valid match, it needs to be followed by [*_-]"
Atomic groups: http://www.regular-expressions.info/atomic.html
Lookaround statements: http://www.regular-expressions.info/lookaround.html
Positive and negative lookahead and lookbehind matches are available.

/(\S)\1(\1)+/g matching all occurrences of three equal non-whitespace characters following each other

Its given: /(\S)\1(\1)+/g matches all occurrences of three equal non-whitespace characters following each other.
I don't understand why there is () around (\S) and 2nd (\1), but not around 1st (\1). Can anyone help in explaining how above regex works?
src: http://www.javascriptkit.com/javatutors/redev2.shtml
Thnx in advance.
The \S needs parentheses to capture its value, so you can refer back to the captured value with \1. \1 means "match the same text which capturing group #1 matched".
I believe there is a problem with this regex. You said you want to match "three equal non-whitespace characters". But the + will make this match 3 or more equal, consecutive non-whitespace characters.
The g on the end means "apply this regex over the entire input string, or globally".
The second set of parentheses is not necessary. It needlessly captures the repeated character a second time, while matching the same strings as this regex:
/(\S)\1\1+/g
Also, as #AlexD pointed out, the description should say that it matches at least three characters. If you replaced that regex with BONK in the string fooxxxxxxbar:
'fooxxxxxxbar'.replace(/(\S)\1\1+/g, 'BONK')
..you might expect the result to be fooBONKBONKbar from their description, because there are two sets of three 'x's. But in fact the result would be fooBONKbar; the first \1 matches the second 'x', and the \1+ matches the third 'x' and any 'x's that follow it. If they wanted to match just three characters, they should have left the + off.
I noticed several other sloppy descriptions like that, plus at least one outright error: \B is equivalent to (?!\b) (a position that's not a word boundary), not [^\b] (a character that's not a backspace). For that matter, their description of word boundaries--"the position between a word and a space"--is wrong, too. A word boundary isn't defined by any particular character, like a space--in fact, it can just as well be the absence of any character that creates one. The string:
Word
...starts with a word boundary because 'W' is a word character and, being first, it's not preceded by another word character. Similarly, the 'd' is not followed by another word character, so the end of the string is also a word boundary.
Also, a regex doesn't know from words, only word characters. The definition of a word character can vary depending on the regex flavor and Unicode or locale settings, but it always includes [A-Za-z0-9_] (ASCII letters and digits plus the underscore). A word boundary is simply a position that's between one of those characters and any other character (or no other character, as I explained earlier).
If you want to learn about regexes, I suggest you forget that site and start here instead: regular-expressions.info.

What are the meanings of these regular expressions in JavaScript?

1) ^[^\s].{1,20}$
2) ^[-/##&$*\w\s]+$
3) ^([\w]{3})$
Are there any links for more information?
^[^\s].{1,20}$
Matches any non-white-space character followed by between 1 and 20 characters. [^\s] could be replaced with \S.
^[-/##&$*\w\s]+$
Matches 1 or more occurances of any of these characters: -/##&$*, plus any word character (A-Ba-b0-9_) plus any white-space character.
^([\w]{3})$
Matches three word characters (A-Ba-b0-9_). This regular expression forms a group (with (...)), which is quite pointless because the group will always equal the aggregate match. Note that the [...] is redundant -- might as well just use \w without wrapping it in a character class.
More info: "Regular Expression Basic Syntax Reference"
1) match everything without space what have 1 to 20 chars.
2) match all this signs -/##&$* plus words and spaces, at last one char must be
3) match three words
here is excelent source of regex
http://www.regular-expressions.info/
Matches any string that starts with a non-whitespace character that's followed by at least one and up to 20 other characters before the end of the string.
Matches any string that contains one or more "word" characters (letters etc), whitespace characters, or any of "-/##&$*"
Matches a string with exactly 3 "word" characters

Categories