Regex to match a JSON String - javascript

I am building a JSON validator from scratch, but I am quite stuck with the string part. My hope was building a regex which would match the following sequence found on JSON.org:
My regex so far is:
/^\"((?=\\)\\(\"|\/|\\|b|f|n|r|t|u[0-9a-f]{4}))*\"$/
It does match the criteria with a backslash following by a character and an empty string. But I'm not sure how to use the UNICODE part.
Is there a regex to match any UNICODE character expert " or \ or control character? And will it match a newline or horizontal tab?
The last question is because the regex match the string "\t", but not " " (four spaces, but the idea is to be a tab). Otherwise I will need to expand the regex with it, which is not a problem, but my guess is the horizontal tab is a UNICODE character.
Thanks to Jaeger Kor, I now have the following regex:
/^\"((?=\\)\\(\"|\/|\\|b|f|n|r|t|u[0-9a-f]{4})|[^\\"]*)*\"$/
It appears to be correct, but is there any way to check for control characters or is this unneeded as they appear on the non-printable characters on regular-expressions.info? The input to validate is always text from a textarea.
Update: the regex is as following in case anyone needs it:
/^("(((?=\\)\\(["\\\/bfnrt]|u[0-9a-fA-F]{4}))|[^"\\\0-\x1F\x7F]+)*")$/

For your exact question create a character class
# Matches any character that isn't a \ or "
/[^\\"]/
And then you can just add * on the end to get 0 or unlimited number of them or alternatively 1 or an unlimited number with +
/[^\\"]*/
or
/[^\\"]+/
Also there is this below, found at https://regex101.com/ under the library tab when searching for json
/(?(DEFINE)
# Note that everything is atomic, JSON does not need backtracking if it's valid
# and this prevents catastrophic backtracking
(?<json>(?>\s*(?&object)\s*|\s*(?&array)\s*))
(?<object>(?>\{\s*(?>(?&pair)(?>\s*,\s*(?&pair))*)?\s*\}))
(?<pair>(?>(?&STRING)\s*:\s*(?&value)))
(?<array>(?>\[\s*(?>(?&value)(?>\s*,\s*(?&value))*)?\s*\]))
(?<value>(?>true|false|null|(?&STRING)|(?&NUMBER)|(?&object)|(?&array)))
(?<STRING>(?>"(?>\\(?>["\\\/bfnrt]|u[a-fA-F0-9]{4})|[^"\\\0-\x1F\x7F]+)*"))
(?<NUMBER>(?>-?(?>0|[1-9][0-9]*)(?>\.[0-9]+)?(?>[eE][+-]?[0-9]+)?))
)
\A(?&json)\z/x
This should match any valid json, you can also test it at the website above
EDIT:
Link to the regex

Use this, works also with array jsons [{...},{...}]:
((\[[^\}]{3,})?\{s*[^\}\{]{3,}?:.*\}([^\{]+\])?)
Demo:
https://regex101.com/r/aHAnJL/1

Related

Javascript - String.search() return true if string contains any characters NOT matching regex

I'm trying to do a search for a character in a string NOT matching the regex :
password.search(/[`!###$%^&*A-Za-z0-9]/i));.
Basically, all characters that aren't this regex isn't allowed and I want to know if the user has input any characters that isn't allowed. For example, '\', or any other characters that I might not think of.
I'm pretty sure there's a question similar to this out somewhere, but despite trying to look for it I surprisingly couldn't find it. If this is a duplicate question please link me.
According to this answer, you could use ?!:
console.log("valid$\\".search(/(?![`!###$%^&*A-Za-z0-9])/i));
console.log("256)128".search(/(?![`!###$%^&*A-Za-z0-9])/i));
f you want to exclude a set of characters (some punctuation characters, for example) you would use the ^ operator at the beginning of a character set, in a regex .

Simple regex pattern for email

I've been trying to work this out for almost an hour now, and I can't see myself getting much further with it without any help or explanation. I've used regex before, but only ones that are very simple or had already been made.
This time, I'm trying to work out how to write a regex that achieves the following:
Email address must contain one # character and at least one dot (.) at least one position after the # character.
So far, this is all I've been able to work out, and it still matches email addresses that, for example, have more than one # symbol.
.*?#?[^#]*\.+.*
It would be helpful if you can show me how to construct a regular expression that checks for a single # and at least one full stop one or more spaces after the #. If you could break down the regex and explain what each bit does, that would be really helpful.
I want to keep it simple for now, so it doesn't have to be a full-on super-accurate email validation expression.
With the help of ClasG's comment, I now have a fairly straightforward and suitable regex for my problem. For the sake of anyone learning regex who might come across this question in the future, I'll break the expression down below.
Expression: ^[^#]+#[^#]+\.[^#]+$
^ Matches the beginning of the string (or line if multiline)
[^#] Match any character that is not in this set (i.e. not "#")
+ Match one or more of this
# Match "#" character
[^#] Match any character that is not in this set
+ Match one or more
\. Match "." (full stop) character (backslash escapes the full stop)
[^#] Match any character that is not in this set
+ Match one or more
$ Matches the end of the string (or line if multiline)
And in plain language:
Start at beginning of string or line
Include all characters except # until the # sign
Include the # sign
Include all characters except # after the # sign until the full stop
Include all characters except # after the full stop
Stop at the end of the string or line
Email address must contain one # character
No they don't. An email address with no '#' character is perfectly valid. An email address with multiple '#' characters before an IP address is perfectly valid (as long as all but 1 are outside the ADDR_SPEC or are quoted/escaped within the mailbox name).
I suspect you're not trying to validate an email address but rather an ADDR_SPEC. The answer linked by Máté Safranka describes how to validate an ADDR_SPEC (not an email address). Unless you expect to be validating records which don't have a valid internet MX record, and more than one '#' is more likely be a typo than a valid address....
/[a-z0-9\._%+!$&*=^|~#%'`?{}/\-]+#([a-z0-9\-]+\.){1,}([a-z]{2,16})/
^[^\W_]+\w*(?:[.-]\w*)*[^\W_]+#[^\W_]+(?:[.-]?\w*[^\W_]+)*(?:\.[^\W_]{2,})$

Regex to include white space and then verify a character

I have a pattern to search around web but im new to it and unable to verify it.
Im looking for example
[verify if any whitespace] [any of this char ':' '|' ';'] [verify if any whitespace] [[String a-zA-Z0-9-]+]
Suppose Test String -
" : hello129 " or ":hello129" or ";hello129" or "|hello129" or " | hello129"
My attemps
\s[:;|]\s[a-zA-Z0-9_.+-]+
(\w+\s\w+):(\w+\s\w+)[a-zA-Z0-9_.+-]+
Please suggest me possible solutions for this pattern in regex/regular expressions
Thank you in advance :)
Whitespace is represented with \s. The other groups are easily written in brackets.
Whitespace could by one or more characters, so the + modifier will be necessary. If Whitespace was optional, the * would have been okay as well. If only one character of whitespace would be allowed, we would leave the modifier out.
The string in the end is one or more characters long and needs the + as well.
The result is a regular expression like this:
\s+[:;|]\s+[a-zA-Z0-9-]+
Here is an example including tests on the great RegEx testing site regex101.com.

Regex-Groups in Javascript

I have a problem using a Javascript-Regexp.
This is a very simplified regexp, which demonstrates my Problem:
(?:\s(\+\d\w*))|(\w+)
This regex should only match strings, that doesn't contain forbidden characters (everything that is no word-character).
The only exception is the Symbol +
A match is allowed to start with this symbol, if [0-9] is trailing.
And a + must not appear within words (44+44 is not a valid match, but +4ad is)
In order to allow the + only at the beginning, I said that there must be a whitespace preceding. However, I don't want the whitespace to be part of the match.
I tested my regex with this tool: http://regex101.com/#javascript and the resultig matches look fine.
There are 2 Issues with that regexp:
If I use it in my JS-Code, the space is always part of the match
If +42 appears at the beginning of a line, it won't be matched
My Questions:
How should the regex look like?
Why does this regex add the space to the matches?
Here's my JS-Code:
var input = "+5ad6 +5ad6 sd asd+as +we";
var regexp = /(?:\s(\+\d\w*))|(\w+)/g;
var tokens = input.match(regexp);
console.log(tokens);
How should the regex look like?
You've got multiple choices to reach your goal:
It's fine as you have it. You might allow the string beginning in place of the whitespace as well, though. Just get the capturing groups (tokens[1], tokens[2]) out of it, which will not include the whitespace.
If you didn't use JavaScript, a lookbehind could help. Unfortunately it's not supported.
Require a non-word-boundary before the +, which would make every \w character before the + prevent the match:
/\B\+\d\w+|\w+/
Why does this regex add the space to the matches?
Because the regex does match the whitespace. It does not add the \s(\+\d\w+) to the captured groups, though.

RegExp to match hashtag at the begining of the string or after a space

I have looked through previous questions and answers, however they do not solve the following:
https://stackoverflow.com/questions/ask#notHashTag
The closest I got to is this: (^#|(?:\s)#)(\w+), which finds the hashtag in half the necessary cases and also includes the leading space in the returned text. Here are all the cases that need to be matched:
#hashtag
a #hashtag
a #hashtag world
cool.#hashtag
##hashtag, but only until the comma and starting at second hash
#hashtag#hashtag two separate matches
And these should be skipped:
https://stackoverflow.com/questions/ask#notHashTag
Word#notHashTag
#ab is too short to be a hashtag, 3 characters minimum
This should work for everything but #hashtag#duplicates, and because JS doesn't support lookbehind, that's probably not possible to match that by itself.
\B#\w{3,}
\B is designed to match only between two word characters or two non-word characters. Since # is a non-word character, this forces the match to be preceded by a space or punctuation, or the beginning of the string.
Try this regex:
(?:^|[\s.])(#+\w{3,})(#+\w{3,})?
Online Demo: http://regex101.com/r/kG1nD5

Categories