Validate a search expression with quotes and * - javascript

I'm trying to write a regular expression to validate some user entered text. I'd like to use the regular expression with an ngPattern directive so should avoid using the g flag.
Essentially, there are a number of "simple" rules.
There must be one or more words.
Single quotes (') are not allowed.
Double quotes (") are allowed but must be paired, i.e. open and closing.
Paired double quotes must wrap one or more words.
No white space is allowed between a double quote and the word it adjacently wraps.
An asterisk (*) is not allowed unless it immediately precedes a closing double quote and follows a word, without whitespace.
Here are some examples.
example match
'' false
' ' false
' foo' true
'foo' true
'foo bar' true
'foo bar*' false
'"foo' false
'"foo"' true
'" foo"' false
'"foo "' false
'"foo bar"' true
'"foo *"' false
'"foo*"' true
'foo*"' false
'"foo*" "bar*"' true
'foo "bar*"' true
'"foo* bar"' false
'"foo*" bar' true
I've created unit tests here
I'm struggling to get anywhere close,
I've got an expression like this
/(")(?:(?=(\\?))\2.)*?\1/
that will match text between paired double quotes. Something like this,
/^.*\*"$/
will match text that ends with '*"',
as you can see, I've got a long way to go, please help.
Is it possible that a regular expression is the wrong way to do this?

^(?=.*\b)(?=[^"]*("[^"]*"[^"]*)*$)(?![^"]*("[^"]*"[^"]*)*" *")(?!.*\*[^"])(?!.*[ "]\*)(?![^"]*("[^"]*"[^"]*)*[^"]*\*")(?![^"]*("[^"]*"[^"]*)*" \w)(?![^"]*("[^"]*"[^"]*)*"[^"]*\w ")'[^']*'$
See it in action
Good luck using this in your production codebase
Ok, so dafuq...
An important idea that we are going to reuse is how to reach a position, before which you know there were an even number of "s. Namely:
[^"]*("[^"]*"[^"]*)*
Unfortunately, we can't reuse patterns in javascript regexes, so we will have to repeat it where ever we need it. Namely:
Double quotes (") are allowed but must be paired, i.e. open and closing.
^(?=__even_quotes_pattern__$)
Basically, we say that from the start (^), when we iterate til the end ($) we match the said pattern, aka even number of ".
No white space is allowed between a double quote and the word it adjacently wraps.
We will split this in two parts - doesn't happen on the left, doesn't happen on the right:
^(?!__even_quotes_pattern__" \w)
^(?!__even_quotes_pattern__\w ")
Paired double quotes must wrap one or more words.
^(?!__even_quotes_pattern__" *")
(there are no paired quotes that wrap only spaces)
The rest of them are easier:
There must be one or more words.
^(?=.*\b)
(at some point there is a word boundary (\b))
Single quotes (') are not allowed.
(or from the interpretation in the comments, not allowed except for the ones that wrap the string)
^'[^']*'$
An asterisk (*) is not allowed unless it immediately precedes a closing double quote and follows a word, without whitespace.
We will split this into three parts:
(1) Must precede a ":
(?!.*\*[^"])
(2) Must follow a non-" or space
(?!.*[ "]\*)
(3) It doesn't precede non-closing ":
(?!__even_quotes_pattern__[^"]*\*")

clean and simple function:
function myParser(string) {
var string = string.trim(),
wordsArray = string.split(' '),
regExp = /((?=")^["][a-zA-Z]+[*]{0,1}["]$|^[a-zA-Z]+$)/,
len = wordsArray.length,
i = 0;
// '"foo bar"' situation
if (string.match(/^["][a-zA-Z]+[\s]?[a-zA-Z]+[*]{0,1}["]$/)) {
return true;
}
for (i; i < len; i++) {
var result = wordsArray[i].match(regExp);
if (result === null) {
return false;
}
}
return true;
}
https://jsfiddle.net/cy9ozmdm/ to check results.
If you need explanations - write in comment, I will write logic detailed.
(Idea for you: check 2 variants (clear regExp and function) - on 10.000 test situation - what works faster (and doesn't fail :))?)

Related

regex with replace() for letters only

I have a string that output
20153 Risk
What i am trying to achieve is getting only letters, i have achieved by getting only numbers using regular expression which is
const cf_regex_number = cf_input.replace(/\D/g, '');
this will return only 20153 . But as soon as i tried to only get letters , its returning the while string instead of Risk . i have done my research and the regular expression to get only letters is using **/^[a-zA-Z]*$/**
This is my line of code i tried to get only letters
const cf_regex_character = cf_input.replace(/^[a-zA-Z]*$/,'')
but instead of returning Risk , it is returning 20153 Risk which is the whole line of string .
/[^a-z]+/i
The [ brackets ] signify a range of characters; specifically, a to z in this case.
Actually the i flag means insensitive to case, so that includes A to Z also.
The caret ^ inverts the pattern; it means, anything not in the specified range.
And the + means continue adding characters to the match as long as they are they within that range.
Then stop matching.
In effect this matches everything up to the space in 20153 Risk.
Then you replace this match with the empty string '' and what you've got left is Risk.
const string = '20153 Risk';
const result = string.replace(/[^a-z]+/i, '');
console.log(result);
Your first pattern is locating every non-digit and replacing it with nothing.
On the other hand, your second pattern is locating just the first occurence of a pattern, and the pattern is looking for start of string, followed by letters, followed by end of string. There is no such sequence - if you start from the start of string, there are exactly zero letters, and then you are left very far from the expected end of the string. Even if that worked, you are deleting letters, not non-letters.
This pattern is parallel to your first one (delete any occurence of a non-letter):
const cf_regex_character = cf_input.replace(/[^a-zA-Z]/g,'')
but possibly a better way to go is to extract the desired substring, instead of deleting everything that it is not:
const letters = cf_input.match(/[a-z]+/i)[0];
const numbers = cf_input.match(/\d+/)[0];
(This is if you know there is such a substring; if you are unsure it would be better to code a bit more defensively.)
cf_input="20153 Risk"
const cf_regex_character = cf_input.replace(/\d+\s/,'')
console.log(cf_regex_character)
str="20153 Risk"
reg=/[a-z]+/gi
res=str.match(reg)
console.log(res[0])

Regex excluding matches wrapped in specific bbcode tags

I'm trying to replace double quotes with curly quotes, except when the text is wrapped in certain tags, like [quote] and [code].
Sample input
[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>"Why no goodbye?" replied [b]Bob[/b]. "It's always Hello!"</p>
Expected output
[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>“Why no goodbye?” replied [b]Bob[/b]. “It's always Hello!”</p>
I figured how to elegantly achieve what I want in PHP by using (*SKIP)(*F), however my code will be run in javascript, and the javascript solution is less than ideal.
Right now I'm splitting the string at those tags, running the replace, then putting the string together:
var o = 3;
a = a
.split(/(\[(?<first>(?:icode|quote|code))[^\]]*?\](?:[\s]*?.)*?[\s]*?\[\/(?:\k<first>)\])/i)
.map(function(x,i) {
if (i == o-1 && x) {
x = '';
}
else if (i == o && x)
{
x = x.replace(/(?![^<]*>|[^\[]*\])"([^"]*?)"/gi, '“$1”')
o = o+3;
}
return x;
}).join('');
Javascript Regex Breakdown
Inside split():
(\[(?<first>icode|quote|code)[^\]]*?\](?:.)*?\[\/(\k<first>)\]) - captures the pattern inside parentheses:
\[(?<first>quote|code|icode)[^\]]*?\] - a [quote], [code], or [icode] opening tag, with or without parameters like =html, eg [code=html]
(?:[\s]*?.)*? - any 0+ (as few as possible) occurrences of any char (.), preceded or not by whitespace, so it doesn't break if the opening tag is followed by a line break
[\s]*? - 0+ whitespaces
\[\/(\k<first>)\] - [\quote], [\code], or [\icode] closing tags. Matches the text captured in the (?<first>) group. Eg: if it's a quote opening tag, it'll be a quote closing tag
Inside replace():
(?![^<]*>|[^\[]*\])"([^"]*?)" - captures text inside double quotes:
(?![^<]*>|[^\[]*\]) - negative lookahead, looks for characters (that aren't < or [) followed by either > or ] and discards them, so it won't match anything inside bbcode and html tags. Eg: [spoiler="Name"] or <span style="color: #24c4f9">. Note that matches wrapped in tags are left untouched.
" - literal opening double quotes character.
([^"]*?) - any 0+ character, except double quotes.
" - literal closing double quotes character.
SPLIT() REGEX DEMO: https://regex101.com/r/Ugy3GG/1
That's awful, because the replace is executed multiple times.
Meanwhile, the same result can be achieved with a single PHP regex. The regex I wrote was based on Match regex pattern that isn't within a bbcode tag.
(\[(?<first>quote|code|icode)[^\]]*?\](?:[\s]*?.)*?[\s]*?\[\/(\k<first>)\])(*SKIP)(*F)|(?![^<]*>|[^\[]*\])"([^"]*?)"
PHP Regex Breakdown
(\[(?<first>quote|code|icode)[^\]]*?\](?:[\s]*?.)*?[\s]*?\[\/(\k<first>)\])(*SKIP)(*F) - matches the pattern inside capturing parentheses just like javascript split() above, then (*SKIP)(*F) make the regex engine omit the matched text.
| - or
(?![^<]*>|[^\[]*\])"([^"]*?)" - captures text inside double quotes in the same way javascript replace() does
PHP DEMO: https://regex101.com/r/fB0lyI/1
The beauty of this regex is that it only needs to be run once. No splitting and joining of strings. Is there a way to implement it in javascript?
Because JS lacks backtracking verbs you will need to consume those bracketed chunks but later replace them as is. By obtaining the second side of the alternation from your own regex the final regex would be:
\[(quote|i?code)[^\]]*\][\s\S]*?\[\/\1\]|(?![^<]*>|[^\[]*\])"([^"]*)"
But the tricky part is using a callback function with replace() method:
str.replace(regex, function($0, $1, $2) {
return $1 ? $0 : '“' + $2 + '”';
})
Above ternary operator returns $0 (whole match) if first capturing group exists otherwise it encloses second capturing group value in curly quotes and returns it.
Note: this may fail in different cases.
See live demo here
Nested markup is hard to parse with rx, and JS's RegExp in particular. Complex regular expressions also hard to read, maintain, and debug. If your needs are simple, a tag content replacement with some banned tags excluded, consider a simple code-based alternative to run-on RegExps:
function curly(str) {
var excludes = {
quote: 1,
code: 1,
icode: 1
},
xpath = [];
return str.split(/(\[[^\]]+\])/) // breakup by tag markup
.map(x => { // for each tag and content:
if (x[0] === "[") { // tag markup:
if (x[1] === "/") { // close tag
xpath.pop(); // remove from current path
} else { // open tag
xpath.push(x.slice(1).split(/\W/)[0]); // add to current path
} //end if open/close tag
} else { // tag content
if (xpath.every(tag =>!excludes[tag])) x = x.replace(/"/g, function repr() {
return (repr.z = !repr.z) ? "“" : "”"; // flip flop return value (naive)
});
} //end if markup or content?
return x;
}) // end term map
.join("");
} /* end curly() */
var input = `[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>"Why no goodbye?" replied [b]Bob[/b]. "It's always Hello!"</p>`;
var wants = `[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>“Why no goodbye?” replied [b]Bob[/b]. “It's always Hello!”</p>`;
curly(input) == wants; // true
To my eyes, even though it a bit longer, code allows documentation, indentation, and explicit naming that makes these sort of semi-complicated logical operations easier to understand.
If your needs are more complex, use a true BBCode parser for JavaScript and map/filter/reduce it's model as needed.

Regex example to match pseudo element's content property

I am trying to parse the pseudo selector content in javascript.
Html content can be
content: counter(item)" " attr(data) "" counter(item1,decimal) url('test.jpeg') "hi" attr(xyz);
To parse this content i am using below regex (logic of matching parenthesis copied from internet )
counter\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)
This selects all the counter with "(" but counter can not have nested parentheses (as far as i know, correct me if i am wrong).Similarly same regex i am using to select other content also.
Attr : attr\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)
Quotes: openQuote\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)
String: anything inside double/single quotes: (current regex is not working ".*")
I have below questions here
1. Regex to match single parenthesis (no nested parenthesis is possible in pseudo selector content property)
2.Single regex that will match the counter, attribute , url and string content in the given order (order is important because i want to replace them later with evaluated values)
Please let me know if any more information is required from side.
Thanks
Your first regex does indeed match nested parentheses (but not escaped parentheses). Is that desirable?
Without nesting or escaping, these become much simpler.
Here's a variant of your first regex that ignores nesting possibilities:
counter\([^)]*\)
It matches a literal counter( and then zero or more non-close-parentheses, then finally a close parenthesis. (Full explanations of your first regex and my simpler version at regex101.)
I believe that answers your first question, though if you're literally looking for a "regex to match [a] single parenthesis," that's just [()], which will match either an open or a close parenthesis character. You could alternatively explicitly match \( or \) if you know which one you want to match.
Matching quotes (without regard to nesting or escaped quotes) is similarly easy:
"[^"]*"
This matches a literal double quote character ("), then zero or more non-doublequote characters, then another literal double quote character.
Your second request was for a "single regex that will match the counter, attribute , url and string content in the given order (order is important because i want to replace them later with evaluated values)."
I'm not sure how you intend to get the CSS content property's value, given how that's typically in an ::after or ::before pseudo-class, which are not available from the DOM, but here's some dummy code populating it so we can manipulate it:
var css = `content: counter(item)" " attr(data) "" counter(item1,decimal) url('test.jpeg') "hi" attr(xyz); color:red;`;
// harvest last `content` property (this is tricked by `content: "content: blah"`)
var content = css.match(/.*\bcontent:\s*([^;"']*(?:"[^"]*"[^;"']*|'[^']*'[^;"']*)*)/);
if (content) {
var part_re = /(?:"([^"]*)"|'([^']*)'|(?:counter|attr|url)\(([^)]*)\))/g;
while ( part = part_re.exec(content[1]) ) { // parse on just the value
if (part[0].match(/^"/)) { /* do stuff to part[1] */ }
else if (part[0].match(/^'/)) { /* do stuff to part[2] */ }
else if (part[0].match(/^counter/)) { /* do stuff to part[3] */ }
else if (part[0].match(/^attr/)) { /* do stuff to part[3] */ }
else if (part[0].match(/^url/)) { /* do stuff to part[3] */ }
// silently skips other values, like `open-quote` or `counters(name, string)`
}
}
The first regex (line 4) extracts the last content property from the CSS (last because it'll override previous instances, though note the fact that this'll stupidly extract content: blah from content: "content: blah"). After finding the last instance of a word break and then content:, it absorbs any whitespace and then matches the rest of the line until a semicolon, double quote, or single quote. A non-capture group allows for any content between double quotes or a single quote, much in the same way we matched quotes near the top of this answer. (Full explanation of this CSS content regex at regex101.)
The second regex (line 7, assigned to part_re) is in a while loop so we can work on each individual value in the content property in order. It matches double-quoted strings or single-quoted strings or certain named values (counter or attr or url). See the conditionals and comments for where the values' data are stored. Full explanation of this value parsing regex at regex101 (see "Match Information" in the middle of the right column to see how I'm storing the values' data).

Matching special characters and letters in regex

I am trying to validate a string, that should contain letters numbers and special characters &-._ only. For that I tried with a regular expression.
var pattern = /[a-zA-Z0-9&_\.-]/
var qry = 'abc&*';
if(qry.match(pattern)) {
alert('valid');
}
else{
alert('invalid');
}
While using the above code, the string abc&* is valid. But my requirement is to show this invalid. ie Whenever a character other than a letter, a number or special characters &-._ comes, the string should evaluate as invalid. How can I do that with a regex?
Add them to the allowed characters, but you'll need to escape some of them, such as -]/\
var pattern = /^[a-zA-Z0-9!##$%^&*()_+\-=\[\]{};':"\\|,.<>\/?]*$/
That way you can remove any individual character you want to disallow.
Also, you want to include the start and end of string placemarkers ^ and $
Update:
As elclanrs understood (and the rest of us didn't, initially), the only special characters needing to be allowed in the pattern are &-._
/^[\w&.\-]+$/
[\w] is the same as [a-zA-Z0-9_]
Though the dash doesn't need escaping when it's at the start or end of the list, I prefer to do it in case other characters are added. Additionally, the + means you need at least one of the listed characters. If zero is ok (ie an empty value), then replace it with a * instead:
/^[\w&.\-]*$/
Well, why not just add them to your existing character class?
var pattern = /[a-zA-Z0-9&._-]/
If you need to check whether a string consists of nothing but those characters you have to anchor the expression as well:
var pattern = /^[a-zA-Z0-9&._-]+$/
The added ^ and $ match the beginning and end of the string respectively.
Testing for letters, numbers or underscore can be done with \w which shortens your expression:
var pattern = /^[\w&.-]+$/
As mentioned in the comment from Nathan, if you're not using the results from .match() (it returns an array with what has been matched), it's better to use RegExp.test() which returns a simple boolean:
if (pattern.test(qry)) {
// qry is non-empty and only contains letters, numbers or special characters.
}
Update 2
In case I have misread the question, the below will check if all three separate conditions are met.
if (/[a-zA-Z]/.test(qry) && /[0-9]/.test(qry) && /[&._-]/.test(qry)) {
// qry contains at least one letter, one number and one special character
}
Try this regex:
/^[\w&.-]+$/
Also you can use test.
if ( pattern.test( qry ) ) {
// valid
}
let pattern = /^(?=.*[0-9])(?=.*[!##$%^&*])(?=.*[a-z])(?=.*[A-Z])[a-zA-Z0-9!##$%^&*]{6,16}$/;
//following will give you the result as true(if the password contains Capital, small letter, number and special character) or false based on the string format
let reee =pattern .test("helLo123#"); //true as it contains all the above
I tried a bunch of these but none of them worked for all of my tests. So I found this:
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[^a-zA-Z0-9])(?!.*\s).{8,15}$
from this source: https://www.w3resource.com/javascript/form/password-validation.php
Try this RegEx: Matching special charecters which we use in paragraphs and alphabets
Javascript : /^[a-zA-Z]+(([\'\,\.\-_ \/)(:][a-zA-Z_ ])?[a-zA-Z_ .]*)*$/.test(str)
.test(str) returns boolean value if matched true and not matched false
c# : ^[a-zA-Z]+(([\'\,\.\-_ \/)(:][a-zA-Z_ ])?[a-zA-Z_ .]*)*$
Here you can match with special char:
function containsSpecialChars(str) {
const specialChars = /[`!##$%^&*()_+\-=\[\]{};':"\\|,.<>\/?~]/;
return specialChars.test(str);
}
console.log(containsSpecialChars('hello!')); // 👉️ true
console.log(containsSpecialChars('abc')); // 👉️ false
console.log(containsSpecialChars('one two')); // 👉️ false

How to make this simple regexp?

I need to make a string starts and ends with alphanumeric range between 5 to 20 characters and it could have a space or none between characters. /^[a-z\s?A-Z0-9]{5,20}$/ but this is not working.
EDIT
test test -should pass
testtest -should pass
test test test -should not pass
You can't do this with traditional regex without writing a ridiculously long expression, so you need to use a look-ahead:
/^(?=(\w| ){15,20}$)\w+ ?\w+$/
This says, make sure there are between 15 and 20 characters in the match, then match /\w+ \w+/
Note I used \w for simplification. It is the same as your character class above except it also accepts underscores. If you don't want to match them you have to do:
/^(?=[a-zA-Z0-9 ]{15,20}$)[a-zA-Z0-9]+ ?[a-zA-Z0-9]+$/
You can't put a ? inside of [...]. [...] is used to specify a set of characters precisely, you can't maybe (?) have a character inside a set of characters. The occurrence of any specific characters is already optional, the ? is meaningless.
If you allow any number of spaces inside your match, just remove the question mark. If you want to allow a single space but no more, then regular expressions alone can't do that for you, you'd need something like
if (myString.match(/^[a-z\sA-Z0-9]{5,20}$/ && myString.match(/\s/g).length <= 1)
You couldn't do this with a single traditional regex without it being dozens of lines long; regexes are meant for matching more simpler patterns than this.
If you only want to use regexes, you could use two instead of one. The first matches the general pattern, the second ensures that only one non-space characters is found.
if (myString.match(/^[a-z\sA-Z0-9]{5,20}$/ && myString.match(/^[^\s]*\s?[^\s]*$/))) {
Example Usage
inputs = ["test test", "testtest", "test test test"];
for (index in inputs) {
var myString = inputs[index];
if (myString.match(/^[a-z\sA-Z0-9]{5,20}$/ && myString.match(/^[^\s]*\s?[^\s]*$/))) {
console.log(myString + " matches.")
} else {
console.log(myString + " does not match.")
}
}
This produces the output specified in your question.
Meh , So here's the ridiculously long traditional regex for the same
(?i)[a-z0-9]+( [a-z0-9]+)?{5,12}
js vesrion (w/o the nested quantifier)
/^([a-z0-9]( [a-z0-9])?){5,12}$/i

Categories