Use Whitelist RegEx in Javascript to validate a string - javascript

I'm trying to prevent an action based on wether a string passes the whiteList Regex in Javascript
const whiteList = /[#A-Za-z0-9.,-]/g // Regex from external source. It will be difficult to modify.
const str= 'ds%d';
console.log(str.replace(whiteList, '').length === 0) // Expected false - WORKS
// How can I make this statement return false ?
console.log(whiteList.test(str)) //Expected false Actual true
Tried using replace command to check if a string passed a validation based on a whitelist. It works but I believe there could be a better way of solving this problem.

You get true from test because there is a character in the string that matches the expression. The expression has no anchors, so it's not that it requires all characters in the string to match, just one. To require all characters to match, you'd need a "start of input" assertion (^) at the beginning, an "end of input" assertion ($) at the end, and either a "zero or more" (*) or "one or more" (+) quantifier on the character class (depending on whether an empty string should pass).
If you're getting the expression from elsewhere, you can add those to it after the fact:
const whiteList = /[#A-Za-z0-9.,-]/g // Regex from external source. It will be difficult to modify.
const str= 'ds%d';
console.log(str.replace(whiteList, '').length === 0);
const improvedList = new RegExp("^" + whiteList.source + "+$");
console.log(improvedList.test(str)); // Now shows false
That does make the assumption that the original regex has the problem described. You might check first, but it would be easy to construct regular expressions that seemed like they needed modifying but didn't.
Alternatively, just use the replace check you have, since it works as well. It's not that much more expensive.

Related

Using Regex within a custom JS variable to extract part of a string when subsequent characters can be different

I have the following function that I am using to extract part of the URL of a page that is of interest to me:
function() {
var page = {{Page Path}};
return page.match(/[REGEX]/);
}
I need to return the 'match' part of the string from the following URL's:
/abc/def/match123
/abc/def/matchxyz
/abc/def/match000
I am struggling to do this when the preceding and subsequent character sets can differ. There are only three possible sets of subsequent character sets after the string I want to match: xyz|123|000 but the preceding can be anything, although always ends with the final / in the URL.
This regex will match the "match" part in replacement-var $1:
([^\/]+)(123|xyz|000)$
To break it down, this will match any character, that is not a /, which is immediately followed by on of the accepted patterns. The $ will require it to be in the end of the string - that could probably be removed, depending on your needs.
Example usage:
'/abc/def/iAmTheMatchxyz'.replace(/([^\/]+)(123|xyz|000)$/g, '$1') === 'iAmTheMatch';

Regex for checking presence and absence of keywords in User Agent String

I'm using a Regex in Javascript to sniff the User Agent string. Here is some pseudo code below:
is_not_tablet_bool = /android.+mobile/i.test(navigator.userAgent.toLowerCase());
switch(true)
{
case (is_not_tablet_bool):
return true;
break;
}
I'm trying to craft a regex that will do something close to the opposite of the above i.e.
ensure that 'android' keyword is present in string, and at the same time
ensure that 'mobile' keyword is absent from string
Thanks for the anticipated assistance.
Negative lookahead
Regular expressions have a construct called negative lookahead which matches a string in the regular expression without capturing the lookahead part:
Your regular expression should be written this way:
/android(?!.*mobile)/i
This will match any string that contains word android that is not followed by word mobile with ignored case. This also means that you can remove the toLowerCase call.
Addition: Negative lookbehind
In case you need only match those strings that have word android in them but lack mobile (either before or after) then a combination of negative lookaheads and lookbehinds will do.
/(?<!mobile.*)android(?!.*mobile)/i
But the problem is that Javascript doesn't support negative lookbehinds. So you have to employ a different trick that will help you determining that situation.
There are several possibilities of which following seem to be most interesting (and last one useful):
Replace a matching negative string that will fail afterwards:
var nav = navigator.userAgent.replace(/mobile.*android/, "fail" );
return /android(?!.*mobile)/i.test(nav);
Use two lookaheads. One on normal string and the other on reversed one while also having a reversed regular expression:
var nav = navigator.userAgent;
var after = /android(?!.*mobile)/i.test(nav);
var before = /diordna(?!.*elibom)/i.test(nav.split("").reverse().join(""));
return before && after;
Simplicity is key. Two simple regular expressions would do the trick just fine as well:
var nav = navigator.userAgent;
return /android/i.test(nav) && !/mobile/i.test(nav);
Note: I'm not sure whether your code is actual code, because if it is I would strongly recommend you reconsider the use of switch(true) statement and simply replace it by
return is_not_tablet_bool;.
I don't think using regular expressions makes this task any easier. I would simply check for the inclusion and exclusion of those strings using the String.indexOf(...) method:
function isNotAndroidMobile(userAgentString) {
var ua = userAgentString.toLowerCase();
return (ua.indexOf('android')>=0) && (ua.indexOf('mobile')===-1);
}

JavaScript RegEx Match Failing

I am having issues matching a string using regex in javascript. I am trying to get everything up to the word "at". I am using the following and while it doesn't return any errors, it also doesn't do anything either.
var str = "Team A at Team B";
var matches = str.match(/(.*?)(?=at|$)/);
I tried multiple regex patterns before coming across this SO post, Regex to capture everything before first optional string, but it doesn't to return what I want.
Remove the ? at your first capturing group, and |$ from your second, and add ^ to mark beginning of string:
str.match(/^(.*)(?=at)/)
Alternatively (I personally find below easier to read, but your call):
str.substr(0, str.search(/\bat\b/))

JavaScript Regex to match a URL in a field of text

How can I setup my regex to test to see if a URL is contained in a block of text in javascript. I cant quite figure out the pattern to use to accomplish this
var urlpattern = new RegExp( "(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?"
var txtfield = $('#msg').val() /*this is a textarea*/
if ( urlpattern.test(txtfield) ){
//do something about it
}
EDIT:
So the Pattern I have now works in regex testers for what I need it to do but chrome throws an error
"Invalid regular expression: /(http|ftp|https)://[w-_]+(.[w-_]+)+([w-.,#?^=%&:/~+#]*[w-#?^=%&/~+#])?/: Range out of order in character class"
for the following code:
var urlexp = new RegExp( '(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?' );
Though escaping the dash characters (which can have a special meaning as character range specifiers when inside a character class) should work, one other method for taking away their special meaning is putting them at the beginning or the end of the class definition.
In addition, \+ and \# in a character class are indeed interpreted as + and # respectively by the JavaScript engine; however, the escapes are not necessary and may confuse someone trying to interpret the regex visually.
I would recommend the following regex for your purposes:
(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?
this can be specified in JavaScript either by passing it into the RegExp constructor (like you did in your example):
var urlPattern = new RegExp("(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?")
or by directly specifying a regex literal, using the // quoting method:
var urlPattern = /(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])?/
The RegExp constructor is necessary if you accept a regex as a string (from user input or an AJAX call, for instance), and might be more readable (as it is in this case). I am fairly certain that the // quoting method is more efficient, and is at certain times more readable. Both work.
I tested your original and this modification using Chrome both on <JSFiddle> and on <RegexLib.com>, using the Client-Side regex engine (browser) and specifically selecting JavaScript. While the first one fails with the error you stated, my suggested modification succeeds. If I remove the h from the http in the source, it fails to match, as it should!
Edit
As noted by #noa in the comments, the expression above will not match local network (non-internet) servers or any other servers accessed with a single word (e.g. http://localhost/... or https://sharepoint-test-server/...). If matching this type of url is desired (which it may or may not be), the following might be more appropriate:
(http|ftp|https)://[\w-]+(\.[\w-]+)*([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?
#------changed----here-------------^
<End Edit>
Finally, an excellent resource that taught me 90% of what I know about regex is Regular-Expressions.info - I highly recommend it if you want to learn regex (both what it can do and what it can't)!
Complete Multi URL Pattern.
UPDATED: Nov. 2020, April & June 2021 (Thanks commenters)
Matches all URI or URL in a string!
Also extracts the protocol, domain, path, query and hash. ([a-z0-9-]+\:\/+)([^\/\s]+)([a-z0-9\-#\^=%&;\/~\+]*)[\?]?([^ \#\r\n]*)#?([^ \#\r\n]*)
https://regex101.com/r/jO8bC4/56
Example JS code with output - every URL is turned into a 5-part array of its 'parts' (protocol, host, path, query, and hash)
var re = /([a-z0-9-]+\:\/+)([^\/\s]+)([a-z0-9\-#\^=%&;\/~\+]*)[\?]?([^ \#\r\n]*)#?([^ \#\r\n]*)/mig;
var str = 'Bob: Hey there, have you checked https://www.facebook.com ?\n(ignore) https://github.com/justsml?tab=activity#top (ignore this too)';
var m;
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
console.log(m);
}
Will give you the following:
["https://www.facebook.com",
"https://",
"www.facebook.com",
"",
"",
""
]
["https://github.com/justsml?tab=activity#top",
"https://",
"github.com",
"/justsml",
"tab=activity",
"top"
]
You have to escape the backslash when you are using new RegExp.
Also you can put the dash - at the end of character class to avoid escaping it.
& inside a character class means & or a or m or p or ; , you just need to put & and ; , a, m and p are already match by \w.
So, your regex becomes:
var urlexp = new RegExp( '(http|ftp|https)://[\\w-]+(\\.[\\w-]+)+([\\w-.,#?^=%&:/~+#-]*[\\w#?^=%&;/~+#-])?' );
try (http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?
I've cleaned up your regex:
var urlexp = new RegExp('(http|ftp|https)://[a-z0-9\-_]+(\.[a-z0-9\-_]+)+([a-z0-9\-\.,#\?^=%&;:/~\+#]*[a-z0-9\-#\?^=%&;/~\+#])?', 'i');
Tested and works just fine ;)
Try this general regex for many URL format
/(([A-Za-z]{3,9})://)?([-;:&=\+\$,\w]+#{1})?(([-A-Za-z0-9]+\.)+[A-Za-z]{2,3})(:\d+)?((/[-\+~%/\.\w]+)?/?([&?][-\+=&;%#\.\w]+)?(#[\w]+)?)?/g
The trouble is that the "-" in the character class (the brackets) is being parsed as a range: [a-z] means "any character between a and z." As Vini-T suggested, you need to escape the "-" characters in the character classes, using a backslash.
try this worked for me
/^((ftp|http[s]?):\/\/)?(www\.)([a-z0-9]+)\.[a-z]{2,5}(\.[a-z]{2})?$/
that is so simple and understandable

Split string in JavaScript using a regular expression

I'm trying to write a regex for use in javascript.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var match = new RegExp("'[^']*(\\.[^']*)*'").exec(script);
I would like split to contain two elements:
match[0] == "'areaog_og_group_og_consumedservice'";
match[1] == "'\x26roleOrd\x3d1'";
This regex matches correctly when testing it at gskinner.com/RegExr/ but it does not work in my Javascript. This issue can be replicated by testing ir here http://www.regextester.com/.
I need the solution to work with Internet Explorer 6 and above.
Can any regex guru's help?
Judging by your regex, it looks like you're trying to match a single-quoted string that may contain escaped quotes. The correct form of that regex is:
'[^'\\]*(?:\\.[^'\\]*)*'
(If you don't need to allow for escaped quotes, /'[^']*'/ is all you need.) You also have to set the g flag if you want to get both strings. Here's the regex in its regex-literal form:
/'[^'\\]*(?:\\.[^'\\]*)*'/g
If you use the RegExp constructor instead of a regex literal, you have to double-escape the backslashes: once for the string literal and once for the regex. You also have to pass the flags (g, i, m) as a separate parameter:
var rgx = new RegExp("'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", "g");
while (result = rgx.exec(script))
print(result[0]);
The regex you're looking for is .*?('[^']*')\s*,\s*('[^']*'). The catch here is that, as usual, match[0] is the entire matched text (this is very normal) so it's not particularly useful to you. match[1] and match[2] are the two matches you're looking for.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var parameters = /.*?('[^']*')\s*,\s*('[^']*')/.exec(script);
alert("you've done: loadArea("+parameters[1]+", "+parameters[2]+");");
The only issue I have with this is that it's somewhat inflexible. You might want to spend a little time to match function calls with 2 or 3 parameters?
EDIT
In response to you're request, here is the regex to match 1,2,3,...,n parameters. If you notice, I used a non-capturing group (the (?: ) part) to find many instances of the comma followed by the second parameter.
/.*?('[^']*')(?:\s*,\s*('[^']*'))*/
Maybe this:
'([^']*)'\s*,\s*'([^']*)'

Categories