Regex for url validation (unterminated parentheticals error) - javascript

I have the following expression to validate a URL but it gives me a syntax error on the browser. I am no expert in regex expressions so I am not sure what I am looking for. I would also like it to test for http:// and https:// urls.
"url":{
"regex":"/^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$/",
"alertText":"URL must start with http://"}
Edit:
To clarify I am looking for help for both the regex and the syntax issues please. I have tried about 20 different variations based on all the answers but still no luck. Just to clarity, I do not need to validate the entire URL. I just need to validate that it starts with http:// or https:// but it must not fail validation if left empty. I can get the http part working with this
/^https?:///
no need to escape the / even. But it fails if the input field is empty, when I try:
/^(https?://)?/
I get an error saying "unterminated parenthetical /^(https?://)/".
Just to confuse matters more, here is one that I added yesterday to validate a date or no entry and it like the same sort of format to me.
/^([0-9]{1,2}\-\[0-9]{1,2}\-\[0-9]{4})?$/

For what it's worth, the syntax error is the unescaped forward slash here: /\S*
Edit: oh wow, I'm tired. All of the forward slashes are unescaped. You can escape them with a backslash: \/

Here's the spec on URIs, of which URLs are a subset, or here's the spec on URLs if you're sure that's all you care about. A full implementation of either would be nearly impossible with only a single regular expression.
If you truly want to validate a URL, one that you know will be HTTP or HTTPS, send it an HTTP HEAD request and check the response code.
Alternatively, if you're going to play loose with the spec, decide how loose you're willing to be with the input, and if it's better to exclude valid URLs or permit false ones.

If you want to test for a URL or empty input, you might want to do two passes.
test for empty string.
test for valid url.
I would do something like the following (assuming urlString is my input).
// get rid of whitespace, in case user hit spacebar/tab
// also removes leading/trailing spaces.
urlString = urlString.replace(/[\s]*/g,'');
// test if zero length string, if not, test the url.
if( urlString.length > 0 ){ // test the URL
var re = new RegExp( your_expression_goes_here );
var result = re.exec(urlString);
if( result != null ) {
// we have a hit!!! this is a URL.
} else {
// this is a bad string.
}
} else {
// user entered no text, let's move on.
}
So, the preceding should work and allow you to test for either empty string or a url. As to the regular expression you're using "/(http|https):///", I believe it's a bit flawed. Yes, it will catch "http://" or "https://", but it will also key in on a string like "htthttp://" which is clearly not what you want.
Your other sample "/^(http|https):///" is better in that it will match from the beginning of the string and will tell you if the string begins like a URL.
Now, I think jrob above was on the right track with his second string in regards to testing the full URL. I think I found the same sample he used at this page. I've modified the expression as per below and tested it using an online regex tester, can't post the link as I'm a new user :D.
It seems to catch a whole manner of valid URLs and produces an error if the input string is in any way an invalid URL, at least for the invalid URLs I can think of. It also catches http/https protocols only, which I think is your base requirement.
^(?:http(?:s?)\:\/\/|~/|/)?(?:\w+:\w+#)?(?:(?:[-\w]+\.)+([a-zA-Z]{2,9}))(?::[\d]{1,5})?(?:(?:(?:/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|/)+|\?|#)?(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?$
Hope this helps.
Updated code (twice).
I still strongly suggest you test for empty string first as per my earlier example, and you only test for the valid values if the string is non zero. I have tried to combine the two tests into one, but have been unable to do so so far (maybe someone else can still figure it out).
The following tests work for me, here's a URL sample as you required:
//var re = /^(?:http(?:s?)\:\/\/)/;
// the following expression will test for http(s):// and empty string
var re = /^(?:http(?:s?)\:\/\/)*$/;
// use the precompiled expression above, or the following
// two lines:
//var reTxt = "^(?:http(?:s?)\:\/\/)";
//var re = new RegExp(reTxt);
alert(
"result:" + re.test("http://") +
"\nresult:" + re.test("https://") +
"\nresult:" + re.test("") +
"\nresult:" + re.test("https:") +
"\nresult:" + re.test("xhttp://") +
"\nresult:" + re.test("ftp://") +
"\nresult:" + re.test("http:/") +
"\nresult:" + re.test("http://somepage.com") +
"\nresult:" + re.test("httphttp://") +
"\nresult:" + re.test(" http://") +
"\nresult:" + re.test("Random text")
);
And here's a test for dates:
var re2 = /^[0-9]{1,2}\-[0-9]{1,2}\-[0-9]{4}$/;
// use the precompiled expression above, or the following
// two lines:
//var reDateTxt = /^[0-9]{1,2}\-[0-9]{1,2}\-[0-9]{4}$/;
//var re2 = new RegExp(reDateTxt);
alert(
"result:" + re2.test("02-02-2009") +
"\nresult:" + re2.test("022-02-2009") +
"\nresult:" + re2.test("02-032-2009") +
"\nresult:" + re2.test("02-02-23009") +
"\nresult:" + re2.test(" 02-02-2009") +
"\nresult:" + re2.test("02-0a2-2009") +
"\nresult:" + re2.test("02-02-2009") +
"\nresult:" + re2.test("Random text")
);

Related

+ Sign removal javascript

Hi I dont know why + sign is removed and how to eliminate it's removing.
Sample code is presented:
var customer_number = $('cust_num');
var l_sParams = 'number='+customer_number.value;
alert(l_sParams);
var l_sURL = '/caller/send_sms';
new Ajax.Request(l_sURL, {parameters: l_sParams, method: 'POST',
onComplete:function(a_oRequest){
}.bind(this)
});
the alert displays ex: +1907727500
and if I print in Python it is printed without + sign like this ex:
_to_customer = self.request.post['number']
result: 1907727500 (without + )
Thank you
+ in a query parameter is the escape code for a space. You receive ' 1907727500', with the space.
Use %2B instead, or better still, have JavaScript quote your values properly
var l_sParams = 'number=' + encodeURIComponent(customer_number.value);
Strings containing a plus sign (or such special chars) should be urlencoded since it represents space in URLs. Use encodeURI() to do that.

selector concatenation peculiarity

I came across behavior that I cannot explain. I want to get an element (id='address.zipCode)' with simple selector:
$('#' + prefix + 'zipCode')
and it doesn't work. In this case, prefix == 'address\\.'. Chrome console debugging results in:
> prefix
"address\\."
> $('#' + prefix + 'zipCode')
[]
The most interesting part is that:
$('#' + "address\\." + 'zipCode')
[<input id=​"address.zipCode" name=​"address.zipCode" class=​"zipCodeMask" type=​"text" value>​]
Any ideas what's wrong with that?
Working backwards from the behavior of the Chrome REPL (which displays the final value of the string, i.e. sans escaping characters), you actually have two backslashes in your final string. In other words, you have probably assigned prefix like so:
var prefix = "address\\\\.";
What you actually need is only one backslash, which means you should type in two backslashes in the string literal (one for escaping):
var prefix = "address\\.";

Most efficient way to grab XML tag from file with JavaScript and Regex

I'm doing some more advanced automation on iOS devices and simulators for an enterprise application. The automation is written in browserless Javascript. One of the methods works on the device but not on the simulator, so I need to code a workaround. For the curious, it's UIATarget.localTarget().frontMostApp().preferencesValueForKey(key).
What we need to do is read a path to a server (which varies) from a plist file on disk. As a workaround on the simulator, I've used the following lines to locate the plist file containing the preferences:
// Get the alias of the user who's logged in
var result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/whoami", [], 5).stdout;
// Remove the extra newline at the end of the alias we got
result = result.replace('\n',"");
// Find the location of the plist containing the server info
result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/find", ["/Users/"+result+"/Library/Application Support/iPhone Simulator", "-name", "redacted.plist"], 100);
// For some reason we need a delay here
UIATarget.localTarget().delay(.5);
// Results are returned in a single string separated by newline characters, so we can split it into an array
// This array contains all of the folders which have the plist file under the Simulator directory
var plistLocations = result.stdout.split("\n");
...
// For this example, let's just assume we want slot 0 here to save time
var plistBinaryLocation = plistLocations[0];
var plistXMLLocation = plistLocations[i] + ".xml";
result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/plutil", ["-convert","xml1", plistBinaryLocation,"-o", plistXMLLocation], 100);
From here, I think the best way to get the contents is to cat or grep the file, since we can't read the file directly from disk. However, I'm having trouble getting the syntax down. Here's an edited snippet of the plist file I'm reading:
<key>server_url</key>
<string>http://pathToServer</string>
There are a bunch of key/string pairs in the file, where the server_url key is unique. Ideally I'd do something like a lookback, but because JavaScript doesn't appear to support it, I figured I'd just get the pair from the file and whittle it down a bit later.
I can search for the key with this:
// This line works
var expression = new RegExp(escapeRegExp("<key>server_url</key>"));
if(result.stdout.match(expression))
{
UIALogger.logMessage("FOUND IT!!!");
}
else
{
UIALogger.logMessage("NOPE :(");
}
Where the escapeRegExp method looks like this:
function escapeRegExp(str)
{
var result = str.replace(/([()[{*+.$^\\|?])/g, '\\$1');
UIALogger.logMessage("NEW STRING: " + result);
return result;
}
Also, this line returns a value (but gets the wrong line):
var expression = new RegExp(escapeRegExp("<string>(.*?)</string>"));
However, when you put the two together, it (the Regex syntax) works on the terminal but doesn't work in code:
var expression = new RegExp(escapeRegExp("<key>server_url</key>[\s]*<string>(.*?)</string>"));
What am I missing? I also tried grep and egrep without any luck.
There are two problems affecting you here getting the regex to work in your JavaScript code.
First, you are escaping the whole regex expression string, which means that your capturing (.*?) and your whitespace ignoring [\s]* will also be escaped and won't be evaluated the way you're expecting. You need to escape the XML parts and add in the regex parts without escaping them.
Second, the whitespace ignoring part, [\s]* is falling prey to JavaScript's normal string escaping rules. the "\s" is turning into "s" in the output. You need to escape that backslash with "\s" so that it stays as "\s" in the string that you pass to construct the regular expression.
I've built a working script that I've verified in the UI Automation engine itself. It should extract and print out the expected URL:
var testString = "" +
"<plistExample>\n" +
" <key>dont-find-me</key>\n" +
" <string>bad value</string>\n" +
" <key>server_url</key>\n" +
" <string>http://server_url</string>\n" +
"</plistExample>";
function escapeRegExp(str)
{
var result = str.replace(/([()[{*+.$^\\|?])/g, '\\$1');
UIALogger.logMessage("NEW STRING: " + result);
return result;
}
var strExp = escapeRegExp("<key>server_url</key>") + "[\\s]*" + escapeRegExp("<string>") + "(.*)" + escapeRegExp("</string>");
UIALogger.logMessage("Expression escaping only the xml parts:" + strExp);
var exp = new RegExp(strExp);
var match = testString.match(exp);
UIALogger.logMessage("Match: " + match[1]);
I should point out, though, that the only thing you need to escape in the regex is the forward slashes in the XML closing tags. That means that you don't need your escapeRegExp() function and can write the expression you want like this:
var exp = new RegExp("<key>server_url<\/key>[\\s]*<string>(.*)<\/string>");

Regex for matching multiple forward slashes in URL

I need a regular expression for replacing multiple forward slashes in a URL with a single forward slash, excluding the ones following the colon
e.g. http://link.com//whatever/// would become http://link.com/whatever/
I think this should work: /[^:](\/+)/ or /[^:](\/\/+)/ if you want only multiples.
It wont match leading // but it looks like you're not looking for that.
To replace:
"http://test//a/b//d".replace(/([^:]\/)\/+/g, "$1") // --> http://test/a/b/d
Working Demo
As you already accepted an answer. To show some more extend of matching and controlling the matches, this might help you in the future:
var url = 'http://link.com//whatever///';
var set = url.match(/([^:]\/{2,3})/g); // Match (NOT ":") followed by (2 OR 3 "/")
for (var str in set) {
// Modify the data you have
var replace_with = set[str].substr(0, 1) + '/';
// Replace the match
url = url.replace(set[str], replace_with);
}
console.log(url);
Will output:
http://link.com/whatever/
Doublets won't matter in your situation. If you have this string:
var url = 'http://link.com//om/om/om/om/om///';
Your set array will contain multiple m//. A bit redundant, as the loop will see that variable a few times. The nice thing is that String.replace() replaces nothing if it finds nothing, so no harm done.
What you could do is strip out the duplicates from set first, but that would almost require the same amount of resources as just letting the for-loop go over them.
Good luck!
result = subject.replace(/(?<!http:)\/*\//g, "/");
or (for http, https, ftp and ftps)
result = subject.replace(/(?<!(?:ht|f)tps?:)\/*\//g, "/");
The original accepted answer does a sufficient job at replacing, but not for matching. And the currently accepted answer matches the character before duplicate slashes, also not good for matching.
Using a negative lookbehind to exclude the protocol from the match (?<!:), and a curly bracket quantifier to match 2 to infinite slashes \/{2,} does the job to both match and replace.
(?<!:)\/{2,}
let str = 'https://test.example.com:8080//this/is//an/exmaple///';
document.write('Original: ' + str + '<br><br>');
document.write('Matches: ' + str.match(/(?<!:)\/{2,}/g) + '<br><br>');
document.write('Replaced: ' + str.replace(/(?<!:)\/{2,}/g, '/'));

JSLint - Bad Escaping for a Regex with variables

I've seen some posts about JSLint "bad escapement" warnings, but I just wanted to see if I'm doing this Regex correctly. (Note - I'm dabbler programmer).
I have a function (below) that attempts to parse out a variable from it's name in a long message. The regex is working well, but should I change something in response to the JSLint warning?
A very simplified version of msg could look like this essentially:
VariableName1 = Value1
VariableName2 = Value2
VariableName3 = Value3
The actual msg has different unstructured data above and below. I had to use a strange Regex since even though a more simple one worked on all the testing websites, it didn't work within the server application we are using, so this is the only way I could get it to work. The regular expression incorporates a variable.
Here is the parsing function I'm using:
function parseValue(msg, strValueName) {
var myRegexp = new RegExp(strValueName + ' = ([A-Z3][a-zA-Z\. 3]+)[\\n\\r]+', 'gm');
log('parseValue', 'myRegexp = ' + myRegexp.toString());
var match = myRegexp.exec(msg);
log('parseValue', 'returning match = ' + match[1] );
return match[1];
}
There is probably something much simpler that a 'real' programmer can come up with pretty easily. Any help would be appreciated.
THanks.
The problem that JSLint didn't like was the '.' character in the character class as pointed out by 'Explosion Pills'.
When I removed the '.' all was good.
Thanks.

Categories