Encode URL in JavaScript - javascript

How do you safely encode a URL using JavaScript such that it can be put into a GET string?
var myUrl = "http://example.com/index.html?param=1&anotherParam=2";
var myOtherUrl = "http://example.com/index.html?url=" + myUrl;
I assume that you need to encode the myUrl variable on that second line?

Check out the built-in function encodeURIComponent(str) and encodeURI(str).
In your case, this should work:
var myOtherUrl =
"http://example.com/index.html?url=" + encodeURIComponent(myUrl);

You have three options:
escape() will not encode: #*/+
encodeURI() will not encode: ~!##$&*()=:/,;?+'
encodeURIComponent() will not encode: ~!*()'
But in your case, if you want to pass a URL into a GET parameter of other page, you should use escape or encodeURIComponent, but not encodeURI.
See Stack Overflow question Best practice: escape, or encodeURI / encodeURIComponent for further discussion.

Stick with encodeURIComponent(). The function encodeURI() does not bother to encode many characters that have semantic importance in URLs (e.g. "#", "?", and "&"). escape() is deprecated, and does not bother to encode "+" characters, which will be interpreted as encoded spaces on the server (and, as pointed out by others here, does not properly URL-encode non-ASCII characters).
There is a nice explanation of the difference between encodeURI() and encodeURIComponent() elsewhere. If you want to encode something so that it can safely be included as a component of a URI (e.g. as a query string parameter), you want to use encodeURIComponent().

The best answer is to use encodeURIComponent on values in the query string (and nowhere else).
However, I find that many APIs want to replace " " with "+" so I've had to use the following:
const value = encodeURIComponent(value).replace('%20','+');
const url = 'http://example.com?lang=en&key=' + value
escape is implemented differently in different browsers and encodeURI doesn't encode many characters (like # and even /) -- it's made to be used on a full URI/URL without breaking it – which isn't super helpful or secure.
And as #Jochem points out below, you may want to use encodeURIComponent() on a (each) folder name, but for whatever reason these APIs don't seem to want + in folder names so plain old encodeURIComponent works great.
Example:
const escapedValue = encodeURIComponent(value).replace('%20','+');
const escapedFolder = encodeURIComponent('My Folder'); // no replace
const url = `http://example.com/${escapedFolder}/?myKey=${escapedValue}`;

I would suggest to use the qs npm package:
qs.stringify({a:"1=2", b:"Test 1"}); // gets a=1%3D2&b=Test+1
It is easier to use with a JavaScript object and it gives you the proper URL encoding for all parameters.
If you are using jQuery, I would go for the $.param method. It URL encodes an object, mapping fields to values, which is easier to read than calling an escape method on each value.
$.param({a:"1=2", b:"Test 1"}) // Gets a=1%3D2&b=Test+1

Modern solution (2021)
Since the other answers were written, the URLSearchParams API has been introduced. It can be used like this:
const queryParams = { param1: 'value1', param2: 'value2' }
const queryString = new URLSearchParams(queryParams).toString()
// 'param1=value1&param2=value2'
It also encodes non-URL characters.
For your specific example, you would use it like this:
const myUrl = "http://example.com/index.html?param=1&anotherParam=2";
const myOtherUrl = new URL("http://example.com/index.html");
myOtherUrl.search = new URLSearchParams({url: myUrl});
console.log(myOtherUrl.toString());
This solution is also mentioned here and here.

encodeURIComponent() is the way to go.
var myOtherUrl = "http://example.com/index.html?url=" + encodeURIComponent(myUrl);
But you should keep in mind that there are small differences from PHP version urlencode() and as #CMS mentioned, it will not encode every character. Guys at http://phpjs.org/functions/urlencode/ made JavaScript equivalent to phpencode():
function urlencode(str) {
str = (str + '').toString();
// Tilde should be allowed unescaped in future versions of PHP (as reflected below), but if you want to reflect current
// PHP behavior, you would need to add ".replace(/~/g, '%7E');" to the following.
return encodeURIComponent(str)
.replace('!', '%21')
.replace('\'', '%27')
.replace('(', '%28')
.replace(')', '%29')
.replace('*', '%2A')
.replace('%20', '+');
}

I think now in 2022 to be really safe, you should always consider constructing your URLs using the URL() interface. It'll do most of the job for you. So coming to your code,
const baseURL = 'http://example.com/index.html';
const myUrl = new URL(baseURL);
myUrl.searchParams.append('param', '1');
myUrl.searchParams.append('anotherParam', '2');
const myOtherUrl = new URL(baseURL);
myOtherUrl.searchParams.append('url', myUrl.href);
console.log(myUrl.href);
// Outputs: http://example.com/index.html?param=1&anotherParam=2
console.log(myOtherUrl.href);
// Outputs: http://example.com/index.html?url=http%3A%2F%2Fexample.com%2Findex.html%3Fparam%3D1%26anotherParam%3D2
console.log(myOtherUrl.searchParams.get('url'));
// Outputs: http://example.com/index.html?param=1&anotherParam=2
Or...
const params = new URLSearchParams(myOtherUrl.search);
console.log(params.get('url'));
// Outputs: http://example.com/index.html?param=1&anotherParam=2
Something like this is assured not to fail.

To encode a URL, as has been said before, you have two functions:
encodeURI()
and
encodeURIComponent()
The reason both exist is that the first preserves the URL with the risk of leaving too many things unescaped, while the second encodes everything needed.
With the first, you could copy the newly escaped URL into address bar (for example) and it would work. However your unescaped '&'s would interfere with field delimiters, the '='s would interfere with field names and values, and the '+'s would look like spaces. But for simple data when you want to preserve the URL nature of what you are escaping, this works.
The second is everything you need to do to make sure nothing in your string interfers with a URL. It leaves various unimportant characters unescaped so that the URL remains as human readable as possible without interference. A URL encoded this way will no longer work as a URL without unescaping it.
So if you can take the time, you always want to use encodeURIComponent() -- before adding on name/value pairs encode both the name and the value using this function before adding it to the query string.
I'm having a tough time coming up with reasons to use the encodeURI() -- I'll leave that to the smarter people.

What is URL encoding:
A URL should be encoded when there are special characters located inside the URL. For example:
console.log(encodeURIComponent('?notEncoded=&+'));
We can observe in this example that all characters except the string notEncoded are encoded with % signs. URL encoding is also known as percentage encoding because it escapes all special characters with a %. Then after this % sign every special character has a unique code
Why do we need URL encoding:
Certain characters have a special value in a URL string. For example, the ? character denotes the beginning of a query string. In order to successfully locate a resource on the web, it is necessary to distinguish between when a character is meant as a part of string or part of the URL structure.
How can we achieve URL encoding in JavaScript:
JavaScript offers a bunch of built-in utility functions which we can use to easily encode URLs. These are two convenient options:
encodeURIComponent(): Takes a component of a URI as an argument and returns the encoded URI string.
encodeURI(): Takes a URI as an argument and returns the encoded URI string.
Example and caveats:
Be aware of not passing in the whole URL (including scheme, e.g., https://) into encodeURIComponent(). This can actually transform it into a not functional URL. For example:
// for a whole URI don't use encodeURIComponent it will transform
// the / characters and the URL won't fucntion properly
console.log(encodeURIComponent("http://www.random.com/specials&char.html"));
// instead use encodeURI for whole URL's
console.log(encodeURI("http://www.random.com/specials&char.html"));
We can observe f we put the whole URL in encodeURIComponent that the forward slashes (/) are also converted to special characters. This will cause the URL to not function properly anymore.
Therefore (as the name implies) use:
encodeURIComponent on a certain part of a URL which you want to encode.
encodeURI on a whole URL which you want to encode.

To prevent double encoding, it's a good idea to decode the URL before encoding (if you are dealing with user entered URLs for example, which might be already encoded).
Let’s say we have abc%20xyz 123 as input (one space is already encoded):
encodeURI("abc%20xyz 123") // Wrong: "abc%2520xyz%20123"
encodeURI(decodeURI("abc%20xyz 123")) // Correct: "abc%20xyz%20123"

A similar kind of thing I tried with normal JavaScript:
function fixedEncodeURIComponent(str){
return encodeURIComponent(str).replace(/[!'()]/g, escape).replace(/\*/g, "%2A");
}

You should not use encodeURIComponent() directly.
Take a look at RFC3986: Uniform Resource Identifier (URI): Generic Syntax
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI.
These reserved characters from the URI definition in RFC3986 ARE NOT escaped by encodeURIComponent().
MDN Web Docs: encodeURIComponent()
To be more stringent in adhering to RFC 3986 (which reserves !, ', (, ), and *), even though these characters have no formalized URI delimiting uses, the following can be safely used:
Use the MDN Web Docs function...
function fixedEncodeURIComponent(str) {
return encodeURIComponent(str).replace(/[!'()*]/g, function(c) {
return '%' + c.charCodeAt(0).toString(16);
});
}

Performance
Today (2020.06.12) I performed a speed test for chosen solutions on macOS v10.13.6 (High Sierra) on browsers Chrome 83.0, Safari 13.1, and Firefox 77.0. This results can be useful for massive URLs encoding.
Conclusions
encodeURI (B) seems to be fastest, but it is not recommended for URLs
escape (A) is a fast cross-browser solution
solution F recommended by MDN is medium fast
solution D is slowest
Details
For solutions
A
B
C
D
E
F
I perform two tests
for short URL - 50 characters - you can run it HERE
for long URL - 1M characters - you can run it HERE
function A(url) {
return escape(url);
}
function B(url) {
return encodeURI(url);
}
function C(url) {
return encodeURIComponent(url);
}
function D(url) {
return new URLSearchParams({url}).toString();
}
function E(url){
return encodeURIComponent(url).replace(/[!'()]/g, escape).replace(/\*/g, "%2A");
}
function F(url) {
return encodeURIComponent(url).replace(/[!'()*]/g, function(c) {
return '%' + c.charCodeAt(0).toString(16);
});
}
// ----------
// TEST
// ----------
var myUrl = "http://example.com/index.html?param=1&anotherParam=2";
[A,B,C,D,E,F]
.forEach(f=> console.log(`${f.name} ?url=${f(myUrl).replace(/^url=/,'')}`));
This snippet only presents code of chosen solutions
Example results for Chrome

Nothing worked for me. All I was seeing was the HTML of the login page, coming back to the client side with code 200. (302 at first but the same Ajax request loading login page inside another Ajax request, which was supposed to be a redirect rather than loading plain text of the login page).
In the login controller, I added this line:
Response.Headers["land"] = "login";
And in the global Ajax handler, I did this:
$(function () {
var $document = $(document);
$document.ajaxSuccess(function (e, response, request) {
var land = response.getResponseHeader('land');
var redrUrl = '/login?ReturnUrl=' + encodeURIComponent(window.location);
if(land) {
if (land.toString() === 'login') {
window.location = redrUrl;
}
}
});
});
Now I don't have any issue, and it works like a charm.

Here is a live demo of encodeURIComponent() and decodeURIComponent() JavaScript built-in functions:
<!DOCTYPE html>
<html>
<head>
<style>
textarea{
width: 30%;
height: 100px;
}
</style>
<script>
// Encode string to Base64
function encode()
{
var txt = document.getElementById("txt1").value;
var result = btoa(txt);
document.getElementById("txt2").value = result;
}
// Decode Base64 back to original string
function decode()
{
var txt = document.getElementById("txt3").value;
var result = atob(txt);
document.getElementById("txt4").value = result;
}
</script>
</head>
<body>
<div>
<textarea id="txt1">Some text to decode
</textarea>
</div>
<div>
<input type="button" id="btnencode" value="Encode" onClick="encode()"/>
</div>
<div>
<textarea id="txt2">
</textarea>
</div>
<br/>
<div>
<textarea id="txt3">U29tZSB0ZXh0IHRvIGRlY29kZQ==
</textarea>
</div>
<div>
<input type="button" id="btndecode" value="Decode" onClick="decode()"/>
</div>
<div>
<textarea id="txt4">
</textarea>
</div>
</body>
</html>

Encode URL String
var url = $(location).attr('href'); // Get the current URL
// Or
var url = 'folder/index.html?param=#23dd&noob=yes'; // Or specify one
var encodedUrl = encodeURIComponent(url);
console.log(encodedUrl);
// Outputs folder%2Findex.html%3Fparam%3D%2323dd%26noob%3Dyes
For more information, go to, jQuery Encode/Decode URL String.

Use fixedEncodeURIComponent function to strictly comply with RFC 3986:
function fixedEncodeURIComponent(str) {
return encodeURIComponent(str).replace(/[!'()*]/g, function(c) {
return '%' + c.charCodeAt(0).toString(16);
});
}

You can use ESAPI library and encode your URL using the below function. The function ensures that '/'s are not lost to encoding while the remainder of the text contents are encoded:
function encodeUrl(url)
{
String arr[] = url.split("/");
String encodedUrl = "";
for(int i = 0; i<arr.length; i++)
{
encodedUrl = encodedUrl + ESAPI.encoder().encodeForHTML(ESAPI.encoder().encodeForURL(arr[i]));
if(i<arr.length-1) encodedUrl = encodedUrl + "/";
}
return url;
}

Don't forget the /g flag to replace all encoded ' '
var myOtherUrl = "http://example.com/index.html?url=" + encodeURIComponent(myUrl).replace(/%20/g,'+');

I always use this to encode stuff for URLs. This is completely safe because it will encode every single character even if it doesn't have to be encoded.
function urlEncode(text) {
let encoded = '';
for (let char of text) {
encoded += '%' + char.charCodeAt(0).toString(16);
}
return encoded;
}

let name = `bbb`;
params = `name=${name}`;
var myOtherUrl = `http://example.com/index.html?url=${encodeURIComponent(params)}`;
console.log(myOtherUrl);
Use backtick now in ES6 to encode urls
try this - https://bbbootstrap.com/code/encode-url-javascript-26885283

Related

Is there a more succinct way to get the last number in my url?

So I currently pass two variables into the url for use on another page. I get the last variable (ie #12345) with location.hash. Then from the other part of the url (john%20jacob%202) all I need is the '2'. I've got it working but feel there must be a cleaner and succinct way to handle this. The (john%20jacob%202) will change all the time to have different string lengths.
url: http://localhost/index.html?john%20jacob%202?#12345
<script>
var hashUrl = location.hash.replace("?","");
// function here to use this data
var fullUrl = window.location.href;
var urlSplit = fullUrl.split('?');
var justName = urlSplit[1];
var nameSplit = justName.split('%20');
var justNumber = nameSplit[2];
// function here to use this data
</script>
A really quick one-liner could be something like:
let url = 'http://localhost/index.html?john%20jacob%202?#12345';
url.split('?')[1].split('').pop();
// returns '2'
How about something like
decodeURI(window.location.search).replace(/\D/g, '')
Since your window.location.search is URI encoded we start by decoding it. Then replace everything that is not a number with nothing. For your particular URL it will return 2
Edit for clarity:
Your example location http://localhost/index.html?john%20jacob%202?#12345 consists of several parts, but the interesting one here is the part after the ? and before the #.
In Javascript this interesting part, the query string (or search), is available through window.location.search. For your specific location window.location.search will return ?john%20jacob%202?.
The %20 is a URI encoded space. To decode (ie. remove) all the URI encodings I first run the search string through the decodeURI function. Then I replace everything that is not a number in that string with an empty string using a regular expression.
The regular expression /\D/ matches any character that is not a number, and the g is a modifier specifying that I want to match everything (not just stop after the first match), resulting in 2.
If you know you are always after a tag, you could replace everything up until the "#"
url.replace(/^.+#/, '');
Alternatively, this regex will match the last numbers in your URL:
url.match(/(?<=\D)\d+$/);
//(positive look behind for any non-digit) one more digits until the end of the string

Using Regex to parse a URI

I'm currently using Modenizr to determine what link to serve users based on their device of choice. So if they're using a mobile device I want to return a URI if not then just return a traditional URL.
URI: spotify:album:1jcYwZsN7JEve9xsq9BuUX
URL: https://open.spotify.com/album/1jcYwZsN7JEve9xsq9BuUX
Right now I'm using slice() to retrieve the last 22 characters of the URI. Though it works I'd like to parse the string via regex in the event that the URI exceeds the aforementioned character amount. What would be the best way to get the string of characters after the second colon of the URI?
$(".spotify").attr("href", function(index, value) {
if (Modernizr.touch) {
return value
} else {
return "https://open.spotify.com/album/" + value.slice(-22);
}
});
I would like something like this using split.
var url = 'spotify:album:1jcYwZsN7JEve9xsq9BuUX'.split(':');
var part = url[url.length-1];
// alert(part);
return "https://open.spotify.com/album/" + part;
Regex is appropriate for this task because it is quite simple, here's the RegEx which supports as many : as there are and will still work
/[\w\:]*\:(\w+)/
How it works
[\w\:]* Will get all word characters (Letters, numbers, underscore) and colons
\: Will basically tell the previous thing to stop at a colon. Regex is by default greedy, that means it will get the last colon
(\w+) Will select all word characters and store it in a group so we can access it
Use this like:
var string = 'spotify:album:1jcYwZsN7JEve9xsq9BuUX',
parseduri = string.match(/[\w\:]*\:(\w+)/)[1];
parseduri is the result
And then you can finally combine this:
var url = 'https://open.spotify.com/album/'+parseduri;

Using encodeURI() vs. escape() for utf-8 strings in JavaScript

I am handling utf-8 strings in JavaScript and need to escape them.
Both escape() / unescape() and encodeURI() / decodeURI() work in my browser.
escape()
> var hello = "안녕하세요"
> var hello_escaped = escape(hello)
> hello_escaped
"%uC548%uB155%uD558%uC138%uC694"
> var hello_unescaped = unescape(hello_escaped)
> hello_unescaped
"안녕하세요"
encodeURI()
> var hello = "안녕하세요"
> var hello_encoded = encodeURI(hello)
> hello_encoded
"%EC%95%88%EB%85%95%ED%95%98%EC%84%B8%EC%9A%94"
> var hello_decoded = decodeURI(hello_encoded)
> hello_decoded
"안녕하세요"
However, Mozilla says that escape() is deprecated.
Although encodeURI() and decodeURI() work with the above utf-8 string, the docs (as well as the function names themselves) tell me that these methods are for URIs; I do not see utf-8 strings mentioned anywhere.
Simply put, is it okay to use encodeURI() and decodeURI() for utf-8 strings?
Hi!
When it comes to escape and unescape, I live by two rules:
Avoid them when you easily can.
Otherwise, use them.
Avoiding them when you easily can:
As mentioned in the question, both escape and unescape have been deprecated. In general, one should avoid using deprecated functions.
So, if encodeURIComponent or encodeURI does the trick for you, you should use that instead of escape.
Using them when you can't easily avoid them:
Browsers will, as far as possible, strive to achieve backwards compatibility. All major browsers have already implemented escape and unescape; why would they un-implement them?
Browsers would have to redefine escapeand unescape if the new specification requires them to do so. But wait! The people who write specifications are quite smart. They too, are interested in not breaking backwards compatibility!
I realize that the above argument is weak. But trust me, ... when it comes to browsers, deprecated stuff works. This even includes deprecated HTML tags like <xmp> and <center>.
Using escape and unescape:
So naturally, the next question is, when would one use escape or unescape?
Recently, while working on CloudBrave, I had to deal with utf8, latin1 and inter-conversions.
After reading a bunch of blog posts, I realized how simple this was:
var utf8_to_latin1 = function (s) {
return unescape(encodeURIComponent(s));
};
var latin1_to_utf8 = function (s) {
return decodeURIComponent(escape(s));
};
These inter-conversions, without using escape and unescape are rather involved. By not avoiding escape and unescape, life becomes simpler.
Hope this helps.
It is never okay to use encodeURI() or encodeURIComponent(). Let's try it out:
console.log(encodeURIComponent('##*'));
Input: ##*. Output: %40%23*. Wait, so, what exactly happened to the * character? Why wasn't that converted? Imagine this: You ask a user what file to delete and their response is *. Server-side, you convert that using encodeURIComponent() and then run rm *. Well, got news for you: using encodeURIComponent() means you just deleted all files.
Use fixedEncodeURI(), when trying to encode a complete URL (i.e., all of example.com?arg=val), as defined and further explained at the MDN encodeURI() Documentation...
function fixedEncodeURI(str) {
return encodeURI(str).replace(/%5B/g, '[').replace(/%5D/g, ']');
}
Or, you may need to use use fixedEncodeURIComponent(), when trying to encode part of a URL (i.e., the arg or the val in example.com?arg=val), as defined and further explained at the MDN encodeURIComponent() Documentation...
function fixedEncodeURIComponent(str) {
return encodeURIComponent(str).replace(/[!'()*]/g, function(c) {
return '%' + c.charCodeAt(0).toString(16);
});
}
If you are unable to distinguish them based on the above description, I always like to simplify it with:
fixedEncodeURI() : will not encode +#?=:#;,$& to their http-encoded equivalents (as & and + are common URL operators)
fixedEncodeURIComponent() will encode +#?=:#;,$& to their http-encoded equivalents.
Mozilla says that escape() is deprecated.
Yes, you should avoid both escape() and unescape()
Simply put, is it okay to use encodeURI() and decodeURI() for utf-8 strings?
Yes, but depending on the form of your input and the required form of your output you may need some extra work.
From your question I assume you have a JavaScript string and you want to convert encoding to UTF-8 and finally store the string in some escaped form.
First of all it's important to note that JavaScript strings enconding is UCS-2, similar to UTF-16, different from UTF-8.
See: https://mathiasbynens.be/notes/javascript-encoding
encodeURIComponent() is good for the job as turns the UCS-2 JavaScript string into UTF-8 and escapes it in the form a sequence of %nn substrings where each nn is the two hex digits of each byte.
However encodeURIComponent() does not escape letters, digits and few other characters in the ASCII range. But this is easy to fix.
For example, if you want to turn a JavaScript string into an array of numbers representing the bytes of the original string UTF-8 encoded you may use this function:
//
// Convert JavaScript UCS2 string to array of bytes representing the string UTF8 encoded
//
function StringUTF8AsBytesArrayFromString( s )
{
var i,
n,
u;
u = [];
s = encodeURIComponent( s );
n = s.length;
for( i = 0; i < n; i++ )
{
if( s.charAt( i ) == '%' )
{
u.push( parseInt( s.substring( i + 1, i + 3 ), 16 ) );
i += 2;
}
else
{
u.push( s.charCodeAt( i ) );
}
}
return u;
}
If you want to turn the string in its hexadecimal representation:
//
// Convert JavaScript UCS2 string to hex string representing the bytes of the string UTF8 encoded
//
function StringUTF8AsHexFromString( s )
{
var u,
i,
n,
s;
u = StringUTF8AsBytesArrayFromString( s );
n = u.length;
s = '';
for( i = 0; i < n; i++ )
{
s += ( u[ i ] < 16 ? '0' : '' ) + u[ i ].toString( 16 );
}
return s;
}
If you change the line in the for loop into
s += '%' + ( u[ i ] < 16 ? '0' : '' ) + u[ i ].toString( 16 );
(adding the % sign before each hex digit)
The resulting escaped string (UTF-8 encoded) may be turned back into a JavaScript UCS-2 string with decodeURIComponent()

Most efficient way to grab XML tag from file with JavaScript and Regex

I'm doing some more advanced automation on iOS devices and simulators for an enterprise application. The automation is written in browserless Javascript. One of the methods works on the device but not on the simulator, so I need to code a workaround. For the curious, it's UIATarget.localTarget().frontMostApp().preferencesValueForKey(key).
What we need to do is read a path to a server (which varies) from a plist file on disk. As a workaround on the simulator, I've used the following lines to locate the plist file containing the preferences:
// Get the alias of the user who's logged in
var result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/whoami", [], 5).stdout;
// Remove the extra newline at the end of the alias we got
result = result.replace('\n',"");
// Find the location of the plist containing the server info
result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/find", ["/Users/"+result+"/Library/Application Support/iPhone Simulator", "-name", "redacted.plist"], 100);
// For some reason we need a delay here
UIATarget.localTarget().delay(.5);
// Results are returned in a single string separated by newline characters, so we can split it into an array
// This array contains all of the folders which have the plist file under the Simulator directory
var plistLocations = result.stdout.split("\n");
...
// For this example, let's just assume we want slot 0 here to save time
var plistBinaryLocation = plistLocations[0];
var plistXMLLocation = plistLocations[i] + ".xml";
result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/plutil", ["-convert","xml1", plistBinaryLocation,"-o", plistXMLLocation], 100);
From here, I think the best way to get the contents is to cat or grep the file, since we can't read the file directly from disk. However, I'm having trouble getting the syntax down. Here's an edited snippet of the plist file I'm reading:
<key>server_url</key>
<string>http://pathToServer</string>
There are a bunch of key/string pairs in the file, where the server_url key is unique. Ideally I'd do something like a lookback, but because JavaScript doesn't appear to support it, I figured I'd just get the pair from the file and whittle it down a bit later.
I can search for the key with this:
// This line works
var expression = new RegExp(escapeRegExp("<key>server_url</key>"));
if(result.stdout.match(expression))
{
UIALogger.logMessage("FOUND IT!!!");
}
else
{
UIALogger.logMessage("NOPE :(");
}
Where the escapeRegExp method looks like this:
function escapeRegExp(str)
{
var result = str.replace(/([()[{*+.$^\\|?])/g, '\\$1');
UIALogger.logMessage("NEW STRING: " + result);
return result;
}
Also, this line returns a value (but gets the wrong line):
var expression = new RegExp(escapeRegExp("<string>(.*?)</string>"));
However, when you put the two together, it (the Regex syntax) works on the terminal but doesn't work in code:
var expression = new RegExp(escapeRegExp("<key>server_url</key>[\s]*<string>(.*?)</string>"));
What am I missing? I also tried grep and egrep without any luck.
There are two problems affecting you here getting the regex to work in your JavaScript code.
First, you are escaping the whole regex expression string, which means that your capturing (.*?) and your whitespace ignoring [\s]* will also be escaped and won't be evaluated the way you're expecting. You need to escape the XML parts and add in the regex parts without escaping them.
Second, the whitespace ignoring part, [\s]* is falling prey to JavaScript's normal string escaping rules. the "\s" is turning into "s" in the output. You need to escape that backslash with "\s" so that it stays as "\s" in the string that you pass to construct the regular expression.
I've built a working script that I've verified in the UI Automation engine itself. It should extract and print out the expected URL:
var testString = "" +
"<plistExample>\n" +
" <key>dont-find-me</key>\n" +
" <string>bad value</string>\n" +
" <key>server_url</key>\n" +
" <string>http://server_url</string>\n" +
"</plistExample>";
function escapeRegExp(str)
{
var result = str.replace(/([()[{*+.$^\\|?])/g, '\\$1');
UIALogger.logMessage("NEW STRING: " + result);
return result;
}
var strExp = escapeRegExp("<key>server_url</key>") + "[\\s]*" + escapeRegExp("<string>") + "(.*)" + escapeRegExp("</string>");
UIALogger.logMessage("Expression escaping only the xml parts:" + strExp);
var exp = new RegExp(strExp);
var match = testString.match(exp);
UIALogger.logMessage("Match: " + match[1]);
I should point out, though, that the only thing you need to escape in the regex is the forward slashes in the XML closing tags. That means that you don't need your escapeRegExp() function and can write the expression you want like this:
var exp = new RegExp("<key>server_url<\/key>[\\s]*<string>(.*)<\/string>");

How can I extract a URL from url("http://www.example.com")?

I need to get the URL of an element's background image with jQuery:
var foo = $('#id').css('background-image');
This results in something like url("http://www.example.com/image.gif"). How can I get just the "http://www.example.com/image.gif" part from that? typeof foo says it's a string, but the url() part makes me think that JavaScript and/or jQuery has a special URL type and that I should be able to get the location with foo.toString(). That doesn't work though.
Note that different browser implementations may return the string in a different format. For instance, one browser may return double-quotes while another browser may return the value without quotes. This makes it awkward to parse, especially when you consider that quotes are valid as URL characters.
I would say the best approach is a good old check and slice():
var imageUrlString = $('#id').css('background-image'),
quote = imageUrlString.charAt(4),
result;
if (quote == "'" || quote == '"')
result = imageUrlString.slice(5, -2);
else
result = imageUrlString.slice(4, -1);
Assuming the browser returns a valid string, this wouldn't fail. Even if an empty string were returned (ie, there is no background image), the result is an empty string.
You might want to consider regular expressions in this case:
var urlStr = 'url("http://www.foo.com/")';
var url = urlStr.replace(/^url\(['"]?([^'"]*)['"]?\);?$/, '$1');
This particular regex allows you to use formats like url(http://foo.bar/) or url("http://foo.bar/"), with single quotes instead of double quotes, or possibly with a semicolon at the end.
You could split the string at each " and get the second element:
var foo = $('#id').css('background-image').split('"')[1];
Note: This doesn't work if your URL contains quotation marks.
If it's always the same, I'd just take the substring of the URL without the prefix.
For instance, if it's always:
url("<URL>")
url("<otherURL>")
It's always the 5th index of the string to the len - 2
Not the best by all means, but probably faster than a Regex if you're not worried about other string formats.
There is no special URL type - it's a string representing a CSS url value. You can get the URL back out with a regex:
var foo = ${'#id').css('background-image');
var url = foo.match(/url\(['"](.*)['"]\)/)[1];
(that regex isn't foolproof, but it should work against whatever jQuery returns)

Categories