Using Regex to parse a URI - javascript

I'm currently using Modenizr to determine what link to serve users based on their device of choice. So if they're using a mobile device I want to return a URI if not then just return a traditional URL.
URI: spotify:album:1jcYwZsN7JEve9xsq9BuUX
URL: https://open.spotify.com/album/1jcYwZsN7JEve9xsq9BuUX
Right now I'm using slice() to retrieve the last 22 characters of the URI. Though it works I'd like to parse the string via regex in the event that the URI exceeds the aforementioned character amount. What would be the best way to get the string of characters after the second colon of the URI?
$(".spotify").attr("href", function(index, value) {
if (Modernizr.touch) {
return value
} else {
return "https://open.spotify.com/album/" + value.slice(-22);
}
});

I would like something like this using split.
var url = 'spotify:album:1jcYwZsN7JEve9xsq9BuUX'.split(':');
var part = url[url.length-1];
// alert(part);
return "https://open.spotify.com/album/" + part;

Regex is appropriate for this task because it is quite simple, here's the RegEx which supports as many : as there are and will still work
/[\w\:]*\:(\w+)/
How it works
[\w\:]* Will get all word characters (Letters, numbers, underscore) and colons
\: Will basically tell the previous thing to stop at a colon. Regex is by default greedy, that means it will get the last colon
(\w+) Will select all word characters and store it in a group so we can access it
Use this like:
var string = 'spotify:album:1jcYwZsN7JEve9xsq9BuUX',
parseduri = string.match(/[\w\:]*\:(\w+)/)[1];
parseduri is the result
And then you can finally combine this:
var url = 'https://open.spotify.com/album/'+parseduri;

Related

How do I parse url after searching?

I'm trying to parse a specifc part of url after search using
any language.(Ideally Javascript but open to Python)
How do I get a specific part of url and save/store?
For example,
In songking.com,
The way to get artist_id is checking a specific part of the url after searching artist name
in the search bar of the website.
in the case below,
the artist id is 301329.
https://www.songkick.com/artists/301329-rac
I strongly believe there is a way to parse this part using either python or js
given that I have a csv file that has artist name in its column. Instead of searching all the artists one by one. I wonder about the algorithm that literate my csv column and search it and parse the url and save/store.
It would be very grateful even if I could only get a hint that I could start with.
Thank you so much always.
It can be done using regular expressions.
Here's an example of a JavaScript implementation
const url = "https://www.songkick.com/artists/301329-rac";
const regex = /https:\/\/www\.songkick\.com\/artists\/(\d+)-.+/;
const match = url.match(regex);
if (match) {
console.log('Artist ID: ' + match[1]);
} else {
console.log('No Artist ID found!');
}
This regular expression /https:\/\/www\.songkick\.com\/artists\/(\d+)-.+/ means that we're trying to match something that starts with https://www.songkick.com/artists/, preceded by a group of decimals a dash then a group of letters.
The match() method retrieves the result of matching a string against a
regular expression.
Thus it will return the overall string in the first index, then the matched (\d+) group in the second index (match[1] in our case).
If you're not sure of the protocol (http vs https) you can add a ? in the regex right after https. That makes the s in https optional. So the regex would become /https?:\/\/www\.songkick\.com\/artists\/(\d+)-.+/.
Let me know if you need more explanation.
First, you can use RegEx simply.
In python
import re
url = 'https://www.songkick.com/artists/301329-rac'
pattern = '/artists/(\d+)-\w'
match = re.search(pattern, url)
if match:
artist_id = match.group(1)
I hope this will help you.

Alternative to lookbehind in js

I am trying to use javascript to do text replacements for variables with the follow format #variable (yes I know it is bad practice, but sadly it's data from an external system so I cannot change it).
The problem is that I need to ensure that it also works if there are mail addresses in the text.
Therefor it needs to match #variable but not test#example.com. If it was in another language I would simply use something like, but js does not support lookbehind.
text.replace(/(?<!\w)#[\w]+/g, replacement);
'#var' matches #var
'#var bar' matches #var
'bar#var' does not match
'bar2#var' does not match
Any javascript way of doing this using regex?
Here is an example of the expected result using negative lookbehind
https://regex101.com/r/orCEGE/1
It's not entirely clear what exactly you want to replace, but here's a fairly generic method:
const text = "#A foo#bar#baz #var#asdf.#Z";
const result = text.replace(/#(\w+)/g,
(m0, m1, pos, str) => {
if (pos > 0 && /\w/.test(str.charAt(pos-1))) {
return m0;
}
return "{replacement for " + m1 + "}";
}
);
console.log(result);
The replacement function gets not just the matched parts of the string, but also the position where the match occurred. This match position can be used to make further decisions (e.g. whether the matched string should be returned unchanged (as in return m0;)).

Is there a more succinct way to get the last number in my url?

So I currently pass two variables into the url for use on another page. I get the last variable (ie #12345) with location.hash. Then from the other part of the url (john%20jacob%202) all I need is the '2'. I've got it working but feel there must be a cleaner and succinct way to handle this. The (john%20jacob%202) will change all the time to have different string lengths.
url: http://localhost/index.html?john%20jacob%202?#12345
<script>
var hashUrl = location.hash.replace("?","");
// function here to use this data
var fullUrl = window.location.href;
var urlSplit = fullUrl.split('?');
var justName = urlSplit[1];
var nameSplit = justName.split('%20');
var justNumber = nameSplit[2];
// function here to use this data
</script>
A really quick one-liner could be something like:
let url = 'http://localhost/index.html?john%20jacob%202?#12345';
url.split('?')[1].split('').pop();
// returns '2'
How about something like
decodeURI(window.location.search).replace(/\D/g, '')
Since your window.location.search is URI encoded we start by decoding it. Then replace everything that is not a number with nothing. For your particular URL it will return 2
Edit for clarity:
Your example location http://localhost/index.html?john%20jacob%202?#12345 consists of several parts, but the interesting one here is the part after the ? and before the #.
In Javascript this interesting part, the query string (or search), is available through window.location.search. For your specific location window.location.search will return ?john%20jacob%202?.
The %20 is a URI encoded space. To decode (ie. remove) all the URI encodings I first run the search string through the decodeURI function. Then I replace everything that is not a number in that string with an empty string using a regular expression.
The regular expression /\D/ matches any character that is not a number, and the g is a modifier specifying that I want to match everything (not just stop after the first match), resulting in 2.
If you know you are always after a tag, you could replace everything up until the "#"
url.replace(/^.+#/, '');
Alternatively, this regex will match the last numbers in your URL:
url.match(/(?<=\D)\d+$/);
//(positive look behind for any non-digit) one more digits until the end of the string

remove all but a specific portion of a string in javascript

I am writing a little app for Sharepoint. I am trying to extract some text from the middle of a field that is returned:
var ows_MetaInfo="1;#Subject:SW|NameOfADocument
vti_parservers:SR|23.0.0.6421
ContentTypeID:SW|0x0101001DB26Cf25E4F31488B7333256A77D2CA
vti_cachedtitle:SR|NameOfADocument
vti_title:SR|ATitleOfADocument
_Author:SW:|TheNameOfOurCompany
_Category:SW|
ContentType:SW|Document
vti_author::SR|mrwienerdog
_Comments:SW|This is very much the string I need extracted
vti_categories:VW|
vtiapprovallevel:SR|
vti_modifiedby:SR|mrwienerdog
vti_assignedto:SR|
Keywords:SW|Project Name
ContentType _Comments"
So......All I want returned is "This is very much the string I need extracted"
Do I need a regex and a string replace? How would you write the regex?
Yes, you can use a regular expression for this (this is the sort of thing they are good for). Assuming you always want the string after the pipe (|) on the line starting with "_Comments:SW|", here's how you can extract it:
var matchresult = ows_MetaInfo.match(/^_Comments:SW\|(.*)$/m);
var comment = (matchresult==null) ? "" : matchresult[1];
Note that the .match() method of the String object returns an array. The first (index 0) element will be the entire match (here, we the entire match is the whole line, as we anchored it with ^ and $; note that adding the "m" after the regex makes this a multiline regex, allowing us to match the start and end of any line within the multi-line input), and the rest of the array are the submatches that we capture using parenthesis. Above we've captured the part of the line that you want, so that will present in the second item in the array (index 1).
If there is no match ("_Comments:SW|" doesnt appear in ows_MetaInfo), then .match() will return null, which is why we test it before pulling out the comment.
If you need to adjust the regex for other scenarios, have a look at the Regex docs on Mozilla Dev Network: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
You can use this code:
var match = ows_MetaInfo.match(/_Comments:SW\|([^\n]+)/);
if (match)
document.writeln(match[1]);
I'm far from competent with RegEx, so here is my RegEx-less solution. See comments for further detail.
var extractedText = ExtractText(ows_MetaInfo);
function ExtractText(arg) {
// Use the pipe delimiter to turn the string into an array
var aryValues = ows_MetaInfo.split("|");
// Find the portion of the array that contains "vti_categories:VW"
for (var i = 0; i < aryValues.length; i++) {
if (aryValues[i].search("vti_categories:VW") != -1)
return aryValues[i].replace("vti_categories:VW", "");
}
return null;
}​
Here's a working fiddle to demonstrate.

How can I extract a URL from url("http://www.example.com")?

I need to get the URL of an element's background image with jQuery:
var foo = $('#id').css('background-image');
This results in something like url("http://www.example.com/image.gif"). How can I get just the "http://www.example.com/image.gif" part from that? typeof foo says it's a string, but the url() part makes me think that JavaScript and/or jQuery has a special URL type and that I should be able to get the location with foo.toString(). That doesn't work though.
Note that different browser implementations may return the string in a different format. For instance, one browser may return double-quotes while another browser may return the value without quotes. This makes it awkward to parse, especially when you consider that quotes are valid as URL characters.
I would say the best approach is a good old check and slice():
var imageUrlString = $('#id').css('background-image'),
quote = imageUrlString.charAt(4),
result;
if (quote == "'" || quote == '"')
result = imageUrlString.slice(5, -2);
else
result = imageUrlString.slice(4, -1);
Assuming the browser returns a valid string, this wouldn't fail. Even if an empty string were returned (ie, there is no background image), the result is an empty string.
You might want to consider regular expressions in this case:
var urlStr = 'url("http://www.foo.com/")';
var url = urlStr.replace(/^url\(['"]?([^'"]*)['"]?\);?$/, '$1');
This particular regex allows you to use formats like url(http://foo.bar/) or url("http://foo.bar/"), with single quotes instead of double quotes, or possibly with a semicolon at the end.
You could split the string at each " and get the second element:
var foo = $('#id').css('background-image').split('"')[1];
Note: This doesn't work if your URL contains quotation marks.
If it's always the same, I'd just take the substring of the URL without the prefix.
For instance, if it's always:
url("<URL>")
url("<otherURL>")
It's always the 5th index of the string to the len - 2
Not the best by all means, but probably faster than a Regex if you're not worried about other string formats.
There is no special URL type - it's a string representing a CSS url value. You can get the URL back out with a regex:
var foo = ${'#id').css('background-image');
var url = foo.match(/url\(['"](.*)['"]\)/)[1];
(that regex isn't foolproof, but it should work against whatever jQuery returns)

Categories