I want to get a url from a string but I am unfamiliar with the use of regex or any such methods.
For example i have 3 strings,
"I've navigated to www.facebook.com";
"I've navigated to www.facebook.com and to www.google.com";
"I've navigated to https://www.facebook.com ;
In my case : I should get "www.facebook.com" as the url that is extracted from the first string.
All I want is to get the first url inside the string so i can make a link preview using an API I found. But I am struggling to get the url using Javascript or Jquery. The string will be gotten from a textbox and I want to get the url on keyup.
You can use this function to extract the first URL found in the given string:
function getFirstUrl(string) {
var pattern = /(https?:\/\/)?(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)/;
var match = string.match(pattern);
return match[0];
}
var string = "I've nagivated to www.facebook.com and to www.google.com";
var url = getFirstUrl(string);
// url === 'www.facebook.com'
I got the regex pattern from this answer and modified it to make the https:// part optional.
Related
I'm trying to parse a specifc part of url after search using
any language.(Ideally Javascript but open to Python)
How do I get a specific part of url and save/store?
For example,
In songking.com,
The way to get artist_id is checking a specific part of the url after searching artist name
in the search bar of the website.
in the case below,
the artist id is 301329.
https://www.songkick.com/artists/301329-rac
I strongly believe there is a way to parse this part using either python or js
given that I have a csv file that has artist name in its column. Instead of searching all the artists one by one. I wonder about the algorithm that literate my csv column and search it and parse the url and save/store.
It would be very grateful even if I could only get a hint that I could start with.
Thank you so much always.
It can be done using regular expressions.
Here's an example of a JavaScript implementation
const url = "https://www.songkick.com/artists/301329-rac";
const regex = /https:\/\/www\.songkick\.com\/artists\/(\d+)-.+/;
const match = url.match(regex);
if (match) {
console.log('Artist ID: ' + match[1]);
} else {
console.log('No Artist ID found!');
}
This regular expression /https:\/\/www\.songkick\.com\/artists\/(\d+)-.+/ means that we're trying to match something that starts with https://www.songkick.com/artists/, preceded by a group of decimals a dash then a group of letters.
The match() method retrieves the result of matching a string against a
regular expression.
Thus it will return the overall string in the first index, then the matched (\d+) group in the second index (match[1] in our case).
If you're not sure of the protocol (http vs https) you can add a ? in the regex right after https. That makes the s in https optional. So the regex would become /https?:\/\/www\.songkick\.com\/artists\/(\d+)-.+/.
Let me know if you need more explanation.
First, you can use RegEx simply.
In python
import re
url = 'https://www.songkick.com/artists/301329-rac'
pattern = '/artists/(\d+)-\w'
match = re.search(pattern, url)
if match:
artist_id = match.group(1)
I hope this will help you.
So I currently pass two variables into the url for use on another page. I get the last variable (ie #12345) with location.hash. Then from the other part of the url (john%20jacob%202) all I need is the '2'. I've got it working but feel there must be a cleaner and succinct way to handle this. The (john%20jacob%202) will change all the time to have different string lengths.
url: http://localhost/index.html?john%20jacob%202?#12345
<script>
var hashUrl = location.hash.replace("?","");
// function here to use this data
var fullUrl = window.location.href;
var urlSplit = fullUrl.split('?');
var justName = urlSplit[1];
var nameSplit = justName.split('%20');
var justNumber = nameSplit[2];
// function here to use this data
</script>
A really quick one-liner could be something like:
let url = 'http://localhost/index.html?john%20jacob%202?#12345';
url.split('?')[1].split('').pop();
// returns '2'
How about something like
decodeURI(window.location.search).replace(/\D/g, '')
Since your window.location.search is URI encoded we start by decoding it. Then replace everything that is not a number with nothing. For your particular URL it will return 2
Edit for clarity:
Your example location http://localhost/index.html?john%20jacob%202?#12345 consists of several parts, but the interesting one here is the part after the ? and before the #.
In Javascript this interesting part, the query string (or search), is available through window.location.search. For your specific location window.location.search will return ?john%20jacob%202?.
The %20 is a URI encoded space. To decode (ie. remove) all the URI encodings I first run the search string through the decodeURI function. Then I replace everything that is not a number in that string with an empty string using a regular expression.
The regular expression /\D/ matches any character that is not a number, and the g is a modifier specifying that I want to match everything (not just stop after the first match), resulting in 2.
If you know you are always after a tag, you could replace everything up until the "#"
url.replace(/^.+#/, '');
Alternatively, this regex will match the last numbers in your URL:
url.match(/(?<=\D)\d+$/);
//(positive look behind for any non-digit) one more digits until the end of the string
I am making a web app that has multiple 'pages' but it will all be loaded client side. Seems how it is all technically on the same page, I will be using parameters after # to track the current page state while preventing postbacks. My problem is that I cant seem to select all the parameters with a regex line. The regex for split works when I use a testing tool online but does not work when I use it on my web page.
//Test data for url
//https://test.ca?hi&hey=3&test=oh+hi+mark#edit&e=1
var split = /([^&#=]+)=?([^&#]*)/g;
var url = window.location.href;
var match = split.exec(url);
//this outputs match with a length of three
//[0] = 'https://test.ca?hi'
//[1] = 'https://test.ca?hi'
//[2] = ''
I thought this should be a solved problem but I cant seem to find an answer. Which I guess leads to another question. Am I going about this the completely wrong way?
You are using the regex wrong. You just print the whole match, while you need to access the captured groups while iterating through all the matches inside 1 string.
Here is an example snippet:
var re = /([^&#=]+)=?([^&#]*)/g;
var str = 'https://test.ca?hi&hey=3&test=oh+hi+mark#edit&e=1';
var match;
while ((match = re.exec(str)) !== null) {
document.write(match[1] + "<br/>" + match[2] + "<br/><br/>");
}
Note that the first match is the "main" part of the URL. Subsequent matches are param-value pairs.
Try using window.location.hash instead. It will return the hash value (in your example url it would be #edit&e=1) and you can use string operations to do whatever you need to with that.
I have a list of urls like this:
http://www.mylocal.com
http://v1.mylocal.com
http://v2.mylocal.com
http://www.mylocal2.com
http://www.mylocal3.com
And I want to write a JS that if I define the search string be "*.mylocal.com" , then it will return www.mylocal.com v1.mylocal.com and v2.myloca.com. And if the search string is "www.local.com", then it will return only www.mylocal.com
how should I write it?
The following regex will match what you want when given a host string:
var reg = new RegExp('^https?://([^.]*' + host + ')');
So, for example:
var host = '.mylocal.com';
reg.exec('http://www.mylocal.com'); // ["http://www.mylocal.com", "www.mylocal.com"]
reg.exec('http://v1.mylocal.com/path'); // ["http://v1.mylocal.com", "v1.mylocal.com"]
reg.exec('https://v3.mylocal.com'); // ["https://v3.mylocal.com", "v3.mylocal.com"]
host = 'www.mylocal.com';
reg.exec('http://www.mylocal.com'); // ["http://www.mylocal.com", "www.mylocal.com"]
reg.exec('http://v1.mylocal.com/path'); // null
reg.exec('https://v3.mylocal.com'); // null
You could also refer to the following post for a full URI regex:
Regular expression validation for URL in ASP.net.
If you want to search on each part of the URL then do just that.
split the URL into 3 searching strings, then run a match of each against your split search terms, this way you can control matching at the beginning and end of each term, and if you would like can order the rest of the terms appropriately.
With the script I'm making, jquery is getting vars from url parameter. The value that its getting is an url so if its something like
http://localhost/index.html?url=http://www.example.com/index.php?something=some
it reads:
url = http://www.example.com/index.php?something
If its like
http://localhost/index.html?url=http://www.example.com/index.php?something%3Dsome
it reads:
url = http://www.example.com/index.php?something%3Dsome
which would register as a valid url. my question is how can I search for = sign in the url variable and replace it with hex %3D with jquery or javascript?
Use the (built-in) encodeURIComponent() function:
url = 'http://localhost/index.html?url=' +
encodeURIComponent('http://www.example.com/index.php?something=some');
Are you looking for encodeURIComponent and decodeURIComponent?