I have a list of urls like this:
http://www.mylocal.com
http://v1.mylocal.com
http://v2.mylocal.com
http://www.mylocal2.com
http://www.mylocal3.com
And I want to write a JS that if I define the search string be "*.mylocal.com" , then it will return www.mylocal.com v1.mylocal.com and v2.myloca.com. And if the search string is "www.local.com", then it will return only www.mylocal.com
how should I write it?
The following regex will match what you want when given a host string:
var reg = new RegExp('^https?://([^.]*' + host + ')');
So, for example:
var host = '.mylocal.com';
reg.exec('http://www.mylocal.com'); // ["http://www.mylocal.com", "www.mylocal.com"]
reg.exec('http://v1.mylocal.com/path'); // ["http://v1.mylocal.com", "v1.mylocal.com"]
reg.exec('https://v3.mylocal.com'); // ["https://v3.mylocal.com", "v3.mylocal.com"]
host = 'www.mylocal.com';
reg.exec('http://www.mylocal.com'); // ["http://www.mylocal.com", "www.mylocal.com"]
reg.exec('http://v1.mylocal.com/path'); // null
reg.exec('https://v3.mylocal.com'); // null
You could also refer to the following post for a full URI regex:
Regular expression validation for URL in ASP.net.
If you want to search on each part of the URL then do just that.
split the URL into 3 searching strings, then run a match of each against your split search terms, this way you can control matching at the beginning and end of each term, and if you would like can order the rest of the terms appropriately.
Related
I'm trying to parse a specifc part of url after search using
any language.(Ideally Javascript but open to Python)
How do I get a specific part of url and save/store?
For example,
In songking.com,
The way to get artist_id is checking a specific part of the url after searching artist name
in the search bar of the website.
in the case below,
the artist id is 301329.
https://www.songkick.com/artists/301329-rac
I strongly believe there is a way to parse this part using either python or js
given that I have a csv file that has artist name in its column. Instead of searching all the artists one by one. I wonder about the algorithm that literate my csv column and search it and parse the url and save/store.
It would be very grateful even if I could only get a hint that I could start with.
Thank you so much always.
It can be done using regular expressions.
Here's an example of a JavaScript implementation
const url = "https://www.songkick.com/artists/301329-rac";
const regex = /https:\/\/www\.songkick\.com\/artists\/(\d+)-.+/;
const match = url.match(regex);
if (match) {
console.log('Artist ID: ' + match[1]);
} else {
console.log('No Artist ID found!');
}
This regular expression /https:\/\/www\.songkick\.com\/artists\/(\d+)-.+/ means that we're trying to match something that starts with https://www.songkick.com/artists/, preceded by a group of decimals a dash then a group of letters.
The match() method retrieves the result of matching a string against a
regular expression.
Thus it will return the overall string in the first index, then the matched (\d+) group in the second index (match[1] in our case).
If you're not sure of the protocol (http vs https) you can add a ? in the regex right after https. That makes the s in https optional. So the regex would become /https?:\/\/www\.songkick\.com\/artists\/(\d+)-.+/.
Let me know if you need more explanation.
First, you can use RegEx simply.
In python
import re
url = 'https://www.songkick.com/artists/301329-rac'
pattern = '/artists/(\d+)-\w'
match = re.search(pattern, url)
if match:
artist_id = match.group(1)
I hope this will help you.
I have a Javascript array of string that contains urls like:
http://www.example.com.tr/?first=DSPN47ZTE1BGMR&second=NECEFT8RYD
http://www.example.com.tr/?first=RTR22414242144&second=YUUSADASFF
http://www.example.com.tr/?first=KOSDFASEWQESAS&second=VERERQWWFA
http://www.example.com.tr/?first=POLUJYUSD41234&second=13F241DASD
http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD
I want to extract "first" query parameter values from these url.
I mean i need values DSPN47ZTE1BGMR, RTR22414242144, KOSDFASEWQESAS, POLUJYUSD41234, 54SADFD14242RD
Because i am not good using regex, i couldnt find a way to extract these values from the array. Any help will be appreciated
Instead of using regex, why not just create a URL object out of the string and extract the parameters natively?
let url = new URL("http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD");
console.log(url.searchParams.get("first")); // -> "54SADFD14242RD"
If you don't know the name of the first parameter, you can still manually search the query string using the URL constructor.
let url = new URL("http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD");
console.log(url.search.match(/\?([^&$]+)/)[1]); // -> "54SADFD14242RD"
The index of the search represents the parameter's position (with index zero being the whole matched string). Note that .match returns null for no matches, so the code above would throw an error if there's no parameters in the URL.
Does it have to use regex? Would something like the following work:
var x = 'http://www.example.com.tr/?first=DSPN47ZTE1BGMR&second=NECEFT8RYD';
x.split('?first=')[1].split('&second')[0];
Try this regex:
first=([^&]*)
Capture the contents of Group 1
Click for Demo
Code
Explanation:
first= - matches first=
([^&]*) - matches 0+ occurences of any character that is not a & and stores it in Group 1
You can use
(?<=\?first=)[^&]+?
(?<=\?first=) - positive look behind to match ?first=
[^&]+? - Matches any character up to & (lazy mode)
Demo
Without positive look behind you do like this
let str = `http://www.example.com.tr/?first=DSPN47ZTE1BGMR&second=NECEFT8RYD
http://www.example.com.tr/?first=RTR22414242144&second=YUUSADASFF
http://www.example.com.tr/?first=KOSDFASEWQESAS&second=VERERQWWFA
http://www.example.com.tr/?first=POLUJYUSD41234&second=13F241DASD
http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD`
let op = str.match(/\?first=([^&]+)/g).map(e=> e.split('=')[1])
console.log(op)
So I currently pass two variables into the url for use on another page. I get the last variable (ie #12345) with location.hash. Then from the other part of the url (john%20jacob%202) all I need is the '2'. I've got it working but feel there must be a cleaner and succinct way to handle this. The (john%20jacob%202) will change all the time to have different string lengths.
url: http://localhost/index.html?john%20jacob%202?#12345
<script>
var hashUrl = location.hash.replace("?","");
// function here to use this data
var fullUrl = window.location.href;
var urlSplit = fullUrl.split('?');
var justName = urlSplit[1];
var nameSplit = justName.split('%20');
var justNumber = nameSplit[2];
// function here to use this data
</script>
A really quick one-liner could be something like:
let url = 'http://localhost/index.html?john%20jacob%202?#12345';
url.split('?')[1].split('').pop();
// returns '2'
How about something like
decodeURI(window.location.search).replace(/\D/g, '')
Since your window.location.search is URI encoded we start by decoding it. Then replace everything that is not a number with nothing. For your particular URL it will return 2
Edit for clarity:
Your example location http://localhost/index.html?john%20jacob%202?#12345 consists of several parts, but the interesting one here is the part after the ? and before the #.
In Javascript this interesting part, the query string (or search), is available through window.location.search. For your specific location window.location.search will return ?john%20jacob%202?.
The %20 is a URI encoded space. To decode (ie. remove) all the URI encodings I first run the search string through the decodeURI function. Then I replace everything that is not a number in that string with an empty string using a regular expression.
The regular expression /\D/ matches any character that is not a number, and the g is a modifier specifying that I want to match everything (not just stop after the first match), resulting in 2.
If you know you are always after a tag, you could replace everything up until the "#"
url.replace(/^.+#/, '');
Alternatively, this regex will match the last numbers in your URL:
url.match(/(?<=\D)\d+$/);
//(positive look behind for any non-digit) one more digits until the end of the string
I have a dilemma in using regex as I am very new in using this:
I have the URL below:
var url = https://website.com/something-here/page.html?p=null#confirmation?order=123
My expected result is:
/something-here/page.html #confirmation
It could be a space or a comma or simply combine the two(/something-here/page.html#confirmation)
I can do this using two regex below:
var a= url.match(/som([^#]+).html/)[0];
var b= url.match(/#([^#]+).tion/)[0];
console.log(a,b);
But I would like to have it done as a single regex with the same result.
You can use RegExp's group system to your advantage. Here's a snippet:
var matches = url.match(/(som[^#]+.html).*?(#[^#]+.tion)/);
console.log(matches[1] + " " + matches[2]); // prints /something-here/page.html #confirmation
I combined your two RegExp conditions into one, while enclosing them with parenthesis in the correct areas to create two groups.
That way, you can get the specified group and add a space in between.
Aside the fact that your example url is malformed (you have two search params), therefore not very suitable to work with - I have e proposition:
Why not use the URL object and its properties?
url = new URL("https://website.com/something-here/page.html?p=null#confirmation?order=123");
and precisely grab the properties with explicit syntax as in:
url.pathname; >> "something-here/page.html"
url.hash; >> "#confirmation?order=123"
But in case you explicitly need a RegExp variant
here is one
var url = "https://website.com/something-here/page.html?p=null#confirmation?order=123";
var match = url.match(/\/som.*?html|\#.*?tion/g);
console.log(match.join(" "));
Use each your condition in scope "( )" More details answer try find here
str = 'http://*.foo.com/bar/' is a valid string.
How do I write a regex to validate this in JavaScript?
`http://xyz.foo.com/bar/` ✓ valid
`http://xyz.foo.com/bar/abc/` ✗ invalid
`http://xyz.foo.com/` ✗ invalid
try playing around at RegExr. It has a lot of good information and will give you a javascript regex at the bottom of the page when you are done.
Try this:
var url = // your url
url.match(/http://[a-zA-Z_0-9]+\.foo\.com/bar/$/g)
The $ matches the end of a string. The $ at the end will make sure that there is no text after it.
Can you give some more details?
e,g, /^http://[a-zA-Z]+.foo.com/bar/$/g
will do what you're looking for if it is on a single line (through ^$ delimiters). It will match xyz but not xyz1 which is easy to fix if you want to include numbers.
Play around it and let me know if you have more questions.
You can assign some of the location object's properties to your own variables once the page loads (since location is free to change afterward):
var hostURL = location.host; //should be '*.foo.com'
var pathURL = location.pathname; //should be '/bar'
Then create a RegExp object:
var regex = '.*\.foo\.com/bar/$';
var testURL = new RegExp(regex);
And test the URL:
if (testURL(hostURL + pathURL)) {
//do something
}
This regex oughta do it for you.
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov)\b
Assign the above to a variable. And test if this pattern matches with the url of your choice using test() method of javascript. Update this to suit your needs if need be.