I am trying to create a regex that will return true only when the string only contains an anchor like so Link.
Currently I have the following regex which almost works (?=.*^<a)(?=.*<\/a>).*/g
This works for the following scenarios:
Link - Matches successfully
words before Link - No matches - success
Link words after - finds match - Not successful :(
I think I'm pretty close, I just need to know how to not find a match if there are any characters after the </a>.
Add a $ after the last >
(?=.*^<a)(?=.*<\/a>$).*
regex101
Alternatively, this regex matches links and discards everything else, irregardless of whether there are one or more links in the same line.
<a.*?\/a>
Related
I need a regex to verify if the textarea has one of the following matches:
[img]https://example.com/image.jpg[/img]
[img=https://example.com/image.jpg]
This is what I've been trying so far, but it doesn't work, sadly...
/\[img(?=|\])(https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*))(?\])(?\[\/img\])/gi
Thank you.
You can use this- ^\[img(?:\].+\[\/img\]|=.+\])$
Note: If you want to verify the link string is a valid URL, replace both .+ with a URL regex matcher, you may find one here
Explanation
^\[img - This part is common both strings, this will match the [img at the start of line
(?:\].+\[\/img\]|=.+\])$ - This will match 2 alternatives, depending on the very first character
First alternative (first character is ]) - In this case \].+\[\/img\]| will be matched. This will match everything (.+) in between the opening and closing [img] tags before finally matching the closing tag itself.
Second alternative (first character is =) - In this case =.+\] will be matched. This grabs everything after img= and stops when ] is reached.
finally the regex matches the end of line.
Check out the demo
Im trying to match a URL's path (window.location.pathname) but exclude anything further down the path.
I want to match the following:
/admin/sites/{2-6 digit number}{/ exclude the rest}
Examples
/admin/sites/123 - true
/admin/sites/1 - false
/admin/sites/123/foo - false
I've got as far as the following regex but can't seem to figure out the rest.
/admin\/sites\/[0-9]/.test(window.location.pathname)
/^\/admin\/sites\/\d{2,6}$/
the $ anchors the expression to the end of the string so it must end with the digits.
I also included the ^ so it must start with /admin.
If you want to match up to the / after the digits, you need the following regex:
^\/admin\/sites\/[0-9]{2,6}(?=\/)
See demo
I am matching up stored URLs to the current URL and having a little bit of an issue - the regex works fine when being matched against the URL itself, but for some reason all sub-directories match too (when I want a direct match only of course).
Say the user stores www.facebook.com, this should match both http://www.facebook.com and https://www.facebook.com and it does
The problem is it is also matching sub-directories such as https://www.facebook.com/events/upcoming etc.
The regex for example:
/.+:\/\/www\.facebook\.com/
Matches the following:
https://www.facebook.com/events/upcoming
When it should just be matching
http://www.facebook.com/
https://www.facebook.com/
How can I fix this seemingly broken regex?
If you're being really specific about what you want to match, why not reflect that in your RegExp?
/^https?:\/\/(?:(?:www|m)\.)?facebook\.com\/?$/
http or https
www., m. or no subdomain
facebook.com
Demo
edit to include optional trailing backslash
Put an end marker $, like:
/.+:\/\/www\.facebook\.com\/$/
but really should have a start marker ^ too, like:
/^https?:\/\/www\.facebook\.com\/$/
also if you're matching the current domain, you may as well just match the location.host rather than location.href
Try adding a $ at the end of your regex. It's the symbol for end of string.
I have been trying to match just the user id or vanity part of the URI for Google+ accounts. I am using GAS (Google Script Engine) which I've loaded XRegExp to help match Unicode characters.
So far I have this: ((https?://)?(plus\.)?google\.com/)?(.*/)?([a-zA-Z0-9._]*)($|\?.*) which you can see the regex tests (external site) still don't just match the right parts.
I've tried using \p{L} inside of [a-zA-Z0-9._] but no luck with that. Also, I end up with an extra forward slash at the end of the profile name when it does match.
UPDATE #1: I am trying to fix some G+ URL in a spreadsheet copied from a Google Form. The links are not all the same and the most simplest profile link is "https://plus.google.com/" + user id OR vanity name.
UPDATE #2: So far I have ([+]\w+|[0-9]{21})(?:\/)?(?:\w+)?$ with uses #demrks simplified version of #guest271314's response. However, two problems:
1) Google Vanity URLs can have unicode in them. Example: https://plus.google.com/u/0/+JoseManuelGarcĂa_ertatto which fails. I have tried to use \p{L} but can't seem to get it right.
2) GAS doesn't seem to like it event though regex tests works on this site. =(
UPDATE #3: It seems GAS just hates using \w so I've had to expand it. So I have this so far:
/([+][A-Za-z0-9-_]+|[0-9]{21})(?:\/)?(?:[A-Za-z0-9-_]+)?$/
This matches even with "/about" or "/posts" at end of the URL. However still doesn't match UNICODE. =( I am still working on that.
UPDATE #4: So this seems to work:
/([+][\\w-_\\p{L}]+|[\\d]{21})(?:\/)?(?:[\\w-_]+)?$/
Looks like I needed to do double backslashes in side of the character classes. So this seems to work so far. Not sure if there is shorter way to use this however.
Edit, updated
Try (v4)
document.URL.match(/\++\w+.*|\d+\d|\/+\w+$/).toString()
.replace(/\/+|posts|about|photos|videos|plusones|reviews/g, "")
e.g.,
var urls = ["https://plus.google.com/+google/posts"
, "https://plus.google.com/+google/about"
, "https://plus.google.com/+google/photos"
, "https://plus.google.com/+google/videos"
, "https://plus.google.com/+google/plusones"
, "https://plus.google.com/+google/reviews"
, "https://plus.google.com/communities/104645458102703754878"
, "https://plus.google.com/u/0/LONGIDHERE"
, "https://plus.google.com/u/0/+JoseManuelGarcĂa_ertatto"];
var _urls = [];
urls.forEach(function(item) {
_urls.push(item.match(/\++\w+.*|\d+\d|\/+\w+$/).toString()
.replace(/\/+|posts|about|photos|videos|plusones|reviews/g, ""));
});
_urls.forEach(function(id) {
var _id = document.createElement("div");
_id.innerHTML = id;
document.body.appendChild(_id)
});
jsfiddle http://jsfiddle.net/guest271314/o4kvftwh/
This solution should match both IDs and usernames (with unicode characters):
/\+[^/]+|\d{21}/
http://regexr.com/39ds0
Explanation: As an alternative to \w (which doesn't match unicode characters) I used a negation group [^/] (matches anything but "/").
Following a possible solution:
(?:\+)(\w+)|(?:\/)(\w+)$
Explanation:
1st Alternative: (?:\+)(\w+)
(?:\+) Non-capturing group: \+ matches the character + literally. Capturing group (\w+): \w+ match any word character [a-zA-Z0-9_]. Quantifier: Between one and unlimited
times.
2nd Alternative: (?:\/)(\w+)$. (?:\/) Non-capturing group. \/ matches the character / literally. Capturing group (\w+). \w+ match any word character [a-zA-Z0-9_]. Quantifier: Between one and unlimited times. $ assert position at end of the string.
Hope it useful!
So this seems to work:
/([+][\\w-_\\p{L}]+|[\\d]{21})(?:\/)?(?:[\\w-_]+)?$/
Looks like I needed to do double backslashes in side of the character classes. So this seems to work so far. Not sure if there is shorter way to use this however.
So I have tweet url for example https://twitter.com/ESPNFC/status/423771542627966976.
This url in my website gets automatically parsed to
https://twitter.com/ESPNFC/status/423771542627966976
I need to match this pattern and also get username and tweet ID.
I did it that way
/<a href="(http|https):\/\/twitter.com\/([^\/]*)\/status\/([^\/]*)">.+<\/a>/g. Everything works when I have 1 tweet per line, but if there are 2 or more tweets in one line, that regex matches both of them at same time and groups it as one, but I need to separate them.
Example:
https://twitter.com/ESPNFC/status/423771542627966976
https://twitter.com/ESPNFC/status/423771542627966976
returns 2 matches, but
https://twitter.com/ESPNFC/status/423771542627966976https://twitter.com/ESPNFC/status/423771542627966976
returns 1 match including both urls. How can I separate it or for example everything after interpret as new line?
It's best to avoid parsing HTML with regex when possible. Having said that the problem with your expression is the greedy .+ which will match as much as possible. Instead you could use .+? to make it ungreedy (match as few characters as possible). Or you could restrict what . matches, for example use [^\s<>]+ instead of .+.
Also you probably want to change those [^\/]* to maybe [^\/"\s]* to make them more effective.