Stuck on a regular expression

Stuck on a regular expression - javascript

I seem to have backed myself up into a corner with this. I'm sure that the answer is going to make me want to smack a brick against my head, but I'm not all that good with regex just yet. So, here goes.
I need to modify this regex so that it fails if it finds any occurances of pound signs. (#)
My current regex is this;
/^[A-Za-z\.\-\_\s]{1,80}$/i
I tried a number of variations such as;
/[^#]^[A-Za-z\.\-\_\s]{1,80}$/i
/^[[^#]A-Za-z\.\-\_\s]{1,80}$/i
/^[A-Za-z\.\-\_\s^#]{1,80}$/i
None of which work. Can anyone offer any advise, please?

Your original regex should work, because # isn't in the list of characters you specified for the class. You don't need to add anything, it already fails if there's a # in there.

Just use two regexes:
/^[A-Za-z\.\-\_\s]{1,80}$/i
Then filter your input so that you keep only what does NOT match this regex:
/#/
It's far easier to match on patterns that you want to filter out (and then ignoring the matching strings instead of ignoring the complement) than it is to try to construct an "inverse" regex. And there's no reason why you should try to fit everything into one regex.

Related

RegExp - Using the dollar sign to differentiate hex from decimal [duplicate]

I want to find "U.S.A." (without the quotes) if it is in a string as a whole word.
So good strings should be
U.S.A.
u.s.a.
U.S.A. is a great country
Is this U.S.A. ?
Bad string is
U.S.A.mnop
U.S.A
I tried using
/\bU\.S\.A\.\b/i
But strangely the one that works is - (But it fails for other countries and so not useful)
/\bU\.S\.A\.\B/i
This seems opposite of my understanding from documentation and have searched this and there are lots of similar problems but none of them helped me understand the issue. I think that the last "." is being consumed by \b and hence not working but am still confused.
Can someone please help with explanation and the right search string ? It should also do proper word search of other strings without special characters.

You can check for the word boundary first (as you were doing), but the tricky part is at the end where you can't use the word boundary because of the .. However, you can check for a whitespace character at the end instead:
/\b(u\.s\.a\.)(?:\s|$)/gi
Check out the Regex101

RegExp word boundary with special characters (.) javascript

I want to find "U.S.A." (without the quotes) if it is in a string as a whole word.
So good strings should be
U.S.A.
u.s.a.
U.S.A. is a great country
Is this U.S.A. ?
Bad string is
U.S.A.mnop
U.S.A
I tried using
/\bU\.S\.A\.\b/i
But strangely the one that works is - (But it fails for other countries and so not useful)
/\bU\.S\.A\.\B/i
This seems opposite of my understanding from documentation and have searched this and there are lots of similar problems but none of them helped me understand the issue. I think that the last "." is being consumed by \b and hence not working but am still confused.
Can someone please help with explanation and the right search string ? It should also do proper word search of other strings without special characters.

You can check for the word boundary first (as you were doing), but the tricky part is at the end where you can't use the word boundary because of the .. However, you can check for a whitespace character at the end instead:
/\b(u\.s\.a\.)(?:\s|$)/gi
Check out the Regex101

javascript regex match anything not inside a specific attribute

Lets say I want to match urls, which are not inside a specific set of attributes in html tags.
<span cstm1="url1" cstm2="url2" data-x="url3">url4</span>
I want to match url3 and url4 only, so I tried something like:
/(?!(?:cstm1|cstm2)=["']?)(url_regex)/g
problem is that the negative look ahead assertion need something before it and I cannot ensure that the number cannot be inside quotes because it'll still be valid, so I don't have anything reasonable to put behind this negative look ahead assertion.
If I was able to use negative look behind assertion it'll be really easy, but I'm using javascript which doesn't support it, so I'm kinda stuck and looking for help on how to achieve this.
I look for regex only solution.
EDIT:
The url regex I used to find urls:
((?:(?:https?):\/\/)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|\[(?:(?:[0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|(?:[0-9a-fA-F]{1,4}:){1,7}:|(?:[0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|(?:[0-9a-fA-F]{1,4}:){1,5}(?::[0-9a-fA-F]{1,4}){1,2}|(?:[0-9a-fA-F]{1,4}:){1,4}(?::[0-9a-fA-F]{1,4}){1,3}|(?:[0-9a-fA-F]{1,4}:){1,3}(?::[0-9a-fA-F]{1,4}){1,4}|(?:[0-9a-fA-F]{1,4}:){1,2}(?::[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:(?:(?::[0-9a-fA-F]{1,4}){1,6})|:(?:(?::[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(?::[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(?:ffff(?::0{1,4}){0,1}:){0,1}(?:(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])|(?:[0-9a-fA-F]{1,4}:){1,4}:(?:(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9]))\]|localhost|(?:xn--[a-z0-9\-]{1,59}|(?:(?:[a-z\u00a1-\uffff0-9]+-?){0,62}[a-z\u00a1-\uffff0-9]{1,63}))(?:\.(?:xn--[a-z0-9\-]{1,59}|(?:[a-z\u00a1-\uffff0-9]+-?){0,62}[a-z\u00a1-\uffff0-9]{1,63}))*(?:\.(?:xn--[a-z0-9\-]{1,59}|(?:[a-z\u00a1-\uffff]{2,63}))))(?::\d{2,5})?(?:\/[^"'()<>\s]*)?)

In the absence of lookbehind you can use capture group to extract your results.
/(?:cstm1|cstm2)=(['"]?)\d+\1|(\b\d+\b)/ig
Use captured group #2 for your matches.
RegEx Demo

regex for background-image:url('URL');

I trying to make a regex for finding: background-image:url('URL'); Where the URL is a external link for an image.
Been trying for something like this:
/\s*?[ \t\n]background-image:url('https?:\/\/(?:[a-z\-]+\.)+[a-z]{2,6}(?:\/[^\/#?]+)+\.(?:jpe?g|gif|png)$');/i
But couldn't get it to work.
I am using this with javascript/jquery

Does this get what you want?:
/\s*?[ \t\n]background-image:url\('.+?'\);/i

I think you can simplify it to this if you know it will only change with the URL in the middle. I probably went overboard with the \ escapes but better to be safe than sorry.
/background\-image\:url\(\'.*?\'\)\;/

Epascarello hit the nail on the head. Is this source you control? Or at least a predictable website? What are multiple different examples of input and your expected results?
Will this always be inline in double quotes, and therefore your URL will always be in single quotes? Some old websites use double-quotes in their CSS Files or header CSS.
Do you want to capture the whole thing? Or are you just trying to extract the resulting URL?
SirCapsAlot brings up a good question, are you just looking for background image URL's in general? Because they can use the Background property also, or even be set in JavaScript with .backgroundImage="url(image.jpg)".
And you definitely only want the ones that include http(s)?
With the limited requirements you gave, this is the best Regex:
background-image\s*:\s*url\('(https?://[^']+)
Comment here if you have answers to my questions which may alter your requirements, and thusly my answer.
Breakdown:
background-image:\s*url //Find the literal text to begin
\(' //Find the literal opening parens and quote
( //Begin Capture Group 1
https?:// //Require the match of https:// (the s is optional because of the ?)
[^']+ //Require that everything until the next quote is matched
) //Capture the result into Group 1
A Co-Worker pointed out that I might have been downvoted for not capturing the closing tick. Note: Capturing the closing tick would be a wasted step, and is not necessary for this regex to work.
He also pointed out somebody might have downvoted me for requiring http or https in the url portion. But the user's question was specifically for external URLs, not internal ones. So this is a valid requirement and gets him closer to what he asked.
Sooo... not sure why this got a downvote.

Regex will not match

This is my string:
<link href="/post?page=4&tags=example" rel="last" title="Last Page">
From there I am trying to obtain the 4 out of that page parameter, using this regular expression:
link href="/post?page=(.*?)&tags=(.*?)" rel="last"
I will then collect the 4 out of the first group, the tags parameter has a wildcard because the contents can change. However, I don't seem to be getting a match with this, can anyone help?
And I know I shouldn't be using regex to parse HTML, but this is just a small thing and it would be a waste to import a huge module for this.

Assuming you are using a /regex literal/, you will need to escape the / in that path as \/.
Alternatively, it depends on how you are getting this string. Is it really typed that way, or is it part of an innerHTML that you are then reading out again? If that's the case, then the innerHTML won't be what you expect it to be, because the browser will "normalise" it.
If it is an innerHTML, then it'd be far easier to get the tag, then get the tag's href attribute, then regex that.

link href="/post\?page=(.*?)&tags=(.*?)" rel="last"
You forgot the slash before ?

I think it might be better to change your capture groups to something a little different, but will catch everything up to the terminating character:
link href="/post?page=([^&]+)&tags=([^\"]+)" rel="last"
Using the negating character first in the character group tells the regex engine "capture all characters EXCEPT the ones listed here". This makes it very easy to capture everything up until it hits a termination character, such as the amperstand and double-quote. Assuming you're using PHP or Java, this should also slightly improve regex performance.

If the page parameter always comes first, try the PCRE /\?page=(\d+)/. Match group 1 will contain the page number.

We Keep Coding

JavaScript is the programming language of the Web.