RegEx "ignores" quantifier? - javascript

Basically I have the following string: http:/www.-woejfewiofjewow
which is NOT allowed to be matched
My Regex: http://(www\.[^-])?[^-].*
(I used regexr.com to check it..)
The thing is, it doesn't use the first part of the regex (www\.[^-])? but the second part: [^-].*
I don't really know how to solve this problem, is there any possibility?
I am trying to search valid URLs (well in this case without .com) with the following format: http://www.test http://test
Hyphens at the beginning are not allowed (but http://www.test-test is allowed)
I am trying to find a solution without lookaheads

I think you actually need a negative lookahead assertion.
\bhttp:\/\/(?!www\.-)[^-].*
(?!www\.-) negative lookahead which asserts that the double forward_slashes // must not be followed by www.-
DEMO

if you are trying to validate urls, this regex would match a url a bit better:
http:\/\/(?:www\.)?(?:[a-zA-z0-9]+)\.(?:[a-z]){2,3}
these urls are allowed:
http://www.woejfewiofjewow.net
http://www.woejfewiofjewow.ly
this is not allowed:
http://www.woejfewiofjewow.neta
http://www.woejfewiofjewow.n
or even this
http://www.-woejfewiofjewow.net

Related

JS Regexp to exclude forward slash after .com in url

I have this URL for e.g https://www.example.com/filters/test.jpg and in JS, I want to retrieve this part: filters/test.jpg.
I am using match() but the element of the first position of match is /filters/test.jpg.
This is my regexp:/(?!com)\/((\w+)\/(.*))/
What am I missing to remove the forward slash / from the match array?
If your interest is in regex itself rather than just the result, how about this expression?
(?<=.+\.com\/).+
This uses a positive lookbehind and will give you everything after any amount of text ending in ".com/". Note my use of escape slashes for the period and the forward slash. If you want more specificity, you can do the same thing with the word group and second slash in your original regex:
(?<=.com\/)((\w+)\/(.*))
UPDATE: As requested, a note on negative vs. positive lookahead/lookbehind: lookahead instructs the query to "look for X, but match only if followed by Y." Negative lookahead "look for X, but match only if not followed by Y." In your case, you want a lookbehind because that will "look for X, but match only if preceded by Y." A negative lookbehind, which you were trying, allows to match a pattern only if there isn't something before it, so doing this in your case would be a mistake. For more information, see https://javascript.info/regexp-lookahead-lookbehind
If your goal is just to get the result, I think using the URL object in javascript (as in the comment) is actually better than regex because it's more tuned to the specific problem. See https://dev.to/attacomsian/introduction-to-javascript-url-object-27hn.
If code for new JS engines /(?<=\/)(\w+)\/.*/
If code for old JS engines /\b(?!(?:com|net|org|uk)\/)(\w+)\/.*/
Best way though is store array using /\/((\w+)\/.*)/

Regular Expression to find a pattern and replace just part of it

I want to know how can I use RegEx to find a pattern and replace just a part of it in JavaScript.
Let's say, for example, I want to replace some patterns like this -foo but just if it has a - after it, like -foo- but replace just the -foo.
Can someone please explain in details the RegEx construction to achieve it?
I did not find a detailed explanation of it here, just codes with a minimum explanation.
You need to use a positive look-ahead (?=-) that will check the existence of - after -foo but will not consume it:
var s = "-foo- -foo";
alert(s.replace(/-foo(?=-)/g, 'REPLACED'));
You can read more about look-aheads (and look-behinds, though they are not supported by the JS regex engine) at regular-expressions.info.
The main idea is that the text is checked for presence or absence of some patterns defined in the look-around, and based on that either allow or fail the match. They can actually be used efficiently together with anchors, but this is not the case here.
Lookahead and lookbehind, collectively called "lookaround", are zero-length assertions... lookaround actually matches characters, but then gives up the match, returning only the result: match or no match... They do not consume characters in the string, but only assert whether a match is possible or not.
As the first poster said, you need to make use of a lookahead (?=) to check for an additional character(s). In this situation, the character you need to look for is -, therefore your pattern would make use of a lookahead followed by - ie(?=-).

Exact string negation in javascript regexpressions

This is more a question to satisfy my curiosity than a real need for help, but I will appreciate your help equally as it is driving me nuts.
I am trying to negate an exact string using Javascript regular expressions, the idea is to exclude URL that include the string "www". For instance this list:
http://www.example.org/
http://status.example.org/index.php?datacenter=1
https://status.example.org/index.php?datacenter=2
https://www.example.org/Insights
http://www.example.org/Careers/Job_Opportunities
http://www.example.org/Insights/Press-Releases
For that I can succesfully use the following regex:
/^http(|s):..[^w]/g
This works correctly, but while I can do a positive match I cannot do something like:
/[^www]/g or /[^http]/g
To exclude lines that include the exact string www or http. I have tried the infamous "negative Lookeahead" like that:
/*(?: (?!www).*)/g
But this doesn't work either OR I cannot test it online, it doesn't works in Notepad++ either.
If I were using Perl, Grep, Awk or Textwrangler I would have simply done:
!www OR !http
And this would have done the job.
So, my question is obviously: What would be the correct way to do such thing in Javascript? Does this depend on the regex parser (as I seem to understand?).
Thanks for any answer ;)
You need to add a negative lookahead at the start.
^(?!.*\bwww\.)https?:\/\/.*
DEMO
(?!.*\bwww\.) Negative lookahead asserts that the string we are going to match won't contain, www.. \b means word boundary which matches between a word character and a non-word character. Without \b, www. in your regex would match www. in foowww.
To negate 'www' at every position in the input string:
var a = [
'http://www.example.org/',
'http://status.example.org/index.php?datacenter=1',
'https://status.example.org/index.php?datacenter=2',
'https://www.example.org/Insights',
'http://www.example.org/Careers/Job_Opportunities',
'http://www.example.org/Insights/Press-Releases'
];
a.filter(function(x){ return /^((?!www).)*$/.test(x); });
So at every position check that 'www' doesn't match, and then match
any character (.).

Javascript regex: is there anyway to write a regex which gives true if backreference is NOT matched

so here is my problem: I'm checking an input of 2 years with a hyphen. Like:
2001-2015
To test this, I use the simple regex
/^([0-9]{4})-([0-9]{4})$/
I know groups aren't needed, and (19|20)[0-9]{2}, is a closer match to the basic year exp, but bear with me.
Now, if my requirement was to match the two years only if they are the same, i could have used a backreference like:
/^([0-9]{4})-\1$/
which matches 2000-2000 but not 2000-2014
My actual requirement is exactly the opposite. I want it to match if the years are different but not if they're same. That is, 2000-2014 should match. 2000-2000 should not.
And using the negative of the boolean I find is not an option. I need this for a huuuge regex which is supposed to match a whole lot of different date formats. This is just a part of it.
Is there any way to achieve this?
You can use a negative lookahead to achieve this:
^([0-9]{4})-(?!\1)[0-9]{4}$
Demo
This is almost the same pattern, except it inserts a condition check using the backreference.
(?!\1) will fail if \1 matches at its position.
You can use negative lookahead:
\b(\d{4})-(?!\1)\d{4}\b
RegEx Demo
Use Negative Lookahead.
Like this :
^([0-9]{4})-(?!\1)[0-9]{4}$
It does work on your example.
Explanation : (?!\1) Assert that it is impossible to match the regex \1. Then you just put your 4 digits requirement.

Regex Positive Lookbehind on url segment

I am parsing a number from a URL string. The URL looks like:
https://www.myapi.com/player/?url=https%3A//myapi.com/users/11468859&color=788b78&auto_play=false&show_artwork=false
I would like to match the number between 'users/' and '&'. In this case '11468859'. So I using a positive lookahead and lookbehind to accomplish this.
This is what I have so far:
(?<=users/)([0-9]*?)(?=\&)
This doesn't match anything. My lookbehind is wrong. So if I omit the lookbehind I can match on users/11468859
([0-9]*?)(?=\&) matches >> 'users/11468859'
How do I correctly create a positive lookbehind to match on users/?
Thanks!
Putting aside your lookbehind question for a moment, this regex works:
users/([0-9]+)
Debuggex Demo
The id is in capture group one.
In debuggex your lookbehind works fine but not in JavaScript:
(?<=users/)([0-9]*?)(?=\&)
Debuggex Demo
(You could also get away with just
(?<=users/)([0-9]*)
Debuggex Demo
since [0-9]* is greedy.)
However, as you're using JavaScript, I recommend the regex at the top of my answer.
If you're certain that the desired segment will be a series of integers immediately after user/, you don't need the look ahead. Also, I would recommend escaping any sort of slash: \/
(?<=users\/)([0-9]*?)
Also, you don't need to tell the regex not to be greedy unless you know it will run into other numbers, and I would consider telling the regex that there must be numbers so it won't match if they are missing:
thus
([0-9]*?)
becomes
(\d+)
There are a couple of approaches avaiable in most languages. To match a number use the positive look ahead fromat (?<=STUFF). To match numbers try \d+ or [0-9]+. Each of the following lines work. The second includes a positive look ahead for including letters in an id but will fail if the ampersand is moved.
(?<=users.)\d+
(?<=users.).*?(?=&)
(?<=users.)[0-9]+
For more information: http://myregextester.com/index.php#highlighttab
How do I correctly create a positive lookbehind to match on users/?
You don't, because JavaScript does not support lookbehinds:
From javascript regex - look behind alternative?:
Javascript doesn't have regex lookbehind.
http://regexadvice.com/forums/thread/58678.aspx:
The JavaScript regex engine does not support look-behinds
As an alternative, you can capture the number like this:
users\/(.*?)\&
And just access the first capturing group. Explanation and demonstration: http://regex101.com/r/aZ3bL0
try
string = "https://www.myapi.com/player/?url=https%3A//myapi.com/users/11468859&color=788b78&auto_play=false&show_artwork=false"
regex = /users.([\d]*)/;
arr = regex.exec(a);
result = arr[1];

Categories