Exact string negation in javascript regexpressions

Exact string negation in javascript regexpressions - javascript

This is more a question to satisfy my curiosity than a real need for help, but I will appreciate your help equally as it is driving me nuts.
I am trying to negate an exact string using Javascript regular expressions, the idea is to exclude URL that include the string "www". For instance this list:
http://www.example.org/
http://status.example.org/index.php?datacenter=1
https://status.example.org/index.php?datacenter=2
https://www.example.org/Insights
http://www.example.org/Careers/Job_Opportunities
http://www.example.org/Insights/Press-Releases
For that I can succesfully use the following regex:
/^http(|s):..[^w]/g
This works correctly, but while I can do a positive match I cannot do something like:
/[^www]/g or /[^http]/g
To exclude lines that include the exact string www or http. I have tried the infamous "negative Lookeahead" like that:
/*(?: (?!www).*)/g
But this doesn't work either OR I cannot test it online, it doesn't works in Notepad++ either.
If I were using Perl, Grep, Awk or Textwrangler I would have simply done:
!www OR !http
And this would have done the job.
So, my question is obviously: What would be the correct way to do such thing in Javascript? Does this depend on the regex parser (as I seem to understand?).
Thanks for any answer ;)

You need to add a negative lookahead at the start.
^(?!.*\bwww\.)https?:\/\/.*
DEMO
(?!.*\bwww\.) Negative lookahead asserts that the string we are going to match won't contain, www.. \b means word boundary which matches between a word character and a non-word character. Without \b, www. in your regex would match www. in foowww.

To negate 'www' at every position in the input string:
var a = [
'http://www.example.org/',
'http://status.example.org/index.php?datacenter=1',
'https://status.example.org/index.php?datacenter=2',
'https://www.example.org/Insights',
'http://www.example.org/Careers/Job_Opportunities',
'http://www.example.org/Insights/Press-Releases'
];
a.filter(function(x){ return /^((?!www).)*$/.test(x); });
So at every position check that 'www' doesn't match, and then match
any character (.).

Related

JS Regexp to exclude forward slash after .com in url

I have this URL for e.g https://www.example.com/filters/test.jpg and in JS, I want to retrieve this part: filters/test.jpg.
I am using match() but the element of the first position of match is /filters/test.jpg.
This is my regexp:/(?!com)\/((\w+)\/(.*))/
What am I missing to remove the forward slash / from the match array?

If your interest is in regex itself rather than just the result, how about this expression?
(?<=.+\.com\/).+
This uses a positive lookbehind and will give you everything after any amount of text ending in ".com/". Note my use of escape slashes for the period and the forward slash. If you want more specificity, you can do the same thing with the word group and second slash in your original regex:
(?<=.com\/)((\w+)\/(.*))
UPDATE: As requested, a note on negative vs. positive lookahead/lookbehind: lookahead instructs the query to "look for X, but match only if followed by Y." Negative lookahead "look for X, but match only if not followed by Y." In your case, you want a lookbehind because that will "look for X, but match only if preceded by Y." A negative lookbehind, which you were trying, allows to match a pattern only if there isn't something before it, so doing this in your case would be a mistake. For more information, see https://javascript.info/regexp-lookahead-lookbehind
If your goal is just to get the result, I think using the URL object in javascript (as in the comment) is actually better than regex because it's more tuned to the specific problem. See https://dev.to/attacomsian/introduction-to-javascript-url-object-27hn.

If code for new JS engines /(?<=\/)(\w+)\/.*/
If code for old JS engines /\b(?!(?:com|net|org|uk)\/)(\w+)\/.*/
Best way though is store array using /\/((\w+)\/.*)/

Get the Opposite of a Regular Expression [duplicate]

Is it possible to write a regex that returns the converse of a desired result? Regexes are usually inclusive - finding matches. I want to be able to transform a regex into its opposite - asserting that there are no matches. Is this possible? If so, how?
http://zijab.blogspot.com/2008/09/finding-opposite-of-regular-expression.html states that you should bracket your regex with
/^((?!^ MYREGEX ).)*$/
, but this doesn't seem to work. If I have regex
/[a|b]./
, the string "abc" returns false with both my regex and the converse suggested by zijab,
/^((?!^[a|b].).)*$/
. Is it possible to write a regex's converse, or am I thinking incorrectly?

Couldn't you just check to see if there are no matches? I don't know what language you are using, but how about this pseudocode?
if (!'Some String'.match(someRegularExpression))
// do something...
If you can only change the regex, then the one you got from your link should work:
/^((?!REGULAR_EXPRESSION_HERE).)*$/

The reason your inverted regex isn't working is because of the '^' inside the negative lookahead:
/^((?!^[ab].).)*$/
^ # WRONG
Maybe it's different in vim, but in every regex flavor I'm familiar with, the caret matches the beginning of the string (or the beginning of a line in multiline mode). But I think that was just a typo in the blog entry.
You also need to take into account the semantics of the regex tool you're using. For example, in Perl, this is true:
"abc" =~ /[ab]./
But in Java, this isn't:
"abc".matches("[ab].")
That's because the regex passed to the matches() method is implicitly anchored at both ends (i.e., /^[ab].$/).
Taking the more common, Perl semantics, /[ab]./ means the target string contains a sequence consisting of an 'a' or 'b' followed by at least one (non-line separator) character. In other words, at ANY point, the condition is TRUE. The inverse of that statement is, at EVERY point the condition is FALSE. That means, before you consume each character, you perform a negative lookahead to confirm that the character isn't the beginning of a matching sequence:
(?![ab].).
And you have to examine every character, so the regex has to be anchored at both ends:
/^(?:(?![ab].).)*$/
That's the general idea, but I don't think it's possible to invert every regex--not when the original regexes can include positive and negative lookarounds, reluctant and possessive quantifiers, and who-knows-what.

You can invert the character set by writing a ^ at the start ([^…]). So the opposite expression of [ab] (match either a or b) is [^ab] (match neither a nor b).
But the more complex your expression gets, the more complex is the complementary expression too. An example:
You want to match the literal foo. An expression, that does match anything else but a string that contains foo would have to match either
any string that’s shorter than foo (^.{0,2}$), or
any three characters long string that’s not foo (^([^f]..|f[^o].|fo[^o])$), or
any longer string that does not contain foo.
All together this may work:
^[^fo]*(f+($|[^o]|o($|[^fo]*)))*$
But note: This does only apply to foo.

You can also do this (in python) by using re.split, and splitting based on your regular expression, thus returning all the parts that don't match the regex, how to find the converse of a regex

In perl you can anti-match with $string !~ /regex/;.

With grep, you can use --invert-match or -v.

Java Regexps have an interesting way of doing this (can test here) where you can create a greedy optional match for the string you want, and then match data after it. If the greedy match fails, it's optional so it doesn't matter, if it succeeds, it needs some extra data to match the second expression and so fails.
It looks counter-intuitive, but works.
Eg (foo)?+.+ matches bar, foox and xfoo but won't match foo (or an empty string).
It might be possible in other dialects, but couldn't get it to work myself (they seem more willing to backtrack if the second match fails?)

Regex finding file names

Can anyone help me with the REGEX to match
../_assets/applications/cleaning/*logo.png
"*" being the file name which can also follow an underscore or dash so
../_assets/applications/cleaning/main_logo.png
OR
../_assets/applications/cleaning/main-logo.png
this is as far as I got
\assets\/applications\/cleaning\/

An asterisk in a regex is a quantifier allowing zero or more of the previous character/group. So you first expression would allow zero or more forward slashes. You can use a . with a * to allow for zero or more of any character (excluding new line). So something like:
\/cleaning\/(.+?logo\.png)$
should find all the images you want, then:
/logos/$1
should replace them as you wanted.
Demo: https://regex101.com/r/dmAjjv/1/

RegEx "ignores" quantifier?

Basically I have the following string: http:/www.-woejfewiofjewow
which is NOT allowed to be matched
My Regex: http://(www\.[^-])?[^-].*
(I used regexr.com to check it..)
The thing is, it doesn't use the first part of the regex (www\.[^-])? but the second part: [^-].*
I don't really know how to solve this problem, is there any possibility?
I am trying to search valid URLs (well in this case without .com) with the following format: http://www.test http://test
Hyphens at the beginning are not allowed (but http://www.test-test is allowed)
I am trying to find a solution without lookaheads

I think you actually need a negative lookahead assertion.
\bhttp:\/\/(?!www\.-)[^-].*
(?!www\.-) negative lookahead which asserts that the double forward_slashes // must not be followed by www.-
DEMO

if you are trying to validate urls, this regex would match a url a bit better:
http:\/\/(?:www\.)?(?:[a-zA-z0-9]+)\.(?:[a-z]){2,3}
these urls are allowed:
http://www.woejfewiofjewow.net
http://www.woejfewiofjewow.ly
this is not allowed:
http://www.woejfewiofjewow.neta
http://www.woejfewiofjewow.n
or even this
http://www.-woejfewiofjewow.net

Find consecutive "//" in regex in JavaScript

I gave it a college try, but I'm stumped. I'm trying to find consecutive slashes within a string. The rest of the regex works great, but the last part I can't quite get.
Here's what I have:
val.match( /^[\/]|[~"#%&*:<>?\\{|}]|[\/|.]$/ )
and finding this thread, I decided to update my code to no avail:
RegEx to find two or more consecutive chars
val.match( /^[\/]|[\/]{2,}|[~"#%&*:<>?\\{|}]|[\/|.]$/ )
What do I need to get this thing going?
So, I need this regex to look for many characters. That would explain the first code sample that I provided:
val.match( /^[\/]|[~"#%&*:<>?\\{|}]|[\/|.]$/ )
What I need it to also do, is look in the string for a double whack. Yes, I'm well aware of indexOf and other string manipulation techniques, but I labeled it regex because it needs to be. Let me know if you need more info...

Uh, why aren't you just doing
/\/{2,}/g
? Your regexes in the OP seem way more complicated...
\/ matches a literal backslash character
{2,} tells to match it twice or more
/g makes the pattern global so you can find all occurences of the pattern in your strings.

[\/]+ should match one or more /s.

/(.)$1+/
would find any place where a single character occurs 2 or more times. the (.) matches a single character, and captures that character into $1, which you then require to be immediately after the initial character, 1 or more times.
For slashes, you can simplify it down to
/\/{2,}/
/\/\/+/
but then you're into leaning toothpick territory.

Why not use indexof? That would be simpler.

Here's the answer.
val.match( /^[\/|_]|[~"#%&*:<>?\\{|}]|[\/]{2,}|[\/|.]$/ )
Not sure why the other version doesn't work, but maybe someone could shed some light onto the matter.
Tests:
_text - Failed leading underscore
/text - Falied leading whack
text~moreText - Failed contains invalid character: ~"#%&*:<>?\{|}
text//text - Failed double whack
text/ - Failed trailing whack
text. - Failed trailing period
Not sure why the code below wasn't working, but moved the double whack test and it works now:
val.match( /^[\/|_]|[\/]{2,}|[~"#%&*:<>?\\{|}]|[\/|.]$/ )

We Keep Coding

JavaScript is the programming language of the Web.

Exact string negation in javascript regexpressions - javascript

Related

JS Regexp to exclude forward slash after .com in url

Get the Opposite of a Regular Expression [duplicate]

Regex finding file names

RegEx "ignores" quantifier?

Find consecutive "//" in regex in JavaScript

Categories

Resources