Regular expression negative match - javascript

I can't seem to figure out how to compose a regular expression (used in Javascript) that does the following:
Match all strings where the characters after the 4th character do not contain "GP".
Some example strings:
EDAR - match!
EDARGP - no match
EDARDTGPRI - no match
ECMRNL - match
I'd love some help here...

Use zero-width assertions:
if (subject.match(/^.{4}(?!.*GP)/)) {
// Successful match
}
Explanation:
"
^ # Assert position at the beginning of the string
. # Match any single character that is not a line break character
{4} # Exactly 4 times
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
GP # Match the characters “GP” literally
)
"

You can use what's called a negative lookahead assertion here. It looks into the string ahead of the location and matches only if the pattern contained is /not/ found. Here is an example regular expression:
/^.{4}(?!.*GP)/
This matches only if, after the first four characters, the string GP is not found.

could do something like this:
var str = "EDARDTGPRI";
var test = !(/GP/.test(str.substr(4)));
test will return true for matches and false for non.

Related

How do I combine these three regex into one?

I'm trying to make a regular expression that only accepts:
Min 100 and atleast 1000 characters, characters “,’,<,> aren't allowed, two full stops one after another aren't allowed.
This is what I have for now:
^.{100,1000}$ → for 100 to 1000 characters
^[^"'<>]*$ → for the characters that aren't allowed
^([^._]|[.](?=[^.]|$)|_(?=[^_]|$))*$ → doesn't allow 2 consecutive dots
How do I combine this regex into one? ._.
This part [^._] means no dot or underscore and this part [.](?=[^.]|$)|_(?=[^_]|$) matches either a . or _ followed by the opposite or end of string.
You could write the pattern using a single negative lookahead assertion excluding __ or ..
^(?!.*([._])\1)[^"'<>\n]{100,1000}$
Explanation
^ Start of string
(?! Negative lookahead, assert that what is at the right is not
.*([._])\1 capture either . or _ and match the same captured char after it (meaning no occurrence of .. or __)
) Close lookahead
[^"'<>\n]{100,1000} Match 100-1000 times any character except the listed
$ End of string
Regex demo (with the quantifier set to {10,100} for the demo)

Regex related confusion

I would like to extract mkghj.bmg and pp.kp from the following string using a regex used in javascript
avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp
Everything enclosed within ,,,, needs to be ignored. There could be multiple instances of ,,,, But they will always occur even number of times (non occurrence is also a possibility) in the string.
Also, avb#gh.lk has an # sign, therefore it needs to be ignored
I guess the rule I am looking for is this - if there is a dot (.) look ahead and look behind :-
If the dot is enclosed inside ,,,, then ignore it
If the dot has an # before it with no space between the dot and #, ignore it
In all other cases, capture an unbroken set of characters (until a space is encountered) on either side of the dot
I came up with this regex, but it is not helpful
[^\, ]+([^# \,]+\w+)[^\, ]+
Generally speaking that is (mind the capturing group):
not_this | neither_this_nor_this | (but_this_interesting_stuff)
For your specific example, this could be
,,,,.*?,,,,|\S+#\S+|(\S+)
You need to check for the existance of group 1, see a demo on regex101.com.
In JS this would be:
var myString = "avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp";
var myRegexp = /,,,,.*?,,,,|\S+#\S+|(\S+)/g;
match = myRegexp.exec(myString);
while (match != null) {
if (typeof(match[1]) != 'undefined') {
console.log(match[1]);
}
match = myRegexp.exec(myString);
}
Regex
^[^ ]+ ([^ ]+) ,,,,.*,,,,\s+(.*)
Description
^ asserts position at start of a line
Match a single character not present in the list below [^ ]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character literally (case sensitive)
matches the character literally (case sensitive)
1st Capturing Group ([^ ]+)
Match a single character not present in the list below [^ ]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character literally (case sensitive)
,,,, matches the characters ,,,, literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
,,,, matches the characters ,,,, literally (case sensitive)
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (.*)
.* matches any character (except for line terminators)
You can replace the string between ,,, by '' and than from remaining you can split and search for # sign and filter.
let str = `avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp`
let op = str.replace(/,,,.*?,,,,|/g,'').split(' ').filter(e=> !e.includes('#') && e );
console.log(op)

Regular expression capture with optional trailing underscore and number

I'm trying to find a regular expression that will match the base string without the optional trailing number (_123). e.g.:
lorem_ipsum_test1_123 -> capture lorem_ipsum_test1
lorem_ipsum_test2 -> capture lorem_ipsum_test2
I tried using the following expression, but it would only work when there is a trailing _number.
/(.+)(?>_[0-9]+)/
/(.+)(?>_[0-9]+)?/
Similarly, adding the ? (zero or more) quantifier only worked when there is no trailing _number, otherwise, the trailing _number would just be part of the first capture.
Any suggestions?
You may use the following expression:
^(?:[^_]+_)+(?!\d+$)[^_]+
^ Anchor beginning of string.
(?:[^_]+_)+ Repeated non capturing group. Negated character set for anything other than a _, followed by a _.
(?!\d+$) Negative lookahead for digits at the end of the string.
[^_]+ Negated character set for anything other than a _.
Regex demo here.
Please note that the \n in the character sets in the Regex demo are only for demonstration purposes, and should by all means be removed when using as a pattern in Javascript.
Javascript demo:
var myString = "lorem_ipsum_test1_123";
var myRegexp = /^(?:[^_]+_)+(?!\d+$)[^_]+/g;
var match = myRegexp.exec(myString);
console.log(match[0]);
var myString = "lorem_ipsum_test2"
var myRegexp = /^(?:[^_]+_)+(?!\d+$)[^_]+/g;
var match = myRegexp.exec(myString);
console.log(match[0]);
You might match any character and use a negative lookahead that asserts that what follows is not an underscore, one or more digits and the end of the string:
^(?:(?!_\d+$).)*
Explanation
^ Assert start of the string
(?: Non capturing group
(?! Negative lookahead to assert what is on the right side is not
_\d+$Match an underscore, one or more digits and assert end of the string
.) Match any character and close negative lookahead
)* Close non capturing group and repeat zero or more times
Regex demo
const strings = [
"lorem_ipsum_test1_123",
"lorem_ipsum_test2"
];
let pattern = /^(?:(?!_\d+$).)*/;
strings.forEach((s) => {
console.log(s + " ==> " + s.match(pattern)[0]);
});
You are asking for
/^(.*?)(?:_\d+)?$/
See the regex demo. The point here is that the first dot pattern must be non-greedy and the _\d+ should be wrapped with an optional non-capturing group and the whole pattern (especially the end) must be enclosed with anchors.
Details
^ - start of string
(.*?) - Capturing group 1: any zero or more chars other than line break chars, as few as possible due to the non-greedy ("lazy") quantifier *?
(?:_\d+)? - an optional non-capturing group matching 1 or 0 occurrences of _ and then 1+ digits
$ - end of string.
However, it seems easier to use a mere replacing approach,
s = s.replace(/_\d+$/, '')
If the string ends with _ and 1+ digits, the substring will get removed, else, the string will not change.
See this regex demo.
Try to check if the string contains the trailing number. If it does you get only the other part. Otherwise you get the whole string.
var str = "lorem_ipsum_test1_123"
if(/_[0-9]+$/.test(str)) {
console.log(str.match(/(.+)(?=_[0-9]+)/g))
} else {
console.log(str)
}
Or, a lot more concise:
str = str.replace(/_[0-9]+$/g, "")

Match Specific File Path Regex

so far I have this regex $fileregex = /([a-z]:\\\\([^\\\\^\\.])*)|(\/[^\/.])/i; but I am very confused on what to do next.
I want to match strings in this format
c:\\something\\else\\something
c:\\something\\else\\something.whatever
/etc/whatever/something/here
/etc/here.txt
/
c:\\
But I don't want to match, for example
c:\oneslash\text.txt
\etc\hi
I am really stuck on my regex especially on repeating the optional path, as one could just request the root.Can anyone help me out with the regex?
This one should work:
preg_match_all('%[A-Za-z]:\\\\\\\\(.*?\\\\\\\\)*.*|/(.*?/)*.*%m', $input, $regs, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($regs[0]); $i++) {
# Matched text = $regs[0][$i];
}
Result:
Description of the Regex:
Match either the regular expression below (attempting the next alternative only if this one fails)
[A-Za-z] Match a single character present in the list below
A character in the range between “A” and “Z”
A character in the range between “a” and “z”
: Match the character “:” literally
\\\\ Match the character “\” literally
\\\\ Match the character “\” literally
( Match the regular expression below and capture its match into backreference number 1
. Match any single character that is not a line break character
*? Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\\\\ Match the character “\” literally
\\\\ Match the character “\” literally
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
| Or match regular expression number 2 below (the entire match attempt fails if this one fails to match)
/ Match the character “/” literally
( Match the regular expression below and capture its match into backreference number 2
. Match any single character that is not a line break character
*? Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
/ Match the character “/” literally
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Have you try this:
/^([a-zA-Z]\:|\\\\[^\/\\:*?"<>|]+\\[^\/\\:*?"<>|]+)(\\[^\/\\:*?"<>|]+)+(\.[^\/\\:*?"<>|]+)$/
This regular expression will match any valid file path. It checks local drives and network path. The file extension is required.
Further references : The regular expression library

Need help with a regular expression in Javascript

The box should allow:
Uppercase and lowercase letters (case insensitive)
The digits 0 through 9
The characters, ! # $ % & ' * + - / = ? ^ _ ` { | } ~
The character "." provided that it is not the first or last character
Try
^(?!\.)(?!.*\.$)[\w.!#$%&'*+\/=?^`{|}~-]*$
Explanation:
^ # Anchor the match at the start of the string
(?!\.) # Assert that the first characters isn't a dot
(?!.*\.$) # Assert that the last characters isn't a dot
[\w.!#$%&'*+\/=?^`{|}~-]* # Match any number of allowed characters
$ # Anchor the match at the end of the string
Try something like this:
// the '.' is not included in this:
var temp = "\\w,!#$%&'*+/=?^`{|}~-";
var regex = new RegExp("^["+ temp + "]([." + temp + "]*[" + temp + "])?$");
// ^
// |
// +---- the '.' included here
Looking at your comments it's clear you don't know exactly what a character class does. You don't need to separate the characters with comma's. The character class:
[0-9,a-z]
matches a single (ascii) -digit or lower case letter OR a comma. Note that \w is a "short hand class" that equals [a-zA-Z0-9_]
More information on character classes can be found here:
http://www.regular-expressions.info/charclass.html
You can do something like:
^[a-zA-Z0-9,!#$%&'*+-/=?^_`{|}~][a-zA-Z0-9,!#$%&'*+-/=?^_`{|}~.]*[a-zA-Z0-9,!#$%&'*+-/=?^_`{|}~]$
Here's how I would do it:
/^[\w!#$%&'*+\/=?^`{|}~-]+(?:\.[\w!#$%&'*+\/=?^`{|}~-]+)*$/
The first part is required to match at least one non-dot character, but everything else is optional, allowing it to match a string with only one (non-dot) character. Whenever a dot is encountered, at least one non-dot character must follow, so it won't match a string that begins or ends with a dot.
It also won't match a string with two or more consecutive dots in it. You didn't specify that, but it's usually one of the requirements when people ask for patterns like this. If you want to permit consecutive dots, just change the \. to \.+.

Categories