How to extract an optional query parameter using regex in Javascript - javascript

I'd like to construct a regex that will check for a "path" and a "foo" parameter (non-negative integer). "foo" is optional. It should:
MATCH
path?foo=67 # path found, foo = 67
path?foo=67&bar=hello # path found, foo = 67
path?bar=bye&foo=1&baz=12 # path found, foo = 1
path?bar=123 # path found, foo = ''
path # path found, foo = ''
DO NOT MATCH
path?foo=37signals # foo is not integer
path?foo=-8 # foo cannot be negative
something?foo=1 # path not found
Also, I'd like to get the value of foo, without performing an additional match.
What would be the simplest regex to achieve this?

The Answer
Screw your hard work, I just want the answer! Okay, here you go...
var regex = /^path(?:(?=\?)(?:[?&]foo=(\d*)(?=[&#]|$)|(?![?&]foo=)[^#])+)?(?=#|$)/,
URIs = [
'path', // valid!
'pathbreak', // invalid path
'path?foo=123', // valid!
'path?foo=-123', // negative
'invalid?foo=1', // invalid path
'path?foo=123&bar=abc', // valid!
'path?bar=abc&foo=123', // valid!
'path?bar=foo', // valid!
'path?foo', // valid!
'path#anchor', // valid!
'path#foo=bar', // valid!
'path?foo=123#bar', // valid!
'path?foo=123abc', // not an integer
];
for(var i = 0; i < URIs.length; i++) {
var URI = URIs[i],
match = regex.exec(URI);
if(match) {
var foo = match[1] ? match[1] : 'null';
console.log(URI + ' matched, foo = ' + foo);
} else {
console.log(URI + ' is invalid...');
}
}
<script src="https://getfirebug.com/firebug-lite-debug.js"></script>
Research
Your bounty request asks for "credible and/or official sources", so I'll quote the RFC on query strings.
The query component contains non-hierarchical data that, along with data in the path component (Section 3.3), serves to identify a resource within the scope of the URI's scheme and naming authority (if any). The query component is indicated by the first question mark ("?") character and terminated by a number sign ("#") character or by the end of the URI.
This seems pretty vague on purpose: a query string starts with the first ? and is terminated by a # (start of anchor) or the end of the URI (or string/line in our case). They go on to mention that most data sets are in key=value pairs, which is what it seems like what you expect to be parsing (so lets assume that is the case).
However, as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters.
With all this in mind, let's assume a few things about your URIs:
Your examples start with the path, so the path will be from the beginning of the string until a ? (query string), # (anchor), or the end of the string.
The query string is the iffy part, since RFC doesn't really define a "norm". A browser tends to expect a query string to be generated from a form submission and be a list of key=value pairs appended by & characters. Keeping this mentality:
A key cannot be null, will be preceded by a ? or &, and cannot contain a =, & or #.
A value is optional, will be preceded by key=, and cannot contain a & or #.
Anything after a # character is the anchor.
Let's Begin!
Let's start by mapping out our basic URI structure. You have a path, which is characters starting at the string and up until a ?, #, or the end of the string. You have an optional query string, which starts at a ? and goes until a # or the end of the string. And you have an optional anchor, which starts at a # and goes until the end of the string.
^
([^?#]+)
(?:
\?
([^#]+)
)?
(?:
#
(.*)
)?
$
Let's do some clean up before digging into the query string. You can easily require the path to equal a certain value by replacing the first capture group. Whatever you replace it with (path), will have to be followed by an optional query string, an optional anchor, and the end of the string (no more, no less). Since you don't need to parse the anchor, the capturing group can be replaced by ending the match at either a # or the end of the string (which is the end of the query parameter).
^path
(?:
\?
([^#\+)
)?
(?=#|$)
Stop Messing Around
Okay, I've been doing a lot of setup without really worrying about your specific example. The next example will match a specific path (path) and optionally match a query string while capturing the value of a foo parameter. This means you could stop here and check for a valid match..if the match is valid, then the first capture group must be null or a non-negative integer. But that wasn't your question, was it. This got a lot more complicated, so I'm going to explain the expression inline:
^ (?# match beginning of the string)
path (?# match path literally)
(?: (?# begin optional non-capturing group)
(?=\?) (?# lookahead for a literal ?)
(?: (?# begin optional non-capturing group)
[?&] (?# keys are preceded by ? or &)
foo (?# match key literally)
(?: (?# begin optional non-capturing group)
= (?# values are preceded by =)
([^&#]*) (?# values are 0+ length and do not contain & or #)
) (?# end optional non-capturing group)
| (?# OR)
[^#] (?# query strings are non-# characters)
)+ (?# end repeating non-capturing group)
)? (?# end optional non-capturing group)
(?=#|$) (?# lookahead for a literal # or end of the string)
Some key takeaways here:
Javascript doesn't support lookbehinds, meaning you can't look behind for a ? or & before the key foo, meaning you actually have to match one of those characters, meaning the start of your query string (which looks for a ?) has to be a lookahead so that you don't actually match the ?. This also means that your query string will always be at least one character (the ?), so you want to repeat the query string [^#] 1+ times.
The query string now repeats one character at a time in a non-capturing group..unless it sees the key foo, in which case it captures the optional value and continues repeating.
Since this non-capture query string group repeats all the way until the anchor or end of the URI, a second foo value (path?foo=123&foo=bar) would overwrite the initial captured value..meaning you wouldn't 100% be able to rely on the above solution.
Final Solution?
Okay..now that I've captured the foo value, it's time to kill the match on a values that are not positive integers.
^ (?# match beginning of the string)
path (?# match path literally)
(?: (?# begin optional non-capturing group)
(?=\?) (?# lookahead for a literal ?)
(?: (?# begin optional non-capturing group)
[?&] (?# keys are preceeded by ? or &)
foo (?# match key literally)
= (?# values are preceeded by =)
(\d*) (?# value must be a non-negative integer)
(?= (?# begin lookahead)
[&#] (?# literally match & or #)
| (?# OR)
$ (?# match end of the string)
) (?# end lookahead)
| (?# OR)
(?! (?# begin negative lookahead)
[?&] (?# literally match ? or &)
foo= (?# literally match foo=)
) (?# end negative lookahead)
[^#] (?# query strings are non-# characters)
)+ (?# end repeating non-capturing group)
)? (?# end optional non-capturing group)
(?=#|$) (?# lookahead for a literal # or end of the string)
Let's take a closer look at some of the juju that went into that expression:
After finding foo=\d*, we use a lookahead to ensure that it is followed by a &, #, or the end of the string (the end of a query string value).
However..if there is more to foo=\d*, the regex would be kicked back by the alternator to a generic [^#] match right at the [?&] before foo. This isn't good, because it will continue to match! So before you look for a generic query string ([^#]), you must make sure you are not looking at a foo (that must be handled by the first alternation). This is where the negative lookahead (?![?&]foo=) comes in handy.
This will work with multiple foo keys, since they will all have to equal non-negative integers. This lets foo be optional (or equal null) as well.
Disclaimer: Most Regex101 demos use PHP for better syntax highlighting and include \n in negative character classes since there are multiple lines of examples.

Nice question! Seems fairly simple at first...but there are a lot of gotchas. Would advise checking any claimed solution will handle the following:
ADDITIONAL MATCH TESTS
path? # path found, foo = ''
path#foo # path found, foo = ''
path#bar # path found, foo = ''
path?foo= # path found, foo = ''
path?bar=1&foo= # path found, foo = ''
path?foo=&bar=1 # path found, foo = ''
path?foo=1#bar # path found, foo = 1
path?foo=1&foo=2 # path found, foo = 2
path?foofoo=1 # path found, foo = ''
path?bar=123&foofoo=1 # path found, foo = ''
ADDITIONAL DO NOT MATCH TESTS
pathbar? # path not found
pathbar?foo=1 # path not found
pathbar?bar=123&foo=1 # path not found
path?foo=a&foofoo=1 # not an integer
path?foofoo=1&foo=a # not an integer
The simplest regex I could come up with that works for all these additional cases is:
path(?=(\?|$|#))(\?(.+&)?foo=(\d*)(&|#|$)|((?![?&]foo=).)*$)
However, would advise adding ?: to the unused capturing groups so they are ignored and you can easily get the foo value from Group 1 - see Debuggex Demo
path(?=(?:\?|$|#))(?:\?(?:.+&)?foo=(\d*)(?:&|#|$)|(?:(?![?&]foo=).)*$)

^path\b(?!.*[?&]foo=(?!\d+(?=&|#|$)))(?:.*[?&]foo=(\d+)(?=&|#|$))?
Basically I just broke it down into three parts
^path\b # starts with path
(?!.*[?&]foo=(?!\d+(?=&|#|$))) # not followed by foo with an invalid value
(?:.*[?&]foo=(\d+)(?=&|#|$))? # possibly followed by foo with a valid value
see validation here http://regexr.com/39i7g
Caveats:
will match path#bar=1&foo=27
will not match path?foo=
The OP didn't mention these requirements and since he wants a simple regex (oxymoron?) I did not attempt to solve them.

path.+?(?:foo=(\d+))(?![a-zA-Z\d])|path((?!foo).)*$
You can try this.See demo.
http://regex101.com/r/jT3pG3/10

You can try the following regex:
path(?:.*?foo=(\d+)\b|()(?!.*foo))
regex101 demo
There are two possible matches after path:
.*?foo=(\d+)\b i.e. foo followed by digits.
OR
()(?!.*foo) an empty string if there is no foo ahead.
Add some word boundaries (\b) if you don't want the regex to interpret other words (e.g. another parameter named barfoobar) around the foos.
path(?:.*?\bfoo=(\d+)\b|()(?!.*\bfoo\b))

You can check for the existence of 3rd matched group. It it is not there, the foo value would be null; otherwise, it is the group itself:
/^(path)(?:$|\?(?:(?=.*\b(foo=)(\d+)\b.*$)|(?!foo=).*?))/gm
An example on regex101: http://regex101.com/r/oP6lU7/1

Dealing with javascript engine to make Regular Expressions besides all the lacks it has in compare with PCRE, somehow is enjoyable!
I made this RegEx, simple and understandable:
^(?=path\?).*foo=(\d*)(?:&|$)|path$
Explanations
^(?=path\?) # A positive lookahead to ensure we have "path" at the very begining
.*foo=(\d*)(?:&|$) # Looking for a string includes foo=(zero or more digits) following a "&" character or end of string
| # OR
path$ # Just "path" itself
Runnable snippet:
var re = /^(?=path\?).*foo=(\d*)(?:&|$)|path$/gm;
var str = 'path?foo=67\npath?foo=67&bar=hello\npath?bar=bye&foo=1&baz=12\npath\npathtest\npath?foo=37signals\npath?foo=-8\nsomething?foo=1';
var m, n = [];
while ((m = re.exec(str)) != null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
n.push(m[0]);
}
alert( JSON.stringify(n) );
Or a Live demo for more details

path(?:\?(?:[^&]*&)*foo=([0-9]+)(?:[&#]|$))?
This is as short as most, and reads more straightforwardly, since things that appear once in the string appear once in the RE.
We match:
the initial path
a question mark, (or skip to end)
some blocks terminated by ampersands
our parameter assignment
a closing confirmation, either starting the next syntactic element, or ending the line
Unfortunately it matches foo to None rather than '' when the foo parameter is omitted, but in Python (my language of choice) that is considered more appropriate. You could complain if you wanted, or just or with '' afterwards.

Based on the OP's data here is my attempt pattern
^(path)\b(?:[^f]+|f(?!oo=))(?!\bfoo=(?!\d+\b))(?:\bfoo=(\d+)\b)?
if path is found: sub-pattern #1 will contains "path"
if foo is valid: sub-pattern #2 will contains "foo value if any"
Demo
^(path)\b "path"
(?:[^f]+|f(?!oo=)) followed by anything but "foo="
(?!\bfoo=(?!\d+\b)) if "foo=" is found it must not see anything but \d+\b
(?:\bfoo=(\d+)\b)? if valid "foo=" is found, capture "foo" value

t = 'path?foo=67&bar=hello';
console.log(t.match(/\b(foo|path)\=\d+\b/))
regex /\b(foo|path)\=\d+\b/

Related

JavaScript regex replace last pattern in string?

I have a string which looks like
var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42)
.ramBam(8.1, 0).bam(8.1, (slot_height-thick)/2)
I want to put a tag around the last .bam() or .ramBam().
str.replace(/(\.(ram)?bam\(.*?\))$/i, '<span class="focus">$1</span>');
And I hope to get:
new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42).ramBam(8.1, 0)<span class="focus">.bam(8.1, (slot_height-thick)/2)</span>
But somehow I keep on fighting with the non greedy parameter, it wraps everything after new Bammer with the span tags. Also tried a questionmark after before the $ to make the group non greedy.
I was hoping to do this easy, and with the bam or ramBam I thought that regex would be the easiest solution but I think I'm wrong.
Where do I go wrong?
You can use the following regex:
(?!.*\)\.)(\.(?:bam|ramBam)\(.*\))$
Demo
(?!.*\)\.) # do not match ').' later in the string
( # begin capture group 1
.\ # match '.'
(?:bam|ramBam) # match 'bam' or 'ramBam' in non-cap group
\(.*\) # match '(', 0+ chars, ')'
) # end capture group 1
$ # match end of line
For the example given in the question the negative lookahead (?!.*\)\.) moves an internal pointer to just before the substring:
.bam(8.1, (slot_height-thick)/2)
as that is the first location where there is no substring ). later in the string.
If there were no end-of-line anchor $ and the string ended:
...0).bam(8.1, (slot_height-thick)/2)abc
then the substitution would still be made, resulting in a string that ends:
...0)<span class="focus">.bam(8.1, (slot_height-thick)/2)</span>abc
Including the end-of-line anchor prevents the substitution if the string does not end with the contents of the intended capture group.
Regex to use:
/\.((?:ram)?[bB]am\([^)]*\))(?!.*\.(ram)?[bB]am\()/
\. Matches period.
(?:ram)? Optionally matches ram in a non-capturing group.
[bB]am Matches bam or Bam.
\( Matches (.
[^)]* Matches 0 or more characters as long as they are not a ).
) Matches a ). Items 2. through 6. are placed in Capture Group 1.
(?!.*\.(ram)?[bB]am\() This is a negative lookahead assertion stating that the rest of the string contains no further instance of .ram( or .rambam( or .ramBam( and therefore this is the last instance.
See Regex Demo
let str = 'var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, 0).bam(0, -42).ramBam(8.1, 0).bam(8.1, slot_height)';
console.log(str.replace(/\.((?:ram)?[bB]am\([^)]*\))(?!.*\.(ram)?[bB]am\()/, '<span class="focus">.$1</span>'));
Update
The JavaScript regular expression engine is not powerful enough to handle nested parentheses. The only way I know of solving this is if we can make the assumption that after the final call to bam or ramBam there are no more extraneous right parentheses in the string. Then where I had been scanning the parenthesized expression with \([^)]*\), which would fail to pick up final parentheses, we must now use \(.*\) to scan everything until the final parentheses. At least I know no other way. But that also means that the way that I had been using to determine the final instance of ram or ramBam by using a negative lookahead needs a slight adjustment. I need to make sure that I have the final instance of ram or ramBam before I start doing any greedy matches:
(\.(?:bam|ramBam)(?!.*\.(bam|ramBam)\()\((.*)\))
See Regex Demo
\. Matches ..
(?:bam|ramBam) Matches bam or ramBam.
(?!.*\.(bam|ramBam)\() Asserts that Item 1. was the final instance
\( Matches (.
(.*) Greedily matches everything until ...
\) the final ).
) Items 1. through 6. are placed in Capture Group 1.
let str = 'var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42) .ramBam(8.1, 0).bam(8.1, (slot_height-thick)/2)';
console.log(str.replace(/(\.(?:bam|ramBam)(?!.*\.(bam|ramBam)\()\((.*)\))/, '<span class="focus">$1</span>'));
The non-greedy flag isn't quite right here, as that will just make the regex select the minimal number of characters to fit the pattern. I'd suggest that you do something with a negative lookahead like this:
str.replace(/(\.(?:ram)?[Bb]am\([^)]*\)(?!.*(ram)?[Bb]am))/i, '<span class="focus">$1</span>');
Note that this will only replace the last function name (bam OR ramBam), but not both. You'd need to take a slightly different approach to be able to replace both of them.

Regex windows path validator

I've tried to find a windows file path validation for Javascript, but none seemed to fulfill the requirements I wanted, so I decided to build it myself.
The requirements are the following:
the path should not be empty
may begin with x:\, x:\\, \, // and followed by a filename (no file
extension required)
filenames cannot include the following special characters: <>:"|?*
filenames cannot end with dot or space
Here is the regex I came up with:
/^([a-z]:((\|/|\\|//))|(\\|//))[^<>:"|?*]+/i
But there are some issues:
it validates also filenames that include the special characters
mentioned in the rules
it doesn't include the last rule (cannot end with: . or space)
var reg = new RegExp(/^([a-z]:((\\|\/|\\\\|\/\/))|(\\\\|\/\/))[^<>:"|?*]+/i);
var startList = [
'C://test',
'C://te?st.html',
'C:/test',
'C://test.html',
'C://test/hello.html',
'C:/test/hello.html',
'//test',
'/test',
'//test.html',
'//10.1.1.107',
'//10.1.1.107/test.html',
'//10.1.1.107/test/hello.html',
'//10.1.1.107/test/hello',
'//test/hello.txt',
'/test/html',
'/tes?t/html',
'/test.html',
'test.html',
'//',
'/',
'\\\\',
'\\',
'/t!esrtr',
'C:/hel**o'
];
startList.forEach(item => {
document.write(reg.test(item) + ' >>> ' + item);
document.write("<br>");
});
Unfortunately, JavaScript flavour of regex does not support lookbehinds,
but fortunately it does support lookaheads, and this is the key factor
how to construct the regex.
Let's start from some observations:
After a dot, slash, backslash or a space there can not occur another
dot, slash or backslash. The set of "forbidden" chars includes also
\n, because none of these chars can be the last char of the file name
or its segment (between dots or (back-)slashes).
Other chars, allowed in the path are the chars which you mentioned
(other than ...), but the "exclusion list" must include also a dot,
slash, backslash, space and \n (the chars mentioned in point 1).
After the "initial part" (C:\) there can be multiple instances of
char mentioned in point 1 or 2.
Taking these points into account, I built the regex from 3 parts:
"Starting" part, matching the drive letter, a colon and up to 2
slashes (forward or backward).
The first alternative - either a dot, slash, backslash or a space,
with negative lookahead - a list of "forbidden" chars after each of
the above chars (see point 1).
The second alternative - chars mentioned in point 2.
Both the above alternatives can occur multiple times (+ quantifier).
So the regex is as follows:
^ - Start of the string.
(?:[a-z]:)? - Drive letter and a colon, optional.
[\/\\]{0,2} - Either a backslash or a slash, between 0 and 2 times.
(?: - Start of the non-capturing group, needed due to the +
quantifier after it.
[.\/\\ ] - The first alternative.
(?![.\/\\\n]) - Negative lookahead - "forbidden" chars.
| - Or.
[^<>:"|?*.\/\\ \n] - The second alternative.
)+ - End of the non-capturing group, may occur multiple times.
$ - End of the string.
If you attempt to match each path separately, use only i option.
But if you have multiple paths in separate rows, and match them
globally in one go, add also g and m options.
For a working example see https://regex101.com/r/4JY31I/1
Note: I suppose that ! should also be treated as a forbidden
character. If you agree, add it to the second alternative, e.g. after *.
This may work for you: ^(?!.*[\\\/]\s+)(?!(?:.*\s|.*\.|\W+)$)(?:[a-zA-Z]:)?(?:(?:[^<>:"\|\?\*\n])+(?:\/\/|\/|\\\\|\\)?)+$
You have a demo here
Explained:
^
(?!.*[\\\/]\s+) # Disallow files beginning with spaces
(?!(?:.*\s|.*\.|\W+)$) # Disallow bars and finish with dot/space
(?:[a-zA-Z]:)? # Drive letter (optional)
(?:
(?:[^<>:"\|\?\*\n])+ # Word (non-allowed characters repeated one or more)
(?:\/\/|\/|\\\\|\\)? # Bars (// or / or \\ or \); Optional
)+ # Repeated one or more
$
Since this post seems to be (one of) the top result(s) in a search for a RegEx Windows path validation pattern, and given the caveats / weaknesses of the above proposed solutions, I'll include the solution that I use for validating Windows paths (and which, I believe, addresses all of the points raised previously in that use-case).
I could not come up with a single viable REGEX, with or without look-aheads and look behinds that would do the job, but I could do it with two, without any look-aheads, or -behinds!
Note, though, that successive relative paths (i.e. "..\..\folder\file.exe") will not pass this pattern (though using "..\" or ".\" at the beginning of the string will). Periods and spaces before and after slashes, or at the end of the line are failed, as well as any character not permitted according to Microsoft's short-filename specification:
https://learn.microsoft.com/en-us/windows/win32/msi/filename
First Pattern:
^ (?# <- Start at the beginning of the line #)
(?# validate the opening drive or path delimiter, if present -> #)
(?: (?# "C:", "C:\", "C:..\", "C:.\" -> #)
(?:[A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)
| (?# or "\", "..\", ".\", "\\" -> #)
(?:[\/\\]{1,2}|\.{1,2}[\/\\])
)?
(?# validate the form and content of the body -> #)
(?:[^\x00-\x1A|*?\v\r\n\f+\/,;"'`\\:<>=[\]]+[\/\\]?)+
$ (?# <- End at the end of the line. #)
This will generally validate the path structure and character validity, but it also allows problematic things like double-periods, double-backslashes, and both periods and backslashes that are preceded-, and/or followed-by spaces or periods. Paths that end with spaces and/or periods are also permitted.
To address these problems I perform a second test with another (similar) pattern:
^ (?# <- Start at the beginning of the line #)
(?# validate the opening drive or path delimiter, if present -> #)
(?: (?# "C:", "C:\", "C:..\", "C:.\" -> #)
(?:[A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)
| (?# or "\", "..\", ".\", "\\" -> #)
(?:[\/\\]{1,2}|\.{1,2}[\/\\])
)?
(?# ensure that undesired patterns aren't present in the string -> #)
(?:([^\/\\. ]|[^\/. \\][\/. \\][^\/. \\]|[\/\\]$)*
[^\x00-\x1A|*?\s+,;"'`:<.>=[\]]) (?# <- Ensure that the last character is valid #)
$ (?# <- End at the end of the line. #)
This validates that, within the path body, no multiple-periods, multiple-slashes, period-slashes, space-slashes, slash-spaces or slash-periods occur, and that the path doesn't end with an invalid character. Annoyingly, I have to re-validate the <root> group because it's the one place where some of these combinations are allowed (i.e. ".\", "\\", and "..\") and I don't want those to invalidate the pattern.
Here is an implementation of my test (in C#):
/// <summary>Performs pattern testing on a string to see if it's in a form recognizable as an absolute path.</summary>
/// <param name="test">The string to test.</param>
/// <param name="testExists">If TRUE, this also verifies that the specified path exists.</param>
/// <returns>TRUE if the contents of the passed string are valid, and, if requested, the path exists.</returns>
public bool ValidatePath( string test, bool testExists = false )
{
bool result = !string.IsNullOrWhiteSpace(test);
string
drivePattern = /* language=regex */
#"^(([A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)|([\/\\]{1,2}|\.{1,2}[\/\\]))?",
pattern = drivePattern + /* language=regex */
#"([^\x00-\x1A|*?\t\v\f\r\n+\/,;""'`\\:<>=[\]]+[\/\\]?)+$";
result &= Regex.IsMatch( test, pattern, RegexOptions.ExplicitCapture );
pattern = drivePattern + /* language=regex */
#"(([^\/\\. ]|[^\/. \\][\/. \\][^\/. \\]|[\/\\]$)*[^\x00-\x1A|*?\s+,;""'`:<.>=[\]])$";
result &= Regex.IsMatch( test, pattern, RegexOptions.ExplicitCapture );
return result && (!testExists || Directory.Exists( test ));
}

Regular Expression to only get a specific line

I am attempting to only extract a specific line without any other characters after. For example:
permit ip any any
permit oped any any eq 10.52.5.15
permit top any any (sdfg)
permit sdo any host 10.51.86.17 eq sdg
I would like to match only the first line permit ip any any and not the others. A thing to take note is that the second word ip can be any word.
Meaning, I find only permit (anyword) any any and if there was a character after the second any, do not match.
I tried to do \bpermit.\w+.(?:any.any).([$&+,:;=?##|'<>.^*()%!-\w].+)but that finds the other lines except the permit ip any any. I did attempt to do a reverse lookup, but to no success.
Use the $ end of line anchor after the final "any" and the m multiline regexp flag.
/^permit \w+ any any$/gm
https://regex101.com/r/FfOp5k/2
If you are using Java based regex, you can include the multiline flag in the expression. This syntax is not supported by JavaScript regex.
(?m)^permit \w+ any$
I tried to do \bpermit.\w+.(?:any.any).([$&+,:;=?##|'<>.^*()%!-\w].+) but that finds the other lines except the permit ip any any. I did attempt to do a reverse lookup, but to no success.
Lets take apart your regex to see what your regex says:
\b # starting on a word boundary (space to non space or reverse)
permit # look for the literal characters "permit" in that order
. # followed by any character
\w+ # followed by word characters (letters, numbers, underscores)
. # followed by any character
(?: # followed by a non-capturing group that contains
any # the literal characters 'any'
. # any character
any # the literal characters 'any'
)
. # followed by any character <-- ERROR HERE!
( # followed by a capturing group
[$&+,:;=?##|'<>.^*()%!-\w] # any one of these many characters or word characters
.+ # then any one character one or more times
)
The behavior you describe...
but that finds the other lines except the permit ip any any.
matches what you've specified. Specifically, the regex above requires that there be characters after the 'any any'. Because permit \w+ any any does not have any characters after the any any part, the regex fails at the <-- ERROR HERE! mark in my breakdown above.
If that last part must be captured (using a capturing group) but it may not exist, you can make that entire last part optional using the ? character.
This would look like:
permit \w+ any any(?: (.+))?
for a breakdown of:
permit # the word permit
[ ] # a literal space
\w+ # one or more word characters
[ ] # a literal space
any # the word any
[ ] # another literal space
any # another any; all of this is requred.
(?: # a non-capturing group to start the "optional" part
[ ] # a literal space after the any
(.+) # everything else, including spaces, and capture it in a group
)? # end non-capturing group, but make it optional

Regex remove string in url

I have an url like https://randomsitename-dd555959b114a0.mydomain.com and want to remove the -dd555959b114a0 part of the url.
So randomsitename is a random name and the domain is static domain name.
Is this possible to remove the part with jquery or javascript?
Look at this code that is using regex
var url = "https://randomsitename-dd555959b114a0.mydomain.com";
var res = url.replace(/ *\-[^.]*\. */g, ".");
http://jsfiddle.net/VYw9Y/
It's usually best to code for all possible cases and since hyphens are allowed within any part of domain names, you'll more than likely want to use a more specific RexExp such as:
^ # start of string
( # start first capture group
[a-z]+ # one or more letters
) # end first capture group
:// # literal separator
( # start second capture group
[^.-]+ # one or more chars except dot or hyphen
) # end second capture group
(?: # start optional non-capture group
- # literal hyphen
[^.]+ # one or more chars except dot
)? # end optional non-capture group
( # start third capture group
.+ # one or more chars
) # end third capture group
$ # end of string
Or without comments:
^([a-z]+)://([^.-])(?:-[^.]+)?(.+)$
(Remember to escape slashes if you use the literal form for RegExps rather than creating them as objects, i.e. /literal\/form/ vs. new RegExp('object/form'))
Used in a string replacement, the second argument should then be: $1://$2$3
Previous answers will fail for URLs like http://foo.bar-baz.com or http://foo-bar.baz-blarg.com.
You could try this regex,
(.*)(-[^\.]*)(.*$)
Your code should be,
var url = "https://randomsitename-dd555959b114a0.mydomain.com";
var res = url.replace(/(.*)(-[^\.]*)(.*$)/, "$1$3");
//=>https://randomsitename.mydomain.com
Explanation:
(.*) matches any character 0 or more times and it was stored into group 1 because we enclose those characters within paranthesis. Whenever the regex engine finds -, it stops storing it into group1.
(-[^\.]*) From - upto a literal . are stored into group2. It stops storing when it finds a literal dot.
(.*$) From the literal dot upto the last character are stored into group3.
$1$3 at the replacement part prints only the stored group1 and 3.
OR
(.*)(?:-[^\.]*)(.*$)
If you use this regex, in the replacement part you need to put only $1 and $2.
DEMO

Regular expression negative match

I can't seem to figure out how to compose a regular expression (used in Javascript) that does the following:
Match all strings where the characters after the 4th character do not contain "GP".
Some example strings:
EDAR - match!
EDARGP - no match
EDARDTGPRI - no match
ECMRNL - match
I'd love some help here...
Use zero-width assertions:
if (subject.match(/^.{4}(?!.*GP)/)) {
// Successful match
}
Explanation:
"
^ # Assert position at the beginning of the string
. # Match any single character that is not a line break character
{4} # Exactly 4 times
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
GP # Match the characters “GP” literally
)
"
You can use what's called a negative lookahead assertion here. It looks into the string ahead of the location and matches only if the pattern contained is /not/ found. Here is an example regular expression:
/^.{4}(?!.*GP)/
This matches only if, after the first four characters, the string GP is not found.
could do something like this:
var str = "EDARDTGPRI";
var test = !(/GP/.test(str.substr(4)));
test will return true for matches and false for non.

Categories