How can i match a filename, which is exactly (Capitals included) in the following format/pattern:
yymmdd_Name1_Data_Prices,
yymmdd_Name1_Data_Contact,
yymmdd_Name1_Data_Address.
I have files that need to be uploaded and the filenames are saved in a database. I want to match the given filename, with the pattern from the database, but i am unsure how to do that.
You could use the following regular expression.
\b\d{6}(?:_[A-Z][a-z]+){3}\b
Demo
Javascript's regex engine performs the following operations.
\b # match word break
\d{6} # match 6 digits
(?: # begin non-capture group
_[A-Z][a-z]+ # match '_', one upper-case letter, 1+ lower-case letters
) # end non-capture group
{3} # execute non-capture group 3 times
\b # match word break
Match the first 6 characters, which corresponds to a date, could be more precise than simply matching 6 digits. For example, assuming the year is 2000-2020, one could replace \d{6} with
(?:[01]\d|20)(?:0[1-9]|1[0-2])(?:0[1-9]|[12]\d|30|31)
but it still does would not ensure the date is valid.
I've tried to find a windows file path validation for Javascript, but none seemed to fulfill the requirements I wanted, so I decided to build it myself.
The requirements are the following:
the path should not be empty
may begin with x:\, x:\\, \, // and followed by a filename (no file
extension required)
filenames cannot include the following special characters: <>:"|?*
filenames cannot end with dot or space
Here is the regex I came up with:
/^([a-z]:((\|/|\\|//))|(\\|//))[^<>:"|?*]+/i
But there are some issues:
it validates also filenames that include the special characters
mentioned in the rules
it doesn't include the last rule (cannot end with: . or space)
var reg = new RegExp(/^([a-z]:((\\|\/|\\\\|\/\/))|(\\\\|\/\/))[^<>:"|?*]+/i);
var startList = [
'C://test',
'C://te?st.html',
'C:/test',
'C://test.html',
'C://test/hello.html',
'C:/test/hello.html',
'//test',
'/test',
'//test.html',
'//10.1.1.107',
'//10.1.1.107/test.html',
'//10.1.1.107/test/hello.html',
'//10.1.1.107/test/hello',
'//test/hello.txt',
'/test/html',
'/tes?t/html',
'/test.html',
'test.html',
'//',
'/',
'\\\\',
'\\',
'/t!esrtr',
'C:/hel**o'
];
startList.forEach(item => {
document.write(reg.test(item) + ' >>> ' + item);
document.write("<br>");
});
Unfortunately, JavaScript flavour of regex does not support lookbehinds,
but fortunately it does support lookaheads, and this is the key factor
how to construct the regex.
Let's start from some observations:
After a dot, slash, backslash or a space there can not occur another
dot, slash or backslash. The set of "forbidden" chars includes also
\n, because none of these chars can be the last char of the file name
or its segment (between dots or (back-)slashes).
Other chars, allowed in the path are the chars which you mentioned
(other than ...), but the "exclusion list" must include also a dot,
slash, backslash, space and \n (the chars mentioned in point 1).
After the "initial part" (C:\) there can be multiple instances of
char mentioned in point 1 or 2.
Taking these points into account, I built the regex from 3 parts:
"Starting" part, matching the drive letter, a colon and up to 2
slashes (forward or backward).
The first alternative - either a dot, slash, backslash or a space,
with negative lookahead - a list of "forbidden" chars after each of
the above chars (see point 1).
The second alternative - chars mentioned in point 2.
Both the above alternatives can occur multiple times (+ quantifier).
So the regex is as follows:
^ - Start of the string.
(?:[a-z]:)? - Drive letter and a colon, optional.
[\/\\]{0,2} - Either a backslash or a slash, between 0 and 2 times.
(?: - Start of the non-capturing group, needed due to the +
quantifier after it.
[.\/\\ ] - The first alternative.
(?![.\/\\\n]) - Negative lookahead - "forbidden" chars.
| - Or.
[^<>:"|?*.\/\\ \n] - The second alternative.
)+ - End of the non-capturing group, may occur multiple times.
$ - End of the string.
If you attempt to match each path separately, use only i option.
But if you have multiple paths in separate rows, and match them
globally in one go, add also g and m options.
For a working example see https://regex101.com/r/4JY31I/1
Note: I suppose that ! should also be treated as a forbidden
character. If you agree, add it to the second alternative, e.g. after *.
This may work for you: ^(?!.*[\\\/]\s+)(?!(?:.*\s|.*\.|\W+)$)(?:[a-zA-Z]:)?(?:(?:[^<>:"\|\?\*\n])+(?:\/\/|\/|\\\\|\\)?)+$
You have a demo here
Explained:
^
(?!.*[\\\/]\s+) # Disallow files beginning with spaces
(?!(?:.*\s|.*\.|\W+)$) # Disallow bars and finish with dot/space
(?:[a-zA-Z]:)? # Drive letter (optional)
(?:
(?:[^<>:"\|\?\*\n])+ # Word (non-allowed characters repeated one or more)
(?:\/\/|\/|\\\\|\\)? # Bars (// or / or \\ or \); Optional
)+ # Repeated one or more
$
Since this post seems to be (one of) the top result(s) in a search for a RegEx Windows path validation pattern, and given the caveats / weaknesses of the above proposed solutions, I'll include the solution that I use for validating Windows paths (and which, I believe, addresses all of the points raised previously in that use-case).
I could not come up with a single viable REGEX, with or without look-aheads and look behinds that would do the job, but I could do it with two, without any look-aheads, or -behinds!
Note, though, that successive relative paths (i.e. "..\..\folder\file.exe") will not pass this pattern (though using "..\" or ".\" at the beginning of the string will). Periods and spaces before and after slashes, or at the end of the line are failed, as well as any character not permitted according to Microsoft's short-filename specification:
https://learn.microsoft.com/en-us/windows/win32/msi/filename
First Pattern:
^ (?# <- Start at the beginning of the line #)
(?# validate the opening drive or path delimiter, if present -> #)
(?: (?# "C:", "C:\", "C:..\", "C:.\" -> #)
(?:[A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)
| (?# or "\", "..\", ".\", "\\" -> #)
(?:[\/\\]{1,2}|\.{1,2}[\/\\])
)?
(?# validate the form and content of the body -> #)
(?:[^\x00-\x1A|*?\v\r\n\f+\/,;"'`\\:<>=[\]]+[\/\\]?)+
$ (?# <- End at the end of the line. #)
This will generally validate the path structure and character validity, but it also allows problematic things like double-periods, double-backslashes, and both periods and backslashes that are preceded-, and/or followed-by spaces or periods. Paths that end with spaces and/or periods are also permitted.
To address these problems I perform a second test with another (similar) pattern:
^ (?# <- Start at the beginning of the line #)
(?# validate the opening drive or path delimiter, if present -> #)
(?: (?# "C:", "C:\", "C:..\", "C:.\" -> #)
(?:[A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)
| (?# or "\", "..\", ".\", "\\" -> #)
(?:[\/\\]{1,2}|\.{1,2}[\/\\])
)?
(?# ensure that undesired patterns aren't present in the string -> #)
(?:([^\/\\. ]|[^\/. \\][\/. \\][^\/. \\]|[\/\\]$)*
[^\x00-\x1A|*?\s+,;"'`:<.>=[\]]) (?# <- Ensure that the last character is valid #)
$ (?# <- End at the end of the line. #)
This validates that, within the path body, no multiple-periods, multiple-slashes, period-slashes, space-slashes, slash-spaces or slash-periods occur, and that the path doesn't end with an invalid character. Annoyingly, I have to re-validate the <root> group because it's the one place where some of these combinations are allowed (i.e. ".\", "\\", and "..\") and I don't want those to invalidate the pattern.
Here is an implementation of my test (in C#):
/// <summary>Performs pattern testing on a string to see if it's in a form recognizable as an absolute path.</summary>
/// <param name="test">The string to test.</param>
/// <param name="testExists">If TRUE, this also verifies that the specified path exists.</param>
/// <returns>TRUE if the contents of the passed string are valid, and, if requested, the path exists.</returns>
public bool ValidatePath( string test, bool testExists = false )
{
bool result = !string.IsNullOrWhiteSpace(test);
string
drivePattern = /* language=regex */
#"^(([A-Z]:(?:\.{1,2}[\/\\]|[\/\\])?)|([\/\\]{1,2}|\.{1,2}[\/\\]))?",
pattern = drivePattern + /* language=regex */
#"([^\x00-\x1A|*?\t\v\f\r\n+\/,;""'`\\:<>=[\]]+[\/\\]?)+$";
result &= Regex.IsMatch( test, pattern, RegexOptions.ExplicitCapture );
pattern = drivePattern + /* language=regex */
#"(([^\/\\. ]|[^\/. \\][\/. \\][^\/. \\]|[\/\\]$)*[^\x00-\x1A|*?\s+,;""'`:<.>=[\]])$";
result &= Regex.IsMatch( test, pattern, RegexOptions.ExplicitCapture );
return result && (!testExists || Directory.Exists( test ));
}
I am attempting to only extract a specific line without any other characters after. For example:
permit ip any any
permit oped any any eq 10.52.5.15
permit top any any (sdfg)
permit sdo any host 10.51.86.17 eq sdg
I would like to match only the first line permit ip any any and not the others. A thing to take note is that the second word ip can be any word.
Meaning, I find only permit (anyword) any any and if there was a character after the second any, do not match.
I tried to do \bpermit.\w+.(?:any.any).([$&+,:;=?##|'<>.^*()%!-\w].+)but that finds the other lines except the permit ip any any. I did attempt to do a reverse lookup, but to no success.
Use the $ end of line anchor after the final "any" and the m multiline regexp flag.
/^permit \w+ any any$/gm
https://regex101.com/r/FfOp5k/2
If you are using Java based regex, you can include the multiline flag in the expression. This syntax is not supported by JavaScript regex.
(?m)^permit \w+ any$
I tried to do \bpermit.\w+.(?:any.any).([$&+,:;=?##|'<>.^*()%!-\w].+) but that finds the other lines except the permit ip any any. I did attempt to do a reverse lookup, but to no success.
Lets take apart your regex to see what your regex says:
\b # starting on a word boundary (space to non space or reverse)
permit # look for the literal characters "permit" in that order
. # followed by any character
\w+ # followed by word characters (letters, numbers, underscores)
. # followed by any character
(?: # followed by a non-capturing group that contains
any # the literal characters 'any'
. # any character
any # the literal characters 'any'
)
. # followed by any character <-- ERROR HERE!
( # followed by a capturing group
[$&+,:;=?##|'<>.^*()%!-\w] # any one of these many characters or word characters
.+ # then any one character one or more times
)
The behavior you describe...
but that finds the other lines except the permit ip any any.
matches what you've specified. Specifically, the regex above requires that there be characters after the 'any any'. Because permit \w+ any any does not have any characters after the any any part, the regex fails at the <-- ERROR HERE! mark in my breakdown above.
If that last part must be captured (using a capturing group) but it may not exist, you can make that entire last part optional using the ? character.
This would look like:
permit \w+ any any(?: (.+))?
for a breakdown of:
permit # the word permit
[ ] # a literal space
\w+ # one or more word characters
[ ] # a literal space
any # the word any
[ ] # another literal space
any # another any; all of this is requred.
(?: # a non-capturing group to start the "optional" part
[ ] # a literal space after the any
(.+) # everything else, including spaces, and capture it in a group
)? # end non-capturing group, but make it optional
I was looking for a regex to match words with hyphens and/or apostrophes. So far, I have:
(\w+([-'])(\w+)?[']?(\w+))
and that works most of the time, though if there's a apostrophe and then a hyphen, like "qu'est-ce", it doesn't match. I could append more optionals, though perhaps there's another more efficient way?
Some examples of what I'm trying to match: Mary's, High-school, 'tis, Chambers', Qu'est-ce.
use this pattern
(?=\S*['-])([a-zA-Z'-]+)
Demo
(?= # Look-Ahead
\S # <not a whitespace character>
* # (zero or more)(greedy)
['-] # Character in ['-] Character Class
) # End of Look-Ahead
( # Capturing Group (1)
[a-zA-Z'-] # Character in [a-zA-Z'-] Character Class
+ # (one or more)(greedy)
) # End of Capturing Group (1)
[\w'-]+ would match pretty much any occurrence of words with (or without) hyphens and apostrophes, but also in cases where those characters are adjacent.
(?:\w|['-]\w)+ should match cases where the characters can't be adjacent.
If you need to be sure that the word contains hyphens and/or apostrophes and that those characters aren't adjacent maybe try \w*(?:['-](?!['-])\w*)+. But that would also match ' and - alone.
debuggex.com is a great resource for visualizing these sorts of things
\b\w*[-']\w*\b should do the trick
The problem you're running into is that you actually have three possible sub-patterns: one or more chars, an apostrophe followed by one or more chars, and a hyphen followed by one or more chars.
This presumes you don't wish to accept words that begin or end with apostrophes or hyphens or have hyphens next to apostrophes (or vice versa).
I believe the best way to represent this in a RegExp would be:
/\b[a-z]+(?:['-]?[a-z]+)*\b/
which is described as:
\b # word-break
[a-z]+ # one or more
(?: # start non-matching group
['-]? # zero or one
[a-z]+ # one or more
)* # end of non-matching group, zero or more
\b # word-break
which will match any word that begins and ends with an alpha and can contain zero or more groups of either a apos or a hyphen followed by one or more alpha.
How about: \'?\w+([-']\w+)*\'?
demo
I suppose these words shouldn't be matched:
something- or -something: start or end with -
some--thing or some'-thing: - not followed by a character
some'': two hyphens
This worked for me:
([a-zA-Z]+'?-?[a-zA-Z]+(-?[a-zA-Z])?)|[a-zA-Z]
Use
([\w]+[']*[\w]*)|([']*[\w]+)
It will properly parse
"You've and we i've it' '98"
(supports ' in any place in the word but single ' is ignored).
If needed \w could be replaced with [a-zA-Z] etc.
I need to match the below type of strings using a regex pattern in javascript.
E.g. /this/<one or more than one word with hyphen>/<one or more than one word with hyphen>/<one or more than one word with hyphen>/<one or more than one word with hyphen>
So this single pattern should match both these strings:
1. /this/is/single-word
2. /this/is-more-than/single/word-patterns/to-match
Only the slash (/) and the 'this' string in the beginning are consistent and contains only alphabets.
You can use:
\/this\/[a-zA-Z ]+\/[a-zA-Z ]+\/[a-zA-Z ]+
Working Demo
I think you want something like this maybe?
(\/this\/(\w+\s?){1,}\/\w+\/(\w+\s?)+)
break down:
\/ # divder
this # keyword
\/ # divider
( # begin section
\w+ # single valid word character
\s? # possibly followed by a space
) # end section
{1,} # match previous section at least 1 times, more if possible.
\/ # divider
\w+ # single valid word character
\/ # divider
( # begin section
\w+ # single valid word character
\s? # possible space
) # end section
Working example
This might be obvious, however to match each pattern as a separate result, I believe you want to place parenthesis around the whole expression, like so:
(\/[a-zA-Z ]+\/[a-zA-Z ]+\/[a-zA-Z ]+\/[a-zA-Z ]+)
This makes sure that TWO results are returned, not just one big group.
Also, your question did not state that "this" would be static, as the other answers assumed... it says only the slashes are static. This should work for any text combo (no word this required).
Edit - actually looking back at your attempt, I see you used /this/ in your expression, so I assume that's why others did as well.
Demo: http://rubular.com/r/HGYp2qtmAM
Modified question samples:
/this/is/single-word
/this/is-more-than/single/word-patterns/to-match
Modified again The sections may have hyphen (no spaces) and there may be 3 or 4 sections beyond '/this/'
Modified pattern /^\/this(?:\/[a-zA-Z]+(?:-[a-zA-Z]+)*){3,4}$/
^
/this
(?:
/ [a-zA-Z]+
(?: - [a-zA-Z]+ )*
){3,4}
$