Match Specific File Path Regex - javascript

so far I have this regex $fileregex = /([a-z]:\\\\([^\\\\^\\.])*)|(\/[^\/.])/i; but I am very confused on what to do next.
I want to match strings in this format
c:\\something\\else\\something
c:\\something\\else\\something.whatever
/etc/whatever/something/here
/etc/here.txt
/
c:\\
But I don't want to match, for example
c:\oneslash\text.txt
\etc\hi
I am really stuck on my regex especially on repeating the optional path, as one could just request the root.Can anyone help me out with the regex?

This one should work:
preg_match_all('%[A-Za-z]:\\\\\\\\(.*?\\\\\\\\)*.*|/(.*?/)*.*%m', $input, $regs, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($regs[0]); $i++) {
# Matched text = $regs[0][$i];
}
Result:
Description of the Regex:
Match either the regular expression below (attempting the next alternative only if this one fails)
[A-Za-z] Match a single character present in the list below
A character in the range between “A” and “Z”
A character in the range between “a” and “z”
: Match the character “:” literally
\\\\ Match the character “\” literally
\\\\ Match the character “\” literally
( Match the regular expression below and capture its match into backreference number 1
. Match any single character that is not a line break character
*? Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\\\\ Match the character “\” literally
\\\\ Match the character “\” literally
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
| Or match regular expression number 2 below (the entire match attempt fails if this one fails to match)
/ Match the character “/” literally
( Match the regular expression below and capture its match into backreference number 2
. Match any single character that is not a line break character
*? Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
/ Match the character “/” literally
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)

Have you try this:
/^([a-zA-Z]\:|\\\\[^\/\\:*?"<>|]+\\[^\/\\:*?"<>|]+)(\\[^\/\\:*?"<>|]+)+(\.[^\/\\:*?"<>|]+)$/
This regular expression will match any valid file path. It checks local drives and network path. The file extension is required.
Further references : The regular expression library

Related

how does javascript regex lazy match work?

For this string
abc.com/file/some.png?v=123
how do I match .png? I use
/\..*?\?/
but it is matching .com/file/some.png?, so why is the lazy match rule not working here?
There are lots of variants to this answer. I will propose matching the first file suffix after the last / character.
That can be done with this regex
/(?!.*\/)\.\w+\?/
Explaination
(?!.*/)\.\w+\?
Options: Case insensitive; Dot doesn’t match line breaks; ^$ match at line breaks
Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!.*/)
Match any single character that is NOT a line break character (line feed, carriage return, line separator, paragraph separator) .*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “/” literally /
Match the character “.” literally \.
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) \w+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the question mark character \?
\1
Insert a backslash \
Insert the character “1” literally 1
Created with RegexBuddy

Regex related confusion

I would like to extract mkghj.bmg and pp.kp from the following string using a regex used in javascript
avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp
Everything enclosed within ,,,, needs to be ignored. There could be multiple instances of ,,,, But they will always occur even number of times (non occurrence is also a possibility) in the string.
Also, avb#gh.lk has an # sign, therefore it needs to be ignored
I guess the rule I am looking for is this - if there is a dot (.) look ahead and look behind :-
If the dot is enclosed inside ,,,, then ignore it
If the dot has an # before it with no space between the dot and #, ignore it
In all other cases, capture an unbroken set of characters (until a space is encountered) on either side of the dot
I came up with this regex, but it is not helpful
[^\, ]+([^# \,]+\w+)[^\, ]+
Generally speaking that is (mind the capturing group):
not_this | neither_this_nor_this | (but_this_interesting_stuff)
For your specific example, this could be
,,,,.*?,,,,|\S+#\S+|(\S+)
You need to check for the existance of group 1, see a demo on regex101.com.
In JS this would be:
var myString = "avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp";
var myRegexp = /,,,,.*?,,,,|\S+#\S+|(\S+)/g;
match = myRegexp.exec(myString);
while (match != null) {
if (typeof(match[1]) != 'undefined') {
console.log(match[1]);
}
match = myRegexp.exec(myString);
}
Regex
^[^ ]+ ([^ ]+) ,,,,.*,,,,\s+(.*)
Description
^ asserts position at start of a line
Match a single character not present in the list below [^ ]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character literally (case sensitive)
matches the character literally (case sensitive)
1st Capturing Group ([^ ]+)
Match a single character not present in the list below [^ ]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character literally (case sensitive)
,,,, matches the characters ,,,, literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
,,,, matches the characters ,,,, literally (case sensitive)
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (.*)
.* matches any character (except for line terminators)
You can replace the string between ,,, by '' and than from remaining you can split and search for # sign and filter.
let str = `avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp`
let op = str.replace(/,,,.*?,,,,|/g,'').split(' ').filter(e=> !e.includes('#') && e );
console.log(op)

regex to coordinates WGS84?

I'm trying this regular expressión, but I can't validate correctly the end white space and the letter:
/^\d{0,2}(\-\d{0,2})?(\-\d{0,2})?(\ ?\d[W,E]?)?$/
Examples of correct values:
33-39-10 N //OK
85-50 W //OK
-85-50 E //Wrong
What's wrong?
\d{0,2} this quantifier also matches a digit zero times so that would match the leading - in the 3rd example.
In the character class [W,E] you could omit the comma and list the characters you allow to match [ENW]
If only the third group is optional you could try including the whitespace before the end of the line $
^\d{2}(-\d{2})(-\d{2})? [ENW] $
I have used this regular expression : ^(?!\-)\d{0,2}?(\-\d{0,2}).+\s(N|E|W|S)$
Using a negative lookahead, we have excluded anything that starts with a dash (-).
(?!\-) = Starting at the current position in the expression,
ensures that the given pattern will not match
\s(N|E|W|S) matches anything with a space (\s) and one of the letters using OR operator |.
You may also use \s+(N|E|W|S).
+ = Matches between one and unlimited times, as many times as
possible, giving back as needed

Checking an Array Element for Two Blank Space Characters

How would you be able to tell if an array element is made up of three words (i.e. if it has two blank space characters in it)? It might look something like "abc def ghi". I am trying to search through an array for elements of this form and will remove this while others of the format "jkl xyz" or '"jkl"' would remain in the array.
You can use search function with following regex :
str.search(/\b(\w+ \w+ \w+)\b/g);
Read the detail in Demo
You can use a regex like:
/^[^\s]+\s[^\s]+\s[^\s]+$/.test("abc def def") // true
/^[^\s]+\s[^\s]+\s[^\s]+$/.test("abc def ") // false
It means:
^ Start of string
[^\s]+ 1 or more none space characters
\s a space character
[^\s]+ 1 or more none space characters
\s a space character
[^\s]+ 1 or more none space characters
\s a space character
$ End of string
var myArray = ["abc def ghi","jkl xyz","gty slp","zxc vbn jkl"];
for (i=0;i<myArray.length;++i) {
if (/\w+ \w+ \w+/.test(myArray[i])) {
myArray.splice([i], 1);
}
};
console.log(myArray);
Outputs:
["jkl xyz", "gty slp"]
CODEPEN DEMO
RegexExplanation:
\w+ \w+ \w+
-----------
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “ ” literally « »
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “ ” literally « »
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

javascript - url regexp not proper

I am tryint to validate URL with js.
function validateURL(url) {
var urlregex = new RegExp("^(http:\/\/www.|https:\/\/www.|ftp:\/\/www.|www.|https:\/\/|http:\/\/){1}([0-9A-Za-z]+\.)");
return urlregex.test(url);
}
but but i want that google.com will also pass, but it is not passing thru this regexp.
what is wrong with regexp?
I want these urls to pass thru regexp:
http://www.google.com
http://google.com
https://www.google.com
https://google.com
google.com
www.google.com
Try this instead:
function validateURL(url) {
var urlregex = new RegExp("^((?:https?:\/\/)?(?:www\.)?[0-9A-Za-z]+\.[0-9A-Za-z]+)$");
return urlregex.test(url);
}
´
DEMO
http://regex101.com/r/rG8wP9
OR:
function validateURL(url) {
if (/^((?:https?:\/\/)?(?:www\.)?[0-9A-Za-z]+\.[0-9A-Za-z]+)$/im.test(url)) {
return true;
} else {
return false;
}
}
EXPLANATION:
^ assert position at start of a line
1st Capturing group ((?:https?:\/\/)?(?:www\.)?[0-9A-Za-z]+\.[0-9A-Za-z]+)
(?:https?:\/\/)? Non-capturing group
Quantifier: Between zero and one time, as many times as possible, giving back as needed [greedy]
http matches the characters http literally (case sensitive)
s? matches the character s literally (case sensitive)
Quantifier: Between zero and one time, as many times as possible, giving back as needed [greedy]
: matches the character : literally
\/ matches the character / literally
\/ matches the character / literally
(?:www\.)? Non-capturing group
Quantifier: Between zero and one time, as many times as possible, giving back as needed [greedy]
www matches the characters www literally (case sensitive)
\. matches the character . literally
[0-9A-Za-z]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
\. matches the character . literally
[0-9A-Za-z]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
$ assert position at end of a line
g modifier: global. All matches (don't return on first match)
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
what is wrong with regexp?
This part of regexp:
([0-9A-Za-z]+\.)
can't be matched with "google.com" in your example. But it matches with wrong domain addresses like google. Something like this:
(?:[0-9A-Za-z]+\.)+[A-Za-z]{2,}
is more correct and flexible.
Several things not working in this regexp. For instance nothing happen after the period at the end. Also never need {1} as every expression matched once by default. Main problem is the | because the branches are not inside () so after each | the match restart completely
This is your regexp cleaned up. For instance pulling things into groups, see the | inside ().
isMatch = (/^(?:(?:https?|ftp):\/\/)?(?:www\.)?[0-9a-z]+\.[a-z]+/im.test(url))

Categories