Regex to match mostly alphanumeric paths - javascript

I tried creating a regular expression to verify path names for a filesystem API I write using GridFS.
My current RegEx ^[A-Za-z0-9\-\[\]()$#_./]*$ can fulfill this criteria:
Allow a-z, A-Z, 0-9, -[]()$#_./
However it doesn't meet these additional criteria:
First Character has to be /
There mustn't be any occurrence of multiple / in a row.
Questions:
Can anybody help me fix my RegEx?
Are there any possible issues for using my criteria for path names? (Did I miss anything important?)

Not sure about the path criteria, but regarding the RegExp, pretty simple:
^\/(?!\/)([A-Za-z0-9\-\[\]()$#_.]|(\/(?!\/)))*$
\/(?!\/) means a slash / not followed by a slash (?!\/). I used it twice, once as the first character, and again as one of the possible matches after the first character.

Here's how you could address your requirements. To enforce the first character is /, simply add that after the ^.
^\/[A-Za-z0-9\-\[\]()$#_./]*$
To not allow consecutive slashes, you should remove it from your character set, and think of the set as a portion of the path. Portions would be separated by a slash. So the final regex would be:
^\/([A-Za-z0-9\-\[\]()$#_.]\/?)*$

Related

validate username with regex in javascript

I am a newbie to regex and would like to create a regular expression to check usernames. These are the conditions:
username must have between 4 and 20 characters
username must not contain anything but letters a-z, digits 0-9 and special characters -._
the special characters -._ must not be used successively in order to avoid confusion
the username must not contain whitespaces
Examples
any.user.13 => valid
any..user13 => invalid (two dots successively)
anyuser => valid
any => invalid (too short)
anyuserthathasasupersuperlonglongname => invalid (too many characters)
any username => invalid because of the whitespace
I've tried to create my own regex and only got to the point where I specify the allowed characters:
[a-z0-9.-_]{4,20}
Unfortunately, it still matches a string if there's a whitespace in between and it's possible to have two special chars .-_ successively:
If anybody would be able to provide me with help on this issue, I would be extremely grateful. Please keep in mind that I'm a newbie on regex and still learning it. Therefore, an explanation of your regex would be great.
Thanks in advance :)
Sometimes writing a regular expression can be almost as challenging as finding a user name. But here you were quite close to make it work. I can point out three reasons why your attempt fails.
First of all, we need to match all of the input string, not just a part of it, because we don't want to ignore things like white spaces and other characters that appear in the input. For that, one will typically use the anchors ^ (match start) and $ (match end) respectively.
Another point is that we need to prevent two special characters to appear next to each other. This is best done with a negative lookahead.
Finally, I can see that the tool you are using to test your regex is adding the flags gmi, which is not what we want. Particularly, the i flag says that the regex should be case insensitive, so it should match capital letters like small ones. Remove that flag.
The final regex looks like this:
/^([a-z0-9]|[-._](?![-._])){4,20}$/
There is nothing really cryptic here, except maybe for the group [-._](?![-._]) which means any of -._ not followed by any of -._.

Regex finding file names

Can anyone help me with the REGEX to match
../_assets/applications/cleaning/*logo.png
"*" being the file name which can also follow an underscore or dash so
../_assets/applications/cleaning/main_logo.png
OR
../_assets/applications/cleaning/main-logo.png
this is as far as I got
\assets\/applications\/cleaning\/
An asterisk in a regex is a quantifier allowing zero or more of the previous character/group. So you first expression would allow zero or more forward slashes. You can use a . with a * to allow for zero or more of any character (excluding new line). So something like:
\/cleaning\/(.+?logo\.png)$
should find all the images you want, then:
/logos/$1
should replace them as you wanted.
Demo: https://regex101.com/r/dmAjjv/1/

Regex match word after negated set

I'm currently trying to match the following cases with Regex.
Current regex
\.\/[^/]\satoms\s\/[^/]+\/index\.js
Cases
// Should match
./atoms/someComponent/index.js
./molecules/someComponent/index.js
./organisms/someComponent/index.js
// Should not match
./atomsdsd/someComponent/index.js
./atosdfms/someComponent/index.js
./atomssss/someComponent/index.js
However none of the cases are matching, what am I doing wrong?
Hope this will help you out. You have added some addition characters which lets your regex to fail.
Regex: \.\/(atoms|molecules|organisms)\/[^\/]+\/index\.js
1. \.\/ This will match ./
2. (atoms|molecules|organisms) This will match either atoms or molecules or organisms
3. \/[^\/]+\/ This will match / and then till /
4. index\.js This will match index.js
Regex demo
why not just this simpler pattern?
\.\/(atoms|molecules|organisms)\/.*?index\.js
Try the following:
\.\/(atoms|molecules|organisms)\/[a-zA-Z]*\/index\.js
Forward slashes (and other special characters) should be escaped with a back slash \.
\.\/(atoms|molecules|organisms)\/ matches '.atoms/' or .molecules or organisms strictly. Without the parenthesis it will match partial strings. the | is an alternation operator that matches either everything to the left or everything to the right.
[a-zA-Z]* will match a string of any length with characters in any case. a-z accounts for lower case while A-Z accounts for upper case. * indicates one or more characters. Depending on what characters may be in someCompenent you may need to account for numbers using [a-zA-Z\d]*.
\/index\.js will match '/index.js'

Regex-Groups in Javascript

I have a problem using a Javascript-Regexp.
This is a very simplified regexp, which demonstrates my Problem:
(?:\s(\+\d\w*))|(\w+)
This regex should only match strings, that doesn't contain forbidden characters (everything that is no word-character).
The only exception is the Symbol +
A match is allowed to start with this symbol, if [0-9] is trailing.
And a + must not appear within words (44+44 is not a valid match, but +4ad is)
In order to allow the + only at the beginning, I said that there must be a whitespace preceding. However, I don't want the whitespace to be part of the match.
I tested my regex with this tool: http://regex101.com/#javascript and the resultig matches look fine.
There are 2 Issues with that regexp:
If I use it in my JS-Code, the space is always part of the match
If +42 appears at the beginning of a line, it won't be matched
My Questions:
How should the regex look like?
Why does this regex add the space to the matches?
Here's my JS-Code:
var input = "+5ad6 +5ad6 sd asd+as +we";
var regexp = /(?:\s(\+\d\w*))|(\w+)/g;
var tokens = input.match(regexp);
console.log(tokens);
How should the regex look like?
You've got multiple choices to reach your goal:
It's fine as you have it. You might allow the string beginning in place of the whitespace as well, though. Just get the capturing groups (tokens[1], tokens[2]) out of it, which will not include the whitespace.
If you didn't use JavaScript, a lookbehind could help. Unfortunately it's not supported.
Require a non-word-boundary before the +, which would make every \w character before the + prevent the match:
/\B\+\d\w+|\w+/
Why does this regex add the space to the matches?
Because the regex does match the whitespace. It does not add the \s(\+\d\w+) to the captured groups, though.

Email verification regex failing on hyphens

I'm attempting to verify email addresses using this regex: ^.*(?=.{8,})[\w.]+#[\w.]+[.][a-zA-Z0-9]+$
It's accepting emails like a-bc#def.com but rejecting emails like abc#de-f.com (I'm using the tool at http://tools.netshiftmedia.com/regexlibrary/ for testing).
Can anybody explain why?
Here is the explaination:
In your regualr expression, the part matches a-bc#def.com and abc#de-f.com is [\w.]+[.][a-zA-Z0-9]+$
It means:
There should be one or more digits, word characters (letters, digits, and underscores), and whitespace (spaces, tabs, and line breaks) or '.'. See the reference of '\w'
It is followed by a '.',
Then it is followed one or more characters within the collection a-zA-Z0-9.
So the - in de-f.com doesn't matches the first [\w.]+ format in rule 1.
The modified solution
You could adjust this part to [\w.-]+[.][a-zA-Z0-9]+$. to make - validate in the #string.
Because after the # you're looking for letters, numbers, _, or ., then a period, then alphanumeric. You don't allow for a - anywhere after the #.
You'd need to add the - to one of the character classes (except for the single literal period one, which I would have written \.) to allow hyphens.
\w is letters, numbers, and underscores.
A . inside a character class, indicated by [], is just a period, not any character.
In your first expression, you don't limit to \w, you use .*, which is 0+ occurrences of any character (which may not actually be what you want).
Use this Regex:
var email-regex = /^[^#]+#[^#]+\.[^#\.]{2,}$/;
It will accept a-bc#def.com as well as emails like abc#de-f.com.
You may also refer to a similar question on SO:
Why won't this accept email addresses with a hyphen after the #?
Hope this helps.
Instead you can use a regex like this to allow any email address.
^[a-zA-Z][\w\.-]*[a-zA-Z0-9]#[a-zA-Z][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$
Following regex works:
([A-Za-z0-9]+[-.-_])*[A-Za-z0-9]+#[-A-Za-z0-9-]+(\.[-A-Z|a-z]{2,})+

Categories