Regex finding file names

Regex finding file names - javascript

Can anyone help me with the REGEX to match
../_assets/applications/cleaning/*logo.png
"*" being the file name which can also follow an underscore or dash so
../_assets/applications/cleaning/main_logo.png
OR
../_assets/applications/cleaning/main-logo.png
this is as far as I got
\assets\/applications\/cleaning\/

An asterisk in a regex is a quantifier allowing zero or more of the previous character/group. So you first expression would allow zero or more forward slashes. You can use a . with a * to allow for zero or more of any character (excluding new line). So something like:
\/cleaning\/(.+?logo\.png)$
should find all the images you want, then:
/logos/$1
should replace them as you wanted.
Demo: https://regex101.com/r/dmAjjv/1/

Related

RegEx matching help: won't match on each appearence

I need to write a little RegEx matcher which will match any occurrence of strings in the form of
[a-zA-Z]+(_[a-zA-Z0-9]+)?
If I use the regex above it does match the sections needed but would also match onto the abc part of 4_abc which is not intended. I tried to exclude it with:
(?:[^a-zA-Z0-9_]|^)([a-zA-Z]+(_[a-zA-Z0-9]+)?)(?:[^a-zA-Z0-9_]|$)
The problem is that the 'not' matches at the beginning and end are not really working like I hoped they would. If I use them on the example
a_d Dd_da 4_d d_4
they would block matching the second Dd_da because the space was used in the first match.Sadly I can't use lookarounds because I am using JS.
So the input:
a_d Dd_da 4_d d_4
should match: a_d, Dd_da and d_4
but matches: a_d (there is a space at the end)
Is there another way to match the needed sections, or to not consume the 'anchor' matches?
I really appreciate your help.

You can make use of \b:
\b[a-zA-Z]+(_[a-zA-Z0-9]+)?\b
\b matches the (zero-width) point where either the preceding character or following character is a letter, digit or underscore, but not both. It also matches with the start/end of the string if the first/last character is a letter, digit or underscore.

regex - how to select all double slashes except followed by colon

I need some help with RegEx, it may be a basic stuff but I cannot find a correct way how to do it. Please help!
So, here's my question:
I have a list of URLs, that are invalid because of double slash, like this:
http://website.com//wp-content/folder/file.jpg, to fix it I need to remove all double slashes except the first one followed by colon (http://), so fixed URL is this: http://website.com/wp-content/folder/file.jpg.
I need to do it with RegExp.
Variant 1
url.replace(/\/\//g,'/'); // => http:/website.com/wp-content/folder/file.jpg
will replace all double slashed (//), including the first one, which is not correct.
example here:
https://regex101.com/r/NhCVMz/2

You may use
url = url.replace(/(https?:\/\/)|(\/){2,}/g, "$1$2")
See the regex demo
Note: a ^ anchor at the beginning of the pattern might be used if the strings are entire URLs.
This pattern will match and capture http:// or https:// and will restore it in the resulting string with the $1 backreference and all other cases of 2 or more / will be matched by (\/){2,} and only 1 occurrence will be put back into the resulting string since the capturing group does not include the quantifier.

Find (^|[^:])/{2,}
Replace $1/
delimited: /(^|[^:])\/{2,}/

Regex to match mostly alphanumeric paths

I tried creating a regular expression to verify path names for a filesystem API I write using GridFS.
My current RegEx ^[A-Za-z0-9\-\[\]()$#_./]*$ can fulfill this criteria:
Allow a-z, A-Z, 0-9, -[]()$#_./
However it doesn't meet these additional criteria:
First Character has to be /
There mustn't be any occurrence of multiple / in a row.
Questions:
Can anybody help me fix my RegEx?
Are there any possible issues for using my criteria for path names? (Did I miss anything important?)

Not sure about the path criteria, but regarding the RegExp, pretty simple:
^\/(?!\/)([A-Za-z0-9\-\[\]()$#_.]|(\/(?!\/)))*$
\/(?!\/) means a slash / not followed by a slash (?!\/). I used it twice, once as the first character, and again as one of the possible matches after the first character.

Here's how you could address your requirements. To enforce the first character is /, simply add that after the ^.
^\/[A-Za-z0-9\-\[\]()$#_./]*$
To not allow consecutive slashes, you should remove it from your character set, and think of the set as a portion of the path. Portions would be separated by a slash. So the final regex would be:
^\/([A-Za-z0-9\-\[\]()$#_.]\/?)*$

Exact string negation in javascript regexpressions

This is more a question to satisfy my curiosity than a real need for help, but I will appreciate your help equally as it is driving me nuts.
I am trying to negate an exact string using Javascript regular expressions, the idea is to exclude URL that include the string "www". For instance this list:
http://www.example.org/
http://status.example.org/index.php?datacenter=1
https://status.example.org/index.php?datacenter=2
https://www.example.org/Insights
http://www.example.org/Careers/Job_Opportunities
http://www.example.org/Insights/Press-Releases
For that I can succesfully use the following regex:
/^http(|s):..[^w]/g
This works correctly, but while I can do a positive match I cannot do something like:
/[^www]/g or /[^http]/g
To exclude lines that include the exact string www or http. I have tried the infamous "negative Lookeahead" like that:
/*(?: (?!www).*)/g
But this doesn't work either OR I cannot test it online, it doesn't works in Notepad++ either.
If I were using Perl, Grep, Awk or Textwrangler I would have simply done:
!www OR !http
And this would have done the job.
So, my question is obviously: What would be the correct way to do such thing in Javascript? Does this depend on the regex parser (as I seem to understand?).
Thanks for any answer ;)

You need to add a negative lookahead at the start.
^(?!.*\bwww\.)https?:\/\/.*
DEMO
(?!.*\bwww\.) Negative lookahead asserts that the string we are going to match won't contain, www.. \b means word boundary which matches between a word character and a non-word character. Without \b, www. in your regex would match www. in foowww.

To negate 'www' at every position in the input string:
var a = [
'http://www.example.org/',
'http://status.example.org/index.php?datacenter=1',
'https://status.example.org/index.php?datacenter=2',
'https://www.example.org/Insights',
'http://www.example.org/Careers/Job_Opportunities',
'http://www.example.org/Insights/Press-Releases'
];
a.filter(function(x){ return /^((?!www).)*$/.test(x); });
So at every position check that 'www' doesn't match, and then match
any character (.).

Regex-Groups in Javascript

I have a problem using a Javascript-Regexp.
This is a very simplified regexp, which demonstrates my Problem:
(?:\s(\+\d\w*))|(\w+)
This regex should only match strings, that doesn't contain forbidden characters (everything that is no word-character).
The only exception is the Symbol +
A match is allowed to start with this symbol, if [0-9] is trailing.
And a + must not appear within words (44+44 is not a valid match, but +4ad is)
In order to allow the + only at the beginning, I said that there must be a whitespace preceding. However, I don't want the whitespace to be part of the match.
I tested my regex with this tool: http://regex101.com/#javascript and the resultig matches look fine.
There are 2 Issues with that regexp:
If I use it in my JS-Code, the space is always part of the match
If +42 appears at the beginning of a line, it won't be matched
My Questions:
How should the regex look like?
Why does this regex add the space to the matches?
Here's my JS-Code:
var input = "+5ad6 +5ad6 sd asd+as +we";
var regexp = /(?:\s(\+\d\w*))|(\w+)/g;
var tokens = input.match(regexp);
console.log(tokens);

How should the regex look like?
You've got multiple choices to reach your goal:
It's fine as you have it. You might allow the string beginning in place of the whitespace as well, though. Just get the capturing groups (tokens[1], tokens[2]) out of it, which will not include the whitespace.
If you didn't use JavaScript, a lookbehind could help. Unfortunately it's not supported.
Require a non-word-boundary before the +, which would make every \w character before the + prevent the match:
/\B\+\d\w+|\w+/
Why does this regex add the space to the matches?
Because the regex does match the whitespace. It does not add the \s(\+\d\w+) to the captured groups, though.

We Keep Coding

JavaScript is the programming language of the Web.

Regex finding file names - javascript

Related

RegEx matching help: won't match on each appearence

regex - how to select all double slashes except followed by colon

Regex to match mostly alphanumeric paths

Exact string negation in javascript regexpressions

Regex-Groups in Javascript

Categories

Resources