regex - how to select all double slashes except followed by colon - javascript

I need some help with RegEx, it may be a basic stuff but I cannot find a correct way how to do it. Please help!
So, here's my question:
I have a list of URLs, that are invalid because of double slash, like this:
http://website.com//wp-content/folder/file.jpg, to fix it I need to remove all double slashes except the first one followed by colon (http://), so fixed URL is this: http://website.com/wp-content/folder/file.jpg.
I need to do it with RegExp.
Variant 1
url.replace(/\/\//g,'/'); // => http:/website.com/wp-content/folder/file.jpg
will replace all double slashed (//), including the first one, which is not correct.
example here:
https://regex101.com/r/NhCVMz/2

You may use
url = url.replace(/(https?:\/\/)|(\/){2,}/g, "$1$2")
See the regex demo
Note: a ^ anchor at the beginning of the pattern might be used if the strings are entire URLs.
This pattern will match and capture http:// or https:// and will restore it in the resulting string with the $1 backreference and all other cases of 2 or more / will be matched by (\/){2,} and only 1 occurrence will be put back into the resulting string since the capturing group does not include the quantifier.

Find (^|[^:])/{2,}
Replace $1/
delimited: /(^|[^:])\/{2,}/

Related

Regex for matching BBCode Images

I need a regex to verify if the textarea has one of the following matches:
[img]https://example.com/image.jpg[/img]
[img=https://example.com/image.jpg]
This is what I've been trying so far, but it doesn't work, sadly...
/\[img(?=|\])(https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*))(?\])(?\[\/img\])/gi
Thank you.
You can use this- ^\[img(?:\].+\[\/img\]|=.+\])$
Note: If you want to verify the link string is a valid URL, replace both .+ with a URL regex matcher, you may find one here
Explanation
^\[img - This part is common both strings, this will match the [img at the start of line
(?:\].+\[\/img\]|=.+\])$ - This will match 2 alternatives, depending on the very first character
First alternative (first character is ]) - In this case \].+\[\/img\]| will be matched. This will match everything (.+) in between the opening and closing [img] tags before finally matching the closing tag itself.
Second alternative (first character is =) - In this case =.+\] will be matched. This grabs everything after img= and stops when ] is reached.
finally the regex matches the end of line.
Check out the demo

Regex to match all spaces, except single spaces having words on both sides

I have come so far:
1) Run regex / /g to match all spaces.
2) Run a new call to regex /\b( )\b/g to match the spaces that need to be excluded.
Now I need them both fused in one statement. All spaces except the ones returned by the second. Any help?
Live regex for testing: https://regex101.com/r/26w2WR/1
EDIT: Although good answers are already available, I found that trying to match "words" with \b or \B is not always a good idea, as a lot of printable characters like dots and quotes are not seen as words by RegEx. Another problem is when you are looping through DOM nodes, sometimes you encounter inline styling tags like <strong> which should also just count as a beginning/end of a word, but a #text node just ends before the tag. So you may want to include start & end of a string in the RegEx too. For anyone wishing to address these too, I ended up with this RegEx:
/(\S|^)( )(?=\S|$)/g
This uses \S (not white space), inlcudes start/end of a string and applies groups for replacement ability. Replace JS looks like this:
yourTextNode.replace(/(\S|^)( )(?=\S|$)/g, '$1'+ yourreplacement)
To match chars, you can use (\u00A0) instead of ( )
Hope this helps.
You can use negative look-ahead:
(?!\b \b)( )
Without any look-around you can use this regex with \B and alternation:
\B +| +\B
Updated RegEx Demo
\B assert position where \b does not match
Above matches a space that is preceded or followed by \B

Regex finding file names

Can anyone help me with the REGEX to match
../_assets/applications/cleaning/*logo.png
"*" being the file name which can also follow an underscore or dash so
../_assets/applications/cleaning/main_logo.png
OR
../_assets/applications/cleaning/main-logo.png
this is as far as I got
\assets\/applications\/cleaning\/
An asterisk in a regex is a quantifier allowing zero or more of the previous character/group. So you first expression would allow zero or more forward slashes. You can use a . with a * to allow for zero or more of any character (excluding new line). So something like:
\/cleaning\/(.+?logo\.png)$
should find all the images you want, then:
/logos/$1
should replace them as you wanted.
Demo: https://regex101.com/r/dmAjjv/1/

Use regex in Find & Replace to extract everything but a pattern/string

I want to extract the ASIN from any Amazon URL. I found this, giving me the following regex:
/([a-zA-Z0-9]{10})(?:[/?]|$)
This expression works for me in Excel. However, I also have use another tool where I can only edit my text with Find & Replace. I can use regex but the tool will always replace the result from my regex.
When I use the expression above the tool will find exactly the string I am looking for but will then replace it with either blank or whatever I put in the replace field.
How does the regex have to look when I must use Find & Replace? I assume it should match/find anything BUT the ASIN/string and then replace it with blank. At the end of the day everything should be deleted/replaced except the ASIN.
Example input:
https://www.amazon.de/gp/product/**B00ZFWRGXC**/ref=br_asw_pdt-1?pf_rd_m=A3JWKAKI7XB7XF&pf_rd_s=desktop-6&pf_rd_r=BKAKXRSA7JM715TZ38YN&pf_rd_t=36701&pf_rd_p=f54c1f0d-d685-4847-826e-7fdd8c321011&pf_rd_i=desktop
I only want to keep the bold part (via Find & Replace).
You may use a regex based on an alternation with one branch matching and capturing what you need, and the other will just match all the text that does not start your sequence.
Use
/([a-zA-Z0-9]{10})|(?:(?!/[a-zA-Z0-9]{10}).)*
and replace with $1\n. To make it work better, make sure . matches the newline option (if present) is on. If it is not present, replace the . with [\s\S].
Details:
/([a-zA-Z0-9]{10}) - match a / and capture 10 alphanumerical symbols
| - or
(?:(?!/[a-zA-Z0-9]{10}).)* - any 0+ character that is not starting a sequence of a / followed with 10 alphanumerical symbols.
The $1 is a backreference restoring the contents of the capturing group (10 alphanumerical symbols) in the result.
/([A-Z0-9]{10})|(?:(?!/[A-Z0-9]{10}).)*
or
/([a-zA-Z0-9]{10})/|(?:(?!/[a-zA-Z0-9]{10}/).)*
will fix it.

Regex for fragment url, but not whole

I would like to have some regular expression to my JS script.
Examples of urls that should not match:
http://www.domain.com/files/pictures/3749832
C://mydocuments/files/pictures/3749832
domain.com:8080/doc/files/pictures/3749832
BUT these should match:
files/pictures/3749832
/files/pictures/3749832
My regex: files/pictures/[0-9]{7} is not enough good :(
You'll need to escape the front slashes in order to get it to work. You'll also want to ensure it matches the start of the string (with or without /) - using the ^ matches the start of line.
^\/?files\/pictures\/\d{7}
Here's a regex101 for you to play around with: https://regex101.com/r/gF5cA0/1
If you need it to also not match anything after this (like a subfolder) use the $ to match the end of line:
^\/?files\/pictures\/\d{7}$

Categories