Regex Positive lookahead get first occurrence - javascript

I Have String like this, and I want capture characters between .html and the first slash
http://example.org/some-path/some-title-in-1978.html
This part some-title-in-1978, for that I came up with this regex:
/\/.+?(?=\.html)/ and result are not what i want, it's like this:
//domain.org/some-path/some-title-in-1978

Use the following regex pattern:
[^/]+(?=\.html)
https://regex101.com/r/wep2Im/1
[^/]+ - matches all characters that are followed by .html except forward slash

Related

How to not match given prefix in RegEx without negative lookbehind?

Goal
The goal is matching a string in JavaScript without certain delimiters, i.e. a string between two characters (the characters can be included in the match).
For example, this string should be fully matched: $ test string $. This can appear anywhere in a string. That would be trivial, however, we want to allow escaping the syntax, e.g. The price is 5\$ to 10\$.
Summarized:
Match any string that is enclosed by two $ signs.
Do not match it if the dollar signs are escaped using \$.
Solution using negative lookbehind
A solution that achieves this goal perfectly is: (?<!\\)\$(.*?)(?<!\\)\$.
Problem
This solution uses negative lookbehind, which is not supported on Safari. How can the same matches be achieved without using negative lookbehind (i.e. on Safari)?
A solution that partially works is (?<!\\)\$(.*?)(?<!\\)\$. However, this will also match the character in front of the $ sign if it is not a \.
You might rule out what you don't want by matching it, and capture what you want to keep in group 1
\\\$.*?\$|\$.*?\\\$|(\$.*?\$)
Regex demo
You may use this regex and grab your inner text using capture group #1 as you are already doing in your current regex using lookbehind:
(?:^|[^\\])\$((?:\\.|[^$])*)\$
RegEx Demo
RegEx Details:
(?:^|[^\\]): Match start position or a non-backslash character in a non-capturing group
\$: Match starting $
(: Start capturing group
(?:\\.|[^$])*: Match any escaped character or a non-$ character. Repeat this group 0 or more times
): End capturing group
\$: Match closing $
PS: This regex will give same matches as your current regex: (?<!\\)\$(.*?)(?<!\\)\$

RegEx matching help: won't match on each appearence

I need to write a little RegEx matcher which will match any occurrence of strings in the form of
[a-zA-Z]+(_[a-zA-Z0-9]+)?
If I use the regex above it does match the sections needed but would also match onto the abc part of 4_abc which is not intended. I tried to exclude it with:
(?:[^a-zA-Z0-9_]|^)([a-zA-Z]+(_[a-zA-Z0-9]+)?)(?:[^a-zA-Z0-9_]|$)
The problem is that the 'not' matches at the beginning and end are not really working like I hoped they would. If I use them on the example
a_d Dd_da 4_d d_4
they would block matching the second Dd_da because the space was used in the first match.Sadly I can't use lookarounds because I am using JS.
So the input:
a_d Dd_da 4_d d_4
should match: a_d, Dd_da and d_4
but matches: a_d (there is a space at the end)
Is there another way to match the needed sections, or to not consume the 'anchor' matches?
I really appreciate your help.
You can make use of \b:
\b[a-zA-Z]+(_[a-zA-Z0-9]+)?\b
\b matches the (zero-width) point where either the preceding character or following character is a letter, digit or underscore, but not both. It also matches with the start/end of the string if the first/last character is a letter, digit or underscore.

regex - how to select all double slashes except followed by colon

I need some help with RegEx, it may be a basic stuff but I cannot find a correct way how to do it. Please help!
So, here's my question:
I have a list of URLs, that are invalid because of double slash, like this:
http://website.com//wp-content/folder/file.jpg, to fix it I need to remove all double slashes except the first one followed by colon (http://), so fixed URL is this: http://website.com/wp-content/folder/file.jpg.
I need to do it with RegExp.
Variant 1
url.replace(/\/\//g,'/'); // => http:/website.com/wp-content/folder/file.jpg
will replace all double slashed (//), including the first one, which is not correct.
example here:
https://regex101.com/r/NhCVMz/2
You may use
url = url.replace(/(https?:\/\/)|(\/){2,}/g, "$1$2")
See the regex demo
Note: a ^ anchor at the beginning of the pattern might be used if the strings are entire URLs.
This pattern will match and capture http:// or https:// and will restore it in the resulting string with the $1 backreference and all other cases of 2 or more / will be matched by (\/){2,} and only 1 occurrence will be put back into the resulting string since the capturing group does not include the quantifier.
Find (^|[^:])/{2,}
Replace $1/
delimited: /(^|[^:])\/{2,}/

javascript regex: string contains this, but not that

I'm trying to put together a regex pattern that matches a string that does contain the word "front" and does NOT contain the word "square". I have can accomplish this individually, but am having trouble putting them together.
front=YES
^((?=front).)*$
square=NO
^((?!square).)*$
However, how to I combine these into as single regex expression?
You can use just a single negative lookahead for this:
/^(?!.*square).*front/
RegEx Demo
RegEx Details:
^: Start
(?!.*square) is negative lookahead to assert a failure if text square is present anywhere in input starting from the start position
.*front will match front anywhere in input
You could use lookahead assertions to express the logical and:
The final pattern would look like that:
^(?=.*?front)(?!.*square)

Match index.html or empty url end with regex

I'm trying to check for a url ending of either:
http://www.site.com/html/
OR
http://www.site.com/html/index.html
So far I have this (with numerous attempts of moving the $ and /'s) but can seem to get it to work.
window.location.pathname.match(/index.html/|/^$z/))
You could try this:
window.location.pathname.match(/\/$|index\.html/)
Will match the last / of the pathname, and also index.html
The first part of the regex "/$" escapes the forward slash, and the $ matches the last character of the string. So the way I read it is "The last character is a forwardslash"
The second part of the regex "index.html" matches index.html, but you have to escape the period because "." matches any character.
Heres a regular expression cheatsheet: http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

Categories