javascript regex exclude dot - javascript

Hello I'm finding difficulties making the right regex. I'm missing something, but I don't know what.
pattern:
href=".*?\/FileBrowser\/File\?path=esoft\/[^.\s]*?"
test string:
dfhgndfhkljh;fth href="/FileBrowser/File?path=esoft/test/I4/I0000/as.jpeg" dfghfdhnjfgh e:small;"><a href="/FileBrowser/File?path=esoft/test/bb/2evo/1_folder" target="_blank"dsadsadsa
and the site I use to test online is https://regex101.com/r/mU5vH6/2
The goal is to mark the links (after the href) separately as shown https://regex101.com/r/mU5vH6/3 here, but if one of them has a dot - meaning file path, not to be included

You can use this regex:
href="[^"]*\/FileBrowser\/File\?path=esoft([^.])*?"
The previous one was matching:
dfhgndfhkljh;fth href="/FileBrowser/File?path=esoft/test/I4/I0000/as.jpeg" dfghfdhnjfgh e:small;"><a href="/FileBrowser/File?path=esoft/test/bb/2evo/1_folder" target="_blank"dsadsadsa
|___________________________________________________________________________________________________________________________________________|
Because you allowed your match to contain ", which consumed too much chars

Related

Regex: Replace last segment of url

I try to figure out the correct regex to replace the last segment of an url with a modified version of that very last segment. (I know that there are similar threads out there, but none seemed to help...)
Example:
https://www.test.com/one/two/three/mypost/
--->
one/two/three?id=mypost
https://www.test.com/one/mypost/
--->
one?id=mypost
Now I am stuck here:
https://regex101.com/r/9GqYaU/1
I can get the last segment in capturing group 2 but how would I replace it?
I think I will have to something like this:
const url = 'https://www.test.com/one/two/three/mypost/'
const regex = /(http[s]?:\/\/)([^\/]+\/)*(?=\/$|$)/
const path = url.replace(regex, `${myUrlWithoutTheLastSegmentAnd WithoutHTTPS}?id=$2`)
return path
But I have no idea how to get the url without the last segment. I have currently only access to the whole string or group 1 (which is useless in this case) and then group 2, but not the string without group 2.
I would be very glad for any help here. Sometimes I just lack the knowledge of what is possible with regex and how to achieve it.
Thank you in advance.
Cheers
You could use the URL class to extract the pathname and substring to remove the first '/'.
Then, you could put the last part of the pathname in a group and use it as a reference $1 for the replacement.
const url = new URL('https://www.test.com/one/two/three/mypost/').pathname.substring(1)
console.log(url.replace(/\/([^/]*)\/$/, '?id=$1'))
I came across your question yesterday and agree with going down the route of parsing the URL. Once you get there you could even use JavaScript array methods which I prefer to string methods like:
pathname.split("/").filter(p => p.length).pop()
This would separate each folder, ignore any with no length (i.e. handle a trailing slash) and return the last one (mypost).
Anyway, I am also learning regex so sometimes when I find a question like this I just try to find the answer anyway as the best way of learning is doing. It took 24 hours 😂 I came up with this:
/(https?:\/\/).+?([a-z-]*)\/?$/gm
(https?:\/\/) you know what this does. Small correction, you don't need the square brackets. Question mark matches 0 or 1 of the preceding character. As we're only matching s this just works. If you wanted to match s or z you would use [sz]?. I think.
.+? this is the cool one I think I will use in future now I found it. The question mark here has a different meaning - it makes .+ (which means one or more of any character) non-greedy. That means it stops applying once it reaches the next rule. Which is...
([a-z-]*) any number of letters or a hyphen. You should maybe change this to include numbers and upper case.
\/? Optional slash
$ all this must apply at the end of the string.
Here is a demo
https://regex101.com/r/mQNkIS/1

Javascript Replace with parameters

So I'm making a markdown editor, and I want some function like "This is *italics*".replace("*$1*","<i>$1</i>");
Any easy way to do this? (Client Side, this'll be hosted on Github Pages or something, so a random npm package probably won't help)
Edit: An equal number of people have upvoted and downvoted this. It would help if you tell me why you downvoted.
Short answer: 'This is *italics*'.replace(/\*(.+)\*/, '<i>$1</i>');
Explanation: Using RegExp is the easiest way to go about this, specifically the grouping section.
Let's strip down /\*(.+)\*/:
The starting and ending / are defining that the thing in between is actually a RegExp
We need to check for asterisks at the start and at the end, but * is a quantity selector in the RegExp, therefore we need to escape them using a \ (basically saying "hey, the next chracter is not an actual selector, but something literal")
Next we need to specify that we need to check for any character between those asterisks (that's the .), appearing more than once (that's the +)
Finally we need to group this and tell the RegExp that what we want to remember is the thing between the asterisks and not the whole thing, that's where the parenthesis come to action.
Using those parenthesis, we can do $n (where n is the matched quantity number, in this case 1) in the replacing string to replace for the matching group

Match optional domain within string

I've racked my brain over this JS regex and have so far only managed to get parts of it to work or the whole thing to work in certain circumstances.
I have a string like this:
Some string<br>http://anysubdomain.particulardomain.com<br>Rest of string
The goal is to move the domain part to the end of the string, if it's there. The http part is also optional and can also be https. The TLD is always particulardomain.com, the subdomain can be anything.
I've managed to get everything into capture groups when the domain with protocol is present with this regex:
(.*)(https?\:\/\/[a-z\d\-]*\.particulardomain\.com)(.*)
But any attempt at making the domain part and the protocol part within it optional has resulted in no or the wrong matches.
The end result I'm looking for is to have the three parts of the string – beginning, domain, end – in separate capture groups so I can move capture group 2 (the domain part) to the end, or, if there's no domain present, the whole string in the first capture group.
To clarify, here are some examples with the expected output/capture groups:
INPUT:
Some string<br>http://anysubdomain.particulardomain.com<br>Rest of string
OR (no protocol):
Some string<br>anysubdomain.particulardomain.com<br>Rest of string
OUTPUT:
$1: Some string<br>
$2: http://anysubdomain.particulardomain.com
$3: <br>Rest of string
INPUT:
Some string<br>Rest of string
OUTPUT:
$1: Some string<br>Rest of string
$2: empty
$3: empty
One mistake in your regex is that it contains only particular whereas
the source text contains particulardomain, but this is a detail.
Now let's move to the protocol part. You put only one ? (after s),
which means that only s is optional, but both http and :
are still required.
To make the whole protocol optional, you must:
enclose it with a group (either capturing or not),
make this group optional (put ? after it).
And now maybe the most important thing: Your regex starts with (.*).
Note that it is greedy version, which:
initially tries to capture the whole rest of source string,
then moves back one char by one, to allow matching by the
following part of regex.
Change it to reluctant version (.*?) and then optional
group (https?:)? will match as expected.
Another detail: \ before : is not needed. It does not do
any harm either, but due to the principle "Keep It Simple...",
I recommend to delete it (as I did above).
One more detail: After [a-z\d\-] (subdomain part) you should put
+, not *, as this part may not be empty.
So the whole regex can be:
(.*?)((https?:)?\/\/[a-z\d\-]+\.particulardomain\.com)(.*)
And the last remark: I am in doubt, whether you really need three
capturing groups. Maybe it would be enough to leave only the content
of the middle capturing group, i.e.:
(https?:)?\/\/[a-z\d\-]+\.particulardomain\.com
Found a solution. Since, as stated, the goal is to move the domain to the end of the string, if it's present, I'm just matching the domain and anything after it. If there's no domain, nothing matches and hence nothing gets replaced. The problem was the two .* both at the beginning and the end of the regex. Only the one at the end is needed.
REGEX:
([a-z\d\-:\/]+\.particulardomain\.com)(.*)
Works for the following strings:
Domain present:
Start of string 1234<br>https://subdomain.particulardomain.com<br>End of string 999
Domain without protocol:
Start of string 1234<br>subdomain.particulardomain.com<br>End of string 999
No domain:
Start of string 1234<br>End of string 999
Thanks everyone for helping me rethink the problem!
I see good answer here, as you explained you need three group and set the domain to the back of the string(to be clear the entire url or only the domain e.g particulardomain.com)
You can do this:
//Don't know if the <br> tag matter for you problem, suppose it not
//this is you input
let str = "Start of string 1234<br>https://subdomain.particulardomain.com<br>End of string 99";
let group = str.split(<br>);
let indexOfDomain;
/*moere code like a for loop or work with a in-build funcion of the array with the regExp you made /[a-z\d\-:\/]+\.particulardomain\.com/ you can validated the domain separately.
}
TO HAVE IN MIND:
With your solution will not work at 100%, why?
your regExp:
([a-z\d\-:\/]+\.particulardomain\.com)(.*)
will mach a http, https, *(any other thing that is not a protocol) and will not work for this input you can test if you like and do a comment
Start of string 1234<br>End of string 999
The regExp that #Valdi_Bo answer:
(.*?)((https?:)?\/\/[a-z\d\-]+\.particulardomain\.com)(.*)
will fit to the what you described in the question
This regExp don't fit all yours input maybe he did not test it for all your input as you did not explained in your question like you did in your own answer
In conclusion at the end you need to extract the domain (wich don't know if is the entire url as you mix up the idea). If you are not going to use the do a split and then validated the regExp it will be more easy

javascript regex negation detect Url NOT containing given domain

I need to check some html files and extract the urls that are not referred to 2 websites
after many tests I got this
/(http|https)?:?(\/\/)\w*\.*\-*[^(mysite.com)]\w*\.?\S*/igm
that works not bad.. but not perfectly:
for example, as can see HERE on regexr.com it matches
// End
but not
www.demo.com
while should be the countrary, but adding a ? after (\/\/) it becomes an unusful "catch all"
and if url has a " at beginning and at the end, and this clearly happens frequently
does not grab starting " (correctly) but grab ending one (wrong)
finally it should not match also theothermysite.net but do well understood how to handle OR with Negation :-(
can help please?
Joe
Like this?
/((http|https):(\/\/)|www\.)\w*\.*\-*[^(mysite.com)(theothermysite.net)]\w*\.?[^\s\t\r\n\"]*/igm
I just added a "or www", replaced \S with its components plus \" and added another atomic group to the negation like you already did with mysite.com

javascript regexp to match path depth

Been struggling for the last hour to try and get this regexp to work but cannot seem to crack it.
It must be a regexp and I cannot use split etc as it is part of a bigger regexp that searches for numerous other strings using .test().
(public\/css.*[!\/]?)
public/css/somefile.css
public/css/somepath/somefile.css
public/css/somepath/anotherpath/somefile.css
Here I am trying to look for path starting with public/css followed by any character except for another forward slash.
so "public/css/somefile.css" should match but the other 2 should not.
A better solution may be to somehow specify the number of levels to match after the prefix using something like
(public\/css\/{1,2}.*)
but I can't seem to figure that out either, some help with this would be appreciated.
edit
No idea why this question has been marked down twice, I have clearly stated the requirement with sample code and test cases and also attempted to solve the issue, why is it being marked down ?
You can use this regex:
/^(public\/css\/[^\/]*?)$/gm
^ : Starts with
[^/] : Not /
*?: Any Characters
$: Ends with
g: Global Flag
m: Multi-line Flag
Something like this?
/public\/css\/[^\/]+$/
This will match
public/css/[Any characters except for /]$
$ is matching the end of the string in regex.

Categories