How can I only capture string within url() with regex? - javascript

I'll try and make this concise.
I want to only capture the string within the parenthesis of:
background-image:url()
I've tried for hours and the best I could come up with was (?<!\#\*)(?<=url\().*(?=\)).
(Try it online)
This is almost perfect except when there is a run on line like this or any minified CSS code:
background-image:url(/images/products/test#2x.png);height:0;background-image:url("/images/products/test#2x.png")
It captures everything from the first open parentheses to the last closed parenthesis including the irrelevant styles in between.
I only want to capture the string between the url()'s.

Using positive and negative lookbehind: /(?<=background-image:\s*url\().*(?=\))/ig will return everything inside the background-image: url() call, with \s* to catch spaces.

use regex
(?<=background-image:url\()(.+?)(?=\))
can found all xxx in background-image:url(xxx), while not include background-image:url()
you can check result in: https://regexr.com , like this:
Note:
about look around = look ahead and look behind, here is my summary:
more detail can refer (Chinese) tutorial: 环视断言 · 应用广泛的超强搜索:正则表达式

Below is the regex which will work for you:
(?<!\#\*)(?<=url\().*?(?=\))
Hope this helps!

Related

javascript regexp to match path depth

Been struggling for the last hour to try and get this regexp to work but cannot seem to crack it.
It must be a regexp and I cannot use split etc as it is part of a bigger regexp that searches for numerous other strings using .test().
(public\/css.*[!\/]?)
public/css/somefile.css
public/css/somepath/somefile.css
public/css/somepath/anotherpath/somefile.css
Here I am trying to look for path starting with public/css followed by any character except for another forward slash.
so "public/css/somefile.css" should match but the other 2 should not.
A better solution may be to somehow specify the number of levels to match after the prefix using something like
(public\/css\/{1,2}.*)
but I can't seem to figure that out either, some help with this would be appreciated.
edit
No idea why this question has been marked down twice, I have clearly stated the requirement with sample code and test cases and also attempted to solve the issue, why is it being marked down ?
You can use this regex:
/^(public\/css\/[^\/]*?)$/gm
^ : Starts with
[^/] : Not /
*?: Any Characters
$: Ends with
g: Global Flag
m: Multi-line Flag
Something like this?
/public\/css\/[^\/]+$/
This will match
public/css/[Any characters except for /]$
$ is matching the end of the string in regex.

Capturing optional part of URL with RegExp

While writing an API service for my site, I realized that String.split() won't do it much longer, and decided to try my luck with regular expressions. I have almost done it but I can't find the last bit. Here is what I want to do:
The URL represents a function call:
/api/SECTION/FUNCTION/[PARAMS]
This last part, including the slash, is optional. Some functions display a JSON reply without having to receive any arguments. Example: /api/sounds/getAllSoundpacks prints a list of available sound packs. Though, /api/sounds/getPack/8Bit prints the detailed information.
Here is the expression I have tried:
req.url.match(/\/(.*)\/(.*)\/?(.*)/);
What am I missing to make the last part optional - or capture it in whole?
This will capture everything after FUNCTION/ in your URL, independent of the appearance of any further / after FUNCTION/:
FUNCTION\/(.+)$
The RegExp will not match if there is no part after FUNCTION.
This regex should work by making last slash and part after optional:
/^\/[^/]*\/[^/]*(?:\/.*)?$/
This matches all of these strings:
/api/SECTION/FUNCTION/abc
/api/SECTION
/api/SECTION/
/api/SECTION/FUNCTION
Your pattern /(.*)/(.*)/?(.*) was almost correct, it's just a bit too short - it allows 2 or 3 slashes, but you want to accept anything with 3 or 4 slashes. And if you want to capture the last (optional) slash AND any text behind it as a whole, you simply need to create a group around that section and make it optional:
/.*/.*/.*(?:/.+)?
should do the trick.
Demo. (The pattern looks different because multiline mode is enabled, but it still works. It's also a little "better" because it won't match garbage like "///".)

What's wrong with this regular expression to find URLs?

I'm working on a JavaScript to extract a URL from a Google search URL, like so:
http://www.google.com/search?client=safari&rls=en&q=thisisthepartiwanttofind.org&ie=UTF-8&oe=UTF-8
Right now, my code looks like this:
var checkForURL = /[\w\d](.org)/i;
var findTheURL = checkForURL.exec(theURL);
I've ran this through a couple regex testers and it seems to work, but in practice the string I get returned looks like this:
thisisthepartiwanttofind.org,.org
So where's that trailing ,.org coming from?
I know my pattern isn't super robust but please don't suggest better patterns to use. I'd really just like advice on what in particular I did wrong with this one. Thanks!
Remove the parentheses in the regex if you do not process the .org (unlikely since it is a literal). As per #Mark comment, add a + to match one or more characters of the class [\w\d]. Also, I would escape the dot:
var checkForURL = /[\w\d]+\.org/i;
What you're actually getting is an array of 2 results, the first being the whole match, the second - the group you defined by using parens (.org).
Compare with:
/([\w\d]+)\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl"]
/[\w\d]+\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org"]
/([\w\d]+)(\.org)/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl", ".org"]
The result of an .exec of a JS regex is an Array of strings, the first being the whole match and the subsequent representing groups that you defined by using parens. If there are no parens in the regex, there will only be one element in this array - the whole match.
You should escape .(DOT) in (.org) regex group or it matches any character. So your regex would become:
/[\w\d]+(\.org)/
To match the url in your example you can use something like this:
https?://([0-9a-zA-Z_.?=&\-]+/?)+
or something more accurate like this (you should choose the right regex according to your needs):
^https?://([0-9a-zA-Z_\-]+\.)+(com|org|net|WhatEverYouWant)(/[0-9a-zA-Z_\-?=&.]+)$

Javascript RegEx to find first line of each paragraph

This is probably a simple one but I'm very much a regex novice.
I'm looking to select the first line of every paragraph within a textarea on a page using a regular expression. After thinking I was there I have hit a problem.
Using http://gskinner.com/RegExr/ I came up with this:
/\r\r.*\r/g
but then I place that into my javascript and ran it on the page:
var headingsArr = document.getElementById("text").value.match(/\r\r.*\r/g);
and the array returns null.
Have I got the regular expression right and if so where am I going wrong when using it in my javascript!?
Thank you
This depends on what your newline characters are. I think you may better go for
/(?:\r\n|[\r\n]){2}.*(?:\r\n|[\r\n])/g
I know in Regexr only a \r is a newline. But in Windows normally \r\n is used, but under .*ix its normally only the \n.
So (?:\r\n|[\r\n]) is an alternation, it tries at first to match \r\n if this is not found it matches either \r or \n.
For the sake of future searchers, if you just need to style the first line of text, you can use the css pseudoclass ::first-line :
textarea::first-line {
background-color: yellow;
}
http://www.w3schools.com/cssref/sel_firstline.asp

Javascript Regular Expressions Lookbehind Failing

I am hoping that this will have a pretty quick and simple answer. I am using regular-expressions.info to help me get the right regular expression to turn URL-encoded, ISO-8859-1 pound sign ("%A3"), into a URL-encoded UTF-8 pound sign ("%C2%A3").
In other words I just want to swap %A3 with %C2%A3, when the %A3 is not already prefixed with %C2.
So I would have thought the following would work:
Regular Expression: (?!(\%C2))\%A3
Replace With: %C2%A3
But it doesn't and I can't figure out why!
I assume my syntax is just slightly wrong, but I can't figure it out! Any ideas?
FYI - I know that the following will work (and have used this as a workaround in the meantime), but really want to understand why the former doesn't work.
Regular Expression: ([^\%C2])\%A3
Replace With: $1%C2%A3
TIA!
Why not just replace ((%C2)?%A3) with %C2%A3, making the prefix an optional part of the match? It means that you're "replacing" text with itself even when it's already right, but I don't foresee a performance issue.
Unfortunately, the (?!) syntax is negative lookahead. To the best of my knowledge, JavaScript does not support negative lookbehind.
What you could do is go forward with the replacement anyway, and end up with %C2%C2%A3 strings, but these could easily be converted in a second pass to the desired %C2%A3.
You could replace
(^.?.?|(?!%C2)...)%A3
with
$1%C2%A3
I would suggest you use the functional form of Javascript String.replace (see the section "Specifying a function as a parameter"). This lets you put arbitrary logic, including state if necessary, into a regexp-matching session. For your case, I'd use a simpler regexp that matches a superset of what you want, then in the function call you can test whether it meets your exact criteria, and if it doesn't then just return the matched string as is.
The only problem with this approach is that if you have overlapping potential matches, you have the possibility of missing the second match, since there's no way to return a value to tell the replace() method that it isn't really a match after all.

Categories