regex matching image url with spaces - javascript

I need to match a image url like this:
http://site.com/site.com/files/images/img (5).jpg
Something like this works fine:
.replace(/(http:\/\/([ \S]+\.(jpg|png|gif)))/ig, "<div style=\"background: url($1)\"></div>")
Except if I have something like this:
http://site.com/site.com/files/audio/audiofile.mp3 http://site.com/site.com/files/images/img (5).jpg
How do I match only the image?
Thanks!
Edit: And I'm using javascript.

Assuming images will always be in the 'images' directory, try:
http://.*/images/(.*?).(jpe?g|gif|png)
If you can't assume an images directory:
http://.*/(.*?).(jpe?g|gif|png)
Group 1 and 2 should have what you want (file name and extension).
I tested the regular expression here and here and it appears to do what you want.

Proper URLs should not have spaces in them, they should have %20 or a plus '+' instead. If you had them written with those alternatives then your matching would be much easier.

Why not:
/([^/]+\.(jpg|png|gif))$

Using
http:\/\/.*\/(.*)\.(jpg|png|gif)
should do the trick if all you want is the name of the image. The first group is the file name and the second group is the file extension.

Can you assume that the urls will be space delimited, or return delimited?
As in, can you assume this input?
site.com/images/images/lol (5).jpg
site.com/images/other/radio.mp3
site.com/images/images/copter (3).jpg
If you are going to have your delimiter as part of your string to return, things get tricky. What kind of volume are you talking about here? Could you do it semi-manually at all, or does the process have to be automated?

This would be an approach:
^((\w+):)?\/\/((\w|\.)+(:\d+)?)[^:]+\.(jpe?g|gif|png)$
Mathing on the colon. (:)
In this case it's only accepted for the protocol and port (optional).
This will not match:
http://site.com/site.com/files/audio/audiofile.mp3 http://site.com/site.com/files/images/img (5).jpg
This will match (colon in second http:// removed)
"/audiofile.mp3 http/" will count as a folder in "/audio/"
http://site.com/site.com/files/audio/audiofile.mp3 http//site.com/site.com/files/images/img (5).jpg
It's not fool proof. There are other characters that are not allowed in filenames ( * | " < > )

Related

Replace everything after last character in URL

I have the following code which replaces the current URL using JavaScript:
window.location.replace(window.location.href.replace(/\/?$/, '#/view-0'));
However if I have a URL like:
domain.com/#/test or domain.com/#/
It will append the #/view-0 to the current hash. What I want to is replace EVERYTHING after the last part of the URL including any query strings or hashes.
So presume my regex doesn't handle that... How can I amend it, to be more aggressive?
The following syntax may help:
location.href.replace(/[?#].*$/, '#/view')
It will replace everything after (and together with) ? or # in the string with #/view.
(^[^\/]*?\/)(?:.*)
Use this.Replace by \1 then your string
See demo.
http://regex101.com/r/sA7pZ0/28

URL RegExp WITHOUT http:// or www

I'm trying to construct URL RegExp. The base expression looks like:
/^(((http(?:s)?\:\/\/)|www\.)[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*\.[a-zA-Z]{2,6}(?:\/?|(?:\/[\w\-]+)*)(?:\/?|\/\w+((\.[a-zA-Z]{2,4})?)(?:\?[\w]+\=[\w\-]+)?)?(?:\&[\w]+\=[\w\-]+)*)$/
It looks good for me, because matches these:
http://gmail.com
http://www.gmail.com
www.gmail.com
But I wold like to modify it to match this:
gmail.com
I will appreciate any help.
just add a ? to make www optional, then it will match gmail.com also
use this :
^(((http(?:s)?\:\/\/)|www\.)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*\.[a-zA-Z]{2,6}(?:\/?|(?:\/[\w\-]+)*)(?:\/?|\/\w+((\.[a-zA-Z]{2,4})?)(?:\?[\w]+\=[\w\-]+)?)?(?:\&[\w]+\=[\w\-]+)*)$
or if you want to match only gmail.com and not http://gmail.com in that case use this :
^([a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*\.[a-zA-Z]{2,6}(?:\/?|(?:\/[\w\-]+)*)(?:\/?|\/\w+((\.[a-zA-Z]{2,4})?)(?:\?[\w]+\=[\w\-]+)?)?(?:\&[\w]+\=[\w\-]+)*)$
please note , this will match anu string which has dots and alphabets in it.
IMO it will be better off using a regex like this :
^(http:\/\/|www\.)?[\w\.]+\.(com|net|co\.cc|co\.in)$
you can modify it according to your needs .
check out a demo here and play around with the regex :
http://regex101.com/r/tS4aB3
The easiest way is to treat 'www' as just another subdomain (because that's all it is).
So:
/^(((http(?:s)?\:\/\/))?([a-zA-Z0-9\-]+\.?)+(?:\.[a-zA-Z0-9\-]+)*\.[a-zA-Z]{2,6}(?:\/?|(?:\/[\w\-]+)*)(?:\/?|\/\w+((\.[a-zA-Z]{2,4})?)(?:\?[\w]+\=[\w\-]+)?)?(?:\&[\w]+\=[\w\-]+)*)$/
Edit: as a side note, the tld (i.e. the ".com" part) is... quite complicated these days. There are a lot of them, and they may not fit easily in 2-6 chars.

Regexp javascript - url match with localhost

I'm trying to find a simple regexp for url validation, but not very good in regexing..
Currently I have such regexp: (/^https?:\/\/\w/).test(url)
So it's allowing to validate urls as http://localhost:8080 etc.
What I want to do is NOT to validate urls if they have some long special characters at the end like: http://dodo....... or http://dododo&&&&&
Could you help me?
How about this?
/^http:\/\/\w+(\.\w+)*(:[0-9]+)?\/?(\/[.\w]*)*$/
Will match: http://domain.com:port/path or just http://domain or http://domain:port
/^http:\/\/\w+(\.\w+)*(:[0-9]+)?\/?$/
match URLs without path
Some explanations of regex blocks:
Domain: \w+(\.\w+)* to match text with dots: localhost or www.yahoo.com (could be as long as Path or Port section begins)
Port: (:[0-9]+)? to match or to not match a number starting with semicolon: :8000 (and it could be only one)
Path: \/?(\/[.\w]*)* to match any alphanums with slashes and dots: /user/images/0001.jpg (until the end of the line)
(path is very interesting part, now I did it to allow lone or adjacent dots, i.e. such expressions could be possible: /. or /./ or /.../ and etc. If you'd like to have dots in path like in domain section - without border or adjacent dots, then use \/?(\/\w+(.\w+)*)* regexp, similar to domain part.)
* UPDATED *
Also, if you would like to have (it is valid) - characters in your URL (or any other), you should simply expand character class for "URL text matching", i.e. \w+ should become [\-\w]+ and so on.
If you want to match ABCD then you may leave the start part..
For Example to match http://localhost:8080
' just write
/(localhost).
if you want to match specific thing then please focus the term that you want to search, not the starting and ending of sentence.
Regular expression is for searching the terms, until we have a rigid rule for the same. :)
i hope this will do..
It depends on how complex you need the Regex to be. A simple way would be to just accept words (and the port/domain):
^https?:\/\/\w+(:[0-9]*)?(\.\w+)?$
Remember you need to use the + character to match one or more characters.
Of course, there are far better & more complicated solutions out there.
^https?:\/\/localhost:[0-9]{1,5}\/([-a-zA-Z0-9()#:%_\+.~#?&\/=]*)
match:
https://localhost:65535/file-upload-svc/files/app?query=abc#next
not match:
https://localhost:775535/file-upload-svc/files/app?query=abc#next
explanation
it can only be used for localhost
it also check the value for port number since it should be less than 65535 but you probably need to add additional logic
You can use this. This will allow localhost and live domain as well.
^https?:\/\/\w+(\.\w+)*(:[0-9]+)?(\/.*)?$
I'm pretty late to the party but now you should consider validating your URL with the URL class. Avoid the headache of regex and rely on standard
let isValid;
try {
new URL(endpoint); // Will throw if URL is invalid
isValid = true;
} catch (err) {
isValid = false;
}
^https?:\/\/(localhost:([0-9]+\.)+[a-zA-Z0-9]{1,6})?$
Will match the following cases :
http://localhost:3100/api
http://localhost:3100/1
http://localhost:3100/AP
http://localhost:310
Will NOT match the following cases :
http://localhost:3100/
http://localhost:
http://localhost
http://localhost:31

Getting http URL using regex when there are multiple http URLs

I want to extract different .swf files from different sites for a project. Different sites use different source methods so I can't use src= or data= in my regex.
I'm able to match the file name with /[\w-]+.swf/g , but when I try to match the full path( http(.*?).swf ) starting with http it matches another http before the path (the first one in the code). Also I can't use src= or data= etc, it must be only the link.
Basically, is there a way to limit the match to the first http found when searching backwards?
If anyone cares to take a look then here's the code: http://pastebin.com/kT20UqqJ .
And here's a good place to test regex: http://regex.larsolavtorvik.com/
Try the following one:
var regex = /http:[\.\/\w-%]+\.swf/g
You need to escape the . else it will match an arbitrary character and the / since it is the expression delimiter.
You can see the working Example here.
If you have url encoded characters (like white space) you would have also a % in your url.
Here is an example which will work in this case: /http:[\./\w%-]+\.swf/g
Here is a tool where you can test the regex: http://regexpal.com/
And one where you can check it's performance: http://regexter.com/

Javascript string replace of dots (.) within filenames

I'm trying to parse and amend some html (as a string) using javascript and in this html, there are references (like img src or css backgrounds) to filenames which contain full stops/periods/dots/.
e.g.
<img src="../images/filename.01.png"> <img src="../images/filename.02.png">
<div style="background:url(../images/file.name.with.more.dots.gif)">
I've tried, struggled and failed to come up with a neat regex to allow me to parse this string and spit it back out without the dots in those filenames, e.g.
<img src="../images/filename01.png"/> <img src="../images/filename02.png"/>
<div style="background:url(../images/filenamewithmoredots.gif)">
I only want to affect the image filenames, and obviously I want to leave the filetype alone.
A regex like:
/(.*)(?=(.gif|.png|.jpg|.jpeg))
allows me to match the main part of the filename and the extension seperately, but it also matches across the whole of the string, not just within the one filename I want.
I have no control over the incoming html, I'm just consuming it.
Help me please overflowers, you're my only hope!
I agree that this is not a problem suitable for regular expression, much less one neat expression.
But I trust that you are not here to hear that. So, in case you want to keep the input as string...
var src, result = '<img src="../images/filename.01.png"> <img src="../images/filename.02.png"><div style="background:url(../images/file.name.with.more.dots.gif)">';
do {
src = result;
result = src.replace( /((?:url(\()|href=|src=)['"]?(?:[^'"\/]*\/)*[^'"\/]*)\.(?=[^\.'")]*\.(?:gif|png|jpe?g)['")>}\s])/g, '$1' );
} while (result != src)
Basically it keeps removing the second last dot of images url's filenames until there are none. Here is a breakdown of the expression in case you need to modify it. Tread lightly:
( start main capturing group since js regx has no lookbehind.
(?:url(\()|href=|src=)['"]? Start of an url. it would be safer to force url() to be properly quoted so that we can use back reference, but unfortunately your given example is not.
(?:[^'"\/]*\/)* Folder part of the url.
[^'"\/]* Part of the file name that comes before second last dot.
) close main group.
\. This is the second last dot we want to get rid of.
(?= Look behind.
[^\.'")]* Part of the file name that goes between second last dot and last dot.
\.(?:gif|png|jpe?g) Make sure the url ends in image extension.
['")>}\s] Closing the url, which can be a quote, ')', '>', '}', or spaces. Should user back reference here if possible. (Was ['"]?\b when first answered)
) End of look behind.
Consider using the DOM instead of regular expressions. One way is to create fake elements.
var fake = document.createElement('div');
fake.innerHTML = incomingHTML: // Not really part of JS standard but all the 'main' browsers support it
var background = fake.childNodes[0].style.background;
// Now use a regex if need be: /url\(\"?(.*)\"?\)/
// If img is at childNodes[1]
var url = fake.childNodes[1].src;
With jQuery this is far easier:
$(incomingHTML).find('img').each(function() { $(this).attr('src'); });
Your problem is the greedy match in .*. Maybe better try something like this
([^\/]*)(?=(.gif|.png|.jpg|.jpeg))
[^\/] is a character class that matches every character but slashes
another point is, you need to escape the . to match it literally
([^\/]*)(?=\.(gif|png|jpg|jpeg))
The problem is that . means "any character".
Escape it:
/(.*)(?=(\.gif|\.png|\.jpg|\.jpeg))

Categories