Regex to detect urls with '?' character at the end - javascript

I found many solutions, but none was useful for me.
Let's say, as an example, I want to find URLs that start with www. and end with a space or ?. In this case, I really mean it ends in a ?, not that it's necessarily a CGI-related URL.
I'm trying to use the regex
var r = /(^|[\s\?])(www\..+?(?=([\s]|\?|($))))/g;
My sample use: http://jsfiddle.net/DKNat/2/
How can I use \? in a regex to prevent the end of the URL containing / before ??
http://jsfiddle.net/DKNat/11/
I can't solve last prob with DOT at the end of url.
Can any body help?

Try this in your fiddle:
var r = /(^|\??)(www\.[^\?]+)/g;
I updated your fiddle here:
http://jsfiddle.net/DKNat/3/
Update:
I see what you are trying to do now. Unfortunately, both your strings are essentially the same, apart from the /, so unless you want your regex to make the assumption that a ? anywhere after a slash denotes a CGI call, then there isn't much you can do. But you could try this:
var r = /(^|\??)(www\.[^\?]+\/[^\/]+\?[^\?]+|www\.[^\?]+)/g;
Updated fiddle:
http://jsfiddle.net/DKNat/5/
Update 2: After determining the requirements, this is the final RegExp I added to fiddle 10:
var r = /(^|[\?\s])(www\.[^\? ]+\/[^\/ ]*\?[^\? ]+|www\.[^\? ]+)/g;

Related

Remove everything after constant using regex

I've got XML that has additional information, BLAH, in each tag. When creating the tags, I've separated the extra info from the tag name with a constant (XMLSPLIT as constant XML_SPLITTER)... I needed to do this because I'm generating my XML from a JSON object and I can't have multiple keys that are the same thing... but in the XML output, can't have that superfluous stuff.
For example:
....
<SetXMLSPLITBLAH>
<Value>9</Value>
<SetType>
<Name>Foo</Name>
</SetType>
</SetXMLSPLITBLAH>
...
So, after generating the XML, I go through and clean it. I'm trying to do it with a regex. I figure, I want to remove anything on a line after the splitter and replace it with just the >.
let reg = new RegExp("<Set"+XML_SPLITTER+"(.*)\/g");
cleanXML = dirtyXML.replace(reg, "<Set>")
This fails to work.
I will note, that I reg = /<Set(.*)/g; and that worked just fine... but it also captures "SetType" and any other use of a tag that starts with "
It's because ^ is a Regex special character that indicates "beginning of line". You'd need to escape it like \^ for this to work. Something like /<Set\^\^[^>]*>/g should do the trick.
Small note: The above regex assumes that the "BLAH" string in your example will never contain the > character... but if it does, then your XML is super malformed anyway.
Using .* will match > and if - for some reason - your XML file is not broken up into multiple lines (i.e. minified), you'll match more than you should. To avoid this, you can use [^>]* to match everything up to the >.
Since you've gracefully included a splitter, it'll make matching much easier and much more predictable (as you mentioned, you match SetType without a splitter).
Without a splitter, you'd have to use a regex pattern that resembles <Set(?!Type>)[^>]* or <Set(?!(?:Type|SomethingElse)>)[^>]* if you had more than just one suffix to Set that should remain. These methods use a negative lookahead to assert what follows does not match.
var str = `<SetXMLSPLITBLAH>
<Value>9</Value>
<SetType>
<Name>Foo</Name>
</SetType>
</SetXMLSPLITBLAH>`
var XML_SPLITTER = 'XMLSPLIT'
var p = `(</?)Set${XML_SPLITTER}[^>]*`
var r = new RegExp(p,'g')
x = str.replace(r,'$1Set')
console.log(x)

angularJS | Javascript - regex replace

I need to remove a substring that looks like this
page/number/
I think the regex goes like this: "page/[0-9]+/" (correct me if I'm wrong).
Is this the correct way?
"www.myurl/archive/page/25/?abc=xyz".replace(page/[0-9]+/,"");
Or is there something I'm missing?
EDIT:
Whoever votes -1, can you comment the reason so that I'll know for the next time I ask a question? Thanks
Or is there something I'm missing?
Delimiters. :-) You need delimters around the regular expression so the JavaScript parser knows it's a regular expression. (And since it happens those delimiters are /, you need to escape the / inside the regex with a backslash.)
var result = "www.myurl/archive/page/25/?abc=xyz".replace(/page\/[0-9]+\//, "");
console.log(result);
Note that that will also change www.myurl/archive/blahpage/25/?abc=xyz (note blahpage rather than page). If you only want to replace /page/, we want another (escaped) / at the beginning and we want to replace the old thing with "/" rather than "":
var result = "www.myurl/archive/page/25/?abc=xyz".replace(/\/page\/[0-9]+\//, "/");
...unless this is always just prior to the ?, in which case the trailing / isn't needed and we could keep using ""). Here it is assuming this will always be followed by the ?:
var result = "www.myurl/archive/page/25/?abc=xyz".replace(/\/page\/[0-9]+\//, "");
var result = "www.myurl/archive/page/25/?abc=xyz".replace(/\/page\/[0-9]+\//, "");
console.log(result);

Look behind replace all occurrences

I want to replace all occurences of .digit with 0.digit.
I'm new to regular expressions but as far as I understand I could use look behind to do this. But JS does not support that, I'd like to know if someone knows a solution.
To show the problem I wrote the following code.
str = "0.11blabla.22bla0.33bla.33"
allow = "\\.\\d*"
str.match(new RegExp(allow,"g"))
[".11", ".22", ".33", ".33"]
deny = "0\\.\\d*"
str.match(new RegExp(deny,"g"))
["0.11", "0.33"]
diffreg= new RegExp("(?!"+deny+")"+allow,"g") // translates to: /(?!0\.\d*)\.\d*/g
str.match(diffreg)
[".11", ".22", ".33", ".33"]
Obviously allow matches all decimal values whereas deny matches all values with a preceding 0. The result should of course be the set difference between the two: [".33", ".33"].
Use a group match.
> str.replace(/([^0])(\.\d)/g, "$10$2");
"0.11blabla0.22bla0.33bla0.33"
I think you are looking for this regex instead
[0]?(\.\d*)
So in your code you will have:
intersectionreg = new RegExp("[0]?("+allow+")","g")
Thanks #richard, edited

what's wrong with this regular expression? getting the hash part of an url

I´m trying to get the first part of a hash from a url (the part between the # and a /, a ? or the end of the string
So far now I came out with this:
r = /#(.*)[\?|\/|$]/
// OK
r.exec('http://localhost/item.html#hash/sub')
["#hash/", "hash"]
// OK
r.exec('http://localhost/item.html#hash?sub')
["#hash?", "hash"]
// WAT?
r.exec('http://localhost/item.html#hash')
null
I was expeting to receive "hash"
I tracked down the problem to
/#(.*)[$]/
r2.exec('http://localhost/item.html#hash')
null
any idea what could be wrong?
r = /#(.*)[\?|\/|$]/
When $ appears in [] (character class, it's the literal "$" character, not the end of input/line. In fact, your [\?|\/|$] part is equivalent to just [?/$|], which matches the 4 specific characters (including pipe).
Use this instead (JSFiddle)
r = /#(.+?)(\?|\/|$)/
You aren't supposed to write [$] (within a character class) unless you want to match the $ literally and not the end of line.
/#(.*)$/
Code:
var regex = /\#(.*)$/;
regex.exec('http://localhost/item.html#hash');
Output:
["#hash", "hash"]
Your regex: /#(.*)[\?|\/|$]/
//<problem>-----^ ^-----<problem>
| operator won't work within [], but within ()
$ will be treated literally within []
.* will match as much as possible. .*? will be non-greedy
On making the above changes,
you end up with /#(.*?)(\?|\/|$)/
I use http://regexpal.com/ to test my regular expressions.
Your problem here is that your regular expression wants a /. So it don't works with http://localhost/item.html#hash but it works with http://localhost/item.html#hash/
Try this one :
r = /#([^\?|\/|$]*)/
You can't use the $ end-of-string marker in a character class. You're probably better off just matching characaters that aren't / or ?, like this:
/#([^\?\/]*)/
Why Regex? Do it like this (nearly no regex):
var a = document.createElement('a');
a.href = 'http://localhost/item.html#hash/foo?bar';
console.log(a.hash.split(/[\/\?]/)[0]); // #hash
Just for the sake, if it is node.js you are working with:
var hash = require('url').parse('http://localhost/item.html#hash').hash;
I found this regular expression that seems to work
r = /#([^\/\?]*)/
r.exec('http://localhost/item.html#hash/sub')
["#hash", "hash"]
r.exec('http://localhost/item.html#hash?sub')
["#hash", "hash"]
r.exec('http://localhost/item.html#hash')
["#hash", "hash"]
Anyway, I still don't get why the original one isn't working

Javascript - get current page URL without pagination numbers

Say I have a URL for an article:
http://domain.com/blog/articles/title-here/
And it has about 5 pages, so as you go through each page, you get:
http://domain.com/blog/articles/title-here/ OR http://domain.com/blog/articles/title-here/1
http://domain.com/blog/articles/title-here/2
http://domain.com/blog/articles/title-here/3
http://domain.com/blog/articles/title-here/4
http://domain.com/blog/articles/title-here/5
I know that the following code will get the full current URL (aka including the page #):
var u = window.location.href;
But is there a way to limit it so that the page # is NOT a part of the variable "u"?
Perhaps there's a regex or something I should add in there..? (I'm fairly new to javascript, so not sure how to apply this?)
var u = window.location.href.match(/.*[/][^\d]*/)[0]
Would that work for you?
Edit
I changed it... again :P
NOTE: Regex is a more complicated version of Joseph's and still suffers from the same bug. Will undelete when I fix it.
Joseph's answer is good, but it has a minor bug: it will drop the last part of the URL if you have an URL like:
http://domain.com/blog/articles/title-here
You can use this instead:
var u = window.location.href.match(/(.*)(\/\d*)/)[1]
How the regex works:
/ # delimiter
(.*?) # match anything and put in capture group 1
(\/ # match the forward slash
\d*) # match zero or more digits
/ # delimiter
var l = window.location;
l.href.replace(l.pathname, l.pathname.split(/\/[0-9]+$/)[0]);
try it in the console at this URL
Regex would do it. But in this case, you could just turn it into an array, strip of the end item, and re-serialize it.
var a = 'http://domain.com/blog/articles/title-here/2'.split('/');
a.splice(-1, 1);
a.join('/');

Categories