RegEx: Removing select parts of a URL, except variables - javascript

Trying to remove parts of a URL via RegEx.
I'm getting my content via an AJAX requst thus I cannot use
$(location).attr('search').split("&")[2]
My RegEx (Regex101)
Any direct answer will be greatly appreciated as I cannot comprehend RegEx, other better or more efficient suggestions will also be greatly valued.

Straightforward approach: this question already has helpful answers on how to parse a URL with and without a RegEx. Try one of those methods, then keep the parts you want and discard the ones you don't.

Not sure if I got your question fully. For the example you have, this is the regexp I have for your matching. May not be the elegant ones, but it should work for your example.
http[s]*://opskins.com/?loc=shop_search&app=730_2
http[s]://[a-z]+.com/\?[a-z]+=[a-z_]+&[a-z]+=[0-9_]+

Related

How to deal with hashed js and css in gatling?

I am trying to create a scenario that will work every time but I do not know how to deal with the uniquely hashed javascript and CSS. I could not find any answer in the documentation about that.
What I want specifically is the ability to pass a regex into my get but that is not possible since it only takes a string.
.get("/dist/precache-manifest.3efd6185a8d8559962673d45aed7ae98.js")
.headers(headers_0)
I expect a way to be able to somehow get the URL with a regex and then use it in my get above. Is there a way to do that in a Gatling scenario.
I found a way but its a hack and it takes a lot of time I am answering this because someone might want to use this way. However this could be considered a bug.
.get("").queryParam("", _ =>regex("""\/dist\/precache-manifest.[A-Za-z0-9]+.js"""))
.headers(headers_0),

Javascript Regex for DD/MM only

I have looked all over stack overflow and tried making and changing regex's to suit my needs but do to my very limited understanding of them I am coming unstuck...
I need to make a Javascript regular expression to check DD/MM. I can get DD/MM/YYYY working but this is not what I need.
What I have is ^([0-2][0-9]|(3)[0-1])(\/)(((0)[0-9])|((1)[0-2]))(\/)\d{4}$. This checks for DD/MM/YYYY but I when I try to simply truncate the end I get errors. I know limited knowledge read about regex's in javascript...Links appreciated.
Edited - simplified version as mentioned in the comments below
Based on your regex, this would be what you are looking for:
^([0-2][0-9]|3[0-1])\/(0[0-9]|1[0-2])$
But as Tim has pointed out in the comments, it is not bullet proof to do it that way.
You can look at the regex here: https://regex101.com/r/oQ2k6v/1
regex101 is a very nice site for regexes. It explains every part of the regex.
/^(0[1-9]|[1-2][0-9]|3[0-1])\/(0[1-9]|1[0-2])$/
To avoid matching dates like 00/00, use:
^(?:0[1-9]|[12][0-9]|3[01])\/(?:0[1-9]|1[0-2])$
Demo
According to comment, you said " I will be making a function to check it if it passes the regex.", So it's enough to use simpler regex:
^\d\d\/\d\d$

Trying to get the "perfect URL validation regex" to work in ruby and javascript

I'm looking for the best regex to detect URLs in text. After trying many, I came across this article where the author demonstrated his regex to be the most robust among many. I'm trying to get this regex to work in Ruby and Javascript, but both Rubular and Regexpal are giving me errors. When I've tried to fix them, I've gotten no matches. Much love to anyone can help me translate this regex into Ruby and Javascript compatable versions.
_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS
Have you seen the source? There are Ruby and JS ports embedded: gist.github.com/dperini/729294.
Ruby:
result = subject.scan(/http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+/)
Javascript:
result = subject.match(/http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+/g);
The “perfect URL validation regex” to work in ruby and javascript, is probably:
http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+
DMKE answered my original question best, by linking me to some source I'd overlooked, so I accepted his answer. But after testing #diegoperini's regex, I was a bit underwhelmed. I ultimately stumbled upon the following regex I found on Daring Fireball:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))))+(?:(([^\s()<>]+|(([^\s()<>]+))))|[^\s`!()[]{};:'".,<>?«»“”‘’]))
It is liberal, and accepts port numbers, links without http: or www., but still managed to pass my tests. Plus, it is simple and easy to read. So I would recommend this Regex for someone who wants a quick, liberal regex for URLs.

Best way to match and catch doubly-entified character entities/references?

I'm talking about stuff like &amp; which will then render to: & when it actually should render to &. In this I asked how to match entities, but it seems that isn't really possible or realistic with regexes. What then is the best way to match double entities?
EDIT: Is this a good way to do it? .replace(/&(?=#?x?[0-9a-z]+);/i, '&');
(I'm using javascript)
I'd go with
pattern &([a-zA-Z0-9]+?;)\1
replacement &$1
to replace just double amps, or:
pattern &([#a-zA-Z0-9]+?;)
EDIT:
your pattern
/&(?=#?x?[0-9a-z]+);/i
looks also good to me.
Note: none of these is something you can trust
Possibly:
&[a-zA-Z]+;
Though not fool proof.
Normalize your data first. Use whatever you know about encoding to decode them back to form where character/piece of data have only one possible encoding. After that match this normalized data with normalized pattern.

Collect prices from page using JavaScript

I'm developing a bookmarklet now and faced this task: I need to collect all prices from any page.
The problem is that the price may be in multiple formats ($19.00, 15.45$, etc), not counting different currencies and html markup. The good news is that I'm using jquery.
If anybody has an idea how it can be accomplish, please share :)
Thanks in advance!
If there is no consistent markup you're probably going to have to write some regex's for the known patterns. For example:
To capture a pattern like $19.00 you'll use a regex that looks something like this:
\$[0-9]*.?[0-9]{1,2}
Since you're target data is so unstructured i'm not sure there is a single good answer to this. You'll need to identify the patterns you are looking for and write the regex's to identify them.
Test your regular expressions here: http://regexpal.com/
Best of luck.
-R

Categories