I'm developing a bookmarklet now and faced this task: I need to collect all prices from any page.
The problem is that the price may be in multiple formats ($19.00, 15.45$, etc), not counting different currencies and html markup. The good news is that I'm using jquery.
If anybody has an idea how it can be accomplish, please share :)
Thanks in advance!
If there is no consistent markup you're probably going to have to write some regex's for the known patterns. For example:
To capture a pattern like $19.00 you'll use a regex that looks something like this:
\$[0-9]*.?[0-9]{1,2}
Since you're target data is so unstructured i'm not sure there is a single good answer to this. You'll need to identify the patterns you are looking for and write the regex's to identify them.
Test your regular expressions here: http://regexpal.com/
Best of luck.
-R
Related
I am trying to create a scenario that will work every time but I do not know how to deal with the uniquely hashed javascript and CSS. I could not find any answer in the documentation about that.
What I want specifically is the ability to pass a regex into my get but that is not possible since it only takes a string.
.get("/dist/precache-manifest.3efd6185a8d8559962673d45aed7ae98.js")
.headers(headers_0)
I expect a way to be able to somehow get the URL with a regex and then use it in my get above. Is there a way to do that in a Gatling scenario.
I found a way but its a hack and it takes a lot of time I am answering this because someone might want to use this way. However this could be considered a bug.
.get("").queryParam("", _ =>regex("""\/dist\/precache-manifest.[A-Za-z0-9]+.js"""))
.headers(headers_0),
I'm working in a system where there is no document and no jQuery, but
I do have to present html entities in an understandable way. So the trick of putting the string in an element and then taking the .text() won't work.
I need a pure JavaScript solution. The system isn't reachable from the outside, there is no user-input so security is not really an issue.
Thanks for any help, I'm out of ideas (not that I had to many to begin with)...
Perhaps I should clarify, what I am looking for is a function (or pointers to get me pointing in the right direction) which is able to translate a string with substrings that should translate to characters. So it should be able to translate "blah < blahblah" into "blah < blahblah".
There are no additional frameworks I can use other than pure javascript.
UPDATE:
I've got the html4 part working, not extremely difficult, but I have been busy with other things. Here's the fiddle:html4 entities to characters.
You could have done the same with a dictionary with just the characters already in there, but I didn't feel like making such a dictionary. The function is fairly simple but I guess it could do with some refactoring, can't really be bothered at the moment...
This function exists in PHP (htmlspecialchars_decode). As such, you'll find a javascript port from PHPJS. This is based on a very established codebase, and should be better than rolling something on your own.
Edit / Add:
Flub on my part. I didn't read the entities part properly. You want the equiv of html_entity_decode:
http://phpjs.org/functions/html_entity_decode/
Assuming you are using nodejs, cheerio is exactly what you need. I have used it myself a couple of times with great success for off-browser testing of HTML structures returned from servers.
https://github.com/cheeriojs/cheerio
The most awesome part is that it uses jQuery API.
I'm looking for the best regex to detect URLs in text. After trying many, I came across this article where the author demonstrated his regex to be the most robust among many. I'm trying to get this regex to work in Ruby and Javascript, but both Rubular and Regexpal are giving me errors. When I've tried to fix them, I've gotten no matches. Much love to anyone can help me translate this regex into Ruby and Javascript compatable versions.
_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS
Have you seen the source? There are Ruby and JS ports embedded: gist.github.com/dperini/729294.
Ruby:
result = subject.scan(/http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+/)
Javascript:
result = subject.match(/http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+/g);
The “perfect URL validation regex” to work in ruby and javascript, is probably:
http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+
DMKE answered my original question best, by linking me to some source I'd overlooked, so I accepted his answer. But after testing #diegoperini's regex, I was a bit underwhelmed. I ultimately stumbled upon the following regex I found on Daring Fireball:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))))+(?:(([^\s()<>]+|(([^\s()<>]+))))|[^\s`!()[]{};:'".,<>?«»“”‘’]))
It is liberal, and accepts port numbers, links without http: or www., but still managed to pass my tests. Plus, it is simple and easy to read. So I would recommend this Regex for someone who wants a quick, liberal regex for URLs.
I'm trying to make a website that would help my students to evaluate their math knowledge.
The subject involves simple algebra like 5a-(2a+(2b-3a))= ? and simple fractions.
I have two problems:
Students have to be able to input math easily. They have no experience with code like TeX. I can't expect them to input stuf like \frac{a}{b}.
Which easy to use math editor library would you recommend?
How can i evaluate their answers. How can i evaluate input like 5a+2ab = 2ab+5a. I already tried something out but student said afterwards that they entered 5 a of 5*a instead of 5a and the system said it was wrong...
Are there any (javascript) library's that could help me accomplish this?
Thanks!
WolframAlpha has an API. It can handle such requests fairly easily: Demo
It has a very comprehensive parsing engine that'll understand most correctly formatted mathematical expressions. Although I must admit that it might be a bit overkill...
How to split the syllables in a word using JavaScript. Is there any API for that? Any help will be appreciated.
Now, back to being constructive, what I suggest is that you find a couple online dictionary sites and look at their APIs (I know dictionary.com has a free API) and see if you can use it to access just the word split into syllables from a lookup.
Unfortunately, from what I have read, it looks like you would really need a dictionary of words split already to check against and there aren't any standalone versions out there.
Be the first and post it somewhere! :)