We have a textarea control that holds text and hyperlinks. The links are stored as follows:
http://www.google.com [Link to a site __title__ Title of Link]
http://www.yahoo.com [http://www.yahoo.com __title__ Link with text & hyperlink the same]
In the second link, I don't want to count yahoo twice, so I want to ignore links starting with the left bracket. I know that using Regex to do this isn't the best way, but I don't know of any other way to accomplish this. So far I've tried this regex, but I figured out that Javascript doesn't support lookbehind:
(?<!\[)((http|https|ftp)\://(www\.)?)(([a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(/[a-zA-Z0-9%:/_\?\.'~&=-]*)
Anyone know of a decent way to accomplish this?
I just found out also that I can't rely on the brackets. Users can enter any type of link, using our tool that creates the brackets or by just entering it manually themselves.
Count the number of the character [ in the textarea contents.
The above solution will work, if the format that you have mentioned above persists.
You just need a count? Can't you count all the links then count all the links starting with the left bracket, and subtract?
Related
I'm attempting to click the download 'button' in the following image:
As you can see in the inspector, there is extra spacing in the label, so simply doing:
this.clickLabel("Download", "a");
doesn't work.
I've tried cutting and pasting the text from the html, but the nature of the return character producing a parsing error.
Any ideas?
Update:
#Artom B.'s duplicate link does have a potential solution to the problem, but the question being asked by the user is not the same and difficult to find otherwise.
With the help of #Artjom B. I came to use:
var x = require('casper').selectXPath;
casper.click(x("//a[contains(text(), 'Download')]"));
Essentially, the problem of having trailing characters after "Download" is overcome by searching for any link that contains "Download". When utilizing this, be weary that it will cause problems if another link also contains "Download" in the page.
Note: This is similar to the duplicate link Artjom commented on the question, but I think the problem is unique and the title is better related to that problem.
I trying to make a regex for finding: background-image:url('URL'); Where the URL is a external link for an image.
Been trying for something like this:
/\s*?[ \t\n]background-image:url('https?:\/\/(?:[a-z\-]+\.)+[a-z]{2,6}(?:\/[^\/#?]+)+\.(?:jpe?g|gif|png)$');/i
But couldn't get it to work.
I am using this with javascript/jquery
Does this get what you want?:
/\s*?[ \t\n]background-image:url\('.+?'\);/i
I think you can simplify it to this if you know it will only change with the URL in the middle. I probably went overboard with the \ escapes but better to be safe than sorry.
/background\-image\:url\(\'.*?\'\)\;/
Epascarello hit the nail on the head. Is this source you control? Or at least a predictable website? What are multiple different examples of input and your expected results?
Will this always be inline in double quotes, and therefore your URL will always be in single quotes? Some old websites use double-quotes in their CSS Files or header CSS.
Do you want to capture the whole thing? Or are you just trying to extract the resulting URL?
SirCapsAlot brings up a good question, are you just looking for background image URL's in general? Because they can use the Background property also, or even be set in JavaScript with .backgroundImage="url(image.jpg)".
And you definitely only want the ones that include http(s)?
With the limited requirements you gave, this is the best Regex:
background-image\s*:\s*url\('(https?://[^']+)
Comment here if you have answers to my questions which may alter your requirements, and thusly my answer.
Breakdown:
background-image:\s*url //Find the literal text to begin
\(' //Find the literal opening parens and quote
( //Begin Capture Group 1
https?:// //Require the match of https:// (the s is optional because of the ?)
[^']+ //Require that everything until the next quote is matched
) //Capture the result into Group 1
A Co-Worker pointed out that I might have been downvoted for not capturing the closing tick. Note: Capturing the closing tick would be a wasted step, and is not necessary for this regex to work.
He also pointed out somebody might have downvoted me for requiring http or https in the url portion. But the user's question was specifically for external URLs, not internal ones. So this is a valid requirement and gets him closer to what he asked.
Sooo... not sure why this got a downvote.
SO kept preventing me from posting the title I wanted so finally got a title that let me post though it kind of sucks so feel free to edit/change it.
I have fields a user can fill in and in the javascript we have
'${chart.title}'
and stuff like that. Is it sufficient to just strip out the single quote character such that they cannot escape it back to javascript? or are there other ways to close out the string that started with the single quote character.
${chart.title} inserts the title a user typed in on a previous page so naturally they could type something like "Title'+callMethod()+'RestOfTitle" injecting a callMethod into my javascript.
thanks,
Dean
The best way would be to restrict the input to alphanumerical and space characters.
If you want to allow anything inside the title, you can use a escaping function.
http://xkr.us/articles/javascript/encode-compare/
Just stripping the string of single quote characters is definitely not enough. Think of new lines for one reason.
There are couple of options.
First go very restrictive way and do both so called white-list validation for input field for you title and always encode the text that you output to the page. That will filtered out all unwanted (and potentially dangerous) characters and make sure that if some of them pass filter (or somebody update the text to contains some js code after the filters were applied) the encoding procedure make all malicious js scripts not runable (it turns it into plain text).
Second you do let your users input what ever they want (which is highly unrecommended way but sometime developers asked to do it) but always encode the text that you output to the page.
You can implement white-list validation by yourself using regular expression or you can use one of the libraries.
I created my first web app using Python, Django, Bootstrap and google-app-engine
The requirement I have is when people suggest external links, The program should be able to find and highlight as clickable URL in the text.
For example, when we give http://www.google.com in stackoverflow, it converted it as hyperlink
I have no idea how to achieve this, any help is greatly appreciated
https://docs.djangoproject.com/en/dev/ref/templates/builtins/?from=olddocs#urlize
Look into the python re module.
For example, taking John Gruber's URL regex pattern and matching it against his data set you could do something like...
giant_regex = r'''(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))'''
output_with_links = re.sub(giant_regex, r'\1', source_html)
Unfortunately this will capture actual links as well that don't need converting, but now your problem is finding the correct regex (which I'm sure has been documented online if you look). The python and django part is done.
This is not a full answer, but will get you started: to do what you want in Django, you will need to (1) take the input text that the user submits, (2) parse it for url patterns, and (3) return the html with a hyperlink to display in the View.
I don't know if there is a canonical regex for this purpose, but some that seem to work well are here and in this answer.
In SO, as you notice, the parsed text is first shown in a separate
display box and, once you hit "submit" is re-rendered. You can choose many ways to render the text (e.g. to parse the text on the client side with Javascript). However, for the first stage you should probably just create a "results" page with each url replaced with a hyperlink (<a href='url'>url</a>) to that url.
I want greasemonkey to scan through a website and change certain words to something else. Is there a way to do it with regex or some other string processing function in javascript?
Thanks for your help :)
In greasemonkey you use the DOM and then on the text nodes regular expressions might be used for finding your words. Check the Wikiproxy user script for an example that searches for words and changes stuff.