I know this has been talked here, but no solutions were offer to the exact problem. Please, take a look...
I'm using a function to transform plain-text URLs into clickable links. This is what I have:
<script type='text/javascript' language='javascript'>
window.onload = autolink;
function autolink(text) {
var exp = /(\b(https?|ftp):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gim;
document.body.innerHTML = document.body.innerHTML.replace(exp,"<a href='$1'>$1</a>");
}
</script>
This makes
https://stackoverflow.com/
Looks like:
https://stackoverflow.com/
It works, but also replace the existent HTML links with nested links.
So, a valid HTML link like
StackOverflow
Becomes something messy like:
StackOverflow">StackOverflow</a>...
How can I fix the expression to ignore the content of link tags? Thanks!
I'm a newbie... I barely understand the regex code. Please be gentle :) Thanks again.
Using the jQuery JavaScript library, this would look like (demo at http://jsfiddle.net/BRPRH/4):
function autolink() {
var exp = /(\b(https?|ftp):\/\/[-A-Z0-9+\u0026##\/%?=~_|!:,.;]*[-A-Z0-9+\u0026##\/%=~_|])/gi,
lt = '\u003c',
gt = '\u003e';
$('*:not(a, script, style, textarea)').contents().each(function() {
if (this.nodeType == Node.TEXT_NODE) {
var textNode = $(this);
var span = $(lt + 'span/' + gt).text(this.nodeValue);
span.html(span.html().replace(exp, lt + 'a href=\'$1\'' + gt + '$1' + lt + '/a' + gt));
textNode.replaceWith(span);
}
});
}
$(autolink);
Edit: Excluded textareas, scripts, and embedded CSS. I note that this can also be done using pure DOM's splitText, which has the advantage of not adding extra span elements.
Edit 2: Eliminated all ampersands and double quotes.
Edit 3: Got rid of < and > characters as well.
This problem is beyond the power of regular expressions. You might be able to write a regex that could avoid some links, but you wouldn't be able to avoid every existing link.
The good news is that a different approach will make the job much easier. Right now you using document.body.innerHTML to manipulate the HTML as plain text. To do it correctly that way, you will basically need to parse the HTML yourself. But you don't have to, because the browser has already parsed it for you!
The web browser allows you to access an HTML document as a series of object. It's called the Document Object Model (DOM) and if you do some reading on that, you should be able to learn how to traverse through the HTML, skipping over anything inside an A element, and using the regex you have on plain text only.
Related
I want to wrap a HTML tag with another HTML tag in a string (so not a DOM element, a plain string). I created this function but I wonder if I could do it in one go without a forEach loop.
This is the working function:
function style(content) {
var tempStyledContent = content;
var imgMatches = tempStyledContent.match(/(<img.*?src=[\"'](.+?)[\"'].*?>)/g);
imgMatches.forEach(function (imgMatch) {
var imgTag = imgMatch;
var imgSrc = imgMatch.match(/src\s*=\s*"(.+?)"/)[1];
tempStyledContent = tempStyledContent.replace(imgTag,
"<a href=\"" + imgSrc + "\" data-fancybox>" + imgTag + "</a>");
});
return tempStyledContent;
}
The parameter content is a string with HTML code in it. The function above outputs the same html as the input but with the (fancybox) a tags surrounding all the child img tags.
So an input string like
"<div><img src='example.jpg'/></div>"
will output
"<div><a href='example.jpg' data-fancybox><img src='example.jpg'/></a></div>"
Can anyone improve this? I know too little about regex's to make this better.
Manipulating HTML with regex is notoriously problematic. Changes that would be trivial in a DOM parser can be very difficult to create a robust regex for; and when regex fails, it fails silently, which makes errors easy to miss. When working in regex you also have to be careful to handle all possible variations in markup such as whitespace, attribute order, quoting style, tag closing style, attribute contents that resemble html but which you don't want modified, etc.
As discussed exhaustively in the comment thread below, given enough time and effort it's certainly possible to handle all of these things in regex; but it leads to a complex, difficult to maintain regex -- and most importantly it's difficult to be certain your regex accommodates every possible valid markup variation. DOM parsing handles all of this stuff automatically, and lets you work with the structured data directly instead of having to cope with all the possible variations in its string representation.
Therefore, if you need to make nontrivial changes to an HTML string, it's almost always best to convert your HTML into a true DOM tree, manipulate that using standard DOM methods, then (if necessary) convert it back into a string. Fortunately it doesn't take a lot of code to do so. Here's a simple vanilla JS demo:
var htmlToElement = function(html) {
var template = document.createElement('template');
template.innerHTML = html.trim();
return template.content.firstChild;
};
var elementToHtml = function(el) {
return el.outerHTML;
}
// Usage demo:
var string = "<div>This <b>is some</b> <i>html</i><img src='http://example.com'></div>";
var foo = htmlToElement(string);
// perform your DOM manipulation as needed on foo here. This would look much simpler if I wasn't so stubborn about avoiding jQuery these days, but here we are anyway:
foo.querySelectorAll('img').forEach(function(img) {
var link = document.createElement('a');
link.setAttribute('data-fancybox',true);
link.setAttribute('href', img.getAttribute('src'));
img.parentNode.insertBefore(link,img);
link.appendChild(img);
});
// back to a string:
var bar = elementToHtml(foo);
console.log(bar);
Ok, I'm probably going to do DOM manipulation as #DanielBeck suggested. Once knouckout finished binding I will use $.wrap http://api.jquery.com/wrap/ to do my manipulation. I just hoped there was an easy way without using jquery, so if there are other suggestions please comment them.
I've been going through and trying to find an answer to this question that fits my need but either I'm too noob to make other use cases work, or their not specific enough for my case.
Basically I want to use javascript/jQuery to replace any and all ampersands (&) on a web page that may occur in a links href with just the word "and". I've tried a couple different versions of this with no luck
var link = $("a").attr('href');
link.replace(/&/g, "and");
Thank you
Your current code replaces the text of the element within the jQuery object, but does not update the element(s) in the DOM.
You can instead achieve what you need by providing a function to attr() which will be executed against all elements in the matched set. Try this:
$("a").attr('href', function(i, value) {
return value.replace(/&/g, "and");
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
link
link
Sometimes when replacing &, I've found that even though I replaced &, I still have amp;. There is a fix to this:
var newUrl = "#Model.UrlToRedirect".replace(/&/gi, '%').replace(/%amp;/gi, '&');
With this solution you replace & twice and it will work. In my particular problem in an MVC app, window.location.href = #Model.UrlToRedirect, the url was already partially encoded and had a query string. I tried encoding/decoding, using Uri as the C# class, escape(), everything before coming up with this solution. The problem with using my above logic is other things could blow up the query string later. One solution is to put a hidden field or input on the form like this:
<input type="hidden" value="#Model.UrlToRedirect" id="url-redirect" />
then in your javascript:
window.location.href = document.getElementById("url-redirect").value;
in this way, javascript won't take the c# string and change it.
I'm sorry,I can't believe this question is not solved in stackoverflow but I've been searching a lot and I don't find any solution.
I want to change HTML code with regular expressions in this way:
testing anchor
to
testing anchor
Only I want to unlink a text code without use DOM functions, the code is in a string not in the document and I don't want to remove other tags that the a ones.
If you really don't want to use DOM functions (why ?) you might do
str = str.replace(/<[^>]*>/g, '')
You can use it if you're fairly confident you don't have a more complex HTML but it will fail in many cases, for example some nested tags, or > in an attribute. You might fix some of the problems with more complex regular expressions but they aren't the right tool for this job in the general case.
If you don't want to remove other tags than a, do this :
str = str.replace(/<\/?a( [^>]*)?>/g, '')
This changes
<a>testing</a> <b>a</b>nchor<div>test</div><aaa>E</aaa>
to
testing <b>a</b>nchor<div>test</div><aaa>E</aaa>
I know you only want regex, for future viewers, here is a trivial solution using DOM methods.
var a = document.createElement("div");
a.innerHTML = 'testing anchor';
var wordsOnly = a.textContent || a.innerText;
This will not fail on complicated use cases, allows nested tags and it's perfectly clear what's happening:
Hey browser! Create an element
Put that HTML in it
Give me back just the text, that's what I want now.
NOTE:
The element we're creating will not be added to the actual DOM since we're not adding it anywhere, it'll stay invisible. Here is a fiddle to illustrate how this works.
As has been mentioned, you cannot parse HTML with regular expressions. The principal reason is that HTML elements nest and regular expressions cannot handle that.
That said, with a few restrictions which I will mention, you can do the following :
string.replace (/(\b\w+\s*)<a\s+href="([^"]*)">(.*)<\/a>/g, '$1 $3')
This requires there to be a word before the tag, spacing between the word and the tag is optional, no attributes other than the href specified in the <a> tag and you accept anything between the <a> and the .
You can create a DOM object from the string, use DOM methods to parse, without having had appended said DOM object to the document
I'm converting a website in modx which makes use of a java wrapper to dynamically pull in content and display it without any page reloading. the basics of the site are there but I'm having a slight issue with generated links and I'm not sure what the best way to get around it is.
I didn't write the original javascript that the site uses, I'm just trying to refactor it slightly so modx leverages the right pieces.
Here is an example of the template I'm using to page next/previous
<div id="next"></div>
<script type="text/javascript">
$(function()
{
setNext('[[+href]]');
var page_content_height = $('#page_content').height();
}
);
</script>
Basically modx's generated links take the following format in the page:
setNext('nb/index.php?id=17&page=2');
For them to work, they need to be:
setNext('nb/index.php?id=17&page=2');
The sites using jquery, I was thinking there might be a way I can get that to convert text strings before it renders the page?
Hope someone can point me in the right direction cos I'm a bit stumped atm.
setNext(htmlDecode('nb/index.php?id=17&page=2'));
function htmlDecode(input){
var e = document.createElement('div');
e.innerHTML = input;
return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}
See unescape html entities in javascript
This would do the trick
var str = 'nb/index.php?id=17&page=2'
str = str.replace(/&/g, '&');
setNext(str);
IMO it would be better to correct it at the source rather than 'patching it up' in the browser.
Worst case, you can do it in PHP like this:
$href = 'nb/index.php?id=17&page=2';
$modx->setPlaceholder('href', str_replace('&', '&', $href));
However if the link was generated using MODX's makeUrl() method then it should already be formatted correctly.
I have the following
var id='123';
newDiv.innerHTML = "";
Which renders in my HTML.
The problem I have is that I wish to take the call to the method TestFunction, and use as a string parameter in my function StepTwo(string, boolean), which would ideally end up in live HTML as shown...
notice how the TestFunction is a string here (it is executed within StepTwo using eval).
I have tried to format my JS as by :
newDiv.innerHTML = "";
but while this appears to me correct in my IDE, in the rendered HTML, it as garbelled beyond belief.
Would appreciate if anyone could point me in the right direction. Thanks!
One of the biggest capital failures on the internet is creating html in javascript by gluing strings together.
var mya = document.createElement("a");
mya.href="#";
mya.onclick = function(){
StepTwo(function(){
TestFunction('123', false );
}, true );
};
newDiv.innerHTML = "";
newDiv.appendChild(mya);
This Eliminates the need for any fancy escaping stuff.
( I probably should do 'onclick' differently, but this should work, I'm trying hard not to just use jQuery code to do everything )
Heres how I would do it in jQuery:
jQuery(function($){
var container = $("#container");
var link = document.createElement("a"); /* faster than $("<a></a>"); */
$(link).attr("href", "Something ( or # )" );
$(link).click( function(){
var doStepTwo = function()
{
TestFunction('123', true );
};
StepTwo( doStepTwo, false ); /* StepTwo -> doStepTwo -> TestFunction() */
});
container.append(link);
});
There is no good excuse for gluing strings together in Javascript
All it does is ADD overhead of html parsing back into dom structures, and ADD potential for XSS based broken HTML. Even beloved google get this wrong in some of their advertising scripts and have caused epic failures in many cases I have seen ( and they don't want to know about it )
I don't understand Javascript is the only excuse, and it's NOT a good one.
Try using " instead of \"
newDiv.innerHTML = "<a href="#"...
You should be using " not " or \" inside an HTML string quoted with double-quotes.
NewDiv.innerHTML = "";
There's probably a better way to do this - any time you find yourself using eval() you should stand back and look for a different solution.
You claim that eval is the right thing to do here. I'm not so sure.
Have you considered this approach:
and in your StepTwo function
function StepTwo(func,args,flag){
//do what ever you do with the flag
//instead of eval use the function.apply to call the function.
func.apply(args);
}
You could create the a element and attach to the click event using DOM Methods.
A Javascript Framework (like the ubiquitous jQuery) would make this a lot easier.
Your biggest problem is using eval, it leads to so many potential problems that it's nearly always better to find an alternative solution.
Your immediate problem is that what you really have is
as the next " after the start of the onclick attribute, closes it. Use " as others have suggested. And don't use eval.
You need to alternate your " and '.
Maybe you don't need quotes around the 123, because of Javascripts flexible typing. Pass it without quotes but treat it as a string within TestFunction.
Hey guys, thanks for all the answers. I find that the quot; seems to work best.
I'll give you guys some votes up once I get more reputation!
In regards to eval(), what you see in the question is a very small snapshot of the application being developed. I understand the woes of eval, however, this is one of those one in a million situations where it's the correct choice for the situation at hand.
It would be understood better if you could see what these functions do (have given them very generic names for stackoverflow).
Thanks again!
The best way is to create the element with document.createElement, but if you're not willing to, I guess you could do or use ".
In your code:
newDiv.innerHTML = "";
If it doesn't work, try changing "\'" to "\\'".
Remember that the " character is used to open and close the attribute on HTML tags. If you use it in the attribute's value, the browser will understand it as the close char.
Example:
<input type="text" value="foo"bar"> will end up being <input type="text" value="foo">.
...
I know this is hella' old now, but if anyone has issues with escaped strings when using eval (and you absolutely have to use eval), I've got a way to avoid problems.
var html = '';
eval('(function(div, html){div.innerHTML = html;})')(newDiv, html);
So, what's going on here?
eval creates a function that contains two parameters, div and html and returns it.
The function is immediately run with the parameters to the right of the eval function. This is basically like an IIFE.
In this case
var myNewMethod = eval('(function(div, html){div.innerHTML = html;})');
is basically the same as:
var myNewMethod = function(div, html){div.innerHTML = html;}
and then we're just doing this:
myNewMethod(newDiv, html); //where html had the string containing markup
I would suggest not using eval. If it can't be avoided, or if you control all the inputs and there's no risk of injection then this will help in cases where string escapes are an issue.
I also tend to use Function, but it isn't any more secure.
Here's the snippet I use:
var feval = function(code) {
return (new Function(code))();
}