innerHTML removes attribute quotes in Internet Explorer - javascript

When you get the innerHTML of a DOM node in IE, if there are no spaces in an attribute value, IE will remove the quotes around it, as demonstrated below:
<html>
<head>
<title></title>
</head>
<body>
<div id="div1"><div id="div2"></div></div>
<script type="text/javascript">
alert(document.getElementById("div1").innerHTML);
</script>
</body>
</html>
In IE, the alert will read:
<DIV id=div2></DIV>
This is a problem, because I am passing this on to a processor that requires valid XHTML, and all attribute values must be quoted. Does anyone know of an easy way to work around this behavior in IE?

IE innerHTML is very annoying indeed. I wrote this function for it, which may be helpfull? It quotes attributes and sets tagnames to lowercase. By the way, to make it even more annoying, IE's innerHTML doesn't remove quotes from non standard attributes.
Edit based on comments
The function now processes more characters in attribute values and optionally converts attribute values to lower case. The function looks even more ugly now ;~). If you want to add or remove characters to the equation, edit the [a-zA-Z\.\:\[\]_\(\)\&\$\%#\#\!0-9]+[?\s+|?>] part of the regular expressions.
function ieInnerHTML(obj, convertToLowerCase) {
var zz = obj.innerHTML ? String(obj.innerHTML) : obj
,z = zz.match(/(<.+[^>])/g);
if (z) {
for ( var i=0;i<z.length;(i=i+1) ){
var y
,zSaved = z[i]
,attrRE = /\=[a-zA-Z\.\:\[\]_\(\)\&\$\%#\#\!0-9\/]+[?\s+|?>]/g
;
z[i] = z[i]
.replace(/([<|<\/].+?\w+).+[^>]/,
function(a){return a.toLowerCase();
});
y = z[i].match(attrRE);
if (y){
var j = 0
,len = y.length
while(j<len){
var replaceRE =
/(\=)([a-zA-Z\.\:\[\]_\(\)\&\$\%#\#\!0-9\/]+)?([\s+|?>])/g
,replacer = function(){
var args = Array.prototype.slice.call(arguments);
return '="'+(convertToLowerCase
? args[2].toLowerCase()
: args[2])+'"'+args[3];
};
z[i] = z[i].replace(y[j],y[j].replace(replaceRE,replacer));
j+=1;
}
}
zz = zz.replace(zSaved,z[i]);
}
}
return zz;
}
Example key-value pairs that should work
data-mydata=return[somevalue] => data-mydata="return[somevalue]"
id=DEBUGGED:true => id="DEBUGGED:true" (or id="debugged:true" if you use the convertToLowerCase parameter)
someAttribute=Any.Thing.Goes => someAttribute="Any.Thing.Goes"

Ah, the joy of trying to use XHTML in a browser that doesn't support it.
I'd just accept that you are going to get HTML back from the browser and put something in front of your XML processor that can input tag soup and output XHTML — HTML Tidy for example.

I ran into this exact same problem just over a year ago, and solved it using InnerXHTML, a custom script written by someone far smarter than I am. It's basically a custom version of innerHTML that returns standard markup.

I might be couple year too late but here goes. Accepted answer might do what it promises but it's already Friday afternoon and I need something simpler nor I have time go it through. So here's my version which will just quote attribute values w/o quotes and it should be pretty trivial to extend it.
var t = "<svg id=foobar height=\"200\" width=\"746\"><g class=rules>";
t.replace(/([\w-]+)=([\w-]+)([ >])/g, function(str, $n, $v, $e, offset, s) {
return $n + '="' + $v + '"' + $e;
});

I've tested this, and it works for most attributes, except those that are hyphenated, such as class=day-month-title. It ignores those attributes, and does not quote them.

There is a quick and dirty workaround for this issue. I used it when working with jQuery templates. Just add a trailing space to the value of the attribute.
Of course this does not make much sense with id that is used in the example, so here is another example involving jQuery, and jQuery templates:
http://jsfiddle.net/amamaenko/dW7Wh/5/
Note the trailing space in the line <input value="${x} "/> without it, the example won't work on IE.

did you tried with jquery ?
alert($('#div1').html());

Related

Javascript replace first and last font tag

I'm trying to convert html to bbcode.
My code:
var html = "<font color=\"Green\"><font size=\"4\">test</font></font>"
html = html.replace(/\<font color="(.*?)"\>(.*?)\<\/font\>/ig, "[color=$1]$2[/color]");
Result:
[color=Green]<font size="4">test[/color]</font>
But I need to get another
[color=Green]<font size="4">test</font>[/color]
Please could you correct my mistake. Sorry for my English.
You're going to have difficulty parsing html with regex as you can see from the many many posts on this site about it. Html is not a regular language, and it therefore can be impossible to parse in this way. That being said, if the problem is this simple it should be possible.
Here's a solution that will work in very simple cases to get the first and last part
var html = "<font color=\"Green\"><font size=\"4\">test</font></font>"
,findTag = /<.*?>|([^<]+)/g
,part
,allParts = []
;
while((part = findTag.exec(html)) !== null) {
allParts.push(part[0]);
}
console.log(allParts)
var first = allParts[0]
,last = allParts.slice(-1)[0]
;
console.log(first, last);
You can then parse the first and last as you were doing previously and use array.join() to join everything back.
But again, this will only work in the simple case.
Use negative lookaheads:
html = html.replace(/\<font color="(.*?)"\>(.*?)\<\/font\>(?!\<\/font\>)/ig, "[color=$1]$2[/color]");
To explain what's happening here: The pattern I used is exactly yours, except for an appended (?!\<\/font\>). This so-called negative lookahead basically says: "only make this a match, if I don't encounter </font> next".

Stripping text with javascript

Hi I'm a newbie with javascript and I was wondering how do I strip all the text except the word TB_iframeContent800 . the digits at the end varies.
here is an example string
<iframe frameborder="0" style="width: 670px; height: 401px;" onload="tb_showIframe()" name="TB_iframeContent80" id="TB_iframeContent" src="http://www.gorgeoushentai.com/wp-admin/media-upload.php?post_id=33&" hspace="0">This feature requires inline frames. You have iframes disabled or your browser does not support them.</iframe>
I want to extract TB_iframeContent80 and store it as a variable. So how can you do this using regex with javascript? please note the last 2 digits varies cause the number always changes so it sometimes become a 3 digit number.
var iframeName = document.getElementsByTagName("iframe")[0].name
if you've include jQuery then it could be something like this:
var iframeName = $("iframe:first").attr("name");
If jQuery is an option I think you are looking for something like this
$('iframe[name^="TB_iframeContent"]')
If you wont use DOM (because code analysis etc) just try this regex
var code = '<iframe ... /iframe>';
var result = code.match( /name="([^"]*)"/ );
var extract = result[1];
this selects the content of the name attribute
You could load html youre parsing like this and use DOM to get its name (its easier and more reliable way than using a regex):
var loadhtml = document.createElement('div');
loadhtml.innerHTML = 'yourHtml';
var theName = loadhtml.getElementsByTagName('iframe')[0].name;
If you use Jquery you could consider attr("name") as way to get name
However if you insist using a Regex here is one :
/< *iframe[^>]*name *= *['"]([^'"]*)/

Regex for visible text, not HTML

If i had a string:
hey user, what are you doing?
How, with regex could I say: look for user, but not inside of < or > characters? So the match would grab the user between the <a></a> but not the one inside of the href
I'd like this to work for any tag, so it wont matter what tags.
== Update ==
Why i can't use .text() or innerText is because this is being used to highlight results much like the native cmd/ctrl+f functionality in browsers and I dont want to lose formatting. For example, if i search for strong here:
Some <strong>strong</strong> text.
If i use .text() itll return "Some strong text" and then I'll wrap strong with a <span> which has a class for styling, but now when I go back and try to insert this into the DOM it'll be missing the <strong> tags.
If you plan to replace the HTML using html() again then you will loose all event handlers that might be bound to inner elements and their data (as I said in my comment).
Whenever you set the content of an element as HTML string, you are creating new elements.
It might be better to recursively apply this function to every text node only. Something like:
$.fn.highlight = function(word) {
var pattern = new RegExp(word, 'g'),
repl = '<span class="high">' + word + '</span>';
this.each(function() {
$(this).contents().each(function() {
if(this.nodeType === 3 && pattern.test(this.nodeValue)) {
$(this).replaceWith(this.nodeValue.replace(pattern, repl));
}
else if(!$(this).hasClass('high')) {
$(this).highlight(word);
}
});
});
return this;
};
DEMO
It could very well be that this is not very efficient though.
To emulate Ctrl-F (which I assume is what you're doing), you can use window.find for Firefox, Chrome, and Safari and TextRange.findText for IE.
You should use a feature detect to choose which method you use:
function highlightText(str) {
if (window.find)
window.find(str);
else if (window.TextRange && window.TextRange.prototype.findText) {
var bodyRange = document.body.createTextRange();
bodyRange.findText(str);
bodyRange.select();
}
}
Then, after you the text is selected, you can style the selection with CSS using the ::selection selector.
Edit: To search within a certain DOM object, you could use a roundabout method: use window.find and see whether the selection is in a certain element. (Perhaps say s = window.getSelection().anchorNode and compare s.parentNode == obj, s.parentNode.parentNode == obj, etc.). If it's not in the correct element, repeat the process. IE is a lot easier: instead of document.body.createTextRange(), you can use obj.createTextRange().
$("body > *").each(function (index, element) {
var parts = $(element).text().split("needle");
if (parts.length > 1)
$(element).html(parts.join('<span class="highlight">needle</span>'));
});
jsbin demo
at this point it's evolving to be more and more like Felix's, so I think he's got the winner
original:
If you're doing this in javascript, you already have a handy parsed version of the web page in the DOM.
// gives "user"
alert(document.getElementById('user').innerHTML);
or with jQuery you can do lots of nice shortcuts:
alert($('#user').html()); // same as above
$("a").each(function (index, element) {
alert(element.innerHTML); // shows label text of every link in page
});
I like regexes, but because tags can be nested, you will have to use a parser. I recommend http://simplehtmldom.sourceforge.net/ it is really powerful and easy to use. If you have wellformed xhtml you can also use SimpleXML from php.
edit: Didn't see the javascript tag.
Try this:
/[(<.+>)(^<)]*user[(^>)(<.*>)]/
It means:
Before the keyword, you can have as many <...> or non-<.
Samewise after it.
EDIT:
The correct one would be:
/((<.+>)|(^<))*user((^>)|(<.*>))*/
Here is what works, I tried it on your JS Bin:
var s = 'hey user, what are you doing?';
s = s.replace(/(<[^>]*)user([^<]>)/g,'$1NEVER_WRITE_THAT_ANYWHERE_ELSE$2');
s = s.replace(/user/g,'Mr Smith');
s = s.replace(/NEVER_WRITE_THAT_ANYWHERE_ELSE/g,'user');
document.body.innerHTML = s;
It may be a tiny little bit complicated, but it works!
Explanation:
You replace "user" that is in the tag (which is easy to find) with a random string of your choice that you must never use again... ever. A good use would be to replace it with its hashcode (md5, sha-1, ...)
Replace every remaining occurence of "user" with the text you want.
Replace back your unique string with "user".
this code will strip all tags from sting
var s = 'hey user, what are you doing?';
s = s.replace(/<[^<>]+>/g,'');

Javascript - using innerHTML to output strings *WITHOUT* HTML-encoded special characters?

It appears that JavaScript auto-converts certain special characters into HTML entities when outputting content via the innerHTML() function. This is a problem, since I need to be able to output < and > without converting to gt; and lt;
Can this auto-conversion be prevented, reversed, or escaped? So far, no matter what I do, < and > are always automatically encoded into HTML entities.
Example code:
function DisplayQueries() {
var IDs = ['AllOpenedINC','AllOpenedCRQ','AllClosedINC','AllClosedCRQ','SameDayINC','SameDayCRQ','NotSameDayINC','NotSameDayCRQ',
'StillOpenINC','StillOpenCRQ','OpenOldINC','OpenOldCRQ','OtherQueuesINC','OtherQueuesCRQ']
for (var i = 0; i < IDs.length; i++) {
if (eval(IDs[i]))
document.getElementById(IDs[i]).innerHTML = eval(IDs[i]);
}
}
Example query variable:
AllOpenedINC = "('Company*+' = \"test\" OR 'Summary*' = \"%test%\") AND ('Submit Date' >= \"" + theDate +
" 12:00:00 AM\" AND 'Submit Date' <= \"" + theDate + " 11:59:59 PM\")" + nameINC;
You should focus on what you want to accomplish as a result, rather than the way of doing it. innerHTML() does encode, innerText() and textContent() do encoding too. So you should decode your strings if you want them as < or > back.
You can use this unescapeHTML() function to get your results as you want them.
function unescapeHTML() {
return this.stripTags().replace(/</g,'<').replace(/>/g,'>').replace(/&/g,'&');
}
I hope this helps. I've copied it from Prototype.
I think your question is based on a false premise. Just make a very simple test:
document.getElementById("testdiv").innerHTML = '<h1><em>Hello</em></h1>';
if this works fine then the problem is not on the JS side, instead you use some other components in your system which HTML-encode your characters.
I figured out what's going on. There's no easy way to prevent innerHTML from converting special characters to HTML entities, but since the problem was surfacing when copying the content of a DIV to the clipboard (using IE-only JS, which works since this is in a government environment where everyone has to use IE), I just used the replace() function to re-convert the HTML entities back to < and >.
You can use jquery and .append()

Need Pure/jQuery Javascript Solution For Cleaning Word HTML From Text Area

I know this issue has been touched on here but I have not found a viable solution for my situation yet, so I'd like to but the brain trust back to work and see what can be done.
I have a textarea in a form that needs to detect when something is pasted into it, and clean out any hidden HTML & quotation marks. The content of this form is getting emailed to a 3rd party system which is particularly bitchy, so sometimes even encoding it to the html entity characters isn't going to be a safe bet.
I unfortunately cannot use something like FCKEditor, TinyMCE, etc, it's gotta stay a regular textarea in this instance. I have attempted to dissect FCKEditor's paste from word function but have not had luck tracking it down.
I am however able to use the jQuery library if need be, but haven't found a jQuery plugin for this just yet.
I am specifically looking for information geared towards cleaning the information pasted in, not how to monitor the element for change of content.
Any constructive help would be greatly appreciated.
I am looking at David Archer's answer and he pretty much answers it. I have used in the past a solution similar to his:
$("textarea").change( function() {
// convert any opening and closing braces to their HTML encoded equivalent.
var strClean = $(this).val().replace(/</gi, '<').replace(/>/gi, '>');
// Remove any double and single quotation marks.
strClean = strClean.replace(/"/gi, '').replace(/'/gi, '');
// put the data back in.
$(this).val(strClean);
});
If you are looking for a way to completely REMOVE HTML tags
$("textarea").change( function() {
// Completely strips tags. Taken from Prototype library.
var strClean = $(this).val().replace(/<\/?[^>]+>/gi, '');
// Remove any double and single quotation marks.
strClean = strClean.replace(/"/gi, '').replace(/'/gi, '');
// put the data back in.
$(this).val(strClean);
});
You could check out Word HTML Cleaner by Connor McKay. It is a pretty strong cleaner, in that it removes a lot of stuff that you might want to keep, but if that's not a problem it looks pretty decent.
What about something like this:
function cleanHTML(pastedString) {
var cleanString = "";
var insideTag = false;
for (var i = 0, var len = pastedString.length; i < len; i++) {
if (pastedString.charAt(i) == "<") insideTag = true;
if (pastedString.charAt(i) == ">") {
if (pastedString.charAt(i+1) != "<") {
insideTag = false;
i++;
}
}
if (!insideTag) cleanString += pastedString.charAt(i);
}
return cleanString;
}
Then just use the event listener to call this function and pass in the pasted string.
It might be useful to use the blur event which would be triggered less often:
$("textarea").blur(function() {
// check input ($(this).val()) for validity here
});
Edited from the jquery docs..
$("textarea").change( function() {
// check input ($(this).val()) for validity here
});
Thats for detecting the changes. The clean would probably be a regex of sorts
edited above to look for a textarea not a textbox

Categories