puppeteer escaping double quotes in style attribute - javascript

I'm using Google's puppeteer to read HTML, make some changes to it, and save it to a new HTML file.
Almost everything is working properly, except puppeteer is escaping double-quote characters (") as " inside the style attribute.
For example:
style='font-size:11.0pt;font-family:"Arial",sans-serif;
color:#D99594'
becomes:
style="font-size:11.0pt;font-family:"Arial",sans-serif;
color:#D99594"
This is affecting not only the output HTML, but some of the processing I'm doing within Puppeteer.
I believe I've ruled out encoding as an issue. Any ideas or fixes?
Thanks!

Problem
Functions like page.content() or similar functions that return HTML, will give you the current HTML representation of the DOM. However, this DOM representation of your HTML code might differ from your given HTML code. Therefore, this is expected behavior.
To name some examples:
Chrome will make <div/> into <div></div>.
Chrome will use double quotes for attributes: <div id='a'></div> becomes <div id="a"></div>
Chrome will make attributes lower case: <div ID="a"></div> becomes <div id="a"></div>
Chrome will try to fix your code: <div><span></div></span> becomes <div><span></span></div>
Try it yourself
To test it yourself you can use the following code. It will put some code into the DOM and then use innerHTML to check what the DOM actually looks like. Click on Run code snippet at the bottom and enter any code you want to test:
const el = document.querySelector("#domTester");
const output = document.querySelector('#output');
function showResult() {
const outerElement = document.createElement('div');
outerElement.innerHTML = el.value;
output.value = outerElement.innerHTML;
}
el.addEventListener('input', showResult);
showResult();
<p>
What you give to the browser:<br />
<input id="domTester" type="text" value="<div id='a " b'/>" style="width:100%" />
</p>
<p>
What the DOM will be rendered as:<br />
<input id="output" type="text" readonly="readonly" style="width:100%" />
</p>

Related

Enter a value in a "text/html" script via another script

Maybe this is a naive question but I wonder if there is a way to enter a value in a "text/html" script via another script, something like the following example which, by the way, does not work for me.
<script id="Block" type="text/html">
<input type="text" id="InputID"/>
</script>
<script>
document.getElementById('InputID').value = 'some text';
</script>
Setting the type of a <script> element to text/html is a hack for including some HTML source code in an HTML document without it being rendered so that you can later read the script element and do something with the HTML source code.
const raw_html = document.querySelector("script[type='text/html']").textContent;
This works because normal HTML rules do not apply to a script element, < has no special meaning there (in HTML 4 terms, the script element contains CDATA).
The modern equivalent is the <template> element.
You have no <input> element, so you can't find it in the DOM and you can't set a value to it.
If you want to do that, you first need to add it to the DOM.
const raw_html = document.querySelector("script[type='text/html']").textContent;
const container = document.querySelector("#container");
container.innerHTML = raw_html;
document.getElementById('InputID').value = 'some text';
<script id="Block" type="text/html">
<input type="text" id="InputID"/>
</script>
<div id="container"></div>

Markup and Execution of HTML, CSS and JavaScript from seperate textareas

I found a code for real time markup for HTML and CSS in different textareas using a jQuery function that outputs in an iframe:
HTML
<div class="container grid">
<form>
<h3>HTML</h3>
<textarea id="html" class="edit"></textarea> // TEXTAREA FOR HTML
<h3>CSS</h3>
<textarea id="css" class="edit"></textarea> // TEXTAREA FOR CSS
</form>
</div>
<div class="output grid">
<iframe></iframe>
</div>
JQUERY for markup
function() {
$(".grid").height($(window).height());
var contents = $("iframe").contents(),
body = contents.find("body"),
styleTag = $("<style></style>").appendTo(contents.find("head"));
$("textarea.edit").keyup(function() {
var $this = $(this);
if ($this.attr("id") === "html") {
body.html($this.val());
} else {
// it had to be css
styleTag.text($this.val());
}
});
})();
What if I wanted another textarea for javascript? I'm guessing you can't execute it in real time so I have to include a button to to run an eval()
But how?
In your HTML you can have another textarea for the javascript code.
In your textarea.edit event handler, you can add another check for javascript and use something like this -
$iframeEl.append(`<script>${textarea.val()}</script>`);
This line would execute your script as well because the browser re parses the document whenever it encounters a DOM change and it would also execute the script tag.
We should anyways refrain on using the eval keyword because it is an expensive operation and is advised not to be used.

How to copy newline characters from HTML div to textarea in jQuery?

Consider the following HTML page fragment:
<div id='myDiv'>
Line 1.<br />
Line 2<br />
These are &ltspecial> characters & must be escaped !##><>
</div>
<input type='button' value='click' id='myButton' />
<textarea id='myTextArea'></textarea>
<script>
$(document).ready(function () {
$('#myButton').click(function () {
var text = $('#myDiv').text();
$('#myTextArea').val(text);
});
});
</script>
First, there is a div element with id myDiv. It contains some text similar to what might be retrieved form a SQL database at runtime in my production web site.
Next, there is a button and a textarea. I want the text in myDiv to appear in the textarea when the button is clicked.
However, using the code I provided, the line-breaks are stripped out. What can I do about this, taking into consideration that escaping special characters is absolutely non-negotiable?
Your code works great for me in both Firefox and Chrome: http://jsfiddle.net/jYjRc/
However, if you have a client that doesn't do what you want, replace <br>s with newline characters.
Edit: Tested in IE7 and the code breaks. So I updated the fiddle with my suggestion: http://jsfiddle.net/jYjRc/1/
Do your HTML like so:
<div id='myDiv'><pre>
Line 1.
Line 2
These are &ltspecial> characters & must be escaped !##><>
</pre></div>
And now .text() will return the text exactly as you specify it in the <pre> tag, even in IE.

Javascript InnerHTML in IE7 messing with INPUT tags

When I receive the innerHTML of an element containing an input[type=text] element the speech marks around the value and id are removed in IE7 i.e.
<input type="text" id="test" value="test" />
Becomes:
<input type="text" id=test value=test />
This would be fine, other than the fact that I am using a JQuery plugin that takes a html segment and binds JSON to it. This means if I have a template:
<div id="template"><input type="text" value="${ValueToBind}" /></div>
When I retrieve this via document.getElementByID("template").innerHTML i get:
<input type="text" value=${ValueToBind} />
Thus, if I am binding a string with whitespace i.e. "this is a test" the output is:
<input type="text" value=this is a test />
Obviously, this is invalid html and causing havoc with my app. What I really need to do it to retrieve the html in the template AS IS, and not have IE try to do anything helpful like remove the " speech marks.
Cheers, Chris.
answered here innerHTML removes attribute quotes in Internet Explorer
If the problem is IE7 specific a quick fix may be to add an IE conditional comment for IE7 with code that re-inserts the quote marks.
I believe this isn't something you can get around directly as the quotemark-less html is just how IE7 represents the DOM node internally.
My view on the best way to ensure you get the exactly right template is to read each attribute of each node yourself rather than the inner html and then write them out with the quote marks.
See
innerHTML removes attribute quotes in Internet Explorer
for other ideas.
Using jQuery's .html()
http://docs.jquery.com/Attributes/html
would generally be the "jQuery way" to do this also rather than .getElementById

how to remove tags with JavaScript regular expressions

I have a JavaScript string containing HTML like this:
<div>
<div class="a">
content1
</div>
content 2
<div class="a">
<b>content 3</b>
</div>
</div>
and I want to remove the div's of class="a" but leave their content.
In Python I would use something like:
re.compile('<div class="a">(.*?)</div>', re.DOTALL).sub(r'\1', html)
What is the equivalent using Javascript regular expressions?
Why don't you use proper DOM methods? With a little help from jQuery, that's dead simple:
var contents = $('<div><div class="a">content1</div>content 2<div class="a"><b>content 3</b></div></div>');
contents.find('.a').each(function() {
$(this).replaceWith($(this).html());
});
You can achieve it with regular expressions in JavaScript
var html = '<div> <div class="a"> content1 </div> <div class="a"> content1 </div> ... </div>';
var result = html.replace(/<div class="a">(.*?)<\/div>/g, function(a,s){return s;});
alert(result);
RegExp method replace takes two parameters - first one is the actual re and the second one is the replacement. Since there is not one but unknown number of replacements then a function can be used.
If you want to do this in Javascript, I'm presuming that you are running it in a web browser, and that the 'javascript string' that you refer to was extracted from the DOM in some way.
If both of these case are true, then I'd say that it would be a good idea to use a tried and tested javascript library, such as JQuery (There are others out there, but I don't use them, so can't really comment)
JQuery allows you to do on-the-fly DOM manipulations like you describe, with relative ease...
$('div.a').each(function(){$(this).replaceWith($(this).html());});
JQuery is definitely one of those tools that pays dividends - a failry short learning curve and a whole lot of power.

Categories