Javascript DOM nodes with special characters in the name - javascript

UPDATE: OK, this turns out to be an encoding/decoding problem. In the actual SOURCE HTML, the ID is rendered as follows:
<input type="text" id="addInput0436%2E20_0" name="quantity" value="1" />
And the Javascript to reference it is rendered as follows:
javascript:app.catalog.product.updateAvailability('availabilityContainer0436%2E20_0','0436%2E20_0','PFC','0436.20', dojo.attr(dojox.html.entities.decode('addInput0436%2E20_0'), 'value'));
However, by the time the Javascript call gets to the dojo.attr() routine, the encoded value of addInput0436%2E20_0 has helpfully been decoded to addInput0436.20_0 which of course no longer matches.
So I need to either force the JS to not decode this string, or force HTML/Tomcat to not encode the HTML ID. Ugh.
ORIGINAL POST:
I'm trying to identify a problem with the following line of code:
document.getElementById('addInput0436.20_0')
This should return the DOM node with that ID (which does exist in the document) but instead returns null.
I suspect that it is the special character in the node name, but I'm not sure how to fix it. Anyone run into this before?

I'm posting this answer in case it helps someone else:
We finally resolved this by re-encoding the (previously decoded) tag ID inside the Javascript routine that needed it. We were not able to find any other way around this behavior.
Bottom line: Don't use URL-encoded strings as tag ID names. Doing so seems to be asking for trouble.

Related

Hyperlink href incorrectly quoted in innerHTML?

Take this very simple example HTML:
<html>
<body>This is okay & fine, but the encoding of this link seems wrong.</body>
<html>
On examining document.body.innerHTML (e.g. in the browser's JS console, in JS itself, etc.), this is the value I see:
This is okay & fine, but the encoding of this link seems wrong.
This behaviour is the same across browsers but I can't understand it, it seems wrong.
Specifically, the link in the orginal document is to http://example.com?a=1&b=2, whereas if the value of innerHTML is treated as HTML then it links to http://example.com?a=1&b=2 which is NOT the same (e.g. If I created a new document, which actually had innerHTML as its inner HTML, and I clicked on the link then the browser would be sent to a materially different URL as far as I can see).
(EDIT #3: I'm wrong about the above. Firstly, yes, those two URLs are different; but secondly, the innerHTML which I thought was wrong is right, and it correctly represents the first URL, not the second! See the end of my own answer below.)
This is different from the issue discussed in question innerHTML gives me & as & !. In my case (which is the opposite to the case in that question) the original HTML is correct and it looks to me as if it is the innerHTML which is wrong (i.e. because it is HTML which does not represent what the original HTML represented).
(EDIT #2: I was wrong about this, too: it's not really different. But I think it is not widely known that & is the correct way to represent & inside an href, not just within body text. Once you realise that, then you can see that these are the same issue really.)
Can anyone explain this?
(EDIT #1+4: This only occurred to me a bit late, after writing my original question, but: "is & actually correct within the href text, and & technically incorrect?" As I said when I first wrote those words, that "seems very unlikely! I've certainly never seen HTML written that way." But however 'unlikely', or not, that is the case, and is the main part of what I wasn't understanding!)
Also related and would be useful, can anyone explain how to cleanly get HTML which does correctly represent the target of document links? You definitely can't just un-encode all HTML character references within innerHTML, because (as shown in the example I've used, and also as discussed in innerHTML gives me & as & !) the ones in the main run of text should be encoded, and just un-encoding everything would make these wrong.
I originally thought this was not a duplicate of innerHTML gives me & as & ! (as discussed above; and in a way it still isn't, if it's agreed that it's not as obvious or widely known that the same issues apply inside href as in body text). It's still definitely not a duplicate of A href in innerHTML (which somehwat unclearly asks about how to set innerHTML using JS).
Most browser tools don't show the actual HTML because it wouldn't be of much help:
HTML is often generated dynamically after page load with the help of CSS and JavaScript.
HTML is often broken and the browser needs to repair it in order to generate the memory representation needed for rendering and other stuff.
So the HTML you see is not the actual source but it's generated on the fly from the current status of the document, which of course includes all the fixed applied (in your case, the invalid HTML entities).
The following example hopefully illustrates all the combinations:
const section = document.querySelector("section");
const invalid = document.createElement("p");
invalid.innerHTML = 'Invalid HTML (dynamic)';
const valid = document.createElement("p");
valid.innerHTML = 'Valid HTML (dynamic)';
section.appendChild(valid);
section.appendChild(invalid);
const paragraphs = document.querySelectorAll("p");
for (p of paragraphs) {
console.log(p.innerHTML);
}
const links = document.querySelectorAll("a");
for (a of links) {
console.log(a.getAttribute("href"));
}
<section>
<p>Invalid HTML (static)</p>
<p>Valid HTML (static)</p>
<section>
Is & actually correct within the href text, and & technically incorrect? It seems very unlikely! I've certainly never seen HTML written that way.
There's no such thing as "technically correct", let alone today when HTML is pretty well standardised. (Well, yes, there're two competing standards bodies and specs are continuously evolving, but the basics were set up long ago.)
The & symbol starts a character entity and &b is an invalid character entity. Period.
But it works! Doesn't that mean it's technically correct?
It works because browsers are explicitly designed to deal with completely broken markup, what's known as tag soup, because it was thought that it would ease usage:
<p><strong>Hello, World!</u>
<body><br itspartytime="yeah">
<pink>It works!!!</red>
But HTML entities are just an encoding artefact. That doesn't mean that URLs are not allowed to contain literal ampersands, it just means that —when in HTML context— they need to be represented as &. It's the same as when you type a backslash in a JavaScript string to escape some quotes: the backslash does not become part of your data.
Having thought up a possible (but I thought 'unlikely') explanation - which I put in as an edit in the original question - I've realised that it is the answer:
Using & to represent & inside an href is technically incorrect, and & is technically correct
I gathered this initially from this SO answer https://stackoverflow.com/a/16168585/795690, and I think it is relevant that (as it also says in that answer) the idea that & is the correct way to represent & in an href is not as widely understood as the idea that & is the correct way to represent & in body text.
Once you do understand this, it makes sense that what the browser is doing is right, and that the innerHTML value which comes back represents the link correctly.
EDIT:
#ÁlvaroGonzález gives a much longer answer, and it took me a while to see how everything he says applies, so I thought I'd try to explain what I didn't understand starting from where I started from, in case it helps someone else!
If you start with raw HTML with <a href="http://example.com/?a=1&b=1"> and then you inspect the DOM in the browser, or look at the value of the href attribute in JS then you see "http://example.com/?a=1&b=1" everywhere. So it looks as if nothing has changed, and nothing was wrong. What I didn't understand is that actually the browser has parsed a technically incorrect href (with invalid entities) to be able to display this to you! (Yes, LOTS of people use this 'broken' format!)
To see this first hand, load this longer HTML example into your browser:
<html>
<body style="font-family: sans-serif">
<p>Now & then http://example.com/?a=1&b=2</p>
<p>Now & then http://example.com/?a=1&b=2</p>
<p>Now &amp; then http://example.com/?a=1&amp;b=2</p>
</body>
</html>
then in your javascript console try running this code taken from #ÁlvaroGonzález's answer:
const paragraphs = document.querySelectorAll("p");
for (p of paragraphs) {
console.log(p.innerHTML);
}
const links = document.querySelectorAll("a");
for (a of links) {
console.log(a.getAttribute("href"));
}
Also try clicking on the links to see where they go.
Once you've made sense of everything that you see there, it is no longer surprising how innerHTML works!

Why is the browser changing my single quotes to double and ruining my JSON in the data attr?

This should be way easier than it is, but it's got me stuck.
Im putting some JSON in an input data attribute and the quotes on the first key are closing the attribute.
Here's what I'm trying to do:
var html = `<input type="checkbox" data-values='${dataVals}' />`;
Where dataVals is a JSON string like this
'{"checked":true,"unchecked":false}'
But it's showing up in the browser like this:
<input type="checkbox" data-values="{"checked":true,"unchecked":false}">
And the browser is reading it essentially as though it's this.
data-values="{"
Which obviously isn't what I want.
I'm clearly missing something. Any thoughts?
Combination of #T.J. Crowder and #Teemu
I added a replace at the end of the json string to replace double quotes with "
JSON.stringify({ ... }).replace(/\"/g, """)
Then also stopped trying to run JSON.parse() when I wanted to get the value later since $.data('values') already returns a javascript object (when it can).
JSON.parse($(this).data('values')) => $(this).data('values')

Javascript regex to replace ampersand in all links href on a page

I've been going through and trying to find an answer to this question that fits my need but either I'm too noob to make other use cases work, or their not specific enough for my case.
Basically I want to use javascript/jQuery to replace any and all ampersands (&) on a web page that may occur in a links href with just the word "and". I've tried a couple different versions of this with no luck
var link = $("a").attr('href');
link.replace(/&/g, "and");
Thank you
Your current code replaces the text of the element within the jQuery object, but does not update the element(s) in the DOM.
You can instead achieve what you need by providing a function to attr() which will be executed against all elements in the matched set. Try this:
$("a").attr('href', function(i, value) {
return value.replace(/&/g, "and");
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
link
link
Sometimes when replacing &, I've found that even though I replaced &, I still have amp;. There is a fix to this:
var newUrl = "#Model.UrlToRedirect".replace(/&/gi, '%').replace(/%amp;/gi, '&');
With this solution you replace & twice and it will work. In my particular problem in an MVC app, window.location.href = #Model.UrlToRedirect, the url was already partially encoded and had a query string. I tried encoding/decoding, using Uri as the C# class, escape(), everything before coming up with this solution. The problem with using my above logic is other things could blow up the query string later. One solution is to put a hidden field or input on the form like this:
<input type="hidden" value="#Model.UrlToRedirect" id="url-redirect" />
then in your javascript:
window.location.href = document.getElementById("url-redirect").value;
in this way, javascript won't take the c# string and change it.

Will jQuery escape my quotes properly?

I'm using $.cookie() to pull all the values from a cookie, in JSON format:
var props = $.cookie('params');
props returns:
{"distinct_id": "13f97d6600b42e-000e6293c-6b1b2e75-232800-13f97d6600cc82","utm_source": "the_source","utm_term": "myterms","utm_campaign": "campaign","utm_medium": "medium","utm_content": "content"}
I'm inserting this dynamically into a form with jQuery, and I want to ensure everything is going to POST properly, even though there's all kinds of crazy characters that could be in there which would normally conflict with HTML (fully qualified urls, &, ", ', maybe even a > or <)
I also need to make certain it works in IE6, IE7, etc.
var input = $('<input type="hidden" name="CustomField1">');
input.appendTo($('form[data-params=track]')).val(props);
It would *appear* to be working, but I want to make 100% sure I'm doing this right as it's quite important there are no bugs for this step.
I am pretty sure val() does not need any additional escaping as you are not actually editing raw HTML. val() sets DOM value on an element.
Generally setting attributes or properties through DOM/jQuery should be fine. Those will be auto-escaped when rendering innerHtml. But if you submit a page it does not even have to render anything -- it can just directly copy values from DOM to request.
Since your are setting the value of an input field, it should just work fine, there is no need to escape/process any characters in the input's value.
You can use the JavaScript escape() function to make sure the string get's escaped properly for display in the browser.

Javascript Html Swedish characters strange behavior

I would like to ask a question regarding a strange behavior I face using the escape Ascii characters for some Swedish chars.
More specifically, in order to support a multilingual site, I have a json file where I have specified all required messages in Swedish, i.e. 'Avancerad sök'.
Then when the page loads the first time, I set this value to an input text and it is displayed properly: 'Avancerad sök'. But when I click a button and set again the value of this input text I get: 'Avancerad sök'.
Does anyone have faced a similar problem?
Thanks a lot!
Code:
q('#keyword').val(qLanguage.qAdvancedHint);
I execute this code both times. qLanguage is an object which I fill it from the json file and qAdvancedHint a specific key.
Don't know have the specific encoding is called. But tested with js's unescape method, but didn't work.
However a solution, a bad/ugly one, could be to ask jQuery to parse it for you then add it as a value property:
var text = $("<span/>").html(qLanguage.qAdvancedHint).text();
q('#keyword').val(text);

Categories