How to get string representation of HTML element - javascript

Is there any way to get easily string representation of node without its content? I mean - element.outerHTML returns string representation of element AND its content. I'd like to get something like element.outerHTML without element.innerHTML.
I want to make fallback of dirxml for consoles without support, using console.group and console.groupEnd

element.cloneNode(false).outerHTML
.cloneNode(false) creates a duplicate node with identical attributes, but not the child nodes, and .outerHTML returns its serialization that matches the original node excluding the node's content.

Related

Click on "Confirm" button by getting the Button Text inside a span [duplicate]

I have a html code like
<div>
<span>TV</span>
</div>
I want to find this span through documentObject having text 'TV', like getElementById etc ... something like getElementByText. I know that it's possible through XPath/JQuery/Regex.
But I need it to get through DOM object model only. As only DOM model is available in my context.
I see couple of answers:
Finding an html element ID based on a text displayed
jquery - find element that only has text and not any other html tag
how to find element after some text with javascript?
But these are not helpful to me, as I need to get it through DOM model only.
Assuming the document is well-formed enough to parse into a proper DOM object tree, you can iterate through the entire structure without using an external library. Depending on the structure, you may have to examine every node to find all matches, and this may be slow. If you have access to IDs of any sort, you may be able to reduce search scope and improve performance.
The key property you will need is the childNodes collection on every DOM node. Starting with the BODY (or some other container), you can recurse through all the child nodes.
This site is pretty basic but shows dependency-free methods for accessing DOM elements. See the section called "Tools to Navigate to a Certain Element".
I noticed that you mentioned regular expressions as a means to find elements. Regexes are poor for parsing entire documents, but they can be very useful in evaluating the textual content of a single node (e.g. partial matches, pattern matches, case insensitivity, etc.) Regular expressions are part of the JavaScript language itself and have been so for well over a decade.
Only thing I can think of is something like this:
function getElementByTextContent(text)
{
var spanList = document.getElementsByTagName("span");
for (var i = 0, len = spanList.length; i < len; i++)
{
if(spanList[i].textContent === text) // use .innerHTML if you need IE compatibility
return spanList[i]
}
}
of course it assumes you are only searching for <span> elements, but this might work for you. Here's a demo as well:
http://jsfiddle.net/uATdG/

Give the difference between input.value and input.textContent. Why is one used instead of the other?

Why is it that input.value is used instead of input.textContent. What is the difference between both?
For example, if I want to retrieve content from an input box
<input type="number">
I have to use this code
var input = document.querySelector("input");
input.value
instead of this one
input.textContent
Just want to get a clearer understanding of each.
From MDN:
[...] textContent returns the concatenation of the textContent of every child node, excluding comments and processing instructions. This is an empty string if the node has no children.
Essentially, textContent gives you a textual representation of what a node contains. Think of it as being everything between the opening and closing tags, e.g.
console.log(document.querySelector('span').textContent);
<span> this text </span> but not this one
<input> elements however cannot have children (content model: nothing). The value that is associated with them can only be accessed via the value property.
Only input elements have a "value". It represent the input data supplied by the user or provided initially by the code. Whereas textContent property sets or returns the text content of the specified node, and all its descendants.
textContent returns the concatenation of the textContent of every child node, excluding comments and processing instructions and if if the node has no children then textContent will be empty string.

Does DOMParser always returns normalized document? (node.normalize)

Does DOMParser always returns normalized document?
Spec says:
The parseFromString(str, type) method must run these steps, depending on type: "text/html": Parse str with an HTML parser, and return the newly created Document. (...)
I could not find any information whether Document is normalized by default or not.
_
To clarify, by saying "normalized" I mean the form after an execution of a normalize method:
node.normalize()
Removes empty Text nodes and concatenates the data of remaining
contiguous Text nodes into the first of their nodes.
Sources:
https://w3c.github.io/DOM-Parsing/#dom-domparser-parsefromstring
https://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-normalize
https://www.w3.org/TR/dom/#document
https://www.w3.org/TR/dom/#dom-node-normalize

Complete the empty XML tags in nodejs or Javascript?

I've currently huge amount of data (500 mb each) which I'm using lodash and cheerio to parse and fetch parts of it.
Problem with new data is that it has some empty tags being incorrectly replaced.
Example:
<apple></apple>
gets replaced by
</apple>
I want to make sure that the previous formatting remains the same. Any regex that I can use to find these new empty tags and replace it with the old correct format?
You probably mean that <apple></apple> is replaced by <apple/> (not </apple>).
<apple></apple> and <apple/> are equivalent in XML, and no compliant XML process will treat them differently, so you should not care which is used in your document.
If you truly meant that <apple></apple> is replaced by </apple>, then you have a likely irreparably damaged file as you won't know whether any given end tag for apple should be associated with an empty or nonempty apple element.
For example, doing a string-level replace of "</apple>" to <apple></apple> for
<apple>one</apple>
would result in
<apple>one<apple></apple>
which would not be well-formed.

Why does replacing innerHTML get slower as innerHTML get's larger?

I have a div that is not contentEditable. I capture keystrokes, insert the associated char into a string in memory, and then call a render() function that replaces the innerHTML of the div with the current string.
My question is, why does this loop get slower and slower as the innerHTML get's larger? All I'm doing is overwriting the innerHTML of the div with a straight string. Shouldn't this be constant time?
dojo.byId('thisFrame').innerHTML = this.value.string;
I don't understand how this is dependent on the size of the string at all. It slows down when the string's length gets over about 200 characters, and slows down drastically from there on out.
dojo.byId('thisFrame')
is a DOM element. Setting the innerHTML property of a DOm element is not constant time because it causes a side effect which does not take constant time.
Specifically, assigning to myHTMLElement.innerHTML causes the browser to parse the string with its HTML parser and rewrite a chunk of the DOM.
http://www.w3.org/TR/2008/WD-html5-20080610/dom.html#innerhtml0
On setting, [innerHTML] replaces the node's children with new nodes that result from parsing the given value.
Parsing HTML is at least linear in the amount of HTML, and replacing the DOM is at least linear in both the number of nodes removed and the number of nodes added.
The html you set using innerhtml must be parsed by the browser in order to get the DOM elements that make up the browsers internal representation of the div. This takes more time for a longer string with a greater number of elements.

Categories