innerText auto truncating long text inside a DOM element

innerText auto truncating long text inside a DOM element - javascript

I am trying to access the content inside a td element through the JavaScript innerText property (inside an excel VBA macro). It works perfectly for all cases except for this one case where the text inside the td element is very long (greater than 85982 characters).
Upon inspection of the text extracted by innerText, I found that innerText seems to be truncating the text after a certain length. Note that this doesn't happen for other cases where the text size is small.
Also, it seems that Mozilla's textContent property has a similar problem as well. I tried accessing the truncated part of the text using the developer console in Firefox for the aforementioned DOM element, but it seems that text isn't there in the extracted content (but the not truncated text is there - just like with innerText).
Does anyone know how to bypass this restriction? Is there some internal limit on these functions?
Here's my VBA code that has this problem:
MyInnerText = objElement.ChildNodes(3).innerText
Here's an equivalent code run in the Firefox console which has the same problem:
var t = document.getElementsByName("chapter11")[0].parentNode.children[3].textContent;
t.match("some text that is in the part being truncated.");
For Firefox, this problem seems to go away if I inspect the element, and click "Show all 3396" nodes. After those nodes are visible, the textContent does not truncate the text anymore.
Please do note that I want to be able to extract the text from inside a VBA script using the Internet Explorer object.

It turns out that the problem was caused due to an integer overflow. The index for the string I was looking for was larger than the capacity of the type Integer in VBA, so VBA silently set it to zero.
As for the reason behind why I wasn't able to find the text inside FireFox, that is still a mystery.

Related

Text renders blank in Chrome, reappears when selected

The problem
I'm using innerHTML to put HTML-formatted text in a <div>. At a consistent, seemingly random point in the text, the fonts stop being rendered and display blank. When I select the invisible text, it reappears.
Description of the code
We have a single <div id="text" inner-h-t-m-l="[[markup]]">.
The initial markup doesn't contain any data apart from empty segments with IDs:
<div id="segment-1"></div><div id="segment-2"></div>... etc
In Javascript, we loop over the IDs using querySelector (this is slow) and insert HTML into the each segment's innerHTML.
The framework used is Polymer 2.
Additional info; video
In the Chrome Dev Tools, the invisible text is shown as present in the DOM and seems to be no different from the text that renders correctly.
The font in the video is non-standard, but the problem also occurs when using system fonts.
Here's a video to illustrate the problem.
Here's a screenshot of a Chrome profiler run:
Edit:
After a discussion in the comments, I thought I should link the actual code.
Here's the element in question.
The relevant parts are:
<div id="segmented_text_content" inner-h-t-m-l="[[markup]]"></div>
_addPrimaryText(textStrings) {...}
_addSecondaryText(textStrings) {...}
Edit 2:
I found two potential workarounds for this, but neither one works well enough.
If I run this.querySelector('#text').innerHTML = this.querySelector('#text').innerHTML with a timeout of 3 seconds, it paints the text correctly.
When adding the text, if I use the async processArray function from this comment, it renders the text correctly, albeit very slowly because it updates the layout after every insertion.
With these two points, my working theory now is that Chrome updates the layout before the innerHTML attribute is fully assigned.
I also forgot to mention this project uses Shady DOM.

Can't get word on click

I have a script that get the word by getselection method when click, not by user selection, and then show an alert box with the word it's working well with some words, for example:
Si potrebbe comumente
It works with "potrebbe", but not with "Si".... It means, it works when the word is in the middle of the line, but not with the first or last word of the line...
This only happen when after apply the CheckKnowWords function, without apply this function, it works 100%.
The code is to big to post here, so, to prevent visual polution, I create this jsfiddle link: https://jsfiddle.net/fabiobraglin/ww7uLvd1/
ON FIREFOX
InvalidStateError: An attempt was made to use an object that is not, or is no longer, usable
ON CHROME:
Uncaught InvalidStateError: Failed to execute 'surroundContents' on 'Range': The Range has partially selected a non-Text node.
When use:
range.surroundContents(newNode);

Firefox seems to have a number of issues with this code.
1) An error is thrown in some cases
According to the documentation on MDN for Range.surroundContents:
An exception will be thrown, however, if the Range splits a non-Text
node with only one of its boundary points.
It also suggests another method that will work regardless. Instead of
range.surroundContents(newNode);
you can use
newNode.appendChild(range.extractContents());
range.insertNode(newNode);
This will make it so that you will not get errors on beginning and ending words, like Si, or mondo.
2) The popup and title show the entire html
They should show the text instead (e.g. senza instead of <span style="border-radius: 4px;border: 1px solid #ffcccc; background: #ffcccc;">senza</span>). To fix this, you can replace innerHTML with innerText.
More issues
There are a few more issues I found, but this is where it gets tricky. MDN has a warning that Selection.modify is non-standard and doesn't have plans to become standard, so you will get weird behavior between browsers, such as:
In Firefox, punctuation is being included at the end of the word (e.g. forti, or IATA).). In Chrome, this doesn't happen. Also, words with punctuation in them seem to work fine in Chrome but not in Firefox (e.g. the word L'indice looks fine in Chrome but comes up as indice in Firefox).
In Firefox, clicking on a word that follows a recognized word will instead select the recognized word. For example, if you clikc on ognuno where it says viaggo che ognuno di, it will catch the word che, but this doesn't happen in Chrome.
The word Henley is unique in that it is in bold tags. In the phrase britannica Henley & Partners, if you click on either Henley or & in Chrome you will get britannica. In Firefox you will get Henley.
You might want to consider changing the technique for finding the word. For example, you're already going through and highlighting all words you're interested in by wrapping them in a span. Instead of using Range and Selection, you could use the click event to figure out which span you're in, then get the text within that. If your initial checking also excludes punctuation and "words" like &, then you could just select the entire inner text.
Another minor issue I noticed is the HTML uses an unescaped & instead of &. It probably isn't affecting this example, but in general you'll run into less problems if you properly escape HTML (I think it's mainly just <, > and &, but there's plenty of tools and documentation on that elsewhere).
Here's an updated fiddle solving the first two issues, but they may be obsolete if you end up doing some refactoring to solve the other issues.

updated the fiddle. https://jsfiddle.net/ww7uLvd1/8/ Always use split(/\s+/) while trying to split with space

contenteditable div backspace and deleting text node problems

There are so many issues with contenteditable divs and deleting html and/or non content editable content inside editable divs.
Using an answer by the excellent Tim Down here: How to delete an HTML element inside a div with attribute contentEditable?
Using Tim's code, the entire text node gets deleted. I need this to work like any textarea would, deleting character by character and just making sure html elements can be backspaced as well.
I tried the following
else if(node){
var index = node.length-1;
if(index >= 0)
node.deleteData(index,1);
else
this.removeChild(node);
}
But this is obviously not going to work correctly. If I am at the end of the content, things work as expected. But if I place the cursor anywhere else, it's still deleting from the end.
I'm lost at this point, any help is very appreciated
http://jsfiddle.net/mstefanko/DvhGd/1/

After breaking down how google uses contenteditable divs in their google plus user tagging, I landed on a much more reasonable solution. Maybe it will help someone else out.
After adding 1 tag, you can already see a lot of differences in the html browser to browser.
In Google Chrome, a space is added with each tag. The button tag is used. And the chrome-only contenteditable="plaintext-only" is used.
When I backspace the space in chrome, a BR tag is then appended.
In Firefox the BR tag is added immediately with the first tag. No spaces are needed. And an input tag is used instead of the button tag.
The BR tag was the single greatest break-through I had while digging through this. Before adding this, there was a lot of quirky behavior with deleting tags, as well as focus issues.
In IE, more interesting changes were made. A span with contenteditable false is used for the tags here. No spaces or BR tags, but an empty text node.
With all of that, you don't have to copy google exactly.
The important parts:
If you're rendering HTML, do the following...
1. Chrome should use the button tag
2. Firefox/IE should use the input tag
For range/selection you generally want to treat things like tags as a single character. You can build this into your range/selection logic, but the behavior of the input/button tags is much more consistent, and way less code.
IE behaves better in IE7-8 using a span. Just from a UI standpoint. But if you don't care if your site is pretty in old versions of IE, the input has the correct behaviour in IE as well as firefox.
3. Chrome only, use the contenteditable="plaintext-only" attribute on your editable div.
Otherwise, a lot of weird issues happen not only when a user tries to paste rich-text, but also when deleting html elements sometimes the styles can get transferred to the div, I noted many strange issues with this.
4. If you need to set the caret position to the end of the div, set the end of the range before the BR.
for FireFox:
range.setEndBefore($(el).find('br')[0]);

textarea attribute wrap=hard does not work on pasted text, any workarounds?

I have a textarea
<textarea id="id-textarea-readme" wrap="hard"></textarea>
which works realy fine, until someone writes his text in "notepad" and puts it in there via c&p , the words are wrapped correctly but no "extra" linebreaks are made (which is kind of the purpose of "hard")
Is there any workaround to make this work? any JS or a trick to trigger the linebreaks?

The wrap=hard attribute is nonstandard and does not work consistently across browsers. Modify the design so that you do not need to rely on such client-side operations. Textarea elements should be expected to yield actual user input, containing line breaks when user actually hit Enter. If you need to split long lines for further processing, do it server-side.
In my tests, IE wraps even in copy and paste. Firefox, on the other hand, introduces hard line breaks only when it wraps at whitespace but not when it wraps inside a “word”, so that for cols=5, the input 0123456789 (whether direct or copypaste) gets displayed as two lines but sent as one line, whereas 01234 56789 gets sent as two lines. I would expect to find other browsers incompatibilities as well.

the copy & paste issue probably comes from native OS linebreak/return/newline issues.
you could just listen for change or paste on the textarea DOM element and parse with javascript? that'd be my conceptual guess, here's a generic example in jquery:
$("textarea").on("change", function(event){
$this.replace(/\n\r?/g, '\n');
});

Weird issue in IE when setting element's innerHTML attribute to contain a script element

I'm having a weird issue in IE when I set an elements innerHTML attribute to a string that contains a script element.
What happens is, when innerHTML is set like:
domEl.innerHTML = "<script type=\"text/javascript\">alert(\"hello world\");</script>"
alert(domEl.innerHTML);
The alert box doesn't show any text, as if the script element was removed completely. In addition, checking the element's childNodes collection also shows that the script element is not present as domEl.childNodes.length = 0.
However, if you add some text before the script tag like so:
domEl.innerHTML = "start text<script type=\"text/javascript\">alert(\"hello world\");</script>"
alert(domEl.innerHTML);
The script element is present when the alert box is shown.
Why is this happening and how can I fix it properly? Is this a bug in IE? It works fine in the latest versions of Chrome and Firefox. I'm using IE 8 for this.

looks like a bug, or some weird security consideration in IE.
try using XMP tags around your text. it might work, but that depends on what you were trying to achieve.

We Keep Coding

JavaScript is the programming language of the Web.