Copy webpage as it is rendered, regardless of underlaying source

Copy webpage as it is rendered, regardless of underlaying source - javascript

So I have this webpage that I want to copy to a word document. It's an installation guide and we want to use that but add comments relating to how we installed the program in our environment.
Simple problem. Just copy and paste, right? Wrong.
Problem is, this specific webpage is built up of <div ..> tags, where a couple of checkboxes enable/disable them relating to your choices. So I check the box that marks an installation in Linux, and all div tags relating to that installation option are shown.
Example from the source:
<div class="forWindows forAIX forLinux forZLinux forPLinux forSolaris">...</div>
<div class="forJTS"> ... </div>
<div class="forCCM"> ... </div>
This means, that whenever I copy and paste a part of the webpage I get all the content, regardless of what I actually see on the screen. What I want is to just copy the webpage as I see it on the screen.
I've tried to copy from Internet Explorer and Firefox both to MS Word and to a basic text editor with the same results.
I want the result to be text so I can edit it, so screenshots or exporting to PDF won't work.
I could save the source HTML, remove the tags that dont apply and open the local html file, except that it's quite alot of work. Also the page seems to rely heavily on scripts on the serverside, so I guess that may cause some issues.
Ideally I'd like to preserve the formatting as it is shown aswell.
To reproduce the issue:
Go to the IBM's interactive guide for installing Rational Team Concert.
Select any choices, but to verify step 5-6 below, choose Linux as OS.
Click "Get your interactions"
Copy/paste a part of the webpage and compare the pasted version with what is seen in the browser.
Go to Step 3, "set up the database" in the guide. Copy all the content between "What to do next" in the previous step to the end of the heading in step 3. All in all, about 6 lines.
Paste in a texteditor, you should now see text that only relates to zOS and IBMi operating systems.

It seems that the behaviour on copy and pasting is undefined. Some browsers will copy ignoring the styling that hides stuff and others will copy including styles (ie some will include hidden text and others will not).
A rough summary of browsers seems to be:
IE - copies hidden text on IE8 and presumably older, no idea about newer.
FF - newer versions will not copy hidden text, older versions it seems will. Unknown where the cutoff is but it seems to be somewhere between version 3 and version 14. :)
Chrome - my current version (19.0.1084.52) will copy just the visible text. Untested on any other version.

I would simply screenshot the page and use a simple graphics editing program to crop the image and add annotations.
To screenshot a page, press Print screen (probably shorted as PrtScn on your keyboard). That copies the screenshot into memory.
Now, in your graphics editing program or even word processor, click paste (or press ctrl-v). The screenshot will appear. Crop and add annotations as per your desire.

Write a bookmarklet that concatenates #text nodes together but only when the parent element has a computed-style where display != none and visiblity != hidden.

Related

Text renders blank in Chrome, reappears when selected

The problem
I'm using innerHTML to put HTML-formatted text in a <div>. At a consistent, seemingly random point in the text, the fonts stop being rendered and display blank. When I select the invisible text, it reappears.
Description of the code
We have a single <div id="text" inner-h-t-m-l="[[markup]]">.
The initial markup doesn't contain any data apart from empty segments with IDs:
<div id="segment-1"></div><div id="segment-2"></div>... etc
In Javascript, we loop over the IDs using querySelector (this is slow) and insert HTML into the each segment's innerHTML.
The framework used is Polymer 2.
Additional info; video
In the Chrome Dev Tools, the invisible text is shown as present in the DOM and seems to be no different from the text that renders correctly.
The font in the video is non-standard, but the problem also occurs when using system fonts.
Here's a video to illustrate the problem.
Here's a screenshot of a Chrome profiler run:
Edit:
After a discussion in the comments, I thought I should link the actual code.
Here's the element in question.
The relevant parts are:
<div id="segmented_text_content" inner-h-t-m-l="[[markup]]"></div>
_addPrimaryText(textStrings) {...}
_addSecondaryText(textStrings) {...}
Edit 2:
I found two potential workarounds for this, but neither one works well enough.
If I run this.querySelector('#text').innerHTML = this.querySelector('#text').innerHTML with a timeout of 3 seconds, it paints the text correctly.
When adding the text, if I use the async processArray function from this comment, it renders the text correctly, albeit very slowly because it updates the layout after every insertion.
With these two points, my working theory now is that Chrome updates the layout before the innerHTML attribute is fully assigned.
I also forgot to mention this project uses Shady DOM.

Microsoft Edge version 40/15 clipboard returns strange string for 'text/html'

I just find that in Microsoft Edge v40/15 , when you paste some text from a <div contenteditable=true>, into the same <div> and get the data with e.clipboard.getData("text/html"), what I get is some thing confusing, instead of the part I pasted, I got a bunch of something like debugging information, like in the picture below (inside <div id="display">), what I actually pasted is only <p>paragraph element</p>:
I made a jsfiddle for it, you could try it out: https://jsfiddle.net/larryzhao/wfy60y07/ . Paste something from the contenteditable div into the same one with Microsoft Edge v40/15 and the thing will be shown in the div below.
I'd like to know if Microsoft Edge v40/15 is publicly released? Is it a bug or a feature from Microsoft Edge? I can't find it anywhere on the web.

In Edge 40/15 Microsoft has added text/html portion to the clipboard, in previous versions (even in the one that is publicly available at the moment) it wasn't even available.
That part which you consider to be a debugging information, is something that Microsoft has been adding to the clipboard for years (I'm not aware of any details though). So one can be pretty sure that it's there to stay. I guess the only way to address it would be to accommodate your code to the fact that it's there.
There's also another problem with text/html content in the Edge 40/15: https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/11877517/.
The word document contains unicode characters that are corrupted when pasting into Edge. The pasted content appears fine in ContentEditable, but the data we receive in JavaScript has been translated into what looks like ASCII.
Looking at the second paragraph in the HTML clipboard contents, we expect (and confirmed in Chrome/Firefox):
śƿęċīǟƪ characters
Instead Edge gives us:
Å›Æ¿Ä™Ä‹Ä«ÇŸÆª characters

Firefox contenteditable image selected after drop - can't remove selection

If you drag-and-drop around an image in Firefox in a contenteditable area, sometimes the images will end up being selected like this:
Fiddle here: http://jsfiddle.net/zupa/qg5Qh/
You may need to drag-drop it a few times, I have this bug in like 20+% of the time.
I am using Firefox 13.0.1 on Windows 7
How to remove that selection? Any help is appreciated.
Ps:
It is not available as a range via document.getselection().getRangeAt(..)
Firefox does NOT add any HTML attributes, still if I hit save (custom CMS), and reload the page in contenteditable mode, the selection comes back. Seems to be an annoying bug.

It does it reliably when the image is within a word that is marked by Firefox as a spelling error. For example, here's your jsFiddle with the image moved into the middle of the word "Lorem": http://jsfiddle.net/timdown/qg5Qh/1/
It seems to be something to do with the styling applied to misspelled words. Add the word "Lorem" to the browser's dictionary and the image styling goes away.
You could switch spellchecking off using the spellcheck attribute. From what I can gather, you have to do this at the <body> level in Firefox because it doesn't seem to work on single contenteditable elements as it does for textareas.
Demo: http://jsfiddle.net/timdown/qg5Qh/2/

How do some WYSIWYG editors keep formatting of pasted text?

How do some WYSIWYG editors keep formatting of pasted text? As an example, I copied italic red text from a text-editor into a WYSIWYG and it kept the text's color and style, how is this happening? For the longest I thought JavaScript had access the clipboards text only. Is this not the case? If so then, what is it?

There's a content type negotiation between the source and target during the copy/paste operation. It happens sort of like this:
You copy something into the copy and paste buffer. The copied data is tagged with, more or less, a MIME type and who put it there.
When you paste, the paste target tells the copy-and-paste system that it understands a specific list of MIME types.
The copy-and-paste system matches the available formats to the desired formats and finds text/html in both lists.
Someone (probably the original source of the data) then converts the paste buffer to text/html and drops it in the editor.
That's pretty much how things worked back when I was doing X11/Motif development (hey! get off my lawn you rotten kids!) so I'd guess that everyone does it pretty much the same way.

JavaScript has no direct access to the clipboard in general. However, all major browsers released over the past few years have a built-in WYSIWYG editing facility, via the contenteditable attribute/property of any element (which makes just that element editable) and the designMode property of document objects (which makes the whole document editable).
While the user edits content in the page, if they trigger a paste (via keyboard shortcuts such as Ctrl + V or Shift + Insert or via the Edit or context menus), the browser automatically handles the whole pasting process without any intervention from JavaScript. Part of this process includes preserving formatting wherever possible.
However, the HTML this produces can be gruesome and varies heavily between browsers. Many WYSIWYG editors such as TinyMCE and CKEditor employ tricks to intercept the pasted content and clean it before it reaches the editor's editable area.

What you're seeing is a rich text editor. There's some information in this Wikipedia article: http://en.wikipedia.org/wiki/Online_rich-text_editor

I think it copied the selected DOM instead

firebug and _moz_dirty

I am developing Javascript app that will wrap every line of text entered inside iframe (designmode) with P (or div) like it happens by default in IE.
For now I am not pasting my code because I just started, the first problem is when i type some text in firefox and even before I click enter or calling any function firebug inserts
<br _moz_dirty="">
under the entered text.
Why? How can I prevent it?
If you still need my code please tell.

As the _moz_-prefix suggests, this is a Mozilla-internal property. It isn't inserted by Firebug, but rather by the core editor functionality in Gecko. You can't prevent it; ignore it or work around it.

#myEditableDiv br {display:none;}
It's something Mozilla uses to prevent empty containers collapsing and occasionally inserts at seemingly random times too.
The question is, if they knew it was a dirty hack then why did they do it?

The _moz_dirty attribute is used to indicate that the node needs to be pretty-printed when the document is saved, although it shouldn't appear in web pages, only in SeaMonkey Composer and SeaMonkey and Thunderbird's HTML Message Compose.

The Gecko editor used to put it there because it needed it to give it somewhere to put the cursor. I believe this is fixed in Firefox 4.

We Keep Coding

JavaScript is the programming language of the Web.