Prevent web page copying method

Prevent web page copying method - javascript

I was completely facinated when I discovered BRW's (example) method to prevent web page copying. I had a quick look through the source view and couldn't see how they did it. Aside from inserting (c) symbols through out the text, they also scramble the text yet it is completely readable through a browser. Amazing!
Any ideas how they did it?

If you view the source, you will notice that its a boatload of <i> and <span> elements littered in the source (some of which are hidden by indenting them -10000 to the left). However, a simple scraper with a tiny bit of logic could easily undo that travesty.
Sure, it will prevent casual copy and paste, but is downright dumb, plus makes you pretty much ungoogleable.

They overlay/insert invisible text spans (through CSS: text-indent: -100000px) that browsers usually still copy and paste, resulting in too much copied text. You need to parse the CSS to determine what is readable (try lynx -- bad text)

Related

How to preserve colors when paste into Quill code-block?

I copy code from VSCode to Quill, it has a nice color scheme. But when I enable code-block, it becomes just dark and boring. Is there any way to preserve original colors?

The colorscheme you see in VSCode is what's called syntax highlighting, and its done dynamically, i.e. is procedural and only "exists" in VSCode as it reads that code and displays it to you in some kind of visual buffer.
Typically, when working with code snippets/markup/markdown and a clipboard which copies and pastes, its pretty much just the text. Code, like this html code you are working with here, is never what is called rich text. Mainly because rich text is already a kind of code! It is something which defines the bold and colored aspects of your text (or image uris, links, etc), that, when you see it in Word or your web browser, it is something that has been parsed as a code and turned into a presentation of formatted text (in a way, the entire history of software and/or application development revolves around this very idea, consider the history of WYSIWG).
For this reason, something like html markup existing in your IDE which is itself only a "representation" of html markup, would be quite a weird thing that would be complicated to handle, not to mention quite distressing ontologically in coding world.
What you need is something that will reproduce the same procedural syntax highlighting you see in VSCode within css/js within Quilljs, and... your in luck!
https://quilljs.com/docs/modules/syntax/
That seems to be what you need, and you can even configure it to the exact colors you like in VSCode with some patience. (Quill is using the highlight.js library internally, as the module description page notes, so thats why the config link is pointing to those docs.)

Can Tinymce give me some exact HTML content with all styles kept (really means WYSIWYG)?

It's really hard to understand how Tinymce can be considered as WYSIWYG, because I cannot get what I see (visually exactly). So it is more likely "what you see is just what you see".
Currently I use getContent() to get the HTML. But it lacks embedded style and if we show that output html in some container, the visual rendering will look different.
I've tried implementing my own solution to help embed the current style (based on getComputedStyle) to each element. But that's not very efficient (many redundant styles can be included) and not always works (such as for embedded video, I'm not so sure why the <video> is not kept with getContent() and all <video>s disappear in the final output html).
The Tinymce team has done a lot of works, but really not sure why they did not even think about this feature? We need the exact HTML that renders what you see in the editor. We can sanitize the HTML after that by ourselves.
Here is a demo helping you imagine better what's so bothersome with this WYSISWYG editor:
https://jsfiddle.net/L83u5v0n/1/
Clicking on the Show HTML button shows this:
So you can clearly see it's just more likely to be WYSIWYS rather than WYSIWYG. Is there a solution to get the exact output HTML based on some hidden feature of Tinymce that I've not known of? If it's based on some custom script using getComputedStyle then really I do not need it (actually my solution is fairly good).

This is a function of demos that are set up to look good in the editor versus real world usage. The intention of the content_css configuration is to provide the CSS that will be used to render the content.
If you apply the content CSS elements to the page then "Show HTML" works perfectly.
https://jsfiddle.net/xzh8utbp/
Alternatively, delete the content_css configuration (but that won't quite work in your example because JSFiddle adds CSS to the result window).
Note that I've added mce-content-body to the view div because it turns out our codepen demo CSS leverages it. Normally that wouldn't be required, but then I don't think normal integrations use our codepen CSS.

How can I prevent a certain element's text being copied to the clipboard?

Before someone suggests it: no, user-select isn't the correct answer.
Daniel O'Connor's pure CSS method is close but I can't use it for my use case because of those "Accessibility concerns".
I need a better way of doing the same thing. I can't think of how it'd be done; I don't think there's a reliable cross-browser compatible way of copying something to the clipboard with JavaScript. So I think it has to be a HTML & CSS solution.
Edit
I say "I don't think there's a reliable cross-browser compatible way of copying something to the clipboard with JavaScript" because one solution could've been to catch when the copy event (if that's even supported everywhere). I just realised though even if "there's a reliable cross-browser compatible way of copying something to the clipboard with JavaScript", this probably wouldn't work when a user copies the text on a mobile device. Correct me if I'm wrong.
Edit 2
I'm not trying to block people from copying the text in any way. I'm not trying to block access to the text. That's impossible and discouraged anyway. I'm just trying to make it nicer for users who are copying N number of elements' text from my app who end up with unimportant stuff in the clipboard (like timestamp elements' text, etc.)

New Answer
You want to automatically omit certain parts of the text when users select and copy it.
I know two solutions for this:
Put the text in different parent elements
Hide the unwanted elements when the user starts selecting
Solution #1 is usually used for source code with line numbers. By clever applying CSS, two DIV's are aligned vertically so that each line number appears to be on the left of the correct line of source code (which is actually in a different DIV).
That way, you can drag your selection in the source code without getting any of the line numbers. This works well for information that can be displayed in boxes that are vertically/horizontally aligned or, to use a different picture, if you could put things into different table cells.
Solution #2 responds to the first mouse click and applies display: none to elements with a certain class like omit-during-copy.
The advantage of this approach is that the user can see what they are copying (i.e. the unwanted information is vanishing).
The disadvantage is how/when to revert the state.
A variant of this is to use absolute positioning to make elements appear in a certain place. This becomes tedious very quickly if your have to apply the technique to things like aHIDEb - you will have to put enough empty space between a and b to display HIDE between them, then place HIDE pixel perfect in the gap. Not impossible but I'd try to rearrange the information first.
Old Answer
There are two ways to prevent a user from copying text from your HTML page:
Don't display it
Use a password field
Let me explain: There is a small amount of things that you can try with JavaScript in the page itself. If you swallow Ctrl+C, then people will use the mouse. If you swallow the mouse button in the page, they will use the menu bar. If you disable the menu bar of the browser, people will deactivate the option "Allow JavaScript to hide the menu bar".
If you use a password field, people will use the JavaScript console or the "Show Source" or they'll use tools like Tampermonkey to get rid of your pesky intrusion into their lives.
You can try to display an image with the text instead. People will then use OCR. So even if you used a Flash-based HTML viewer to replace the whole page, people could still copy the text. But it would make them pretty mad.
Personal note: I tend to visit pages which try things like that once and briefly at that. I tend to remember such sites for a long time and with very negative emotions. So if I were you, I try this only to drive people away from my site.

In case someone is still looking for a way to prevent copying unimportant elements (such as line numbers), the answer is to use pseudo elements:
.code {
white-space: pre;
font-family: monospace;
padding-left: 40px;
}
.line-number {
position: absolute;
left: 0;
width: 38px;
color: #888;
text-align: right;
}
.line-number::before {
content: attr(data-line-number);
}
<div class="code">
<span class="line-number" data-line-number="1"></span>const greet = () => {
<span class="line-number" data-line-number="2"></span> console.log("Hello World!");
<span class="line-number" data-line-number="3"></span>};
</div>

How to paste Text from Word to plain text by preserve defined styles?

I want to let the user paste text to an editor (currently CKEditor). By pasting the text all styles and elements which are not white-listed must be removed, including images, tables etc. So 90% should be converted to plain text or be removed while some simple styles like bold, italic or underlined should be preserved.
Didn't thought that's so complicated. But all I can find within the documentation and the samples of CKEditor is about pasting complete plain text or pasting cleaned up content from Word without the ability to configure a white-list (and even if I remove all table-related plugins it is still possible to paste a table from MS WorD).
I really, really appreciate any hint.
Thanks.

You can't without writing your own parser. Another issue is MS word uses Windows-1252 character encoding and most of the web uses UTF-8 encoding, so if you paste from WORD and transmit this data via AJAX, it will be garbled.
While Dreamweaver has a pretty good "paste from word" feature, it's unlikely you'll find an online equivalent. This is a huge and complex problem that would be an application in itself. Even WORD's "save as HTML" can't even do a decent job of it.
Sadly, what most have to do, is strip it all down to ASCII (paste into Notepad), put it in the editor and mark it back up.

You can add a listener for the 'paste' event in the editor instance: http://docs.cksource.com/ckeditor_api/symbols/CKEDITOR.editor.html#event:paste
That way you get the HTML that it's gonna get pasted and you can perform whatever clean up you need (for example based on inserting that html into a div and then work with the DOM, or using regexps on the string).

Found a solution:
Listening to the paste event as AlfonsoML wrote.
Sending the pasted content of Word to the server.
Parsing it with the HTML Agility Pack.
Sending it back to the client.
Inserting it within the editor.

How do some WYSIWYG editors keep formatting of pasted text?

How do some WYSIWYG editors keep formatting of pasted text? As an example, I copied italic red text from a text-editor into a WYSIWYG and it kept the text's color and style, how is this happening? For the longest I thought JavaScript had access the clipboards text only. Is this not the case? If so then, what is it?

There's a content type negotiation between the source and target during the copy/paste operation. It happens sort of like this:
You copy something into the copy and paste buffer. The copied data is tagged with, more or less, a MIME type and who put it there.
When you paste, the paste target tells the copy-and-paste system that it understands a specific list of MIME types.
The copy-and-paste system matches the available formats to the desired formats and finds text/html in both lists.
Someone (probably the original source of the data) then converts the paste buffer to text/html and drops it in the editor.
That's pretty much how things worked back when I was doing X11/Motif development (hey! get off my lawn you rotten kids!) so I'd guess that everyone does it pretty much the same way.

JavaScript has no direct access to the clipboard in general. However, all major browsers released over the past few years have a built-in WYSIWYG editing facility, via the contenteditable attribute/property of any element (which makes just that element editable) and the designMode property of document objects (which makes the whole document editable).
While the user edits content in the page, if they trigger a paste (via keyboard shortcuts such as Ctrl + V or Shift + Insert or via the Edit or context menus), the browser automatically handles the whole pasting process without any intervention from JavaScript. Part of this process includes preserving formatting wherever possible.
However, the HTML this produces can be gruesome and varies heavily between browsers. Many WYSIWYG editors such as TinyMCE and CKEditor employ tricks to intercept the pasted content and clean it before it reaches the editor's editable area.

What you're seeing is a rich text editor. There's some information in this Wikipedia article: http://en.wikipedia.org/wiki/Online_rich-text_editor

I think it copied the selected DOM instead

We Keep Coding

JavaScript is the programming language of the Web.