Microsoft Word Metadata on copy and paste into html (contenteditable) - javascript

I'm wondering if there is any way to ignore or strip metadata from a copied and pasted paragraph / sentence in a html div / content editable = true from Microsoft Word or other text editing program?
I am building an html app that is a text editor, but the problem is every time somebody copies and pastes text that was already formatted in another program (word, other html pages), it adds some metadata that I don't know how to strip out.

You might consider using TinyMCE for this as it is a WYSIWYG text editor that has that feature. There is a Demo Page that has a button at the top of the editor "Paste From Word".
If using TinyMCI is not an option you can download it's source and determine how TinyMCE does it.

Related

How do popular JS Wysiwyg editor access style of copy-pasted webpages?

I don't understand how wysiwyg such as Froala or Redactor, are able to render into proper HTML, something that you just copied from another webpage (and that would have rendered only as text if you pasted it somewhere else).
If you want to checkout the behavior I am talking about, just
go to this page : https://www.froala.com/wysiwyg-editor
then delete all the demo content in the editor
then copy some article on medium, wikipedia or whatever
Then paste it in the editor
You can see it is rendered as proper HTML (you can switch their editor to see the actual html behind) : bold is still bold, titles are titles, etc, as if the editor was somehow able to access the style of the copied page.
How do they access to this style ?
I am one of the guys who is working on the Froala Editor. The browser is doing most of the jobs in terms of placing the right style there in the browser clipboard object. We're just getting the style from the browser clipboard when the HTML gets pasted to the editor.

Detect if an HTML table has been pasted into a textarea

I was wondering if there was a way to detect (or at least make a good assumption) whether text pasted into a textarea includes content copied from an HTML table?
I'm finding users of my website are pasting tabular data (from other websites) into their comments and I'm wanting to clean up the way my website displays those comments.
I'm using PHP, but I'm not too fussed if there's a way to do this with Javascript.
And bonus points if your suggestion can keep the table formatting :)
A pure textarea can't receive formatted content. If your users copy a table, div, or whatever HTML structure from other sites and paste into a textarea, you'll have access only to the pure visible text of the copied content, not the HTML code. Using a textarea, the only way to paste HTML code is if your user copy the code directly =).
An alternative is to use a WYSIWYG like Redactor or CKeditor, it can retain rich text and you'll be able to get the HTML that your users paste there.
Or you can simplify and use the attribute contenteditable with other tag (like a div) and test if there's a table using a Regex, this way:
<div id="yourDiv" contenteditable>Paste a table here!!</div>
var yourHTML = document.getElementById("yourDiv").innerHTML;
var thereIsATableHere = /<table[^>]*>(.*?)<\/table>/.test(yourHTML);

What makes editors paste data on textarea as html -like in rich wysiwyg editors?

I want to copy/paste html from websites and store them in mysql database. To do this I have checked out CKEditor which allows me to paste html, even word documents and it generates html code for it. Since all I want is to "generate" the pasted data as html, instead of using a full wysiwyg editor like CKEditor, I want to write some code (perhaps with jquery) to convert the pasted data to have html tags and formatting.
To achieve this functionality, what do these online editors do? How do they convert the clipboard data to html code? Why do I get only text when I paste html formatted text or divs or buttons to this textarea here and images and properly sized divs on wysiwyg editors?
Do the editors access the clipboard data and manipulate it? Does clipboard save formatting data in an organized manner allowing the "CKEditor" or others to manipulate it?
Can this be done with jQuery? Or do we need server-side code as well?
If you can shed some light on this subjects I would appreciate it. I only want to know the method so that I can write appropriate code for it.
For reference: http://ckeditor.com/demo
Here's a crude demo which works in Chrome, IE9 and Safari: http://jsfiddle.net/SN6PQ/2/
<div contenteditable="true" id="paste-target">Paste Here</div>​
$(function(){
$("#paste-target").on("paste", function(){
// delay, or else innerHTML won't be updated
setTimeout(function(){
// option 1 - for pasting text that looks like HTML (e.g. a code snippet)
alert($("#paste-target").text());
// option 2 - for pasting actual HTML (e.g. select a webpage and paste it)
alert($("#paste-target").html());
},100);
});
});​
Not sure if this is what you are after, but it alerts HTML on paste. Keep in mind that a content editable element may change the markup on paste.

Keeping the format of pasted text in a textarea without visible mark-up info

I had a case where I copied bold text from a web page into a textarea of another page. On paste, the text area kept the text's bold format. When I pasted the same text into a text file I could see no markups or formatting info along with the text. How is the textarea keeping the format of the text?
Thanks in advance.
EDIT:
How is the textarea or JS drove text editor formatting the pasted text without any formatting info being passed along with the pasted text?
If formatting info is being passed internally by the browser, how is the webpage receiving that info?
Textarea is plain text, so are you sure you didn't paste it into a JS driven text editor in the browser? As for pasting it into a text file... well, this is an OS driven event. It will only work if the application allows formatting of the text (e.g. if you paste it into MS Word or OpenOffice.org Writer).
Answering the additional questions:
Again, the textarea is plain text, so there's no formatting or formatting information. The JS driven editor generates the appropriate code depending on what was pasted. This information is passed by the OS' copy&paste functionality. If it's plain text, then there will be no formatting. If the paste contains formatting codes, it will generate the appropriate markup.

How do Rich Text Editors in HTML documents achieve their rich text formatting?

Could you please explain how the text in the textarea gets styled in rich-text editors? I've tried to style text in the form but it doesn't change.
Is it JS that recognizes characters typed by the user? How's that done?
Edit:
After a bit of research (Google rules!), I've found some excellent information. For others who might be interested, these links were very helpful to me:
http://www.webreference.com/programming/javascript/gr/column11/
http://www.webreference.com/programming/javascript/gr/column12/
I think the answer is that what you're looking at is typically NOT an actual <textarea>. It's a <div> or <span> made to look like a text area. A regular HTML textarea doesn't have individual formatting of the text.
"Rich-Text editors" will have controls that modify the contents of the span/div with regular html markup (<strong>, <em>, etc.) to emulate a full-blown rich text editor
Usually, the rich text editor will provide a variable to specify a style sheet in. That style sheet will then be loaded and applied to the textarea (Most, if not all, rich text editors use a IFRAME to display the editor in, and obviously styles specified in the main document won't apply to it.)
It's a textarea when you send the HTML to the user but the editor replaced it with a div when it can start.
This way, the code gracefully degrades: Users with unsupported web browsers or disabled JavaScript can still edit the text in the textarea while all the other users get a nice rich text editor.
Basically, the TEXTAREA contents are used as the HTML source for the IFRAME.

Categories