Getting non-html text from CKeditor

Getting non-html text from CKeditor - javascript

In my application, in insert news section, i use a sub string of news content for news Summary. for getting news content text from users,i use CKEditor and for news summary i use substring method to get a certain length of news content.but when i'm working with CKEditor i get text with html tags and not plain text and when i use substring method, my news summary become messed! how do i get raw text from this control?
i read this but i can't use getText() method

Try code like this:
CKEDITOR.instances.editor1.document.getBody().getText();
It works fine for me. You can test it on http://ckeditor.com/demo. It's not ideal (text in table cells is joined together without spaces), but may be enough for your needs.
EDIT (20 Dec 2017): The CKEditor 4 demo was moved to https://ckeditor.com/ckeditor-4/ and uses different editor names, so the new code to execute is:
CKEDITOR.instances.ckdemo.document.getBody().getText();
It's also important that it will work in the "Article editor" and in the "Inline editor" you need to get text of a different element:
CKEDITOR.instances.editor1.editable().getText();

do it like this
//getSnapshot() retrieves the "raw" HTML, without tabs, linebreaks etc
var html=CKEDITOR.instances.YOUR_TEXTAREA_ID.getSnapshot();
var dom=document.createElement("DIV");
dom.innerHTML=html;
var plain_text=(dom.textContent || dom.innerText);
alert(plain_text);
viola, grab the portion of plain_text you want.
UPDATE / EXAMPLE
add this javascript
<script type="text/javascript">
function createTextSnippet() {
//example as before, replace YOUR_TEXTAREA_ID
var html=CKEDITOR.instances.YOUR_TEXTAREA_ID.getSnapshot();
var dom=document.createElement("DIV");
dom.innerHTML=html;
var plain_text=(dom.textContent || dom.innerText);
//create and set a 128 char snippet to the hidden form field
var snippet=plain_text.substr(0,127);
document.getElementById("hidden_snippet").value=snippet;
//return true, ok to submit the form
return true;
}
</script>
in your HTML, add createTextSnippet as onsubmit-handler to the form, eg
<form action="xxx" method="xxx" onsubmit="createTextSnippet();" />
inside the form, between <form> and </form> insert
<input type="hidden" name="hidden_snippet" id="hidden_snippet" value="" />
When the form is submitted, you can serverside access hidden_snippet along with the rest of the fields in the form.

i personally use this method to compact the code and remove also double spaces and line feeds:
var TextGrab = CKEDITOR.instances['editor1'].getData();
TextGrab = $(TextGrab).text(); // html to text
TextGrab = TextGrab.replace(/\r?\n|\r/gm," "); // remove line breaks
TextGrab = TextGrab.replace(/\s\s+/g, " ").trim(); // remove double spaces

I used this function:
function getPlainText( strSrc ) {
var resultStr = "";
// Ignore the <p> tag if it is in very start of the text
if(strSrc.indexOf('<p>') == 0)
resultStr = strSrc.substring(3);
else
resultStr = strSrc;
// Replace <p> with two newlines
resultStr = resultStr.replace(/<p>/gi, "\r\n\r\n");
// Replace <br /> with one newline
resultStr = resultStr.replace(/<br \/>/gi, "\r\n");
resultStr = resultStr.replace(/<br>/gi, "\r\n");
//-+-+-+-+-+-+-+-+-+-+-+
// Strip off other HTML tags.
//-+-+-+-+-+-+-+-+-+-+-+
return resultStr.replace( /<[^<|>]+?>/gi,'' );
}
Function call:
var plain_text = getPlainText(FCKeditorAPI.GetInstance("FCKeditor1").GetXHTML());
I created this fiddle for testing: http://jsfiddle.net/4etVv/3/

I use this method (need jQuery):
var objEditor =CKEDITOR.instances["textarea_id"];
var msg = objEditor.getData();
var txt = jQuery(msg).text().replaceAll("\n\n","\n");
hope it helps!

Assuming that editor is your CKEditor instance (CKEditor.instances.editor1 from above example or if you are using events then event.editor). You can use following code to get plain text content.
editor.ui.contentsElement.getChild(0).getText()
Apparently CKEditor adds a "voice label" element to the actual editable content. Hence getChild(0).

Related

Convert HTML to plain text keeping links, bold and italic in Javascript

I am configuring an API to send an email using the content of a publication as the body of the email. The text editor used for the publication save the text in HTML so I needed to convert the result into plain text. There are other questions that have gave me a solution, but I would like to keep from the original text the bold text, italic and the links. So this is what I have:
Body of a test publicacion:
This is bold text.This is regular text.This is italic.This is a link.
Then in the script I have the following function:
function htmlToText(html){
//remove code brakes and tabs
html = html.replace(/\n/g, "");
html = html.replace(/\t/g, "");
//keep html brakes and tabs
html = html.replace(/<\/td>/g, "\t");
html = html.replace(/<\/table>/g, "\n");
html = html.replace(/<\/tr>/g, "\n");
html = html.replace(/<\/p>/g, "\n");
html = html.replace(/<\/div>/g, "\n");
html = html.replace(/<\/h>/g, "\n");
html = html.replace(/<br>/g, "\n"); html = html.replace(/<br( )*\/>/g, "\n");
html = html.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 (Link->$1) ");
//parse html into text
var dom = (new DOMParser()).parseFromString('<!doctype html><body>' + html, 'text/html');
return dom.body.textContent;
}
That gives me some plain text with nice line breaks, but I was wondering if I could get the bold, italic and links.
Thanks.

I had some time on my hands and played around. This is what I came up with:
const copy=document.createElement("div");
copy.innerHTML=container.innerHTML.replace(/\n/g," ").replace(/[\t\n]+/g,"");
const tags={B:["**","**",1], // [<prefix>, <postfix>, <sequence-number> ]
I:["*","*",2],
H2:["##","\n",3],
P:["\n","\n",4],
DIV:["","\n",5],
TD:["","\t",6]};
[...copy.querySelectorAll(Object.keys(tags).join(","))]
.sort((a,b)=>tags[a.tagName][2]-tags[b.tagName][2])
.forEach(e=>{
const [a,b]=tags[e.tagName];
e.innerHTML=(e.matches("TD:first-child") ? "\n": a) + e.innerHTML + b;
});
console.log(copy.textContent.replace(/^ */mg,""));
<div id="container">
<H2>Second level heading</H2>
<div><div>
A <b>first div</b> with a
link (abc) and a
<p>paragraph having itself another link (def) in it.</p>
</div>
</div>
And here is some more <i>"lost" text</i> ...
<table>
<tr><td>one</td><td><b>two</b></td><td>three</td></tr>
<tr><td>a</td><td>b</td><td>c</td></tr>
<tr><td>d</td><td>e</td><td>f</td></tr>
</table>
</div>
Instead of using regexp to "parse" the html I chose to actually treat it in a DOM way: I create a new div element (copy) into which I insert the original .innerHTML. For particular element types I then define some pre- and postfixes that should surround the original .innerHTML. These are stored in tags and applied on the freshly created div element.
This is done by selecting all of the "special" elements (as specified by the tags-keys) and processing them in a given sequential order. Afterwards I simply return the .textContent of the modified copy element.
Plain text cannot really render bold or italics text decoration. For this reason I used modifiers in the markdown style (*:italics, **:bold)

How to find a particular element a replace it with another dynamically in jquery

I have been trying to add some custom features in tinymce editor. A button to highlight a text and the highlighted text should further be replaced with a underscore. This means that the mark element with it content should be replace with something like this:
<p> This yet another moment of trial. We <mark>keep</mark>doing it until it becomes <mark>perfect</mark>.</p>
To this:
<p> This yet another moment of trial. We <b>_____</b>doing it untill it becomes <b>____</b>.</p>
I have been trying it with this function
function getContentFromEditor() {
var content = tinymce.activeEditor.getContent();
content = content.replace("<mark>", "<b>______</b>");
document.getElementById("content_display").innerHTML = content;
}
But it only change the start tag.

This is the regex you need:
/<mark>\[^<>\]*<\/mark>/g
So your code should be updated like this:
content = content.replace(/<mark>[^<>]*<\/mark>/g, "<b>______</b>");
Demo:
This is a sample Demo:
var text = '<p> This yet another moment of trial. We <mark>keep</mark>doing it until it becomes <mark>perfect</mark>.</p>';
text = text.replace(/<mark>[^<>]*<\/mark>/g,"<b>______</b>");
console.log(text);

After making content changes to the tinyMCE editor dynamically you have to call the save function to update it visually.
Depending on your version:
tinymce.triggerSave();
or
tinymce.activeEditor.save();

First, you need to find all the marked elements.
And then loop over all those marked elements and replace them with the target HTML you want the elements to be replaced with
var replaceWithStr = "<b>____</b>";
var markedElems = document.getElementsByTagName("mark");
var markedElemsArr = [].slice.call(markedElems);
markedElemsArr.forEach(function(elem){
//elem.innerText = replaceWithStr; // this will only replace text
//elem.innerHTML = replaceWithStr; // this will not remove the mark tag, so your underscore will still be highlighted
elem.outerHTML = replaceWithStr; // this will give the required result
})
<p> This yet another moment of trial. We <mark>keep</mark> doing it until it becomes <mark>perfect</mark>.</p>

You can try using regular expressions:
content = content.replace(/<mark>[a-zA-Z]*<\/mark>/g,"<b>______</b>");

replacing text from a paste when looping over html elements

I am trying to replace html links (and eventually other elements) with bbcode when a user does a paste from a document (like gdocs or libre office). So we are dealing with rich html already formatted (which is why it needs to copy HTML and not text).
Essentially, I want to be able to copy stuff pre-written from a document into a textarea on my website without having to manually write BBCode tags in the original document (as it's messy for proof-reading).
Thanks to the help here Adjust regex to ignore anything else inside link HTML tags I have gotten mostly there, but I am stuck on replacing the found tags with the original text.
Here's what I have:
function fragmentFromString(strHTML) {
return document.createRange().createContextualFragment(strHTML);
}
$('textarea').on('paste',function(e) {
e.preventDefault();
var text = (e.originalEvent || e).clipboardData.getData('text/html') || prompt('Paste something..');
var fragment = fragmentFromString(text);
var aTags = Array.from(fragment.querySelectorAll('a'));
aTags.forEach(a => {
text = text.replace(a, "[url="+a.href+"]"+a.textContent+"[/url]");
});
window.document.execCommand('insertText', false, text);
});
You can see it loops over the found a tags and I am essentially trying to replace them from the original text with the new stuff.
Here's an example of the type of content that could be pasted (this is a single link from google docs):
<span style="font-size:14.666666666666666px;font-family:Arial;color:#1155cc;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Link test</span>
Expected to be replaced with:
[url=https://www.test.com]Link test[/url]
So I want that HTML replaced, with the BBCode within the original text that's then sent to the textarea from the paste.

The aTags foreach currently does nothing. You need to create a new text node, and replace the existing anchor tag with it.
aTags.forEach(a => {
var new_text = document.createTextNode("[url=" + a.href + "]" + a.textContent + "[/url]");
a.parentNode.insertBefore(new_text, a);
a.parentNode.removeChild(a);
});
window.document.execCommand('insertText', false, text.innerText);
This will replace every a tag into the given text.

How can I Strip all regular html tags except <a></a>, <img>(attributes inside) and <br> with javascript?

When a user create a message there is a multibox and this multibox is connected to a design panel which lets users change fonts, color, size etc.. When the message is submited the message will be displayed with html tags if the user have changed color, size etc on the font.
Note: I need the design panel, I know its possible to remove it but this is not the case :)
It's a Sharepoint standard, The only solution I have is to use javascript to strip these tags when it displayed. The user should only be able to insert links, images and add linebreaks.
Which means that all html tags should be stripped except <a></a>, <img> and <br> tags.
Its also important that the attributes inside the the <img> tag that wont be removed. It could be isplayed like this:
<img src="/image/Penguins.jpg" alt="Penguins.jpg" style="margin:5px;width:331px;">
How can I accomplish this with javascript?
I used to use this following codebehind C# code which worked perfectly but it would strip all html tags except <br> tag only.
public string Strip(string text)
{
return Regex.Replace(text, #"<(?!br[\x20/>])[^<>]+>", string.Empty);
}
Any kind of help is appreciated alot

Does this do what you want? http://jsfiddle.net/smerny/r7vhd/
$("body").find("*").not("a,img,br").each(function() {
$(this).replaceWith(this.innerHTML);
});
Basically select everything except a, img, br and replace them with their content.

Smerny's answer is working well except that the HTML structure is like:
var s = '<div><div>Link<span> Span</span><li></li></div></div>';
var $s = $(s);
$s.find("*").not("a,img,br").each(function() {
$(this).replaceWith(this.innerHTML);
});
console.log($s.html());
The live code is here: http://jsfiddle.net/btvuut55/1/
This happens when there are more than two wrapper outside (two divs in the example above).
Because jQuery reaches the most outside div first, and its innerHTML, which contains span has been retained.
This answer $('#container').find('*:not(br,a,img)').contents().unwrap() fails to deal with tags with empty content.
A working solution is simple: loop from the most inner element towards outside:
var $elements = $s.find("*").not("a,img,br");
for (var i = $elements.length - 1; i >= 0; i--) {
var e = $elements[i];
$(e).replaceWith(e.innerHTML);
}
The working copy is: http://jsfiddle.net/btvuut55/3/

with jQuery you can find all the elements you don't want - then use unwrap to strip the tags
$('#container').find('*:not(br,a,img)').contents().unwrap()
FIDDLE

I think it would be better to extract to good tags. It is easy to match a few tags than to remove the rest of the element and all html possibilities. Try something like this, I tested it and it works fine:
// the following regex matches the good tags with attrinutes an inner content
var ptt = new RegExp("<(?:img|a|br){1}.*/?>(?:(?:.|\n)*</(?:img|a|br){1}>)?", "g");
var input = "<this string would contain the html input to clean>";
var result = "";
var match = ptt.exec(input);
while (match) {
result += match;
match = ptt.exec(input);
}
// result will contain the clean HTML with only the good tags
console.log(result);

TinyMce editor not returning tags

H7i guys, I am having a weird problem with the TinyMce editor. What I am trying to do is to select some text, click a button and append a tag at the start and at the end.
For example, if the original text is <p>hello</p>, the end text would be <myTag><p>hello</p></myTag>.
It works fine but when selecting a single line of text the existing tags are not returned. So in the previous example I would get hello only and not <p>hello</p>.
When I select multiple lines it returns the tags.
Here is what I have tried so far:
var se = ed.selection.getContent(); //Doesn't return tags on single line
var be = ed.selection.getNode().outerHtml; //Doesn't work with multiline
var ke = ed.selection.getContent({ format: 'raw' }); //Same as the first option
Any help?

You will need to employ different functions to get the content, depending on the content the user selected
var node = ed.selection.getNode();
if (node.nodeName != 'P' )
{
content = ed.selection.getContent();
}
else content = node.outerHtml;

I use this, and works well:
var textt= tinyMCE.activeEditor.selection.getContent({format : 'text'});
alert(textt);
BUT NOTE: You should not select text from the start of a paragraph to the end of a paragraph,
because in that case(maybe bug of TinyMce), it cant get content .

We Keep Coding

JavaScript is the programming language of the Web.

Getting non-html text from CKeditor - javascript

I use this method (need jQuery): var objEditor =CKEDITOR.instances["textarea_id"]; var msg = objEditor.getData(); var txt = jQuery(msg).text().replaceAll("\n\n","\n"); hope it helps!

Related

Convert HTML to plain text keeping links, bold and italic in Javascript

How to find a particular element a replace it with another dynamically in jquery

replacing text from a paste when looping over html elements

How can I Strip all regular html tags except <a></a>, <img>(attributes inside) and <br> with javascript?

TinyMce editor not returning tags

Categories

Resources