Replace / restrict non-standard characters in CKEDITOR - javascript

I have a CKEDITOR instance (version 4.5.7) into which users input content. This content posts to a database field with the collation SQL_Latin1_General_CP1_CI_AS.
The problem comes when a user pastes text from Word or a similar rich-text editor. Two characters in particular get malformed when they hit the database: ” (”) and – (–).
I have already set config.entities to false to prevent the characters from being converted into their HTML equivalents. Now I'm looking for a place where I can intercept the process to find/replace any offending characters. Although the javascript for this sort of thing is easy enough ( text = text.replace('”', '"') ), I'm not sure where to put it in order to make this happen. I've tried placing it in various places within the CKEDITOR.htmlParser.basicWriter function, but nothing so far has worked.
This seems like it would be a fairly common problem - is there perhaps a way to set collation on the editor so it matches the database?
Thank you for any advice.

I kept plunking away in the basicWriter function until eventually I was surprised to find one place that actually does work. Basically, this is the process I used to solve this problem without editing ckeditor.js
Download and open an uncompressed version of the ckeditor.js file.
Locate and copy the entire CKEDITOR.htmlParser.basicWriter function into the bottom of your config.js file. This basically redefines the function, overriding the real one but allowing us to make customizations to it without necessarily breaking future updates.
In the copied function within config.js, locate the getHtml section and customize the html variable before it gets returned. Below is a template to help you locate this section
getHtml: function( reset ) {
var html = this._.output.join( '' );
// this is where we can replace individual characters or make other
// customizations
html = html.replace('”', '"');
html = html.replace('–', '-');
if ( reset )
this.reset();
return html;
}

Related

HTML entities in CSS content (convert entities to escape-string at runtime)

I know that html-entities like or ö or ð can not be used inside a css like this:
div.test:before {
content:"text with html-entities like ` ` or `ö` or `ð`";
}
There is a good question with good answers dealing with this problem: Adding HTML entities using CSS content
But I am reading the strings that are put into the css-content from a server via AJAX. The JavaScript running at the users client receives text with embedded html-entities and creates style-content from it instead of putting it as a text-element into an html-element's content. This method helps against thieves who try to steal my content via copy&paste. Text that is not part of the html-document (but part of css-content) is really hard to copy. This method works fine. There is only this nasty problem with that html-entities.
So I need to convert html-entities into unicode escape-sequences at runtime. I can do this either on the server with a perl-script or on the client with JavaScript, But I don't want to write a subroutine that contains a complete list of all existing named entities. There are more than 2200 named entities in html5, as listed here: http://www.w3.org/TR/2011/WD-html5-20110113/named-character-references.html And I don't want to change my subroutine every time this list gets changed. (Numeric entities are no problem.)
Is there any trick to perfom this conversion with javascript? Maybe by adding, reading and removing content to the DOM? (I am using jQuery)
I've found a solution:
var text = 'Text that contains html-entities';
var myDiv = document.createElement('div');
$(myDiv).html(text);
text = $(myDiv).text();
$('#id_of_a_style-element').html('#id_of_the_protected_div:before{content:"' + text + '"}');
Writing the Question was half way to get this answer. I hope this answer helps others too.

How to develop custom filters for the Imagus hover zoom extension?

After I read about Hover Zoom being evil (yikes!), two articles made me instantly switch to another one, called Imagus:
Hoverzoom’s Malware controversy, and Imagus alternative - ghacks.net
Imagus is a Hover Zoom Replacement to Enlarge Images on Mouseover - LifeHacker
Imagus seems to fit the bill by doing pretty much what Hover Zoom also could, but in addition, it seems to support custom filters (to support more sites), in addition to the huge bunch it already comes packed with.
In the options page, on Chrome, the filters section looks deliciously hackable:
However, at the same time, it seems to be written in what I would call Perl Javascript.
I consider myself well-versed in Javascript, DOM and Regex, but it's just painful to try to guess what that is doing, so I looked for documentation. It seems like there was an MyOpera blog, and now the website of the project is, for the time being, hosted on Google Docs.
The page doesn't mention anything about how to develop "filters" (or "sieves", as written in that page?)
So, how can I develop a custom filter? I'm not aware of all the possibilities (it seems to be pretty flexible), but even a simple example like just modifying URLs would be good. (turning /thumb/123.jpg into /large/123.jpg or something).
Or even just an explanation of the fields. They seem to be:
link
url
res
img
to
note <- Probably Comment
The fieds can contain a JavaScript function or a Regex.
link recives the address of any link you hover over.
url uses captured parentheses values from the link field to make an url.
res recives whatever page, in text, that was pointed to by url or link.
If one of them is empty, that step is skipped, e.g. no url and res just loads from link's output.
A simple example is the xkcd filter:
link:
^(xkcd\.(?:org|com)/\d{1,5})/?$
Finds links to xkcd comics. If you're unfamiliar with regex, anything between the parentheses is saved and can be used in Imagus as "$n" to refer to the nth capture. Note that if there's a "?:" after the first parentheses it wont get captured.
url:
$1/info.0.json
This simply appends "/info.0.json" to the address from link.
res:
:
if ($._[0] != '{') $ = null;
else $ = JSON.parse($._), $ = [$.img, [$.year, ('0'+$.month).slice(-2),
('0'+$.day).slice(-2)].join('-') + ' | ' + $.safe_title + ' - ' + $.alt + ' ' +
$.link];
return $;
This javascript function parses the JSON file and returns an array where the first element is the link and the second is the caption text displayed under the hoverzoomed image.
If you return just a link then the caption will be the alt text of the link.
img is used as link is, but for image sources
to is used as res or url is
A simple use case is when you want to redirect from thumbnails to hires.
Like the filter for wikimapia.org.
img:
^(photos\.wikimapia\.org/p/[^_]+_(?!big))[^.]+
This finds any wikimapia image that doesn't have big in the name.
to:
$1big
Adds big to the url.
note is just for notes.
Some filters have links to API docs here.
Now, there's no documentation for this feature yet so I probably missed a lot, but hopfully it'll be enough.
Cheers.

Javascript Bookmarklet Unresponsive

Javascript newb here. Creating a bookmarklet to automate a simple task at work. Mostly a learning exercise. It will scan a transcript on CNN.com, for instance: (http://transcripts.cnn.com/TRANSCRIPTS/1302/28/acd.01.html). It will grab the lead stories at the top of the page, the name and title of the guests on the show, and format them so that they can be copy pasted into another document.
I've come up with a simple version that includes some jQuery that grabs the subheading and then uses a regular expression to find the names of the guests (it will also exclude everything between (begin videoclip) and (end videoclip), but I haven't gotten that far yet. It then alerts them (will eventually print them in a pop-up window, alert is just for troubleshooting purposes).
I'm using http://benalman.com/code/test/jquery-run-code-bookmarklet/ to create the bookmarklet. My problem is that once the bookmarklet is created it is completely unresponsive. Click on it and nothing happens. I've tried minimizing the code first with no result. My guess is that cnn.com's javascript is conflicting with mine but I'm not sure how to get around that. Or do I need to include some code to load and store the text on the current page? Here's the code (I've included comments, but I took these out when I used the bookmarklet generator.) Thanks for any help!
//Grabs the subheading
var leadStories=$(".cnnTransSubHead").text();
//Scans the webpage for guest name and title. Includes a regular expression to find any
//string that starts with a capital letter, includes a comma, and ends in a colon.
var scanForGuests=/[A-Z ].+,[A-Z0-9 ].+:/g;
//Joins the array created by scanForGuests with a semicolon instead of a comma
var guests=scanForGuests.join(‘; ‘);
//Creates an alert in the proper format including stories and guests.
alert(“Lead Stories: “ + leadStories + “. ” + guests + “. SEE TRANSCRIPT FIELD FOR FULL TRANSCRIPT.“)
Go to the page. Open up developer tools (ctrl+shift+j in chrome) and paste your code in the console to see what's wrong.
The $ in var leadStories = $(".cnnTransSubHead").text(); is from jQuery and the link provided does not have jQuery loaded into the page.
On any modern browser you should be able to achieve the same results without jQuery:
var leadStories = document.getElementsByClassName('cnnTransSubHead')
.map(function(el) { return el.innerText } );
next we have:
var scanForGuests=/[A-Z ].+,[A-Z0-9 ].+:/g;
var guests=scanForGuests.join('; ');
scanForGuests IS a regular expression, you never actually matched it to anything - so .join() is going to throw an error. I'm not exactly sure what you're trying to do. Are you trying to scan the full text of the page for that regex? In that case something like this would be your best bet
document.body.innerText.match(scanForGuests);
keep in mind that while innerText removes html markup, it's far from perfect and what pops up in it is very much at the mercy of how the page's html is structured. That said, on my quick test it seems to work.
Finally, for something like this you should use an immediately invoked function or you're sticking all your variables into the global context.
So putting it all together you get something like this:
(function() {
var leadStories = document.getElementsByClassName('cnnTransSubHead')
.map(function(el) { return el.innerText } );
var scanForGuests=/[A-Z ].+,[A-Z0-9 ].+:/g;
var guests = document.body.innerText.match(scanForGuests).join("; ");
alert("Leads: " + leadStories + " Guests: " + guests);
})();

Javascript Html Swedish characters strange behavior

I would like to ask a question regarding a strange behavior I face using the escape Ascii characters for some Swedish chars.
More specifically, in order to support a multilingual site, I have a json file where I have specified all required messages in Swedish, i.e. 'Avancerad sök'.
Then when the page loads the first time, I set this value to an input text and it is displayed properly: 'Avancerad sök'. But when I click a button and set again the value of this input text I get: 'Avancerad sök'.
Does anyone have faced a similar problem?
Thanks a lot!
Code:
q('#keyword').val(qLanguage.qAdvancedHint);
I execute this code both times. qLanguage is an object which I fill it from the json file and qAdvancedHint a specific key.
Don't know have the specific encoding is called. But tested with js's unescape method, but didn't work.
However a solution, a bad/ugly one, could be to ask jQuery to parse it for you then add it as a value property:
var text = $("<span/>").html(qLanguage.qAdvancedHint).text();
q('#keyword').val(text);

textmate reformat with 2 spaces

I've set textmate to use softtabs 2 spaces on my file. But when I try to reformat the entire document, it uses 2 hard tabs as the indents.
Regular indents work as I want it to, just the document format doesn't. Anyway to get textmate to be obedient?
Thanks.
The JavaScript bundle's "Reformat Document / Selection" command is passing the document's text to the js_beautify function in the bundle's beautify.php file (found on my system and probably by default at /Applications/TextMate.app/Contents/SharedSupport/Bundles/JavaScript.tmbundle/Support/lib/beautify.php). If you take a look at the function definition you'll see that there's a second parameter, $tab_size, with a default value of 4. There's a line in the bundle that reads print js_beautify($input);. Change this to print js_beautify($input, 2); and you should, I expect, get tab stops with two spaces.
To make it a bit more flexible, use the TextMate environment variable TM_TAB_SIZE, as in print js_beautify( $input, getenv('TM_TAB_SIZE' ) );, which should update how the command operates if you ever change your tab size.
Note, I've tested none of this. :) Just took a look at the bundle and tracked down what seems to be necessary.
So, I tried chuck's suggestion and it gave me an error. I did this to "fix it". I'm sure it could be done more elegantly, but this worked for me.
Open up the same file Chuck says to open up, line 50 (or so) should look like this:
function js_beautify($js_source_text, $tab_size = 4)
change $tab_size to 1
function js_beautify($js_source_text, $tab_size = 1)
Now, around line 56 where it says:
$tab_string = str_repeat(' ', $tab_size);
change the space to a tab like so:
$tab_string = str_repeat("\t", $tab_size);
That worked for me.

Categories