Disabling ranges of Unicode characters in input field - javascript

I have a textarea meant for plain text that users sometimes copy and paste special characters into. It becomes a problem when emoticons are used, because it's material we then need to include in PDF files.
For instance: ❤️
❤
Now my question is, how could I go about identifying such characters and removing them with Javascript as the form is validated? I don't want to be too restrictive, as many languages are allowed (Russian, Arabic, etc.). Only those symbols would need to be excluded.
Thank you

See http://crocodillon.com/blog/parsing-emoji-unicode-in-javascript. The problem is that emoticons are in the Supplementary plane. That does not allow you to use a normal character range; instead you need to work with "surrogate pairs", along the lines of
/\ud83d[\ude00-\ude4f]/
The link above has additional information on how to find and treat emoticon characters in other Unicode ranges.

Related

WYSIWYG HTML editors, characters left and MySQL storage Optimization

I want to use a WYSIWYG editor on my <textarea>. In my MySQL I limit the varchar field to 500 characters. I also put a limit of maxlength in the textarea tag.
The problem is that the editor generated HTML tags on user input (e.g. <p>, ) and those take up space as well. I prefer not using a very large comment field (prefer around 1000 chars).
I also show the user the amount of characters left.
The problem is that with the tags the 500 limit take up the space really fast.I prefer not to use like 5000 chars if the user just write a short 10 character comment. Is there a better way to implement this to save up space in the database, report the correct chars left and have a well marked text?
I'm currently using Trumbowyg editor. I thought about just ditching the editors and using plain textarea, but do want to be able to allow bold text and maintain line breaks.I am searching for an optimal solution.
My project is built in ASP.NET/C# + jQuery.
I would not worry about setting a 500 or 5000 VARCHAR, as the space is variable anyhow.
Premature optimization is the root of all evil and all that :)
Do a basic count of the raw text in the textarea, and maybe later see if there are comments that generate unusual amounts of HTML tags.

Using German characters (e.g. Umlaut) in HTML

I am trying to encode German characters in html. I just want to use the special character codes. For the Umlaut, I've tried using both Ü and Ü and neither renders properly. What am I doing wrong? Thanks.
This is for a Squarespace site, and I am inserting Javascript into their Code Injection page, into the footer. I am using Javascript to write a German word on the page. The relevant part of my code looks like the below. And the problem is that this simply renders "& U uml;ber' (space added by me because the umlaut renders properly on Stack Overflow without it) on the page rather than Uber with an umlaut. Thanks!
var strings = {
'About': {
'de': 'Über'
},
Try using Ü for the German character Ü.
You need to escape special characters in HTML, unless...
You address the encoding issue on a document-wide level by adding the following line of code at the beginning of the <head> section:
<meta charset="utf-8">
Then you don't need to escape special characters individually.
Further reading:
Character encodings for beginners
Declaring character encodings in HTML
Declaring character encodings in CSS
UPDATE 1 (Javascript)
Convert special characters to HTML in Javascript
How to convert characters to HTML entities using plain JavaScript
UPDATE 2 (Squarespace)
HTML Special Characters and Squarespace
special character in squarespace (text block)
If you are a Squarespace customer, they provide 24/7 customer support. Contact them directly.
This solution worked for me. Use the hex code but eliminate the &# from the beginning and add a /
So, to render the word "Über" within Javascript in Squarespace, use /xdcber

I want to strip certain characters from a textarea - PHP/Javascript [duplicate]

This question already has answers here:
How to replace Microsoft-encoded quotes in PHP
(6 answers)
Closed 9 years ago.
The Details
I have a simple textarea <textarea></textarea>
The value of this textarea is sent through ajax and stored in a database.
The value in this database is viewed on an iPad (or iPad mini or iPhone, etc)
The Problem
When someone copies text from somewhere (could be anywhere from the internet potentially), I want to remove any weird characters such as: “windows-1252 quotes” from the text before storing them in a utf8_unicode_ci column in a database. This column stores the above quotes but are unknown on certain devices (like iPad)
The Question
How can I remove these characters in Javascript or PHP?
string.replace has been tried from various examples to remove these characters.
htmlentities($sample) has been tried in order to convert these characters but still no luck.
Any help would be appreciated! Thanks!
Regular expressions will do this; php's function for this is preg_replace, javascript's is simply .replace(). You can find usage snippets everywhere ;)
There are two ways to approach this using regex:
1. Define an allowed character range and strip anything that isn't in that range.
[^\w-=+()!##$%^*(] will match NOT anything in this character range (the ^ at the beginning of the character class denotes this). You can then take the resulting matched characters and replace with an empty string.
Working example: http://regex101.com/r/zK2qW6
2. Define a non-allowed character range and strip anything that is in that range.
[“”] will match anything in this character range. You can then take the resulting matched characters, and again replace with an empty string. You could also use a regex unicode range here too.
Working example: http://regex101.com/r/yG4qJ4
In the end, you should choose the path which requires the smallest expression. If there's only a handful of characters to replace, use option #2. If you only want to allow a handful of characters, use option #1.

how to prevent scripts from being run

SO kept preventing me from posting the title I wanted so finally got a title that let me post though it kind of sucks so feel free to edit/change it.
I have fields a user can fill in and in the javascript we have
'${chart.title}'
and stuff like that. Is it sufficient to just strip out the single quote character such that they cannot escape it back to javascript? or are there other ways to close out the string that started with the single quote character.
${chart.title} inserts the title a user typed in on a previous page so naturally they could type something like "Title'+callMethod()+'RestOfTitle" injecting a callMethod into my javascript.
thanks,
Dean
The best way would be to restrict the input to alphanumerical and space characters.
If you want to allow anything inside the title, you can use a escaping function.
http://xkr.us/articles/javascript/encode-compare/
Just stripping the string of single quote characters is definitely not enough. Think of new lines for one reason.
There are couple of options.
First go very restrictive way and do both so called white-list validation for input field for you title and always encode the text that you output to the page. That will filtered out all unwanted (and potentially dangerous) characters and make sure that if some of them pass filter (or somebody update the text to contains some js code after the filters were applied) the encoding procedure make all malicious js scripts not runable (it turns it into plain text).
Second you do let your users input what ever they want (which is highly unrecommended way but sometime developers asked to do it) but always encode the text that you output to the page.
You can implement white-list validation by yourself using regular expression or you can use one of the libraries.

How can i limit the number of characters the user can enter into ckeditor?

I'm using the CKeditor and I need to be able to impose a maxLength restriction on it.
For instance, prevent user from entering more than 100 characters, excluding the html characters
applied by the user.
Has anyone been able to do this?
Thanks, I'd appreciate if you point me towards a resource. I found similar questions here but they were not of much help.
I doubt this is going to end up being reliable even if someone posts an approach. Consider the following:
var tags = /<[^>]*?\/?>/;
That should match most tags, but what if you get someone who does something screwy like this:
<img alt=">My Title<" />
Now your regular expression that should be ignoring tags is improperly recognizing the contents of this image's alt tag as counting towards their character limit. If some back end system requires that the text content be only 100 characters what I'd suggest doing is giving the user a single text input with a maxlength of 100, and then look for another control or library that will let them change it's look and feel via CSS.
Attempting to strip out the HTML Tags and then count the remaining characters is unlikely to do anything but give you a headache, will be error prone in the best of cases, and will malfunction entirely in the worst of cases.

Categories