Javascript: Remove all utf8 icons in string but preserve chinese chars [duplicate] - javascript

How do I remove emoji code using JavaScript? I thought I had taken care of it using the code below, but I still have characters like 🔴.
function removeInvalidChars() {
return this.replace(/[\uE000-\uF8FF]/g, '');
}

For me none of the answers completely removed all emojis so I had to do some work myself and this is what i got :
text.replace(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g, '');
Also, it should take into account that if one inserting the string later to the database, replacing with empty string could expose security issue. instead replace with the replacement character U+FFFD, see : http://www.unicode.org/reports/tr36/#Deletion_of_Noncharacters

The range you have selected is the Private Use Area, containing non-standard characters. Carriers used to encode emoji as different, inconsistent values inside this range.
More recently, the emoji have been given standardised 'unified' codepoints. Many of these are outside of the Basic Multilingual Plane, in the block U+1F300–U+1F5FF, including your example 🔴 U+1F534 Large Red Circle.
You could detect these characters with [\U0001F300-\U0001F5FF] in a regex engine that supported non-BMP characters, but JavaScript's RegExp is not such a beast. Unfortunately the JS string model is based on UTF-16 code units, so you'd have to work with the UTF-16 surrogates in a regexp:
return this.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')
However, note that there are other characters in the Basic Multilingual Plane that are used as emoji by phones but which long predate emoji. For example U+2665 is the traditional Heart Suit character ♥, but it may be rendered as an emoji graphic on some devices. It's up to you whether you treat this as emoji and try to remove it. See this list for more examples.

I solved it by using a regex with Unicode property escapes. I got it from this article, it's for Java but still very helpful - Remove Emojis from a Java String.
'Smile😀'.replace(/[^\p{L}\p{N}\p{P}\p{Z}^$\n]/gu, '');
It removes all symbols except:
\p{L} - all letters from any language
\p{N} - numbers
\p{P} - punctuation
\p{Z} - whitespace separators
^$\n - add any symbols you want to keep
This one should be more correct and it works, but for me it leaves some trash symbols in the string:
'Smile😀'.replace(/\p{Emoji}/gu, '');
Edit: added symbols from comments

I've found many suggestions around but the regex that have solved my problem is:
/(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|\ud83c[\ude32-\ude3a]|\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])/g
A short example
function removeEmojis (string) {
var regex = /(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|\ud83c[\ude32-\ude3a]|\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])/g;
return string.replace(regex, '');
}
Hope it can help you

Just an addition to #hababr answer.
If you need to get rid of complicated emojis, you have to remove also additional things like modifiers and etc:
'👨🏿‍🎤'.replace(/[\p{Emoji}\p{Emoji_Modifier}\p{Emoji_Component}\p{Emoji_Modifier_Base}\p{Emoji_Presentation}]/gu, '').charCodeAt(0)
update:
*#0-9 - are Emoji characters with a text representation by default, per the Unicode Standard.
so, my current solution is next:
'👨🏿‍🎤'.replace(/(?![*#0-9]+)[\p{Emoji}\p{Emoji_Modifier}\p{Emoji_Component}\p{Emoji_Modifier_Base}\p{Emoji_Presentation}]/gu, '').charCodeAt(0)

I know this post is a bit old, but I stumbled across this very problem at work and a colleague came up with an interesting idea. Basically instead of stripping emoji character only allow valid characters in. Consulting this ASCII table:
http://www.asciitable.com/
A function such as this could only keep legal characters (the range itself dependent on what you are after)
function (input) {
var result = '';
if (input.length == 0)
return input;
for (var indexOfInput = 0, lengthOfInput = input.length; indexOfInput < lengthOfInput; indexOfInput++) {
var charAtSpecificIndex = input[indexOfInput].charCodeAt(0);
if ((32 <= charAtSpecificIndex) && (charAtSpecificIndex <= 126)) {
result += input[indexOfInput];
}
}
return result;
};
This should preserve all numbers, letters and special characters of the Alphabet for a situation where you wish to preserve the English alphabet + number + special characters. Hope it helps someone :)

#bobince's solution didn't work for me. Either the Emojis stayed there or they were swapped by a different Emoji.
This solution did the trick for me:
var ranges = [
'\ud83c[\udf00-\udfff]', // U+1F300 to U+1F3FF
'\ud83d[\udc00-\ude4f]', // U+1F400 to U+1F64F
'\ud83d[\ude80-\udeff]' // U+1F680 to U+1F6FF
];
$('#mybtn').on('click', function() {
removeInvalidChars();
})
function removeInvalidChars() {
var str = $('#myinput').val();
str = str.replace(new RegExp(ranges.join('|'), 'g'), '');
$("#myinput").val(str);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" id="myinput"/>
<input type="submit" id="mybtn" value="clear"/>
Source

After searching and trying lots of unicode regex, I suggest you try this, it can cover all of emojis:
function removeEmoji(str) {
let strCopy = str;
const emojiKeycapRegex = /[\u0023-\u0039]\ufe0f?\u20e3/g;
const emojiRegex = /\p{Extended_Pictographic}/gu;
const emojiComponentRegex = /\p{Emoji_Component}/gu;
if (emojiKeycapRegex.test(strCopy)) {
strCopy = strCopy.replace(emojiKeycapRegex, '');
}
if (emojiRegex.test(strCopy)) {
strCopy = strCopy.replace(emojiRegex, '');
}
if (emojiComponentRegex.test(strCopy)) {
// eslint-disable-next-line no-restricted-syntax
for (const emoji of (strCopy.match(emojiComponentRegex) || [])) {
if (/[\d|*|#]/.test(emoji)) {
continue;
}
strCopy = strCopy.replace(emoji, '');
}
}
return strCopy;
}
let a = "1️⃣aa🤹‍♂️b#️⃣🔤✅❎23#!^*bb🤹🏾🤹‍♀️🚴🏻ccc";
console.log(removeEmoji(a))
Refrence: Unicode Emoij Document

None of the answers here worked for all the unicode characters I tested (specifically characters in the miscellaneous range such as ⛽ or ☯️).
Here is one that worked for me, (heavily) inspired from this SO PHP answer:
function _removeEmojis(str) {
return str.replace(/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?/g, '');
}
(My use case is sorting in a data grid where emojis can come first in a string but users want the text ordered by the actual words.)

sandre89's answer is good but not perfect.
I spent some time on the subject and have a working solution.
var ranges = [
'[\u00A0-\u269f]',
'[\u26A0-\u329f]',
// The following characters could not be minified correctly
// if specifed with the ES6 syntax \u{1F400}
'[🀄-🧀]'
//'[\u{1F004}-\u{1F9C0}]'
];
$('#mybtn').on('click', function() {
removeInvalidChars();
});
function removeInvalidChars() {
var str = $('#myinput').val();
str = str.replace(new RegExp(ranges.join('|'), 'ug'), '');
$("#myinput").val(str);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" id="myinput" />
<input type="submit" id="mybtn" value="clear" />
Here is my CodePen
There are some points to note, though.
Unicode characters from U+1F000 up need a special notation, so you can use sandre89's way, or opt for the \u{1F000} ES6 notation, which may or may not work with your minificator. I succeeded pasting the emojis directly in the UTF-8 encoded script.
Don't forget the u flag in the regex, or your Javascript engine may throw an error.
Beware that things may not be working due to the file encoding, character set, or minificator. In my case nothing worked until I took the script off an .isml file (Demandware) and pasted it into a .js file.
You may gain some insight by referring to Wikipedia Emoji page and How many bytes does one Unicode character take?, and by tinkering with this Online Unicode converter, as I did.

var emoji =/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?|[\u20E3]|[\u26A0-\u3000]|\uD83E[\udd00-\uddff]|[\u00A0-\u269F]/g;
str.replace(emoji, "");
i add this '\uD83E[\udd00-\uddff]'
these emojis were updated when 2018 june
if u want block emojis after other update then use this
str.replace(/[^0-9a-zA-Zㄱ-힣+×÷=%♤♡☆♧)(*&^/~##!-:;,?`_|<>{}¥£€$◇■□●○•°※¤《》¡¿₩\[\]\"\' \\]/g ,"");
u can block all emojis and u can only use eng, num, hangle, and some Characters
thx :)

You can use this function to replace emojis with nothing:
function msgAfterClearEmojis(msg)
{
var new_msg = msg.replace(/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?|[\u20E3]|[\u26A0-\u3000]|\uD83E[\udd00-\uddff]|[\u00A0-\u269F]/g, '').trim();
return new_msg;
}

You can check here with emoji..
😊 , 😌 , 👽
function removeEmoji() {
var y = document.getElementById('textbox_id1');
y.value = y.value.replace(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g, '');
}
input {
padding: 5px;
}
<input type="text" id="textbox_id1" placeholder="Remove emoji..." oninput="removeEmoji()">
You can take more emojis from here: Emoji Keyboard Online

This is the iteration on #hababr's answer.
His answer removes lots of standard chars like $, +, < and so on.
This version keeps all of them (except for the \ backslash - dunno how to properly escape it).
"hey😁 hau💓 ahoy🏴‍☠️ !##$%^&*()-_=+±§;:'\|`~/?[]{},.<>".replace(/[^\p{L}\p{N}\p{P}\p{Z}{^$=+±\\'|`\\~<>}]/gu, "")
// "hey hau ahoy !##$%^&*()-_=+±§;:'|`~/?[]{},.<>"

I have this regex and it works for all emojis i found on this page
try this regex
<:[^:\s]+:\d+>|<a:[^:\s]+:\d+>|(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff]|\ufe0f)

var emojiRegex = /\uD83C\uDFF4(?:\uDB40\uDC67\uDB40\uDC62(?:\uDB40\uDC65\uDB40\uDC6E\uDB40\uDC67|\uDB40\uDC77\uDB40\uDC6C\uDB40\uDC73|\uDB40\uDC73\uDB40\uDC63\uDB40\uDC74)\uDB40\uDC7F|\u200D\u2620\uFE0F)|\uD83D\uDC69\u200D\uD83D\uDC69\u200D(?:\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67]))|\uD83D\uDC68(?:\u200D(?:\u2764\uFE0F\u200D(?:\uD83D\uDC8B\u200D)?\uD83D\uDC68|(?:\uD83D[\uDC68\uDC69])\u200D(?:\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67]))|\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67])|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|(?:\uD83C[\uDFFB-\uDFFF])\u200D(?:\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3]))|\uD83D\uDC69\u200D(?:\u2764\uFE0F\u200D(?:\uD83D\uDC8B\u200D(?:\uD83D[\uDC68\uDC69])|\uD83D[\uDC68\uDC69])|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|\uD83D\uDC69\u200D\uD83D\uDC66\u200D\uD83D\uDC66|(?:\uD83D\uDC41\uFE0F\u200D\uD83D\uDDE8|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2695\u2696\u2708]|\uD83D\uDC68(?:(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2695\u2696\u2708]|\u200D[\u2695\u2696\u2708])|(?:(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)\uFE0F|\uD83D\uDC6F|\uD83E[\uDD3C\uDDDE\uDDDF])\u200D[\u2640\u2642]|(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2640\u2642]|(?:\uD83C[\uDFC3\uDFC4\uDFCA]|\uD83D[\uDC6E\uDC71\uDC73\uDC77\uDC81\uDC82\uDC86\uDC87\uDE45-\uDE47\uDE4B\uDE4D\uDE4E\uDEA3\uDEB4-\uDEB6]|\uD83E[\uDD26\uDD37-\uDD39\uDD3D\uDD3E\uDDB8\uDDB9\uDDD6-\uDDDD])(?:(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2640\u2642]|\u200D[\u2640\u2642])|\uD83D\uDC69\u200D[\u2695\u2696\u2708])\uFE0F|\uD83D\uDC69\u200D\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D\uDC69\u200D\uD83D\uDC69\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D\uDC68(?:\u200D(?:(?:\uD83D[\uDC68\uDC69])\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D[\uDC66\uDC67])|\uD83C[\uDFFB-\uDFFF])|\uD83C\uDFF3\uFE0F\u200D\uD83C\uDF08|\uD83D\uDC69\u200D\uD83D\uDC67|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])\u200D(?:\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|\uD83D\uDC69\u200D\uD83D\uDC66|\uD83C\uDDF6\uD83C\uDDE6|\uD83C\uDDFD\uD83C\uDDF0|\uD83C\uDDF4\uD83C\uDDF2|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])|\uD83C\uDDED(?:\uD83C[\uDDF0\uDDF2\uDDF3\uDDF7\uDDF9\uDDFA])|\uD83C\uDDEC(?:\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEE\uDDF1-\uDDF3\uDDF5-\uDDFA\uDDFC\uDDFE])|\uD83C\uDDEA(?:\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDED\uDDF7-\uDDFA])|\uD83C\uDDE8(?:\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDEE\uDDF0-\uDDF5\uDDF7\uDDFA-\uDDFF])|\uD83C\uDDF2(?:\uD83C[\uDDE6\uDDE8-\uDDED\uDDF0-\uDDFF])|\uD83C\uDDF3(?:\uD83C[\uDDE6\uDDE8\uDDEA-\uDDEC\uDDEE\uDDF1\uDDF4\uDDF5\uDDF7\uDDFA\uDDFF])|\uD83C\uDDFC(?:\uD83C[\uDDEB\uDDF8])|\uD83C\uDDFA(?:\uD83C[\uDDE6\uDDEC\uDDF2\uDDF3\uDDF8\uDDFE\uDDFF])|\uD83C\uDDF0(?:\uD83C[\uDDEA\uDDEC-\uDDEE\uDDF2\uDDF3\uDDF5\uDDF7\uDDFC\uDDFE\uDDFF])|\uD83C\uDDEF(?:\uD83C[\uDDEA\uDDF2\uDDF4\uDDF5])|\uD83C\uDDF8(?:\uD83C[\uDDE6-\uDDEA\uDDEC-\uDDF4\uDDF7-\uDDF9\uDDFB\uDDFD-\uDDFF])|\uD83C\uDDEE(?:\uD83C[\uDDE8-\uDDEA\uDDF1-\uDDF4\uDDF6-\uDDF9])|\uD83C\uDDFF(?:\uD83C[\uDDE6\uDDF2\uDDFC])|\uD83C\uDDEB(?:\uD83C[\uDDEE-\uDDF0\uDDF2\uDDF4\uDDF7])|\uD83C\uDDF5(?:\uD83C[\uDDE6\uDDEA-\uDDED\uDDF0-\uDDF3\uDDF7-\uDDF9\uDDFC\uDDFE])|\uD83C\uDDE9(?:\uD83C[\uDDEA\uDDEC\uDDEF\uDDF0\uDDF2\uDDF4\uDDFF])|\uD83C\uDDF9(?:\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDED\uDDEF-\uDDF4\uDDF7\uDDF9\uDDFB\uDDFC\uDDFF])|\uD83C\uDDE7(?:\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEF\uDDF1-\uDDF4\uDDF6-\uDDF9\uDDFB\uDDFC\uDDFE\uDDFF])|[#\*0-9]\uFE0F\u20E3|\uD83C\uDDF1(?:\uD83C[\uDDE6-\uDDE8\uDDEE\uDDF0\uDDF7-\uDDFB\uDDFE])|\uD83C\uDDE6(?:\uD83C[\uDDE8-\uDDEC\uDDEE\uDDF1\uDDF2\uDDF4\uDDF6-\uDDFA\uDDFC\uDDFD\uDDFF])|\uD83C\uDDF7(?:\uD83C[\uDDEA\uDDF4\uDDF8\uDDFA\uDDFC])|\uD83C\uDDFB(?:\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDEE\uDDF3\uDDFA])|\uD83C\uDDFE(?:\uD83C[\uDDEA\uDDF9])|(?:\uD83C[\uDFC3\uDFC4\uDFCA]|\uD83D[\uDC6E\uDC71\uDC73\uDC77\uDC81\uDC82\uDC86\uDC87\uDE45-\uDE47\uDE4B\uDE4D\uDE4E\uDEA3\uDEB4-\uDEB6]|\uD83E[\uDD26\uDD37-\uDD39\uDD3D\uDD3E\uDDB8\uDDB9\uDDD6-\uDDDD])(?:\uD83C[\uDFFB-\uDFFF])|(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)(?:\uD83C[\uDFFB-\uDFFF])|(?:[\u261D\u270A-\u270D]|\uD83C[\uDF85\uDFC2\uDFC7]|\uD83D[\uDC42\uDC43\uDC46-\uDC50\uDC66\uDC67\uDC70\uDC72\uDC74-\uDC76\uDC78\uDC7C\uDC83\uDC85\uDCAA\uDD74\uDD7A\uDD90\uDD95\uDD96\uDE4C\uDE4F\uDEC0\uDECC]|\uD83E[\uDD18-\uDD1C\uDD1E\uDD1F\uDD30-\uDD36\uDDB5\uDDB6\uDDD1-\uDDD5])(?:\uD83C[\uDFFB-\uDFFF])|(?:[\u231A\u231B\u23E9-\u23EC\u23F0\u23F3\u25FD\u25FE\u2614\u2615\u2648-\u2653\u267F\u2693\u26A1\u26AA\u26AB\u26BD\u26BE\u26C4\u26C5\u26CE\u26D4\u26EA\u26F2\u26F3\u26F5\u26FA\u26FD\u2705\u270A\u270B\u2728\u274C\u274E\u2753-\u2755\u2757\u2795-\u2797\u27B0\u27BF\u2B1B\u2B1C\u2B50\u2B55]|\uD83C[\uDC04\uDCCF\uDD8E\uDD91-\uDD9A\uDDE6-\uDDFF\uDE01\uDE1A\uDE2F\uDE32-\uDE36\uDE38-\uDE3A\uDE50\uDE51\uDF00-\uDF20\uDF2D-\uDF35\uDF37-\uDF7C\uDF7E-\uDF93\uDFA0-\uDFCA\uDFCF-\uDFD3\uDFE0-\uDFF0\uDFF4\uDFF8-\uDFFF]|\uD83D[\uDC00-\uDC3E\uDC40\uDC42-\uDCFC\uDCFF-\uDD3D\uDD4B-\uDD4E\uDD50-\uDD67\uDD7A\uDD95\uDD96\uDDA4\uDDFB-\uDE4F\uDE80-\uDEC5\uDECC\uDED0-\uDED2\uDEEB\uDEEC\uDEF4-\uDEF9]|\uD83E[\uDD10-\uDD3A\uDD3C-\uDD3E\uDD40-\uDD45\uDD47-\uDD70\uDD73-\uDD76\uDD7A\uDD7C-\uDDA2\uDDB0-\uDDB9\uDDC0-\uDDC2\uDDD0-\uDDFF])|(?:[#\*0-9\xA9\xAE\u203C\u2049\u2122\u2139\u2194-\u2199\u21A9\u21AA\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA\u24C2\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE\u2600-\u2604\u260E\u2611\u2614\u2615\u2618\u261D\u2620\u2622\u2623\u2626\u262A\u262E\u262F\u2638-\u263A\u2640\u2642\u2648-\u2653\u265F\u2660\u2663\u2665\u2666\u2668\u267B\u267E\u267F\u2692-\u2697\u2699\u269B\u269C\u26A0\u26A1\u26AA\u26AB\u26B0\u26B1\u26BD\u26BE\u26C4\u26C5\u26C8\u26CE\u26CF\u26D1\u26D3\u26D4\u26E9\u26EA\u26F0-\u26F5\u26F7-\u26FA\u26FD\u2702\u2705\u2708-\u270D\u270F\u2712\u2714\u2716\u271D\u2721\u2728\u2733\u2734\u2744\u2747\u274C\u274E\u2753-\u2755\u2757\u2763\u2764\u2795-\u2797\u27A1\u27B0\u27BF\u2934\u2935\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55\u3030\u303D\u3297\u3299]|\uD83C[\uDC04\uDCCF\uDD70\uDD71\uDD7E\uDD7F\uDD8E\uDD91-\uDD9A\uDDE6-\uDDFF\uDE01\uDE02\uDE1A\uDE2F\uDE32-\uDE3A\uDE50\uDE51\uDF00-\uDF21\uDF24-\uDF93\uDF96\uDF97\uDF99-\uDF9B\uDF9E-\uDFF0\uDFF3-\uDFF5\uDFF7-\uDFFF]|\uD83D[\uDC00-\uDCFD\uDCFF-\uDD3D\uDD49-\uDD4E\uDD50-\uDD67\uDD6F\uDD70\uDD73-\uDD7A\uDD87\uDD8A-\uDD8D\uDD90\uDD95\uDD96\uDDA4\uDDA5\uDDA8\uDDB1\uDDB2\uDDBC\uDDC2-\uDDC4\uDDD1-\uDDD3\uDDDC-\uDDDE\uDDE1\uDDE3\uDDE8\uDDEF\uDDF3\uDDFA-\uDE4F\uDE80-\uDEC5\uDECB-\uDED2\uDEE0-\uDEE5\uDEE9\uDEEB\uDEEC\uDEF0\uDEF3-\uDEF9]|\uD83E[\uDD10-\uDD3A\uDD3C-\uDD3E\uDD40-\uDD45\uDD47-\uDD70\uDD73-\uDD76\uDD7A\uDD7C-\uDDA2\uDDB0-\uDDB9\uDDC0-\uDDC2\uDDD0-\uDDFF])\uFE0F|(?:[\u261D\u26F9\u270A-\u270D]|\uD83C[\uDF85\uDFC2-\uDFC4\uDFC7\uDFCA-\uDFCC]|\uD83D[\uDC42\uDC43\uDC46-\uDC50\uDC66-\uDC69\uDC6E\uDC70-\uDC78\uDC7C\uDC81-\uDC83\uDC85-\uDC87\uDCAA\uDD74\uDD75\uDD7A\uDD90\uDD95\uDD96\uDE45-\uDE47\uDE4B-\uDE4F\uDEA3\uDEB4-\uDEB6\uDEC0\uDECC]|\uD83E[\uDD18-\uDD1C\uDD1E\uDD1F\uDD26\uDD30-\uDD39\uDD3D\uDD3E\uDDB5\uDDB6\uDDB8\uDDB9\uDDD1-\uDDDD])/g;
console.log(text.replace(emojiRegex,'');

<!DOCTYPE html>
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script>
function isEmoji(str) {
var ranges = [
'[\uE000-\uF8FF]',
'\uD83C[\uDC00-\uDFFF]',
'\uD83D[\uDC00-\uDFFF]',
'[\u2011-\u26FF]',
'\uD83E[\uDD10-\uDDFF]'
];
if (str.match(ranges.join('|'))) {
return true;
} else {
return false;
}
}
$(document).ready(function(){
$('input').on('input',function(){
var $th = $(this);
console.log("Value of Input"+$th.val());
emojiInput= isEmoji($th.val());
if (emojiInput==true) {
$th.val("");
}
});
});
</script>
</head>
<body>
Enter your name: <input type="text">
</body>
</html>

There is a modern solution using categories
Modern browsers support Unicode property, which allows you to match emojis based on their belonging in the Emoji Unicode category. For example, you can use Unicode property escapes like \p{Emoji} or \P{Emoji} to match/no match emoji characters. Note that 0123456789#* and other characters are interpreted as emojis using the previous Unicode category. Therefore, a better way to do this is to use the {Extended_Pictographic} Unicode category that denotes all the characters typically understood as emojis instead of the {Emoji} category.
const withEmojis = /\p{Extended_Pictographic}/u
withEmojis.test('😀😀');
//true
withEmojis.test('ab');
//false
withEmojis.test('1');
//false
or with negation
const noEmojis = /\P{Extended_Pictographic}/u
noEmojis.test('😀');
//false
noEmojis.test('1212');
//false

You can use mathiasbynens/emoji-regex package to remove or replace emojis.
You can see the latest build's content to grab the regex by visiting following url:
http://unpkg.com/emoji-regex/index.js

function removeEmoji (content) {
let conByte = new TextEncoder("utf-8").encode(content);
for (let i = 0; i < conByte.length; i++) {
if ((conByte[i] & 0xF8) == 0xF0) {
for (let j = 0; j < 4; j++) {
conByte[i+j]=0x30;
}
i += 3;
}
}
content = new TextDecoder("utf-8").decode(conByte);
return content.replaceAll("0000", "");
}

Related

Input field should not allow icons with input field text JavaScript [duplicate]

How do I remove emoji code using JavaScript? I thought I had taken care of it using the code below, but I still have characters like 🔴.
function removeInvalidChars() {
return this.replace(/[\uE000-\uF8FF]/g, '');
}
For me none of the answers completely removed all emojis so I had to do some work myself and this is what i got :
text.replace(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g, '');
Also, it should take into account that if one inserting the string later to the database, replacing with empty string could expose security issue. instead replace with the replacement character U+FFFD, see : http://www.unicode.org/reports/tr36/#Deletion_of_Noncharacters
The range you have selected is the Private Use Area, containing non-standard characters. Carriers used to encode emoji as different, inconsistent values inside this range.
More recently, the emoji have been given standardised 'unified' codepoints. Many of these are outside of the Basic Multilingual Plane, in the block U+1F300–U+1F5FF, including your example 🔴 U+1F534 Large Red Circle.
You could detect these characters with [\U0001F300-\U0001F5FF] in a regex engine that supported non-BMP characters, but JavaScript's RegExp is not such a beast. Unfortunately the JS string model is based on UTF-16 code units, so you'd have to work with the UTF-16 surrogates in a regexp:
return this.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')
However, note that there are other characters in the Basic Multilingual Plane that are used as emoji by phones but which long predate emoji. For example U+2665 is the traditional Heart Suit character ♥, but it may be rendered as an emoji graphic on some devices. It's up to you whether you treat this as emoji and try to remove it. See this list for more examples.
I solved it by using a regex with Unicode property escapes. I got it from this article, it's for Java but still very helpful - Remove Emojis from a Java String.
'Smile😀'.replace(/[^\p{L}\p{N}\p{P}\p{Z}^$\n]/gu, '');
It removes all symbols except:
\p{L} - all letters from any language
\p{N} - numbers
\p{P} - punctuation
\p{Z} - whitespace separators
^$\n - add any symbols you want to keep
This one should be more correct and it works, but for me it leaves some trash symbols in the string:
'Smile😀'.replace(/\p{Emoji}/gu, '');
Edit: added symbols from comments
I've found many suggestions around but the regex that have solved my problem is:
/(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|\ud83c[\ude32-\ude3a]|\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])/g
A short example
function removeEmojis (string) {
var regex = /(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|\ud83c[\ude32-\ude3a]|\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])/g;
return string.replace(regex, '');
}
Hope it can help you
Just an addition to #hababr answer.
If you need to get rid of complicated emojis, you have to remove also additional things like modifiers and etc:
'👨🏿‍🎤'.replace(/[\p{Emoji}\p{Emoji_Modifier}\p{Emoji_Component}\p{Emoji_Modifier_Base}\p{Emoji_Presentation}]/gu, '').charCodeAt(0)
update:
*#0-9 - are Emoji characters with a text representation by default, per the Unicode Standard.
so, my current solution is next:
'👨🏿‍🎤'.replace(/(?![*#0-9]+)[\p{Emoji}\p{Emoji_Modifier}\p{Emoji_Component}\p{Emoji_Modifier_Base}\p{Emoji_Presentation}]/gu, '').charCodeAt(0)
I know this post is a bit old, but I stumbled across this very problem at work and a colleague came up with an interesting idea. Basically instead of stripping emoji character only allow valid characters in. Consulting this ASCII table:
http://www.asciitable.com/
A function such as this could only keep legal characters (the range itself dependent on what you are after)
function (input) {
var result = '';
if (input.length == 0)
return input;
for (var indexOfInput = 0, lengthOfInput = input.length; indexOfInput < lengthOfInput; indexOfInput++) {
var charAtSpecificIndex = input[indexOfInput].charCodeAt(0);
if ((32 <= charAtSpecificIndex) && (charAtSpecificIndex <= 126)) {
result += input[indexOfInput];
}
}
return result;
};
This should preserve all numbers, letters and special characters of the Alphabet for a situation where you wish to preserve the English alphabet + number + special characters. Hope it helps someone :)
#bobince's solution didn't work for me. Either the Emojis stayed there or they were swapped by a different Emoji.
This solution did the trick for me:
var ranges = [
'\ud83c[\udf00-\udfff]', // U+1F300 to U+1F3FF
'\ud83d[\udc00-\ude4f]', // U+1F400 to U+1F64F
'\ud83d[\ude80-\udeff]' // U+1F680 to U+1F6FF
];
$('#mybtn').on('click', function() {
removeInvalidChars();
})
function removeInvalidChars() {
var str = $('#myinput').val();
str = str.replace(new RegExp(ranges.join('|'), 'g'), '');
$("#myinput").val(str);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" id="myinput"/>
<input type="submit" id="mybtn" value="clear"/>
Source
After searching and trying lots of unicode regex, I suggest you try this, it can cover all of emojis:
function removeEmoji(str) {
let strCopy = str;
const emojiKeycapRegex = /[\u0023-\u0039]\ufe0f?\u20e3/g;
const emojiRegex = /\p{Extended_Pictographic}/gu;
const emojiComponentRegex = /\p{Emoji_Component}/gu;
if (emojiKeycapRegex.test(strCopy)) {
strCopy = strCopy.replace(emojiKeycapRegex, '');
}
if (emojiRegex.test(strCopy)) {
strCopy = strCopy.replace(emojiRegex, '');
}
if (emojiComponentRegex.test(strCopy)) {
// eslint-disable-next-line no-restricted-syntax
for (const emoji of (strCopy.match(emojiComponentRegex) || [])) {
if (/[\d|*|#]/.test(emoji)) {
continue;
}
strCopy = strCopy.replace(emoji, '');
}
}
return strCopy;
}
let a = "1️⃣aa🤹‍♂️b#️⃣🔤✅❎23#!^*bb🤹🏾🤹‍♀️🚴🏻ccc";
console.log(removeEmoji(a))
Refrence: Unicode Emoij Document
None of the answers here worked for all the unicode characters I tested (specifically characters in the miscellaneous range such as ⛽ or ☯️).
Here is one that worked for me, (heavily) inspired from this SO PHP answer:
function _removeEmojis(str) {
return str.replace(/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?/g, '');
}
(My use case is sorting in a data grid where emojis can come first in a string but users want the text ordered by the actual words.)
sandre89's answer is good but not perfect.
I spent some time on the subject and have a working solution.
var ranges = [
'[\u00A0-\u269f]',
'[\u26A0-\u329f]',
// The following characters could not be minified correctly
// if specifed with the ES6 syntax \u{1F400}
'[🀄-🧀]'
//'[\u{1F004}-\u{1F9C0}]'
];
$('#mybtn').on('click', function() {
removeInvalidChars();
});
function removeInvalidChars() {
var str = $('#myinput').val();
str = str.replace(new RegExp(ranges.join('|'), 'ug'), '');
$("#myinput").val(str);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" id="myinput" />
<input type="submit" id="mybtn" value="clear" />
Here is my CodePen
There are some points to note, though.
Unicode characters from U+1F000 up need a special notation, so you can use sandre89's way, or opt for the \u{1F000} ES6 notation, which may or may not work with your minificator. I succeeded pasting the emojis directly in the UTF-8 encoded script.
Don't forget the u flag in the regex, or your Javascript engine may throw an error.
Beware that things may not be working due to the file encoding, character set, or minificator. In my case nothing worked until I took the script off an .isml file (Demandware) and pasted it into a .js file.
You may gain some insight by referring to Wikipedia Emoji page and How many bytes does one Unicode character take?, and by tinkering with this Online Unicode converter, as I did.
var emoji =/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?|[\u20E3]|[\u26A0-\u3000]|\uD83E[\udd00-\uddff]|[\u00A0-\u269F]/g;
str.replace(emoji, "");
i add this '\uD83E[\udd00-\uddff]'
these emojis were updated when 2018 june
if u want block emojis after other update then use this
str.replace(/[^0-9a-zA-Zㄱ-힣+×÷=%♤♡☆♧)(*&^/~##!-:;,?`_|<>{}¥£€$◇■□●○•°※¤《》¡¿₩\[\]\"\' \\]/g ,"");
u can block all emojis and u can only use eng, num, hangle, and some Characters
thx :)
You can use this function to replace emojis with nothing:
function msgAfterClearEmojis(msg)
{
var new_msg = msg.replace(/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?|[\u20E3]|[\u26A0-\u3000]|\uD83E[\udd00-\uddff]|[\u00A0-\u269F]/g, '').trim();
return new_msg;
}
You can check here with emoji..
😊 , 😌 , 👽
function removeEmoji() {
var y = document.getElementById('textbox_id1');
y.value = y.value.replace(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g, '');
}
input {
padding: 5px;
}
<input type="text" id="textbox_id1" placeholder="Remove emoji..." oninput="removeEmoji()">
You can take more emojis from here: Emoji Keyboard Online
This is the iteration on #hababr's answer.
His answer removes lots of standard chars like $, +, < and so on.
This version keeps all of them (except for the \ backslash - dunno how to properly escape it).
"hey😁 hau💓 ahoy🏴‍☠️ !##$%^&*()-_=+±§;:'\|`~/?[]{},.<>".replace(/[^\p{L}\p{N}\p{P}\p{Z}{^$=+±\\'|`\\~<>}]/gu, "")
// "hey hau ahoy !##$%^&*()-_=+±§;:'|`~/?[]{},.<>"
I have this regex and it works for all emojis i found on this page
try this regex
<:[^:\s]+:\d+>|<a:[^:\s]+:\d+>|(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff]|\ufe0f)
var emojiRegex = /\uD83C\uDFF4(?:\uDB40\uDC67\uDB40\uDC62(?:\uDB40\uDC65\uDB40\uDC6E\uDB40\uDC67|\uDB40\uDC77\uDB40\uDC6C\uDB40\uDC73|\uDB40\uDC73\uDB40\uDC63\uDB40\uDC74)\uDB40\uDC7F|\u200D\u2620\uFE0F)|\uD83D\uDC69\u200D\uD83D\uDC69\u200D(?:\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67]))|\uD83D\uDC68(?:\u200D(?:\u2764\uFE0F\u200D(?:\uD83D\uDC8B\u200D)?\uD83D\uDC68|(?:\uD83D[\uDC68\uDC69])\u200D(?:\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67]))|\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67])|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|(?:\uD83C[\uDFFB-\uDFFF])\u200D(?:\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3]))|\uD83D\uDC69\u200D(?:\u2764\uFE0F\u200D(?:\uD83D\uDC8B\u200D(?:\uD83D[\uDC68\uDC69])|\uD83D[\uDC68\uDC69])|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|\uD83D\uDC69\u200D\uD83D\uDC66\u200D\uD83D\uDC66|(?:\uD83D\uDC41\uFE0F\u200D\uD83D\uDDE8|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2695\u2696\u2708]|\uD83D\uDC68(?:(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2695\u2696\u2708]|\u200D[\u2695\u2696\u2708])|(?:(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)\uFE0F|\uD83D\uDC6F|\uD83E[\uDD3C\uDDDE\uDDDF])\u200D[\u2640\u2642]|(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2640\u2642]|(?:\uD83C[\uDFC3\uDFC4\uDFCA]|\uD83D[\uDC6E\uDC71\uDC73\uDC77\uDC81\uDC82\uDC86\uDC87\uDE45-\uDE47\uDE4B\uDE4D\uDE4E\uDEA3\uDEB4-\uDEB6]|\uD83E[\uDD26\uDD37-\uDD39\uDD3D\uDD3E\uDDB8\uDDB9\uDDD6-\uDDDD])(?:(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2640\u2642]|\u200D[\u2640\u2642])|\uD83D\uDC69\u200D[\u2695\u2696\u2708])\uFE0F|\uD83D\uDC69\u200D\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D\uDC69\u200D\uD83D\uDC69\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D\uDC68(?:\u200D(?:(?:\uD83D[\uDC68\uDC69])\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D[\uDC66\uDC67])|\uD83C[\uDFFB-\uDFFF])|\uD83C\uDFF3\uFE0F\u200D\uD83C\uDF08|\uD83D\uDC69\u200D\uD83D\uDC67|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])\u200D(?:\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|\uD83D\uDC69\u200D\uD83D\uDC66|\uD83C\uDDF6\uD83C\uDDE6|\uD83C\uDDFD\uD83C\uDDF0|\uD83C\uDDF4\uD83C\uDDF2|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])|\uD83C\uDDED(?:\uD83C[\uDDF0\uDDF2\uDDF3\uDDF7\uDDF9\uDDFA])|\uD83C\uDDEC(?:\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEE\uDDF1-\uDDF3\uDDF5-\uDDFA\uDDFC\uDDFE])|\uD83C\uDDEA(?:\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDED\uDDF7-\uDDFA])|\uD83C\uDDE8(?:\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDEE\uDDF0-\uDDF5\uDDF7\uDDFA-\uDDFF])|\uD83C\uDDF2(?:\uD83C[\uDDE6\uDDE8-\uDDED\uDDF0-\uDDFF])|\uD83C\uDDF3(?:\uD83C[\uDDE6\uDDE8\uDDEA-\uDDEC\uDDEE\uDDF1\uDDF4\uDDF5\uDDF7\uDDFA\uDDFF])|\uD83C\uDDFC(?:\uD83C[\uDDEB\uDDF8])|\uD83C\uDDFA(?:\uD83C[\uDDE6\uDDEC\uDDF2\uDDF3\uDDF8\uDDFE\uDDFF])|\uD83C\uDDF0(?:\uD83C[\uDDEA\uDDEC-\uDDEE\uDDF2\uDDF3\uDDF5\uDDF7\uDDFC\uDDFE\uDDFF])|\uD83C\uDDEF(?:\uD83C[\uDDEA\uDDF2\uDDF4\uDDF5])|\uD83C\uDDF8(?:\uD83C[\uDDE6-\uDDEA\uDDEC-\uDDF4\uDDF7-\uDDF9\uDDFB\uDDFD-\uDDFF])|\uD83C\uDDEE(?:\uD83C[\uDDE8-\uDDEA\uDDF1-\uDDF4\uDDF6-\uDDF9])|\uD83C\uDDFF(?:\uD83C[\uDDE6\uDDF2\uDDFC])|\uD83C\uDDEB(?:\uD83C[\uDDEE-\uDDF0\uDDF2\uDDF4\uDDF7])|\uD83C\uDDF5(?:\uD83C[\uDDE6\uDDEA-\uDDED\uDDF0-\uDDF3\uDDF7-\uDDF9\uDDFC\uDDFE])|\uD83C\uDDE9(?:\uD83C[\uDDEA\uDDEC\uDDEF\uDDF0\uDDF2\uDDF4\uDDFF])|\uD83C\uDDF9(?:\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDED\uDDEF-\uDDF4\uDDF7\uDDF9\uDDFB\uDDFC\uDDFF])|\uD83C\uDDE7(?:\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEF\uDDF1-\uDDF4\uDDF6-\uDDF9\uDDFB\uDDFC\uDDFE\uDDFF])|[#\*0-9]\uFE0F\u20E3|\uD83C\uDDF1(?:\uD83C[\uDDE6-\uDDE8\uDDEE\uDDF0\uDDF7-\uDDFB\uDDFE])|\uD83C\uDDE6(?:\uD83C[\uDDE8-\uDDEC\uDDEE\uDDF1\uDDF2\uDDF4\uDDF6-\uDDFA\uDDFC\uDDFD\uDDFF])|\uD83C\uDDF7(?:\uD83C[\uDDEA\uDDF4\uDDF8\uDDFA\uDDFC])|\uD83C\uDDFB(?:\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDEE\uDDF3\uDDFA])|\uD83C\uDDFE(?:\uD83C[\uDDEA\uDDF9])|(?:\uD83C[\uDFC3\uDFC4\uDFCA]|\uD83D[\uDC6E\uDC71\uDC73\uDC77\uDC81\uDC82\uDC86\uDC87\uDE45-\uDE47\uDE4B\uDE4D\uDE4E\uDEA3\uDEB4-\uDEB6]|\uD83E[\uDD26\uDD37-\uDD39\uDD3D\uDD3E\uDDB8\uDDB9\uDDD6-\uDDDD])(?:\uD83C[\uDFFB-\uDFFF])|(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)(?:\uD83C[\uDFFB-\uDFFF])|(?:[\u261D\u270A-\u270D]|\uD83C[\uDF85\uDFC2\uDFC7]|\uD83D[\uDC42\uDC43\uDC46-\uDC50\uDC66\uDC67\uDC70\uDC72\uDC74-\uDC76\uDC78\uDC7C\uDC83\uDC85\uDCAA\uDD74\uDD7A\uDD90\uDD95\uDD96\uDE4C\uDE4F\uDEC0\uDECC]|\uD83E[\uDD18-\uDD1C\uDD1E\uDD1F\uDD30-\uDD36\uDDB5\uDDB6\uDDD1-\uDDD5])(?:\uD83C[\uDFFB-\uDFFF])|(?:[\u231A\u231B\u23E9-\u23EC\u23F0\u23F3\u25FD\u25FE\u2614\u2615\u2648-\u2653\u267F\u2693\u26A1\u26AA\u26AB\u26BD\u26BE\u26C4\u26C5\u26CE\u26D4\u26EA\u26F2\u26F3\u26F5\u26FA\u26FD\u2705\u270A\u270B\u2728\u274C\u274E\u2753-\u2755\u2757\u2795-\u2797\u27B0\u27BF\u2B1B\u2B1C\u2B50\u2B55]|\uD83C[\uDC04\uDCCF\uDD8E\uDD91-\uDD9A\uDDE6-\uDDFF\uDE01\uDE1A\uDE2F\uDE32-\uDE36\uDE38-\uDE3A\uDE50\uDE51\uDF00-\uDF20\uDF2D-\uDF35\uDF37-\uDF7C\uDF7E-\uDF93\uDFA0-\uDFCA\uDFCF-\uDFD3\uDFE0-\uDFF0\uDFF4\uDFF8-\uDFFF]|\uD83D[\uDC00-\uDC3E\uDC40\uDC42-\uDCFC\uDCFF-\uDD3D\uDD4B-\uDD4E\uDD50-\uDD67\uDD7A\uDD95\uDD96\uDDA4\uDDFB-\uDE4F\uDE80-\uDEC5\uDECC\uDED0-\uDED2\uDEEB\uDEEC\uDEF4-\uDEF9]|\uD83E[\uDD10-\uDD3A\uDD3C-\uDD3E\uDD40-\uDD45\uDD47-\uDD70\uDD73-\uDD76\uDD7A\uDD7C-\uDDA2\uDDB0-\uDDB9\uDDC0-\uDDC2\uDDD0-\uDDFF])|(?:[#\*0-9\xA9\xAE\u203C\u2049\u2122\u2139\u2194-\u2199\u21A9\u21AA\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA\u24C2\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE\u2600-\u2604\u260E\u2611\u2614\u2615\u2618\u261D\u2620\u2622\u2623\u2626\u262A\u262E\u262F\u2638-\u263A\u2640\u2642\u2648-\u2653\u265F\u2660\u2663\u2665\u2666\u2668\u267B\u267E\u267F\u2692-\u2697\u2699\u269B\u269C\u26A0\u26A1\u26AA\u26AB\u26B0\u26B1\u26BD\u26BE\u26C4\u26C5\u26C8\u26CE\u26CF\u26D1\u26D3\u26D4\u26E9\u26EA\u26F0-\u26F5\u26F7-\u26FA\u26FD\u2702\u2705\u2708-\u270D\u270F\u2712\u2714\u2716\u271D\u2721\u2728\u2733\u2734\u2744\u2747\u274C\u274E\u2753-\u2755\u2757\u2763\u2764\u2795-\u2797\u27A1\u27B0\u27BF\u2934\u2935\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55\u3030\u303D\u3297\u3299]|\uD83C[\uDC04\uDCCF\uDD70\uDD71\uDD7E\uDD7F\uDD8E\uDD91-\uDD9A\uDDE6-\uDDFF\uDE01\uDE02\uDE1A\uDE2F\uDE32-\uDE3A\uDE50\uDE51\uDF00-\uDF21\uDF24-\uDF93\uDF96\uDF97\uDF99-\uDF9B\uDF9E-\uDFF0\uDFF3-\uDFF5\uDFF7-\uDFFF]|\uD83D[\uDC00-\uDCFD\uDCFF-\uDD3D\uDD49-\uDD4E\uDD50-\uDD67\uDD6F\uDD70\uDD73-\uDD7A\uDD87\uDD8A-\uDD8D\uDD90\uDD95\uDD96\uDDA4\uDDA5\uDDA8\uDDB1\uDDB2\uDDBC\uDDC2-\uDDC4\uDDD1-\uDDD3\uDDDC-\uDDDE\uDDE1\uDDE3\uDDE8\uDDEF\uDDF3\uDDFA-\uDE4F\uDE80-\uDEC5\uDECB-\uDED2\uDEE0-\uDEE5\uDEE9\uDEEB\uDEEC\uDEF0\uDEF3-\uDEF9]|\uD83E[\uDD10-\uDD3A\uDD3C-\uDD3E\uDD40-\uDD45\uDD47-\uDD70\uDD73-\uDD76\uDD7A\uDD7C-\uDDA2\uDDB0-\uDDB9\uDDC0-\uDDC2\uDDD0-\uDDFF])\uFE0F|(?:[\u261D\u26F9\u270A-\u270D]|\uD83C[\uDF85\uDFC2-\uDFC4\uDFC7\uDFCA-\uDFCC]|\uD83D[\uDC42\uDC43\uDC46-\uDC50\uDC66-\uDC69\uDC6E\uDC70-\uDC78\uDC7C\uDC81-\uDC83\uDC85-\uDC87\uDCAA\uDD74\uDD75\uDD7A\uDD90\uDD95\uDD96\uDE45-\uDE47\uDE4B-\uDE4F\uDEA3\uDEB4-\uDEB6\uDEC0\uDECC]|\uD83E[\uDD18-\uDD1C\uDD1E\uDD1F\uDD26\uDD30-\uDD39\uDD3D\uDD3E\uDDB5\uDDB6\uDDB8\uDDB9\uDDD1-\uDDDD])/g;
console.log(text.replace(emojiRegex,'');
<!DOCTYPE html>
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script>
function isEmoji(str) {
var ranges = [
'[\uE000-\uF8FF]',
'\uD83C[\uDC00-\uDFFF]',
'\uD83D[\uDC00-\uDFFF]',
'[\u2011-\u26FF]',
'\uD83E[\uDD10-\uDDFF]'
];
if (str.match(ranges.join('|'))) {
return true;
} else {
return false;
}
}
$(document).ready(function(){
$('input').on('input',function(){
var $th = $(this);
console.log("Value of Input"+$th.val());
emojiInput= isEmoji($th.val());
if (emojiInput==true) {
$th.val("");
}
});
});
</script>
</head>
<body>
Enter your name: <input type="text">
</body>
</html>
There is a modern solution using categories
Modern browsers support Unicode property, which allows you to match emojis based on their belonging in the Emoji Unicode category. For example, you can use Unicode property escapes like \p{Emoji} or \P{Emoji} to match/no match emoji characters. Note that 0123456789#* and other characters are interpreted as emojis using the previous Unicode category. Therefore, a better way to do this is to use the {Extended_Pictographic} Unicode category that denotes all the characters typically understood as emojis instead of the {Emoji} category.
const withEmojis = /\p{Extended_Pictographic}/u
withEmojis.test('😀😀');
//true
withEmojis.test('ab');
//false
withEmojis.test('1');
//false
or with negation
const noEmojis = /\P{Extended_Pictographic}/u
noEmojis.test('😀');
//false
noEmojis.test('1212');
//false
You can use mathiasbynens/emoji-regex package to remove or replace emojis.
You can see the latest build's content to grab the regex by visiting following url:
http://unpkg.com/emoji-regex/index.js
In detail, this function first uses TextEncoder to convert content into a byte array with utf-8 encoding, then loops through this array, if it finds a byte whose first five bits are 11110 (i.e. 0xF0), it means this is an emoji start, then it replaces this byte and the next three bytes with 0x30 (i.e. number 0). Finally, it uses TextDecoder to convert the modified byte array back to a string, and uses replaceAll method to remove extra 0s.
function removeEmoji (content) {
let conByte = new TextEncoder("utf-8").encode(content);
for (let i = 0; i < conByte.length; i++) {
if ((conByte[i] & 0xF8) == 0xF0) {
for (let j = 0; j < 4; j++) {
conByte[i+j]=0x30;
}
i += 3;
}
}
content = new TextDecoder("utf-8").decode(conByte);
return content.replaceAll("0000", "");
}

Non-backtracking regex hangs node on strings with newlines

I have no idea why this hangs the javascript engine but it does. Anyone else have a clue?
function isEnglish(text) {
const checker = /^(\p{Emoji}|\p{ASCII})+$/u;
return !!checker.exec(text.replace(/\\n/g, ""));
}
text = `
RT #PROMOSIGROUP: FOLL TWITTER
3K:20rb
5k:30rb
10K:50rb
Foll IG aktif WW
100F:15rb
500F:50rb
1K:100rb
Jual Akun Twitter+IG
081327927525/…`
isEnglish(text);
Ok, figured it out, the "…" character causes the regex engine to spin. Anyone know why this might be?
It looks like you're isEnglish() test is supposed to return true when the source text consists solely of:
US-ASCII characters,
Emoji (not sure why this would count as "english", but whatever), and
Punctuation
and false otherwise.
I might point out that US-ASCII covers U+0000 to U+007F: that includes the C0 Control Characters (U+0000 to U+001F), as well as [DEL] (U+007F), none of which, save whitespace, are actual characters.
But, you're making a mountain out of a molehill: it will be much faster (and clearer) to just search for the first character that's not part of your desired alphabet:
function isEnglish(s) {
return !rxIsNonEnglishAlphabet.test(s);
}
// -------------------------------------------------------
// this regular expression matches characters that are NOT
// * Whitespace
// * US-ASCII (u+0000 through U+007F)
// * Emoji
// * Punctuation
// -------------------------------------------------------
const rxIsNonEnglishAlphabet = /[^\s\p{ASCII}\p{Emoji}\p{Punctuation}]/u;
It turns out that my regex turns into a big backtracking bug even though there isn't anything obvious to me that would cause that. My function now looks much different to get it to work:
function isEnglish(text) {
const ascii = /\p{ASCII}/ug;
const emoji = /\p{Emoji}/ug;
const punct = /\p{Punctuation}/ug;
text = text.replace(ascii, "");
text = text.replace(emoji, "");
text = text.replace(punct, "");
return text.length === 0;
}

Remove special characters from input field

I've been searching everywhere but have been unable to find exactly what I am looking for.
I have an html form that is filled out with Mac addresses from our inventory so the strings inputted into the input field will look like:
A1:A2:A3:A4:A5:A6
I'm trying to write a script to remove the : character plus any spaces anywhere. That way when it is entered the output will be:
A1A2A3A4A5A6
This is what I have so far:
<input type="text" id="macaddress" onChange="removeChar();WriteLog();" />
Then in my script I have:
function removeChar() {
var a = document.getElementById("macaddress").value;
a = a.replace(/: /g, '');
document.getElementById.innerHTML = a;
}
I don't get any JavaScript errors with this but nothing happens.
I also have another script that pulls the value of the field into a work log which is the other function WriteLog().
Essentially I want to remove the : then have the new value pulled into the log by the second function.
If you want to keep only numbers and letts you can use this
a.replace(/[^a-zA-Z0-9]/g, '');
which basically replaces everything that isn't a-z or A-Z or 0-9 with an empty string.
A great tool for explaining regex and testing it is Regex101
And this line document.getElementById.innerHTML = a; should be fixed as well, you probably meant something like document.getElementById('some-elements-id').innerHTML = a;
Question spec says you want to remove : and : with space. Make the space in the regex optional:
a = a.replace(/:( )?/g, '');
But you also need to account for preceeding spaces:
a = a.replace(/( )?:( )?/g, '');
I would also trim the initial string (Just good practice)
a = a.trim().replace(/( )?:( )?/g, '');
Finally, I am not sure what this line does:
document.getElementById.innerHTML = a;, but that line will throw an error. Remove it.
to remove colons and spaces from string simply use
str = str.replace(/[:\s]/g, '');
HTML
<input type="text" id="macaddress"/>
<button onclick="removeChar()">Click me!</button>
JS
function removeChar() {
var a = document.getElementById("macaddress").value.trim();
a = a.split('');
a.forEach(function (character, index) {
if (character === ':') {
a.splice(index, 1);
}
});
a = a.join('');
document.getElementById("macaddress").value = a;
}
Your Regex searches for "colon immediately followed by space".
If you add a pipe in between them: /:| /, then it will search for all colons and/or spaces, in any order.
Demo:
function removeChar() {
var a = document.getElementById("macaddress").value;
a = a.replace(/:| /g, '');
document.getElementById("result").innerHTML = a;
}
<input type="text" id="macaddress" onChange="removeChar();" />
<div id="result"></div>

How do I extract only alphabet from a alphanumeric string

I have a string "5A" or "a6". I want to get only "A" or "a" on the result. I am using the following but it's not working.
Javascript
var answer = '5A';
answer = answer.replace(/^[0-9]+$/i);
//console.log(answer) should be 'A';
let answer = '5A';
answer = answer.replace(/[^a-z]/gi, '');
// [^a-z] matches everything but a-z
// the flag `g` means it should match multiple occasions
// the flag `i` is in case sensitive which means that `A` and `a` is treated as the same character ( and `B,b`, `C,c` etc )
Instead of a-z then you can use \p{L} and the /u modifier which will match any letter, and not just a though z, for instance:
'50Æ'.replace(/[^\p{L}]/gu, ''); // Æ
// [^\p{L}] matches everything but a unicode letter, this includes lower and uppercase letters
// the flag `g` means it should match multiple occasions
// the flag `u` will enable the support for unicode character classes.
See https://caniuse.com/mdn-javascript_builtins_regexp_unicode for support
var answer = '5A';
answer = answer.replace(/[^A-Za-z]/g, '');
g for global, no ^ or $, and '' to replace it with nothing. Leaving off the second parameter replaces it with the string 'undefined'.
I wondered if something like this this might be faster, but it and variations are much slower:
function alphaOnly(a) {
var b = '';
for (var i = 0; i < a.length; i++) {
if (a[i] >= 'A' && a[i] <= 'z') b += a[i];
}
return b;
}
http://jsperf.com/strip-non-alpha
The way you asked, you want to find the letter rather than remove the number (same thing in this example, but could be different depending on your circumstances) - if that's what you want, there's a different path you can choose:
var answer = "5A";
var match = answer.match(/[a-zA-Z]/);
answer = match ? match[0] : null;
It looks for a match on the letter, rather that removing the number. If a match is found, then match[0] will represent the first letter, otherwise match will be null.
var answer = '5A';
answer = answer.replace(/[0-9]/g, '');
You can also do it without a regular expression if you care about performance ;)
You code hade multiple issues:
string.replace takes tow parameters: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace
the flag i, standing for case insensitive, doesn't make sense since you are dealing with numbers (what's an upper-case 1?!)
/^[0-9]+$/ would only match a number, nothing more. You should check this out: http://www.regexper.com/. Enter your regex (without the slashes) in the box and hit enter!
In general I would advice you to learn a bit about basic regular expressions. Here is a useful app to play with them: http://rubular.com/
JavaScript:
It will extract all Alphabets from any string..
var answer = '5A';
answer = answer.replace(/[^a-zA-Z]/g, '');
/*var answer = '5A';
answer = answer.replace(/[^a-zA-Z]/g, '');*/
$("#check").click(function(){
$("#extdata").html("Extraxted Alphabets : <i >"+$("#data").val().replace(/[^a-zA-Z]/g, '')+"</i>");
});
i{
color:green;
letter-spacing:1px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div>
<input type="text" id="data">
<button id="check">Click To Extract</button><br/>
<h5 id="extdata"></h5>
</div>
You can simplify a bit #TrevorDixon's and #Aegis's answers using \d (digit) instead of [0-9]
var answer = '5A';
answer = answer.replace(/\d/g, '');
var answer = '5A';
answer.replace(/\W/g,"").replace(/\d/g,"")

Japanese characters :Convert lowerCase to upperCase using jquery

I tried using toUppercase() method to convert japanese characters to uppercase but it return same string with out conversion.
Is there any other way to do this using jquery or javascript.
fieldValue = "ショウコ"; //japanese string.
function convertToUppercase(fieldValue)
{
convertedValue = fieldValue.toUpperCase();
return convertedValue;
}
Any help would be greatly appreciated!
There's a list of all the "small" letters (known as "youon") on Wikipedia:
ぁぃぅぇぉっゃゅょゎァィゥェォヵㇰヶㇱㇲッㇳㇴㇵㇶㇷㇷ゚ㇸㇹㇺャュョㇻㇼㇽㇾㇿヮ
You could use a switch statement to convert them to their "big" equivalents, which I'll type out here for your convenience:
あいうえおつやゆよわアイウエオカクケシスツトヌハヒフプヘホムヤユヨラリルレロワ
Note that the consonants aren't necessarily read the same way when they're made "big"; for example, 何ヶ月 is read "なんかげつ(nankagetsu)", not "なんけげつ(nankegetsu)". っ, which indicates a glottal stop on the next syllable, is read as "tsu" when it's made big.
The "big" vowels indicate that they need to be given their own syllable length. (There's a term for this too, but I'm not a linguist -- sorry!)
I'm a bit ignorant on the names of Japanese characters but I do know that Sugar.js has many methods for manipulating and transforming such characters. It has methods such as zenkaku, hankaku, hiragana, and katakana.
Here's a link to the Sugarjs' String API
Thanks for the help and guided me to right way
Finally I came up with this solution.
function convertBigKana(kanaVal){
var smallKana = Array('ァ','ィ','ゥ','ェ','ォ','ヵ','ヶ','ㇱ','ㇲ','ッ','ㇳ','ㇴ','ㇵ','ㇶ','ㇷ','ㇸ','ㇹ','ㇺ','ャ','ュ','ョ','ㇻ','ㇼ','ㇽ','ㇾ','ㇿ','ヮ');
var bigKana = Array('ア','イ','ウ','エ','オ','カ','ケ','シ','ス','ツ','ト','ヌ','ハ','ヒ','フ','ヘ','ホ','ム','ヤ','ユ','ヨ','ラ','リ','ル','レ','ロ','ワ');
var ckanaVal = '';
for (var i = 0; i < kanaVal.length; i++){
//var index = smallKana.indexOf(kanaVal.charAt(i)); //indexOf and stri[i] don't work on ie
var index = jQuery.inArray(kanaVal.charAt(i), smallKana);
if (index !== -1) {
ckanaVal+= bigKana[index];
}
else
{
ckanaVal+= kanaVal.charAt(i);
}
}
return ckanaVal;
}

Categories