Chrome counts characters wrong in textarea with maxlength attribute

Chrome counts characters wrong in textarea with maxlength attribute - javascript

Here is an example:
$(function() {
$('#test').change(function() {
$('#length').html($('#test').val().length)
})
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea id=test maxlength=10></textarea>
length = <span id=length>0</span>
Fill textarea with lines (one character at one line) until browser allows.
When you finish, leave textarea, and js code will calculate characters too.
So in my case I could enter only 7 characters (including whitespaces) before chrome stopped me. Although value of maxlength attribute is 10:

Here's how to get your javascript code to match the amount of characters the browser believes is in the textarea:
http://jsfiddle.net/FjXgA/53/
$(function () {
$('#test').keyup(function () {
var x = $('#test').val();
var newLines = x.match(/(\r\n|\n|\r)/g);
var addition = 0;
if (newLines != null) {
addition = newLines.length;
}
$('#length').html(x.length + addition);
})
})
Basically you just count the total line breaks in the textbox and add 1 to the character count for each one.

Your carriage returns are considered 2 characters each when it comes to maxlength.
1\r\n
1\r\n
1\r\n
1
But it seems that the javascript only could one of the \r\n (I am not sure which one) which only adds up to 7.

It seems like the right method, based on Pointy's answer above, is to count all new lines as two characters. That will standardize it across browsers and match what will get sent when it's posted.
So we could follow the spec and replace all occurrences of a Carriage Return not followed by a New Line, and all New Lines not followed by a Carriage Return, with a Carriage Return - Line Feed pair.
var len = $('#test').val().replace(/\r(?!\n)|\n(?!\r)/g, "\r\n").length;
Then use that variable to display the length of the textarea value, or limit it, and so on.

For reasons unknown, jQuery always converts all newlines in the value of a <textarea> to a single character. That is, if the browser gives it \r\n for a newline, jQuery makes sure it's just \n in the return value of .val().
Chrome and Firefox both count the length of <textarea> tags the same way for the purposes of "maxlength".
However, the HTTP spec insists that newlines be represented as \r\n. Thus, jQuery, webkit, and Firefox all get this wrong.
The upshot is that "maxlength" on <textarea> tags is pretty much useless if your server-side code really has a fixed maximum size for a field value.
edit — at this point (late 2014) it looks like Chrome (38) behaves correctly. Firefox (33) however still doesn't count each hard return as 2 characters.

It looks like that javascript is considering length of new line character also.
Try using:
var x = $('#test').val();
x = x.replace(/(\r\n|\n|\r)/g,"");
$('#length').html(x.length);
I used it in your fiddle and it was working. Hope this helps.

That is because an new line is actually 2 bytes, and therefore 2 long. JavaScript doesn't see it that way and therefore it will count only 1, making the total of 7 (3 new lines)

Here's a more universal solution, which overrides the jQuery 'val' function. Will be making this issue into a blog post shortly and linking here.
var originalVal = $.fn.val;
$.fn.val = function (value) {
if (typeof value == 'undefined') {
// Getter
if ($(this).is("textarea")) {
return originalVal.call(this)
.replace(/\r\n/g, '\n') // reduce all \r\n to \n
.replace(/\r/g, '\n') // reduce all \r to \n (we shouldn't really need this line. this is for paranoia!)
.replace(/\n/g, '\r\n'); // expand all \n to \r\n
// this two-step approach allows us to not accidentally catch a perfect \r\n
// and turn it into a \r\r\n, which wouldn't help anything.
}
return originalVal.call(this);
}
else {
// Setter
return originalVal.call(this, value);
}
};

If you want to get remaining content length of text area then you can use match on the string containing the line breaks.
HTML:
<textarea id="content" rows="5" cols="15" maxlength="250"></textarea>
JS:
var getContentWidthWithNextLine = function(){
return 250 - content.length + (content.match(/\n/g)||[]).length;
}

var value = $('#textarea').val();
var numberOfLineBreaks = (value.match(/\n/g)||[]).length;
$('#textarea').attr("maxlength",500+numberOfLineBreaks);
works perfectly on google already in IE have to avoid the script! In IE the 'break-line' is counted only once, so avoid this solution in IE!

Textareas are still not fully in sync among browsers. I noticed 2 major problems: Carriage returns and Character encodings
Carriage return
By default are manipulated as 2 characters \r\n (Windows style).
The problem is that Chrome and Firefox will count it as one character. You can also select it to observe there is an invisivle character selected as a space.
A workaround is found here:
var length = $.trim($(this).val()).split(" ").join("").split('\n').join('').length;
Jquery word counts when user type line break
Internet explorer on the other hand will count it as 2 characters.
Their representation is :
Binary: 00001101 00001010
Hex: 0D0A
, and are represented in UTF-8 as 2 characters and counted for maxlength as 2 characters.
The HTML entities can be
1) Created from javascript code:
<textarea id='txa'></textarea>
document.getElementById("txa").value = String.fromCharCode(13, 10);
2) Parsed from the content of the textarea:
Ansi code:
<textarea>Line one.
Line two.</textarea>
3) Inserted from keyboard Enter key
4) Defined as the multiline content of the textbox
<textarea>Line one.
Line two.</textarea>
Character Encoding
Character encoding of an input field like textarea is independent than the character encoding of the page. This is important if you plan to count the bytes. So, if you have a meta header to define ANSI encoding of your page (with 1 byte per character), the content of your textbox is still UTF-8 with 2 bytes per character.
A workaround for the character encoding is provided here:
function htmlEncode(value){
// Create a in-memory div, set its inner text (which jQuery automatically encodes)
// Then grab the encoded contents back out. The div never exists on the page.
return $('<div/>').text(value).html();
}
function htmlDecode(value){
return $('<div/>').html(value).text();
}
HTML-encoding lost when attribute read from input field

Related

\s RegEx not capturing new line data

I am trying to clean up input and put it into a desired way. Basically, we have serialnumbers that are entered several different ways - enter delimited (newline), space, comma, etc.
My problem in my code below in testing is that new line delimited isn't working. According to w3schools and 2 other sites:
The \s metacharacter is used to find a whitespace character.
A whitespace character can be:
-A space character
-A tab character
-A carriage return character
-A new line character
-A vertical tab character
-A form feed character
This should mean that I can catch basically any new line. In Netsuite, the user is entering the value as:
SN1SN2SN3
I want this to change to "SN1,SN2,SN3,". Currently the \s RegEx is not picking up the newline? Any help would be appreciated.
**For the record - while I am using Netsuite (CRM) to get the input, the rest of this code is typical javascript and regex work. This is why I am using all 3 tags - netsuite, js, and regex
function fixSerailNumberString(s_serialNum){
var cleanString = '';
var regExSpace = new RegExp('\\s',"g");
if(regExSpace.test(s_serialNum)){
var a_splitSN = s_serialNum.split(regExSpace);
for(var i = 0; i < a_splitSN.length;i++){
if(a_splitSN[i].length!=0){
cleanString = cleanString + a_splitSN[i]+((a_splitSN[i].split(',').length>1)?'':',');
}
}
return cleanString;
}
else{
alert("No cleaning needed");
return s_serialNum;
}
}
EDITS:
1-I need to handle both if it has spaces (such as "sn1, sn2, sn3" needs to become "sn1,sn2,sn3") and this newline issue. What I have above works for the spaces.
2- I am not sure if it matters, but the field is a textarea. Does that impact this?

#Cheery found why this was happening. As I said, I got the data from Netsuite and was using the API to get the data. In the UI of Netsuite this data did look like each line was on a new line, however, when doing a console.log the values were not.
Example:
UI displayed:
sn1
sn2
sn3
Console.log displayed:
sn1sn2sn3
I was assuming the UI translated into the actual value and didn't think to check what the string was.

NetSuite multi-select fields (like the Serial Numbers transaction column) usually return all selected values as a single string, as you've noted with "sn1sn2sn3"; however, each of these values is actually separated by a non-printing character \x05. Try .split(/\x05/).join(',')

Remove Unicode characters within various ranges in javascript

I'm trying to remove every Unicode character in a string if it falls in any the ranges below.
\uD800-\uDFFF
\u1D800-\u1DFFF
\u2D800-\u2DFFF
\u3D800-\u3DFFF
\u4D800-\u4DFFF
\u5D800-\u5DFFF
\u6D800-\u6DFFF
\u7D800-\u7DFFF
\u8D800-\u8DFFF
\u9D800-\u9DFFF
\uAD800-\uADFFF
\uBD800-\uBDFFF
\uCD800-\uCDFFF
\uDD800-\uDDFFF
\uED800-\uEDFFF
\uFD800-\uFDFFF
\u10D800-\u10DFFF
As an initial prototype, I tried to just remove characters within the first range by using a regex in the replace function.
var buffer = "he\udfffllo world";
var output = buffer.replace(/[\ud800-\udfff]/g, "");
d.innerText = buffer + " is replaced with " + output;
In this case, the character seems to have been replaced fine.
However, when I replace that with
var buffer = "he\udfffllo worl\u1dfffd";
var output = buffer.replace(/[\ud800-\udfff\u1d800-\u1dfff]/g, "");
d.innerText = buffer + " is replaced with " + output;
I see something unexpected. My output shows up as:
he�llo worl᷿fd is replaced with
There are two things to note here:
\u1dfff does not show up as one character - \u1dff gets converted to a character and the f at the end it treated as its own character
the result is an empty string.
Any suggestions on how I can accomplish this would be much appreciated.
EDIT
My overall goal is to filter out all characters that the encodeURIComponent function considers invalid. I ran some tests and found the list above to be the set of characters that a invalid. For instance, the code below, which first converts 1dfff to a unicode character before passing that to encodeURIComponent causes an exception to be raised by the latter function.
var v = String.fromCharCode(122879);
var uriComponent = encodeURIComponent(v);
I edited parts of the question after #Blender pointed out that i was using x instead of u in my code to represent Unicode characters.
EDIT 2
I investigated my technique for fetching the "invalid" unicode ranges further, and as it turns out, if you give String.fromCharacterCode a number that's larger than 16 bits, it'll just look at the lowest 16 bits of the number. That explains the pattern I was seeing. So as it turns out, I only need to worry about the first range.

It seems you're trying to remove Unicode surrogate code units from the string. However, only U+D800 through U+DFFF are surrogate code points; the remaining values you name are not, and could be allocated to valid Unicode characters. In that case, the following will suffice (use \u rather than \x to refer to Unicode characters):
buffer.replace(/[\ud800-\udfff]/g, "");

To do a auto enter in text editor on reaching a particular character limit perline

In a JavaScript text editor, the total characters accepted per line is 65. I want to change the number of character accepted in a line to 35 with out changing the total character length of my text editor (65).
So, if a user enters 36th character, the cursor should to go the second line although the character length per line is 65. It has to accept only 35 characters per line.

One can only assume your talking about a TEXTAREA html tag, in which case your best bet would be to use a JavaScript regex replace to insert new lines you can attached this to the OnChange event.
edit This JS fiddle seems to work. I used the 'keyup' event instead of 'change'. http://jsfiddle.net/sUS5s/
HTML
<textarea id="demo" rows="5" cols="65"></textarea>
JS
$("#demo").keyup(function(event) {
var txt = $(this).val();
$(this).val(txt.replace(/([^\r\n]{35})/gm, "$1\r"));
});
Simple regex replace with substitution. Matching 35 consecutive NON-newline characters and replacing them with themselves plus a newline.
This will work if you continually type forward, you need something a little more complex to let you insert into previous lines etc without the new lines getting skewed.
EDIT Without jQuery
document.getElementById("demo").onkeyup = function() {
var txt = document.getElementById("demo").value;
document.getElementById("demo").value = txt.replace(/([^\r\n]{35})/gm, "$1\r");
};

New line characters in text area increases text length in C#

I have this problem in my asp.net mvc application.
In one of my model there is a field "Description". The database column for this fields is set to NVarchar(300).
In my view I am creating a a text area as following.
#Html.TextAreaFor(m => m.Description, new { maxlength = "300" })
I am using "jquery.validate.unobtrusive.min.js" for client side validation. So when user types in the textarea and content length goes more than 300 characters it displays message "Please enter no more than 300 characaters."
Everything works fine till the following sceanario comes.
User enters following data in the text area.
f
f
f
f
f
f
f
f
sdfa
(this content has 8 new lines)
According to "unobtrusive" validation this content has length 300 (counting each new line "\n" as a single character) so the validation passes and page posts back.
In my C# code, due to Encoding, the same content becomes fo length 308 (Counting each new line "\r\n" as 2 characters) which in tern fails the data base operation as it only allows 300 characters.
If someone is saying that I should have StringLength attribute on this particular property, I have the following reason for not having it.
If I put this attribute the client-side validation does not happen for this particular property, it goes to the server and since the model is not valid it comes back to the page with error message.
Please advise me what could be the possible solution for this?

After taking a closer look at the solution by #Chris, I found that this would cause an endless loop for any control other than textarea with the #maxlength attribute. Furthermore, I found that using value (=the value of the textarea passed into the validator) would already have the leading and trailing line breaks cut off, which means the database operation still failed when it tried to save text containing those line breaks.
So here is my solution:
(function ($) {
if ($.validator) {
//get the reference to the original function into a local variable
var _getLength = $.validator.prototype.getLength;
//overwrite existing getLength of validator
$.validator.prototype.getLength = function (value, element) {
//double count line breaks for textareas only
if (element.nodeName.toLowerCase() === 'textarea') {
//Counts all the newline characters (\r = return for macs, \r\n for Windows, \n for Linux/unix)
var newLineCharacterRegexMatch = /\r?\n|\r/g;
//use [element.value] rather than [value] since I found that the value passed in does cut off leading and trailing line breaks.
if (element.value) {
//count newline characters
var regexResult = element.value.match(newLineCharacterRegexMatch);
var newLineCount = regexResult ? regexResult.length : 0;
//replace newline characters with nothing
var replacedValue = element.value.replace(newLineCharacterRegexMatch, "");
//return the length of text without newline characters + doubled newline character count
return replacedValue.length + (newLineCount * 2);
} else {
return 0;
}
}
//call the original function reference with apply
return _getLength.apply(this, arguments);
};
}
})(jQuery);
I tested this in Chrome and a few IE versions and it worked fine for me.

You can change the behavior for getLength in client validation to double count newlines by adding the following to your javascript after you've included jquery.validate.js. This will cause the server-side and client-side length methods to match letting you use the StringLength attribute (I assume your issue with StringLength was that the server and client validation methods differed).
$.validator.prototype._getLength = $.validator.prototype.getLength;
$.validator.prototype.getLength = function (value, element) {
// Double count newlines in a textarea because they'll be turned into \r\n by the server.
if (element.nodeName.toLowerCase() === 'textarea')
return value.length + value.split('\n').length - 1;
return this._getLength(value, element);
};

How to force breaking of non breakable strings?

I have an HTML page that I generate from the data contained in a database. The database sometimes contains long strings that the browser can't break because the strings don't contain breakable characters (space, point, comma, etc...).
Is there any way to fix this using html, css or even javascript?
See this link for an example of the problem.

Yes you can, just set the css property of the box to:
.some_selector {
word-wrap: break-word;
}
Edit: Some testing shows that it does work with a div or a p - a block level element - but it does not work with a table cell, nor when the div is put inside a table cell.
Tested and works in IE6, IE7, IE8, Firefox 3.5.3 and Chrome.
Works:
<div style="word-wrap: break-word">aaaaaaaaaaaaaaaaaaaaaaddddddddddddddddddddddddddddddddddddddddddaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa </div>

Based on this article and this one as well: the "Shy Hyphen" or "Soft Hyphen" can be written in HTML as:  /  / &#xAD (173 dec = AD hex). They all convert to the U+00AD character.
The JavaScript textContent and nodeValue of the DOM Text Nodes are not 'entity encoded' - they just contain the actual entities. In order to write these characters you must therefore encode them yourself: \xAD is a simple way to write the same character in a JavaScript string. String.fromCharCode(173) would also work.
Based on your own VERY good answer - a jQuery Plugin version:
$.fn.replaceInText = function(oldText, newText) {
// contents() gets all child dom nodes -- each lets us operate on them
this.contents().each(function() {
if (this.nodeType == 3) { // text node found, do the replacement
if (this.textContent) {
this.textContent = this.textContent.replace(oldText, newText);
} else { // support to IE
this.nodeValue = this.nodeValue.replace(oldText, newText);
}
} else {
// other types of nodes - scan them for same replace
$(this).replaceInText(oldText, newText);
}
});
return this;
};
$(function() {
$('div').replaceInText(/\w{10}/g, "$&\xAD");
});
A side note:
I think that the place this should happen is NOT in JavaScript - it should be in the server side code. If this is only a page used to display data- you could easily do a similar regexp replace on the text before it is sent to the browser. However the JavaScript solution offers one advantage(or disadvantage depending on how you want to look at it) - It doesn't add any extraneous characters to the data until the script executes, which means any robots crawling your HTML output for data wont see the shy hyphens. Although the HTML spec interprets it as a "hyphenation hint" and an invisible character its not guaranteed across the rest of the Unicode world: (quote from Unicode standard via the second article I linked)
U+00AD soft hyphen indicates a
hyphenation point, where a line-break
is preferred when a word is to be
hyphenated. Depending on the script,
the visible rendering of this
character when a line break occurs may
differ (for example, in some scripts
it is rendered as a hyphen -, while in
others it may be invisible).
Another Note:
Found in this other SO Question - it seems that the "Zero Width Space" character  /  / U+200b is another option you might want to explore. It would be \x20\x0b as a javascript string.

As it has been pointed out numerous times, no, there is nothing you can do about it, without preprocessing the strings programmatically before displaying them.
I know there is a strategy with inserting the soft hyphen character (), where needed, but does not seem like a popular option.
Check out this question: Soft hyphen in HTML ( vs. )

It is also possible to use word-break css property to cut every word on the element edge.
.selector_name {
word-break: break-all;
}
<p class="selector_name">some words some words some words some words</p>
you can obtain:
some word|
s some wo|<-edge of the element
rds some |
words som|
e words |

There is special character  or  that could do it.
For example:
Dzielenie wyrazów
could be display like:
1. dzie
2. le
3. nie wy
5. ra
6. zow

I'm answering my own question here...
Based on your answers, I came up with this solution (thanks to #CMS in this question for his help).
This script breaks any word that is more than 30 characters long by inserting a space at the 31st position.
Here is the fixed version: link
I have one problem left, I'd rather insert a  then a space. But the assigning node.nodeValue or node.textContent causes the insertion of the text  not the tag.
<script type="text/javascript">
$(function() {
replaceText(/\w{30}/g, "$& ", document.body);
});
function replaceText(oldText, newText, node) {
node = node || document.body; // base node
var childs = node.childNodes, i = 0;
while (node = childs[i]) {
if (node.nodeType == 3) { // text node found, do the replacement
if (node.textContent) {
node.textContent = node.textContent.replace(oldText, newText);
} else { // support to IE
node.nodeValue = node.nodeValue.replace(oldText, newText);
}
} else { // not a text mode, look forward
replaceText(oldText, newText, node);
}
i++;
}
}
</script>
I'll wait a few days before I accept this answer in case someone comes up with a simpler solution.
Thanks

The issue with using  and the solutions above is that an extra character is still there, and with a copy/paste action (even in plain text) it comes out.
I would use instead the tag <wbr> that is not visible and is not considered when copying.
For example, to have email addresses break in two lines (only when there is not enough space) I use this:
echo str_replace( "#","<wbr>#", $email );
That results in something like this:
name.surname
#website.com

You can use jQuery to achieve that, but How : Let me explain a little bit. First you need to add the reference and there is a plug-in which may help you : Read More Plugin - JQuery But you need to penetrate your code during the fetch phase. At this point you can handle this problem in HttpHandler or Page_PreInit phase but w/o any server side code it must be hard or perhaps there isn't any way. I don't know but you should be able to add something in your database-fetched html page.

It's easier to break up the long words from a text string, before you add them to the document.
It would also be nice to avoid orphans, where you have only one or two characters on the last line.
This method will insert spaces in every unspaced run of characters longer than n,
splitting it so that there are at least min characters on the last line.
function breakwords(text, n, min){
var L= text.length;
n= n || 20;
min= min || 2;
while(L%n && L%n<min)--n;
var Rx= RegExp('(\\w{'+n+',}?)','g');
text= text.replace(Rx,'$1 ');
return text;
}
//test
var n=30, min=5;
var txt= 'abcdefghijklmnopqrstuvwxyz0123456789 abcdefghijklmnopqrstuvwxyz012345678 abcdefghijklmnopqrstuvwxyz01234567 abcdefghijklmnopqrstuvwxyz0123456';
txt=txt.replace(/(\w{30,})/g,function(w){return breakwords(w,n,min)});
alert(txt.replace(/ +/g,'\n'))
/* returned value: (String)
abcdefghijklmnopqrstuvwxyz0123
456789
abcdefghijklmnopqrstuvwxyz0123
45678
abcdefghijklmnopqrstuvwxyz012
34567
abcdefghijklmnopqrstuvwxyz01
23456
*/

We Keep Coding

JavaScript is the programming language of the Web.

Chrome counts characters wrong in textarea with maxlength attribute - javascript

Your carriage returns are considered 2 characters each when it comes to maxlength. 1\r\n 1\r\n 1\r\n 1 But it seems that the javascript only could one of the \r\n (I am not sure which one) which only adds up to 7.

It looks like that javascript is considering length of new line character also. Try using: var x = $('#test').val(); x = x.replace(/(\r\n|\n|\r)/g,""); $('#length').html(x.length); I used it in your fiddle and it was working. Hope this helps.

That is because an new line is actually 2 bytes, and therefore 2 long. JavaScript doesn't see it that way and therefore it will count only 1, making the total of 7 (3 new lines)

var value = $('#textarea').val(); var numberOfLineBreaks = (value.match(/\n/g)||[]).length; $('#textarea').attr("maxlength",500+numberOfLineBreaks); works perfectly on google already in IE have to avoid the script! In IE the 'break-line' is counted only once, so avoid this solution in IE!

Related

\s RegEx not capturing new line data

Remove Unicode characters within various ranges in javascript

To do a auto enter in text editor on reaching a particular character limit perline

New line characters in text area increases text length in C#

How to force breaking of non breakable strings?

Categories

Resources