How to force breaking of non breakable strings? - javascript

I have an HTML page that I generate from the data contained in a database. The database sometimes contains long strings that the browser can't break because the strings don't contain breakable characters (space, point, comma, etc...).
Is there any way to fix this using html, css or even javascript?
See this link for an example of the problem.

Yes you can, just set the css property of the box to:
.some_selector {
word-wrap: break-word;
}
Edit: Some testing shows that it does work with a div or a p - a block level element - but it does not work with a table cell, nor when the div is put inside a table cell.
Tested and works in IE6, IE7, IE8, Firefox 3.5.3 and Chrome.
Works:
<div style="word-wrap: break-word">aaaaaaaaaaaaaaaaaaaaaaddddddddddddddddddddddddddddddddddddddddddaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa </div>

Based on this article and this one as well: the "Shy Hyphen" or "Soft Hyphen" can be written in HTML as: ­ / ­ / &#xAD (173 dec = AD hex). They all convert to the U+00AD character.
The JavaScript textContent and nodeValue of the DOM Text Nodes are not 'entity encoded' - they just contain the actual entities. In order to write these characters you must therefore encode them yourself: \xAD is a simple way to write the same character in a JavaScript string. String.fromCharCode(173) would also work.
Based on your own VERY good answer - a jQuery Plugin version:
$.fn.replaceInText = function(oldText, newText) {
// contents() gets all child dom nodes -- each lets us operate on them
this.contents().each(function() {
if (this.nodeType == 3) { // text node found, do the replacement
if (this.textContent) {
this.textContent = this.textContent.replace(oldText, newText);
} else { // support to IE
this.nodeValue = this.nodeValue.replace(oldText, newText);
}
} else {
// other types of nodes - scan them for same replace
$(this).replaceInText(oldText, newText);
}
});
return this;
};
$(function() {
$('div').replaceInText(/\w{10}/g, "$&\xAD");
});
A side note:
I think that the place this should happen is NOT in JavaScript - it should be in the server side code. If this is only a page used to display data- you could easily do a similar regexp replace on the text before it is sent to the browser. However the JavaScript solution offers one advantage(or disadvantage depending on how you want to look at it) - It doesn't add any extraneous characters to the data until the script executes, which means any robots crawling your HTML output for data wont see the shy hyphens. Although the HTML spec interprets it as a "hyphenation hint" and an invisible character its not guaranteed across the rest of the Unicode world: (quote from Unicode standard via the second article I linked)
U+00AD soft hyphen indicates a
hyphenation point, where a line-break
is preferred when a word is to be
hyphenated. Depending on the script,
the visible rendering of this
character when a line break occurs may
differ (for example, in some scripts
it is rendered as a hyphen -, while in
others it may be invisible).
Another Note:
Found in this other SO Question - it seems that the "Zero Width Space" character ​ / ​ / U+200b is another option you might want to explore. It would be \x20\x0b as a javascript string.

As it has been pointed out numerous times, no, there is nothing you can do about it, without preprocessing the strings programmatically before displaying them.
I know there is a strategy with inserting the soft hyphen character (­), where needed, but does not seem like a popular option.
Check out this question: Soft hyphen in HTML ( vs. ­)

It is also possible to use word-break css property to cut every word on the element edge.
.selector_name {
word-break: break-all;
}
<p class="selector_name">some words some words some words some words</p>
you can obtain:
some word|
s some wo|<-edge of the element
rds some |
words som|
e words |

There is special character ­ or ­ that could do it.
For example:
Dzie­le­nie wy­ra­zów
could be display like:
1. dzie
2. le
3. nie wy
5. ra
6. zow

I'm answering my own question here...
Based on your answers, I came up with this solution (thanks to #CMS in this question for his help).
This script breaks any word that is more than 30 characters long by inserting a space at the 31st position.
Here is the fixed version: link
I have one problem left, I'd rather insert a ­ then a space. But the assigning node.nodeValue or node.textContent causes the insertion of the text ­ not the tag.
<script type="text/javascript">
$(function() {
replaceText(/\w{30}/g, "$& ", document.body);
});
function replaceText(oldText, newText, node) {
node = node || document.body; // base node
var childs = node.childNodes, i = 0;
while (node = childs[i]) {
if (node.nodeType == 3) { // text node found, do the replacement
if (node.textContent) {
node.textContent = node.textContent.replace(oldText, newText);
} else { // support to IE
node.nodeValue = node.nodeValue.replace(oldText, newText);
}
} else { // not a text mode, look forward
replaceText(oldText, newText, node);
}
i++;
}
}
</script>
I'll wait a few days before I accept this answer in case someone comes up with a simpler solution.
Thanks

The issue with using ­ and the solutions above is that an extra character is still there, and with a copy/paste action (even in plain text) it comes out.
I would use instead the tag <wbr> that is not visible and is not considered when copying.
For example, to have email addresses break in two lines (only when there is not enough space) I use this:
echo str_replace( "#","<wbr>#", $email );
That results in something like this:
name.surname
#website.com

You can use jQuery to achieve that, but How : Let me explain a little bit. First you need to add the reference and there is a plug-in which may help you : Read More Plugin - JQuery But you need to penetrate your code during the fetch phase. At this point you can handle this problem in HttpHandler or Page_PreInit phase but w/o any server side code it must be hard or perhaps there isn't any way. I don't know but you should be able to add something in your database-fetched html page.

It's easier to break up the long words from a text string, before you add them to the document.
It would also be nice to avoid orphans, where you have only one or two characters on the last line.
This method will insert spaces in every unspaced run of characters longer than n,
splitting it so that there are at least min characters on the last line.
function breakwords(text, n, min){
var L= text.length;
n= n || 20;
min= min || 2;
while(L%n && L%n<min)--n;
var Rx= RegExp('(\\w{'+n+',}?)','g');
text= text.replace(Rx,'$1 ');
return text;
}
//test
var n=30, min=5;
var txt= 'abcdefghijklmnopqrstuvwxyz0123456789 abcdefghijklmnopqrstuvwxyz012345678 abcdefghijklmnopqrstuvwxyz01234567 abcdefghijklmnopqrstuvwxyz0123456';
txt=txt.replace(/(\w{30,})/g,function(w){return breakwords(w,n,min)});
alert(txt.replace(/ +/g,'\n'))
/* returned value: (String)
abcdefghijklmnopqrstuvwxyz0123
456789
abcdefghijklmnopqrstuvwxyz0123
45678
abcdefghijklmnopqrstuvwxyz012
34567
abcdefghijklmnopqrstuvwxyz01
23456
*/

Related

Stairs pattern in javascript

This function in Javascript doesn't works as desired. But when written in C, it works as desired.
var patt_2 = function()
{
for(i=5;i>=1;i--)
{
for(j=1;j<i;j++)
{
$("#panel8").append(" ");
}
for(k=5;k>=i;k--)
{
$("#panel8").append("*");
}
$("#panel8").append("<br/>");
}
};
Undesired output
Desired Output
You could give your #panel8 element the following to allow for non characters to also allow multiple whitespace:
#panel8 {
white-space: pre;
}
You can read here what this does, quote:
This value prevents user agents from collapsing sequences of white space. Lines are only broken at preserved newline characters.
Basically HTML collapses whitespace characters (" ") to create one space, if you use you overcome this issue but creates uglier code in general (since you'll have to append " " to your string all the time).
If your element can have white-space: pre; then this is a clean and easy solution which doesn't require you to edit your script!
P.S. It's not JS in which it's not working but rather HTML or the HTML parser that collapses the whitespace.

Replace words of text area

I have made a javascript function to replace some words with other words in a text area, but it doesn't work. I have made this:
function wordCheck() {
var text = document.getElementById("eC").value;
var newText = text.replace(/hello/g, '<b>hello</b>');
document.getElementById("eC").innerText = newText;
}
When I alert the variable newText, the console says that the variable doesn't exist.
Can anyone help me?
Edit:
Now it replace the words, but it replaces it with <b>hello</b>, but I want to have it bold. Is there a solution?
Update:
In response to your edit, about your wanting to see the word "hello" show up in bold. The short answer to that is: it can't be done. Not in a simple textarea, at least. You're probably looking for something more like an online WYSIWYG editor, or at least a RTE (Richt Text Editor). There are a couple of them out there, like tinyMCE, for example, which is a decent WYSIWYG editor. A list of RTE's and HTML editors can be found here.
First off: As others have already pointed out: a textarea element's contents is available through its value property, not the innerText. You get the contents alright, but you're trying to update it through the wrong property: use value in both cases.
If you want to replace all occurrences of a string/word/substring, you'll have to resort to using a regular expression, using the g modifier. I'd also recommend making the matching case-insensitive, to replace "hello", "Hello" and "HELLO" all the same:
var txtArea = document.querySelector('#eC');
txtArea.value = txtArea.value.replace(/(hello)/gi, '<b>$1</b>');
As you can see: I captured the match, and used it in the replacement string, to preserve the caps the user might have used.
But wait, there's more:
What if, for some reason, the input already contains <b>Hello</b>, or contains a word containing the string "hello" like "The company is called hellonearth?" Enter conditional matches (aka lookaround assertions) and word boundaries:
txtArea.value = txtArea.value.replace(x.value.replace(/(?!>)\b(hello)\b(?!<)/gi, '<b>$1</b>');
fiddle
How it works:
(?!>): Only match the rest if it isn't preceded by a > char (be more specific, if you want to and use (?!<b>). This is called a negative look-ahead
\b: a word boundary, to make sure we're not matching part of a word
(hello): match and capture the string literal, provided (as explained above) it is not preceded by a > and there is a word boundary
(?!<): same as above, only now we don't want to find a matching </b>, so you can replace this with the more specific (?!<\/b>)
/gi: modifiers, or flags, that affect the entire pattern: g for global (meaning this pattern will be applied to the entire string, not just a single match). The i tells the regex engine the pattern is case-insensitive, ie: h matches both the upper and lowercase character.
The replacement string <b>$1</b>: when the replacement string contains $n substrings, where n is a number, they are treated as backreferences. A regex can group matches into various parts, each group has a number, starting with 1, depending on how many groups you have. We're only grouping one part of the pattern, but suppose we wrote:
'foobar hello foobar'.replace(/(hel)(lo)/g, '<b>$1-$2</b>');
The output would be "foobar <b>hel-lo</b> foobar", because we've split the match up into 2 parts, and added a dash in the replacement string.
I think I'll leave the introduction to RegExp at that... even though we've only scratched the surface, I think it's quite clear now just how powerful regex's can be. Put some time and effort into learning more about this fantastic tool, it is well worth it.
If <textarea>, then you need to use .value property.
document.getElementById("eC").value = newText;
And, as mentioned Barmar, replace() replaces only first word. To replace all word, you need to use simple regex. Note that I removed quotes. /g means global replace.
var newText = text.replace(/hello/g, '<b>hello</b>');
But if you want to really bold your text, you need to use content editable div, not text area:
<div id="eC" contenteditable></div>
So then you need to access innerHTML:
function wordCheck() {
var text = document.getElementById("eC").innerHTML;
var newText = text.replace(/hello/g, '<b>hello</b>');
newText = newText.replace(/<b><b>/g,"<b>");//These two lines are there to prevent <b><b>hello</b></b>
newText = newText.replace(/<\/b><\/b>/g,"</b>");
document.getElementById("eC").innerHTML = newText;
}

testing to see if you have empty text node in xml

Using Ajax to fetch XML data. question about blank text nodes
I read this question/ answer here
Javascript/XML - Getting the node name
and this helped my understanding a ton about how the structure is set up however I still have a question or two.. when he mentions this part:
"Text node with a carriage return and some spaces or tab"
How would you test to see f you have gotten an empty text node like this? I've tried testing to see:
if nodeValue == null
nodeValue == "null"
nodeValue == ""
nodeValue == " "
none of these appear to be working
I figured maybe the length would be 0 so I tested for .length and it returned 5 (1 return key and 4 tabs.. added an extra tab in there and tested again it returned 6)
I then googled how to remove whitespace and used these:
.replace(/\s+/g, ' ');
.replace(/^\s+|\s+$/g, '');
Neither worked and still said the .length was still 5
Reason I want to test for this is because what if I don't know each of the element node names before hand or exactly how the DOM is set up.
Or is there a better way to navigate without bothering to check if a text node is just the tabs/spaces/return key?
.replace(/^\s+|\s+$/g, ''); works unless the spaces/tabs aren't really spaces and tabs, but one of the various other Unicode space-like characters. So I'm guessing you weren't quite using it correctly.
This would be how to use it:
if (!textNode.nodeValue.replace(/^\s+|\s+$/g, '')) {
// Node is empty
}
Example: Live Copy | Live Source
var xml = parseXml("<test>\n\t\t\t</test>");
var textNode = xml.documentElement.firstChild;
display("textNode.nodeValue.length = " +
textNode.nodeValue.length);
display('Is "empty"? ' +
(!textNode.nodeValue.replace(/^\s+|\s+$/g, '')));
function display(msg) {
var p = document.createElement('p');
p.innerHTML = String(msg);
document.body.appendChild(p);
}
Output:
textNode.nodeValue.length = 4
Is "empty"? true
(Where parseXML is from this answer.)
But we really don't need to do a replace (although the overhead of doing it is trivial). A test like this would do it:
if (/^\s*$/.test(textNode.nodeValue)) {
// The node is "empty"
}
That regular expression (zero or more whitespace characters with anchors at both ends) will only match a string that's already empty or consists only of whitespace characters.

Chrome counts characters wrong in textarea with maxlength attribute

Here is an example:
$(function() {
$('#test').change(function() {
$('#length').html($('#test').val().length)
})
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea id=test maxlength=10></textarea>
length = <span id=length>0</span>
Fill textarea with lines (one character at one line) until browser allows.
When you finish, leave textarea, and js code will calculate characters too.
So in my case I could enter only 7 characters (including whitespaces) before chrome stopped me. Although value of maxlength attribute is 10:
Here's how to get your javascript code to match the amount of characters the browser believes is in the textarea:
http://jsfiddle.net/FjXgA/53/
$(function () {
$('#test').keyup(function () {
var x = $('#test').val();
var newLines = x.match(/(\r\n|\n|\r)/g);
var addition = 0;
if (newLines != null) {
addition = newLines.length;
}
$('#length').html(x.length + addition);
})
})
Basically you just count the total line breaks in the textbox and add 1 to the character count for each one.
Your carriage returns are considered 2 characters each when it comes to maxlength.
1\r\n
1\r\n
1\r\n
1
But it seems that the javascript only could one of the \r\n (I am not sure which one) which only adds up to 7.
It seems like the right method, based on Pointy's answer above, is to count all new lines as two characters. That will standardize it across browsers and match what will get sent when it's posted.
So we could follow the spec and replace all occurrences of a Carriage Return not followed by a New Line, and all New Lines not followed by a Carriage Return, with a Carriage Return - Line Feed pair.
var len = $('#test').val().replace(/\r(?!\n)|\n(?!\r)/g, "\r\n").length;
Then use that variable to display the length of the textarea value, or limit it, and so on.
For reasons unknown, jQuery always converts all newlines in the value of a <textarea> to a single character. That is, if the browser gives it \r\n for a newline, jQuery makes sure it's just \n in the return value of .val().
Chrome and Firefox both count the length of <textarea> tags the same way for the purposes of "maxlength".
However, the HTTP spec insists that newlines be represented as \r\n. Thus, jQuery, webkit, and Firefox all get this wrong.
The upshot is that "maxlength" on <textarea> tags is pretty much useless if your server-side code really has a fixed maximum size for a field value.
edit — at this point (late 2014) it looks like Chrome (38) behaves correctly. Firefox (33) however still doesn't count each hard return as 2 characters.
It looks like that javascript is considering length of new line character also.
Try using:
var x = $('#test').val();
x = x.replace(/(\r\n|\n|\r)/g,"");
$('#length').html(x.length);
I used it in your fiddle and it was working. Hope this helps.
That is because an new line is actually 2 bytes, and therefore 2 long. JavaScript doesn't see it that way and therefore it will count only 1, making the total of 7 (3 new lines)
Here's a more universal solution, which overrides the jQuery 'val' function. Will be making this issue into a blog post shortly and linking here.
var originalVal = $.fn.val;
$.fn.val = function (value) {
if (typeof value == 'undefined') {
// Getter
if ($(this).is("textarea")) {
return originalVal.call(this)
.replace(/\r\n/g, '\n') // reduce all \r\n to \n
.replace(/\r/g, '\n') // reduce all \r to \n (we shouldn't really need this line. this is for paranoia!)
.replace(/\n/g, '\r\n'); // expand all \n to \r\n
// this two-step approach allows us to not accidentally catch a perfect \r\n
// and turn it into a \r\r\n, which wouldn't help anything.
}
return originalVal.call(this);
}
else {
// Setter
return originalVal.call(this, value);
}
};
If you want to get remaining content length of text area then you can use match on the string containing the line breaks.
HTML:
<textarea id="content" rows="5" cols="15" maxlength="250"></textarea>
JS:
var getContentWidthWithNextLine = function(){
return 250 - content.length + (content.match(/\n/g)||[]).length;
}
var value = $('#textarea').val();
var numberOfLineBreaks = (value.match(/\n/g)||[]).length;
$('#textarea').attr("maxlength",500+numberOfLineBreaks);
works perfectly on google already in IE have to avoid the script! In IE the 'break-line' is counted only once, so avoid this solution in IE!
Textareas are still not fully in sync among browsers. I noticed 2 major problems: Carriage returns and Character encodings
Carriage return
By default are manipulated as 2 characters \r\n (Windows style).
The problem is that Chrome and Firefox will count it as one character. You can also select it to observe there is an invisivle character selected as a space.
A workaround is found here:
var length = $.trim($(this).val()).split(" ").join("").split('\n').join('').length;
Jquery word counts when user type line break
Internet explorer on the other hand will count it as 2 characters.
Their representation is :
Binary: 00001101 00001010
Hex: 0D0A
, and are represented in UTF-8 as 2 characters and counted for maxlength as 2 characters.
The HTML entities can be
1) Created from javascript code:
<textarea id='txa'></textarea>
document.getElementById("txa").value = String.fromCharCode(13, 10);
2) Parsed from the content of the textarea:
Ansi code:
<textarea>Line one.
Line two.</textarea>
3) Inserted from keyboard Enter key
4) Defined as the multiline content of the textbox
<textarea>Line one.
Line two.</textarea>
Character Encoding
Character encoding of an input field like textarea is independent than the character encoding of the page. This is important if you plan to count the bytes. So, if you have a meta header to define ANSI encoding of your page (with 1 byte per character), the content of your textbox is still UTF-8 with 2 bytes per character.
A workaround for the character encoding is provided here:
function htmlEncode(value){
// Create a in-memory div, set its inner text (which jQuery automatically encodes)
// Then grab the encoded contents back out. The div never exists on the page.
return $('<div/>').text(value).html();
}
function htmlDecode(value){
return $('<div/>').html(value).text();
}
HTML-encoding lost when attribute read from input field

Selenium adding weird characters to the end of javascript in a form field

I'm trying to set up a field to prepopulate with a unique set of characters, so that i can automatically generate test accounts. Because of the way the system is set up, the name field must be unique, and must not include numerical characters.
I put together this selenium code, and it works 99% of the way, but leaves extra garbage characters at the end of the good code.
javascript{stringtime='';
nowtime=new Date().getTime().toString();
for ( var i in nowtime )
{ stringtime+=String.fromCharCode(parseInt(nowtime[i])+65 ); };
'test' + stringtime + '\0'}
Result:
testBCEBBJCBFBBAI + a bunch of characters that won't copy into here. They look like 4 zeros in a box.
Thanks in advance for the help.
Excluding the '\0' character at the end, which shows up at a ?, and within Selenium, I think it's javascript engine is having trouble processing the for(var i in nowtime).
Try it like this:
javascript{
stringtime= '';
nowtime=new Date().getTime().toString();
for(var i = 0; i < nowtime.length; i++){
stringtime += String.fromCharCode(parseInt(nowtime[i])+65);
}
stringtime;
}
Those characters are ones that are outside the standard ASCII that your font can't reproduce. Those numbers signify which character it is. If its 4 zeros, its that \0 char you are putting on at the end. I don't know the language, but it doesn't look like you need that.
Also your random number generator is a bit flawed. Have a look here:
http://www.mediacollege.com/internet/javascript/number/random.html

Categories