testing to see if you have empty text node in xml - javascript

Using Ajax to fetch XML data. question about blank text nodes
I read this question/ answer here
Javascript/XML - Getting the node name
and this helped my understanding a ton about how the structure is set up however I still have a question or two.. when he mentions this part:
"Text node with a carriage return and some spaces or tab"
How would you test to see f you have gotten an empty text node like this? I've tried testing to see:
if nodeValue == null
nodeValue == "null"
nodeValue == ""
nodeValue == " "
none of these appear to be working
I figured maybe the length would be 0 so I tested for .length and it returned 5 (1 return key and 4 tabs.. added an extra tab in there and tested again it returned 6)
I then googled how to remove whitespace and used these:
.replace(/\s+/g, ' ');
.replace(/^\s+|\s+$/g, '');
Neither worked and still said the .length was still 5
Reason I want to test for this is because what if I don't know each of the element node names before hand or exactly how the DOM is set up.
Or is there a better way to navigate without bothering to check if a text node is just the tabs/spaces/return key?

.replace(/^\s+|\s+$/g, ''); works unless the spaces/tabs aren't really spaces and tabs, but one of the various other Unicode space-like characters. So I'm guessing you weren't quite using it correctly.
This would be how to use it:
if (!textNode.nodeValue.replace(/^\s+|\s+$/g, '')) {
// Node is empty
}
Example: Live Copy | Live Source
var xml = parseXml("<test>\n\t\t\t</test>");
var textNode = xml.documentElement.firstChild;
display("textNode.nodeValue.length = " +
textNode.nodeValue.length);
display('Is "empty"? ' +
(!textNode.nodeValue.replace(/^\s+|\s+$/g, '')));
function display(msg) {
var p = document.createElement('p');
p.innerHTML = String(msg);
document.body.appendChild(p);
}
Output:
textNode.nodeValue.length = 4
Is "empty"? true
(Where parseXML is from this answer.)
But we really don't need to do a replace (although the overhead of doing it is trivial). A test like this would do it:
if (/^\s*$/.test(textNode.nodeValue)) {
// The node is "empty"
}
That regular expression (zero or more whitespace characters with anchors at both ends) will only match a string that's already empty or consists only of whitespace characters.

Related

str.IndexOf fails on string combination separated by "space"

I have a Javascript function which goes like this. I can see that the words are side by side separated by space in document. Any idea why this is failing?
var txt = $("tr.trofinterest:first").text().toUpperCase(); //even triedgetting rid of whitespace characters
//.replace(" ", " ").replace("\r", " ").replace("\n", " ")
//and various combination of regex but the function below fails
if (txt.indexOf("TWO WORDS") >= 0) {
// do sth here //but this is returned false
console.log("Found TWO WORDS together");
}
//But the folloing statement returns true in all cases
if (txt.indexOf("TWO") >= 0 && txt.indexOf("WORDS") >= 0) {
console.log("words exist in sentence and i can see they are side by side seperated by space")
}
console.log(txt); //this prints "This tr has text with Two Words which are interesting and the words are side by side"
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<table>
<!-- nested child tables with a -->
<tr class='trofinterest'>
<td>This tr has text with Two Words which are interesting and the words are side by side</td>
</tr>
</table>
So basically whitespace characters ere messing with the string and Firefox would not show the string exactly
I had to first convert "spaces" to '+' using
txt = txt.toUpperCase().replace(/\s/g, '+');
// this gives better visual represntation of the problem
do {
txt = txt.replace("++", '+'); filter out mutiples to single
}
while (txt.indexOf("++") >= 0);
if(txt.indexOf("TWO+WORDS")>=0){
console.log("Hey! Hey! Ho! Ho! Not cloning your grandpa has got to go!!!")
}
Edit for down voting. I am only interested in "TWO+WORDS" not the entire string. Infact its better if TWO+WORDS is in original string. Sometimes no need to over engineer a solution. Anyway my answer provides a method to figure out the problem. First steps in finding a solution.

Parsing Text with jQuery

I'm attempting to parse a text string with jQuery and to make a variable out of it. The string is below:
Publications Deadlines: armadllo
I'm trying to just get everything past "Publications Deadlines: ", so it includes whatever the name is, regardless of how long or how many words it is.
I'm getting the text via a the jQuery .text() function like so:
$('.label_im_getting').text()
I feel like this may be a simple solution that I just can't put together. Traditional JS is fine as well if it's more efficient than JQ!
Try this,
Live Demo
First part
str = $.trim($('.label_im_getting').text().split(':')[0]);
Second part
str = $.trim($('.label_im_getting').text().split(':')[1]);
var string = input.split(':') // splits in two halfs based on the position of ':'
string = input[1] // take the second half
string = string.replace(/ /g, ''); // removes all the spaces.

Replace any words with modified version of themselves

I'm looking for an easy way to turn this string:
(java || javascript) && vbscript
Into this string:
(str.search('java') || str.search('javascript')) && str.search('vbscript')
ie replace each word in the string with str.search('" + word + "')
I've looked at mystring.match(/[-\w]+/g); which will pull any words out into an array (but not their position)
You can call replace:
mystring.replace(/[-\w]+/g, "str.search('$&')");
Note that this is an XSS hole, since the user input can contain 's.
Just a fixed version with correct capture and using 1 as backtrack index. See details in "Specifying a string as a parameter" section of String.replace.
mystring.replace(/([-\w]+)/g, "str.search('$1')");

Chrome counts characters wrong in textarea with maxlength attribute

Here is an example:
$(function() {
$('#test').change(function() {
$('#length').html($('#test').val().length)
})
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea id=test maxlength=10></textarea>
length = <span id=length>0</span>
Fill textarea with lines (one character at one line) until browser allows.
When you finish, leave textarea, and js code will calculate characters too.
So in my case I could enter only 7 characters (including whitespaces) before chrome stopped me. Although value of maxlength attribute is 10:
Here's how to get your javascript code to match the amount of characters the browser believes is in the textarea:
http://jsfiddle.net/FjXgA/53/
$(function () {
$('#test').keyup(function () {
var x = $('#test').val();
var newLines = x.match(/(\r\n|\n|\r)/g);
var addition = 0;
if (newLines != null) {
addition = newLines.length;
}
$('#length').html(x.length + addition);
})
})
Basically you just count the total line breaks in the textbox and add 1 to the character count for each one.
Your carriage returns are considered 2 characters each when it comes to maxlength.
1\r\n
1\r\n
1\r\n
1
But it seems that the javascript only could one of the \r\n (I am not sure which one) which only adds up to 7.
It seems like the right method, based on Pointy's answer above, is to count all new lines as two characters. That will standardize it across browsers and match what will get sent when it's posted.
So we could follow the spec and replace all occurrences of a Carriage Return not followed by a New Line, and all New Lines not followed by a Carriage Return, with a Carriage Return - Line Feed pair.
var len = $('#test').val().replace(/\r(?!\n)|\n(?!\r)/g, "\r\n").length;
Then use that variable to display the length of the textarea value, or limit it, and so on.
For reasons unknown, jQuery always converts all newlines in the value of a <textarea> to a single character. That is, if the browser gives it \r\n for a newline, jQuery makes sure it's just \n in the return value of .val().
Chrome and Firefox both count the length of <textarea> tags the same way for the purposes of "maxlength".
However, the HTTP spec insists that newlines be represented as \r\n. Thus, jQuery, webkit, and Firefox all get this wrong.
The upshot is that "maxlength" on <textarea> tags is pretty much useless if your server-side code really has a fixed maximum size for a field value.
edit — at this point (late 2014) it looks like Chrome (38) behaves correctly. Firefox (33) however still doesn't count each hard return as 2 characters.
It looks like that javascript is considering length of new line character also.
Try using:
var x = $('#test').val();
x = x.replace(/(\r\n|\n|\r)/g,"");
$('#length').html(x.length);
I used it in your fiddle and it was working. Hope this helps.
That is because an new line is actually 2 bytes, and therefore 2 long. JavaScript doesn't see it that way and therefore it will count only 1, making the total of 7 (3 new lines)
Here's a more universal solution, which overrides the jQuery 'val' function. Will be making this issue into a blog post shortly and linking here.
var originalVal = $.fn.val;
$.fn.val = function (value) {
if (typeof value == 'undefined') {
// Getter
if ($(this).is("textarea")) {
return originalVal.call(this)
.replace(/\r\n/g, '\n') // reduce all \r\n to \n
.replace(/\r/g, '\n') // reduce all \r to \n (we shouldn't really need this line. this is for paranoia!)
.replace(/\n/g, '\r\n'); // expand all \n to \r\n
// this two-step approach allows us to not accidentally catch a perfect \r\n
// and turn it into a \r\r\n, which wouldn't help anything.
}
return originalVal.call(this);
}
else {
// Setter
return originalVal.call(this, value);
}
};
If you want to get remaining content length of text area then you can use match on the string containing the line breaks.
HTML:
<textarea id="content" rows="5" cols="15" maxlength="250"></textarea>
JS:
var getContentWidthWithNextLine = function(){
return 250 - content.length + (content.match(/\n/g)||[]).length;
}
var value = $('#textarea').val();
var numberOfLineBreaks = (value.match(/\n/g)||[]).length;
$('#textarea').attr("maxlength",500+numberOfLineBreaks);
works perfectly on google already in IE have to avoid the script! In IE the 'break-line' is counted only once, so avoid this solution in IE!
Textareas are still not fully in sync among browsers. I noticed 2 major problems: Carriage returns and Character encodings
Carriage return
By default are manipulated as 2 characters \r\n (Windows style).
The problem is that Chrome and Firefox will count it as one character. You can also select it to observe there is an invisivle character selected as a space.
A workaround is found here:
var length = $.trim($(this).val()).split(" ").join("").split('\n').join('').length;
Jquery word counts when user type line break
Internet explorer on the other hand will count it as 2 characters.
Their representation is :
Binary: 00001101 00001010
Hex: 0D0A
, and are represented in UTF-8 as 2 characters and counted for maxlength as 2 characters.
The HTML entities can be
1) Created from javascript code:
<textarea id='txa'></textarea>
document.getElementById("txa").value = String.fromCharCode(13, 10);
2) Parsed from the content of the textarea:
Ansi code:
<textarea>Line one.
Line two.</textarea>
3) Inserted from keyboard Enter key
4) Defined as the multiline content of the textbox
<textarea>Line one.
Line two.</textarea>
Character Encoding
Character encoding of an input field like textarea is independent than the character encoding of the page. This is important if you plan to count the bytes. So, if you have a meta header to define ANSI encoding of your page (with 1 byte per character), the content of your textbox is still UTF-8 with 2 bytes per character.
A workaround for the character encoding is provided here:
function htmlEncode(value){
// Create a in-memory div, set its inner text (which jQuery automatically encodes)
// Then grab the encoded contents back out. The div never exists on the page.
return $('<div/>').text(value).html();
}
function htmlDecode(value){
return $('<div/>').html(value).text();
}
HTML-encoding lost when attribute read from input field

How to force breaking of non breakable strings?

I have an HTML page that I generate from the data contained in a database. The database sometimes contains long strings that the browser can't break because the strings don't contain breakable characters (space, point, comma, etc...).
Is there any way to fix this using html, css or even javascript?
See this link for an example of the problem.
Yes you can, just set the css property of the box to:
.some_selector {
word-wrap: break-word;
}
Edit: Some testing shows that it does work with a div or a p - a block level element - but it does not work with a table cell, nor when the div is put inside a table cell.
Tested and works in IE6, IE7, IE8, Firefox 3.5.3 and Chrome.
Works:
<div style="word-wrap: break-word">aaaaaaaaaaaaaaaaaaaaaaddddddddddddddddddddddddddddddddddddddddddaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa </div>
Based on this article and this one as well: the "Shy Hyphen" or "Soft Hyphen" can be written in HTML as: ­ / ­ / &#xAD (173 dec = AD hex). They all convert to the U+00AD character.
The JavaScript textContent and nodeValue of the DOM Text Nodes are not 'entity encoded' - they just contain the actual entities. In order to write these characters you must therefore encode them yourself: \xAD is a simple way to write the same character in a JavaScript string. String.fromCharCode(173) would also work.
Based on your own VERY good answer - a jQuery Plugin version:
$.fn.replaceInText = function(oldText, newText) {
// contents() gets all child dom nodes -- each lets us operate on them
this.contents().each(function() {
if (this.nodeType == 3) { // text node found, do the replacement
if (this.textContent) {
this.textContent = this.textContent.replace(oldText, newText);
} else { // support to IE
this.nodeValue = this.nodeValue.replace(oldText, newText);
}
} else {
// other types of nodes - scan them for same replace
$(this).replaceInText(oldText, newText);
}
});
return this;
};
$(function() {
$('div').replaceInText(/\w{10}/g, "$&\xAD");
});
A side note:
I think that the place this should happen is NOT in JavaScript - it should be in the server side code. If this is only a page used to display data- you could easily do a similar regexp replace on the text before it is sent to the browser. However the JavaScript solution offers one advantage(or disadvantage depending on how you want to look at it) - It doesn't add any extraneous characters to the data until the script executes, which means any robots crawling your HTML output for data wont see the shy hyphens. Although the HTML spec interprets it as a "hyphenation hint" and an invisible character its not guaranteed across the rest of the Unicode world: (quote from Unicode standard via the second article I linked)
U+00AD soft hyphen indicates a
hyphenation point, where a line-break
is preferred when a word is to be
hyphenated. Depending on the script,
the visible rendering of this
character when a line break occurs may
differ (for example, in some scripts
it is rendered as a hyphen -, while in
others it may be invisible).
Another Note:
Found in this other SO Question - it seems that the "Zero Width Space" character ​ / ​ / U+200b is another option you might want to explore. It would be \x20\x0b as a javascript string.
As it has been pointed out numerous times, no, there is nothing you can do about it, without preprocessing the strings programmatically before displaying them.
I know there is a strategy with inserting the soft hyphen character (­), where needed, but does not seem like a popular option.
Check out this question: Soft hyphen in HTML ( vs. ­)
It is also possible to use word-break css property to cut every word on the element edge.
.selector_name {
word-break: break-all;
}
<p class="selector_name">some words some words some words some words</p>
you can obtain:
some word|
s some wo|<-edge of the element
rds some |
words som|
e words |
There is special character ­ or ­ that could do it.
For example:
Dzie­le­nie wy­ra­zów
could be display like:
1. dzie
2. le
3. nie wy
5. ra
6. zow
I'm answering my own question here...
Based on your answers, I came up with this solution (thanks to #CMS in this question for his help).
This script breaks any word that is more than 30 characters long by inserting a space at the 31st position.
Here is the fixed version: link
I have one problem left, I'd rather insert a ­ then a space. But the assigning node.nodeValue or node.textContent causes the insertion of the text ­ not the tag.
<script type="text/javascript">
$(function() {
replaceText(/\w{30}/g, "$& ", document.body);
});
function replaceText(oldText, newText, node) {
node = node || document.body; // base node
var childs = node.childNodes, i = 0;
while (node = childs[i]) {
if (node.nodeType == 3) { // text node found, do the replacement
if (node.textContent) {
node.textContent = node.textContent.replace(oldText, newText);
} else { // support to IE
node.nodeValue = node.nodeValue.replace(oldText, newText);
}
} else { // not a text mode, look forward
replaceText(oldText, newText, node);
}
i++;
}
}
</script>
I'll wait a few days before I accept this answer in case someone comes up with a simpler solution.
Thanks
The issue with using ­ and the solutions above is that an extra character is still there, and with a copy/paste action (even in plain text) it comes out.
I would use instead the tag <wbr> that is not visible and is not considered when copying.
For example, to have email addresses break in two lines (only when there is not enough space) I use this:
echo str_replace( "#","<wbr>#", $email );
That results in something like this:
name.surname
#website.com
You can use jQuery to achieve that, but How : Let me explain a little bit. First you need to add the reference and there is a plug-in which may help you : Read More Plugin - JQuery But you need to penetrate your code during the fetch phase. At this point you can handle this problem in HttpHandler or Page_PreInit phase but w/o any server side code it must be hard or perhaps there isn't any way. I don't know but you should be able to add something in your database-fetched html page.
It's easier to break up the long words from a text string, before you add them to the document.
It would also be nice to avoid orphans, where you have only one or two characters on the last line.
This method will insert spaces in every unspaced run of characters longer than n,
splitting it so that there are at least min characters on the last line.
function breakwords(text, n, min){
var L= text.length;
n= n || 20;
min= min || 2;
while(L%n && L%n<min)--n;
var Rx= RegExp('(\\w{'+n+',}?)','g');
text= text.replace(Rx,'$1 ');
return text;
}
//test
var n=30, min=5;
var txt= 'abcdefghijklmnopqrstuvwxyz0123456789 abcdefghijklmnopqrstuvwxyz012345678 abcdefghijklmnopqrstuvwxyz01234567 abcdefghijklmnopqrstuvwxyz0123456';
txt=txt.replace(/(\w{30,})/g,function(w){return breakwords(w,n,min)});
alert(txt.replace(/ +/g,'\n'))
/* returned value: (String)
abcdefghijklmnopqrstuvwxyz0123
456789
abcdefghijklmnopqrstuvwxyz0123
45678
abcdefghijklmnopqrstuvwxyz012
34567
abcdefghijklmnopqrstuvwxyz01
23456
*/

Categories