I'm trying to parse minimal mark-up text by lines. Currently I have a for loop that parses letter by letter. See the code below:
Text:
<element id="myE">
This is some text that
represents accurately the way I
have written my html
file.
</element>
code:
var list = document.getElementById("myE").innerHTML;
var tallie = 0;
for (i=1;i<list.length;i++) {
if (/*list[i] == " "*/ true) {
list += 1;
console.log(list[i]);
}
}
console.log(tallie);
As expected, the text embedded in the element renders in the DOM as though it were a continuous, properly formatted string. But what I'm finding is that the console recognizes the difference between a non-breaking space and a new line. where " " and
"
"
represent the two respectively.
Since the console appears to know the difference, it seems there should be a way to test for the difference. If you unlock the commented condition, it will start testing for non-breaking spaces. I think there is another way to do this using the character encoding string (not  , another one). It seems reasonable then to expect to be able to find a character code for a breaking space. Unfortunately I can not find one.
Long story short, how can I achieve a true line by line parsing of an html file?
Newline characters are encoded with \n. Sometimes you will also find combinations of carriage return and new line \r\n (see wikipedia on Newline). These should not be confused with a Non Breaking Space or which are used if you want the browser to not word wrap but still display a space or if you want the browser to not collapse multiple spaces together.
Related
I wanted to understand what carriage return is by writint a simple code to console.As carriage return '\r' means
" return to the beginning of the current line without advancing
downward"
But in my code the following string is appended at the end of the line .Why it is behaving like this.I have a string "this is my string" ,then i have carriage return ,and it is followed by another string "that".I thought "that" will be placed at the beginning of the string
console.log("this is my string"+String.fromCharCode(13)+"that");
it prints "this is my stringthat"
Using \r in a string in JavaScript is probably going to give you different results depending on a combination of how the program is being run (in a browser or a standalone engine) and the target of the text (console, alert, a text node in an HTML element etc). It's not clear from your question whether you're running JavaScript in a browser, but (assuming you are) you're going to get different results for different browsers. Internet Explorer's console treats \r as a newline character (\n) while most other browsers will ignore it. I doubt any browser implementation of console is going to give you the behavior you've described.
Note that \r is not a string processing instruction, it's a character. Doing this:
var aString="one\r2";
is never going to result in
aString == "2ne"
or
aString == "2one"
or
aString == "one2"
or anything similar evaluating to true. aString's value will remain "one\r2" until you change it. It's up to the console or alert that is displaying the string to choose how to render \r.
There are string processing methods in JavaScript for splitting and recombining strings (see the w3schools Javascript String Reference or Mozilla's String reference) that would better suit your purposes. If you start using characters like \r or \b in other languages and/or environments you're going to encounter different behaviors based on a whole host of factors.
I can see the line breaks "↵" for a string in Chrome Developer Tools
<br>↵↵<br>Event Status: confirmed↵<br>Event Description: Hog Day (Night )and Hog Day (Day)↵↵Friday...
If i double click this and paste to Notepad, the line breaks are preserved.
When i save the string to an object like so,
var summary = el.find("summary").text();
var volunteerEvent = {title: title, summary: summary}
and eventually display it on a page,
$('#volunteerEventDescription').empty().html(event.summary);
the line breaks are gone and it's a huge chunk of text.
How do i keep the newlines?
I see two obvious options. Which one is the right one for you depends on how much control over the formatting you want.
Use the pre tag and the new lines will be respected. pre is for preformatted text and will use non-proportional font so it may not render as you would wish. See pre on MDN for more details.
Replace the new lines with the br tag. You can do this with a regular expression: stringValue.replace(/\n/g, '<br/>'). A more robust regular expression is present on another question: jQuery convert line breaks to br (nl2br equivalent).
The nl2br function equivalent from PHP can be found in php.js: http://phpjs.org/functions/nl2br/. nl2br, as the name might subtly suggest, converts newlines to break tags.
I'm dynamically filling a div with text using javascript.
The div is at a fix width of 200px, and the text is automatically formatted to fit in that div.
The text itself is in a json, and the json has no carriage return.
I would like to know if it's possible to detect the carriage returns that are automatically generated.
The reason I would like to know that is because I have more than a hundred texts, and if a carriage return is inserted after a 3/2 letter word, I need to insert it before the 3/2 letter word.
So I've looked on the forum, but all I tried didn't seem to work.
test = $("#mydiv").html();
html = test.split(/\r\n|\r|\n/g);
console.log(html.length);
It always returns a length of 1, as if it didn't recognize the carriage returns automatically inserted.
Thanks for any help will be most welcomed !
You can prevent line breaks after short words by replacing any space after those words with a non-breaking space. This displays like a normal space, but doesn't allow the text to be wrapped at this point. E.g.
mydiv.innerHTML = mytext.replace(/\b(\w{1,3})\s+/g, '$1 ');
The {1,3} specifies words of one to three alphanumeric characters in length. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions for details. You way wish to adjust the regular expression for your own requirements.
I don't think the effect is especially visually pleasing. Browsers don't have very sophisticated word-wrapping algorithms.
I was wondering if there's a way to automatically control orphaned words in an HTML file, possibly by using CSS and/or Javascript (or something else, if anyone has an alternative suggestion).
By 'orphaned words', I mean singular words that appear on a new line at the end of a paragraph. For example:
"This paragraph ends with an undesirable orphaned
word."
Instead, it would be preferable to have the paragraph break as follows:
"This paragraph no longer ends with an undesirable
orphaned word."
While I know that I could manually correct this by placing an HTML non-breaking space ( ) between the final two words, I'm wondering if there's a way to automate the process, since manual adjustments like this can quickly become tedious for large blocks of text across multiple files.
Incidentally, the CSS2.1 properties orphans (and widows) only apply to entire lines of text, and even then only for the printing of HTML pages (not to mention the fact that these properties are largely unsupported by most major browsers).
Many professional page layout applications, such as Adobe InDesign, can automate the removal of orphans by automatically adding non-breaking spaces where orphans occur; is there any sort of equivalent solution for HTML?
You can avoid orphaned words by replacing the space between the last two words in a sentence with a non-breaking space ( ).
There are plugins out there that does this, for example jqWidon't or this jquery snippet.
There are also plugins for popular frameworks (such as typogrify for django and widon't for wordpress) that essentially does the same thing.
I know you wanted a javascript solution, but in case someone found this page a solution but for emails (where Javascript isn't an option), I decided to post my solution.
Use CSS white-space: nowrap. So what I do is surround the last two or three words (or wherever I want the "break" to be) in a span, add an inline CSS (remember, I deal with email, make a class as needed):
<td>
I don't <span style="white-space: nowrap;">want orphaned words.</span>
</td>
In a fluid/responsive layout, if you do it right, the last few words will break to a second line until there is room for those words to appear on one line.
Read more about about the white-space property on this link: http://www.w3schools.com/cssref/pr_text_white-space.asp
EDIT: 12/19/2015 - Since this isn't supported in Outlook, I've been adding a non-breaking space between the last two words in a sentence. It's less code, and supported everywhere.
EDIT: 2/20/2018 - I've discovered that the Outlook App (iOS and Android) doesn't support the entity, so I've had to combine both solutions: e.g.:
<td>
I don't <span style="white-space:nowrap;">want orphaned words.</span>
</td>
In short, no. This is something that has driven print designers crazy for years, but HTML does not provide this level of control.
If you absolutely positively want this, and understand the speed implications, you can try the suggestion here:
detecting line-breaks with jQuery?
That is the best solution I can imagine, but that does not make it a good solution.
I see there are 3rd party plugins suggested, but it's simpler to do it yourself. if all you want to do is replace the last space character with a non-breaking space, it's almost trivial:
const unorphanize = (str) => {
let iLast = str.lastIndexOf(' ');
let stArr = str.split('');
stArr[iLast] = ' ';
return stArr.join('')
}
I suppose this may miss some unique cases but it's worked for all my use cases. the caveat is that you can't just plug the output in where text would go, you have to set innerHTML = unorphanize(text) or otherwise parse it
If you want to handle it yourself, without jQuery, you can write a javascript snippet to replace the text, if you're willing to make a couple assumptions:
A sentence always ends with a period.
You always want to replace the whitespace before the last word with
Assuming you have this html (which is styled to break right before "end" in my browser...monkey with the width if needed):
<div id="articleText" style="width:360px;color:black; background-color:Yellow;">
This is some text with one word on its own line at the end.
<p />
This is some text with one word on its own line at the end.
</div>
You can create this javascript and put it at the end of your page:
<script type="text/javascript">
reformatArticleText();
function reformatArticleText()
{
var div = document.getElementById("articleText");
div.innerHTML = div.innerHTML.replace(/\S(\s*)\./g, " $1.");
}
</script>
The regex simply finds all instances (using the g flag) of a whitespace character (\S) followed by any number of non-whitespace characters (\s) followed by a period. It creates a back-reference to the non-white-space that you can use in the replace text.
You can use a similar regex to include other end punctuation marks.
If third-party JavaScript is an option, one can use typogr.js, a JavaScript "typogrify" implementation. This particular filter is called, unsurprisingly, Widont.
<script src="https://cdnjs.cloudflare.com/ajax/libs/typogr/0.6.7/typogr.min.js"></script>
<script>
document.body.innerHTML = typogr.widont(document.body.innerHTML);
</script>
</body>
I am trying to populate a DOM element with ID 'myElement'. The content which I'm populating is a mix of text and HTML elements.
Assume following is the content I wish to populate in my DOM element.
var x = "<b>Success</b> is a matter of hard work &luck";
I tried using innerHTML as follows,
document.getElementById("myElement").innerHTML=x;
This resulted in chopping off of the last word in my sentence.
Apparently, the problem is due to the '&' character present in the last word. I played around with the '&' and innerHTML and following are my observations.
If the last word of the content is less than 10 characters and if it has a '&' character present in it, innerHTML chops off the sentence at '&'.
This problem does not happen in firefox.
If I use innerText the last word is in tact but then all the HTML tags which are part of the content becomes plain text.
I tried populating through jQuery's #html method,
$("#myElement").html(x);
This approach solves the problem in IE but not in chrome.
How can I insert a HTML content with a last word containing '&' without it being chopped off in all browsers?
Update : 1. I tried html encoding the content which I am trying to insert into the DOM. When I encode the content, the html tags which are part of the content becomes plain string.
For the above mentioned content, I expect the result to be rendered as,
Success is a matter of hard work &luck
but when I encode what I actually get in the rendered page is,
<b>Success</b> is a matter of hard work &luck
You should replace your & with &.
The & (ampersand) character is used within HTML to represent various special characters. For example, " = ", < = <, etcetera. Now, &luck clearly is not a valid HTML entity (for one it is missing the semicolon). However, various browsers may, due to combinations of error correcting (the semicolon), and the fact that it looks somewhat like an HTML entity (& followed by four characters) try to parse it as such.
Because &luck; is not a valid HTML entity, the original text is lost. Because of this, when using an ampersand in your HTML, always use &.
Update: When this text is entered by a user, it is up to you to escape this character properly. In PHP for example, you would call htmlentities on the text before displaying it to the user. This has the added benefit of filtering out malicious user code such as <script> tags.
The ampersand is a special character in HTML that indicates the start of a character entity reference or numeric character reference, you need to escape it like so:
var x = "<b>Success</b> is a matter of hard work &luck";
Try using this instead:
var x = "<b>Success</b> is a matter of hard work &luck";
By HTML encoding the ampersand, you are ensuring that there is no ambiguity in what you mean when you write "&luck".