I can see the line breaks "↵" for a string in Chrome Developer Tools
<br>↵↵<br>Event Status: confirmed↵<br>Event Description: Hog Day (Night )and Hog Day (Day)↵↵Friday...
If i double click this and paste to Notepad, the line breaks are preserved.
When i save the string to an object like so,
var summary = el.find("summary").text();
var volunteerEvent = {title: title, summary: summary}
and eventually display it on a page,
$('#volunteerEventDescription').empty().html(event.summary);
the line breaks are gone and it's a huge chunk of text.
How do i keep the newlines?
I see two obvious options. Which one is the right one for you depends on how much control over the formatting you want.
Use the pre tag and the new lines will be respected. pre is for preformatted text and will use non-proportional font so it may not render as you would wish. See pre on MDN for more details.
Replace the new lines with the br tag. You can do this with a regular expression: stringValue.replace(/\n/g, '<br/>'). A more robust regular expression is present on another question: jQuery convert line breaks to br (nl2br equivalent).
The nl2br function equivalent from PHP can be found in php.js: http://phpjs.org/functions/nl2br/. nl2br, as the name might subtly suggest, converts newlines to break tags.
Related
I'm trying to parse minimal mark-up text by lines. Currently I have a for loop that parses letter by letter. See the code below:
Text:
<element id="myE">
This is some text that
represents accurately the way I
have written my html
file.
</element>
code:
var list = document.getElementById("myE").innerHTML;
var tallie = 0;
for (i=1;i<list.length;i++) {
if (/*list[i] == " "*/ true) {
list += 1;
console.log(list[i]);
}
}
console.log(tallie);
As expected, the text embedded in the element renders in the DOM as though it were a continuous, properly formatted string. But what I'm finding is that the console recognizes the difference between a non-breaking space and a new line. where " " and
"
"
represent the two respectively.
Since the console appears to know the difference, it seems there should be a way to test for the difference. If you unlock the commented condition, it will start testing for non-breaking spaces. I think there is another way to do this using the character encoding string (not  , another one). It seems reasonable then to expect to be able to find a character code for a breaking space. Unfortunately I can not find one.
Long story short, how can I achieve a true line by line parsing of an html file?
Newline characters are encoded with \n. Sometimes you will also find combinations of carriage return and new line \r\n (see wikipedia on Newline). These should not be confused with a Non Breaking Space or which are used if you want the browser to not word wrap but still display a space or if you want the browser to not collapse multiple spaces together.
We have report pages that append a <pre> at the bottom of pages that lists information line by line.
Example:
<pre>
Site Report Info
This is where any error will appear as a query string of numbers: 938109283091238109281092
This is where the account ID will be.
This is where the account reference pin will be.
So on...
So on...
So on...
So on...
So on...
</pre>
Using javascript or jquery, perhaps regex, how can I place all of this into one array where each line is an array element? I assume regex, since the way to determine the lines is by identifying line breaks \n ?
var lines = $('pre').text().split('\n') should do the trick.
You don't need to use jQuery to get the text, of course, but if you're doing web programming, jQuery is pretty ubiquitous.
You may also want to trim the results to get rid of extra whitespace (or not, depending on your application):
var lines = $('pre').text().split('\n').map(function(l) { return l.trim(); });
I'm trying to make a parser for formatting JavaScript in a contextual format. First I want to be able to convert the input JavaScript into one line of JavaScript and then format the code based on my requirements. This does not remove all of the enters or white space:
txt = $.trim(txt);
txt = txt.replace("\n", "");
How can I convert the text into one line?
Use a regular expression with the "global" flag set:
txt.replace(/\n/g, "");
However, you should be careful about removing linebreaks in Javascript. You might break code that was depending on semicolon insertion. Why don't you use an off-the shelf parser like Esprima?
Use :
\s character that represents any space character (Carriage return, Line Feed, Tabs, Spaces, ...)
the "greedy" g flag.
var text = txt.replace(/\s+/g, ' ');
Hope it helps
If the text comes from some operating systems, it may have the \r\n line ending, so it is worth removing both...
You should also use /\r/g this replaces ALL \rs not just the first one.
var noNewLines = txt.replace(/\r/g, "").replace(/\n/g, "");
You have to be pretty sure there are no single-line comments and that there are no missing semi-colons.
You can try to minify your code, using something like https://javascript-minifier.com/
however this will also change your variable names
ok i do have this following data in my div
<div id="mydiv">
<!--
what is your present
<code>alert("this is my present");</code>
where?
<code>alert("here at my left hand");</code>
oh thank you! i love you!! hehe
<code>alert("welcome my honey ^^");</code>
-->
</div>
well what i need to do there is to get the all the scripts inside the <code> blocks and the html codes text nodes without removing the html comments inside. well its a homework given by my professor and i can't modify that div block..
I need to use regular expressions for this and this is what i did
var block = $.trim($("div#mydiv").html()).replace("<!--","").replace("-->","");
var htmlRegex = new RegExp(""); //I don't know what to do here
var codeRegex = new RegExp("^<code(*n)</code>$","igm");
var code = codeRegex.exec(block);
var html = "";
it really doesn't work... please don't give the exact answer.. please teach me.. thank you
I need to have the following blocks for the variable code
alert("this is my present");
alert("here at my left hand");
alert("welcome my honey ^^");
and this is the blocks i need for variable html
what is your present
where?
oh thank you! i love you!! hehe
my question is what is the regex pattern to get the results above?
Parsing HTML with a regular expression is not something you should do.
I'm sure your professor thinks he/she was really clever and that there's no way to access the DOM API and can wave a banner around and justify some minor corner-case for using regex to parse the DOM and that sometimes it's okay.
Well, no, it isn't. If you have complex code in there, what happens? Your regex breaks, and perhaps becomes a security exploit if this is ever in production.
So, here:
http://jsfiddle.net/zfp6D/
Walk the dom, get the nodeType 8 (comment) text value out of the node.
Invoke the HTML parser (that thing that browsers use to parse HTML, rather than regex, why you wouldn't use the HTML parser to parse HTML is totally beyond me, it's like saying "Yeah, I could nail in this nail with a hammer, but I think I'm going to just stomp on the nail with my foot until it goes in").
Find all the CODE elements in the newly parsed HTML.
Log them to console, or whatever you want to do with them.
First of all, you should be aware that because HTML is not a regular language, you cannot do generic parsing using regular expressions that will work for all valid inputs (generic nesting in particular cannot be expressed with regular expressions). Many parsers do use regular expressions to match individual tokens, but other algorithms need to be built around them
However, for a fixed input such as this, it's just a case of working through the structure you have (though it's still often easier to use different parsing methods than just regular expressions).
First lets get all the code:
var code = '', match = [];
var regex = new RegExp("<code>(.*?)</code>", "g");
while (match = regex.exec(content)) {
code += match[1] + "\n";
}
I assume content contains the content of the div that you've already extracted. Here the "g" flag says this is for "global" matching, so we can reuse the regex to find every match. The brackets indicate a capturing group, . means any character, * means repeated 0 or more times, and ? means "non-greedy" (see what happens without it to see what it does).
Now we can do a similar thing to get all the other bits, but this time the regex is slightly more complicated:
new RegExp("(<!--|</code>)(.*?)(-->|<code>)", "g")
Here | means "or". So this matches all the bits that start with either "start comment" or "end code" and end with "end comment" or "start code". Note also that we now have 3 sets of brackets, so the part we want to extract is match[2] (the second set).
You're doing a lot of unnecessary stuff. .html() gives you the inner contents as a string. You should be able to use regEx to grab exactly what you need from there. Also, try to stick with regEx literals (e.g. /^regexstring$/). You have to escape escape characters using new RegExp which gets really messy. You generally only want to use new RegExp when you need to put a string var into a regEx.
The match function of strings accepts regEx and returns a collection of every match when you add the global flag (e.g. /^regexstring$/g <-- note the 'g'). I would do something like this:
var block = $('#mydiv').html(), //you can set multiple vars in one statement w/commas
matches = block.match(/<code>[^<]*<\/code>/g);
//[^<]* <-- 0 or more characters that aren't '<' - google 'negative character class'
matches.join('_') //lazy way of avoiding a loop - join into a string with a safe character
.replace(/<\/*code>/g,'') //\/* 0 or more forward slashes
.split('_');//return the matches string back to array
//Now do what you want with matches. Eval (ew) or append in a script tag (ew).
//You have no control over the 'ew'. I just prefer data to scripts in strings
I was wondering if there's a way to automatically control orphaned words in an HTML file, possibly by using CSS and/or Javascript (or something else, if anyone has an alternative suggestion).
By 'orphaned words', I mean singular words that appear on a new line at the end of a paragraph. For example:
"This paragraph ends with an undesirable orphaned
word."
Instead, it would be preferable to have the paragraph break as follows:
"This paragraph no longer ends with an undesirable
orphaned word."
While I know that I could manually correct this by placing an HTML non-breaking space ( ) between the final two words, I'm wondering if there's a way to automate the process, since manual adjustments like this can quickly become tedious for large blocks of text across multiple files.
Incidentally, the CSS2.1 properties orphans (and widows) only apply to entire lines of text, and even then only for the printing of HTML pages (not to mention the fact that these properties are largely unsupported by most major browsers).
Many professional page layout applications, such as Adobe InDesign, can automate the removal of orphans by automatically adding non-breaking spaces where orphans occur; is there any sort of equivalent solution for HTML?
You can avoid orphaned words by replacing the space between the last two words in a sentence with a non-breaking space ( ).
There are plugins out there that does this, for example jqWidon't or this jquery snippet.
There are also plugins for popular frameworks (such as typogrify for django and widon't for wordpress) that essentially does the same thing.
I know you wanted a javascript solution, but in case someone found this page a solution but for emails (where Javascript isn't an option), I decided to post my solution.
Use CSS white-space: nowrap. So what I do is surround the last two or three words (or wherever I want the "break" to be) in a span, add an inline CSS (remember, I deal with email, make a class as needed):
<td>
I don't <span style="white-space: nowrap;">want orphaned words.</span>
</td>
In a fluid/responsive layout, if you do it right, the last few words will break to a second line until there is room for those words to appear on one line.
Read more about about the white-space property on this link: http://www.w3schools.com/cssref/pr_text_white-space.asp
EDIT: 12/19/2015 - Since this isn't supported in Outlook, I've been adding a non-breaking space between the last two words in a sentence. It's less code, and supported everywhere.
EDIT: 2/20/2018 - I've discovered that the Outlook App (iOS and Android) doesn't support the entity, so I've had to combine both solutions: e.g.:
<td>
I don't <span style="white-space:nowrap;">want orphaned words.</span>
</td>
In short, no. This is something that has driven print designers crazy for years, but HTML does not provide this level of control.
If you absolutely positively want this, and understand the speed implications, you can try the suggestion here:
detecting line-breaks with jQuery?
That is the best solution I can imagine, but that does not make it a good solution.
I see there are 3rd party plugins suggested, but it's simpler to do it yourself. if all you want to do is replace the last space character with a non-breaking space, it's almost trivial:
const unorphanize = (str) => {
let iLast = str.lastIndexOf(' ');
let stArr = str.split('');
stArr[iLast] = ' ';
return stArr.join('')
}
I suppose this may miss some unique cases but it's worked for all my use cases. the caveat is that you can't just plug the output in where text would go, you have to set innerHTML = unorphanize(text) or otherwise parse it
If you want to handle it yourself, without jQuery, you can write a javascript snippet to replace the text, if you're willing to make a couple assumptions:
A sentence always ends with a period.
You always want to replace the whitespace before the last word with
Assuming you have this html (which is styled to break right before "end" in my browser...monkey with the width if needed):
<div id="articleText" style="width:360px;color:black; background-color:Yellow;">
This is some text with one word on its own line at the end.
<p />
This is some text with one word on its own line at the end.
</div>
You can create this javascript and put it at the end of your page:
<script type="text/javascript">
reformatArticleText();
function reformatArticleText()
{
var div = document.getElementById("articleText");
div.innerHTML = div.innerHTML.replace(/\S(\s*)\./g, " $1.");
}
</script>
The regex simply finds all instances (using the g flag) of a whitespace character (\S) followed by any number of non-whitespace characters (\s) followed by a period. It creates a back-reference to the non-white-space that you can use in the replace text.
You can use a similar regex to include other end punctuation marks.
If third-party JavaScript is an option, one can use typogr.js, a JavaScript "typogrify" implementation. This particular filter is called, unsurprisingly, Widont.
<script src="https://cdnjs.cloudflare.com/ajax/libs/typogr/0.6.7/typogr.min.js"></script>
<script>
document.body.innerHTML = typogr.widont(document.body.innerHTML);
</script>
</body>