Javascript shorten string and find sentence end

Javascript shorten string and find sentence end - javascript

I am trying to shorted long string by number of characters (approximately) and finding sentence end (dot). Obviously this is not going to be 100% correct in all cases but its good enough. So for example, shorted string to 250 characters and find nearest dot as sentence end.
So having this:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien. Integer mattis dui ut erat. Phasellus nibh magna, tempor vitae, dictum sed, vehicula sed, mauris. In enim arcu, porta vel, dictum eu, pretium a, ipsum. Donec cursus, lorem ac posuere viverra, sem tellus accumsan dolor, vel accumsan tortor est et est.
Would create this:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien.
Things to consider I think:
If there is no dot in the string, shorten string by word boundary (so not to break a word) and add ellipsis (...) on the end which would be this function:
function truncateString( str, n, useWordBoundary ){
if (str.length <= n) { return str; }
var subString = str.substr(0, n-1);
return (useWordBoundary
? subString.substr(0, subString.lastIndexOf(' '))
: subString) + "...";
};
How could one incorporate dot finding into this function?

One approach you can make is splitting upp the string into chars in an array. Looping over the array from position 250 to position 0 and breaking when you find a dot. Take that index of the dot and splice the original array from the starting char, 0, to the dot which is the index value of that dot plus one as splice doesnt include the last value. Then turning that array into a string again.
let string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien. Integer mattis dui ut erat. Phasellus nibh magna, tempor vitae, dictum sed, vehicula sed, mauris. In enim arcu, porta vel, dictum eu, pretium a, ipsum. Donec cursus, lorem ac posuere viverra, sem tellus accumsan dolor, vel accumsan tortor est et est.";
let arrarOfChar = string.split(""); //turns string into array
let position = -1; //-1 indicates that no dot has been found
for(let i = 250 ; i >= 0 ; i--) { //loop from 250 to 0
if(arrarOfChar[i] == ".") { //if that char is equal to "."
position = i; //set the position value to that
break; //break the for loop
}
}
if(position > 0) { //only if we found a dot
let newShortArrayOfChar = arrarOfChar.slice(0,position+1); //shorten the array from 0 to the dot index
let finalString = ""; //this is the final string
for(let i = 0; i < newShortArrayOfChar.length ; i++) {
finalString += newShortArrayOfChar[i]; //loop over every char and add it to the string
}
}
else {
// position should be -1
//handle if no dot exists
}

One option would be to use a regular expression: search for n or fewer characters, ending in a ., and if that match fails (there are no dots within the desired substring), search for n or fewer characters, followed by a word character and a word boundary:
const input = `Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien. Integer mattis dui ut erat. Phasellus nibh magna, tempor vitae, dictum sed, vehicula sed, mauris. In enim arcu, porta vel, dictum eu, pretium a, ipsum. Donec cursus, lorem ac posuere viverra, sem tellus accumsan dolor, vel accumsan tortor est et est.`;
function truncateString( str, n, useWordBoundary ){
const pattern = new RegExp(`^(?:.{1,${n}}\\.` + (
useWordBoundary
? `|.{1,${n - 1}}\\w\\b)`
: ')'
));
const match = str.match(pattern);
if (match) return match[0];
else return 'Match failed';
}
console.log(truncateString(input, 70));
// first sentence is more than 50 characters long, so this fails:
console.log(truncateString(input, 50));
// unless you enable word boundaries:
console.log(truncateString(input, 50, true));
The regex pattern looks like:
^(?:.{1,50}\.|.{1,49}\w\b)
Breaking that down:
^ - Start of string
(?: - Non-capturing group that alternates between:
.{1,50}\. - 50 or fewer characters, followed by a ., or:
.{1,49}\w\b) - 49 or fewer characters, followed by a word character and a word boundary

Here is a pretty straightforward example that trims the string to 250 characters then walks backward looking for the first . if it doesn't find one then the entire 250 characters are returned and if it does it trims it to that .
var maxLength = 250;
function test() {
var input = document.getElementById('test').value;
var trimmed = input.substr(0, maxLength);
var i = trimmed.length;
while (i > 0) {
if (trimmed[i] == '.') {
break;
}
i--;
}
var endResult = i > 1 ? trimmed.substr(0, i + 1) : trimmed;
endResult += endResult.length < input.length ? ' ...' : '';
document.getElementById('output').innerHTML = endResult;
}
.boxsizingBorder {
width: 100%;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
<button onclick="test()">
test
</button>
<textarea id="test" class="boxsizingBorder" rows="5">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien. Integer mattis dui ut erat. Phasellus nibh magna, tempor vitae, dictum sed, vehicula sed, mauris. In enim arcu, porta vel, dictum eu, pretium a, ipsum. Donec cursus, lorem ac posuere viverra, sem tellus accumsan dolor, vel accumsan tortor est et est.</textarea>
<p id="output"></p>

I would suggest to add two more parameters to your function in order to express what the extreme limits are for the offset at which the string would be clipped.
So for instance, if n is 250, you could provide 200 as a minimum and maybe 270 as the ultimate maximum for the cut-off point.
Then here is how I would include the dot-break possibility:
function truncateString( str, min, n, max, useWordBoundary ){
if (str.length <= max) return str;
if (useWordBoundary) {
// Prefer to break after a dot:
var i = str.indexOf(".", n)+1; // Look forward
if (i < min || i > max) i = str.slice(0, n).lastIndexOf(".")+1; // ...or backward
if (i >= min) return str.slice(0, i); // No ellipsis necessary
// If dot-break is impossible, try word break:
i = str.indexOf(" ", n); // Look forward
if (i < min || i > max) i = str.slice(0, n).lastIndexOf(" "); // ...backward
if (i >= min) n = i; // Found an acceptable position
}
return str.substr(0, n) + " ...";
}
// Example:
var str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed in neque. Vivamus tellus. Donec magna. Donec consequat hendrerit magna. In iaculis neque eget nisi. Maecenas vehicula, leo eu commodo aliquam, sem dolor iaculis eros, vel mollis sem urna ac sapien. Integer mattis dui ut erat. Phasellus nibh magna, tempor vitae, dictum sed, vehicula sed, mauris. In enim arcu, porta vel, dictum eu, pretium a, ipsum. Donec cursus, lorem ac posuere viverra, sem tellus accumsan dolor, vel accumsan tortor est et est.";
console.log(truncateString(str, 200, 250, 270, true));
console.log(truncateString(str, 200, 250, 255, true));

Related

Removing specific word in paragraph and modifying first word after it with JS & regex?

With use of JavaScript/jQuery and RegEx I would like to remove all instances of the word 'Integer' from paragraph below and first word after the deleted word should be capitalized.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam elit
massa, maximus in turpis vel, viverra iaculis nisl. Nullam pulvinar mi
eu metus posuere, a porta ligula feugiat. Integer quis nunc neque.
Etiam sollicitudin diam in dolor sagittis pellentesque. Nunc placerat
sollicitudin purus. Proin mattis, quam sit amet pellentesque blandit,
urna erat mollis sapien, et vestibulum nunc mi sed orci. Integer ligula
tellus, maximus id orci quis, euismod consequat nulla.
My attempt so far for removing desired word:
var modified = $(".paragraph").html();
modified = modified.replace(/Integer\s/g, '');
But after that I don't know how to dynamically access the next word (from above example text word: 'quis' and 'ligula') and set it to be capitalized. One note: the word that needs to be deleted is always the same, but word after is always different.

To be sure of getting a capitalized word every time after removing Integer, use the following:
modified = modified.replace(/Integer\s+(\w)/g, function(fullMatch, capturedGroup) {
return capturedGroup.toUpperCase();
});
Note: This would even match Integer followed by Capitalised words. If you want to select only instances of Integer followed by lowercase words, then use [a-z] instead of \w in the above regex.

Not a regex but this one liner can fulfil your purpose.
let str = `Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam elit
massa, maximus in turpis vel, viverra iaculis nisl. Nullam pulvinar mi eu metus
posuere, a porta ligula feugiat. Integer quis nunc neque. Etiam sollicitudin diam
in dolor sagittis pellentesque. Nunc placerat sollicitudin purus. Proin mattis,
quam sit amet pellentesque blandit, urna erat mollis sapien, et vestibulum nunc
mised orci. Integer ligula tellus, maximus id orci quis, euismod consequat nulla.`;
str.split(/Integer\ /g).map(part=>{return part.charAt(0).toUpperCase() + part.substr(1)}).join("")

there is maybe a way with replace directly, but i would do it like this maybe:
let textResult;
do {
textResult = /Integer\s(.)/gs.exec(modified);
if (!textResult || !textResult[1]) {
textResult = null;
continue;
}
modified = modified.replace('Integer ' + textResult[1], textResult[1].toUpperCase());
} while (!!textResult);

Split string with regex by paragraph, and if a paragraph is longer than 5 sentences, split it in half by sentence

I have some code that takes a response from an API and splits it into paragraphs by regex line breaks:
choppedString = await mainResponse.split(/\n\s*\n/);
But sometimes this returns a very long paragraph, and I can't push a Discord.JS embed field thats longer than 1024 characters.
This is where I'm stuck. I can't figure out how to split a paragraph (a .split() array elem) that is longer than 1024 characters and split it up every 5 sentences. Any help?

I don't know if this is the best/most efficient way to do this, but it works:
const mainResponse = `A short paragraph with less than 5 sentences. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
A longer paragraph over 1024 characters. A sentence ending in a question mark should still work? And another ending with an exclamation mark! A sentence ending with a new line
Sed ac tempor velit. Mauris accumsan sollicitudin enim, a blandit metus blandit at. Aenean metus nulla, faucibus et mattis ut, tincidunt ut ante. Cras feugiat mollis risus, sed luctus orci condimentum at. Etiam condimentum, lacus ut posuere malesuada, lectus elit consectetur eros, eget tincidunt purus ipsum sit amet turpis. Mauris ac eros vitae velit dictum ultrices eu ac velit. Aenean interdum, ex nec vulputate tincidunt, est dolor tristique dui, sed sagittis urna nulla ac risus. Etiam ipsum metus, finibus sit amet pulvinar at, ultrices ac libero. Aenean tristique felis sit amet semper auctor. Integer porta neque sed velit tincidunt scelerisque. Fusce nec justo quis arcu ultrices ultricies. Proin fermentum pellentesque arcu vitae imperdiet. Integer tristique commodo arcu, eu cursus ipsum lobortis eu. Aenean hendrerit posuere ex, nec elementum mi tristique eu. Suspendisse felis purus, ultricies id nisi feugiat, scelerisque malesuada risus. Curabitur sit amet velit finibus, venenatis mauris vitae, tincidunt purus. Morbi eget tortor massa. Donec ut ante luctus, fermentum est a, euismod turpis. Proin risus ex, dignissim ac dignissim eu, semper eget lectus. Cras posuere pulvinar turpis, eu auctor ante fermentum quis. Sed tincidunt eu nulla tempus tempor.`
// This splits up an array into multiple arrays of a maximum length
// stolen from https://stackoverflow.com/a/11764168/8289918
const chunk = (arr, len) => {
const chunks = []
let i = 0
while (i < arr.length) chunks.push(arr.slice(i, i += len))
return chunks
}
const choppedString = mainResponse
// Splits it into paragraphs (what you already did)
.split(/\n\s*\n/)
.flatMap(paragraph => paragraph.length > 1024
// If the paragraph is over 1024 characters, split it into arrays with a
// maximum of 5 sentences...
? chunk(paragraph.split(/(?<=[.?!\n])\s*/), 5)
// ...and then trim each of those sentences (to remove the trailing
// new line if there is any) and join them
.map(sentences => sentences.map(s => s.trim()).join(' '))
// If the pargraph is <= 1024 characters, just keep it as it is
: paragraph)
console.log(choppedString)
/(?<=[.?!\n])\s*/ explanation:
(?<=[.?!\n]): a positive lookbehind that matches the characters ., ?, !, or a new line. The lookbehind means that those punctuation won't be removed, but are required for it to match.
\s*: any whitespace, if present
Note that this assumes that the 5 sentences will always be less than 1024 characters.

Creating an array to read the number of words in a line

Hi guys so i am doing an exercise and one of the questions has asked me to "Create an algorithm that puts words divided by space with a max 100 character line length" So basically there is an array of 1000 words or so and its suppose to put 100 words per line, and spaces don't count. It also says :
Given N words where N >= 1000
Create rows of words separated by spaces where row length >= 100
Once a word has been used it should be considered removed.
Add to a simple SPA
Now i have done this to the best of my ability but i created my own set of words, i was wondering how to change it to use an array and also how to remove it when its done been used. To be honset i am not sure what SPA means at all , so if someone can explain that, that would be fantastic.
const paragraph = "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nam nibh. Nunc varius facilisis eros. Sed erat. In in velit quis arcu ornare laoreet. Curabitur adipiscing luctus massa. Integer ut purus ac augue commodo commodo. Nunc nec mi eu justo tempor consectetuer. Etiam vitae nisl. In dignissim lacus ut ante. Cras elit lectus, bibendum a, adipiscing vitae, commodo et, dui. Ut tincidunt tortor. Donec nonummy, enim in lacinia pulvinar, velit tellus scelerisque augue, ac posuere libero urna eget neque. Cras ipsum. Vestibulum pretium, lectus nec venenatis volutpat, purus lectus ultrices risus, a condimentum risus mi et quam. Pellentesque auctor fringilla neque. Duis eu massa ut lorem iaculis vestibulum. Maecenas facilisis elit sed justo. Quisque volutpat malesuada velit.",
lines = Math.round(paragraph.length / 100);
let line = 0;
for (let i = 0; lines > i; i++) {
document.body.innerHTML += paragraph.slice(line, line + 100) + '<br>';
line += 100;
}
So i got it to go through my set of words and it worked fine but i just need to change it to use an array which is what i am trying to figure out
Thanks for all the help

Getting 3rd line of div content

I have a div with some content in four lines. I am able to get number of lines by using the following code. But my requirement is to get 4 line text of the div. For example:
<div>
ut returns between paragraphsut returns between paragraphs
ut returns between paragraphsut returns between paragraphs
web ui text is going on hereut returns between paragraphsut
returns between paragraphs
</div>
In the above DIV.I want to get 4th line text i.e., returns between paragraphs.Is there any way to do this.
I am getting number of lines with the following code
var content = elm.innerHTML;
var elmHeight = elm.offsetHeight;
var lineHeight = 15;
var lines = elmHeight / lineHeight;
lines variable has number of lines in a particular DIV

var line_number=3; // The line number you prefer
var result= $.trm( $( '#mydiv' ).text() ).split( '\n' )[line_number];// mydiv is the id of division
alert(result);

The following works in Firefox for me. I hope it could be adopted to work under IE as well.
<html>
<head>
<script language="javascript">
<!--
function find3rdline (element)
{
var text = element.textContent;
var begin = -1;
var end = -1;
var top = -1;
var line = -1;
for (i = 0; i < text.length; i++)
{
var id = "marker" + i;
element.innerHTML = text.substr (0, i) + "<span id='" + id + "'>X</span>" + text.substr (i, text.length - i);
var marker = document.getElementById (id);
if (marker.offsetTop != top)
{
top = marker.offsetTop;
line++;
if (line == 2) begin = i;
else if (line == 3)
{
end = i;
// break;
}
}
}
element.innerHTML = text;
if (begin == -1) return "";
else if (end == -1) return text.substr (begin, text.length - begin);
else return text.substr (begin, end - begin);
}
// -->
</script>
</head>
<body onload="alert ('Third line is: [' + find3rdline (text) + ']')">
<div id="text">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce porttitor, leo non sollicitudin blandit, metus eros dapibus massa, nec euismod nunc sapien quis nunc. Maecenas mollis, justo sed egestas semper, nulla libero blandit sem, eget gravida ante sapien sagittis turpis. Vivamus sit amet elit tortor, a eleifend mi. Aliquam erat volutpat. Vestibulum sit amet pellentesque magna. Integer eget erat nisl. Suspendisse adipiscing placerat felis quis blandit. Etiam hendrerit tincidunt gravida. Nunc condimentum tristique commodo. Aliquam eget tellus et sapien accumsan cursus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce porttitor, leo non sollicitudin blandit, metus eros dapibus massa, nec euismod nunc sapien quis nunc. Maecenas mollis, justo sed egestas semper, nulla libero blandit sem, eget gravida ante sapien sagittis turpis. Vivamus sit amet elit tortor, a eleifend mi. Aliquam erat volutpat. Vestibulum sit amet pellentesque magna. Integer eget erat nisl. Suspendisse adipiscing placerat felis quis blandit. Etiam hendrerit tincidunt gravida. Nunc condimentum tristique commodo. Aliquam eget tellus et sapien accumsan cursus.
</div>
</body>
</html>

join result that is separated by a diffrent regex match

hello im doing something like how to replace dots inside quote in sentence with regex
var string = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. "Vestibulum interdum dolor nec sapien blandit a suscipit arcu fermentum. Nullam lacinia ipsum vitae enim consequat iaculis quis in augue. Phasellus fermentum congue blandit. Donec laoreet, ipsum et vestibulum vulputate, risus augue commodo nisi, vel hendrerit sem justo sed mauris." Phasellus ut nunc neque, id varius nunc. In enim lectus, blandit et dictum at, molestie in nunc. Vivamus eu ligula sed augue pretium tincidunt sit amet ac nisl. "Morbi eu elit diam, sed tristique nunc."';
// seperate the quotes
var quotes = string.match(/"(.)+?"/g);
var test = [];
// for each quotes
for (var i = quotes.length - 1; i >= 0; i--) {
// replace all the dot inside the quote
test[i] = quotes[i].replace(/\./g, '[dot]');
};
console.log(test);
lets say we already make the change with the regex, but im stuck at how can we join it back to the existing var string as my result is seperated in var test ? or theres a better way?
the output should be something like
Lorem ipsum dolor sit amet, consectetur adipiscing elit. "Vestibulum interdum dolor nec sapien blandit a suscipit arcu fermentum[dot]Nullam lacinia ipsum vitae enim consequat iaculis quis in augue[dot] Phasellus fermentum congue blandit[dot] Donec laoreet, ipsum et vestibulum vulputate, risus augue commodo nisi, vel hendrerit sem justo sed mauris[dot]" Phasellus ut nunc neque, id varius nunc. In enim lectus, blandit et dictum at, molestie in nunc. Vivamus eu ligula sed augue pretium tincidunt sit amet ac nisl. "Morbi eu elit diam, sed tristique nunc[dot]"
*ps im not sure the title is corrent
thanks

You could instead just split at ", do the replacement in every second array element, then join again:
var parts = string.split('"');
for (var i = 1; i < parts.length; i += 2) {
parts[i] = parts [i].replace(/\./g, '[dot]');
};
string = parts.join('"');
Since split will create an empty string at index 0 if the string starts with " this should work in all cases.
Note that the edge case of a trailing unmatched " will lead to every dot after that " to be replaced as well. If you do not want this, simply change the for condition to i < parts.length - 1.
JSFiddle Demo

Use regexp but with replace function:
string.replace(/"[^"]+"/g, function(m) {return m.replace(/\./g,"[dot]")})

We Keep Coding

JavaScript is the programming language of the Web.

Javascript shorten string and find sentence end - javascript

Related

Removing specific word in paragraph and modifying first word after it with JS & regex?

Split string with regex by paragraph, and if a paragraph is longer than 5 sentences, split it in half by sentence

Creating an array to read the number of words in a line

Getting 3rd line of div content

join result that is separated by a diffrent regex match

Categories

Resources