Javascript regex: match first 50 characters, respecting words

Javascript regex: match first 50 characters, respecting words - javascript

I'm trying to keep some nav bar lines short by matching the first 50 chars then concatenating '...', but using substr sometimes creates some awkward word chops.
So I want to figure out a way to respect words.
I could write a function to do this, but I'm just seeing if there's an easier/cleaner way.
I've used this successfully in perl:
^(.{50,50}[^ ]*)
Nice and elegant! But it doesn't work in Javascript :(
let catName = "A string that is longer than 50 chars that I want to abbreviate";
let regex = /^(.{50,50}[^ ]*)/;
let match = regex.exec(catName);
match is undefined

Use String#match method with regex with word boundary to include the last word.
str.match(/^.{1,50}.*?\b/)[0]
var str="I'm trying to keep some nav bar lines short by matching the first 50 chars then concatenating '...', but using substr sometimes creates some awkward word chops. So I want to figure out a way to respect words.";
console.log('With your code:', str.substr(0,50));
console.log('Using match:',str.match(/^.{1,50}.*?\b/)[0]);

Probably the most fool-proof solution with regular expression would be to use replace method instead. It won't fail with strings less than 50 characters:
str.replace(/^(.{50}[^ ]*).*/, '$1...');
var str = 'A string that is longer than 50 chars that I want to abbreviate';
console.log( str.replace(/^(.{50}[^ ]*).*/, '$1...') );

Tinkering with Pranov's answer, I think this works and is most succinct:
// abbreviate strings longer than 50 char, respecting words
if (catName.length > 50) {
catName = catName.match(/^(.{50,50}[^ ]*)/)[0] + '...';
}
The regex in my OP did work, but it was used in a loop and was choking on strings that already had fewer than 50 chars.

You can .split() \s, count characters at each array element which contains a word at for loop, when 50 or greater is reached when .length of each word is accrued at a variable, .slice() at current iteration from array, .join() with space characters " ", .concat() ellipses, break loop.
let catName = "A string that is longer than 50 chars that I want to abbreviate";
let [stop, res] = [50, ""];
if (catName.length > stop) {
let arr = catName.split(/\s/);
for (let i = 0, n = 0; i < arr.length; i++) {
n += arr[i].length;
if (n >= stop) {
res = arr.slice(0, i).join(" ").concat("...");
break;
};
}
} else {
res = catName.slice(0, 50).concat("...")
}
document.querySelector("pre").textContent = res;
<pre></pre>

Related

Removing last two characters of a string [duplicate]

I have a string, 12345.00, and I would like it to return 12345.0.
I have looked at trim, but it looks like it is only trimming whitespace and slice which I don't see how this would work. Any suggestions?

You can use the substring function:
let str = "12345.00";
str = str.substring(0, str.length - 1);
console.log(str);
This is the accepted answer, but as per the conversations below, the slice syntax is much clearer:
let str = "12345.00";
str = str.slice(0, -1);
console.log(str);

You can use slice! You just have to make sure you know how to use it. Positive #s are relative to the beginning, negative numbers are relative to the end.
js>"12345.00".slice(0,-1)
12345.0

You can use the substring method of JavaScript string objects:
s = s.substring(0, s.length - 4)
It unconditionally removes the last four characters from string s.
However, if you want to conditionally remove the last four characters, only if they are exactly _bar:
var re = /_bar$/;
s.replace(re, "");

The easiest method is to use the slice method of the string, which allows negative positions (corresponding to offsets from the end of the string):
const s = "your string";
const withoutLastFourChars = s.slice(0, -4);
If you needed something more general to remove everything after (and including) the last underscore, you could do the following (so long as s is guaranteed to contain at least one underscore):
const s = "your_string";
const withoutLastChunk = s.slice(0, s.lastIndexOf("_"));
console.log(withoutLastChunk);

For a number like your example, I would recommend doing this over substring:
console.log(parseFloat('12345.00').toFixed(1));
Do note that this will actually round the number, though, which I would imagine is desired but maybe not:
console.log(parseFloat('12345.46').toFixed(1));

Be aware that String.prototype.{ split, slice, substr, substring } operate on UTF-16 encoded strings
None of the previous answers are Unicode-aware.
Strings are encoded as UTF-16 in most modern JavaScript engines, but higher Unicode code points require surrogate pairs, so older, pre-existing string methods operate on UTF-16 code units, not Unicode code points.
See: Do NOT use .split('').
const string = "ẞ🦊";
console.log(string.slice(0, -1)); // "ẞ\ud83e"
console.log(string.substr(0, string.length - 1)); // "ẞ\ud83e"
console.log(string.substring(0, string.length - 1)); // "ẞ\ud83e"
console.log(string.replace(/.$/, "")); // "ẞ\ud83e"
console.log(string.match(/(.*).$/)[1]); // "ẞ\ud83e"
const utf16Chars = string.split("");
utf16Chars.pop();
console.log(utf16Chars.join("")); // "ẞ\ud83e"
In addition, RegExp methods, as suggested in older answers, don’t match line breaks at the end:
const string = "Hello, world!\n";
console.log(string.replace(/.$/, "").endsWith("\n")); // true
console.log(string.match(/(.*).$/) === null); // true
Use the string iterator to iterate characters
Unicode-aware code utilizes the string’s iterator; see Array.from and ... spread.
string[Symbol.iterator] can be used (e.g. instead of string) as well.
Also see How to split Unicode string to characters in JavaScript.
Examples:
const string = "ẞ🦊";
console.log(Array.from(string).slice(0, -1).join("")); // "ẞ"
console.log([
...string
].slice(0, -1).join("")); // "ẞ"
Use the s and u flags on a RegExp
The dotAll or s flag makes . match line break characters, the unicode or u flag enables certain Unicode-related features.
Note that, when using the u flag, you eliminate unnecessary identity escapes, as these are invalid in a u regex, e.g. \[ is fine, as it would start a character class without the backslash, but \: isn’t, as it’s a : with or without the backslash, so you need to remove the backslash.
Examples:
const unicodeString = "ẞ🦊",
lineBreakString = "Hello, world!\n";
console.log(lineBreakString.replace(/.$/s, "").endsWith("\n")); // false
console.log(lineBreakString.match(/(.*).$/s) === null); // false
console.log(unicodeString.replace(/.$/su, "")); // ẞ
console.log(unicodeString.match(/(.*).$/su)[1]); // ẞ
// Now `split` can be made Unicode-aware:
const unicodeCharacterArray = unicodeString.split(/(?:)/su),
lineBreakCharacterArray = lineBreakString.split(/(?:)/su);
unicodeCharacterArray.pop();
lineBreakCharacterArray.pop();
console.log(unicodeCharacterArray.join("")); // "ẞ"
console.log(lineBreakCharacterArray.join("").endsWith("\n")); // false
Note that some graphemes consist of more than one code point, e.g. 🏳️‍🌈 which consists of the sequence 🏳 (U+1F3F3), VS16 (U+FE0F), ZWJ (U+200D), 🌈 (U+1F308).
Here, even Array.from will split this into four “characters”.
Matching those is made easier with the RegExp set notation and properties of strings proposal.

Using JavaScript's slice function:
let string = 'foo_bar';
string = string.slice(0, -4); // Slice off last four characters here
console.log(string);
This could be used to remove '_bar' at end of a string, of any length.

A regular expression is what you are looking for:
let str = "foo_bar";
console.log(str.replace(/_bar$/, ""));

Try this:
const myString = "Hello World!";
console.log(myString.slice(0, -1));

Performance
Today 2020.05.13 I perform tests of chosen solutions on Chrome v81.0, Safari v13.1 and Firefox v76.0 on MacOs High Sierra v10.13.6.
Conclusions
the slice(0,-1)(D) is fast or fastest solution for short and long strings and it is recommended as fast cross-browser solution
solutions based on substring (C) and substr(E) are fast
solutions based on regular expressions (A,B) are slow/medium fast
solutions B, F and G are slow for long strings
solution F is slowest for short strings, G is slowest for long strings
Details
I perform two tests for solutions A, B, C, D, E(ext), F, G(my)
for 8-char short string (from OP question) - you can run it HERE
for 1M long string - you can run it HERE
Solutions are presented in below snippet
function A(str) {
return str.replace(/.$/, '');
}
function B(str) {
return str.match(/(.*).$/)[1];
}
function C(str) {
return str.substring(0, str.length - 1);
}
function D(str) {
return str.slice(0, -1);
}
function E(str) {
return str.substr(0, str.length - 1);
}
function F(str) {
let s= str.split("");
s.pop();
return s.join("");
}
function G(str) {
let s='';
for(let i=0; i<str.length-1; i++) s+=str[i];
return s;
}
// ---------
// TEST
// ---------
let log = (f)=>console.log(`${f.name}: ${f("12345.00")}`);
[A,B,C,D,E,F,G].map(f=>log(f));
This snippet only presents soutions
Here are example results for Chrome for short string

Use regex:
let aStr = "12345.00";
aStr = aStr.replace(/.$/, '');
console.log(aStr);

How about:
let myString = "12345.00";
console.log(myString.substring(0, myString.length - 1));

1. (.*), captures any character multiple times:
console.log("a string".match(/(.*).$/)[1]);
2. ., matches last character, in this case:
console.log("a string".match(/(.*).$/));
3. $, matches the end of the string:
console.log("a string".match(/(.*).{2}$/)[1]);

https://stackoverflow.com/questions/34817546/javascript-how-to-delete-last-two-characters-in-a-string
Just use trim if you don't want spaces
"11.01 °C".slice(0,-2).trim()

Here is an alternative that i don't think i've seen in the other answers, just for fun.
var strArr = "hello i'm a string".split("");
strArr.pop();
document.write(strArr.join(""));
Not as legible or simple as slice or substring but does allow you to play with the string using some nice array methods, so worth knowing.

debris = string.split("_") //explode string into array of strings indexed by "_"
debris.pop(); //pop last element off the array (which you didn't want)
result = debris.join("_"); //fuse the remainng items together like the sun

If you want to do generic rounding of floats, instead of just trimming the last character:
var float1 = 12345.00,
float2 = 12345.4567,
float3 = 12345.982;
var MoreMath = {
/**
* Rounds a value to the specified number of decimals
* #param float value The value to be rounded
* #param int nrDecimals The number of decimals to round value to
* #return float value rounded to nrDecimals decimals
*/
round: function (value, nrDecimals) {
var x = nrDecimals > 0 ? 10 * parseInt(nrDecimals, 10) : 1;
return Math.round(value * x) / x;
}
}
MoreMath.round(float1, 1) => 12345.0
MoreMath.round(float2, 1) => 12345.5
MoreMath.round(float3, 1) => 12346.0
EDIT: Seems like there exists a built in function for this, as Paolo points out. That solution is obviously much cleaner than mine. Use parseFloat followed by toFixed

if(str.substring(str.length - 4) == "_bar")
{
str = str.substring(0, str.length - 4);
}

Via slice(indexStart, indexEnd) method - note, this does NOT CHANGE the existing string, it creates a copy and changes the copy.
console.clear();
let str = "12345.00";
let a = str.slice(0, str.length -1)
console.log(a, "<= a");
console.log(str, "<= str is NOT changed");
Via Regular Expression method - note, this does NOT CHANGE the existing string, it creates a copy and changes the copy.
console.clear();
let regExp = /.$/g
let b = str.replace(regExp,"")
console.log(b, "<= b");
console.log(str, "<= str is NOT changed");
Via array.splice() method -> this only works on arrays, and it CHANGES, the existing array (so careful with this one), you'll need to convert a string to an array first, then back.
console.clear();
let str = "12345.00";
let strToArray = str.split("")
console.log(strToArray, "<= strToArray");
let spliceMethod = strToArray.splice(str.length-1, 1)
str = strToArray.join("")
console.log(str, "<= str is changed now");

In cases where you want to remove something that is close to the end of a string (in case of variable sized strings) you can combine slice() and substr().
I had a string with markup, dynamically built, with a list of anchor tags separated by comma. The string was something like:
var str = "<a>text 1,</a><a>text 2,</a><a>text 2.3,</a><a>text abc,</a>";
To remove the last comma I did the following:
str = str.slice(0, -5) + str.substr(-4);

You can, in fact, remove the last arr.length - 2 items of an array using arr.length = 2, which if the array length was 5, would remove the last 3 items.
Sadly, this does not work for strings, but we can use split() to split the string, and then join() to join the string after we've made any modifications.
var str = 'string'
String.prototype.removeLast = function(n) {
var string = this.split('')
string.length = string.length - n
return string.join('')
}
console.log(str.removeLast(3))

Try to use toFixed
const str = "12345.00";
return (+str).toFixed(1);

Try this:
<script>
var x="foo_foo_foo_bar";
for (var i=0; i<=x.length; i++) {
if (x[i]=="_" && x[i+1]=="b") {
break;
}
else {
document.write(x[i]);
}
}
</script>
You can also try the live working example on http://jsfiddle.net/informativejavascript/F7WTn/87/.

#Jason S:
You can use slice! You just have to
make sure you know how to use it.
Positive #s are relative to the
beginning, negative numbers are
relative to the end.
js>"12345.00".slice(0,-1)
12345.0
Sorry for my graphomany but post was tagged 'jquery' earlier. So, you can't use slice() inside jQuery because slice() is jQuery method for operations with DOM elements, not substrings ...
In other words answer #Jon Erickson suggest really perfect solution.
However, your method will works out of jQuery function, inside simple Javascript.
Need to say due to last discussion in comments, that jQuery is very much more often renewable extension of JS than his own parent most known ECMAScript.
Here also exist two methods:
as our:
string.substring(from,to) as plus if 'to' index nulled returns the rest of string. so:
string.substring(from) positive or negative ...
and some other - substr() - which provide range of substring and 'length' can be positive only:
string.substr(start,length)
Also some maintainers suggest that last method string.substr(start,length) do not works or work with error for MSIE.

Use substring to get everything to the left of _bar. But first you have to get the instr of _bar in the string:
str.substring(3, 7);
3 is that start and 7 is the length.

replace non matches between delimiters

I've have a input string:
12345,3244,654,ffgv,87676,988ff,87657
I'm having a difficulty to transform all terms in the string that are not five digit numbers to a constant 34567 using regular expressions. So, the output would be like this:
12345,34567,34567,34567,87676,34567,87657
For this, I looked at two options:
negated character class: Not useful because it does not execute directly on this expression ,[^\d{5}],
lookahead and lookbehind: Issue here is that it doesn't include non-matched part in the result of this expression ,(?!\d{5}) or (?<!\d{5}), for the purpose of substitution/replace.
Once the desired expression is found, it would give a result so that one can replace non-matched part using tagged regions like \1, \2.
Is there any mechanism in regular expression tools to achieve the output as mentioned in the above example?
Edit: I really appreciate those who have answered non-regex solutions, but I would be more thankful if you provide a regex-based solution.

You don't need regex for this. You can use str.split to split the string at commas first and then for each item check if its length is greater than or equal to 5 and it contains only digits(using str.isdigit). Lastly combine all the items using str.join.
>>> s = '12345,3244,654,ffgv,87676,988ff,87657'
>>> ','.join(x if len(x) >= 5 and x.isdigit() else '34567' for x in s.split(','))
'12345,34567,34567,34567,87676,34567,87657'
Javascript version:
function isdigit(s){
for(var i=0; i <s.length; i++){
if(!(s[i] >= '0' && s[i] <= '9')){
return false;
}
}
return true;
}
arr = "12345,3244,654,ffgv,87676,988ff,87657".split(",");
for(var i=0; i < arr.length; i++){
if(arr[i].length < 5 || ! isdigit(arr[i])) arr[i] = '34567';
}
output = arr.join(",")

Try the following: /\b(?!\d{5})[^,]+\b/g
It constrains the expression between word boundaries (\b),
Followed by a negative look-ahead for non five digit numbers (!\d{5}),
Followed by any characters between ,
const expression = /\b(?!\d{5})[^,]+\b/g;
const input = '12345,3244,654,ffgv,87676,988ff,87657';
const expectedOutput = '12345,34567,34567,34567,87676,34567,87657';
const output = input.replace(expression, '34567');
console.log(output === expectedOutput, expectedOutput, output);

This approach uses /\b(\d{5})|(\w+)\b/g:
we match on boundaries (\b)
our first capture group captures "good strings"
our looser capture group gets the leftovers (bad strings)
our replacer() function knows the difference
const str = '12345,3244,654,ffgv,87676,988ff,87657';
const STAND_IN = '34567';
const massageString = (str) => {
const pattern = /\b(\d{5})|(\w+)\b/g;
const replacer = (match, goodstring, badstring) => {
if (goodstring) {
return goodstring;
} else {
return STAND_IN;
}
}
const r = str.replace(pattern,replacer);
return r;
};
console.log( massageString(str) );

I think the following would work for value no longer than 5 alphanumeric characters:
(,(?!\d{5})\w{1,5})
if longer than 5 alphanumeric characters, then remove 5 in above expression:
(,(?!\d{5})\w{1,})
and you can replace using:
,34567
You can see a demo on regex101. Of course, there might be faster non-regex methods for specific languages as well (python, perl or JS)

Javascript split function not correct worked with specific regex

I have a problem. I have a string - "\,str\,i,ing" and i need to split by comma before which not have slash. For my string - ["\,str\,i", "ing"]. I'm use next regex
myString.split("[^\],", 2)
but it's doesn't worked.

Well, this is ridiculous to avoid the lack of lookbehind but seems to get the correct result.
"\\,str\\,i,ing".split('').reverse().join('').split(/,(?=[^\\])/).map(function(a){
return a.split('').reverse().join('');
}).reverse();
//=> ["\,str\,i", "ing"]

Not sure about your expected output but you are specifying string not a regex, use:
var arr = "\,str\,i,ing".split(/[^\\],/, 2);
console.log(arr);
To split using regex, wrap your regex in /..../

This is not easily possible with js, because it does not support lookbehind. Even if you'd use a real regex, it would eat the last character:
> "xyz\\,xyz,xyz".split(/[^\\],/, 2)
["xyz\\,xy", "xyz"]
If you don't want the z to be eaten, I'd suggest:
var str = "....";
return str.split(",").reduce(function(res, part) {
var l = res.length;
if (l && res[l-1].substr(-1) == "\\" || l<2)
// ^ ^^ ^
// not the first was escaped limit
res[l-1] += ","+part;
else
res.push(part);
return;
}, []);

Reading between the lines, it looks like you want to split a string by , characters that are not preceded by \ characters.
It would be really great if JavaScript had a regular expression lookbehind (and negative lookbehind) pattern, but unfortunately it does not. What it does have is a lookahead ((?=) )and negative lookahead ((?!)) pattern. Make sure to review the documentation.
You can use these as a lookbehind if you reverse the string:
var str,
reverseStr,
arr,
reverseArr;
//don't forget to escape your backslashes
str = '\\,str\\,i,ing';
//reverse your string
reverseStr = str.split('').reverse().join('');
//split the array on `,`s that aren't followed by `\`
reverseArr = reverseStr.split(/,(?!\\)/);
//reverse the reversed array, and reverse each string in the array
arr = reverseArr.reverse().map(function (val) {
return val.split('').reverse().join('');
});

You picked a tough character to match- a forward slash preceding a comma is apt to disappear while you pass it around in a string, since '\,'==','...
var s= 'My dog, the one with two \\, blue \\,eyes, is asleep.';
var a= [], M, rx=/(\\?),/g;
while((M= rx.exec(s))!= null){
if(M[1]) continue;
a.push(s.substring(0, rx.lastIndex-1));
s= s.substring(rx.lastIndex);
rx.lastIndex= 0;
};
a.push(s);
/* returned value: (Array)
My dog
the one with two \, blue \,eyes
is asleep.
*/

Find something which will not be present in your original string, say "###". Replace "\\," with it. Split the resulting string by ",". Replace "###" back with "\\,".
Something like this:
<script type="text/javascript">
var s1 = "\\,str\\,i,ing";
var s2 = s1.replace(/\\,/g,"###");
console.log(s2);
var s3 = s2.split(",");
for (var i=0;i<s3.length;i++)
{
s3[i] = s3[i].replace(/###/g,"\\,");
}
console.log(s3);
</script>
See JSFiddle

Count number of words in string using JavaScript

I am trying to count the number of words in a given string using the following code:
var t = document.getElementById('MSO_ContentTable').textContent;
if (t == undefined) {
var total = document.getElementById('MSO_ContentTable').innerText;
} else {
var total = document.getElementById('MSO_ContentTable').textContent;
}
countTotal = cword(total);
function cword(w) {
var count = 0;
var words = w.split(" ");
for (i = 0; i < words.length; i++) {
// inner loop -- do the count
if (words[i] != "") {
count += 1;
}
}
return (count);
}
In that code I am getting data from a div tag and sending it to the cword() function for counting. Though the return value is different in IE and Firefox. Is there any change required in the regular expression? One thing that I show that both browser send same string there is a problem inside the cword() function.

[edit 2022, based on comment] Nowadays, one would not extend the native prototype this way. A way to extend the native protype without the danger of naming conflicts is to use the es20xx symbol. Here is an example of a wordcounter using that.
Old answer: you can use split and add a wordcounter to the String prototype:
if (!String.prototype.countWords) {
String.prototype.countWords = function() {
return this.length && this.split(/\s+\b/).length || 0;
};
}
console.log(`'this string has five words'.countWords() => ${
'this string has five words'.countWords()}`);
console.log(`'this string has five words ... and counting'.countWords() => ${
'this string has five words ... and counting'.countWords()}`);
console.log(`''.countWords() => ${''.countWords()}`);

I would prefer a RegEx only solution:
var str = "your long string with many words.";
var wordCount = str.match(/(\w+)/g).length;
alert(wordCount); //6
The regex is
\w+ between one and unlimited word characters
/g greedy - don't stop after the first match
The brackets create a group around every match. So the length of all matched groups should match the word count.

This is the best solution I've found:
function wordCount(str) {
var m = str.match(/[^\s]+/g)
return m ? m.length : 0;
}
This inverts whitespace selection, which is better than \w+ because it only matches the latin alphabet and _ (see http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.2.6)
If you're not careful with whitespace matching you'll count empty strings, strings with leading and trailing whitespace, and all whitespace strings as matches while this solution handles strings like ' ', ' a\t\t!\r\n#$%() d ' correctly (if you define 'correct' as 0 and 4).

You can make a clever use of the replace() method although you are not replacing anything.
var str = "the very long text you have...";
var counter = 0;
// lets loop through the string and count the words
str.replace(/(\b+)/g,function (a) {
// for each word found increase the counter value by 1
counter++;
})
alert(counter);
the regex can be improved to exclude html tags for example

//Count words in a string or what appears as words :-)
function countWordsString(string){
var counter = 1;
// Change multiple spaces for one space
string=string.replace(/[\s]+/gim, ' ');
// Lets loop through the string and count the words
string.replace(/(\s+)/g, function (a) {
// For each word found increase the counter value by 1
counter++;
});
return counter;
}
var numberWords = countWordsString(string);

Javascript substr(); limit by word not char

I would like to limit the substr by words and not chars. I am thinking regular expression and spaces but don't know how to pull it off.
Scenario: Limit a paragraph of words to 200 words using javascript/jQuery.
var $postBody = $postBody.substr(' ',200);
This is great but splits words in half :) Thanks ahead of time!

function trim_words(theString, numWords) {
expString = theString.split(/\s+/,numWords);
theNewString=expString.join(" ");
return theNewString;
}

if you're satisfied with a not-quite accurate solution, you could simply keep a running count on the number of space characters within the text and assume that it is equal to the number of words.
Otherwise, I would use split() on the string with " " as the delimiter and then count the size of the array that split returns.

very quick and dirty
$("#textArea").val().split(/\s/).length

I suppose you need to consider punctuation and other non-word, non-whitespace characters as well. You want 200 words, not counting whitespace and non-letter characters.
var word_count = 0;
var in_word = false;
for (var x=0; x < text.length; x++) {
if ( ... text[x] is a letter) {
if (!in_word) word_count++;
in_word = true;
} else {
in_word = false;
}
if (!in_word && word_count >= 200) ... cut the string at "x" position
}
You should also decide whether you treat digits as a word, and whether you treat single letters as a word.

We Keep Coding

JavaScript is the programming language of the Web.

Javascript regex: match first 50 characters, respecting words - javascript

Related

Removing last two characters of a string [duplicate]

replace non matches between delimiters

Javascript split function not correct worked with specific regex

Count number of words in string using JavaScript

Javascript substr(); limit by word not char

Categories

Resources