Cut string into chunks without breaking words based on max length

Cut string into chunks without breaking words based on max length - javascript

I have a string that I would like to break down into an array.
Each index needs to have a max letter, say 15 characters. Each point needs to be at a words end, with no overlap of the maximum characters (IE would stop at 28 chars before heading into next word).
I've been able to do similar things using regex in the past, but I'm trying to make this work with an online platform that does not like regex.
Example string:
Hi this is a sample string that I would like to break down into an array!
Desired result # 15 char max:
Hi this is a
sample string
that I would
like to break
down into an
array!

Considering there's no word bigger then max limit
function splitString (n,str){
let arr = str?.split(' ');
let result=[]
let subStr=arr[0]
for(let i = 1; i < arr.length; i++){
let word = arr[i]
if(subStr.length + word.length + 1 <= n){
subStr = subStr + ' ' + word
}
else{
result.push(subStr);
subStr = word
}
}
if(subStr.length){result.push(subStr)}
return result
}
console.log(splitString(15,'Hi this is a sample string that I would like to break down into an array!'))

Related

how to split string every 2 character

var alphabet = "FIN SLHOJVHEN GYKOHU";
I want to split it every 2 character so it would print
"I LOVE YOU "
I already try this but it didn't work
for (var i = 0 ; i \< alphabet.length ; i+=2 ){
alphabet.split(i)
correct me please

You can transform the string into an array, filter it and make it a string again.
let alphabet = "FIN SLHOJVHEN GYKOHU";
alphabet = [...alphabet].filter((_, i) => i%2).join("");
console.log(alphabet); //I LOVE YOU;

Also this one results in I LOVE YOU (regex101 demo).
// the string
let alphabet = 'FIN SLHOJVHEN GYKOHU';
// capture ^ start or chr, match latter chr
alphabet = alphabet.replace(/(^|.)./g, '$1');
console.log(alphabet);

Since the split function will split the given string by the delimiter you passed, it seems to me that you wish to first split the words (using empty space) contained in the encoded string and only then take only the characters at even positions to include in the decoded string.
This is a demo achieving that:
const encoded = "FIN SLHOJVHEN GYKOHU";
const words = encoded.split(' ');
let decoded = '';
words.forEach((word)=>{
for (let i=1;i<word.length;i+=2){
decoded += word[i];
}
decoded += ' ';
});
console.log(decoded);

Using a regex replacement approach we can try:
var alphabet = "FIN SLHOJVHEN GYKOHU";
var output = alphabet.replace(/[A-Z]([A-Z]|(?=\s))/g, "$1");
console.log(output);
Here is an explanation of the regex pattern:
[A-Z] match a single (odd) uppercase letter
( open capture
[A-Z] an uppercase letter
| OR
(?=\s) lookahead and find a space
)
In other words, we match an odd letter and then capture the next letter, unless that odd letter happen to be the last in the word. Then we replace with just the captured even letter, if available.

you already have different ways to do it, I'm adding one, so you'll get totally confused! hehe
This one is recursive:
we take your string alphabet
first 2 letters (every 2)
last one of this 2 characters string is stored in toPrint variable
delete the first 2 characters from alphabet
... loop till alphabet empty
Your toPrint has I Love You
Certainly not the fastest one, but nice.
let alphabet = "FIN SLHOJVHEN GYKOHU";
let toPrint = '';
do {
let temp = alphabet.slice(0, 2);
toPrint += temp[1];
alphabet = alphabet.slice(2, alphabet.length);
} while (alphabet !== '');
console.log(toPrint);

You can start your loop at 1 as you want every second character, and note that \< should be <
In the loop, i is the position of the character, so you would still have to get the character for that position and then assemble the resulting string.
var alphabet = "FIN SLHOJVHEN GYKOHU";
var result = "";
for (var i = 1; i < alphabet.length; i += 2) {
result += alphabet[i];
}
console.log(result);
If you want to take the spaces into account and only get the 2nd non whitespace character you might:
get the separate words by splitting on whitespace chars
remove the empty entries
join every second character to a string
join the parts from the initial split with a space
const alphabet = " FIN SLHOJVHEN GYKOHU ";
const result = alphabet
.split(/\s+/)
.filter(Boolean)
.map(s => s.split("").filter((s, i) => i % 2).join(""))
.join(" ");
console.log(result);
If you have a browser where a positive lookbehind for a regex is supported:
const alphabet = " FIN SLHOJVHEN GYKOHU ";
const result = alphabet
.split(/\s+/)
.filter(Boolean)
.map(s => s.match(/(?<=^(?:..)*.)./g).join(""))
.join(" ");
console.log(result);

var alphabet = "FIN SLHOJVHEN GYKOHU";
const arrayAlpha = alphabet.split('');
let stringToPrint = '';
for(let i=1; i<arrayAlpha.length; i+=2){
stringToPrint = stringToPrint + arrayAlpha[i]
}
console.log(stringToPrint)

Calculate an alphabetic score for a word

How can I generate a numeric score for a string, which I can later user to order things alphabetically?
(I'd like to add objectIds to a redis sorted set based on a name property. This needs a numeric score. My list-of-things might be too big to sort all at once, hence wanting to score each item individually)
Words earlier in an alphabetic list should have a lower score, with 'a' = 0.
My naive approach so far; (letter alphabetic position from Replace a letter with its alphabet position )
function alphaScoreString(inputString) {
let score = 0
inputString
.trim()
.toLowerCase()
.split('')
.map((letter, index) => {
const letterNumber = parseInt(letter, 36) - 10
if (letterNumber >= 0) {
score += letterNumber / (index + 1)
}
})
return score * 1000
}
This does not work, as
alphaScoreString('bb')
1500
alphaScoreString('bc')
2000
alphaScoreString('bbz')
9833.333333333334
You can see that 'bbz' has a higher score than 'bc', whereas it should be lower, as 'bbz' would come before 'bc' in an alphabetical list.

You can convert each character to its unicode (and ensure that every character is 4 digits by padding the string. e.g. "H" = 72 but is padded to 0072: Doing a word by word comparison, you can still determine the 'alphabetical order' of each string:
var instring = "Hello World";
var output = "";
for(i=0; i<instring.length;i++){
const newchar = String(instring.charCodeAt(i)).padStart(4, '0');
output = output.concat(newchar)
console.log(output);
}

Answer writen in python.
char_codex = {'a':0.01, 'b':0.02, 'c':0.03, 'd':0.04, 'e':0.05, 'f':0.06,
'g':0.07, 'h':0.08, 'i':0.09, 'j':0.10, 'k':0.11, 'l':0.12,
'm':0.13, 'n':0.14, 'o':0.15, 'p':0.16, 'q':0.17, 'r':0.18,
's':0.19, 't':0.20, 'u':0.21, 'v':0.22, 'w':0.23, 'x':0.24,
'y':0.25, 'z':0.26}
def alphabetic_score(word):
bitwiseshift = '1'
scores = [0.00] * len(word)
for index, letter in enumerate(word.lower()):
if index is 0:
scores[index] = char_codex[letter]
else:
bitwiseshift = bitwiseshift+'00'
scores[index] = char_codex[letter]/int(bitwiseshift)
return sum(scores)

Make a utf-8 string shorter with a utf-32 encoding in Javascript?

I'm trying to find a way to compress/decompress a string in Javascript. By compress I mean to make the string look shorter (less char). That's my goal.
Here's an example of how things should work:
// The string that I want to make shorter
// It will only contain [a-zA-Z0-9] chars and some ponctuations like ()[]{}.,;'"!
var string = "I like bananas !";
// The compressed string, maybe something like "䐓㐛꯱字",
// which is shorter than the original
var shortString = compress(string);
// The original string, "I like banana !"
var originalString = decompress(shortString);
Here's my first idea (maybe there's a better way to get to my goal, and if so I'm interested in it).
I know that my original string will be in utf-8. So I'm thinking of using utf-32 for the encoding, which should divide by 4 the length of the string.
But I don't know how to do these 2 functions that construct new strings with different encoding. Here's the code I have so far that doesn't work...
function compress(string) {
string = unescape(encodeURIComponent(string));
var newString = '';
for (var i = 0; i < string.length; i++) {
var char = string.charCodeAt(i);
newString += parseInt(char, 8).toString(32);
}
return newString;
}

Since you're using a set of less than 100 characters and that javascript strings are encoded in UTF-16 (which mean you have 65536 possible characters), what you can do is concatenate the character codes so as to have one "compressed" character per two basic character. This allows you to compress strings to half the length.
Like this for example:
document.getElementById('compressBtn').addEventListener('click', function() {
var stringToCompress = document.getElementById('tocompress').value;
var compressedString = compress(stringToCompress);
var decompressedString = decompress(compressedString);
if (stringToCompress === decompressedString) {
document.getElementById('display').innerHTML = stringToCompress + ", length of " + stringToCompress.length  + " characters compressed to " + compressedString + ", length of " + compressedString.length + " characters back to " + decompressedString;
} else {
document.getElementById('display').innerHTML = "This string cannot be compressed"
}
})
function compress(string) {
string = unescape(encodeURIComponent(string));
var newString = '',
char, nextChar, combinedCharCode;
for (var i = 0; i < string.length; i += 2) {
char = string.charCodeAt(i);
if ((i + 1) < string.length) {
// You need to make sure that you don't have 3 digits second character else you might go over 65536.
// But in UTF-16 the 32 characters aren't in your basic character set. But it's a limitation, anything
// under charCode 32 will cause an error
nextChar = string.charCodeAt(i + 1) - 31;
// this is to pad the result, because you could have a code that is single digit, which would make
// decompression a bit harder
combinedCharCode = char + "" + nextChar.toLocaleString('en', {
minimumIntegerDigits: 2
});
// You take the concanated code string and convert it back to a number, then a character
newString += String.fromCharCode(parseInt(combinedCharCode, 10));
} else {
// Here because you won't always have pair number length
newString += string.charAt(i);
}
}
return newString;
}
function decompress(string) {
var newString = '',
char, codeStr, firstCharCode, lastCharCode;
for (var i = 0; i < string.length; i++) {
char = string.charCodeAt(i);
if (char > 132) {
codeStr = char.toString(10);
// You take the first part of the compressed char code, it's your first letter
firstCharCode = parseInt(codeStr.substring(0, codeStr.length - 2), 10);
// For the second one you need to add 31 back.
lastCharCode = parseInt(codeStr.substring(codeStr.length - 2, codeStr.length), 10) + 31;
// You put back the 2 characters you had originally
newString += String.fromCharCode(firstCharCode) + String.fromCharCode(lastCharCode);
} else {
newString += string.charAt(i);
}
}
return newString;
}
var stringToCompress = 'I like bananas!';
var compressedString = compress(stringToCompress);
var decompressedString = decompress(compressedString);
document.getElementById('display').innerHTML = stringToCompress + ", length of " + stringToCompress.length  + " characters compressed to " + compressedString + ", length of " + compressedString.length + " characters back to " + decompressedString;
body {
padding: 10px;
}
#tocompress {
width: 200px;
}
<input id="tocompress" placeholder="enter string to compress" />
<button id="compressBtn">
Compress input
</button>
<div id="display">
</div>
Regarding the possible use of UTF-32 to further compress, I'm not sure it's possible, I might be wrong on that, but from my understanding it's not feasible. Here's why:
The approach above is basically concatenating two 1 byte values in one 2 bytes value. This is possible because javascript strings are encoded in 2 bytes (or 16 bits) (note that from what I understand the engine could decide to store differently making this compression unnecessary from a purely memory space point of view - that being said, in the end, one character is considered being 16 bits). A cleaner way to make the compression above would in fact to user the binary numbers instead of the decimal, it would make much more sense. Like this for example:
document.getElementById('compressBtn').addEventListener('click', function() {
var stringToCompress = document.getElementById('tocompress').value;
var compressedString = compress(stringToCompress);
var decompressedString = decompress(compressedString);
if (stringToCompress === decompressedString) {
document.getElementById('display').innerHTML = stringToCompress + ", length of " + stringToCompress.length + " characters compressed to " + compressedString + ", length of " + compressedString.length + " characters back to " + decompressedString;
} else {
document.getElementById('display').innerHTML = "This string cannot be compressed"
}
})
function compress(string) {
string = unescape(encodeURIComponent(string));
var newString = '',
char, nextChar, combinedCharCode;
for (var i = 0; i < string.length; i += 2) {
// convert to binary instead of keeping the decimal
char = string.charCodeAt(i).toString(2);
if ((i + 1) < string.length) {
nextChar = string.charCodeAt(i + 1).toString(2) ;
// you still need padding, see this answer https://stackoverflow.com/questions/27641812/way-to-add-leading-zeroes-to-binary-string-in-javascript
combinedCharCode = "0000000".substr(char.length) + char + "" + "0000000".substr(nextChar.length) + nextChar;
// You take the concanated code string and convert it back to a binary number, then a character
newString += String.fromCharCode(parseInt(combinedCharCode, 2));
} else {
// Here because you won't always have pair number length
newString += string.charAt(i);
}
}
return newString;
}
function decompress(string) {
var newString = '',
char, codeStr, firstCharCode, lastCharCode;
for (var i = 0; i < string.length; i++) {
char = string.charCodeAt(i);
if (char > 132) {
codeStr = char.toString(2);
// You take the first part (the first byte) of the compressed char code, it's your first letter
firstCharCode = parseInt(codeStr.substring(0, codeStr.length - 7), 2);
// then the second byte
lastCharCode = parseInt(codeStr.substring(codeStr.length - 7, codeStr.length), 2);
// You put back the 2 characters you had originally
newString += String.fromCharCode(firstCharCode) + String.fromCharCode(lastCharCode);
} else {
newString += string.charAt(i);
}
}
return newString;
}
var stringToCompress = 'I like bananas!';
var compressedString = compress(stringToCompress);
var decompressedString = decompress(compressedString);
document.getElementById('display').innerHTML = stringToCompress + ", length of " + stringToCompress.length + " characters compressed to " + compressedString + ", length of " + compressedString.length + " characters back to " + decompressedString;
<input id="tocompress" placeholder="enter string to compress" />
<button id="compressBtn">
Compress input
</button>
<div id="display">
</div>
So why not push the logic and use utf-32, which should be 4 bytes, meaning four 1 byte characters. One problem is that javascript has 2 bytes string. It's true that you can use pairs of 16 bits characters to represent utf-32 characters. Like this:
document.getElementById('test').innerHTML = "\uD834\uDD1E";
<div id="test"></div>
But if you test the length of the resulting string, you'll see that it's 2, even if there's only one "character". So from a javascript perspective, you're not reducing the actual string length.
The other thing is that UTF-32 has in fact 221 characters. See here: https://en.wikipedia.org/wiki/UTF-32
It is a protocol to encode Unicode code points that uses exactly 32
bits per Unicode code point (but a number of leading bits must be zero
as there are fewer than 221 Unicode code points)
So you don't really have 4 bytes, in fact you don't even have 3, which would be needed to encode 3. So UTF-32 doesn't seem to be a way to compress even more. And since javascript has native 2 bytes strings, it seems to me to be the most efficient - using that approach at least.

If your strings only contain ASCII characters [0, 127] you can "compress" the string using a custom 6 or 7-bit code page.
You can do this several ways, but I think one of the simpler methods is to define an array holding all allowed characters - a LUT, lookup-table if you like, then use its index value as the encoded value. You would of course have to manually mask and shift the encoded value into a typed array.
If your LUT looked like this:
var lut = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,:;!(){}";
you would in this case deal with a LUT of length 71 which means we would need to use a 7-bit range or [0, 127] (if length were 64 we could've reduced the it to 6-bit [0, 63] values).
Then you would take each characters in the string and convert to index values (you would normally do all the following steps in a single operation but I have separated them for simplicity):
var lut = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,:;!(){}";
var str = "I like bananas !";
var page = [];
Array.prototype.forEach.call(str, function(ch) {
var i = lut.indexOf(ch);
if (i < 0) throw "Invalid character - can't encode";
page.push(i);
});
console.log("Intermediate page:", page);
You can always tweak the LUT so that the most used characters are in the beginning, then support variable encoding bit-range, find max value and use that to determine what range you want to encode in. You can add an initial bit as a flag as to which range the encoding uses (for example bit 0 set if 6-bit fits, otherwise use 7-bit range).
Now that you know the indices we can start to encode the binary output itself using a 7-bit approach. Since JavaScript only support byte values, i.e. 8-bit width, we have to do all the split, shift and merge operations manually.
This means we need to keep track of remainder and position on a bit-level.
Say first index value was the following 7-bit value (full 7-bit range for readability - all in pseudo format):
&b01111111
The first step would be to shift it over to bit position 0 and keep track of a remainder:
&b01111111 << 1
Resulting in:
&b11111110
^
new bit position: 7
new remainder : 1
Then the next index value, for example:
&b01010101
would be encoded like this - first convert to 7-bit value in its own byte representation:
&b01010101 << 1 => &b10101010
Then get the reminder part first. To obtain this will shift everything right-wise using 8-bit minus the current remainder (within modulo of 8):
remainderValue = &b10101010 >>> (8 - remainder)
leaving us with the following representation:
&b00000001
(Note that we use triple >>> to shift right to avoid issues with sign.)
Next step now is to merge this value with our previous value that has already been encoded and stored into our destination byte array - for this we'll use an OR operation:
Index 0 New value Result in index 0 (index of dst. array)
&b11111110 | &b00000001 => &b11111111
then go to next index in our destination array and store the rest of the current value, then update the remainder and position.
The "leftover" of the byte is calculated like this using the original (after shifting it) 7-bit byte value:
leftover = &b10101010 << remainder => &b01010100
which we now put into the next position:
Index 0 Index 1 (destination array index, not page index)
&b11111111 01010100
^
new bit position: 14
new remainder : 2
And so on with the remaining index values. See this answer for actual code on how you can do this in JavaScript - the code in this answer doesn't deal with string encoding per-se, but it shows how you can shift byte buffers bit-wise which is essentially the same you need for this task.
To calculate the remainder step, use 8-bits minus your custom bit-range:
step = 8 - newRange (here 7) => 1
This will also be the start remainder. For each character, you'll add the step to remainder after it has been processed, but remember to use modulo 8 (byte width) when you use it for shifting:
remainder += step;
numOfBitsToShift = remainder % 8;
Bit-position uses of course the bit-range, in this case 7:
bitPosition += 7;
Then to find which indices you're dealing with you divide the bitPosition on 8, if any decimal you have to deal with two indexes (old and new), if no decimal the current position represents new index only (only shift is needed for current index value).
You can also use modulo and when modulo of remainder = step you know you that you are dealing with a single index in the destination.
To calculate the final length you would use the bit-length and length of string, then ceil the result so that all characters will fit into a 8-byte byte array which is the only array we can get in JavaScript:
dstLength = Math.ceil(7 * str.length / 8);
To decode you just reverse all the steps.
An alternative, if you use long strings or have to move forward fast, is to use an established compressor such as zlib which has a very compact header as well as good performance in JavaScript in the case of the linked solution. This will also deal with "patterns" in the string to further optimize the resulting size.
Disclaimer: as this is mostly a theoretical answer there might be some errors. Feel free to comment if any are found. Refer to linked answer for actual code example.

for full code see here: https://repl.it/NyMl/1
using the Uint8Array you can work with the bytes.
let msg = "This is some message";
let data = []
for(let i = 0; i < msg.length; ++i){
data[i] = msg.charCodeAt(i);
}
let i8 = new Uint8Array(data);
let i16 = new Uint16Array(i8.buffer);
you could also think of a compression like this: http://pieroxy.net/blog/pages/lz-string/demo.html
if you don't want to use a 3rd party library, the lz based compression should be fairly simple. see here (wikipedia)

I use the same library mentioned above, lz-string https://github.com/pieroxy/lz-string, and it creates file sizes that are smaller than most of the binary formats like Protocol Buffers.
I compress via Node.js like this:
var compressedString = LZString.compressToUTF16(str);
And I decompress client side like this:
var decompressedString = LZString.decompressFromUTF16(str);

how to count digits of a phonenumber(included 0) in javascript?

I am trying to write a javascript , And want to count digits of a var str,
in the code below var str is 6 digits (012345), but when i run this code it is showing answer 4. i tried to search on google but answer not found;
how to get correct answer and fix it ?
my code
var str = 012345;
var x = String(str);
var n = x.length;
document.getElementById("demo").innerHTML = "var str is[" + n + "] Digits";

Initialize you phone number as string using quotes.
var str = '0123213'
And use length property to get its length

If you were to actually have a mixed letter/number string from which you wanted to get the number of digits you could use a regex. match creates an array of all the matches in the string - in this case \d, a digit (g says to check the whole of the string, not give up the search when the first digit has been found.) You can then check the length of the returned array.
'01xx2s3eg345'.match(/\d/g).length; // 7

try replacing the code with:
var str = "012345";
var n = str.length;
document.getElementById("demo").innerHTML = "var str is[" + n + "] Digits";

create all possible variations of a string with inserted character

I'm trying to take the variable email and create all possible combinations with a "." in it like so:
Results
andrew
andre.w
andr.ew
andr.e.w
and.rew
and.re.w
and.r.ew
and.r.e.w
an.drew
an.dre.w
an.dr.ew
an.dr.e.w
an.d.rew
an.d.re.w
an.d.r.ew
an.d.r.e.w
a.ndrew
a.ndre.w
a.ndr.ew
a.ndr.e.w
a.nd.rew
a.nd.re.w
a.nd.r.ew
a.nd.r.e.w
a.n.drew
a.n.dre.w
a.n.dr.ew
a.n.dr.e.w
a.n.d.rew
a.n.d.re.w
a.n.d.r.ew
a.n.d.r.e.w
I'm not sure how to do about doing this exactly. I know how to use a loop to go over each character, but as far as the rest goes I'm stumped. I was looking at substr, slice and few other functions but couldn't get anything working.
Code
var email = "andrew";
for (var i = 0; i < email.length; i++) {
console.log( email[i] + "." );
}

That's easy:
var str = 'andrew';
var results = [],
bin;
for (var i = 0; i < Math.pow(2, str.length - 1); ++i) {
bin = i.toString(2).split('').reverse().join('');
results.push(str.replace(/./g, function(letter, index) {
if (bin.charAt(index) == 1) {
letter += '.';
}
return letter;
}));
}
console.log(results);
Demo: http://jsfiddle.net/9qLY6/
Short description:
For 'abc' string there are 2 positions for a dot character: between a and b; b and c. These 2 positions might be presented as a digits of a binary number. All the possible combinations in this case are:
00
01
10
11
If you treat 1 as - . there, and 0 as no . there - you can just iterate over 2^(n-1) numbers and put . if the corresponding bit is set.

If you're interested in a recursive solution like Dinesh mentioned, here's some code to get you started.
function withPeriods(str, prev) {
prev = prev || '';
if(!str || str.length == 0) {
return prev ? [prev] : [];
} else if(str.length == 1) {
return [prev + str];
} else {
var c = str.charAt(0);
var newStr = str.slice(1);
return withPeriods(newStr, prev+c).concat(withPeriods(newStr, prev+c+'.'));
}
}
The idea here is that you are working your way through the string, keeping the current result in the 'prev' variable. If the string is length 0 or 1, there's nothing left to do. Otherwise, you need consider two options: one where you take a character from 'str' and add it to 'prev', and one where you do that but also add a '.'

If you think about it, you need to either insert a dot, or not insert one, at every possible location in the string (between any two characters). A funky way to do this is to realize that if you have n characters, there are n-1 places. If you wrote the combinations of period = 1 and no period = 0, then you can write all possible solutions as a 2^n-1 binary sequence. Showing this for a four letter word "word":
000 word
001 wor.d
010 wo.rd
011 wo.r.d
100 w.ord
101 w.or.d
110 w.o.rd
111 w.o.r.d
In pseudo code (can't test JS syntax right now):
n = strlen( email );
combinations = 1 << n - 1; // left shift operation
for i = 0 to combinations - 1:
dot = 1
for j = 0 to n:
print email[j];
if dot & i:
print '.'
dot << 1;
Can you take it from here?

You might take a recursive approach to this problem. Maybe you can use the base case as a string with 2 characters.

We Keep Coding

JavaScript is the programming language of the Web.

Cut string into chunks without breaking words based on max length - javascript

Related

how to split string every 2 character

Calculate an alphabetic score for a word

Make a utf-8 string shorter with a utf-32 encoding in Javascript?

how to count digits of a phonenumber(included 0) in javascript?

create all possible variations of a string with inserted character

Categories

Resources