I'm trying to take the variable email and create all possible combinations with a "." in it like so:
Results
andrew
andre.w
andr.ew
andr.e.w
and.rew
and.re.w
and.r.ew
and.r.e.w
an.drew
an.dre.w
an.dr.ew
an.dr.e.w
an.d.rew
an.d.re.w
an.d.r.ew
an.d.r.e.w
a.ndrew
a.ndre.w
a.ndr.ew
a.ndr.e.w
a.nd.rew
a.nd.re.w
a.nd.r.ew
a.nd.r.e.w
a.n.drew
a.n.dre.w
a.n.dr.ew
a.n.dr.e.w
a.n.d.rew
a.n.d.re.w
a.n.d.r.ew
a.n.d.r.e.w
I'm not sure how to do about doing this exactly. I know how to use a loop to go over each character, but as far as the rest goes I'm stumped. I was looking at substr, slice and few other functions but couldn't get anything working.
Code
var email = "andrew";
for (var i = 0; i < email.length; i++) {
console.log( email[i] + "." );
}
That's easy:
var str = 'andrew';
var results = [],
bin;
for (var i = 0; i < Math.pow(2, str.length - 1); ++i) {
bin = i.toString(2).split('').reverse().join('');
results.push(str.replace(/./g, function(letter, index) {
if (bin.charAt(index) == 1) {
letter += '.';
}
return letter;
}));
}
console.log(results);
Demo: http://jsfiddle.net/9qLY6/
Short description:
For 'abc' string there are 2 positions for a dot character: between a and b; b and c. These 2 positions might be presented as a digits of a binary number. All the possible combinations in this case are:
00
01
10
11
If you treat 1 as - . there, and 0 as no . there - you can just iterate over 2^(n-1) numbers and put . if the corresponding bit is set.
If you're interested in a recursive solution like Dinesh mentioned, here's some code to get you started.
function withPeriods(str, prev) {
prev = prev || '';
if(!str || str.length == 0) {
return prev ? [prev] : [];
} else if(str.length == 1) {
return [prev + str];
} else {
var c = str.charAt(0);
var newStr = str.slice(1);
return withPeriods(newStr, prev+c).concat(withPeriods(newStr, prev+c+'.'));
}
}
The idea here is that you are working your way through the string, keeping the current result in the 'prev' variable. If the string is length 0 or 1, there's nothing left to do. Otherwise, you need consider two options: one where you take a character from 'str' and add it to 'prev', and one where you do that but also add a '.'
If you think about it, you need to either insert a dot, or not insert one, at every possible location in the string (between any two characters). A funky way to do this is to realize that if you have n characters, there are n-1 places. If you wrote the combinations of period = 1 and no period = 0, then you can write all possible solutions as a 2^n-1 binary sequence. Showing this for a four letter word "word":
000 word
001 wor.d
010 wo.rd
011 wo.r.d
100 w.ord
101 w.or.d
110 w.o.rd
111 w.o.r.d
In pseudo code (can't test JS syntax right now):
n = strlen( email );
combinations = 1 << n - 1; // left shift operation
for i = 0 to combinations - 1:
dot = 1
for j = 0 to n:
print email[j];
if dot & i:
print '.'
dot << 1;
Can you take it from here?
You might take a recursive approach to this problem. Maybe you can use the base case as a string with 2 characters.
Related
I have a large text from which I read data according to the scheme. Key words are placed in the "smallArtName" array. The scheme looks like this:
(key word) xxx (cordX|cordY)
I can't convert the string I received to a number. It seems to me that the reason is white space, visible in the terminal in the picture. I tried to use the replace method which works for sample text, but not for my value.
I'm a beginner and I could probably do it simpler, but the code I wrote works, and this is the most important thing for now.
for (i = 0; i < smallArtName.length; i++) {
var n = art.artPrintScreen.indexOf(smallArtName[i]);
if (n > -1) {
var tempString = art.artPrintScreen.substring(n, n + 100);
betweenChar = tempString.indexOf('|');
for (k = betweenChar - 10; k <= betweenChar + 10; k++) {
if (tempString[k] == '(') {
xStart = k;
}
if (tempString[k] == ')') {
yEnd = k;
}
}
cordX = tempString.slice(xStart + 1, betweenChar);
cordY = tempString.slice(betweenChar + 1, yEnd);
strTest = " t est".replace(/\s/g, '')
var cordY2 = cordY.replace(/\s/g, '')
console.log(typeof (cordY))
console.log(cordY2)
console.log(cordY2[0])
console.log(cordY2[1])
console.log(cordY2[2])
console.log(cordY2[3])
console.log(cordY2[4])
console.log(cordY2[5])
console.log(strTest)
var cordYtest = parseInt(cordY2, 10);
console.log(cordYtest)
}
}
Terminal:
-181
-
1
8
1
test
NaN
string
-154
-
1
5
4
test
NaN
string
104
1
0
4
undefined
test
NaN
Fragment of input text:
Ukryta twierdza (Mapa podziemi I) 153 (−72|−155)
Ukryta twierdza (Amfora Mgły VI) 135 (73|104)
Ukryta twierdza (Mapa podziemi IV) 131 (154|−72)
Analysing your sample input strings, I found some unicode characters \u202c and \u202d that should be stripped before converting to number. Also, the negative values are prefixed by the character −, which is different than minus -, se we need to replace it. That being said, all parsing could be done with a single regex:
var input = "Ukryta twierdza (Mapa podziemi I) 153 (−72|−155)";
input = input.replace(/\u202d|\u202c/g, "");
input = input.replace(/−/g, "-");
var m = input.match(/.*\((.*)\)\s*(.+?)\s*\((.+)\|(.+)\)/);
console.log(m);
console.log(parseInt(m[3]));
console.log(parseInt(m[4]));
Explaining the regex:
.* - Something that will be ignored
\((.*)\) - Something enclosed in parenthesis
\s*(.+?)\s* - Something possibly surrounded by spaces
\((.+)\|(.+)\) - Two parts split by a | and enclosed by parenthesis
I stuck on this can you help in JavaScript Alien message
Allowed languages
JavaScript
Your task is to translate a message in some alien language (let's call it Alienski).
The message could be created by following simple rules and from two known languages, English and Spanish.
Each word in Alienski is constructed by subtracting the letters from English and Spanish (absolute value) and that is the resulting letter.
There are two special cases. If in each of the words the symbol is '-' (hyphen) or ' ' (space) it is mandatory for it to be kept this way.
There won't be a case with a '-' (hyphen) and a ' ' (space) at the same time.
If one of the words is with more letters than the other just add the letters from the longer word to the result.
Example:
Copy
talk
hablar
Copy
a b c d....
0 1 2 3....
t - h = | 19 - 7 | = 12 = m
a - a = | 0 - 0 | = 0 = a
l - b = | 11 - 1 | = 10 = k
k - l = | 10 - 11 | = 1 = b
empty - a = a
empty - r = r
Result:
makbar
I stuck from 3 hours on this. Here is my code so far
let englishWord = 'talk'
let spanishWord = 'hablar'
let engToDigit = [];
let spnToDigit = [];
let alien = [];
for (var i = 0; i < englishWord.length; i++) {
engToDigit.push(englishWord.charCodeAt(i))
}
for (var y = 0; y < spanishWord.length; y++) {
spnToDigit.push(spanishWord.charCodeAt(y))
}
let result = engToDigit.map((a, i) => a - spnToDigit[i]);
for (let index = 0; index < result.length; index++) {
result[index] += 97;
console.log(result);
What it sounds like you need is to take this in small steps. First I would make a function that iterates through a string and converts each letter to its ASCII code. Try the following order:
Check if code is uppercase then get the numeric value.
Make sure charCode is greather than 96 and charCode is less than 123
Then turn all the codes to their numeric value by running and
collecting in an array: charCode - 97
Else check if the code is lower case then get the numeric value.
Make sure that charCode is greater than 64 and charCode is less than 91.
Then turn all the codes to their numeric value by running and collecting in an array: charCode - 65
Else just add the value to the array.
Outside the above loop return an array that is joined.
When the array is joined it will be a string like "19,0,11,10,-,7,0,1,11,0,17".
Check if there is a space or a hyphen.
Then you can split the array on the result of step 9.
Then split each array on ",".
Loop through each array and subtract the values.
Convert the values back by adding 65 - because there is no way at this point to know if a character was upper case.
Then use String.fromCharCode(##) to convert the code back to the non-readable alien word.
I'm trying to find a way to compress/decompress a string in Javascript. By compress I mean to make the string look shorter (less char). That's my goal.
Here's an example of how things should work:
// The string that I want to make shorter
// It will only contain [a-zA-Z0-9] chars and some ponctuations like ()[]{}.,;'"!
var string = "I like bananas !";
// The compressed string, maybe something like "䐓㐛꯱字",
// which is shorter than the original
var shortString = compress(string);
// The original string, "I like banana !"
var originalString = decompress(shortString);
Here's my first idea (maybe there's a better way to get to my goal, and if so I'm interested in it).
I know that my original string will be in utf-8. So I'm thinking of using utf-32 for the encoding, which should divide by 4 the length of the string.
But I don't know how to do these 2 functions that construct new strings with different encoding. Here's the code I have so far that doesn't work...
function compress(string) {
string = unescape(encodeURIComponent(string));
var newString = '';
for (var i = 0; i < string.length; i++) {
var char = string.charCodeAt(i);
newString += parseInt(char, 8).toString(32);
}
return newString;
}
Since you're using a set of less than 100 characters and that javascript strings are encoded in UTF-16 (which mean you have 65536 possible characters), what you can do is concatenate the character codes so as to have one "compressed" character per two basic character. This allows you to compress strings to half the length.
Like this for example:
document.getElementById('compressBtn').addEventListener('click', function() {
var stringToCompress = document.getElementById('tocompress').value;
var compressedString = compress(stringToCompress);
var decompressedString = decompress(compressedString);
if (stringToCompress === decompressedString) {
document.getElementById('display').innerHTML = stringToCompress + ", length of " + stringToCompress.length + " characters compressed to " + compressedString + ", length of " + compressedString.length + " characters back to " + decompressedString;
} else {
document.getElementById('display').innerHTML = "This string cannot be compressed"
}
})
function compress(string) {
string = unescape(encodeURIComponent(string));
var newString = '',
char, nextChar, combinedCharCode;
for (var i = 0; i < string.length; i += 2) {
char = string.charCodeAt(i);
if ((i + 1) < string.length) {
// You need to make sure that you don't have 3 digits second character else you might go over 65536.
// But in UTF-16 the 32 characters aren't in your basic character set. But it's a limitation, anything
// under charCode 32 will cause an error
nextChar = string.charCodeAt(i + 1) - 31;
// this is to pad the result, because you could have a code that is single digit, which would make
// decompression a bit harder
combinedCharCode = char + "" + nextChar.toLocaleString('en', {
minimumIntegerDigits: 2
});
// You take the concanated code string and convert it back to a number, then a character
newString += String.fromCharCode(parseInt(combinedCharCode, 10));
} else {
// Here because you won't always have pair number length
newString += string.charAt(i);
}
}
return newString;
}
function decompress(string) {
var newString = '',
char, codeStr, firstCharCode, lastCharCode;
for (var i = 0; i < string.length; i++) {
char = string.charCodeAt(i);
if (char > 132) {
codeStr = char.toString(10);
// You take the first part of the compressed char code, it's your first letter
firstCharCode = parseInt(codeStr.substring(0, codeStr.length - 2), 10);
// For the second one you need to add 31 back.
lastCharCode = parseInt(codeStr.substring(codeStr.length - 2, codeStr.length), 10) + 31;
// You put back the 2 characters you had originally
newString += String.fromCharCode(firstCharCode) + String.fromCharCode(lastCharCode);
} else {
newString += string.charAt(i);
}
}
return newString;
}
var stringToCompress = 'I like bananas!';
var compressedString = compress(stringToCompress);
var decompressedString = decompress(compressedString);
document.getElementById('display').innerHTML = stringToCompress + ", length of " + stringToCompress.length + " characters compressed to " + compressedString + ", length of " + compressedString.length + " characters back to " + decompressedString;
body {
padding: 10px;
}
#tocompress {
width: 200px;
}
<input id="tocompress" placeholder="enter string to compress" />
<button id="compressBtn">
Compress input
</button>
<div id="display">
</div>
Regarding the possible use of UTF-32 to further compress, I'm not sure it's possible, I might be wrong on that, but from my understanding it's not feasible. Here's why:
The approach above is basically concatenating two 1 byte values in one 2 bytes value. This is possible because javascript strings are encoded in 2 bytes (or 16 bits) (note that from what I understand the engine could decide to store differently making this compression unnecessary from a purely memory space point of view - that being said, in the end, one character is considered being 16 bits). A cleaner way to make the compression above would in fact to user the binary numbers instead of the decimal, it would make much more sense. Like this for example:
document.getElementById('compressBtn').addEventListener('click', function() {
var stringToCompress = document.getElementById('tocompress').value;
var compressedString = compress(stringToCompress);
var decompressedString = decompress(compressedString);
if (stringToCompress === decompressedString) {
document.getElementById('display').innerHTML = stringToCompress + ", length of " + stringToCompress.length + " characters compressed to " + compressedString + ", length of " + compressedString.length + " characters back to " + decompressedString;
} else {
document.getElementById('display').innerHTML = "This string cannot be compressed"
}
})
function compress(string) {
string = unescape(encodeURIComponent(string));
var newString = '',
char, nextChar, combinedCharCode;
for (var i = 0; i < string.length; i += 2) {
// convert to binary instead of keeping the decimal
char = string.charCodeAt(i).toString(2);
if ((i + 1) < string.length) {
nextChar = string.charCodeAt(i + 1).toString(2) ;
// you still need padding, see this answer https://stackoverflow.com/questions/27641812/way-to-add-leading-zeroes-to-binary-string-in-javascript
combinedCharCode = "0000000".substr(char.length) + char + "" + "0000000".substr(nextChar.length) + nextChar;
// You take the concanated code string and convert it back to a binary number, then a character
newString += String.fromCharCode(parseInt(combinedCharCode, 2));
} else {
// Here because you won't always have pair number length
newString += string.charAt(i);
}
}
return newString;
}
function decompress(string) {
var newString = '',
char, codeStr, firstCharCode, lastCharCode;
for (var i = 0; i < string.length; i++) {
char = string.charCodeAt(i);
if (char > 132) {
codeStr = char.toString(2);
// You take the first part (the first byte) of the compressed char code, it's your first letter
firstCharCode = parseInt(codeStr.substring(0, codeStr.length - 7), 2);
// then the second byte
lastCharCode = parseInt(codeStr.substring(codeStr.length - 7, codeStr.length), 2);
// You put back the 2 characters you had originally
newString += String.fromCharCode(firstCharCode) + String.fromCharCode(lastCharCode);
} else {
newString += string.charAt(i);
}
}
return newString;
}
var stringToCompress = 'I like bananas!';
var compressedString = compress(stringToCompress);
var decompressedString = decompress(compressedString);
document.getElementById('display').innerHTML = stringToCompress + ", length of " + stringToCompress.length + " characters compressed to " + compressedString + ", length of " + compressedString.length + " characters back to " + decompressedString;
<input id="tocompress" placeholder="enter string to compress" />
<button id="compressBtn">
Compress input
</button>
<div id="display">
</div>
So why not push the logic and use utf-32, which should be 4 bytes, meaning four 1 byte characters. One problem is that javascript has 2 bytes string. It's true that you can use pairs of 16 bits characters to represent utf-32 characters. Like this:
document.getElementById('test').innerHTML = "\uD834\uDD1E";
<div id="test"></div>
But if you test the length of the resulting string, you'll see that it's 2, even if there's only one "character". So from a javascript perspective, you're not reducing the actual string length.
The other thing is that UTF-32 has in fact 221 characters. See here: https://en.wikipedia.org/wiki/UTF-32
It is a protocol to encode Unicode code points that uses exactly 32
bits per Unicode code point (but a number of leading bits must be zero
as there are fewer than 221 Unicode code points)
So you don't really have 4 bytes, in fact you don't even have 3, which would be needed to encode 3. So UTF-32 doesn't seem to be a way to compress even more. And since javascript has native 2 bytes strings, it seems to me to be the most efficient - using that approach at least.
If your strings only contain ASCII characters [0, 127] you can "compress" the string using a custom 6 or 7-bit code page.
You can do this several ways, but I think one of the simpler methods is to define an array holding all allowed characters - a LUT, lookup-table if you like, then use its index value as the encoded value. You would of course have to manually mask and shift the encoded value into a typed array.
If your LUT looked like this:
var lut = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,:;!(){}";
you would in this case deal with a LUT of length 71 which means we would need to use a 7-bit range or [0, 127] (if length were 64 we could've reduced the it to 6-bit [0, 63] values).
Then you would take each characters in the string and convert to index values (you would normally do all the following steps in a single operation but I have separated them for simplicity):
var lut = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,:;!(){}";
var str = "I like bananas !";
var page = [];
Array.prototype.forEach.call(str, function(ch) {
var i = lut.indexOf(ch);
if (i < 0) throw "Invalid character - can't encode";
page.push(i);
});
console.log("Intermediate page:", page);
You can always tweak the LUT so that the most used characters are in the beginning, then support variable encoding bit-range, find max value and use that to determine what range you want to encode in. You can add an initial bit as a flag as to which range the encoding uses (for example bit 0 set if 6-bit fits, otherwise use 7-bit range).
Now that you know the indices we can start to encode the binary output itself using a 7-bit approach. Since JavaScript only support byte values, i.e. 8-bit width, we have to do all the split, shift and merge operations manually.
This means we need to keep track of remainder and position on a bit-level.
Say first index value was the following 7-bit value (full 7-bit range for readability - all in pseudo format):
&b01111111
The first step would be to shift it over to bit position 0 and keep track of a remainder:
&b01111111 << 1
Resulting in:
&b11111110
^
new bit position: 7
new remainder : 1
Then the next index value, for example:
&b01010101
would be encoded like this - first convert to 7-bit value in its own byte representation:
&b01010101 << 1 => &b10101010
Then get the reminder part first. To obtain this will shift everything right-wise using 8-bit minus the current remainder (within modulo of 8):
remainderValue = &b10101010 >>> (8 - remainder)
leaving us with the following representation:
&b00000001
(Note that we use triple >>> to shift right to avoid issues with sign.)
Next step now is to merge this value with our previous value that has already been encoded and stored into our destination byte array - for this we'll use an OR operation:
Index 0 New value Result in index 0 (index of dst. array)
&b11111110 | &b00000001 => &b11111111
then go to next index in our destination array and store the rest of the current value, then update the remainder and position.
The "leftover" of the byte is calculated like this using the original (after shifting it) 7-bit byte value:
leftover = &b10101010 << remainder => &b01010100
which we now put into the next position:
Index 0 Index 1 (destination array index, not page index)
&b11111111 01010100
^
new bit position: 14
new remainder : 2
And so on with the remaining index values. See this answer for actual code on how you can do this in JavaScript - the code in this answer doesn't deal with string encoding per-se, but it shows how you can shift byte buffers bit-wise which is essentially the same you need for this task.
To calculate the remainder step, use 8-bits minus your custom bit-range:
step = 8 - newRange (here 7) => 1
This will also be the start remainder. For each character, you'll add the step to remainder after it has been processed, but remember to use modulo 8 (byte width) when you use it for shifting:
remainder += step;
numOfBitsToShift = remainder % 8;
Bit-position uses of course the bit-range, in this case 7:
bitPosition += 7;
Then to find which indices you're dealing with you divide the bitPosition on 8, if any decimal you have to deal with two indexes (old and new), if no decimal the current position represents new index only (only shift is needed for current index value).
You can also use modulo and when modulo of remainder = step you know you that you are dealing with a single index in the destination.
To calculate the final length you would use the bit-length and length of string, then ceil the result so that all characters will fit into a 8-byte byte array which is the only array we can get in JavaScript:
dstLength = Math.ceil(7 * str.length / 8);
To decode you just reverse all the steps.
An alternative, if you use long strings or have to move forward fast, is to use an established compressor such as zlib which has a very compact header as well as good performance in JavaScript in the case of the linked solution. This will also deal with "patterns" in the string to further optimize the resulting size.
Disclaimer: as this is mostly a theoretical answer there might be some errors. Feel free to comment if any are found. Refer to linked answer for actual code example.
for full code see here: https://repl.it/NyMl/1
using the Uint8Array you can work with the bytes.
let msg = "This is some message";
let data = []
for(let i = 0; i < msg.length; ++i){
data[i] = msg.charCodeAt(i);
}
let i8 = new Uint8Array(data);
let i16 = new Uint16Array(i8.buffer);
you could also think of a compression like this: http://pieroxy.net/blog/pages/lz-string/demo.html
if you don't want to use a 3rd party library, the lz based compression should be fairly simple. see here (wikipedia)
I use the same library mentioned above, lz-string https://github.com/pieroxy/lz-string, and it creates file sizes that are smaller than most of the binary formats like Protocol Buffers.
I compress via Node.js like this:
var compressedString = LZString.compressToUTF16(str);
And I decompress client side like this:
var decompressedString = LZString.decompressFromUTF16(str);
Here is my question:
Given a string, which is made up of space separated words, how can I split that into N strings of (roughly) even length, only breaking on spaces?
Here is what I've gathered from research:
I started by researching word-wrapping algorithms, because it seems to me that this is basically a word-wrapping problem. However, the majority of what I've found so far (and there is A LOT out there about word wrapping) assumes that the width of the line is a known input, and the number of lines is an output. I want the opposite.
I have found a (very) few questions, such as this that seem to be helpful. However, they are all focused on the problem as one of optimization - e.g. how can I split a sentence into a given number of lines, while minimizing the raggedness of the lines, or the wasted whitespace, or whatever, and do it in linear (or NlogN, or whatever) time. These questions seem mostly to be unanswered, as the optimization part of the problem is relatively "hard".
However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.
Here is what I've come up with:
I think I have a workable method for the case when N=3. I start by putting the first word on the first line, the last word on the last line, and then iteratively putting another word on the first and last lines, until my total width (measured by the length of the longest line) stops getting shorter. This usually works, but it gets tripped up if your longest words are in the middle of the line, and it doesn't seem very generalizable to more than 3 lines.
var getLongestHeaderLine = function(headerText) {
//Utility function definitions
var getLongest = function(arrayOfArrays) {
return arrayOfArrays.reduce(function(a, b) {
return a.length > b.length ? a : b;
});
};
var sumOfLengths = function(arrayOfArrays) {
return arrayOfArrays.reduce(function(a, b) {
return a + b.length + 1;
}, 0);
};
var getLongestLine = function(lines) {
return lines.reduce(function(a, b) {
return sumOfLengths(a) > sumOfLengths(b) ? a : b;
});
};
var getHeaderLength = function(lines) {
return sumOfLengths(getLongestLine(lines));
}
//first, deal with the degenerate cases
if (!headerText)
return headerText;
headerText = headerText.trim();
var headerWords = headerText.split(" ");
if (headerWords.length === 1)
return headerText;
if (headerWords.length === 2)
return getLongest(headerWords);
//If we have more than 2 words in the header,
//we need to split them into 3 lines
var firstLine = headerWords.splice(0, 1);
var lastLine = headerWords.splice(-1, 1);
var lines = [firstLine, headerWords, lastLine];
//The header length is the length of the longest
//line in the header. We will keep iterating
//until the header length stops getting shorter.
var headerLength = getHeaderLength(lines);
var lastHeaderLength = headerLength;
while (true) {
//Take the first word from the middle line,
//and add it to the first line
firstLine.push(headerWords.shift());
headerLength = getHeaderLength(lines);
if (headerLength > lastHeaderLength || headerWords.length === 0) {
//If we stopped getting shorter, undo
headerWords.unshift(firstLine.pop());
break;
}
//Take the last word from the middle line,
//and add it to the last line
lastHeaderLength = headerLength;
lastLine.unshift(headerWords.pop());
headerLength = getHeaderLength(lines);
if (headerLength > lastHeaderLength || headerWords.length === 0) {
//If we stopped getting shorter, undo
headerWords.push(lastLine.shift());
break;
}
lastHeaderLength = headerLength;
}
return getLongestLine(lines).join(" ");
};
debugger;
var header = "an apple a day keeps the doctor away";
var longestHeaderLine = getLongestHeaderLine(header);
debugger;
EDIT: I tagged javascript, because ultimately I would like a solution I can implement in that language. It's not super critical to the problem though, and I would take any solution that works.
EDIT#2: While performance is not what I'm most concerned about here, I do need to be able to perform whatever solution I come up with ~100-200 times, on strings that can be up to ~250 characters long. This would be done during a page load, so it needs to not take forever. For example, I've found that trying to offload this problem to the rendering engine by putting each string into a DIV and playing with the dimensions doesn't work, since it (seems to be) incredibly expensive to measure rendered elements.
Try this. For any reasonable N, it should do the job:
function format(srcString, lines) {
var target = "";
var arr = srcString.split(" ");
var c = 0;
var MAX = Math.ceil(srcString.length / lines);
for (var i = 0, len = arr.length; i < len; i++) {
var cur = arr[i];
if(c + cur.length > MAX) {
target += '\n' + cur;
c = cur.length;
}
else {
if(target.length > 0)
target += " ";
target += cur;
c += cur.length;
}
}
return target;
}
alert(format("this is a very very very very " +
"long and convoluted way of creating " +
"a very very very long string",7));
You may want to give this solution a try, using canvas. It will need optimization and is only a quick shot, but I think canvas might be a good idea as you can calculate real widths. You can also adjust the font to the really used one, and so on. Important to note: This won't be the most performant way of doing things. It will create a lot of canvases.
DEMO
var t = `However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.`;
function getTextTotalWidth(text) {
var canvas = document.createElement("canvas");
var ctx = canvas.getContext("2d");
ctx.font = "12px Arial";
ctx.fillText(text,0,12);
return ctx.measureText(text).width;
}
function getLineWidth(lines, totalWidth) {
return totalWidth / lines ;
}
function getAverageLetterSize(text) {
var t = text.replace(/\s/g, "").split("");
var sum = t.map(function(d) {
return getTextTotalWidth(d);
}).reduce(function(a, b) { return a + b; });
return sum / t.length;
}
function getLines(text, numberOfLines) {
var lineWidth = getLineWidth(numberOfLines, getTextTotalWidth(text));
var letterWidth = getAverageLetterSize(text);
var t = text.split("");
return createLines(t, letterWidth, lineWidth);
}
function createLines(t, letterWidth, lineWidth) {
var i = 0;
var res = t.map(function(d) {
if (i < lineWidth || d != " ") {
i+=letterWidth;
return d;
}
i = 0;
return "<br />";
})
return res.join("");
}
var div = document.createElement("div");
div.innerHTML = getLines(t, 7);
document.body.appendChild(div);
I'm sorry this is C#. I had created my project already when you updated your post with the Javascript tag.
Since you said all you care about is roughly the same line length... I came up with this. Sorry for the simplistic approach.
private void DoIt() {
List<string> listofwords = txtbx_Input.Text.Split(' ').ToList();
int totalcharcount = 0;
int neededLineCount = int.Parse(txtbx_LineCount.Text);
foreach (string word in listofwords)
{
totalcharcount = totalcharcount + word.Count(char.IsLetter);
}
int averagecharcountneededperline = totalcharcount / neededLineCount;
List<string> output = new List<string>();
int positionsneeded = 0;
while (output.Count < neededLineCount)
{
string tempstr = string.Empty;
while (positionsneeded < listofwords.Count)
{
tempstr += " " + listofwords[positionsneeded];
if ((positionsneeded != listofwords.Count - 1) && (tempstr.Count(char.IsLetter) + listofwords[positionsneeded + 1].Count(char.IsLetter) > averagecharcountneededperline))//if (this is not the last word) and (we are going to bust the average)
{
if (output.Count + 1 == neededLineCount)//if we are writting the last line
{
//who cares about exceeding.
}
else
{
//we're going to exceed the allowed average, gotta force this loop to stop
positionsneeded++;//dont forget!
break;
}
}
positionsneeded++;//increment the needed position by one
}
output.Add(tempstr);//store the string in our list of string to output
}
//display the line on the screen
foreach (string lineoftext in output)
{
txtbx_Output.AppendText(lineoftext + Environment.NewLine);
}
}
(Adapted from here, How to partition an array of integers in a way that minimizes the maximum of the sum of each partition?)
If we consider the word lengths as a list of numbers, we can binary search the partition.
Our max length ranges from 0 to sum (word-length list) + (num words - 1), meaning the spaces. mid = (range / 2). We check if mid can be achieved by partitioning into N sets in O(m) time: traverse the list, adding (word_length + 1) to the current part while the current sum is less than or equal to mid. When the sum passes mid, start a new part. If the result includes N or less parts, mid is achievable.
If mid can be achieved, try a lower range; otherwise, a higher range. The time complexity is O(m log num_chars). (You'll also have to consider how deleting a space per part, meaning where the line break would go, features into the calculation.)
JavaScript code (adapted from http://articles.leetcode.com/the-painters-partition-problem-part-ii):
function getK(arr,maxLength) {
var total = 0,
k = 1;
for (var i=0; i<arr.length; i++) {
total += arr[i] + 1;
if (total > maxLength) {
total = arr[i];
k++;
}
}
return k;
}
function partition(arr,n) {
var lo = Math.max(...arr),
hi = arr.reduce((a,b) => a + b);
while (lo < hi) {
var mid = lo + ((hi - lo) >> 1);
var k = getK(arr,mid);
if (k <= n){
hi = mid;
} else{
lo = mid + 1;
}
}
return lo;
}
var s = "this is a very very very very "
+ "long and convoluted way of creating "
+ "a very very very long string",
n = 7;
var words = s.split(/\s+/),
maxLength = partition(words.map(x => x.length),7);
console.log('max sentence length: ' + maxLength);
console.log(words.length + ' words');
console.log(n + ' lines')
console.log('')
var i = 0;
for (var j=0; j<n; j++){
var str = '';
while (true){
if (!words[i] || str.length + words[i].length > maxLength){
break
}
str += words[i++] + ' ';
}
console.log(str);
}
Using the Java String Split() Method to split a string we will discover How and Where to Apply This String Manipulation Technique:
We'll examine the Java Split() method's explanation and discover how to apply it. The principles are explained simply and with enough programming examples, either as a separate explanation or in the comment part of the programs.
The Java String Split() method is used to divide or split the calling Java String into pieces and return the Array, as the name implies. The delimiters("", " ", ) or regular expressions that we have supplied separately for each component or item of an array.
Syntax
String[ ] split(String regExp)
First Case: It involves initializing a Java String variable with a variety of words separated by spaces, using the Java String Split() method, and evaluating the results. We can effectively print each word without the space using the Java Split() function.
Second Case: In this case, we initialize a Java String variable and attempt to split or deconstruct the main String variable to use the String Split() method utilizing a substring of the initialized String variable.
Third Case: In this case, we will attempt to split a String using its character by taking a String variable (a single word).
You can check out other approaches to this problem on YouTube and even coding websites on google such as Coding Ninjas
This old question was revived by a recent answer, and I think I have a simpler technique than the answers so far:
const evenSplit = (text = '', lines = 1) => {
if (lines < 2) {return [text]}
const baseIndex = Math .round (text .length / lines)
const before = text .slice (0, baseIndex) .lastIndexOf (' ')
const after = text .slice (baseIndex) .indexOf (' ') + baseIndex
const index = after - baseIndex < baseIndex - before ? after : before
return [
text .slice (0, index),
... evenSplit (text .slice (index + (before > -1 ? 1 : 0)), lines - 1)
]
}
const text = `However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.`
const display = (lines) => console .log (lines .join ('\n'))
display (evenSplit (text, 7))
display (evenSplit (text, 5))
display (evenSplit (text, 12))
display (evenSplit (`this should be three lines, but it has a loooooooooooooooooooooooooooooooong word`, 3))
.as-console-wrapper {max-height: 100% !important; top: 0}
It works by finding the first line then recurring on the remaining text with one fewer lines. The recursion bottoms out when we have a single line. To calculate the first line, we take an initial target index which is just an equal share of the string based on its length and the number of lines. We then check to find the closest space to that index, and split the string there.
It does no optimization, and could certainly be occasionally misled by long words, but mostly it just seems to work.
When changing to TypeScript I'm not allowed to use escape(string) anymore because it's deprecated. The reason I still use it is that the alternatives encodeURI and encodeURIComponent give a different results.
var s = "Å"
console.log(escape(s));
console.log(encodeURI(s));
console.log(encodeURIComponent(s));
I don't use this for URLs, but for a CSV export.
What are other alternatives that will give me the same result as escape(string)?
In EcmaScript spec there is algorithm:
Call ToString(string).
Compute the number of characters in Result(1).
Let R be the empty string.
Let k be 0.
If k equals Result(2), return R.
Get the character at position k within Result(1).
If Result(6) is one of the 69 nonblank ASCII characters ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz 0123456789 #*_+-./, go to step 14.
Compute the 16-bit unsigned integer that is the Unicode character encoding of Result(6).
If Result(8), is less than 256, go to step 12.
Let S be a string containing six characters “%uwxyz” where wxyz are four hexadecimal digits encoding the
value of Result(8).
Go to step 15.
Let S be a string containing three characters “%xy” where xy are two hexadecimal digits encoding the
value of Result(8).
Go to step 15.
Let S be a string containing the single character Result(6).
Let R be a new string value computed by concatenating the previous value of R and S.
Increase k by 1.
Go to step 5.
which can be coded like this:
(function(global) {
var allowed = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789#*_+-./,';
global.escapeString = function(str) {
str = str.toString();
var len = str.length, R = '', k = 0, S, chr, ord;
while(k < len) {
chr = str[k];
if (allowed.indexOf(chr) != -1) {
S = chr;
} else {
ord = str.charCodeAt(k);
if (ord < 256) {
S = '%' + ("00" + ord.toString(16)).toUpperCase().slice(-2);
} else {
S = '%u' + ("0000" + ord.toString(16)).toUpperCase().slice(-4);
}
}
R += S;
k++;
}
return R;
};
})(typeof window == 'undefined' ? global : window);