So I'm following a algorithms course and it asked to implement the following function:
Write a function called findLongestSubstring, which accepts a string and returns the length of the longest substring with all distinct characters.
findLongestSubstring('') // 0
findLongestSubstring('bbbbbb') // 1
findLongestSubstring('longestsubstring') // 8
And said it has to have a time complexity of o(n).
So I could not come up with a o(n) solution, but a supposed (n^2)... my naive solution:
function findLongestSubstring(str: string): number {
const array = str.split("");
let maxLength = 0;
for (let i = 0; i < array.length; i++) {
const letterArray = [array[i]];
for (let j = i + 1; j < array.length; j++) {
if (letterArray.find(e => e === array[j])) {
if (letterArray.length > maxLength) maxLength = letterArray.length;
break;
} else {
letterArray.push(array[j])
if (j === array.length - 1) {
if (letterArray.length > maxLength) maxLength = letterArray.length;
break;
}
}
}
}
return maxLength;
}
Then I went to check his solution:
function findLongestSubstring(str: string) {
let longest = 0;
let seen = {};
let start = 0;
for (let i = 0; i < str.length; i++) {
let char = str[i];
if (seen[char]) {
start = Math.max(start, seen[char]);
}
// index - beginning of substring + 1 (to include current in count)
longest = Math.max(longest, i - start + 1);
// store the index of the next char so as to not double count
seen[char] = i + 1;
}
return longest;
}
So I do agree that my solution which has a nested for loop appears to be O(n2) and his solution o(n). But when I tested both algorithms with a really big string to my surprise my algorithm was acctually faster and i have NO idea why... can someone enlight me?
The way Im testing:
test("really big string to test bigO", () => {
const alphabet = "abcdefghijklmnopqrstuvwxyz"
const randomLettersArray = Array.from({ length: 100000000 }, () => alphabet[Math.floor(Math.random() * alphabet.length)]);
const randomBigString = randomLettersArray.join("")
const start = Date.now();
findLongestSubstring(randomBigString);
const end = Date.now();
console.log(`Execution time: ${end - start} ms`);
})
Your nested loop solution looks like it could take O(n2) time, but it's actually limited by the longest possible valid substring, and that is limited by the size of the alphabet.
Your test uses a small alphabet -- only 26 characters. Your inner loop has a maximum of 26 iterations in this case, and that just might be faster than the other algorithm in some environments.
If you were to use a much larger alphabet -- say 10000 characters or so, then your algorithm would be much slower, but the other one would not slow down much at all.
Let us condider all longest subtrings (meeting the distinctness condition) that end at the successive positions:
l -> l
lo -> lo
lon -> lon
long -> long
longe -> longe
longes -> longes
longest -> longest
longests -> ts
longestsu -> tsu
longestsub -> tsub
longestsubs -> ubs
longestsubst -> ubst
longestsubstr -> usbtr
longestsubstri -> ubstri
longestsubstrin -> ubstrin
longestsubstring -> ubstring
You will notice that the index of the first character never descreases. Every time one appends a new character, the starting index either does not change, or it moves past the position of the same character in the substring.
So the following loop could do:
the substring is empty
for all ending positions in the string:
append the new character to the substring
move the starting position past the same character in the substring, if any
But we are not done yet: if the new character is found in the substring, the starting cursor will stop there; but if it is not found, we need to traverse the whole substring for nothing, and this can make the complexity quadratic.
So we need an extra device to tell us if the substring contains the new character without traversing the substring. As the alphabet has a finite size, we can do with an array of booleans which will act as counters. Initially all booleans are zero; when a character enters/exits the substring, its entry in the array is set/reset.
Hence the whole search can be performed in linear time. Of course the solution is the length of the longest string found during this process.
all booleans are initially reset
the substring is empty
for all ending positions in the string:
append the new character to the substring
if the boolean for this character is set:
move the starting position past the same character in the substring,
while resetting the booleans of all characters met
set the boolean for this character
update the longest length so far
function reverseString(str) {
let newStr = ''
for (let i = (str.length - 1); i >= 0; i--) {
newStr += str[i]
}
return newStr
}
// This algorithm is faster
function reverseString2(str) {
str = str.split('')
let left = 0
let right = str.length - 1
while (left < right) {
const tmp = str[left]
str[left] = str[right]
str[right] = tmp
left++
right--
}
return str.join('')
}
Why is reverseString2 faster than reverseString if the function does more processing, converting the string to an array and then concatenating the whole array? The advantage is that the main algorithm is O(n/2) but the rest is O(n). Why does that happen?
The results are the following:
str size: 20000000
reverseString: 4022.294ms
reverseString2: 1329.758ms
Thanks in advance.
In the first function reverseString(), what is happening is that the loop is running 'n' times. Here 'n' is meant to signify the length of the string. You can see that you are executing the loop n times to get the reversed string. So the total time taken by this function depends on the value of 'n'.
In the second function reverseString2(), the while loop is running only n/2 times. You can understand this when you look at the left and the right variable. For one execution of the loop, the left variable is increasing by 1 and the right variable is decreasing by 1. So you are doing 2 updation at once. So for a n-length string, it only executes for n/2 times.
Although there are much more statements in the second function, but they are being executed much less time than those statements in the first function. Hence the function reverseString2() is faster.
My purpose is to punch multiple strings into a single (shortest) string that will contain all the character of each string in a forward direction. The question is not specific to any language, but more into the algorithm part. (probably will implement it in a node server, so tagging nodejs/javascript).
So, to explain the problem:
Let's consider I have few strings
["jack", "apple", "maven", "hold", "solid", "mark", "moon", "poor", "spark", "live"]
The Resultant string should be something like:
"sjmachppoalidveonrk"
jack: sjmachppoalidveonrk
apple: sjmachppoalidveonrk
solid: sjmachppoalidveonrk
====================================>>>> all in the forward direction
These all are manual evaluation and the output may not 100% perfect in the example.
So, the point is all the letters of each string have to exist in the output in
FORWARD DIRECTION (here the actual problem belongs), and possibly the server will send the final strings and numbers like 27594 will be generated and passed to extract the token, in the required end. If I have to punch it in a minimal possible string it would have much easier (That case only unique chars are enough). But in this case there are some points:
Letters can be present multiple time, though I have to reuse any
letter if possible, eg: for solid and hold o > l > d can be
reused as forward direction but for apple (a > p) and spark
(p > a) we have to repeat a as in one case it appears before p
for apple, and after p for sparks so either we need to repeat
a or p. Even, we cannot do p > a > p as it will not cover both the case
because we need two p after a for apple
We directly have no option to place a single p and use the same
index twice in a time of extract, we need multiple p with no option
left as the input string contains that
I am (not) sure, that there is multiple outputs possible for a set of
strings. but the concern is it should be minimal in length,
the combination doesn't matter if its cover all the tokens in a forward direction. all (or one ) outputs of minimal possible length
need to trace.
Adding this point as an EDIT to this post. After reading the comments and knowing that it's already an existing
problem is known as shortest common supersequence problem we can
define that the resultant string will be the shortest possible
string from which we can re generate any input string by simply
removing some (0 to N) chars, this is same as all inputs can be found in a forward direction in the resultant string.
I have tried, by starting with an arbitrary string, and then made an analysis of next string and splitting all the letters, and place them accordingly, but after some times, it seems that current string letters can be placed in a better way, If the last string's (or a previous string's) letters were placed according to the current string. But again that string was analysed and placed based on something (multiple) what was processed, and placing something in the favor of something that is not processed seems difficult because to that we need to process that. Or might me maintaining a tree of all processed/unprocessed tree will help, building the building the final string? Any better way than it, it seems a brute force?
Note: I know there are a lot of other transformation possible, please try not to suggest anything else to use, we are doing a bit research on it.
I came up with a somewhat brute force method. This way finds the optimal way to combine 2 words then does it for each element in the array.
This strategy works by trying finding the best possible way to combine 2 words together. It is considered the best by having the fewest letters. Each word is fed into an ever growing "merged" word. Each time a new word is added the existing word is searched for a matching character which exists in the word to be merged. Once one is found both are split into 2 sets and attempted to be joined (using the rules at hand, no need 2 add if letter already exists ect..). The strategy generally yields good results.
The join_word method takes 2 words you wish to join, the first parameter is considered to be the word you wish to place the other into. It then searches for the best way to split into and word into 2 separate parts to merge together, it does this by looking for any shared common characters. This is where the splits_on_letter method comes in.
The splits_on_letter method takes a word and a letter which you wish to split on, then returns a 2d array of all the possible left and right sides of splitting on that character. For example splits_on_letter('boom', 'o') would return [["b","oom"],["bo","om"],["boo","m"]], this is all the combinations of how we could use the letter o as a split point.
The sort() at the beginning is to attempt to place like elements together. The order in which you merge the elements generally effects the results length. One approach I tried was to sort them based upon how many common letters they used (with their peers), however the results were varying. However in all my tests I had maybe 5 or 6 different word sets to test with, its possible with a larger, more varying word arrays you might find different results.
Output is
spmjhooarckpplivden
var words = ["jack", "apple", "maven", "hold", "solid", "mark", "moon", "poor", "spark", "live"];
var result = minify_words(words);
document.write(result);
function minify_words(words) {
// Theres a good sorting method somewhere which can place this in an optimal order for combining them,
// hoever after quite a few attempts i couldnt get better than just a regular sort... so just use that
words = words.sort();
/*
Joins 2 words together ensuring each word has all its letters in the result left to right
*/
function join_word(into, word) {
var best = null;
// straight brute force each word down. Try to run a split on each letter and
for(var i=0;i<word.length;i++) {
var letter = word[i];
// split our 2 words into 2 segments on that pivot letter
var intoPartsArr = splits_on_letter(into, letter);
var wordPartsArr = splits_on_letter(word, letter);
for(var p1=0;p1<intoPartsArr.length;p1++) {
for(var p2=0;p2<wordPartsArr.length;p2++) {
var intoParts = intoPartsArr[p1], wordParts = wordPartsArr[p2];
// merge left and right and push them together
var result = add_letters(intoParts[0], wordParts[0]) + add_letters(intoParts[1], wordParts[1]);
if(!best || result.length <= best.length) {
best = result;
}
}
}
}
// its possible that there is no best, just tack the words together at that point
return best || (into + word);
}
/*
Splits a word at the index of the provided letter
*/
function splits_on_letter(word, letter) {
var ix, result = [], offset = 0;;
while((ix = word.indexOf(letter, offset)) !== -1) {
result.push([word.substring(0, ix), word.substring(ix, word.length)]);
offset = ix+1;
}
result.push([word.substring(0, offset), word.substring(offset, word.length)]);
return result;
}
/*
Adds letters to the word given our set of rules. Adds them starting left to right, will only add if the letter isnt found
*/
function add_letters(word, addl) {
var rIx = 0;
for (var i = 0; i < addl.length; i++) {
var foundIndex = word.indexOf(addl[i], rIx);
if (foundIndex == -1) {
word = word.substring(0, rIx) + addl[i] + word.substring(rIx, word.length);
rIx += addl[i].length;
} else {
rIx = foundIndex + addl[i].length;
}
}
return word;
}
// For each of our words, merge them together
var joinedWords = words[0];
for (var i = 1; i < words.length; i++) {
joinedWords = join_word(joinedWords, words[i]);
}
return joinedWords;
}
A first try, not really optimized (183% shorter):
function getShort(arr){
var perfect="";
//iterate the array
arr.forEach(function(string){
//iterate over the characters in the array
string.split("").reduce(function(pos,char){
var n=perfect.indexOf(char,pos+1);//check if theres already a possible char
if(n<0){
//if its not existing, simply add it behind the current
perfect=perfect.substr(0,pos+1)+char+perfect.substr(pos+1);
return pos+1;
}
return n;//continue with that char
},-1);
})
return perfect;
}
In action
This can be improved trough simply running the upper code with some variants of the array (200% improvement):
var s=["jack",...];
var perfect=null;
for(var i=0;i<s.length;i++){
//shift
s.push(s.shift());
var result=getShort(s);
if(!perfect || result.length<perfect.length) perfect=result;
}
In action
Thats quite close to the minimum number of characters ive estimated ( 244% minimization might be possible in the best case)
Ive also wrote a function to get the minimal number of chars and one to check if a certain word fails, you can find them here
I have used the idea of Dynamic programming to first generate the shortest possible string in forward direction as stated in OP. Then I have combined the result obtained in the previous step to send as a parameter along with the next String in the list. Below is the working code in java. Hope this would help to reach the most optimal solution, in case my solution is identified to be non optimal. Please feel free to report any countercases for the below code:
public String shortestPossibleString(String a, String b){
int[][] dp = new int[a.length()+1][b.length()+1];
//form the dynamic table consisting of
//length of shortest substring till that points
for(int i=0;i<=a.length();i++){
for(int j=0;j<=b.length();j++){
if(i == 0)
dp[i][j] = j;
else if(j == 0)
dp[i][j] = i;
else if(a.charAt(i-1) == b.charAt(j-1))
dp[i][j] = 1+dp[i-1][j-1];
else
dp[i][j] = 1+Math.min(dp[i-1][j],dp[i][j-1]);
}
}
//Backtrack from here to find the shortest substring
char[] sQ = new char[dp[a.length()][b.length()]];
int s = dp[a.length()][b.length()]-1;
int i=a.length(), j=b.length();
while(i!=0 && j!=0){
// If current character in a and b are same, then
// current character is part of shortest supersequence
if(a.charAt(i-1) == b.charAt(j-1)){
sQ[s] = a.charAt(i-1);
i--;
j--;
s--;
}
else {
// If current character in a and b are different
if(dp[i-1][j] > dp[i][j-1]){
sQ[s] = b.charAt(j-1);
j--;
s--;
}
else{
sQ[s] = a.charAt(i-1);
i--;
s--;
}
}
}
// If b reaches its end, put remaining characters
// of a in the result string
while(i!=0){
sQ[s] = a.charAt(i-1);
i--;
s--;
}
// If a reaches its end, put remaining characters
// of b in the result string
while(j!=0){
sQ[s] = b.charAt(j-1);
j--;
s--;
}
return String.valueOf(sQ);
}
public void getCombinedString(String... values){
String sSQ = shortestPossibleString(values[0],values[1]);
for(int i=2;i<values.length;i++){
sSQ = shortestPossibleString(values[i],sSQ);
}
System.out.println(sSQ);
}
Driver program:
e.getCombinedString("jack", "apple", "maven", "hold",
"solid", "mark", "moon", "poor", "spark", "live");
Output:
jmapphsolivecparkonidr
Worst case time complexity of the above solution would be O(product of length of all input strings) when all strings have all characters distinct and not even a single character matches between any pair of strings.
Here is an optimal solution based on dynamic programming in JavaScript, but it can only get through solid on my computer before it runs out of memory. It differs from #CodeHunter's solution in that it keeps the entire set of optimal solutions after each added string, not just one of them. You can see that the number of optimal solutions grows exponentially; even after solid there are already 518,640 optimal solutions.
const STRINGS = ["jack", "apple", "maven", "hold", "solid", "mark", "moon", "poor", "spark", "live"]
function map(set, f) {
const result = new Set
for (const o of set) result.add(f(o))
return result
}
function addAll(set, other) {
for (const o of other) set.add(o)
return set
}
function shortest(set) { //set is assumed non-empty
let minLength
let minMatching
for (const s of set) {
if (!minLength || s.length < minLength) {
minLength = s.length
minMatching = new Set([s])
}
else if (s.length === minLength) minMatching.add(s)
}
return minMatching
}
class ZipCache {
constructor() {
this.cache = new Map
}
get(str1, str2) {
const cached1 = this.cache.get(str1)
if (!cached1) return undefined
return cached1.get(str2)
}
set(str1, str2, zipped) {
let cached1 = this.cache.get(str1)
if (!cached1) {
cached1 = new Map
this.cache.set(str1, cached1)
}
cached1.set(str2, zipped)
}
}
const zipCache = new ZipCache
function zip(str1, str2) {
const cached = zipCache.get(str1, str2)
if (cached) return cached
if (!str1) { //str1 is empty, so only choice is str2
const result = new Set([str2])
zipCache.set(str1, str2, result)
return result
}
if (!str2) { //str2 is empty, so only choice is str1
const result = new Set([str1])
zipCache.set(str1, str2, result)
return result
}
//Both strings start with same letter
//so optimal solution must start with this letter
if (str1[0] === str2[0]) {
const zipped = zip(str1.substring(1), str2.substring(1))
const result = map(zipped, s => str1[0] + s)
zipCache.set(str1, str2, result)
return result
}
//Either do str1[0] + zip(str1[1:], str2)
//or str2[0] + zip(str1, str2[1:])
const zip1 = zip(str1.substring(1), str2)
const zip2 = zip(str1, str2.substring(1))
const test1 = map(zip1, s => str1[0] + s)
const test2 = map(zip2, s => str2[0] + s)
const result = shortest(addAll(test1, test2))
zipCache.set(str1, str2, result)
return result
}
let cumulative = new Set([''])
for (const string of STRINGS) {
console.log(string)
const newCumulative = new Set
for (const test of cumulative) {
addAll(newCumulative, zip(test, string))
}
cumulative = shortest(newCumulative)
console.log(cumulative.size)
}
console.log(cumulative) //never reached
I recently came up with a naive (+ poor) solution to the British Change Problem (i.e. how many combinations of coins can generate a given total). I have a better solution now, but was still interested in solving the time and space complexity of the two solutions below.
Worst Solution
This solution recursively tries combining every number against itself and every other number, resulting in a lot of duplicate work. I believe it's O(n^n) time and not sure how to measure space complexity (but it's huge, since we're storing every result). Thoughts?
var makeChange = function(total){ // in pence
var allSets = new Set();
var coins = [1,2,5,10,20,50,100,200];
var subroutine = (arr, total) => {
if(total < 0){ return; }
if(total === 0){
allSets.add(''+arr);
} else {
// increase each coin amount by one and decrease the recursive total by one
for(var i = 0; i<coins.length; i++){
if((total - coins[i]) >= 0){
subroutine(arr.slice(0,i).concat(arr[i]+1).concat(arr.slice(i+1)), (total - coins[i]))
}
}
}
};
var zeros = new Array(coins.length).fill(0);
subroutine(zeros, total);
return allSets.size;
};
Improved Solution
This solution still has massive space complexity but I believe the time complexity has -improved- to O(n!) since we're recursing on smaller subsets of coins each time.
var makeChange = function(total){ // in pence
var allSets = new Set();
var coins = [1,2,5,10,20,50,100,200];
var subroutine = (arr, total, start) => {
if(total < 0){ return; }
if(total === 0){
console.log(''+arr);
allSets.add(''+arr);
} else {
// only solve for coins above start, since lower coins already solved
for(var i = start; i<coins.length; i++){
if((total - coins[i]) >= 0){
subroutine(arr.slice(0,i).concat(arr[i]+1).concat(arr.slice(i+1)), (total - coins[i]), i);
}
}
}
};
var zeros = new Array(coins.length).fill(0);
for(let i = 0; i<coins.length; i++){
subroutine(zeros, total, i);
}
return allSets.size;
};
Please help me to understand if my time/space complexity estimates are correct, and how to better estimate future problems like these. Thanks!
The complexity of the first algorithm is not actually an O(n^n). N is a variable which represents your input. In this case, I will refer to the variable "total" as your input, so N is based on total. For your algorithm to be O(n^n), it's recurrence tree would have to have a depth of N and a branching factor of N. Here, your depth of your recurrence is based on the smallest variable in your coins array. There is one branch of your recursion tree where you simply subtract that value off every time and recurse until total is zero. Given that that value is constant, it is safe to say your depth is n. Your branching factor for your recursion tree is also based off of your coins array, or the number of values in it. For every function call, you generate C other function calls, where C is the size of your coins array. That means your function is actually O(n^c) not O(n^n). Your time and space complexities are both based off of the size of your coins array as well as your input number.
The space complexity for your function is O(n^c * c). Every time you call your function, you also pass it an array of a size based on your input. We already showed that there are O(n^c) calls, and each call incorporates an array of size c.
Remember when analyzing the complexity of functions to take into account all inputs.
I'm trying to build a function that adds up all the numbers within a string... for example, 'dlsjf3diw62' would end up being 65.
I tried to be clever and put together a recursive function:
function NumberAddition(str) {
var numbers='1234567890';
var check=[];
str=str.split[''];
function recursive(str,check) {
if (str.length==0)
return check;
else if (numbers.indexOf(str[0])>=0)
{
for (i=0;i<str.length;i++){
if (numbers.indexOf(str[i])<0)
check.push(str.slice(0,i));
str=str.slice(i);
return recursive(str,check);
}
}
else
str.shift();
return recursive(str,check);
}
You'll see that I'm trying to get my numbers returned as an array in the array named check. Unfortunately, I have a maximum call stack size exceeded, and I'm not sure why! The recursion does have a base case!! It ends once str no longer has any contents. Why wouldn't this work? Is there something I'm missing?
-Will
You can achieve the same thing with a far easier solution, using regular expressions, as follows:
var str = 'dlsjf3diw62';
var check = str.match(/\d+/g); // this pattern matches all instances of 1 or more digits
Then, to sum the numbers, you can do this:
var checkSum = 0;
for (var i = 0; i < check.length; i++) {
checkSum += parseInt(check[i]);
}
Or, slightly more compact:
var checkSum = check.reduce(function(sum, num){ return sum + parseInt(num) }, 0);
The reason your recursion doesn't work is the case where you do enter the for loop, because you've found a digit, but the digits continue to the end of the string. If that happens, the return inside the for loop never happens, and the loop ends. After that, the .shift() does not happen, because it's in that else branch, so you return re-process the same string.
You shouldn't solve this particular problem that way, but the code makes a good example of the anti-pattern of having return statements inside if bodies followed by else. Your code would be clearer (and would work) if it looked like this:
function recursive(str, check) {
if (str.length == 0)
return check;
if (numbers.indexOf(str[0]) >= 0) {
// Find the end of the string of digits, or
// the end of the whole thing
for (var i = 0; i < str.length && numbers.indexOf(str[i]) >= 0; i++);
check.push(str.slice(0, i));
str = str.slice(i);
return recursive(str, check);
}
// A non-digit character
str.shift();
return recursive(str, check);
}
In that version, there are no else clauses, because the two if clauses always involve a return. The for loop is changed to simply find the right value of "i" for the subsequent slicing.
edit — one thing this doesn't fix is the fact that you're pushing arrays into your "check" list. That is, the substring "62" would be pushed as the array ["6", "2"]. That's not a huge problem; it's solved with the addition of a .join() in the right place.