This is from Leetcode problem: Concatenated Words.
Below is a working solution. I added what I thought to be an optimization (see code comment), but it actually slows down the code. If I remove the wrapping if statement, it runs faster.
To me, the optimization helps avoid having to:
call an expensive O(n) substring()
check inside wordsSet
making an unnecessary function call to checkConcatenation
Surely if (!badStartIndices.has(end + 1)) isn't more expensive than all the above, right? Maybe it has something to do with Javascript JIT compilation? V8? Thoughts?
Use the following test input:
// Notice how the second string ends with a 'b'!
const words = [
'a',
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab',
];
// Function in question.
var findAllConcatenatedWordsInADict = function (words) {
console.time();
// 1) put all words in a set
const wordsSet = new Set(words);
let badStartIndices;
// 2) iterate words, recursively check if it's a valid word
const concatenatedWords = [];
function checkConcatenation(word, startIdx = 0, matches = 0) {
if (badStartIndices.has(startIdx)) {
return false;
}
if (startIdx === word.length && matches >= 2) {
concatenatedWords.push(word);
return true;
}
for (let end = startIdx; end < word.length; end++) {
// I ADDED THE IF STATEMENT AS AN OPTIMIZATION. BUT CODE RUNS FASTER WITHOUT IT.
// NOTE: Code is correct with & without if statement.
if (!badStartIndices.has(end + 1)) {
const curWord = word.substring(startIdx, end + 1);
if (wordsSet.has(curWord)) {
if (checkConcatenation(word, end + 1, matches + 1)) {
return true;
}
}
}
}
badStartIndices.add(startIdx);
return false;
}
for (const word of words) {
// reset memo at beginning of each word
badStartIndices = new Set();
checkConcatenation(word);
}
console.timeEnd();
return concatenatedWords;
};
Turns out this depends entirely on the input data, not on JavaScript or V8. (And as of writing this, I don't know what data you used for benchmarking.)
With the example input from the Leetcode page you've linked, badStartIndices never does anything useful (both of the .has checks always return false); so it's fairly obvious that doing this fruitless check twice is a little slower than doing it just once. In that case, the "dynamic programming" mechanism of the solution never kicks in, so the effective behavior degenerates to brute force, which is good enough because the input data is well-behaved. (In fact, deleting badStartIndices entirely would be even faster for such a test case.)
If I construct "evil" input data that actually leads to exponential combinatorial blow-up, i.e. where the badStartIndices.has(...) checks actually have something to do, then adding the early check does have a (small) performance benefit. (And without either of the checks, the computation would take "forever" for such inputs.)
So, taking a step back, this is one more example to illustrate that benchmarking is difficult; in particular, in order to get useful results, care must be taken to select relevant/realistic input data.
If the tests are too simple, developers are likely to not build optimizations that would help (a little or a lot) in high-load situations.
If the tests are too demanding, developers are likely to waste time on overly complicated code that ends up being slower than it could be for its target use case.
And if the code must handle any input with maximum performance, then as the developer you have the extra challenge of avoiding overhead for simple inputs while still scaling well to tough inputs...
Related
I'm trying to convert an iterator to an array. The iterator is the result of calling matchAll on a very long string. The iterator (I assume) has many matches within the string. First I tried it with the spread operator:
const array = [...myLongString.matchAll(/myregex/g)];
This gave me the error: RangeError: Maximum call stack size exceeded
So I tried iterating via next():
const safeIteratorToArray = (iterator) => {
const result = [];
let item = iterator.next();
while (!item.done) {
result.push(item.value);
item = iterator.next();
}
return result;
};
But this gives me the same error, on the item = iterator.next() line. So I tried making it async in an effort to reset the call stack:
const safeIteratorToArray = async (iterator) => {
const result = [];
let item = iterator.next();
while (!item.done) {
result.push(item.value);
item = await new Promise(resolve => setTimeout(() => resolve(iterator.next())));
}
return result;
};
But I still get the same error.
If you are curious about the actual use case:
The regex I'm actually using is:
/\[(.+?)\] \[DEBUG\] \[Item (.+?)\] Success with response: ((.|\n)+?)\n\[/g
And the contents of the text file (it's a log file) generally looks like:
[TIMESTAMP] [LOG_LEVEL] [Item ITEM_ID] Success with response: {
...put a giant json object here
}
Repeat that ad-nauseam with newlines in between each log.
(V8 developer here.)
It's not about the iterator, it's about the RegExp.
[Update]
Looks like I was misled by a typo in my test, so my earlier explanation/suggestion doesn't fix the problem. With the test corrected, it turns out that only the end of the expression (which I called "fishy" before) needs to be fixed.
The massive consumption of stack memory is caused by the fact that (.|\n) is a capture group, and is matched very frequently. One idea would be to write it as [.\n] instead, but the . metacharacter is not valid inside [...] character sets.
Hat tip to #cachius for suggesting an elegant solution: use the s flag to make . match \n characters.
As an unrelated fix, prefer checking for the closing } instead of the next opening [ at the beginning of a line, so that there's no overlap between matched ranges (which would make you miss some matches).
So, in summary, replace ((.|\n)+?)\n\[/g with (.+?)\n}/gs.
[/Update]
Here is a reproducible example. The following exhibits the stack overflow:
let lines = ["[TIMESTAMP] [DEBUG] [Item ITEM_ID] {"];
for (let i = 0; i < 1000000; i++) {
lines.push(" [0]"); // Fake JSON, close enough for RegExp purposes.
}
lines.push("}");
lines.push("[TIMESTAMP]");
let myLongString = lines.join("\n");
const re = /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] ((.|\n)+?)\n\[/g;
myLongString.match(re);
If you replace the const re = ... line with:
const re = /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] (.+?)\n}/gs;
then the stack overflow disappears.
(It would be possible to reduce the simplified example even further, but then the connection with your original case wouldn't be as obvious.)
[Original post below -- the mechanism I explained here is factually correct, and applying the suggested replacement indeed improves performance by 25% because it makes the RegExp simpler to match, it just isn't enough to fix the stack overflow.]
The problematic pattern is:
\[(.+?)\]
which, after all, means "a [, then any number of arbitrary characters, then a ]". While I understand that regular expressions might seem like magic, they're actually real algorithmic work under the hood, kind of like miniature programs in their own right. In particular, any time a ] is encountered in the string, the algorithm has to decide whether to count this as one of the "arbitrary characters", or as the one ] that ends this sequence. Since it can't magically know that, it has to keep both possibilities "in mind" (=on the stack), pick one, and backtrack if that turns out to be incorrect. Since this backtracking information is kept on the stack (where else?), if you put sufficiently many ] into your string, the stack will run out of space.
Luckily, the solution is simple: since what you actually mean is "a [, then any number of characters that aren't ], then a ]", you can just tell the RegExp engine that, replacing . with [^\]]:
\[([^\]]+?)\]
Note: ((.|\n)+?)\n\[ seems fishy for the same reason, but according to this test doesn't appear to be the problem, even if I further increase the input size. I'm not sure why; it might be due to how I created the test. If you see further problems with the real input, it may be worth reformulating this part as well.
[/Original post]
As the title suggests, I was trying to recursively solve a JavaScript problem. An exercise for my internet programming class was to invert any string that was entered in the function, and I saw this as a good opportunity to solve this with recursion. My code:
function reverseStr(str){
str = Array.from(str);
let fliparray = new Array(str.length).fill(0);
let char = str.slice(-1);
fliparray.push(char);
str.pop();
str.join("");
return reverseStr(str);
}
writeln(reverseStr("hello"))
The biggest problem is that your function doesn't have an end (base) case. It needs to have some way to recognize when it's supposed to stop or it will recurse forever.
The second problem is that you don't really seem to be thinking recursively. You're making some modification to the string, but then you just call reverseStr() all over again on the modified string, which is just going to start the process all over again.
The following doesn't really resemble your attempt (I don't know how to salvage your attempt), but it is a simple way to implement the reverse string algorithm recursively.
function reverseStr(str) {
// string is 0 or 1 characters. nothing to reverse
if (str.length <= 1) {
return str;
}
// return the first character appended to the end of the reverse of
// the portion after the first character
return reverseStr(str.substring(1)) + str.charAt(0);
}
console.log(reverseStr("Hello Everybody!"));
I'm working on the MIU system problem from "Gödel, Escher, Bach" chapter 2.
One of the rules states
Rule III: If III occurs in one of the strings in your collection, you may make a new string with U in place of III.
Which means that the string MIII can become MU, but for other, longer strings there may be multiple possibilities [matches in brackets]:
MIIII could yield
M[III]I >> MUI
MI[III] >> MIU
MUIIIUIIIU could yield
MU[III]UIIIU >> MUUUIIIU
MUIIIU[III]U >> MUIIIUUU
MUIIIIU could yield
MU[III]IU >> MUUIU
MUI[III]U >> MUIUU
Clearly regular expressions such as /(.*)III(.*)/ are helpful, but I can't seem to get them to generate every possible match, just the first one it happens to find.
Is there a way to generate every possible match?
(Note, I can think of ways to do this entirely manually, but I am hoping there is a better way using the built in tools, regex or otherwise)
(Edited to clarify overlapping needs.)
Here's the regex you need: /III/g - simple enough, right? Now here's how you use it:
var text = "MUIIIUIIIU", find = "III", replace "U",
regex = new RegExp(find,"g"), matches = [], match;
while(match = regex.exec(text)) {
matches.push(match);
regex.lastIndex = match.index+1;
}
That regex.lastIndex... line overrides the usual regex behaviour of not matching results that overap. Also I'm using a RegExp constructor to make this more flexible. You could even build it into a function this way.
Now you have an array of match objects, you can do this:
matches.forEach(function(m) { // older browsers need a shim or old-fashioned for loop
console.log(text.substr(0,m.index)+replace+text.substr(m.index+find.length));
});
EDIT: Here is a JSFiddle demonstrating the above code.
Sometimes regexes are overkill. In your case a simple indexOf might be fine too!
Here is, admittedly, a hack, but you can transform it into pretty, reusable code on your own:
var s = "MIIIIIUIUIIIUUIIUIIIIIU";
var results = [];
for (var i = 0; true; i += 1) {
i = s.indexOf("III", i);
if (i === -1) {
break;
}
results.push(i);
}
console.log("Match positions: " + JSON.stringify(results));
It takes care of overlaps just fine, and at least to me, the indexOf just looks simpler.
I'm using the following code to compare returned IP address (Using node-restify which is similar to express):
var checkIP = function (config, req) {
var ip = req.connection.remoteAddress.split('.'),
curIP,
b,
block = [];
for (var i=0, z=config.ips.length-1; i<=z; i++) {
curIP = config.ips[i].split('.');
b = 0;
// Compare each block
while (b<=3) {
(curIP[b]===ip[b] || curIP[b]==='*') ? block[b] = true : block[b] = false;
b++;
}
// Check all blocks
if (block[0] && block[1] && block[2] && block[3]) {
return true;
}
}
return false;
};
config.ips contains an array which (as should be obvious from the code) can be specific or wildcarded IPs.
This works, but it seems like there is a more efficient way to do this. Just curious if anyone has any suggestions on a way to simplify this or make it more efficient. My request time nearly doubled when I introduced this and I'd like to squeeze out some load time if possible.
If my intuition is correct, you might be doing a bunch of extra work right now:
For each IP expression in your config.ips array, your code is parsing and comparing:
if (block[0] && block[1] && block[2] && block[3]) {
return true;
}
^^^ Note that you have already done work to get all 4 blocks in the iterations of calculating this expression 4 times per IP: (curIP[b]===ip[b] || curIP[b]==='*'), so the ANDing above is not preventing the overhead of the work that is already happening regardless.
I have 2 ideas for you:
Since IP addresses are strings anyways, the * notation lends itself to be suitable for a Regex to do the work, instead of your splitting and comparing? So maybe as a next step you could look into implementing a Regex to do the work, instead of .split() and compare, and test the performance of that?
Or maybe figure out how to avoid that overhead associated with comparing the parts all the time, and compare the wholes when you can? And then fall back into comparing the parts only when necessity requires it.
If you want to read some C code, here's how Apache does IP blacklisting behind the scenes. Look at the function named in_domain for some inspiration.
Good luck, hope this helps!
I am currently doing a big project (by big I mean, many processes) where every millisecond I save means a lot (on the long run), so I want to make sure I am doing it the right way.
So, what is the best way to ensure you will have an array greater than 1?
a) use indexOf(), then if result is different than -1, split()
b) split (regardless if characters exist), then do stuff ONLY if the
array.length is greater than 1
c) another not listed above
Using jsPerf, it appears that omitting .indexOf() is roughly 23% more efficient that including it over 500,000 iterations (11.67 vs. 8.95 operations per second):
Without indexOf():
var str = "test";
for (var i = 0; i < 500000; i++) {
var test = str.split('.');
}
With .indexOf():
var str = "test";
for (var i = 0; i < 500000; i++) {
if (str.indexOf('.')) {
var test = str.split('.');
} else {
var test = str;
}
}
http://jsperf.com/split-and-split-indexof
EDIT
Hmm... If the following line is:
if (str.indexOf('.') > -1)
http://jsperf.com/split-and-split-indexof-with-indexof-check
Or any other comparison, it's seemingly quite a bit faster (by about 69%).
The only reason I can think this is the case is that running .split() on every variable will perform two functions on each value (find, then separate), instead of just one when necessary. Note, this last part is just a guess.
We can see that even when there is something to split the best results come from doing the indexOf test against a value. Still the improvement is worse that the cases where 100% of items don't need a split. Thus as you have more items needing to be split testing returns less benefit (as would be expected). So it really depends on the use case since the extra code takes up memory and uses resources.
http://jsperf.com/split-and-split-indexof/2
(b) is obviously more efficient than (a) because split uses the same logic as indexOf and that logic will not need to be repeated if there are indeed more than 2 elements. i cannot think of a more efficient way.