I'm trying to convert an iterator to an array. The iterator is the result of calling matchAll on a very long string. The iterator (I assume) has many matches within the string. First I tried it with the spread operator:
const array = [...myLongString.matchAll(/myregex/g)];
This gave me the error: RangeError: Maximum call stack size exceeded
So I tried iterating via next():
const safeIteratorToArray = (iterator) => {
const result = [];
let item = iterator.next();
while (!item.done) {
result.push(item.value);
item = iterator.next();
}
return result;
};
But this gives me the same error, on the item = iterator.next() line. So I tried making it async in an effort to reset the call stack:
const safeIteratorToArray = async (iterator) => {
const result = [];
let item = iterator.next();
while (!item.done) {
result.push(item.value);
item = await new Promise(resolve => setTimeout(() => resolve(iterator.next())));
}
return result;
};
But I still get the same error.
If you are curious about the actual use case:
The regex I'm actually using is:
/\[(.+?)\] \[DEBUG\] \[Item (.+?)\] Success with response: ((.|\n)+?)\n\[/g
And the contents of the text file (it's a log file) generally looks like:
[TIMESTAMP] [LOG_LEVEL] [Item ITEM_ID] Success with response: {
...put a giant json object here
}
Repeat that ad-nauseam with newlines in between each log.
(V8 developer here.)
It's not about the iterator, it's about the RegExp.
[Update]
Looks like I was misled by a typo in my test, so my earlier explanation/suggestion doesn't fix the problem. With the test corrected, it turns out that only the end of the expression (which I called "fishy" before) needs to be fixed.
The massive consumption of stack memory is caused by the fact that (.|\n) is a capture group, and is matched very frequently. One idea would be to write it as [.\n] instead, but the . metacharacter is not valid inside [...] character sets.
Hat tip to #cachius for suggesting an elegant solution: use the s flag to make . match \n characters.
As an unrelated fix, prefer checking for the closing } instead of the next opening [ at the beginning of a line, so that there's no overlap between matched ranges (which would make you miss some matches).
So, in summary, replace ((.|\n)+?)\n\[/g with (.+?)\n}/gs.
[/Update]
Here is a reproducible example. The following exhibits the stack overflow:
let lines = ["[TIMESTAMP] [DEBUG] [Item ITEM_ID] {"];
for (let i = 0; i < 1000000; i++) {
lines.push(" [0]"); // Fake JSON, close enough for RegExp purposes.
}
lines.push("}");
lines.push("[TIMESTAMP]");
let myLongString = lines.join("\n");
const re = /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] ((.|\n)+?)\n\[/g;
myLongString.match(re);
If you replace the const re = ... line with:
const re = /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] (.+?)\n}/gs;
then the stack overflow disappears.
(It would be possible to reduce the simplified example even further, but then the connection with your original case wouldn't be as obvious.)
[Original post below -- the mechanism I explained here is factually correct, and applying the suggested replacement indeed improves performance by 25% because it makes the RegExp simpler to match, it just isn't enough to fix the stack overflow.]
The problematic pattern is:
\[(.+?)\]
which, after all, means "a [, then any number of arbitrary characters, then a ]". While I understand that regular expressions might seem like magic, they're actually real algorithmic work under the hood, kind of like miniature programs in their own right. In particular, any time a ] is encountered in the string, the algorithm has to decide whether to count this as one of the "arbitrary characters", or as the one ] that ends this sequence. Since it can't magically know that, it has to keep both possibilities "in mind" (=on the stack), pick one, and backtrack if that turns out to be incorrect. Since this backtracking information is kept on the stack (where else?), if you put sufficiently many ] into your string, the stack will run out of space.
Luckily, the solution is simple: since what you actually mean is "a [, then any number of characters that aren't ], then a ]", you can just tell the RegExp engine that, replacing . with [^\]]:
\[([^\]]+?)\]
Note: ((.|\n)+?)\n\[ seems fishy for the same reason, but according to this test doesn't appear to be the problem, even if I further increase the input size. I'm not sure why; it might be due to how I created the test. If you see further problems with the real input, it may be worth reformulating this part as well.
[/Original post]
Related
This is from Leetcode problem: Concatenated Words.
Below is a working solution. I added what I thought to be an optimization (see code comment), but it actually slows down the code. If I remove the wrapping if statement, it runs faster.
To me, the optimization helps avoid having to:
call an expensive O(n) substring()
check inside wordsSet
making an unnecessary function call to checkConcatenation
Surely if (!badStartIndices.has(end + 1)) isn't more expensive than all the above, right? Maybe it has something to do with Javascript JIT compilation? V8? Thoughts?
Use the following test input:
// Notice how the second string ends with a 'b'!
const words = [
'a',
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab',
];
// Function in question.
var findAllConcatenatedWordsInADict = function (words) {
console.time();
// 1) put all words in a set
const wordsSet = new Set(words);
let badStartIndices;
// 2) iterate words, recursively check if it's a valid word
const concatenatedWords = [];
function checkConcatenation(word, startIdx = 0, matches = 0) {
if (badStartIndices.has(startIdx)) {
return false;
}
if (startIdx === word.length && matches >= 2) {
concatenatedWords.push(word);
return true;
}
for (let end = startIdx; end < word.length; end++) {
// I ADDED THE IF STATEMENT AS AN OPTIMIZATION. BUT CODE RUNS FASTER WITHOUT IT.
// NOTE: Code is correct with & without if statement.
if (!badStartIndices.has(end + 1)) {
const curWord = word.substring(startIdx, end + 1);
if (wordsSet.has(curWord)) {
if (checkConcatenation(word, end + 1, matches + 1)) {
return true;
}
}
}
}
badStartIndices.add(startIdx);
return false;
}
for (const word of words) {
// reset memo at beginning of each word
badStartIndices = new Set();
checkConcatenation(word);
}
console.timeEnd();
return concatenatedWords;
};
Turns out this depends entirely on the input data, not on JavaScript or V8. (And as of writing this, I don't know what data you used for benchmarking.)
With the example input from the Leetcode page you've linked, badStartIndices never does anything useful (both of the .has checks always return false); so it's fairly obvious that doing this fruitless check twice is a little slower than doing it just once. In that case, the "dynamic programming" mechanism of the solution never kicks in, so the effective behavior degenerates to brute force, which is good enough because the input data is well-behaved. (In fact, deleting badStartIndices entirely would be even faster for such a test case.)
If I construct "evil" input data that actually leads to exponential combinatorial blow-up, i.e. where the badStartIndices.has(...) checks actually have something to do, then adding the early check does have a (small) performance benefit. (And without either of the checks, the computation would take "forever" for such inputs.)
So, taking a step back, this is one more example to illustrate that benchmarking is difficult; in particular, in order to get useful results, care must be taken to select relevant/realistic input data.
If the tests are too simple, developers are likely to not build optimizations that would help (a little or a lot) in high-load situations.
If the tests are too demanding, developers are likely to waste time on overly complicated code that ends up being slower than it could be for its target use case.
And if the code must handle any input with maximum performance, then as the developer you have the extra challenge of avoiding overhead for simple inputs while still scaling well to tough inputs...
I followed this tutorial which describes how to create a JavaScript Compiler for a ANTLR4 grammar (ECMAScript.g4). As an example, it also describes how to transform something from JavaScript to Python using visit(), visitChildren(), visitTerminal(), and visitErrorNode() methods implemented using functions of the ECMAScriptVisitor.js file.
For this, an expression {x: 1} is given as the input from JavaScript where the output should be {'x': 1} to match the Python-accepted format of expressions.
Everything worked fine till I ran the program where I got the below output:
What might be the reason for this error to appear? This is the link to the github repo where I have uploaded the part of the project I've completed so far. Here's the index.js:
const antlr4 = require('antlr4');
const ECMAScriptLexer = require('./lib/ECMAScriptLexer.js');
const ECMAScriptParser = require('./lib/ECMAScriptParser.js');
const PythonGenerator = require('./codegeneration/PythonGenerator.js');
const input = '{x: 1}';
const chars = new antlr4.InputStream(input);
const lexer = new ECMAScriptLexer.ECMAScriptLexer(chars);
lexer.strictMode = false; // do not use js strictMode
const tokens = new antlr4.CommonTokenStream(lexer);
const parser = new ECMAScriptParser.ECMAScriptParser(tokens);
const tree = parser.program();
console.log('JavaScript input:');
console.log(input);
console.log('Python output:');
const output = new PythonGenerator().start(tree);
console.log(output);
And here's PythonGenerator.js:
const ECMAScriptVisitor = require('../lib/ECMAScriptVisitor').ECMAScriptVisitor;
class Visitor extends ECMAScriptVisitor {
start(ctx) {
return this.visitExpressionSequence(ctx);
}
visitChildren(ctx) {
let code = '';
for (let i = 0; i < ctx.getChildCount(); i++) {
code += this.visit(ctx.getChild(i));
}
return code.trim();
}
visitTerminal(ctx) {
return ctx.getText();
}
visitPropertyExpressionAssignment(ctx) {
const key = this.visit(ctx.propertyName());
const value = this.visit(ctx.singleExpression());
return `'${key}': ${value}`;
}
}
module.exports = Visitor;
Thanks in advance!
I found out that, by replacing the const tree = parser.program(); statement in index.js by const tree = parser.expressionSequence(); the problem is solved.
The first thing of note is that you call start with a ProgContext object and then call visitExpressionSequence on it. This does not actually make a difference in this case because you override neither vistExpressionSequence nor visitProg, so both actually default to just calling visitChildren, so it makes no difference which one you call. Still, you should really only ever call visit explicitly, not any of the visitFoo methods. visit will then always make sure that the correct visitFoo method is called.
Moving on to the actual issue, you've got two problems: First the key in your output does not have quotes around it even though your visitPropertyExpressionAssignment method should accomplish exactly that. And secondly there's an <EOF> at the end of the output, which I assume you do not want.
The first problem is because your input produces a syntax error when parsed. This is because {x: 1} is not actually a valid JavaScript program because a { at the beginning of a statement is seen as the start of a block, not an object literal. You'd need to put parentheses around the literal to make it valid at the beginning of a statement.
The second problem is because the prog rule is terminated by the end of file, producing an EOF token. To avoid printing that, you can simply override visitProg to only call visit on its statement list and not on the EOF token.
You've fixed the first problem by only invoking the expressionSequence rule in your parser instead of prog. If to parse an expression instead of a program, that's fine (though in a REPL-like context, you might instead want to try to parse the input as an expression and, if that fails, parse it as a statement). Doing so may seem like it also fixes your second problem, but that's not quite the case:
You no longer get an <EOF> in the output because you no longer match the end of file. What this means is that if the input consists of a valid expression, followed by total garbage (say const input = '{x: 1} krgsfkjhwruei';), this will not cause a syntax error, but instead happily parse the {x: 1} part and then completely ignore the garbage part without any indication of a problem. That's almost never what you want. Instead you could define a new rule in your grammar that matches an expression, followed by the end of file, like this:
expressionInput: expressionSequence EOF;
Now if the input contains garbage after a valid expression, you'd get a syntax error. However this will re-introduce the problem of the <EOF> in the output. But again, you can just fix that by overriding visitExpressionInput and then only calling visit(ctx.expressionSequence);.
I have something like:
var sFunction = 'my_function("param1", "param2")';
var oMyObject = ...;
And I want to combine it so the result would be equal to:
oMyObject.my_function("param1", "param2");
Would much appreciate any tips.
Remark
As many of you suggested to find a root cause and try not to deal with the problematic input here are some pieces of information about the origins of the "problem".
The sFunction comes from database, hardcoded in one of the columns. It is custom one which should be called on object retrieved basing on other parameters of sFunction's database record.
So being backed up by your comments I will try suggesting changing data model in hope that it is not too late for that. Thank you all for your help.
I am given that as an input, it may come from db or anywhere else. I just have to deal with it in described way.
As Luca noted, you're probably best off solving the problem that brought you to the point of having code in a string that you feel you need to evaluate at runtime. The number of use cases for doing that is very low.
For instance, instead of
sFunction = 'my_function("param1", "param2")';
perhaps you could have
call = {
f: "my_function",
params: ["param1", "param2"]
};
Then it's:
oMyObject[call.f].apply(oMyObject, call.params);
call could even start life as JSON text you parse -- live example:
var json =
'{' +
'"f": "my_function",' +
'"params": ["param1", "param2"]' +
'}';
var call = JSON.parse(json);
var oMyObject = {
my_function: function(p1, p2) {
console.log(p1, p2);
}
};
oMyObject[call.f].apply(oMyObject, call.params);
That's markedly safer than an arbitrary code execution.
You can do this with your sFunction (eval("oMyObject." + sFunction)), but consider:
It lets any arbitrary code in sFunction run.
If User A supplies the code and then you run it on User B's system, you're compromising User B's privacy. (I am not a lawyer, but you could be doing so in a way that violates a country's data protection or privacy laws.)
Now, if you're loading code from a DB and you know that the code in the DB can only be put there by trusted people (for instance, developers on your team, not end users of the system), that's fine, it's largely like running a script file. But there's almost certainly a better way to do it than delivering the code as a string and evaling it.
But if the code comes from "anywhere else", it's not fine; see bullet points above. The setup is fundamentally broken and better options are available. Take that information to your boss, and if necessary to his/her boss, and if necessary his/her boss, until you find someone who can change the requirement.
Here's a string hack that doesn't use eval(), but as I (and others) have said, this is not a good solution. The better solution would be to return the function name and any arguments as a comma delimited string, which would at least make this kind of solution more straight-forward.
var sFunction = 'my_function("param1", "param2")';
// The object would have to already have the function:
var oMyObject = {
my_function: function(x,y){
return x + y;
}
};
// Remove the last ")" and split the remainder into an array at the "("
var funcParts = sFunction.replace(")","").split("(");
// Split the second part (the arguments) into its own array
var funcArgs = funcParts[1].split(",");
// Pass the function name as a string key to the object and then pass the arguments to that
console.log(oMyObject[funcParts[0]](funcArgs[0], funcArgs[1]));
The bigger question is, what ultimately are you trying to accomplish as there is almost always a better approach than this.
To do a dynamic function call you can of course eval as I did in the comments, which is of course a terrible idea. Here is a quick-and-dirty alternative:
const dynamicCallMethod = (obj, s) => {
try {
const fname = s.match(/([$\w]+\(/);
const params = s.match(/("[\w$]+")/g);
return obj[fname](...params);
} catch (e) {
return e;
}
};
Note I still think there's any easier way to do this if you describe the scenario in more detail. The above will fail for any non-ascii characters, for instance.
As the title suggests, I was trying to recursively solve a JavaScript problem. An exercise for my internet programming class was to invert any string that was entered in the function, and I saw this as a good opportunity to solve this with recursion. My code:
function reverseStr(str){
str = Array.from(str);
let fliparray = new Array(str.length).fill(0);
let char = str.slice(-1);
fliparray.push(char);
str.pop();
str.join("");
return reverseStr(str);
}
writeln(reverseStr("hello"))
The biggest problem is that your function doesn't have an end (base) case. It needs to have some way to recognize when it's supposed to stop or it will recurse forever.
The second problem is that you don't really seem to be thinking recursively. You're making some modification to the string, but then you just call reverseStr() all over again on the modified string, which is just going to start the process all over again.
The following doesn't really resemble your attempt (I don't know how to salvage your attempt), but it is a simple way to implement the reverse string algorithm recursively.
function reverseStr(str) {
// string is 0 or 1 characters. nothing to reverse
if (str.length <= 1) {
return str;
}
// return the first character appended to the end of the reverse of
// the portion after the first character
return reverseStr(str.substring(1)) + str.charAt(0);
}
console.log(reverseStr("Hello Everybody!"));
I'm using a modified version of this code (Update: that answer has since been updated to use correct code, but this question still carries value since it contains relevant test cases and discussions for this problem) to store a single object after stringification in chunked keys inside of sync storage.
Note that sync storage has a maximum quota size per item. So, I have those maxLengthPerItem and maxValueLength variables.
function lengthInUtf8Bytes(str) {
// by: https://stackoverflow.com/a/5515960/2675672
// Matches only the 10.. bytes that are non-initial characters in a multi-byte sequence.
var m = encodeURIComponent(str).match(/%[89ABab]/g);
return str.length + (m ? m.length : 0);
}
function syncStore(key, objectToStore, callback) {
var jsonstr = JSON.stringify(objectToStore), i = 0, storageObj = {},
// (note: QUOTA_BYTES_PER_ITEM only on sync storage)
// subtract two for the quotes added by stringification
// extra -5 to err on the safe side
maxBytesPerItem = chrome.storage.sync.QUOTA_BYTES_PER_ITEM - NUMBER,
// since the key uses up some per-item quota, use
// "maxValueBytes" to see how much is left for the value
maxValueBytes, index, segment, counter;
console.log("jsonstr length is " + lengthInUtf8Bytes(jsonstr));
// split jsonstr into chunks and store them in an object indexed by `key_i`
while(jsonstr.length > 0) {
index = key + "_" + i++;
maxValueBytes = maxBytesPerItem - lengthInUtf8Bytes(index);
counter = maxValueBytes;
segment = jsonstr.substr(0, counter);
while(lengthInUtf8Bytes(segment) > maxValueBytes)
segment = jsonstr.substr(0, --counter);
storageObj[index] = segment;
jsonstr = jsonstr.substr(counter);
}
// later used by retriever function
storageObj[key] = i;
console.log((i + 1) + " keys used (= key + key_i)");
// say user saves till chunk 20 in case I
// in case II, user deletes several snippets and brings down
// total no. of "required" chunks to 15; however, the previous chunks
// (16-20) remain in memory unless they are "clear"ed.
chrome.storage.sync.clear(function(){
console.log(storageObj);
console.log(chrome.storage.sync);
chrome.storage.sync.set(storageObj, callback);
});
}
The problem is in this line:
maxLengthPerItem = chrome.storage.sync.QUOTA_BYTES_PER_ITEM - NUMBER,
The problem is that 5 is the minimum NUMBER for which there's no error. Here's the sample code you can use to test my theory:
var len = 102000,
string = [...new Array(len)].map(x => 1).join(""),
Data = {
"my_text": string
},
key = "key";
syncStore(key, Data, function(){
console.log(chrome.runtime.lastError && chrome.runtime.lastError.message);
});
Using 4 yields MAX_QUOTA_BYTES_PER_ITEM exceed error. You can yourself adjust the value of len (to 20000, 60000 < 102000, etc.) to check my theory.
Question:
Why is the current method requiring exactly 5 as the minimum value? I know there's two quotes for stringification, but what about the other 3 characters? Where'd they come from?
Additionally, I've noticed that in textual Data like this one,
even 5 does not work. In the specific case above, minimum NUMBER required is 6.
Clarification:
The point of my question is not what are the other means to store data in sync.
The point of my question is why is the current method requiring exactly 5 (And why that textual data requires a 6.) Imho, my question is very specific and surely does not deserve a close vote.
Update: I've added new code which stores data based on measurement of length of UTF-8 bytes, but it still does not provide desirable results. I've also added code to more easily test my theory.
The problem is that Chrome applies JSON.stringify to each string chunk before storing it, which adds three \ characters to the first string (which, added to the known 2 for outer quotes, makes a full 5). This behavior is noted in the Chromium source code: Calculate the setting size based on its JSON serialization size (and the implementation does indeed compute size based on key.size() + value_as_json.size()).
That is, the value in key_0 is the string
{"my_text":"11111111...
But it is stored as
"{\"my_text\":\"11111111..."
The reason you need to account for the two outer quotes is the same reason you need to account for added slashes. Both are indicative of the output of JSON.stringify operating on a string input.
You can confirm that escape-slashes are the issue by doing
var jsonstr = JSON.stringify(objectToStore).replace(/"/g,"Z")
And observing that the required NUMBER offset is 2 instead of 5, because {Zmy_textZ:Z11111... does not have extra slashes.
I haven't looked closely, but the Lorem text contains a newline and a tab (see: id faucibus diam.\), which your JSON.stringify (correctly) turns into \n\t but then Chrome's additional stringify further expands to \\n\\t, for an extra 2 bytes you do not account for. If that gets chunked with two other quotes or other escapable characters, it could cause a chunk with 4 unaccounted-for bytes.
The solution here is to account for the escaping that Chrome will do upon storage. I'd suggest applying JSON.stringify to each segment when evaluating if it's too big, so that the correct number of bytes will be consumed by the chunking algorithm. Then, once you decide on a size that will not cause problems, even after being double-stringifed, consume that many bytes from the regular string. Something like:
while(lengthInUtf8Bytes(JSON.stringify(segment)) > maxValueBytes)
...
Note that this will automatically account for the two bytes from outer quotes, so there's no need to even have a QUOTA_BYTES_PER_ITEM - NUMBER computation. In the terms you've presented it, with this approach, the NUMBER is 0.
For some reason, the technique only works when we do this:
while(lengthInUtf8Bytes(JSON.stringify(JSON.stringify(segment))) > maxValueBytes)
here's a paste containing data that you can use to compare both this and #apsiller's original approach (and verify the fact that only the above approach works).
Here's the code I used to test all this stuff
I am not accepting either answer yet since neither of them provides an acceptable logic as to why only the above approach is working.
After carefully reading through this thread I finally was able to understand where the extra bytes come from. apsillers actually reference the part from the chromium code that holds the answer:
key.size() + value_as_json.size()
You have to account for the side of the key as well. So the working accurate check is:
while((lengthInUtf8Bytes(JSON.stringify(segment)) + key.length) > maxValueBytes)