How can I speed up my array search function? - javascript

I am working on dictionary application written with react-native.
When I want to filter the array from the search box, I wrote below function. This is working quite good when I test with 2000 word list. But when the word list goes to thousands the search speed is really slow.
So, how can I improve this search function?
//Filter array when input text (Search)
let filteredWords = []
if(this.state.searchField != null)
{
filteredWords = this.state.glossaries.filter(glossary => {
return glossary.word.toLowerCase().includes(this.state.searchField.toLowerCase());
})
}

There are multiple factors that are making this code slow:
You're using filter() with a lambda. This adds a function call overhead for each item being searched.
You're calling toLowercase() on both strings before calling includes(). This will allocate two new string objects for every comparison.
You're calling includes. For some reason the includes() method is not as well optimized in some browsers as indexOf().
for loop (-11%)
Instead of using the filter() method, I recommend creating a new Array and using a for loop to fill it.
const glossaries = this.state.glossaries;
const searchField = this.state.searchField;
const filteredWords = [];
for (let i = 0; i < glossaries.length; i++) {
if (glossaries[i].toLowerCase().includes(searchField.toLowerCase())) {
filteredWords.push(glossaries[i]);
}
}
toLowerCase allocations (-45%)
Memory allocation is expensive due to the fact that JavaScript uses garbage collection mechanism for freeing used memory. When a garbage collection is performed the whole program is paused while it tries to finds memory which is not used anymore.
You can get rid of the toLowerCase() (inside the search loop) completely by making a copy of the glossary everytime the glossary is updated, which I assume is not often.
// When you build the glossary
this.state.glossaries = ...;
this.state.searchGlossaries = this.state.glossaries.map(g => g.toLowerCase());
You can also remove the toLowerCase() on the searchText by calling it once before the loop. After these changes, the code will look like:
const glossaries = this.state.glossaries;
const searchGlassaries = this.state.searchGlossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = [];
for (let i = 0; i < glossaries.length; i++) {
if (searchGlassaries[i].includes(searchField)) {
filteredWords.push(glossaries[i]);
}
}
indexOf() instead of includes() (-13%)
I am not really sure why this is the case, but tests show that indexOf is a lot faster than includes.
const glossaries = this.state.glossaries;
const searchGlassaries = this.state.searchGlossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = [];
for (let i = 0; i < glossaries.length; i++) {
if (searchGlassaries[i].indexOf(searchField) !== -1) {
filteredWords.push(glossaries[i]);
}
}
Overall the performance has improved by 70%.
I got the performance percentages from https://jsperf.com/so-question-perf
Optimize the algorithm
In the comments you said you would like an example of optimizations that can be done when the requirements are loosened to only match words that start with the search text. One way to do this is a binary search.
Let's take the code from above as starting point. We sort the glossaries before we store it in the state. For sorting case insensitively, JavaScript exposes the Intl.Collator constructor. It provides the compare(x, y) method that returns:
negative value | X is less than Y
zero | X is equal to Y
positive value | X is greater than Y
And the resulting code:
// Static in the file
const collator = new Intl.Collator(undefined, {
sensitivity: 'base'
});
function binarySearch(glossaries, searchText) {
let lo = 0;
let hi = glossaries.length - 1;
while (lo <= hi) {
let mid = (lo + hi) / 2 | 0;
let comparison = collator.compare(glossaries[mid].word, searchText);
if (comparison < 0) {
lo = mid + 1;
}
else if (comparison > 0) {
hi = mid - 1;
}
else {
return mid;
}
}
return -1;
}
// When you build the glossary
this.state.glossaries = ...;
this.state.glossaries.sort(function(x, y) {
return collator.compare(x.word, y.word);
});
// When you search
const glossaries = this.state.glossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = [];
const idx = binarySearch(glossaries, searchField);
if (idx != -1) {
// Find the index of the first matching word, seeing as the binary search
// will end up somewhere in the middle
while (idx >= 0 && collator.compare(glossaries[idx].word, searchField) < 0) {
idx--;
}
// Add each matching word to the filteredWords
while (idx < glossaries.length && collator.compare(glossaries[idx].word, searchField) == 0) {
filteredWords.push(glossaries[idx]);
}
}

As the question doesn't seem to belong on CodeReview, I think there are a few things that you can do to make your code drastically faster [citation needed]:
Cache that call to this.state.searchField.toLowerCase() as you don't need to call it on every iteration.
Use regular old for loops instead of flashy-but-slow Array functions.
And here is the final result:
let filteredWords = []
if(this.state.searchField != null) {
let searchField = this.state.searchField.toLowerCase(),
theArray = this.state.glossaries; // cache this too
for(let i = 0, l = theArray.length; i < l; ++i) {
if(theArray[i].word.toLowerCase().includes(searchField)) {
filteredWords.push(theArray[i]);
}
}
}
Edit:
If you want to search for glossaries whose word start with searchField, then use indexOf === 0 instead of includes as the condition like this:
if(theArray[i].word.toLowerCase().indexOf(searchField) === 0) {

Related

Change position in for javascript

good morning, sorry first of all for my english. I'm trying to do a double loop to iterate through two strings, the thing is, I want the ocrString to start one position later each time, so that it can iterate through the string in order to see if there are any matches. That is, I want to find the matches without necessarily being equal in length and without being able to order it.
let ocrString = "casaidespcasa";
let pattern = "idesp";
let conteo = 0;
checkIDESP(ocrString, pattern);
function checkIDESP(ocrString, pattern) {
let ocrStringSeparado = ocrString.split("");
let patternSeparado = pattern.split("");
for (i = 0; i < ocrStringSeparado.length; i++) {
for (x = 0; x < patternSeparado.length; x++) {
console.log(ocrStringSeparado[i], pattern[x]);
if (ocrStringSeparado[i] == pattern[x]) {
conteo++;
}
}
}
if (conteo <= 3) {
console.log(conteo, "No sé si es un dni");
} else {
console.log(conteo, "es un dni");
}
}
Some way to go through the position of an array so that it first starts with 'Casaidespcasa' and then 'Asaidespcasa' etc.
That won't answer totally to your question (I don't really understand by the way).
Now for the last part:
"Some way to go through the position of an array so that it first starts with 'Casaidespcasa' and then 'Asaidespcasa' etc."
Perhaps that can help for you to solve your problem.
let ocrString = "casaidespcasa";
let ocrStringSeparado = ocrString.split("");
decreaseArr(ocrStringSeparado);
decreaseStr(ocrString);
function decreaseArr(arr) {
console.log(arr);
arr.shift();
// do something...
if (arr.length > 0) {
decreaseArr(arr);
}
}
function decreaseStr(str) {
console.log(str);
str = str.substring(1);
// do something...
if (str.length > 0) {
decreaseStr(str);
}
}
First function is with array, second with string.
Well, maybe the following would work for you?
const string = "casaidespcasa";
function comp(str, pat){
let pl=pat.length, sl=str.length, res=[];
for (let i=0; i<=sl-pl; i++){
let s=str.slice(i,i+pl); // get a portion of the original string
let n=s.split("").reduce((a,c,j)=>a+(c==pat[j] ? 1 : 0), 0); // count matches
if (n>2) res.push([i,s]); // at least 3 character must match!
}
return res;
}
// do the string comparison with an array of patterns:
["idesp","1detp","deaspdc","cosa","asaic"].forEach(p=>console.log(p+":",comp(string,p)))
The function returns an array of possible "fuzzy" matches: Each entry is an array, containing the position and the matching substring.

Remove duplicates and mirror from [x,y] coordinates

let x=[1,2,6,3,5,5,5,4,4];
let y=[3,4,3,5,2,4,4,2,6];
expected_x=[1,2,6,3,5,5,4]
expected_y=[3,4,3,5,2,4,6]
Think of x and y as coordinates.[1,3] will be first point and [4,6] will be last point.
If a [X,Y] has duplicates, only one of the [X,Y] will be displayed in the expected output (no duplicate). And if, there is a mirror like [X,Y] which is a mirror of [Y,X] with both at the same index.
This is the code I have written for just one array to make the array unique. However, I am unsure on how to use it with 2 seperate arrays representing x and y coordinates. Any help will be appreciated :)
let chars = ['A', 'B', 'A', 'C', 'B'];
let uniqueChars = [...new Set(chars)];
console.log(uniqueChars);
Use this:
let x=[1,2,6,3,5,5,5,4,4];
let y=[3,4,3,5,2,4,4,2,6];
const coordinates = [];
let i = -1;
while ( x[++i] ) {
const c = {
index: i,
value: [x[i], y[i]]
}
coordinates.push(c);
}
const coordArray = coordinates.reduce((p, next) => {
if (!p.values.includes(JSON.stringify(next.value)) && !p.values.includes(JSON.stringify([...next.value].reverse()))) {
p.values.push(JSON.stringify(next.value));
p.indexes.push(next.index);
}
return p;
},{
indexes: [],
values: []
})
coordArray.values = coordArray.values.map(JSON.parse)
console.log(coordArray)
You can use a for loop and iterate both arrays together, since they have the same length (being an x,y pair) to each other.
You can also keep a "history" of duplicates and mirrors. Then all you need to do while iterating is check the history. If there is no match, append the current to the result arrays, then update the history.
let x=[1,2,6,3,5,5,5,4,4];
let y=[3,4,3,5,2,4,4,2,6];
let h=[]; // history
let rx = []; // result x
let ry = []; // result y
for (let i = 0; i < x.length && i < y.length; i++) {
// The if line (with include()) would be nice if it worked, but it didn't because of
// always returning false.
// Instead I will have to manually search.
// if (h.includes([x[i], y[i]]) || h.includes([y[i], x[i]])) {
let found = false;
for (let s = 0; s < h.length; s++) {
// check for duplicate
if (h[s][0] == x[i] && h[s][1] == y[i]) {
found = true;
break;
}
// check for mirror
if (h[s][0] == y[i] && h[s][1] == x[i]) {
found = true;
break;
}
}
if (found) {
// do nothing, its a duplicate or mirror
console.log("duplicate or mirror detected on index " + i);
}
else {
// update results
rx.push(x[i]);
ry.push(y[i]);
// update history
h.push([ x[i], y[i] ]);
}
}
console.log("rx: " + rx);
console.log("ry: " + ry);
In short, .include() would have been nice, but apparantly the array by reference broke my intended logic. I don't know. But the above separated those concerns out by a literal search of "history", which would alter the "found" boolean to know whether a duplicate or mirror existed.
Obviously this code could like be shortened into less than 10 or 7 lines, but I wanted to work on it because it was interesting and the approach used demonstrates how regular for loops could be used to solve such "iteration" problems.
Hopes it helps.

Word Break algorithm

I'm trying to implement the "Word Break" algorithm.
Problem:
Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
Note:
The same word in the dictionary may be reused multiple times in the segmentation.
You may assume the dictionary does not contain duplicate words.
Example:
Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".
My solution:
var wordBreak = function(s, wordDict) {
if(!wordDict || wordDict.length === 0)
return false;
while(wordDict.length > 0 || s.length > 0) {
const word = wordDict.shift();
const index = s.indexOf(word);
if(index === -1) {
return false;
}
s = s.substring(0, index) + s.substring(index+word.length, s.length);
}
return s.length === 0 && wordDict.length === 0 ? true : false;
};
It works for the example (input) above. However it fails for the input below.
Input: s = "applepenapple", wordDict = ["apple", "pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple".
Note that you are allowed to reuse a dictionary word.
How can I keep track of words that I already eliminate and check it at the end. This input above, the remaining s string contains "apple" which is in the word dictionary, so the output should be true.
Thanks
A simple Javascript solution.
This loops through the wordDict array and checks if each word exist in the str. If it doesn't that is when the indexOf the word return -1, the function returns false. However, if the words in the wordDict array are in the string, it returns true at the end of the for loop.
const wordBreak =(str, wordDict)=>{
if (!wordDict || wordDict.length === 0) return false
for(let i=0; I<wordDict.length; i++){
const dictIndex = str.indexOf(wordDict[i])
if(dictIndex === -1){
return false
}
}
return true
}
This is an interesting problem I met two years ago in a different context, i.e., query tokenization. In my case, the number of words in the dictionary was in the order of several million, therefore a recursive approach looking each time for a different word of the dictionary was not practicable. Furthermore, I needed to apply dynamic programming to solve the task for strict efficiency reasons.
First of all, I suggest you to use the AhoCorasick algorithm to find the words within your search string. The algorithm looks for an arbitrary number of patterns in a string in linear time in the length of the string regardless of the number of patterns to find (no more number of words times length of the string operation, indeed each find of a word in a string needs to scan the entire string..).
Luckily, I found a javascript implementation of the algorithm here.
Using the code linked above and dynamic programming to track the words appearing in your string, I wrote the following javascript solution:
function wordBreak(s, wordDict) {
const len = s.length;
const memoization_array_words = new Array(len).fill(null);
const memoization_array_scores = new Array(len).fill(0);
const wordScores = {};
wordDict.forEach(function(word) {
wordScores[word] = 1
});
automata = new AhoCorasick(wordDict);
results = automata.search(s);
results.forEach(function(result) {
// result[0] contains the end position
// result[1] contains the list of words ending in that position
const end_pos = result[0];
result[1].forEach(function(word) {
const prev_end_pos = end_pos - word.length;
const prev_score = (prev_end_pos == -1) ? 0 : memoization_array_scores[prev_end_pos];
const score = prev_score + wordScores[word];
if (score > memoization_array_scores[end_pos]) {
memoization_array_words[end_pos] = word;
memoization_array_scores[end_pos] = score;
}
});
});
if (memoization_array_words[len-1] == null) {
return false;
}
solution = []
var pos_to_keep = len - 1;
while (pos_to_keep >= 0) {
const word = memoization_array_words[pos_to_keep];
solution.push(word);
pos_to_keep -= word.length;
}
return solution.reverse()
}
where memoization_array_words and memoization_array_scores are filled left to right when we meet a word occurring after a previous one or at the beginning of the string s. The code should be autoesplicative, but if you need any explanation write me a comment, please.
As a plus, I associated a score to each word (here is 1 for simplicity) that allows you to distinguish between the different solutions. For instance, if you associate to each word an importance score, you will end up with the tokenization with the greatest score. In the code above, the tokenization with the highest number of words.
Extended version: I testing over the wordDict with some if there is one of the worde that beginns at the test-string (indexOf==0). If so I shorten the string about the length of the word and call the function recursivly with the shortened string. Otherwise the string is not splitable and I return false. I go this way on till an error occurs or the length of the string is 0 and I win because everything goes allright.
Remark: The error when the WordBreak is not clearly like with s= "cars" wordDict = ["car","ca","rs"] is now fixed. For this I calling in the some-methode the algorithm recursivly. So if one way stops before ending I go backwards and search for alternatives till I found one or there is no possibility left.
Remarks to; array.some
In an array.forEach there can't used a break without using some ugly tricks (like try...catch and throwing an error), so I could use the classic variant of the for-loop. But there exists the array.some method this loops like a forEach-loop but there had only one of the elements to be return true so the result is true.
Example:
const array = [1, 2, 3, 4, 5];
// checks whether an element is even
const even = (element) => element % 2 === 0;
console.log(array.some(even));
Here is the code of the working algorithm.
var wordBreak = function(s, wordDict) {
if (!wordDict || wordDict.length === 0) return false;
while (s.length > 0) {
let test = wordDict.some( (word,index) => {
if (s.indexOf(word)===0) {
s_new = s.substr(word.length);
return wordBreak(s_new, wordDict);
}
});
if (!test ) return false;
s=s_new;
}
if (s.length === 0) return true;
}
s = "leetcode"; wordDict = ["leet", "code"];
console.log(wordBreak(s, wordDict));
s = "applepenapple"; wordDict = ["apple", "pen"];
console.log(wordBreak(s, wordDict));
s= "cars"; wordDict = ["car","ca","rs"];
console.log(wordBreak(s, wordDict));
function wordBreak(dict, str){
if (!str){
return true;
}
for (const word of dict){
if (str.startsWith(word)){
return wordBreak(dict, str.substring(word.length, str.length))
}
}
return false;
}
You could also probably optimize the loop over dict by pre-sorting the array and using binary search, but hopefully this gets the point across.
If you'd be looking for a Dynamic Programming solution, we'd use an array for recording, and then we'd loop through and keep track of the word.
This'll pass through in JavaScript:
const wordBreak = function(s, wordDict) {
const len = s.length
const dp = new Array(len + 1).fill(false)
dp[0] = true
for (let i = 1; i < len + 1; i++) {
for (let j = 0; j < i; j++) {
if (dp[j] === true && wordDict.includes(s.slice(j, i))) {
dp[i] = true
break
}
}
}
return dp[s.length]
}
In Python, we would have used a list (which is similar to an array of JavaScript) with the same size as our string:
class Solution:
def wordBreak(self, s, words):
dp = [False] * len(s)
for i in range(len(s)):
for word in words:
k = i - len(word)
if word == s[k + 1:i + 1] and (dp[k] or k == -1):
dp[i] = True
return dp[-1]
Similarly in Java, we'd have used a boolean[]:
public final class Solution {
public static final boolean wordBreak(
String s,
List<String> words
) {
if (s == null || s.length() == 0) {
return false;
}
final int len = s.length();
boolean[] dp = new boolean[len];
for (int i = 0; i < len; i++) {
for (int j = 0; j <= i; j++) {
final String sub = s.substring(j, i + 1);
if (words.contains(sub) && (j == 0 || dp[j - 1])) {
dp[i] = true;
break;
}
}
}
return dp[len - 1];
}
}
Here is LeetCode's DP solution:
public class Solution {
public boolean wordBreak(String s, List<String> wordDict) {
Set<String> wordDictSet=new HashSet(wordDict);
boolean[] dp = new boolean[s.length() + 1];
dp[0] = true;
for (int i = 1; i <= s.length(); i++) {
for (int j = 0; j < i; j++) {
if (dp[j] && wordDictSet.contains(s.substring(j, i))) {
dp[i] = true;
break;
}
}
}
return dp[s.length()];
}
}
References
For additional details, please see the Discussion Board which you can find plenty of well-explained accepted solutions in there, with a variety of languages including efficient algorithms and asymptotic time/space complexity analysis1, 2.

Simplest way of finding mode in Javascript [duplicate]

This question already has answers here:
Get the element with the highest occurrence in an array
(42 answers)
Closed 4 years ago.
I am a beginner in JavaScript and I was trying to write code for finding the mode. My code is running but it can find the mode only when it is written consecutively. But when there is an array like this a = [1,2,3,4,5,2], it can not find the mode.
As I am a beginner I do not want to write anything complex but want to learn it in the simplest way. Can anyone please help me in this purpose?
list = [1,2,3,4,5,6,7,7]
var empty = []
i = 0
max = 0
while (i<list.length){
if (list[i]==list[i+1]){
empty = list[i]
i += 1
}else{
i +=1
}
}
document.write(empty)
Your code assumes that the parameter array is pre-sorted which is a risky and limiting assumption, and only appears to work on sorted arrays (counterexample: [1,1,1,7,7] incorrectly reports 7 as the mode).
If you wish you persist with this approach, you're on the right track, but you'll need to keep track of the current/best streaks, current/best elements and perform a final check for longest streak before returning the result:
var mode = a => {
a = a.slice().sort((x, y) => x - y);
var bestStreak = 1;
var bestElem = a[0];
var currentStreak = 1;
var currentElem = a[0];
for (let i = 1; i < a.length; i++) {
if (a[i-1] !== a[i]) {
if (currentStreak > bestStreak) {
bestStreak = currentStreak;
bestElem = currentElem;
}
currentStreak = 0;
currentElem = a[i];
}
currentStreak++;
}
return currentStreak > bestStreak ? currentElem : bestElem;
};
console.log(mode([1,2,3,4,5,6,7,7]));
console.log(mode([1,1,1,4,5,6,7,7]));
console.log(mode([1,2,3,3,3,6,3,7]));
console.log(mode([1,3,3,4,5,2,2,1]));
console.log(mode([]));
Having said that, sorting is a non-linear operation, so I recommend trying another approach.
The idea is to keep a count of occurrences for each item in the array using an object, then take the element with the highest count. I used reduce to perform these two operations:
const mode = a =>
Object.values(
a.reduce((count, e) => {
if (!(e in count)) {
count[e] = [0, e];
}
count[e][0]++;
return count;
}, {})
).reduce((a, v) => v[0] < a[0] ? a : v, [0, null])[1];
;
console.log(mode([1,2,3,4,5,6,7,7]));
console.log(mode([1,1,1,4,5,6,7,7]));
console.log(mode([1,2,3,3,3,6,3,7]));
console.log(mode([1,3,3,4,5,2,2,1]));
console.log(mode([]));
Or, the same thing, written without reduce for readability:
const mode = a => {
const count = {};
a.forEach(e => {
if (!(e in count)) {
count[e] = 0;
}
count[e]++;
});
let bestElement;
let bestCount = 0;
Object.entries(count).forEach(([k, v]) => {
if (v > bestCount) {
bestElement = k;
bestCount = v;
}
});
return bestElement;
};
console.log(mode([1,2,3,4,5,6,7,7]));
console.log(mode([1,1,1,4,5,6,7,7]));
console.log(mode([1,2,3,3,3,6,3,7]));
console.log(mode([1,3,3,4,5,2,2,1]));
console.log(mode([]));
Note that these approaches don't choose the same mode in case of ties. You may wish to add an array to keep track of all modes, or change your algorithm to pick the first or last occurring mode to suit your needs.
use a hash
list = [1,2,3,4,5,6,7,7]
counts = {}
list.forEach(function(e) {
if(counts[e] === undefined) {
counts[e] = 0
}
counts[e] += 1
})
which results in this:
{1:1,2:1,3:1,4:1,5:1,6:1,7:2}
This related question deals with finding the max and min in a hash, which is essentially what you do at the end of this.
Fast way to get the min/max values among properties of object

Find smallest substring containing a given set of letters in a larger string

Say you have the following string:
FJKAUNOJDCUTCRHBYDLXKEODVBWTYPTSHASQQFCPRMLDXIJMYPVOHBDUGSMBLMVUMMZYHULSUIZIMZTICQORLNTOVKVAMQTKHVRIFMNTSLYGHEHFAHWWATLYAPEXTHEPKJUGDVWUDDPRQLUZMSZOJPSIKAIHLTONYXAULECXXKWFQOIKELWOHRVRUCXIAASKHMWTMAJEWGEESLWRTQKVHRRCDYXNT
LDSUPXMQTQDFAQAPYBGXPOLOCLFQNGNKPKOBHZWHRXAWAWJKMTJSLDLNHMUGVVOPSAMRUJEYUOBPFNEHPZZCLPNZKWMTCXERPZRFKSXVEZTYCXFRHRGEITWHRRYPWSVAYBUHCERJXDCYAVICPTNBGIODLYLMEYLISEYNXNMCDPJJRCTLYNFMJZQNCLAGHUDVLYIGASGXSZYPZKLAWQUDVNTWGFFY
FFSMQWUNUPZRJMTHACFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDADYSORTYZQPWGMBLNAQOFODSNXSZFURUNPMZGHTAJUJROIGMRKIZHSFUSKIZJJTLGOEEPBMIXISDHOAIFNFEKKSLEXSJLSGLCYYFEQBKIZZTQQ
XBQZAPXAAIFQEIXELQEZGFEPCKFPGXULLAHXTSRXDEMKFKABUTAABSLNQBNMXNEPODPGAORYJXCHCGKECLJVRBPRLHORREEIZOBSHDSCETTTNFTSMQPQIJBLKNZDMXOTRBNMTKHHCZQQMSLOAXJQKRHDGZVGITHYGVDXRTVBJEAHYBYRYKJAVXPOKHFFMEPHAGFOOPFNKQAUGYLVPWUJUPCUGGIXGR
AMELUTEPYILBIUOCKKUUBJROQFTXMZRLXBAMHSDTEKRRIKZUFNLGTQAEUINMBPYTWXULQNIIRXHHGQDPENXAJNWXULFBNKBRINUMTRBFWBYVNKNKDFR
I'm trying to find the smallest substring containing the letters ABCDA.
I tried a regex approach.
console.log(str.match(/[A].*?[B].*?[C].*?[D].*?[A]/gm).sort((a, b) => a.length - b.length)[0]);
This works, but it only find strings where ABCDA appear (in that order). Meaning it won't find substring where the letters appear in a order like this: BCDAA
I'm trying to change my regex to account for this. How would I do that without using | and type out all the different cases?
You can't.
Let's consider a special case: Assume the letters you are looking for are A, A, and B. At some point in your regexp there will certainly be a B. However, the parts to the left and to the right of the B are independent of each other, so you cannot refer from one to the other. How many As are matched in the subexpression to the right of the B depends on the number of As being already matched in the left part. This is not possible with regular expressions, so you will have to unfold all the different orders, which can be many!
Another popular example that illustrates the problem is to match opening brackets with closing brackets. It's not possible to write a regular expression asserting that in a given string a sequence of opening brackets is followed by a sequence of closing brackets of the same length. The reason for this is that to count the brackets you would need a stack machine in contrast to a finite state machine but regular expressions are limited to patterns that can be matched using FSMs.
This algorithm doesn't use a regex, but found both solutions as well.
var haystack = 'FJKAUNOJDCUTCRHBYDLXKEODVBWTYPTSHASQQFCPRMLDXIJMYPVOHBDUGSMBLMVUMMZYHULSUIZIMZTICQORLNTOVKVAMQTKHVRIFMNTSLYGHEHFAHWWATLYAPEXTHEPKJUGDVWUDDPRQLUZMSZOJPSIKAIHLTONYXAULECXXKWFQOIKELWOHRVRUCXIAASKHMWTMAJEWGEESLWRTQKVHRRCDYXNTLDSUPXMQTQDFAQAPYBGXPOLOCLFQNGNKPKOBHZWHRXAWAWJKMTJSLDLNHMUGVVOPSAMRUJEYUOBPFNEHPZZCLPNZKWMTCXERPZRFKSXVEZTYCXFRHRGEITWHRRYPWSVAYBUHCERJXDCYAVICPTNBGIODLYLMEYLISEYNXNMCDPJJRCTLYNFMJZQNCLAGHUDVLYIGASGXSZYPZKLAWQUDVNTWGFFYFFSMQWUNUPZRJMTHACFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDADYSORTYZQPWGMBLNAQOFODSNXSZFURUNPMZGHTAJUJROIGMRKIZHSFUSKIZJJTLGOEEPBMIXISDHOAIFNFEKKSLEXSJLSGLCYYFEQBKIZZTQQXBQZAPXAAIFQEIXELQEZGFEPCKFPGXULLAHXTSRXDEMKFKABUTAABSLNQBNMXNEPODPGAORYJXCHCGKECLJVRBPRLHORREEIZOBSHDSCETTTNFTSMQPQIJBLKNZDMXOTRBNMTKHHCZQQMSLOAXJQKRHDGZVGITHYGVDXRTVBJEAHYBYRYKJAVXPOKHFFMEPHAGFOOPFNKQAUGYLVPWUJUPCUGGIXGRAMELUTEPYILBIUOCKKUUBJROQFTXMZRLXBAMHSDTEKRRIKZUFNLGTQAEUINMBPYTWXULQNIIRXHHGQDPENXAJNWXULFBNKBRINUMTRBFWBYVNKNKDFR';
var needle = 'ABCDA'; // the order of letters doesn't matter
var letters = {};
needle.split('').forEach(function(ch) {
letters[ch] = letters[ch] || 0;
letters[ch]++;
});
var shortestSubstringLength = haystack.length;
var shortestSubstrings = []; // storage for found substrings
var startingPos = 0;
var length;
var currentPos;
var notFound;
var letterKeys = Object.keys(letters); // unique leters
do {
lettersLeft = JSON.parse(JSON.stringify(letters)); // copy letters count object
notFound = false;
posStart = haystack.length;
posEnd = 0;
letterKeys.forEach(function(ch) {
currentPos = startingPos;
while (!notFound && lettersLeft[ch] > 0) {
currentPos = haystack.indexOf(ch, currentPos);
if (currentPos >= 0) {
lettersLeft[ch]--;
posStart = Math.min(currentPos, posStart);
posEnd = Math.max(currentPos, posEnd);
currentPos++;
} else {
notFound = true;
}
}
});
if (!notFound) {
length = posEnd - posStart + 1;
startingPos = posStart + 1; // starting position for next iteration
}
if (!notFound && length === shortestSubstringLength) {
shortestSubstrings.push(haystack.substr(posStart, length));
}
if (!notFound && length < shortestSubstringLength) {
shortestSubstrings = [haystack.substr(posStart, length)];
shortestSubstringLength = length;
}
} while (!notFound);
console.log(shortestSubstrings);
Maybe not as clear as using regex could be (well, for me regex are never really clear :D ) you can use brute force (not so brute)
Create an index of "valid" points of your string (those with the letters you want) and iterate with a double loop over it getting substrings containing at least 5 of those points, checking that they are valid solutions. Maybe not the most efficient way, but easy to implement, to understand, and probably to optimize.
var haystack="UGDVWUDDPRQLUZMSZOJPSIKAIHLTONYXAULECXXKWFQOIKELWOHRVRUCXIAASKHMWTMAJEWGEESLWRTQKVHRRCDYXNTLDSUPXMQTQDFAQAPYBGXPOLOCLFQNGNKPKOBHZWHRXAWAWJKMTJSLDLNHMUGVVOPSAMRUJEYUOBPFNEHPZZCLPNZKWMTCXERPZRFKSXVEZTYCXFRHRGEITWHRRYPWSVAYBUHCERJXDCYAVICPTNBGIODLYLMEYLISEYNXNMCDPJJRCTLYNFMJZQNCLAGHUDVLYIGASGXSZYPZKLAWQUDVNTWGFFYFFSMQWUNUPZRJMTHACFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDADYSORTYZQPWGMBLNAQOFODSNXSZFURUNPMZGHTAJUJROIGMRKIZHSFUSKIZJJTLGOEEPBMIXISDHOAIFNFEKKSLEXSJLSGLCYYFEQBKIZZTQQXBQZAPXAAIFQEIXELQEZGFEPCKFPGXULLAHXTSRXDEMKFKABUTAABSLNQBNMXNEPODPGAORYJXCHCGKECLJVRBPRLHORREEIZOBSHDSCETTTNFTSMQPQIJBLKNZDMXOTRBNMTKHHCZQQMSLOAXJQKRHDGZVGITHYGVDXRTVBJEAHYBYRYKJAVXPOKHFFMEPHAGFOOPFNKQAUGYLVPWUJUPCUGGIXGR";
var needle="ABCD";
var size=haystack.length;
var candidate_substring="";
var minimal_length=size;
var solutions=new Array();
var points=Array();
for(var i=0;i<size;i++){
if(needle.indexOf(haystack[i])>-1) points.push(i);
}
var limit_i= points.length-4;
var limit_k= points.length;
for (var i=0;i<limit_i;i++){
for(var k=i;k<limit_k;k++){
if(points[k]-points[i]+1<=minimal_length){
candidate_substring=haystack.substr(points[i],points[k]-points[i]+1);
if(is_valid(candidate_substring)){
solutions.push(candidate_substring);
if(candidate_substring.length < minimal_length) minimal_length=candidate_substring.length;
}
}
}
}
document.write('<p>Solution length:'+minimal_length+'<p>');
for(var i=0;i<solutions.length;i++){
if(solutions[i].length<=minimal_length) document.write('<p>Solution:'+solutions[i]+'<p>');
}
function is_valid(candidate_substring){
//verify we've got all characters
for(var j=0;j<candidate_substring.length;j++){
if(candidate_substring.indexOf(needle.charAt(j))<0) return false;
}
//...and verify we have two "A"
if(candidate_substring.indexOf("A")==candidate_substring.lastIndexOf("A")) return false;
return true;
}
Just had this problem in an interview as a coding assignment and came up with another solution, (it's not as optimal as the one above but maybe it's easier to understand).
function MinWindowSubstring(strArr) {
const N = strArr[0];
const K = strArr[1];
const letters = {};
K.split('').forEach( (character) => {
letters[character] = letters[character] ? letters[character] + 1 : 1;
});
let possibleSequencesList = [];
const letterKeys = Object.keys(letters);
for(let i=0; i< N.length; i++) {
const char = N[i];
if (new String(letterKeys).indexOf(char) !== -1) {
// found a character in the string
// update all previus sequences
possibleSequencesList.forEach((seq) => {
if(!seq.sequenceComplete) {
seq[char] = seq[char]-1;
seq.lastIndex = i;
// check if sequence is complete
var sequenceComplete = true;
letterKeys.forEach( (letter) => {
if(seq[letter] > 0) {
sequenceComplete = false;
}
});
seq.sequenceComplete = sequenceComplete
}
})
// create a new sequence starting from it
const newSeq = {
startPoint: i,
lastIndex: i,
sequenceComplete: false,
...letters
}
newSeq[char] = newSeq[char]-1;
possibleSequencesList.push(newSeq);
}
}
// cleanup sequences
let sequencesList = possibleSequencesList.filter(sequence => sequence.sequenceComplete);
let output = [];
let minLength = N.length;
// find the smalles one
sequencesList.forEach( seq => {
if( (seq.lastIndex - seq.startPoint) < minLength) {
minLength = seq.lastIndex - seq.startPoint;
output = N.substring(seq.startPoint, seq.lastIndex + 1);
}
})
return output;
}

Categories