Javascript Slow operatiosn async - javascript

In my page, I have some logic that searches through the large array of items by given term.
These items are the large list of data on the client side.
For example, I search 'user' and it filters out 'user.edit', 'user.save' and so on.
When the list gets very large, it blocks the search input.
When I write 'a' it starts searching, and if I type something, it gets rendered when filtering is complete.
I want the data to be on the client side for a couple of reasons.
Is there the best way to solve the issue:
My current suggestions are these:
1) Filter the items by batches of 2000(or 5000) whatever makes sense.
Filter the first 2000 records show filtered results, after that filter the next 2000, show them and so until all items are iterated.
2) Using setTimeout() for each batch - but this could cause an issue because I don't know how long will take to filter each batch.
3) Using setImmediate - "This method is used to break up long-running operations" - its solution maybe but it is Non-standard, and I don't know if its gonna break sometime in the future.
4) Using promises somehow - I think it will be hard with promises because the code is sync (uses indexOf for example) and with or without the promises it will block the UI.
Can you recommend me something? I avoid large libraries or webworkers.
Thank you.

This does sound like a good use case for web workers. As they are off thread, it would not block the user interaction.
If I understand correctly, the data is already loaded and it's searching large datasets what is causing the delay. If this is correct:
I think the general answer to your question is using better data structures and algorithms to reduce the complexity.
Assuming that it does not need to match, but simply "start with":
You could store data in a Trie and run the tree until the point and return all the children.
If the data is ordered, you could implement a variation of binary search to look for the the index-range of elements.
If the issue is on handling the large dataset. Yes, loading it progressively would be best. APIs, for example, usually have a next page token for you to use this to call it again. You could do something like that, load a batch, complete the process and, when completed, call the same operations on the next batch.

All 1)-4) are valid points. But mainly the optimization of the search depends on your implementation. For example, if you search for strings starting with a given query then you could build a suffix tree https://www.geeksforgeeks.org/pattern-searching-using-suffix-tree/ to decrease complexity.
Also if you are researching thru that array every time user types a letter then I would debounce (https://underscorejs.org/#debounce) search function to be executed only once he stops typing.

Related

What is the best way to reduce firestore document reads in chat app on page load?

I have a chat app that is charging me a large number of reads for each page load, 1 for each message to show. I'm trying to figure out a way to reduce that number and optimize for cost, as refreshing the page a few times costs hundreds of reads.
The firestore pricing documentation says
For queries other than document
reads, such as a request for a list of collection IDs, you are billed
for one document read. If fetching the complete set of results
requires more than one request (for example, if you are using
pagination), you are billed once per request.
I considered that maybe if I fetch an entire collection without a query like shown here in the docs, a cost difference might be remotely possible. I'm sure that's probably wrong, but I can't find anything specifying what the exceptions are that only cost 1 read. It also crossed my mind to create an array to hold the most recent messages in the parent document of the collection, but the security rules for updating that array seem overly complex and not practical. I also read about using the firebase cache, but that doesn't seem useful here.
Here is code to demonstrate how I'm currently loading messages. I'm using the react-firebase-hooks library to snapshot this data with useCollectionData:
const q = query(messagesRef, orderBy("createdAt", "desc"), limit(100))
const [messages] = useCollectionData(q)
In researching, I found this question where I'm pretty sure the accepted answer is wrong. It did make me question the rules. Are there any strategies to reduce the number of reads for this common use case?
Pagination still incurs charges on a per-document read, right?
Yes, it does, but only when you load more pages.
I'm not trying to load the entire collection, but rather wondering if loading the collection without a query has a different cost than with.
Loading a collection without a query that is limiting the results, means that you're reading the entire collection. And, yes, the cost will be much higher if you're not using a query. Remember, that the cost of reading a collection/query in Firestore is equal to the number of documents that are actually returned. For example, if you have a collection of 1 million documents, and your query returns 100, you'll have to pay only 100 document reads.
I'm overall trying to figure out if there's a strategy that can improve the read cost of the example query I gave.
No. If you need to get the newest 100 messages, that's the best query you can have. The only change you can make to decrease the number of reads would be to change the value that you pass to the limit() function. And maybe it makes sense since a user might not be interested in reading 100 messages at once. Always try to display data that fits into a screen, and load any other data progressively.

How to get total number of pages in DynamoDB if we set a limit?

I have a list of documents that are stored in DynamoDB. There are more than 10,000 documents being stored in the db. I have set the limit to 100 when requesting. DynamoDB is doing a good job by returning me the first 100 documents and the LastEvaluatedKey to get the next 100 documents.
The problem here is I also want the DynamoDB to return me the total number of pages for pagination purpose. In this case since i have 10,000 documents it should return 100 (the number of pagination)
For now what I have done is that I have been counting manually by looping the queries until it doesn't return me the LastEvaluatedKey. Add up how many looping has been done to get the total page. But I believe there are better approach.
As the other answer has correctly explained, there is no efficient way to get total result counts for DynamoDB query or scan operations. As such there is no efficient way to get total number of pages.
However, what I wanted to call out is that modern UIs have been moving away from classic pagination towards an infinite scroll design pattern. Where the “next page” of results is loaded on demand as the List is scrolled. This can be achieved with DynamoDB. You can still show discrete pages but cannot show, apriori, how many results or how many pages there are.. It’s a current shortcoming of DynamoDB.
Neither "Scan" (to read the entire table) nor "Query" (to read all the items with the same partition key) operations return an estimate on how many total results there are. In some cases (e.g., when a FilterExpression is provided), there is no efficient way for DynamoDB to do this estimation. In other cases there may be a way for DynamoDB to provide this information, but it doesn't.
If I understand you correctly, you want to Scan the entire table, without a filter. Like I said, Scan itself doesn't give you the number of items in the table. But you can find this number using DescribeTable, which returns, among other things, an "ItemCount" attribute, which is an estimate on the total number of items in the table, which may not be completely up-to-date but perhaps is good enough for your needs (if you want an estimate for some sort of progress report, that doesn't need to be 100% accurate).
If you really need accurate and up-to-the-moment counts of items in a partition or an entire table, you can always try to maintain such counters as separate data. Doing this correctly is not trivial, and will have performance implications, but in some use cases (e.g., rare writes and a lot of reads) may be useful.
You can maintain your own counts using DynamoDB streams - basically you create a Lambda function that watches for items being created/deleted and then write back to a counter item in DynamoDB that stores the current count of items.

Alfresco - Attempting to delete files, but Lucene search keeps returning same 1000 results including deleted nodes

Alfresco Version 3.3 -
I'm writing a JavaScript program in the Alfresco Repository to delete all the archived files (similar place to Windows' Recycle Bin). I've found that the Lucene search only returns 1000 nodes. So, my approach was to delete them and do the same search again to hopefully get another 1000 nodes and loop it until there were no search results. However, it returns the same 1000 results after I deleted from the first result. I've tried putting longer and longer pauses before doing the query again in case Lucene needed time to re-index after the deletes, even as long as five minutes. If I run the same script again it will successfully find 1000 existing nodes and delete them, but nothing past that.
My guess is that either there is a transaction linked to the entire JavaScript execution or that the search object caches the search and returns the same results when the same query is executed again.
Has anyone experienced this? Is there a way to get the search to work the second time in the same JavaScript execution?
Here's a snippet of trying to delete 2000 nodes:
var query = 'ASPECT:"sys:archived"';
var results = search.luceneSearch('archive://SpacesStore/',query);
for(var i=0;i<results.length;i++){
if(search.findNode(results[i].nodeRef)!=null){
results[i].remove();
}
}
results = search.luceneSearch('archive://SpacesStore/',query);
for(var i=0;i<results.length;i++){
if(search.findNode(results[i].nodeRef)!=null){
results[i].remove();
}
}
You're running into how Alfresco deals with transactions when working in Javascript. It will bundle all of your changes that are done inside of your Javascript execution into a single transaction and until it exits, changes will not be committed and therefore the index will not be updated.
As far as I know, the only way to batch cleanup of the archive is to use something like Java, where you have explicit control of the transaction. This can be further complicated though if you are using something like SOLR because indexation is asynchronous. Since you said you're in 3.4 you must be using lucene, and this wouldn't be an issue, because the API wouldn't return control to you until the index changes are commited.
Two tips for trashcan management: apply the sys:temporary aspect if you're doing deletes programatically, so you can skip the trashcan step. There is also a wonderful AMP that's Java based which does trashcan cleanup in the background.
https://addons.alfresco.com/addons/trashcan-cleaner
It should work for your version.
Best regards,
Matthew

Node.js JSON.parse on object creation vs. with getter property

This is largely an 'am I doing it right / how can I do this better' kind of topic, with some concrete questions at the end. If you have other advice / remarks on the text below, even if I didn't specifically ask those questions, feel free to comment.
I have a MySQL table for users of my app that, along with a set of fixed columns, also has a text column containing a JSON config object. This is to store variable configuration data that cannot be stored in separate columns because it has different properties per user. There doesn't need to be any lookup / ordering / anything on the configuration data, so we decided this would be the best way to go.
When querying the database from my Node.JS app (running on Node 0.12.4), I assign the JSON text to an object and then use Object.defineProperty to create a getter property that parses the JSON string data when it is needed and adds it to the object.
The code looks like this:
user =
uid: results[0].uid
_c: results[0].user_config # JSON config data as string
Object.defineProperty user, 'config',
get: ->
#c = JSON.parse #_c if not #c?
return #c
Edit: above code is Coffeescript, here's the (approximate) Javascript equivalent for those of you who don't use Coffeescript:
var user = {
uid: results[0].uid,
_c: results[0].user_config // JSON config data as string
};
Object.defineProperty(user, 'config', {
get: function() {
if(this.c === undefined){
this.c = JSON.parse(this._c);
}
return this.c;
}
});
I implemented it this way because parsing JSON blocks the Node event loop, and the config property is only needed about half the time (this is in a middleware function for an express server) so this way the JSON would only be parsed when it is actually needed. The config data itself can range from 5 to around 50 different properties organised in a couple of nested objects, not a huge amount of data but still more than just a few lines of JSON.
Additionally, there are three of these JSON objects (I only showed one since they're all basically the same, just with different data in them). Each one is needed in different scenarios but all of the scenarios depend on variables (some of which come from external sources) so at the point of this function it's impossible to know which ones will be necessary.
So I had a couple of questions about this approach that I hope you guys can answer.
Is there a negative performance impact when using Object.defineProperty, and if yes, is it possible that it could negate the benefit from not parsing the JSON data right away?
Am I correct in assuming that not parsing the JSON right away will actually improve performance? We're looking at a continuously high number of requests and we need to process these quickly and efficiently.
Right now the three JSON data sets come from two different tables JOINed in an SQL query. This is to only have to do one query per request instead of up to four. Keeping in mind that there are scenarios where none of the JSON data is needed, but also scenarios where all three data sets are needed (and of course scenarios inbetween), could it be an improvement to only get the required JSON data from its table, at the point when one of the data sets is actually needed? I avoided this because I feel like waiting for four separate SELECT queries to be executed would take longer than waiting for one query with two JOINed tables.
Are there other ways to approach this that would improve the general performance even more? (I know, this one's a bit of a subjective question, but ideas / suggestions of things I should check out are welcome). I'm not looking to spin off parsing the JSON data into a separate thread though, because as our service runs on a cluster of virtualised single-core servers, creating a child process would only increase overall CPU usage, which at high loads would have even more negative impact on performance.
Note: when I say performance it mainly means fast and efficient throughput rates. We prefer a somewhat larger memory footprint over heavier CPU usage.
We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil
- Donald Knuth
What do I get from that article? Too much time is spent in optimizing with dubious results instead of focusing on design and clarity.
It's true that JSON.parse blocks the event loop, but every synchronous call does - this is just code execution and is not a bad thing.
The root concern is not that it is blocking, but how long it is blocking. I remember a Strongloop instructor saying 10ms was a good rule of thumb for max execution time for a call in an app at cloud scale. >10ms is time to start optimizing - for apps at huge scale. Each app has to define that threshold.
So, how much execution time will your lazy init save? This article says it takes 1.5s to parse a 15MB json string - about 10,000 B/ms. 3 configs, 50 properties each, 30 bytes/k-v pair = 4500 bytes - about half a millisecond.
When the time came to optimize, I would look at having your lazy init do the MySQL call. A config is needed only 50% of the time, it won't block the event loop, and an external call to a db absolutely dwarfs a JSON.parse().
All of this to say: What you are doing is not necessarily bad or wrong, but if the whole app is littered with these types of dubious optimizations, how does that impact feature addition and maintenance? The biggest problems I see revolve around time to market, not speed. Code complexity increases time to market.
Q1: Is there a negative performance impact when using Object.defineProperty...
Check out this site for a hint.
Q2: *...not parsing the JSON right away will actually improve performance...
IMHO: inconsequentially
Q3: Right now the three JSON data sets come from two different tables...
The majority db query cost is usually the out of process call and the network data transport (unless you have a really bad schema or config). All data in one call is the right move.
Q4: Are there other ways to approach this that would improve the general performance
Impossible to tell. The place to start is with an observed behavior, then profiler tools to identify the culprit, then code optimization.

What is the space efficiency of a directed acyclic word graph (dawg)? and is there a javascript implementation?

I have a dictionary of keywords that I want to make available for autocomplete/suggestion on the client side of a web application. The ajax turnaround introduces too much latency, so it would nice to store the entire word list on the client.
The list could be hundreds of thousands of words, maybe a couple of million. I did a little bit of research, and it seams that a dawg structure would provide space and lookup efficiency, but I can't find real world numbers.
Also, feel free to suggest other possibilities for achieving the same functionality.
I have recently implemented DAWG for a wordgame playing program. It uses a dictionary consisting of 2,7 million words from Polish language. Source plain text file is about 33MB in size. The same word list represented as DAWG in binary file takes only 5MB. Actual size may vary, as it depends on implementation, so number of vertices - 154k and number of edges - 411k are more important figures.
Still, that amount of data is far too big to handle by JavaScript, as stated above. Trying to process several MB of data will hang JavaScript interpreter for a few minutes, effectively hanging whole browser.
My mind cringes at the two facts "couple of million" and "JavaScript". JS is meant to shuffle little pieces of data around, not megabytes. Just imagine how long users would have to wait for your page to load!
There must be a reason why AJAX turnaround is so slow in your case. Google serves billion of AJAX requests every day and their type ahead is snappy (just try it on www.google.com). So there must be something broken in your setup. Find it and fix it.
Your solution sounds practical, but you still might want to look at, for example, jQuery's autocomplete implementation(s) to see how they deal with latency.
A couple of million words in memory (in JavaScript in a Browser)? That sounds big regardless of what kind of structure you decide to store it in. Your might consider other kinds of optimizations instead, like loading subsets of your wordlist based on the characters typed.
For example, if the user enters "a" then you'd start retrieving all the words that start with "a". Then you could optimize your wordlist by returning more common words first, so the more likely ones will match up "instantly" while less common words may load a little slower.
from my undestanding, DAWGs are good for storing and searching for words, but not when you need to generate lists of matches. Once you located the prefix, you will have to browser thru all its children to reconstruct the words which start with this prefix.
I agree with others, you should consider server-side search.

Categories