Calling ko.toJS on Breeze entity causes slow long running script

Calling ko.toJS on Breeze entity causes slow long running script - javascript

When calling ko.toJS(entity) on any Breeze entity that has navigation properties or is part of a large object graph IE7-8 complain of long running script errors and even chrome can get bogged down for over 1 second.
What gives?!?

I have had this plague me before and seen various forums or Github issues where this became an issue for devs. I wanted to share some knowledge in hopes that it helps others.
The problem
Breeze.js provides your client-side libraries or frameworks with an object graph. Calling JSON.stringify() on the entity will cause the dreaded stack overflow because some of the properties reference each other. Example -
Person -> Company -> Persons
Since a person has a navigation property of company that also has a navigation property of persons they continue to find each other.
ko.toJS() gets around this by detecting duplicates and stopping the serialization at that point.
Why is it slow?
The serialization takes longer depending on the number of entities in the object graph. Don't believe me? Load your app and call ko.toJS() then go load up a bunch more entities in the same object graph. It will take longer, guaranteed. The problem is that even if you call ko.toJS() on a single person Knockout isn't finding duplicates until it has serialized every person in the original persons company. This isn't a problem of Knockout nor Breeze - it is caused by how well interconnected everything is.
What can I do to fix it?
Using a projection query is the best solution, if you can. The problem is usually we need deeper nested levels of entities, not just the current entity. If your entity has a navigation property that has many related entities there is no easy way to perform a projection query, so we need to take it a step further in one of two manners -
Write a custom serializer
Vote for IdeaBlade (the company behind Breeze.js) to support unwrapping more efficiently
My opinion - expose a method on the entity manager to 'unwrap' that accepts the following parameters -
entityManager.unwrap(entity || [entities], [addtlTypesToUnwrap])
Where addtlTypesToUnwrap is either an array of strings or comma separated strings that define which related entity types to unwrap.
Example usage -
var thisEntity = manager.fetchEntityByKey('Person', 1, true);
var relatedProps = ['Company', 'Address']
entityManager.unwrap(thisEntity, relatedProps);
Which would unwrap Person, their address, their company, and the companies address (if it existed)

Related

RESTful API with related resources

I'm in the process of designing my first serious RESTful API, which will sit above a WCF service.
There are resources like; outlet, schedule and job. A schedule is always owned by an outlet, and a schedule will contain 0 or more jobs. A job does not have to be on a schedule.
I keep coming back to thinking that resources should be addressable in the same type of way resources are addressed on a file system. This would mean I'd have URI's like:
/outlets
/outlets/4/schedules
/outlets/4/schedules/1000/jobs
/outlets/4/schedules/1000/jobs/5123
Things start to get messy though when considering how to pull resources back under different situations though.
e.g. I want a job not on a schedule:
/outlets/4/jobs/85 (this now means we've got 2 ways to pull a job back that's on a schedule)
e.g. I want all schedules regardless of outlet:
/schedules or /outlets/ALL/schedules
There are also lots of other more complex requirements but I'm sure you get the gist.
File system's have a good, logical way of addressing resources. You can obviously create symbolic links and achieve something approximating what I describe but it'd be messy. It'll be even messier once things get even slightly more complex, such as adding the ability to get schedules by date:
/outlets/4/2016-08-29/schedules
And without using query string parameters I'm not even sure how I'd request back all jobs that are NOT on a schedule. The following feels wrong because unscheduled is not a resource:
/outlets/4/unscheduled/jobs
So, I'm coming to think that file system type addressing is only going to work for the simplest of services (our underlying system has hundreds of entity types, with some very complex relationships and a huge number of operations).
Having multiple ways of doing the same thing tends to lead to confusion and messy documentation and I want to avoid it. As a result I'm almost forced to opt for going with the lowest common denominator and choosing very simple address forms - like the 3rd one below:
/outlets/4/schedules/1000/jobs/5123
/outlets/4/jobs/5123
/jobs/5123
From these very simple address forms I would then need to expand using query string parameters to do anything more complex, e.g:
/jobs?scheduleId=1000
/jobs?outletId=4
/jobs?outletId=4&fromDate=2016-01-01&toDate=2016-01-31
This feels like it's going against the REST model though and query string parameters like this aren't predictable, so far from the "no docs needed" idea.
OK, so at the minute I'm almost on the side of the fence that is saying in order to get a clean, maintainable API I'm going to have to go with very simple resource addresses, use query string parameters extensively and have good documentation.
Anyway, this doesn't feel like the conclusion I should have arrived at. Where have I gone wrong?

Welcome to the world of REST API programming. These are the hard problems which we all face when trying to apply general principles to specific situations. There is no clear and easy answer to your questions, but here are a few additional tips that may be useful.
First, you're right that the file-system approach to addressing breaks down when you have complex relationships. You'll only want to establish that sort of addressing when there's a true hierarchy there.
For example, if all jobs were part of a single schedule, then it would make sense to look to schedules/{id}/jobs/{id} to get to a given job. If you think of it from a data-storage perspective, you could imagine there being an XML file for each schedule, and the jobs would just be elements within that file.
However, it sounds like in this particular case your data is more relational. From a data-storage perspective, you'd represent each job as a row in a database table, and establish some foreign key relationships to tie some jobs to some schedules. Your addressing scheme should reflect this by making /jobs a top-level endpoint, and using optional query string parameters to filter by schedule or outlet when it makes sense to do so.
So you're on the right track. One more thing you might want to consider is OData, which extends the basic REST principles with a standards-oriented way of representing things like filtering with query string parameters. The address syntax feels a little "out there", but it does a pretty good job of handling the situations where straight REST starts falling apart. And because it's more standardized, there are tools available to help with things like translating from a data layer into an OData endpoint, or generating client-side proxy helpers based on the metadata exposed by that endpoint.
This feels like it's going against the REST model though and query string parameters like this aren't predictable, so far from the "no docs needed" idea.
If you use OData, then its spec combines with the metadata produced by your tooling to become your documentation. For example, your metadata says that a job has a date property which represents a date. Then the OData spec provides a way to represent filter queries for a date value. From this information, consumers can reliably produce a filter query that will "just work" because you're using a framework server-side to do the hard parts. And if they don't feel like memorizing how OData URLs work, they can generate a client proxy in the language of their choice so they can generate the appropriate URL via their favorite syntax.

Node.js JSON.parse on object creation vs. with getter property

This is largely an 'am I doing it right / how can I do this better' kind of topic, with some concrete questions at the end. If you have other advice / remarks on the text below, even if I didn't specifically ask those questions, feel free to comment.
I have a MySQL table for users of my app that, along with a set of fixed columns, also has a text column containing a JSON config object. This is to store variable configuration data that cannot be stored in separate columns because it has different properties per user. There doesn't need to be any lookup / ordering / anything on the configuration data, so we decided this would be the best way to go.
When querying the database from my Node.JS app (running on Node 0.12.4), I assign the JSON text to an object and then use Object.defineProperty to create a getter property that parses the JSON string data when it is needed and adds it to the object.
The code looks like this:
user =
uid: results[0].uid
_c: results[0].user_config # JSON config data as string
Object.defineProperty user, 'config',
get: ->
#c = JSON.parse #_c if not #c?
return #c
Edit: above code is Coffeescript, here's the (approximate) Javascript equivalent for those of you who don't use Coffeescript:
var user = {
uid: results[0].uid,
_c: results[0].user_config // JSON config data as string
};
Object.defineProperty(user, 'config', {
get: function() {
if(this.c === undefined){
this.c = JSON.parse(this._c);
}
return this.c;
}
});
I implemented it this way because parsing JSON blocks the Node event loop, and the config property is only needed about half the time (this is in a middleware function for an express server) so this way the JSON would only be parsed when it is actually needed. The config data itself can range from 5 to around 50 different properties organised in a couple of nested objects, not a huge amount of data but still more than just a few lines of JSON.
Additionally, there are three of these JSON objects (I only showed one since they're all basically the same, just with different data in them). Each one is needed in different scenarios but all of the scenarios depend on variables (some of which come from external sources) so at the point of this function it's impossible to know which ones will be necessary.
So I had a couple of questions about this approach that I hope you guys can answer.
Is there a negative performance impact when using Object.defineProperty, and if yes, is it possible that it could negate the benefit from not parsing the JSON data right away?
Am I correct in assuming that not parsing the JSON right away will actually improve performance? We're looking at a continuously high number of requests and we need to process these quickly and efficiently.
Right now the three JSON data sets come from two different tables JOINed in an SQL query. This is to only have to do one query per request instead of up to four. Keeping in mind that there are scenarios where none of the JSON data is needed, but also scenarios where all three data sets are needed (and of course scenarios inbetween), could it be an improvement to only get the required JSON data from its table, at the point when one of the data sets is actually needed? I avoided this because I feel like waiting for four separate SELECT queries to be executed would take longer than waiting for one query with two JOINed tables.
Are there other ways to approach this that would improve the general performance even more? (I know, this one's a bit of a subjective question, but ideas / suggestions of things I should check out are welcome). I'm not looking to spin off parsing the JSON data into a separate thread though, because as our service runs on a cluster of virtualised single-core servers, creating a child process would only increase overall CPU usage, which at high loads would have even more negative impact on performance.
Note: when I say performance it mainly means fast and efficient throughput rates. We prefer a somewhat larger memory footprint over heavier CPU usage.

We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil
- Donald Knuth
What do I get from that article? Too much time is spent in optimizing with dubious results instead of focusing on design and clarity.
It's true that JSON.parse blocks the event loop, but every synchronous call does - this is just code execution and is not a bad thing.
The root concern is not that it is blocking, but how long it is blocking. I remember a Strongloop instructor saying 10ms was a good rule of thumb for max execution time for a call in an app at cloud scale. >10ms is time to start optimizing - for apps at huge scale. Each app has to define that threshold.
So, how much execution time will your lazy init save? This article says it takes 1.5s to parse a 15MB json string - about 10,000 B/ms. 3 configs, 50 properties each, 30 bytes/k-v pair = 4500 bytes - about half a millisecond.
When the time came to optimize, I would look at having your lazy init do the MySQL call. A config is needed only 50% of the time, it won't block the event loop, and an external call to a db absolutely dwarfs a JSON.parse().
All of this to say: What you are doing is not necessarily bad or wrong, but if the whole app is littered with these types of dubious optimizations, how does that impact feature addition and maintenance? The biggest problems I see revolve around time to market, not speed. Code complexity increases time to market.
Q1: Is there a negative performance impact when using Object.defineProperty...
Check out this site for a hint.
Q2: *...not parsing the JSON right away will actually improve performance...
IMHO: inconsequentially
Q3: Right now the three JSON data sets come from two different tables...
The majority db query cost is usually the out of process call and the network data transport (unless you have a really bad schema or config). All data in one call is the right move.
Q4: Are there other ways to approach this that would improve the general performance
Impossible to tell. The place to start is with an observed behavior, then profiler tools to identify the culprit, then code optimization.

Equivalent of SPContet.Current.ListItem in Client Object Model (ECMAScript)

I'm integrating an external application to SharePoint 2010 by developing custom ribbon tabs, groups, controls and commands that are made available to editors of a SharePoint 2010 site. The ribbon commands use the dialog framework to open dialogs with custom application pages.
In order to pass a number of query string parameters to the custom applications pages, I'm therefore looking for the equivalent of SPContext.Current.ListItem in the Client Object Model (ECMAScript).
Regarding available tokens (i.e. {ListItemId} or {SelectedItemId}) that can be used in the declarative XML, I already emitting all tokens, but unfortunately the desired tokens are not either not parsed or simply null, while in the context of a Publishing Page (i.e. http://domain/pages/page.aspx). Thus, none of the tokes that do render, are of use to establishing the context of the calling SPListItem in the application page.
Looking at the SP.ClientContext.get_current() provides a lot of information about the current SPSite, SPWeb etc. but nothing about the current SPListItem I'm currently positioned at (again, having the page rendered in the context of a Publishing Page).
What I've come up with so far is the idea of passing in the url of the current page (i.e. document.location.href) and parse that in the application page - however, it feels like I'm going in the wrong direction, and SharePoint surely should be able to provide this information.

I'm not sure this is a great answer, or even fully on-topic, but is basically something I originally intended to blog about - anyway:
It is indeed a pain that the Client OM does not seem to provide a method/property with details of the current SPListItem. However, I'd venture to say that this is a simple concept, but actually has quite wide-ranging implications in SharePoint which aren't apparent until you stop to think about it.
Consider:
Although a redirect exists, a discussion post can be surfaced on 2 or 3 different URLs (e.g. Threaded.aspx/Flat.aspx)
Similarly, a blog post can exist on a couple (Post.aspx/EditPost.aspx, maybe one other)
A list item obviously has DispForm.aspx/EditForm.aspx and (sort of) NewForm.aspx
Also for even for items with an associated SPFile (e.g. document, publishing page), consider that these URLs represent the same item:
http://mydomain/sites/someSite/someLib/Forms/DispForm.aspx?ID=x, http://mydomain/sites/someSite/someLib/Filename.aspx
Also, there could be other content types outside of this set which have a similar deal
In our case, we wanted to 'hang' data off internal and external items (e.g. likes, comments). We thought "well everything in SharePoint has a URL, so that could be a sensible way to identify an item". Big mistake, and I'm still kicking myself for falling into it. It's almost like we need some kind of 'normalizeUrl' method in the API if we wanted to use URLs in this way.
Did you ever notice the PageUrlNormalization class in Microsoft.SharePoint.Utilities? Sounds promising doesn't it? Unfortunately that appears to do something which isn't what I describe above - it doesn't work across the variations of content types etc (but does deal with extended web apps, HTTP/HTTPS etc).
To cut a long story short, we decided the best approach was to make the server emit details which allowed us to identify the current SPListItem when passed back to the server (e.g. in an AJAX request). We hide the 'canonical' list item ID in a JavaScript variable or hidden input field (whatever really), and these are evaluated when back at the server to re-obtain the list item. Not as efficient as obtaining everything from context, but for us it's OK because we only need to resolve when the user clicks something, not on every page load. By canonical, I mean:
SiteID|WebID|ListID|ListItemID
IIRC, one of the key objects has a CanonicalId property (or maybe it's internal), which may help you build such a string.
So in terms of using the window.location.href, I'd avoid that if you're in vaguely the same situation as us. Suggest considering an approach similar to the one we used, but do remember that there are some locations (e.g. certain forms) where even on the server SPContext.Current.ListItem is null, despite the fact that SPContext.Current.Web (and possibly SPContext.Current.List) are populated.
In summary - IDs are your friend, URLs are not.

Ext JS Stores and Complex Objects

I recall reading developers should think of records in a store as a row in a database where each column is a simple data type. We store complex JS objects in Ext JS stores without any apparent ramifications.
Does anybody know of pitfalls with storing JS objects in an Ext JS store?

Modern web browsers have gotten very good at managing this kind of memory usage and processing speed. Internally we had implemented the sort of record associations that extjs 4 has built-in now, and have scenarios with ~250k complex nested records stored without any real problem. I believe that the negligible impact on performance would continue to hold up for a long time, as it also is pretty good about cleaning up its own memory usage after itself. We mirrored our web server's ORM models to extjs record definitions, and regularly query against these nested stores in much of the same ways that we would hit a more traditional database. You have to be careful what you do with it, e.g., trying to render a grid of 250k records at once will not work out very well. But that is almost entirely the impact of dom rendering and not the iteration or storage of the record/store data. All of this seems to be even more true when testing with the recent extjs 4 beta releases.

Looking at the Ext JS source it seems like a Store is a wrapper around an Object which offers sorting / filter and event functionality. It's a simple key/value pair.
The complexity of the object should not cause any problems.
Now treating a Store as anything but a wrapper around an key/value pair collection is going to cause problems. Thinking it's like a table is going to cause problems. That kind of misunderstanding leads to poorly designed code.
A Store should be treated as a Bag of key/value data with helper methods to organise that bag.

I suppose there can potentially be a performance impact if your JS objects are very large, it could take some time if you end up doing a lot of serialization/deserializtion of those JS objects. If you are dealing with grids, you can mitigate these by use of pagination.
Stores are not necessarily used strictly for grid rows. They are used in many Ext objects such as comboboxes (drop down menus). Here it is used with a key/value pair. Usually this is done for a value and a displayValue relationship for the data.
If you need an even lower level object to work with, check out the Ext.util.MixedCollection object. There are lots of fun stuff in there. It's basically a hashmap of key/value pairs. I believe in the Ext source code, Stores use these objects at its core.

CouchDB: How to change view function via javascript?

I am playing around with CouchDB to test if it is "possible" [1] to store scientific data (simulated and experimental raw data + metadata). A big pro is the schema-less approach of CouchDB: we have to be very flexible with the metadata, as the set of parameters changes very often.
Up to now I have some code to feed raw data, plots (both as attachments), and hierarchical metadata (as JSON) into CouchDB documents, and have written some prototype Javascript for filtering and showing. But the filtering is done on the client side (a.k.a. browser): The map function simply returns everything.
How could I change the (or push a second) map function of a specific _design-document with simple browser-JS?
I do not think that a temporary view would yield any performance gain...
Thanks for your time and answers.
[1]: of course it is possible, but is it also useful? feasible? reasonable?
[added]
Ah, the jquery.couch.js (version 0.9.0) provides a saveDoc() function, which could update the _design document with the new map function.
But I also tried out the query function, which uses a temporary view. Okay, "do not use this in the real product, only during development"... But scientific research is steady development, right?
Temporary views are getting cached, as I noticed, and it works well for ~1000 documents per DB. A second plus: all users (think of 1 to 3, so a big user management is quit of an overkill) can work with their own temporary view.

Never ever use temporary views. They are really only there for dev and debugging purposes. For more information, see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views (specifically the bold "NOTE").
And yes, because design documents are really just documents with special powers, you can run you GET/POST/PUT/DELETE methods on them. However, you will usually need admin privileges to do this. So, if you are allowing a client side piece of software to do that, you are making your entire database public for read/write access - this may be fine for your application, but is important to remember.
Ex., if you restrict access to your database, but put the username and password in client side javascript, then anyone can see that username and password.
Cheers.

I´ve written an helper functions for jquery.couch and design docs, take a look at:
https://github.com/grischaandreew/jquery.couch.js

We Keep Coding

JavaScript is the programming language of the Web.