I'm creating a gantt-like chart (configuration really) and need to calculate total duration and validate configurable durations. The goal is that users can build the gantt chart without knowing dates, but by knowing tasks and how they (loosely) relate. In a previous step, users add tasks and select start & end steps for those tasks. Steps have a fixed order. Actual dates are not known (or relevant) but will be mapped to steps later.
Most gantt tools I've seen rely on knowing the start/end dates and don't do calculations.
How should I calculate the total duration and also validate when a duration is invalid? Obviously in some cases a total can't be calculated: if there is an unused step between activities. A simple invalid duration would occur when 2 tasks share the same start and end date but have different values. A more complicated one would occur when 2 or more activities have different start/end steps and overlap.
I'm not looking for a complete solution (it would probably be of little use with my current code anyway), but more a general algorithm or approach. I would think a recursive solution would make sense, but because I'm doing this with JS/jQuery/DOM, I'm concerned about performance of a recursive solution that has to repeatedly look up elements. Should I start calculating from the end or the beginning? should I follow each step's start/end until I go no further or re-evaluate which step to add to total duration mid-way through?
Here is a picture of the current markup:
I'll try to explain what I wound up doing.
I think to follow you have to know a bit about the requirements.
This interactive/configurable gantt/schedule is being used as a template to estimate production timelines.
There are 3 pieces:
Steps
Activities
Durations of activities, which are different depending on the type of item the schedule is applied to.
Since this is a template used for estimation, initially there are no dates - just arbitrary durations tied to activities mapped to steps. However eventually steps get mapped to dates from an imported report (or manually entered).
There are 3 pages where the user incrementally builds up the schedule:
Add/Edit Steps: Steps are rows which are created with a sort order value (inferred)
Add/Edit Activities: A matrix with Steps as columns, Activities as rows. Every intersection is a checkbox. A Start and End Step must be selected for each Activity.
Add/Edit Durations: An item type is selected and durations are added for each activity.
Classes
Step [ Name, StepOrder, ..]
Activity [ Name, StartStepID, StartStepOrder, EndStepID, EndStepOrder, ..]
ActivityDuration : Activity [ Duration, ItemType, ..]
In MVC Controller/Repository:
// start by ordering the durations by the order of the steps
var sortedActivities = durations
.OrderBy(x => x.StartStepOrder)
.ThenBy(x => x.EndStepOrder);
// create func to get the path name from the old + new activity
var getActivityPath = new Func<string, string, string>(
(prevPath, activityID) =>
{
return string.IsNullOrEmpty(prevPath)
? string.Format("{0}", activityID)
: string.Format("{0}.{1}", prevPath, activityID);
});
// create the recursive func we'll call to do all the work
Action<List<ActivityDuration>, string, long?, IEnumerable<ActivityDuration>> buildPaths = null;
buildPaths = (activities, path, startStepID, starts) =>
{
// activities will be null until we are joining gapped paths,
// so grab the activities with the provided startStepID
if (starts == null)
starts = activities.Where(x => x.StartStepID == startStepID);
// each activity represents a new branch in the path
foreach (var activity in starts)
{
var newPath = getActivityPath(path, activity.Id.ToString());
// add the new path and it's ordered collection
// of activities to the collection
if (string.IsNullOrEmpty(path))
{
paths.Add(newPath, new ActivityDuration[] { activity });
}
else
{
paths.Add(newPath, paths[path].Concat(new ActivityDuration[] { activity }));
}
// do this recursively providing the EndStepID as the new Start
buildPaths(activities, newPath, activity.EndStepID, null);
}
// if there were any new branches, remove the previous
// path from the collection
if (starts.Any() && !string.IsNullOrEmpty(path))
{
paths.Remove(path);
}
};
// since the activities are in step order, the first activity's
// StartStepID will be where all paths start.
var firstStepID = sortedActivities.FirstOrDefault().StartStepID;
// call the recursive function starting with the first step
buildPaths(sortedActivities.ToList(), null, firstStepID, null);
// handle gaps in the paths after all the first connected ones have been pathed.
// :: ie - step 1,2 & 4,5 are mapped, but not step 3
// these would be appended to the longest path with a step order < start step of the gapped activity's start step (!!!)
// :: ie - the path should be 1-2,2-4,4-5)
// because the list of paths can grow, we need to keep track of the count
// and loop until there are no more paths added
var beforeCount = paths.Count;
var afterCount = beforeCount + 1;
while (beforeCount < afterCount)
{
foreach (var path in paths.ToArray())
{
var lastActivity = path.Value.Last();
// check for activities that start after the last one in each path ..
// .. that don't start on another activity's end step (because that would be a part of a path already)
var gapped = sortedActivities
.Where(x => x.StartStepOrder > lastActivity.EndStepOrder)
.Where(thisAct =>
!sortedActivities
.Select(otherAct => otherAct.EndStepID)
.Contains(thisAct.StartStepID)
);
beforeCount = paths.Count;
// for each new gapped path, build paths as if it was specified by the previous activity
buildPaths(sortedActivities.ToList(), path.Key, null, gapped);
afterCount = paths.Count;
}
}
// build up an object that can be returned to
// JS with summations and ordering already done.
rValue = new ActivityPaths()
{
Paths = paths
.Select(x => new ActivityPath()
{
Path = x.Key,
ActivityDurations = x.Value,
TotalDuration = x.Value.Sum(y => y.Duration)
})
.OrderByDescending(x => x.TotalDuration)
};
There are admittedly some shortcomings of this design, but the use cases allow for it. Specifically:
- An activity can't directly have more than one dependent step - or in other words - 2 steps can't have the same step order.
- If 2 paths have the same total duration, only one will show as the critical path.
Since the dates which are mapped to steps are ultimately used to calculate back/forward to the end of a path from a given point of time, this is OK. Once all dates are provided, a more accurate critical path can be calculated if needed.
The entire set of paths is returned so that some validation can be implemented in the javascript. The first path will be the critical 'one', and this path gets highlighted in the UI, with the total duration of the critical path shown as well:
Related
I'm working on some kind of 1:1 chat system, the environment is Node.JS
For each country, there is a country room (lobby), for each socket client there is a js class/object is being created and each object is in a list with their unique user id.
This unique id is preserved even users logged in from different browser tabs etc..
Each object stored in collections like: "connections" (all of them), "operators"(only operators), "{countryISO}_clients" (users) and the reference key is their unique id.
In some circumstances, I need to access these connections by their socket ids.
At this point, I can think of 2 resolutions.
Using a for each loop to find the desired object
Creating another collection, this time instead of unique id use socket id (or something else.)
Which one makes sense? Because in JS since this collection will be a reference list instead of a copy, it feels like it makes sense (and beautiful looking) but I can't be sure. Which one is expensive in memory/performance terms?
I can't make thorough tests since I don't know how to create dummy (simultaneous) socket connections.
Expected connected socket client count: 300 - 1000 (depends on the time of the day)
e.g. user:
"qk32d2k":{
"uid":"qk32d2k",
"name":"Josh",
"socket":"{socket.io's socket reference}",
"role":"user",
"rooms":["room1"],
"socketids":["sid1"]
"country":"us",
...
info:() => { return gatherSomeData(); },
update:(obj) => { return updateSomeData(obj); },
send:(data)=>{ /*send data to this user*/ }
}
e.g. Countries collection:
{
us:{
"qk32d2k":{"object above."}
"l33t71":{"another user object."}
},
ca:{
"asd231":{"other user object."}
}
}
Pick Simple Design First that Optimizes for Most Common Access
There is no ideal answer here in the absolute. CPUs are wicked fast these days, so if I were you I'd start out with one simple mechanism of storing the sockets that you can access both ways you want, even if one way is kind of a brute force search. Pick the data structure that optimizes the access mechanism that you expect to be either most common or most sensitive to performance.
So, if you are going to be looking up by userID the most, then I'd probably store the sockets in a Map object with the userID as the key. That will give you fast, optimized access to get the socket for a given userID.
For finding a socket by some other property of the socket, you will just iterate the Map item by item until you find the desired match on some other socket property. I'd probably use a for/of loop because it's both fast and easy to bail out of the loop when you've found your match (something you can't do on a Map or Array object with .forEach()). You can obviously make yourself a little utility function or method that will do the brute force lookup for you and that will allow you to modify the implementation later without changing much calling code.
Measure and Add Further Optimization Later (if data shows you need to)
Then, once you get up to scale (or simulated scale in pre-production test), you take a look at the performance of your system. If you have loads of room to spare, you're done - no need to look further. If you have some operations that are slower than desired or higher than desired CPU usage, then you profile your system and find out where the time is going. It's most likely that your performance bottlenecks will be elsewhere in your system and you can then concentrate on those aspects of the system. If, in your profiling, you find that the linear lookup to find the desired socket is causing some of your slow-down, then you can make a second parallel lookup Map with the socketID as the key in order to optimize that type of lookup.
But, I would not recommend doing this until you've actually shown that it is an issue. Premature optimization before you have actual metrics that prove it's worth optimizing something just add complexity to a program without any proof that it is required or even anywhere close to a meaningful bottleneck in your system. Our intuition about what the bottlenecks are is often way, way off. For that reasons, I tend to pick an intelligent first design that is relatively simple to implement, maintain and use and then, only when we have real usage data by which we can measure actual performance metrics would I spend more time optimizing it or tweaking it or making it more complicated in order to make it faster.
Encapsulate the Implementation in Class
If you encapsulate all operations here in a class:
Adding a socket to the data structure.
Removing a socket from the data structure.
Looking up by userID
Looking up by socketID
Any other access to the data structure
Then, all calling code will access this data structure via the class and you can tweak the implementation some time in the future (to optimize based on data) without having to modify any of the calling code. This type of encapsulation can be very useful if you suspect future modifications or change of modifications to the way the data is stored or accessed.
If You're Still Worried, Design a Quick Bench Measurement
I created a quick snippet that tests how long a brute force lookup is in a 1000 element Map object (when you want to find it by something other than what the key is) and compared it to an indexed lookup.
On my computer, a brute force lookup (non-indexed lookup) takes about 0.002549 ms per lookup (that's an average time when doing 1,000,000 lookups. For comparison an indexed lookup on the same Map takes about 0.000017 ms. So you save about 0.002532 ms per lookup. So, this is fractions of a millisecond.
function addCommas(str) {
var parts = (str + "").split("."),
main = parts[0],
len = main.length,
output = "",
i = len - 1;
while(i >= 0) {
output = main.charAt(i) + output;
if ((len - i) % 3 === 0 && i > 0) {
output = "," + output;
}
--i;
}
// put decimal part back
if (parts.length > 1) {
output += "." + parts[1];
}
return output;
}
let m = new Map();
// populate the Map with objects that have a property that
// you have to do a brute force lookup on
function rand(min, max) {
return Math.floor((Math.random() * (max - min)) + min)
}
// keep all randoms here just so we can randomly get one
// to try to find (wouldn't normally do this)
// just for testing purposes
let allRandoms = [];
for (let i = 0; i < 1000; i++) {
let r = rand(1, 1000000);
m.set(i, {id: r});
allRandoms.push(r);
}
// create a set of test lookups
// we do this ahead of time so it's not part of the timed
// section so we're only timing the actual brute force lookup
let numRuns = 1000000;
let lookupTests = [];
for (let i = 0; i < numRuns; i++) {
lookupTests.push(allRandoms[rand(0, allRandoms.length)]);
}
let indexTests = [];
for (let i = 0; i < numRuns; i++) {
indexTests.push(rand(0, allRandoms.length));
}
// function to brute force search the map to find one of the random items
function findObj(targetVal) {
for (let [key, val] of m) {
if (val.id === targetVal) {
return val;
}
}
return null;
}
let startTime = Date.now();
for (let i = 0; i < lookupTests.length; i++) {
// get an id from the allRandoms to search for
let found = findObj(lookupTests[i]);
if (!found) {
console.log("!!didn't find brute force target")
}
}
let delta = Date.now() - startTime;
//console.log(`Total run time for ${addCommas(numRuns)} lookups: ${delta} ms`);
//console.log(`Avg run time per lookup: ${delta/numRuns} ms`);
// Now, see how fast the same number of indexed lookups are
let startTime2 = Date.now();
for (let i = 0; i < indexTests.length; i++) {
let found = m.get(indexTests[i]);
if (!found) {
console.log("!!didn't find indexed target")
}
}
let delta2 = Date.now() - startTime2;
//console.log(`Total run time for ${addCommas(numRuns)} lookups: ${delta2} ms`);
//console.log(`Avg run time per lookup: ${delta2/numRuns} ms`);
let results = `
Total run time for ${addCommas(numRuns)} brute force lookups: ${delta} ms<br>
Avg run time per brute force lookup: ${delta/numRuns} ms<br>
<hr>
Total run time for ${addCommas(numRuns)} indexed lookups: ${delta2} ms<br>
Avg run time per indexed lookup: ${delta2/numRuns} ms<br>
<hr>
Net savings of an indexed lookup is ${(delta - delta2)/numRuns} ms per lookup
`;
document.body.innerHTML = results;
I'm working on a data visualization that has an odd little bug:
It's a little tricky to see, but essentially, when I click on a point in the line chart, that point corresponds to a specific issue of a magazine. The choropleth updates to reflect geodata for that issue, but, critically, the geodata is for a sampled period that corresponds to the issue. Essentially, the choropleth will look the same for any issue between January-June or July-December of a given year.
As you can see, I have a key called Sampled Issue Date (for Geodata), and the value should be the date of the issue for which the geodata is based on (basically, they would get geographical distribution for one specific issue and call it representative of ALL data in a six month period.) Yet, when I initially click on an issue, I'm always getting the last sampled date in my data. All of the geodata is correct, and, annoyingly, all subsequent clicks display the correct information. So it's only that first click (after refreshing the page OR clearing an issue) that I have a problem.
Honestly, my code is a nightmare right now because I'm focused on debugging, but you can see my reducer for the remove function on GitHub which is also copy/pasted below:
// Reducer function for raw geodata
function geoReducerAdd(p, v) {
// console.log(p.sampled_issue_date, v.sampled_issue_date, state.periodEnding, state.periodStart)
++p.count
p.sampled_mail_subscriptions += v.sampled_mail_subscriptions
p.sampled_single_copy_sales += v.sampled_single_copy_sales
p.sampled_total_sales += v.sampled_total_sales
p.state_population = v.state_population // only valid for population viz
p.sampled_issue_date = v.sampled_issue_date
return p
}
function geoReducerRemove(p, v) {
const currDate = new Date(v.sampled_issue_date)
// if(currDate.getFullYear() === 1921) {
// console.log(currDate)
// }
currDate <= state.periodEnding && currDate >= state.periodStart ? console.log(v.sampled_issue_date, p.sampled_issue_date) : null
const dateToRender = currDate <= state.periodEnding && currDate >= state.periodStart ? v.sampled_issue_date : p.sampled_issue_date
--p.count
p.sampled_mail_subscriptions -= v.sampled_mail_subscriptions
p.sampled_single_copy_sales -= v.sampled_single_copy_sales
p.sampled_total_sales -= v.sampled_total_sales
p.state_population = v.state_population // only valid for population viz
p.sampled_issue_date = dateToRender
return p
}
// generic georeducer
function geoReducerDefault() {
return {
count: 0,
sampled_mail_subscriptions: 0,
sampled_single_copy_sales: 0,
sampled_total_sales: 0,
state_population: 0,
sampled_issue_date: ""
}
}
The problem could be somewhere else, but I don't think it's a crossfilter issue (I'm not running into the "two groups from the same dimension" problem for sure) and adding additional logic to the add reducer makes things even less predictable (understandably - I don't ever really need to render the sample date for all values anyway.) The point of this is that I'm completely lost about where the flaw in my logic is, and I'd love some help!
EDIT: Note that the reducers are for the reduce method on a dc.js dimension, not the native javascript reducer! :D
Two crossfilters! Always fun to see that... but it can be tricky because nothing in dc.js directly supports that, except for the chart registry. You're on your own for filtering between different chart groups, and it can be tricky to map between data sets with different time resolutions and so on.
The problem
As I understand your app, when a date is selected in the line chart, the choropleth and accompanying text should have exactly one row from the geodata dataset selected per state.
The essential problem is that Crossfilter is not great at telling you which rows are in any given bin. So even though there's just one row selected, you don't know what it is!
This is the same problem that makes minimum, maximum, and median reductions surprisingly complicated. You often end up building new data structures to capture what crossfilter throws away in the name of efficiency.
A general solution
I'll go with a general solution that's more that you need, but can be helpful in similar situations. The only alternative that I know is to go completely outside crossfilter and look in the original dataset. That's fine too, and maybe more efficient. But it can be buggy and it's nice to work within the system.
So let's keep track of which dates we've seen per bin. When we start out, every bin will have all the dates. Once a date is selected, there will be only one date (but not exactly the one that was selected, because of your two-crossfilter setup).
Instead of the sampled_issue_date stuff, we'll keep track of an object called date_counts now:
// Reducer function for raw geodata
function geoReducerAdd(p, v) {
// ...
const canonDate = new Date(v.sampled_issue_date).getTime()
p.date_counts[canonDate] = (p.date_counts[canonDate] || 0) + 1
return p
}
function geoReducerRemove(p, v) {
// ...
const canonDate = new Date(v.sampled_issue_date).getTime()
if(!--p.date_counts[canonDate])
delete p.date_counts[canonDate]
return p
}
// generic georeducer
function geoReducerDefault() {
return {
// ...
date_counts: {}
}
}
What does it do?
Line by line
const canonDate = new Date(v.sampled_issue_date).getTime()
Maybe this is paranoid, but this canonicalizes the input dates by converting them to the number of milliseconds since 1970. I'm sure you'd be safe using the string dates directly, but who knows there could be a space or a zero or something.
You can't index an object with a date object, you have to convert it to an integer.
p.date_counts[canonDate] = (p.date_counts[canonDate] || 0) + 1
When we add a row, we'll check if we currently have a count for the row's date. If so, we'll use the count we have. Otherwise we'll default to zero. Then we'll add one.
if(!--p.date_counts[canonDate])
delete p.date_counts[canonDate]
When we remove a row, we know that we have a count for the date for that row (because crossfilter won't tell us it's removing the row unless it was added earlier). So we can go ahead and decrement the count. Then if it hits zero we can remove the entry.
Like I said, it's overkill. In your case, the count will only go to 1 and then drop to 0. But it's not much more expensive to this rather than just keep
Rendering the side panel
When we render the side panel, there should only be one date left in date_counts for that selected item.
console.assert(Object.keys(date_counts).length === 1) // only one entry
console.assert(Object.entries(date_counts)[0][1] === 1) // with count 1
document.getElementById('geo-issue-date').textContent = new Date(+Object.keys(date_counts)[0]).format('mmm dd, yyyy')
Usability notes
From a usability perspective, I would recommend not to filter(null) on mouseleave, or if you really want to, then put it on a timeout which gets cancelled when you see a mouseenter. One should be able to "scrub" over the line chart and see the changes over time in the choropleth without accidentally switching back to the unfiltered colors.
I also noticed (and filed) an issue because I noticed that dots to the right of the mouse pointer are shown, making them difficult to click. The reason is that the dots are overlapping, so only a little sliver of a crescent is hoverable. At least with my trackpad, the click causes the pointer to travel leftward. (I can see the date go back a week in the tooltip and then return.) It's not as much of a problem when you're zoomed in.
Partition Refinement Algorithm
I have this algorithm from another Stack Exchange question:
Let
S = {x_1,x_2,x_3,x_4,x_5}
be the state space and
R = {
(x_1,a,x_2),
(x_1,b,x_3),
(x_2,a,x_2),
(x_3,a,x_4),
(x_3,b,x_5),
(x_4,a,x_4), // loop
(x_5,a,x_5), // loop (a-transition)
(x_5,b,x_5) // loop (b-transition)
}
be the transition relation
Then we start with the partition
Pi_1 = {{x_1,x_2,x_3,x_4,x_5}}
where all the states are lumped together.
Now, x_2 and x_4 can both always do an a-transition to a state, but no b-transitions, whereas the remaining states can do both a- and b-transitions, so we split the state space as
Pi_2 = {{x_1,x_3,x_5},{x_2,x_4}}.
Next, x_5 can do an a-transition into the class {x_1,x_3,x_5},
but x_1 and x_3 can not, since their a-transitions go into the class {x_2,x_4}. Hence these should again be split, so we get
Pi_3 = {{x_1,x_3},{x_5},{x_2,x_4}}.
Now it should come as no surprise that x_3 can do a b-transition into the class {x_5}, but x_1 can not, so they must also be split, so we get
Pi_4 = {{x_1},{x_3},{x_5},{x_2,x_4}},
and if you do one more step, you will see that Pi_4 = Pi_5, so this is the result.
Implementation
I do not know how to implement this algorithm in JavaScript.
// blocks in initial partition (only consist of 1 block)
const initialPartition = { block1: getStates() };
// new partition
const partition = {};
// loop through each block in the partition
Object.values(initialPartition).forEach(function (block) {
// now split the block into subgroups based on the relations.
// loop through each node in block to see to which nodes it has relations
block.forEach(function (node) {
// recursively get edges (and their targets) coming out of the node
const successors = node.successors();
// ...
}
});
I guess I should create a function that for each node can say which transitions it can make. If I have such function, I can loop through each node in each block, find the transitions, and create a key using something like
const key = getTransitions(node).sort().join();
and use this key to group the nodes into subblocks, making it possible to do something like
// blocks in initial partition (only consist of 1 block)
const initialPartition = { block1: getStates() };
// new partition
const partition = {};
// keep running until terminating
while (true) {
// loop through each block in the partition
Object.values(initialPartition).forEach(function (block) {
// now split the block into subgroups based on the relations.
// loop through each node in block to see to which nodes it has relations
block.forEach(function (node) {
// get possible transitions
const key = getTransitions(node).sort().join();
// group by key
partition[key].push(node);
}
});
}
but I need to remember which nodes were already separated into blocks, so the subblocks keep becoming smaller (i.e. if I have {{x_1,x_3,x_5},{x_2,x_4}}, I should remember that these blocks can only become smaller, and never 'interchange').
Can someone give an idea on how to implement the algorithm? Just in pseudocode or something, so I can implement how to get the nodes' successors, incomers, labels (e.g. a-transition or b-transition), etc. This is a bisimulation algorithm, and the algorithm is implemented in Haskell in the slides here, but I do not understand Haskell.
I'm trying to predict some data using a neural network in javascript. For that I found convnetjs that seems easy to use.
In the example, they use one thing that they call MagicNet, so you don't need to know about NN to work with it. This is the example of use:
// toy data: two data points, one of class 0 and other of class 1
var train_data = [new convnetjs.Vol([1.3, 0.5]), new convnetjs.Vol([0.1, 0.7])];
var train_labels = [0, 1];
// create a magic net
var magicNet = new convnetjs.MagicNet(train_data, train_labels);
magicNet.onFinishBatch(finishedBatch); // set a callback a finished evaluation of a batch of networks
// start training MagicNet. Every call trains all candidates in current batch on one example
setInterval(function(){ magicNet.step() }, 0});
// once at least one batch of candidates is evaluated on all folds we can do prediction!
function finishedBatch() {
// prediction example. xout is Vol of scores
// there is also predict_soft(), which returns the full score volume for all labels
var some_test_vol = new convnetjs.Vol([0.1, 0.2]);
var predicted_label = magicNet.predict(some_test_vol);
}
What I don't understand is this:
They create train data like [new convnetjs.Vol([1.3, 0.5]), new convnetjs.Vol([0.1, 0.7])] and then use 2 labels. Those labels, are one for each position of array or for each element of subarray in those positions??
Here is a visual example:
It's like [new 0, new 1] or like [new convnetjs.Vol([0, 1]), new convnetjs.Vol([0, 1])]?
The sample new convnetjs.Vol([1.3, 0.5]) has label 0.
The sample new convnetjs.Vol([0.1, 0.7]) has label 1.
In general, in machine learning, you'd usually have samples which can be quite high-dimensional (here they are only two-dimensional), but you'd have a single label per sample which tells you which "class" it belongs in. What the classes actually mean depends on the problem you are trying to solve; for instance, they could be the digits that are represented by images of hand-written digits.
When I have a large dataset in my viewModel and I use foreach to loop over an Array of Objects to render each Object as a row within a table, KnockoutJS will block the main thread until it can render, which sometimes takes minutes (!).
Here is a jsFiddle example using a dataset containing 2000 Objects, containing a url and a code. Real data will have longer URLs in some cases and 4 other columns (only 2 in this example.) I also added some simple styles because adding styles also seems to slow things down a bit during the process.
Warning: your browser might break
http://jsfiddle.net/DESC3/7/
I suggest that if you have such large datasets you try an alternative solution. For example slickGrid renders large datasets in a much more efficient way, by only generating HTML elements for the data that is actually visible. We've used this for large datasets, and it performs well.
How about something like this. Say, you've got viewModel.items = ko.observableArray() that you'd like to render.
Have a separate non-observable array of all your data: var itemsToRender = functionThatReturnsLargeArray().
Put some portion of your data from itemsToRender into your observable array. Say, 50 elements only.
Keep adding elements into observable array in portions inside a setTimeout callback.
NOTE1: You can add some time-tracking into setTimeout callback and increase/reduce the number of items that you add at each iteration. Your goal is to keep each callback time below 50-100 milliseconds so your application still feels responsive.
var batchSize = 50; // default number of items rendered per iteration
var batchOffset = 0;
function render(items, itemsToRender, done) {
setTimeout(function () {
var startTime = new Date().getTime();
items.pushAll(itemsToRender.slice(batchOffset, batchSize));
batchOffset += batchSize;
// at this point Knockout rendered next batchSize items from itemsToRender
var endTime = new Date().getTime();
// update batchSize for next iteration
batchSize = batchSize * 50 / (endTime - startTime); // 50 milliseconds
batchSize = Math.min(itemsToRender.length, batchOffset + batchSize);
if (batchSize > 0) render() else done(); // callback if you need one
}, 0);
}
/* I haven't actually tested the code */
Another batch size updating strategy could be based on target FPS. Say you'd like to achieve 60 fps update rate and thus 60 calls to setTimeout per 1000 milliseconds. That would take somewhat longer to process the whole collection. You can also use requestAnimationFrame instead of setTimeout and see how that would work out.
EDIT: Build-in throttling was added into Knockout JS 1.3 (currently it's in beta but seems pretty stable).
NOTE2: If some other data on the view depends on viewModel.items you can still map it down to original array itemsToRender. Say, for example, that you'd like to show the number of items in collection. If you use viewModel.items().length you'll end up with changing size value in the UI while more items get renderred. To avoid that you can first define your size binding as a dependentObservable based on itemsToRender, not viewModel.items. After you've done rendering all items you can re-map it onto viewModel.items if you see fit.