Partition Refinement Algorithm
I have this algorithm from another Stack Exchange question:
Let
S = {x_1,x_2,x_3,x_4,x_5}
be the state space and
R = {
(x_1,a,x_2),
(x_1,b,x_3),
(x_2,a,x_2),
(x_3,a,x_4),
(x_3,b,x_5),
(x_4,a,x_4), // loop
(x_5,a,x_5), // loop (a-transition)
(x_5,b,x_5) // loop (b-transition)
}
be the transition relation
Then we start with the partition
Pi_1 = {{x_1,x_2,x_3,x_4,x_5}}
where all the states are lumped together.
Now, x_2 and x_4 can both always do an a-transition to a state, but no b-transitions, whereas the remaining states can do both a- and b-transitions, so we split the state space as
Pi_2 = {{x_1,x_3,x_5},{x_2,x_4}}.
Next, x_5 can do an a-transition into the class {x_1,x_3,x_5},
but x_1 and x_3 can not, since their a-transitions go into the class {x_2,x_4}. Hence these should again be split, so we get
Pi_3 = {{x_1,x_3},{x_5},{x_2,x_4}}.
Now it should come as no surprise that x_3 can do a b-transition into the class {x_5}, but x_1 can not, so they must also be split, so we get
Pi_4 = {{x_1},{x_3},{x_5},{x_2,x_4}},
and if you do one more step, you will see that Pi_4 = Pi_5, so this is the result.
Implementation
I do not know how to implement this algorithm in JavaScript.
// blocks in initial partition (only consist of 1 block)
const initialPartition = { block1: getStates() };
// new partition
const partition = {};
// loop through each block in the partition
Object.values(initialPartition).forEach(function (block) {
// now split the block into subgroups based on the relations.
// loop through each node in block to see to which nodes it has relations
block.forEach(function (node) {
// recursively get edges (and their targets) coming out of the node
const successors = node.successors();
// ...
}
});
I guess I should create a function that for each node can say which transitions it can make. If I have such function, I can loop through each node in each block, find the transitions, and create a key using something like
const key = getTransitions(node).sort().join();
and use this key to group the nodes into subblocks, making it possible to do something like
// blocks in initial partition (only consist of 1 block)
const initialPartition = { block1: getStates() };
// new partition
const partition = {};
// keep running until terminating
while (true) {
// loop through each block in the partition
Object.values(initialPartition).forEach(function (block) {
// now split the block into subgroups based on the relations.
// loop through each node in block to see to which nodes it has relations
block.forEach(function (node) {
// get possible transitions
const key = getTransitions(node).sort().join();
// group by key
partition[key].push(node);
}
});
}
but I need to remember which nodes were already separated into blocks, so the subblocks keep becoming smaller (i.e. if I have {{x_1,x_3,x_5},{x_2,x_4}}, I should remember that these blocks can only become smaller, and never 'interchange').
Can someone give an idea on how to implement the algorithm? Just in pseudocode or something, so I can implement how to get the nodes' successors, incomers, labels (e.g. a-transition or b-transition), etc. This is a bisimulation algorithm, and the algorithm is implemented in Haskell in the slides here, but I do not understand Haskell.
Related
I'm working on some kind of 1:1 chat system, the environment is Node.JS
For each country, there is a country room (lobby), for each socket client there is a js class/object is being created and each object is in a list with their unique user id.
This unique id is preserved even users logged in from different browser tabs etc..
Each object stored in collections like: "connections" (all of them), "operators"(only operators), "{countryISO}_clients" (users) and the reference key is their unique id.
In some circumstances, I need to access these connections by their socket ids.
At this point, I can think of 2 resolutions.
Using a for each loop to find the desired object
Creating another collection, this time instead of unique id use socket id (or something else.)
Which one makes sense? Because in JS since this collection will be a reference list instead of a copy, it feels like it makes sense (and beautiful looking) but I can't be sure. Which one is expensive in memory/performance terms?
I can't make thorough tests since I don't know how to create dummy (simultaneous) socket connections.
Expected connected socket client count: 300 - 1000 (depends on the time of the day)
e.g. user:
"qk32d2k":{
"uid":"qk32d2k",
"name":"Josh",
"socket":"{socket.io's socket reference}",
"role":"user",
"rooms":["room1"],
"socketids":["sid1"]
"country":"us",
...
info:() => { return gatherSomeData(); },
update:(obj) => { return updateSomeData(obj); },
send:(data)=>{ /*send data to this user*/ }
}
e.g. Countries collection:
{
us:{
"qk32d2k":{"object above."}
"l33t71":{"another user object."}
},
ca:{
"asd231":{"other user object."}
}
}
Pick Simple Design First that Optimizes for Most Common Access
There is no ideal answer here in the absolute. CPUs are wicked fast these days, so if I were you I'd start out with one simple mechanism of storing the sockets that you can access both ways you want, even if one way is kind of a brute force search. Pick the data structure that optimizes the access mechanism that you expect to be either most common or most sensitive to performance.
So, if you are going to be looking up by userID the most, then I'd probably store the sockets in a Map object with the userID as the key. That will give you fast, optimized access to get the socket for a given userID.
For finding a socket by some other property of the socket, you will just iterate the Map item by item until you find the desired match on some other socket property. I'd probably use a for/of loop because it's both fast and easy to bail out of the loop when you've found your match (something you can't do on a Map or Array object with .forEach()). You can obviously make yourself a little utility function or method that will do the brute force lookup for you and that will allow you to modify the implementation later without changing much calling code.
Measure and Add Further Optimization Later (if data shows you need to)
Then, once you get up to scale (or simulated scale in pre-production test), you take a look at the performance of your system. If you have loads of room to spare, you're done - no need to look further. If you have some operations that are slower than desired or higher than desired CPU usage, then you profile your system and find out where the time is going. It's most likely that your performance bottlenecks will be elsewhere in your system and you can then concentrate on those aspects of the system. If, in your profiling, you find that the linear lookup to find the desired socket is causing some of your slow-down, then you can make a second parallel lookup Map with the socketID as the key in order to optimize that type of lookup.
But, I would not recommend doing this until you've actually shown that it is an issue. Premature optimization before you have actual metrics that prove it's worth optimizing something just add complexity to a program without any proof that it is required or even anywhere close to a meaningful bottleneck in your system. Our intuition about what the bottlenecks are is often way, way off. For that reasons, I tend to pick an intelligent first design that is relatively simple to implement, maintain and use and then, only when we have real usage data by which we can measure actual performance metrics would I spend more time optimizing it or tweaking it or making it more complicated in order to make it faster.
Encapsulate the Implementation in Class
If you encapsulate all operations here in a class:
Adding a socket to the data structure.
Removing a socket from the data structure.
Looking up by userID
Looking up by socketID
Any other access to the data structure
Then, all calling code will access this data structure via the class and you can tweak the implementation some time in the future (to optimize based on data) without having to modify any of the calling code. This type of encapsulation can be very useful if you suspect future modifications or change of modifications to the way the data is stored or accessed.
If You're Still Worried, Design a Quick Bench Measurement
I created a quick snippet that tests how long a brute force lookup is in a 1000 element Map object (when you want to find it by something other than what the key is) and compared it to an indexed lookup.
On my computer, a brute force lookup (non-indexed lookup) takes about 0.002549 ms per lookup (that's an average time when doing 1,000,000 lookups. For comparison an indexed lookup on the same Map takes about 0.000017 ms. So you save about 0.002532 ms per lookup. So, this is fractions of a millisecond.
function addCommas(str) {
var parts = (str + "").split("."),
main = parts[0],
len = main.length,
output = "",
i = len - 1;
while(i >= 0) {
output = main.charAt(i) + output;
if ((len - i) % 3 === 0 && i > 0) {
output = "," + output;
}
--i;
}
// put decimal part back
if (parts.length > 1) {
output += "." + parts[1];
}
return output;
}
let m = new Map();
// populate the Map with objects that have a property that
// you have to do a brute force lookup on
function rand(min, max) {
return Math.floor((Math.random() * (max - min)) + min)
}
// keep all randoms here just so we can randomly get one
// to try to find (wouldn't normally do this)
// just for testing purposes
let allRandoms = [];
for (let i = 0; i < 1000; i++) {
let r = rand(1, 1000000);
m.set(i, {id: r});
allRandoms.push(r);
}
// create a set of test lookups
// we do this ahead of time so it's not part of the timed
// section so we're only timing the actual brute force lookup
let numRuns = 1000000;
let lookupTests = [];
for (let i = 0; i < numRuns; i++) {
lookupTests.push(allRandoms[rand(0, allRandoms.length)]);
}
let indexTests = [];
for (let i = 0; i < numRuns; i++) {
indexTests.push(rand(0, allRandoms.length));
}
// function to brute force search the map to find one of the random items
function findObj(targetVal) {
for (let [key, val] of m) {
if (val.id === targetVal) {
return val;
}
}
return null;
}
let startTime = Date.now();
for (let i = 0; i < lookupTests.length; i++) {
// get an id from the allRandoms to search for
let found = findObj(lookupTests[i]);
if (!found) {
console.log("!!didn't find brute force target")
}
}
let delta = Date.now() - startTime;
//console.log(`Total run time for ${addCommas(numRuns)} lookups: ${delta} ms`);
//console.log(`Avg run time per lookup: ${delta/numRuns} ms`);
// Now, see how fast the same number of indexed lookups are
let startTime2 = Date.now();
for (let i = 0; i < indexTests.length; i++) {
let found = m.get(indexTests[i]);
if (!found) {
console.log("!!didn't find indexed target")
}
}
let delta2 = Date.now() - startTime2;
//console.log(`Total run time for ${addCommas(numRuns)} lookups: ${delta2} ms`);
//console.log(`Avg run time per lookup: ${delta2/numRuns} ms`);
let results = `
Total run time for ${addCommas(numRuns)} brute force lookups: ${delta} ms<br>
Avg run time per brute force lookup: ${delta/numRuns} ms<br>
<hr>
Total run time for ${addCommas(numRuns)} indexed lookups: ${delta2} ms<br>
Avg run time per indexed lookup: ${delta2/numRuns} ms<br>
<hr>
Net savings of an indexed lookup is ${(delta - delta2)/numRuns} ms per lookup
`;
document.body.innerHTML = results;
Looking to learn how to implement a hash table in a decent way in JavaScript.
I would like for it to be able to:
Efficiently resolve collisions,
Be space efficient, and
Be unbounded in size (at least in principle, like v8 objects are, up to the size of the system memory).
From my research and help from SO, there are many ways to resolve collisions in hash tables. The way v8 does it is Quadratic probing:
hash-table.h
The wikipedia algorithm implementing quadratic probing in JavaScript looks something like this:
var i = 0
var SIZE = 10000
var key = getKey(arbitraryString)
var hash = key % SIZE
if (hashtable[hash]) {
while (i < SIZE) {
i++
hash = (key + i * i) % SIZE
if (!hashtable[hash]) break
if (i == SIZE) throw new Error('Hashtable full.')
}
hashtable[hash] = key
} else {
hashtable[hash] = key
}
The elements that are missing from the wikipedia entry are:
How to compute the hash getKey(arbitraryString). Hoping to learn how v8 does this (not necessarily an exact replica, just along the same lines). Not being proficient in C it looks like the key is an object, and the hash is a 32 bit integer. Not sure if the lookup-cache.h is important.
How to make it dynamic so the SIZE constraint can be removed.
Where to store the final hash, and how to compute it more than once.
V8 allows you to specify your own "Shape" object to use in the hash table:
// The hash table class is parameterized with a Shape.
// Shape must be a class with the following interface:
// class ExampleShape {
// public:
// // Tells whether key matches other.
// static bool IsMatch(Key key, Object* other);
// // Returns the hash value for key.
// static uint32_t Hash(Isolate* isolate, Key key);
// // Returns the hash value for object.
// static uint32_t HashForObject(Isolate* isolate, Object* object);
// // Convert key to an object.
// static inline Handle<Object> AsHandle(Isolate* isolate, Key key);
// // The prefix size indicates number of elements in the beginning
// // of the backing storage.
// static const int kPrefixSize = ..;
// // The Element size indicates number of elements per entry.
// static const int kEntrySize = ..;
// // Indicates whether IsMatch can deal with other being the_hole (a
// // deleted entry).
// static const bool kNeedsHoleCheck = ..;
// };
But not sure what the key is and how they convert that key to the hash so keys are evenly distributed and the hash function isn't just a hello-world example.
The question is, how to implement a quick hash table like V8 that can efficiently resolve collisions and is unbounded in size. It doesn't have to be exactly like V8 but have the features outlined above.
In terms of space efficiency, a naive approach would do var array = new Array(10000), which would eat up a bunch of memory until it was filled out. Not sure how v8 handles it, but if you do var x = {} a bunch of times, it doesn't allocate a bunch of memory for unused keys, somehow it is dynamic.
I'm stuck here essentially:
var m = require('node-murmurhash')
function HashTable() {
this.array = new Array(10000)
}
HashTable.prototype.key = function(value){
// not sure if the key is actually this, or
// the final result computed from the .set function,
// and if so, how to store that.
return m(value)
}
HashTable.prototype.set = function(value){
var key = this.key(value)
var array = this.array
// not sure how to get rid of this constraint.
var SIZE = 10000
var hash = key % SIZE
var i = 0
if (array[hash]) {
while (i < SIZE) {
i++
hash = (key + i * i) % SIZE
if (!array[hash]) break
if (i == SIZE) throw new Error('Hashtable full.')
}
array[hash] = value
} else {
array[hash] = value
}
}
HashTable.prototype.get = function(index){
return this.array[index]
}
This is a very broad question, and I'm not sure what exactly you want an answer to. ("How to implement ...?" sounds like you just want someone to do your work for you. Please be more specific.)
How to compute the hash
Any hash function will do. I've pointed out V8's implementation in the other question you've asked; but you really have a lot of freedom here.
Not sure if the lookup-cache.h is important.
Nope, it's unrelated.
How to make it dynamic so the SIZE constraint can be removed.
Store the table's current size as a variable, keep track of the number of elements in your hash table, and grow the table when the percentage of used slots exceeds a given threshold (you have a space-time tradeoff there: lower load factors like 50% give fewer collisions but use more memory, higher factors like 80% use less memory but hit more slow cases). I'd start with a capacity that's an estimate of "minimum number of entries you'll likely need", and grow in steps of 2x (e.g. 32 -> 64 -> 128 -> etc.).
Where to store the final hash,
That one's difficult: in JavaScript, you can't store additional properties on strings (or primitives in general). You could use a Map (or object) on the side, but if you're going to do that anyway, then you might as well use that as the hash table, and not bother implementing your own thing on top.
and how to compute it more than once.
That one's easy: invoke your hashing function again ;-)
I just want a function getUniqueString(string)
How about this:
var table = new Map();
var max = 0;
function getUniqueString(string) {
var unique = table.get(string);
if (unique === undefined) {
unique = (++max).toString();
table.set(string, unique);
}
return unique;
}
For nicer encapsulation, you could define an object that has table and max as properties.
I'm looking to write a small parser for some kind of files, and one of the things I have to accomplish is to find if a line is inside another one, defining this with indentation (spaces or tabs).
Example:
This is the main line
This is a nested or child line
I'm trying to establish this by reading the first character position in the line and comparing it with the previous one with something like this:
var str = ' hello';
str.indexOf(str.match(/\S|$/).shift());
I'm sure this is not the best way and it looks horrible, also I have another issues to address, like checking if the indentation is made by spaces (2 or 4), or tabs, or passing/maintaining an state of the previous line (object).
Also, lines can be infinitely nested and of course I'm looking more for a nice and performant algorithm (or idea), or pattern rather than a simple check that I think is relatively easy to do but error prone. I'm sure it is already solve by people who works with parsers and compilers.
Edit:
str.search(/\S/);
#Oriol proposal looks much better
This is generally the kind of thing you write a parser for, rather than purely relying on regex. If the nesting determines the depth, then you have two things to solve: 1) find the depth for an arbitrary line, and 2) iterate through the set of lines and track, for each line, which preceding line has a lower depth value.
The first is trivial if you are familiar with the RegExp functions in Javascript:
function getDepth(line) {
// find leading white space
var ws = str.match(/^(\s+)/);
// no leading white space?
if (ws === null) return 0;
// leading white space -> count the white space symbols.
// obviously this goes wrong for mixed spaces and tabs, and that's on you.
return ws[0].split('').length;
}
The second part is less trivial, and so you have several options. You could iterate through all the lines, and track the list of line numbers, pushing onto the list as you go deeper and popping from the list as you go back up, or you can build a simple tree structure (which is generally far better because it lets you expand its functionality much more easily) using standard tree building approached.
function buildTree(lines, depths) {
if (!depths) {
var depths = lines.map(e => getDepth);
return buildTree(lines, depths);
}
var root = new Node();
for(var pos=0, end=lines.length; pos<end; pos++) {
var line = lines[pos];
var depth = depths[pos];
root.insert(line, depth);
}
}
With a simple Node object, of course
var Node = function(text, depth) {
this.children = [];
this.line = text.replace(/^\s+/,'');
this.depth = depth;
}
Node.prototype = {
insert: function(text, depth) {
// this is where you become responsible: we need to insert
// a new node inside of this one if the depths indicate that
// is what should happen, but you're on the hook for determining
// what you want to have happen if the indentation was weird, like
// a line at depth 12 after a line at depth 2, or vice versa.
}
}
I've been following this snake example and decided to modify it to generate new apples only in empty (i.e. non-snake) cells. However, that's introduced a cyclic dependency between Observables, since generating new apples now depends not only on the last position but on the whole snake:
// stream of last `length` positions -- snake cells
var currentSnake = currentPosition.slidingWindowBy(length);
// stream of apple positions
var apples = appleStream(currentSnake);
// length of snake
var length = apples.scan(1, function(l) { return l + 1; });
Is there a nice way to resolve the cycle?
I can imagine how this would work in a messy state machine but not with clean FRP.
The closest I can think of is coalescing apples and length into one stream and making that stream generate its own "currentSnake" from currentPosition.
applesAndLength --> currentPosition
^ ^
| /
currentSnake
I haven't thought about the implementation much, though.
Once it has been constructed, Bacon can usually handle a cyclic dependency between Observables. It is constructing them that's a bit tricky.
In a language like Javascript, to create a structure with a cycle in it (i.e. a doubly-linked list), you need a mutable variable. For regular objects you use a regular variable or field to do that, e.g.
var tail = { prev: null, next: null };
var head = { prev: null, next: tail };
tail.prev = head; // mutating 'tail' here!
In Bacon, we operate on Observables instead of variables and objects, so we need some kind of a mutable observable to reach the same ends. Thankfully, Bacon.Bus is just the kind of observable we need:
var apples = new Bacon.Bus(); // plugged in later
var length = apples.scan(1, function(l) { return l + 1; });
var currentSnake = currentPosition.slidingWindowBy(length);
apples.plug(appleStream(currentSnake)); // mutating 'apples' here!
In my experience, it is preferrable to cut the cycles at EventStreams instead of Properties, because initial values tend to get lost otherwise; thus the reordering of apples and length.
I'm creating a gantt-like chart (configuration really) and need to calculate total duration and validate configurable durations. The goal is that users can build the gantt chart without knowing dates, but by knowing tasks and how they (loosely) relate. In a previous step, users add tasks and select start & end steps for those tasks. Steps have a fixed order. Actual dates are not known (or relevant) but will be mapped to steps later.
Most gantt tools I've seen rely on knowing the start/end dates and don't do calculations.
How should I calculate the total duration and also validate when a duration is invalid? Obviously in some cases a total can't be calculated: if there is an unused step between activities. A simple invalid duration would occur when 2 tasks share the same start and end date but have different values. A more complicated one would occur when 2 or more activities have different start/end steps and overlap.
I'm not looking for a complete solution (it would probably be of little use with my current code anyway), but more a general algorithm or approach. I would think a recursive solution would make sense, but because I'm doing this with JS/jQuery/DOM, I'm concerned about performance of a recursive solution that has to repeatedly look up elements. Should I start calculating from the end or the beginning? should I follow each step's start/end until I go no further or re-evaluate which step to add to total duration mid-way through?
Here is a picture of the current markup:
I'll try to explain what I wound up doing.
I think to follow you have to know a bit about the requirements.
This interactive/configurable gantt/schedule is being used as a template to estimate production timelines.
There are 3 pieces:
Steps
Activities
Durations of activities, which are different depending on the type of item the schedule is applied to.
Since this is a template used for estimation, initially there are no dates - just arbitrary durations tied to activities mapped to steps. However eventually steps get mapped to dates from an imported report (or manually entered).
There are 3 pages where the user incrementally builds up the schedule:
Add/Edit Steps: Steps are rows which are created with a sort order value (inferred)
Add/Edit Activities: A matrix with Steps as columns, Activities as rows. Every intersection is a checkbox. A Start and End Step must be selected for each Activity.
Add/Edit Durations: An item type is selected and durations are added for each activity.
Classes
Step [ Name, StepOrder, ..]
Activity [ Name, StartStepID, StartStepOrder, EndStepID, EndStepOrder, ..]
ActivityDuration : Activity [ Duration, ItemType, ..]
In MVC Controller/Repository:
// start by ordering the durations by the order of the steps
var sortedActivities = durations
.OrderBy(x => x.StartStepOrder)
.ThenBy(x => x.EndStepOrder);
// create func to get the path name from the old + new activity
var getActivityPath = new Func<string, string, string>(
(prevPath, activityID) =>
{
return string.IsNullOrEmpty(prevPath)
? string.Format("{0}", activityID)
: string.Format("{0}.{1}", prevPath, activityID);
});
// create the recursive func we'll call to do all the work
Action<List<ActivityDuration>, string, long?, IEnumerable<ActivityDuration>> buildPaths = null;
buildPaths = (activities, path, startStepID, starts) =>
{
// activities will be null until we are joining gapped paths,
// so grab the activities with the provided startStepID
if (starts == null)
starts = activities.Where(x => x.StartStepID == startStepID);
// each activity represents a new branch in the path
foreach (var activity in starts)
{
var newPath = getActivityPath(path, activity.Id.ToString());
// add the new path and it's ordered collection
// of activities to the collection
if (string.IsNullOrEmpty(path))
{
paths.Add(newPath, new ActivityDuration[] { activity });
}
else
{
paths.Add(newPath, paths[path].Concat(new ActivityDuration[] { activity }));
}
// do this recursively providing the EndStepID as the new Start
buildPaths(activities, newPath, activity.EndStepID, null);
}
// if there were any new branches, remove the previous
// path from the collection
if (starts.Any() && !string.IsNullOrEmpty(path))
{
paths.Remove(path);
}
};
// since the activities are in step order, the first activity's
// StartStepID will be where all paths start.
var firstStepID = sortedActivities.FirstOrDefault().StartStepID;
// call the recursive function starting with the first step
buildPaths(sortedActivities.ToList(), null, firstStepID, null);
// handle gaps in the paths after all the first connected ones have been pathed.
// :: ie - step 1,2 & 4,5 are mapped, but not step 3
// these would be appended to the longest path with a step order < start step of the gapped activity's start step (!!!)
// :: ie - the path should be 1-2,2-4,4-5)
// because the list of paths can grow, we need to keep track of the count
// and loop until there are no more paths added
var beforeCount = paths.Count;
var afterCount = beforeCount + 1;
while (beforeCount < afterCount)
{
foreach (var path in paths.ToArray())
{
var lastActivity = path.Value.Last();
// check for activities that start after the last one in each path ..
// .. that don't start on another activity's end step (because that would be a part of a path already)
var gapped = sortedActivities
.Where(x => x.StartStepOrder > lastActivity.EndStepOrder)
.Where(thisAct =>
!sortedActivities
.Select(otherAct => otherAct.EndStepID)
.Contains(thisAct.StartStepID)
);
beforeCount = paths.Count;
// for each new gapped path, build paths as if it was specified by the previous activity
buildPaths(sortedActivities.ToList(), path.Key, null, gapped);
afterCount = paths.Count;
}
}
// build up an object that can be returned to
// JS with summations and ordering already done.
rValue = new ActivityPaths()
{
Paths = paths
.Select(x => new ActivityPath()
{
Path = x.Key,
ActivityDurations = x.Value,
TotalDuration = x.Value.Sum(y => y.Duration)
})
.OrderByDescending(x => x.TotalDuration)
};
There are admittedly some shortcomings of this design, but the use cases allow for it. Specifically:
- An activity can't directly have more than one dependent step - or in other words - 2 steps can't have the same step order.
- If 2 paths have the same total duration, only one will show as the critical path.
Since the dates which are mapped to steps are ultimately used to calculate back/forward to the end of a path from a given point of time, this is OK. Once all dates are provided, a more accurate critical path can be calculated if needed.
The entire set of paths is returned so that some validation can be implemented in the javascript. The first path will be the critical 'one', and this path gets highlighted in the UI, with the total duration of the critical path shown as well: