I wrote a reduce function for Iterables and now I want to derive a generic map that can map over arbitrary Iterables. However, I have encountered an issue: Since Iterables abstract the data source, map couldn't determine the type of it (e.g. Array, String, Map etc.). I need this type to invoke the corresponding identity element/concat function. Three solutions come to mind:
pass the identity element/concat function explicitly const map = f => id => concat => xs (this is verbose and would leak internal API though)
only map Iterables that implement the monoid interface (that were cool, but introducing new types?)
rely on the prototype or constructor identity of ArrayIterator,StringIterator, etc.
I tried the latter but isPrototypeOf/instanceof always yield false no matter what a do, for instance:
Array.prototype.values.prototype.isPrototypeOf([].values()); // false
Array.prototype.isPrototypeOf([].values()); // false
My questions:
Where are the prototypes of ArrayIterator/StringIterator/...?
Is there a better approach that solves the given issue?
Edit: [][Symbol.iterator]() and ("")[Symbol.iterator]() seem to share the same prototype:
Object.getPrototypeOf(Object.getPrototypeOf([][Symbol.iterator]())) ====
Object.getPrototypeOf(Object.getPrototypeOf(("")[Symbol.iterator]()))
A distinction by prototypes seems not to be possible.
Edit: Here is my code:
const values = o => keys(o).values();
const next = iter => iter.next();
const foldl = f => acc => iter => {
let loop = (acc, {value, done}) => done
? acc
: loop(f(acc) (value), next(iter));
return loop(acc, next(iter));
}
// static `map` version only for `Array`s - not what I desire
const map = f => foldl(acc => x => [...acc, f(x)]) ([]);
console.log( map(x => x + x) ([1,2,3].values()) ); // A
console.log( map(x => x + x) (("abc")[Symbol.iterator]()) ); // B
The code in line A yields the desired result. However B yields an Array instead of String and the concatenation only works, because Strings and Numbers are coincidentally equivalent in this regard.
Edit: There seems to be confusion for what reason I do this: I want to use the iterable/iterator protocol to abstract iteration details away, so that my fold/unfold and derived map/filter etc. functions are generic. The problem is, that you can't do this without also having a protocol for identity/concat. And my little "hack" to rely on prototype identity didn't work out.
#redneb made a good point in his response and I agree with him that not every iterable is also a "mappable". However, keeping that in mind I still think it is meaningful - at least in Javascript - to utilize the protocol in this way, until maybe in future versions there is a mappable or collection protocol for such usage.
I have not used the iterable protocol before, but it seems to me that it is essentially an interface designed to let you iterate over container objects using a for loop. The problem is that you are trying to use that interface for something that it was not designed for. For that you would need a separate interface. It is conceivable that an object might be "iterable" but not "mappable". For example, imagine that in an application we are working with binary trees and we implement the iterable interface for them by traversing them say in BFS order, just because that order makes sense for this particular application. How would a generic map work for this particular iterable? It would need to return a tree of the "same shape", but this particular iterable implementation does not provide enough information to reconstruct the tree.
So the solution to this is to define a new interface (call it Mappable, Functor, or whatever you like) but it has to be a distinct interface. Then, you can implement that interface for types that makes sense, such as arrays.
Pass the identity element/concat function explicitly const map = f => id => concat => xs
Yes, this is almost always necessary if the xs parameter doesn't expose the functionality to construct new values. In Scala, every collection type features a builder for this, unfortunately there is nothing in the ECMAScript standard that matches this.
only map Iterables that implement the monoid interface
Well, yes, that might be one way to got. You don't even need to introduce "new types", a standard for this already exists with the Fantasyland specification. The downsides however are
most builtin types (String, Map, Set) don't implement the monoid interface despite being iterable
not all "mappables" are even monoids!
On the other hand, not all iterables are necessarily mappable. Trying to write a map over arbitrary iterables without falling back to an Array result is doomed to fail.
So rather just look for the Functor or Traversable interfaces, and use them where they exist. They might internally be built on an iterator, but that should not concern you. The only thing you might want to do is to provide a generic helper for creating such iterator-based mapping methods, so that you can e.g. decorate Map or String with it. That helper might as well take a builder object as a parameter.
rely on the prototype or constructor identity of ArrayIterator, StringIterator, etc.
That won't work, for example typed arrays are using the same kind of iterator as normal arrays. Since the iterator does not have a way to access the iterated object, you cannot distinguish them. But you really shouldn't anyway, as soon as you're dealing with the iterator itself you should at most map to another iterator but not to the type of iterable that created the iterator.
Where are the prototypes of ArrayIterator/StringIterator/...?
There are no global variables for them, but you can access them by using Object.getPrototypeOf after creating an instance.
You could compare the object strings, though this is not fool proof as there have been known bugs in certain environments and in ES6 the user can modify these strings.
console.log(Object.prototype.toString.call(""[Symbol.iterator]()));
console.log(Object.prototype.toString.call([][Symbol.iterator]()));
Update: You could get more reliable results by testing an iterator's callability of an object, it does require a fully ES6 spec compliant environment. Something like this.
var sValues = String.prototype[Symbol.iterator];
var testString = 'abc';
function isStringIterator(value) {
if (value === null || typeof value !== 'object') {
return false;
}
try {
return value.next.call(sValues.call(testString)).value === 'a';
} catch (ignore) {}
return false;
}
var aValues = Array.prototype.values;
var testArray = ['a', 'b', 'c'];
function isArrayIterator(value) {
if (value === null || typeof value !== 'object') {
return false;
}
try {
return value.next.call(aValues.call(testArray)).value === 'a';
} catch (ignore) {}
return false;
}
var mapValues = Map.prototype.values;
var testMap = new Map([
[1, 'MapSentinel']
]);
function isMapIterator(value) {
if (value === null || typeof value !== 'object') {
return false;
}
try {
return value.next.call(mapValues.call(testMap)).value === 'MapSentinel';
} catch (ignore) {}
return false;
}
var setValues = Set.prototype.values;
var testSet = new Set(['SetSentinel']);
function isSetIterator(value) {
if (value === null || typeof value !== 'object') {
return false;
}
try {
return value.next.call(setValues.call(testSet)).value === 'SetSentinel';
} catch (ignore) {}
return false;
}
var string = '';
var array = [];
var map = new Map();
var set = new Set();
console.log('string');
console.log(isStringIterator(string[Symbol.iterator]()));
console.log(isArrayIterator(string[Symbol.iterator]()));
console.log(isMapIterator(string[Symbol.iterator]()));
console.log(isSetIterator(string[Symbol.iterator]()));
console.log('array');
console.log(isStringIterator(array[Symbol.iterator]()));
console.log(isArrayIterator(array[Symbol.iterator]()));
console.log(isMapIterator(array[Symbol.iterator]()));
console.log(isSetIterator(array[Symbol.iterator]()));
console.log('map');
console.log(isStringIterator(map[Symbol.iterator]()));
console.log(isArrayIterator(map[Symbol.iterator]()));
console.log(isMapIterator(map[Symbol.iterator]()));
console.log(isSetIterator(map[Symbol.iterator]()));
console.log('set');
console.log(isStringIterator(set[Symbol.iterator]()));
console.log(isArrayIterator(set[Symbol.iterator]()));
console.log(isMapIterator(set[Symbol.iterator]()));
console.log(isSetIterator(set[Symbol.iterator]()));
<script src="https://cdnjs.cloudflare.com/ajax/libs/es6-shim/0.35.1/es6-shim.js"></script>
Note: included ES6-shim because Chrome does not currently support Array#values
I know this question was posted quite a while back, but take a look at
https://www.npmjs.com/package/fluent-iterable
It supports iterable maps along with ~50 other methods.
Using iter-ops library, you can apply any processing logic, while iterating only once:
import {pipe, map, concat} from 'iter-ops';
// some arbitrary iterables:
const iterable1 = [1, 2, 3];
const iterable2 = 'hello'; // strings are also iterable
const i1 = pipe(
iterable1,
map(a => a * 2)
);
console.log([...i1]); //=> 2, 4, 6
const i2 = pipe(
iterable1,
map(a => a * 3),
concat(iterable2)
);
console.log([...i2]); //=> 3, 6, 9, 'h', 'e', 'l', 'l', 'o'
There's a plethora of operators in the library that you can use with iterables.
There's no clean way to do this for arbitrary iterable. It is possible to create a map for built-in iterables and refer to it.
const iteratorProtoMap = [String, Array, Map, Set]
.map(ctor => [
Object.getPrototypeOf((new ctor)[Symbol.iterator]()),
ctor]
)
.reduce((map, entry) => map.set(...entry), new Map);
function getCtorFromIterator(iterator) {
return iteratorProtoMap.get(Object.getPrototypeOf(iterator));
}
With a possibility of custom iterables an API for adding them can also be added.
To provide a common pattern for concatenating/constructing a desired iterable a callback can be provided for the map instead of a constructor.
Related
I'm trying to add Flow type information to a small library of mine.
The library defines some functions that are generic over Object, Array, Set, Map and other types.
Here a small piece example to give an idea:
function set( obj, key, value ) {
if( isMap(obj) ) { obj.set(key, value); }
else if( isSet(obj) ) { obj.add(value); }
else { obj[key] = value; }
}
function instantiateSameType( obj ) {
if( isArray(obj) ) { return []; }
else if( isMap(obj) ) { return new Map(); }
else if( isSet(obj) ) { return new Set(); }
else { return {}; }
}
function forEach( obj, fn ) {
if( obj.forEach ) obj.forEach( ( value, key )=>fn(value, key, obj) );
else Object.entries(obj).forEach( ([key, value])=>fn(value, key, obj) );
}
function map( obj, fn ) {
const result = instantiateSameType( obj );
forEach(obj, (value, key)=>{
set( result, key, fn(value, key, this) );
});
return result;
}
How can I define types for map?
I'd want to avoid giving a specialized version for each of the 4 types I listed in the example, as map is generic over them.
I feel the need to define higher-order interfaces, and implement them for existing types, but can't find much about any of this...
Any hints or ideas?
Update 2017-11-28: fp-ts is the successor to flow-static-land. fp-ts is a newer library by the same author. It supports both Flow and Typescript.
There is a library, flow-static-land, that does something quite similar to what you are attempting. You could probably learn some interesting things by looking at that code and reading the accompanying blog posts by #gcanti. I'll expand on the strategy in flow-static-land; but keep in mind that you can implement your iteration functions without higher-kinded types if you are OK with a closed set of iterable types.
As #ftor mentions, if you want polymorphic functions that can work on an open set of collection types then you want higher-kinded types (HKTs). Higher-kinded types are types that take type parameters, but with one or more of those parameters left unspecified. For example arrays in Flow take a type parameter to specify the type of elements in the array (Array<V>), and the same goes for maps (Map<K, V>). Sometimes you want to be able to refer to a parameterized type without specifying all of its type parameters. For example map should be able to operate on all arrays or maps regardless of their type parameters:
function map<K, A, B, M: Array<_> | Map<K, _>>(M<A>, fn: A => B): M<B>
In this case M is a variable representing a higher-kinded type. We can pass M around as a first-class type, and fill in its type parameter with different types at different times. Flow does not natively support HKTs, so the syntax above does not work. But it is possible to fake HKTs with some type alias indirection, which is what flow-static-land does. There are details in the blog post, Higher kinded types with Flow.
To get a fully-polymorphic version of map, flow-static-land emulates Haskell type classes (which rely on HKTs). map is the defining feature of a type class called Functor; flow-static-land has this definition for Functor (from Functor.js):
export interface Functor<F> {
map<A, B>(f: (a: A) => B, fa: HKT<F, A>): HKT<F, B>
}
The HKT type is flow-static-land's workaround for implementing higher-kinded types. The actual higher-kinded type is F, which you can think of as standing in for Array or Map or any type that could implement map. Expressions like HKT<F, A> can be thought of as F<A> where the higher-kinded type F has been applied to the type parameter A. (I'm doing some hand waving here - F is actually a type-level tag. But the simplified view works to some extent.)
You can create an implementation of Functor for any type. But there is a catch: you need to define your type in terms of HKT so that it can be used as a higher-kinded type. In flow-static-land in the module Arr.js we see this higher-kinded version of the array type:
class IsArr {} // type-level tag, not used at runtime
export type ArrV<A> = Array<A>; // used internally
export type Arr<A> = HKT<IsArr, A>; // the HKT-compatible array type
If you do not want to use Arr<A> in place of Array<A> everywhere in your code then you need to convert using inj: (a: Array<A>) => Arr<A> and prj: (fa: Arr<A>) => Array<A>. inj and prj are type-level transformers - at runtime both of those functions just return their input, so they are likely to be inlined by the JIT. There is no runtime difference between Arr<A> and Array<A>.
A Functor implementation for Arr looks like this:
const arrFunctor: Functor<IsArr> = {
function map<A, B>(f: (a: A) => B, fa: Arr<A>): Arr<B> {
const plainArray = prj(f)
const mapped = plainArray.map(f)
return inj(mapped)
}
}
In fact the entire Arr.js module is an Arr implementation for Functor, Foldable, Traversable, and other useful type classes. Using that implementation with polymorphic code looks like this:
import * as Arr from 'flow-static-land/lib/Arr'
import { type Foldable } from 'flow-static-land/lib/Foldable'
import { type Functor } from 'flow-static-land/lib/Functor'
import { type HKT } from 'flow-static-land/lib/HKT'
type Order = { items: string[], total: number }
// this code is polymorphic in that it is agnostic of the collection kind
// that is given
function computeTotal<F> (
f: Foldable<F> & Functor<F>,
orders: HKT<F, Order>
): number {
const totals = f.map(order => order.total, orders)
return f.reduce((sum, total) => sum + total, 0, totals)
}
// calling the code with an `Arr<Order>` collection
const orders = Arr.inj([{ items: ['foo', 'bar'], total: 23.6 }])
const t = computeTotal(Arr, orders)
computeTotal needs to apply map and reduce to its input. Instead of constraining the input to a given collection type, computeTotal uses its first argument to constrain its input to types that implement both Foldable and Functor: f: Foldable<F> & Functor<F>. At the type-level the argument f acts as a "witness" to prove that the given collection type implements both map and reduce. At runtime f provides references to the specific implementations of map and reduce to be used. At the entry point to the polymorphic code (where computeTotal is called with a statically-known collection type) the Foldable & Functor implementation is given as the argument Arr. Because Javascript is not designed for type classes the choice of Arr must be given explicitly; but Flow will at least throw an error if you try to use an implementation that is incompatible with the collection type that is used.
To round this out here is an example of a polymorphic function, allItems, that accepts a collection, and returns a collection of the same kind. allItems is agnostic of the specific type of collection that it operates on:
import { type Monad } from 'flow-static-land/lib/Monad'
import { type Monoid, concatAll } from 'flow-static-land/lib/Monoid'
import { type Pointed } from 'flow-static-land/lib/Pointed'
// accepts any collection type that implements `Monad` & `Monoid`, returns
// a collection of the same kind but containing `string` values instead of
// `Order` values
function allItems<F> (f: Monad<F> & Monoid<*>, orders: HKT<F, Order>): HKT<F, string> {
return f.chain(order => fromArray(f, order.items), orders)
}
function fromArray<F, A> (f: Pointed<F> & Monoid<*>, xs: A[]): HKT<F, A> {
return concatAll(f, xs.map(f.of))
}
// called with an `Arr<Order>` collection
const is = allItems(Arr, orders)
chain is flow-static-land's version of flatMap. For every element in a collection, chain runs a callback that must produce a collection of the same kind (but it could hold a different value type). That produces effectively a collection of collections. chain then flattens that to a single level for you. So chain is basically a combination of map and flatten.
I included fromArray because the callback given to chain must return the same kind of collection that allItems accepts and returns - returning an Array from the chain callback will not work. I used a Pointed constraint in fromArray to get the of function, which puts a single value into a collection of the appropriate kind. Pointed does not appear in the constraints of allItems because allItems has a Monad constraint, and every Monad implementation is also an implementation of Pointed, Chain, Functor, and some others.
I am personally a fan of flow-static-land. The functional style and use of HKTs result in code with better type safety than one could get with object-oriented style duck typing. But there are drawbacks. Error messages from Flow can become very verbose when using type unions like Foldable<F> & Functor<F>. And the code style requires extra training - it will seem super weird to programmers who are not well acquainted with Haskell.
I wanted to follow up with another answer that matches up with the question that you actually asked. Flow can do just what you want. But it does get a bit painful implementing functions that operate on all four of those collection types because in the case of Map the type for keys is fully generic, but for Array the key type must be number, and due to the way objects are implemented in Javascript the key type for Object is always effectively string. (Set does not have keys, but that does not matter too much because you do not need to use keys to set values in a Set.) The safest way to work around the Array and Object special cases would be to provide an overloaded type signature for every function. But it turns out to be quite difficult to tell Flow that key might be the fully-generic type K or string or number depending on the type of obj. The most practical option is to make each function fully generic in the key type. But you have to remember that these functions will fail if you try to use arrays or plain objects with the wrong key type, and you will not get a type error in those cases.
Let's start with a type for the set of collection types that you are working with:
type MyIterable<K, V> = Map<K, V> | Set<V> | Array<V> | Pojo<V>
type Pojo<V> = { [key: string]: V } // plain object
The collection types must all be listed at this point. If you want to work with an open set of collection types instead then see my other answer. And note that my other answer avoids the type-safety holes in the solution here.
There is a handy trick with Flow: you can put the keyword %checks in the type signature of a function that returns a boolean, and Flow will be able to use invocations of that function at type-checking time for type refinements. But the body of the function must use constructions that Flow knows how to use for type refinements because Flow does not actually run the function at type-checking time. For example:
function isMap ( obj: any ): boolean %checks {
return obj instanceof Map
}
function isSet ( obj: any ): boolean %checks {
return obj instanceof Set
}
function isArray ( obj: any ): boolean %checks {
return obj instanceof Array
}
I mentioned you would need a couple of type casts. One instance is in set: Flow knows that when assigning to an array index, the index variable should be a number, and it also knows that K might not be number. The same goes for assigning to plain object properties, since the Pojo type alias specifies string keys. So in the code branch for those cases you need to type-cast key to any, which effectively disables type checking for that use of key.
function set<K, V>( obj: MyIterable<K, V>, key: K, value: V ) {
if( isMap(obj) ) { obj.set(key, value); }
else if( isSet(obj) ) { obj.add(value); }
else { obj[(key:any)] = value; }
}
Your instantiateSameType function just needs a type signature. An important point to keep in mind is that you use instantiateSameType to construct the result of map, and the type of values in the collection can change between the input and output when using map. So it is important to use two different type variables for the value type in the input and output of instantiateSameType as well. You might also allow instantiateSameType to change the key type; but that is not required to make map work correctly.
function instantiateSameType<K, A, B>( obj: MyIterable<K, A> ): MyIterable<K, B> {
if( isArray(obj) ) { return []; }
else if( isMap(obj) ) { return new Map(); }
else if( isSet(obj) ) { return new Set(); }
else { return {}; }
}
That means that the output of instantiateSameType can hold any of values. It might be the same type as values in the input collection, or it might not.
In your implementation of forEach you check for the presence of obj.forEach as a type refinement. This is confusing to Flow because one of the types that make up MyIterable is a plain Javascript object, which might hold any string key. Flow cannot assume that obj.forEach will be falsy. So you need to use a different check. Re-using the isArray, etc. predicates works well:
function forEach<K, V, M: MyIterable<K, V>>( obj: M, fn: (value: V, key: K, obj: M) => any ) {
if( isArray(obj) || isMap(obj) || isSet(obj) ) {
obj.forEach((value, key) => fn(value, (key:any), obj));
} else {
for (const key of Object.keys(obj)) {
fn(obj[key], (key:any), obj)
}
}
}
There are two more issues to point out: Flow's library definition for Object.entries looks like this (from core.js):
declare class Object {
/* ... */
static entries(object: any): Array<[string, mixed]>;
/* ... */
}
Flow assumes that the type of values returned by Object.entries will be mixed, but that type should be V. The fix for this is to get values via object property access in a loop.
The type of the key argument to the given callback should be K, but Flow knows that in the array case that type will actually be number, and in the plain object case it will be string. A couple more type casts are necessary to fix those cases.
Finally, map:
function map<K, A, B, M: MyIterable<K, A>>(
obj: M, fn: (value: A, key: K, obj: M) => B
): MyIterable<K, B> {
const result = instantiateSameType( obj );
forEach(obj, (value, key)=>{
set( result, key, fn(value, key, this) );
});
return result;
}
Some things that I want to point out here: the input collection has a type variable A while the output collection has the variable B. This is because map might change the type of values. And I set up a type variable M for the type of the input collection; that is to inform Flow that the type of the callback argument obj is the same as the type of the input collection. That allows you to use functions in your callback that are particular to the specific collection type that you provided when invoking map.
New ES 6 (Harmony) introduces new Set object. Identity algorithm used by Set is similar to === operator and so not much suitable for comparing objects:
var set = new Set();
set.add({a:1});
set.add({a:1});
console.log([...set.values()]); // Array [ Object, Object ]
How to customize equality for Set objects in order to do deep object comparison? Is there anything like Java equals(Object)?
Update 3/2022
There is currently a proposal to add Records and Tuples (basically immutable Objects and Arrays) to Javascript. In that proposal, it offers direct comparison of Records and Tuples using === or !== where it compares values, not just object references AND relevant to this answer both Set and Map objects would use the value of the Record or Tuple in key comparisons/lookups which would solve what is being asked for here.
Since the Records and Tuples are immutable (can't be modified) and because they are easily compared by value (by their contents, not just their object reference), it allows Maps and Sets to use object contents as keys and the proposed spec explicitly names this feature for Sets and Maps.
This original question asked for customizability of a Set comparison in order to support deep object comparison. This doesn't propose customizability of the Set comparison, but it directly supports deep object comparison if you use the new Record or a Tuple instead of an Object or an Array and thus would solve the original problem here.
Note, this proposal advanced to Stage 2 in mid-2021. It has been moving forward recently, but is certainly not done.
Mozilla work on this new proposal can be tracked here.
Original Answer
The ES6 Set object does not have any compare methods or custom compare extensibility.
The .has(), .add() and .delete() methods work only off it being the same actual object or same value for a primitive and don't have a means to plug into or replace just that logic.
You could presumably derive your own object from a Set and replace .has(), .add() and .delete() methods with something that did a deep object comparison first to find if the item is already in the Set, but the performance would likely not be good since the underlying Set object would not be helping at all. You'd probably have to just do a brute force iteration through all existing objects to find a match using your own custom compare before calling the original .add().
Here's some info from this article and discussion of ES6 features:
5.2 Why can’t I configure how maps and sets compare keys and values?
Question: It would be nice if there were a way to configure what map
keys and what set elements are considered equal. Why isn’t there?
Answer: That feature has been postponed, as it is difficult to
implement properly and efficiently. One option is to hand callbacks to
collections that specify equality.
Another option, available in Java, is to specify equality via a method
that object implement (equals() in Java). However, this approach is
problematic for mutable objects: In general, if an object changes, its
“location” inside a collection has to change, as well. But that’s not
what happens in Java. JavaScript will probably go the safer route of
only enabling comparison by value for special immutable objects
(so-called value objects). Comparison by value means that two values
are considered equal if their contents are equal. Primitive values are
compared by value in JavaScript.
As mentioned in jfriend00's answer customization of equality relation is probably not possible.
Following code presents an outline of computationally efficient (but memory expensive) workaround:
class GeneralSet {
constructor() {
this.map = new Map();
this[Symbol.iterator] = this.values;
}
add(item) {
this.map.set(item.toIdString(), item);
}
values() {
return this.map.values();
}
delete(item) {
return this.map.delete(item.toIdString());
}
// ...
}
Each inserted element has to implement toIdString() method that returns string. Two objects are considered equal if and only if their toIdString methods returns same value.
As the top answer mentions, customizing equality is problematic for mutable objects. The good news is (and I'm surprised no one has mentioned this yet) there's a very popular library called immutable-js that provides a rich set of immutable types which provide the deep value equality semantics you're looking for.
Here's your example using immutable-js:
const { Map, Set } = require('immutable');
var set = new Set();
set = set.add(Map({a:1}));
set = set.add(Map({a:1}));
console.log([...set.values()]); // [Map {"a" => 1}]
Maybe you can try to use JSON.stringify() to do deep object comparison.
for example :
const arr = [
{name:'a', value:10},
{name:'a', value:20},
{name:'a', value:20},
{name:'b', value:30},
{name:'b', value:40},
{name:'b', value:40}
];
const names = new Set();
const result = arr.filter(item => !names.has(JSON.stringify(item)) ? names.add(JSON.stringify(item)) : false);
console.log(result);
To add to the answers here, I went ahead and implemented a Map wrapper that takes a custom hash function, a custom equality function, and stores distinct values that have equivalent (custom) hashes in buckets.
Predictably, it turned out to be slower than czerny's string concatenation method.
Full source here: https://github.com/makoConstruct/ValueMap
Comparing them directly seems not possible, but JSON.stringify works if the keys just were sorted. As I pointed out in a comment
JSON.stringify({a:1, b:2}) !== JSON.stringify({b:2, a:1});
But we can work around that with a custom stringify method. First we write the method
Custom Stringify
Object.prototype.stringifySorted = function(){
let oldObj = this;
let obj = (oldObj.length || oldObj.length === 0) ? [] : {};
for (let key of Object.keys(this).sort((a, b) => a.localeCompare(b))) {
let type = typeof (oldObj[key])
if (type === 'object') {
obj[key] = oldObj[key].stringifySorted();
} else {
obj[key] = oldObj[key];
}
}
return JSON.stringify(obj);
}
The Set
Now we use a Set. But we use a Set of Strings instead of objects
let set = new Set()
set.add({a:1, b:2}.stringifySorted());
set.has({b:2, a:1}.stringifySorted());
// returns true
Get all the values
After we created the set and added the values, we can get all values by
let iterator = set.values();
let done = false;
while (!done) {
let val = iterator.next();
if (!done) {
console.log(val.value);
}
done = val.done;
}
Here's a link with all in one file
http://tpcg.io/FnJg2i
For Typescript users the answers by others (especially czerny) can be generalized to a nice type-safe and reusable base class:
/**
* Map that stringifies the key objects in order to leverage
* the javascript native Map and preserve key uniqueness.
*/
abstract class StringifyingMap<K, V> {
private map = new Map<string, V>();
private keyMap = new Map<string, K>();
has(key: K): boolean {
let keyString = this.stringifyKey(key);
return this.map.has(keyString);
}
get(key: K): V {
let keyString = this.stringifyKey(key);
return this.map.get(keyString);
}
set(key: K, value: V): StringifyingMap<K, V> {
let keyString = this.stringifyKey(key);
this.map.set(keyString, value);
this.keyMap.set(keyString, key);
return this;
}
/**
* Puts new key/value if key is absent.
* #param key key
* #param defaultValue default value factory
*/
putIfAbsent(key: K, defaultValue: () => V): boolean {
if (!this.has(key)) {
let value = defaultValue();
this.set(key, value);
return true;
}
return false;
}
keys(): IterableIterator<K> {
return this.keyMap.values();
}
keyList(): K[] {
return [...this.keys()];
}
delete(key: K): boolean {
let keyString = this.stringifyKey(key);
let flag = this.map.delete(keyString);
this.keyMap.delete(keyString);
return flag;
}
clear(): void {
this.map.clear();
this.keyMap.clear();
}
size(): number {
return this.map.size;
}
/**
* Turns the `key` object to a primitive `string` for the underlying `Map`
* #param key key to be stringified
*/
protected abstract stringifyKey(key: K): string;
}
Example implementation is then this simple: just override the stringifyKey method. In my case I stringify some uri property.
class MyMap extends StringifyingMap<MyKey, MyValue> {
protected stringifyKey(key: MyKey): string {
return key.uri.toString();
}
}
Example usage is then as if this was a regular Map<K, V>.
const key1 = new MyKey(1);
const value1 = new MyValue(1);
const value2 = new MyValue(2);
const myMap = new MyMap();
myMap.set(key1, value1);
myMap.set(key1, value2); // native Map would put another key/value pair
myMap.size(); // returns 1, not 2
A good stringification method for the special but frequent case of a TypedArray as Set/Map key is using
const key = String.fromCharCode(...new Uint16Array(myArray.buffer));
It generates the shortest possible unique string that can be easily converted back. However this is not always a valid UTF-16 string for display concerning Low and High Surrogates. Set and Map seem to ignore surrogate validity.
As measured in Firefox and Chrome, the spread operator performs slowly. If your myArray has fixed size, it executes faster when you write:
const a = new Uint16Array(myArray.buffer); // here: myArray = Uint32Array(2) = 8 bytes
const key = String.fromCharCode(a[0],a[1],a[2],a[3]); // 8 bytes too
Probably the most valuable advantage of this method of key-building: It works for Float32Array and Float64Array without any rounding side-effect. Note that +0 and -0 are then different. Infinities are same. Silent NaNs are same. Signaling NaNs are different depending on their signal (never seen in vanilla JavaScript).
As other guys said there is no native method can do it by far.
But if you would like to distinguish an array with your custom comparator, you can try to do it with the reduce method.
function distinct(array, equal) {
// No need to convert it to a Set object since it may give you a wrong signal that the set can work with your objects.
return array.reduce((p, c) => {
p.findIndex((element) => equal(element, c)) > -1 || p.push(c);
return p;
}, []);
}
// You can call this method like below,
const users = distinct(
[
{id: 1, name: "kevin"},
{id: 2, name: "sean"},
{id: 1, name: "jerry"}
],
(a, b) => a.id === b.id
);
...
As others have said, there is no way to do it with the current version of Set.
My suggestion is to do it using a combination of arrays and maps.
The code snipped below will create a map of unique keys based on your own defined key and then transform that map of unique items into an array.
const array =
[
{ "name": "Joe", "age": 17 },
{ "name": "Bob", "age": 17 },
{ "name": "Carl", "age": 35 }
]
const key = 'age';
const arrayUniqueByKey = [...new Map(array.map(item =>
[item[key], item])).values()];
console.log(arrayUniqueByKey);
/*OUTPUT
[
{ "name": "Bob", "age": 17 },
{ "name": "Carl", "age": 35 }
]
*/
// Note: this will pick the last duplicated item in the list.
To someone who found this question on Google (as me) wanting to get a value of a Map using an object as Key:
Warning: this answer will not work with all objects
var map = new Map<string,string>();
map.set(JSON.stringify({"A":2} /*string of object as key*/), "Worked");
console.log(map.get(JSON.stringify({"A":2}))||"Not worked");
Output:
Worked
I have been studying JavaScript algorithms and Big O for interviews. I was told that knowing the runtimes of built-in methods, such as Object.prototype.hasOwnProperty and Array.prototype.map, is important.
What is a simple way to view the source code for these functions in node.js? I have a local copy of node.js, and I tried to search for these methods in my text editor, but it's not as straightforward as I thought.
Object.prototype.hasOwnProperty()
From a Javascript interview point of view, I would think you just need to fully understand what obj.hasOwnProperty() does at the Javascript level, not how it's implemented inside of V8.
To do that, you should fully understand this little snippet:
function MyConstructor() {
this.methodB = function() {}
}
MyConstructor.prototype = {
methodA: function() {}
};
var o = new MyConstructor();
log(o.hasOwnProperty("methodA")); // false
log(o.hasOwnProperty("methodB")); // true
o.methodA = function() {}; // assign "own" property, overrides prototype
log(o.hasOwnProperty("methodA")); // true
This is because .hasOwnProperty() looks only on the object itself and not on the prototype chain. So properties which are only on the prototype chain or do not exist at all will return false and properties which are directly on the object will return true.
Array.prototype.map()
A polyfill in Javascript for Array.prototype.map() is here on MDN which will show you exactly how it works. You can, of course, do the same type of search I did above in the Github repository to find the .map() implementation too if you want.
Array.prototype.map() is pretty simple really. Iterate over an array, calling a function for each item in the array. Each return value of that function will be used to construct a new array that will be returned from the call to .map(). So, conceptually, it's used to "map" one array to another by calling some transform function on each element of the original array.
In the simplest incarnation, you add 1 to each element of an array:
var origArray = [1,2,3];
var newArray = origArray.map(function(item, index, array) {
return item + 1;
});
console.log(newArray); // [2,3,4]
Actual V8 source code:
If you really want to see how it is implemented inside of V8, here are code snippets and links to the relevant actual code files. As you can see, most of it is in C++ and to understand it, you have to understand how objects are structured in memory and what C++ methods they have internally in V8. This is very V8-specific, not general Javascript knowledge.
I've included links to the relevant source files too so if you want to see other context in those files, you can click on the links to see that.
In v8.h:
V8_DEPRECATED("Use maybe version", bool HasOwnProperty(Local<String> key));
V8_WARN_UNUSED_RESULT Maybe<bool> HasOwnProperty(Local<Context> context, Local<Name> key);
In api.cc:
Maybe<bool> v8::Object::HasOwnProperty(Local<Context> context,
Local<Name> key) {
PREPARE_FOR_EXECUTION_PRIMITIVE(context, "v8::Object::HasOwnProperty()",
bool);
auto self = Utils::OpenHandle(this);
auto key_val = Utils::OpenHandle(*key);
auto result = i::JSReceiver::HasOwnProperty(self, key_val);
has_pending_exception = result.IsNothing();
RETURN_ON_FAILED_EXECUTION_PRIMITIVE(bool);
return result;
}
bool v8::Object::HasOwnProperty(Local<String> key) {
auto context = ContextFromHeapObject(Utils::OpenHandle(this));
return HasOwnProperty(context, key).FromMaybe(false);
}
In v8natives.js:
// ES6 7.3.11
function ObjectHasOwnProperty(value) {
var name = TO_NAME(value);
var object = TO_OBJECT(this);
return %HasOwnProperty(object, name);
}
In objects-inl.h:
Maybe<bool> JSReceiver::HasOwnProperty(Handle<JSReceiver> object,
Handle<Name> name) {
if (object->IsJSObject()) { // Shortcut
LookupIterator it = LookupIterator::PropertyOrElement(
object->GetIsolate(), object, name, LookupIterator::HIDDEN);
return HasProperty(&it);
}
Maybe<PropertyAttributes> attributes =
JSReceiver::GetOwnPropertyAttributes(object, name);
MAYBE_RETURN(attributes, Nothing<bool>());
return Just(attributes.FromJust() != ABSENT);
}
In runtime-object.cc:
static Object* HasOwnPropertyImplementation(Isolate* isolate,
Handle<JSObject> object,
Handle<Name> key) {
Maybe<bool> maybe = JSReceiver::HasOwnProperty(object, key);
if (!maybe.IsJust()) return isolate->heap()->exception();
if (maybe.FromJust()) return isolate->heap()->true_value();
// Handle hidden prototypes. If there's a hidden prototype above this thing
// then we have to check it for properties, because they are supposed to
// look like they are on this object.
if (object->map()->has_hidden_prototype()) {
PrototypeIterator iter(isolate, object);
DCHECK(!iter.IsAtEnd());
// TODO(verwaest): The recursion is not necessary for keys that are array
// indices. Removing this.
// Casting to JSObject is fine because JSProxies are never used as
// hidden prototypes.
return HasOwnPropertyImplementation(
isolate, PrototypeIterator::GetCurrent<JSObject>(iter), key);
}
RETURN_FAILURE_IF_SCHEDULED_EXCEPTION(isolate);
return isolate->heap()->false_value();
}
RUNTIME_FUNCTION(Runtime_HasOwnProperty) {
HandleScope scope(isolate);
DCHECK(args.length() == 2);
CONVERT_ARG_HANDLE_CHECKED(Object, object, 0)
CONVERT_ARG_HANDLE_CHECKED(Name, key, 1);
uint32_t index;
const bool key_is_array_index = key->AsArrayIndex(&index);
// Only JS objects can have properties.
if (object->IsJSObject()) {
Handle<JSObject> js_obj = Handle<JSObject>::cast(object);
// Fast case: either the key is a real named property or it is not
// an array index and there are no interceptors or hidden
// prototypes.
// TODO(jkummerow): Make JSReceiver::HasOwnProperty fast enough to
// handle all cases directly (without this custom fast path).
Maybe<bool> maybe = Nothing<bool>();
if (key_is_array_index) {
LookupIterator it(js_obj->GetIsolate(), js_obj, index,
LookupIterator::HIDDEN);
maybe = JSReceiver::HasProperty(&it);
} else {
maybe = JSObject::HasRealNamedProperty(js_obj, key);
}
if (!maybe.IsJust()) return isolate->heap()->exception();
DCHECK(!isolate->has_pending_exception());
if (maybe.FromJust()) {
return isolate->heap()->true_value();
}
Map* map = js_obj->map();
if (!key_is_array_index && !map->has_named_interceptor() &&
!map->has_hidden_prototype()) {
return isolate->heap()->false_value();
}
// Slow case.
return HasOwnPropertyImplementation(isolate, Handle<JSObject>(js_obj),
Handle<Name>(key));
} else if (object->IsString() && key_is_array_index) {
// Well, there is one exception: Handle [] on strings.
Handle<String> string = Handle<String>::cast(object);
if (index < static_cast<uint32_t>(string->length())) {
return isolate->heap()->true_value();
}
} else if (object->IsJSProxy()) {
Maybe<bool> result =
JSReceiver::HasOwnProperty(Handle<JSProxy>::cast(object), key);
if (!result.IsJust()) return isolate->heap()->exception();
return isolate->heap()->ToBoolean(result.FromJust());
}
return isolate->heap()->false_value();
}
This is the node.js Github repository. If you know what to search for and have enough patience to wade through all the search hits, you can generally find anything you need. The unfortunate thing about searching on Github is I have not found any way to remove all the test sub-directories from the search so you end up with 95% of the search hits in the test code, not in the actual implementation code. But, with enough persistence, you can eventually find what you need.
New ES 6 (Harmony) introduces new Set object. Identity algorithm used by Set is similar to === operator and so not much suitable for comparing objects:
var set = new Set();
set.add({a:1});
set.add({a:1});
console.log([...set.values()]); // Array [ Object, Object ]
How to customize equality for Set objects in order to do deep object comparison? Is there anything like Java equals(Object)?
Update 3/2022
There is currently a proposal to add Records and Tuples (basically immutable Objects and Arrays) to Javascript. In that proposal, it offers direct comparison of Records and Tuples using === or !== where it compares values, not just object references AND relevant to this answer both Set and Map objects would use the value of the Record or Tuple in key comparisons/lookups which would solve what is being asked for here.
Since the Records and Tuples are immutable (can't be modified) and because they are easily compared by value (by their contents, not just their object reference), it allows Maps and Sets to use object contents as keys and the proposed spec explicitly names this feature for Sets and Maps.
This original question asked for customizability of a Set comparison in order to support deep object comparison. This doesn't propose customizability of the Set comparison, but it directly supports deep object comparison if you use the new Record or a Tuple instead of an Object or an Array and thus would solve the original problem here.
Note, this proposal advanced to Stage 2 in mid-2021. It has been moving forward recently, but is certainly not done.
Mozilla work on this new proposal can be tracked here.
Original Answer
The ES6 Set object does not have any compare methods or custom compare extensibility.
The .has(), .add() and .delete() methods work only off it being the same actual object or same value for a primitive and don't have a means to plug into or replace just that logic.
You could presumably derive your own object from a Set and replace .has(), .add() and .delete() methods with something that did a deep object comparison first to find if the item is already in the Set, but the performance would likely not be good since the underlying Set object would not be helping at all. You'd probably have to just do a brute force iteration through all existing objects to find a match using your own custom compare before calling the original .add().
Here's some info from this article and discussion of ES6 features:
5.2 Why can’t I configure how maps and sets compare keys and values?
Question: It would be nice if there were a way to configure what map
keys and what set elements are considered equal. Why isn’t there?
Answer: That feature has been postponed, as it is difficult to
implement properly and efficiently. One option is to hand callbacks to
collections that specify equality.
Another option, available in Java, is to specify equality via a method
that object implement (equals() in Java). However, this approach is
problematic for mutable objects: In general, if an object changes, its
“location” inside a collection has to change, as well. But that’s not
what happens in Java. JavaScript will probably go the safer route of
only enabling comparison by value for special immutable objects
(so-called value objects). Comparison by value means that two values
are considered equal if their contents are equal. Primitive values are
compared by value in JavaScript.
As mentioned in jfriend00's answer customization of equality relation is probably not possible.
Following code presents an outline of computationally efficient (but memory expensive) workaround:
class GeneralSet {
constructor() {
this.map = new Map();
this[Symbol.iterator] = this.values;
}
add(item) {
this.map.set(item.toIdString(), item);
}
values() {
return this.map.values();
}
delete(item) {
return this.map.delete(item.toIdString());
}
// ...
}
Each inserted element has to implement toIdString() method that returns string. Two objects are considered equal if and only if their toIdString methods returns same value.
As the top answer mentions, customizing equality is problematic for mutable objects. The good news is (and I'm surprised no one has mentioned this yet) there's a very popular library called immutable-js that provides a rich set of immutable types which provide the deep value equality semantics you're looking for.
Here's your example using immutable-js:
const { Map, Set } = require('immutable');
var set = new Set();
set = set.add(Map({a:1}));
set = set.add(Map({a:1}));
console.log([...set.values()]); // [Map {"a" => 1}]
Maybe you can try to use JSON.stringify() to do deep object comparison.
for example :
const arr = [
{name:'a', value:10},
{name:'a', value:20},
{name:'a', value:20},
{name:'b', value:30},
{name:'b', value:40},
{name:'b', value:40}
];
const names = new Set();
const result = arr.filter(item => !names.has(JSON.stringify(item)) ? names.add(JSON.stringify(item)) : false);
console.log(result);
To add to the answers here, I went ahead and implemented a Map wrapper that takes a custom hash function, a custom equality function, and stores distinct values that have equivalent (custom) hashes in buckets.
Predictably, it turned out to be slower than czerny's string concatenation method.
Full source here: https://github.com/makoConstruct/ValueMap
Comparing them directly seems not possible, but JSON.stringify works if the keys just were sorted. As I pointed out in a comment
JSON.stringify({a:1, b:2}) !== JSON.stringify({b:2, a:1});
But we can work around that with a custom stringify method. First we write the method
Custom Stringify
Object.prototype.stringifySorted = function(){
let oldObj = this;
let obj = (oldObj.length || oldObj.length === 0) ? [] : {};
for (let key of Object.keys(this).sort((a, b) => a.localeCompare(b))) {
let type = typeof (oldObj[key])
if (type === 'object') {
obj[key] = oldObj[key].stringifySorted();
} else {
obj[key] = oldObj[key];
}
}
return JSON.stringify(obj);
}
The Set
Now we use a Set. But we use a Set of Strings instead of objects
let set = new Set()
set.add({a:1, b:2}.stringifySorted());
set.has({b:2, a:1}.stringifySorted());
// returns true
Get all the values
After we created the set and added the values, we can get all values by
let iterator = set.values();
let done = false;
while (!done) {
let val = iterator.next();
if (!done) {
console.log(val.value);
}
done = val.done;
}
Here's a link with all in one file
http://tpcg.io/FnJg2i
For Typescript users the answers by others (especially czerny) can be generalized to a nice type-safe and reusable base class:
/**
* Map that stringifies the key objects in order to leverage
* the javascript native Map and preserve key uniqueness.
*/
abstract class StringifyingMap<K, V> {
private map = new Map<string, V>();
private keyMap = new Map<string, K>();
has(key: K): boolean {
let keyString = this.stringifyKey(key);
return this.map.has(keyString);
}
get(key: K): V {
let keyString = this.stringifyKey(key);
return this.map.get(keyString);
}
set(key: K, value: V): StringifyingMap<K, V> {
let keyString = this.stringifyKey(key);
this.map.set(keyString, value);
this.keyMap.set(keyString, key);
return this;
}
/**
* Puts new key/value if key is absent.
* #param key key
* #param defaultValue default value factory
*/
putIfAbsent(key: K, defaultValue: () => V): boolean {
if (!this.has(key)) {
let value = defaultValue();
this.set(key, value);
return true;
}
return false;
}
keys(): IterableIterator<K> {
return this.keyMap.values();
}
keyList(): K[] {
return [...this.keys()];
}
delete(key: K): boolean {
let keyString = this.stringifyKey(key);
let flag = this.map.delete(keyString);
this.keyMap.delete(keyString);
return flag;
}
clear(): void {
this.map.clear();
this.keyMap.clear();
}
size(): number {
return this.map.size;
}
/**
* Turns the `key` object to a primitive `string` for the underlying `Map`
* #param key key to be stringified
*/
protected abstract stringifyKey(key: K): string;
}
Example implementation is then this simple: just override the stringifyKey method. In my case I stringify some uri property.
class MyMap extends StringifyingMap<MyKey, MyValue> {
protected stringifyKey(key: MyKey): string {
return key.uri.toString();
}
}
Example usage is then as if this was a regular Map<K, V>.
const key1 = new MyKey(1);
const value1 = new MyValue(1);
const value2 = new MyValue(2);
const myMap = new MyMap();
myMap.set(key1, value1);
myMap.set(key1, value2); // native Map would put another key/value pair
myMap.size(); // returns 1, not 2
A good stringification method for the special but frequent case of a TypedArray as Set/Map key is using
const key = String.fromCharCode(...new Uint16Array(myArray.buffer));
It generates the shortest possible unique string that can be easily converted back. However this is not always a valid UTF-16 string for display concerning Low and High Surrogates. Set and Map seem to ignore surrogate validity.
As measured in Firefox and Chrome, the spread operator performs slowly. If your myArray has fixed size, it executes faster when you write:
const a = new Uint16Array(myArray.buffer); // here: myArray = Uint32Array(2) = 8 bytes
const key = String.fromCharCode(a[0],a[1],a[2],a[3]); // 8 bytes too
Probably the most valuable advantage of this method of key-building: It works for Float32Array and Float64Array without any rounding side-effect. Note that +0 and -0 are then different. Infinities are same. Silent NaNs are same. Signaling NaNs are different depending on their signal (never seen in vanilla JavaScript).
As other guys said there is no native method can do it by far.
But if you would like to distinguish an array with your custom comparator, you can try to do it with the reduce method.
function distinct(array, equal) {
// No need to convert it to a Set object since it may give you a wrong signal that the set can work with your objects.
return array.reduce((p, c) => {
p.findIndex((element) => equal(element, c)) > -1 || p.push(c);
return p;
}, []);
}
// You can call this method like below,
const users = distinct(
[
{id: 1, name: "kevin"},
{id: 2, name: "sean"},
{id: 1, name: "jerry"}
],
(a, b) => a.id === b.id
);
...
As others have said, there is no way to do it with the current version of Set.
My suggestion is to do it using a combination of arrays and maps.
The code snipped below will create a map of unique keys based on your own defined key and then transform that map of unique items into an array.
const array =
[
{ "name": "Joe", "age": 17 },
{ "name": "Bob", "age": 17 },
{ "name": "Carl", "age": 35 }
]
const key = 'age';
const arrayUniqueByKey = [...new Map(array.map(item =>
[item[key], item])).values()];
console.log(arrayUniqueByKey);
/*OUTPUT
[
{ "name": "Bob", "age": 17 },
{ "name": "Carl", "age": 35 }
]
*/
// Note: this will pick the last duplicated item in the list.
To someone who found this question on Google (as me) wanting to get a value of a Map using an object as Key:
Warning: this answer will not work with all objects
var map = new Map<string,string>();
map.set(JSON.stringify({"A":2} /*string of object as key*/), "Worked");
console.log(map.get(JSON.stringify({"A":2}))||"Not worked");
Output:
Worked
First, let me define what is short-cut fusion for those of you who don't know. Consider the following array transformation in JavaScript:
var a = [1,2,3,4,5].map(square).map(increment);
console.log(a);
function square(x) {
return x * x;
}
function increment(x) {
return x + 1;
}
Here we have an array, [1,2,3,4,5], whose elements are first squared, [1,4,9,16,25], and then incremented [2,5,10,17,26]. Hence, although we don't need the intermediate array [1,4,9,16,25], we still create it.
Short-cut fusion is an optimization technique which can get rid of intermediate data structures by merging some functions calls into one. For example, short-cut fusion can be applied to the above code to produce:
var a = [1,2,3,4,5].map(compose(square, increment));
console.log(a);
function square(x) {
return x * x;
}
function increment(x) {
return x + 1;
}
function compose(g, f) {
return function (x) {
return f(g(x));
};
}
As you can see, the two separate map calls have been fused into a single map call by composing the square and increment functions. Hence the intermediate array is not created.
Now, I understand that libraries like Immutable.js and Lazy.js emulate lazy evaluation in JavaScript. Lazy evaluation means that results are only computed when required.
For example, consider the above code. Although we square and increment each element of the array, yet we may not need all the results.
Suppose we only want the first 3 results. Using Immutable.js or Lazy.js we can get the first 3 results, [2,5,10], without calculating the last 2 results, [17,26], because they are not needed.
However, lazy evaluation just delays the calculation of results until required. It does not remove intermediate data structures by fusing functions.
To make this point clear, consider the following code which emulates lazy evaluation:
var List = defclass({
constructor: function (head, tail) {
if (typeof head !== "function" || head.length > 0)
Object.defineProperty(this, "head", { value: head });
else Object.defineProperty(this, "head", { get: head });
if (typeof tail !== "function" || tail.length > 0)
Object.defineProperty(this, "tail", { value: tail });
else Object.defineProperty(this, "tail", { get: tail });
},
map: function (f) {
var l = this;
if (l === nil) return nil;
return cons(function () {
return f(l.head);
}, function () {
return l.tail.map(f);
});
},
take: function (n) {
var l = this;
if (l === nil || n === 0) return nil;
return cons(function () {
return l.head;
}, function () {
return l.tail.take(n - 1);
});
},
mapSeq: function (f) {
var l = this;
if (l === nil) return nil;
return cons(f(l.head), l.tail.mapSeq(f));
}
});
var nil = Object.create(List.prototype);
list([1,2,3,4,5])
.map(trace(square))
.map(trace(increment))
.take(3)
.mapSeq(log);
function cons(head, tail) {
return new List(head, tail);
}
function list(a) {
return toList(a, a.length, 0);
}
function toList(a, length, i) {
if (i >= length) return nil;
return cons(a[i], function () {
return toList(a, length, i + 1);
});
}
function square(x) {
return x * x;
}
function increment(x) {
return x + 1;
}
function log(a) {
console.log(a);
}
function trace(f) {
return function () {
var result = f.apply(this, arguments);
console.log(f.name, JSON.stringify([...arguments]), result);
return result;
};
}
function defclass(prototype) {
var constructor = prototype.constructor;
constructor.prototype = prototype;
return constructor;
}
As you can see, the function calls are interleaved and only the first three elements of the array are processed, proving that the results are indeed computed lazily:
square [1] 1
increment [1] 2
2
square [2] 4
increment [4] 5
5
square [3] 9
increment [9] 10
10
If lazy evaluation is not used then the result would be:
square [1] 1
square [2] 4
square [3] 9
square [4] 16
square [5] 25
increment [1] 2
increment [4] 5
increment [9] 10
increment [16] 17
increment [25] 26
2
5
10
However, if you see the source code then each function list, map, take and mapSeq returns an intermediate List data structure. No short-cut fusion is performed.
This brings me to my main question: do libraries like Immutable.js and Lazy.js perform short-cut fusion?
The reason I ask is because according to the documentation, they “apparently” do. However, I am skeptical. I have my doubts whether they actually perform short-cut fusion.
For example, this is taken from the README.md file of Immutable.js:
Immutable also provides a lazy Seq, allowing efficient chaining of collection methods like map and filter without creating intermediate representations. Create some Seq with Range and Repeat.
So the developers of Immutable.js claim that their Seq data structure allows efficient chaining of collection methods like map and filter without creating intermediate representations (i.e. they perform short-cut fusion).
However, I don't see them doing so in their code anywhere. Perhaps I can't find it because they are using ES6 and my eyes aren't all too familiar with ES6 syntax.
Furthermore, in their documentation for Lazy Seq they mention:
Seq describes a lazy operation, allowing them to efficiently chain use of all the Iterable methods (such as map and filter).
Seq is immutable — Once a Seq is created, it cannot be changed, appended to, rearranged or otherwise modified. Instead, any mutative method called on a Seq will return a new Seq.
Seq is lazy — Seq does as little work as necessary to respond to any method call.
So it is established that Seq is indeed lazy. However, there are no examples to show that intermediate representations are indeed not created (which they claim to be doing).
Moving on to Lazy.js we have the same situation. Thankfully, Daniel Tao wrote a blog post on how Lazy.js works, in which he mentions that at its heart Lazy.js simply does function composition. He gives the following example:
Lazy.range(1, 1000)
.map(square)
.filter(multipleOf3)
.take(10)
.each(log);
function square(x) {
return x * x;
}
function multipleOf3(x) {
return x % 3 === 0;
}
function log(a) {
console.log(a);
}
<script src="https://rawgit.com/dtao/lazy.js/master/lazy.min.js"></script>
Here the map, filter and take functions produce intermediate MappedSequence, FilteredSequence and TakeSequence objects. These Sequence objects are essentially iterators, which eliminate the need of intermediate arrays.
However, from what I understand, there is still no short-cut fusion taking place. The intermediate array structures are simply replaced with intermediate Sequence structures which are not fused.
I could be wrong, but I believe that expressions like Lazy(array).map(f).map(g) produce two separate MappedSequence objects in which the first MappedSequence object feeds its values to the second one, instead of the second one replacing the first one by doing the job of both (via function composition).
TLDR: Do Immutable.js and Lazy.js indeed perform short-cut fusion? As far as I know they get rid of intermediate arrays by emulating lazy evaluation via sequence objects (i.e. iterators). However, I believe that these iterators are chained: one iterator feeding its values lazily to the next. They are not merged into a single iterator. Hence they do not “eliminate intermediate representations“. They only transform arrays into constant space sequence objects.
I'm the author of Immutable.js (and a fan of Lazy.js).
Does Lazy.js and Immutable.js's Seq use short-cut fusion? No, not exactly. But they do remove intermediate representation of operation results.
Short-cut fusion is a code compilation/transpilation technique. Your example is a good one:
var a = [1,2,3,4,5].map(square).map(increment);
Transpiled:
var a = [1,2,3,4,5].map(compose(square, increment));
Lazy.js and Immutable.js are not transpilers and will not re-write code. They are runtime libraries. So instead of short-cut fusion (a compiler technique) they use iterable composition (a runtime technique).
You answer this in your TLDR:
As far as I know they get rid of intermediate arrays by emulating lazy
evaluation via sequence objects (i.e. iterators). However, I believe
that these iterators are chained: one iterator feeding its values
lazily to the next. They are not merged into a single iterator. Hence
they do not "eliminate intermediate representations". They only
transform arrays into constant space sequence objects.
That is exactly right.
Let's unpack:
Arrays store intermediate results when chaining:
var a = [1,2,3,4,5];
var b = a.map(square); // b: [1,4,6,8,10] created in O(n)
var c = b.map(increment); // c: [2,5,7,9,11] created in O(n)
Short-cut fusion transpilation creates intermediate functions:
var a = [1,2,3,4,5];
var f = compose(square, increment); // f: Function created in O(1)
var c = a.map(f); // c: [2,5,7,9,11] created in O(n)
Iterable composition creates intermediate iterables:
var a = [1,2,3,4,5];
var i = lazyMap(a, square); // i: Iterable created in O(1)
var j = lazyMap(i, increment); // j: Iterable created in O(1)
var c = Array.from(j); // c: [2,5,7,9,11] created in O(n)
Note that using iterable composition, we have not created a store of intermediate results. When these libraries say they do not create intermediate representations - what they mean is exactly what is described in this example. No data structure is created holding the values [1,4,6,8,10].
However, of course some intermediate representation is made. Each "lazy" operation must return something. They return an iterable. Creating these is extremely cheap and not related to the size of the data being operated on. Note that in short-cut fusion transpilation, an intermediate representation is also made. The result of compose is a new function. Functional composition (hand-written or the result of a short-cut fusion compiler) is very related to Iterable composition.
The goal of removing intermediate representations is performance, especially regarding memory. Iterable composition is a powerful way to implement this and does not require the overhead that parsing and rewriting code of an optimizing compiler which would be out of place in a runtime library.
Appx:
This is what a simple implementation of lazyMap might look like:
function lazyMap(iterable, mapper) {
return {
"##iterator": function() {
var iterator = iterable["##iterator"]();
return {
next: function() {
var step = iterator.next();
return step.done ? step : { done: false, value: mapper(step.value) }
}
};
}
};
}