In my application, I have a very large array of objects on the front-end, and these objects all have some kind of long ID under the heading ["object_id"]. I'm using UnderscoreJS for all my manipulations of this list. This is a prototype for an application that will eventually be handling most of this effort on the backend.
Merging the list is a big part of the application's requirement. See, the list that I work with initially will have many distinct objects with identical object_ids. Initially I was merging them all in one go with a groupBy and a map-reduce, but now the requirements have changed and I'm merging them one at a time (about a second apart, to simulate a stream of input) into a initially empty array.
My naive implementation was something like this:
function(newObject, objects) {
var obj_id = newObject["object_id"]; //id of object to merge, let's say
var tempObject = null;
var objectToMerge = _.find(objects,
function(obj) {
return obj_id == obj["object_id"];
});
if (objectToMerge) {
tempObject = merge(objectToMerge, newObject);
objects = _.reject(objects, /*same function as findWhere*/ );
} else {
tempObject = newObject;
}
objects.push(tempObject);
return objects;
}
This is ridiculously more efficient than before, when I was remerging from the mock data "source" array every time a new object was supposed to be pushed, so it's down from what I think was O(N^2) at least to O(N), but N here is so large (for JavaScript, anyway!) I'd like to optimize it. Currently worst case, where the object_id is not redundant, is the entire list is traversed twice. So what I'd like is to do a find-and-replace, an operation which would return a new version of the list, but with the merged object in place of the old one.
I could do a map where the iterator returns a new, merged object iff the object_id is the same, but that doesn't have the short-circuit evaluation that _.find has, which means the difference between having a worst-case runtime and having that be the default runtime, and doesn't easily account for pushing the object if there wasn't a match.
I'd also like to avoid mutating the original array in place. I know objects.push(tempObject) does that very thing, but for data-binding reasons I'm ignoring that and returning the mutated list as though it were new.
It's also unavoidable that I'll have to check the array to see if the new object was merged or whether it was appended. Using closures I could keep track of a flag to see if the merge happened, but I'm trying to be as idiomatically LISPy as possible for my own sanity. Also, past a certain point, most objects will be merged, so extra runtime overheard for adding new items isn't a huge problem, as long as it is only incurred when it has to happen.
Related
I have the following code which feels a bit redundant as I'm iterating over the same array many times, any suggestions to improve while keeping it readable would be appreciated.
The array object format is a standard one where I am checking the keys in each object to check it matches a certain condition e.g. every object contains one value, or only some of them do e.t.c [{},{},{}]
const getValue = arrayOfObjects => {
const hasA = arrayOfObjects.some(
object => arrayOfObjects.abc === 'val1'
);
const hasB = arrayOfObjects.every(
object => arrayOfObjects.abc === 'val2'
);
// the above 2 iterations are repeated about 4 more times for different checks
// then there are a few versions of the below assignment depending on the above variables
const hasC =
hasA ||
hasB;
// finally the function returns one of the values
if (hasA) {
return 'val10';
} else if (hasB) {
return 'val11';
} else if (hasD) {
return 'val12';
}
};
This sounds like a theoretical question. It sounds like you're wondering if using a few Array.prototype methods like some and every on the same array over and over has downsides or if there is a more readable way to do it.
It depends.
If the size of the array n is generally pretty small, in a practical sense, it doesn't matter. Choose what is more readable. Big O complexity comes more into play on a practical level if you're dealing with a lot of array items.
Boolean vars derived from some and every can be very readable in my opinion.
If you are in a situation where the array could be quite large, you could consider trying to do it in one step. Array.prototype.reduce would be a good tool for this: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/reduce
In general when you're making use of every and some (which I personally love, as they're very readable), you should stop and think about the necessity of the checks you're making.
To simplify the examples you had, let's say you have a bunch of fruits and you're checking whether:
There is at least one orange.
All of the fruits are apples.
It's clear that
If (1) is true, then (2) must be false
And the other way around:
If (2) is true, then (1) must be false
Then you can spare the testing one of them if the other one is true, using else if for example.
Additionally, if one of them is false - then you simply don't have to check (1) or (2), which is guaranteed to be false.
Let's say we have this object: var a = {x: 3}.
Now, if we have an array arr = [a] that holds a reference to this object, what arr[0] actually stores is a reference to that object, not the actual object data.
I have many objects (20k+) such as a that I want to keep track of, probably creating an array similar to arr each second. As I want the memory allocation to be as efficient as possible, can I somehow tell the compiler that my array will only contain references to objects like a? I thought of using something like TypedArray, but I don't know what type the reference of a is, I guess I can't just use new UInt32Array() and actually store a at each index.
In languages like C++ you could have an array of pointers and you always know the size of a pointer (eg: 8 bytes on 64bit machines).
Is there any way to efficiently store references/pointers to objects in an Array or Object?
I think this answers my own questions. I can create an initial Array, add the elements that I want to it and then whenever I need a new array, just copy this one and update the elements. This way it doesn't matter what data type are the references, as I can directly allocate exactly the entire memory needed for the array.
Example in a pseudo-JavaScript:
var initialArray = [];
// push initial references into this array
// whenever I need a new array do:
var newArray = initialArray.slice();
// update references in the newArray
...
for i in newArray
newArray[i] = newRefi;
...
This way the newArray will be the correct size when created.
LE: Although this works in theory, it actually makes performance worse, probably because now creating the newArray now has to copy memory and do other crazy stuff instead of just allocating some memory.
I'm trying to get my head around how to use Immutables in JavaScript/TypeScript without taking all day about it. I'm not quite ready to take the dive into Immutable.js, because it seems to leave you high and dry as far as type safety.
So let's take an example where I have an Array where the elements are all of Type MyType. In my Class, I have a method that searches the Array and returns a copy of a matching element so we don't edit the original. Say now that at a later time, I need to look and see if the object is in the Array, but what I have is the copy, not the original.
What is the standard method of handling this? Any method I can think of to determine whether I already have this item is going to take some form of looping through the collection and visiting each element and then doing a clunky equality match, whether that's turning both of them to strings or using a third-party library.
I'd like to use Immutables, but I keep running into situations like this that make them look pretty unattractive. What am I missing?
I suspect that my solution is not "...the standard method of handling this." However, I think it at least is a way of doing what I think you're asking.
You write that you have a method that "...returns a copy of a matching element so we don't edit the original". Could you change that method so that it instead returns both the original and a copy?
As an example, the strategy below involves retrieving both an original element from the array (which can later be used to search by reference) as well as a clone (which can be manipulated as needed without affecting the original). There is still the cost of cloning the original during retrieval, but at least you don't have to do such conversions for every element in the array when you later search the array. Moreover, it even allows you to differentiate between array elements that are identical-by-value, something that would be impossible if you only originally retrieved a copy of an element. The code below demonstrates this by making every array element identical-by-value (but, by definition of what objects are, different-by-reference).
I don't know if this violates other immutability best practices by, e.g., keeping copies of references to elements (which, I suppose, leaves the code open to future violations of immutability even if they are not currently being violated...though you could deep-freeze the original to prevent future mutations). However it at least allows you to keep everything technically immutable while still being able to search by reference. Thus you can mutate your clone as much as you want but still always hold onto an associated copy-by-reference of the original.
const retrieveDerivative = (array, elmtNum) => {
const orig = array[elmtNum];
const clone = JSON.parse(JSON.stringify(orig));
return {orig, clone};
};
const getIndexOfElmt = (array, derivativeOfElement) => {
return array.indexOf(derivativeOfElement.orig);
};
const obj1 = {a: {b: 1}}; // Object #s are irrelevant.
const obj3 = {a: {b: 1}}; // Note that all objects are identical
const obj5 = {a: {b: 1}}; // by value and thus can only be
const obj8 = {a: {b: 1}}; // differentiated by reference.
const myArr = [obj3, obj5, obj1, obj8];
const derivedFromSomeElmt = retrieveDerivative(myArr, 2);
const indexOfSomeElmt = getIndexOfElmt(myArr, derivedFromSomeElmt);
console.log(indexOfSomeElmt);
The situation you've described is one where a mutable datastructure has obvious advantages, but if you otherwise benefit from using immutables there are better approaches.
While keeping it immutable means that your new updated object is completely new, that cuts both ways: you may have a new object, but you also still have access to the original object! You can do a lot of neat things with this, e.g. chain your objects so you have an undo-history, and can go back in time to roll back changes.
So don't use some hacky looking-up-the-properties in the array. The problem with your example is because you're building a new object at the wrong time: don't have a function return a copy of the object. Have the function return the original object, and call your update using the original object as an index.
let myThings = [new MyType(), new MyType(), new MyType()];
// We update by taking the thing, and replacing with a new one.
// I'll keep the array immutable too
function replaceThing(oldThing, newThing) {
const oldIndex = myThings.indexOf(oldThing);
myThings = myThings.slice();
myThings[oldIndex] = newThing;
return myThings;
}
// then when I want to update it
// Keep immutable by spreading
const redThing = myThings.find(({ red }) => red);
if (redThing) {
// In this example, there is a 'clone' method
replaceThing(redThing, Object.assign(redThing.clone(), {
newProperty: 'a new value in my immutable!',
});
}
All that said, classes make this a whole lot more complex too. It's much easier to keep simple objects immutable, since you could simple spread the old object into the new one, e.g. { ...redThing, newProperty: 'a new value' }. Once you get a higher than 1-height object, you may find immutable.js far more useful, since you can mergeDeep.
I was wondering if any one knows how memory is handled with JS arrays if you have an array that starts with a high value.
For example, if you have:
array[5000] = 1;
As the first value in the array, everything before 5000 simply does not exist, will the amount of memory assigned to the array cater for the unassigned 4999 positions prior to it... or will it only assign memory to the value in the array for [5000] ?
I'm trying to cut down on the amount of memory used for my script so this led to me wondering about this question :)
When assigning a value to the 5000th key, not the whole array is populated:
var array = []; // Create array
array[5000] = 1;
'1' in array; // false: The key does not exists
Object.keys(array); // 5000 (it's the only key)
If you want to blow your new browser with arrays, populate a typed array:
var array = new ArrayBuffer(6e9); // 6 Gigs
Both can be verified easily in Chrome: Open the console and memory console (Shift+Esc), and paste the code. window.a=new Array(6e9); or window.a=[];window[6e9]=1; doesn't result in a significant memory increase,
while window.a=new ArrayBuffer(6e9); crashes the page.
PS. 6e9 === 6000000000
Javascript is really interpreted and run by the browser, so it depends on how the browser implements this behavior. In theory, once you do array[5000], you have an array of 5001 elements, all except the 5001st being undefined.
Though if I were the one implementing the logic for running such script, undefined would be the default value if not assigned to anything else, meaning I could probably get away with defining a map with 1 entry assigning key 5000 to value 1. Any accesses to any other value in the array would automatically return undefined, without having to do unnecessary work.
Here's a test of this here. As you can see, the alert is seen immediately.
JS arrays are actually not arrays as you know them from other programming languages like C, C++, etc. They are instead objects with a array like way of accessing them. This means that when you define array[5000] = 1; You actually define the 5000 property of the array object.
If you had used a string as the array key you would have been able to access the index as a property as well to demonstrate this behavior, but since variable names can't start with a number array.5000 would be invalid.
array['key'] = 1;
alert( array.key ); // Gives you 1
This means that arrays will probably be implemented much like objects, although each implementation is free to optimize, thus giving you the behavior you except from objects where you can define object.a and object.z without defining the whole alphabet.
I have a custom object which contains other items (ie arrays, strings, other types of objects).
I am not sure how to traverse the object to iterate and list all of the object types, keys, and values of the nested items.
Second to this issue I don't know how many levels of nesting there are (as the object is generated dynamically from the back-end and passed to me as one object).
Any ideas (and should I just use javascript/jQuery or both to do this most efficiently)?
Thanks I'll give the code a go. I am retrieving a result set from a webservice which returns a different set of columns (of differing datatypes) and rows each time. I don't know the names of the columns which is why I am trying to get the data however I can.
Depending on the datatype I will perform a different action (sum the amount, format it etc).
JSON-serialized objects contain a hierarchy, w/o any reference cycles, so it should be fairly straightforward to traverse, something like
function visit(JSONobj, f)
{
for (var key in JSONobj)
{
var value = JSONobj[key];
f(key,value);
if (value instanceof Object)
visit(value, f);
}
}
where f is a function that does something with keys and values. (of course you could just write a function to do this directly).
What exactly are you trying to find within the object?