I'm making a function on a node.js server which reads a CSV file, I need to read all lines and execute several promised operations (MySQL queries) for each one (update or insert, then set an specified column which identifies this item as "modified in this execution") and once this finishes change another column on those not updated or inserted to identify this items as "deleted"
At first, the problem I had was that this CSV has millions of lines (literally) and a hundred of columns, so I run out of memory quite easily, and this number of lines can grow or decrease so I cannot know the amount of lines I will have to process every time I receive it.
I made a simple program that allows me to separate this CSV in some others with a readable amount of lines so my server can work with each one of them without dying, thus making an unknown amount of files each new file is processed, so now I have a different problem.
I want to read all of those CSVs, make those operations, and, once those operations are finished, execute the final one which will change those not updated/inserted. The only issue is that I need to read all of them and I cannot do this simultaneously, I have to make it sequentially, no matter how many they are (as said, after separating the main CSV, I may have 1 million lines divided into 3 files, or 2 millions into 6 files).
At first I though about using a forEach loop, but the problem is that, foreach doesn't respects the promisifying, so it will launch all of them, server will run out of memory when loading all those CSVs and then die. Honestly, using a while(boolean) on each iteration of the foreach to wait for the resolve of each promisified function seems pretty.... smelly for me, plus I feel like that solution will stop the server from working properly so I'm looking for a different solution.
Let me give you a quick explanation of what I want:
const arrayReader=(([arrayOfCSVs])=>{
initialFunction();
functions(arrayOfCSVs[0])
.then((result)=>{
functions(arrayOfCSVs[1])
.then((result2)=>{
functions(arrayOfCSVs[2])
(OnAndOnAndOnAndOn...)
.then((resultX)=>{
executeFinalFunction();
});
});
});
You can use Array.reduce to get the previous promise and queue new promise, without the need for waiting.
const arrayReader = ([arrayOfCSVs]) => {
initialFunction();
return arrayOfCSVs.reduce((prom, csv) => {
return prom.then(() => functions(csv));
}, Promise.resolve()).then(resultX => executeFinalFunction());
}
Related
I am trying to sum up the results which got from a series of async call
let sum = 0;
for(const id of ids) {
const res = await getRes(id);
sum += res;
}
Is this a valid way to do it? Any better solution?
The code that you have written seems to be correct.
In order to validate it you can write some unit tests, I can't say how difficult it is because I don't know what getRes is doing and what external dependencies (if any) you have to mock. Generally speaking you should take the habit of unit testing your code: one of the benefits it brings to the table is offering you a way to validate your implementation.
You can also consider the idea of getting the results in parallel, this can actually speed things up. Again I don't know what getResis doing, but I suppose it performs some sort of IO (e.g.: a database query). Given the single thread nature of Javascript, when you have a bunch of independent asynchronous operations you can always try to perform them in parallel and to collect their results, so that you can aggregate them later. Please notice the bold on the word independent: in order to perform a bunch of operations in parallel they need to be independent (if they are not, the correctness of your code will be invalidated).
Since you are performing a sum of number, it is safe to compute the addends in parallel.
This is the simplest possible parallel solution:
async function sum(ids) {
const getResTasks = ids.map(id => getRes(id));
const addends = await Promise.all(getResTasks);
return addends.reduce((memo, item) => memo + item, 0);
}
Pay attention: this is a naive implementation. I don't know how many items the ids array can contain. If this number can possibly be huge (thousand of items or more) a code like the previous one could create an heavy load on the external dependency used to get the sum addends, so some precautions to limit the degree of parallelism should be taken.
I have two intervals that need access to the same data.
So in one interval I want to push() an element to an array and then in the other interval I want to get the first element from the array and then remove it.
for example:
let array = [];
let count = 0;
setInterval( () => {
array.push(count);
count++;
}, 1000);
setInterval( () => {
let data = array[0];
array.shift();
console.log("received data: "+data);
}, 1000);
the output of this is:
received data: 0
received data: 1
received data: 2
received data: 3
....
Does this also work with more complex functions and bigger arrays?
Could this cause any weird behaviour? Maybe it could shift and push at the same time and mess up the array?
Is this a good way to do that? Are there better ways?
EDIT:
The reason i want to do this. Is because I want to download data from many different links. So inside my script i call a download(link) function, but this will result in the script trying to download a lot of links at the same time. So i want to create a buffer, so that the script only downloads from 100 links at the same time.
Inside the script i want to call download(link) wherever i want and then let an interval take care of downloading only 100 links at the same time. So it removes 100 links from a buffer and downloads them. While the script pushes new links to the same array.
My main concern is that while i am doing a shift() the array will reorganize itself somehow. Might js try to make a push() in between this reorganization phase? Or will js not do any array operations on the array until shift() is completed?
Your general idea of pushing links to an array asynchronously and then removing them from the array in a separate asynchronous task is fine, it'll work.
My main concern is that while i am doing a shift() the array will reorganize itself somehow. Might js try to make a push() in between this reorganization phase? Or will js not do any array operations on the array until shift() is completed?
Javascript is single-threaded, so this isn't something to worry about - if one interval triggers a function that does stuff, that function's synchronous actions (like manipulating an array) will run to the end before any other interval callbacks can run.
The issue of shared mutable state is a problem for many other languages, but not for Javascript, at least in most cases.
I have created a small application for keeping track of how much time I spend in different courses (as a teacher) using Angular 5 and putting my data in Firestore via AngularFire2. Most things have worked out nicely, but for the final part of the application I am having serious problems. I am quite a newbie at Angular/Firebase and most importantly (perhaps) with reactive programming.
The data that is stored in Firestore is quite simple:
Data stored in Firebase It consists of six fields for information about the course, when the course was held and most importantly what I did (called Element) and for how long (called Duration).
Now, what I would like to do is to make a report that summaries all the durations for each thing I did (Element). In essence, what I would like to do is the equivalent of a GroupBy in SQL. As I understand it Firebase does not have an operation for grouping.
My, so far failed approach to this, has been to try to create an array into which I put the data and make calculations on that. It is here that I fail.
My first attempt is based on creating an observer as most examples for retrieving data in tutorials use. WorkItem is a simple class that has the same content as in Firebase.
this.lectureDoc = this.afs.collection(`users/${this.afAuth.auth.currentUser.uid}/regTime`,
ref => ref.where('course', '==', this.selectedCourse.courseName).orderBy('element') );
this.lectureItems = this.lectureDoc.snapshotChanges().pipe(
map (courses => courses.map(a => {
const data = a.payload.doc.data() as WorkItem;
const id = a.payload.doc.id;
return { id, ...data };
})
));
this.lectureItems.subscribe(item => {
item.forEach(i => {
this.allElements.push(i);
});
});
console.error(this.allElements);
console.error(this.allElements[0]);
The two last lines show the problem, namely that the first (of the two last lines) will return the complete array while the second will return 'undefined'.
I understand that reactive programming will make asynchronous calls and that I therefore cannot know for sure when the data is filled. However, I do not understand why I can see the content of the array in the first of the two last lines, but not in the second.
My second attempt is based on getting the data from the documents of the collection itself and also, to prevent empty arrays, using .then() but again the two last lines show exactly the same output as previously.
this.lectureDoc.ref.get().then(item => {
item.forEach(i => {
this.lectures.push(i.data() as WorkItem);
});
});
console.error(this.lectures);
console.error(this.lectures[0]);
The output can be seen in this image: Output from the running program
So, to recap, what I would like to do is to collect all the data in an array of WorkItem and then calculate how much time has been spent on different tasks and then display it in list on the web page. I have not come to the display part and I suspect that I will have trouble again with the array not being populated when bound to the list. I have a hard time to understand reactive programming...
Any help would be greatly appreciated!
//Tobias
This is a pretty common behaviour of the console as far as I'm aware, at least in Chrome. When you log data that's async but haven't been populated yet, the console will still show the array once the async operation is completed.
But when you log an object in the array, at the time that you log it that object isn't there yet, therefore the log will show as undefined.
Simply do the logging inside of the forEach and you will see the data being pushed properly.
An another case, example there: Console.log behavior
I am writing a scheduling program that returns JSON data about courses. I just started Node.js a week ago so I'm not sure if I'm thinking right.
I am trying to find a better way to write this code and avoid callback hell. I have already written the getJSON method.
/*getJSON(course-name, callback(JSONretrieved)): gets JSON info about name and
takes in a callback to manipulate retrieved JSON data*/
Now I want to get multiple course-name from a course array and check for time conflicts between them. I will then add all viable combinations to an answer array. My current idea is:
/*courseArray: array of classes to be compared
answers: array of all nonconflicting classes*/
var courseArray = ['MATH-123','CHEM-123']
var answers=[]
getJSON(courseArray[0],function(class1data){
getJSON(courseArray[1],function(class2data){
if(noConflict) answers.push( merge(class1data,class2data))
})
)
})
);
Finally, to access the answer array we wrap the entire code from above:
function getAnswers(cb){
/*courseArray: array of classes to be compared
answers: array of all nonconflicting classes*/
var courseArray = ['MATH-123','CHEM-123']
var answers=[]
getJSON(courseArray[0],function(class1data){
getJSON(courseArray[1],function(class2data){
/check for time conflicts between class1data and class2 data
if(noConflict(class1data,class2data)) answers.push( merge(class1data,class2data))
})
)
})
);
cb(answers)
}
and we call the function
getAnswers(function(ans){
//do processing of answers
console.log(ans)
})
My main question is if there is any way to make this code shorter, more readable, or less callback hecky.
You can use a promise library to make things easier for yourself. The way you're doing it can quickly get out of hand if the user selects more than a handful of courses to compare.
With something like async, you can make parallel calls to getJSON and your conflict code will run inside a single callback once all of the getJSON calls have returned. Your code will be much more readable and maintainable for large arrays of courses.
I am using async.js to work on an array that holds roughly 12'000 values. The problem I'm currently facing is that I am not 100% certain what I can take for granted from async.js. First let me show you an abstract of the iterator function im using (coffeescript):
baseFace = 0
dataBuffer = new DataBuffer() # my own class that wraps DataView on an Uint8Array
async.each(#blocks,
(block, callback) =>
numFaces = writeToBuffer(dataBuffer, block, baseFace)
baseFace += numFaces
callback(null)
,
() =>
console.log 'done'
)
So essentially the order in which the blocks are written to the buffers is unimportant, but there are two things I need to be assured of:
The entire writeToBuffer has to be done atomic, its not reentrant since it writes to the end of the given buffer and the block has to be written as one (multiple calls to DataView.setXXXX)
The baseFace variable too must be accessed atomically inside the function. it cant be that writeToBuffer is called for another element before the face count of the previous call to writeToBuffer has been added to baseFace
Basically you can say my iterator cannot be evaluated multiple times at the same time, not even with interleaving (like, writeToBuffer, writeToBuffer, baseFace += numFaces, baseFace += numFaces). Coming from C++/C# etc I always fear that something goes wrong when methods access data like in the above example.
Thank you for any insight or tips about that topic, Cromon