Google Script to remove duplicates exceeds processing time

Google Script to remove duplicates exceeds processing time - javascript

I have a Google Sheet that has 10k+ rows of data. While it should be rare, there could be instances of duplicate data being entered into the tab, and I have written a script to search for and remove those duplicates. For a while, this script has been running nicely and doing exactly what I expected it to do. But now that the tab has grown to over 10k rows, the script is exceeding the 6 minute time limit.
I've based this function on this tutorial.
// remove duplicates on Ship Details Complete
function duplicateShipDetailsComplete() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sourceSheet = ss.getSheetByName("Shipment Details Complete");
var sourceRange = sourceSheet.getRange(2, 1, sourceSheet.getLastRow(), 16)
var sourceData = sourceRange.getValues();
var keepData = new Array();
var deleteCount = 0;
for(i in sourceData) { // look for duplicates
var row = sourceData[i];
var duplicate = false; // initialize as not a duplicate
for(j in keepData) { // compare the current row in data to the rows in newData
if(row[2] == keepData[j][2] // duplicate Partner Invoice?
&& row[4] == keepData[j][4] // duplicate vPO?
&& row[5] == keepData[j][5] // duplicate SKU?
&& row[7] == keepData[j][7]) { // duplicate qty?
duplicate = true; // only if ALL criteria are duplicate, set row as a duplicate
}
}
if(!duplicate) { // If the row is NOT a duplicate
keepData.push(row); // add to newData
} else {
deleteCount++; // keep track of duplicates being deleted
}
}
sourceRange.clear();
sourceSheet.getRange(2, 1, keepData.length, keepData[0].length).setValues(keepData); // paste the keepData into the Working sheet
return deleteCount;
}
I've thought about breaking it up into pieces; process 1/3 of the data in each of 3 different calls. But this is actually being called from a different function that emails the returned deleteCount value, if it's greater than 0.
// Nightly email after checking for duplicates in Ship Details Complete
function sendEmailShipDetails() {
var deleted = duplicateShipDetailsComplete();
var update = parseFloat(SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Dept/Class").getRange(1,6).getValue()).toFixed(2);
if(deleted > 0) {
MailApp.sendEmail(
"me#myoffice.com",
"Shipment Details Cleaned Up",
deleted + " Shipment Detail line(s) were deleted from Complete as duplicates.\n" +
"Updated Value: " + update + "\n"
);
}
}
Even without that email function, it's exceeding the limit when I call duplicateShipDetailsComplete() directly. I suppose I could write three different functions (first 1/3, second 1/3, third 1/3) and update a cell somewhere with the results for each, and then call the email function separately to get that value. I'd feel a little better about that if I could write 1 function and pass parameters to it, but this is all coming from a Time Based Trigger, and you can't pass parms from those. But before I started to do that, I thought I'd check to see if someone had other suggestions on how I could make the existing code more efficient. Or, see if someone had totally different ideas on how I can do this.
Thanks

Related

Skipping cell in loop, and tweaking a function to move range

I have a google sheet for my business that creates a prep list for events.
After tackling an automatic clearing issue for over 25 hours(serious);I got some help from your fantastic community, modified the code to work, see below: (If Fx is empty, clear the cell to the left):
function cgar() {
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('auto');
var values = sheet.getRange("F11:F20").getValues();
var ranges = values.reduce((ar, [f], i) => {
if (f == "") ar.push("E" + (i + 11));
return ar;
}, []);
sheet.getRangeList(ranges).clearContent();
}
Is there a way for me to skip a specific cell, either based on location or value = "Garnish"?
If I change the code to take the whole range of "F5:F24" instead of the individual groups "F11:F20" it clears the header rows of the merged cell E10:F10 & E21:F21.
2.
var syr1 = ("D6")
var syr2 = ("D7")
var syr1isblank = SpreadsheetApp.getActiveSheet().getRange(syr1).isBlank()
var syr2isblank = SpreadsheetApp.getActiveSheet().getRange(syr2).isBlank()
if (syr1isblank == true) {
SpreadsheetApp.getActiveSheet().getRange("C16:D23").moveTo(sheet.getRange(syr1))
}
else if (syr2isblank == true && syr1isblank == false ) {
SpreadsheetApp.getActiveSheet().getRange("C16:D23").moveTo(sheet.getRange(syr2))
}
I have been using if and else if to move the group ranges that are below the cleared ones to the newly created space above, but it's clunky:
Is there a way that I can also modify this to scan for empty range, I think the process would be like this:
first empty cell spotted
Mark the cell to the left to a variable?
Scan the range f.x. E11:F20 for lastRow after it has been cleared by
the function above
Move the range to the variable marked cell above.
Image of the sheet in question: https://i.stack.imgur.com/ND2DC.png
I would greatly appreciate any assistance, thank you.

How to create a function to count items from a set and store counts in an array parallel to one containing related items?

I am having trouble completing one of the last assignments in my semester-long high school-level programming class. I have been assigned to create a JavaScript program which counts the amount of time different ZIP codes appear in a set and output parallel arrays containing the zip codes and their counts. I am having difficulty getting the values to output. I believe that the respective zips and counts aren't being entered into their arrays at all.
I'm not looking for an original solution to the problem. I'd just like someone to tell me why my code isn't working, and possibly what I can change in my code specifically to fix it.
Usually I would never ask for help like this. I actually took the class last semester and now that I'm at the end of the year I have the option of completing it to earn college credit. I have never been the best at working with functions, and that remains true now. In the code below are all the moving parts I'm allowed to work with. I know it looks messy and rudimentary, but it's all I know. I'd appreciate it if any answers use only the sorts of things I used in my code. Another note, I am required to use functions for 'all identifiable processes', but I'm pretty sure my instructor only cares about the final product, so I'm not sure that the functions really matter, even if they could help.
var records = openZipCodeStudyRecordSet(),
uniqueZips = [],
zipCounts = [],
output = "";
function project62Part1() {
table = document.getElementById("outputTable");
function countZips(zip) {
var currentZip,
count;
while (records.readNextRecord()) {
currentZip = records.getSampleZipCode();
if (zip === currentZip) {
count++;
}
}
return count;
}
function processZip(zip) {
var currentZip;
while (records.readNextRecord()) {
currentZip = records.getSampleZipCode();
for (i = 0; i < uniqueZips.length; i++) {
if (uniqueZips[i] === "") {
uniqueZips.push(currentZip);
zipCounts[i] = countZips(currentZip);
break;
}
if (zip !== uniqueZip[i]) {
uniqueZips.push(currentZip);
zipCounts[i] = countZips(currentZip);
}
}
}
}
function createOutput(string) {
for (i = 0; i < uniqueZips.length; i++) {
string += "<tr><td>" + uniqueZips[i] + "</td><td>" + zipCounts[i] +
"</td></tr>";
}
return string;
}
processZip();
output = createOutput(output);
table.innerHTML = "<tr><td>Zip Code</td><td>Count</td></tr>" + output;
}
The output is supposed to be additional rows of zips and counts added to a table that is already set up on the page. There are no important technical errors in the code.
This is to be accomplished through the function processZip, which is meant to add respective zip and count into table rows. However, it appears as though the zip and count arrays its getting info from haven't had anything put into them by the other functions. I don't know if it is because of error in calling the functions, or what's in the functions themselves.
The HTML page this is connected to calls the function project62Part1().

That code is kind of all over the place but here's the logic you ideally want to follow:
Loop over each record in your table (outer loop) to get the zip code.
Declare an 'isFound' variable and set it to false
For each iteration of the outer loop, loop over your entire array of zip codes (inner loop).
3a. If you get a match, set isFound to true, increment your zipcode counter += 1 on the same index (since they're parallel arrays)
3b. If, at the end of your inner loop, isFound is still false, add the zipcode to your array of zip codes, and add a new array element to your zip code counters setting it to 1.
Since your zip code array and your zip code counter are parallel arrays to each other, when isFound is false, you are creating entries in both arrays, keeping them parallel to each other.
If, on 3a isFound is true, you are on the index of the zip code array that the zip code belongs to, so it should be the same index for your counter array.
In your current process zip function, the first condition will never be true, because starting out, your array size is 0 and after you start populating that array, you will never have an empty string (unless, of course, the zip code itself was an empty string)
The second if statement you have that checks if zip !== uniqueZip[i] - you are only checking that current value of uniqueZips and ignoring every other value in the array, so you will almost always have the second condition as true

I've been playing with the newer JavaScript language and syntax and your item was a good candidate for me to try out.
I did approach the code a little differently such as making the use of a Set for the unique values. Saves on code by not having to check and see if the value exists because the Set will never allow duplicate values in.
var uniqueZips = new Set();
const zipcodes = [21060, 22422, 25541, 43211, 21060, 22422, 22422, 43211, 43211, 43211];
function project62Part1() {
function processZipCodes() {
for(let index in zipcodes){
// We add every value because a SET will only allow you to add it once.
uniqueZips.add(zipcodes[index]);
}
}
// Structure our zipcode data information
function organizeZipCodeData() {
let response = {data:[]};
uniqueZips.forEach(function(zip) {
response.data.push( { 'zipcode':zip, 'appears': countZipAppearances(zip) })
});
return response;
}
function countZipAppearances(zip) {
// Default to zero even though you never expect an undefined
let count = 0;
zipcodes.forEach(function(zval) {
if (zip === zval) {
count++;
}
});
return count;
}
function showZipcodeInformation(data){
for (var index in data) {
if (data.hasOwnProperty(index)) {
var entry = [data[index]][0];
console.log(entry.zipcode, entry.appears);
}
}
}
// UI CONTENT: Construct the UI view from the data
function generateHtmlView(data){
let htmlview = "<table><tr><td>Zip Code</td><td>Count</td></tr>";
for (var index in data) {
if (data.hasOwnProperty(index)) {
var entry = [data[index]][0];
htmlview+="<tr><td>"+entry.zipcode+"</td><td>"+entry.appears+"</td></tr>";
}
}
htmlview += "</table>";
console.log(htmlview);
return htmlview;
}
// //////////////////////////////////////////////////////
// Call to gather the zipcodes
processZipCodes();
// Call to organize the zipcode data
let output = organizeZipCodeData();
// See what we have in the organized data
showZipcodeInformation(output.data);
// See what we have in the html content
generateHtmlView(output.data);
}
// Initiate the process
project62Part1();

Google Script for a Sheet - Maximum Execution time Exceeded

I'm writing a script that's going to look through a monthly report and create sheets for each store for a company we do work for and copy data for each to the new sheets. Currently the issue I'm running into is that we have two days of data and 171 lines is taking my script 369.261 seconds to run and it is failing to finish.
function menuItem1() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet1 = ss.getSheetByName("All Stores");
var data = sheet1.getDataRange().getValues();
var CurStore;
var stores = [];
var target_sheet;
var last_row;
var source_range
var target_range;
var first_row = sheet1.getRange("A" + 1 +":I" + 1);
//assign first store number into initial index of array
CurStore = data[1][6].toString();
//add 0 to the string so that all store numbers are four digits.
while (CurStore.length < 4) {CurStore = "0" + CurStore;}
stores[0] = CurStore;
// traverse through every row and add all unique store numbers to the array
for (var row = 2; row <= data.length; row++) {
CurStore = data[row-1][6].toString();
while (CurStore.length < 4) {
CurStore = "0" + CurStore;
}
if (stores.indexOf(CurStore) == -1) {
stores.push(CurStore.toString());
}
}
// sort the store numbers into numerical order
stores.sort();
// traverse through the stores array, creating a sheet for each store, set the master sheet as the active so we can copy values, insert first row (this is for column labels), traverse though every row and when the unique store is found,
// we take the whole row and paste it onto it's newly created sheet
// at the end push a notification to the user letting them know the report is finished.
for (var i = stores.length -1; i >= 0; i--) {
ss.insertSheet(stores[i].toString());
ss.setActiveSheet(sheet1);
target_sheet = ss.getSheetByName(stores[i].toString());
last_row = target_sheet.getLastRow();
target_range = target_sheet.getRange("A"+(last_row+1)+":G"+(last_row+1));
first_row.copyTo(target_range);
for (var row = 2; row <= data.length; row++) {
CurStore = data[row-1][6].toString();
while (CurStore.length < 4) {
CurStore = "0" + CurStore;
}
if (stores[i] == CurStore) {
source_range = sheet1.getRange("A" + row +":I" + row);
last_row = target_sheet.getLastRow();
target_range = target_sheet.getRange("A"+(last_row+1)+":G"+(last_row+1));
source_range.copyTo(target_range);
}
}
for (var j = 1; j <= 9; j++) {
target_sheet.autoResizeColumn(j);
}
}
Browser.msgBox("The report has been finished.");
}
Any help would be greatly appreciated as I'm still relatively new at using this, and I'm sure there are plenty of ways to speed this up, if not, I'll end up finding a way to break down the function to divide up the execution. If need be, I can also provide some sample data if need be.
Thanks in advance.

The problem is calling SpreadsheepApp lib related methods like getRange() in each iteration. As stated here:
Using JavaScript operations within your script is considerably faster
than calling other services. Anything you can accomplish within Google
Apps Script itself will be much faster than making calls that need to
fetch data from Google's servers or an external server, such as
requests to Spreadsheets, Docs, Sites, Translate, UrlFetch, and so on.
Your scripts will run faster if you can find ways to minimize the
calls the scripts make to those services.
I ran into the same situation and, instead of doing something like for(i=0;i<data.length;i++), I ended up dividing the data.length into 3 separate functions and ran them manually each time one of them ended.
Same as you, I had a complex report to automate and this was the only solution.

indexeddb partial key search get next

Does the indexeddb CursorWithValue store what the next or prev record will be BEFORE I call cursor.continue()? Can I look at the IDBCursorWithValue object and then store the pointer to the next record?
Is it possible to get the first record via a partial key, then get the next record ONLY when the user clicks for the next record without buffering a collection of the records in an array?
I understand I can use cursor.continue() to get all the matching records and store in an array. I also understand that being asynchronous, if I just take the first matching record, and terminate my onsuccess function that call to the db is terminated and I'm fairly sure that I then lose the ability to link to the next record.
The following works and I can get one or all matching records of the partial key. With the \uffff I basically get matching alpha and all greater records.
storeGet = indexTITLE.openCursor(IDBKeyRange.bound(x.value, x.value, '\uffff'), 'next');
This is all new to me, perhaps I'm looking at this all wrong. Any advice is appreciated. I've been reading every thread on here and github that I can, hoping someone else already was doing this with indexeddb.

Let me try and restate the problem:
You've iterated a cursor part-way through a range. Now you want to stop and wait for user input before continuing. But the transaction will close, so you can't just continue on the click. What do you do instead?
First off: great question! This is tricky. You have a handful of different options.
In the simplest case, you have a unique index (or an object store) so there are no duplicate keys.
var currentKey = undefined;
// assumes you open a transaction and pass in the index to query
function getNextRecord(index, callback) {
var range;
if (currentKey === undefined) {
range = null; // unbounded
} else {
range = IDBKeyRange.lowerBound(currentKey, true); // exclusive
}
var request = index.openCursor(range);
request.onsuccess = function(e) {
var cursor = request.result;
if (!cursor) {
// no records found/hit end of range
callback();
return;
}
// yay, found a record. remember the key for next time
currentKey = cursor.key;
callback(cursor.value);
};
}
If you have a non-unique index it is more tricky since you need to store the index key and primary key, and there's no way to open the cursor right at that position. (See the feature request: https://github.com/w3c/IndexedDB/issues/14) So you need to advance the cursor just past the previously seen key/primaryKey position:
var currentKey = undefined, primaryKey = undefined;
// assumes you open a transaction and pass in the index to query
function getNextRecord(index, callback) {
var range;
if (currentKey === undefined) {
range = null; // unbounded
} else {
range = IDBKeyRange.lowerBound(currentKey, true); // exclusive
}
var request = index.openCursor(range);
request.onsuccess = function(e) {
var cursor = request.result;
if (!cursor) {
// no records found/hit end of range
callback();
return;
}
if (indexedDB.cmp(cursor.key, currentKey) === 0 &&
indexedDB.cmp(cursor.primaryKey, primaryKey) <= 0) {
// walk over duplicates until we are past where we were last time
cursor.continue();
return;
}
// yay, found a record. remember the keys for next time
currentKey = cursor.key;
primaryKey = cursor.primaryKey;
callback(cursor.value);
};
}
I'm assuming there's not an upper bound, e.g. we want all records in the index. You can replace the initialization of range as appropriate.

Google Apps Script double deleting rows in spreadsheet

function onEdit() {
var openRequests = SpreadsheetApp.getActive().getSheetByName('Open Requests');
var lastRowOpen = openRequests.getLastRow();
var closedRequests = SpreadsheetApp.getActive().getSheetByName('Closed Requests');
var lastRowClose = closedRequests.getLastRow();
var closed = openRequests.getRange(2,8,lastRowOpen,1).getValues();
for (var i = 0; i < lastRowOpen; i++)
{
if (closed[i][0].toString() == 'Yes')
{
var line = i+2;
if (closedRequests.getLastRow() == 1)
{
openRequests.getRange(line,1,1,9).copyTo(closedRequests.getRange(2,1,1,9));
closedRequests.getRange(2,9,1,1).setValue(new Date());
openRequests.deleteRow(line);
}
else
{
openRequests.getRange(line,1,1,9).copyTo(closedRequests.getRange(lastRowClose+1,1,1,9));
closedRequests.getRange(lastRowClose+1,9,1,1).setValue(new Date());
openRequests.deleteRow(line);
}
}
}
}
I have set up a trigger to run onEdit. What it does is check a column called Closed to see if it says Yes. The Closed column has a data validation drop down menu with the value Yes in it.
So when I click on the drop down menu and select Yes, it should copy the whole row to another sheet called Closed Requests then delete that row from the spreadsheet called Open Requests.
The issue I am having is that about 50% of the time, it deletes the row I select Yes to but it ALSO deletes the row below it (and about 50% of the time when this happens, only some times does the second deleted row show up in Closed Requests, the other times the whole row just disappears forever unless I undo).
From what I can tell, the deleteRow() function deletes the whole row and shifts all rows below it up a row to fill in the blank. So the row below the one meant to be deleted gets shifted up to the same row and also gets deleted. I don't know why the function is getting called twice though.
I tried adding some delays but it does not seem to be working.

function onEdit(e) {
var eRange = e.source.getActiveRange();
var openRequests = SpreadsheetApp.getActive().getSheetByName('Open Requests');
var closedRequests = SpreadsheetApp.getActive().getSheetByName('Closed Requests');
var nextRowClose = (closedRequests.getLastRow()?closedRequests.getLastRow()+1:2);
if(eRange.getSheet().getName()=="Open Requests" && eRange.getColumn()==8 && eRange.getValue()=="Yes") {
openRequests.getRange(eRange.getRow(), 1, 1, 9)
.copyTo(closedRequests.getRange(nextRowClose, 1));
closedRequests.getRange(nextRowClose, 9).setValue(new Date());
openRequests.deleteRow(eRange.getRow());
}
}

Could try iterating backwards as it was mentioned to me. Throwing in a SpreadsheetApp.flush() after the delete may help too.

#Jack, I have a similar use case to you. My code is the backwards one that BryanP discusses. My code is more or less here: "Batch removal of task items where status = 'Done'". It is because I remove them in a batch that I use the backwards method whereby the removal of a row with a higher row number will not disturb the row number of any rows with a lower row number.
But you are not removing rows in batch mode, so maybe backwards shouldn't make a difference (perhaps unless two users use the sheet and delete at the same time?)
So thought I'd try your code. I shoe horned your code into the onedit() function that is already present on my spreadsheet (which is used to colour rows red after a period of inactivity, and to put in a timestamp once the task is actually attended).
Then to test I used a copy of one of our spreadsheet which had already 50 rows/tasks in it. I manually filled in the required cells in a row and selected Done from the cell with the dropdown (I changed your code to expect "Done" rather than "Yes"). I repeated this for 20 rows.
The Result: Your code succeeded as you had expected it to every one of the 20 times ... no double deletes, always copying data across. It worked for me without introducing delays nor SpreadsheetApp.flush().
I don't have a solid suggestion I am afraid. In passing I mention the known fault where the spreadsheet has not properly refreshed itself, so does not show the deleted rows; this can be checked for by manually refreshing the spreadsheet when this fault appears. (However, the indications of this fault does not seem to logically fit with your report about the double copying over of two sequential rows.)

Thread lock? Sounds like a thread lock problem. Try:
function onEdit() {
// ****** add lock code
var lock = LockService.getPublicLock();
var hasMutex = lock.tryLock(100);
if(hasMutex==false) {
return;
}
// *** end
var openRequests = SpreadsheetApp.getActive().getSheetByName('Open Requests');
var lastRowOpen = openRequests.getLastRow();
var closedRequests = SpreadsheetApp.getActive().getSheetByName('Closed Requests');
var lastRowClose = closedRequests.getLastRow();
var closed = openRequests.getRange(2,8,lastRowOpen,1).getValues();
for (var i = 0; i < lastRowOpen; i++)
{
if (closed[i][0].toString() == 'Yes')
{
var line = i+2;
if (closedRequests.getLastRow() == 1)
{
openRequests.getRange(line,1,1,9).copyTo(closedRequests.getRange(2,1,1,9));
closedRequests.getRange(2,9,1,1).setValue(new Date());
openRequests.deleteRow(line);
}
else
{
openRequests.getRange(line,1,1,9).copyTo(closedRequests.getRange(lastRowClose+1,1,1,9));
closedRequests.getRange(lastRowClose+1,9,1,1).setValue(new Date());
openRequests.deleteRow(line);
}
}
}
// ****** add lock code
lock.releaseLock();
// *** end
}
Questions:
1) how many people were using the spreadsheet at the time.
2) how often does it happen.

We Keep Coding

JavaScript is the programming language of the Web.