Data Scraping With ImportHTML in Apps Script & Google Sheets - javascript

Goal: I am trying to pull data from a website and use it to create a big table. I can tell that I'm very close to getting this to work, but I've reached a roadblock.
Background:
I have a google sheet with three pages. (1) Titled "tickers" is a list of every ticker in the S&P 500, in rows A1-A500. (2) Titled actionField is just a blank page used during the script. (3) Titled resultField will hold the results. The website I am pulling from is (http://www.reuters.com/finance/stocks/companyOfficers?symbol=V) Though, I want the script to work (with minor modification) for any data accessible through importHtml.
Script:
The script I currently have is as follows:
function populateData() {
var googleSheet = SpreadsheetApp.getActive();
// Reading Section
var sheet = googleSheet.getSheetByName('tickers');
var tickerArray = sheet.getDataRange().getValues();
var arrayLength = tickerArray.length;
var blankSyntaxA = 'ImportHtml("http://www.reuters.com/finance/stocks/companyOfficers?symbol=';
var blankSyntaxB = '", "table", 1)';
// Writing Section
for (var i = 0; i < arrayLength; i++)
{
var sheet = googleSheet.getSheetByName('actionField');
var liveSyntax = blankSyntaxA+tickerArray[i][0]+blankSyntaxB;
sheet.getRange('A1').setFormula(liveSyntax);
Utilities.sleep(5000);
var importedData = sheet.getDataRange().getValues();
var sheet = googleSheet.getSheetByName('resultField');
sheet.appendRow(importedData)
}
}
This successfully grabs the ticker from the tickers page. Calls importHtml. Copies the data. And appends SOMETHING to the right page. It loops through and does this for each item in the ticker list.
However, the data being appended is as follows:
[Ljava.lang.Object;#42782e7c
[Ljava.lang.Object;#2de9f184
[Ljava.lang.Object;#4b86a4d0
That displays across many columns, for as many rows as there are iterations in the loop.
How do I successfully append the data?
(And any advice on improving this script?)

The appendRow method is not suitable here. As it only appends one row, its argument is expected to be a 1D array of values.
What you get from getValues is normally a 2D array of values, like [[a,b], [c,d]]. Even if it's just one row, getValues will return [[a,b]]. The only exception is a single-cell range, for which you get just the value in that cell. It's never a 1D array.
If just one row is needed, use, e.g., appendRow(importedData[0]).
Otherwise, insert the required number of rows and assign the 2D array of values to them.
var sheet = googleSheet.getSheetByName('resultField');
var lastRow = sheet.getLastRow();
sheet.insertRowsAfter(lastRow, importedData.length);
sheet.getRange(lastRow + 1, 1, importedData.length, importedData[0].length)
.setValues(importedData);

Related

Google Scripts how to delete extra rows from an array

I am working on a google sheets template that will have some roster maintenance built in. When rosters are updated on the main "roster" tab, I would like for all the other tabs in the sheet to check student ID #s against the updated roster tab. In the code, an example sheet is "anet" sheets the sheets. I am using indexOf and a for loop to check each value in the "anet" sheet against the IDs in the "roster" sheet. If an ID# has been removed from the "roster" sheet, I would like that row to be deleted in the "anet" sheet.
When I run the script right now, some of the rows are deleted, but not all of them. The list of IDs begins in A3 on the "roster" tab, and the other list begins in A15 on the "anet" tab. Can someone help me understand why it is deleting some of the rows returning an indexOf of -1, but not all of the rows I need deleted?
function withdrawnStudent (){
let lastRowTyler = roster.getLastRow();
let tylerData = roster.getRange(3,1,lastRowTyler,1).getValues();
let tylerArray = tylerData.map(function(r){ return r[0]});
let anetLastRow = anet.getLastRow();
let anetLastColumn = anet.getLastColumn();
let anetData = anet.getRange(15,1,anetLastRow,anetLastColumn).getValues();
let anetIDArray = anetData.map(function(r){ return r[0]});'''
for (let index = 14; index < 200; index++){
if(tylerArray.indexOf(anetIDArray[index][0]) === -1){
anet.deleteRow(index +14);
Logger.log(tylerArray.indexOf(anetIDArray[index][0]))
Here is a link to an example spreadsheet. In the "roster" tab, it lists 4th grade student IDS. In the "anet" tab, all rows with a number should be deleted because these are 5th grade IDs. However, not all rows are getting deleted, only some.
https://docs.google.com/spreadsheets/d/1vDse6X6gs3bkgnlBfgo-vzERkAMud3rUDC6j8fEkcrk/edit#gid=447751616
So when the document changes, set up a trigger to fire your script, and your script will loop through all the available IDs in the first sheet and save them to an array. Then in your second sheet, you will loop through the IDs, and if it is not in the array, then delete the row. We want to make sure that we run the loop backward because if we delete rows and keep moving down, the chart will be skipping rows here and there since the table has shifted upwards.
Here's what I was able to come up with:
function withdrawStudent() {
//Get Student IDs From Roster Spreadsheet
var rosterSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Roster");
var dataRangeOnRosterSheet = rosterSheet.getDataRange();
//Returns a nested Array of all values in the 3rd row, 1st column, all the rows to the end, only one column
//I added the flat() to make it into a one-dimenstional array
var studentIDs = rosterSheet.getRange(3, 1, dataRangeOnRosterSheet.getLastRow() - 1, 1).getValues().flat();
Logger.log(JSON.stringify(studentIDs)); //If you want to see what the data looks like
//Now loop through each student ID in the second sheet, and if it doesn't exist in our first array then delete the row
var ANetSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("ANet");
var dataRangeOnANetSheet = ANetSheet.getDataRange();
var lastRow = dataRangeOnANetSheet.getLastRow();
var firstRow = 15;
//Reverse the for loop to work bottom-up because row deletion shifts the chart
for (var i = lastRow; i >= firstRow; i--) {
var currentStudentID = ANetSheet.getRange(i, 1, 1, 1).getValue(); //Get Student ID of current row
//If the currentStudentID is not found in our list of student IDs, remove it
if (!studentIDs.includes(currentStudentID)) {
//Remove the row
ANetSheet.deleteRow(i);
}
}
}
How to set up your trigger so that it runs your function every time a user edits the chart:
Disclaimer: I made a copy of your document so I could test my code and make sure it works, but I'm deleting it now. Hope you are fine with that!

Google Sheets Apps Script Copy Row of Data If date same as tab name

I found some code that almost does what i need and have tried playing around with it to get it to work, but no luck. I get an export with data with dates in the last column on every row.
I simply want to copy the last column rows of dates to the tabs with the same name.
function MoveDate_FourthDEC() {
var ss=SpreadsheetApp.getActive();
var sh1=ss.getSheetByName("Import");
var sh2=ss.getSheetByName("4/12/2020");
var rg1=sh1.getRange(2,1,sh1.getLastRow(),32);//starting at column2
var data=rg1.getValues();
for(var i=0;i<data.length;i++) {
// 13 = collected should be in this column which is column N
if(data[i][31]=="4/12/2020") {
sh2.appendRow(data[i]);
}}}
Explanation:
Your goal is to copy all the rows for which column AF matches the names of the sheets.
To begin with, you can use forEach() to iterate over every sheet. For each sheet, you want to check whether the sheet name matches a date in column AF. If it does, then you need to filter only the rows that contain this date in column AF and store them into an temporary array:
let temp_data = data.filter(r=>r[31]==sh.getName());
Then you can efficiently copy and paste all the relevant rows to the matching sheet:
sh.getRange(sh.getLastRow()+1,1,temp_data.length,temp_data[0].length).setValues(temp_data);
Side notes:
When dealing with date objects you need to consider the display values in the sheet. This is why I am using getDisplayValues instead of getValues.
Since data starts from the second row, you need to deduct one row from the last row with content, to get the correct range:
getRange(2,1,sh1.getLastRow()-1,32)
I am using includes to check if the sheet name matches the last column. In order to use includes you need to flatten the 2D array that is returned by the getDisplayValues function.
Solution:
function MoveDate_FourthDEC() {
const ss = SpreadsheetApp.getActive();
const sh1 = ss.getSheetByName("Import");
const shs = ss.getSheets();
const dts = sh1.getRange('AF2:AF'+sh1.getLastRow()).getDisplayValues().flat();
const data=sh1.getRange(2,1,sh1.getLastRow()-1,32).getDisplayValues();
shs.forEach(sh=>{
if (dts.includes(sh.getName())){
let temp_data = data.filter(r=>r[31]==sh.getName());
sh.getRange(sh.getLastRow()+1,1,temp_data.length,temp_data[0].length).setValues(temp_data);
}});
}

Filter function will not delete my empty rows - Google App Script

I want to import rows from one google sheet to the other, however source sheet imports a number of empty rows. Now I use a filter function to get rid of these rows but they will not disappear, can anyone tell me why?
var a = SpreadsheetApp.openByUrl("url").getSheetByName("Admin Use Only").getRange(4,1,6,21).getValues();
var b = SpreadsheetApp.getActive().getSheetByName('Credit_Detail');
b.getRange(b.getLastRow() +1, 1, a.length,21).setValues(a);
//filter function below:
var otarget=b.getRange(2,1,b.getLastRow()-1, 26).getValues();
var data=otarget.filter(function(r){
return !r.every(function(cell){
return cell === "";});
});
Logger.log(data);
b.getRange("A2:Z").clearContent();
b.getRange(3,1,data.length,data[0].length).setValues(data);
here's how I would do it. First, create an variable to store the array of the source. then run a for loop scanning the first column for empties. something like: for (var i = 0, i < data.length; i++) { if (data[i][0] != '') { XXXX } }
XXXX means that you can either put a code to create a new set of array which can be passed to the target sheet at once or use append row to transfer non blank rows to the target sheet one by one.
Note: Creating a new array to store non-empty rows would speedup the execution time if you are dealing with large data, thousands of rows.

Google Apps Script - how to reference object in loop in function

I am trying to create a small invoicing organization system using google sheets/drive. I have one sheet I call "tasks", where I plan to control everything from. Some of my columns include, "Client", "Project", "Requirements", "Details", "subcontractor"... As I acquire new tasks/clients, i'd find and append information respective of the task ("Project", "Requirements") to other sheets or, if none exist, create the folders, sheets, and append the respective necessary information from the "tasks" sheet to the new sheets. Some of the sheets will be sent to subcontractors, dependent on whether or not their tasks were updated or new ones were assigned to them in the original "tasks" sheet.
Within the sheets I send to subcontractors, there will be fields for them to fill out (rate, eta..), once filled, I will send that info to a third sheet to apply some margins, extra fees, and then send the info back to the original "tasks" sheet where it will fill appropriate cells.. Once all of the necessary information in a row is filled, it will be prepared and organized into an invoice for the client specified in the "client" column...
Anyways, i've been trying to learn javascript to implement all of this. As I plan to create folders, sheets, and append information based on the values entered in the rows and columns of the "tasks" sheet... I've placed a for loop in an onEdit function that does the following:
function onEdit(e) {
var ss = e.range.getSheet().getParent();
var sheet = e.range.getSheet();
var row = e.range.getRow();
var columns = [1, 2, 3, 4, 5, 6, 7, 8, 9];
//assign titles as 'keys' to array
var titles = []
//assign values of edited row to array
var values = []
//create an object to associate the title to the new edited values
var task = {}
for(var i in columns){
titles.push(sheet.getRange(1, columns[i]).getValue()); //push titles
values.push(sheet.getRange(row, columns[i]).getValue()); //push values of updated row
task[titles[i]] = values[i]; //add the values to their property names in task object
}
This works, and I can reference task["Client"], but i'd like to put this loop in a function so that I can use it again. I suppose I could do without it, but array "columns" only represents the columns I will be inputting on the "first round" --- when im sending information out...I will be inputting new information to columns 10-15, then 16-20, as the tasks progress.. and i'd like to run the for loop for those columns without having to create separate loops. To do this i've created the GetInfo function below:
function GetInfo(row,column){
for(var i in column){
titles.push(sheet.getRange(1, column[i]).getValue()); //push titles
values.push(sheet.getRange(row, column[i]).getValue()); //push values of updated row
this.task[titles[i]] = values[i]; //add the values to their task
}
}
What I am trying to accomplish is similar to what is outlined here. However, the "for(var..in..") is not mentioned in the examples and I think im missing something. In attempt to use the function for the first array of columns ive done this:
var list = new GetInfo(row,columns);
i'd like to reference the task as follows
list.task["client"]
Or var.task["name"], but the above doesn't work. When I toast list.task["Client] or try to append it to another cell, nothing happens - its blank. What am I doing wrong? How do I accomplish this correctly? What should I do?
Any help or guidance would be greatly appreciated. Please.
(other toasts are working, and the respective cell is not blank, without the function the for var in works)

Pulling Trello card members into google sheet

I'm sure this is insanely simple but I've been bashing my head against it on and off for a week now and getting nowhere.
I'm pulling details of Trello cards into a google sheet with the ultimate aim of sending an email to the members of any Trello card from a given list that's exceeded a certain number of days since the last activity.
I'm able to get the cards and relevant fields from Trello using
//get all cards from board
var url = "https://api.trello.com/1/boards/<board id>/cards?filter=open&fields=name,idList,url,dateLastActivity,idMembers,idLabels&key=<key>&token=<token>";
var get = UrlFetchApp.fetch(url);
and then to parse the return
//parse JSON
var dataSet = JSON.parse(get);
The problem I've encountered is that where there's more than one member of a card the IDs are returned as a comma seperarated list within a sub-array like this
{idMembers=[member_id, member_id, member_id], idLabels=[label_id], name=card_name, dateLastActivity=date, id=card_id, idList=list_id, url=https://trello.com/c/XXxxxXXxx/blahblahblahblahblahblah}
Looping through parsed JSON and pushing the output to a new array all the member IDs are still all together for each card but when they're passed to the spreadsheet only the first member is making the trip.
Code below
for (i = 0; i < dataSet.length; i++) {
data = dataSet[i];
rows.push([data.id,
data.name,
data.idList,
data.dateLastActivity,
data.idMembers,
data.idLabels,
data.url,
]);
}
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheetByName("SheetName");
var last_row = sheet.getLastRow();
var last_column = sheet.getLastColumn();
sheet.getRange(2,1,last_row,last_column).clear();
dataRange = sheet.getRange(2, 1, rows.length, 7);
dataRange.setValues(rows);
I've tried looping through the array again to seperate out the member IDs
for(j = 0; j < data.idMembers.length; j++) {
but that hasn't helped and I'm not sure it's even along the right lines to a solution. I'm not even sure the problem's in the code and not the google sheet's handling of comma separated values.
In the spreadsheet I don't much care (at the moment) whether I get repeated cards with a single member ID on each, an additional column for each member ID or all the member IDs together in a single cell I just want all the member IDs so I can hassle them to deal with their abandoned cards clogging up my lists.
Any and all suggestions gratefully received
Try this code, using join:
rows.push([data.id,
data.name,
data.idList,
data.dateLastActivity,
data.idMembers.join(),
data.idLabels,
data.url,
]);
That worked for me, gives list of members, devided by comma:
5467844a...,52cd1e6d27...,55bf2a090...

Categories