Hi i Have 4 google sheets using import xml (because of google 50 import limits) on each which are sorting and then feeding data from a webpage to another 'summary' sheet. I just need a script for the 4 google sheets that refreshes the importxml every minute or so.
Either that, or to refresh the importxml when the specified information on the target (source) web page changes.
also, as this would be used from a mobile device some of the time, would the 4 sheets have to be kept open ?
I eventually came across, sorry cant find the OP for it now - seems to do the trick ..
function getData() {
var queryString = Math.random();
var cellFunction1 = '=IMPORTXML("' + SpreadsheetApp.getActiveSheet().getRange('D1').getValue() + '?' + queryString + '","'+ SpreadsheetApp.getActiveSheet().getRange('E1').getValue() + '")';
SpreadsheetApp.getActiveSheet().getRange('D2').setValue(cellFunction1);
}
Related
tl;dr - After exporting a Google Doc as an HTML file and pasting the HTML into a GMail draft it does not contain the formatting from the original Google Doc (other than hyperlinks).
Code snippet:
//copies the doc to HTML format
var htmlExport = "https://docs.google.com/feeds/download/documents/export/Export?id=" + docID + "&exportFormat=html";
var param = {
method: "get",
headers: {"Authorization": "Bearer " + ScriptApp.getOAuthToken()},
muteHttpExceptions: true,
};
var htmlExportText = UrlFetchApp.fetch(htmlExport,param).getContentText();
//the variables below (contactEmail & emailSubject) are both taken from a spreadsheet
//copies recent draft body to new email, then updates body of new email to include HTML export
var draftEmailBody = GmailApp.getMessageById(draftEmailID).getBody();
var draftToSend = GmailApp.createDraft(contactEmail,emailSubject,'',{htmlBody: htmlExportText + draftEmailBody}).getMessageId();
Long version:
I am building a mail merge that pulls contact info from a GSheet and uses GDoc as the template for the body. The GDoc has several bits of formatting in it (bold, italics, superscript) that, when exported as an HTML using the script above, appear in the GMail draft devoid of formatting (for some reason it leaves the hyperlinks). For some odd reason it even leaves the images from the doc!
The GMail draft pulled into the body (draftEmailBody) does, however, keep all it's formatting. I can only assume this means I'm doing something wrong by using getContentText but I don't know how else to go about it.
(This is completely separate and I should probably just make another question for this, but I'm here so...)
Separately, I wanted to have the script edit specific fields within the GDoc template, but I have run into 2 issues.
Problem 1 - I have found no way to replace specific text within a GMail draft.
Workaround 1 - I have the script edit the text in a GDoc instead, using repalceText. This, however, leads to:
Problem 2 - Using replaceText in a GDoc requires you to saveAndClose before the script can recognize the change. For some reason I can never get my script to open the GDoc again, despite including openByID in various places of the script!
Workaround 2 - I create a copy of the doc for each contact, replacing the text within that doc, then trash all of the copies on completion so there's no clutter. Quite clunky and slow but it gets the job done.
While it's not the prettiest solution, I found something that helps:
Google Scripts: Generating Email from Docs Loses Formatting
In my google sheet I have some data in it. Later I published my sheet as .csv format.
Pls find my sheet here, below are the Cell Values
C1 = tdsyltt = 'ఈరోజు ( బుధవారం ) క్విజ్ సిలబస్' ;
C2 = tdsyl = 'హబక్కూకు 1 & యాకోబు 2, 3' ;
C4 = document.getElementById("tdsyltt).innerHTML = tdsyltt ;
C5 = document.getElementById("tdsyl").innerHTML = tdsyl ;
And using my published URL I have developed a web app
code.gs
const doGet = _ => ContentService.createTextOutput(UrlFetchApp.fetch("https://docs.google.com/spreadsheets/d/e/2PACX-1vReY-tDEwKYjTiSjsfAN42qjFUwMv_OD3_64bFdGrgL-2p3otc13elLcCq3pkb5xqhTA-bW3QXobpqh/pub?gid=1861615717&single=true&range=c1:c5&output=csv").getContentText()).setMimeType(ContentService.MimeType.JAVASCRIPT);
In the web app output 1st line is ok,
but for 2nd, 3rd and 4th line there are Extra 2 Apostrophes coming at the starting and ending of the lines.
Here is My Web App
How to fix this ..?
Modification points:
In this case, how about directly retrieving the values from Spreadsheet using Spreadsheet service instead of UrlFetchApp.fetch("https://docs.google.com/spreadsheets/d/e/2PACX-1vReY-tDEwKYjTiSjsfAN42qjFUwMv_OD3_64bFdGrgL-2p3otc13elLcCq3pkb5xqhTA-bW3QXobpqh/pub?gid=1861615717&single=true&range=c1:c5&output=csv").getContentText()? I thought that the reason of your current issue might be due to exporting the Spreadsheet as CSV data. When the Spreadsheet service is used, the values can be retrieved.
In your previous question, I said "In this answer, your Spreadsheet is used. Of course, you can directly set the script in Web Apps.". Ref In your this question, I thought that this can be used.
In your sample Spreadsheet, document.getElementById("tdsyltt).innerHTML = tdsyltt ; is required to be document.getElementById("tdsyltt").innerHTML = tdsyltt ;. Please be careful about this.
When these points are reflected in your script, how about the following modification?
Modified script:
const doGet = _ => {
const spreadsheetId = "###"; // Please set your Spreadsheet ID.
const ss = SpreadsheetApp.openById(spreadsheetId);
const sheet = ss.getSheetByName("SYLLABUSC");
const html = sheet.getRange("C1:C6").getDisplayValues().filter(([c]) => c).join("\n");
// console.log(html); // When you directly run this function with the script editor, you can see the created value in the log.
return ContentService.createTextOutput(html).setMimeType(ContentService.MimeType.JAVASCRIPT);
}
From your Spreadsheet, in this sample script, I used "C1:C6" of "SYLLABUSC" sheet. So, please modify this for your actual situation.
Note:
When you modified the Google Apps Script of Web Apps, please modify the deployment as a new version. By this, the modified script is reflected in Web Apps. Please be careful about this.
You can see the detail of this in my report "Redeploying Web Apps without Changing URL of Web Apps for new IDE (Author: me)".
Try a way to solve this by changing the spreadsheet Publishing Format from csv to tsv then it will work.
Code.gs
const doGet = _ => ContentService.createTextOutput(UrlFetchApp.fetch("https://docs.google.com/spreadsheets/d/e/2PACX-1vReY-tDEwKYjTiSjsfAN42qjFUwMv_OD3_64bFdGrgL-2p3otc13elLcCq3pkb5xqhTA-bW3QXobpqh/pub?gid=1861615717&single=true&range=c1:c5&output=tsv").getContentText()).setMimeType(ContentService.MimeType.JAVASCRIPT);
I use this script to scrape data from any website in every 15 minute. I want to make this script auto remove Importxml formula and keep value only, but yet still can't achieve it.
function fetchData (){
var wrkBk= SpreadsheetApp.getActiveSpreadsheet();
var wrkSht= wrkBk.getSheetByName("Sheet1");
var url= "https://coinmarketcap.com/currencies";
for (var i= 2;i <=6;i++)
{
var coin= wrkSht.getRange('A' + i).getValue();
var formula = "=IMPORTXML(" + String.fromCharCode(34) + url + "/" + coin + String.fromCharCode(34) + "," + String.fromCharCode(34)+"//span[#class='cmc-details-panel-price__price']"+ String.fromCharCode(34)+")";
wrkSht.getRange('C' + i).activate();
wrkSht.getActiveRangeList().clear({contentsOnly: true, skipFilteredRows: true});
wrkSht.getRange('C'+i).setFormula(formula);
Utilities.sleep(1000);
}}
And I try put this script before Utilites.sleep(1000); and yet still not success
First try
var range = wrkSht.getRange('C'+i);
range.copyTo(range, {contentsOnly: true});
Second try
var range = wrkSht.getCurrentCell();
range.copyTo(range, {contentsOnly: true});
This is my Google Spreadsheet
https://docs.google.com/spreadsheets/d/1vykBSNJQ9xO23jA1ZT8fQAjfmtUQOQTqzQXFfCqz8oQ/edit?usp=sharing
Hope someone can help me, Thanks you
By default Google Apps Script applies the changes made by the code until the execution ends. Use SpreadsheetApp.flush() to force the changes be applied before doing the copy/paste as values only operation.
Instead of
var range = wrkSht.getCurrentCell();
Use
SpreadsheetApp.flush(); // This force to apply the previous changes (add the formula)
Utilities.sleep(30000); // This is required to wait for the spreadsheet to be recalculated (importxml import the data)
var range = wrkSht.getDataRange(); // This is in case that you want to paste the whole sheet as values
Instead of sleep you could use a loop to poll the spreadsheet until the spreadsheet is recalculated.
NOTE: Whenever it's possible we should avoid to use Google Apps Script classes and methods inside loops because they are (extremely?) slow and the execution time limit is small for free accounts (6 mins) and not so big for G Suite accounts (30 mins). The official docs explain this and we have several questions about this here.
Resources
Best Practices | Google Apps Script
I recently got in touch with web scraping and tried to web scrape various pages. For now, I am trying to scrape the following site - http://www.pizzahut.com.cn/StoreList
So far I've used selenium to get the longitude and latitude scraped. However, my code right now only extracts the first page. I know there is a dynamic web scraping that executes javascript and loads different pages, but had hard time trying to find a right solution. I was wondering if there's a way to access the other 49 pages or so, because when I click next page the URL does not change because it is set, so I cannot just iterate over a different URL each time
Following is my code so far:
import os
import requests
import csv
import sys
import time
from bs4 import BeautifulSoup
page = requests.get('http://www.pizzahut.com.cn/StoreList')
soup = BeautifulSoup(page.text, 'html.parser')
for row in soup.find_all('div',class_='re_RNew'):
name = row.find('p',class_='re_NameNew').string
info = row.find('input').get('value')
location = info.split('|')
location_data = location[0].split(',')
longitude = location_data[0]
latitude = location_data[1]
print(longitude, latitude)
Thank you so much for helping out. Much appreciated
Steps to get the data:
Open the developer tools in your browser (for Google Chrome it's Ctrl+Shift+I). Now, go to the XHR tab which is located inside the Network tab.
After doing that, click on the next page button. You'll see the following file.
Click on that file. In the General block, you'll see these 2 things that we need.
Scrolling down, in the Form Data tab, you can see the 3 variables as
Here, you can see that changing the value of pageIndex will give all the pages required.
Now, that we've got all the required data, we can write a POST method for the URL http://www.pizzahut.com.cn/StoreList/Index using the above data.
Code:
I'll show you the code to scrape first 2 pages, you can scrape any number of pages you want by changing the range().
for page_no in range(1, 3):
data = {
'pageIndex': page_no,
'pageSize': 10,
'keyword': '输入餐厅地址或餐厅名称'
}
page = requests.post('http://www.pizzahut.com.cn/StoreList/Index', data=data)
soup = BeautifulSoup(page.text, 'html.parser')
print('PAGE', page_no)
for row in soup.find_all('div',class_='re_RNew'):
name = row.find('p',class_='re_NameNew').string
info = row.find('input').get('value')
location = info.split('|')
location_data = location[0].split(',')
longitude = location_data[0]
latitude = location_data[1]
print(longitude, latitude)
Output:
PAGE 1
31.085877 121.399176
31.271117 121.587577
31.098122 121.413396
31.331458 121.440183
31.094581 121.503654
31.270737000 121.481178000
31.138214 121.386943
30.915685 121.482079
31.279029 121.529255
31.168283 121.283322
PAGE 2
31.388674 121.35918
31.231706 121.472644
31.094857 121.219961
31.228564 121.516609
31.235717 121.478692
31.288498 121.521882
31.155139 121.428885
31.235249 121.474639
30.728829 121.341429
31.260372 121.343066
Note: You can change the results per page by changing the value of pageSize (currently it's 10).
I have filled a google spreadsheet with around 500 URLs and Xpaths. After discovering that ImportXML has some drawbacks (it is getting perpetual loading errors, even when there are only 10 or so functions running). I am looking for another way to populate the sheet. My first attempt was an iterative script that simply wrote an ImportXML function into a working cell then wrote in the value for each URL. I thought that by just having one ImportXML running at a time it would work fine but it still gets perpetual loading errors.
Sample sheet:
https://docs.google.com/spreadsheets/d/1QgW4LVkB_oraO9gdS5DsnNta3GVlqsH0_uC1QP0iE7w/edit?usp=sharing
(note the sample sheet actually works OK with the iterative ImportXML script, still returns some errors, but I think there must be some limit on historical ImportXML functions not just current ones on sheet because my main sheet has real problems handling just a few now)
Is there a simple script that will work? I have tried variations using URLFetch, xml.evaluate, xmlService, but with my limited knowledge I can't get it to work.
Any guidance much appreciated.
Thanks!
Here's a working method - I tested for you :
add this function in above the function you currently have in your apps script.
function importprice(url) {
var found, html, content = '';
var response = UrlFetchApp.fetch(url);
if (response) {
html = response.getContentText();
if (html) content = html.match(/<span id="product_price" itemprop="price">(.*)<\/span>/gi)[0].match(/<span id="product_price" itemprop="price">(.*)<\/span>/i)[1];
}
return content;
}
and then replace your importxml function that currently looks like this:
var cellFunction1 = '=IMPORTXML("' + sheet.getRange(row,4).getValue() + '?' + queryString + '","' + sheet.getRange(row,5).getValue() + '")';
with this:
var cellFunction1 = importprice(sheet.getRange(row,4).getValue());