XmlService using Apps Script is finding an error with the '&' symbol - javascript

I have the below function trying to convert a webpage to xml, so I can start extracting out some data from tables etc.
function getWebpageContent() {
var url = "https://training.gov.au/Training/Details/BSBCRT501";
var xml = UrlFetchApp.fetch(url).getContentText();
var document = XmlService.parse(xml);
Logger.log(document);
}
I'm recieving this error:
Exception: Error on line 170: The entity name must immediately follow the '&' in the entity reference.
getWebpageContent # Code.gs:6
When I search that webpage for the "&" symbol, (assuming that XmlService is confusing the 'and' symbol for some sort of html code and throwing an error) I can only find one hidden one. And am not sure how to circumvent it.
Any way to dodge that error and get the webpage info as Xml in Apps Script?

From your following replying,
The output I want from this page (https://training.gov.au/Training/Details/BSBCRT501) is each in the 'Elements and Performance Criteria' table. I want to save it as an array to then reformat into my spreadsheet. I might just use IMPORTXML in a spreadsheet formula instead.
In this case, how about the following formula?
Sample formula:
=IMPORTXML("https://training.gov.au/Training/Details/BSBCRT501","//table[2]//tr")
Result:
Reference:
IMPORTXML
Added:
From your following replying,
That's a great answer thanks and the method I think I'll use. It doesn't link break at each point (2.1, 2.2 etc) unfortunately but it's still good. I don't think I can accept it as the answer though as it doesn't solve the specific Apps Script problem, but thanks a lot for this.
I added a sample script for using Google Apps Script. Could you please confirm it?
Sample script:
Before you use this script, please enable Sheets API at Advanced Google services. When you run this script, the table is put to the active sheet.
function myFunction() {
const url = "https://training.gov.au/Training/Details/BSBCRT501";
const res = UrlFetchApp.fetch(url, {muteHttpExceptions: true});
if (res.getResponseCode() != 200) throw new Error(res.getContentText());
const table = [...res.getContentText().matchAll(/<TABLE[\s\S\w]+?<\/TABLE>/g)];
if (table && table[1][0]) {
const spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
const sheet = spreadsheet.getActiveSheet();
const resource = {requests: [{pasteData: {html: true, data: table[1][0], coordinate: {sheetId: sheet.getSheetId()}}}]};
Sheets.Spreadsheets.batchUpdate(resource, spreadsheet.getId());
}
}
References:
Class UrlFetchApp
Method: spreadsheets.batchUpdate

This guy here explains the underlying cause, and also gives the solution:
The entity name must immediately follow the '&' in the entity reference

Related

The Specified GSHEET_JSON_URL does not contain JSON

I have a working Google Sheet JSON search engine here:
https://codepen.io/Teeke/pen/gOwgvXQ?editors=0011
It reads this Google sheet:
https://docs.google.com/spreadsheets/d/1c2aJmDdLkbjW0ErfUB0sfiQ-pt6zdW5KWZgREWs0zvM/edit#gid=0
I made an exact copy of the spreadsheet, and didn't change anything:
https://docs.google.com/spreadsheets/d/1WUzfGeyOMVVOb8tbYLqHTrAk7Ii2p6l58EXG-kcgOcY/edit#gid=0
The codepen with the new spreadsheet raises the following error:
The specified GSHEET_JSON_URL does not contain JSON:
https://codepen.io/Teeke/pen/zYKNRpZ?editors=0011
$(function() {
var GSHEET_JSON_URL =
'https://spreadsheets.google.com/feeds/list/1WUzfGeyOMVVOb8tbYLqHTrAk7Ii2p6l58EXG-
kcgOcY/1/public/values?alt=json';
The urls look exactly the same.
Try to access your Google Sheets from an incognito window. Your first sheet is published while your second sheet is not.

Protecting a URL with a Query Parameter when Using Google Apps Script

I'm having a really hard time sending an automated email (with Google Apps Script) that includes a URL that contains query parameter.
Expected Behavior
Google Apps Script (specifically, the Gmail service) sends an email, and part of the email body contains a URL with a query parameter. The URL will look something like this:
http://my.app/products?id=Bz9n7PJLg8hufTj11gMF
Observed Behavior
The Gmail service seems to be stripping out the = from my URL. So, the body of the email ends up looking like this:
...
http://my.app/products?idBz9n7PJLg8hufTj11gMF
...
Obviously, that link won't work.
I've checked other questions here on SO, and I've tried working with the base encoding tools from the GAS Utilities service, as well as working with the encodeURI() JavaScript method. No luck so far.
Email-sending Code
//////// GENERATING MESSAGE FROM ID ////////////
// Gets message from ID
var id = Gmail.Users.Drafts.get('me', 'r-1006091711303067868').message.id
var message = GmailApp.getMessageById(id)
var template = message.getRawContent()
// Replaces template variables with custom ones for the user using RegExes
let listingUrl = 'http://my.app/products?id=xyz'
let creatorEmail = 'hello#gmail.com'
let creatorUsername = 'Sam'
template = template.replace(/templates#my.app/g, creatorEmail)
template = template.replace(/firstName/g, creatorUsername)
//** Below is the string that gets modified and broken **//
template = template.replace(/listingUrl/g, listingUrl)
// Creates the new message
var message = Gmail.newMessage()
var encodedMsg = Utilities.base64EncodeWebSafe(template)
message.raw = encodedMsg
// Sends it
Gmail.Users.Messages.send(message, "me", Utilities.newBlob(template, "message/rfc822"))
Regex-based Solution
With the help of Tanaike and Rafa Guillermo, the solution that ended up working for me was to replace = with = by using a little .replace() like this:
listingUrl = listingUrl.replace(/=/, '=')

Copy data from a dynamic website using scrapy

I started to write a scraper for the site to collect data on cars. As it turned out, the data structure can change, since the sellers do not fill all the fields, because of what there are fields that can change, and during the scraper as a result in the csv file, the values ​​are in different fields.
page example:
https://www.olx.ua/obyavlenie/prodam-voikswagen-touran-2011-goda-IDBzxYq.html#87fcf09cbd
https://www.olx.ua/obyavlenie/fiat-500-1-4-IDBjdOc.html#87fcf09cbd
data example:
Data example
One approach was to check the field name with text () = "Category name", but I'm not sure how to correctly write the result to the correct cells.
Also I use the built-in Google developer tool, and with the help of the command document.getElementsByClassName('margintop5')[0].innerText
I brought out the whole contents of the table, but the results are not structured.
So, if the output can be in json format then it would solve my problem?
innerText result
In addition, when I studied the page code, I came across a javascript script in which all the necessary data is already structured, but I do not know how to get them.
<script type="text/javascript">
var GPT = GPT || {};
GPT.targeting = {"cat_l0":"transport","cat_l1":"legkovye-avtomobili","cat_l2":"volkswagen","cat_l0_id":"1532","cat_l1_id":"108","cat_l2_id":"1109","ad_title":"volkswagen-jetta","ad_img":"https:\/\/img01-olxua.akamaized.net\/img-olxua\/676103437_1_644x461_volkswagen-jetta-kiev.jpg","offer_seek":"offer","private_business":"private","region":"ko","subregion":"kiev","city":"kiev","model":["jetta"],"modification":[],"motor_year":[2006],"car_body":["sedan"],"color":["6"],"fuel_type":["543"],"motor_engine_size":["1751-2000"],"transmission_type":["546"],"motor_mileage":["175001-200000"],"condition":["first-owner"],"car_option":["air_con","climate-control","cruise-control","electric_windows","heated-seats","leather-interior","light-sensor","luke","on-board-computer","park_assist","power-steering","rain-sensor"],"multimedia":["acoustics","aux","cd"],"safety":["abs","airbag","central-locking","esp","immobilizer","servorul"],"other":["glass-tinting"],"cleared_customs":["no"],"price":["3001-5000"],"ad_price":"4500","currency":"USD","safedealads":"","premium_ad":"0","imported":"0","importer_code":"","ad_type_view":"normal","dfp_user_id":"e3db0bed-c3c9-98e5-2476-1492de8f5969-ver2","segment":[],"dfp_segment_test":"76","dfp_segment_test_v2":"46","dfp_segment_test_v3":"46","dfp_segment_test_v4":"32","adx":["bda2p24","bda1p24","bdl2p24","bdl1p24"],"comp":["o12"],"lister_lifecycle":"0","last_pv_imps":"2","user-ad-fq":"2","ses_pv_seq":"1","user-ad-dens":"2","listingview_test":"1","env":"production","url_action":"ad","lang":"ru","con_inf":"transportxxlegkovye-avtomobilixx46"};
data in json dict
How can I get the data from the pages using python and scrapy?
You can do it by extracting the JS code from the <script> block, using a regex to get only the JS object with the data and then loading it using the json module:
query = 'script:contains("GPT.targeting = ")::text'
js_code = response.css(query).re_first('targeting = ({.*});')
data = json.loads(js_code)
This way, data is a python dict containing the data from the JS object.
More about the re_first method here: https://doc.scrapy.org/en/latest/topics/selectors.html#using-selectors-with-regular-expressions

HTML Function displaying as plain text

I have been working on this for quite some time, and have basically been teaching myself HTML, so I apologize if the code is sloppy or if this is a simple fix. Here is what I am attempting to do, and the problem I am running into:
Take Google Form responses, generate an email based on those responses and dynamically email a certain person in my organization based on the location response(this part is done and working, just adding for context). Then create a survey response that sends info back to the original responder, sent from the administrator that the form was sent to. This is the js that I have running, that is working when it is ran in the google project:
function getid() {
var spreadsheet = SpreadsheetApp.openByUrl('https://docs.google.com/a/raytownschools.org/spreadsheets/d/1YWHu_yKn5bqq63x1A4e4-vBUtZANj-xjeF07IBpHP64/edit?usp=sharing');
SpreadsheetApp.setActiveSheet(spreadsheet.getSheets()[0]);
var sheet = spreadsheet.getActiveSheet();
var lastRow = sheet.getLastRow();
}
When I attempt to run that in my HTML code, and insert it into the element, it is simply inserting that code as raw text. HTML isn't running the function, or returning the data that it should be (and does return when ran outside the HTML code as a js app).
I can post the full HTML code if that would be helpful. Hopefully someone on here can help me out.
What you have there is a Javascript function. There aren't functions in HTML, HTML is a markup language.
You must add that function inside Javascript tags like this:
<script type="text/javascript">
function getid() {
var spreadsheet = SpreadsheetApp.openByUrl('https://docs.google.com/a/raytownschools.org/spreadsheets/d/1YWHu_yKn5bqq63x1A4e4-vBUtZANj-xjeF07IBpHP64/edit?usp=sharing');
SpreadsheetApp.setActiveSheet(spreadsheet.getSheets()[0]);
var sheet = spreadsheet.getActiveSheet();
var lastRow = sheet.getLastRow();
}
alert( getid() );
</script>
Take a look at here, on how to use javascript.
Edit
Seems like that code you're trying to execute is for Google Apps Script. I think you must execute it inside the Google script editor, because they don't make this API available for regular websites. Here is a running example with your code.

Accessing headers of a google spreadsheet using HTTP GET and Google App Script

I'm trying to return the header row of a Google spreadsheet using doGet() in a Google App Script that's running as a WebApp. I'm using a HTML form to send the GET request to the WebApp and it's all working except I don't know how to return the headers to my javascript. I'll post my code:
HTML:
<form id="getForm" method="get" action="My URL for WebApp">
<label for="sheetGetID">SheetID</label>
<input type="text" name="sheetGetID" id="sheetGetID" value="">
<button class="ui-btn" onclick='submitGET()'>Submit</button>
</form>
Javascript:
function submitGET() {
var headers = $("getForm").submit();
alert(headers);
}
Google App Script:
function doGet(e) {
//Trying To: Get headers from sheetID and then return to app, then have correct labels for the inputs, then use POST to post.
var ss = SpreadsheetApp.openById(ScriptProperties.getProperty('active'));
var sheet = ss.getSheetByName(e.parameter["sheetGetID"]);
//Return the first 3 cells, A1:C1,
var headers = sheet.getRange(1,1,1,sheet.getLastColumn()).getValues()[0];
return ContentService.createTextOutput(JSON.stringify(headers))
.setMimeType(ContentService.MimeType.JSON);
}
I'm getting a JSON object returned but it's just a text output. My question is how would/could I get the JSON returned and stored as the headers variable?
The return of doGet method must be an HTML.
Build another html page and use the call HtmlService.createTemplateFromFile('newPag.html').evaluate()
Inside your page use the tags and put your server side code manipulating the json object. This way you will create a good look and feel and a good maintanable code.
I got this to work a while ago, I forgot to post the answer just in case anyone else needed it.
You need to output it as a JSON object like the API demo. You also need to append "?prefix=?" to the url when you're doing a $.getJSON() call. The prefix part is to tell the JQuery that it is a JSON object you're receiving.
If anyone has troubles with this just comment and this and I'll post all the code I used.
So on your client end, I'm using JQuery Mobile, I'm not sure how to do it without it, you would do something like:
sheetID = $("#sheetGetID").val();
$.getJSON("https://script.google.com/macros/s/YOUR_KEY_GOES_HERE/exec?prefix=?",
{ sheetGetID: sheetID},
function(results) {
var fields = results.split(",");
//Do something with fields
}
);
}
Where #sheetGetID is the textbox where the user can enter the sheet id for headers.
Note the ?prefix=? appended to the URL, that part is for JQuery to know it's receiving JSON. That part is necessary. The URL is your deployed WebApp.
On the Google App Script side, ie Server side, you'd have something like:
function doGet(request) {
var ss = SpreadsheetApp.openById(ScriptProperties.getProperty('active'));
var sheet = ss.getSheetByName(request.parameter["sheetGetID"]);
//Return the first 3 cells, A1:C1,
var headers = sheet.getRange(1,1,1,sheet.getLastColumn()).getValues()[0];
var result = headers.join();
var content = request.parameters.prefix + '(' +JSON.stringify(result) + ')';
return ContentService.createTextOutput(content)
.setMimeType(ContentService.MimeType.JSON);
}
If you have any questions on how the spreadsheet part works theres plenty of documentation on Google's API's. doGet() is called when you use the $.getJSON(), the return from the G.A.S. needs to be JSON. Most of this is covered in the documentation Google has, some of it I found watching Google Developers Live on youtube. If you are trying to do more stuff I highly recommend checking those sources out.
If you have any more questions about what's being called or parameters you can find it easily enough on Google.

Categories