I want to load a web page that's generated by JS (e.g., AngularJS or similar) then scrape it using (only) Google Apps Script. How can I accomplish that?
I'm looking for something like:
const response = UrlFetchApp.fetch( urlToExternalJsPage );
const content = response.getContentText();
// scrape content
Only, maybe, replace the UrlFetchApp with come call to a library or something? Perhaps a Puppeteer library for GAS, the Cheerio library for GAS or something else?
How can I load an externally loaded JS page and read the HTML from that page after it's generated in order to scrape it?
Idea 1
I came across this article: The Best Way to Load Javascript that supplies the following code.
function loadScript(url, callback){
var script = document.createElement("script")
script.type = "text/javascript";
if (script.readyState){ //IE
script.onreadystatechange = function(){
if (script.readyState == "loaded" || script.readyState == "complete"){
script.onreadystatechange = null;
callback();
}
};
} else { //Others
script.onload = function(){
callback();
};
}
script.src = url;
document.getElementsByTagName("head")[0].appendChild(script);
}
The actual code on your page ends up looking like this:
<script type="text/javascript" src="http://your.cdn.com/first.js"></script>
<script type="text/javascript">
loadScript("http://your.cdn.com/second.js", function(){
//initialization code
});
</script>
The problem with this approach is that I'm trying to stay strictly server side. I'm not trying to post any HTML pages and/or serve them.
Idea 2
I came across this article that appears to describe some Puppeteer Libary for GAS. I translated it from Japanese using Google Translate. The problem is it requires using Google Cloud Platform and I want to avoid that. I also want to avoid setting up any billing and just stay strictly inside Google Apps Script.
Idea 3
Perhaps there is a way to use the browser that comes with the UI service. Specifically, the sidebar?
On this page, I found the following example of importing web pages into an HTML service page using an IFRAME.
Code.gs
function doGet() {
var template = HtmlService.createTemplateFromFile('top');
return template.evaluate();
}
top.html
<!DOCTYPE html>
<html>
<body>
<div>
Click Me!
</div>
</body>
</html>
Related
I have a JavaScript file that points to another website as the source:
<script src="https://example.com/requiredcode.js" async></script>
This script element is going to be sent to my clients and they are going to place it manually on their websites. And I would prefer that it at least looks like the code is going through my website domain.
I want it to look like this:
<script src="https://mywebsite.com/script.js" async></script>
This is simply an aesthetic issue.
I don't care if clients go into the code itself and see that it relies on 3rd parties. I just don't want the main <script> element to look like it's relying on a 3rd party itself.
UPDATE:
I thought of hosting the file locally on my website. But the problem is that the 3rd party file is sometimes manually updated by the 3rd party.
You could have your HTML link to a .js file on your website that then loads in the third party library like this:
HTML:
<script src="https://mywebsite.com/script.js" async></script>
JS:
function headScript(url_check, fn) {
if (typeof url_check != 'object') {
url_check = [url_check]
}
var url = url_check[0]
var checkFor = url_check[1]
if (!window[checkFor]) {
var script = document.createElement('script')
document.head.appendChild(script)
script.onload = fn
return (script.src = url)
}
fn && fn()
}
headScript('https://example.com/requiredcode.js', function() {
// run when file is loaded
})
I want to create my own asset loader. To load external scripts, I used following javascript
function loadScript(src, callback) {
var script = document.createElement("script");
script.onload = function() {
document.head.appendChild(script);
callback();
};
script.src = src;
}
But I think this isn't the most elegant solution. This code snippet would make my HTML code ugly since it appends code to the head - for all of my dependencies.
So my question: is it possible to access to my external loaded code without using following line
document.head.appendChild(script);
Am I able to executes my script with pure js like
script.execute();
Or even better, is there a way to access to the data stored in my external js file? Like variable "bar", for example?
var foo = script.get("bar")
Could I even execute a function of the external file?
script.function(params)
It would be great to hear of your ideas and experiences!
Darth Moon
Edit: I forgot to exclude ajax. I know I could load code via ajax and executes it via eval(), but that won't be a good idea if you're testing code local since you need a Server (like an XAMPP Apache) to send ajax request to your local files.
You could try something like so as below.
Internal script:
function appendScript(src) {
var script = document.createElement(‘script’);
script.src = src;
document.head.appendChild(script);
}
External script:
function loadedScript() {
// run something
}
loadedScript()
The external code will just run loadedScript() once the file is loaded.
I am using google app scripts on google sites. I have created a navigation menu, and I embedded it into the page. I want to get the pageURL() from google scripts and retrieve it in my JavaScript page. I tried using the scriptlet to get the value, but it doesn't execute. Here is what I have so far. How can I get access to values in google app scripts and use them in my JavaScript function?
google script (.gs)
function getPageName(){
var site = SitesApp.getSite("site.com", "sitename");
var page = site.getChildren()[0];
var pageName = page.getUrl().split("/").splice(-1)[0];
return pageName;
}
javascript file
var pageName = <?!= getPageName()?>; // doesnt execute, need to get page url
if(pageName == linkName){
// add class here.
}
Since google loads the apps script as an iframe, I tried doing window.location.href, but it doesn't work either. The page name ends up being the name of the google app instead.
An alternative to using scriptlets is to use google.script.run (Client-side API)
It's pretty easy to use. In your case, it should be like this
code.gs
function getPageName(){
var site = SitesApp.getSite("site.com", "sitename");
var page = site.getChildren()[0];
var pageName = page.getUrl().split("/").splice(-1)[0];
return pageName;
}
Javascript File:
function onSuccess(receviedPageName)
{
if(receviedPageName== linkName)
{
// add class here.
}
}//onSuccess
google.script.run.withSuccessHandler(onSuccess).getPageName();
withSuccessHandler(function) is executed if the server-side function returns successfully or withFailureHandler(function) is executed if a server side function fails to complete the task it was assigned.
Give it a try :)
My website needs to use the Google Earth plugin for just a bit longer (I know, the API is deprecated, but I'm stuck with it for several more months). I load it by including google.com/jsapi, then calling google.load like so:
...
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("earth", "1", {"other_params": "sensor=false"});
google.setOnLoadCallback(function () {
// call some JavaScript to begin initializing the GE plugin
});
</script>
</body>
</html>
This works well from multiple computers and with multiple browser inside our company's firewall. It works well from my home computer, and from my colleagues' home computers. However, when my customer tries to load it, she gets an error message that google is not defined on the line that begins google.load(.
Of course, global variable google is defined at the start of file www.google.com/jsapi, so presumably that file isn't loading. I initially assumed that her corporate firewall was blocking that file, but when I asked her to paste "https://www.google.com/jsapi" into her browser's address bar, she said that immediately loaded up a page of JavaScript.
The entire output to the browser console is:
Invalid URI. Load of media resource failed. main.html
ReferenceError: google is not defined main.html:484
And I believe the Invalid URI business is just because we don't have a favicon.ico file.
She is running Firefox 35.0.1, though she says the same error occurred with IE (she didn't mention the version of IE).
Short of asking her to install Firebug, which I don't think is going to be feasible, how can I troubleshoot this issue?
I'm really not sure with that assumption but:
Could it be, that your first script loads asynchronous? Then for slow connections (your customer) this problem would occur (i know that you are not using the async tag - but maybe the source can trigger to load async).
Best thing to do here is to make sure that the Google code you're using is the sync kind and redeploy.
Also https://bugsnag.com/ can be a really interesting tool for you. Just implement the js and you can track every error your customer gets.
Redeploy your code as follows,
<script type="text/javascript">
try {
google.load("earth", "1", {"other_params": "sensor=false"});
google.setOnLoadCallback(function () {
// call some JavaScript to begin initializing the GE plugin
});
} catch (e) {
$.post('http://<your-remote-debug-script-or-service>',e)
}
</script>
Then, when your customer encounters the error, the full details will be sent directly to your server and you can troubleshoot as necessary.
It could be something as simple as the clients browser is blocking javascript from being executed. Maybe specifically blocking your domain or something crazy like that.
Can you try an external script that loads the google jsapi, then put your code in the callback to ensure it is loaded?
<script type="text/javascript">
function loadScript(url, callback){
var script = document.createElement("script")
script.type = "text/javascript";
if (script.readyState){ //IE
script.onreadystatechange = function(){
if (script.readyState == "loaded" ||
script.readyState == "complete"){
script.onreadystatechange = null;
callback();
}
};
} else { //Others
script.onload = function(){
callback();
};
}
script.src = url;
document.getElementsByTagName("head")[0].appendChild(script);
}
loadScript("https://www.google.com/jsapi", function(){
google.load("earth", "1", {"other_params": "sensor=false"});
google.setOnLoadCallback(function () {
// call some JavaScript to begin initializing the GE plugin
});
});
</script>
(Modified from http://www.nczonline.net/blog/2009/07/28/the-best-way-to-load-external-javascript/)
You may also want to look at jsapi Auto-Loading to minimize what is loaded, but it may get tricky with an older library. https://developers.google.com/loader/
The following are the first lines of code in a <script> tag just above the closing body tag in my document (it specifies that a locally-served copy of jQuery is run in the event that Google's CDN fails):
if(!window.jQuery){
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = '/js/jquery.js';
var scriptHook = document.getElementsByTagName('script')[0];
scriptHook.parentNode.insertBefore(script, scriptHook);
}
jQuery(document).ready(function($){
// page behaviors
});
It does execute successfully, in the sense that if my computer is not connected to the Internet (this is a locally-served page), the local copy of jQuery is inserted. However, the document.ready() section below does not execute. I'm guessing this is because it is invoked before the fallback copy of jQuery takes effect. What's the proper practice for somehow "delaying" its execution so that either copy of jQuery will work properly?
Consider using an existing script loader such as yepnope. There's an example of exactly what you're trying to do on the home page.
You need to be sure that the script you are appending to the dom has finished loading before calling jQuery. You can do this with the technique described here:
if(!window.jQuery){
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = '/js/jquery.js';
script.onreadystatechange= function () {
if (this.readyState == 'complete') jQueryLoaded();
}
script.onload = jQueryLoaded;
var scriptHook = document.getElementsByTagName('script')[0];
scriptHook.parentNode.insertBefore(script, scriptHook);
}
function jQueryLoaded() { };
You can also fetch the jQuery contents as an Ajax request, create a script tag with those as the body of the script and append it. That would also work.
Try that
<script>window.jQuery || document.write('<script src="js/libs/jquery-1.6.2.min.js"><\/script>')</script>
<script>
jQuery(document).ready(function($){
// page behaviors
});
</script>
This way the script tag will be loaded synchronously.
The question "of how do I cope with my CDN failing and load a file hosted on my server" seems to come up a few times lately.
Question I'd ask is whether adding yet more js is the way to achieve the resilience and what level of resilience do the js approaches really add e.g. if the CDN is down they'll be a quick failure but how well do these approaches if the CDN is slow to respond how well do these solutions cope?
An alternative way to approach this is treat it as an infrastructure problem...
Run a CDN based on a domain/sub-domain you own. Have automated monitoring on it's availability, when it fails switch the DNS over to a backup server (anycast may provide an alternative solution too)
A php solution would be something like this:
$google_jquery = 'https://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js';
$fp = #fsockopen($google_jquery, 'r');
if (!$fp)
{
echo '<script type="text/javascript" src="js/jquery.js"></script>';
}
else
{
echo '<script src="'.$google_jquery.'"></script>' }
}