I need to get the name of a company into a Google spreadsheet.
The GOOGLEFINANCE function doesn't include the name of the company in its attributes, so I'm trying to create a custom function for that.
So, for IBM, for example, I can fetch the URL:
https://www.google.com/finance?q=ibm
And using Javascript, I'm trying to get the text of the name using:
document.getElementsByClassName('appbar-snippet-primary')[0].getElementsByTagName("span")[0].innerHTML
Which is returning:
undefined
If you are trying to do that inside apps scripts it wont be possible, the version of javascript in apps script does not contain the document object, therefore you won't be able to do it like that.
If you return that response to the client(where the javascript contains the document object) in order to look for the item in that way, you should have first add the information to that object.
a possible solution would be to treat the result of the urlfetch as string and then look for the information you require.
Related
I'm trying to scrape data from this url https://drive.getbigger.io/#/stores, however I didn't find the Xpath of the text I want to export, which are the producer's offer.
Firstly I try the importxml function in Google sheet:
=IMPORTXML(A1;"/html/body/flt-ruler-host/div[23]/p")
and it gave me N/A error "the imported content is empty"
so I try to scrape this website with add-ons and Parsehub, and it gave me every time a .csv file where i can't find the data I want to export.
Also I can't find the right Xpath for the data I would like to scrape, when I use the inspection tool, the data isn't in the <body> part.
However the Xpath I use in my importXML function is some code I find in the <body> part and which is close of the text I'd like to extract (the producer's offer).
It seems that the Xpath I am looking for is linked in the <head> part with some JavaScript code, also when I hover the page with the selection tool in order to scrape the data it select the whole page, maybe because there is a "scroll <div>".
So I wonder if the website use some kind of protection against scraping or other.
Please guys tell me if :
I could find the right Xpath in order to scrape with the importXML function?
Should I extract the data with a python script?
if the website block my attempts, how could I do this?
You won't be able to scrape anything with IMPORTXML formula since the website uses dynamic rendering (javascript).
So yes, Python+Selenium (or other combinations) could do the job. The website won't block you if you follow some rules (switch user-agent, add pauses between requests).
You would probably need these XPath :
Product description :
//p[1][string-length(text())>5][parent::flt-dom-canvas]
Product price :
//p[3][contains(text(),"€") and not (contains(text(),","))][parent::flt-dom-canvas]
However, I think the most elegant way to get the data is probably to use the API the website relies upon. With GoogleSheets and a custom ImportJSON script, you can obtain something like this (result for "fromage" as query) :
It won't work out of the box, you'll have to modify some parts of the script since it won't load a JSON (called with POST) which needs headers in the request. In a nutshell, you need to construct the payload part, add headers to the request ("Bearer XXXXX"), and add a parameter to a function to retrieve the results.
All this depends on your objective and your expected output.
EDIT : For references (constructing the payload, adding parameters) you can read :
https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app#fetchurl,-params
Look also the networktab of your browser developper tools in order to find : the url of the API and the correct parameters to send.
I'm trying to create a new page in Alfresco, but the tutorials gives to me the information that i have to create three files: new-page.get.js, new-page.html.ftl and new-page.get.xml , like Aikau - http://docs.alfresco.com/5.0/concepts/dev-extensions-share-page-creation.html
But the javascript is different, for example, I try to get the current URL with: window.location.search or make console.logor alert. But, in this three cases, I got "undefined" like "window is undefined"
Why is this javascript different? What type of javascript is? Where I can get tutorials, for example, to program this javascripts?
I want to make a window.location.search to get the current URL , but if I don't have this command, what can I use for this effect?
Normally, the Alfresco way wouldn't be to get the raw URL. Instead, you should be using the built-in argument processing
Since Alfresco itself is open source, we can look at Alfresco for some examples! So, starting with the groups get webscript, we see a URL pattern defined as:
<url>/api/groups?shortNameFilter={shortNameFilter?}&zone={zone?}&maxItems={maxItems?}&skipCount={skipCount?}&sortBy={sortBy?}</url>
With that, we see a whole bunch of pre-defined parameters on the URL.
Next, we look at the javascript controller behind that webscript, and we see things like:
var shortNameFilter = args["shortNameFilter"];
var zone = args["zone"];
Those URL parameters are then parsed into your webscript in the args variable, available for you to fetch as a hash.
No need to do any raw URL munging yourself, if you define your webscript correctly the framework does it all for you!
The JavaScript isn't different, the language itself is still the same.
window, console and alert are just APIs supplied by browsers. They aren't a native part of JavaScript.
The documentation you linked to should be your starting point for figuring how what APIs are available.
You can get the server URL in the Javascript web script (on the backend) by
var path = url.getServer()
http://localhost:8080 will be returned for example
Here is the list of available the methods - you can concatenate them to get a direct URL:
I want to use the Wikipedia API to select a famous person's name from the People category from my Javascript application. Basically, I would like to send the name or partial name and get results that contains the Wikipedia URL, title, an excerpt of the content and if possible the main picture.
I have been trying two ways, but I cannot make it work as I want.
First I have tried with search, but I cannot find the way of make it return the url. Would sectiontitle be good as unique identifier? Can snippet be plain text somehow? Cannot find how to filter by category.
Second, I have tried with opensearch, but the JSON response does not contain images, while the XML response does:
JSON: http://en.wikipedia.org/w/api.php?action=opensearch&search=mariano&namespace=0&format=json
XML: http://en.wikipedia.org/w/api.php?action=opensearch&search=mariano&namespace=0&format=xml
It is not possible to filter by category. Also, some results include a link to the disambiguation page, when I would prefer to get the list of possible matches rather than such link.
How could I search by title and get full title, url, small description and a picture link?
Opensearch is for input field autocompletion; it's based on an external spec and not very flexible. You should use the search API as a generator for some other API such as info which can return more details (example).
If I have a function written in Google Spreadsheets script editor that retrieves the data in the spreadsheet in JSON format, how can I access that function outside of the script editor in my own code? I want to access that JSON and manipulate it in my own code. Is there a way to do that using the Spreadsheets API? I format it in a specific way inside script editor so I can't just use the json-in-script provided. In the call (http://spreadsheets.google.com/feeds/feed/key/worksheet/public/basic?alt=json-in-script&callback=myFunc) there's a callback function for myFunc. Can I use the function I defined in the script editor to replace myFunc?
Following your comment that brings some details on your use case, there is a Google-Apps-Script feature specially designed to give access to some functions you wrote from within another script : is is called libraries and is fully described in the documentation.
EDIT, following 2cond comment:
Calling a GS function from a javascript (or any other language) script that is not a Google Script (GS) is not possible if you consider using it as a function...
but
what you can eventually do - depending on the data this function must handle - is to deploy a script as a webApp running as a service and call this service from your external app using the equivalent of an urlFetch (that's the service doing that in GS).
The service will have an url to which you can add parameters and it will return a result that you can use in your local app.
Of course this workflow has a few limitations and might quickly become complex but in many cases it is fully workable.
Note that the url you will have to use in the "versioned" one ending with .exec (Not sure this word is correct but I mean the published url that corresponds to a version of your script and not the ".dev" one that one can use to test a script in GS).
You'll find details about that in the documentation and on many other ressources, including SO. The url is typically something like this :
https://script.google.com/macros/s/AKfycbyw-2WtmF7wsd__________azjImbMWm5YrxB8/exec?someParameter=someValue&otherParam=otherVal // etc...
I am trying to build this application that when provided a .txt file filled with isbn numbers will visit the isbn.nu page for that isbn number by simply appending the isbn to the url www.isbn.nu/your isbn number.
After pulling up the page, I want to scan it for information about the book, and store that in an excel file.
I was thinking about creating a file stream of the url in Java, but I am not really sure how to extract the information from the html page. Storing the information will be done using the JExcel Java package.
My best guess would be using javascript to extract the information, but I don't know how to call the javascript from my java program.
Is my idea plausible? if not, what do you guys suggest I do.
my goal: retrieve information from an html page and store it in an excel file for each ISBN in a text file. There can be any number of isbn's in a text file.
This isn't homework btw, I am simply doing this for an organization that donates books to Sudan. Currently they have 5 people cataloging these books manually and I am one of them.
Jsoup is a useful tool for parsing a web page and getting data from it. You can do it in Java and it's pretty easy.
You can parse the text file, build the URL with a string, send it in with JSoup then use JSoup to parse out the information using the html tags on the page. Then you can store it out however you want. You really don't need to use Javascript at all if you're more comfortable with Java.
Example for reading a page and parsing it with Jsoup:
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");
Use a div in which you load your link (example here how to do that http://api.jquery.com/load/).
After that when load is complete you can check what is the name of the div's or spans used in the webpage and get that content with val (http://api.jquery.com/val/) or text (http://api.jquery.com/text/)
Here is text from the main page of www.isbn.nu:
Please note that isbn.nu is designed for manual searching by individuals. It is not intended as an information resource for automated retrieval, nor as a research tool for companies. isbn.nu reserves the right to deny access based on excessive requests.
Why not just use the free Google books API that would return book details in XML format. There are many classes available in Java to parse XML feeds and would make your life much easier.
See http://code.google.com/apis/books/ for more info.
Here are the steps needed:
Create CURL request (you can use multiple curl requests)
Get body data
Parse data
Make excel file
You can read HTML information using this guide.
A simple solution might be to use a Google Docs spreadsheet function like ImportXML(URL,path-expression).
More information and examples here:
http://www.seerinteractive.com/blog/importxml-cookbook/
http://www.distilled.net/blog/distilled/guide-to-google-docs-importxml/
http://blog.ouseful.info/2008/10/14/data-scraping-wikipedia-with-google-spreadsheets/