Simple question: every chart created with Google Charts, when saved as an image, is named "chart.png". Is there a way to force another name? That way if I export a bunch of QR codes at once I can give them meaningful names.
I couldn't find anything in the documentation or here about simply renaming the .png
The short answer is no. Because the last part of the URL for creating QR codes from Google Charts (which is deprecated and will be phased out 2015-4-20) is chart, that is also the name that most browsers (new Chrome, IE, FF) use as a file name. Also, Google seems to not have provided any way to alter this URL-ending.
The long answer is that it would be possible for you to circumvent this, but not trivial.
Forwarder with forged URL (source)
One option would be to create a PHP-page (or similar) of your own that just presents the content of Googles QR code, and where you control the last part of the URL. For example:
http://example.com/forwarder.php/mychart.png?chs=150x150&cht=qr&chl=Hello+world
You could then use this URL as the src of your image. This would also require you to have som specific settings to allow the above URL format without giving a 404 error message.
Other service
Depending on your requirements for variations of charts, devotion to making this work, and what service you'd like to use for doing this I'd suggest looking for some other way (site/framework) of generating your QR codes that does give you control over this parameter.
Obscure possiblities
Using HTTP headers (not sure if it can be applied to multiple images)
Using download tag on anchor (if seeing the image isn't necessary)
Related
I'm trying to scrape data from this url https://drive.getbigger.io/#/stores, however I didn't find the Xpath of the text I want to export, which are the producer's offer.
Firstly I try the importxml function in Google sheet:
=IMPORTXML(A1;"/html/body/flt-ruler-host/div[23]/p")
and it gave me N/A error "the imported content is empty"
so I try to scrape this website with add-ons and Parsehub, and it gave me every time a .csv file where i can't find the data I want to export.
Also I can't find the right Xpath for the data I would like to scrape, when I use the inspection tool, the data isn't in the <body> part.
However the Xpath I use in my importXML function is some code I find in the <body> part and which is close of the text I'd like to extract (the producer's offer).
It seems that the Xpath I am looking for is linked in the <head> part with some JavaScript code, also when I hover the page with the selection tool in order to scrape the data it select the whole page, maybe because there is a "scroll <div>".
So I wonder if the website use some kind of protection against scraping or other.
Please guys tell me if :
I could find the right Xpath in order to scrape with the importXML function?
Should I extract the data with a python script?
if the website block my attempts, how could I do this?
You won't be able to scrape anything with IMPORTXML formula since the website uses dynamic rendering (javascript).
So yes, Python+Selenium (or other combinations) could do the job. The website won't block you if you follow some rules (switch user-agent, add pauses between requests).
You would probably need these XPath :
Product description :
//p[1][string-length(text())>5][parent::flt-dom-canvas]
Product price :
//p[3][contains(text(),"€") and not (contains(text(),","))][parent::flt-dom-canvas]
However, I think the most elegant way to get the data is probably to use the API the website relies upon. With GoogleSheets and a custom ImportJSON script, you can obtain something like this (result for "fromage" as query) :
It won't work out of the box, you'll have to modify some parts of the script since it won't load a JSON (called with POST) which needs headers in the request. In a nutshell, you need to construct the payload part, add headers to the request ("Bearer XXXXX"), and add a parameter to a function to retrieve the results.
All this depends on your objective and your expected output.
EDIT : For references (constructing the payload, adding parameters) you can read :
https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app#fetchurl,-params
Look also the networktab of your browser developper tools in order to find : the url of the API and the correct parameters to send.
I'm trying to extract a URL from a HTML snippet in string format.
I've been using regex to retrieve the part between href=" and ". However, I noticed that in some cases href links to pages within the website without containing the root URL. For example, a snippet can be like:
<div class="textcontent" id="desc">
<br>
<a rel="nofollow" href="/confirm/url/aHR0cHLy9yYZy50bw%3D%3D/" class="ajaxLink">link</a><br>
Instead of the more usual:
Google
Where I can just use this regex to narrow down my results:
/href\n*=\n*".*?"/
I looked around StackOverflow, and saw a few posts about this (extracting URLs from html/text), and saw a mention of using an external library like JSoup. This is for a Chrome Extension, so I'm hoping to keep it lightweight (if that might be an issue). (JSoup is a Java library not JS).
Are there any good solutions for this "partial URL" problem? Would it be best to just check and append to the URL if root is missing, or would using external library like JSoup be more advised?
Following the direction you took by using a regex, the best approach could be to parse the extracted URL in order to detect one of the following three kinds of URL possibilities:
Protocol://FQDN/Document
/DOCUMENT/
DOCUMENT/
The first case points to an absolute document, the second points to an absolute document but omitting the protocol and the FQDN, and the third points to a relative document.
For the second and third cases you need to know the ommitted information in order to build a complete URL. Assuming you know the URL of the original HTML snippet code, the problem here is to detect what kind of possibility are you facing for each href. If you don't know the original URL, you are in a lack of information, meaning that you can't complete the HREF.
I'm trying to extract an SVG I created with D3 from a webpage. Because I'm using d3.csv to read in my data for the image I'm using a local web server. I've been experimenting with using Andrew Reagan's phantom-crowbar.js code (https://github.com/andyreagan/phantom-crowbar) and while that works great for extracting SVG from http:// pages and file:/// pages, when I try to extract from my page with address http://localhost:8000 then I receive the following message:
TypeError: null is not an object (evaluating 'svg.setAttribute')
phantomjs://webpage.evaluate():32
phantomjs://webpage.evaluate():55
Evaluated our code
"Evaluated our code" is the message you usually receive when the SVG has been successfully extracted but the output file is empty.
I'm new to JavaScript, PhantomJS and working in the browser with D3 so any help would be much appreciated. I really have no idea why the local server page should behave differently.
I ended up asking Andrew Reagan and he got back to me very quickly, the issue was that I was setting the SVG "id" attribute within a function in my script. Since I was using d3.csv() to read in a csv and generate my image everything was wrapped within this function. I set the "id" attribute outside of this using
d3.select("svg").attr("id", mysvg)
and now phantom-crowbar.js works no problem.
#Stephen I need to generate and save multiple D3 images so I needed a way to extract the SVG programmatically and Andrew's phantom-crowbar.js is the best approach I've tried if other people are looking to do something similar. SVG Crowbar bookmarklet is great if you need to do this only on occasion.
so what I want to mimic is the link share feature Facebook provides. You simply enter in the URL and then FB automatically fetches an image, the title, and a short description from the target website. How would one program this in javascript with node.js and other javascript libraries that may be required? I found an example using PHP's fopen function, but i'd rather not include PHP in this project.
Is what I'm asking an example of webscraping? Is all I need to do is retrieve the data from inside the meta tags of the target website, and then also get the image tags using CSS selectors?
If someone can point me in the right direction, that'd be greatly appreciated. Thanks!
Look at THIS post. It discusses scraping with node.js.
HERE you have lots of previous info on scraping with javascript and jquery.
That said, Facebook doesn't actually guess what the title and description and preview are, they (at least most of the time) get that info from meta tags present in the sites that want to be more accessible to fb users.
Maybe you could make use of that existing metadata to pull titles, descriptions and img previews. The docs on the available metadata is HERE.
Yes web-scraping is required and that's the easy part. The hard part is the generic algo to find headings and relevant texts and images.
How to scrape
You can use jsdom to download and create a DOM structure in your server and scrape that using jquery on your server. You can find a good tutorial at blog.nodejitsu.com/jsdom-jquery-in-5-lines-on-nodejs as suggested by #generalhenry above.
What to scrape
I guess a good way to find the heading would be:-
var h;
for(var i=6; i<=1; i++)
if(h = $('h'+i).first()){
break;
}
Now h will have the title or undefined if it fails. The alternative for this could be simply get the page's title tag. :)
As for the images. List all or first few images on that page which are reasonably large, i.e. so as to filter out sprites used for buttons, arrows, etc.
And while fetching the remote data make sure that ProcessExternalResources flag is off. This will ensure that script tags for ads do not pollute the fetched page.
And yes the relevant text would be in some tags after h.
I'm currently working on a Safari Extension to create a printable form based upon information provided within a website. A custom CSS stylesheet wouldn't be ideal, instead I was hoping that it would be possible to do the following...
If I were to have the following DIV on page called name.html
<div id="name">John</div>
Is there a way of getting the contents of #name and passing it into a text field in another page called form.html? Ideally, avoiding server side scripts?
To retrieve the element's text (as in ALL the text, subnodes included):
var value = document.getElementById('name').textContent;
Then to assigned the text to the input field in another page:
document.getElementById('myField').value = value;
Of course that doesn't work across pages. If you don't want to use server-side code for this, one simple way of doing it would be to pass the code in a query string, redirect to your form page, and retrieve the variable from the query parameters. Which sounds simpler than it actually is, as you'd need a function to add a query parameter, another one to read a query parameter, and to be sure that everything is encoded and decoded properly.
Another - bad - alternative could be to use cookies via JavaScript.
Another - better but not yet widespread - alternative could to use the WebStorage API. (see localStorage and/or sessionStorage). This will require a modern browser supporting these APIs (for instance, Google Chrome, IE9, Firefox 4, etc...)
The embedded links will provide the missing parts.