Download web page with one single click - javascript

I need to download hundreds of web pages manually. Unfortunately I can't automate the process since I don't know the urls of the web pages in advance. What I do now is open the page in browser, right click the mouse, select save as , choose a directory on my computer, and save the web page as html file.
It is just too much work since I need to repeat hundreds of times. It would be great if there's a floating button pops up when I open a page. When I click the button, the page will be saved to a default folder on my computer. I'll probably implement this function as a Chrome extension.
I guess this function can be easily realized with simple Javascript code. Unfortunately I have little experience with Javascript at all. It may take me quite a while whereas it's just a breeze for experienced web developers. Anyone help suggest the core lines of code I need to achieve this?

This task isn't best suited for Javascript since it doesn't have the ability to easily save files to your hard drive, in addition you will run into cross-domain issues likely. I would suggest an easy server side language like Go, PHP, or Python as you could set up a script to do this for you quite easily.

Related

Is there any way to edit a local file through a web page without running server software?

I've a web working on a web page that's basically just a big table of links. I use javascript to read from a text file, parse it, and create a table based on that.
I'd like to be able to have a button on the page to add new a row of links and add them to the text file (or another file type if it's better).
I know you can accomplish this with php, node.js, and others, but all the methods I've found require server software to be running. Is there any way around this? For example, is there a way to use javascript to call a python script, or any other way?
The page is just for personal use, so I'd like to avoid running server software just to use it if possible. I know you can set it to download a text file, and you can save it in the same location, but I'd also like to avoid that.
From the research I've done, it doesn't seem possible, but I just thought I'd ask before I give up. Thanks in advance.
You can only read from files locally in browser with javascript.
This would be a huge security vulnerability if scripts in browsers could write files to your machine.

Rendering HTML local file with out browser or networking (lightweight)

I have had a lot of trouble trying to find information or possible examples of this being done.
I would like to render html in a window and take the js from the html and output that to a python code.
The Html is local and there will never be an internet connection for it to run off. Everythin i try shearch for possible answers everyone always seems to relate back to using some small lightweigh browser which in my case isn't an option to use.
Fort some more detail, I am running Selenium-Webdriver
(python) and Iceweasel(Raspberry Pi B+) to get the value of a element from a html page. So using a different browser isnt possible as the lightweight ones are not compatible with selenium. Using Selenium and Iceweasel takes in excess of 2 miunets to fully load up which for what i need it for is far to long.
I had a look into Awesomium but i think it lacks compatability with the Raspberry Pi.
My other thought was to use OpenGL to render the html but found no easy explained examples.
Currently looking into LibRocket, Berkelium and QWebView but again i dont think they will have anythin i need with the compatability i need.
EDIT:
Basically i want a Canvas capeable of rendering HTML to a screen using X11. On the HTML there will be buttons. I want those buttons to preform actions inside a python script.
The way i see it, a browser is basically a toolbar, a canvas and a lot of networking. I want to strip away as much of that as possible and just remain with the canvas.
First go to the directory that you has the local webpage. Than run python -m SimpleHTTPServer 8000. This will "render the html" in a window. Then view source and paste the javascript into a python file. Alternatively if you would like to automate piping the javascript into an out file you can use beautiful soup to select the javascript and write it to any file you want. Then manipulate it in python however you want.

Ruby on Rails code for taking a screenshot of different sections of a page

I am creating a Ruby on Rails app. A specific page in my app is divided into several sections by <div> tags. Each <div> includes a combination of text (using different fonts), symbols and mathematic formulas. I use MathJax and a few other Javascript codes to display them correctly and everything works great on my computer. However Javascript is not enabled on everyone's browser and some Javascript codes might not load correctly on some other people's browsers. One solution I was thinking is this: after all the javascripts are done processing and the page is displayed correctly on my computer (server) I use some code to generate a snapshot of each <div> in PNG and send them to the server (for example I click a <button> tag on the page to activate this code after I'm happy what is displayed is correct). Then I'll save these images in the database and serve them which will look the same on everyone's computer regardless of whether Javascript is enabled, what browser they're using, etc. Is anyone aware of a code or command that I can use? Please note, currently after the page is loaded, Javascripts process the HTML content and produce the correct display. Also I don't want to take a snapshot of the whole page; snapshot of each <div> separately.
Thanks a lot.
Well this is a client-side problem, here is a javascript that will work for you http://experiments.hertzen.com/jsfeedback/
You've got a bit of a problem there. Javascript is not executed until the page has finished loading, i.e. all of the information has already been sent to the client. You're not executing javascript at the server level, so you wouldn't be able to do that kind of processing at all. If they have javascript disabled, your code will never get executed.
You could generate the images using Imagemagick or something similar, I know PHP has bindings for that. There are a couple of extremely messy solutions like rendering it in a browser on the server side with something like selenium, but I definitely wouldn't recommend doing that. Overall, it depends on the platform on which your developing, but most major languages have support for generating images that don't require 100% javascript.

Include a text or XML file via the script tag?

For a little context I'm working on a site for a client and it has to run completely offline. It's just a set of html/css/js files that you run locally. The computers it will be used on are office computers and quite locked down so I can't even use java. Luckily the project isn't overly complicated and I've accomplished most of my goals with this limited platform. The issue I'm having is I want to create some easy to change files to load the data from. Right now I have all the data loading through script tags that point to js files that can be manually edited, however I've tried to make the javascript as simple and straight forward as I can but it's still not looking very friendly to someone who hasn't programmed before.
What I would like to do is include an xml file or text file in the HTML using a script tag or something similar and then use JS to read the contents but every time I try this it doesn't actually load the file. Here's a few things I've tried:
<script type="text/xml" src="myxml.xml"></script>
<script type="text/plain" src="myxml.xml"></script>
I've tried using XMLHttpRequest but most of these attempts end in the same result.. can't do a cross-site request. Even though I'm using a url "myxml.xml" and they're in the same folder, chrome is still convinced this is a XSS attempt. So I'm starting to run out of ideas. Can anyone think of any clever way to achieve this?
IF you're goal is to just run your web-app, even offline and you do not care about cross-browser compatibility, you can consider to convert you're application to Packaged App.
It will work only in google-chrome browser but setting the right permissions, you should not have problem with cross-site requests. At this point, you could download the xml content through a noraml XMLHttpRequest.

Using Celerity to download a file

I'm using Celerity in JRuby to automate the download of some .csv files from certain websites. For one of the websites (LinkShare), I've gotten very close but cannot figure out the last step.
The website pushes the file download using javascript and the 'hidden iframe' method - during regular browsing, when you click the download button, it calls javascript that creates a hidden iframe containing the download content, and the browser picks that up and prompts the user to save the file.
Obviously doesn't work quite the same way in Celerity. I can see the new iframe in jirb after I've clicked the link, but can't call any methods on it, getting errors like:
NoMethodError: undefined method `getDocumentElement' for #<Java::ComGargoylesoftwareHtmlunit::TextPage:0x184e6efc>
Anybody have enough experience with Celerity/Htmlunit/Javascript/Jruby that they can point me in the right direction? I just want to retrieve the download content (the .csv file).
Alternately, does anybody know of a (headless) browser automation tool that would be better suited for the task, if one exists?
The first thing I'd do is check that you're navigating to the frame. A frame (even an iframe) is treated as a completely separate window, and you'll have to navigate there first. Check the Celerity::Frames class.
Failing that, you may want to try a library that controls a browser, rather than emulate it. Libraries that emulate a browser (such as htmlunit and mechanize) have their limits, and you may have found one. For this, I'd recommend using watir/firewatir.
Mechanize may work for you, it's meant to more closely resemble a normal person's usage of a browser, while remaining headless.
http://mechanize.rubyforge.org/
As ehsanul said Mechanize might be a good starting point. You'll need to figure out the URL being accessed to retrieve the file. Also, look for a cookie or session ID identifying your session to the host. Mechanize should capture that and return it as that's part of what it does.

Categories