I need to convert a website page with all of it's external stylesheets to a single html with inline css (not for email use but to include styled portions of that page into another website's page), I've found so far CssToInlineStyles wich does most of the job very well, but I still have to merge all stylesheets into one file and then pass it to that class.
scrapbook for firefox seems fine, only drawback : no contol over changing urls of resources (images in html or in css) it make a whole snapshot of the webpage available offline with no overrides
Related
I'am trying to copy a whole html page, but the css and images & javascript files are external, if there were only few of them I could copy them manually, but what if there are many of them....the links in the html pages to those files refers as local, is there a way I can copy all of the files exactly as they are in the html page? is there a tool for that? cause I can't do it in the chrome console
You can save a website in the MHTML format (short for MIME Encapsulation of Aggregate HTML Documents) which is an HTML document along its assets like styles or images in one single document.
Some browsers do support that format (e.g. Chrome “Save complete website”), for other clients you'
ll need to install a plugin.
See: https://en.wikipedia.org/wiki/MHTML
Can I use one script file in two different HTML files (HTML1 and HTML2). How can I specify which document to get the element form when I use DOM? Or will it automatically identify which document is by the element id??
Yes.
Take javascript CDNs for an example, you potentially have one javascript file used by millions of websites, let alone web pages.
The script itself doesn't care or know what page has loaded it. It is loaded by the browser and if you are doing DOM manipulation it works with whatever DOM has been loaded into the browser via the HTML in the loaded page.
It is up to you, the developer, to make sure that you have the appropriate hooks in your HTML for your javascript to use.
I'm working on an application that needs to download the source of a web page from a link, with all the internal files, like images, css, javascript.
After, I will need to open this html in a webview, in offline mode, that's why I need to download everything from the page.
I'd download the images using JSOUP, but I haven't ideia how to link them into the downloaded html.
Could you give me some examples, or starting points where to look to start?
Thanks in advance
Essentially, what you'll need to do (and what my app mentioned below does) is go over all the references links to additional additional assets / images / scripts and so on, download them, and then change the HTML document to point to the local downloaded copy. Something like this, with Jsoup:
Find all the img elements on the page,
Get the location / url of the image file from the src attribute of the img elements (with .attr("abs:src:)),
Download all of those images to a local directory
Change each of the image elements src attribute values to point to the location of the downloaded image file, relative to where the main HTML file will be stored, eg with .attr("src", "assets/imagefilename.png"").
Do this for all other assets required by the page, eg. images, CSS, scripts, html5 video, and others. I also did some regex on the CSS (both linked and inline) to extract, download, and rewrite things like background image references and in the css. Webpages also have other linked things like favicons or RSS feeds which you might want too.
Save your Jsoup document (with the modified URLs pointing to your downloaded versions of the assets) to file, by calling .toString() on it and saving the result to a file.
You can then open the local HTML file in webview, and, assuming you have done everything right, it will show with all images and assets, even offline.
I actually wrote an Android app which does exactly this: save a complete HTML file and all of the CSS / images / other assets to a local file / directory, using Jsoup.
See https://github.com/JonasCz/SaveForOffline/ for the source, specifically SaveService.java for the actual HTML page saving / downloading code.
Beware that it's GPL licensed, so you have to comply with the GPL license if you use (parts of) it.
Also beware that it does a lot of things, and is quite messy as a result, (there's also no comments or documentation either...) but it may help you.
You can do it with Jsoup. IMO, it's a lot of work. On the other, you can consider Crawler4j.
There is a tutorial on their website. Have look to the example for crawling images.
Technique #1:
Single index.php file which includes header.php, navigation.php, footer.php and content files depending on URL variable.
Problems #1:
You can't add individual .CSS files specific to your content pages because they have to be added in the main index.php file and may conflict with other content pages.
If you have javascript's needed for specific content pages you MUST load all potentially used JS files in the index.php file. This means you unnecessarily load JS files for content pages where they aren't needed.
Technique #2:
template.php file for each major page of website which includes header.php, navigation.php, footer.php. Content is not included via a file but rather the template file is used as the content file.
Problems #2:
Any changes made to the the template has to be duplicated to every other major page manually.
I started using technique #1 until I ran into major javascript issues. I am now considering moving to technique #2 and just dealing with template changes as necessary.
What technique do you use and how do you solve the CSS/JS include issue?
I tend to have one CSS for the whole web site or at most 2 CSS file when a unique CSS file will be very long. In such case, I define in the first CSS file the general ayout of the web site and the common structure shared by the web site pages, and in the second CSS file I will define the layout specific to a page or to an object.
I need a way to take a web page that's already loaded in the page and save the page's full DOM (as an HTML string) such that were I to load the HTML offline as a single file, it would preserve the effects of all CSS and whatever scripts had been run prior to saving it. Keeping the images would be a bonus, but even having them missing but with a placeholder so that the layout is preserved is fine.
The catch is I can't reload or requery any of the resource files (JS/CSS). Fonts are not important.
This means the resulting HTML can't refer to external files. Is this even possible using just JavaScript?
EDIT:
1) This needs to be a programmatic solution using JavaScript, not a browser UI solution.
you can store the entire HTML along with inline CSS as a var in in JavaScript (which you write ). Maybe you can write some JS which uses HTML5 local storage to store the external JS/CSS resources and use them later while loading the page offline.