HTML DOM to Download Image from <img> URI - javascript

I have created a list of all page uris I'd like to download an image from for a vehicle service manual.
The images are delivered via a PHP script,as can be seen here http://www.atfinley.com/service/index.php?cat=g2&page=32
This is probably meant to deter behaviors like my own, however, every single Acura Legend owner shouldn't depend on a single host for their vehicle's manual.
I'd like to design a bot in JS/Java that can visit every url I've stored in this txt document https://pastebin.com/yXdMJipq
To automate the download of the available png at the resource.
I'll eventually be creating a pdf of the manual, and publishing it for open and free use.
If anyone has ideas for libraries I could use, or ways to delve into the solution, please let me know. I am most fluent in Java.
I'm thinking a solution might be to fetch the html document at each url, and download the image from the <img src>argument.

I have written something similar but unfortunately, i can't find it anymore. Nevertheless, i remember using the JSoup Java-library which comes in pretty handy.
It includes an HTTP-client and you can run CSS-selectors on the document just like with jQuery...
This is the example from their frontpage:
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");
Creating PDFs is quite tricky, but i use Apache PDFBox for such things...

I know you asked for a JavaScript solution but I believe PHP (which you also added as a tag) is more suitable for the task. Here are some guidelines to get you started:
Move all the URLs into an array and create a foreach loop that will iterate on it.
Inside the loop use the PHP Simple HTML DOM Parser to retrieve the image URL attribute for each page.
Still inside the loop use the URL for the image in a CURL request to grab the file from that and save it into your custom folder. You can find the code required for this part here.
If this process proves to be too long and you get a PHP runtime error consider storing the URLs generated by step 2 in a file and then using that file to generate a new array and run step 3 on it as a separate process.

Finished solution for grabbing image urls;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
import java.util.Scanner;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class Acura {
public static void main(String[] args) throws IOException {
Scanner read;
Writer write;
try {
File list = new File("F:/result.txt");
read = new Scanner(list);
write = new FileWriter("F:/imgurls.txt");
double s = 0;
while(read.hasNextLine())
try {
s++;
String url = read.nextLine();
Document doc = Jsoup.connect(url).get();
Element img = doc.select("img").first();
String imgUrl = img.absUrl("src");
write.write(imgUrl + "\n");
System.out.println((double)(s/2690) + "%");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
read.close();
write.close();
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
}
}
Generates a nice long list of image urls in a text document.
Could have done it in a non-sequential manner, but was heavily intoxicated when I did this. However I did add a progress bar for my own peace of mind :)

Scanner read;
Writer write;
try {
File list = new File("F:/imgurls.txt");
read = new Scanner(list);
double s = 0;
while(read.hasNextLine())
try {
s++;
String url = read.nextLine();
Response imageResponse = Jsoup.connect(url).ignoreContentType(true).execute();
FileOutputStream writer = new FileOutputStream(new java.io.File("F:/Acura/" + (int) s + ".png"));
writer.write(imageResponse.bodyAsBytes());
writer.close();
System.out.println((double)(s/2690) + "%");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
read.close();
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
}
Worked for generating pngs

Related

How do i add sound to jgrasp code?

I found this online and manipulated it to the best of my ability but I cannot get it to work.
import java.io.*;
import sun.audio.*;
/**
* A simple Java sound file example (i.e., Java code to play a sound file).
* AudioStream and AudioPlayer code comes from a javaworld.com example.
* #author alvin alexander, devdaily.com.
*/
public class SoundTest
{
public static void main(String[] args)
throws IOException
{
// open the sound file as a Java input stream
String gongFile = "C:/Users/jd186856/Desktop/SoundTest/IMALEMON.au";
InputStream in = new FileInputStream(gongFile);
// create an audiostream from the inputstream
AudioStream audioStream = new AudioStream(in);
// play the audio clip with the audioplayer class
AudioPlayer.player.start(audioStream);
}
}
Here are the error codes:
Exception in thread "main" java.io.IOException: could not create audio
stream from input stream at
sun.audio.AudioStream.(AudioStream.java:80) at
SoundTest.main(SoundTest.java:23)
Thanks for any help in advance!!
I'm not able to find references to the Java class AudioStream except in a listing from 1997!
The current practice is to use an AudioInputStream. This can be found in the API for Java 7 & 8.
The Java Sound Tutorials are a difficult read but cover the current practices. See the section on "playing back audio". When importing a sound, I always use the URL, and avoid creating an InputStream as an intermediate step. The code which converts InputStreams to AudioInputStreams may apply "Mark/Reset" tests to the InputStream which can fail. Making an AudioInputStream directly from a URL avoids this test.
In fact, if AudioStream allows you to use a URL as a source, changing it to use the URL of your source file might fix your code, but I wouldn't count on it working on every Java system out there, given that many sun libraries have been deprecated. (I don't know for sure if this one has been deprecated or not, but not finding it in the current API is telling.)
Another dodge that sometimes works is to wrap the InputStream in a BufferedInputStream, as this class implements the mark and reset methods that sometimes cause errors when attempting to use an InputStream.

How can I take a screenshot of a whole web page?

I want to take the screenshot of Jsp page in the browser. I had googled a lot. Everyone is pointing to java.awt.Robot functionality. It is great. But what i need is i want the screenshot of the full web page which is also inside the scrollable area of the browser window. Moreover i want only the webpage content not the status bar and other tabs and menus on the browser. I had used the following code.
public class ScreenCapture {
public void TakeCapture()
{
try
{
Robot robot = new Robot();
String format = "jpg";
String fileName = "D:\\PDFTest\\PartialScreenshot." + format;
Dimension screenSize = Toolkit.getDefaultToolkit().getScreenSize();
Rectangle captureRect = new Rectangle(0, 0, screenSize.width , screenSize.height);
BufferedImage screenFullImage = robot.createScreenCapture(captureRect);
ImageIO.write(screenFullImage, format, new File(fileName));
Document document = new Document();
String input = "D:\\PDFTest\\PartialScreenshot.jpg";
String output = "D:\\PDFTest\\PartialScreenshot.pdf";
try {
FileOutputStream fos = new FileOutputStream(output);
PdfWriter writer = PdfWriter.getInstance(document, fos);
writer.open();
document.open();
document.add(Image.getInstance(input));
document.close();
writer.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
catch (AWTException | IOException ex) {
System.err.println(ex);
}
}
public String getTakeCapture() {
return getTakeCapture();
}
Is there a way to take the screen shot of the full JSP webpage that is viewing in the browser.(Along the content inside the scrollable window) and then I have to convert this screenshot into PDF. Don't tell me the ways to directly convert it into the PDF using FlyingSaucer as it's not working in my case.
This is not possible in pure Java.
However you can add the html2canvas library to your JSP page. You can then use Javascript to submit the canvas image to your servlet and process it as you please.
See the following question and answer that deals with similar problem: How to upload a screenshot using html2canvas?

Load resources (javascript) in a web application framework (wicket) correctly?

I am using Apache Wicket as my webapp framework and I have a following structure:
What I need:
1) The javascript file Rules.js should be loaded and readable for my Custom Java Class Process.java. Because in my Process.java I have something like this:
private String readFile(String filename) throws IOException {
File file = new File("PATH");
//reads the file and returns a string
...
}
So the first question would be: What path instead of "PATH" should I use, when my resource Rules.js is in the resource folder or any other package folder?
2) The Process.java not only reads the file, it also manipulates my Rules.js file for example with json. So the following line should be added to the Rules.js file:
var object = {JSON STRING};
I know how to do this already.
3) After the manipulation of the Rules.js file it should automatically be updated of course.
So the second question would be: What else should I add and where to my application so the Rules.js is a file that is aviable for all needed classes in my application and is always up to date during the session.
I tried a lot of things already, but I am missing here something that I cant figure out...
You can create your own IResource or even use the ContextRelativeResource if you want to keep your rules.js file on the webapp root.
At your WebApplication you have to register the resource as a shared resource:
#Override
public void init() {
super.init();
getSharedResources().add(RULES_RESOURCE_KEY, new ContextRelativeResource(RULES_CONTEXT_FILE));
}
public static ResourceReference getRulesResourceReference() {
return Application.get().getSharedResources().get(Application.class, RULES_RESOURCE_KEY, null, null, null, true);
}
Here an example method that modifies the rules.js file.
public static void changeRules() {
try {
// Get a context URL
URL rules = WebApplication.get()
.getServletContext()
.getResource(RULES_CONTEXT_FILE);
// Update the file
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(new File(rules.toURI())));
out.write("var rules=\"" + Long.toHexString(System.currentTimeMillis()) + "\" ");
out.close();
} catch (Exception e) {
throw new WicketRuntimeException(e);
}
}
At your HomePage class or wherever you need update the rules.js:
public HomePage(final PageParameters parameters) {
super(parameters);
add(new Link("changerules") {
#Override
public void onClick() {
WicketApplication.changeRules();
}
});
}
At the page that you have your Form add a JavaScriptHeaderItem:
#Override
public void renderHead(IHeaderResponse response) {
response.render(
JavaScriptHeaderItem.forReference(WicketApplication.getRulesResourceReference())
);
}
The HTML to run this example if you want to test it:
<html>
<body onload="alert(rules);">
<a wicket:id="changerules" href="#">Change rules files</a>
</body>
</html>
And don't forget to create an initial version of rules.js at your webapp folder.
I think that it's not a good idea to update the files in the webapp folder, because the container can replace it for the initial version at redeploy time. So it's better to use your own IResource and put the file in an external folder. Check this post if you need more details.

ASP.NET Custom Server Control with lots of Embedded JavaScript and other Static Files

I am trying to create a Custom Server Control with lots of JavaScript and other static files embedded.
My problem is how to bundle and register them easily. I have an approach as follows but I don't think it would be good idea to register every single javascript file one by one.
This is the code which I have put inside my AssemblyInfo.cs file :
[assembly: WebResource("CustomControl.Scripts.Default.js", "text/javascript")]
The following code is for my custom control to register the .js:
protected override void OnPreRender(EventArgs e) {
base.OnPreRender(e);
string resourceName = "CustomControl.Scripts.Default.js";
ClientScriptManager cs = this.Page.ClientScript;
cs.RegisterClientScriptResource(typeof(CustomControl.MyControl), resourceName);
}
Also, that would be great to reach out the file from the web application like below :
CustomControl/scripts/default.js
In Visual Studio, you can embed resources in the assembly and then programmatically retrieve them like so:
// TODO: Get the correct assembly, this is just an example.
var assembly = GetType().Assembly;
var resourceNames = assembly.GetManifestResourceNames();
foreach (var resourceName in resourceNames)
{
using (var stream = assembly.GetManifestResourceStream(resourceName))
{
using (var reader = new StreamReader(stream))
{
// This will contain the contents of the embedded resource
string resource = reader.ReadToEnd();
}
}
}
You of course need to adjust the above code to your requirements, but the basics should be the same.

Reading Google Gears blobs with JavaScript

Does anybody know how to read google gears blob objects within the browser? I'm using gwt on top of gears, but I'm looking for any kind of solutions. The application needs to work fully offline so I can't post the files and process them server side. My files are simple text files that I want to upload and parse in offline mode.
I wrote a very simple class to do this you can check it out here:
http://procbits.com/2009/07/29/read-file-contents-blobs-in-gwt-and-gears/
It's very simple to use. Either call the method "readAllText" or you can read it line by line. Here is an example reading line by line:
try {
Desktop dt = Factory.getInstance().createDesktop();
dt.openFiles(new OpenFilesHandler(){
public void onOpenFiles(OpenFilesEvent event) {
File[] files = event.getFiles();
File file = files[0];
Blob data = file.getBlob();
BlobReader br = new BlobReader(data);
while (!br.endOfBlob())
Window.alert(br.readLine());
}
}, true);
} catch (Exception ex){
Window.alert(ex.toString());
}
I hope this helps!
Have you looked at the Google Gears API documentation (for JavaScript)?

Categories