How to scrape images from a site using javascript?

How to scrape images from a site using javascript? - javascript

I am attempting to write a script in javascript to scrape images from a site and save them to my computer.
I have managed to make the script isolate the image tag that contains the image I want using jQuery. So I have a jquery selection:
<img src="sourceofimage.com/path/img">
My question is how can I now save this image to my computer?
I tried searching but all the results I got were about doing things like making a download button or other user facing tasks. To be clear, I will be the only one running this script and it will be run by pasting it into the console.
I only want a way to programmatically download the image and set its filename once jQuery has isolated it. Is this possible?
Edit: Can somebody kindly explain why this is receiving so many downvotes?

This would apply to every single image on your page which is the direct child of an anchor, but you could use:
'$('a > img').each(function(){
var $this = $(this);
$this.parent('a').attr('href', $this.attr('src'));
});
But it would do the job.
Only thing is though, users with JS disabled will see an anchor with an empty href. The following would achieve the same end result with the added benefit of simplifying your code (cleaner HTML) and adding graceful degradation:
'<img src="folio/1.jpg" class="downloadable" />
$('img.downloadable').each(function(){
var $this = $(this);
$this.wrap('<a href="' + $this.attr('src') + '" download />')

Try the fs library
fs.writeFile('logo.png', imagedata, 'binary', function(err){
if (err) throw err
console.log('File saved.')

Within a web browser it basically can't be done you can't write directly to the file system (may be possible with browser extensions however I haven't looked at this in a while).
Using node there's nothing stopping you doing something like:
Use http to retrieve the HTML
Use jQuery to parse the html - something like $(html).find('img');
Generate a http request to each image to download them
Save it to disk using fs

Related

Get real path instead of 'fakepath' in file upload

I face the following problem:
When a user uploads a file with the HTML file input and I then want to receive the file path itself. I only get C:/fakepath/filename.txt for example.
I understand that it is a security reason for browsers to know the exact path of the file. So i was wondering if it is even possible with some hack, some way in .net or with additional jquery/js plugin to get the full path of the file.
Why?
We dont want to upload the file itself to our server filesystem, neither to the database. We just want to store the local path in the database so when the same user opens the site, he can click on that path and his local file system opens.
Any suggestions or recommendations for this approach?
If this is really not possible like
How to resolve the C:\fakepath?
How To Get Real Path Of A File Using Jquery
we would need to come up with a diffrent idea I guess. But since some of the answers are really old, I thought maybe there is a solution to it by now. Thx everyone

As my goal was to make the uploaded file name visible to the End User and then send it via php mail() function, All I did to resolve this was:
in your js file
Old function:
var fileuploadinit = function(){
$('#career_resume').change(function(){
var pathwithfilename = $('#career_resume').val();
$('.uploadedfile').html("Uploaded File Name :" + pathwithfilename).css({
'display':'block'
});
});
};
Corrected function:
var fileuploadinit = function(){
$('#career_resume').change(function(){
var pathwithfilename = $('#career_resume').val();
var filename = pathwithfilename.substring(12);
$('.uploadedfile').html("Uploaded File Name :" + filename).css({
'display':'block'
});
});
};
$(document).ready(function () {
fileuploadinit();
});
Old result:
Uploaded File Name :C:\fakepath\Coverpage.pdf
New result:
Uploaded File Name :Coverpage.pdf
Hope it helps :)

You can't do it.
And if you find a way, it's big security vulnerability that the browser manufacturer will fix when discovered.

You'll need your own code running outside browser-box to do this, since browsers are designed NOT to allow this.
I mean something ugly like ActiveX, flash, COM object, custom browser extenstion or other fancy security breach that can open it's own OpenFileDialog and insert that value in your input field.

How to Download multiple files from javascript

I am trying to use window.location.href in a loop to download multiple files
I have a table in which i can select file's, then i run a loop of selected and
try navigate to the file path to download the files.
I keep only getting the last file to download.
I think it's due to the location herf only taking action after my javascript finishes and not as the code runs.
When i have a break point on the window.location.herf it still only downloads the last file and only when i let the code run through.
Is there a better way to initiate multiple downloads from a javascript loop.
$("#btnDownload").click(function () {
var table = $('#DocuTable').DataTable();
var rows_selected = table.rows('.selected').data();
$.each(rows_selected, function (i, v) {
window.location.href = v.FilePath;
});
});

In some browsers (at least Google Chrome) support the follow:
$("<a download/>").attr("href", "https://code.jquery.com/jquery-3.1.0.min.js").get(0).click();
$("<a download/>").attr("href", "https://code.jquery.com/jquery-3.1.0.min.js").get(0).click();
$("<a download/>").attr("href", "https://code.jquery.com/jquery-3.1.0.min.js").get(0).click();
JSFiddle: https://jsfiddle.net/padk08zc/

I would make use of iframes and a script to force the download of the files as Joe Enos and cmizzi have suggested.
The answer here will help with JavaScript for opening multiple iframes for each file:
Download multiple files with a single action
The answers for popular languages will help with forcing downloads if the URL is actually something that can be served correctly over the web:
PHP: How to force file download with PHP
.Net: Force download of a file on web server - ASP .NET C#
NodeJS: Download a file from NodeJS Server using Express
Ruby: Force browser to download file instead of opening it
Ensure you change the links to point to your download script and also make sure you add the appropriate security checks. You wouldn't want to allow anyone to abuse your script.

Though this looks like an old post and I stumbled on this while trying to solve a similar issue. So, just giving a solution which might help. I was able to download the files but not in the same tab. You can just replace the event handler with download which is provided below. The urls is an array of presigned S3 URLs.
The entire code looks like below:
download(urls: any) {
var self = this;
var url = urls.pop();
setTimeout(function(){
var a = document.createElement('a');
a.setAttribute('href', url);
document.body.appendChild(a);
a.setAttribute('download', '');
a.setAttribute('target', '_blank');
a.click();
// a.remove();
}, 1000)
}

HTML 5 anchor tag download incomplete file?

I am using angular and ASP.NET Web API to allow users to download files that are generated on the server.
HTML Markup for download link:
<img src="/content/images/table_excel.png">
<a ng-click="exportToExcel(report.Id)">Excel Model</a>
<a id="report_{{report.Id}}" target="_self"></a>
The last anchor tag is there to serve as a place holder for an automatic click event. The visible anchor calls the exportToExcel method to initiate the call to the server and begin creating the file.
$scope.exportToExcel = function(reportId) {
reportService.excelExport(reportId, function (result) {
var url = "/files/report_" + reportId + "/" + result.data.Model.fileName;
var dLink = document.getElementById("report_" + reportId);
dLink.href = url;
dLink.setAttribute('download', result.data.Model.fileName);
dLink.click();
});
}
The Web API code creates an Excel file. The file, on the server is about 279k, but when it is downloaded on the client it is only 7k. My first thought was that the automatic click might be happening before the file is completely written. So, I added a 10 second $timeout around the click event as a test. It failed with the same result.
This seems to only be happening on our remote QA server. On my local development server I always get the entire file back. I am at a loss as to why this might be happening. We have similar functionality where files are constructed from a database blob and saved to the local disk for download. The same method is employed for the client side download and that seems to work fine. I am wondering if anyone else has run into a similar issue.
Update
After the comment by SilentTremmor we think it actually may be IIS or some sort of Sever issue. Originally, we didn't think it could be, but after some digging it may be. It seems the instance of the client code is only allowing 7k of data to be downloaded. It doesn't matter what we try to download the result is always the same.

It turns out the API application was writing the file to a different instance of our application. The client code had no idea and was trying to download a file that did not exist. So, when the download link was creating the file it was empty, thus the small file size.

Is it possible to retrieve text files from HTML app directory without HTTP request or <input>?

I'm working on an HTML/javascript app intended to be run locally.
When dealing with img tags, it is possible to set the src attribute to a file name with a relative path and thereby quickly and easily load an image from the app's directory. I would like to use a similar method to retrieve a text file from the app's directory.
I have used TideSDK, but it is less lightweight. And I am aware of HTTP requests, but if I remember correctly only Firefox has taken kindly to my use of this for local file access (although accessing local images with src does not appear to be an issue). I am also aware of the FileReader object; however, my interface requires that I load a file based on the file name and not based on a file-browser selection as with <input type="file">.
Is there some way of accomplishing this type of file access, or am I stuck with the methods mentioned above?

The browser will not permit you to access files like that but you can make javascript files instead of text files like this:
text1.js:
document.write('This is the text I want to show in here.'); //this is the content of the javascript file
Now call it anywhere you like:
<script type="text/javascript" src="text1.js"></script>

There are too many security issues (restrictions) within browsers making many local web-apps impossible to implement so my solution to a similar problem was to move out of browsers and into node-webkit which combines Chromium + Node.js + your scripts, into an executable with full disk I/O.
http://nwjs.io/

[edit] I'm sorry I thought you wanted to do this with TideSDK, I'll let my answer in case you want to give another try to TideSDK [/edit]
I'm not sure if it's what you're looking for but I will try to explain my case.
I've an application which allow the user to save the state of his progress. To do this, I allow him to select a folder, enter a filename and write this file. When the user open the app, he can open the saved file, and get back his progress. So I assume this enhancement is similar of what you are looking for.
In my case, I use the native File Select to allow the user to select a specific save (I'm using CoffeeScript) :
Ti.UI.currentWindow.openFileChooserDialog(_fileSelected, {
title: 'Select a file'
path: Ti.Filesystem.getDocumentsDirectory().nativePath()
multiple: false
})
(related doc http://tidesdk.multipart.net/docs/user-dev/generated/#!/api/Ti.UI.UserWindow-method-openFileChooserDialog)
When this step is done I will open the selected file :
if !filePath?
fileToLoad = Ti.Filesystem.getFile(scope.fileSelected.nativePath())
else
fileToLoad = Ti.Filesystem.getFile(filePath)
data = Ti.JSON.parse(fileToLoad.read())
(related doc http://tidesdk.multipart.net/docs/user-dev/generated/#!/api/Ti.Filesystem)
Please note that those snippets are copy/paste from my project and they will not work without the rest of my code but I think it's enough to illustrate you how I manage to open a file, and read his content.
In this case I'm using Ti.JSON.parse because there is only javascript object in these files but in your case you can just get the content. The openFileChooserDialog isn't mandatory, if you already know the file name, or if you get it from another way you can use Ti.Filesystem in your own way.

How can I locally save an .html file generated by javascript (running on a local .html page)?

So I've been researching this for a couple days and haven't come up with anything conclusive. I'm trying to create a (very) rudimentary liveblogging setup because I don't want to pay for something like CoverItLive. My process is: Local HTML file > Cloud storage (Dropbox/Drive/etc) > iframe on content page. All that works, and with some CSS even looks pretty nice despite the less-than-awesome approach. But here's the thing: the liveblog itself is made up of an HTML table, and I have to manually copy/paste the code for a new row, fill in the timestamp, write the new message, and save the document (which then syncs with the cloud and shows up in the iframe). To simplify the process I've made another HTML file which I intend to run locally and use to add entries to the table automatically. At the moment it's just a bunch of input boxes and some javascript to automate the timestamp and write the table row from the input data.
Code, as it stands now: http://jsfiddle.net/LukeLC/999bH/
What I'm looking to do from here is find a way to somehow export the generated table data to another .html file on my hard drive. So far I've managed to get this code...
if(document.documentElement && document.documentElement.innerHTML){
var a=document.getElementById("tblive").innerHTML;
a=a.replace(/</g,'<');
var w=window.open();
w.document.open();
w.document.write('<pre><tblive>\n'+a+'\n</tblive></pre>');
w.document.close();
}
}
...to open just the generated table code in a new window, and sure, I can save the source from there, but the whole point is to eliminate steps like that from the process.
How can I tell the page to save the generated code to a separate .html file when I click on the 'submit' button? Again, all of this happens locally, not on a server.
I'm not very good with javascript--and maybe a different language will be necessary--but any help is much appreciated.

I suppose you could do something like this:
var myHTMLDoc = "<html><head><title>mydoc</title></head><body>This is a test page</body></html>";
var uri = "data:application/octet-stream;base64,"+btoa(myHTMLDoc);
document.location = uri;
BTW, btoa might not be cross-browser, I think modern browsers all have it, but older versions of IE don't. AFAIK base64 isn't even needed. you might be able to get away with
var uri = "data:application/octet-stream,"+myHTMLDoc;
Drawbacks with this is that you can't set the filename when it gets saved

You cant do this with javascript but you can have a HTML5 link to open save dialogue:
<a href="pageToDownload.html" download>Download</a>
You could add some smarts to automate it on the processed page after the POST.
fiddle : http://jsfiddle.net/ghQ9M/

Simple answer, you can't.
JavaScript is restricted to perform such operations due to security reasons.
The best way to accomplish that, would be, to call a server page that would write
the new file on the server. Then from javascript perform a POST request to the
server page passing the data you want to write to the new file.
If you want the user to save the page to it's file system, this is a different
problem and the best approach to accomplish that, would be to, notify the user/ask him
to save the page, that page could be your new window like you are doing w.open().
Let me do some demonstration for you:
//assuming you know jquery or are willing to use it :)
var html = $("#tblive").html().replace(/</g, '<');
//generating your download button
$.post('generate_page.php', { content: html })
.done(function( data ) {
var filename = data;
//inject some html to allow user to navigate to the new page (example)
$('#tblive').parent().append(
'Check your Dynamic Page!');
// you data here, is the response from the server so you can return
// your new dynamic page file name here.
// and maybe to some window.location="new page";
});
On the server side, something like this:
<?php
if($_REQUEST["content"]){
$pagename = uniqid("page_", true) . '.html';
file_put_contents($pagename, $_REQUEST["content"]);
echo $pagename;
}
?>
Some notes, I haven't tested the example, but it works in theory.
I assume that with this the effort to implement it should be minimal, assuming this solves your problem.

A server based solution:
You'll need to set up a server (or your PC) to serve your HTML page with headers that tell your browser to download the page instead of processing the HTML markup. If you want to do this on your local machine, you can use software such as WAMP (or MAMP for Mac or LAMP for Linux) that is basically a web server in a .exe. It's a lot of hassle but it'll work.

We Keep Coding

JavaScript is the programming language of the Web.

How to scrape images from a site using javascript? - javascript

Try the fs library fs.writeFile('logo.png', imagedata, 'binary', function(err){ if (err) throw err console.log('File saved.')

Related

Get real path instead of 'fakepath' in file upload

How to Download multiple files from javascript

HTML 5 anchor tag download incomplete file?

Is it possible to retrieve text files from HTML app directory without HTTP request or <input>?

How can I locally save an .html file generated by javascript (running on a local .html page)?

Categories

Resources

We Keep Coding

JavaScript is the programming language of the Web.

How to scrape images from a site using javascript? - javascript

Try the fs library fs.writeFile('logo.png', imagedata, 'binary', function(err){ if (err) throw err console.log('File saved.')

Related

Get real path instead of 'fakepath' in file upload

How to Download multiple files from javascript

HTML 5 anchor tag download incomplete file?

Is it possible to retrieve text files from HTML app directory without HTTP request or <input>?

How can I *locally* save an .html file generated by javascript (running on a *local* .html page)?

Categories

Resources

How can I locally save an .html file generated by javascript (running on a local .html page)?