Download a 'data:' image/file using puppeteer and node.js - javascript

I'm trying to download an image using node.js and puppeteer but I'm running into some issues. I'm using a webscraper to gather the links of the images from the site and then using the https/http package to download the image.
This works for the images using http and https sources but some images have links that look like this (the whole link is very long so I cut the rest):
........
I'm not sure how to handle these links or how to download the image. Any help would be appreciated.

You need to first decode the url from base64 using node.js Buffer.
// the content type image/png has to be removed first
const data = 'iVBORw0KGgoAAAANSUhEUgAAAw8AAADGCAYAAACU07w3AAAZuUlEQVR4Ae3df4yU930n8Pcslu1I1PU17okdO1cLrTD+g8rNcvRyti6247K5NG5S5HOl5hA2uZ7du6RJEGYPTFy1Nv4RUJy0cWVkeQ9ErqqriHNrR8niZuVIbntBS886rBZWCGHVsNEFRQ5BloPCzGn2B+yzZMLyaP';
const buffer = new Buffer(data);
const base64data = buff.toString('base64');
// after this you will get the url string and continue to fetch the image

These are the base64 encoded images (mostly used for icons and small images).
you can ignore it.
if(url.startsWith('data:')){
//base 64 image
} else{
// an image url
}
if you really want to mess with base64 I can give you a workaround.
import { parseDataURI } from 'dauria';
import mimeTypes from 'mime-types';
const fileContent = parseDataURI(file);
// you probably need an extension for that image.
let ext = mimeTypes.extension(fileContent.MIME) || 'bin';
fs.writeFile("a random file"+"."+ext, fileContent.buffer, function (err) {
console.log(err); // writes out file without error, but it's not a valid image
});

Related

How to save to disk a file of unknown content type, when receiving it as a binary stream in a string. (All my attempts corrupt the file)

I receive a string from an API that is a binary stream and a filename, here is an example (Shortened for brevity).
{ filename: '002-000017.pdf' body: '%PDF-1.7\n\n4 0 obj\n(Identity)\nendobj\n5 0 obj\n(Adobe)\nendobj\n8 0 obj\n<<\n/Filter /FlateDecode\n/Length 181864\n/Length1 542744\n/Type /Stream\n>>\nstream\nx��}\t`�������}dw����~��\t9 #H6'\t�9 e.t.c ' }
Other files simply have \u001c\u0004N�v�$$\u0010$�\u0000
In the above example it is a PDF file, but I dont know what content type the file will be, it could be a .docx, .txt or even no file extension. Also the name will not always have an extension. I take this file and convert it to blob, that I attach to and click an anchor.
I have tried specifying no type, 'application/unknown' and 'application/octet-stream'. I even tried hardcoding it to 'application/pdf' for the above sample, just to see if that would work. In every scenario the pdf is blank. (Or unreadable with other file types, docx for example just crashes out when opening with word).
The exception is .txt, any .txt files work perfectly.
I have confirmed the string is convertable manually and readable, I did this using PowerShell without issue. The issue is im doing something wrong in my web side javascript code. What am I missing here?
Code:
const { filename } = response
const blob = new Blob([response.body])
const blobURL = window.URL.createObjectURL(blob);
// Using React Ref to handle the anchor.
downloadAnchor!.current!.href = blobURL;
downloadAnchor!.current!.download = 'filename';
downloadAnchor!.current!.click();

Tampermonkey To open multiple javascript in href in new tab [duplicate]

Over the years on snapchat I have saved lots of photos that I would like to retrieve now, The problem is they do not make it easy to export, but luckily if you go online you can request all the data (thats great)
I can see all my photos download link and using the local HTML file if I click download it starts downloading.
Here's where the tricky part is, I have around 15,000 downloads I need to do and manually clicking each individual one will take ages, I've tried extracting all of the links through the download button and this creates lots of Urls (Great) but the problem is, if you past the url into the browser then ("Error: HTTP method GET is not supported by this URL") appears.
I've tried a multitude of different chrome extensions and none of them show the actually download, just the HTML which is on the left-hand side.
The download button is a clickable link that just starts the download in the tab. It belongs under Href A
I'm trying to figure out what the best way of bulk downloading each of these individual files is.
So, I just watched their code by downloading my own memories. They use a custom JavaScript function to download your data (a POST request with ID's in the body).
You can replicate this request, but you can also just use their method.
Open your console and use downloadMemories(<url>)
Or if you don't have the urls you can retrieve them yourself:
var links = document.getElementsByTagName("table")[0].getElementsByTagName("a");
eval(links[0].href);
UPDATE
I made a script for this:
https://github.com/ToTheMax/Snapchat-All-Memories-Downloader
Using the .json file you can download them one by one with python:
req = requests.post(url, allow_redirects=True)
response = req.text
file = requests.get(response)
Then get the correct extension and the date:
day = date.split(" ")[0]
time = date.split(" ")[1].replace(':', '-')
filename = f'memories/{day}_{time}.mp4' if type == 'VIDEO' else f'memories/{day}_{time}.jpg'
And then write it to file:
with open(filename, 'wb') as f:
f.write(file.content)
I've made a bot to download all memories.
You can download it here
It doesn't require any additional installation, just place the memories_history.json file in the same directory and run it. It skips the files that have already been downloaded.
Short answer
Download a desktop application that automates this process.
Visit downloadmysnapchatmemories.com to download the app. You can watch this tutorial guiding you through the entire process.
In short, the app reads the memories_history.json file provided by Snapchat and downloads each of the memories to your computer.
App source code
Long answer (How the app described above works)
We can iterate over each of the memories within the memories_history.json file found in your data download from Snapchat.
For each memory, we make a POST request to the URL stored as the memories Download Link. The response will be a URL to the file itself.
Then, we can make a GET request to the returned URL to retrieve the file.
Example
Here is a simplified example of fetching and downloading a single memory using NodeJS:
Let's say we have the following memory stored in fakeMemory.json:
{
"Date": "2022-01-26 12:00:00 UTC",
"Media Type": "Image",
"Download Link": "https://app.snapchat.com/..."
}
We can do the following:
// import required libraries
const fetch = require('node-fetch'); // Needed for making fetch requests
const fs = require('fs'); // Needed for writing to filesystem
const memory = JSON.parse(fs.readFileSync('fakeMemory.json'));
const response = await fetch(memory['Download Link'], { method: 'POST' });
const url = await response.text(); // returns URL to file
// We can now use the `url` to download the file.
const download = await fetch(url, { method: 'GET' });
const fileName = 'memory.jpg'; // file name we want this saved as
const fileData = download.body; // contents of the file
// Write the contents of the file to this computer using Node's file system
const fileStream = fs.createWriteStream(fileName);
fileData.pipe(fileStream);
fileStream.on('finish', () => {
console.log('memory successfully downloaded as memory.jpg');
});

Convert HEIC to JPG , using php or JS

Anyone tried to convert a heic to jpg?
I looked at the official repository, but I did'nt understand how it works.
All examples in the repository are working. But when I try to process my photo, made on the iphone, the script refuses to process it.
I've had some luck recently with the conversion using libheif. So I made this library which should greatly simplify the whole process
https://github.com/alexcorvi/heic2any
The only caveat is that the resulting PNG/JPG doesn't retain any of the meta-data that were in the original HEIC.
I managed to convert heic to jpg with the help of heic2any js library (https://github.com/alexcorvi/heic2any/blob/master/docs/getting-started.md)
I converted the picture on client side, then gave it to the input in client side.
Server is seeing it as it was originally uploaded as jpg.
function convertHeicToJpg(input)
{
var fileName = $(input).val();
var fileNameExt = fileName.substr(fileName.lastIndexOf('.') + 1);
if(fileNameExt == "heic") {
var blob = $(input)[0].files[0]; //ev.target.files[0];
heic2any({
blob: blob,
toType: "image/jpg",
})
.then(function (resultBlob) {
var url = URL.createObjectURL(resultBlob);
$(input).parent().find(".upload-file").css("background-image", "url("+url+")"); //previewing the uploaded picture
//adding converted picture to the original <input type="file">
let fileInputElement = $(input)[0];
let container = new DataTransfer();
let file = new File([resultBlob], "heic"+".jpg",{type:"image/jpeg", lastModified:new Date().getTime()});
container.items.add(file);
fileInputElement.files = container.files;
console.log("added");
})
.catch(function (x) {
console.log(x.code);
console.log(x.message);
});
}
}
$("#input").change(function() {
convertHeicToJpg(this);
});
What I am doing is converting the heic picture to jpg, then previewing it.
After that I add it to the original input. Server side will consider it as an uploaded jpg.
Some delay can appear while converting, therefore I placed a loader gif while uploading.

encodeURI file download - crashing browser

I created a web application to clean up CSV/TSV data. The app allows me to upload a CSV file, read it, fix data, and then download a new CSV file with the correct data. One challenge I have run into is downloading files with more than ~ 2500 lines. The browser crashes with the following error message:
"Aw, Snap! Something went wrong while displaying this webpage..."
To work around this I have changed the programming to download multiple CSV files not exceeding 2500 lines until all the data is downloaded. I would then put together the downloaded CSV files into a final file. That's not the solution I am looking for. Working with files of well over 100,000 lines, I need to download all contents in 1 file, and not 40. I also need a front-end solution.
Following is the code for downloading the CSV file. I am creating a hidden link, encoding the contents of data array (each element has 1000 lines) and creating the path for the hidden link. I then trigger a click on the link to start the download.
var startDownload = function (data){
var hiddenElement = document.createElement('a');
var path = 'data:attachment/tsv,';
for (i=0;i<data.length;i++){
path += encodeURI(data[i]);
}
hiddenElement.href = path;
hiddenElement.target = '_blank';
hiddenElement.download = 'result.tsv';
hiddenElement.click();
}
In my case the above process works for ~ 2500 lines at a time. If I attempt to download bigger files, the browser crashes. What am I doing wrong, and how can I download bigger files without crashing the browser? The file that is crashing the browser has (12,000 rows by 48 columns)
p.s. I am doing all of this in Google Chrome, which allows for file upload. So the solution should work in Chrome.
I've experienced this problem before and the solution I found was to use Blobs to download the CSV. Essentially, you turn the csv data into a Blob, then use the URL API to create a URL to use in the link, eg:
var blob = new Blob([data], { type: 'text/csv' });
var hiddenElement = document.createElement('a');
hiddenElement.href = window.URL.createObjectURL(blob);
Blobs aren't supported in IE9, but if you just need Chrome support you should be fine.
I also faced same problem. I used this code,it will works fine. You can also try this.
if (window.navigator.msSaveBlob) {
window.navigator.msSaveBlob(new Blob([base64toBlob($.base64.encode(excelFile), 'text/csv')]),'data.csv');
} else {
var link = document.createElement('a');
link.download = 'data.csv';
// If u use chrome u can use webkitURL in place of URL
link.href = window.URL.createObjectURL(new Blob([base64toBlob($.base64.encode(excelFile), 'text/csv')]));
link.click();
}

Image not showing using chrome filesystem toURL

I have the following code to write an image into the filesystem, and read it back for display. Prior to trying out the filesystem API, I loaded the whole base64 image into the src attribute and the image displayed fine. Problem is the images can be large so if you add a few 5MB images, you run out of memory. So I thought I'd just write them to the tmp storage and only pass the URL into the src attribute.
Trouble is, nothing gets displayed.
Initially I thought it might be something wrong with the URL, but then I went into the filesystem directory, found the image it was referring to and physically replaced it with the real binary image and renamed it to the same as the replaced image. This worked fine and the image is displayed correctly, so the URL looks good.
The only conclusion I can come to is that the writing of the image is somehow wrong - particularly the point where the blob is created. I've looked through the blob API and can't see anything that I may have missed, however I'm obviously doing something wrong because it seems to be working for everyone else.
As an aside, I also tried to store the image in IndexedDB and use the createObjectURL to display the image - again, although the URL looks correct, nothing is displayed on the screen. Hence the attempt at the filesystem API. The blob creation is identical in both cases, with the same data.
The source data is a base64 encoded string as I mentioned. Yes, I did also try to store the raw base64 data in the blob (with and without the prefix) and that didn't work either.
Other info - chrome version 28, on linux Ubuntu
//strip the base64 `enter code here`stuff ...
var regex = /^data.+;base64,/;
if (regex.test(imgobj)) { //its base64
imgobj = imgobj.replace(regex,"");
//imgobj = B64.decode(imgobj);
imgobj = window.atob(imgobj);
} else {
console.log("it's already :", typeof imgobj);
}
// store the object into the tmp space
window.requestFileSystem(window.TEMPORARY, 10*1024*1024, function(fs) {
// check if the file already exists
fs.root.getFile(imagename, {create: false}, function(fileEntry) {
console.log("File exists: ", fileEntry);
callback(fileEntry.toURL(), fileEntry.name);
//
}, function (e) { //file doesn't exist
fs.root.getFile(imagename, {create: true}, function (fe) {
console.log("file is: ", fe);
fe.createWriter(function(fw){
fw.onwriteend = function(e) {
console.log("write complete: ", e);
console.log("size of file: ", e.total)
callback(fe.toURL(), fe.name);
};
fw.onerror = function(e) {
console.log("Write failed: ", e.toString());
};
var data = new Blob([imgobj], {type: "image/png"});
fw.write(data);
}, fsErrorHandler);
}, fsErrorHandler);
});
// now create a file
}, fsErrorHandler);
Output from the callback is:
<img class="imgx" src="filesystem:file:///temporary/closed-padlock.png" width="270px" height="270px" id="img1" data-imgname="closed-padlock.png">
I'm at a bit of a standstill unless someone can provide some guidance...
UPDATE
I ran a test to encode and decode the base64 image with both the B64encoder/decoder and atob/btoa -
console.log(imgobj); // this is the original base64 file from the canvas.toDataURL function
/* B64 is broken*/
B64imgobjdecode = B64.decode(imgobj);
B64imgobjencode = B64.encode(B64imgobjdecode);
console.log(B64imgobjencode);
/* atob and btoa decodes and encodes correctly*/
atobimgobj = window.atob(imgobj);
btoaimgobj = window.btoa(atobimgobj);
console.log(btoaimgobj);
The results show that the btoa/atob functions work correctly but the B64 does not - probably because the original encoding didn't use the B64.encode function...
The resulting file in filesystem TEMPORARY, I ran through an online base64 encoder for comparison and the results are totally different. So the question is - while in the filesystem temp storage, is the image supposed to be an exact image, or is it padded with 'something' which only the filesystem API understands? Remember I put the original PNG in the file system directory and the image displayed correctly, which tends to indicate that the meta-data about the image (eg. the filename) is held elsewhere...
Can someone who has a working implementation of this confirm if the images are stored as images in the filesystem, or are padded with additional meta-data?
So to answer my own question - the core problem was in the base64 encoding/decoding - I've since then changed this to use things like ajax and responseTypes like arraybuffer and blob and things have started working.
To answer the last part of the question, this is what I've found - in the filesystem tmp storage, yes the file is supposed to be an exact binary copy - verified this in chrome and phonegap.

Categories