Extract images from PDF file with JavaScript

Extract images from PDF file with JavaScript - javascript

I want to write JavaScript code to extract all image files from a PDF file, perhaps getting them as JPG or some other image format. There is already some JavaScript code for reading a PDF file, for example in the PDF viewer pdf-js.
window.addEventListener('change', function webViewerChange(evt) {
var files = evt.target.files;
if (!files || files.length === 0)
return;
// Read the local file into a Uint8Array.
var fileReader = new FileReader();
fileReader.onload = function webViewerChangeFileReaderOnload(evt) {
var buffer = evt.target.result;
var uint8Array = new Uint8Array(buffer);
PDFView.open(uint8Array, 0);
};
var file = files[0];
fileReader.readAsArrayBuffer(file);
PDFView.setTitleUsingUrl(file.name);
........
Can this code be used to extract images from a PDF file?

If you open a page with pdf.js, for example
PDFJS.getDocument({url: <pdf file>}).then(function (doc) {
doc.getPage(1).then(function (page) {
window.page = page;
})
})
you can then use getOperatorList to search for paintJpegXObject objects and grab the resources.
window.objs = []
page.getOperatorList().then(function (ops) {
for (var i=0; i < ops.fnArray.length; i++) {
if (ops.fnArray[i] == PDFJS.OPS.paintJpegXObject) {
window.objs.push(ops.argsArray[i][0])
}
}
})
Now args will have a list of the resources from that page that you need to fetch.
console.log(window.args.map(function (a) { page.objs.get(a) }))
should print to the console a bunch of <img /> objects with data-uri src= attributes. These can be directly inserted into the page, or you can do more scripting to get at the raw data.
It only works for embedded JPEG objects, but it's a start!

Here is link to working example of getting images from pdf and adding alpha channel to Uint8ClampedArray to be able to display it. It displays images in canvas.
Example in codepen: https://codepen.io/allandiego/pen/RwVGbyj
Getting data url from canvas to be able to display it in img tag:
const canvas = document.createElement('canvas');
canvas.width = imageWidth;
canvas.height = imageHeight;
const ctx = canvas.getContext('2d');
ctx!.putImageData(imageData, 0, 0);
const dataURL = canvas.toDataURL();

Related

Converting base64 into a functional JPEG in Typescript/angular

I am running cropperjs on a static image in the browser(retrieved from a nodejs server in jpeg format), it returns a preview in a different image that is in base64. Im trying to take that data and save the modified image back to the server in the original jpeg format.
I've tried a few different things, but this is the latest:
saveCroppedImage(){
var split = this.imageDestination.split(','); // parsing out data:image/png;base64,
var croppedImage = split[1]; // assigning the base64 to a variable
var blob = new Blob([croppedImage],{type: 'image/jpeg'}); //changing the base64->Blob
var file = new File([blob],'cropped.jpeg'); //theoretically changing the blob->jpeg
this.newCroppedImage = file;
}
I then upload the file to the server and it is corrupted.

Ok, so I figured it out with some help of the interweb:
This is the info I had to include with the file creation, and it did have to stay a png to work.
public file = (theBlob: Blob): File =>
{return new File([theBlob],'croppedImage.png',{lastModified:new
Date().getTime(), type:'image/png'})
}
saveCroppedImage() {
var split = this.imageDestination.split(',');
console.log(this.imageDestination)
var croppedImage = split[1]
this.blob = blobUtil.base64StringToBlob(croppedImage);
this.newCroppedImage =this.file(this.blob);
this.onFileChange();
}

canvas.toDataURL blocked by Brave Shield cross site device-recognition

I am currently working on a webapp which will be used to upload files after a resize. When I am working on localhost:3000, the resize process works like a charm, but when I am working on 192.168.0.5:3000 for example (or another domain), canvas.toDataURL returns an empty string instead of the base64 and the resize never ends.
This issue only occurs when the brave shields are enabled, and the reason behind the blocked element is a cross site device-recognition.
What can I do for my canvas to be able to resize an image I am uploading to the browser and get it's dataURL and Blob, when I am not using localhost ?
The idea is to get an file from an input, resize it to 16MP via a Canva, and get the blob of this canvas to send to the backend.
I have a PictureCompress function which will read the file with new FileReader() in order to get the base64 of this file. After that I am creating a new Image() and set it's src to my reader event.target.result (the b64) and, onload I am supposed to create the Canvas to resize the image via ctx.drawImage, get the new base64 via canvas.toDataURL and then the blob.
This process is perfectly working on my webapp when running it locally, but when I change the url to access my machine IP (I expect it to be the same on a domain name), it is no longer working.
For example, there is a live version of this code on CodeBox and the issue seems to occurs.
EDIT :
This issue only occurs when the url has a port (192.168.0.5:3000 is not working, but 192.168.0.5 is working). Why ?
On firefox, the above fix does not work : Blocked http://192.168.0.13/ from extracting canvas data because no user input was detected.. Is it because it's not https ? To be continued ...
function Uploader(props) {
function PictureCompress(file, callback) {
const fileName = file.name;
const reader = new FileReader();
reader.readAsDataURL(file);
reader.onerror = error => console.warn(error);
reader.onload = event => {
const img = new Image();
img.name = fileName;
img.onerror = error => console.warn(error);
img.onload = e => {
const imageOriginalWidth = e.target.width;
const imageOriginalHeight = e.target.height;
const hvRatio = imageOriginalWidth / imageOriginalHeight;
const vhRatio = imageOriginalHeight / imageOriginalWidth;
const imageHvRatio = 16000000 * hvRatio;
const imageVhRatio = 16000000 * vhRatio;
const newWidth = Math.sqrt(imageHvRatio);
const newHeight = Math.sqrt(imageVhRatio);
const canvas = document.createElement("canvas");
canvas.width = newWidth;
canvas.height = newHeight;
const ctx = canvas.getContext("2d");
ctx.drawImage(img, 0, 0, newWidth, newHeight);
const base64 = canvas.toDataURL(file.type);
console.log(base64, newWidth, newHeight, file.type);
canvas.toBlob(
blob => {
console.log(blob);
callback(blob, base64);
},
file.type,
0.85
);
};
img.src = event.target.result;
};
}
function recursiveUpload(index, files) {
if (index >= files.length) {
return;
}
const file = files[index];
PictureCompress(file, resizedFile => {
resizedFile.name = file.name;
resizedFile.lastModified = file.lastModified;
//MY API CALL, NOT THE ISSUE
});
}
return (
<input type={"file"} onChange={event => recursiveUpload(0, event.target.files)} />
);
}
Thank you for your help !

Download PDF From blob returned by AJAX call [duplicate]

I have a difficult question to you, which i'm struggling on for some time now.
I'm looking for a solution, where i can save a file to the users computer, without the local storage, because local storage has 5MB limit. I want the "Save to file"-dialog, but the data i want to save is only available in javascript and i would like to prevent sending the data back to the server and then send it again.
The use-case is, that the service im working on is saving compressed and encrypted chunks of the users data, so the server has no knowledge whats in those chunks and by sending the data back to the server, this would cause 4 times traffic and the server is receiving the unencrypted data, which would render the whole encryption useless.
I found a javascript function to save the data to the users computer with the "Save to file"-dialog, but the work on this has been discontinued and isnt fully supported. It's this: http://www.w3.org/TR/file-writer-api/
So since i have no window.saveAs, what is the way to save data from a Blob-object without sending everything to the server?
Would be great if i could get a hint, what to search for.
I know that this works, because MEGA is doing it, but i want my own solution :)

Your best option is to use a blob url (which is a special url that points to an object in the browser's memory) :
var myBlob = ...;
var blobUrl = URL.createObjectURL(myBlob);
Now you have the choice to simply redirect to this url (window.location.replace(blobUrl)), or to create a link to it. The second solution allows you to specify a default file name :
var link = document.createElement("a"); // Or maybe get it from the current document
link.href = blobUrl;
link.download = "aDefaultFileName.txt";
link.innerHTML = "Click here to download the file";
document.body.appendChild(link); // Or append it whereever you want

FileSaver.js implements saveAs for certain browsers that don't have it
https://github.com/eligrey/FileSaver.js
Tested with FileSaver.js 1.3.8 tested on Chromium 75 and Firefox 68, neither of which have saveAs.
The working principle seems to be to just create an <a element and click it with JavaScript oh the horrors of the web.
Here is a demo that save a blob generated with canvas.toBlob to your download folder with the chosen name mypng.png:
var canvas = document.getElementById("my-canvas");
var ctx = canvas.getContext("2d");
var pixel_size = 1;
function draw() {
console.log("draw");
for (x = 0; x < canvas.width; x += pixel_size) {
for (y = 0; y < canvas.height; y += pixel_size) {
var b = 0.5;
ctx.fillStyle =
"rgba(" +
(x / canvas.width) * 255 + "," +
(y / canvas.height) * 255 + "," +
b * 255 +
",255)"
;
ctx.fillRect(x, y, pixel_size, pixel_size);
}
}
canvas.toBlob(function(blob) {
saveAs(blob, 'mypng.png');
});
}
window.requestAnimationFrame(draw);
<canvas id="my-canvas" width="512" height="512" style="border:1px solid black;"></canvas>
<script src="https://cdnjs.cloudflare.com/ajax/libs/FileSaver.js/1.3.8/FileSaver.min.js"></script>
Here is an animated version that downloads multiple images: Convert HTML5 Canvas Sequence to a Video File
See also:
how to save canvas as png image?
JavaScript: Create and save file

HERE is the direct way.
canvas.toBlob(function(blob){
console.log(typeof(blob)) //let you have 'blob' here
var blobUrl = URL.createObjectURL(blob);
var link = document.createElement("a"); // Or maybe get it from the current document
link.href = blobUrl;
link.download = "image.jpg";
link.innerHTML = "Click here to download the file";
document.body.appendChild(link); // Or append it whereever you want
document.querySelector('a').click() //can add an id to be specific if multiple anchor tag, and use #id
}, 'image/jpeg', 1); // JPEG at 100% quality
spent a while to come upto this solution, comment if this helps.
Thanks to Sebastien C's answer.

this node dependence was more utils fs-web;
npm i fs-web
Usage
import * as fs from 'fs-web';
async processFetch(url, file_path = 'cache-web') {
const fileName = `${file_path}/${url.split('/').reverse()[0]}`;
let cache_blob: Blob;
await fs.readString(fileName).then((blob) => {
cache_blob = blob;
}).catch(() => { });
if (!!cache_blob) {
this.prepareBlob(cache_blob);
console.log('FROM CACHE');
} else {
await fetch(url, {
headers: {},
}).then((response: any) => {
return response.blob();
}).then((blob: Blob) => {
fs.writeFile(fileName, blob).then(() => {
return fs.readString(fileName);
});
this.prepareBlob(blob);
});
}
}

From a file picker or input type=file file chooser, save the filename to local storage:
HTML:
<audio id="player1">Your browser does not support the audio element</audio>
JavaScript:
function picksinglefile() {
var fop = new Windows.Storage.Pickers.FileOpenPicker();
fop.suggestedStartLocation = Windows.Storage.Pickers.PickerLocationId.musicLibrary;
fop.fileTypeFilter.replaceAll([".mp3", ".wav"]);
fop.pickSingleFileAsync().then(function (file) {
if (file) {
// save the file name to local storage
localStorage.setItem("alarmname$", file.name.toString());
} else {
alert("Operation Cancelled");
}
});
}
Then later in your code, when you want to play the file you selected, use the following, which gets the file using only it's name from the music library. (In the UWP package manifest, set your 'Capabilites' to include 'Music Library'.)
var l = Windows.Storage.KnownFolders.musicLibrary;
var f = localStorage.getItem("alarmname$").toString(); // retrieve file by name
l.getFileAsync(f).then(function (file) {
// storagefile file is available, create URL from it
var s = window.URL.createObjectURL(file);
var x = document.getElementById("player1");
x.setAttribute("src", s);
x.play();
});

How to correctly set new Image in the following situation?

I'm trying to read the width and height of files that I get after clicking the Browser button:
for (let i = 0; i < this.uploadingPanoramas.length; i++) {
const img = new Image() // eslint-disable-line
console.log('file', this.uploadingPanoramas[i].file)
img.src = this.uploadingPanoramas[i].file
console.log('img', img)
img.onload(() => {
console.log('width', img.width)
})
}
console.log('file') logs:
console.log('file') logs: <img src="[object File]">
So img.onload doesn't actually work because I'm not getting the image apparently.
What's the correct way of doing this?
EDIT:
this.uploadingPanoramas is an array which objects that contains the files:
[{
file: File,
progress: 0
}, {
file: File,
progress: 0
}]

As this.uploadingPanoramas[i].file is File type you can use FileReader to generate data URL using readAsDataURL() method.
The FileReader object lets web applications asynchronously read the contents of files (or raw data buffers) stored on the user's computer, using File or Blob objects to specify the file or data to read.
var reader = new FileReader();
reader.onload = function (e) {
img.src = e.target.result;
}
reader.readAsDataURL(this.uploadingPanoramas[i].file);

how to open a local PDF in PDFJS using file input?

I would like to know if there is a way to select a pdf file using input type="file" and open it using PDFJS

You should be able to use a FileReader to get the contents of a file object as a typed array, which pdfjs accepts (https://mozilla.github.io/pdf.js/examples/)
//Step 1: Get the file from the input element
inputElement.onchange = function(event) {
var file = event.target.files[0];
//Step 2: Read the file using file reader
var fileReader = new FileReader();
fileReader.onload = function() {
//Step 4:turn array buffer into typed array
var typedarray = new Uint8Array(this.result);
//Step 5:pdfjs should be able to read this
const loadingTask = pdfjsLib.getDocument(typedarray);
loadingTask.promise.then(pdf => {
// The document is loaded here...
});
};
//Step 3:Read the file as ArrayBuffer
fileReader.readAsArrayBuffer(file);
}
Edit: The pdfjs API changed at some point since I wrote this first answer in 2015. Updating to reflect the new API as of 2021(thanks to #Chiel) for the updated answer

If getDocument().then is not a function:
I reckon I have managed to solve the new problem with the new API. As mentioned in this GitHub issue, the getDocument function now has an promise added to itself.
In short, this:
PDFJS.getDocument(typedarray).then(function(pdf) {
// The document is loaded here...
});
became this:
const loadingTask = pdfjsLib.getDocument(typedarray);
loadingTask.promise.then(pdf => {
// The document is loaded here...
});
Adapting the older answer to the new api to comply to the bounty gives the following result:
//Step 1: Get the file from the input element
inputElement.onchange = function(event) {
//It is important that you use the file and not the filepath (The file path won't work because of security issues)
var file = event.target.files[0];
var fileReader = new FileReader();
fileReader.onload = function() {
var typedarray = new Uint8Array(this.result);
//replaced the old function with the new api
const loadingTask = pdfjsLib.getDocument(typedarray);
loadingTask.promise.then(pdf => {
// The document is loaded here...
});
};
//Step 3:Read the file as ArrayBuffer
fileReader.readAsArrayBuffer(file);
}
I have created an example below with the official releases of the source code below to show that it is working.
/*Offical release of the pdfjs worker*/
pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.5.207/pdf.worker.js';
document.getElementById('file').onchange = function(event) {
var file = event.target.files[0];
var fileReader = new FileReader();
fileReader.onload = function() {
var typedarray = new Uint8Array(this.result);
console.log(typedarray);
const loadingTask = pdfjsLib.getDocument(typedarray);
loadingTask.promise.then(pdf => {
// The document is loaded here...
//This below is just for demonstration purposes showing that it works with the moderen api
pdf.getPage(1).then(function(page) {
console.log('Page loaded');
var scale = 1.5;
var viewport = page.getViewport({
scale: scale
});
var canvas = document.getElementById('pdfCanvas');
var context = canvas.getContext('2d');
canvas.height = viewport.height;
canvas.width = viewport.width;
// Render PDF page into canvas context
var renderContext = {
canvasContext: context,
viewport: viewport
};
var renderTask = page.render(renderContext);
renderTask.promise.then(function() {
console.log('Page rendered');
});
});
//end of example code
});
}
fileReader.readAsArrayBuffer(file);
}
<html>
<head>
<!-- The offical release-->
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.5.207/pdf.js"> </script>
</head>
<body>
<input type="file" id="file">
<h2>Rendered pdf:</h2>
<canvas id="pdfCanvas" width="300" height="300"></canvas>
</body>
</html>
Hope this helps! If not, please comment.
Note:
This might not work in jsFiddle.

I adopted your code and it worked! Then I was browsing for more tips here and there, then I learned there is an even more convenient method.
You can get the URL of client-loaded file with
URL.createObjectURL()
It reduces nesting by one level and you don't need to read the file, convert it to array, etc.

We Keep Coding

JavaScript is the programming language of the Web.

Extract images from PDF file with JavaScript - javascript

Related

Converting base64 into a functional JPEG in Typescript/angular

canvas.toDataURL blocked by Brave Shield cross site device-recognition

Download PDF From blob returned by AJAX call [duplicate]

How to correctly set new Image in the following situation?

how to open a local PDF in PDFJS using file input?

Categories

Resources