Merging PDFs in Node

Merging PDFs in Node - javascript

Hi i'm trying to merge pdf's of total of n but I cannot get it to work.
I'm using the Buffer module to concat the pdf's but it does only apply the last pdf in to the final pdf.
Is this even possible thing to do in node?
var pdf1 = fs.readFileSync('./test1.pdf');
var pdf2 = fs.readFileSync('./test2.pdf');
fs.writeFile("./final_pdf.pdf", Buffer.concat([pdf1, pdf2]), function(err) {
if(err) {
return console.log(err);
}
console.log("The file was saved!");
});
There are currently some libs out there but they do all depend on either other software or programming languages.

What do you expect to get when you do Buffer.concat([pdf1, pdf2])? Just by concatenating two PDFs files you won’t get one containing all pages. PDF is a complex format (basically one for vector graphics). If you just added two JPEG files you wouldn’t expect to get a big image containing both pictures, would you?
You’ll need to use an external library. https://github.com/wubzz/pdf-merge might work for instance.

HummusJS is another PDF manipulation library, but without a dependency on PDFtk. See this answer for an example of combining PDFs in Buffers.

Aspose.PDF Cloud SDK for Node.js can merge/combine pdf documents without depending on any third-party API/Tool. However, currently, it works with cloud storage(Aspose Internal Storage, Amazon S3, DropBox, Google Drive Storage, Google Cloud Storage, Windows Azure Storage, FTP Storage). In near future, we will provide support to merge files from the request body(stream).
const { PdfApi } = require("asposepdfcloud");
const { MergeDocuments }= require("asposepdfcloud/src/models/mergeDocuments");
var fs = require('fs');
pdfApi = new PdfApi("xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx", "xxxxxxxxxxxxxxxxxxxxx");
const file1 = "dummy.pdf";
const file2 = "02_pages.pdf";
const localTestDataFolder = "C:\\Temp";
const names = [file1, file2];
const resultName = "MergingResult.pdf";
const mergeDocuments = new MergeDocuments();
mergeDocuments.list = [];
names.forEach( (file) => {
mergeDocuments.list.push(file);
});
// Upload File
pdfApi.uploadFile(file1, fs.readFileSync(localTestDataFolder + "\\" + file1)).then((result) => {
console.log("Uploaded File");
}).catch(function(err) {
// Deal with an error
console.log(err);
});
// Upload File
pdfApi.uploadFile(file2, fs.readFileSync(localTestDataFolder + "\\" + file2)).then((result) => {
console.log("Uploaded File");
}).catch(function(err) {
// Deal with an error
console.log(err);
});
// Merge PDF documents
pdfApi.putMergeDocuments(resultName, mergeDocuments, null, null).then((result) => {
console.log(result.body.code);
}).catch(function(err) {
// Deal with an error
console.log(err);
});
//Download file
const outputPath = "C:/Temp/" + resultName;
pdfApi.downloadFile(outputPath).then((result) => {
fs.writeFileSync(localPath, result.body);
console.log("File Downloaded");
}).catch(function(err) {
// Deal with an error
console.log(err);
});

Related

folders and files are not visible after uploading file though multer

I am working on a small project. discussing Step by step
At first I am uploading zip files though multer
extracting those files (How can I call extract function after completing upload using multer?)
After extracting those I am trying to filter those files
after filtering those files I want to move some files to another directory
in my main index.js I have
A simple route to upload files which is working
// MAIN API ENDPOINT
app.post("/api/zip-upload", upload, async (req, res, next) => {
console.log("FIles - ", req.files);
});
Continuous checking for if there is any zip file that needs to unzip but the problem is after uploading it's not showing any files or dir
// UNZIP FILES
const dir = `${__dirname}/uploads`;
const files = fs.readdirSync("./uploads");
const filesUnzip = async () => {
try {
if (fs.existsSync(dir)) {
console.log("files - ", files);
for (const file of files) {
console.log("file - ", file);
try {
const extracted = await extract("./uploads/" + file, { dir: __dirname + "/uploads/" });
console.log("Extracted - ",extracted);
// const directories = await fs.statSync(dir + '/' + file).isDirectory();
} catch (bufErr) {
// console.log("------------");
console.log(bufErr.syscall);
}
};
// const directories = await files.filter(function (file) { return fs.statSync(dir + '/' + file).isDirectory(); });
// console.log(directories);
}
} catch (err) {
console.log(err);
}
return;
}
setInterval(() => {
filesUnzip();
}, 2000);
Moving files to static directory but here is the same problem no directory found
const getAllDirs = async () => {
// console.log(fs.existsSync(dir));
// FIND ALL DIRECTORIES
if (fs.existsSync(dir)) {
const directories = await files.filter(function (file) { return fs.statSync(dir + '/' + file).isDirectory(); });
console.log("Directories - ",directories);
if (directories.length > 0) {
for (let d of directories) {
const subdirFiles = fs.readdirSync("./uploads/" + d);
for (let s of subdirFiles) {
if (s.toString().match(/\.xml$/gm) || s.toString().match(/\.xml$/gm) !== null) {
console.log("-- ", d + "/" + s);
const move = await fs.rename("uploads/" + d + "/" + s, __dirname + "/static/" + s, (err) => { console.log(err) });
console.log("Move - ", move);
}
}
}
}
}
}
setInterval(getAllDirs, 3000);

There are so many issues with your code, I don't know where to begin:
Why are you using fs.xxxSync() methods if all your functions are async? Using xxxSync() methods is highly discouraged because it's blocking the server (ie parallel requests can't/won't be accepted while a sync reading is in progress). The fs module supports a promise api ...
Your "Continuous checking" for new files is always checking the same (probably empty) files array because it seems you are executing files = fs.readdirSync("./uploads"); only once (probably at server start, but I can't tell for sure because there isn't any context for that snippet)
You shouldn't be polling that "uploads" directory. Because as writing a file (if done properly) is an asynchronous process, you may end up reading incomplete files. Instead you should trigger the unzipping from your endpoint handler. Once it is hit, body.files contains the files that have been uploaded. So you can simply use this array to start any further processing instead of frequently polling a directory.
At some points you are using the callback version of the fs API (for instance fs.rename(). You cannot await a function that expects a callback. Again, use the promise api of fs.
EDIT
So I'm trying to address your issues. Maybe I can't solve all of them because of missing infomation, but you should get the general idea.
First of all, you shuld use the promise api of the fs module. And also for path manipulation, you should use the available path module, which will take care of some os specific issues.
const fs = require('fs').promises;
const path = require('path');
Your API endpoint isn't currently returning anything. I suppose you stripped away some code, but still. Furthermore, you should trigger your filehandling from here, so you don't have to do directory polling, which is
error prone,
wasting resources and
if you do it synchronously like you do blocks the server
app.post("/api/zip-upload", upload, async (req, res, next) => {
console.log("FIles - ", req.files);
//if you want to return the result only after the files have been
//processed use await
await handleFiles(req.files);
//if you want to return to the client immediately and process files
//skip the await
//handleFiles(req.files);
res.sendStatus(200);
});
Handling the files seems to consist of two different steps:
unzipping the uploaded zip files
copying some of the extracted files into another directory
const source = path.join(".", "uploads");
const target = path.join(__dirname, "uploads");
const statics = path.join(__dirname, "statics");
const handleFiles = async (files) => {
//a random folder, which will be unique for this upload request
const tmpfolder = path.join(target, `tmp_${new Date().getTime()}`);
//create this folder
await fs.mkdir(tmpfolder, {recursive: true});
//extract all uploaded files to the folder
//this will wait for a list of promises and resolve once all of them resolved,
await Promise.all(files.map(f => extract(path.join(source, f), { dir: tmpfolder })));
await copyFiles(tmpfolder);
//you probably should delete the uploaded zipfiles and the tempfolder
//after they have been handled
await Promise.all(files.map(f => fs.unlink(path.join(source, f))));
await fs.rmdir(tmpfolder, { recursive: true});
}
const copyFiles = async (tmpfolder) => {
//get all files/directory names in the tmpfolder
const allfiles = await fs.readdir(tmpfolder);
//get their stats
const stats = await Promise.all(allfiles.map(f => fs.stat(path.join(tmpfolder, f))));
//filter directories only
const dirs = allfiles.filter((_, i) => stats[i].isDirectory());
for (let d of dirs) {
//read all filenames in the subdirectory
const files = await fs.readdir(path.join(tmpfolder, d)));
//filter by extension .xml
const xml = files.filter(x => path.extname(x) === ".xml");
//move all xml files
await Promise.all(xml.map(f => fs.rename(path.join(tmpfolder, d, f), path.join(statics, f))));
}
}
That should do the trick. Of course you may notice there is no error handling with this code. You should add that.
And I'm not 100% sure about your paths. You should consider the following
./uploads refers to a directory uploads in the current working directory (whereever that may be)
${__dirname}/uploads refers to a directory uploads which is in the same directory as the script file currently executing Not sure if that is the directory you want ...
./uploads and ${__dirname}/uploads may point to the same folder or to completely different folders. No way knowing that without additional context.
Furthermore in your code you extract the ZIP files from ./uploads to ${__dirname}/uploads and then later try to copy XML files from ./uploads/xxx to ${__dirname}/statics, but there won't be any directory xxx in ./uploads because you extracted the ZIP file to a (probably) completely different folder.

How do I pass a file/blob from JavaScript to emscripten/WebAssembly (C++)?

I'm writing a WebExtension that uses C++ code compiled with emscripten. The WebExtension downloads files which I want to process inside the C++ code. I'm aware of the File System API and I think I read most of it, but I don't get it to work - making a downloaded file accessible in emscripten.
This is the relevant JavaScript part of my WebExtension:
// Download a file
let blob = await fetch('https://stackoverflow.com/favicon.ico').then(response => {
if (!response.ok) {
return null;
}
return response.blob();
});
// Setup FS API
FS.mkdir('/working');
FS.mount(IDBFS, {}, '/working');
FS.syncfs(true, function(err) {
if (err) {
console.error('Error: ' + err);
}
});
// Store the file "somehow"
let filename = 'favicon.ico';
// ???
// Call WebAssembly/C++ to process the file
Module.processFile(filename);
The directory is created, what can be seen, when inspecting the Web Storage of the browser. If I understand the File System API correctly, I have to "somehow" write my data to a file inside /working. Then, I should be able to call a function of my C++ code (from JavaScript) and open that file as if there was a directory called 'working' at the root, containing the file. The call of the C++ function works (I can print the provided filename).
But how do I add the file (currently a blob) to that directory?
C++ code:
#include "emscripten/bind.h"
using namespace emscripten;
std::string processFile(std::string filename)
{
// open and process the file
}
EMSCRIPTEN_BINDINGS(my_module)
{
function("processFile", &processFile);
}

It turned out, that I was mixing some things up while trying different methods, and I was also misinterpreting my debugging tools. So the easiest way to accomplish this task (without using IDBFS) is:
JS:
// Download a file
let blob = await fetch('https://stackoverflow.com/favicon.ico').then(response => {
if (!response.ok) {
return null;
}
return response.blob();
});
// Convert blob to Uint8Array (more abstract: ArrayBufferView)
let data = new Uint8Array(await blob.arrayBuffer());
// Store the file
let filename = 'favicon.ico';
let stream = FS.open(filename, 'w+');
FS.write(stream, data, 0, data.length, 0);
FS.close(stream);
// Call WebAssembly/C++ to process the file
console.log(Module.processFile(filename));
C++:
#include "emscripten/bind.h"
#include <fstream>
using namespace emscripten;
std::string processFile(std::string filename)
{
std::fstream fs;
fs.open (filename, std::fstream::in | std::fstream::binary);
if (fs) {
fs.close();
return "File '" + filename + "' exists!";
} else {
return "File '" + filename + "' does NOT exist!";
}
}
EMSCRIPTEN_BINDINGS(my_module)
{
function("processFile", &processFile);
}
If you want to do it with IDBFS, you can do it like this:
// Download a file
let blob = await fetch('https://stackoverflow.com/favicon.ico').then(response => {
if (!response.ok) {
return null;
}
return response.blob();
});
// Convert blob to Uint8Array (more abstract: ArrayBufferView)
let data = new Uint8Array(await blob.arrayBuffer());
// Setup FS API
FS.mkdir('/persist');
FS.mount(IDBFS, {}, '/persist');
// Load persistant files (sync from IDBFS to MEMFS), will do nothing on first run
FS.syncfs(true, function(err) {
if (err) {
console.error('Error: ' + err);
}
});
FS.chdir('/persist');
// Store the file
let filename = 'favicon.ico';
let stream = FS.open(filename, 'w+');
FS.write(stream, data, 0, data.length, 0);
FS.close(stream);
// Persist the changes (sync from MEMFS to IDBFS)
FS.syncfs(false, function(err) {
if (err) {
console.error('Error: ' + err);
}
});
// NOW you will be able to see the file in your browser's IndexedDB section of the web storage inspector!
// Call WebAssembly/C++ to process the file
console.log(Module.processFile(filename));
Notes:
When using FS.chdir() in the JS world to change the directory, this also changes the working directory in the C++ world. So respect that, when working with relative paths.
When working with IDBFS instead of MEMFS, you are actually still working with MEMFS and just have the opportunity to sync data from or to IDBFS on demand. But all your work is still done with MEMFS. I would consider IDBFS as an add-on to MEMFS. Didn't read that directly from the docs.

uploading multiple files to s3 with sailsjs

I am not using the package of multer because I am not using express so I am not sure how multer can work with sailsjs
Anyways, I am trying to upload multiple files to s3, at first I worked with for loop which did not work because for loop is synchronous and file upload is asynchronous.
But then I googled that using recurrsive would work so I tried it but somehow it still didn't though.
Files are uploaded but then the size isn't right for all of them.
Somehow the size might be bigger / smaller then when I download the file let's say if it's a doc file, either I get error saying it's not a msdoc file or what's inside is all scrambled. If it's a pdf, it'll say failed to open the pdf file.
If I try only with one file, it works sometimes but not always though.
Did I do something wrong with the codes below?
s3_upload_multi: async function(req){
try {
let fieldName = req._fileparser.upstreams[0].fieldName;
let files = req.file(fieldName)._files;
let return_obj = [];
const upload_rec = files => {
if (files.length <= 0) return return_obj;
const f = files.pop();
const fileUpload = f.stream;
const s3 = new AWS.S3();
// https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
s3.putObject(({ // uses s3 sdk
Bucket: sails.config.aws.bucket,
Key: 'blahblahblahblahblah',
Body: fileUpload._readableState.buffer.head.data, // buffer from file
ACL: 'public-read',
}, function ( err, data ) {
if (err) reject(err);
return_obj.push(data);
console.log(return_obj, 'return_obj');
});
return upload_rec(files);
};
upload_rec(files);
} catch (e) {
console.log(e, 'inside UploadService');
return false;
}
}
Thanks in advance for any advices and suggestions

Ipfs-mini cat APi's output buffer seems like corrupted for the hash pointing the image file

I am a newbie to both Javascript and ipfs and I am trying an experiment to fetch an image buffer from the ipfs hash "QmdD8FL7N3kFnWDcPSVeD9zcq6zCJSUD9rRSdFp9tyxg1n" using ipfs-mini node module.
Below is my code
const IPFS = require('ipfs-mini');
const FileReader = require('filereader');
var multer = require('multer');
const ipfs = initialize();
app.post('/upload',function(req,res){
upload(req,res, function(err){
console.log(req.file.originalname);
ipfs.cat('QmdD8FL7N3kFnWDcPSVeD9zcq6zCJSUD9rRSdFp9tyxg1n', function(err, data){
if(err) console.log("could not get the image from the ipfs for hash " + ghash);
else {
var wrt = data.toString('base64');
console.log('size ; ' + wrt.length);
fs.writeFile('tryipfsimage.gif',wrt, (err) =>{
if(err)console.log('can not write file');
else {
//console.log(data);
ipfs.stat('QmdD8FL7N3kFnWDcPSVeD9zcq6zCJSUD9rRSdFp9tyxg1n', (err, data)=>{
// console.log(hexdump(wrt));
});
console.log("files written successfully");
}
});
}
});
});
});
function initialize() {
console.log('Initializing the ipfs object');
return new IPFS({
host: 'ipfs.infura.io',
protocol: 'https'
});
}
I could view the image properly in the browser using the link below "https://ipfs.io/ipfs/QmdD8FL7N3kFnWDcPSVeD9zcq6zCJSUD9rRSdFp9tyxg1n", but if I open the file 'tryipfsimage.gif' in which I dump the return buffer of the cat API in above program, the content of the image seems corrupted. I am not sure what the mistake I am doing in the code. it would be great If someone points me the mistake.

From ipfs docs https://github.com/ipfs/interface-ipfs-core/blob/master/SPEC/FILES.md#javascript---ipfsfilescatipfspath-callback
file in the callback is actually a Buffer so by toString('base64')'ing it you are writing actual base64 into the .gif file - no need to do this. you can pass the Buffer directly to the fs.writeFile api with
fs.writeFile('tryipsimage.gif', file, ...
For larger files I would recommend using the ipfs catReadableStream, where you can do something more like:
const stream = ipfs.catReadableStream('QmdD8FL7N3kFnWDcPSVeD9zcq6zCJSUD9rRSdFp9tyxg1n')
// don't forget to add error handlers to stream and whatnot
const fileStream = fs.createWriteStream('tryipsimage.gif')
stream.pipe(fileStream);

Extract Sketch file using JavaScript

The purpose is to extract the contents of a .sketch file.
I have a file with the name myfile.sketch. On renaming the file extension to myfile.zip and extracting the same in Finder, I'm able view the files within. I tried doing the same on the server using Node.js by renaming the file extension to .zip. I wasn't able to extract the files, rather I got some ZIP files within the files.
var oldPath = __dirname+'/uploads/myfile.sketch',
newPath = __dirname+'/uploads/myfile.zip';
fs.rename(oldPath, newPath, function (err) {
console.log('rename callback ', err);
});
Is it possible to extract a non-ZIP file using frameworks like JSzip?

As a .sketch file is essentially a ZIP file, the extension does not matter. Any tool that is capable of unpacking a ZIP file will work.
You can verify this with the file command:
$ file myfile.sketch
myfile.sketch: Zip archive data, at least v1.0 to extract
As you are working on the server already, there is nothing stopping you from just using the OS's command line tools like unzip.
Like this:
const util = require('util');
const exec = util.promisify(require('child_process').exec);
async function unzip() {
const filename = 'myfile.sketch'
const { stdout, stderr } = await exec('unzip ' + filename);
console.log('stdout:', stdout);
console.log('stderr:', stderr);
}
unzip();
Doing it with JSZip is straight-forward as well:
var fs = require('fs');
var JSZip = require('jszip');
new JSZip.external.Promise(function (resolve, reject) {
fs.readFile('myfile.sketch', function(err, data) {
if (err) {
reject(e);
} else {
resolve(data);
}
});
}).then(function (data) {
return JSZip.loadAsync(data);
})

We Keep Coding

JavaScript is the programming language of the Web.

Merging PDFs in Node - javascript

HummusJS is another PDF manipulation library, but without a dependency on PDFtk. See this answer for an example of combining PDFs in Buffers.

Related

folders and files are not visible after uploading file though multer

How do I pass a file/blob from JavaScript to emscripten/WebAssembly (C++)?

uploading multiple files to s3 with sailsjs

Ipfs-mini cat APi's output buffer seems like corrupted for the hash pointing the image file

Extract Sketch file using JavaScript

Categories

Resources