Using pdf.js to render a JBIG2 image in a browser

Using pdf.js to render a JBIG2 image in a browser - javascript

I'm trying to render a JBIG2 image in a browser. This script, which is part of pdf.js appears to do this:
https://github.com/mozilla/pdf.js/blob/master/src/core/jbig2.js
Only it does not have instructions on its usage as it is usually executed as a dependency of pdf.js (for complete PDF rendering, which I don't want or need.)
Can anyone figure out how I would use this script to render a JBIG2 image on a web page?

As nobody has helped you out with this, let me at least share my progress on this problem:
<script src="arithmetic_decoder.js"></script>
<script src="util.js"></script>
<script src="jbig2.js"></script>
<script>
var jbig2 = new Jbig2Image();
httpRequest = new XMLHttpRequest();
httpRequest.responseType = 'arraybuffer';
httpRequest.onreadystatechange = function() {
if (httpRequest.readyState === 4) {
if (httpRequest.status === 200) {
var data = jbig2.parseChunks([{
data:new Uint8Array(httpRequest.response),
start:0,
end:httpRequest.response.byteLength
}]);
for (var i = 0; i < data.length; i++)
data[i] ^= 0xFF;
console.log(data);
} else {
alert('There was a problem with the request.');
}
}
};
httpRequest.open('GET', "sample.jbig2");
httpRequest.send();
</script>
So, this makes all the relevant dependencies clear, it contains the way I believe the parseChunks function should be called (I am sure about the Uint8Array part in combination with the arraybuffer from the XMLHttpRequest, not sure whether I shouldn't first slice it up or anything like that). The array returned to data looks like some sort of pixel array, but lacking any information about width or height I am not sure how to continue. Additionally the sample .jbig2 file you provided gives a corruption error in STDU viewer (the only free app I could find to view .jbig2 files), so I couldn't check whether the image is mostly white (as the resulting data seems to suggest) nor drawing the result by hand seemed like a good idea as I didn't have any width or height. If you wish to draw it the way to go is of course a canvas element (ideally you should construct a pixeldataarray and then use putImageData).
Now, let me outline a way for you to 'figure' out the rest of the solution. What would work best probably is forking pdf.js, adding logging, generating a pdf with just a single jbig2 image and then observing how exactly the above array gets drawn to a canvas (and how/where the dimensions are determined).

JBIG2 used in PDF spec are a subset of full JBIG2 specification (so called embedded profile). For example, in pdf you can have a jbig2 stream that can reference only a single shared symbol dictionary. The full spec does not have this restriction and it also defines a format to bring all pieces together (all of which is missing in pdfjs).
In summary, what you are looking for is technically possible (with some effort), but it is not simple.

Could you process it server side?
There's a good post on stackoverflow on using Java or tools
Print PDF that contains JBIG2 images

Related

How to capture broken image urls with javascript then pass the urls to php?

I have a site that displays a lot of external images and thumbnails etc, easily up to 100 on a single page. I crawl and index the urls to the images and save them in mysql and display them with this code inside simple loops from queries.
<img src="<?php echo $row['img_url']; ?>" onerror="this.onerror=null;this.src='http://example.com/image.jpg';" width="150" height="150">
I use that particular code to replace any broken image urls with a default image.
My question is, is it possible to use javascripts onerror or something else to capture the image url that is broken when a broken url is found so that I can pass the url back to php and be able to automatically delete the urls from my database?
I am not very good with javascript and after searching I can't seem to find anything similar to what I am looking for, I mostly just find lots of posts on how to replace the broken image.
I am open to any ideas really, the original image urls come from $row['img_url'] as you can see in the code but I know I need javascript or something to catch the errors and then somehow get the urls passed back to php so that I can automate the deletion process instead of just replacing images with default images like my currnt code.

you can use file_exists to check like so:
if(file_exists($row['img_url'])){
echo '<img src="'.$row['img_url'].'" onerror="this.onerror=null;this.src="http://example.com/image.jpg";" width="150" height="150">';
}

You are almost there. Simply replace this:
this.src='http://example.com/image.jpg'
with:
this.src='http://example.com/image-missing.php?url={$row['primaryKey']}&w=150&h=150'
Now, "image-missing.php" receives the URL of the missing file and the intended size of the missing image. What it needs to do is clean the database (after checking that the call is legitimate (1) and that the referred row exists(2)), and output a replacement image in the proper size.
(1) otherwise you have just handed the possibility of deleting your whole image database to the first guy with time on his hands.
(2) someone else might have loaded the same page, and called the same script one millisecond ago.

You should try something like this (example uses vanilla JS, no jquery or any 3rd party library):
var allImages = document.querySelectorAll('img');
for (var i = 0; i < deleteLink.length; i++) {
document.querySelector('img').addEventListener('error', function(e) {
// Delete a user
var url = "<DELETION_URL>";
var xhr = new XMLHttpRequest();
xhr.open("DELETE", url+'?resource=' + e.target.src, true);
});
}
<DELETION-URL> should point to a PHP script that accepts DELETE (or at least POST) requests and read the resource parameter. It will then remove the image provided that resource is truly missing.

Extracting EXIF data (specifically dateTime and GPSLatitude and GPSLongitude) with JavaScript

I have a program where a camera is set up to constantly take pictures (about every 10 seconds or so) and the picture is sent to a folder on my server and then another program refreshes that folder constantly so that I always just have the most recent picture in that particular folder.
An HTML document exists that also constantly refreshes, and references that picture location to get and display the newest image.
What I'm trying to do is extract the EXIF data (that I've verified exists when I save the image from the active webpage and look at it's properties). I want to display the DateCreated (I believe this is DateTime) and the Latitude and Longitude (I believe is GPSLatitude and GPSLongitude).
I came across this library, exif-js, which seems like the go-to for most people trying to do this same thing in JavaScript. My code looks the same as the code at the bottom of the README.md file, except I changed out my img id="...." and variable names, (see below). It seems like it should work, but it's not producing any data. My empty span element just stays empty.
Is there an issue with the short time span that the page has before refreshing?
Thanks for any help!
Here's what my code currently looks like (just trying to get the DateTime info). I have also tried the GPSLatitude and GPSLongitude tags.
<!-- Library to extract EXIF data -->
<script src="vendors/exif-js/exif-js"></script>
<script type="text/javascript">
window.onload=getExif;
function getExif()
{
var img1 = document.getElementById("img1");
EXIF.getData(img1, function() {
var time = EXIF.getTag(this, "DateTime");
var img1Time = document.getElementById("img1Time");
img1Time.innerHTML = `${time}`;
});
var img2 = document.getElementById("img2");
EXIF.getData(img2, function() {
var allMetaData = EXIF.getALLTags(this);
var allMetaDataSpan = document.getElementById("Img2Time");
allMetaDataSpan.innerHTML = JSON.stringify(allMetaData, null, "\t");
});
}
</script>

go into ur exif.js file and then go to line 930 and then change it to
EXIF.getData = function(img, callback) {
if ((self.Image && img instanceof self.Image
|| self.HTMLImageElement && img instanceof self.HTMLImageElement)
&& !img.complete)
return false;

I know this may be already solved but I'd like to offer an alternative solution, for the people stumbling upon this question.
I'm a developer of a new library exifr you might want to try. It's maintained, actively developed library with focus on performance and works in both nodejs and browser.
async function getExif() {
let output = await exifr.parse(imgBuffer)
console.log('latitude', output.latitude) // converted by the library
console.log('longitude', output.longitude) // converted by the library
console.log('GPSLatitude', output.GPSLatitude) // raw value
console.log('GPSLongitude', output.GPSLongitude) // raw value
console.log('GPSDateStamp', output.GPSDateStamp)
console.log('DateTimeOriginal', output.DateTimeOriginal)
console.log('DateTimeDigitized', output.DateTimeDigitized)
console.log('ModifyDate', output.ModifyDate)
}
You can also try out the library's playground and experiment with images and their output, or check out the repository and docs.

Getting image dimensions with Angular vs Node.js

I am confused about the best way to discover the image dimensions, or the naturalWidth of images, given the url to the image, most often found in the src attribute of an <img> tag.
My goal is take as input a url to a news article and use machine learning to find the top 5 biggest pictures (.jpg, .png, etc) files in the document. The problem with using the front-end to do this, is that I don't know of a way to use AJAX to http GET html from some random page of some random server, because of CORS related issues.
However, using Node.js, or some server technology, I can make requests to get the HTML from other servers (as one would expect) but I don't know a way of getting the image sizes without downloading the images first. The problem is that, I want the downloaded images on the front-end, not the back-end, and therefore downloading images with Node.js is wasted effort, if it's just to check the image dimensions.
Has anyone experienced this exact problem before? Not sure how to proceed. As I said, my goals are to download images on the front-end, and keep the ones that are bigger than say 300px in width.

Both ways are ok, depends greatly on exactly what you need to achieve in terms of performance:
To me seems that, the simplest way for you would be on client side, then you only need a few lines of JavaScript to do it:
var img = new Image();
img.onload = function() {
console.log(this.width + 'x' + this.height);
}
img.src = 'http://www.google.com/intl/en_ALL/images/logo.gif';
On server side is also possible but you will need to install GraphicsMagick or ImageMagick. I'd go with GraphicsMagick as it is faster.
Once you have installed both the program and it's module (npm install gm) you would do something like this to get the width and height.
gm = require('gm');
// obtain the size of an image
gm('test.jpg')
.size(function (err, size) {
if (!err) {
console.log(size.width + 'x' + size.height);
}
});
Also, this other module looks good, I haven't used it but it looks promsing https://github.com/netroy/image-size
To get the img urls from the html string
You can load your html string using a simple http request, then you need to use a regexp capture group to extract the urls, and if you're wanting to match globally g, i.e. more than once, when using capture groups, you need to use exec in a loop (match ignores capture groups when matching globally).
This way you'll have all the sources in an array.
For example:
var m;
var urls = [];
var rex = /<img[^>]+src="?([^"\s]+)"?\s*\/>/g;
// this is you html string
var str = '<img src="http://example.com/one.jpg />\n <img src="http://example.com/two.jpg />';
while ( m = rex.exec( str ) ) {
urls.push( m[1] );
}
console.log( urls );
// [ "http://example.com/one.jpg", "http://example.com/two.jpg" ]
Hope it helps.

Detecting if the user drops the same file twice on a browser window

I want to allow users to drag images from their desktop onto a browser window and then upload those images to a server. I want to upload each file only once, even if it is dropped on the window several times. For security reasons, the information from File object that is accessible to JavaScript is limited. According to msdn.microsoft.com, only the following properties can be read:
name
lastModifiedDate
(Safari also exposes size and type).
The user can drop two images with the same name and last modified date from different folders onto the browser window. There is a very small but finite chance that these two images are in fact different.
I've created a script that reads in the raw dataURL of each image file, and compares it to files that were previously dropped on the window. One advantage of this is that it can detect identical files with different names.
This works, but it seems overkill. It also requires a huge amount of data to be stored. I could improve this (and add to the overkill) by making a hash of the dataURL, and storing that instead.
I'm hoping that there may be a more elegant way of achieving my goal. What can you suggest?
<!DOCTYPE html>
<html>
<head>
<title>Detect duplicate drops</title>
<style>
html, body {
width: 100%;
height: 100%;
margin: 0;
background: #000;
}
</style>
<script>
var body
var imageData = []
document.addEventListener('DOMContentLoaded', function ready() {
body = document.getElementsByTagName("body")[0]
body.addEventListener("dragover", swallowEvent, false)
body.addEventListener("drop", treatDrop, false)
}, false)
function swallowEvent(event) {
// Prevent browser from loading the dropped image in an empty page
event.preventDefault()
event.stopPropagation()
}
function treatDrop(event) {
swallowEvent(event)
for (var ii=0, file; file = event.dataTransfer.files[ii]; ii++) {
importImage(file)
}
}
function importImage(file) {
var reader = new FileReader()
reader.onload = function fileImported(event) {
var dataURL = event.target.result
var index = imageData.indexOf(dataURL)
var img, message
if (index < 0) {
index = imageData.length
console.log(dataURL)
imageData.push(dataURL, file.name)
message = "Image "+file.name+" imported"
} else {
message = "Image "+file.name+" imported as "+imageData[index+1]
}
img = document.createElement("img")
img.src = imageData[index] // copy or reference?
body.appendChild(img)
console.log(message)
}
reader.readAsDataURL(file)
}
</script>
</head>
<body>
</body>
</html>

Here is a suggestion (that I haven't seen being mentioned in your question):
Create a Blob URL for each file-object in the FileList-object to be stored in the browsers URL Store, saving their URL-String.
Then you pass that URL-string to a webworker (separate thread) which uses the FileReader to read each file (accessed via the Blob URL string) in chunked sections, re-using one fixed-size buffer (almost like a circular buffer), to calculates the file's hash (there are simple/fast carry-able hashes like crc32 which can often be simply combined with a vertical and horizontal checksum in the same loop (also carry-able over chunks)).
You might speed up the process by reading in 32 bit (unsigned) values instead of 8 bit values using an appropriate 'bufferview' (that's 4 times faster). System endianness is not important, don't waste resources on this!
Upon completion the webworker then passes back the file's hash to the main-thread/app which then simply performs your matrix comparison of [[fname, fsize, blobUrl, fhash] /* , etc /*].
Pro
The re-used fixed buffer significantly brings down your memory usage (to any level you specify), the webworker brings up performance by using the extra thread (which doesn't block your main browser's thread).
Con
You'd still need serverside fall-back for browsers with javascript disabled (you might add a hidden field to the form and set it's value using javascript as means of a javascript-enabled check, as to lower server-side load). However.. even then.. you'd still need server-side fallback to safeguard against malicious input.
Usefulness
So.. no net gain? Well.. if the chance is reasonable that the user might upload duplicate files (or just uses them in a web-based app) than you have saved on waisted bandwith just to perform the check. That is quite a (ecological/financial) win in my book.
Extra
Hashes are prone to collision, period. To lower the (realistic) chance of collision you'd select a more advanced hash-algo (most are easily carry-able in chunked mode). Obvious trade-off for more advanced hashes is larger code-size and lower speed (higher CPU usage).

Use FileAPI to download big generated data file

The JavaScript process generates a lot of data (200-300MB). I would like to save this data for further analysis but the best I found so far is saving using this example http://jsfiddle.net/c2U2T/ which is not an option for me, because it looks like it requires all the data being available before starting the downloading. But what I need is something like
var saver = new Saver();
saver.save(); // The Save As ... dialog appears
saver.onaccepted = function () { // user accepted saving
for (var i = 0; i < 1000000; i++) {
saver.write(Math.random());
}
};
Of course, instead of the Math.random() will be some meaningful construction.

#dader - I would build upon dader's example.
Use HTML5 FileSystem API - but instead of writing to the file each and every line (more IO than it is worth), you can batch some of the lines in memory in a javascript object/array/string, and only write it to the file when they reach a certain threshold. You are thus appending to a local file as the process chugs (makes it easy to pause/restart/stop etc)
Of note is the following, which is an example of how you can spawn the dialoge to request the amount of data that you would need (it sounds large). Tested in chrome.:
navigator.persistentStorage.queryUsageAndQuota(
function (usage, quota) {
var availableSpace = quota - usage;
var requestingQuota = args.size + usage;
if (availableSpace >= args.size) {
window.requestFileSystem(PERSISTENT, availableSpace, persistentStorageGranted, persistentStorageDenied);
} else {
navigator.persistentStorage.requestQuota(
requestingQuota, function (grantedQuota) {
window.requestFileSystem(PERSISTENT, grantedQuota - usage, persistentStorageGranted, persistentStorageDenied);
}, errorCb
);
}
}, errorCb);
When you are done you can use Javascript to open a new window with the url of that blob object that you saved which you can retrieve via: fileEntry.toURL()
OR - when it is done crunching you can just display that URL in an html link and then they could right click on it and do whatever Save Link As that they want.
But this is something that is new and cool that you can do entirely in the browser without needing to involve a server in any way at all. Side note, 200-300MB of data generated by a Javascript Process sounds absolutely huge... that would be a concern for whether you are storing the "right" data...

What you actually are trying to do is a kind of streaming. I mean FileAPI is not suited for the task. Instead, I could suggest two options :
The first, using XHR facility, ie ajax, by splitting your data into several chunks which will sequencially be sent to the server, each chunk in its own request along with an id ( for identifying the stream ) and a position index ( for identifying the chunk position ). I won't recommend that, since it adds work to break up and reassemble data, and since there's a better solution.
The second way of achieving this is to use Websocket API. It allows you to send data sequentially to the server as it is generated. Following a usual stream API. I think you definitely need this.
This page may be a good place to start at : http://binaryjs.com/
That's all folks !
EDIT considering your comment :
I'm not sure to perfectly get your point though but, what about HTML5's FileSystem API ?
There are a couple examples here : http://www.html5rocks.com/en/tutorials/file/filesystem/ among which this sample that allows you to append data to an existant file. You can also create a new file, etc. :
function onInitFs(fs) {
fs.root.getFile('log.txt', {create: false}, function(fileEntry) {
// Create a FileWriter object for our FileEntry (log.txt).
fileEntry.createWriter(function(fileWriter) {
fileWriter.seek(fileWriter.length); // Start write position at EOF.
// Create a new Blob and write it to log.txt.
var blob = new Blob(['Hello World'], {type: 'text/plain'});
fileWriter.write(blob);
}, errorHandler);
}, errorHandler);
}
EDIT 2 :
What you're trying to do is not possible using javascript as said on SO here. Tha author nonetheless suggest to use Java Applet to achieve needed behaviour.
To put it in a nutshell, HTML5 Filesystem API only provides a sandboxed filesystem, ie located in some hidden directory of the browser. So if you want to access the true filesystem, using java would be just fine considering your use case. I guess there is an interface between java and javascript here.
But if you want to make your data only available from the browser ( constrained by same origin policy ), use FileSystem API.

We Keep Coding

JavaScript is the programming language of the Web.

Using pdf.js to render a JBIG2 image in a browser - javascript

Could you process it server side? There's a good post on stackoverflow on using Java or tools Print PDF that contains JBIG2 images

Related

How to capture broken image urls with javascript then pass the urls to php?

Extracting EXIF data (specifically dateTime and GPSLatitude and GPSLongitude) with JavaScript

Getting image dimensions with Angular vs Node.js

Detecting if the user drops the same file twice on a browser window

Use FileAPI to download big generated data file

Categories

Resources