PhantomJS render of individual frame not working properly - javascript

I am using PhantomJS as a method of creating a local copy of a website, I have a function which traverses the frame structure of a website and grabs the frame contents as it goes, storing it in a global array. This part is working fine at the moment, the problem is:
At each step I am attempting to convert the frame to a Base64 encoded image using
var temp = require('webpage').create();
temp.content = currpage.frameContent; //set the temp page to be the current frame
var b64 = temp.renderBase64('png');
If I simply export currpage.frameContent to a file and open it, I can see it's contents as well as open it in a browser and see that it does indeed display what it is supposed to (ads, for the most part).
Although, the b64 variable has no value and there are no errors popping up when running the program.
I should also note that b64 doesn't always have no value, sometimes I do indeed get a proper rendering of the frame, depending on the site I am scraping.

After a while the issue has been uncovered, although we are switching directions with the project, I will post how to fix what my problem was.
In order to get the temp to render, all I had to do was set temp's viewport.
temp.viewportSize = {
width: 480,
height: 800
};

Related

Display pdf file over 2MB using embed element

I am making an application that brings up a preview of PDF files. Embedding with an embed element works well for small PDF files but fails for larger PDF files because of the size limits for data urls. I'm looking for a way to use the browser's native PDF viewer to view PDF files but without using data urls.
My code currently looks something like the following:
<script>
function addToCard(input) {
if (input.files.length <= 0) return;
let fileReader = new FileReader();
fileReader.onload = async function () {
pdfCard.src = fileReader.result;
};
fileReader.readAsDataURL(input.files[0]);
}
</script>
<input type=file oninput="addToCard(this)" />
<embed id=pdfCard style="width:100%;height:100%" />
Example. The original PDF is here.
You could use URL.createObjectURL() on the PDF. It also creates a URL representing the object; however, the difference between an object URL and a data URL is that, while a data URL contains the object itself, an object URL is a reference to the object, which is stored in memory. This means that object URLs are significantly shorter than data URLs and take less time to create.
There are two drawbacks to this approach that may prevent you from using it. The first is that an object URL will only work on the page on which it was created. Attempting to use an object URL on a different page will not work. If you need to access this URL anywhere other than the page it was created on, this approach will not work.
The second is that object URLs keep the object for which they were created stored in memory. You have to revoke the object URL when you are done using it with the URL.revokeObjectURL() method, otherwise it will cause a memory leak. This means that you might have to add some extra code that revokes the object URL once the PDF is loaded. This example may be helpful.
The implementation might look something like this:
function addToCard(input) {
if (input.files.length <= 0) return;
pdfCard.src = URL.createObjectURL(input.files[0])
// gonna have to call revokeObjectURL eventually...
}

Extracting EXIF data (specifically dateTime and GPSLatitude and GPSLongitude) with JavaScript

I have a program where a camera is set up to constantly take pictures (about every 10 seconds or so) and the picture is sent to a folder on my server and then another program refreshes that folder constantly so that I always just have the most recent picture in that particular folder.
An HTML document exists that also constantly refreshes, and references that picture location to get and display the newest image.
What I'm trying to do is extract the EXIF data (that I've verified exists when I save the image from the active webpage and look at it's properties). I want to display the DateCreated (I believe this is DateTime) and the Latitude and Longitude (I believe is GPSLatitude and GPSLongitude).
I came across this library, exif-js, which seems like the go-to for most people trying to do this same thing in JavaScript. My code looks the same as the code at the bottom of the README.md file, except I changed out my img id="...." and variable names, (see below). It seems like it should work, but it's not producing any data. My empty span element just stays empty.
Is there an issue with the short time span that the page has before refreshing?
Thanks for any help!
Here's what my code currently looks like (just trying to get the DateTime info). I have also tried the GPSLatitude and GPSLongitude tags.
<!-- Library to extract EXIF data -->
<script src="vendors/exif-js/exif-js"></script>
<script type="text/javascript">
window.onload=getExif;
function getExif()
{
var img1 = document.getElementById("img1");
EXIF.getData(img1, function() {
var time = EXIF.getTag(this, "DateTime");
var img1Time = document.getElementById("img1Time");
img1Time.innerHTML = `${time}`;
});
var img2 = document.getElementById("img2");
EXIF.getData(img2, function() {
var allMetaData = EXIF.getALLTags(this);
var allMetaDataSpan = document.getElementById("Img2Time");
allMetaDataSpan.innerHTML = JSON.stringify(allMetaData, null, "\t");
});
}
</script>
go into ur exif.js file and then go to line 930 and then change it to
EXIF.getData = function(img, callback) {
if ((self.Image && img instanceof self.Image
|| self.HTMLImageElement && img instanceof self.HTMLImageElement)
&& !img.complete)
return false;
I know this may be already solved but I'd like to offer an alternative solution, for the people stumbling upon this question.
I'm a developer of a new library exifr you might want to try. It's maintained, actively developed library with focus on performance and works in both nodejs and browser.
async function getExif() {
let output = await exifr.parse(imgBuffer)
console.log('latitude', output.latitude) // converted by the library
console.log('longitude', output.longitude) // converted by the library
console.log('GPSLatitude', output.GPSLatitude) // raw value
console.log('GPSLongitude', output.GPSLongitude) // raw value
console.log('GPSDateStamp', output.GPSDateStamp)
console.log('DateTimeOriginal', output.DateTimeOriginal)
console.log('DateTimeDigitized', output.DateTimeDigitized)
console.log('ModifyDate', output.ModifyDate)
}
You can also try out the library's playground and experiment with images and their output, or check out the repository and docs.

Detecting if the user drops the same file twice on a browser window

I want to allow users to drag images from their desktop onto a browser window and then upload those images to a server. I want to upload each file only once, even if it is dropped on the window several times. For security reasons, the information from File object that is accessible to JavaScript is limited. According to msdn.microsoft.com, only the following properties can be read:
name
lastModifiedDate
(Safari also exposes size and type).
The user can drop two images with the same name and last modified date from different folders onto the browser window. There is a very small but finite chance that these two images are in fact different.
I've created a script that reads in the raw dataURL of each image file, and compares it to files that were previously dropped on the window. One advantage of this is that it can detect identical files with different names.
This works, but it seems overkill. It also requires a huge amount of data to be stored. I could improve this (and add to the overkill) by making a hash of the dataURL, and storing that instead.
I'm hoping that there may be a more elegant way of achieving my goal. What can you suggest?
<!DOCTYPE html>
<html>
<head>
<title>Detect duplicate drops</title>
<style>
html, body {
width: 100%;
height: 100%;
margin: 0;
background: #000;
}
</style>
<script>
var body
var imageData = []
document.addEventListener('DOMContentLoaded', function ready() {
body = document.getElementsByTagName("body")[0]
body.addEventListener("dragover", swallowEvent, false)
body.addEventListener("drop", treatDrop, false)
}, false)
function swallowEvent(event) {
// Prevent browser from loading the dropped image in an empty page
event.preventDefault()
event.stopPropagation()
}
function treatDrop(event) {
swallowEvent(event)
for (var ii=0, file; file = event.dataTransfer.files[ii]; ii++) {
importImage(file)
}
}
function importImage(file) {
var reader = new FileReader()
reader.onload = function fileImported(event) {
var dataURL = event.target.result
var index = imageData.indexOf(dataURL)
var img, message
if (index < 0) {
index = imageData.length
console.log(dataURL)
imageData.push(dataURL, file.name)
message = "Image "+file.name+" imported"
} else {
message = "Image "+file.name+" imported as "+imageData[index+1]
}
img = document.createElement("img")
img.src = imageData[index] // copy or reference?
body.appendChild(img)
console.log(message)
}
reader.readAsDataURL(file)
}
</script>
</head>
<body>
</body>
</html>
Here is a suggestion (that I haven't seen being mentioned in your question):
Create a Blob URL for each file-object in the FileList-object to be stored in the browsers URL Store, saving their URL-String.
Then you pass that URL-string to a webworker (separate thread) which uses the FileReader to read each file (accessed via the Blob URL string) in chunked sections, re-using one fixed-size buffer (almost like a circular buffer), to calculates the file's hash (there are simple/fast carry-able hashes like crc32 which can often be simply combined with a vertical and horizontal checksum in the same loop (also carry-able over chunks)).
You might speed up the process by reading in 32 bit (unsigned) values instead of 8 bit values using an appropriate 'bufferview' (that's 4 times faster). System endianness is not important, don't waste resources on this!
Upon completion the webworker then passes back the file's hash to the main-thread/app which then simply performs your matrix comparison of [[fname, fsize, blobUrl, fhash] /* , etc /*].
Pro
The re-used fixed buffer significantly brings down your memory usage (to any level you specify), the webworker brings up performance by using the extra thread (which doesn't block your main browser's thread).
Con
You'd still need serverside fall-back for browsers with javascript disabled (you might add a hidden field to the form and set it's value using javascript as means of a javascript-enabled check, as to lower server-side load). However.. even then.. you'd still need server-side fallback to safeguard against malicious input.
Usefulness
So.. no net gain? Well.. if the chance is reasonable that the user might upload duplicate files (or just uses them in a web-based app) than you have saved on waisted bandwith just to perform the check. That is quite a (ecological/financial) win in my book.
Extra
Hashes are prone to collision, period. To lower the (realistic) chance of collision you'd select a more advanced hash-algo (most are easily carry-able in chunked mode). Obvious trade-off for more advanced hashes is larger code-size and lower speed (higher CPU usage).

Javascript dynamically show local file

I have a local text file which is kept changing by other programs. I want to write a html and javascript based web page to show the content of file dynamically. I have searched in google and found that most solutions require to get this text file via html element. I wonder if there is a way to get the file via a fixed path(lets say it is a string of the file directory) in javascript. I am using Javascript fileReader. Any help will be appreciated.
This is not possible using javascript running inside the browser. You will not be able to do anything outside the browser.
EDIT:
You could run a Node.js server though that runs on localhost and does your file operations you desire. You could build a API so your html page that you load in the browser calls your serverscript to do your file operations.
Do you understand what I mean?
How much information does the text file hold, Depending on your scenario it might be worth looking into javascript localstorage W3SCHOOLS local storage. Would that help your situation ?
What you can do is allow the user to choose the file of interest, using a file-input. Once done, the browser wil have access to the file, even though the JS wont have access to the file's full-path.
Once the user has chosen the file, you can reload it and refresh the view pretty-much as often as you please.
Here's a short demo, using a file input (<input type='file'/>) and an iframe. You can pick pretty much anything the browser will normally display, though there are limits on the size of the file that will work - due to the limit of the length of a URL - the file's data is turned into a data-url and that url is set as the source of the iframe.
As a demo, pick a file and then load it. Now, open the file in another program and change it. Finally, press the load button once again - the new content now fills the iframe. You can trigger the loading of the file by a timer or any other event in the page. As far as I'm aware, you cannot re-load it when it changes, since there's no notification from the OS - you have to use a button, timer, element event or whatever. Basically, you have to poll for changes.
<!DOCTYPE html>
<html>
<head>
<script>
function byId(e){return document.getElementById(e);}
window.addEventListener('load', onDocLoaded, false);
function onDocLoaded()
{
// uncomment this line for on-demand loading.
byId('loadBtn').addEventListener('click', onLoadBtnClick, false);
}
// fileVar is an object as returned by <input type='file'>
// tgtElem is an <iframe> or <img> element - can be on/off screen (doesn't need to be added to the DOM)
function loadFromFile(fileVar, tgtElem)
{
var fileReader = new FileReader();
fileReader.onload = onFileLoaded;
fileReader.readAsBinaryString(fileVar);
function onFileLoaded(fileLoadedEvent)
{
var result,data;
data = fileLoadedEvent.target.result;
result = "data:";
result += fileVar.type;
result += ";base64,";
result += btoa(data);
tgtElem.src = result;
}
}
function onLoadBtnClick(evt)
{
var fileInput = byId('mFileInput');
if (fileInput.files.length != 0)
{
var tgtElem = byId('tgt');
var curFile = fileInput.files[0];
loadFromFile(curFile, tgtElem);
}
}
</script>
<style>
</style>
</head>
<body>
<button id='loadBtn'>Load</button><input id='mFileInput' type='file'/><br>
<iframe id='tgt'></iframe>
</body>
</html>
you can use nodejs to watch for a filechange using watchfile module, if you just want to watch the filechange and its content. you can run following code using node, but it only consoles the file changed in your terminal.
var fs=require('fs');
fs.watchFile('message.text', function (curr, prev) { //listens to file change
fs.readFile('message.text', function(err,data){ //reads the file
console.log(data.toString()); //consoles the file content
});
});

How long does a Blob persist?

I'm trying to write a fail-safe program that uses the canvas to draw very large images (60 MB is probably the upper range, while 10 MB is the lower range). I have discovered long ago that calling the canvas's synchronous function toDataURL usually causes the page to crash in the browser, so I have adapted the program to use the toBlob method using a filler for cross-browser compatibility. My question is this: How long do Blob URLs using the URL.createObjectURL(blob) method last?
I would like to know if there's a way to cache the Blob URL that will allow it to last beyond the browser session in case somebody wants to render part of the image at one point, close the browser, and come back and finish it later by reading the Blob URL into the canvas again and resuming from the point at which it left off. I noticed that this optional autoRevoke argument may be what I'm looking for, but I'd like a confirmation that what I'm trying to do is actually possible. No code example is needed in your answer unless it involves a different solution, all I need is a yes or no on if it's possible to make a Blob URL last beyond sessions using this method or otherwise. (This would also be handy if for some reason the page crashes and it acts like a "restore session" option too.)
I was thinking of something like this:
function saveCache() {
var canvas = $("#canvas")[0];
canvas.toBlob(function (blob) {
/*if I understand correctly, this prevents it from unloading
automatically after one asynchronous callback*/
var blobURL = URL.createObjectURL(blob, {autoRevoke: false});
localStorage.setItem("cache", blobURL);
});
}
//assume that this might be a new browser session
function loadCache() {
var url = localStorage.getItem("cache");
if(typeof url=="string") {
var img = new Image();
img.onload = function () {
$("#canvas")[0].getContext("2d").drawImage(img, 0, 0);
//clear cache since it takes up a LOT unused of memory
URL.revokeObjectURL(url);
//remove reference to deleted cache
localStorage.removeItem("cache");
init(true); //cache successfully loaded, resume where it left off
};
img.onprogress = function (e) {
//update progress bar
};
img.onerror = loadFailed; //notify user of failure
img.src = url;
} else {
init(false); //nothing was cached, so start normally
}
}
Note that I am not certain this will work the way I intend, so any confirmation would be awesome.
EDIT just realized that sessionStorage is not the same thing as localStorage :P
Blob URL can last across sessions? Not the way you want it to.
The URL is a reference represented as a string, which you can save in localStorage just like any string. The location that URL points to is what you really want, and that won't persist across sessions.
When using URL.toObjectUrl() in conjuction with the autoRevoke argument, the URL will persist until you call revokeObjectUrl or "till the unloading document cleanup steps are executed." (steps outlined here: http://www.w3.org/TR/html51/browsers.html#unloading-document-cleanup-steps)
My guess is that those steps are being executed when the browser session expires, which is why the target of your blobURL can't be accessed in subsequent sessions.
Some other discourse on this: How to save the window.URL.createObjectURL() result for future use?
The above leads to a recommendation to use the FileSystem API to save the blob representation of your canvas element. When requesting the file system the first time, you'll need to request PERSISTENT storage, and the user will have to agree to let you store data on their machine permanently.
http://www.html5rocks.com/en/tutorials/file/filesystem/ has a good primer everything you'll need.

Categories