I'm attempting to use CefSharp (Offscreen) to get image information in a webpage. I'd like to avoid downloading the content twice (I'm aware I can pull the src string from an image tag, and then download the image again). Right now, this is my code:
using (var browser = new ChromiumWebBrowser("http://example.com"))
{
//All this does is wait for the entire frame to be loaded.
await LoadPageAsync(browser);
var res1 = await browser.EvaluateScriptAsync("document.getElementsByTagName('img')[0].src");
//res1.Result = the source of the image (A string)
var res2 = await browser.EvaluateScriptAsync("document.getElementsByTagName('img')[0]");
//This causes an APPCRASH on CefSharp.BrowserSubprocess.exe
}
The way I figure it, CefSharp is already downloading these images to render a webpage. I'd like to avoid making a second request to pull these images from the client, and pull them directly from the client. Is this possible? What are the limitations of the JavascriptResponse object, and why is it causing an APPCRASH here?
Some thoughts: I thought about base64 encoding the image and then pulling it out this way, but this would require me to generate a canvas and fill that canvas every time for each image I want, generate a base64 string, bring it to c# as a string, and then decode it back to an image. I don't know how efficient that would be, but I'm hoping there could be a better solution.
This is how I solved it:
result = await browser.EvaluateScriptAsync(#"
;(function() {
var getDataFromImg = function(img) {
var canvas = document.createElement('canvas');
var context = canvas.getContext('2d');
context.drawImage(img, 0, 0 );
var dataURL = canvas.toDataURL('image/png');
return dataURL.replace(/^data:image\/(png|jpg);base64,/, '');
}
var images = document.querySelectorAll('.image');
var finalArray = {};
for ( var i=0; i<images.length; i++ )
{
//I just filled in array. Depending on what you're grabbing, you may want to fill
//This with objects instead with text to identify each image.
finalArray.push(getDataFromDiv(images[i]));
}
return finalArray;
})()");
//Helper function for below
private static string FixBase64ForImage(string image)
{
var sbText = new StringBuilder(image, image.Length);
sbText.Replace("\r\n", string.Empty);
sbText.Replace(" ", string.Empty);
return sbText.ToString();
}
//In c# convert the data to a memory stream, and then load it from that.
var bitmapData = Convert.FromBase64String(FixBase64ForImage(image));
var streamBitmap = new MemoryStream(bitmapData);
var sourceImage = (Bitmap) Image.FromStream(streamBitmap);
Try executing this javascript...
How to get base64 encoded data from html image
CefSharp should have FileReader api.
Then you can have the EvaluateScriptAsync call return the base64 encoded image data.
Related
I have a python server that sends a frame of RGB data to my JS script via a websocket. The frame is encoded using simplejpeg python package.
The code looks like this:
jpeg_frame = simplejpeg.encode_jpeg(image=color_frame, quality=85,colorspace='RGB', colorsubsampling='444', fastdct=True)
jpeg_frame is passed to the websocket and sent to the JS script.
On the JS side however, I would like to decompress the image and have it in the form of a Uint8Array so that I can work with the data. The image does not have to be viewed.
The data recieved is in the form of ArrayBuffer.
This is what I have so far.
socketio.on('colorFrame',(data)=>{
var mime = 'image/jpeg';
var a = new Uint8Array(data);
var nb = a.length;
if (nb < 4)
return null;
var binary = "";
for (var i = 0; i < nb; i++)
binary += String.fromCharCode(a[i]);
var base64 = window.btoa(binary);
var image = new Image();
image.src = 'data:' + mime + ';base64,' + base64;
var canvas = document.createElement('canvas');
var ctx = canvas.getContext('2d');
ctx.drawImage(image,0,0)
image.onload = function(){
var image_data = ctx.getImageData(0,0, 960, 540);
console.log(image_data);
};
});
So far I could not yet figure out how I can decompress the image. I dont mind the inaccuracy of the lossy compression, I just want the image back to its original resolution and be able to convert it to a Uint8Array.
What is the simplest way to get JPEG decoding working in a JS scrip?
There is an excellent article "Decoding a PNG Image in JavaScript" that answers the literal question. It's a non-trivial answer. (Note: I am not the author of the article)
However
Based on your code, it looks like you might not care about actually decoding the Image yourself. If you just want to display it, take a look at this StackOverflow question
I have a custom Node.JS addon that transfers a jpg capture to my application which is working great - if I write the buffer contents to disk, it's a proper jpg image, as expected.
var wstream = fs.createWriteStream(filename);
wstream.write(getImageResult.imagebuffer);
wstream.end();
I'm struggling to display that image as an img.src rather than saving it to disk, something like
var image = document.createElement('img');
var b64encoded = btoa(String.fromCharCode.apply(null, getImageResult.imagebuffer));
image.src = 'data:image/jpeg;base64,' + b64encoded;
The data in b64encoded after the conversion IS correct, as I tried it on http://codebeautify.org/base64-to-image-converter and the correct image does show up. I must be missing something stupid. Any help would be amazing!
Thanks for the help!
Is that what you want?
// Buffer for the jpg data
var buf = getImageResult.imagebuffer;
// Create an HTML img tag
var imageElem = document.createElement('img');
// Just use the toString() method from your buffer instance
// to get date as base64 type
imageElem.src = 'data:image/jpeg;base64,' + buf.toString('base64');
facepalm...There was an extra leading space in the text...
var getImageResult = addon.getlatestimage();
var b64encoded = btoa(String.fromCharCode.apply(null, getImageResult.imagebuffer));
var datajpg = "data:image/jpg;base64," + b64encoded;
document.getElementById("myimage").src = datajpg;
Works perfectly.
You need to add img to the DOM.
Also if you are generating the buffer in the main process you need to pass it on the the render process of electron.
I'm fighting with PDF.JS. All I want is to ignore to add a pdf file in the GET-Paramter, for real.. who does this today??
So my problem is, I'm trying to load a pdf file into my loaded pdf.js-file. I want to use viewer.html and the viewer.js. The file is served as base64-encoded-string. For tests I am loading the base64 code into the html to have directly access over Javascript.
What files I'm loading:
build/pdf.js
build/pdf.worker.js
web/viewer.js
No loading errors
var BASE64_MARKER = ';base64,';
var pdfAsArray = convertDataURIToBinary("data:application/pdf;base64, " + document.getElementById('pdfData').value);
pdfjsLib.getDocument(pdfAsArray).then(function (pdf) {
//var url = URL.createObjectURL(blob);
console.log(pdfjsLib);
pdf.getPage(1).then(function(page) {
// you can now use *page* here
var scale = 1.5;
var viewport = page.getViewport({ scale: scale, });
// Prepare canvas using PDF page dimensions.
var canvas = document.getElementById('viewer');
var context = canvas.getContext('2d');
canvas.height = viewport.height;
canvas.width = viewport.width;
// Render PDF page into canvas context.
var renderContext = {
canvasContext: context,
viewport: viewport,
};
page.render(renderContext);
});
//pdfjsLib.load(pdf);
})
function convertDataURIToBinary(dataURI) {
var base64Index = dataURI.indexOf(BASE64_MARKER) + BASE64_MARKER.length;
var base64 = dataURI.substring(base64Index);
var raw = window.atob(base64);
var rawLength = raw.length;
var array = new Uint8Array(new ArrayBuffer(rawLength));
for (i = 0; i < rawLength; i++) {
array[i] = raw.charCodeAt(i);
}
return array;
}
Console.log here is..
{build: "d7afb74a", version: "2.2.228", getDocument: ƒ, LoopbackPort: ƒ, PDFDataRangeTransport: ƒ, …}
The PDF comes right and I can see Javascript gives me a correct console.log. I can see it has 2 pages and more than 1MB data. So I think the code and pdf is okay.
So and now I dont't want to use a fkn canvas. (I only saw tutorials where users working with canvas and not with the viewer.html) I'm not going to work with iframes,canvas or objects. I just want that the viewer.js is taking MY pdf not any other. (example.pdf)
I want that pdf.js is loading the pdf im parsing with PHP and onload it should just appears. PHP is giving the pdf base64 encoded.
I saw the article on the docu of pdf.js: https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#file
You can use raw binary data to open a PDF document: use Uint8Array instead of URL in the PDFViewerApplication.open call. If you have base64 encoded data, please decode it first -- not all browsers have atob or data URI scheme support. (The base64 conversion operation uses more memory, so we recommend delivering raw PDF data as typed array in first place.)
What a nice tipp. BUT nobody says where you have access to PDFViewerApplication. If I do this:
pdfjsLib.PDFViewerApplication.open(pdfAsArray);
I will get an error like 'open is not a function' (i tried with load() too)
Sry for my bad english, hope you understand my problem and can help me ..
I need to generate snapshots for seo.
I am using puppeteer(headless chrome) for this purpose.
On main page i have a canvas, on which i start to draw once the component has mounted (my main site is in react).
Issue is that when i get the html from puppeteer, the drawing on the canvas is not there.
In puppeteer code i wait till the content is not loaded.
html = await page.content()
How can i make puppeteer wait till the point canvas is not painted.
page.content will only return the HTML representation of the DOM. To get the actual image of a canvas inside the DOM, you can use the function toDataURL. This will return the image that is shown in a base64-encoded string.
Code sample
const dataUrl = await page.evaluate(() => {
const canvas = document.querySelector("#canvas-selector");
return canvas.toDataURL();
});
// dataUrl looks like this: "..."
const base64String = dataUrl.substr(dataUrl.indexOf(',') + 1); // get everything after the comma
const imgBuffer = Buffer.from(base64String, 'base64'); //
fs.writeFileSync('image.png', imgBuffer);
The evaluate call will return the base64 encoded buffer of the image. You need to first remove the "data:...," from that and then you can put that into a buffer. The buffer can then be saved (or handled in any other way).
I am just getting started with pdf.js and I am trying to load a pdf file from the raw pdf data. I have seen the code:
PDFJS.getPdf('cwpdf.pdf', function getPdfHelloWorld(data) {
...
}
But I am wondering if there is any way to load a pdf from the raw pdf data instead of from the filename. Is this possible?
I put together some complete code and was able to find the problem with the solution below:
var int8View = new Uint8Array(...); //populate int8View with the raw pdf data
PDFJS.getDocument(int8View).then(function(pdf) {
}
When using this solution I ran into the problem other users have seen (#MurWade and #user94154) - the stream must have data error message. It looks like the problem is in the following line:
var int8View = new Uint8Array(...);
The array containing the data does not get properly created, since the data is not in the expected format. Therefore, this line works for some cases, but it might not work in the general case.
I've put together a complete solution, that seems to work better. It loads a PDF file, and it converts it to a raw PDF stream. This is there just for testing purposes, in a real world example, the PDF stream will probably be received in a different fashion. You can examine the stream in a debugger, and it will show as plain text. Below is the key line of the code to make this sample work. Instead converting the raw PDF stream to an array, convert it to data.
var docInitParams = { data: pdfraw };
Then proceed with loading the data. Below is the complete working sample of how to load a standard raw PDF stream and display it. I used to PDF JS hello world sample as a starting point. Please let me know in the comments if any clarification is necessary on this.
'use strict';
PDFJS.getDocument('helloworld.pdf').then(function(pdf) {
pdf.getData().then(function(arrayBuffer) {
var pdfraw = String.fromCharCode.apply(null, arrayBuffer);
var docInitParams = {
data: pdfraw
};
PDFJS.getDocument(docInitParams).then(function(pdfFromRaw) {
pdfFromRaw.getPage(1).then(function(page) {
var scale = 1.5;
var viewport = page.getViewport(scale);
var canvas = document.getElementById('the-canvas');
var context = canvas.getContext('2d');
canvas.height = viewport.height;
canvas.width = viewport.width;
var renderContext = {
canvasContext: context,
viewport: viewport
};
page.render(renderContext);
});
});
});
});
Well, since no one else has answered I will post my findings. I figured out that yes, it is possible to load a pdf file from the raw data. The way this can be done is by using a UInt8Array populated with data in place of the url to where the pdf file is stored.
Example code to do this is below:
var int8View = new Uint8Array(...); //populate int8View with the raw pdf data
PDFJS.getDocument(int8View).then(function(pdf) {
}