Pdf.js renders text inconsistently on canvas - javascript

I have a fairly large AngularJS app. I've integrated PDF.JS simpleviewer.html code from the examples folder inside my app(didn't modify it). When I use PDF.JS to render a single page within the container(not iframed), I get very inconsistent text rendering:
Original
Inconsistent boldness
Inconsistent fonts
When I use the same pdf file with the examples code outside of my app, it always renders correctly. It's only when I integrate simpleviewer.html inside my app (not modifying a single line), that this issue appears.
Can you suggest what might be the root cause for it. Which areas should I look for conflicts?

I figured that if I have multiple instances of the viewer on that same page, they will try to render pdfs all at the same time -> this causing the bad rendering of text. If there is just one instance, text renders fine.

Related

How to get element height without DOM?

I am building a SSR angular app and I got some logic which handles the location of different templates but in order to complete the logic I need the element height of each template without using the DOM.
The project, in short
I am building an app that styles HTML into looking like a PDF and then using that HTML to create a PDF using Aspose.
Why SSR?
SSR is needed for other platforms than a browser to get the HTML that then renders into a PDF. I.e, I have an Api that is able to call the SSR app, get the full HTML and print it into a PDF.
Why Angular?
The entire project is build on Angular and I need to be able to reuse the component that renders the PDF-look-alike HTML for direct editing within the HTML.
Now that you are up to speed, back to the problem at hand
The HTML consists of multiple templates which height changes according to the data added. In order to push sections down (or up) to a corresponding page I need the height of each template (after data is added) to know if the content exceeds the page height.
When navigating to the SSR app via a browser everything is rendered fine because I am able to make use of the DOM and query into each template to get the height of the element. But when accessing the app via an Api or Postman I get the HTML back but the logic that handles the section placements are broken because it doens't have the height - SO I need to get/calculate the height of each template WITHOUT the use of the DOM.
What I have found so far
I am pretty new at SSR but everything that manipulates the DOM or something that only a browser engine has doesn't seem to be a viable way - so #ViewChild, ngAfterViewInit, setTimeout and other DOM Api's or functionalities cannot be used in this case. I need to be able to prerender each template at the lifecycle step NgOnInit (at the latest).
Using libraries like Mustache or HandleBars renders the HTML fine from the data that I give it but I ONLY get the Html - no dimensions at all.
I have also tried to use createElement('div') after Mustache or HandleBars creates the element and add it into the newly created div. This adds the element fine to the div's childnodes but it doesn't calculate the height.
Maybe I am missing something using one of these libraries?
Is it even possible?
So far I am getting the impression that it is not possible to get an element dimensions from code without entering the DOM?
Headless Chrome?
Is running a Headless Chrome the only way to create a sort-of virtual DOM that I can manipulate from code?
Sorry for all the text and minimal amount of code but this question is more on how my approach should be and if I am heading in the wrong direction.
Thanks in advance!
The issue you describe is that the actual height of the DOM element is calculable only after that the element is inserted in the DOM.
For example, if you compile your Handlebars template and you insert in a new div element created with createElement('div'), the actual height will still be 0.
You will need to append your div to a current existing DOM element in order to calculate its height.
A workaround i often use is to load the component you want to know its height into the bottom of the page (or in a location the user is not currently watching), calculate its height and then eliminate it.
All the process is almost instantaneous and allows you to get the actual height of the element.
The fix for this was to go with Headless Chrome - at least in my situation.
Without a DOM you are not able to render the height and if you do not have a browser engine to render this for you, you'll need to spin up an engine that can - i.e Headless Chrome.
Using Puppeteer was the easiest way for me.

How to create pdf export of a part of the current page on the client side

I try to understand if I could use PhantomJS (or any similar js library) in my web application (in a javascript function called when "ExportToPdf button is hit) to create pdf export of a part of the user's current page, for example the contents of a div, but with its current styles.
And I need it to be made on client side.
I see over my searches that phantomJS is used as a tool (console app executable), but could I utilize it the way I need it?
Maybe in combination with jsPDF or some similar js library...
Or if you have any other solution to suggest... I already have tried jsPDF alone and it does not get any styles at all (all bootstrap class elements are missing). Also I tried html2canvas+jsPDF to get image of the desired div contents and put it on a pdf doc, but this does not seem to get all the height of the div (only the part that is viewable in browser gets exported)

Include PDF as an image in HTML

When using LaTeX one can include a PDF as an image (this is usually done, e.g., with scientific papers, in which one can include a graph in PDF, so that it can be shown properly at different scales).
By using some tools like remark and MathJax one can create web pages with some LaTeX insertion.
Now, suppose I am interested in including a PDF as an image, as I usually do with plain LaTeX files.
I have tried to include my PDF using the <img> tag, and everything was working, since I realized that this only works in Safari (since Safari considers PDFs as images too). This consequently does not work in other browsers, as Chrome / Firefox.
So, I tried to include the image with an <embed> tag, as shown here. However, what I obtain is a mini-PDF viewer inside the browser, with a grey frame all around the image I am including. I would instead like to include just the image, with no frames.
Is there a way of reproducing this behavior?
Thank you in advance.
I dont know what type of framework or cms you use for your Homepage. But i guess you have to use sth like "imagemagick" to render your pdf files, like its done in wordpress or other cms systems.

Is it possible to convert a dynamic HTML page with a lot of javascript to a page without javascript?

I have a page with a lots of javascript. However, the page once rendered remains static, there are no moving things or special effects, etc... It should be possible to render the same HTML without any javascript at all using only the plain HTML and CSS. This is exactly what I want - I would like to get a no javascript version of the particular page. Surely, I do not expect any dynamic behavior, so I am OK if buttons are dead, for example. I just want them rendered.
Now, I do not want an image. It needs to be an HTML with CSS, may be embedded with the HTML, which is fine too.
How can I do it?
EDIT
I am sorry, but I must have not been clear. My web site works with javascript and will not work without it. I do not want to check if it works without, I know it will not and I really do not care about it. This is not what I am asking. I am asking about a specific page, which I want to grab as pure HTML + CSS. The fact that its dynamic nature is lost is of no importance.
EDIT2
There is a suggestion to gram the HTML from the DOM inspector. This is what I did the first thing - in Chrome development utils copied as HTML the root html element and saved it to a file. Of course, this does not work, because it continues to reference the CSS files on the web. I guess I should have mentioned that I want it to work from the file system.
Next was to save the page as complete with all the environment using some kind of the Save menu (browser dependent). It saves the page and all the related files forming a closure, which can be open from the file system. But the html has to be manually cleaned up of all the javascript - tedious and error prone.
EDIT3
I seem to keep forgetting things. Images should be preserved, of course.
I have to do a similar task on a semi-regular basis. As yet I haven't found an automated method, but here's my workflow:
Open the page in Google Chrome (I imagine FireFox also has the relevant tools);
"Save Page As" (complete page), rename the html page to something nicer, delete any .js scripts which got downloaded, move everything into a single folder;
On the original page, open the Elements tab (DOM inspector), find and delete any tags which I know cause problems (Facebook "like" buttons for example) (I also try to delete script tags at this stage because it's easier) and copy as HTML (right-click the <html> tag. Paste this into (replace) the downloaded HTML file (remember to keep the DOCTYPE which doesn't get copied;
Search all HTML files for any remaining script sections and delete (also delete any noscript content), and search for on (that's with a space at the start but StackOverflow won't render it) to remove handlers (onload, onclick, etc);
Search for images (src=, url(), find common patterns in image filenames and use regular expressions to replace them globally. So for example src="/images/myimage.png" => |/images/||. This needs to be applied to all HTML and CSS files. Also make sure the CSS files have the correct path (href). While doing this I usually replace all href (links) with #;
Finally open the converted page in a browser (actually I tend to do this early on so that I can see if any change I make causes it to break), use the Console tab to check for 404 errors (images that didn't get downloaded or had a different name) and the Network tab to check if anything is still being loaded from the online version;
For any files which didn't get downloaded I go back to the original page and use the Resources tab to find them and download manually;
(Optional) Cull any content which isn't needed (tracker images/iframes, unused CSS, etc).
It's a big job. I'd love a tool which automated all that, but so far I haven't found one. The pages I download are quite badly made (shops) which have a lot of unusual code, so that's why there are so many steps. You might not need to follow every step.

Downloaded aspx website does not display well

I have downloaded a aspx webpage and saved it as html. I open it in IE and chrome and it takes time to load + some parts are missing. All the text is there but the onmouseover is not working properly and some css is not displaying correctly. Was the content not downloaded completely? i.e is it missing sme javascript, css or else?
I have done what you describe on many occasions for the purposes of putting together a prototype of new functionality in an existing application.
You will likely need to do a couple of things:
Ensure the paths to your JS and CSS resources are right (removing the unneccessary JS files, if any)
Also, you will likely need to update the paths in your CSS to any image resources in your page

Categories