DOM Parser Chrome extension memory leak

DOM Parser Chrome extension memory leak - javascript

The problem
I have developed an extension that intercepts web requests, gets the HTML the web request originated from and processes it. I have used the DOMParser to parse the HTML and I have realised that the DOMParser is causing massive memory leak issues, which eventually causes the chrome extension to crash.
This is the code that causes the issues.
https://gist.github.com/uche1/20929b6ece7d647250828c63e4a2ffd4
What I've tried
Dev Tools Recorded Performance
I have recorded the chrome extension whilst it's intercepting requests and I noticed that as the DOMParser.parseFromString method was called, the more nodes and documents were created which weren't destroyed.
Dev tools screenshot
https://i.imgur.com/pMY50kR.png
Task Manager Memory Footprint
I looked at the task manager on chrome and saw that it had a huge memory footprint that wouldn't decrease with time (because garbage collection should kick in after a while). When the memory footprint gets too large the extension crashes.
Task manager memory footprint screenshot
https://i.imgur.com/c8fLWCy.png
Heap snapshots
I took some before and after screenshots of the heap and I can see the issue seems to be originating from the HTMLDocuments being allocated that isn't being garbage collected.
Snapshot (before)
https://i.imgur.com/Rg2CRi6.png
Snapshot (after)
https://i.imgur.com/UQgLuT1.png
Expected outcome
I would want to understand why the DOMParser is causing such memory issues, why it isn't being cleaned up by the garbage collector and what to do to resolve it.
Thanks

I have resolved the problem. It seems like the issue was because the DOMParser class for some reason kept the references of the HTML documents it parsed in memory and didn't release it. Because my extension is a Chrome extension that runs in the background, exaggerates this problem.
The solution was to use another method of parsing the HTML document which was to use
let parseHtml = (html) => {
let template = document.createElement('template');
template.innerHTML = html;
return template;
}
This helped resolve the issue.

You are basically replicating the entire DOM in memory and then never releasing the memory.
We get away with this in a client side app because when we navigate away, the memory used by the scripts on that page is recovered.
In a background script, that doesn't happen and is now your responsibility.
So set both parser and document to null when you are done using it.
chrome.webRequest.onCompleted.addListener(async request => {
if (request.tabId !== -1) {
let html = await getHtmlForTab(request.tabId);
let parser = new DOMParser();
let document = parser.parseFromString(html, "text/html");
let title = document.querySelector("title").textContent;
console.log(title);
parser = null; // <----- DO THIS
document = null; // <----- DO THIS
}
}, requestFilter);

I cannot point to a confirmed bug report in Chromium, but we were also hit by the memory leak. If you are developing an extension, DOMParser will leak in background scripts on Chromium based browser, but not on Firefox.
We could not get any of the workarounds mentioned here to solve the leak, so we ended up replacing the native DOMParser with the linkedom library, which provides a drop-in replacement and works in the browser (not only in NodeJs). It solves the leaks, so you might consider it, but there are some aspects that you need to be aware of:
It will not leak, but its initial memory footprint is higher then using the native parser
Performance is most likely slower (but I have not benchmarked it)
The DOM generated by its HTML parser might slightly different from what Firefox or Chrome would produce. The effect is most visible in HTML that is broken and where the browsers will attempt to error correct it.
We also tried jsdom first, which tries to be more compatible with the majors browsers at the cost of higher complexity of its codebase. Unfortunately, we found it difficult to make jsdom work in the browser (but on NodeJs it is works well).

Related

Can a website detect if any chrome extension is reading the DOM

I want to know from within javascript in a webpage if any chrome extension installed in Chrome is reading the DOM. I want to be able to do this even if the extension does not insert any HTML or script into the DOM. Is there some way I can do this? Basically I want to prevent scraping of the website content.
I see this option
http://blog.kotowicz.net/2012/02/intro-to-chrome-addons-hacking.html
but this requires that I know the ids of the extension. I want to be able to do this more natively across all extensions that are installed.
Is there some event I can get based on when a DOM element is ready or some DOM function is called?
Thanks in advance for your help.

There are lots of methods for reading the DOM, so catching everything would be difficult. However, one approach you can do is wrap individual DOM querying methods. For example:
const origFn = HTMLDocument.prototype.querySelector;
HTMLDocument.prototype.querySelector = function(...args) {
const error = new Error();
console.log(`querySelector was called. Args:`, args, ', Stack trace', error.stack);
return orig.apply(this, args);
}
This will do a console.log whenever document.querySelector is called. I'm not 100% sure if the stack trace will reference Chrome extension JavaScript, but it's worth a shot. Also, it's not guaranteed that all prototypes will be overwriteable like this.

iPhone crash for feeding base64 to webview

I am using UIImagePickerController to choose image/video,upon selection I am converting the resource into base64 string and I am sending that to wkwebview
NSData(contentsOfURL:(info[UIImagePickerControllerMediaURL] as? NSURL)!)?.base64EncodedDataWithOptions(NSDataBase64EncodingOptions.Encoding64CharacterLineLength)
I am sending the base64 string to webview using evaluatejavascript
Here is the JS function
function showResource(base64,type){
if (type == "image"){
document.getElementById("div1").innerHTML="<img src='data:image/jpeg;base64,"+base64+"' width='100' height='100' alt='No Image'/>";
}
else{
document.getElementById("div1").innerHTML="<video width='320' height='240' controls><source src='data:video/x-m4v;base64,"+base64+"'></video>";
}
}.
I am facing the following problems
sometimes the webpage that is in webview becomes blank
sometimes the application gets crashed
sometimes the device goes into multiple
reboot
I couldn't find any memory leak in native code.I tried Instruments.
try loading a video of 25 seconds long first time it won't crash and you can't find a memory leak in Instruments
try loading the same video again the results are same
try doing the same,this time you'll see the device has gone offline
or if we try to load a video that's 60 longs it will crash the first time itself.
I am not sure what's causing the problem.
As the page is turning blank , it could be a javascript memory leak I thought
But I don't think the above function can cause memory leak since we're reassigning the new base64 on the same variable the older base64 should have been released from the memory and moreover javascript is garbage collected language.
so this is contradicting.
so base64 conversion might take more memory which in turn causes iOS to crash the app or crash the iPhone also in worst case but if this is the case why sometime html page is turning blank.
so this is also contradicting.
Any help is appreciated !
Update:
So far in my research it shows the problem is with memory leak in javascript.

If xcode print "pointer being freed was not allocated * set a breakpoint in malloc_error_break to debug" ,try to turn off the safari web viewer.

Browser crashes on heavy single page pplication

We have a big single page application, that started to crash from time to time. We were trying to debug it for a while now, but unfortunately, still no results. We used traditional debugging tools, but they were not very useful - perhaps not used correctly.
The app seems to crash most often on Safari, it doesn't crash that often in Chrome, but it still does, so I can't rule out a problem with browser(s).
I have managed to get this crash report, which you can find at the end of this question, unfortunately I don't know what to look for in it. I know it's huge and I'm just throwing it at you saying "here, find a bug", but could you possibly have a look at it and give me some hint what might be wrong or what should I focus on in the report?
Here is the crash report http://pastebin.com/bNxpuS6T
Thanks

What I can see from the crash report and the source code is that your JavaScript code was trying to destroy some DOM objects while still iterating through those, which is the reason of the crash.
I guess you may want to check if any timer associated with the idle tabs is still active.
DETAILS:
The WebKit crashed at
1 com.apple.WebCore 0x00007fff83cace2d WebCore::ScriptExecutionContext::willDestroyActiveDOMObject(WebCore::ActiveDOMObject*) + 45
where the source code is (click here)
void ScriptExecutionContext::willDestroyActiveDOMObject(ActiveDOMObject* object)
{
ASSERT(object);
if (m_iteratingActiveDOMObjects)
CRASH();
m_activeDOMObjects.remove(object);
}

Is it possible to make an object created in console persistent throughout the session?

Goal: making a standalone modular JavaScript debuggin-utility (includes custom dom and event manipulation methods) to be used in the console (preferably Chrome) on any random sites of interests (with no backend access).
Usage: initially include module script directly via copy-paste to console or by creating a new script element pointing at myhomepage.com/shortandeasytoremember.js and call methods on the namespace from there on.
Problem: how to best make it persistent throughout the session on that webpage (so that I wouldn't need to reinclude it after every refresh) ?
Note: any additional browser compatibility is not required - as long as it works in the latest Chrome, it's all fine by me (but any effort in the compatibility department is always much appreciated for the sake of others). IF YOU READ THIS IN A FAR FUTURE and by then there exists a better solution than what is written down below, please take a moment to contribute with your superior knowledge.
What I currently have is an event listener on window.unload to save any state data to localStorage and a string to make it easier to reinclude the module after page reload using eval(localStorage.getItem('loadMyNS'));.
(function(ns, undefined){
'use strict';
//util methods on ns and few monkey patches for debugging ...
var store = 'if(!window.MyNS){' +
'var xyz9=document.createElement("script");' +
'xyz9.src="http://myhomepage.com/shortandeasytoremember.js";' +
'document.head.appendChild(xyz9);}';
localStorage.setItem('loadMyNS', store);
ns.save = function () {
// and use localStorage for some more data
// to be used by other methods after page reload
};
window.addEventListener('unload', ns.save, false);
}(window.MyNS = window.MyNS || {}));
(browsers with no localStorage or addEventListener may benifit from this article)
I've also concidered using the same schema with window.name instead of localStorage (as long as this still seems legid) just because writing eval(window.name) would take less typing ^^.
The trouble (one of them) I have with the "eval-script-tag-inclusion" is on the sites which block external non-https script sources. An ideal solution would be a globally accessible module which would live with state and methods included (and no initialization required after refresh) at least until I close the the window (or overwrite the ref ofc).
If that is currently absolutely not possible, a lesser solution yet still superior to my current code would suffice.

Script stack space exhausted firefox

I am working with a large XML response from a web service. When I try to get that using a URL, after some time it displays an error in Firebug that "script stack space quota is exhausted"
How can i resolve that?

It sounds like there is some recursion going on when processing the xml, that is essentially causing a stack overflow (by any name).
Thoughts:
work with less data
if you are processing the data manually, try to use less recursion? perhaps manual tail-call or queue/stack based
consider json - then you can offload to the script host to rehydrate the object without any extra processing

Have you tried disabling Firebug?

As of Firefox 3, the available stack space has dropped from 4MB to ~= 640KB (I'm passing on word of mouth here).
Do you happen to be running FF3?
https://bugzilla.mozilla.org/show_bug.cgi?id=420874

I had a similar problem, maybe the same.
This can happen if you try to parse a huge chunk of html with jQuery $(html).
In my tests this only happened on Firefox 3.6.16 on Windows.
Firefox 4.0.1 on Ubuntu behaved much better. Probably nothing to do with the OS, just the script engine in 4.x is much better..
Solution:
Instead of
var $divRoot = $(html);
I did
var $temp = $('<div style="display:none;">'); // .appendTo($('body')); // (*)
$temp.html(html); // using the client's html parsing
var $divRoot = $('> div', $temp); // or .children() or whatever
// $temp.remove(); // (*)
(*)
I remember that in some cases you need to add the temp node to the body, before jquery can apply any selectors. However, in this case it seemed to work just fine without that.
There was absolutely no difference on FF 4.x, but it did allow to avoid the stack space overflow error on FF 3.x.

We Keep Coding

JavaScript is the programming language of the Web.