I am trying to extract content from web pages using my firefox/chrome/safari extension. Capturing works fine but when I capture full web pages, it takes a long time and UI gets blocked. I want to move the capture/DOM parsing code to a different thread (Web Worker). But web workers do not have access to the DOM. Is there a way I can work around this?
I am using the following code to inject the script into the web page:
function executeScript(script, messageKey, callback) {
var wm = Components.classes["#mozilla.org/appshell/window-mediator;1"].getService(Components.interfaces.nsIWindowMediator);
var mainWindow = wm.getMostRecentWindow("navigator:browser");
mainWindow.gBrowser.selectedBrowser.messageManager.loadFrameScript(script, true);
mainWindow.gBrowser.selectedBrowser.messageManager.addMessageListener(messageKey, callback);
}
executeScript("chrome://extension/content/contentscript.js", "onSelectionReceived", onSelection);
All the DOM processing is happening inside this script 'contentscript.js'
If the work you are trying to perform needs to interact with the DOM and it happens to take a long time, and you can't refactor it to not need to interact with the DOM, then there is a way without using WebWorkers.
(Because as you discovered, WebWorkers do not have access to the DOM)
Consider using Array Processing. The basic idea is to split up the work you need to do into different chunks, and after reach chunk of work is completed, periodically give back control to the DOM (UI Thread) using the Timer.
Here is a basic example of Array Processing:
function saveDocument(id){
var tasks = [openDocument,writeText,closeDocument,updateUI]
setTimeout(function(){
//execute the next task
var task = tasks.shift();
task(id);
//determine if there's more
if (tasks.length > 0) {
setTimeout(arguments.calee, 25);
}
}, 25);
}
Related
Premise: I need to apply changes trough JavaScript because the free platform on which my site is based does not allow changes to some parts of the template, so I can't insert the scripts in the source through the tag, but I can insert them in the script administration panel provided by the platform.
The script put in the panel are normally called up before the building of the DOM tree, so I have to insert the scripts in the $(function) to force the call at the end of the dom tree building, but in this way the function is called after the page has been drawn and the user first sees the old page and then the new page generating a "single buffering" effect.
Summarize:
1 - browser loading page and display it
2 - browser loading image and display it
3 - browser execute script and display changes
Quite often the point two take about 1 seconds or more so the old page is displayed for that time.
Can I force javascript to run code between point one and point two, so that point three become point two, and point two become point three?
I try:
$(function) {}
$(window).onload() {}
$(document).ready() {}
document.addEventListener('DOMContentLoaded', function() {});
Nothing of these works as I want.
Thank you in advance for help.
Solution found:
// add 'timer' function which will be called once every millisecond
var step = setInterval(step, 1);
function step() {
// verify if element is loaded
if (document.getElementById('*element-id*')) {
// add code here
// stop function call
window.clearInterval(step);
}
}
This work very well for me.
I am building AngularJS applications which have common header with links to each of the application:
App1
App2
Each application is running on its own subdomain and when user clicks a link on the header - page redirects to that application.
I have to track user actions with the links, e.g. onClick events with Omniture (but the problem applies to Google Analytics as well). When I add an onClick event that calls a function to send event to Omniture, e.g.:
App1
trackLink() is a function of an AngularJS service, brief implementation:
trackLink: function (eVar8Code) {
s = this.getSVariable(s);
s.eVar8 = eVar8Code;
s.prop28 = s.eVar8;
this.sendOmnitureMessage(s, send, false);
return s;
},
the function executes asynchronously and returns right away. Then standard link's behaviour kicks in: page is redirected to the URL defined in "href" attribute. New page is loaded very quickly (around 70 ms) but AJAX request to Omniture has not been executed: it's all async.
I believe that using events for the links is incorrect approach, one should rather use Query parameters, e.g.:
App1
but it's hard to convince some.
What is a good practise to track events on links?
Change your function to include a short timeout (probably you'd let it return false to suppress default link behaviour, too, and redirect via the location object).
Google Analytics has hit callbacks which are executed after the call to Google was sent, you might want to look if Adobe Analytics has something similar (as this can be used for redirects after the tracking call has been made).
If event tracking and query parameters are interchangeable depends on your use case (they certainly measure different things). However event tracking is a well accepted way for link tracking.
As #Eike Pierstorff suggested - I used capabilities of Adobe Analytics native library to set a delay (200ms) which give the call to Adobe Analytics much better chances to succeed:
in HTML:
App1
in AngularJS service:
sendOmnitureMessageWithDelay: function (s, element, eVar8Code) {
var s = s_gi(s_account); // jshint ignore:line
s.useForcedLinkTracking = true;
s.forcedLinkTrackingTimeout = 200; // Max number of milliseconds to wait for tracking to finish
s.linkTrackVars = 'eVar8,prop28';
s.eVar8 = eVar8Code;
s.prop28 = eVar8Code;
var target = element;
if (!target) {
target = true;
}
s.tl(target, 'o', s.eVar8, null, 'navigate');
this.cleanOmnitureVars();
}
Here, element - is HTML element about.
It works pretty well in 99% of the cases but has issues on the slow and old devices where page loads before call to Adobe has been made. It appears that there is no good solution to this problem and there cannot be guarantee that events would always be recorded in Adobe Analytics (or Google Analytics).
I have a long-polling application written in JS fetching XML files to update a web page. It fetches every 5 seconds and is about 8KB of data. I have had this web page open for about 1 week straight (although computer goes to sleep in evening).
When first opening Chrome it starts at about 33K of my PC's memory. After I left it open for a week, constantly updating while the PC was awake, it was consuming 384K for just one tab. This is a common method that my application will be run (leaving the web page open for very long periods of time).
I feel like I am hindering Chrome's GC or am not doing some specific memory management (or maybe even a memory leak). I don't really know how a memory leak would be achievable in JS.
My app paradigm is very typical, following this endless sequence:
function getXml(file){
return $.get(file);
}
function parseXml(Xml){
return {
someTag : $(Xml).find('someTag').attr('val'),
someOtherTag: $(Xml).find('someOtherTag').attr('val')
}
}
function polling(modules){
var defer = $.Deferred();
function module1(){
var xmlData = getXml('myFile.xml').done(function(xmlData){
var data = parseXml(xmlData);
modules.module1.update(data);
}).fail(function(){
alert('error getting XML');
}).always(function(){
module2();
});
});
function module2(){
var xmlData = getXml('myFile.xml').done(function(xmlData){
var data = parseXml(xmlData);
modules.module2.update(data);
}).fail(function(){
alert('error getting XML');
}).always(function(){
defer.resolve(modules);
});
});
return defer.promise(modules);
}
$(document).on('ready', function(){
var myModules = {
module1 : new Module(),
module2 : new ModuleOtherModule()
}
// Begin polling
var update = null;
polling(myModules).done(function(modules){
update = setInterval(polling.bind(this, modules), 5000);
});
That's the jist of it... Is there some manual memory management I should be doing for apps built like this? Do I need to better management my variables or memory? Or is this just a typical symptom of having a web browser (crome/ff) open for 1-2 weeks?
Thanks
Your code seems ok but You don't posted what happens on method "udpate" inside "modules". Why I said that? Because could be that method who is leaking your app.
I recomender you two things:
Deep into update method and look how are you updating the DOM (be careful if there are a lot of nodes). Check if this content that you are updating could have associated events because if you assign a event listener to a node and then you remove the dom node, your listener still kepts in memory (until javascript garbage collector trash it)
Read this article. It's the best way to find your memory leak: https://developer.chrome.com/devtools/docs/javascript-memory-profiling
Hey guys just testing our pages out using the grunt-phantomcss plugin (it's essentially a wrapper for PhantomJS & CasperJS).
We have some stuff on our sites that comes in dynamically (random profile images for users and random advertisements) sooo technically the page looks different each time we load it, meaning the build fails. We would like to be able to jump in and using good ol' DOM API techniques and 'grey out'/make opaque these images so that Casper/Phantom doesn't see them and passes the build.
We've already looked at pageSettings.loadImages = false and although that technically works, it also takes out every image meaning that even our non-ad, non-profile images get filtered out.
Here's a very basic sample test script (doesn't work):
casper.start( 'http://our.url.here.com' )
.then(function(){
this.evaluate(function(){
var profs = document.querySelectorAll('.profile');
profs.forEach(function( val, i ){
val.style.opacity = 0;
});
return;
});
phantomcss.screenshot( '.profiles-box', 'profiles' );
});
Would love to know how other people have solved this because I am sure this isn't a strange use-case (as so many people have dynamic ads on their sites).
Your script might actually work. The problem is that profs is a NodeList. It doesn't have a forEach function. Use this:
var profs = document.querySelectorAll('.profile');
Array.prototype.forEach.call(profs, function( val, i ){
val.style.opacity = 0;
});
It is always a good idea to register to page.error and remote.message to catch those errors.
Another idea would be to employ the resource.requested event handler to abort all the resources that you don't want loaded. It uses the underlying onResourceRequested PhantomJS function.
casper.on("resource.requested", function(requestData, networkRequest){
if (requestData.url.indexOf("mydomain") === -1) {
// abort all resources that are not on my domain
networkRequest.abort();
}
});
If your page handles unloaded resources well, then this should be a viable option.
Is there any way to know how far browser loaded the page?
Either by using JavaScript or browser native functions.
Based on the page status i want to build progress bar.
I'm not 100% sure this will work, but.. here is the theory:
First of all, don't stop JavaScript running until the page has loaded. Meaning, don't use window.ready document.ready etc..
At the top of the page initialise a JavaScript variable called loaded or something and set it to 0.
var loaded = 0;
Throughout the page increment loaded at different points that you consider to be at the correct percentages.
For example, after you think half the page would have been loaded in the code set loaded = 50;.
Etc..
As I say, this is just a concept.
Code:
function request() {
showLoading(); //calls a function which displays the loading message
myRequest = GetXmlHttpObject();
url = 'path/to/script.php';
myRequest.open('GET',url,true);
myRequest.onreadystatechange = function() {
if(myRequest.readyState == 4 && myRequest.status == 200) {
clearLoading(); //calls a function which removes the loading message
//show the response by, e.g. populating a div
//with the response received from the server
}
}
myRequest.send(null);
}
At the beginning of the request I call showLoading() which uses Javascript to dynamically add the equivalent of your preLoaderDiv. Then, when the response is received, I call clearLoading() which dynamically removes the equivalent of your preLoaderDiv.
You'll have to determine it yourself, and to do that you'll have to have a way to determine it. One possibility is to have dummy elements along the page, know their total, and at each point count how many are already present. But that will only give you the amount of DOM obtained, and that can be a very small part of the load time - most often than not, the browser is idle waiting for scripts and images.