Load and execution sequence of a web page? - javascript

I have done some web based projects, but I don't think too much about the load and execution sequence of an ordinary web page. But now I need to know detail. It's hard to find answers from Google or SO, so I created this question.
A sample page is like this:
<html>
<head>
<script src="jquery.js" type="text/javascript"></script>
<script src="abc.js" type="text/javascript">
</script>
<link rel="stylesheets" type="text/css" href="abc.css"></link>
<style>h2{font-wight:bold;}</style>
<script>
$(document).ready(function(){
$("#img").attr("src", "kkk.png");
});
</script>
</head>
<body>
<img id="img" src="abc.jpg" style="width:400px;height:300px;"/>
<script src="kkk.js" type="text/javascript"></script>
</body>
</html>
So here are my questions:
How does this page load?
What is the sequence of the loading?
When is the JS code executed? (inline and external)
When is the CSS executed (applied)?
When does $(document).ready get executed?
Will abc.jpg be downloaded? Or does it just download kkk.png?
I have the following understanding:
The browser loads the html (DOM) at first.
The browser starts to load the external resources from top to bottom, line by line.
If a <script> is met, the loading will be blocked and wait until the JS file is loaded and executed and then continue.
Other resources (CSS/images) are loaded in parallel and executed if needed (like CSS).
Or is it like this:
The browser parses the html (DOM) and gets the external resources in an array or stack-like structure. After the html is loaded, the browser starts to load the external resources in the structure in parallel and execute, until all resources are loaded. Then the DOM will be changed corresponding to the user's behaviors depending on the JS.
Can anyone give a detailed explanation about what happens when you've got the response of a html page? Does this vary in different browsers? Any reference about this question?
Thanks.
EDIT:
I did an experiment in Firefox with Firebug. And it shows as the following image:

Edit: It's 2022. If you are interested in detailed coverage on the load and execution of a web page and how the browser works, you should check out https://browser.engineering/ (open sourced at https://github.com/browserengineering/book)
According to your sample,
<html>
<head>
<script src="jquery.js" type="text/javascript"></script>
<script src="abc.js" type="text/javascript">
</script>
<link rel="stylesheets" type="text/css" href="abc.css"></link>
<style>h2{font-wight:bold;}</style>
<script>
$(document).ready(function(){
$("#img").attr("src", "kkk.png");
});
</script>
</head>
<body>
<img id="img" src="abc.jpg" style="width:400px;height:300px;"/>
<script src="kkk.js" type="text/javascript"></script>
</body>
</html>
roughly the execution flow is about as follows:
The HTML document gets downloaded
The parsing of the HTML document starts
HTML Parsing reaches <script src="jquery.js" ...
jquery.js is downloaded and parsed
HTML parsing reaches <script src="abc.js" ...
abc.js is downloaded, parsed and run
HTML parsing reaches <link href="abc.css" ...
abc.css is downloaded and parsed
HTML parsing reaches <style>...</style>
Internal CSS rules are parsed and defined
HTML parsing reaches <script>...</script>
Internal Javascript is parsed and run
HTML Parsing reaches <img src="abc.jpg" ...
abc.jpg is downloaded and displayed
HTML Parsing reaches <script src="kkk.js" ...
kkk.js is downloaded, parsed and run
Parsing of HTML document ends
Note that the download may be asynchronous and non-blocking due to behaviours of the browser. For example, in Firefox there is this setting which limits the number of simultaneous requests per domain.
Also depending on whether the component has already been cached or not, the component may not be requested again in a near-future request. If the component has been cached, the component will be loaded from the cache instead of the actual URL.
When the parsing is ended and document is ready and loaded, the events onload is fired. Thus when onload is fired, the $("#img").attr("src","kkk.png"); is run. So:
Document is ready, onload is fired.
Javascript execution hits $("#img").attr("src", "kkk.png");
kkk.png is downloaded and loads into #img
The $(document).ready() event is actually the event fired when all page components are loaded and ready. Read more about it: http://docs.jquery.com/Tutorials:Introducing_$(document).ready()
Edit - This portion elaborates more on the parallel or not part:
By default, and from my current understanding, browser usually runs each page on 3 ways: HTML parser, Javascript/DOM, and CSS.
The HTML parser is responsible for parsing and interpreting the markup language and thus must be able to make calls to the other 2 components.
For example when the parser comes across this line:
a hypertext link
The parser will make 3 calls, two to Javascript and one to CSS. Firstly, the parser will create this element and register it in the DOM namespace, together with all the attributes related to this element. Secondly, the parser will call to bind the onclick event to this particular element. Lastly, it will make another call to the CSS thread to apply the CSS style to this particular element.
The execution is top down and single threaded. Javascript may look multi-threaded, but the fact is that Javascript is single threaded. This is why when loading external javascript file, the parsing of the main HTML page is suspended.
However, the CSS files can be download simultaneously because CSS rules are always being applied - meaning to say elements are always repainted with the freshest CSS rules defined - thus making it unblocking.
An element will only be available in the DOM after it has been parsed. Thus when working with a specific element, the script is always placed after, or within the window onload event.
Script like this will cause error (on jQuery):
<script type="text/javascript">/* <![CDATA[ */
alert($("#mydiv").html());
/* ]]> */</script>
<div id="mydiv">Hello World</div>
Because when the script is parsed, #mydiv element is still not defined. Instead this would work:
<div id="mydiv">Hello World</div>
<script type="text/javascript">/* <![CDATA[ */
alert($("#mydiv").html());
/* ]]> */</script>
OR
<script type="text/javascript">/* <![CDATA[ */
$(window).ready(function(){
alert($("#mydiv").html());
});
/* ]]> */</script>
<div id="mydiv">Hello World</div>

1) HTML is downloaded.
2) HTML is parsed progressively. When a request for an asset is reached the browser will attempt to download the asset. A default configuration for most HTTP servers and most browsers is to process only two requests in parallel. IE can be reconfigured to downloaded an unlimited number of assets in parallel. Steve Souders has been able to download over 100 requests in parallel on IE. The exception is that script requests block parallel asset requests in IE. This is why it is highly suggested to put all JavaScript in external JavaScript files and put the request just prior to the closing body tag in the HTML.
3) Once the HTML is parsed the DOM is rendered. CSS is rendered in parallel to the rendering of the DOM in nearly all user agents. As a result it is strongly recommended to put all CSS code into external CSS files that are requested as high as possible in the <head></head> section of the document. Otherwise the page is rendered up to the occurance of the CSS request position in the DOM and then rendering starts over from the top.
4) Only after the DOM is completely rendered and requests for all assets in the page are either resolved or time out does JavaScript execute from the onload event. IE7, and I am not sure about IE8, does not time out assets quickly if an HTTP response is not received from the asset request. This means an asset requested by JavaScript inline to the page, that is JavaScript written into HTML tags that is not contained in a function, can prevent the execution of the onload event for hours. This problem can be triggered if such inline code exists in the page and fails to execute due to a namespace collision that causes a code crash.
Of the above steps the one that is most CPU intensive is the parsing of the DOM/CSS. If you want your page to be processed faster then write efficient CSS by eliminating redundent instructions and consolidating CSS instructions into the fewest possible element referrences. Reducing the number of nodes in your DOM tree will also produce faster rendering.
Keep in mind that each asset you request from your HTML or even from your CSS/JavaScript assets is requested with a separate HTTP header. This consumes bandwidth and requires processing per request. If you want to make your page load as fast as possible then reduce the number of HTTP requests and reduce the size of your HTML. You are not doing your user experience any favors by averaging page weight at 180k from HTML alone. Many developers subscribe to some fallacy that a user makes up their mind about the quality of content on the page in 6 nanoseconds and then purges the DNS query from his server and burns his computer if displeased, so instead they provide the most beautiful possible page at 250k of HTML. Keep your HTML short and sweet so that a user can load your pages faster. Nothing improves the user experience like a fast and responsive web page.

Open your page in Firefox and get the HTTPFox addon. It will tell you all that you need.
Found this on archivist.incuito:
http://archivist.incutio.com/viewlist/css-discuss/76444
When you first request a page, your
browser sends a GET request to the
server, which returns the HTML to the
browser. The browser then starts
parsing the page (possibly before all
of it has been returned).
When it finds a reference to an
external entity such as a CSS file, an
image file, a script file, a Flash
file, or anything else external to
the page (either on the same
server/domain or not), it prepares to
make a further GET request for that
resource.
However the HTTP standard specifies
that the browser should not make more
than two concurrent requests to the
same domain. So it puts each request
to a particular domain in a queue, and
as each entity is returned it starts
the next one in the queue for that
domain.
The time it takes for an entity to be
returned depends on its size, the
load the server is currently
experiencing, and the activity of
every single machine between the
machine running the browser and the
server. The list of these machines
can in principle be different for
every request, to the extent that one
image might travel from the USA to me
in the UK over the Atlantic, while
another from the same server comes out
via the Pacific, Asia and Europe,
which takes longer. So you might get a
sequence like the following, where a
page has (in this order) references
to three script files, and five image
files, all of differing sizes:
GET script1 and script2; queue request for script3 and images1-5.
script2 arrives (it's smaller than script1): GET script3, queue
images1-5.
script1 arrives; GET image1, queue images2-5.
image1 arrives, GET image2, queue images3-5.
script3 fails to arrive due to a network problem - GET script3 again
(automatic retry).
image2 arrives, script3 still not here; GET image3, queue images4-5.
image 3 arrives; GET image4, queue image5, script3 still on the way.
image4 arrives, GET image5;
image5 arrives.
script3 arrives.
In short: any old order, depending on
what the server is doing, what the
rest of the Internet is doing, and
whether or not anything has errors
and has to be re-fetched. This may
seem like a weird way of doing
things, but it would quite literally
be impossible for the Internet (not
just the WWW) to work with any degree
of reliability if it wasn't done this
way.
Also, the browser's internal queue
might not fetch entities in the order
they appear in the page - it's not
required to by any standard.
(Oh, and don't forget caching, both in
the browser and in caching proxies
used by ISPs to ease the load on the
network.)

If you're asking this because you want to speed up your web site, check out Yahoo's page on Best Practices for Speeding Up Your Web Site. It has a lot of best practices for speeding up your web site.

AFAIK, the browser (at least Firefox) requests every resource as soon as it parses it. If it encounters an img tag it will request that image as soon as the img tag has been parsed. And that can be even before it has received the totality of the HTML document... that is it could still be downloading the HTML document when that happens.
For Firefox, there are browser queues that apply, depending on how they are set in about:config. For example it will not attempt to download more then 8 files at once from the same server... the additional requests will be queued. I think there are per-domain limits, per proxy limits, and other stuff, which are documented on the Mozilla website and can be set in about:config. I read somewhere that IE has no such limits.
The jQuery ready event is fired as soon as the main HTML document has been downloaded and it's DOM parsed. Then the load event is fired once all linked resources (CSS, images, etc.) have been downloaded and parsed as well. It is made clear in the jQuery documentation.
If you want to control the order in which all that is loaded, I believe the most reliable way to do it is through JavaScript.

Dynatrace AJAX Edition shows you the exact sequence of page loading, parsing and execution.

The chosen answer looks like does not apply to modern browsers, at least on Firefox 52. What I observed is that the requests of loading resources like css, javascript are issued before HTML parser reaches the element, for example
<html>
<head>
<!-- prints the date before parsing and blocks HTMP parsering -->
<script>
console.log("start: " + (new Date()).toISOString());
for(var i=0; i<1000000000; i++) {};
</script>
<script src="jquery.js" type="text/javascript"></script>
<script src="abc.js" type="text/javascript"></script>
<link rel="stylesheets" type="text/css" href="abc.css"></link>
<style>h2{font-wight:bold;}</style>
<script>
$(document).ready(function(){
$("#img").attr("src", "kkk.png");
});
</script>
</head>
<body>
<img id="img" src="abc.jpg" style="width:400px;height:300px;"/>
<script src="kkk.js" type="text/javascript"></script>
</body>
</html>
What I found that the start time of requests to load css and javascript resources were not being blocked. Looks like Firefox has a HTML scan, and identify key resources(img resource is not included) before starting to parse the HTML.

Related

What happens when a script source is loaded multiple times?

Let's say I load a library multiple times:
<script src="https://unpkg.com/neovis.js#2.0.0-alpha.9"></script>
<script src="https://unpkg.com/neovis.js#2.0.0-alpha.9"></script>
<script src="https://unpkg.com/neovis.js#2.0.0-alpha.9"></script>
<script src="https://unpkg.com/neovis.js#2.0.0-alpha.9"></script>
<script src="https://unpkg.com/neovis.js#2.0.0-alpha.9"></script>
I check the Network tab in the devtool and see that it's only loaded once. This makes sense, but I wonder how exactly the engine handles this?
It detects duplication and only executes the first line?
It executes all the lines, but detects that the same file URL has been visited, so it doesn't download again?
It executes all the lines, but detects that the same file name has been downloaded, so it doesn't download again?
It executes all the lines, but detects that the same file content has been downloaded, so it doesn't download again?
I read this article, Preventing JavaScript Files from Loading Multiple Times – Michael Kennedy on Technology, and it seems that it will loaded multiple times by default. Am I correct?
All of the major browsers like Firefox, Safari, IE and Opera will cache a Javascript file the first time it is used, and then on subsequent script tags the browser will use the cached copy if it's available and if it hasn't expired. Having said that, caching behavior can usually be altered through browser configuration, so you cannot rely on any kind of caching behavior.
<script src="https://unpkg.com/neovis.js#2.0.0-alpha.9"></script>
<script src="https://unpkg.com/neovis.js#2.0.0-alpha.9"></script>
<script src="https://unpkg.com/neovis.js#2.0.0-alpha.9"></script>
<script src="https://unpkg.com/neovis.js#2.0.0-alpha.9"></script>
<script src="https://unpkg.com/neovis.js#2.0.0-alpha.9"></script>
Regarding the execution of the files, even if the files are indeed cached, you may have problems since the same code will be executed five times. At the very least, this will cause the browser to take more time than necessary and it may produce errors, since most JavaScript code isn't written to be executed multiple times. For example, it may attach the same event handlers multiple times.
For your shown code the script https://unpkg.com/neovis.js#2.0.0-alpha.9 will be executed 5 times.
Whether it will be downloaded multiple times depends on what is sent as a response for the https://unpkg.com/neovis.js#2.0.0-alpha.9 request and on the browser settings.
The server can send ETag or Modification Date headers in combination with caching policy information headers that tell the browser what to be done if the URL is requested another time.
The headers can say that a resource will be valid for a certain time so the browser does not need to do any further requests for that period of time.
It however could also say that the browser has to validate the freshness each time using the last received ETag or Modification Date.

Does failing to load a remote javascript file stop javascript execution in any browsers?

I've got a stand alone script file that I want to load from a 3rd party server:
<script type="text/javascript" src="//some_server/logger.js"></script>
There's a small chance the remote script won't be there sometimes (404), and I want to be sure that including this script doesn't affect how my app operates, since my app doesn't require the script to be loaded to work (it's an analytics tracker of sorts)
Can I include this script safely without it blocking or causing javascript errors in my app that stops other javascript from running?
I was thinking of adding the async and defer attributes to make the script load lazily. Is this enough? My app needs to work on IE8 and above.
Here's what I'm thinking right now:
<script async defer type="text/javascript" src="//some_server/logger.js"></script>
<script type="text/javascript">
console.log("I want this to always execute, no matter if the above script 404's or not!");
</script>
Can I include this script safely without it blocking or causing
javascript errors in my app that stop other javascript from running?
YES you can
A 404 does not halt execution of javascript in any way, only errors do.
As long as the server responds with a 404, and doesn't hang, the script not loading won't cause any noticeable delay.
This can be tested in different browsers by logging the time it takes to check a 404 or broken link.
Just the fact that the browser logs the time, shows that such scripts does not halt execution of javascript, the thread always continues on to the next script tag unless the parser encounters an error in a script, if the URL isn't found, no browser will throw an error, it just goes on as soon as the URL is not resolved.
<script>console.time('Test');</script>
<script type="text/javascript" src="http://www.broken.url/file.js"></script>
<script>console.timeEnd('Test');</script>
FIDDLE
Testing in IE, Chrome, Firefox and Opera shows that all browsers use less than 0.0002 seconds to resolve a broken link, and the time it takes to resolve a 404 depends on how fast the server responds, but for Google's servers it seems to consistently be less than 0.2 seconds in all browsers before the 404 status code is returned, and the browser keeps executing the next scripts.
Even adding up 20 scripts that all return a 404 takes generally less than half a second for the server to resolve and move on
FIDDLE
In other words, you can safely add any script that has a broken link or returns a 404, it won't break anything, and it won't hang the browser in any way, it only takes a few milliseconds for modern browser to determine that the script can't be loaded, and move on.
What you can't do, is include scripts that actually load, and contain fatal errors, as that will stop the entire thread, and any execution of scripts that comes after the error is encountered.
Define all functions you use (which are in //some_server/logger.js) as empty functions before loading the script and you'll have no exceptions even if you use them without the script being loaded.
<script type="text/javascript">
functionInLogger = function() {
};
functionInLogger2 = function() {
};
...
</script>
<script type="text/javascript" src="//some_server/logger.js"></script>
<script type="text/javascript">
functionInLogger();
functionInLogger2();
console.log("This will always work");
</script>
And when the script is loaded, it'll override the empty functions.
I could not find any popular browser that will stop execution upon a 404. And W3 standard only states this; (W3)
When the user agent is required to execute a script block, it must run the following steps:
...
If the load resulted in an error (for example a DNS error, or an HTTP 404 error)
Executing the script block must just consist of firing a simple event named error at the element.
You can place that script on the bottom of the page (after all your important script), to make sure this will not block rendering of your page.
Or you can also load it after document ready, this method will not give extra load time when the script are not found. Example:
$(document).ready(function() {
$('head').append('<script type="text/javascript" src="//some_server/logger.js"></script>');
});
or use $.getScript method
$(document).ready(function() {
$.getScript('//some_server/logger.js', function(data, textStatus) {
/*optional stuff to do after getScript */
});
});
* in example above I assume if you are using jQuery
I think you should use something like RequireJS.
From Wiki:
It allow developers to define dependencies that must load before a module is executed, so the module doesn't try to use outside code that isn't yet available.

does a web browser hold onto a web server connection while inline javascript is executing?

For example if I wrote the following code containing a link to a 3rd party javascript while which took 1 second to load:
<!DOCTYPE>
<html>
<head>
</head>
<body>
// note no async attribute!
<script src="//thirdparty.com/some/slow/loading/script.js">
</script>
</body>
</html>
Would the http connection to my web server be kept open until the the end of the document?
Update:
I'm not talking in the context of Connection: Keep-alive, this would obviously retain a connection after the page has loaded. I am referring to the fact that the browser may not have fully read the contents of the document from the server at the point it executes the in-line javascript, so would it still retain its connection to keep reading the rest of the file, or would this have been read but not yet added to the DOM?
No, the connection is not kept open until the document is completely parsed.
The document will continue to load while it's being parsed, and while the external script is requested, loaded, parsed and executed. The browser doesn't pause in the reading of the document just because it doesn't need any more data to parse right now, or because it's loading something else. It will still continue to load the document in the background.
Open connections is a more expensive resource than memory, so it's better for the browser to read all data into memory as fast as possible, instead of keeping connections open to read from them as data is needed.
The browser would first start loading your document. As soon as it has parsed the <script> tag it would make a connection to the third-party server and start loading the JS from there. The connection to your server will be closed when the loading of the main page is finished. However, the browser shows the loading icon as long as something (e.g. the JS) is loading, and it will also call window.onload after everything is loaded.

how should my site handle ocassionally missing javascript files gracefully?

Say I've got this script tag on my site (borrowed from SO).
<script type="text/javascript" async=""
src="http://edge.quantserve.com/quant.js"></script>
If edge.quantserve.com goes down or stops responding without returning a 404, won't SO have to wait for the timeout before the rest of the page loads? I'm thinking Chaos Monkey shows up and blasts a server that my site is depending on, a server that isn't part of a CDN and has a poor failover.
What's the industry standard way to handle this issue? I couldn't find a dupe on SO, maybe I'm searching for the wrong terms.
Update: I should have looked a bit more closely at the SO code, there's this at the bottom:
<script type="text/javascript">var _gaq=_gaq||[];_gaq.push(['_setAccount','UA-5620270-1']);
_gaq.push(['_setCustomVar', 2, 'accountid', '14882',2]);
_gaq.push(['_trackPageview']);
var _qevents = _qevents || [];
(function(){
var s=document.getElementsByTagName('script')[0];
var ga=document.createElement('script');
ga.type='text/javascript';
ga.async=true;
ga.src='http://www.google-analytics.com/ga.js';
s.parentNode.insertBefore(ga,s);
var sc=document.createElement('script');
sc.type='text/javascript';
sc.async=true;
sc.src='http://edge.quantserve.com/quant.js';
s.parentNode.insertBefore(sc,s);
})();
</script>
OK, so if the quant.js file fails to load, it's creating a script tag with ga.async=true;. Maybe that's the trick.
Possible answer: https://stackoverflow.com/a/1834129/30946
Generally, it's tricky to do it well and cross-browser.
Some proposals:
Move the script to the very bottom of the HTML page (so that almost everything is displayed before you request that script)
Move it to the bottom and wrap it in <script>document.write("<scr"+"ipt src='http://example.org/script.js'></scr"+"ipt>")</script> or the way you added after update (document.createElement('script'))
A last option is to load it via XHR (but this works only for same-domain, or cross-domain only if the CORS is enabled on a third-party server); you can then use timeout property of the XHR (for IE and Fx12+), and in the other browsers, use setTimeout and check the XHR's readyState. It's kind of convoluted and very non-cross-browser for now, so the option 2 looks the best.
Make a copy of the file on your server and use this. it will load your copy only if the one from the server has failed to load
<script src="http://edge.quantserve.com/quant.js"></script>
<script>window.quant || document.write('<script src="js/quant.js"><\/script>')</script>
To answer your question about the browser having to wait for the script to load before the rest of the page loads, the answer to that would typically be no. Typical browsers will have multiple threads processing the download of the page and linked content (CSS, images, js). So the rest of the page should be loaded, though the user's browser indicator will still show the page trying to load until the final request is fulfilled or timed out.
Depending on the nature of the resource you are trying to load, this will obviously effect your page differently. Typically, if you are worried about this, you can host all your files on a common CDN (or your website if it is not that highly trafficked), that way at least if one thing fails, chances are everything is failing and you have a bigger issue to contend with :)

Certain .js pages not loading in time

I have certain pages on my site that loads a good amount of JavaScript files. In the code below, edit_onload() is in script1.js. Typically all the scripts load fine and edit_onload fires successfully. On occasion it seems like script1.js isn't loading in time because edit_onload() errors with object expected. If you refresh the page everything will load fine.
My question is, shouldn't the <script> tag below wait for all of the .js files to load and then execute edit_onload()?
<script LANGUAGE="javascript" DEFER="true" for="window" event="onload">
<xsl:comment>
<![CDATA[
edit_onload();
]]>
</xsl:comment>
</script>
<script language="javascript" src="/_scripts/script1.js" defer="true"></script>
<script language="javascript" src="/_scripts/script2.js" defer="true"></script>
<script language="javascript" src="/_scripts/script3.js" defer="true"></script>
I think, an implementation of processing of deferred scripts is browser specific. For example, they can handle inline and external scripts in different queues. In general the 'defer' attribute is just a recommendation provided by a site developer for a user agent, but the user agent will not necessarily abide by the recommendation.
Also you should try using 'defer="defer"'. Can you move the call to the script1.js itself? Also you can periodically check an existence of a specific element moved to the end of the content being loaded and run the method only after the element is discovered.
BTW, you could possibly gain more control over script loading if you use a dynamic script loading metaphor (the link is just one of examples).

Categories