The web-browser is not an easy platform to program against. I realized this in Dec 2008 when we decided to change the architecture of one of our products at Directi to a pure JavaScript client talking to a REST API, and even more so in Apr 2009 when we decided to build our desktop product on webkit. However, despite all the issues, the decision of betting the client on web technologies has paid off handsomely and since then our investments in using web-technologies on the client have increased manifold. Over the last two years, multiple teams at Directi have discovered the pains and joy of programming the browser, and almost everyday we learn something interesting about how browsers behave. As we kept learning stuff, I kept notes and thought it would be a good idea to annotate them and share them more widely. This is the first in a series of posts, in which I intend to cover (not necessarily in this order or with this breakup):

  • Architecture
  • HTTP
  • Security Model
  • Content and Rendering
  • JavaScript
  • Apps

Having said that, I have a few posts lying around where I have never gotten around to writing part 2. Let’s hope it does not happen in this case :)

1. Rendering Process

The job of a browser is to fetch and display a web-page. At a high level, most modern browsers carry out the following steps to render an HTML page:

(Source: https://developer.mozilla.org/en/Introduction_to_Layout_in_Mozilla)

  • Load the HTML
  • Parse it
  • Apply styles
  • Build frames
  • Layout the frames (flow)
  • Paint the frames
  1. Load: The browser tries to fetch the page from the specified location. Typically this would be thru a HTTP client. However, a HTML page may also be loaded from a filesystem. Irrespective, the loader fetches the HTML page from its location. The super important concept of Browser Cache comes into play over here – but more on this later. The way the HTML page gets loaded is different from the way the resources get loaded. In WebKit there are two different pipelines – one for loading the page and another for loading resources:
  2. (source: http://webkit.org/blog/1188/how-webkit-loads-a-web-page/)

  3. Parse: As the stream comes thru from the loader, an HTML parser starts building the DOM (also called a “Content Tree”) – each node here is an HTML element. Now a lot of HTML on the net is broken, and each browser has had to implement its own quirks to parse HTML leading to subtle incompatibilities. HTML 5 however specifies the parsing algorithm. As this gets adopted, the x-browser incompatibilities because of parsing should go away. While parsing, the engine may come across resources (JS, CSS, images, fonts, etc.) When that happens the particular resource is queued for loading and parsing continues. Again, there is more to this, which we’ll tackle later.
  4. Compute Styles: The browser provides a default stylesheet. Often the HTML page also has a set of styles specified. These styles need to be applied to the Content Tree. For this purpose, a “Rendering Tree” is built – this essentially consists of elements that are to be rendered. For example an element with display set to none would not appear in this tree (nor would its descendants). Nor would elements like HEAD and SCRIPT. Nodes in the Render Tree represent style information: CSS box model, z-order, opacity are all specified here
  5. Construct Frames: Most render-able elements follow the CSS box model: They have height, width, border, spacing, padding, margin and position. For these objects, a rectangular box – called a Frame – is created. Not all objects have a frame – for example the SVG image above does not have a frame. It is put inside an iframe, which has a frame. A frame has all the information on how the object itself is going to be rendered. What is not known however, is how is the element going to be placed w.r.t other elements.
  6. Compute Flow: Flow Computation or Layout Computation is about how elements are placed w.r.t each other and is mostly controlled by the CSS Visual Rendering Model. This is typically a recursive process from the root of the tree to leafs. Also, this is typically a lazy process – it is done on a need-basis. Basically when the layout engine determines that an element needs to be laid out (for example a newly added Node), it marks it as such by setting a dirty bit. The actual layout is done only when some method is called which requires the new information. A visual representation of the layout process can be seen in these videos:

    Most browsers do flow calculation at a higher resolution than what any display would have. This is to support zooming – when the user zooms in or out, the objects can be drawn correctly on the screen without requiring any extra steps other than mapping the coordinates to real pixels.

  7. Paint: Once the engine knows exactly where the objects need to be drawn, comes the process of actually rendering the objects on the screen. This process – called Painting – is described in agonizing detail in Appendix E of the CSS 2.1 Spec. This is basically a Tree walk from the root of the Rendering tree, where each node is asked to paint itself. The actual rendering is abstracted out thru a Graphics Engine which is responsible for actually turning on the pixels and things like hardware acceleration.

2. Rendering Modes

The actual execution of the rendering process described above can change completely based upon the rendering mode the browser decides to use for a particular page. The reason browsers have different rendering modes is because of the history of the web, and understanding rendering modes is very important to understanding how browsers behave. However, I would not touch upon it here since http://hsivonen.iki.fi/doctype/ does an excellent job of capturing all the details. If you are just interested in the background, read http://en.wikipedia.org/wiki/Quirks_mode.

3. Dynamic Pages

Pages can change because of JavaScript or because of user interaction which triggers parts of the rendering process:

(Source: https://developer.mozilla.org/en/Introduction_to_Layout_in_Mozilla)

  • If DOM elements are added or removed, the typical response of the browser is to follow the rendering process described earlier in almost serial order
  • If the Style attribute on an element is changed, the style for the element needs to be recomputed, the page re-flown and re-painted
    • Browsers may optimize this by batching style re-computes by queuing them
    • However, scripts often read back changes that they have just made which requires the re-styling queue to be flushed
    • For better performance, make style changes as a batch and then read them in a batch so that the queue is flushed less frequently
  • Some style changes are cheaper:
    • Changing size / location would not require style re-compute but only re-flowing and re-painting
    • Color change does not require re-flowing, but only re-painting
    • Scrolling also does not require re-computation, but only re-painting – this is typically done incrementally and may not even require full repainting (but things like fixed background images would necessitate full repainting). So moving elements by scrolling programmatically can be faster than moving elements by modifying their style attribute
  • Re-Flow – because of position or size changes – is typically recursive (root to leafs)
    • Some attribute changes in a child can trigger changes in the entire ancestry all the way up to the root. Example: Height changes
    • Some attribute changes in a parent can trigger changes in all the descendants right down to leaves. Example: Width changes
    • Browsers can detect that only a section of the tree may change and do re-flow only on that sub-tree

4. Resource Loading

As the Parser goes over the Content Tree, it may see an element referring an external resource (image, CSS, JS, font, etc), which needs to be loaded. This loading happens as follows:

Order of Loading

While typically resources get loaded in the order of appearance in the document, browsers optimize this by prioritizing stylesheets and JavaScript files ahead of images. It is also recommended that stylesheets be put at the top. This is because:

  • Stylesheets are required to build the Rendering Tree, but have no impact on Content Tree, so HTML parsing and JS execution can continue while CSS is downloaded and loaded.
  • A script could ask for style information even as the stylesheet is being downloaded and the rendering tree is being built. If this happens, you get an error. So you want to load styles before JS starts executing

Parallel Loading

Modern browsers maintain multiple persistent connections to a server. This allows parallel loading. Parallel loading is a good thing because it reduces the overall latency of the page getting delivered to the end-user. However, out of consideration for the load factor on web-servers, The HTTP 1.1 RFC recommends that “Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy”.

Note that there is a trade-off here between the overhead of number of open sockets vs. the overhead of opening new sockets, and its impact on latency. With the number of external resources being fetched by a page going up, it makes sense to optimize for reducing the number of times a new connection has to be setup, to reduce the latency and improve the user experience. Indeed, most browsers these days allow more than 2 simultaneous connections per host. Steve Souders summarizes the current situation nicely in his Roundup on Parallel Connections.

Blocking Loads

Since a script can call document.write(), parsing can’t proceed before the script is fully loaded, executed (if there is any inline script in the script block) and document.write() has been inserted. This means that a script load blocks parsing, and that means that further loading is blocked, preventing the parallelism mentioned above from being exploited. Modern browsers do help a bit. For example, in WebKit, when the main parser gets blocked because of a script load, it starts a side parser that figures out other resources to load in the rest of the HTML. However, that is WebKit – for other browsers, there are a couple of ways out:

  1. Put script blocks at the end – that way they do not pause any further parsing
  2. Use a hack to download scripts asynchronously – Souders sums these up in his post on Loading Scripts Without Blocking
  3. HTML 5 specifies the async attribute on the script tag which tells the browser that the script does not require synchronous execution and the parser can continue. WebKit recently started supporting this attribute and Firefox has supported this since 3.6.

5. Physical Architecture

Web browsers started off with a single process, single thread model. This was acceptable since web-pages were just documents that had to be rendered. However, the web has evolved from being document-centric to becoming application-centric – a lot of sites these days are applications, with a lot of active code, a far-cry from the static content browsers were designed to render. This gives rise to problems of stability, performance and security. To address these, most browsers have moved (or are in the process of moving) to a multi-process architecture. There are three drivers behind this trend:

  • Performance: Multiple processes exploit multiple cores
  • Security: The browser can spin up a new process in a lower privilege mode, reducing / removing the impact of malicious code
  • Stability: A badly behaved page / script / plugin does not impact others since it is isolated in a process.

Firefox

Firefox uses a single thread, single process model. This means that in Firefox a single UI thread is shared by all windows. The reason for that apparently is to allow X-DOM blocking calls from diff pages of the same origin. More details on http://www.mail-archive.com/[email protected]/msg03580.html. Network calls and web-worker requests are handled on different threads.

To provide better isolation and reliablility Firefox will move to a multi-process model with its Electrolysis project. However, this seems to be for plugins alone and pages would continue to be served from a single process.

IE

The first browser to ship with multiple process support was IE 7, with each browser window running in its own process:

(source: http://blogs.msdn.com/b/ie/archive/2008/03/11/ie8-and-loosely-coupled-ie-lcie.aspx)

IE 8 improved upon this model by putting each tab in its own process, but moving the frame and the broker into a common process for improving startup time. Microsoft calls this architecture Loosely Coupled Internet Explorer (LCIE):

(source: http://blogs.msdn.com/b/ie/archive/2008/03/11/ie8-and-loosely-coupled-ie-lcie.aspx)

The actual model is however more sophisticated than what the diagram above suggests since IE 8 tries to balance the benefits of more processes with the extra overhead, without compromising on security. The actual process model is:

  • Protected Mode processes: Irrespective of memory overhead, sites with different levels of configured security open in different processes. This approach called Protected Mode is based on Mandatory Integrity Control
  • Context-based tab-processes: The decision on whether to create a new tab-process or not is made depending upon the amount of memory available
  • Max tab-processes: A specific value of maximum tab processes that can be created for a single isolated session at specific MIC

More details: http://blogs.msdn.com/b/askie/archive/2009/03/09/opening-a-new-tab-may-launch-a-new-process-with-internet-explorer-8-0.aspx

Chrome

Chrome follows an approach similar to that of IE 8 – the host process for a tab is called a Renderer and the broker process is called Browser:

(source: http://dev.chromium.org/developers/design-documents/multi-process-architecture)

Chrome supports four process models:

  • Process per site-instance: Different visits to a site are in separate processes. Provides the highest level of isolation but also creates more overhead.
  • Process per site: Different sites are isolated from each other, but visits to the same site run in the same process. Reduces overall memory overhead, but if you have several pages from a site open, the size of a single Renderer would be quite large, perhaps slowing it down.
  • Process per tab: While the previous models consider source of origin, the process per tab model is based on the choice a user makes. One process is used for rendering one tab, and if in the same tab you switch to a different site, the process would continue.
  • Single process: This is the simplest process with no isolation.

In both Chrome and IE, a frame runs in the same process as its parent page. Also, separate process may prevent legal interactions between two pages from the same origin. Chrome’s solution to this is to not permit a x-process call even if it is legal. What IE does is to proxy these specific calls and convert them behind the scenes into some sort of IPC. This may also be supported by Chrome at some later stage.

  • Pingback: How Browsers Work – Part 1 – Architecture » Vineet Gupta » Web Coding Unravelled

  • Pingback: HTML Scripts Tips and Secrets » Blog Archive » How Browsers Work – Part 1 – Architecture » Vineet Gupta

  • Moha297

    awesome…. so much info on page performances, browser changes on performances. I still need to give another read to this article to absorb the info….

  • http://antony-stuff.blogspot.com/ Antony

    amazing stuff!!! You’re doing awesome posts!

  • Pingback: RealTime - Questions: "Is this web page working?"

  • http://antony-stuff.blogspot.com/ Antony

    Why did you decide to make desktop application with webkit? Could you explain your decision. There are others technologies: c/c++, .net. :)

    • Anonymous

      Hey Antony, sorry about the late reply – been busy with stuff. Reasons were:

      1) We wanted to go X-OS, so .Net was out

      2) C++ is of course possible, but a pain to work with, especially given how different the various windowing systems are across Windows, Mac and Linux. Basically you would need to come up with a UI port specific to each OS

      3) The other option was AIR, but unfortunately access to native APIs is not there in the current AIR platform though Adobe has been talking about it. We did spike it out though, and besides the native access issue, also found that the design experience on AIR is not designed for a classic graphic / web-designer. You need to express a lot of UI detail in code. That was not cool.

      4) Silverlight was the other option, however, natively hosted SL requires writing your own X-OS host, since Microsoft does not provide an AIR equivalent. And writing a native SL host would perhaps never work x-platform since it requires a lot of COM plumbing

      5) We want to provide a very similar experience on the web. So code-sharing between web and desktop was high on the agenda.

      6) We have strong skills on web technologies across dev + design teams, so this was more familiar terrain

      Hope that explains it …

  • Guest

    Thanks a lot for sharing this!

  • http://www.facebook.com/people/Ashwin-Seshadri/100001761581495 Ashwin Seshadri

    Can’t wait for the rest of the series

  • http://www.facebook.com/people/Ashwin-Seshadri/100001761581495 Ashwin Seshadri

    Great job. Can’t wait for the rest of the series

  • rodrigo

    Can’t wait for the rest of the series

  • priya

    Nice article. It would be great if you could explain about rendering engines like webkit in your subsequent article.

  • Geepgeek

    How can you call this as your work ? Please don’t take credit for some one else’s efforts. This piece is lifted straight out of:
    http://taligarsiel.com/
    http://taligarsiel.com/Projects/howbrowserswork1.htm

  • Geepgeek

    I take it back. The link I quoted has your references. Apologies.

  • as

    Very useful overview for people starting out fresh in web programming.

  • Pavan

    Really very informative. And nice article for those who wanted to know under browser hood. I am beginner and trying know about Webtool kit and during googling came here :) just bookmarking this page. Could you please tel me some good book of webkit and cross plateform, Browser internals and high traffic & scalable website development technology .