Inside a Browser: A Beginner's Guide

I used to think browsers were just windows. Open Chrome, type a URL, website appears. Simple.

Then in my second year, my professor asked us to explain what happens between pressing Enter and seeing the page. I said something like: "The browser fetches the HTML and shows it." He nodded slowly in that way professors do when they're being polite about a terrible answer.

The real answer took me weeks to piece together. And once I did, I genuinely started looking at browsers differently — not as windows, but as some of the most sophisticated software ever built. Your browser does more work in two seconds than most programs do in their entire lifetime.

Let me walk you through it. We'll go layer by layer, no specification pages, no jargon avalanches. Just how it actually works.

What a Browser Actually Is

Most people describe a browser as "software that opens websites." That's like calling a car "a seat that moves." Technically accurate. Completely misses the point.

A browser is a platform — a complex application that fetches resources from the internet, interprets multiple programming languages simultaneously, constructs a visual representation of those resources, and renders it on your screen, all while handling user interactions in real time.

It speaks HTTP to fetch files. It understands HTML, CSS, and JavaScript. It has its own layout engine, rendering pipeline, JavaScript runtime, and networking stack. It manages memory, security sandboxes, and multiple processes running in parallel.

All of that fires up every time you type a URL.

The Main Parts, Before We Dive In

Before going deep, here's the high-level map. Think of a browser as a team where each member has one job:

User Interface (UI): Everything you see and interact with that isn't the webpage itself. The address bar, the tabs, the back button, the bookmarks bar.

Browser Engine: The coordinator. Takes input from the UI, passes it to the rendering engine, manages communication between components.

Rendering Engine: The core worker. Takes HTML, CSS, and other resources and figures out what to draw on screen.

Networking Layer: Handles all HTTP/HTTPS requests — fetching HTML files, CSS files, images, JavaScript.

JavaScript Engine: Parses and executes JavaScript. Chrome uses V8. Firefox uses SpiderMonkey.

Data Storage: Cookies, localStorage, IndexedDB, cache — the browser's memory for persistent data.

These components hand work to each other in a pipeline. Understanding that pipeline is what this blog is about.

The User Interface: The Part You Know

The UI is everything you interact with before the webpage loads — address bar, back/forward buttons, reload, bookmarks, tabs.

The UI and webpage content are intentionally separated. The browser's address bar is not part of the webpage. This matters for security — a website can't fake your browser chrome to spoof URLs (at least not through normal means).

When you type a URL and press Enter, the UI hands that URL to the browser engine. That's when the real work starts.

Browser Engine vs Rendering Engine

These two get confused constantly, even in technical writing. Here's the simple version:

The browser engine is the glue. It sits between the UI and the rendering engine, coordinating communication. When you press back, it's the browser engine that tells the rendering engine to load the previous page. It manages navigation, history, and coordinates between components.

The rendering engine is the builder. It takes raw HTML and CSS and converts them into something visual. This is where your webpage actually gets constructed and drawn.

Different browsers use different rendering engines:

Chrome and Edge use Blink
Firefox uses Gecko
Safari uses WebKit (Blink was actually forked from WebKit)

You don't need to memorize which engine does what internally. The important thing is understanding the flow that all of them follow. That flow is remarkably similar across every browser.

Networking: Fetching the Raw Material

You pressed Enter. The networking layer takes over.

DNS resolution happens first — the domain gets converted to an IP address (exactly what we covered in the DNS resolution blog: root → TLD → authoritative nameserver). Then the browser opens a TCP connection to that IP and sends an HTTP request:

GET /index.html HTTP/1.1
Host: google.com
Accept: text/html

The server responds with the HTML file as a stream of bytes. The browser starts working on it immediately, even before the full file arrives.

While parsing HTML, the browser discovers more resources — a CSS file in <link>, JavaScript in <script>, images in <img>. Each triggers another network request. Modern browsers have a preload scanner that reads ahead in the HTML while the main parser is busy, fetching CSS and JS files early so requests overlap instead of queue up.

HTML Parsing: Building the DOM

The raw HTML the browser receives is just text. A string of characters. The browser needs to make it meaningful.

This process is called parsing. Let me explain parsing with something simpler first.

Consider this math expression: 3 + 4 × 2

You don't evaluate it left to right. You know multiplication comes before addition. You understand the structure — the rules — and evaluate accordingly: 3 + 8 = 11.

Parsing is the same idea. Take a flat sequence of characters, understand the rules of the language, and extract the structure.

For HTML, the parser reads character by character, identifies tags, attributes, and text content, and converts them into a structured tree. Take this HTML:

<html>
  <body>
    <h1>Hello World</h1>
    <p>This is a paragraph.</p>
  </body>
</html>

The parser turns this into a tree structure:

Document
└── html
    └── body
        ├── h1
        │   └── "Hello World"
        └── p
            └── "This is a paragraph."

This tree is called the DOM — Document Object Model. Every element in your HTML becomes a node in this tree. Every piece of text becomes a node. Even HTML comments become nodes.

The DOM is not your HTML file. It's a live, in-memory tree representation that the browser builds from your HTML. This is an important distinction. When JavaScript manipulates the DOM with document.querySelector() or element.innerHTML, it's operating on this tree in memory — not modifying your actual HTML file.

One thing I found interesting: HTML parsing is intentionally forgiving. If you write broken HTML, the parser doesn't crash. It makes its best guess and continues. That's why badly-written HTML often still renders — the browser is quietly fixing your mistakes.

CSS Parsing: Building the CSSOM

While HTML is being parsed (or sometimes after, depending on where your <link> tag is), the browser fetches and parses CSS files.

CSS parsing works similarly to HTML parsing but produces a different tree — the CSSOM, which stands for CSS Object Model.

Take this CSS:

body {
  font-size: 16px;
}

h1 {
  color: blue;
  font-size: 32px;
}

p {
  color: gray;
}

The browser parses these rules and builds a tree where styles cascade down from parent elements to children. This is what the "Cascading" in Cascading Style Sheets actually means — styles inherit and override as you go down the tree.

Here's something important that tripped me up early: CSS blocks rendering. The browser will not render the page until it has fetched and parsed all CSS files. It needs the complete CSSOM before it can draw anything, because any CSS file could override any style.

This is why putting <link rel="stylesheet"> in the <head> matters. It gives the browser a chance to fetch CSS early. And it's why large, slow-loading CSS files make your page feel slow — the entire render is waiting for that CSS to arrive and be parsed.

Render Tree: DOM Meets CSSOM

Now you have two trees: the DOM (structure) and the CSSOM (styles). The browser combines them into a third structure called the Render Tree.

The Render Tree contains only the elements that will actually be visible on screen. It merges each DOM node with its computed styles from the CSSOM.

This is where some elements get excluded. <head> elements — <title>, <meta>, <link> — are in the DOM but not the Render Tree, because they don't render visually. Elements with display: none are also excluded — they're in the DOM, but they don't occupy space on screen.

Think of it this way:

DOM = the full structure, including invisible elements
CSSOM = the style rules
Render Tree = visible elements with styles attached

The Render Tree is what the browser actually uses to figure out what to draw.

Layout (Reflow): Where Does Everything Go?

Having a Render Tree tells the browser what to draw. But not where.

That's the Layout stage (also called Reflow). The browser walks the Render Tree and calculates the exact position and size of every element in pixels. This <h1> is 960px wide, starts at x=20, y=80. This <p> is 920px wide, starts at x=20, y=140. Percentage widths become pixels. Flexbox and grid calculations happen here. Text wrapping is figured out.

Layout is expensive. It runs again (hence "reflow") any time something changes element positions — a DOM element added, a CSS class toggled, an image that loads and shifts the page. Minimizing reflows is one of the most fundamental frontend performance optimisations.

Painting: Filling in the Pixels

After layout, the browser knows where everything goes. Now it fills in the visuals — colors, text, shadows, borders, images. That's the Paint stage.

Modern browsers paint in layers. Animated or transformed elements get their own layers. These layers are then composited — combined by the GPU in the correct order — and displayed on screen.

This is why CSS transform animations are smoother than changing top/left — transforms happen at the compositing stage (GPU), not at the paint stage (CPU). Understanding this is the difference between 60fps animations and janky ones.

The Full Flow, Start to Finish

Let me bring it together. Type a URL, press Enter. Here's what happens:

URL entered
  → DNS resolution (domain → IP address)
    → TCP connection opened
      → HTTP request sent
        → HTML received and parsed
          → DOM built
        → CSS fetched and parsed
          → CSSOM built
        → DOM + CSSOM → Render Tree
          → Layout (calculate positions)
            → Paint (fill pixels)
              → Composite (combine layers)
                → Webpage visible ✓

Every request. Every page. Every time. On a fast connection with a simple page, this entire pipeline can run in under a second.

Where JavaScript Fits In

JavaScript complicates the timeline — in an interesting way.

When the HTML parser encounters a <script> tag, it stops. It pauses HTML parsing, fetches the JavaScript file, executes it, then resumes parsing. This is why you've heard "put your scripts at the bottom of <body>" — placing scripts in <head> would block the entire page from rendering until the script downloads and runs.

JavaScript can modify the DOM. document.createElement(), element.remove(), element.style.color = 'red' — all of these change the Render Tree, which can trigger new Layout and Paint stages. This is called a reflow and repaint, and doing it excessively is one of the main causes of janky, slow websites.

The async and defer attributes on script tags exist precisely to give you control over when JavaScript runs relative to HTML parsing — so you can avoid blocking the render pipeline unnecessarily.

What This Means for You as a Developer

Understanding browser internals doesn't mean you need to memorize every specification. But it does change how you write code.

Knowing that CSS blocks rendering explains why you should keep stylesheets lean and load them early. Knowing that DOM manipulation triggers reflows explains why batching DOM changes is faster than making them one at a time. Knowing that JavaScript blocks HTML parsing explains defer and async. Knowing that painting happens in layers explains why CSS transforms are more performant than changing top and left for animations.

Every performance tip you'll ever read about frontend development — lazy loading, critical CSS, script deferral, avoiding layout thrashing — traces back to one or more of these pipeline stages.

You don't have to be a browser engineer to benefit from knowing this. You just have to understand the flow.

Browsers are the most widely deployed, most used, and least understood piece of software in the world. Millions of people use them every day without ever thinking about DOM trees or render pipelines. But you're building things for browsers — and that's a good reason to understand what's happening on the other side of your code.

How a Browser Works: A Beginner-Friendly Guide to Browser Internals

What a Browser Actually Is

The Main Parts, Before We Dive In

The User Interface: The Part You Know

Browser Engine vs Rendering Engine

Networking: Fetching the Raw Material

HTML Parsing: Building the DOM

CSS Parsing: Building the CSSOM

Render Tree: DOM Meets CSSOM

Layout (Reflow): Where Does Everything Go?

Painting: Filling in the Pixels

The Full Flow, Start to Finish

Where JavaScript Fits In

What This Means for You as a Developer

More from this blog

Getting Started with cURL

Understanding HTML Tags and Elements

Emmet for HTML: A Beginner's Guide to Writing Faster Markup

CSS Selectors 101: Targeting Elements with Precision

TCP vs UDP: When to Use What, and How TCP Relates to HTTP

Command Palette

What a Browser Actually Is

The Main Parts, Before We Dive In

The User Interface: The Part You Know

Browser Engine vs Rendering Engine

Networking: Fetching the Raw Material

HTML Parsing: Building the DOM

CSS Parsing: Building the CSSOM

Render Tree: DOM Meets CSSOM

Layout (Reflow): Where Does Everything Go?

Painting: Filling in the Pixels

The Full Flow, Start to Finish

Where JavaScript Fits In

What This Means for You as a Developer

More from this blog