Falsehoods Programmers Believe About HTML

By Artyom Bologov

Web is beautiful. Web is ugly. Web is astonishing. A part of this appeal is HTML, with its historical quirks. Many a programmer believe many things about HTML. And some of the beliefs are not necessarily true. So let's explore some falsehoods programmers believe about HTML.

Language & Parsing

HTML is just XML. All tags have matching closing tags. Some tags (like <li> or <p>) have implicit closing tags:
<li> List item without closing tag
<li> Another list item right after it
Example of implicit closing tags in <li> tag
HTML is almost XML. All tags have closing tags, even if implicit <img> and <input> are self-closing:
<!-- Notice the / here!-->
<input type=text/>
Self-closing <input> tag
Okay, okay, HTML is not XML. But all elements either have closing tags or self-close <br> and <hr> don't even need a self-close slash.

Actually, self-close slash is mostly optional (and discouraged) in HTML, so the difference is less pronounced.

Standard

HTML is defined by the standard It's defined by browser vendors and WHATWG (= browser vendors)
The standard does not change after validation The standard is "Living", and you can see (a very recent) date of last change at Living Standard page.
The standard is self-contained (relating to HTML only) HTML is also relating to a group of standards, including DOM and JavaScript. In fact, many features of HTML are defined as JavaScript classes.
There is only one (two? three?) doctypes for HTML documents Oh my sweet summer child...
<!DOCTYPE html>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd">
<!DOCTYPE math SYSTEM "http://www.w3.org/Math/DTD/mathml1/mathml.dtd">
Many (not all) HTML doctypes

Practices

No one uses XHTML ePub, a widespread ebook format, uses XHTML for content markup. It sucks, but it's a practice.

Runtime

Modifying DOM is slow React propaganda is probably to blame for this illusion. DOM is the most optimized data structure out there. Whatever you put in it—it'll sustain. React will not.
Browsers are just messy HTML parsers Browsers are JS evaluators. Browsers are layout engines. Browsers are computer graphics toolkits (WebGL and fonts). Browsers are OSes (they have file system interfaces, audio output, and many other APIs).
SEO is hard and you need frameworks for it Not really if you write simple semantic HTML. Because it's easy to parse and index, especially compared to JS-generated markup.
WebAssembly will deprecate HTML and JS These are different niches. You can't really make accessible websites with WebAssembly. So if you want universal pages openable everywhere, you have to stick with HTML etc.
HTML is not Turing-complete It is, given CSS and user input.

Did I Forget Any?

In case you haven't found your favorite falsehood, feel free to suggest more! This post will likely be on Reddit and Hacker News, so use comments there. Or use the contacts from the About page!

Leave feedback! (via email)