Markup Madness, Part One: HTML/XHTML Deathmatch

Thursday August 23rd 2007, 6:18 pm Printer Friendly Version
Filed under:Firefox, World Wide Web
Posted By: Matt

Once upon a time I was a markup head, having gotten into the SGML scene just as it was careening towards the mainstream on the back of XML. We always used to make fun of the HTML people and how impure their markup was, although in reality we were insanely jealous of how successful they were. It didn’t occur to me until this February that by hitching my wagon to Mozilla I had long since gone over to the Dark Side, and that the HTML folks who used to be Them were now Us. One particular IRC discussion from last February (which I have edited for readability) was particularly edifying, and not just because it illustrates what happens when you try to simultaneously debate ten really smart people on a topic that they all master better than you. The topic, in case it isn’t obvious, is whether there is any hope of a web consisting of shiny valid XHTML instead of oozing HTML tag swill.

For those who don’t want to wade through the whole chat log, I would sum it up concisely as follows: Microsoft sucks. Somewhat less concisely, the point is that Internet Explorer doesn’t support XHTML, so even people who want to deliver their webpages as compliant XML actually have to send it using the text/html MIME type. This causes the HTML parser to be used, defeating the whole purpose of the exercise, especially the fact that the XML parser (if it were used, which it isn’t) would complain if there were errors in the markup. Not using the XML parser means that the web continues to fill up with documents that computers can’t easily process, preventing the appearance of tons of amazingly cool new applications. Thanks, Microsoft. Did I mention you suck?

Nor can you simply deliver your markup to IE as HTML and to other browsers as XML because of fundamental (albeit subtle) differences between the two languages. There is hope, however, in the form of WHAT WG and (X)HTML 5. Their biggest innovation, from my perspective, is to stop treating HTML as an SGML-based language. (Buy me a beer and I will keep telling you different ways in which SGML is a crazy freak show of an international standard until I pass out or you stop buying me beers.) This means that they can look at how different browsers (none of which include an SGML-compliant parser because that would be like putting a Chernobyl-era fission reactor into every new Prius that comes off the assembly line) actually handle all of HTML’s ambiguous undocumented edge cases, choose the best compromises (read: whatever IE does since Microsoft is the least likely to ever change) and document them.

The best part is that we’ll finally be able to create a single set of webpages and delivery them as HTML or XHTML just by changing the MIME type. This strikes me as ample reason for optimism, despite the gloomy ending of the aforementioned IRC log.

Update: As Boris Zbarsky was quick to point out to me, the WHAT WG’s work doesn’t solve the problems with HTML-to-XML conversion he raised in the chat log. But as explained here it does make it easier for people who are so motivated to use XML tools and data internally and convert to the HTML serialization for the stragglers (including the hilariously labeled “browser that currently holds the majority market share”) who can’t yet handle XHTML.


6 Comments »

  1. Considering that Microsoft is definitely not adopting anything coming out of the WHAT WG (instead, having members on the W3C HTML WG), work by the WHAT WG isn’t going to solve any problems if you care about IE implementation.

    Comment by Al Billings — 8/23/2007 @ 10:26 pm

  2. Chernobyl-era *fusion* reactor? Chernobyl was a fission plant ;)

    Comment by Jesse Ruderman — 8/23/2007 @ 10:47 pm

  3. If you want to try out a HTML 5 parser, there’s an online version of html5lib (a python implementation of the HTML 5 parsing spec) available.

    I’m curious to know what the other supposed advantages of an XML-based web would be. Extensibility is one (but we’re hoping to solve that in HTML somehow) and I guess sanity in situations involving legacy code is, in principle, another. However I think getting from a world of permissive parsers to one of strict parsers is not feasible. Indeed I don’t think that one could even start with strict parsers and keep them strict; competition between browsers would always drive them to take more and more liberties with strict error handling.

    Comment by jgraham — 8/23/2007 @ 10:56 pm

  4. Jesse - My brain was thinking fission, but apparently my fingers had other ideas.

    Comment by Matt — 8/24/2007 @ 12:17 am

  5. XHTML is nice in the backend, because it allows you to use the normal XML toolchain on the stuff you produce. And it’s good you can shove it down the browsers’ throats, thus avoiding conversion.

    However, I don’t see why I would want a browser not to display a document I’ve created because of a markup error, which is what is implied by the desire to go for strict XML parsing. Sure, it would make a lot of things simpler, but I think the drawbacks would outweigh the benefits by far.

    Comment by Juri Pakaste — 8/24/2007 @ 8:14 am

  6. Yuri, if you’re talking about an XML toolchain, then I don’t see how a browser could be served a document with a markup error (rather: a well-formedness error). It would either not give trouble at all, or already give parsing errors in the toolchain. If it would send unacceptable content to the browser, then that’s a sign that there must be something wrong with the XML toolchain, which I’d rather find out than leave undetected.

    Comment by Laurens Holst — 10/10/2007 @ 11:24 pm

Trackback URL RSS feed for comments on this post.

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

(required)

(required)


 

AllPeers File Sharing



AddThis Feed Button



Creative Commons License
This work is licensed under a Creative Commons License
Conestoga Street Wordpress Theme by Theron Parlin