Paris Hilton, Marsellus Wallace and the Extensible Web

Sunday March 20th 2005, 1:32 pm Printer Friendly Version
Filed under:Semantic Web, World Wide Web
Posted By: Matt

Well, reaction to my “Why AutoLink is Evil” post was a bit more heated than I expected. I’m certainly glad that I didn’t go with my original title: “AutoLink: Spawn of Satan?”

Seriously, there seems to be some kind of backlash effect going on here where anyone who criticizes tools like AutoLink and Greasemonkey, for whatever reason, is seen to be taking a totalitarian, anti-user stance. In my case nothing could be further from the truth. I don’t think that we should be up in arms because these tools are letting all those pesky users muck around with our pristine web. On the contrary, as users we should be demanding much more powerful, reliable and extensible ways of mucking with the web.

Any description of a web technology as “evil” has got to be largely tongue-in-cheek. This should be obvious so I’m not going to retreat from that characterization. But let me try to clarify what I said with an analogy.

AutoLink is like Paris Hilton: it’s only interesting when it’s being naughty. AutoLink is exciting because it re-raises all of those questions about who should have the right to modify web content, and these clearly strike a nerve in a lot of people. But the main issue with AutoLink isn’t whether it does or does not violate the sanctity of other people’s webpages, it’s that it’s so bloody useless. I can see a small amount of marginal utility in linking addresses to maps, much less so with ISBN numbers, and as far as package tracking and VIN numbers are concerned: come on, is this someone’s idea of a joke?

I think that these categories of information must have been chosen by Google because they’re easy to identify using simple pattern matching. The kinds of stuff that I often do want to look up on webpages, things like obscure vocabulary, acronyms, foreign words and company names, are much harder to pick out in this way. If Google works its statistical processing magic and comes up with something that, a large proportion of the time, selects content from a page that I might plausibly want to look up somewhere, then I’ll revise my AutoLink is Evil assessment. In the meantime, I stick by my assertion that the right way to do this is for the content author to mark up linkable stuff unambiguously using XML. In other words, we should move towards a web where, instead of having constantly to make link/no-link decisions, you can designate text as maybe-link, in which case different graphical interfaces can decide how and when to present these links and what they should point at.

Greasemonkey is like Marsellus Wallace: it’s evil, but cool. It’s evil when people start thinking that this is a good way to bring the customizable web to a large number of desktops. It’s cool because for tech-savvy users, the extreme sports enthusiasts of the internet world, it provides a quick-and-dirty way to play around with web content. The intriguing thing about Greasemonkey is that it could easily become a good guy since the technique it uses is equally applicable to XML content.

As far as the “semantic web is vaporware” argument goes, we need to remember that a lot technologies take off just about the time that people start to write them off as vacuous. This is because our expectations of how fast radical change is going to happen tend to be wildly optimistic. Add to this the fact that the powers-that-be dropped the ball in many ways when putting together the specs that were supposed to be the underpinnings of XML on the web, and it’s hardly surprising that it hasn’t happened yet.

Nonetheless, there’s plenty of reason for optimism. One of the biggest barriers to adopting XML on the web has been the lack of tools for visualizing that XML. No one wants to be the first one to deploy content that can only be viewed inside a sea of icky angle brackets. RSS is a revolutionary force in this respect. Suddenly thousands (millions?) of people are consuming highly structured XML data on a daily basis. It’s hard to see how we could shoehorn things like <streetAddress> or <companyName> tags into existing HTML pages, but it’s much easier to imagine this sort of thing being added to RSS feeds and processed usefully by news aggregators.

In the meantime, if you want to make your very own insipid reality show or get medieval on my ass, I guess I can live with that.


3 Comments

  1. Interesting analogy; I’m not sure I disagree with you.

    However, the biggest problem with the Greasemonkey-is-evil meme is that there’s no alternative. I’ve already got 15 user scripts installed that measurably increase my productivity, and this, in a nutshell, is why Greasemonkey is here to stay.

    The thing that’s interesting about your argument is that to me the logical conclusion to draw from the brittleness of Greasemonkey is that, despite the widespread adoption of valid HTML, adoption of semantically valid HTML is still far off. It seems that a push towards the use of semantically valid HTML — perhaps combined with the judicious development and use of microformats — would allow Greasemonkey (etc) to be more robust.

    Comment by Jacob Kaplan-Moss — 3/21/2005 @ 1:51 pm

  2. Jacob,

    Absolute, total agreement. After all this ranting and raving about how cool an XML-based web would be, I spent the past 24 hours trying to think how it could actually become a reality without requiring a “big bang” migration where every authoring tool, browser, RSS reader, etc. on earth is updated simultaneously, since that clearly ain’t gonna happen.

    My conclusion was exactly what you suggest. The semantics need to be cleanly embedded into HTML pages in such a way that they continue to render in existing browsers. I wasn’t familiar with microformats, but it’s very encouraging to see that others are thinking along these same lines, and producing code to boot.

    What I want to do now is put together a case study of how AutoLink-style features could be implemented using this kind of approach, so that link identification happens at authoring time (either through explicit tagging or heuristics) and is delivered to the client using microformats or something similar.

    Comment by Matt — 3/21/2005 @ 5:13 pm

  3. Matt,
    I agree that million dollar markup on the general web would be great and useful.

    I’m not sure it’s forthcoming. If we can demonstrate value in integrations and user-centered features, I think it might be incentive to create more formal mechanisms. But the expectation that creating formal markups (even microformats) will supplant the need for user scripts is just wrong.

    Jacob’s suggestion that microformats might be a way forward is certainly more likely than a specially-created XML vocabulary. One of the appealing things about Greasemonkey (to me) is that it allows for unintended and unanticipated features (as far the publisher is concerned). I think Greasemonkey could be an engine for discovering microformat demands. However, you’re never going to avoid the conflict where a publisher’s interests are not in line with a user’s, and on this front, I will be surprised if there can be any useful formalization.

    Comment by Jeremy Dunck — 3/23/2005 @ 8:01 pm

Trackback URL RSS feed for comments on this post.

Sorry, the comment form is closed at this time.