The Fall and Rise of the Semantic Web

Monday June 19th 2006, 5:17 pm Printer Friendly Version
Filed under:Semantic Web, Online Identity
Posted By: Matt

One day I’ll write a business book describing the key maxims that I’ve picked up over the course of my career and adopted as guiding principles. Probably right after I learn Japanese, publish my first novel, break par over 18 holes, improvise a six-part fugue on a theme by Klaus Nomi and win the world backgammon championship. In the meantime, I can always pontificate about these pearls of wisdom here on Peer Pressure.

One of my favorites is the tendency of technology hype to peak far too early, causing folks to write off promising trends years before their time. By the time the hype starts to crystallize into reality, no one is paying attention since they’re already concentrating on the latest flavor du jour.

A particularly poignant example is the semantic web. This was all the rage back in the late 90’s, but I dare say most tech watchers have forgotten all about it as they work themselves into an AJAX-fueled, contextual advertising-funded frenzy. Meanwhile, clear evidence of its imminent emergence is starting to appear, as described for example in a recent blog post by Tim Bray.

The main problem with initial efforts to add structure and semantics to the web is that they relied on a big bang shift in the way web content is created, with no incremental path to adoption. The inevitable result is a classic tech catch 22: no one wants to create content that can’t be consumed, and no one wants to invest in tools to consume content that doesn’t yet exist. Perhaps the biggest driver of the future semantic web will thus be RSS, especially to the extent that this can be abused as a blanket term that also encompasses the far more flexible (and far less yucky) Atom. By bringing structured content to the masses in a way that’s immediately useful, RSS opens the door for a parallel web based on XML, with all the exciting possibilities this implies for more intelligent web applications.

Microformats are another important step. The idea of dual-purpose content that can be processed by human brains while we wait for computers to make them irrelevant neatly solves the chicken-and-egg adoption dilemma.

Naturally we’re not there yet. Some sort of persistent client-hosted identity, for example, is a prerequisite for a true semantic web. And that still seems tantilizingly out of reach. But gadgetry like Techorati’s microformat search and Ray Ozzie’s brilliant Live Clipboard are clear signs that the tide is turning.



Searching Questions

Wednesday May 10th 2006, 6:22 pm Printer Friendly Version
Filed under:Semantic Web, World Wide Web
Posted By: Matt

During my flight to the States a couple of weeks ago I finally found time to read John Battelle’s The Search. As expected, the tale of Google’s vertiginous rise is described in fascinating detail. Where the book exceeded my expectations was in John’s assessment of the current and future state of search technology. Like our resident VC blogger Mark, John points out that existing search engines only scrape the surface of what could theoretically be achieved. The holy grail is a system that understands the intent of the user’s search and responds accordingly, rather than doing simple keyword matching. This is an unbelievably complex task as it involves boiling down the meaning of each webpage into some formal semantic representation, using a similar formalism for the user’s query and then matching the two up in an efficient manner. It may takes decades to achieve this, but some aspects of this type of approach will doubtless find their way into mainstream search software in the not too distant future.

John also develops a vision for the future of media and advertising which I believe is spot-on. I criticized Bob Cringely for lacking sufficient imagination in his portrayal of a prospective media/advertising consumption scenario. Specifically, he makes it sound like the ad-defaced broadcast paradigm will perdure, but the ads will be chosen based on your web surfing habits (or something like that). In John’s vision, people will view the shows they want, when they want, and they will be able to ask explicitly to view relevant ads (lured, perhaps, by discount offers and the like).

This jibes completely with my own expectation of how the media landscape is evolving. What’s more, it’s already happening. Highly recommended.



Through Thick and Thin?

Thursday November 10th 2005, 8:12 pm Printer Friendly Version
Filed under:Semantic Web, World Wide Web, P2P, Online Identity
Posted By: Matt

My first experiences with computing were on a VT100 terminal connected via a 300 baud modem to a DEC VAX at the local university. So I’ve probably followed the whole thick client/thin client seesaw for as long as most, fatting it up on a Windows machine with (gasp!) no connectivity in the early 90’s, then migrating more and more of my applications onto the network in the late 90’s and early 00’s (pronounced “naughties”, I believe).

The debate hasn’t gotten any clearer over the past couple of decades. David Berlind makes a compelling case for paying “less or nothing at all for someone else to worry about guaranteed headaches such as software upgrades, data backup and recovery, and system maintenance.” Jonathan Schwartz explains (albeit somewhat self-servingly) why it would be better to store all our data on a Sun grid while running our applications locally. And Joshua Porter provides a great illustration of why some data is better stored locally, even if the apps are remote. So what’s the deal… do I need to go Atkins on my PC or not?

The problem here is that there are two conflicting trends. On the one hand, bandwidth and storage are getting closer and closer to free, which favors remote storage and execution. On the other hand, processing power and, ehm, storage are getting closer and closer to free, so it’s just as cheap to have a supercomputer on your desk as a dumb terminal.

I think that, at the end of the day, the answer doesn’t lie so much in a finding a definitive Right Place for data and code to reside. Instead, it has to do with gaining the flexibility to make sure that stuff can migrate to the most optimal location depending on the precise factors at play, possibly in real time. For code, this means a combination of technologies like AJAX, Firefox extensions and Flash (and their future incarnations). For data, it means richer data schemas to replace the genericity of RSS and OPML so that we know what’s what, chunking data down so it can be easily transferred and replicated, and (most important of all) unique identifiers so we can keep track of what is where.

Oh dear, have I become an RDF convert?



Unnatural Resources

Wednesday September 21st 2005, 8:56 pm Printer Friendly Version
Filed under:AllPeers, Software Development, Semantic Web, Firefox, P2P
Posted By: Matt

While the initial spark of inspiration behind AllPeers centered around the use of peer-to-peer network topologies, I actually think that the biggest innovation is in the way that we structure and manage data. I’ve been active in the XML space for almost a decade, since I first heard the term at an SGML conference in Chicago. Yet despite a fantastic technical specification, tremendous buzz and widespread usage of XML as a serialization format, the standard has yet to live up the the vision of its creator, Jon Bosak. Jon’s idea was that XML would “give Java something to do”, providing structured data instead of mucky HTML that smart client-side apps could munch on. Java may not be the white-hot property that it once was, but the general idea is still as brilliant as it was back in 1997.

The reason for this failure is that XML itself doesn’t provide enough of the complete solution. RDF is in many ways a step in the right direction, filling in the gaps so that global interoperability of structured data can be achieved. But RDF has suffered from excessive complexity and a lack of effective tools for building fully functional applications. The missing pieces, in my view, include:

  • Mechanisms for update of replicated resources.
  • Efficient client-side storage and retrieval.
  • Frameworks for generating attractive user interfaces.
  • APIs that any sane developer would actually want to use.
  • An approachable system for designing and distributing schemas for describing RDF data structures.

Now I’m not going to sit here and claim that we’re going to singlehandedly make the semantic web a reality. But I do think that AllPeers could have a real impact. In reality, P2P is as much about pushing processing onto the client as it is about making efficient use of network resources. This is essential if an application is to make use of structured data, and it’s especially relevant because we’re hosted inside Firefox. Firefox has a built-in RDF infrastructure, but it takes a lot of effort to do anything useful with the data. You could say that our goal is to make it as easy and intuitive to browse RDF data as it is to surf HTML pages.

AllPeers includes a very efficient storage module for storing and retrieving RDF resources in an SQL database (SQLite, to be specific), so you can process the large volumes of data that are characteristic of real-world applications. We track resource distribution and update replicated resources automatically when the original version changes. We use Relax NG schemas to describe resource formats and associated metadata, so you can enforce validity and organize metadata about these formats in a simple, structured way rather than in messy, fragile code.

Firefox already has a decent framework for creating user interfaces based on RDF, and it’s slated to get a whole lot better. So about the only thing that’s still missing is a less verbose API for non-deities to program to. I have some ideas about this as well. Just imagine an E4X API that hooks automatically into the abstract XML serialization of an RDF resource. That probably sounds like goobledy-gook, but the net net is that it could make RDF programming accessible to a much broader segment of the developer population.

Update: I just noticed that XML.com is running a rather eccentric article on “the difference between XML and RDF”. Enlightenment is a mere click away… or is it?

[ ]



Building a Better Browser

Sunday September 18th 2005, 7:06 pm Printer Friendly Version
Filed under:AllPeers, Software Development, Semantic Web, World Wide Web, Social Software
Posted By: Matt

Bart from Flock was kind enough to comment on my last post, and now that we are engaged in very Web 2.0ish social software-driven discourse I decided I should play nice and not title this post “What the Flock?”, despite the almost irresistible temptation to do so.

I’m kidding, people.

Bart: I definitely agree with you that we should try to connect in person. We seem to be attacking the same problem, albeit from totally different angles, and it would be great to compare notes. I hate to be such a petulant geek, but I’m still unconvinced by the notion of packaging Flock as a browser, rather than an extension. I don’t question your sincerity with respect to not forking. But consider this: what if you build a browser with an awesome WYSIWYG editor and I build one with, say, facilities to manage and share media files? The immediate implication is that users have to choose between one or the other. I submit that this is exactly the problem that the Firefox extension mechanism is designed to solve.

There are other disadvantages to eschewing the extension route. For one thing, it’s a lot easier to convince users to install an extension, in my experience, since the perceived “cost of entry” is much lower. Your editor is great. I was really impressed by how painlessly it found my blog, downloaded my existing posts and provided me with a slick way to modify them and create new ones. I’m ready to bet a cholesterol-laden dinner in the Prague greasy spoon of your choice, however, that a) this functionality could be provided as an extension and b) someone is going to do so if you don’t. There’s a real risk of being outcompeted if this happens. I don’t see enough benefits in the soups-to-nuts browser approach to outweigh these considerations.

Anyway, my seething envy over all the advance publicity that Flock is getting has inspired me to say a bit more about what we’re actually doing here at AllPeers. Starting next week, I’ll be posting regularly about our vision and how we are going about achieving it. This is doubtless long overdue since the first Firefox-based version of AllPeers is scheduled for next month.



This Little Piggy Went to Market

Friday May 27th 2005, 5:23 pm Printer Friendly Version
Filed under:Semantic Web, Firefox
Posted By: Matt

I read about Piggy Bank the other day and was intrigued because it appears to incorporate many of the ideas that I have had about how to transform the web into something more structured without “boiling the ocean”. In a nutshell, Piggy Bank is a Firefox extension that lets you consume RDF data sources, screenscrape HTML so that it looks like RDF and store items so obtained in a data store for browsing and sharing. For much, much more information read their very interesting whitepaper.

I tried to install it just now, but unfortunately I couldn’t get it to work. The documentation says that I should see a “data coin” in my status bar that lets me access RDF data sources and screenscrapers, but this icon didn’t show up when it was supposed to. Perhaps something to do with my configuration (billions of extensions installed), but this kind of stuff has to Just Work so I didn’t pursue it further than that.

Just from the info in their whitepaper, there are two things that I really like. One is the idea of using reusable HTML screenscrapers to create structured data. If tools that consume structured data are to become widespread, we need to seed the ecosystem somehow, and this strikes me as a great way to do so. The other is the ability to add tags to tags. I read Clay Shirky’s “Ontology is Overrated” paper, and his thesis that you can apply Google-style techniques to tags without requiring more formal taxonomies is actually pretty darn convincing. But I still think that something is missing and that being able to mark up the tags themselves would plug a lot of holes and make this approach an order of magnitude more powerful.

Now the bad news: it’s written in Java and uses an RDF datastore as a backend. I love Java as a programming language but I remain to be convinced that it is a good choice for a real-world consumer application. And if you want to store a lot data in a scalable manner, you simply need to use an SQL database. And this comes from a guy who worked for years as a developer for an object-oriented database company. Perhaps most serious is the lack of a compelling reason for adoption. On his blog, Stefan Mazzocchi , one of the project founders, speculates that Piggy Bank might be a killer app. In my opinion, however, a platform is never a killer app.

The future looks bright, but it ain’t here yet.



Tag Me Up, Tag Me Down

Tuesday May 03rd 2005, 11:43 am Printer Friendly Version
Filed under:Semantic Web, Social Software
Posted By: Matt

Clay Shirky with a characteristically insightful article, responding to Tim Bray’s question “do we need tags?” on the new You’re It! blog (devoted entirely to tagging). Definitely worth the read. I particularly liked Clay’s point about adding “people” and “time” as dimensions in the search matrix. This is a point that I missed in my earlier post on this topic. I pointed out that the Technorati Tag search for “firefox” yields far better quality results than the equivalent Google search. Much more striking are the results on the corresponding del.icio.us page. No mess or bother at all, just highly relevant links, with the most topical items near the top.

I should mention, by the by, that after all my blathering about Technorati Tags, I recently unsubscribed from their RSS feeds. Nice idea, but still far too much noise.



Do Tags Matter?

Monday April 11th 2005, 8:42 pm Printer Friendly Version
Filed under:Semantic Web
Posted By: Matt

Tim Bray is still wondering about tags. In reference to Technorati Tags, he asks:

Are tags useful? Are there any questions you want to ask, or jobs you want to do, where tags are part of the solution, and clearly work better than old-fashioned search? I really want to believe that tagging is big, a game-changer, but the longer I go on asking this question and not getting an answer, the more nervous I get.

Well we can’t have that, so let me take a stab at it, keeping in mind that what Tim is really asking is whether tag-based searches have the potential to give better results than keyword-based queries.

Do tags matter? Yes. It’s true that the utility of the Technorati Tags feature is still very limited, as I’ve discussed in the past. It takes a certain leap of faith to see where this is leading. At the same time, there is a clear advantage to the tag approach, which is that it involves a conscious effort on the part of content authors to help you determine whether their material is of interest to you. Any time you can set up a synergistic bond between human provider and human consumer, you’re going to get better results than you can through purely automated processing like massive full-text indexes. Blog authors want more readers and readers want a remedy for information overload; since both of these itches can be scratched by tagging, the necessary synergies are clearly present.

An example: as an enthusiastic developer of Firefox extensions (well, extension really, but I do plan to create many more), I’m interested in posts about Firefox. But forget about searching for the word “firefox” using a full-text engine. To me it looks like at most 10% of the hits are actually about Firefox as opposed to irrelevant posts that use the term tangentially. Now compare this to the corresponding tag-based search. Still not breathtaking, but a whole heck of a lot better.

And let’s not forget that full-text engines have been mature technology for decades, while tag-based systems are in their infancy. There’s any number of things that Technorati could (and doubtless will) do to make their results better. For starters, they could filter based on language. All those Japanese posts (and there are lots of ‘em) are just pretty little pictures dancing on the page for us ignorant western types. Hardly edifying. And, as I’ve mentioned, it would be a lot easier to wade through the RSS feeds if they’d list the blog name and Cosmos statistics for each entry.

In fact, what I’d really like to see is a system like the one used for popular tags on del.icio.us. Instead of using the absolute number of bookmarks to determine what gets on the much-coveted popular page, they use the rate of growth (i.e. the first derivative). So instead of seeing the Slashdots and Boing Boings and other stuff-I-already-knew-about entrenched at the top of the list, you see what’s hot right now. The only problem is that the list is skewed to what the del.icio.us crowd is digging. If Technorati could duplicate this for the whole blogosphere, choosing a reasonable cut-off threshold so I only get links to a manageable number of posts, they’d have a whopping hit on their hands.



Categorical Denial

Thursday April 07th 2005, 6:25 pm Printer Friendly Version
Filed under:Semantic Web, Social Software
Posted By: Matt

I’m getting into the Technorati Tag feeds. Everyone should try this! It’s nice that they recognize my blog categories and file them under the right tags. But the mechanism they offer for adding ad hoc tags leaves something to be desired. You have to add a visible link to your blog entry:

Now maybe it’s just my famously limited graphic design skills, but I can’t figure out a way to add these links that doesn’t look goofy. I finally cracked because I wanted my last post about Apple to show up under the “apple” tag, and I stole Chris Lott’s approach. For whatever reason I like the way it looks on his blog much more than on mine. I even tried hiding the link using a CSS style tied to the [rel=”tag”] attribute value, but naturally that didn’t look right in my RSS feed since whatever news reader is being used isn’t going to have access to my stylesheet. RSS should support client-side skinning, dammit!

I guess what I’ll end up doing is the long-postponed reorganization of my blog categories, which I created through a more or less random stream-of-consciousness brain dump when I first set up the blog. Now I’ve realized that what I want is far more categories, structured in a hierarchy and tied into tag-oriented sites like Technorati and del.icio.us. I’m not sure exactly how I’m going to achieve that, but it should be a fun project.

Update: Since I appear to have a supernatural influence over Technorati’s development team, I should mention another beef I have with their RSS output. On the webpage for a tag, you can see what blog each link is from and (way cool) how many links that site has in the Technorati Cosmos. Talk about useful data for filtering posts. So how come this info doesn’t show up in the feed? I hate to bitch since the service is already on the cutting edge. But I will anyway.



Technorati Takes Notice

Tuesday April 05th 2005, 11:33 am Printer Friendly Version
Filed under:Semantic Web, World Wide Web, Social Software
Posted By: Matt

So yesterday I posted my impressions of Technorati Tag’s RSS output, and notably the fact that it’s darn near useless if you can’t see the description text of the posts in addition to the title. Today I got into the office and checked my Technorati feeds, and whaddya know? Now they have descriptions! Omigod, is Big Brother watching me?

Anyhow, surely coincidence but bravo Technorati nonetheless. The service is already much more useful now.



Dissecting Technorati Tags

Monday April 04th 2005, 5:56 pm Printer Friendly Version
Filed under:Semantic Web, World Wide Web, Social Software
Posted By: Matt

I took a look at Technorati Tags when it was first released, but sorely missed the ability to subscribe to a tag-specific RSS feed. Since then, I’ve checked back a few times but didn’t see any way to do this. Then I stumbled on this post by Niall Kennedy. Apparently it is possible to get a RSS output (and has been since February), just not in the way I expected.

I gave it a whirl, and the good news is that it actually works. Unfortunately, they don’t let you generate a feed directly from the tag page. Instead, you need to go through their API, which means you need to first get an API key. Then you use the TagQuery interface to get the RSS feed, keeping in mind that the default output is XML, so you have to ask for the RSS format explicitly. So my query for the posts about P2P looks like this:

http://api.technorati.com/tag?key=<my API key>&tag=p2p&format=rss

Sure enough, the day after adding this to my Bloglines account, a handful of Grokster-related posts were there waiting for me.

The bad news is that the service isn’t really ready for primetime. First of all, you only get the title of each post, without any of the actual text This might be an oversight because the corresponding page on Technorati does have a snippet of text for each post. This is a showstopper since it makes it impossible to scan the items and pick out the interesting ones without having to visit each site. How am I supposed to know from the title whether “Open Thread and Look Around the Net” is something I want to read? (Just checked and it isn’t.)

Another problem is the need for an API key. This isn’t such a big deal, but it does make the whole process less convenient. Also, aggregation sites like Bloglines can’t keep track of subscription stats since everyone’s feed has a different URL. This is too bad since enquiring minds like mine want to know how many other people are subscribed to the P2P tag, as opposed to some other arbitrary tag (hey, at least I didn’t say “long tail”!).

Finally, there simply aren’t enough posts with tag information right now. Obviously this isn’t Technorati’s fault, but I do think that there is the potential for a virtuous circle here. Once you can go onto Bloglines and see that a gazillion people are subscribed to various tags, the motivation to tag up your own posts will soar. Hopefully the next version of Technorati Tags will be useful enough to kickstart the process.



Open-Ended Links

Monday April 04th 2005, 1:21 pm Printer Friendly Version
Filed under:Semantic Web, World Wide Web
Posted By: Matt

Julien Couvreur picks up the linking debate with an excellent take on “open-ended links”; i.e. links that are bound by the client instead of the server. The idea is that in many cases hyperlinks would be more useful if the reader could decide what should be a link and where it should point. I’ve been doing a lot of thinking and writing about this lately, and I hope to have a new essay on this topic up soon.

In the meantime, let me just say this: I’m still very queasy about the idea of an open-ended linking architecture based solely on client-side manipulation of HTML pages. Unlike Julien, I’m not so worried about the incentives for content authors to add more structured linking information to their documents. At the end of the day, we’re all trying to get our stuff read, and if we can add value (as perceived by our readers) that’s motivation enough in itself. Yes, this will empower users to do things like choose whose affiliate IDs they want to use when clicking on an Amazon link, but so what? If there’s someone out there whose livelihood depends on Amazon clickthroughs, well, get a job, dude.

So I think that the open-ended links should be labeled as such on the server side. Like it or not, this is going to be HTML embedded “microformats” at first, something like:

<span class="openlink:streetAddress" openlink:locale="cs-CZ">Uruguayska 5, Prague</span>

Of course, we’ll need to bootstrap all our existing content to support these new links, since we can’t expect every content author on earth to go out tomorrow and mark up all their pages with new information. This is where Greasemonkey comes in. Since it steps in before the page has been flagged as loaded, any number of a variety of techniques could be used to add links like the above example into the page. This wouldn’t be as reliable or robust as letting the content author craft the links by hand, but at least it would get the ball rolling.



Paris Hilton, Marsellus Wallace and the Extensible Web

Sunday March 20th 2005, 1:32 pm Printer Friendly Version
Filed under:Semantic Web, World Wide Web
Posted By: Matt

Well, reaction to my “Why AutoLink is Evil” post was a bit more heated than I expected. I’m certainly glad that I didn’t go with my original title: “AutoLink: Spawn of Satan?”

Seriously, there seems to be some kind of backlash effect going on here where anyone who criticizes tools like AutoLink and Greasemonkey, for whatever reason, is seen to be taking a totalitarian, anti-user stance. In my case nothing could be further from the truth. I don’t think that we should be up in arms because these tools are letting all those pesky users muck around with our pristine web. On the contrary, as users we should be demanding much more powerful, reliable and extensible ways of mucking with the web.

Any description of a web technology as “evil” has got to be largely tongue-in-cheek. This should be obvious so I’m not going to retreat from that characterization. But let me try to clarify what I said with an analogy.

AutoLink is like Paris Hilton: it’s only interesting when it’s being naughty. AutoLink is exciting because it re-raises all of those questions about who should have the right to modify web content, and these clearly strike a nerve in a lot of people. But the main issue with AutoLink isn’t whether it does or does not violate the sanctity of other people’s webpages, it’s that it’s so bloody useless. I can see a small amount of marginal utility in linking addresses to maps, much less so with ISBN numbers, and as far as package tracking and VIN numbers are concerned: come on, is this someone’s idea of a joke?

I think that these categories of information must have been chosen by Google because they’re easy to identify using simple pattern matching. The kinds of stuff that I often do want to look up on webpages, things like obscure vocabulary, acronyms, foreign words and company names, are much harder to pick out in this way. If Google works its statistical processing magic and comes up with something that, a large proportion of the time, selects content from a page that I might plausibly want to look up somewhere, then I’ll revise my AutoLink is Evil assessment. In the meantime, I stick by my assertion that the right way to do this is for the content author to mark up linkable stuff unambiguously using XML. In other words, we should move towards a web where, instead of having constantly to make link/no-link decisions, you can designate text as maybe-link, in which case different graphical interfaces can decide how and when to present these links and what they should point at.

Greasemonkey is like Marsellus Wallace: it’s evil, but cool. It’s evil when people start thinking that this is a good way to bring the customizable web to a large number of desktops. It’s cool because for tech-savvy users, the extreme sports enthusiasts of the internet world, it provides a quick-and-dirty way to play around with web content. The intriguing thing about Greasemonkey is that it could easily become a good guy since the technique it uses is equally applicable to XML content.

As far as the “semantic web is vaporware” argument goes, we need to remember that a lot technologies take off just about the time that people start to write them off as vacuous. This is because our expectations of how fast radical change is going to happen tend to be wildly optimistic. Add to this the fact that the powers-that-be dropped the ball in many ways when putting together the specs that were supposed to be the underpinnings of XML on the web, and it’s hardly surprising that it hasn’t happened yet.



Why AutoLink is Evil

Saturday March 19th 2005, 2:07 pm Printer Friendly Version
Filed under:Semantic Web, World Wide Web
Posted By: Matt

I’m not sure whether it’s entirely normal to have an AutoLink epiphany, but I just had one. My take on the whole AutoLink-is-evil debate has until now centered on control issues. AutoLink gives a substantial amount of control to users, making it seem less like SmartTags and more like RSS. Since RSS is a Good Thing, anything that gives users more control must also be a Good Thing. Ergo, AutoLink is not evil.

Then it started to dawn on me that maybe control wasn’t the primary consideration here. As people become hooked on a more customized web experience, web developers are responding by treating HTML as a building block rather than a presentation format. We used to call this screen scraping, although the term doesn’t seem to get as much play nowadays. “There’s data in them thar HTML pages,” cry the greasemonkeys, and if they can just get at it they can repurpose it in cool and useful ways like adding helpful hyperlinks or removing ads.

There are a lot of problems with this. Fact is, HTML is a presentation format and as such is not designed for reuse. There’s no contract between the HTML author and the greasemonkey scripter. The whole point of my “Greasemonkeys and Obfuscators” post was that there is an infinite number of ways to represent the same raw data using HTML. So a minor change to a website can easily break every client-side script designed to manipulate it, intentionally or not. If you’re lucky, your popular greasemonkey script will just stop working. Much worse, it might start to behave erratically, breaking the websites it was designed to enhance.

This was all anticipated by Jon Bosak in his essay “XML, Java and the future of the Web“. And that was in March 1997, guys. Understandably for a Sun employee, Jon focused on Java as the tool for client-side manipulation. But the principles remain the same whatever language you’re using. Reliable data repurposing requires a format with a lot more formal structure than HTML. This insight led to the creation of XML, which is undeniably a Good Thing because it provides a technically elegant way to empower users to do new and exciting things with web content.

This is where the comparison between AutoLink and RSS falls down. RSS is the first killer app for XML on the web. While XML was quickly embraced as a format for data interchange (notably in B2B e-commerce), its impact on the web (which, after all, it was originally conceived for) has been relatively minimal. RSS is the preeminent example of XML being delivered by websites, and people love it. This is because RSS really is a building block, so it is far more conducive to the growth of a robust ecosystem for creating, consuming and managing it.

My conclusion: AutoLink is evil. All those greasemonkey scripts are evil. They may provide fleeting satisfaction, but anything close to widespread adoption is going to create a big mess. In one post Robert Scoble asks:

I wonder what happens if there are two conflicting GreaseMonkey scripts, for instance. How does it decide which link to use?

Exactly. Software developers encounter this problem all the time when they implement a quick-and-dirty solution instead of taking the time to get the design right. Oftentimes your hack works fine until it bumps into another hack somewhere else in the program and — poof! — the whole thing blows up.



I Forgot…

Friday March 18th 2005, 10:41 am Printer Friendly Version
Filed under:Semantic Web, World Wide Web, Social Software
Posted By: Matt

4. Community-driven categorization efforts work better than centralized ones. As a little test, I searched for Suck.com in the Open Directory. It’s there, but categorized under Computers -> Internet -> Cyberspace -> Culture. That doesn’t strike me as particularly accurate, and it certainly isn’t intuitive enough to have helped me to find the site based on the criteria that I remembered. Wikipedia’s categorization is much more useful in this instance. In Yahoo’s directory I couldn’t even find a definite reference to the site (although some of its content was referenced).


 

AllPeers File Sharing



AddThis Feed Button



Creative Commons License
This work is licensed under a Creative Commons License
Conestoga Street Wordpress Theme by Theron Parlin