Excess Moderation

Tuesday January 04th 2005, 6:41 pm Printer Friendly Version
Filed under:World Wide Web, Software Industry, Social Software, P2P
Posted By: Matt

As anyone who has ever been involved in a software startup knows, it ain’t all about writing great software, sticking it up onto the web and listening to the sweet tinkling of cash piling up in your bank account. Unless you’re a breatharian, you happen to have cofounded Netscape or your father is Li Ka-shing, you need to find a way to finance your development team until revenues start to come in. This means succumbing to a fate that for most programmers is only slightly better than death: pitching to investors.

I’ve done my share of this over the past few years, and I have the scars to prove it. Certainly compared to the elegant intellectual immersion that is programming, it can be an exercise in excrutiating frustration to try to explain to a technical simpleton who can barely turn on his computer why your brilliant innovation is going to change the world. But this caricature is unfair for two reasons. First of all, while most potential investors aren’t hardened coders, some are… and in the software world the others tend to be bright individuals with significant technical knowledge and business smarts (otherwise how did they get all that money in the first place?). And secondly, those who are completely clueless about software often force us to confront issues that seemed obvious to us, until we tried to explain them to someone without the same background and world view (i.e. 99% of the population).

In the case of AllPeers, a question we get asked often is “why P2P”? Investors are understandably concerned that we are planning to sprinkle some magic P2P pixie dust on a bunch of existing application categories without a clear idea of why this is actually an improvement. This is very far from the truth, but having the question framed this way by people who aren’t necessarily going to go gaga over any and every new, sexy technology is a great way to ensure that you not only know what you’re doing, but you can articulate it clearly and convincingly.

All of this sprung into my mind in the context of the latest hullabaloo about Wikipedia, the (apparently) controversial collaborative encyclopedia project. I’ve written about Wikipedia before, including an article about the moderation woes that are at the heart of the current debate. I also wrote about how it would benefit from the infinite scalability inherent in a P2P architecture.

Another consideration that neatly unifies these two themes is the role of P2P in strengthening generic web infrastructure. Allow me to explain.

The web has turned out to be a fantastic software development platform thanks to a few key characteristics that represent a radical departure from how we used to make software. The most important is the use of a thin client that renders its user interface based on markup delivered to it by the server. This means that most web applications can run anywhere, both in the sense of “on any platform” (be it a Cray supercomputer or a handheld PDA) and “in any place” (so you can read your email on your laptop over a satellite phone while circumnavigating the globe in a one-man dingy).

This approach also has disadvantages, however, the most obvious of which is the increased latency experienced when using web-based applications (in other words, the web is slow). Less widely recognized is the difficulty of leveraging features created in the context of a specific web application. This brings us to the issue of moderation. In my last post on this topic, I talked about the excellent system used by Slashdot. So why not just transplant the relevant Slashdot code into Wikipedia and be done with it? Well, for one thing the two sites are very different in their structure and goals (a newletter vs. an encyclopedia). But even this is not the main stumbling block. Rather, it is the difficulty of integrating one application’s source code into another when the latter might be written in a different programming language and use completely different internal organization, data structures and so forth.

How can we use P2P to fix this? The answer harkens back to the original purpose of XML, as outlined in Jon Bosak’s revolutionary paper XML, Java and the future of the Web (published in 1997). Bosak argues that because HTML is too dumb to allow much automated processing of web content, we need a smarter, more structured alternative called XML. I won’t even attempt to explain what the heck that means. Read his article, or buy me a few gins-and-tonic and I’ll tell you more about the history of markup languages than you could ever possibly want to know. The long and short of it is that XML describes stuff in a much more formal way that makes it easier for computers to figure out what to do with a given chunk of data.

On the surface, there’s no obvious reason why the Slashdot moderation system couldn’t be applied more or less directly to Wikipedia. People could be given the opportunity to rate articles (or modifications to articles) and then filter what they see based on these ratings. So if you see an article with the highest rating, you can be pretty confident that the information it contains is accurate. The problem is that the two websites probably aren’t modular enough to do this without completely rewriting Wikipedia, something I’m sure its creators are not anxious to do (apart from anything else, it is written in PHP whereas Slashdot is written in Perl). Moreover, the Slashdot moderation system would have to be a lot more robust, with various options that can be tweaked and tuned for different usage cases. In a way this is a classic chicken-and-egg problem: the Slashdot moderation system isn’t generic enough because it can’t be easily leveraged, and vice versa.

Making everything XML goes a long way towards resolving this problem. Imagine that all Slashdot articles are managed as XML-based resources, and ditto for Wikipedia articles. Now all I need is an engine that knows how to attach additional rating information to these resources. It doesn’t have to be written in the same programming language as the application that created the resources in the first place since the data structures (XML-based resources) become the lingua franca that let the various modules intercommunicate.

One problem with this is that this kind of extensibility can result in unforeseen load on the server side. Wikipedia’s already pretty slow. Now imagine that we tack on a moderation system of the type I describe. And what the hell, let’s throw in some Amazon-style recommendations, Googlish “find similar entries” and blog-like comments. The predictable result is that the server is going to blow up under the strain. Another serious weakness is that extensions can only take place on the server-side, so the entire burden of experimentation, improvement and tweaking falls on the system’s developers.

This is where P2P comes in. Imagine now that our moderation system is a P2P-style module that runs as a plugin inside the web browser of each user. Now all the strain of creating, retrieving and collating the ratings can be offloaded on to the client machines. The same goes for recommendations and other features. What in the aggregate would represent an untenable load can be quite manageable when spread across potentially millions of users’ machines. And anyone with a cool idea can write a plugin that adds new features to an existing website. Granted, the task of implementing this functionality in a P2P manner is hardly trivial, but who cares when we only have to do it once, get it right and then get full benefit from it in every new application that we write.


No Comments

No comments yet.

Trackback URL RSS feed for comments on this post.

Sorry, the comment form is closed at this time.