Warning this article may contain opinions of the author that you and iTWire don't necessarily agree with. Don't let them get away with it - have your say with a comment!

No. 1 Story

Technology reinforces generation gap

If you believe that technology could be bridging the generation gap, think again. According to Deloitte’s first State of the Media report it’s as stark as ever.

read more

How two of the world's largest websites use Linux for high availability

Opinion and Analysis

Chances are if you’re reading iTWire then you’re also no stranger to Digg, a very popular Web 2.0 technology news website.

Digg combines social bookmarking, blogging and RSS syndication with non-hierarchical editorial control. Digg’s users submit stories for review and it is the users themselves who choose which stories go on the homepage, as determined by popular vote.

Digg works because it has a large user base – some 22 million individuals - who actively promote good stories to the homepage and vote down weaker stories or those with only a limited audience or amount of interest.

Digg isn’t quite up to Wikipedia’s level of general Internet awareness but it is still right up there as one of the top trafficked web sites on the Internet. Alexa ranks it at 276.

Digg handles billions of page requests a month. This necessitates a solid and reliable infrastructure. In fact, Digg is well known for having the ability to bring less-capable web sites to their knees when successfully hitting the Digg front page and receiving an abnormal spike of thousands upon thousands of new visitors. Yet, the ordinary daily usage of Digg far exceeds any traffic it sends to others and must cope with loads well beyond what other websites can only dream of.

Like Wikipedia, Digg opted for a Linux solution – which continues, despite an advertising deal with Microsoft.

Digg’s infrastructure is so massive that the Systems Engineering Lead would have difficulty giving you an exact count of the servers. There are web servers, and database servers, and even six specialised database servers just to implement the recommendation engine.

Debian Linux is used across the board and then a mixture of free open source software along with some custom-written specialised apps.

A request to Digg’s site hits a load balancer; this are a host of servers which balances incoming requests and cached data, and monitor each other to swiftly take on all the load of any server that might fail so users don’t even notice.

After the load balancer web requests are handed to application servers, which are a combination of Apache, PHP, Memcached and Gearman and serve up web pages and marshal database connections as required.

The databases are all MySQL and are broken up across four masters with a load of slaves. All database writes go to the masters and all reads to the slaves.

This setup works successfully for Digg and for its many millions of visitors. It provides massive uptime, enormous scalability and unbelievable reliability.

For many of us we can only but dream about having web traffic so popular that such high availability is a concern. However, even so, we can all still learn from the genuine lessons that sites like Wikipedia and Digg can teach us based on their vast real-world experience with data of such volume.

And one such lesson is that Linux simply works and can be counted on. If you want performance and reliability, think Linux.