Taiwan's National Centre for High-Performance Computing is one place where a lot of good work allied to Linux goes on - but very little is heard about it.
Perhaps that's because the scientists who do the work are good at their work but not terribly good at pushing what they do.
Unfortunately, in this world of ours, the most mediocre are always the most ambitious. It works to the disadvantage of researchers for whom the work is mostly its own best reward.
The NCHC has at least two projects which deserve some attention - DRBL/Clonezilla and Crawlzilla.
The former is really two projects in one - diskless remote boot in Linux and Clonezilla are clubbed together. Heading it is Steven Shiau, a nuclear engineering graduate who then chose plasma simulation as his research topic.
Right now, Shiau is researching high-performance computing. "NCHC is a non-profit organisation and about 90 percent of the budget comes from the Taiwan government," Shiau told iTWire.
DRBL provides a diskless or systemless environment for client machines. It can work with Debian, Ubuntu, Mandriva (and, one assumes, the recnet Mandriva fork Mageia), Red Hat, Fedora, CentOS, Scientific Linux, and SUSE.
Using distributed hardware resources, DRBL makes it possible for clients to make full use of local hardware. It uses PXE/etherboot, NFS and NIS to provide services to its clients, hence installing GNU/Linux on the client's hard drive is not called for.
A DRBL server can be set up and the clients can boot from it; there is no interaction with the client's hard drive so any operating system already present is left undisturbed.
A standard PC can be used to change a group of client PCs into a GNU/Linux network; all one has to do is to download the DRBL package and run the associated scripts. The process takes about half an hour.
If one wishes to use the hard drive in a client, it can be set up to be used as either swap or data space. All these settings can be made in the centralised boot server and doing so will save a lot of time.
"The DRBL project started in 2003," Shiau said. "We registered it on SourceForge on February 5. The Clonezilla project started in 2004 and was registered on SourceForge on July 27 the same year.
"The DRBL and Clonezilla projects are open to outside contributors. We put them on SourceForge since the projects started, and we have some contributors."
Developers Thomas Tsai, Jazz Wang, Steven Shiau (DRBL and Clonezilla project lead), and Ceasar Sun. Shiau is holding the trophy awarded to DRBL in the Public Sector Applications category at the French Trophees du Libre (International Free Software Contest) in December 2007.
Clonezilla, as the name implies, does the same job as done by proprietary application Norton Ghost and the open source package Partition Image. It supports both unicasting and multicasting and takes much less time than any other similar package.
Clonezilla is based on DRBL, Partition Image, ntfsclone and UDPcast and can be used for bare metal backup and recovery. Clonezilla Live is suitable for single machine backup and restoring while Clonezilla Server Edition can clone up to 40 computers simultaneously.
Only the used blocks on the hard drive are saved and restored. At the NCHC, Clonezilla SE was used to clone 41 computers at one go; it took 10 minutes to clone a 5.6GB system image to all 41 using multicasting.
Clonezilla supports ext2, ext3, ext4, reiserfs, reiser4, xfs, jfs, hfs+, FAT and NTFS and can be used to clone GNU/Linux, Windows and Mac OS systems. Unsupported filesystems can be handled too, via a sector-to-sector copy by dd in Clonezilla. LVM2 under GNU/Linux is supported; LVM version 1.0 is not.
Another small package, DRBL-winroll, also developed by the NCHC, can be used to automatically change the hostname, group and SID of a cloned Windows machine.
Crawlzilla project lead Wei-Ju Chen with developers Wen-Chieh Kuo and Shun-Fa.
The other project of note, Crawlzilla, is a cluster-based search engine deployment toolkit. It is headed by Wei-Ju Chen and helps users build search engines for specific websites which cannot be indexed by Google or Yahoo!.
Crawlzilla is based on projects like Nutch, Hadoop and Tomcat; key features include cluster scripts for deployment, text user interface for cluster system management, web user interface for managing crawler URLs and index pools and Chinese lexical support.
Crawlzilla is under active development and the source code is available at SourceForge.