Before we begin, let's be clear about two things: firstly, my views do not necessarily reflect that of ITWire or its editors. Secondly, this story is not making any value judgment on OpenXML vs ODF. Instead, I'm saying, "ok, OpenXML exists. Here's a bunch of stuff you can do with it." Perhaps some might say you can achieve the same things if only Microsoft adopted the existing ISO standard instead - and that's cool; just because I say "here's something you can do in OpenXML" you shouldn't interpret that as meaning "OpenXML is the greatest thing in the world."
Now, credit where credit is due, this story didn’t come to me from my own keen thought processes. Rather, Microsoft expressed disappointment at my views and said “a better story” would have been the positive benefits OpenXML can bring the interoperability between different operating systems, aka Windows and Linux.
That's a fair point; Microsoft have published the specifications for the XML-based file formats used throughout their latest Office suite applications. Never before have alternate applications had the same opportunity to offer 100% compatibility. In the past, rival suites which purported to open such documents could not guarantee complete success: this is no longer the case and major vendors are taking it up. The Novell edition of Open Office supports OpenXML. The Gnumeric open source spreadsheet supports OpenXML. And Corel have announced their support of OpenXML.
What’s more, an XML-based specification has a massive advantage over a binary file format – by virtue of using XML, a program’s support for OpenXML does not have to be full-blown. It’s now possible to produce simple utilities which operate over Microsoft Office documents, performing their own work but safely ignoring everything else at no risk of damaging the document. Such a utility might be a command-line global search-and-replace, for example, which swiftly processes a batch of documents in one sweep. Or, perhaps a utility which reduced the colour depth (and thus dramatically shrink the file size) of all embedded images within a set of specified documents. In these hypothetical cases the developer would not have to understand or implement the entire OpenXML specification. (Although to find the relevant information, they would still have to wade through the 6,000 page description!)
This is the approach crafty developers worldwide have been taking. OpenXMLDeveloper.org is building a repository of these with RSS subscription available. Here’s some of the more interesting ones.
One developer demonstrates how to dynamically create invoices within a PHP-powered web application that are in Microsoft Excel format.
This code sample is compelling; firstly, the originating server can be a Linux box or indeed any other machine that is capable of hosting PHP. There need not be any proprietary libraries or run-times installed. In fact, previously, generating Excel documents on-the-fly was a tedious process even for Windows developers; a .NET web site had to rely on “primary interop assemblies” (PIAs) to communicate with non-.NET Office APIs.
Secondly, it does have to be said, if you’re sending an Excel document there’s a safer pragmatic bet the recipient has either Excel or an Excel-compatible spreadsheet application than any other format. True, at this time Office 2007 doesn’t have a major foothold but this will undoubtedly grow.
Now, perhaps dynamic creation of invoices isn’t really your cup of tea. Where this bit of code really is terrific though is that it itself is only serving to demonstrate a larger piece of work by the author – namely, a reusable open source library of Excel 2007 reading/writing routines called PHPExcel.
The implemented features thus far are quite impressive; an in-memory spreadsheet can be represented, with worksheets, data and formulas. Protection is enforced as is formatting including expected font changes and more complex items like gradient fills. Images may be added and various styles set, along with printing options and saving to several file formats.
Having a library like this means that all the complex work involved in reading and writing the OpenXML format is taken care of. The main app becomes merely a dead simple sequence of calls like so, allowing the developer to focus on the problem at hand:
$objPHPExcel = new PHPExcel();
$objWriter = new PHPExcel_Writer_Excel2007($objPHPExcel);
Producing Excel documents in PHP has never been easier.
Creating Word documents in pure Java
Continuing the trend, another team of developers have devised Java code which generates valid OpenXML word processing documents without any use of the Office client applications, or any Microsoft APIs or libraries, and indeed, without even requiring a Microsoft operating system.
The intention of this code is to assist developers who work in Java on Linux or Macintosh or any other non-Microsoft environment, and also developers building server-side applications that wish to produce Office-compatible documents to present data and reports.
Actually, to be precise, OpenXML covers a set of XML document standards; SpreadsheetML is the subset relating to spreadsheeting which PHPExcel is striving to implement, and this Java code actually implements the WordProcessingML side of OpenXML.
More coders have jumped in with Java snippets. Another sample shows creating a document, adjusting its properties and thumbnail, adding text and converting to HTML output. and several more can be found.
The pinnacle of them all, however, is OpenXML4J – an open-source library for Java developers that provides classes for OpenXML development. It’s in pure Java meaning it’s usable anywhere you have a standard Java compiler and library and runtime. Just like PHPExcel, this library can be used by developers to manage all the mechanics of OpenXML document construction and manipulation making working with Word/Excel and PowerPoint documents a breeze on any platform.
ODF to OpenXML and back again
Another project gaining traction is the ODF to OpenXML translator package. This title is potentially misleading; the project doesn’t just convert ODF (Open Document Format) documents to OpenXML but also allows conversion the other way. This is actually among the top 25 projects on SourceForge.
The development goals for this team are to make plugins that provide interoperability between applications based on ODF and OpenXML. A core deliverable is the development of add-ins for Microsoft Office which permit both opening and saving of ODF files. Unfortunately, no such add-ins appear to be underway or planned for OpenOffice but a secondary deliverable is a series of command-line translator utilities to perform batch conversions in either direction. These utilities can also be run on servers, invoked by server-side applications.
The conversion process is essentially based on performing XSL transformations between the two distinct XML formats, along with necessary pre- and post-processing to manage the zip file packaging and some other housekeeping.
This project is open source, but is being developed by several commercial providers including an international software company who have in the past produced an OpenOffice converter for Word 2003.
The applications can be used to allow Microsoft Office to work with ODF documents created by and intended for use by ODF-compliant applications on Linux or other platforms. However, disappointingly, the applications themselves are compiled for 32-bit Windows environments only, and hence only run on that platform.
With technology like that described here, handling OpenXML within Linux is a snap. We’ve mentioned possibilities for small utilities,and we’ve presented code to produce invoices on the fly. For something more substantial consider some real-world possibilities.
A banking example is a commercial bank website allowing its customers the facility to check their current balance and then with a simple click download and open a spreadsheet generated on the fly from the server. This spreadsheet may include all the user’s account data. They may now work with this data and simulate loans or other operations, or sum the interest paid during a financial year or other activities.
Similarly, an energy company might provide opportunity for customers to check electricity consumption and download a dynamically-generated spreadsheet with formulas and customer data which can be merged with data from other sources thus realising an ad-hoc analysis.
For knowledge workers, an OpenXML app might generate presentations on demand from several slide decks stored on a web server. Presentations can be quickly compiled, adding or removing or shuffling slides as required.
OpenXML4J present other scenarios that can be imagined.
Now, sure, Microsoft ASP.NET developers can do all these things with the .NET framework and Microsoft Office already. And Linux developers could do all this with other document formats. But, pragmatically, OpenXML opens up the popular and widespread Microsoft Office application to developers and applications worldwide, without imposing any constraints on the server or desktop technology used. A verdant world of interoperability between diverse operating systems is opened up, giving the user a rich user interface and experience.
The Ecma TC45 committee have produced copious amounts of paperwork so you probably wouldn’t want to start explorations into OpenXML with them; instead, OpenXML Explained, the first OpenXML book – an easily-digested 128 page publication – can be purchased or better still downloaded free as a PDF from OpenXMLDeveloper.org.
For any person who is keen to produce their own code that works with OpenXML documents – whether consuming or constructing – this is an excellent resource and reference and tutorial all in one.
(Now, did we say everyone should adopt OpenXML? No, we said "if you want to work with OpenXML documents, here's some code snippets." - have fun :)