Home opinion-and-analysis Open Sauce Open source survey: many questions remain

Used in the right context, statistics can often illuminate and point the way. On the other hand, these days, given the degree of spin around, they are more often used to confuse and blur an issue.

That's why I am always suspicious about surveys. Over the years I've been writing about free and open source software, I've seen numerous surveys from the proprietary software crowd - which always tend to paint a positive picture of proprietary software. Curious.

And then there are surveys from proponents of free and open source software - which, mysteriously, end up favouring the FOSS community and its software.

It's not surprising that the great Benjamin Disraeli once referred to statistics as being the third kind of lies - after lies and damn lies.

Hence I decided to have a close look at the Australian Open Source Industry and Community Report 2008 conducted by Waugh Partners and officially released on April 1. If anything, such scrutiny would only add to the credibility of such a report – that was my reasoning. The data for the report was collected through two surveys conducted between October and Devcember 2007.

The report was backed by IBM, Fujitsu and National ICT Australia. The industry results seem perfectly kosher as the sample size appears eminently reasonable. The doubts I have pertain to the community statistics.

Jeff Waugh and Pia Waugh presented a sneak peek at the community side of the results during the national Australian Linux Conference. The presentation was on February 1, the final day of the conference, and was one of the last of five talks scheduled at different venues at 2.30pm.

Shortly after the presentation began, a member of the audience asked about the sample size which was used to determine the results. Pia Waugh, who was doing the main presentation, had begun by citing only percentages. To this question, Jeff Waugh responded: "We will get to that, we will get to that." The figure — 327 — was only given towards the end of the presentation. This figure determines whether the survey findings about the community stand or fall.

(A video of the presentation is here; you can see the question I refer to about nine and a half minutes into the presentation.)

It made me wonder why the figure was not given out right away. (Rupert Murdoch's Sky News runs polls every day and presents the results as "42 percent voted yes, and 58 percent voted no," but they never give you any numbers which makes the poll meaningless.)

A number of graphs presented at the time had no indication of what some of the axes represented. I figured that since the presentation must have been done in a hurry to catch the conference deadline, it would be unfair to report about it at that stage. If a report had been written then, it would probably have made the effort look very amateurish.

Despite this, several news outlets wrote up the survey in glorious detail. I waited for the final product. In the run-up to the release, there was more publicity, including one piece in iTWire on March 10, by my editor Stan Beer. However this piece steered clear of accepting the survey results as gospel, citing all findings as claims made by the company. Several other outlets were not half as careful.

My colleague, GNU/Linux guru David M. Williams, who wrote a detailed piece about the survey for iTWire on April 14 which contained a fair degree of fine analysis, touched on the one aspect of the survey which troubled me.

He wrote: "Compared to a market like the US these numbers are very small. However, the Waughs estimate the respondents are equal to roughly 10% of the individuals and 25% of the companies in the Australian open source community. If they are correct in this then the sample size can be said to be statistically significant. The figure for companies is likely to be correct, but the assumption for individual numbers has been based on the membership of community organisations, mailing list subscriptions, user group attendance figures and other information. I would have liked some more information on this; for instance, were attempts made to cross-reference the various sources and thereby identify each individual who is on a mailing list and is a member of a user group? Indeed, are user groups already counted as being community organisations (namely, organisations that make up the open source community)?"

This made me realise that I was not the only person thinking in this direction. And so I set out to ask a few questions.

The final report says: "We worked closely with psychometricians and statisticians provided by NICTA, our primary research partner, to ensure the end-to-end quality of the research. While our sponsors and supporters provided feedback at numerous points throughout the project lifecycle, this report is the result of independent analysis by Waugh Partners. It is based on data collected through a pair of online surveys held between October and December 2007."

An official spokesman for NICTA told iTWire: "NICTA was a sponsoring partner and was not responsible for data collection or statistical analysis. NICTA did not analyse the data in depth, but we did review it and were satisfied that the sample was representative and that the data provided was of good quality and integrity."

There appears to be some difference between the two versions but I'm sure the intelligent reader will be able to pick it up.

The use of the word "census" to describe this survey is misleading. A census covers all individuals of a particular subset; this is a survey as it only covers a small part of a target population.

I sent a list of questions to Waugh Partners on April 25. These are given below in italics with my comments in bold:

1. Size of the community: you mention that your understanding of the size of the community is based on membership of community organisations, mailing list subscriptions around Australia and group attendance figures and other information about the community. Can you provide the names of organisations that were taken into account? What criteria were employed to include or exclude organisations (if any were excluded, that is)?

This was the major point which I could not get past. The industry figures seemed fine to me but that sample size for the community — 327 — was something that stuck in my  craw.

2. How many members were there in each organisation? Was the fact that people are often members of multiple organisations taken into account?

Multiple membership is common in FOSS community organisations, hence this query.

3. You mention that the data was collected through two online surveys. What measures were taken to ensure the security of these surveys? (You are aware, no doubt, that online surveys are notoriously susceptible to manipulation by technically competent people.)

This question deals with something that often leads to online publications pulling surveys offline.

4. What measures were taken to eradicate duplicate responses to the survey - the case of one person, at times out of patriotism to the cause of open source, filling in data twice? (Once again, I don't have to point that IP spoofing is a trivial thing for technically competent folk to effect).

5. You mention that you "worked closely with psychometricians and statisticians provided by NICTA, our primary research partner, to ensure the end-to-end quality of the research." Did the people from NICTA analyse the data, collate it and then leave the conclusions to you? Or, if someone in your organisation did everything, is the person a trained statistician?

I think this question was answered in part by the NICTA response.

6. Publicity for the survey: You mentioned: "The Census was directly promoted through a national roadshow which traveled to every capital city, on several mailing lists including Linux Australia, Open Source Industry Australia and user groups around the country, and through direct contact with Open Source community members and companies. Indirect promotion included blogging, media coverage, and notification to members of the Australia Computer Society, AIIA, OzZope and numerous other organisations."

Which media were these? Was there any advertising in mainstream media or technology supplements of mainstream media? Were press releases sent to all major technology news outlets in Australia?

Given that major companies like IBM and Fujitsu were involved as sponsors, I thought that they would be concerned about adequate publicity. Hence the question.

7. You mention that you "... received 315 complete and legitimate responses, with 66 incomplete. Twelve of those incomplete responses were deemed legitimate and complete enough to include in the final results." What criteria were used to determine the legitimacy of these 12 responses as opposed to the other 54?

8. You mention that "The community survey was aimed at "individuals who contribute to Open Source projects and communities in any capacity, not just software development". What other contributions were taken into account? For example, would a man who sends a PC to East Timor via the Computerbank project (which fits it out with Debian and sends it to Dili) be counted as an individual who contributes to an open source project?

9. Mention is made of a number of contributors in the "Acknowledgements" section of the report. Were any of these individuals paid or in any other way provided recompense for their contributions?

I asked one of the people mentioned about this directly and the response is given below.

10. And finally, the use of Adobe InDesign to produce the report does sound a bit off-key considering the subject under consideration. Was there no way to do this using an open source solution?

This question had also been raised by Williams; it often bemuses me that open source advocates are the last to use the fruits of labour of their own community.

On April 29, Jeff Waugh replied thus: "We've already done an article about the survey with your editor, who is also aware that we have no interest in working with you under any masthead."

iTWire had already done two articles on the survey; he seemed to be unaware of the second piece. He also seemed unaware that there are many categories when it comes to writing – straight news, analysis, comment, editorials, investigative pieces, etc. It's important to note that none of the questions was dismissed out of hand.

As one of those listed as contributing to the survey, technology writer Sarah Stokely, had written a couple of pieces publicising the report, I wrote to her asking why there had been no disclosure about her involvement in the project when these articles were written and also querying whether she had been paid for her work on the census report.

She replied: "I wrote one story about Pia and Jeff Waugh's presentation on the Census at Linux.conf.au for IT News in February. Their talk was the first time that I'd heard of the Census. I was not involved with it and there was nothing to disclose. I thought that the Census was a cool and worthwhile project, and I volunteered to help by doing some writing for them. As it turned out, they didn't need any help on the writing front but I ended up doing a few hours of editing for them in late March shortly before the report was printed, which is why I received a credit in the report. They offered to pay me but I decided not to accept payment to ensure that it remained a volunteer activity for me."

Good on her, for not taking the money.

My conclusion? There is a limit to openness, even when it comes to matters connected with open source. I asked a series of serious questions and the response, to put it frankly, was disappointing.

Until these queries are answered, I would not put my money on the results of this survey – the community figures, that is. Nor should anybody else, Companies associated with open source should be prepared to undergo the same grilling that others do – indeed, since open source often takes the high moral ground, they should be willing to be more open. Sadly, a Taliban-like attitude seems to prevail.


With 50+ Speakers, 300+ senior data and analytics executives, over 3 exciting days you will indulge in all things data and analytics before leaving with strategic takeaways that will catapult you ahead on your journey

· CDAO Sydney is designed to bring together senior executives in data and analytics from progressive organisations
· Improve operations and services
· Future proof your organisation in this rapidly changing technological landscape
· CDAO Sydney 2-4 April 2019
· Don’t miss out! Register Today!
· Want to find out more? Download the Agenda



Australia is a cyber espionage hot spot.

As we automate, script and move to the cloud, more and more businesses are reliant on infrastructure that has the high potential to be exposed to risk.

It only takes one awry email to expose an accounts’ payable process, and for cyber attackers to cost a business thousands of dollars.

In the free white paper ‘6 Steps to Improve your Business Cyber Security’ you’ll learn some simple steps you should be taking to prevent devastating and malicious cyber attacks from destroying your business.

Cyber security can no longer be ignored, in this white paper you’ll learn:

· How does business security get breached?
· What can it cost to get it wrong?
· 6 actionable tips


Sam Varghese

website statistics

Sam Varghese has been writing for iTWire since 2006, a year after the sitecame into existence. For nearly a decade thereafter, he wrote mostly about free and open source software, based on his own use of this genre of software. Since May 2016, he has been writing across many areas of technology. He has been a journalist for nearly 40 years in India (Indian Express and Deccan Herald), the UAE (Khaleej Times) and Australia (Daily Commercial News (now defunct) and The Age). His personal blog is titled Irregular Expression.


Popular News




Sponsored News