George Danezis, professor of security and privacy engineering at University College, London, offered the response after being asked about de-identification of datasets in connection with the leak of personally identifiable details in health data that was released by the government.
Researchers at Melbourne University were able to trace back the data and after the government was made aware of this, the released dataset was taken offline.
iTWire initially asked Mustafa al-Bassam, a former member of Anonymous, and now a security researcher and doctoral scholar, for an opinion. Bassam directed the queries to Prof Danezis.
"In general it is very hard, to the point of being practically impossible, to take a rich dataset of patient or other records, and produce an 'anonymised' dataset that could be used instead of the first, to mine any information," Prof Danezis said/
"The key reason for this is that all information remaining about the record could be used as an identifier - even the most mundane one. If someone for example knows that you went to the doctor on three dates, they can use that information to re-identify your record if that — non-sensitive information — is still available.
"This problem — namely the fact that all information about a person may be used to re-identify a record (not just names, addresses, etc) — has been studied." In this context, he referred to a paper titled "Myths and fallacies of 'personally identifiable information'" by Arvind Narayanan and Vitaly Shmatikov.
However, Prof Danezis said, what was known how to do safely was to provide private query interfaces into personal information databases.
"Given a statistical query over some records, we can estimate how much to perturb the resulting answers, to ensure that they leak very little information about any person. Those techniques are called 'differential privacy', and are the golden standard computer security and privacy experts would recommend at this point to ensure personal data does not leak."
Prof Danezis added that it should be noted that that these did not result in a "safe" dataset, but instead only answered the questions posed by the query. He said differential privacy had been discussed by Cynthia Dwork in a paper titled "A firm foundation for private data analysis".