Home Data Personal data de-identification practically impossible: expert

The Australian government may well have to reconsider its plan to release data about its citizens for use by businesses or scientists, with a top expert in security and privacy engineering telling iTWire that de-identification of datasets so that they cannot be traced back to the original is very hard, to the point of being practically impossible.

George Danezis, professor of security and privacy engineering at University College, London, offered the response after being asked about de-identification of datasets in connection with the leak of personally identifiable details in health data that was released by the government.

Researchers at Melbourne University were able to trace back the data and after the government was made aware of this, the released dataset was taken offline.

iTWire initially asked Mustafa al-Bassam, a former member of Anonymous, and now a security researcher and doctoral scholar, for an opinion. Bassam directed the queries to Prof Danezis.

"In general it is very hard, to the point of being practically impossible, to take a rich dataset of patient or other records, and produce an 'anonymised' dataset that could be used instead of the first, to mine any information," Prof Danezis said/

"The key reason for this is that all information remaining about the record could be used as an identifier - even the most mundane one. If someone for example knows that you went to the doctor on three dates, they can use that information to re-identify your record if that — non-sensitive information — is still available.

"This problem — namely the fact that all information about a person may be used to re-identify a record (not just names, addresses, etc) — has been studied." In this context, he referred to a paper titled "Myths and fallacies of 'personally identifiable information'" by Arvind Narayanan and Vitaly Shmatikov. 

However, Prof Danezis said, what was known how to do safely was to provide private query interfaces into personal information databases.

"Given a statistical query over some records, we can estimate how much to perturb the resulting answers, to ensure that they leak very little information about any person. Those techniques are called 'differential privacy', and are the golden standard computer security and privacy experts would recommend at this point to ensure personal data does not leak."

Prof Danezis added that it should be noted that that these did not result in a "safe" dataset, but instead only answered the questions posed by the query. He said differential privacy had been discussed by Cynthia Dwork in a paper titled "A firm foundation for private data analysis".


With 50+ Speakers, 300+ senior data and analytics executives, over 3 exciting days you will indulge in all things data and analytics before leaving with strategic takeaways that will catapult you ahead on your journey

· CDAO Sydney is designed to bring together senior executives in data and analytics from progressive organisations
· Improve operations and services
· Future proof your organisation in this rapidly changing technological landscape
· CDAO Sydney 2-4 April 2019
· Don’t miss out! Register Today!
· Want to find out more? Download the Agenda



Australia is a cyber espionage hot spot.

As we automate, script and move to the cloud, more and more businesses are reliant on infrastructure that has the high potential to be exposed to risk.

It only takes one awry email to expose an accounts’ payable process, and for cyber attackers to cost a business thousands of dollars.

In the free white paper ‘6 Steps to Improve your Business Cyber Security’ you’ll learn some simple steps you should be taking to prevent devastating and malicious cyber attacks from destroying your business.

Cyber security can no longer be ignored, in this white paper you’ll learn:

· How does business security get breached?
· What can it cost to get it wrong?
· 6 actionable tips


Sam Varghese

website statistics

Sam Varghese has been writing for iTWire since 2006, a year after the sitecame into existence. For nearly a decade thereafter, he wrote mostly about free and open source software, based on his own use of this genre of software. Since May 2016, he has been writing across many areas of technology. He has been a journalist for nearly 40 years in India (Indian Express and Deccan Herald), the UAE (Khaleej Times) and Australia (Daily Commercial News (now defunct) and The Age). His personal blog is titled Irregular Expression.


Popular News




Sponsored News