Home Data Personal data de-identification practically impossible: expert

The Australian government may well have to reconsider its plan to release data about its citizens for use by businesses or scientists, with a top expert in security and privacy engineering telling iTWire that de-identification of datasets so that they cannot be traced back to the original is very hard, to the point of being practically impossible.

George Danezis, professor of security and privacy engineering at University College, London, offered the response after being asked about de-identification of datasets in connection with the leak of personally identifiable details in health data that was released by the government.

Researchers at Melbourne University were able to trace back the data and after the government was made aware of this, the released dataset was taken offline.

iTWire initially asked Mustafa al-Bassam, a former member of Anonymous, and now a security researcher and doctoral scholar, for an opinion. Bassam directed the queries to Prof Danezis.

"In general it is very hard, to the point of being practically impossible, to take a rich dataset of patient or other records, and produce an 'anonymised' dataset that could be used instead of the first, to mine any information," Prof Danezis said/

"The key reason for this is that all information remaining about the record could be used as an identifier - even the most mundane one. If someone for example knows that you went to the doctor on three dates, they can use that information to re-identify your record if that — non-sensitive information — is still available.

"This problem — namely the fact that all information about a person may be used to re-identify a record (not just names, addresses, etc) — has been studied." In this context, he referred to a paper titled "Myths and fallacies of 'personally identifiable information'" by Arvind Narayanan and Vitaly Shmatikov. 

However, Prof Danezis said, what was known how to do safely was to provide private query interfaces into personal information databases.

"Given a statistical query over some records, we can estimate how much to perturb the resulting answers, to ensure that they leak very little information about any person. Those techniques are called 'differential privacy', and are the golden standard computer security and privacy experts would recommend at this point to ensure personal data does not leak."

Prof Danezis added that it should be noted that that these did not result in a "safe" dataset, but instead only answered the questions posed by the query. He said differential privacy had been discussed by Cynthia Dwork in a paper titled "A firm foundation for private data analysis".

HOW TOP MANAGERS MOTIVATE, ENERGISE EMPLOYEES

Download an in-depth guide to managing a healthy, motivated and energetic workforce without breaking the bank.

DOWNLOAD NOW!

Sam Varghese

website statistics

A professional journalist with decades of experience, Sam for nine years used DOS and then Windows, which led him to start experimenting with GNU/Linux in 1998. Since then he has written widely about the use of both free and open source software, and the people behind the code. His personal blog is titled Irregular Expression.

 

 

 

 

Connect

Join the iTWire Community and be part of the latest news, invites to exclusive events, whitepapers and educational materials and oppertunities.
Why do I want to receive this daily update?
  • The latest features from iTWire
  • Free whitepaper downloads
  • Industry opportunities