Data protection: Can medical research be anonymized?

Interview with Prof. Klaus Pommerening, Spokesperson of the Data Protection Committee, TMF - Technology and Method Platform for Networked Medical Research

Electronic health records, telemedicine, cloud computing and big data: questions about data protection appear everywhere in digitized health care. Yet what do things look like far away from application at the foundations of medicine? Can patient data and personal rights in research be protected when several centers and numerous researchers participate in studies?


Photo: man with glasses and half-bald head

Prof. Klaus Pommerening; ©privat

In this interview with, Prof Klaus Pommerening talks about guidelines on data protection concepts in research, establishing a data protection concept and the (non-) anonymization of patient data.

Prof Pommerening, how is data protection in medical research actually regulated? Are there laws, standards or norms?

Prof. Klaus Pommerening: The legislative body here in Germany only requires us to adhere to the general data protection laws. Of course, doctor-patient confidentiality as it pertains to treatment data also applies. In addition, there is a statement issued by the German Ethics Council on the work of biobanks, which was also carried over to research outside of biobanks at some point. However, there is no legal recourse for this statement.

In terms of standards, the TMF has issued guidelines on data protection for medical research projects in early 2014. It contains best practice recommendations you can implement yourself. The guidelines stand in the tradition of a previous project called "Generic Data Protection Concepts for Medical Research Networks".

How about internationally?

Pommerening: There are no guidelines as such in the respective countries as far as I can tell from several international projects we participated in. Of course, there are also expert opinions that veer in this direction, but there are no best practice documents. However, data protection laws are largely consistent in Europe to where the situation is essentially similar in the other European countries.

Photo: Tiny cyclopses stare at hard drive

The general organizational environment and the cooperation of all involved parties govern in the end, what the data protection concept has to look like; © 3dkombinat

Whom can you choose as a role model in implementing data protection for instance?

Pommerening: You mean a concrete research network? The TMF has, of course, consulted many, who are now geared towards our guidelines or those of the precursor. However, I cannot point to anyone since the networks would have to authorize this first. But you can check out several examples on the TMF webpage after you register.

Is this type of guideline sufficient for all research projects or do you need to make your concept contingent on the research project?

Pommerening: It depends on the project of course. Our guideline has a modular set-up, however. Users can essentially select those portions they need for their project and create their concept from them. Networks that are geared toward the guideline are usually very quick in creating a concept that was also accepted by governmental data protection authorities.

What expenditures and costs are involved with such a concept?

Pommerening: This is hard to quantify. That said the considerations you need to take into account in setting up a research network are actually the same considerations that ultimately lead to a data protection concept. You first need to be clear on the general organizational environment and the cooperation of all involved parties. This also pertains to the IT level. The more thought-out the project is, the better the chances of success for the data protection concept becomes.

You also need to consider certain additional measures when it comes to costs. This includes investments in IT security for example and establishing pseudonymization and anonymization services that protect the patient's data and identity.
Photo: People with covered faces

All personally identifiable information for a patient is deleted in medical research. Aditionally, all data needs to be edited that allows the indirect identification of a person; © dolgachov

How does anonymization of patient data actually work?

Pommerening: Generally, you delete all personally identifiable information for a patient from the medical information system, which includes the name, address and case number. Simply anything, that points directly to the party involved. Yet this is just the first step since you can still identify a patient from the remaining data.

You could identify someone based on a combination of certain characteristics. Effective anonymization tools check how these characteristics can be coarsened, summarized and categorized. They do everything that makes the re-identification of a patient difficult. However, a complete anonymization and prevention of re-identification attempts, as outlined in the data protection laws, is hardly feasible in medicine. The data can subsequently no longer be used or analyzed. This is only possible with very general statistics, not with detailed studies.

This is ultimately also the TMF's position: usable data can never be anonymized to where you could make it "publically available". You always have to implement use restrictions, regardless of how you actually plan to enforce them.

Photo: Binary code overlays a picture of a woman

Data protection is becoming more and more difficult since more and more data is created; © editorialz

Where is data protection headed in your opinion? What trends do you see?

Pommerening: More and more data is being generated and stored. Firstly, people are more willing to disclose their information, for instance in social networks. Of course, it is difficult for us to control this. Secondly, tracker and wearable devices, ambient assistant living tools or other assistance systems also provide data without our help. People and their health status are monitored in the case of assisted living for instance. This makes sense by itself, but detailed data is being recorded throughout the entire day.

A third problem is that personalized medicine creates and uses genomic data that has a high risk of patients being re-identified.

At the very least, these three trends render data protection ever more complex. This also makes the issue of the impossibility of data anonymization more and more important and urgent, but also harder to solve.

Photo: Timo Roth; Copyright: B. Frommann

© B. Frommann

The interview was conducted by Timo Roth and translated from German by Elena O'Meara.