The following guidance will help you consider and develop your understanding of Anonymisation and Pseudonymisation plus highlight practical techniques to accomplish both.
Content
- Key Concepts
- Motivated Intruder Test
- Data Type Specific Considerations
- Common Approaches to Anonymise Datasets
- ‘Whose Hands’ approach to anonymisation
- More resources
- Anonymous data / information – the UK GDPR states ‘“…information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.”. While data protection legislation does not apply to anonymised data, the action of turning personal data into anonymised data will likely be considered as data processing and documented.
- Anonymisation – this is the act of rendering personal data anonymous so that it cannot be tracked back to an individual. This action can become more complex if you have large datasets that contain a wide range of data.
- Pseudonymisation – this is the replacement of directly identifiable parameters with pseudonyms, which will still constitute unique identifying indicators and is classed as personal data under legislation.
- De-identification – often used to describe the removal of all uniquely personal characteristics from data so that they can no longer be linked to a specific individual.. However, the term is not encouraged to be used as it can often be used interchangeably when referring to either pseudonymised or anonymised data. In contrast the term re-identification is described as the process of reidentifying pseudonymised or anonymised data under the context of it being a criminal offence under Section 171 of the Data Protection Act.
- Data Value – it should be assumed that the greater the perceived value of the data, from the perspective of the motivated intruder, potentially the greater the capabilities, tools and resources that are at the intruder’s disposal. This also assists, in part, to understand the possible motives behind a motivated intruder such as; does the data represent potential financial or political reward, malicious intent or simply curiosity.
Information is considered identifiable based on context of the data, how it was collected plus any other information that is held (or likely to be held) by the organisation holding the data. As such, every processing activity must be assessed on its individual merits and with the understanding of how the dataset, as a whole, can / will interact with other datasets held and potentially any external datasets which could be made available (for example data provided by the NHS, NHS Trusts and/or Biobank etc.)
Obvious identifiers include name, address, postcode, date of birth, and NHS Number though combinations of less obvious data items can sometimes also result in the information becoming identifiable and anonymisation being difficult to accomplish. These would include where persons suffer rare conditions or combinations of unique data points/traits.
Annex 3 in the [URL], gives some practical examples of anonymisation procedures which could be used to minimise identifiability of data and to support any such process the ICO recommends undertaking a Motivated Intruder Test to assess the effectiveness of the anonymisation process.
A method widely used to review anonymisation processes is the Motivated Intruder Test. This is an activity which considers all the practical steps and all the means that are reasonably likely to be used by someone who is motivated to identify the people whose personal data the anonymous information is derived from. The test is used to assess the identifiability risk of (proposed) anonymous information.
The approach assumes that the ‘motivated intruder’ is:
- reasonably competent,
- has access to resources such as the internet, libraries, and all public documents, and
- would employ investigative techniques such as making enquiries of people who may have additional knowledge of the identity of the data subject or advertising for anyone with information to come forward.
The ‘motivated intruder’ is not assumed to have:
- specialist knowledge such as computer hacking skills,
- access to specialist equipment, or
- intent towards criminality such as burglary, to gain access to data that is kept securely.
Techniques often considered:
- use of AI.
- online searchers for key identifiers (postcodes, birth dates).
- local and national press.
- trawling social media for associations that aid re-identification.
See full details via the [URL]. To support personnel undertake the Motivated Intruder Test within 51³Ô¹ÏÍø, we have created an Anonymisation Form Template [DOCX] which can be uploaded alongside the relevant Data Activity Risk-assessment Tool (DART) [URL] entry. The Anonymisation Form Template can also be accessed via Templates [URL].
Genomic Data
The [URL] states: "However, the definition of personal data also includes identification by reference to “one or more factors specific to the genetic identity of that natural person”, even without their name or other identifier. So, in practice, genetic analysis which includes enough genetic markers to be unique to an individual is personal data and special category genetic data, even if you have removed other names or identifiers."
It seems likely that some types of genetic data cannot be traced back to an individual. If the university wishes to classify any types of genetic data as non-personal data then documentation of the reasoning taken by expert individuals to reach this position is important evidence of good data stewardship.
The developed in 51³Ô¹ÏÍø provides specific advice around genomic data types.
Face data
Widespread use of online platforms providing name alongside facial images combined with the advances in facial recognition has meant that face data would not satisfy the motivated intruder test. Care must also be taken with any imaging data which may be used to reconstruct facial data such as MRI sequential scans of faces.
Voice data
As with face data, voice data is increasingly more identifiable as online availability of video data with audio becomes widespread.
Geographic
Use of fine geographic data can be used to identify individuals’ places or work, homes or facilities they have been known to attend. This may directly identify an individual. Even rough geographic data can be combined with other available data to be effectively used to break anonymity of a dataset
Free Text / Survey Data
Collection of Free Text / Survey data when attempting to collect directly in an anonymised way or anonymising after collection, can be complicated by the context of the wording itself, inclusion of directly / indirectly identifiable data and can be resource intensive to review - depending on the volume and subject matter being collected. Specific guidance for those providing such information is therefore crucial to include and having a review process after collection is vital to avoid personal data becoming accidentally included in a supposed ‘anonymised’ dataset.
Other Data Types which may reveal personal information or identify an individual
- Omics – proteomics.
- AI / machine learning.
- Data generated from processing biological samples.
- Movement data / gait analysis.
- Other freely available open datasets.
- Combining datasets.
- Minimisation of accuracy.
- Minimisation of fields.
- Masking.
- Imaging – obscuring identifiable features.
- Voice data – transforming / masking.
- Aggregation of data.
- Uniqueness tests.
- Removal of linking fields.
- Avoidance of small numbers.
It’s often suggested that when pseudonymised data is shared with a third party then, subject to that party NOT receiving the associated key or additional information that could reidentify any data subject / individual included in the dataset, then it can be argued that the dataset could be deemed to be anonymised’ and therefore outside of scope of data protection legislation. However, that is not strictly correct as the ‘Whose Hands’ approach only applies when disclosing information to an organisation who is NOT acting with you as a data processor or as a joint controller.
Data Processors
The UK regulator for data protection, the ICO, confirms that “… a processor only processes personal data on your [the controller’s] behalf. This means the status of the data in your hands [the data controller’s hands] is what matters”. Therefore, regardless of any measures put in place or actions undertaken by the data controller, the data is always deemed to be personal data in the hands of the controller and the data processor who is acting on their behalf. This is because the data processor receives, accesses and otherwise processes the personal data, not for its own purposes, but “on behalf of the controller”. This means that whether the data is personal data in the hands of the processor is irrelevant and as such all requirements under the legislation, such as inclusion of data protection related clauses and / or inclusion of the International Data Transfer Agreement (IDTA) or EU Standard Contract Clauses’ remains.
This will be the case regardless of whether or not the processor would itself have the means or ability to identify the data subject / individual to whom the processed data relates, because the processor will always be processing personal data on behalf of its controller and legislation requires the controller to only use a processor that provides “sufficient guarantees to implement appropriate technical and organisational measures in such a manner that processing will meet the requirements of data protection legislation and ensure the protection of the rights of the data subject”.
Joint controllers
Similar to data processors, where data is held by one party as a data controller in an identifiable format (regardless of any actions / methodology to ‘anonymise it’), the nature of the joint-controller relationship where 2 or more organisations operate collaboratively towards joint set goals, then the data must always be treated as personal data if shared or processed between the parties. Furthermore, under data protection legislation, the joint controller relationship must be recorded in documentation as there is legally mandatory information which is required to be catalogued and explained to those data subjects / individuals whose data is being processed.
Separate Data Controllers
Where 2 or more organisations are acting independent of one another, it MAYBE possible to share data that was originally personal data in an anonymised format - subject to the process of anonymisation (along with any additional actions such as contractual requirements) being adequate to the complexity of the original data set. Within 51³Ô¹ÏÍø, the process(es) for anonymisation can be recorded within the Anonymisation Form, uploaded as supporting documentation to the associated DART entry for the original dataset and reviewed by the relevant team as part of the overall DART entry.
Summary
In situations where access to personal data is sought by an entirely independent party (separate data controllers), it may be possible to assess whether the data in question would be personal data in the hands of the requester without any analysis of the respective data protection roles of the parties. This may be as, in such a scenario, it is likely to be apparent that there is no joint controllership or controller to processor relationship in play. However, where numerous entities or individuals are co-operating on a joint project involving the processing of data that is personal data in the hands of a least one of them, to ensure compliance with data protection obligations, it will be necessary first to determine the parties’ respective data protection roles in relation to that data.
Where an assessment of data protection roles has been carried out, it may not be necessary to carry out any further assessment as to whether or not the data to be disclosed will be personal data (as where data is shared by a controller with its processor or joint controllers all such parties will be processing personal data). As set out in our guidance above, only where the would-be recipient is neither a processor for, nor joint controller with, the party proposing to share the data (e.g. the data controller), will it be necessary to determine whether the data the requesting party would receive will be personal data.
For more information on the ‘Whose hands’ approach, please see [URL]
- [URL]
- [URL]
- [PDF]
- [URL]
- [URL]
- [URL]