Explanatory notes on Big Data

Big data offers new opportunities for social and scientific research and a modified form of value creation for businesses. However, big data can also pose a threat to privacy, for example if the processed data is not or is insufficiently anonymised. Where data related to specific persons is involved, the right to privacy and the protection of personal data must be upheld.


Recently the issue of "big data" has come under the spotlight, not least due to revelations about the enormous volumes of data that various intelligence services have gathered, stored and analysed. Shopping, electricity consumption, and the daily use of telecommunications and online services, electronic devices, credit and debit cards etc. generate enormous volumes of data. Added to this is the data that the authorities publish as a result of their statutory duties or as part of open-data projects. The accumulated information is evaluated by a variety of parties and is also put to commercial use. From an economic point of view, big data has enormous potential. According to estimates, the volume of data stored around the world will at least increase fortyfold by 2020.

What is "big data"?

The term "big data" refers to a large volume of data from various sources that is collected and stored at high processing speeds and made available for assessment and analysis for an indeterminate period of time and for undefined purposes. The intensive processing procedures have become possible because, due to technological developments, the costs and the time required for the storage and evaluation of massive volumes of data have drastically decreased. Data can be stored without any difficulty for long periods and can be used again for any purpose in the future. Newly developed methods and technologies make it possible and simple to analyse and match vast volumes of data. Algorithms are applied to a large database with the aim of identifying new patterns, similarities, links or discrepancies.

Big data can basically be defined on the basis of four features, known as the four "Vs";

Big data is large volumes of data that are processed at high velocity. A third feature is the different properties or variety of the data. Big data offers new opportunities to combine data from various sources that have not previously been compared. For example, data from internal customer databases can be matched with external data from social networks, search engines, official gazettes or data collections from official open data portals. The fourth feature is the added value that the analysis of data is intended to bring. 

Opportunities and risks of big data

Big data has also been referred to as "the new oil" or a "goldmine" because it brings new opportunities for social or scientific research and offers a modified form of value creation to commercial enterprises, in that unstructured and heterogeneous information can be exploited through matching and evaluation. Typical applications include rapid automated market research that can react immediately to changes, uncovering cases of misconduct in financial transactions, detailed web analyses aimed at increasing and optimising online marketing measures, comprehensive medical diagnostics or dragnet investigations and profiling by intelligence services or the police. 

Big data can however represent a serious threat to privacy, if information from certain areas of people's lives is collected and evaluated in a systematic and structured way. An insurance company could, for example, refuse to cover someone because an analysis of their health data suggests there is a serious possibility of a future illness. Or intelligence services can use big data algorithms to predict probable security risks, which may lead them to subject private individuals to permanent surveillance by a variety of methods.

Big data - a data protection problem?

Data protection legislation regulates how personal data is handled. Personal data is all the information relating to an identified or identifiable person (Art. 3 lit. a DPA). It is often argued that in most cases big data is obtained by simply collecting technical or anonymised data and that therefore the data protection provisions do not apply.

The difficulty in regarding big data as "technical data" or "anonymised" is that there is a possibility that data will be de-anonymised when several data collections are merged. In many cases, the anonymisation of certain clear identifiers is not enough to prevent re-identification. Even with so-called quasi-identifiers - combinations of particulars such as date of birth, sex and postcode - care must be taken. US researchers have found that four fifths of the American population can be identified solely on the basis of these three criteria. Attribution after the fact becomes considerably more difficult, however, if quasi-identifiers are processed in a generalised form, i.e. if instead of the precise details of a person's age (for example 44), the range of "40-49 years" is used. Where several data fields are generalised, this is known as "k-anonymity". The higher the value of "k", the more identical data pairs exist with the same combination of data values, thus making anonymisation more effective. If the chosen level of personal data anonymisation is too low, the consequence is that the requirements of data protection law continue to apply to the data processing and the original holder of the data can be held to account.

A further difficulty lies in predicting technological developments: though data may be regarded as "anonymous" today, thanks to rapid technological progress and additional data sources, tomorrow it may be possible to attribute it to a specific person without much difficulty, thus potentially causing a serious breach of privacy. This makes it essential to consider data protection issues when developing new technologies. Data protection must be part of the overall concept from the very outset ("privacy by design"), to avoid the arduous and expensive process of having to remedy data protection problems retrospectively.

There are additional serious data protection issues relating to big data:

  • Technological capabilities pose a challenge to the requirement of transparency imposed by data protection law: everyone is entitled to know who is processing what data about them and for what purpose. In the case of big data, it is very hard to keep track of data processing and the matching of data from various sources on any specific person, or indeed to know if any data is being processed at all. This means that big data users face a particular challenge in relation to transparency and in providing information to the persons concerned.

  • Processing personal big data requires the consent of the persons concerned. This means that the purpose of the big data procedure must be clear to those concerned as early as the data procurement stage. This, however, contradicts the basic concept of big data, that data should be collected and stored with a view to serving some as yet undetermined purpose. If a vague, general description of the purpose of the data processing is provided, this means that any consent given to the planned data processing is not legally valid.

  • A further difficulty is the requirement of data accuracy: algorithms are applied to big data to independently and automatically analyse large data collections to establish links, for example. This analysis creates new information related to specific persons that cannot be judged to be correct or wrong, but simply amounts to probabilities or interpretations.


Big data offers new opportunities for social and scientific research and a modified form of value creation for businesses. However, big data can also pose a threat to privacy, for example if the processed data is not or is insufficiently anonymised. Where data related to specific persons is involved, the right to privacy and the protection of personal data must be upheld. The priority must be to ensure that the technology and procedures used to store and process big data are compatible with data protection. Consideration must be given to data protection issues in the conceptual phase, and data security must be guaranteed. In addition, big data must be made subject to strict requirements of transparency and clear procedures. Personal big data is at odds with the basic principles of the Data Protection Act, in particular the requirements of a clear purpose and of economy of data. Serious challenges are being posed to the current conception of data protection, as big data is now being exploited and some of the key provisions of the Data Protection Act (DPA) are therefore being called into question. A fundamental review of the DPA is needed to find out how the key principles of purpose, consent and transparency can be complied with when using big data.