Most of us can keep a secret. It is the people we tell it to who cannot.
I’ve been working on data privacy for close to a decade now. And, in my interactions with various groups of people in the IT industry, I have been surprised to see that quite a few people think that ‘privacy’ and ‘security’ are similar concepts. And to maintain privacy, they believe it is enough to ‘mask a couple of fields’.
When you begin to understand user requirements on the ground, things turn out vastly different. Let me give you a unique example that helps differentiate between privacy and security.
Privacy vis-à-vis Security
In the state of Maharashtra, India, is a village called Shingnapur. It is a pilgrimage site known for ‘Shani’, the Hindu god for planet Saturn, who is believed to punish evil-doers. This belief is so powerful that the villages have no doors to their houses; there are only doorways. The villagers do not keep their valuables under lock and key. And yet no theft is reported in the village. Enthused by this, the United Commercial (UCO) Bank opened a ‘lockless’ branch in the village, the first of its kind in the country ! So much for security! While the houses in the village have no doors, they do have curtains for personal privacy. These Shingnapur residents and their quaint houses teach us some important lessons in security and privacy. Security implies protection from danger or threat; privacy is about keeping information secret. Security and privacy are fundamentally different notions, and should not be confused with each other.
Privacy and Security in the Data World
How does this difference in notions of security and privacy show up in the data world? Here data security boils down to ensuring availability of the data to authorized users and protecting access to the data from unauthorized users. Thus, the security paradigm assumes a clear separation between users and attackers. In fact, if you are not a user, then you are treated as an attacker and dealt with accordingly.
Things start getting trickier in the privacy paradigm. As long as the data is used for the intended purpose for which it was originally shared, all is fine. But if the same user were to use it for any unintended purpose, then he turns into an attacker. Thus users are attackers too, depending on the intent of use.
The Data Privacy Problem
Take for instance a set of data that comprises details of patients. This information could include medical details, payment details (such as credit card information), and personal information (such as age and address). Let us assume that an authorized person who needs access to the medical details of the patients should not have access to some of the other information; but at the same time will have access to selective information to enable identification of the individuals in the database. What data should be shared to ensure that work is carried out effectively, while ensuring that individual privacy concerns are met?
For example, a nurse may need the name and the medical details, but not the bank or payment details; the accounts department may not need all medical details, but may need payment details; the IT company which tests or builds the application may need to view all fields, but may not need the real data. This implies that, almost always, data cannot be shared as is. This, in a nutshell, is the data privacy problem.
My team has been working with Stanford University’s team at the National Science Foundation’s TRUST Center on questions of privacy and security. We work with various data formats – documents, speech, databases, and so on. Our work aims at finding the right balance between the two extremes of fully disclosed and completely withheld data that preserves both data privacy and its utility – to give users access to good data and at the same time stop them from becoming attackers.
At the TCS Innovation Forum in Cincinnati on May 21, I will speak on data privacy with respect to the phenomenon of the Internet of Things (IoT). For example, data evolves and passes through multiple administrative domains involving multiple stakeholders, thus making the control of data flow and usage nearly impossible. Given that juxtaposition of multiple, and seemingly unrelated points of data, can become personal information, as events are reviewed in the spatio-temporal context, the definition of what is to be considered as Personally Identifiable Information (PII) – the cornerstone of almost all privacy regulations worldwide – starts getting trickier. Therefore, privacy schemes need to take a holistic approach involving not only advanced cryptography, protocols but also the behavior and contexts of the entities involved.
How does your organization or research view the problem of data privacy?
This blog post is based on a published paper. The paper can be accessed here: