A Privacy Measure for Data Disclosure to Publish Micro Data using (N,T) - Closeness

A. Sunitha; K. Venkata Subba Reddy; B. Vijayakumar

Research Article

A Privacy Measure for Data Disclosure to Publish Micro Data using (N,T) - Closeness

by A. Sunitha, K. Venkata Subba Reddy, B. Vijayakumar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 51 - Issue 6

Published: August 2012

Authors: A. Sunitha, K. Venkata Subba Reddy, B. Vijayakumar

10.5120/8047-1379

PDF

A. Sunitha, K. Venkata Subba Reddy, B. Vijayakumar . A Privacy Measure for Data Disclosure to Publish Micro Data using (N,T) - Closeness. International Journal of Computer Applications. 51, 6 (August 2012), 22-28. DOI=10.5120/8047-1379

                        @article{ 10.5120/8047-1379,
                        author  = { A. Sunitha,K. Venkata Subba Reddy,B. Vijayakumar },
                        title   = { A Privacy Measure for Data Disclosure to Publish Micro Data using (N,T) - Closeness },
                        journal = { International Journal of Computer Applications },
                        year    = { 2012 },
                        volume  = { 51 },
                        number  = { 6 },
                        pages   = { 22-28 },
                        doi     = { 10.5120/8047-1379 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2012
                        %A A. Sunitha
                        %A K. Venkata Subba Reddy
                        %A B. Vijayakumar
                        %T A Privacy Measure for Data Disclosure to Publish Micro Data using (N,T) - Closeness%T 
                        %J International Journal of Computer Applications
                        %V 51
                        %N 6
                        %P 22-28
                        %R 10.5120/8047-1379
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Closeness is described as a privacy measure and its advantages are illustrated through examples and experiments on a real dataset. In this Paper the closeness can be verified by giving different values for N and T. Government agencies and other organizations often need to publish micro data, e. g. , medical data or census data, for research and other purposes. Typically, such data are stored in a table, and each record (row) corresponds to one individual. Generally if we want to publish micro data A common anonymization approach is generalization, which replaces quasi-identifier values with values that are less-specific but semantically consistent. As a result, more records will have the same set of quasi-identifier values. An equivalence class of an anonymized table is defined to be a set of records that have the same values for the quasi-identifiers To effectively limit disclosure, the disclosure risk of an anonymized table is to be measured. To this end, k-anonymity is introduced as the property that each record is indistinguishable with at least k-1 other records with respect to the quasi-identifier i. e. , k-anonymity requires that each equivalence class contains at least k records. While k-anonymity protects against identity disclosure, it is insufficient to prevent attribute disclosure. To address the above limitation of k-anonymity, a new notion of privacy, called l-diversity is introduced, which requires that the distribution of a sensitive attribute in each equivalence class has at least l "well represented" values. One problem with l-diversity is that it is limited in its assumption of adversarial knowledge. This assumption generalizes the specific background and homogeneity attacks used to motivate l-diversity. The k-anonymity privacy requirement for publishing micro data requires that each equivalence class contains at least k records. But k-anonymity cannot prevent attribute disclosure. The notion of l-diversity has been proposed to address this; l-diversity requires that each equivalence class has at least l well-represented values for each sensitive attribute. L-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. Due to these limitations, a new notion of privacy called "closeness" is proposed. First the base model t- closeness is presented, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table. Then a more flexible privacy model called (n, t)-closeness is proposed. The rationale for using

References

N. Li, T. Li, and S. Venkatasubramanian, "Closeness: A New Privacy Measure for Data Publishing," Proc. Int'l Conf. Data Eng. (ICDE), pp. 943-956, 2010.
D. Lambert, "Measures of Disclosure Risk and Harm," J. OfficialStatistics, vol. 9, pp. 313-331, 1993.
L. Sweeney, "k-Anonymity: A Model for Protecting Privacy," Int'l pp. 557-570, J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, 2002.
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venk atasubramaniam, "'l-diversity: PrivacyBeyond k- Anonymity," Proc. Int'l Conf. Data Eng. (ICDE), pp. 24, 2006
N. Li, T. Li, and S. Venkatasubramanian, "t- Closeness: Privacy beyond k-Anonymity and l- Diversity," Proc. Int'l Conf. Data Eng. (ICDE), pp. 106- 115, 2007.
K. LeFevre, D. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional-Anonymity," Proc. Int'l Conf. Data Eng. (ICDE),p. 25, 2006.
N. Li, T. Li, and S. Venkatasubramanian, "t- Closeness: Privacy beyond k-Anonymity and 'Diversity," Proc. Int'l Conf. Data Eng. (ICDE), pp. 106-115, 2007.
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam,"'-Diversity: Privacy Beyond k-Anonymity," Proc. Int'l Conf. Data Eng. (ICDE), p. 24, 2006.
L. Sweeney, "k-Anonymity: A Model for Protecting Privacy," Int'lJ. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557- 570, 2002.
T. Li and N. Li, "Towards Optimal k- Anonymization," Data and Knowledge Eng. , vol. 65, pp. 22-39, 2008.
R. J. Bayardo and R. Agrawal, "Data Privacy through Optimal k-Anonymization," Proc. Int'l Conf. Data Eng. (ICDE), pp. 217-228, 2005.
B. C. M. Fung, K. Wang, and P. S. Yu, "Top-Down Specialization for Information and Privacy Preservation," Proc. Int'l Conf. Data Eng. (ICDE), pp. 205-216, 2005.
T. Li, N. Li, and J. Zhang, "Modeling and Integrating Background Knowledge in Data Anonymization," Proc. Int'l Conf. Data Eng. (ICDE), 2009.
M. E. Nergiz, M. Atzori, and C. Clifton, "Hiding the Presence of Individuals from Shared Databases," Proc. ACM SIGMOD,pp. 665-676, 2007.
H. Park and K. Shim, "Approximate Algorithms for K-Anonymity,"Proc. ACM SIGMOD, pp. 67-78, 2007.
V. Rastogi, S. Hong, and D. Susie, "The Boundary between Privacy and Utility in Data Publishing," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 531-542, 2007.
R. C. -W. Wong, J. Li, A. W. -C. Fu, and K. Wang, "k-Anonymity: An Enhanced k-Anonymity Model for Privacy Preserving Data Publishing," Proc. ACM SIGKDD, pp. 754-759, 2006.
X. Xiao and Y. Tao, "Anatomy: Simple and Effective Privacy Preservation," Proc. Int'l Conf. Very Large Data Bases (VLDB),pp. 139-150, 2006.
X. Xiao and Y. Tao, "Personalized Privacy Preservation," Proc. ACM SIGMOD, pp. 229-240, 2006.
X. Xiao and Y. Tao, "m-Invariance: Towards Privacy Preserving Republication of Dynamic Datasets," Proc. ACM SIGMOD,pp. 689-700, 2007.
V. S. Iyengar, "Transforming Data to Satisfy Privacy Constraints,"Proc. ACM SIGKDD, pp. 279-288, 2002.
D. Kifer and J. Gehrke, "Injecting Utility into Anonymized Datasets," Proc. ACM SIGMOD, pp. 217-228, 2006.
N. Koudas, D. Srivastava, T. Yu, and Q. Zhang, "Aggregate Query Answering on Anonymized Tables," Proc. Int'l Conf. Data Eng. (ICDE), pp. 116-125, 2007.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Privacy Measure K-Anonomity L-Diversity data Anonymization (n-t) closeness