Research Article

SOFIM: FREQUENT ITEMSET MINING IN OPTIMIZED HDFS WITH SECURE DE-DUPLICATION

by  Bosco Nirmala Priya, Parathasarathi Murugesan, C. Kaleeswari, Achsah Susan Mathew, J. Vimala Roselin, Balakiran S.
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 7
Published: May 2025
Authors: Bosco Nirmala Priya, Parathasarathi Murugesan, C. Kaleeswari, Achsah Susan Mathew, J. Vimala Roselin, Balakiran S.
10.5120/ijca2025924960
PDF

Bosco Nirmala Priya, Parathasarathi Murugesan, C. Kaleeswari, Achsah Susan Mathew, J. Vimala Roselin, Balakiran S. . SOFIM: FREQUENT ITEMSET MINING IN OPTIMIZED HDFS WITH SECURE DE-DUPLICATION. International Journal of Computer Applications. 187, 7 (May 2025), 26-35. DOI=10.5120/ijca2025924960

                        @article{ 10.5120/ijca2025924960,
                        author  = { Bosco Nirmala Priya,Parathasarathi Murugesan,C. Kaleeswari,Achsah Susan Mathew,J. Vimala Roselin,Balakiran S. },
                        title   = { SOFIM: FREQUENT ITEMSET MINING IN OPTIMIZED HDFS WITH SECURE DE-DUPLICATION },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 7 },
                        pages   = { 26-35 },
                        doi     = { 10.5120/ijca2025924960 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2025
                        %A Bosco Nirmala Priya
                        %A Parathasarathi Murugesan
                        %A C. Kaleeswari
                        %A Achsah Susan Mathew
                        %A J. Vimala Roselin
                        %A Balakiran S.
                        %T SOFIM: FREQUENT ITEMSET MINING IN OPTIMIZED HDFS WITH SECURE DE-DUPLICATION%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 7
                        %P 26-35
                        %R 10.5120/ijca2025924960
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Frequent itemset mining has developed into a critical data mining approach for a variety of study domains. The term "common patterns" refers to those that show often in datasets. Numerous methods for analyzing all common itemsets in the database have been presented. A novel hybrid method is proposed to provide a better result for online applications. Big Data stores a huge volume of data from various industrial applications. The stored information must be retrieved with valuable information from the optimized server. In this paper, the proposed SOFIM (Server Optimized Frequent Itemset Mining) technique finds the positive review-based frequent itemset and improves a storage server's performance. This can be achieved by analyzing the sentiment of a product review. The redundant reviews areavoided by checking duplication. The server performance is optimized by partially replicating the review data in multiple servers. Finally, the combined hybrid model SOFIM provides a better solution for finding frequent item sets.

References
  • Sivarajah, Uthayasankar, Zahir Irani, and Vishanth Weerakkody, "Evaluating The UseAnd Impact of Web 2.0 Technologies in Local Government," Government Information Quarterly. Elsevier, pp. 473–487, 2015.
  • Minqing Hu, and Bing Liu, "Mining and Summarizing Customer Reviews," Association for Computing Machinery -ACM, pp. 168-177, 2004.
  • Haseena, S., Manoruthra, S., Hemalatha, P., & Akshaya, V. (2018). Mining FrequentItemsets on Large Scale Temporal Data. 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). doi:10.1109/iceca.2018.8474890
  • R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules between sets of items in large databases," ACM SIGMOD Rec., vol. 22, no. 2, pp. 207–216, 1993.
  • Quan, Y., & Zhilong, L. (2020). Efficient Algorithm for Mining Probabilistic Frequent Itemsets of Uncertain Data. 2020 2nd International Conference on Information TechnologyandComputerApplication(ITCA). doi:10.1109/itca52113.2020.00017
  • Salman,W.A.,&Sadkhan,S.B.(2020). Status and Challenges of Frequent Itemsets and Association Rules MiningMethods. 2020 3rd International Conference on Engineering Technology and Its Applications (IICETA). doi:10.1109/iiceta50496.2020.9318
  • Silambarasan E, Nickolas S, Mary Saira BhanuS.(2020).CECPABE:ANovel Approach for Secure Data Deduplication in Cloud. International Journal of Advanced Science and Technology, 29(10s), 7958-7971. Retrieved from http://sersc.org/journals/index.php/IJAST/article /view/24241
  • Yuan, Haoran; Chen, Xiaofeng; Li, Jin; Jiang, Tao; Wang, Jianfeng; Deng, Robert (2019). Secure Cloud Data Deduplication with Efficient Re-encryption. IEEE Transactions on Services Computing, (), 1–1.doi:10.1109/TSC.2019.2948007
  • S.Wu, C.Du,H.Li, H.Jiang, Z.Shenand B. Mao, "CAGC: A Content-aware Garbage Collection Scheme for Ultra-Low LatencyFlash-based SSDs,"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021, pp. 162-171, doi: 10.1109/IPDPS49936.2021.00025.
  • Zhang,D.,Le,J.,Mu,N.,Wu,J.,&Liao, X. (2021).Secure and Efficient Data De- duplication in JointCloud Storage. IEEE TransactionsonCloudComputing,1–1.doi:10.1109/tcc.2021.3081702.
  • Vijayalakshmi, K., & Jayalakshmi, V. (2021). Analysis on data de-duplication techniques of storage of big data in cloud. 2021 5th International Conference on Computing Methodologies and Communication (ICCMC). doi:10.1109/iccmc51019.2021.94184
  • Sharma, N., Krishna Prasad, A. V., & Kakulapati,V.(2021).File-levelDe-duplication by using text files – Hive integration. 2021 International Conference on Computer Communication and Informatics (ICCCI). doi:10.1109/iccci50826.2021.9402465
  • Reddy, B. T., Vaishnavi, M., Lalitha, M., Poojitha, P., & Kanthi, V. B. S. (2021).Privacy Preserving Data Deduplication in cloud using Advanced Encryption Standard. 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS). doi:10.1109/icais50930.2021.93957
  • Kumar, Naresh; Antwal, Shobha; Samarthyam, Ganesh; Jain, S.C (2017).[IEEE 2017 4th International Conference on Signal Processing, Computing and Control (ISPCC) - solan,India(2017.9.21-2017.9.23)]20174th International Conference on Signal Processing, Computing and Control (ISPCC) - Genetic optimized data de-duplication for distributed big datastoragesystems.,(),7–15.doi:10.1109/ISPCC.2017.8269581
  • Bartus, Paul; Arzuaga, Emmanuel(2018). [IEEE 2018 IEEE InternationalCongress on Big Data (BigData Congress) - San Francisco,CA,USA(2018.7.2-2018.7.7)]2018 IEEE International Congress on Big Data (BigData Congress) - GDedup: Distributed File System Level Deduplication for Genomic Big Data. ,(),120–127.doi:10.1109/BigDataCongress.2018.00023
  • Zhang, Dongzhan; Liao, Chengfa; Yan, Wenjing; Tao, Ran; Zheng, Wei (2017). [IEEE 2017 Fifth International Conference on AdvancedCloudandBigData(CBD)-Shanghai,China(2017.8.13-2017.8.16)]2017 Fifth International Conference on Advanced CloudandBigData (CBD) -Data Deduplication BasedonHadoop.,(),147–152.doi:10.1109/CBD.2017.33
  • Xia, Qiufen; Xu, Zichuan; Liang, Weifa; Yu,Shui;Guo,Song;Zomaya,Albert (2019). Efficient Data Placement andReplication for QoS-Aware Approximate Query Evaluation of Big Data Analytics. IEEE Transactions on Parallel and Distributed Systems, (),1–1.doi:10.1109/TPDS.2019.2921337
  • A. Beloglazov, J. Abawajy, and R. Buyya. Energy-aware resource allocation heuristics for efficient management of datacenters for cloud computing. J. of Future Generation Computer Systems, Vol. 28, No. 5, pp.755-768, 2012.
  • H. Hou, J. Yu, and R. Hao, "Cloud storage auditingwithde-duplicationsupportingdifferent security levels according to data popularity," J. Netw. Comput. Appl., vol. 134, pp. 26–39, 2019, doi: 10.1016/j.jnca.2019.02.015.
  • R. Kaur, I. Chana, and J. Bhattacharya, "Data de-duplication techniques for efficient cloud storage management: a systematicreview," J. Supercomput., vol. 74, no. 5, pp. 2035–2085,2018,doi:10.1007/s11227-017-2210-8.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Frequent itemset mining bigdata SOFIM De-duplication replication

Powered by PhDFocusTM