Research Article

Kafka-based Architecture in Building Data Lakes for Real-time Data Streams

by  Kiran Peddireddy
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 185 - Issue 9
Published: May 2023
Authors: Kiran Peddireddy
10.5120/ijca2023922740
PDF

Kiran Peddireddy . Kafka-based Architecture in Building Data Lakes for Real-time Data Streams. International Journal of Computer Applications. 185, 9 (May 2023), 1-3. DOI=10.5120/ijca2023922740

                        @article{ 10.5120/ijca2023922740,
                        author  = { Kiran Peddireddy },
                        title   = { Kafka-based Architecture in Building Data Lakes for Real-time Data Streams },
                        journal = { International Journal of Computer Applications },
                        year    = { 2023 },
                        volume  = { 185 },
                        number  = { 9 },
                        pages   = { 1-3 },
                        doi     = { 10.5120/ijca2023922740 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2023
                        %A Kiran Peddireddy
                        %T Kafka-based Architecture in Building Data Lakes for Real-time Data Streams%T 
                        %J International Journal of Computer Applications
                        %V 185
                        %N 9
                        %P 1-3
                        %R 10.5120/ijca2023922740
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

The purpose of this paper is to investigate how Kafka can be used to construct data lakes for real-time data processing. Kafka has gained widespread popularity as a data ingestion and processing tool that offers scalability, fault tolerance, and flexibility. The benefits of utilizing Kafka in a data lake architecture are analyzed, as well as the procedures involved in utilizing Kafka in a data lake architecture. In addition, a case study is provided of a major financial institution that utilized Kafka to establish a data lake. The significance of Kafka in modern data processing is emphasized in this paper, as well as its worth in developing data lakes for real-time data processing.

References
  • Kiran Peddireddy. (2023). Book Title: “Enterprise Data Integration and Streaming Using Kafka, ActiveMQ, and AWS Kinesis”- ISBN -13 979-8372725218.
  • Apache Kafka Documentation. (2021). Retrieved from
  • https://kafka.apache.org/documentation/
  • Yu, T., Li, Y., Li, X., & Zhang, J. (2019). A Real-Time Customer Complaint Management System Based on Big Data Analytics. Journal of Computational Science, 31, 15- 24.
  • H. Wu, Z. Shang, G. Peng and K. Wolter, "A Reactive Batching Strategy of Apache Kafka for Reliable Stream Processing in Real-time", 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), pp. 207-217, 2020.
  • K. Peddireddy and D. Banga, "Enhancing Customer Experience through Kafka Data Steams for Driven Machine Learning for Complaint Management," International Journal of Computer Trends and Technology, vol. 71, no. 3, pp. 7-13, 2023, doi: 10.14445/22312803/IJCTT- V71I3P102.
  • G. van Dongen and D. V. D. Poel, "A Performance Analysis of Fault Recovery in Stream Processing Frameworks", IEEE Access, vol. 9, pp. 93745-93763, 2021.
  • J. Kreps, N. Narkhede, J. Rao et al., "Kafka: A distributed messaging system for log processing", Proceedings of the NetDB, pp. 1-7, 2011.
  • H. Mehmood et al., "Implementing Big Data Lake for Heterogeneous Data Sources," 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), Macao, China, 2019, pp. 37-44, doi: 10.1109/ICDEW.2019.00-37.
  • J. C. Couto and D. D. Ruiz, "An overview about data integration in data lakes," 2022 17th Iberian Conference on Information Systems and Technologies (CISTI), Madrid, Spain, 2022, pp. 1-7, doi: 10.23919/CISTI54924.2022.9820576.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Kafka KSQL Data Lake.

Powered by PhDFocusTM