📄
Abstract
Smart cities are increasingly leveraging advanced technologies such as the Internet of Things (IoT), Artificial Intelligence (AI), and Big Data Analytics to optimize urban management and improve the quality of life for citizens. However, managing vast and diverse datasets from numerous sources in real-time presents several challenges. This research proposes a modular framework that integrates distributed data processing engines with container-based workflow orchestration to address scalability, latency, adaptability, and fault tolerance in smart city data analytics. The framework utilizes cloud native technologies, including Apache Spark and Kubernetes, to efficiently manage resources and ensure high availability. The experimental setup tested the framework’s ability to handle dynamic data loads, demonstrating scalability through real-time resource allocation and low-latency processing. The adaptability of the framework was evident in its seamless integration with various data sources, such as environmental sensors and traffic management systems, which require different processing methods. Additionally, the framework’s modularity provided fault tolerance, enabling continued operation even if individual components failed, a crucial feature for mission-critical applications in smart cities. Compared to traditional monolithic systems, the proposed framework outperformed in flexibility, scalability, and performance, offering significant improvements in handling real-time data streams. Despite these advantages, challenges remain, particularly in integrating heterogeneous data formats and optimizing real-time processing for high-priority applications. The research highlights the importance of scalable data analytics and efficient workflow orchestration for the future of smart city platforms, offering a foundation for the development of more resilient, adaptable, and efficient cloud native infrastructures.