📄
Abstract
Distributed software systems face significant challenges related to data quality due to their complex, decentralized architecture. These systems often involve multiple nodes responsible for processing and storing data, making it difficult to maintain consistency and ensure accurate data across the entire network. In particular, issues like data inconsistency, latency, and data fragmentation are prevalent in distributed environments. To address these challenges, this study proposes an integrated data quality governance strategy that combines real time monitoring and automated anomaly detection using machine learning models. The proposed strategy aims to improve data consistency, enhance anomaly detection capabilities, and reduce the need for manual intervention, ultimately improving overall data governance in distributed systems. Real time monitoring ensures immediate identification of data issues as they occur, while machine learning models, such as autoencoders and Isolation Forests, automate the detection of anomalies based on high reconstruction errors and data isolation techniques. The study evaluates the proposed strategy through real-world distributed system scenarios, comparing its effectiveness to traditional approaches like periodic audits and manual validation. Results demonstrate that the integrated approach leads to faster anomaly detection, reduced data inconsistencies, and improved overall system performance. The use of advanced machine learning techniques and real time analytics significantly enhances the system's ability to maintain high data quality standards across multiple distributed nodes. This strategy has wide-ranging implications for industries that rely on distributed systems, such as finance, healthcare, and IoT, where data integrity is essential for operational success. Future research can focus on integrating more advanced machine learning techniques and optimizing the real time monitoring framework to handle larger and more complex systems.