Research Article

AI-Driven System for Real Time Integration Failure Prediction & Policy Governed Mitigation

by  Bhanu Pratap Singh, Anil Mandloi, Amit Gupta
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 63
Published: December 2025
Authors: Bhanu Pratap Singh, Anil Mandloi, Amit Gupta
10.5120/ijca2025926041
PDF

Bhanu Pratap Singh, Anil Mandloi, Amit Gupta . AI-Driven System for Real Time Integration Failure Prediction & Policy Governed Mitigation. International Journal of Computer Applications. 187, 63 (December 2025), 34-43. DOI=10.5120/ijca2025926041

                        @article{ 10.5120/ijca2025926041,
                        author  = { Bhanu Pratap Singh,Anil Mandloi,Amit Gupta },
                        title   = { AI-Driven System for Real Time Integration Failure Prediction & Policy Governed Mitigation },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 63 },
                        pages   = { 34-43 },
                        doi     = { 10.5120/ijca2025926041 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2025
                        %A Bhanu Pratap Singh
                        %A Anil Mandloi
                        %A Amit Gupta
                        %T AI-Driven System for Real Time Integration Failure Prediction & Policy Governed Mitigation%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 63
                        %P 34-43
                        %R 10.5120/ijca2025926041
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Modern-day enterprise-grade computing systems rely on complex integrations across microservices, APIs, and legacy platforms. Integration failures cause a variety of issues, resulting in message loss, resulting in disruption of business processes, along with manual yet complex retry processes, not just limited to retrying the failed transaction from integration, but also sometimes manipulating data in source systems to retrigger the whole transaction. Conventional rule-based monitoring struggles with the volume, velocity, and variability of integration traffic. This paper introduces an AI-powered framework, IntelliFix, for real-time failure detection, root cause isolation, and automated mitigation in integration pipelines. IntelliFix is a cognitively autonomous framework to reengineer integration pipelines as temporal heterogeneous graphs and failure mitigation as a constrained Markov Decision Process (MDP). The core components of the solution include a dual-stage hybrid architecture: • A Temporal Graph Attention Network (TGAT) with payload-aware edge embeddings derived from a domain-adapted BERT encoder for subgraph anomaly forecasting with lead time in minutes. • A Proximal Policy Optimization (PPO) agent augmented with safety-critical action masking and counterfactual regret minimization for reducing Mean Time To Recovery (MTTR). This paper also introduces Diff2Vec, a differential schema embedding technique that captures structural drift in JSON/XML payloads using Siamese contrastive learning.

References
  • Gartner, "Integration Failures Cost Report," 2024.
  • J. Smith et al., "Runbooks at Scale," USENIX SREcon, 2019.
  • Datadog, "State of Observability," 2023.
  • M. Du et al., "DeepLog: Anomaly Detection in Logs," KDD 2019.
  • S. Taylor et al., "Prophet: Time Series Forecasting," Facebook Research, 2017.
  • W. Hamilton et al., "GraphSage: Inductive Learning on Graphs," NeurIPS 2017.
  • H. Mao et al., "AutoRL: Resource Management with RL," ICML 2021.
  • ChaosMesh, "Chaos Engineering Toolkit," CNCF 2022.
  • H. Guo et al., "LogBERT: Log Anomaly Detection," IEEE TKDE 2021.
  • H. Guo et al., "LogBERT: Log Anomaly Detection," IEEE TKDE 2021.
  • A. Li et al., "DGL for Service Graphs," OSDI 2022.
  • W. Park et al., "RL for Scheduling," NeurIPS 2020.
  • M. Achiam et al., "Constrained Policy Optimization," ICML 2017.
  • J. Choi et al., "Lyapunov Barriers," CoRL 2021.
  • M. Alshiekh et al., "Safe RL via Shielding," AAAI 2018.
  • V. Mnih et al., "Asynchronous Methods for RL," Nature 2016.
  • R. Ying et al., "GNNExplainer," NeurIPS 2019.
  • D. Luo et al., "PGExplainer," NeurIPS 2020.
  • Q. Liu et al., "Meta-GNN," KDD 2021.
  • F. Zhou et al., "Graph MAML," ICLR 2020.
  • J. Schulman et al., "Proximal Policy Optimization Algorithms," arXiv:1707.06347, 2017.
  • H. Zhou et al., "Design of an integrated model with temporal graph attention and ... ," Nature Scientific Reports, 2025.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

System Integration Integration failure AI Engineering Machine learning Integration Reliability Predictive Maintenance Deep Reinforcement Learning Graph Neural Networks Federated Learning Schema Evolution

Powered by PhDFocusTM