Communications Magazine, IEEE, 52, 5, 157-164, 2014, IEEE
It is essential for distributed, data-intensive applications to monitor the performance of the underlying network, storage, and computational resources. Increasingly, distributed applications need performance information from multiple aggregates, and tools need to make real-time steering decisions based on the performance feedback. With increasing scale and complexity, the volume and velocity of monitoring data is increasing, posing scalability challenges. In this work, we have developed a persistent query agent (PQA) that provides real-time application and network performance feedback to clients/ applications, thereby enabling dynamic adaptations. The PQA enables federated performance monitoring by interacting with multiple aggregates and performance monitoring sources. Using a publish-subscribe framework, it sends triggers asynchronously to applications/clients when relevant performance events occur. The applications/clients register their events of interest using declarative queries and get notified by the PQA. The PQA leverages a complex event processing (CEP) framework for managing and executing the queries expressed in a standard SQL-like query language. Instead of saving all monitoring data for future analysis, PQA observes performance event streams in real time, and runs continuous queries over streams of monitoring events. In this work, we present the design and architecture of the PQA, and describe some relevant use cases.