2022 IEEE International Conference on Communications (ICC)
Most recent network failure diagnosis systems focused on data center networks where complex measurement systems can be deployed to derive routing information and ensure network coverage in order to achieve accurate and fast fault localization. In this paper, we target the wide-area networks to support the data-intensive distributed applications. We first present a new multi-output prediction model that directly maps the application level observations to localize the system component failures. In reality, this application-centric approach may face the missing data challenge as some input (feature) data to the inference models may be missing due to incomplete or lost measurements in the wide area networks. We show that the presented prediction model naturally allows the multivariate imputation to recover the missing data. We evaluate multiple imputation algorithms and show the prediction performance can be improved significantly in a large-scale network. As far as we know, this is the first study on the missing data issue and applying imputation techniques in the network failure localization.