Project Description
PosEiDon aims to advance the knowledge of how simulation and machine learning (ML) methodologies can be harnessed and amplified to improve DOE’s computational and data science. PosEiDon will provide an integrated platform that helps facility operators and scientists improve the overall end-to-end science workflow by (1) predicting the performance of complex workflows; (2) detecting and classifying infrastructure and workflow anomalies and “explaining” the sources of these anomalies; and (3) suggesting performance optimizations.
Partners: University of Southern California (Lead), Lawrence Berkeley Laboratory, Argonne National Laboratory, RENCI
Funding: US Department of Energy
Project Publications
- (POSTER) Network Testbed for Experimenting With Decentralized Federated Learning
- Advancing Anomaly Detection in Computational Workflows with Active Learning
- Anomaly Detection in Scientific Workflows using End-to-End Execution Gantt Charts and Convolutional Neural Networks
- Elephants Sharing the Highway: Studying TCP Fairness in Large Transfers over High Throughput Links
- End-to-end online performance data capture and analysis for scientific workflows
- Experimenting TCP Performance with Fabric
- Flow-Bench: A Dataset for Computational Workflow Anomaly Detection
- Graph Neural Network for Anomalies Detection in Scientific Workflows
- Graph Neural Networks for Detecting Anomalies in Scientific Workflows
- Large Language Models for Anomaly Detection in Computational Workflows: from Supervised Fine-Tuning to In-Context Learning
- Workflow Anomaly Detection with Graph Neural Networks

