Goto

Collaborating Authors

 aif




AIF: Asynchronous Inference Framework for Cost-Effective Pre-Ranking

Kou, Zhi, Sheng, Xiang-Rong, Han, Shuguang, Zhao, Zhishan, Cheng, Yueyao, Zhu, Han, Xu, Jian, Zheng, Bo

arXiv.org Artificial Intelligence

In industrial recommendation systems, pre-ranking models based on deep neural networks (DNNs) commonly adopt a sequential execution framework: feature fetching and model forward computation are triggered only after receiving candidates from the upstream retrieval stage. This design introduces inherent bottlenecks, including redundant computations of identical users/items and increased latency due to strictly sequential operations, which jointly constrain the model's capacity and system efficiency. To address these limitations, we propose the Asynchronous Inference Framework (AIF), a cost-effective computational architecture that decouples interaction-independent components, those operating within a single user or item, from real-time prediction. AIF reorganizes the model inference process by performing user-side computations in parallel with the retrieval stage and conducting item-side computations in a nearline manner. This means that interaction-independent components are calculated just once and completed before the real-time prediction phase of the pre-ranking stage. As a result, AIF enhances computational efficiency and reduces latency, freeing up resources to significantly improve the feature set and model architecture of interaction-independent components. Moreover, we delve into model design within the AIF framework, employing approximated methods for interaction-dependent components in online real-time predictions. By co-designing both the framework and the model, our solution achieves notable performance gains without significantly increasing computational and latency costs. This has enabled the successful deployment of AIF in the Taobao display advertising system.


Statistical post-processing yields accurate probabilistic forecasts from Artificial Intelligence weather models

Trotta, Belinda, Johnson, Robert, de Burgh-Day, Catherine, Hudson, Debra, Abellan, Esteban, Canvin, James, Kelly, Andrew, Mentiplay, Daniel, Owen, Benjamin, Whelan, Jennifer

arXiv.org Artificial Intelligence

Bureau of Meteorology, Australia ABSTRACT: Artificial Intelligence (AI) weather models are now reaching operational-grade performance for some variables, but like traditional Numerical Weather Prediction (NWP) models, they exhibit systematic biases and reliability issues. We test the application of the Bureau of Meteorology's existing statistical post-processing system, IMPROVER, to ECMWF's deterministic Artificial Intelligence Forecasting System (AIFS), and compare results against post-processed outputs from the ECMWF HRES and ENS models. Without any modification to processing workflows, post-processing yields comparable accuracy improvements for AIFS as for traditional NWP forecasts, in both expected value and probabilistic outputs. We show that blending AIFS with NWP models improves overall forecast skill, even when AIFS alone is not the most accurate component. These findings show that statistical post-processing methods developed for NWP are directly applicable to AI models, enabling national meteorological centres to incorporate AI forecasts into existing workflows in a low-risk, incremental fashion. Notice This Work has been accepted by Artificial Intelligence for the Earth Systems. The AMS does not guarantee that the copy provided here is an accurate copy of the Version of Record (VoR).


AI Factories: It's time to rethink the Cloud-HPC divide

Lopez, Pedro Garcia, Pons, Daniel Barcelona, Copik, Marcin, Hoefler, Torsten, Quiñones, Eduardo, Malawski, Maciej, Pietzutch, Peter, Marti, Alberto, Timoudas, Thomas Ohlson, Slominski, Aleksander

arXiv.org Artificial Intelligence

The strategic importance of artificial intelligence is driving a global push toward Sovereign AI initiatives. Nationwide governments are increasingly developing dedicated infrastructures, called AI Factories (AIF), to achieve technological autonomy and secure the resources necessary to sustain robust local digital ecosystems. In Europe, the EuroHPC Joint Undertaking is investing hundreds of millions of euros into several AI Factories, built atop existing high-performance computing (HPC) supercomputers. However, while HPC systems excel in raw performance, they are not inherently designed for usability, accessibility, or serving as public-facing platforms for AI services such as inference or agentic applications. In contrast, AI practitioners are accustomed to cloud-native technologies like Kubernetes and object storage, tools that are often difficult to integrate within traditional HPC environments. This article advocates for a dual-stack approach within supercomputers: integrating both HPC and cloud-native technologies. Our goal is to bridge the divide between HPC and cloud computing by combining high performance and hardware acceleration with ease of use and service-oriented front-ends. This convergence allows each paradigm to amplify the other. To this end, we will study the cloud challenges of HPC (Serverless HPC) and the HPC challenges of cloud technologies (High-performance Cloud).




few major issues we believe are key for the reviewers ' evaluation of the main contribution of the paper: 3

Neural Information Processing Systems

We thank the reviewers and AC for their thoughtful comments and thorough review. We will include detailed comparisons in the camera-ready version of the paper. Reviewer #1 urges us to describe our calculation of Eqs. We agree with the reviewer's statement that the entropy of the average is not the same as the average of We will describe this calculation in detail in the appendix. We will make this conceptual point explicit in the camera-ready version.


Active Inference for Energy Control and Planning in Smart Buildings and Communities

Nazemi, Seyyed Danial, Jafari, Mohsen A., Matta, Andrea

arXiv.org Artificial Intelligence

Active Inference (AIF) is emerging as a powerful framework for decision-making under uncertainty, yet its potential in engineering applications remains largely unexplored. In this work, we propose a novel dual-layer AIF architecture that addresses both building-level and community-level energy management. By leveraging the free energy principle, each layer adapts to evolving conditions and handles partial observability without extensive sensor information and respecting data privacy. We validate the continuous AIF model against both a perfect optimization baseline and a reinforcement learning-based approach. We also test the community AIF framework under extreme pricing scenarios. The results highlight the model's robustness in handling abrupt changes. This study is the first to show how a distributed AIF works in engineering. It also highlights new opportunities for privacy-preserving and uncertainty-aware control strategies in engineering applications.


Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum Systems

Lapkovskis, Alfreds, Sedlak, Boris, Magnússon, Sindri, Dustdar, Schahram, Donta, Praveen Kumar

arXiv.org Artificial Intelligence

Ensuring Service Level Objectives (SLOs) in large-scale architectures, such as Distributed Computing Continuum Systems (DCCS), is challenging due to their heterogeneous nature and varying service requirements across different devices and applications. Additionally, unpredictable workloads and resource limitations lead to fluctuating performance and violated SLOs. To improve SLO compliance in DCCS, one possibility is to apply machine learning; however, the design choices are often left to the developer. To that extent, we provide a benchmark of Active Inference -- an emerging method from neuroscience -- against three established reinforcement learning algorithms (Deep Q-Network, Advantage Actor-Critic, and Proximal Policy Optimization). We consider a realistic DCCS use case: an edge device running a video conferencing application alongside a WebSocket server streaming videos. Using one of the respective algorithms, we continuously monitor key performance metrics, such as latency and bandwidth usage, to dynamically adjust parameters -- including the number of streams, frame rate, and resolution -- to optimize service quality and user experience. To test algorithms' adaptability to constant system changes, we simulate dynamically changing SLOs and both instant and gradual data-shift scenarios, such as network bandwidth limitations and fluctuating device thermal states. Although the evaluated algorithms all showed advantages and limitations, our findings demonstrate that Active Inference is a promising approach for ensuring SLO compliance in DCCS, offering lower memory usage, stable CPU utilization, and fast convergence.