disaggregation
SDP Relaxation with Randomized Rounding for Energy Disaggregation
Kiarash Shaloudegi, András György, Csaba Szepesvari, Wilsun Xu
We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring. In this problem the goal is to estimate the energy consumption of each appliance over time based on the total energy-consumption signal of a household. The current state of the art is to model the problem as inference in factorial HMMs, and use quadratic programming to find an approximate solution to the resulting quadratic integer program. Here we take a more principled approach, better suited to integer programming problems, and find an approximate optimum by combining convex semidefinite relaxations randomized rounding, as well as a scalable ADMM method that exploits the special structure of the resulting semidefinite program. Simulation results both in synthetic and real-world datasets demonstrate the superiority of our method.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
If you google ``fully adapted particle filters'' you will find a lot more material. The authors have considered four different and all relevant application examples. The experimental section shows that the iFDM seems to work and that it can provide interesting results. The only comparison provided is against the FFBS-type algorithm, which we know will perform worse due to its construction. I know that it is a lot of work to implement other solutions to the problem, but if one were to do so it would probably provide an even better understanding of the performance of the model and it would be interesting to see the performance of existing solution to these problems. For example, for the multitarget tracking example, the simplest solution to this problem would probably be to use an extended Kalman filter together with nearest neighbour data association. Since your targets are very well separated I would expect this solution to perform quite well. It would be interesting to compare your performance against this simple standard solution. I have not worked with the cocktail party problem and the multiuser detection problems, but for the power disaggregation problem there are interesting solutions available, see for example the following NIPS paper (which is gaining some influence): Kolter, J. Z.; Batra, S.; and Ng, A. Y. Energy disaggregation via discriminative sparse coding.
Nexus:Proactive Intra-GPU Disaggregation of Prefill and Decode in LLM Serving
Shi, Xiaoxiang, Cai, Colin, Du, Junjia, Jia, Zhihao
Monolithic serving with chunked prefill improves GPU utilization by batching prefill and decode together, but suffers from fine-grained phase interference. Engine-level prefill-decode (PD) disaggregation avoids interference but incurs higher hardware and coordination overhead. Prior intra-GPU disaggregation approaches multiplex prefill and decode within a single GPU, using SLO-based tuning guided by heuristics from offline profiling or reactive feedback loops. However, these methods respond reactively to performance issues rather than anticipating them, limiting adaptability under dynamic workloads. We ask: can we achieve proactive intra-GPU disaggregation that adapts effectively to dynamic workloads? The key challenge lies in managing the conflicting resource demands of prefill and decode under varying conditions. We first show that GPU resources exhibit diminishing returns -- beyond a saturation point, more allocation yields minimal latency benefit. Second, we observe that memory bandwidth contention becomes a critical bottleneck. These insights motivate a design that dynamically partitions GPU resources across prefill and decode phases, while jointly considering compute capacity, memory footprint, and bandwidth contention. Evaluated on diverse LLMs and workloads, our system Nexus achieves up to 2.2x higher throughput, 20x lower TTFT, and 2.5x lower TBT than vLLM; outperforms SGLang by up to 2x; and matches or exceeds disaggregated vLLM.
Frontier: Simulating the Next Generation of LLM Inference Systems
Feng, Yicheng, Tan, Xin, Sew, Kin Hang, Jiang, Yimin, Zhu, Yibo, Xu, Hong
Large Language Model (LLM) inference is growing increasingly complex with the rise of Mixture-of-Experts (MoE) models and disaggregated architectures that decouple components like prefill/decode (PD) or attention/FFN (AF) for heterogeneous scaling. Existing simulators, architected for co-located, dense models, are unable to capture the intricate system dynamics of these emerging paradigms. We present Frontier, a high-fidelity simulator designed from the ground up for this new landscape. Frontier introduces a unified framework to model both co-located and disaggregated systems, providing native support for MoE inference with expert parallelism (EP). It enables the simulation of complex workflows like cross-cluster expert routing and advanced pipelining strategies for latency hiding. To ensure fidelity and usability, Frontier incorporates refined operator models for improved accuracy. Frontier empowers the community to design and optimize the future of LLM inference at scale.
Beyond the Buzz: A Pragmatic Take on Inference Disaggregation
Mitra, Tiyasa, Borkar, Ritika, Bhatia, Nidhi, Matas, Ramon, Raj, Shivam, Mudigere, Dheevatsa, Zhao, Ritchie, Golub, Maximilian, Dutta, Arpan, Madduri, Sailaja, Jani, Dharmesh, Pharris, Brian, Rouhani, Bita Darvish
As inference scales to multi-node deployments, disaggregation - splitting inference into distinct phases - offers a promising path to improving the throughput-interactivity Pareto frontier. Despite growing enthusiasm and a surge of open-source efforts, practical deployment of disaggregated serving remains limited due to the complexity of the optimization search space and system-level coordination. In this paper, we present the first systematic study of disaggregated inference at scale, evaluating hundreds of thousands of design points across diverse workloads and hardware configurations. We find that disaggregation is most effective for prefill-heavy traffic patterns and larger models. Our results highlight the critical role of dynamic rate matching and elastic scaling in achieving Pareto-optimal performance. Our findings offer actionable insights for efficient disaggregated deployments to navigate the trade-off between system throughput and interactivity.
Season-Independent PV Disaggregation Using Multi-Scale Net Load Temporal Feature Extraction and Weather Factor Fusion
Chen, Xiaolu, Huang, Chenghao, Zhang, Yanru, Wang, Hao
--With the advancement of energy Internet and energy system integration, the increasing adoption of distributed photovoltaic (PV) systems presents new challenges on smart monitoring and measurement for utility companies, particularly in separating PV generation from net electricity load. This paper proposes a PV disaggregation method that integrates Hierarchical Interpolation (HI) and multi-head self-attention mechanisms. By using HI to extract net load features and multi-head self-attention to capture the complex dependencies between weather factors, the method achieves precise PV generation predictions. Simulation experiments demonstrate the effectiveness of the proposed method in real-world data, supporting improved monitoring and management of distributed energy systems. With the increasing adoption of distributed solar photovoltaic (PV) systems, an expanding number of residential prosumers, who both produce and consume electricity, are generating electricity through their PV installations.
tempdisagg: A Python Framework for Temporal Disaggregation of Time Series Data
tempdisagg is a modern, extensible, and production-ready Python framework for temporal disaggregation of time series data. It transforms low-frequency aggregates into consistent, high-frequency estimates using a wide array of econometric techniques-including Chow-Lin, Denton, Litterman, Fernandez, and uniform interpolation-as well as enhanced variants with automated estimation of key parameters such as the autocorrelation coefficient rho. The package introduces features beyond classical methods, including robust ensemble modeling via non-negative least squares optimization, post-estimation correction of negative values under multiple aggregation rules, and optional regression-based imputation of missing values through a dedicated Retropolarizer module. Architecturally, it follows a modular design inspired by scikit-learn, offering a clean API for validation, modeling, visualization, and result interpretation.
Signal Aggregate Constraints in Additive Factorial HMMs, with Application to Energy Disaggregation
Mingjun Zhong, Nigel Goddard, Charles Sutton
Blind source separation problems are difficult because they are inherently unidentifiable, yet the entire goal is to identify meaningful sources. We introduce a way of incorporating domain knowledge into this problem, called signal aggregate constraints (SACs). SACs encourage the total signal for each of the unknown sources to be close to a specified value. This is based on the observation that the total signal often varies widely across the unknown sources, and we often have a good idea of what total values to expect. We incorporate SACs into an additive factorial hidden Markov model (AFHMM) to formulate the energy disaggregation problems where only one mixture signal is assumed to be observed. A convex quadratic program for approximate inference is employed for recovering those source signals. On a real-world energy disaggregation data set, we show that the use of SACs dramatically improves the original AFHMM, and significantly improves over a recent state-of-the-art approach.