AITopics

Country: North America > Canada > Alberta (0.14)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Cho, Jaehong, Choi, Hyunmin, Park, Jongse

LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure

arXiv.org Artificial IntelligenceNov-11-2025

T o overcome these issues, LLMServingSim2.0

artificial intelligence, large language model, natural language, (17 more...)

doi: 10.1109/LCA.2025.3628325

2511.07229

Country: Asia > South Korea (0.14)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.77)

Mingjun Zhong, Nigel Goddard, Charles Sutton

Latent Bayesian melding for integrating individual and population models

Neural Information Processing SystemsOct-2-2025, 03:10:47 GMT

Neural Information Processing Systems http://nips.cc/

appliance, bayesian, information, (16 more...)

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > United Kingdom > England (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Energy (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Neural Information Processing SystemsAug-19-2025, 04:27:22 GMT

Export Reviews, Discussions, Author Feedback and Meta-Reviews

If you google ``fully adapted particle filters'' you will find a lot more material. The authors have considered four different and all relevant application examples. The experimental section shows that the iFDM seems to work and that it can provide interesting results. The only comparison provided is against the FFBS-type algorithm, which we know will perform worse due to its construction. I know that it is a lot of work to implement other solutions to the problem, but if one were to do so it would probably provide an even better understanding of the performance of the model and it would be interesting to see the performance of existing solution to these problems. For example, for the multitarget tracking example, the simplest solution to this problem would probably be to use an extended Kalman filter together with nearest neighbour data association. Since your targets are very well separated I would expect this solution to perform quite well. It would be interesting to compare your performance against this simple standard solution. I have not worked with the cocktail party problem and the multiuser detection problems, but for the power disaggregation problem there are interesting solutions available, see for example the following NIPS paper (which is gaining some influence): Kolter, J. Z.; Batra, S.; and Ng, A. Y. Energy disaggregation via discriminative sparse coding.

algorithm, experiment, particle, (15 more...)

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

arXiv.org Artificial IntelligenceAug-8-2025

Nexus:Proactive Intra-GPU Disaggregation of Prefill and Decode in LLM Serving

Shi, Xiaoxiang, Cai, Colin, Du, Junjia, Jia, Zhihao

Monolithic serving with chunked prefill improves GPU utilization by batching prefill and decode together, but suffers from fine-grained phase interference. Engine-level prefill-decode (PD) disaggregation avoids interference but incurs higher hardware and coordination overhead. Prior intra-GPU disaggregation approaches multiplex prefill and decode within a single GPU, using SLO-based tuning guided by heuristics from offline profiling or reactive feedback loops. However, these methods respond reactively to performance issues rather than anticipating them, limiting adaptability under dynamic workloads. We ask: can we achieve proactive intra-GPU disaggregation that adapts effectively to dynamic workloads? The key challenge lies in managing the conflicting resource demands of prefill and decode under varying conditions. We first show that GPU resources exhibit diminishing returns -- beyond a saturation point, more allocation yields minimal latency benefit. Second, we observe that memory bandwidth contention becomes a critical bottleneck. These insights motivate a design that dynamically partitions GPU resources across prefill and decode phases, while jointly considering compute capacity, memory footprint, and bandwidth contention. Evaluated on diverse LLMs and workloads, our system Nexus achieves up to 2.2x higher throughput, 20x lower TTFT, and 2.5x lower TBT than vLLM; outperforms SGLang by up to 2x; and matches or exceeds disaggregated vLLM.

large language model, latency, machine learning, (19 more...)

2507.06608

Country:

Europe (1.00)
North America > United States > California (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceAug-6-2025

Frontier: Simulating the Next Generation of LLM Inference Systems

Feng, Yicheng, Tan, Xin, Sew, Kin Hang, Jiang, Yimin, Zhu, Yibo, Xu, Hong

Large Language Model (LLM) inference is growing increasingly complex with the rise of Mixture-of-Experts (MoE) models and disaggregated architectures that decouple components like prefill/decode (PD) or attention/FFN (AF) for heterogeneous scaling. Existing simulators, architected for co-located, dense models, are unable to capture the intricate system dynamics of these emerging paradigms. We present Frontier, a high-fidelity simulator designed from the ground up for this new landscape. Frontier introduces a unified framework to model both co-located and disaggregated systems, providing native support for MoE inference with expert parallelism (EP). It enables the simulation of complex workflows like cross-cluster expert routing and advanced pipelining strategies for latency hiding. To ensure fidelity and usability, Frontier incorporates refined operator models for improved accuracy. Frontier empowers the community to design and optimize the future of LLM inference at scale.

large language model, machine learning, natural language, (20 more...)

2508.03148

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-9-2025

Beyond the Buzz: A Pragmatic Take on Inference Disaggregation

Mitra, Tiyasa, Borkar, Ritika, Bhatia, Nidhi, Matas, Ramon, Raj, Shivam, Mudigere, Dheevatsa, Zhao, Ritchie, Golub, Maximilian, Dutta, Arpan, Madduri, Sailaja, Jani, Dharmesh, Pharris, Brian, Rouhani, Bita Darvish

As inference scales to multi-node deployments, disaggregation - splitting inference into distinct phases - offers a promising path to improving the throughput-interactivity Pareto frontier. Despite growing enthusiasm and a surge of open-source efforts, practical deployment of disaggregated serving remains limited due to the complexity of the optimization search space and system-level coordination. In this paper, we present the first systematic study of disaggregated inference at scale, evaluating hundreds of thousands of design points across diverse workloads and hardware configurations. We find that disaggregation is most effective for prefill-heavy traffic patterns and larger models. Our results highlight the critical role of dynamic rate matching and elastic scaling in achieving Pareto-optimal performance. Our findings offer actionable insights for efficient disaggregated deployments to navigate the trade-off between system throughput and interactivity.

large language model, machine learning, natural language, (19 more...)

2506.05508

Country: Asia (0.28)

Genre: Research Report > New Finding (0.54)

Industry: Transportation (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceMay-27-2025

Season-Independent PV Disaggregation Using Multi-Scale Net Load Temporal Feature Extraction and Weather Factor Fusion

Chen, Xiaolu, Huang, Chenghao, Zhang, Yanru, Wang, Hao

--With the advancement of energy Internet and energy system integration, the increasing adoption of distributed photovoltaic (PV) systems presents new challenges on smart monitoring and measurement for utility companies, particularly in separating PV generation from net electricity load. This paper proposes a PV disaggregation method that integrates Hierarchical Interpolation (HI) and multi-head self-attention mechanisms. By using HI to extract net load features and multi-head self-attention to capture the complex dependencies between weather factors, the method achieves precise PV generation predictions. Simulation experiments demonstrate the effectiveness of the proposed method in real-world data, supporting improved monitoring and management of distributed energy systems. With the increasing adoption of distributed solar photovoltaic (PV) systems, an expanding number of residential prosumers, who both produce and consume electricity, are generating electricity through their PV installations.

artificial intelligence, data mining, machine learning, (18 more...)

doi: 10.1109/EI264398.2024.10990379

2505.18747

Country: Asia > China (0.29)

Genre: Research Report (0.50)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

arXiv.org Machine LearningMar-27-2025

tempdisagg: A Python Framework for Temporal Disaggregation of Time Series Data

Vera-Jaramillo, Jaime

tempdisagg is a modern, extensible, and production-ready Python framework for temporal disaggregation of time series data. It transforms low-frequency aggregates into consistent, high-frequency estimates using a wide array of econometric techniques-including Chow-Lin, Denton, Litterman, Fernandez, and uniform interpolation-as well as enhanced variants with automated estimation of key parameters such as the autocorrelation coefficient rho. The package introduces features beyond classical methods, including robust ensemble modeling via non-negative least squares optimization, post-estimation correction of negative values under multiple aggregation rules, and optional regression-based imputation of missing values through a dedicated Retropolarizer module. Architecturally, it follows a modular design inspired by scikit-learn, offering a clean API for validation, modeling, visualization, and result interpretation.

artificial intelligence, machine learning, tempdisagg, (17 more...)

arXiv.org Machine Learning

2503.22054

Country:

North America > United States (0.28)
South America > Colombia (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Banking & Finance > Economy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.69)
Information Technology > Software (0.68)

Mingjun Zhong, Nigel Goddard, Charles Sutton

Signal Aggregate Constraints in Additive Factorial HMMs, with Application to Energy Disaggregation

Neural Information Processing SystemsFeb-8-2025, 22:47:45 GMT

Blind source separation problems are difficult because they are inherently unidentifiable, yet the entire goal is to identify meaningful sources. We introduce a way of incorporating domain knowledge into this problem, called signal aggregate constraints (SACs). SACs encourage the total signal for each of the unknown sources to be close to a specified value. This is based on the observation that the total signal often varies widely across the unknown sources, and we often have a good idea of what total values to expect. We incorporate SACs into an additive factorial hidden Markov model (AFHMM) to formulate the energy disaggregation problems where only one mixture signal is assumed to be observed. A convex quadratic program for approximate inference is employed for recovering those source signals. On a real-world energy disaggregation data set, we show that the use of SACs dramatically improves the original AFHMM, and significantly improves over a recent state-of-the-art approach.

artificial intelligence, constraint, machine learning, (15 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (0.48)

Industry: Energy (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)