Goto

Collaborating Authors

 fcf


A Universal Load Balancing Principle and Its Application to Large Language Model Serving

arXiv.org Machine Learning

Load balancing-the allocation of work across parallel resources to reduce delay, energy and cost-is a pervasive challenge in science and engineering, from large-scale simulation and data processing to cloud and manufacturing operations. Motivated by the emerging bottleneck in large language model (LLM) serving, we study a particularly stringent regime of load balancing that arises in barrier-synchronized, stateful systems: work cannot be freely migrated and progress is gated by the slowest participant at each step, so heterogeneity and temporal drift in workloads create persistent stragglers and substantial idle time. LLM serving under data-parallel decoding provides a prominent modern instance: in production traces, barrier-induced idle can exceed 40% of compute time per decode step. Here we develop a universal load-balancing principle, which admits a step-wise finite-horizon integer-optimization formulation and yields worst-case guarantees: across LLM decode models and a broader class of non-decreasing workload drift processes, it reduces long-run imbalance by a factor that grows with batch size and system scale. Extensive experiments corroborate the theory, showing substantial improvements in throughput and latency together with reductions in energy consumption. These results provide a general, theoretically grounded framework for load balancing, with immediate implications for sustainable LLM serving and broad relevance to other synchronization-gated resource-allocation problems.


Frenet-Cartesian Model Representations for Automotive Obstacle Avoidance within Nonlinear MPC

arXiv.org Artificial Intelligence

In recent years, nonlinear model predictive control (NMPC) has been extensively used for solving automotive motion control and planning tasks. In order to formulate the NMPC problem, different coordinate systems can be used with different advantages. We propose and compare formulations for the NMPC related optimization problem, involving a Cartesian and a Frenet coordinate frame (CCF/ FCF) in a single nonlinear program (NLP). We specify costs and collision avoidance constraints in the more advantageous coordinate frame, derive appropriate formulations and compare different obstacle constraints. With this approach, we exploit the simpler formulation of opponent vehicle constraints in the CCF, as well as road aligned costs and constraints related to the FCF. Comparisons to other approaches in a simulation framework highlight the advantages of the proposed approaches.


Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System

arXiv.org Machine Learning

The increasing interest in user privacy is leading to new privacy preserving machine learning paradigms. In the Federated Learning paradigm, a master machine learning model is distributed to user clients, the clients use their locally stored data and model for both inference and calculating model updates. The model updates are sent back and aggregated on the server to update the master model then redistributed to the clients. In this paradigm, the user data never leaves the client, greatly enhancing the user' privacy, in contrast to the traditional paradigm of collecting, storing and processing user data on a backend server beyond the user's control. In this paper we introduce, as far as we are aware, the first federated implementation of a Collaborative Filter. The federated updates to the model are based on a stochastic gradient approach. As a classical case study in machine learning, we explore a personalized recommendation system based on users' implicit feedback and demonstrate the method's applicability to both the MovieLens and an in-house dataset. Empirical validation confirms a collaborative filter can be federated without a loss of accuracy compared to a standard implementation, hence enhancing the user's privacy in a widely used recommender application while maintaining recommender performance.


Enforcing Liveness in Autonomous Traffic Management

AAAI Conferences

Looking ahead to the time when autonomous cars will be common, Dresner and Stone proposed a multiagent systems-based intersection control protocol called Autonomous Intersection Management (AIM). They showed that by leveraging the capacities of autonomous vehicles it is possible to dramatically reduce the time wasted in traffic, and therefore also fuel consumption and air pollution. The proposed protocol, however, handles reservation requests one at a time and does not prioritize reservations according to their relative priorities and waiting times, causing potentially large inequalities in granting reservations. For example, at an intersection between a main street and an alley, vehicles from the alley can take an excessively long time to get reservations to enter the intersection, causing a waste of time and fuel. The same is true in a network of intersections, in which gridlock may occur and cause traffic congestion. In this paper, we introduce the batch processing of reservations in AIM to enforce liveness properties in intersections and analyze the conditions under which no vehicle will get stuck in traffic. Our experimental results show that our prioritizing schemes outperform previous intersection control protocols in unbalanced traffic.