Goto

Collaborating Authors

 routing







Market share maximizing strategies of CAV fleet operators may cause chaos in our cities

Jamróz, Grzegorz, Kucharski, Rafał, Watling, David

arXiv.org Artificial Intelligence

We study the dynamics and equilibria of a new kind of routing games, where players - drivers of future autonomous vehicles - may switch between individual (HDV) and collective (CAV) routing. In individual routing, just like today, drivers select routes minimizing expected travel costs, whereas in collective routing an operator centrally assigns vehicles to routes. The utility is then the average experienced travel time discounted with individually perceived attractiveness of automated driving. The market share maximising strategy amounts to offering utility greater than for individual routing to as many drivers as possible. Our theoretical contribution consists in developing a rigorous mathematical framework of individualized collective routing and studying algorithms which fleets of CAVs may use for their market-share optimization. We also define bi-level CAV - HDV equilibria and derive conditions which link the potential marketing behaviour of CAVs to the behavioural profile of the human population. Practically, we find that the fleet operator may often be able to equilibrate at full market share by simply mimicking the choices HDVs would make. In more realistic heterogenous human population settings, however, we discover that the market-share maximizing fleet controller should use highly variable mixed strategies as a means to attract or retain customers. The reason is that in mixed routing the powerful group player can control which vehicles are routed via congested and uncongested alternatives. The congestion pattern generated by CAVs is, however, not known to HDVs before departure and so HDVs cannot select faster routes and face huge uncertainty whichever alternative they choose. Consequently, mixed market-share maximising fleet strategies resulting in unpredictable day-to-day driving conditions may, alarmingly, become pervasive in our future cities.


A Multi-Agent, Policy-Gradient approach to Network Routing

Tao, Nigel, Baxter, Jonathan, Weaver, Lex

arXiv.org Artificial Intelligence

Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, a policy-gradient reinforcement learning algorithm, was successfully applied to simulated network routing under a number of network models. Multiple distributed agents (routers) learned co-operative behavior without explicit inter-agent communication, and they avoided behavior which was individually desirable, but detrimental to the group's overall performance. Furthermore, shaping the reward signal by explicitly penalizing certain patterns of sub-optimal behavior was found to dramatically improve the convergence rate.


IslandRun: Privacy-Aware Multi-Objective Orchestration for Distributed AI Inference

Malepati, Bala Siva Sai Akhil

arXiv.org Artificial Intelligence

Modern AI inference faces an irreducible tension: no single computational resource simultaneously maximizes performance, preserves privacy, minimizes cost, and maintains trust. Existing orchestration frameworks optimize single dimensions (Kubernetes prioritizes latency, federated learning preserves privacy, edge computing reduces network distance), creating solutions that struggle under real-world heterogeneity. We present IslandRun, a multi-objective orchestration system that treats computational resources as autonomous "islands" spanning personal devices, private edge servers, and public cloud. Our key insights: (1) request-level heterogeneity demands policy-constrained multi-objective optimization, (2) data locality enables routing compute to data rather than data to compute, and (3) typed placeholder sanitization preserves context semantics across trust boundaries. IslandRun introduces agent-based routing, tiered island groups with differential trust, and reversible anonymization. This establishes a new paradigm for privacy-aware, decentralized inference orchestration across heterogeneous personal computing ecosystems.


PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration

Zhan, Junfei, Shen, Haoxun, Lin, Zheng, He, Tengjiao

arXiv.org Artificial Intelligence

Large Language Models (LLMs) demonstrate impressive capabilities in natural language understanding and generation, but incur high communication overhead and privacy risks in cloud deployments, while facing compute and memory constraints when confined to edge devices. Cloud-edge inference has emerged as a promising paradigm for improving privacy in LLM services by retaining sensitive computations on local devices. However, existing cloud-edge inference approaches apply uniform privacy protection without considering input sensitivity, resulting in unnecessary perturbation and degraded utility even for non-sensitive tokens. To address this limitation, we propose Privacy-aware Routing for Inference with Semantic Modulation (PRISM), a context-aware framework that dynamically balances privacy and inference quality. PRISM executes in four stages: (1) the edge device profiles entity-level sensitivity; (2) a soft gating module, also on the edge, selects an execution mode -cloud, edge, or collaboration; (3) for collaborative paths, the edge applies adaptive two-layer local differential privacy based on entity risks; and (4) the cloud LLM generates a semantic sketch from the perturbed prompt, which is then refined by the edge-side small language model (SLM) using local context. Our results show that PRISM consistently achieves superior privacy-utility trade-offs in various scenarios, reducing energy consumption and latency to 40-50% of baseline methods such as Uniform and Selective LDP, while maintaining high output quality under strong privacy constraints. These findings are validated through comprehensive evaluations involving realistic prompts, actual energy measurements, and heterogeneous cloud-edge model deployments.


Subjective Depth and Timescale Transformers: Learning Where and When to Compute

Wieser, Frederico, Benfeghoul, Martin, Ammar, Haitham Bou, Wang, Jun, Fountas, Zafeirios

arXiv.org Artificial Intelligence

The rigid, uniform allocation of computation in standard Transformer (TF) architectures can limit their efficiency and scalability, particularly for large-scale models and long sequences. Addressing this, we introduce Subjective Depth Transformers (SDT) and Subjective Timescale Transformers (STT), two distinct architectures that leverage Bayesian surprise signals to dynamically route computation, learning where and when to compute within decoder-only TFs. SDT augments a decoder-only stack with alternating Decision and Dynamic layers: a Decision layer computes a full block 'posterior' and a lightweight 'prior,' while a Dynamic layer employs fixed-capacity Top-K routing based on Bayesian surprise (Expected and Unexpected Change), maintaining a static compute graph. STT extends this conditional computation to the temporal domain: a transition network predicts residual updates, forming a temporal 'change hypothesis' that informs a router to dynamically execute or bypass TF blocks for each token, managing KV-cache contributions. Both architectures exhibit the predicted shift from novelty to prediction driven gating over training, suggesting alignment with surprise based principles. While operating at reduced capacity, they offer preliminary insights into the compute-accuracy trade-offs of conditional computation. The proposed architectures establish a flexible framework for efficiency, reducing self-attention computation by 75% and KV-cache requirements by 50% within each compute skipping layer, setting a pathway for more efficient models.