AITopics | routing

Collaborating Authors

routing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints

Markovic-Voronov, Jelena, Behdin, Kayhan, Xu, Yuanda, Zhou, Zhengze, Wang, Zhipeng, Mazumder, Rahul

arXiv.org Machine LearningMar-31-2026

We study the problem of routing queries to large language models (LLMs) under cost, GPU resources, and concurrency constraints. Prior per-query routing methods often fail to control batch-level cost, especially under non-uniform or adversarial batching. To address this, we propose a batch-level, resource-aware routing framework that jointly optimizes model assignment for each batch while respecting cost and model capacity limits. We further introduce a robust variant that accounts for uncertainty in predicted LLM performance, along with an offline instance allocation procedure that balances quality and throughput across multiple models. Experiments on two multi-task LLM benchmarks show that robustness improves accuracy by 1-14% over non-robust counterparts (depending on the performance estimator), batch-level routing outperforms per-query methods by up to 24% under adversarial batching, and optimized instance allocation yields additional gains of up to 3% compared to a non-optimized allocation, all while strictly controlling cost and GPU resource constraints.

dataset1, large language model, machine learning, (20 more...)

arXiv.org Machine Learning

2603.26796

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Unchosen Experts Can Contribute Too: Unleashing MoE Models ' Power by Self-Contrast

Neural Information Processing SystemsFeb-18-2026, 18:03:36 GMT

Mixture-of-Experts (MoE) has emerged as a prominent architecture for scaling model size while maintaining computational efficiency.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > France (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

665bb142d4b9f55660cb89bb56a66fe1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 12:43:26 GMT

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

STAR-Caps: Capsule Networks with Straight-Through Attentive Routing

Karim Ahmed, Lorenzo Torresani

Neural Information Processing SystemsFeb-14-2026, 04:59:58 GMT

Neural Information Processing Systems http://nips.cc/

capsule, capsule network, mechanism, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Nevada (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

4c5e2bcbf21bdf40d75fddad0bd43dc9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 19:17:24 GMT

arxiv, international conference, neural network, (11 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Saudi Arabia > Riyadh Province > Riyadh (0.04)

Genre: Research Report (0.93)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Market share maximizing strategies of CAV fleet operators may cause chaos in our cities

Jamróz, Grzegorz, Kucharski, Rafał, Watling, David

arXiv.org Artificial IntelligenceDec-4-2025

We study the dynamics and equilibria of a new kind of routing games, where players - drivers of future autonomous vehicles - may switch between individual (HDV) and collective (CAV) routing. In individual routing, just like today, drivers select routes minimizing expected travel costs, whereas in collective routing an operator centrally assigns vehicles to routes. The utility is then the average experienced travel time discounted with individually perceived attractiveness of automated driving. The market share maximising strategy amounts to offering utility greater than for individual routing to as many drivers as possible. Our theoretical contribution consists in developing a rigorous mathematical framework of individualized collective routing and studying algorithms which fleets of CAVs may use for their market-share optimization. We also define bi-level CAV - HDV equilibria and derive conditions which link the potential marketing behaviour of CAVs to the behavioural profile of the human population. Practically, we find that the fleet operator may often be able to equilibrate at full market share by simply mimicking the choices HDVs would make. In more realistic heterogenous human population settings, however, we discover that the market-share maximizing fleet controller should use highly variable mixed strategies as a means to attract or retain customers. The reason is that in mixed routing the powerful group player can control which vehicles are routed via congested and uncongested alternatives. The congestion pattern generated by CAVs is, however, not known to HDVs before departure and so HDVs cannot select faster routes and face huge uncertainty whichever alternative they choose. Consequently, mixed market-share maximising fleet strategies resulting in unpredictable day-to-day driving conditions may, alarmingly, become pervasive in our future cities.

artificial intelligence, machine learning, travel time, (19 more...)

arXiv.org Artificial Intelligence

2512.03524

Country: Europe (0.92)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Infrastructure & Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.87)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)

Add feedback

A Multi-Agent, Policy-Gradient approach to Network Routing

Tao, Nigel, Baxter, Jonathan, Weaver, Lex

arXiv.org Artificial IntelligenceDec-4-2025

Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, a policy-gradient reinforcement learning algorithm, was successfully applied to simulated network routing under a number of network models. Multiple distributed agents (routers) learned co-operative behavior without explicit inter-agent communication, and they avoided behavior which was individually desirable, but detrimental to the group's overall performance. Furthermore, shaping the reward signal by explicitly penalizing certain patterns of sub-optimal behavior was found to dramatically improve the convergence rate.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2512.03211

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

IslandRun: Privacy-Aware Multi-Objective Orchestration for Distributed AI Inference

Malepati, Bala Siva Sai Akhil

arXiv.org Artificial IntelligenceDec-2-2025

Modern AI inference faces an irreducible tension: no single computational resource simultaneously maximizes performance, preserves privacy, minimizes cost, and maintains trust. Existing orchestration frameworks optimize single dimensions (Kubernetes prioritizes latency, federated learning preserves privacy, edge computing reduces network distance), creating solutions that struggle under real-world heterogeneity. We present IslandRun, a multi-objective orchestration system that treats computational resources as autonomous "islands" spanning personal devices, private edge servers, and public cloud. Our key insights: (1) request-level heterogeneity demands policy-constrained multi-objective optimization, (2) data locality enables routing compute to data rather than data to compute, and (3) typed placeholder sanitization preserves context semantics across trust boundaries. IslandRun introduces agent-based routing, tiered island groups with differential trust, and reversible anonymization. This establishes a new paradigm for privacy-aware, decentralized inference orchestration across heterogeneous personal computing ecosystems.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.00595

Country: Asia (0.28)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Add feedback

Subjective Depth and Timescale Transformers: Learning Where and When to Compute

Wieser, Frederico, Benfeghoul, Martin, Ammar, Haitham Bou, Wang, Jun, Fountas, Zafeirios

arXiv.org Artificial IntelligenceNov-27-2025

The rigid, uniform allocation of computation in standard Transformer (TF) architectures can limit their efficiency and scalability, particularly for large-scale models and long sequences. Addressing this, we introduce Subjective Depth Transformers (SDT) and Subjective Timescale Transformers (STT), two distinct architectures that leverage Bayesian surprise signals to dynamically route computation, learning where and when to compute within decoder-only TFs. SDT augments a decoder-only stack with alternating Decision and Dynamic layers: a Decision layer computes a full block 'posterior' and a lightweight 'prior,' while a Dynamic layer employs fixed-capacity Top-K routing based on Bayesian surprise (Expected and Unexpected Change), maintaining a static compute graph. STT extends this conditional computation to the temporal domain: a transition network predicts residual updates, forming a temporal 'change hypothesis' that informs a router to dynamically execute or bypass TF blocks for each token, managing KV-cache contributions. Both architectures exhibit the predicted shift from novelty to prediction driven gating over training, suggesting alignment with surprise based principles. While operating at reduced capacity, they offer preliminary insights into the compute-accuracy trade-offs of conditional computation. The proposed architectures establish a flexible framework for efficiency, reducing self-attention computation by 75% and KV-cache requirements by 50% within each compute skipping layer, setting a pathway for more efficient models.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.21408

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

routing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints

Unchosen Experts Can Contribute Too: Unleashing MoE Models ' Power by Self-Contrast

665bb142d4b9f55660cb89bb56a66fe1-Paper-Conference.pdf

STAR-Caps: Capsule Networks with Straight-Through Attentive Routing

df4f371f1f89ec8ba5014b3310578048-Paper-Conference.pdf

4c5e2bcbf21bdf40d75fddad0bd43dc9-Paper-Conference.pdf

Market share maximizing strategies of CAV fleet operators may cause chaos in our cities

A Multi-Agent, Policy-Gradient approach to Network Routing

IslandRun: Privacy-Aware Multi-Objective Orchestration for Distributed AI Inference

Subjective Depth and Timescale Transformers: Learning Where and When to Compute