AITopics

Country: North America > United States (0.34)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Neural Information Processing SystemsFeb-9-2026, 13:56:14 GMT

Non-stationaryBanditswithKnapsacks

We employ both non-stationarity measures to derive upper andlowerbounds fortheproblem.

constraint, data mining, machine learning, (20 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Data Science > Data Mining > Big Data (0.30)

Neural Information Processing SystemsDec-24-2025, 19:06:36 GMT

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling

Video-language models suffer from forgetting old/learned knowledge when trained with streaming data. In this work, we thus propose a continual video-language modeling (CVLM) setting, where models are supposed to be sequentially trained on five widely-used video-text datasets with different data distributions. Although most of existing continual learning methods have achieved great success by exploiting extra information (e.g., memory data of past tasks) or dynamically extended networks, they cause enormous resource consumption when transferred to our CVLM setting. To overcome the challenges (i.e., catastrophic forgetting and heavy resource consumption) in CVLM, we propose a novel cross-modal MoCo-based model with bidirectional momentum update (BMU), termed BMU-MoCo. Concretely, our BMU-MoCo has two core designs: (1) Different from the conventional MoCo, we apply the momentum update to not only momentum encoders but also encoders (i.e., bidirectional) at each training step, which enables the model to review the learned knowledge retained in the momentum encoders.

bidirectional momentum update, bmu-moco, continual video-language modeling, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.62)
Information Technology > Artificial Intelligence > Machine Learning (0.39)

arXiv.org Artificial IntelligenceDec-3-2025

LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems

Zhang, Yuanhe, Wang, Weiliu, Zhou, Zhenhong, Wang, Kun, Zhang, Jie, Sun, Li, Liu, Yang, Su, Sen

Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in reasoning, planning, and tool usage. The recently proposed Model Context Protocol (MCP) has emerged as a unifying framework for integrating external tools into agent systems, enabling a thriving open ecosystem of community-built functionalities. However, the openness and composability that make MCP appealing also introduce a critical yet overlooked security assumption -- implicit trust in third-party tool providers. In this work, we identify and formalize a new class of attacks that exploit this trust boundary without violating explicit permissions. We term this new attack vector implicit toxicity, where malicious behaviors occur entirely within the allowed privilege scope. We propose LeechHijack, a Latent Embedded Exploit for Computation Hijacking, in which an adversarial MCP tool covertly expropriates the agent's computational resources for unauthorized workloads. LeechHijack operates through a two-stage mechanism: an implantation stage that embeds a benign-looking backdoor in a tool, and an exploitation stage where the backdoor activates upon predefined triggers to establish a command-and-control channel. Through this channel, the attacker injects additional tasks that the agent executes as if they were part of its normal workflow, effectively parasitizing the user's compute budget. We implement LeechHijack across four major LLM families. Experiments show that LeechHijack achieves an average success rate of 77.25%, with a resource overhead of 18.62% compared to the baseline. This study highlights the urgent need for computational provenance and resource attestation mechanisms to safeguard the emerging MCP ecosystem.

large language model, machine learning, natural language, (19 more...)

2512.02321

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsNov-21-2025, 15:36:36 GMT

Linear Contextual Bandits with Knapsacks

We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the sum of these vectors doesn't exceed the budget in each dimension. The objective is once again to maximize the total reward. This problem turns out to be a common generalization of classic linear contextual bandits (linContextual), bandits with knapsacks (BwK), and the online stochastic packing problem (OSPP).

knapsack, linear contextual bandit, name change, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.56)

Köbschall, Kirsten, Buschjäger, Sebastian, Fischer, Raphael, Hartung, Lisa, Kramer, Stefan

Lift What You Can: Green Online Learning with Heterogeneous Ensembles

arXiv.org Artificial IntelligenceOct-30-2025

Ensemble methods for stream mining necessitate managing multiple models and updating them as data distributions evolve. Considering the calls for more sustainability, established methods are however not sufficiently considerate of ensemble members' computational expenses and instead overly focus on predictive capabilities. To address these challenges and enable green online learning, we propose heterogeneous online ensembles (HEROS). For every training step, HEROS chooses a subset of models from a pool of models initialized with diverse hyperparameter choices under resource constraints to train. We introduce a Markov decision process to theoretically capture the trade-offs between predictive performance and sustainability constraints. Based on this framework, we present different policies for choosing which models to train on incoming data. Most notably, we propose the novel $ζ$-policy, which focuses on training near-optimal models at reduced costs. Using a stochastic model, we theoretically prove that our $ζ$-policy achieves near optimal performance while using fewer resources compared to the best performing policy. In our experiments across 11 benchmark datasets, we find empiric evidence that our $ζ$-policy is a strong contribution to the state-of-the-art, demonstrating highly accurate performance, in some cases even outperforming competitors, and simultaneously being much more resource-friendly.

artificial intelligence, data mining, machine learning, (15 more...)

2509.18962

Country:

North America > United States (0.68)
Europe > Germany (0.68)
South America > Brazil (0.46)

Genre: Research Report > New Finding (0.48)

Industry:

Education > Educational Setting > Online (1.00)
Government (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.54)
(2 more...)

Wei, Qinshuang, Srivastava, Vaibhav, Gupta, Vijay

Heterogeneous Multi-Agent Task-Assignment with Uncertain Execution Times and Preferences

arXiv.org Artificial IntelligenceOct-21-2025

While sequential task assignment for a single agent has been widely studied, such problems in a multi-agent setting, where the agents have heterogeneous task preferences or capabilities, remain less well-characterized. We study a multi-agent task assignment problem where a central planner assigns recurring tasks to multiple members of a team over a finite time horizon. For any given task, the members have heterogeneous capabilities in terms of task completion times, task resource consumption (which can model variables such as energy or attention), and preferences in terms of the rewards they collect upon task completion. We assume that the reward, execution time, and resource consumption for each member to complete any task are stochastic with unknown distributions. The goal of the planner is to maximize the total expected reward that the team receives over the problem horizon while ensuring that the resource consumption required for any assigned task is within the capability of the agent. We propose and analyze a bandit algorithm for this problem. Since the bandit algorithm relies on solving an optimal task assignment problem repeatedly, we analyze the achievable regret in two cases: when we can solve the optimal task assignment exactly and when we can solve it only approximately.

artificial intelligence, assignment, task assignment, (15 more...)

2510.16221

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Chakraborty, Proggya, Asrar, Aaquib, Sengupta, Jayasree, Bit, Sipra Das

Prioritizing Latency with Profit: A DRL-Based Admission Control for 5G Network Slices

arXiv.org Artificial IntelligenceOct-13-2025

5G networks enable diverse services such as eMBB, URLLC, and mMTC through network slicing, necessitating intelligent admission control and resource allocation to meet stringent QoS requirements while maximizing Network Service Provider (NSP) profits. However, existing Deep Reinforcement Learning (DRL) frameworks focus primarily on profit optimization without explicitly accounting for service delay, potentially leading to QoS violations for latency-sensitive slices. Moreover, commonly used epsilon-greedy exploration of DRL often results in unstable convergence and suboptimal policy learning. To address these gaps, we propose DePSAC -- a Delay and Profit-aware Slice Admission Control scheme. Our DRL-based approach incorporates a delay-aware reward function, where penalties due to service delay incentivize the prioritization of latency-critical slices such as URLLC. Additionally, we employ Boltzmann exploration to achieve smoother and faster convergence. We implement and evaluate DePSAC on a simulated 5G core network substrate with realistic Network Slice Request (NSLR) arrival patterns. Experimental results demonstrate that our method outperforms the DSARA baseline in terms of overall profit, reduced URLLC slice delays, improved acceptance rates, and improved resource consumption. These findings validate the effectiveness of the proposed DePSAC in achieving better QoS-profit trade-offs for practical 5G network slicing scenarios.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2510.08769

Genre: Research Report > New Finding (0.34)

Industry: Telecommunications (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Alcorn, Benjamin, Hammad, Eman

Situational Awareness for Safe and Robust Multi-Agent Interactions Under Uncertainty

arXiv.org Artificial IntelligenceSep-30-2025

Multi-agent systems are prevalent in a wide range of domains including power systems, vehicular networks, and robotics. Two important problems to solve in these types of systems are how the intentions of non-coordinating agents can be determined to predict future behavior and how the agents can achieve their objectives under resource constraints without significantly sacrificing performance. To study this, we develop a model where an autonomous agent observes the environment within a safety radius of observation, determines the state of a surrounding agent of interest (within the observation radius), estimates future actions to be taken, and acts in an optimal way. In the absence of observations, agents are able to utilize an estimation algorithm to predict the future actions of other agents based on historical trajectory. The use of the proposed estimation algorithm introduces uncertainty, which is managed via risk analysis. The proposed approach in this study is validated using two different learning-based decision making frameworks: reinforcement learning and game theoretic algorithms.

agent, algorithm, artificial intelligence, (15 more...)