AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsNov-19-2025, 00:22:20 GMT

Common Concerns 2 Novelty

We thank all reviewers for their valuable comments. We address the concerns raised by them below. The idea of using imitation learning to make approximate decisions is not new. The author needs to provide a wall-clock time cost comparison of different methods. We will include them in the final verision.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Neural Information Processing SystemsAug-20-2025, 03:40:43 GMT

Compiler Auto-Vectorization with Imitation Learning

Charith Mendis, Cambridge Yang, Yewen Pu, Dr.Saman Amarasinghe, Michael Carbin

Neural Information Processing Systems http://nips.cc/

benchmark suite, instruction, vectorization, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > District of Columbia > Washington (0.05)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Neural Information Processing SystemsAug-20-2025, 03:31:44 GMT

d1d5923fc822531bbfd9d87d4760914b-AuthorFeedback.pdf

execution time, learnt policy, optimal solution, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Neural Information Processing SystemsMay-27-2025, 10:31:50 GMT

Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach

deep reinforcement learning, enhancing robustness, lyapunov exponent approach, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMay-3-2024

Multi-Objective Recommendation via Multivariate Policy Learning

Jeunen, Olivier, Mandav, Jatin, Potapov, Ivan, Agarwal, Nakul, Vaid, Sourabh, Shi, Wenzhe, Ustimenko, Aleksei

Real-world recommender systems often need to balance multiple objectives when deciding which recommendations to present to users. These include behavioural signals (e.g. clicks, shares, dwell time), as well as broader objectives (e.g. diversity, fairness). Scalarisation methods are commonly used to handle this balancing task, where a weighted average of per-objective reward signals determines the final score used for ranking. Naturally, how these weights are computed exactly, is key to success for any online platform. We frame this as a decision-making task, where the scalarisation weights are actions taken to maximise an overall North Star reward (e.g. long-term user retention or growth). We extend existing policy learning methods to the continuous multivariate action domain, proposing to maximise a pessimistic lower bound on the North Star reward that the learnt policy will yield. Typical lower bounds based on normal approximations suffer from insufficient coverage, and we propose an efficient and effective policy-dependent correction for this. We provide guidance to design stochastic data collection policies, as well as highly sensitive reward signals. Empirical observations from simulations, offline and online experiments highlight the efficacy of our deployed approach.

estimator, international conference, proceedings, (15 more...)

2405.02141

Country:

Asia > India (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.68)

Industry:

Information Technology > Services (0.46)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.36)

arXiv.org Artificial IntelligenceMar-29-2024

Improving Learnt Local MAPF Policies with Heuristic Search

Veerapaneni, Rishi, Wang, Qian, Ren, Kevin, Jakobsson, Arthur, Li, Jiaoyang, Likhachev, Maxim

Multi-agent path finding (MAPF) is the problem of finding collision-free paths for a team of agents to reach their goal locations. State-of-the-art classical MAPF solvers typically employ heuristic search to find solutions for hundreds of agents but are typically centralized and can struggle to scale when run with short timeouts. Machine learning (ML) approaches that learn policies for each agent are appealing as these could enable decentralized systems and scale well while maintaining good solution quality. Current ML approaches to MAPF have proposed methods that have started to scratch the surface of this potential. However, state-of-the-art ML approaches produce "local" policies that only plan for a single timestep and have poor success rates and scalability. Our main idea is that we can improve a ML local policy by using heuristic search methods on the output probability distribution to resolve deadlocks and enable full horizon planning. We show several model-agnostic ways to use heuristic search with learnt policies that significantly improve the policies' success rates and scalability. To our best knowledge, we demonstrate the first time ML-based MAPF approaches have scaled to high congestion scenarios (e.g. 20% agent density).

agent, collision, lacam, (13 more...)

2403.203

Country:

North America > United States > California (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Rigter, Marc, Lacerda, Bruno, Hawes, Nick

A Framework for Learning from Demonstration with Minimal Human Effort

arXiv.org Artificial IntelligenceJun-15-2023

We consider robot learning in the context of shared autonomy, where control of the system can switch between a human teleoperator and autonomous control. In this setting we address reinforcement learning, and learning from demonstration, where there is a cost associated with human time. This cost represents the human time required to teleoperate the robot, or recover the robot from failures. For each episode, the agent must choose between requesting human teleoperation, or using one of its autonomous controllers. In our approach, we learn to predict the success probability for each controller, given the initial state of an episode. This is used in a contextual multi-armed bandit algorithm to choose the controller for the episode. A controller is learnt online from demonstrations and reinforcement learning so that autonomous performance improves, and the system becomes less reliant on the teleoperator with more experience. We show that our approach to controller selection reduces the human cost to perform two simulated tasks and a single real-world task.

demonstration, machine learning, reinforcement learning, (16 more...)

2306.09211

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Moghadam, Mahshid Helali, Saadatmand, Mehrdad, Borg, Markus, Bohlin, Markus, Lisper, Björn

An Autonomous Performance Testing Framework using Self-Adaptive Fuzzy Reinforcement Learning

arXiv.org Artificial IntelligenceAug-19-2019

Test automation can result in reduction in cost and human effort. If the optimal policy, the course of actio ns taken, for the intended objective in a testing process could be learnt by the testing system (e.g., a smart tester agent), then it could be reused in similar situations, thus leading to higher efficiency, i.e., less computational time. Automating stress testing to find performance breaking points remains a challenge for complex software systems. Common approaches are mainly based on source code or system model analysis or use - case based techniques. However, source code or system models might not be avai lable at testing time. In this paper, we propose a self - adaptive fuzzy reinforcement learning - based performance (stress) testing framework (SaFReL) that enables the tester agent to learn the optimal policy for generating stress test case s leading to performance breaking point without access to performance model of the system under test. SaFReL learns the optimal policy through an initial learning, then reuses it during a transfer learning phase, while keeping the learning running in the long - term. Through multiple experiments on a simulated environment, we demonstrate that our approach generates the stress test case s for different programs efficiently and adaptively without access to performance models .

machine learning, reinforcement learning, safrel, (18 more...)

1908.069

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)