AITopics | Alessandro Lazaric

Efficient Second-Order Online Kernel Learning with Adaptive Embedding

Daniele Calandriello, Alessandro Lazaric, Michal Valko

Neural Information Processing SystemsMay-28-2025, 00:18:14 GMT

Online kernel learning (OKL) is a flexible framework for prediction problems, since the large approximation space provided by reproducing kernel Hilbert spaces often contains an accurate function for the problem. Nonetheless, optimizing over this space is computationally expensive. Not only first order methods accumulate O( T) more loss than the optimal function, but the curse of kernelization results in a O(t) per-step complexity.

artificial intelligence, machine learning, pro-n-kon, (13 more...)

Neural Information Processing Systems

Country: Europe > France (0.28)

Genre: Instructional Material > Online (0.61)

Industry:

Education > Educational Setting (0.46)
Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Fighting Boredom in Recommender Systems with Linear Reinforcement Learning

Romain WARLOP, Alessandro Lazaric, Jérémie Mary

Neural Information Processing SystemsMay-26-2025, 04:58:06 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.29)
Europe > France (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Fighting Boredom in Recommender Systems with Linear Reinforcement Learning

Romain WARLOP, Alessandro Lazaric, Jérémie Mary

Neural Information Processing SystemsMay-23-2025, 21:18:55 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.29)
Europe > France (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Regret Bounds for Learning State Representations in Reinforcement Learning

Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard

Neural Information Processing SystemsMar-26-2025, 08:42:49 GMT

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation.

machine learning, markov model, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.39)

Add feedback

Limiting Extrapolation in Linear Approximate Value Iteration

Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

Neural Information Processing SystemsMar-26-2025, 05:30:20 GMT

We study linear approximate value iteration (LAVI) with a generative model. While linear models may accurately represent the optimal value function using a few parameters, several empirical and theoretical studies show the combination of leastsquares projection with the Bellman operator may be expansive, thus leading LAVI to amplify errors over iterations and eventually diverge. We introduce an algorithm that approximates value functions by combining Q-values estimated at a set of anchor states. Our algorithm tries to balance the generalization and compactness of linear methods with the small amplification of errors typical of interpolation methods. We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points. These findings are confirmed in preliminary simulations in a number of simple problems where a traditional least-square LAVI method diverges.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs

Jian QIAN, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

Neural Information Processing SystemsJan-27-2025, 13:41:50 GMT

The exploration bonus is an effective approach to manage the explorationexploitation trade-off in Markov Decision Processes (MDPs). While it has been analyzed in infinite-horizon discounted and finite-horizon problems, we focus on designing and analysing the exploration bonus in the more challenging infinitehorizon undiscounted setting.

data mining, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Regret Bounds for Learning State Representations in Reinforcement Learning

Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard

Neural Information Processing SystemsJan-25-2025, 23:58:54 GMT

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation.

machine learning, markov model, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.39)

Add feedback

Limiting Extrapolation in Linear Approximate Value Iteration

Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

Neural Information Processing SystemsJan-25-2025, 16:36:30 GMT

We study linear approximate value iteration (LAVI) with a generative model. While linear models may accurately represent the optimal value function using a few parameters, several empirical and theoretical studies show the combination of leastsquares projection with the Bellman operator may be expansive, thus leading LAVI to amplify errors over iterations and eventually diverge. We introduce an algorithm that approximates value functions by combining Q-values estimated at a set of anchor states. Our algorithm tries to balance the generalization and compactness of linear methods with the small amplification of errors typical of interpolation methods. We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points. These findings are confirmed in preliminary simulations in a number of simple problems where a traditional least-square LAVI method diverges.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning

Nicolas Carion, Nicolas Usunier, Gabriel Synnaeve, Alessandro Lazaric

Neural Information Processing SystemsJan-23-2025, 02:27:20 GMT

Effective coordination is crucial to solve multi-agent collaborative (MAC) problems. While centralized reinforcement learning methods can optimally solve small MAC instances, they do not scale to large problems and they fail to generalize to scenarios different from those seen during training. In this paper, we consider MAC problems with some intrinsic notion of locality (e.g., geographic proximity) such that interactions between agents and tasks are locally limited. By leveraging this property, we introduce a novel structured prediction approach to assign agents to tasks. At each step, the assignment is obtained by solving a centralized optimization problem (the inference procedure) whose objective function is parameterized by a learned scoring model. We propose different combinations of inference procedures and scoring models able to represent coordination patterns of increasing complexity. The resulting assignment policy can be efficiently learned on small problem instances and readily reused in problems with more agents and tasks (i.e., zero-shot generalization).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.51)

Add feedback

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

Mohammad Gheshlaghi azar, Alessandro Lazaric, Emma Brunskill

Neural Information Processing SystemsOct-6-2024, 10:56:40 GMT

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of sequential transfer in online learning, notably in the multi-armed bandit framework, where the objective is to minimize the total regret over a sequence of tasks by transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for estimating the possible tasks and derive regret bounds for it.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country: