Braga
Maximum-Entropy Exploration with Future State-Action Visitation Measures
Bolland, Adrien, Lambrechts, Gaspard, Ernst, Damien
Maximum entropy reinforcement learning motivates agents to explore states and actions to maximize the entropy of some distribution, typically by providing additional intrinsic rewards proportional to that entropy function. In this paper, we study intrinsic rewards proportional to the entropy of the discounted distribution of state-action features visited during future time steps. This approach is motivated by two results. First, we show that the expected sum of these intrinsic rewards is a lower bound on the entropy of the discounted distribution of state-action features visited in trajectories starting from the initial states, which we relate to an alternative maximum entropy objective. Second, we show that the distribution used in the intrinsic reward definition is the fixed point of a contraction operator and can therefore be estimated off-policy. Experiments highlight that the new objective leads to improved visitation of features within individual trajectories, in exchange for slightly reduced visitation of features in expectation over different trajectories, as suggested by the lower bound. It also leads to improved convergence speed for learning exploration-only agents. Control performance remains similar across most methods on the considered benchmarks.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Portugal > Braga > Braga (0.04)
Locally Linear Continual Learning for Time Series based on VC-Theoretical Generalization Bounds
Ferreira, Yan V. G., Lima, Igor B., S., Pedro H. G. Mapa, Campos, Felipe V., Braga, Antonio P.
Most machine learning methods assume fixed probability distributions, limiting their applicability in nonstationary real-world scenarios. While continual learning methods address this issue, current approaches often rely on black-box models or require extensive user intervention for interpretability. We propose SyMPLER (Systems Modeling through Piecewise Linear Evolving Regression), an explainable model for time series forecasting in nonstationary environments based on dynamic piecewise-linear approximations. Unlike other locally linear models, SyMPLER uses generalization bounds from Statistical Learning Theory to automatically determine when to add new local models based on prediction errors, eliminating the need for explicit clustering of the data. Experiments show that SyMPLER can achieve comparable performance to both black-box and existing explainable models while maintaining a human-interpretable structure that reveals insights about the system's behavior. In this sense, our approach conciliates accuracy and interpretability, offering a transparent and adaptive solution for forecasting nonstationary time series.
- Europe > Portugal > Braga > Braga (0.41)
- South America > Brazil > Minas Gerais > Belo Horizonte (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Bayesian Control of Large MDPs with Unknown Dynamics in Data-Poor Environments
We propose a Bayesian decision making framework for control of Markov Decision Processes (MDPs) with unknown dynamics and large, possibly continuous, state, action, and parameter spaces in data-poor environments. Most of the existing adaptive controllers for MDPs with unknown dynamics are based on the reinforcement learning framework and rely on large data sets acquired by sustained direct interaction with the system or via a simulator. This is not feasible in many applications, due to ethical, economic, and physical constraints. The proposed framework addresses the data poverty issue by decomposing the problem into an offline planning stage that does not rely on sustained direct interaction with the system or simulator and an online execution stage. In the offline process, parallel Gaussian process temporal difference (GPTD) learning techniques are employed for near-optimal Bayesian approximation of the expected discounted reward over a sample drawn from the prior distribution of unknown parameters. In the online stage, the action with the maximum expected return with respect to the posterior distribution of the parameters is selected. This is achieved by an approximation of the posterior distribution using a Markov Chain Monte Carlo (MCMC) algorithm, followed by constructing multiple Gaussian processes over the parameter space for efficient prediction of the means of the expected return at the MCMC sample. The effectiveness of the proposed framework is demonstrated using a simple dynamical system model with continuous state and action spaces, as well as a more complex model for a metastatic melanoma gene regulatory network observed through noisy synthetic gene expression data.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.81)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.59)
- North America > United States > New Jersey (0.04)
- Europe > Portugal > Braga > Braga (0.04)
- Africa > Mali (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > Russia (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
- Europe > North Sea (0.04)
- Atlantic Ocean > North Atlantic Ocean > North Sea (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (0.67)
- North America > Canada > Ontario > Toronto (0.14)
- Africa > Mali (0.04)
- Asia > Middle East > Jordan (0.04)
- (3 more...)
- Research Report > New Finding (0.94)
- Research Report > Experimental Study (0.67)