Energy
Policy Improvement for POMDPs Using Normalized Importance Sampling
We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowledge of the POMDP, can estimate the returns for finite state controllers, allows experience to be gathered from arbitrary sequences of policies, and estimates the return for any new policy. We motivate the estimator from function-approximation and importance sampling points-of-view and derive its bias and variance. Although the estimator is biased, it has low variance and the bias is often irrelevant when the estimator is used for pairwise comparisons. We conclude by extending the estimator to policies with memory and compare its performance in a greedy search algorithm to the REINFORCE algorithm showing an order of magnitude reduction in the number of trials required.
Knowledge Discovery System For Fiber Reinforced Polymer Matrix Composite Laminate
In this paper Knowledge Discovery System (KDS) is proposed and implemented for the extraction of knowledge-mean stiffness of a polymer composite material in which when fibers are placed at different orientations. Cosine amplitude method is implemented for retrieving compatible polymer matrix and reinforcement fiber which is coming under predicted fiber class, from the polymer and reinforcement database respectively, based on the design requirements. Fuzzy classification rules to classify fibers into short, medium and long fiber classes are derived based on the fiber length and the computed or derive critical length of fiber. Longitudinal and Transverse module of Polymer Matrix Composite consisting of seven layers with different fiber volume fractions and different fibers orientations at 0,15,30,45,60,75 and 90 degrees are analyzed through Rule-of Mixture material design model. The analysis results are represented in different graphical steps and have been measured with statistical parameters. This data mining application implemented here has focused the mechanical problems of material design and analysis. Therefore, this system is an expert decision support system for optimizing the materials performance for designing light-weight and strong, and cost effective polymer composite materials.
Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress
Lopes, Manuel, Lang, Tobias, Toussaint, Marc, Oudeyer, Pierre-yves
Formal exploration approaches in model-based reinforcement learning estimate the accuracy of the currently learned model without consideration of the empirical prediction error. For example, PAC-MDP approaches such as Rmax base their model certainty on the amount of collected data, while Bayesian approaches assume a prior over the transition dynamics. We propose extensions to such approaches which drive exploration solely based on empirical estimates of the learner's accuracy and learning progress. We provide a ``sanity check'' theoretical analysis, discussing the behavior of our extensions in the standard stationary finite state-action case. We then provide experimental studies demonstrating the robustness of these exploration measures in cases of non-stationary environments or where original approaches are misled by wrong domain assumptions.
Variational Inference for Crowdsourcing
Liu, Qiang, Peng, Jian, Ihler, Alexander T.
Crowdsourcing has become a popular paradigm for labeling large datasets. However, ithas given rise to the computational task of aggregating the crowdsourced labels provided by a collection of unreliable annotators. We approach this problem bytransforming it into a standard inference problem in graphical models, and applying approximate variational methods, including belief propagation (BP) and mean field (MF). We show that our BP algorithm generalizes both majority votingand a recent algorithm by Karger et al. [1], while our MF method is closely related to a commonly used EM algorithm. In both cases, we find that the performance of the algorithms critically depends on the choice of a prior distribution onthe workers' reliability; by choosing the prior properly, both BP and MF (and EM) perform surprisingly well on both simulated and real-world datasets, competitive with state-of-the-art algorithms based on more complicated modeling assumptions.
Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data
Glazer, Assaf, Lindenbaum, Michael, Markovitch, Shaul
We propose an efficient, generalized, nonparametric, statistical Kolmogorov-Smirnov test for detecting distributional change in high-dimensional data. To implement the test, we introduce a novel, hierarchical, minimum-volume sets estimator to represent the distributions to be tested. Our work is motivated by the need to detect changes in data streams, and the test is especially efficient in this context. We provide the theoretical foundations of our test and show its superiority over existing methods.
Analog readout for optical reservoir computers
Smerieri, Anteo, Duport, François, Paquot, Yvon, Schrauwen, Benjamin, Haelterman, Marc, Massar, Serge
Reservoir computing is a new, powerful and flexible machine learning technique that is easily implemented in hardware. Recently, by using a time-multiplexed architecture, hardware reservoir computers have reached performance comparable to digital implementations. Operating speeds allowing for real time information operation have been reached using optoelectronic systems. At present the main performance bottleneck is the readout layer which uses slow, digital postprocessing. We have designed an analog readout suitable for time-multiplexed optoelectronic reservoir computers, capable of working in real time. The readout has been built and tested experimentally on a standard benchmark task. Its performance is better than non-reservoir methods, with ample room for further improvement. The present work thereby overcomes one of the major limitations for the future development of hardware reservoir computers.
On Multilabel Classification and Ranking with Partial Feedback
Gentile, Claudio, Orabona, Francesco
We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show $O(T^{1/2}\log T)$ regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-confidence scheme by contrasting against full-information baselines on real-world multilabel datasets, often obtaining comparable performance.
Spectral learning of linear dynamics from generalised-linear observations with application to neural population data
Buesing, Lars, Macke, Jakob H., Sahani, Maneesh
Latent linear dynamical systems with generalised-linear observation models arise in a variety of applications, for example when modelling the spiking activity of populations of neurons. Here, we show how spectral learning methods for linear systems with Gaussian observations (usually called subspace identification in this context) can be extended to estimate the parameters of dynamical system models observed through non-Gaussian noise models. We use this approach to obtain estimates of parameters for a dynamical model of neural population data, where the observed spike-counts are Poisson-distributed with log-rates determined by the latent dynamical process, possibly driven by external inputs. We show that the extended system identification algorithm is consistent and accurately recovers the correct parameters on large simulated data sets with much smaller computational cost than approximate expectation-maximisation (EM) due to the non-iterative nature of subspace identification. Even on smaller data sets, it provides an effective initialization for EM, leading to more robust performance and faster convergence. These benefits are shown to extend to real neural data.
Learning optimal spike-based representations
Bourdoukan, Ralph, Barrett, David, Deneve, Sophie, Machens, Christian K.
How do neural networks learn to represent information? Here, we address this question by assuming that neural networks seek to generate an optimal population representation for a fixed linear decoder. We define a loss function for the quality of the population read-out and derive the dynamical equations for both neurons and synapses from the requirement to minimize this loss. The dynamical equations yield a network of integrate-and-fire neurons undergoing Hebbian plasticity. We show that, through learning, initially regular and highly correlated spike trains evolve towards Poisson-distributed and independent spike trains with much lower firing rates. The learning rule drives the network into an asynchronous, balanced regime where all inputs to the network are represented optimally for the given decoder. We show that the network dynamics and synaptic plasticity jointly balance the excitation and inhibition received by each unit as tightly as possible and, in doing so, minimize the prediction error between the inputs and the decoded outputs. In turn, spikes are only signalled whenever this prediction error exceeds a certain value, thereby implementing a predictive coding scheme. Our work suggests that several of the features reported in cortical networks, such as the high trial-to-trial variability, the balance between excitation and inhibition, and spike-timing dependent plasticity, are simply signatures of an efficient, spike-based code.
Risk-Aversion in Multi-armed Bandits
Sani, Amir, Lazaric, Alessandro, Munos, Rémi
In stochastic multi--armed bandits the objective is to solve the exploration--exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk--aversion where the objective is to compete against the arm with the best risk--return trade--off. This setting proves to be intrinsically more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we introduce two new algorithms, we investigate their theoretical guarantees, and we report preliminary empirical results.