boltzmann
Entropy, concentration, and learning: a statistical mechanics primer
Artificial intelligence models trained through loss minimization have demonstrated significant success, grounded in principles from fields like information theory and statistical physics. This work explores these established connections through the lens of statistical mechanics, starting from first-principles sample concentration behaviors that underpin AI and machine learning. Our development of statistical mechanics for modeling highlights the key role of exponential families, and quantities of statistics, physics, and information theory.
Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards
Broek, Ronald C. van den, Litjens, Rik, Sagis, Tobias, Siecker, Luc, Verbeeke, Nina, Gajane, Pratik
Decision-making problems of sequential nature, where decisions made in the past may have an impact on the future, are used to model many practically important applications. In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays. Motivated by such scenarios, we propose a novel problem formulation called multi-armed bandits with generalized temporally-partitioned rewards. To formalize how feedback about a decision is partitioned across several time steps, we introduce $\beta$-spread property. We derive a lower bound on the performance of any uniformly efficient algorithm for the considered problem. Moreover, we provide an algorithm called TP-UCB-FR-G and prove an upper bound on its performance measure. In some scenarios, our upper bound improves upon the state of the art. We provide experimental results validating the proposed algorithm and our theoretical results.
Noise-injected analog Ising machines enable ultrafast statistical sampling and machine learning
Böhm, Fabian, Alonso-Urquijo, Diego, Verschaffelt, Guy, Van der Sande, Guy
Ising machines are a promising non-von-Neumann computational concept for neural network training and combinatorial optimization. However, while various neural networks can be implemented with Ising machines, their inability to perform fast statistical sampling makes them inefficient for training neural networks compared to digital computers. Here, we introduce a universal concept to achieve ultrafast statistical sampling with analog Ising machines by injecting noise. With an opto-electronic Ising machine, we experimentally demonstrate that this can be used for accurate sampling of Boltzmann distributions and for unsupervised training of neural networks, with equal accuracy as software-based training. Through simulations, we find that Ising machines can perform statistical sampling orders-of-magnitudes faster than software-based methods. This enables the use of Ising machines beyond combinatorial optimization and makes them into efficient tools for machine learning and other applications.
Shannon's Information Theory
I never read original papers of the greatest scientists, but I got so intrigued by the information theory that I gave Claude Shannon's seminal paper a read. In this single paper, Shannon introduced this new fundamental theory. He raised the right questions, which no one else even thought of asking. This would have been enough to make this contribution earthshaking. But amazingly enough, Shannon also provided most of the right answers with class and elegance. In comparison, it took decades for a dozen of top physicists to define the basics of quantum theory. Meanwhile, Shannon constructed something equivalent, all by himself, in a single paper. Shannon's theory has since transformed the world like no other ever had, from information technologies to telecommunications, from theoretical physics to economical globalization, from everyday life to philosophy. I don't think Shannon has had the credits he deserves.
LESS is More: Rethinking Probabilistic Models of Human Behavior
Bobu, Andreea, Scobee, Dexter R. R., Fisac, Jaime F., Sastry, S. Shankar, Dragan, Anca D.
Robots need models of human behavior for both inferring human goals and preferences, and predicting what people will do. A common model is the Boltzmann noisily-rational decision model, which assumes people approximately optimize a reward function and choose trajectories in proportion to their exponentiated reward. While this model has been successful in a variety of robotics domains, its roots lie in econometrics, and in modeling decisions among different discrete options, each with its own utility or reward. In contrast, human trajectories lie in a continuous space, with continuous-valued features that influence the reward function. We propose that it is time to rethink the Boltzmann model, and design it from the ground up to operate over such trajectory spaces. We introduce a model that explicitly accounts for distances between trajectories, rather than only their rewards. Rather than each trajectory affecting the decision independently, similar trajectories now affect the decision together. We start by showing that our model better explains human behavior in a user study. We then analyze the implications this has for robot inference, first in toy environments where we have ground truth and find more accurate inference, and finally for a 7DOF robot arm learning from user demonstrations.
bRe6z2X85pj7n5g%3D%3D&utm_content=buffer2745c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Unlike task-specific algorithms, Deep Learning is a part of Machine Learning family based on learning data representations. With massive amounts of computational power, machines can now recognize objects and translate speech in real time, enabling a smart Artificial intelligence in systems. The concept of a software simulating the neocortex's large array of neurons in an artificial neural network is decades old, and it has led to as many disappointments as breakthroughs. But because of improvements in mathematical formulas and increasingly powerful computers, today researchers & data scientists can model many more layers of virtual neurons than ever before. Languishing through the 1970's, early neural networks could simulate only a very limited number of neurons at once, so they could not recognize patterns of great complexity.
Boltzmann Chains and Hidden Markov Models
Saul, Lawrence K., Jordan, Michael I.
Statistical models of discrete time series have a wide range of applications, most notably to problems in speech recognition (Juang & Rabiner, 1991) and molecular biology (Baldi, Chauvin, Hunkapiller, & McClure, 1992). A common problem in these fields is to find a probabilistic model, and a set of model parameters, that 436 Lawrence K. Saul, Michael I. Jordan
Boltzmann Chains and Hidden Markov Models
Saul, Lawrence K., Jordan, Michael I.
Statistical models of discrete time series have a wide range of applications, most notably to problems in speech recognition (Juang & Rabiner, 1991) and molecular biology (Baldi, Chauvin, Hunkapiller, & McClure, 1992). A common problem in these fields is to find a probabilistic model, and a set of model parameters, that 436 Lawrence K. Saul, Michael I. Jordan
Boltzmann Chains and Hidden Markov Models
Saul, Lawrence K., Jordan, Michael I.
Statistical models of discrete time series have a wide range of applications, most notably to problems in speech recognition (Juang & Rabiner, 1991) and molecular biology (Baldi, Chauvin, Hunkapiller, & McClure, 1992). A common problem in these fields is to find a probabilistic model, and a set of model parameters, that 436 LawrenceK. Saul, Michael I. Jordan account for sequences of observed data. Hidden Markov models (HMMs) have been particularly successful at modeling discrete time series. One reason for this is the powerful learning rule (Baum) 1972») a special case of the Expectation-Maximization (EM) procedure for maximum likelihood estimation (Dempster) Laird) & Rubin) 1977).
Discovering High Order Features with Mean Field Modules
Galland, Conrad C., Hinton, Geoffrey E.
A new form of the deterministic Boltzmann machine (DBM) learning procedure is presented which can efficiently train network modules to discriminate between input vectors according to some criterion. The new technique directly utilizes the free energy of these "mean field modules" to represent the probability that the criterion is met, the free energy being readily manipulated by the learning procedure. Although conventional deterministic Boltzmann learning fails to extract the higher order feature of shift at a network bottleneck, combining the new mean field modules with the mutual information objective function rapidly produces modules that perfectly extract this important higher order feature without direct external supervision. 1 INTRODUCTION The Boltzmann machine learning procedure (Hinton and Sejnowski, 1986) can be made much more efficient by using a mean field approximation in which stochastic binary units are replaced by deterministic real-valued units (Peterson and Anderson, 1987). Deterministic Boltzmann learning can be used for "multicompletion" tasks in which the subsets of the units that are treated as input or output are varied from trial to trial (Peterson and Hartman, 1988). In this respect it resembles other learning procedures that also involve settling to a stable state (Pineda, 1987). Using the multicompletion paradigm, it should be possible to force a network to explicitly extract important higher order features of an ensemble of training vectors by forcing the network to pass the information required for correct completions through a narrow bottleneck. In back-propagation networks with two or three hidden layers, the use of bottlenecks sometimes allows the learning to explictly discover important.