Energy
Generalized Belief Propagation
Yedidia, Jonathan S., Freeman, William T., Weiss, Yair
Belief propagation (BP) was only supposed to work for treelike networks but works surprisingly well in many applications involving networks with loops, including turbo codes. However, there has been little understanding of the algorithm or the nature of the solutions it finds for general graphs. We show that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics.This result characterizes BP fixed-points and makes connections with variational approaches to approximate inference. More importantly, our analysis lets us build on the progress made in statistical physics since Bethe's approximation was introduced in 1935. Kikuchi and others have shown how to construct more accurate freeenergy approximations, of which Bethe's approximation is the simplest.
Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping
Caruana, Rich, Lawrence, Steve, Giles, C. Lee
The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity.
On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems
Our al exploits the special structure of the sum of squared error measure in Equation (1); hence, the other objective functions are outside the scope of this paper. The gradient vector and Hessian matrix are given by g g(9) JT rand H H(9) JT J S, where J is the m x n Jacobian matrix of r, and S denotes the matrix of second-derivative terms. If S is simply omitted based on the "small residual" assumption, then the Hessian matrix reduces to the Gauss-Newton model Hessian: i.e., JT J. Furthermore, a family of quasi-Newton methods can be applied to approximate term S alone, leading to the augmented Gauss-Newton model Hessian (see, for example, Mizutani [2] and references therein).
Generalized Belief Propagation
Yedidia, Jonathan S., Freeman, William T., Weiss, Yair
For general networks with loops, the situation is much less clear. On the one hand, a number of researchers have empirically demonstrated good performance for BP algorithms applied to networks with loops. One dramatic case is the near Shannon-limit performance of "Turbo codes", whose decoding algorithm is equivalent to BP on a loopy network [2, 6]. For some problems in computer vision involving networks with loops, BP has also shown to be accurate and to converge very quickly [2, 1, 7]. On the other hand, for other networks with loops, BP may give poor results or fail to converge [7]. For a general graph, little has been understood about what approximation BP represents, and how it might be improved. This paper's goal is to provide that understanding and introduce a set of new algorithms resulting from that understanding. We show that BP is the first in a progression of local message-passing algorithms, each giving equivalent results to a corresponding approximation from statistical physics known as the "Kikuchi" approximation to the Gibbs free energy. These algorithms have the attractive property of being user-adjustable: by paying some additional computational cost, one can obtain considerable improvement in the accuracy of one's approximation, and can sometimes obtain a convergent message-passing algorithm when ordinary BP does not converge.
Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping
Caruana, Rich, Lawrence, Steve, Giles, C. Lee
The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity.
On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems
Our al exploits the special structure of the sum of squared error measure in Equation (1); hence, the other objective functions are outside the scope of this paper. The gradient vector and Hessian matrix are given by g g(9) JT rand H H(9) JT J S, where J is the m x n Jacobian matrix of r, and S denotes the matrix of second-derivative terms. If S is simply omitted based on the "small residual" assumption, then the Hessian matrix reduces to the Gauss-Newton model Hessian: i.e., JT J. Furthermore, a family of quasi-Newton methods can be applied to approximate term S alone, leading to the augmented Gauss-Newton model Hessian (see, for example, Mizutani [2] and references therein).
A New Direction in AI: Toward a Computational Theory of Perceptions
Fast-forward (FF) was the most successful automatic planner in the Fifth International Conference on Artificial Intelligence Planning and Scheduling (AIPS '00) planning systems competition. Like the well-known hsp system, FF relies on forward search in the state space, guided by a heuristic that estimates goal distances by ignoring delete lists. It differs from HSP in a number of important details. This article describes the algorithmic techniques used in FF in comparison to hsp and evaluates their benefits in terms of run-time and solution-length behavior. Humans have a remarkable capability to perform a wide variety of physical and mental tasks without any measurements and any computations. Familiar examples are parking a car, driving in city traffic, playing golf, cooking a meal, and summarizing a story. In performing such tasks, humans use perceptions of time, direction, speed, shape, possibility, likelihood, truth, and other attributes of physical and mental objects. Reflecting the bounded ability of the human brain to resolve detail, perceptions are intrinsically imprecise. In more concrete terms, perceptions are f-granular, meaning that (1) the boundaries of perceived classes are unsharp and (2) the values of attributes are granulated, with a granule being a clump of values (points, objects) drawn together by indistinguishability, similarity, proximity, and function. For example, the granules of age might be labeled very young, young, middle aged, old, very old, and so on. F-granularity of perceptions puts them well beyond the reach of traditional methods of analysis based on predicate logic or probability theory. The computational theory of perceptions (CTP), which is outlined in this article, adds to the armamentarium of AI a capability to compute and reason with perception-based information. The point of departure in CTP is the assumption that perceptions are described by propositions drawn from a natural language; for example, it is unlikely that there will be a significant increase in the price of oil in the near future. In CTP, a proposition, p, is viewed as an answer to a question, and the meaning of p is represented as a generalized constraint. To compute with perceptions, their descriptors are translated into what is called the generalized constraint language (GCL). Then, goal-directed constraint propagation is utilized to answer a given query. A concept that plays a key role in CTP is that of precisiated natural language (PNL). The computational theory of perceptions suggests a new direction in AI -- a direction that might enhance the ability of AI to deal with realworld problems in which decision-relevant information is a mixture of measurements and perceptions. What is not widely recognized is that many important problems in AI fall into this category.
A Call for Knowledge-Based Planning
Wilkins, David E., desJardins, Marie
We are interested in solving real-world planning problems and, to that end, argue for the use of domain knowledge in planning. We believe that the field must develop methods capable of using rich knowledge models to make planning tools useful for complex problems. We discuss the suitability of current planning paradigms for solving these problems. In particular, we compare knowledge rich approaches such as hierarchical task network planning to minimal-knowledge methods such as STRIPS-based planners and disjunctive planners. We argue that the former methods have advantages such as scalability, expressiveness, continuous plan modification during execution, and the ability to interact with humans. However, these planners also have limitations, such as requiring complete domain models and failing to model uncertainty, that often make them inadequate for real-world problems. In this article, we define the terms knowledge-based and primitive-action planning and argue for the use of knowledge-based planning as a paradigm for solving real-world problems. We next summarize some of the characteristics of real-world problems that we are interested in addressing. Several current real-world planning applications are described, focusing on the ways in which knowledge is brought to bear on the planning problem. We describe some existing knowledge-based approaches and then discuss additional capabilities, beyond those available in existing systems, that are needed. Finally, we draw an analogy from the current focus of the planning community on disjunctive planners to the experiences of the machine learning community over the past decade.