Genre
Early stopping and non-parametric regression: An optimal data-dependent stopping rule
Raskutti, Garvesh, Wainwright, Martin J., Yu, Bin
The strategy of early stopping is a regularization technique based on choosing a stopping time for an iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert space, we analyze the early stopping strategy for a form of gradient-descent applied to the least-squares loss function. We propose a data-dependent stopping rule that does not involve hold-out or cross-validation data, and we prove upper bounds on the squared error of the resulting function estimate, measured in either the $L^2(P)$ and $L^2(P_n)$ norm. These upper bounds lead to minimax-optimal rates for various kernel classes, including Sobolev smoothness classes and other forms of reproducing kernel Hilbert spaces. We show through simulation that our stopping rule compares favorably to two other stopping rules, one based on hold-out data and the other based on Stein's unbiased risk estimate. We also establish a tight connection between our early stopping strategy and the solution path of a kernel ridge regression estimator.
Outlying Property Detection with Numerical Attributes
Angiulli, Fabrizio, Fassetti, Fabio, Palopoli, Luigi, Manco, Giuseppe
The outlying property detection problem is the problem of discovering the properties distinguishing a given object, known in advance to be an outlier in a database, from the other database objects. In this paper, we analyze the problem within a context where numerical attributes are taken into account, which represents a relevant case left open in the literature. We introduce a measure to quantify the degree the outlierness of an object, which is associated with the relative likelihood of the value, compared to the to the relative likelihood of other objects in the database. As a major contribution, we present an efficient algorithm to compute the outlierness relative to significant subsets of the data. The latter subsets are characterized in a "rule-based" fashion, and hence the basis for the underlying explanation of the outlierness.
Sharing Rewards in Cooperative Connectivity Games
Bachrach, Y., Porat, E., Rosenschein, J. S.
We consider how selfish agents are likely to share revenues derived from maintaining connectivity between important network servers. We model a network where a failure of one node may disrupt communication between other nodes as a cooperative game called the vertex Connectivity Game (CG). In this game, each agent owns a vertex, and controls all the edges going to and from that vertex. A coalition of agents wins if it fully connects a certain subset of vertices in the graph, called the primary vertices. Power indices measure an agent's ability to affect the outcome of the game. We show that in our domain, such indices can be used to both determine the fair share of the revenues an agent is entitled to, and identify significant possible points of failure affecting the reliability of communication in the network. We show that in general graphs, calculating the Shapley and Banzhaf power indices is #P-complete, but suggest a polynomial algorithm for calculating them in trees. We also investigate finding stable payoff divisions of the revenues in CGs, captured by the game theoretic solution of the core, and its relaxations, the epsilon-core and least core. We show a polynomial algorithm for computing the core of a CG, but show that testing whether an imputation is in the epsilon-core is coNP-complete. Finally, we show that for trees, it is possible to test for epsilon-core imputations in polynomial time.
The Arcade Learning Environment: An Evaluation Platform for General Agents
Bellemare, M. G., Naddaf, Y., Veness, J., Bowling, M.
In this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. ALE provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players. ALE presents significant research challenges for reinforcement learning, model learning, model-based planning, imitation learning, transfer learning, and intrinsic motivation. Most importantly, it provides a rigorous testbed for evaluating and comparing approaches to these problems. We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning. In doing so, we also propose an evaluation methodology made possible by ALE, reporting empirical results on over 55 different games. All of the software, including the benchmark agents, is publicly available.
Constrained fractional set programs and their application in local clustering and community detection
Bรผhler, Thomas, Rangapuram, Syama Sundar, Setzer, Simon, Hein, Matthias
The (constrained) minimization of a ratio of set functions is a problem frequently occurring in clustering and community detection. As these optimization problems are typically NP-hard, one uses convex or spectral relaxations in practice. While these relaxations can be solved globally optimally, they are often too loose and thus lead to results far away from the optimum. In this paper we show that every constrained minimization problem of a ratio of non-negative set functions allows a tight relaxation into an unconstrained continuous optimization problem. This result leads to a flexible framework for solving constrained problems in network analysis. While a globally optimal solution for the resulting non-convex problem cannot be guaranteed, we outperform the loose convex or spectral relaxations by a large margin on constrained local clustering problems.
Hyperparameter Optimization and Boosting for Classifying Facial Expressions: How good can a "Null" Model be?
Bergstra, James, Cox, David D.
One of the goals of the ICML workshop on representation and learning is to establish benchmark scores for a new data set of labeled facial expressions. This paper presents the performance of a "Null model" consisting of convolutions with random weights, PCA, pooling, normalization, and a linear readout. Our approach focused on hyperparameter optimization rather than novel model components. On the Facial Expression Recognition Challenge held by the Kaggle website, our hyperparameter optimization approach achieved a score of 60% accuracy on the test data. This paper also introduces a new ensemble construction variant that combines hyperparameter optimization with the construction of ensembles. This algorithm constructed an ensemble of four models that scored 65.5% accuracy. These scores rank 12th and 5th respectively among the 56 challenge participants. It is worth noting that our approach was developed prior to the release of the data set, and applied without modification; our strong competition performance suggests that the TPE hyperparameter optimization algorithm and domain expertise encoded in our Null model can generalize to new image classification data sets.
Classifying Single-Trial EEG during Motor Imagery with a Small Training Set
Before the operation of a motor imagery based brain-computer interface (BCI) adopting machine learning techniques, a cumbersome training procedure is unavoidable. The development of a practical BCI posed the challenge of classifying single-trial EEG with a small training set. In this letter, we addressed this problem by employing a series of signal processing and machine learning approaches to alleviate overfitting and obtained test accuracy similar to training accuracy on the datasets from BCI Competition III and our own experiments.
Sparse Recovery of Streaming Signals Using L1-Homotopy
Asif, M. Salman, Romberg, Justin
Most of the existing methods for sparse signal recovery assume a static system: the unknown signal is a finite-length vector for which a fixed set of linear measurements and a sparse representation basis are available and an L1-norm minimization program is solved for the reconstruction. However, the same representation and reconstruction framework is not readily applicable in a streaming system: the unknown signal changes over time, and it is measured and reconstructed sequentially over small time intervals. In this paper, we discuss two such streaming systems and a homotopy-based algorithm for quickly solving the associated L1-norm minimization programs: 1) Recovery of a smooth, time-varying signal for which, instead of using block transforms, we use lapped orthogonal transforms for sparse representation. 2) Recovery of a sparse, time-varying signal that follows a linear dynamic model. For both the systems, we iteratively process measurements over a sliding interval and estimate sparse coefficients by solving a weighted L1-norm minimization program. Instead of solving a new L1 program from scratch at every iteration, we use an available signal estimate as a starting point in a homotopy formulation. Starting with a warm-start vector, our homotopy algorithm updates the solution in a small number of computationally inexpensive steps as the system changes. The homotopy algorithm presented in this paper is highly versatile as it can update the solution for the L1 problem in a number of dynamical settings. We demonstrate with numerical experiments that our proposed streaming recovery framework outperforms the methods that represent and reconstruct a signal as independent, disjoint blocks, in terms of quality of reconstruction, and that our proposed homotopy-based updating scheme outperforms current state-of-the-art solvers in terms of the computation time and complexity.
Constructive Setting of the Density Ratio Estimation Problem and its Rigorous Solution
Vapnik, Vladimir, Braga, Igor, Izmailov, Rauf
We introduce a general constructive setting of the density ratio estimation problem as a solution of a (multidimensional) integral equation. In this equation, not only its right hand side is known approximately, but also the integral operator is defined approximately. We show that this ill-posed problem has a rigorous solution and obtain the solution in a closed form. The key element of this solution is the novel V-matrix, which captures the geometry of the observed samples. We compare our method with three well-known previously proposed ones. Our experimental results demonstrate the good potential of the new approach.
h-approximation: History-Based Approximation of Possible World Semantics as ASP
Eppe, Manfred, Bhatt, Mehul, Dylla, Frank
We propose an approximation of the Possible Worlds Semantics (PWS) for action planning. A corresponding planning system is implemented by a transformation of the action specification to an Answer-Set Program. A novelty is support for postdiction wrt. (a) the plan existence problem in our framework can be solved in NP, as compared to $\Sigma_2^P$ for non-approximated PWS of Baral(2000); and (b) the planner generates optimal plans wrt. a minimal number of actions in $\Delta_2^P$. We demo the planning system with standard problems, and illustrate its integration in a larger software framework for robot control in a smart home.