Goto

Collaborating Authors

 Asia


Stochastic Dual Coordinate Ascent with Alternating Direction Multiplier Method

arXiv.org Machine Learning

We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems. Our method is based on Alternating Direction Multiplier Method (ADMM) to deal with complex regularization functions such as structured regularizations. Although the original ADMM is a batch method, the proposed method offers a stochastic update rule where each iteration requires only one or few sample observations. Moreover, our method can naturally afford mini-batch update and it gives speed up of convergence. We show that, under mild assumptions, our method converges exponentially. The numerical experiments show that our method actually performs efficiently.


Thompson Sampling for Complex Bandit Problems

arXiv.org Machine Learning

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round. The reward of the complex action is some function of the basic arms' rewards, and the feedback observed may not necessarily be the reward per-arm. For instance, when the complex actions are subsets of the arms, we may only observe the maximum reward over the chosen subset. Thus, feedback across complex actions may be coupled due to the nature of the reward function. We prove a frequentist regret bound for Thompson sampling in a very general setting involving parameter, action and observation spaces and a likelihood function over them. The bound holds for discretely-supported priors over the parameter space and without additional structural properties such as closed-form posteriors, conjugate prior structure or independence across arms. The regret bound scales logarithmically with time but, more importantly, with an improved constant that non-trivially captures the coupling across complex actions due to the structure of the rewards. As applications, we derive improved regret bounds for classes of complex bandit problems involving selecting subsets of arms, including the first nontrivial regret bounds for nonlinear MAX reward feedback from subsets.


Multitask Diffusion Adaptation over Networks

arXiv.org Artificial Intelligence

Adaptive networks are suitable for decentralized inference tasks, e.g., to monitor complex natural phenomena. Recent research works have intensively studied distributed optimization problems in the case where the nodes have to estimate a single optimum parameter vector collaboratively. However, there are many important applications that are multitask-oriented in the sense that there are multiple optimum parameter vectors to be inferred simultaneously, in a collaborative manner, over the area covered by the network. In this paper, we employ diffusion strategies to develop distributed algorithms that address multitask problems by minimizing an appropriate mean-square error criterion with $\ell_2$-regularization. The stability and convergence of the algorithm in the mean and in the mean-square sense is analyzed. Simulations are conducted to verify the theoretical findings, and to illustrate how the distributed strategy can be used in several useful applications related to spectral sensing, target localization, and hyperspectral data unmixing.


Multivariate Generalized Gaussian Process Models

arXiv.org Machine Learning

We propose a family of multivariate Gaussian process models for correlated outputs, based on assuming that the likelihood function takes the generic form of the multivariate exponential family distribution (EFD). We denote this model as a multivariate generalized Gaussian process model, and derive Taylor and Laplace algorithms for approximate inference on the generic model. By instantiating the EFD with specific parameter functions, we obtain two novel GP models (and corresponding inference algorithms) for correlated outputs: 1) a Von-Mises GP for angle regression; and 2) a Dirichlet GP for regressing on the multinomial simplex.


A Global Model for Concept-to-Text Generation

Journal of Artificial Intelligence Research

Concept-to-text generation refers to the task of automatically producing textual output from non-linguistic input. We present a joint model that captures content selection ("what to say") and surface realization ("how to say") in an unsupervised domain-independent fashion. Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free grammar that globally describes the inherent structure of the input (a corpus of database records and text describing some of them). We recast generation as the task of finding the best derivation tree for a set of database records and describe an algorithm for decoding in this framework that allows to intersect the grammar with additional information capturing fluency and syntactic well-formedness constraints. Experimental evaluation on several domains achieves results competitive with state-of-the-art systems that use domain specific constraints, explicit feature engineering or labeled data.


Necessary and Sufficient Conditions for Novel Word Detection in Separable Topic Models

arXiv.org Machine Learning

The simplicial condition and other stronger conditions that imply it have recently played a central role in developing polynomial time algorithms with provable asymptotic consistency and sample complexity guarantees for topic estimation in separable topic models. Of these algorithms, those that rely solely on the simplicial condition are impractical while the practical ones need stronger conditions. In this paper, we demonstrate, for the first time, that the simplicial condition is a fundamental, algorithm-independent, information-theoretic necessary condition for consistent separable topic estimation. Furthermore, under solely the simplicial condition, we present a practical quadratic-complexity algorithm based on random projections which consistently detects all novel words of all topics using only up to second-order empirical word moments. This algorithm is amenable to distributed implementation making it attractive for "big-data" scenarios involving a network of large distributed databases.


Algorithm Runtime Prediction: Methods & Evaluation

arXiv.org Artificial Intelligence

Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm's runtime as a function of problem-specific instance features. Such models have important applications to algorithm analysis, portfolio-based algorithm selection, and the automatic configuration of parameterized algorithms. Over the past decade, a wide variety of techniques have been studied for building such models. Here, we describe extensions and improvements of existing models, new families of models, and -- perhaps most importantly -- a much more thorough treatment of algorithm parameters as model inputs. We also comprehensively describe new and existing features for predicting algorithm runtime for propositional satisfiability (SAT), travelling salesperson (TSP) and mixed integer programming (MIP) problems. We evaluate these innovations through the largest empirical analysis of its kind, comparing to a wide range of runtime modelling techniques from the literature. Our experiments consider 11 algorithms and 35 instance distributions; they also span a very wide range of SAT, MIP, and TSP instances, with the least structured having been generated uniformly at random and the most structured having emerged from real industrial applications. Overall, we demonstrate that our new models yield substantially better runtime predictions than previous approaches in terms of their generalization to new problem instances, to new algorithms from a parameterized space, and to both simultaneously.


Mean Field Bayes Backpropagation: scalable training of multilayer neural networks with binary weights

arXiv.org Machine Learning

Significant success has been reported recently using deep neural networks for classification. Such large networks can be computationally intensive, even after training is over. Implementing these trained networks in hardware chips with a limited precision of synaptic weights may improve their speed and energy efficiency by several orders of magnitude, thus enabling their integration into small and low-power electronic devices. With this motivation, we develop a computationally efficient learning algorithm for multilayer neural networks with binary weights, assuming all the hidden neurons have a fan-out of one. This algorithm, derived within a Bayesian probabilistic online setting, is shown to work well for both synthetic and real-world problems, performing comparably to algorithms with real-valued weights, while retaining computational tractability.


An Online Mechanism for Multi-Unit Demand and its Application to Plug-in Hybrid Electric Vehicle Charging

Journal of Artificial Intelligence Research

We develop an online mechanism for the allocation of an expiring resource to a dynamic agent population. Each agent has a non-increasing marginal valuation function for the resource, and an upper limit on the number of units that can be allocated in any period. We propose two versions on a truthful allocation mechanism. Each modifies the decisions of a greedy online assignment algorithm by sometimes cancelling an allocation of resources. One version makes this modification immediately upon an allocation decision while a second waits until the point at which an agent departs the market. Adopting a prior-free framework, we show that the second approach has better worst-case allocative efficiency and is more scalable. On the other hand, the first approach (with immediate cancellation) may be easier in practice because it does not need to reclaim units previously allocated. We consider an application to recharging plug-in hybrid electric vehicles (PHEVs). Using data from a real-world trial of PHEVs in the UK, we demonstrate higher system performance than a fixed price system, performance comparable with a standard, but non-truthful scheduling heuristic, and the ability to support 50% more vehicles at the same fuel cost than a simple randomized policy.


Universalities of Reproducing Kernels Revisited

arXiv.org Machine Learning

Kernel methods have been widely applied to machine learning and other questions of approximating an unknown function from its finite sample data. To ensure arbitrary accuracy of such approximation, various denseness conditions are imposed on the selected kernel. This note contributes to the study of universal, characteristic, and $C_0$-universal kernels. We first give simple and direct description of the difference and relation among these three kinds of universalities of kernels. We then focus on translation-invariant and weighted polynomial kernels. A simple and shorter proof of the known characterization of characteristic translation-invariant kernels will be presented. The main purpose of the note is to give a delicate discussion on the universalities of weighted polynomial kernels.