Goto

Collaborating Authors

 Education


Towards Pareto Descent Directions in Sampling Experts for Multiple Tasks in an On-Line Learning Paradigm

AAAI Conferences

In many real-life design problems, there is a requirement to simultaneously balance multiple tasks or objectives in the system that are conflicting in nature, where minimizing one objective causes another to increase in value, thereby resulting in trade-offs between the objectives. For example, in embedded multi-core mobile devices and very large scale data centers, there is a continuous problem of simultaneously balancing interfering goals of maximal power savings and minimal performance delay with varying trade-off values for different application workloads executing on them. Typically, the optimal trade-offs for the executing workloads, lie on a difficult to determine optimal Pareto front. The nature of the problem requires learning over the lifetime of the mobile device or server with continuous evaluation and prediction of the trade-off settings on the system that balances the interfering objectives optimally. Towards this, we propose an on-line learning method, where the weights of experts for addressing the objectives are updated based on a convex combination of their relative performance in addressing all objectives simultaneously. An additional importance vector that assigns relative importance to each objective at every round is used, and is sampled from a convex cone pointed at the origin Our preliminary results show that the convex combination of the importance vector and the gradient of the potential functions of the learner's regret with respect to each objective ensure that in the next round, the drift (instantaneous regret vector), is the Pareto descent direction that enables better convergence to the optimal Pareto front.


Multi-Engine Machine Translation as a Lifelong Machine Learning Problem

AAAI Conferences

We describe an approach for multi-engine machine translation that uses machine learning methods to train one or several classifiers for a given set of candidate translations. Contrary to existing approaches in quality estimation which only consider a single translation at a time, we explicitly model pairwise comparison with our feature vectors. We discuss several challenges our method is facing and discuss how lifelong machine learning could be applied to resolve these. We also show how the proposed architecture can be extended to allow human feedback to be included into the training process, improving the system's selection process over time.


A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions

arXiv.org Machine Learning

We propose an algorithmic framework for convex minimization problems of a composite function with two terms: a self-concordant function and a possibly nonsmooth regularization term. Our method is a new proximal Newton algorithm that features a local quadratic convergence rate. As a specific instance of our framework, we consider the sparse inverse covariance matrix estimation in graph learning problems. Via a careful dual formulation and a novel analytic step-size selection procedure, our approach for graph learning avoids Cholesky decompositions and matrix inversions in its iteration making it attractive for parallel and distributed implementations.


Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

arXiv.org Machine Learning

We study the problem of learning Markov decision processes with finite state and action spaces when the transition probability distributions and loss functions are chosen adversarially and are allowed to change with time. We introduce an algorithm whose regret with respect to any policy in a comparison class grows as the square root of the number of rounds of the game, provided the transition probabilities satisfy a uniform mixing condition. Our approach is efficient as long as the comparison class is polynomial and we can compute expectations over sample paths for each policy. Designing an efficient algorithm with small regret for the general case remains an open problem.


Variational Inference in Nonconjugate Models

arXiv.org Machine Learning

Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest---like the correlated topic model and Bayesian logistic regression---are nonconjuate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world datasets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression.


Fairness in Academic Course Timetabling

arXiv.org Artificial Intelligence

We consider the problem of creating fair course timetables in the setting of a university. Our motivation is to improve the overall satisfaction of individuals concerned (students, teachers, etc.) by providing a fair timetable to them. The central idea is that undesirable arrangements in the course timetable, i.e., violations of soft constraints, should be distributed in a fair way among the individuals. We propose two formulations for the fair course timetabling problem that are based on max-min fairness and Jain's fairness index, respectively. Furthermore, we present and experimentally evaluate an optimization algorithm based on simulated annealing for solving max-min fair course timetabling problems. The new contribution is concerned with measuring the energy difference between two timetables, i.e., how much worse a timetable is compared to another timetable with respect to max-min fairness. We introduce three different energy difference measures and evaluate their impact on the overall algorithm performance. The second proposed problem formulation focuses on the tradeoff between fairness and the total amount of soft constraint violations. Our experimental evaluation shows that the known best solutions to the ITC2007 curriculum-based course timetabling instances are quite fair with respect to Jain's fairness index. However, the experiments also show that the fairness can be improved further for only a rather small increase in the total amount of soft constraint violations.


Monte-Carlo utility estimates for Bayesian reinforcement learning

arXiv.org Machine Learning

Bayesian reinforcement learning [1], [2] is the decisiontheoretic approach [3] to solving the reinforcement learning problem. Unfonrtunately, calculating posterior distributions can be computationally expensive. Morever, the Bayesoptimal decision can be intractable [4], [5], [1], and even calculating an optimal solution in a restricted class can be difficult [6]. This paper proposes a set of algorithms that take actions by estimating bounds on the Bayes-optimal utility through sampling. They include a direct Monte-Carlo approach, as well as gradient-based approaches. We demonstrate the effectiveness of the proposed algorithms experimentally. A. Setting In the reinforcement learning problem, an agent is acting in some unknown Markovian environment µ M, according to some policy π Π. The agent's policy is a procedure for selecting actions, with the action at time t being a


Clustering on Multi-Layer Graphs via Subspace Analysis on Grassmann Manifolds

arXiv.org Machine Learning

Relationships between entities in datasets are often of multiple nature, like geographical distance, social relationships, or common interests among people in a social network, for example. This information can naturally be modeled by a set of weighted and undirected graphs that form a global multilayer graph, where the common vertex set represents the entities and the edges on different layers capture the similarities of the entities in term of the different modalities. In this paper, we address the problem of analyzing multi-layer graphs and propose methods for clustering the vertices by efficiently merging the information provided by the multiple modalities. To this end, we propose to combine the characteristics of individual graph layers using tools from subspace analysis on a Grassmann manifold. The resulting combination can then be viewed as a low dimensional representation of the original data which preserves the most important information from diverse relationships between entities. We use this information in new clustering methods and test our algorithm on several synthetic and real world datasets where we demonstrate superior or competitive performances compared to baseline and state-of-the-art techniques. Our generic framework further extends to numerous analysis and learning problems that involve different types of information on graphs.


Large-Margin Metric Learning for Partitioning Problems

arXiv.org Machine Learning

In this paper, we consider unsupervised partitioning problems, such as clustering, image segmentation, video segmentation and other change-point detection problems. We focus on partitioning problems based explicitly or implicitly on the minimization of Euclidean distortions, which include mean-based change-point detection, K-means, spectral clustering and normalized cuts. Our main goal is to learn a Mahalanobis metric for these unsupervised problems, leading to feature weighting and/or selection. This is done in a supervised way by assuming the availability of several potentially partially labelled datasets that share the same metric. We cast the metric learning problem as a large-margin structured prediction problem, with proper definition of regularizers and losses, leading to a convex optimization problem which can be solved efficiently with iterative techniques. We provide experiments where we show how learning the metric may significantly improve the partitioning performance in synthetic examples, bioinformatics, video segmentation and image segmentation problems.


A Correlation Clustering Approach to Link Classification in Signed Networks -- Full Version --

arXiv.org Machine Learning

Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label regularity. We derive learning bounds in terms of correlation clustering within three fundamental transductive learning settings: online, batch and active. Our main algorithmic contribution is in the active setting, where we introduce a new family of efficient link classifiers based on covering the input graph with small circuits. These are the first active algorithms for link classification with mistake bounds that hold for arbitrary signed networks.