Optimization
A Convex Formulation of Modularity Maximization for Community Detection
Chan, Yun Kwan (Hong Kong University of Science and Technology) | Yeung, Dit-Yan (Hong Kong University of Science and Technology)
Complex networks pervade in diverse areas ranging from the natural world to the engineered world and from traditional application domains to new and emerging domains, including web-based social networks. Of crucial importance to the understanding of many network phenomena, dynamics and functions is the study of network structural properties. One important type of network structure is known as community structure which refers to the existence of communities that are tightly knit local groups with relatively dense connections among their members. Community detection is the problem of detecting these communities automatically. In this paper, based on the modularity measure proposed previously for community detection, we first propose a reformulation of an optimization problem for the 2-partition problem. Based on this new formulation, we can extend it naturally for tackling the general k-partition problem directly without having to tackle multiple 2-partition subproblems like what other methods do. We then propose a convex relaxation scheme to give an iterative algorithm which solves a simple quadratic program in each iteration. We empirically compare our method with some related methods and find that our method is both scalable and competitive in performance via maintaining a good tradeoff between efficiency and quality.
Eliciting Additive Reward Functions for Markov Decision Processes
Regan, Kevin (University of Toronto) | Boutilier, Craig (University of Toronto)
Specifying the reward function of a Markov decision process (MDP) can be demanding, requiring human assessment of the precise quality of, and tradeoffs among, various states and actions. However, reward functions often possess considerable structure which can be leveraged to streamline their specification. We develop new, decision-theoretically sound heuristics for eliciting rewards for factored MDPs whose reward functions exhibit additive independence. Since we can often find good policies without complete reward specification, we also develop new (exact and approximate) algorithms for robust optimization ofimprecise-reward MDPs with such additive reward. Our methods are evaluated in two domains: autonomic computing and assistive technology.
Predicting Globally-Coherent Temporal Structures from Texts Via Endpoint Inference and Graph Decomposition
Denis, Pascal (Alpage, INRIA and University of Paris Diderot) | Muller, Philippe (Alpage, INRIA and IRIT and University of Toulouse)
An elegant approach to learning temporal orderings from texts is to formulate this problem as a constraint optimization problem, which can be then given an exact solution using Integer Linear Programming. This works well for cases where the number of possible relations between temporal entities is restricted to the mere precedence relation [Bramsen et al., 2006; Chambers and Jurafsky, 2008], but becomes impractical when considering all possible interval relations. This paper proposes two innovations, inspired from work on temporal reasoning, that control this combinatorial blow-up, therefore rendering an exact ILP inference viable in the general case. First, we translate our network of constraints from temporal intervals to their endpoints, to handle a drastically smaller set of constraints, while preserving the same temporal information. Second, we show that additional efficiency is gained by enforcing coherence on particular subsets of the entire temporal graphs. We evaluate these innovations through various experiments on TimeBank 1.2, and compare our ILP formulations with various baselines and oracle systems.
A Fast Dual Projected Newton Method for L1-Regularized Least Squares
Gong, Pinghua (Tsinghua University) | Zhang, Changshui (Tsinghua University)
L1-regularized least squares, with the ability of discovering sparse representations, is quite prevalent in the field of machine learning, statistics and signal processing. In this paper, we propose a novel algorithm called Dual Projected Newton Method (DPNM) to solve the L1-regularized least squares problem. In DPNM, we first derive a new dual problem as a box constrained quadratic programming. Then, a projected Newton method is utilized to solve the dual problem, achieving a quadratic convergence rate . Moreover, we propose to utilize some practical techniques, thus it greatly reduces the computational cost and makes DPNM more efficient. Experimental results on six real-world data sets indicate that DPNM is very efficient for solving the L1-regularized least squares problem, by comparing it with state of the art methods.
Distance Metric Learning under Covariate Shift
Cao, Bin (Hong Kong University of Science and Technology) | Ni, Xiaochuan (Microsoft Research Asia) | Sun, Jian-Tao (Microsoft Research Asia) | Wang, Gang (Microsoft) | Yang, Qiang (Hong Kong University of Science and Technology)
Learning distance metrics is a fundamental problem in machine learning. Previous distance-metric learning research assumes that the training and test data are drawn from the same distribution, which may be violated in practical applications. When the distributions differ, a situation referred to as covariate shift, the metric learned from training data may not work well on the test data. In this case the metric is said to be inconsistent. In this paper, we address this problem by proposing a novel metric learning framework known as consistent distance metric learning (CDML), which solves the problem under covariate shift situations. We theoretically analyze the conditions when the metrics learned under covariate shift are consistent. Based on the analysis, a convex optimization problem is proposed to deal with the CDML problem. An importance sampling method is proposed for metric learning and two importance weighting strategies are proposed and compared in this work. Experiments are carried out on synthetic and real world datasets to show the effectiveness of the proposed method.
Budgeted Social Choice: From Consensus to Personalized Decision Making
Lu, Tyler (University of Toronto) | Boutilier, Craig (University of Toronto)
We develop a general framework for social choice problems in which a limited number of alternatives can be recommended to an agent population. In our budgeted social choice model, this limit is determined by a budget, capturing problems that arise naturally in a variety of contexts, and spanning the continuum from pure consensus decision making (i.e., standard social choice) to fully personalized recommendation. Our approach applies a form of segmentation to social choice problemsโ requiring the selection of diverse options tailored to different agent typesโand generalizes certain multi-winner election schemes. We show that standard rank aggregation methods perform poorly, and that optimization in our model is NP-complete; but we develop fast greedy algorithms with some theoretical guarantees. Experiments on real-world datasets demonstrate the effectiveness of our algorithms.
Iteration Complexity of Randomized Block-Coordinate Descent Methods for Minimizing a Composite Function
Richtรกrik, Peter, Takรกฤ, Martin
In this paper we develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an $\epsilon$-accurate solution with probability at least $1-\rho$ in at most $O(\tfrac{n}{\epsilon} \log \tfrac{1}{\rho})$ iterations, where $n$ is the number of blocks. For strongly convex functions the method converges linearly. This extends recent results of Nesterov [Efficiency of coordinate descent methods on huge-scale optimization problems, CORE Discussion Paper #2010/2], which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing $\epsilon$ from the logarithmic term. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving true iteration complexity bounds. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. Finally, we demonstrate numerically that the algorithm is able to solve huge-scale $\ell_1$-regularized least squares and support vector machine problems with a billion variables.
Proximal Methods for Hierarchical Sparse Coding
Jenatton, Rodolphe, Mairal, Julien, Obozinski, Guillaume, Bach, Francis
Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced tree-structured sparse regularization norm, which has proven useful in several applications. This norm leads to regularized problems that are difficult to optimize, and we propose in this paper efficient algorithms for solving them. More precisely, we show that the proximal operator associated with this norm is computable exactly via a dual approach that can be viewed as the composition of elementary proximal operators. Our procedure has a complexity linear, or close to linear, in the number of atoms, and allows the use of accelerated gradient techniques to solve the tree-structured sparse approximation problem at the same computational cost as traditional ones using the L1-norm. Our method is efficient and scales gracefully to millions of variables, which we illustrate in two types of applications: first, we consider fixed hierarchical dictionaries of wavelets to denoise natural images. Then, we apply our optimization tools in the context of dictionary learning, where learned dictionary elements naturally organize in a prespecified arborescent structure, leading to a better performance in reconstruction of natural image patches. When applied to text documents, our method learns hierarchies of topics, thus providing a competitive alternative to probabilistic topic models.
Phase Transitions and Backbones of the Asymmetric Traveling Salesman Problem
In recent years, there has been much interest in phase transitions of combinatorial problems. Phase transitions have been successfully used to analyze combinatorial optimization problems, characterize their typical-case features and locate the hardest problem instances. In this paper, we study phase transitions of the asymmetric Traveling Salesman Problem (ATSP), an NP-hard combinatorial optimization problem that has many real-world applications. Using random instances of up to 1,500 cities in which intercity distances are uniformly distributed, we empirically show that many properties of the problem, including the optimal tour cost and backbone size, experience sharp transitions as the precision of intercity distances increases across a critical value. Our experimental results on the costs of the ATSP tours and assignment problem agree with the theoretical result that the asymptotic cost of assignment problem is pi ^2 /6 the number of cities goes to infinity. In addition, we show that the average computational cost of the well-known branch-and-bound subtour elimination algorithm for the problem also exhibits a thrashing behavior, transitioning from easy to difficult as the distance precision increases. These results answer positively an open question regarding the existence of phase transitions in the ATSP, and provide guidance on how difficult ATSP problem instances should be generated.
Complexity Results and Approximation Strategies for MAP Explanations
MAP is the problem of finding a most probable instantiation of a set of variables given evidence. MAP has always been perceived to be significantly harder than the related problems of computing the probability of a variable instantiation Pr, or the problem of computing the most probable explanation (MPE). This paper investigates the complexity of MAP in Bayesian networks. Specifically, we show that MAP is complete for NP^PP and provide further negative complexity results for algorithms based on variable elimination. We also show that MAP remains hard even when MPE and Pr become easy. For example, we show that MAP is NP-complete when the networks are restricted to polytrees, and even then can not be effectively approximated. Given the difficulty of computing MAP exactly, and the difficulty of approximating MAP while providing useful guarantees on the resulting approximation, we investigate best effort approximations. We introduce a generic MAP approximation framework. We provide two instantiations of the framework; one for networks which are amenable to exact inference Pr, and one for networks for which even exact inference is too hard. This allows MAP approximation on networks that are too complex to even exactly solve the easier problems, Pr and MPE. Experimental results indicate that using these approximation algorithms provides much better solutions than standard techniques, and provide accurate MAP estimates in many cases.