Optimization
Influence Maximization with Novelty Decay in Social Networks
Feng, Shanshan (Nanyang Technological University) | Chen, Xuefeng (University of Electronic Science and Technology of China) | Cong, Gao (Nanyang Technological University) | Zeng, Yifeng (Teesside University) | Chee, Yeow Meng (Nanyang Technological University) | Xiang, Yanping (University of Electronic Science and Technology of China)
Influence maximization problem is to find a set of seed nodes in a social network such that their influence spread is maximized under certain propagation models. A few algorithms have been proposed for solving this problem. However, they have not considered the impact of novelty decay on influence propagation, i.e., repeated exposures will have diminishing influence on users. In this paper, we consider the problem of influence maximization with novelty decay (IMND). We investigate the effect of novelty decay on influence propagation on real-life datasets and formulate the IMND problem. We further analyze the problem properties and propose an influence estimation technique. We demonstrate the performance of our algorithms on four social networks.
Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems
Defazio, Aaron J., Caetano, Tibรฉrio S., Domke, Justin
Recent advances in optimization theory have shown that smooth strongly convex finite sums can be minimized faster than by treating them as a black box "batch" problem. In this work we introduce a new method in this class with a theoretical convergence rate four times faster than existing methods, for sums with sufficiently many terms. This method is also amendable to a sampling without replacement scheme that in practice gives further speed-ups. We give empirical results showing state of the art performance.
Algorithms for CVaR Optimization in MDPs
Chow, Yinlam, Ghavamzadeh, Mohammad
In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.
A least-squares method for sparse low rank approximation of multivariate functions
Chevreuil, Mathilde, Lebrun, Rรฉgis, Nouy, Anthony, Rai, Prashant
In this paper, we propose a low-rank approximation method based on discrete least-squares for the approximation of a multivariate function from random, noisy-free observations. Sparsity inducing regularization techniques are used within classical algorithms for low-rank approximation in order to exploit the possible sparsity of low-rank approximations. Sparse low-rank approximations are constructed with a robust updated greedy algorithm which includes an optimal selection of regularization parameters and approximation ranks using cross validation techniques. Numerical examples demonstrate the capability of approximating functions of many variables even when very few function evaluations are available, thus proving the interest of the proposed algorithm for the propagation of uncertainties through complex computational models.
DimmWitted: A Study of Main-Memory Statistical Analytics
We perform the first study of the tradeoff space of access methods and replication to support statistical analytics using first-order methods executed in the main memory of a Non-Uniform Memory Access (NUMA) machine. Statistical analytics systems differ from conventional SQL-analytics in the amount and types of memory incoherence they can tolerate. Our goal is to understand tradeoffs in accessing the data in row- or column-order and at what granularity one should share the model and data for a statistical task. We study this new tradeoff space, and discover there are tradeoffs between hardware and statistical efficiency. We argue that our tradeoff study may provide valuable information for designers of analytics engines: for each system we consider, our prototype engine can run at least one popular task at least 100x faster. We conduct our study across five architectures using popular models including SVMs, logistic regression, Gibbs sampling, and neural networks.
Linear Programs for Measuring Inconsistency in Probabilistic Logics
Potyka, Nico (Fern Universitรคt Hagen)
Inconsistency measures help analyzing contradictory knowledge bases and resolving inconsistencies. In recent years several measures with desirable properties have been proposed, but often these measures correspond to combinatorial or non-convex optimization problems that are hard to solve in practice. In this paper, I study a new family of inconsistency measures for probabilistic knowledge bases. All members satisfy many desirable properties and can be computed by means of convex optimization techniques. For two members, I present linear programs whose computation is barely harder than a probabilistic satisfiability test.
Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters
Bayesian optimisation has gained great popularity as a tool for optimising the parameters of machine learning algorithms and models. Somewhat ironically, setting up the hyper-parameters of Bayesian optimisation methods is notoriously hard. While reasonable practical solutions have been advanced, they can often fail to find the best optima. Surprisingly, there is little theoretical analysis of this crucial problem in the literature. To address this, we derive a cumulative regret bound for Bayesian optimisation with Gaussian processes and unknown kernel hyper-parameters in the stochastic setting. The bound, which applies to the expected improvement acquisition function and sub-Gaussian observation noise, provides us with guidelines on how to design hyper-parameter estimation methods. A simple simulation demonstrates the importance of following these guidelines.
Model Consistency of Partly Smooth Regularizers
Vaiter, Samuel, Peyrรฉ, Gabriel, Fadili, Jalal M.
This paper studies least-square regression penalized with partly smooth convex regularizers. This class of functions is very large and versatile allowing to promote solutions conforming to some notion of low-complexity. Indeed, they force solutions of variational problems to belong to a low-dimensional manifold (the so-called model) which is stable under small perturbations of the function. This property is crucial to make the underlying low-complexity model robust to small noise. We show that a generalized "irrepresentable condition" implies stable model selection under small noise perturbations in the observations and the design matrix, when the regularization parameter is tuned proportionally to the noise level. This condition is shown to be almost a necessary condition. We then show that this condition implies model consistency of the regularized estimator. That is, with a probability tending to one as the number of measurements increases, the regularized estimator belongs to the correct low-dimensional model manifold. This work unifies and generalizes several previous ones, where model consistency is known to hold for sparse, group sparse, total variation and low-rank regularizations.
On Soft Power Diagrams
Noname manuscript No. (will be inserted by the editor) Abstract Many applications in data analysis begin with a set of points in a Euclidean space that is partitioned into clusters. Common tasks then are to devise a classifier deciding which of the clusters a new point is associated to, finding outliers with respect to the clusters, or identifying the type of clustering used for the partition. One of the common kinds of clusterings are (balanced) least-squares assignments with respect to a given set of sites. For these, there is a'separating power diagram' for which each cluster lies in its own cell. In the present paper, we aim for efficient algorithms for outlier detection and the computation of thresholds that measure how similar a clustering is to a leastsquares assignment for fixed sites. For this purpose, we devise a new model for the computation of a'soft power diagram', which allows a soft separation of the clusters with'point counting properties'; e.g. As our results hold for a more general non-convex model of free sites, we describe it and our proofs in this more general way. Its locally optimal solutions satisfy the aforementioned point counting properties. For our target applications that use fixed sites, our algorithms are efficiently solvable to global optimality by linear programming.
Convex Optimization Learning of Faithful Euclidean Distance Representations in Nonlinear Dimensionality Reduction
Classical multidimensional scaling only works well when the noisy distances observed in a high dimensional space can be faithfully represented by Euclidean distances in a low dimensional space. Advanced models such as Maximum Variance Unfolding (MVU) and Minimum Volume Embedding (MVE) use Semi-Definite Programming (SDP) to reconstruct such faithful representations. While those SDP models are capable of producing high quality configuration numerically, they suffer two major drawbacks. One is that there exist no theoretically guaranteed bounds on the quality of the configuration. The other is that they are slow in computation when the data points are beyond moderate size. In this paper, we propose a convex optimization model of Euclidean distance matrices. We establish a non-asymptotic error bound for the random graph model with sub-Gaussian noise, and prove that our model produces a matrix estimator of high accuracy when the order of the uniform sample size is roughly the degree of freedom of a low-rank matrix up to a logarithmic factor. Our results partially explain why MVU and MVE often work well. Moreover, we develop a fast inexact accelerated proximal gradient method. Numerical experiments show that the model can produce configurations of high quality on large data points that the SDP approach would struggle to cope with.