AITopics

1906.02732

Country:

North America > United States > California > Alameda County > Livermore (0.05)
Asia > Japan > Honshū > Tōhoku (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Mahdian, Saied, Blanchet, Jose, Glynn, Peter

Optimal Transport Relaxations with Application to Wasserstein GANs

arXiv.org Machine LearningJun-7-2019

Optimal transport costs, which include the Wasserstein Distance and the Earth-Mover-Distance as special cases, have become useful tools in machine learning and statistics [16, 3, 1, 18, 8, 6]. The optimal transport cost between two distributions is computed (in its primal form) as a minimization problem, in which the cost of transporting one distribution to another is minimized over all possible joint distributions, leading to linear program (see for example,[26]). Optimal transport provides great flexibility when comparing (probability) measures and histograms. The transportation cost function (which we refer to as the cost function) can be used to capture key geometric characteristics [16]. It can be also used to compare discrete vs continuous distributions directly, without introducing smoothing, in contrast to alternatives such as the Kullback-Leibler divergence (see [3, 13] for more details). Also, by judiciously choosing the cost function, a Wasserstein distance can generate either the topology corresponding to weak convergence or the total variation distance. In data-driven applications, one needs to estimate the optimal transport cost by means of sampled data.

artificial intelligence, machine learning, optimal transport cost, (14 more...)

1906.03317

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Perdomo, Juan C., Singer, Yaron

Robust Attacks against Multiple Classifiers

We address the challenge of designing optimal adversarial noise algorithms for settings where a learner has access to multiple classifiers. We demonstrate how this problem can be framed as finding strategies at equilibrium in a two-player, zero-sum game between a learner and an adversary. In doing so, we illustrate the need for randomization in adversarial attacks. In order to compute Nash equilibrium, our main technical focus is on the design of best response oracles that can then be implemented within a Multiplicative Weights Update framework to boost deterministic perturbations against a set of models into optimal mixed strategies. We demonstrate the practical effectiveness of our approach on a series of image classification tasks using both linear classifiers and deep neural networks.

artificial intelligence, classifier, machine learning, (19 more...)

1906.02816

Country: North America > United States > California (0.46)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

A General $\mathcal{O}(n^2)$ Hyper-Parameter Optimization for Gaussian Process Regression with Cross-Validation and Non-linearly Constrained ADMM

Xu, Linning, Yin, Feng, Zhang, Jiawei, Luo, Zhi-Quan, Cui, Shuguang

Hyper-parameter optimization remains as the core issue of Gaussian process (GP) for machine learning nowadays. The benchmark method using maximum likelihood (ML) estimation and gradient descent (GD) is impractical for processing big data due to its $O(n^3)$ complexity. Many sophisticated global or local approximation models, for instance, sparse GP, distributed GP, have been proposed to address such complexity issue. In this paper, we propose two novel and general-purpose GP hyper-parameter training schemes (GPCV-ADMM) by replacing ML with cross-validation (CV) as the fitting criterion and replacing GD with a non-linearly constrained alternating direction method of multipliers (ADMM) as the optimization method. The proposed schemes are of $O(n^2)$ complexity for any covariance matrix without special structure. We conduct various experiments based on both synthetic and real data sets, wherein the proposed schemes show excellent performance in terms of convergence, hyper-parameter estimation accuracy, and computational time in comparison with the traditional ML based routines given in the GPML toolbox.

artificial intelligence, bayesian inference, machine learning, (18 more...)

1906.02387

Country:

North America > United States (0.93)
North America > Canada (0.68)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.61)
(2 more...)

Peysakhovich, Alexander, Kroer, Christian

Fair Division Without Disparate Impact

arXiv.org Artificial IntelligenceJun-6-2019

We consider the problem of dividing items between individuals in a way that is fair both in the sense of distributional fairness and in the sense of not having disparate impact across protected classes. An important existing mechanism for distributionally fair division is competitive equilibrium from equal incomes (CEEI). Unfortunately, CEEI will not, in general, respect disparate impact constraints. We consider two types of disparate impact measures: requiring that allocations be similar across protected classes and requiring that average utility levels be similar across protected classes. We modify the standard CEEI algorithm in two ways: equitable equilibrium from equal incomes, which removes disparate impact in allocations, and competitive equilibrium from equitable incomes which removes disparate impact in attained utility levels. We show analytically that removing disparate impact in outcomes breaks several of CEEI's desirable properties such as envy, regret, Pareto optimality, and incentive compatibility. By contrast, we can remove disparate impact in attained utility levels without affecting these properties. Finally, we experimentally evaluate the tradeoffs between efficiency, equity, and disparate impact in a recommender-system based market.

allocation, artificial intelligence, machine learning, (21 more...)

arXiv.org Artificial Intelligence

1906.02775

Genre: Research Report (0.50)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Complete Dictionary Learning via $\ell^4$-Norm Maximization over the Orthogonal Group

Zhai, Yuexiang, Yang, Zitong, Liao, Zhenyu, Wright, John, Ma, Yi

This paper considers the fundamental problem of learning a complete (orthogonal) dictionary from samples of sparsely generated signals. Most existing methods solve the dictionary (and sparse representations) based on heuristic algorithms, usually without theoretical guarantees for either optimality or complexity. The recent $\ell^1$-minimization based methods do provide such guarantees but the associated algorithms recover the dictionary one column at a time. In this work, we propose a new formulation that maximizes the $\ell^4$-norm over the orthogonal group, to learn the entire dictionary. We prove that under a random data model, with nearly minimum sample complexity, the global optima of the $\ell^4$ norm are very close to signed permutations of the ground truth. Inspired by this observation, we give a conceptually simple and yet effective algorithm based on `matching, stretching, and projection' (MSP). The algorithm provably converges locally at a superlinear (cubic) rate and cost per iteration is merely an SVD. In addition to strong theoretical guarantees, experiments show that the new algorithm is significantly more efficient and effective than existing methods, including KSVD and $\ell^1$-based methods. Preliminary experimental results on real images clearly demonstrate advantages of so learned dictionary over classic PCA bases.

algorithm, artificial intelligence, machine learning, (17 more...)

1906.02435

Country: North America > United States > California (0.27)

Genre: Research Report (0.81)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Wu, Cho-Ying, Ding, Jian-Jiun

Nonconvex Approach for Sparse and Low-Rank Constrained Models with Dual Momentum

We first propose a novel nonconvex rank surrogate on the general rank minimization problem and apply this to the corrupted image completion problem. Then, we propose that nonconvex rank surrogates can be introduced into two well-known sparse and low-rank models: Robust Principal Component Analysis (RPCA) and Low-Rank Representation (LRR). For optimization, we use alternating direction method of multipliers (ADMM) and propose a trick, which is called the dual momentum. We add the difference of the dual variable between the current and the last iteration with a weight. This trick can avoid the local minimum problem and make the algorithm converge to a solution with smaller recovery error in the nonconvex optimization problem. Also, it can boost the convergence when the variable updates too slowly. We also give a severe proof and verify that the proposed algorithms are convergent. Then, several experiments are conducted, including image completion, denoising, and spectral clustering with outlier detection. These experiments show that the proposed methods are effective in image and signal processing applications, and have the best performance compared with state-of-the-art methods.

artificial intelligence, data mining, machine learning, (19 more...)

1906.02433

Genre: Research Report (0.70)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

arXiv.org Machine LearningJun-5-2019

Machine Learning and System Identification for Estimation in Physical Systems

Carlson, Fredrik Bagge

In this thesis, we draw inspiration from both classical system identification and modern machine learning in order to solve estimation problems for real-world, physical systems. The main approach to estimation and learning adopted is optimization based. Concepts such as regularization will be utilized for encoding of prior knowledge and basis-function expansions will be used to add nonlinear modeling power while keeping data requirements practical. The thesis covers a wide range of applications, many inspired by applications within robotics, but also extending outside this already wide field. Usage of the proposed methods and algorithms are in many cases illustrated in the real-world applications that motivated the research. Topics covered include dynamics modeling and estimation, model-based reinforcement learning, spectral estimation, friction modeling and state estimation and calibration in robotic machining. In the work on modeling and identification of dynamics, we develop regularization strategies that allow us to incorporate prior domain knowledge into flexible, overparameterized models. We make use of classical control theory to gain insight into training and regularization while using flexible tools from modern deep learning. A particular focus of the work is to allow use of modern methods in scenarios where gathering data is associated with a high cost. In the robotics-inspired parts of the thesis, we develop methods that are practically motivated and ensure that they are implementable also outside the research setting. We demonstrate this by performing experiments in realistic settings and providing open-source implementations of all proposed methods and algorithms.

deep learning, identification and regularization, neural network, (25 more...)

1906.02003

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.45)
Europe > Sweden (0.28)
North America > United States > Massachusetts > Middlesex County (0.14)
(9 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.67)
Government > Regional Government (0.45)
Energy > Oil & Gas (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Li, Bingcong, Ma, Meng, Giannakis, Georgios B.

On the Convergence of SARAH and Beyond

arXiv.org Machine LearningJun-5-2019

The main theme of this work is a unifying algorithm, abbreviated as L2S, that can deal with (strongly) convex and nonconvex empirical risk minimization (ERM) problems. It broadens a recently developed variance reduction method known as SARAH. L2S enjoys a linear convergence rate for strongly convex problems, which also implies the last iteration of SARAH's inner loop converges linearly. For convex problems, different from SARAH, L2S can afford step and mini-batch sizes not dependent on the data size $n$, and the complexity needed to guarantee $\mathbb{E}[\|\nabla F(\mathbf{x}) \|^2] \leq \epsilon$ is ${\cal O}(n+ n/\epsilon)$. For nonconvex problems on the other hand, the complexity is ${\cal O}(n+ \sqrt{n}/\epsilon)$. Parallel to L2S there are a few side results. Leveraging an aggressive step size, D2S is proposed, which provides a more efficient alternative to L2S and SARAH-like algorithms. Specifically, D2S requires a reduced IFO complexity of ${\cal O}\big( (n+ \bar{\kappa}) \ln (1/\epsilon) \big)$ for strongly convex problems. Moreover, to avoid the tedious selection of the optimal step size, an automatic tuning scheme is developed, which obtains comparable empirical performance with SARAH using judiciously tuned step size.

artificial intelligence, machine learning, optimization problem, (17 more...)

1906.02351

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(10 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Celis, L. Elisa, Keswani, Vijay, Yildiz, Ozan, Vishnoi, Nisheeth K.

Fair Distributions from Biased Samples: A Maximum Entropy Optimization Framework

arXiv.org Artificial IntelligenceJun-5-2019

One reason for the emergence of bias in AI systems is biased data -- datasets that may not be true representations of the underlying distributions -- and may over or under-represent groups with respect to protected attributes such as gender or race. We consider the problem of correcting such biases and learning distributions that are "fair", with respect to measures such as proportional representation and statistical parity, from the given samples. Our approach is based on a novel formulation of the problem of learning a fair distribution as a maximum entropy optimization problem with a given expectation vector and a prior distribution. Technically, our main contributions are: (1) a new second-order method to compute the (dual of the) maximum entropy distribution over an exponentially-sized discrete domain that turns out to be faster than previous methods, and (2) methods to construct prior distributions and expectation vectors that provably guarantee that the learned distributions satisfy a wide class of fairness criteria. Our results also come with quantitative bounds on the total variation distance between the empirical distribution obtained from the samples and the learned fair distribution. Our experimental results include testing our approach on the COMPAS dataset and showing that the fair distributions not only improve disparate impact values but when used to train classifiers only incur a small loss of accuracy.

artificial intelligence, machine learning, max-entropy distribution, (17 more...)

arXiv.org Artificial Intelligence

1906.02164

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Bristol (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.82)