AITopics

Error-Correcting Output Codes (ECOCs) offer a principled approach for combining simple binary classifiers into multiclass classifiers. In this paper, we investigate the problem of designing optimal ECOCs to achieve both nominal and adversarial accuracy using Support Vector Machines (SVMs) and binary deep learning models. In contrast to previous literature, we present an Integer Programming (IP) formulation to design minimal codebooks with desirable error correcting properties. Our work leverages the advances in IP solvers to generate codebooks with optimality guarantees. To achieve tractability, we exploit the underlying graph-theoretic structure of the constraint set in our IP formulation. This enables us to use edge clique covers to substantially reduce the constraint set. Our codebooks achieve a high nominal accuracy relative to standard codebooks (e.g., one-vs-all, one-vs-one, and dense/sparse codes). We also estimate the adversarial accuracy of our ECOC-based classifiers in a white-box setting. Our IP-generated codebooks provide non-trivial robustness to adversarial perturbations even without any adversarial training.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

2011.00144

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Leplat, Valentin, Gillis, Nicolas, Idier, Jérôme

Multiplicative Updates for NMF with $\beta$-Divergences under Disjoint Equality Constraints

Nonnegative matrix factorization (NMF) is the problem of approximating an input nonnegative matrix, $V$, as the product of two smaller nonnegative matrices, $W$ and $H$. In this paper, we introduce a general framework to design multiplicative updates (MU) for NMF based on $\beta$-divergences ($\beta$-NMF) with disjoint equality constraints, and with penalty terms in the objective function. By disjoint, we mean that each variable appears in at most one equality constraint. Our MU satisfy the set of constraints after each update of the variables during the optimization process, while guaranteeing that the objective function decreases monotonically. We showcase this framework on three NMF models, and show that it competes favorably the state of the art: (1)~$\beta$-NMF with sum-to-one constraints on the columns of $H$, (2) minimum-volume $\beta$-NMF with sum-to-one constraints on the columns of $W$, and (3) sparse $\beta$-NMF with $\ell_2$-norm constraints on the columns of $W$.

artificial intelligence, constraint, machine learning, (17 more...)

2010.16223

Country:

Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Finite-Sample Guarantees for Wasserstein Distributionally Robust Optimization: Breaking the Curse of Dimensionality

Gao, Rui

Wasserstein distributionally robust optimization (DRO) aims to find robust and generalizable solutions by hedging against data perturbations in Wasserstein distance. Despite its recent empirical success in operations research and machine learning, existing performance guarantees for generic loss functions are either overly conservative due to the curse of dimensionality, or plausible only in large sample asymptotics. In this paper, we develop a non-asymptotic framework for analyzing the out-of-sample performance for Wasserstein robust learning and the generalization bound for its related Lipschitz and gradient regularization problems. To the best of our knowledge, this gives the first finite-sample guarantee for generic Wasserstein DRO problems without suffering from the curse of dimensionality. Our results highlight the bias-variation trade-off intrinsic in the Wasserstein DRO, which balances between the empirical mean of the loss and the variation of the loss, measured by the Lipschitz norm or the gradient norm of the loss. Our analysis is based on two novel methodological developments that are of independent interest: 1) a new concentration inequality controlling the decay rate of large deviation probabilities by the variation of the loss and, 2) a localized Rademacher complexity theory based on the variation of the loss.

artificial intelligence, machine learning, optimization problem, (15 more...)

2009.04382

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Energy (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

A Manifold Proximal Linear Method for Sparse Spectral Clustering with Application to Single-Cell RNA Sequencing Data Analysis

Wang, Zhongruo, Liu, Bingyuan, Chen, Shixiang, Ma, Shiqian, Xue, Lingzhou, Zhao, Hongyu

Spectral clustering is one of the fundamental unsupervised learning methods widely used in data analysis. Sparse spectral clustering (SSC) imposes sparsity to the spectral clustering and it improves the interpretability of the model. This paper considers a widely adopted model for SSC, which can be formulated as an optimization problem over the Stiefel manifold with nonsmooth and nonconvex objective. Such an optimization problem is very challenging to solve. Existing methods usually solve its convex relaxation or need to smooth its nonsmooth part using certain smoothing techniques. In this paper, we propose a manifold proximal linear method (ManPL) that solves the original SSC formulation. We also extend the algorithm to solve the multiple-kernel SSC problems, for which an alternating ManPL algorithm is proposed. Convergence and iteration complexity results of the proposed methods are established. We demonstrate the advantage of our proposed methods over existing methods via the single-cell RNA sequencing data analysis.

algorithm, artificial intelligence, machine learning, (13 more...)

2007.09524

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas (0.04)
North America > United States > Pennsylvania (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Artificial IntelligenceOct-30-2020

The Mathematical Foundations of Manifold Learning

Melas-Kyriazi, Luke

This is an edited version of my undergraduate thesis, submitted to the Harvard Mathematics Department in May 2020. It differs from the original thesis in one major respect, namely that this version omits the proofs of a number of theorems that are readily-available in other expositions. Whereas the original version reproduced these proofs in full, this version simply contains references to these proofs in other works. This thesis is built upon an extensive body of prior work in learning theory, graph theory, differential geometry, and manifold learning. In particular, I would like to thank Professors Lorenzo Rosasco and Tomaso Poggio for their lectures on statistical learning theory, Professor Daniel Spielman for his notes on spectral graph theory, Professor Yaiza Canzani for her notes on analysis on manifolds, and Professor Mikhail Belkin for his work on manifold learning. Finally, I wish to thank those people without whom I could never have written this thesis: my family, friends, and wonderful advisor Professor Arjun Manrai. Unlike the manifolds discussed herein, their support was truly boundless. I hope you enjoy and learn something from this thesis! If you have comments, corrections, or would like to contact me for anything else, feel free to email me.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2011.01307

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.67)
Research Report > New Finding (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)

Yang, Yao, Nam, Andrew, Nasr-Azadani, Mohamad M., Tung, Teresa

Resource-Aware Pareto-Optimal Automated Machine Learning Platform

arXiv.org Artificial IntelligenceOct-30-2020

In this study, we introduce a novel platform Resource-Aware AutoML (RA-AutoML) which enables flexible and generalized algorithms to build machine learning models subjected to multiple objectives, as well as resource and hard-ware constraints. RA-AutoML intelligently conducts Hyper-Parameter Search(HPS) as well as Neural Architecture Search (NAS) to build models optimizing predefined objectives. RA-AutoML is a versatile framework that allows user to prescribe many resource/hardware constraints along with objectives demanded by the problem at hand or business requirements. At its core, RA-AutoML relies on our in-house search-engine algorithm,MOBOGA, which combines a modified constraint-aware Bayesian Optimization and Genetic Algorithm to construct Pareto optimal candidates. Our experiments on CIFAR-10 dataset shows very good accuracy compared to results obtained by state-of-art neural network models, while subjected to resource constraints in the form of model size.

constraint, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2011.00073

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Workflow (0.69)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceOct-30-2020

Inherent Trade-offs in the Fair Allocation of Treatments

He, Yuzi, Burghardt, Keith, Guo, Siyi, Lerman, Kristina

Explicit and implicit bias clouds human judgement, leading to discriminatory treatment of minority groups. A fundamental goal of algorithmic fairness is to avoid the pitfalls in human judgement by learning policies that improve the overall outcomes while providing fair treatment to protected classes. In this paper, we propose a causal framework that learns optimal intervention policies from data subject to fairness constraints. We define two measures of treatment bias and infer best treatment assignment that minimizes the bias while optimizing overall outcome. We demonstrate that there is a dilemma of balancing fairness and overall benefit; however, allowing preferential treatment to protected classes in certain circumstances (affirmative action) can dramatically improve the overall benefit while also preserving fairness. We apply our framework to data containing student outcomes on standardized tests and show how it can be used to design real-world policies that fairly improve student test scores. Our framework provides a principled way to learn fair treatment policies in real-world settings.

artificial intelligence, machine learning, subgroup, (19 more...)

arXiv.org Artificial Intelligence

2010.16409

Country:

North America > United States > California (0.14)
North America > United States > North Carolina (0.04)
North America > United States > Louisiana (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education > Educational Setting (1.00)
Government (0.94)
Education > Assessment & Standards > Student Performance (0.69)
Health & Medicine > Therapeutic Area > Immunology (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

Tarnawski, Jakub, Phanishayee, Amar, Devanur, Nikhil R., Mahajan, Divya, Paravecino, Fanny Nina

Efficient Algorithms for Device Placement of DNN Graph Operators

arXiv.org Machine LearningOct-29-2020

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific accelerators being offered as hardware accelerators in addition to CPUs. These trends necessitate distributing the workload across multiple devices. Recent work has shown that significant gains can be obtained with model parallelism, i.e, partitioning a neural network's computational graph onto multiple devices. In particular, this form of parallelism assumes a pipeline of devices, which is fed a stream of samples and yields high throughput for training and inference of DNNs. However, for such settings (large models and multiple heterogeneous devices), we require automated algorithms and toolchains that can partition the ML workload across devices. In this paper, we identify and isolate the structured optimization problem at the core of device placement of DNN operators, for both inference and training, especially in modern pipelined settings. We then provide algorithms that solve this problem to optimality. We demonstrate the applicability and efficiency of our approaches using several contemporary DNN computation graphs.

artificial intelligence, machine learning, natural language, (21 more...)

2006.16423

Country:

Europe > Germany (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningOct-29-2020

Tensor Completion via Tensor Networks with a Tucker Wrapper

Cai, Yunfeng, Li, Ping

In recent years, low-rank tensor completion (LRTC) has received considerable attention due to its applications in image/video inpainting, hyperspectral data recovery, etc. With different notions of tensor rank (e.g., CP, Tucker, tensor train/ring, etc.), various optimization based numerical methods are proposed to LRTC. However, tensor network based methods have not been proposed yet. In this paper, we propose to solve LRTC via tensor networks with a Tucker wrapper. Here by "Tucker wrapper" we mean that the outermost factor matrices of the tensor network are all orthonormal. We formulate LRTC as a problem of solving a system of nonlinear equations, rather than a constrained optimization problem. A two-level alternative least square method is then employed to update the unknown factors. The computation of the method is dominated by tensor matrix multiplications and can be efficiently performed. Also, under proper assumptions, it is shown that with high probability, the method converges to the exact solution at a linear rate. Numerical simulations show that the proposed algorithm is comparable with state-of-the-art methods.

artificial intelligence, machine learning, tensor, (17 more...)

2010.15819

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
Asia > China > Beijing > Beijing (0.04)
(9 more...)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

arXiv.org Machine LearningOct-29-2020

Off-Policy Interval Estimation with Lipschitz Value Iteration

Tang, Ziyang, Feng, Yihao, Zhang, Na, Peng, Jian, Liu, Qiang

Off-policy evaluation provides an essential tool for evaluating the effects of different policies or treatments using only observed data. When applied to high-stakes scenarios such as medical diagnosis or financial decision-making, it is crucial to provide provably correct upper and lower bounds of the expected reward, not just a classical single point estimate, to the end-users, as executing a poor policy can be very costly. In this work, we propose a provably correct method for obtaining interval bounds for off-policy evaluation in a general continuous setting. The idea is to search for the maximum and minimum values of the expected reward among all the Lipschitz Q-functions that are consistent with the observations, which amounts to solving a constrained optimization problem on a Lipschitz function space. We go on to introduce a Lipschitz value iteration method to monotonically tighten the interval, which is simple yet efficient and provably convergent. We demonstrate the practical efficiency of our method on a range of benchmarks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2010.15392

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Illinois (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Diagnostic Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)