AITopics

2106.12491

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.53)

arXiv.org Artificial IntelligenceJun-14-2021

RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning

Killamsetty, Krishnateja, Zhao, Xujiang, Chen, Feng, Iyer, Rishabh

Semi-supervised learning (SSL) algorithms have had great success in recent years in limited labeled data regimes. However, the current state-of-the-art SSL algorithms are computationally expensive and entail significant compute time and energy requirements. This can prove to be a huge limitation for many smaller companies and academic groups. Our main insight is that training on a subset of unlabeled data instead of entire unlabeled data enables the current SSL algorithms to converge faster, thereby reducing the computational costs significantly. In this work, we propose RETRIEVE, a coreset selection framework for efficient and robust semi-supervised learning. RETRIEVE selects the coreset by solving a mixed discrete-continuous bi-level optimization problem such that the selected coreset minimizes the labeled set loss. We use a one-step gradient approximation and show that the discrete optimization problem is approximately submodular, thereby enabling simple greedy algorithms to obtain the coreset. We empirically demonstrate on several real-world datasets that existing SSL algorithms like VAT, Mean-Teacher, FixMatch, when used with RETRIEVE, achieve a) faster training times, b) better performance when unlabeled data consists of Out-of-Distribution(OOD) data and imbalance. More specifically, we show that with minimal accuracy degradation, RETRIEVE achieves a speedup of around 3X in the traditional SSL setting and achieves a speedup of 5X compared to state-of-the-art (SOTA) robust SSL algorithms in the case of imbalance and OOD data.

deep learning, etrieve, neural network, (20 more...)

2106.0776

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJan-15-2021

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Killamsetty, Krishnateja, Sivasubramanian, Durga, Ramakrishnan, Ganesh, Iyer, Rishabh

Large scale machine learning and deep models are extremely data-hungry. Unfortunately, obtaining large amounts of labeled data is expensive, and training state-of-the-art models (with hyperparameter tuning) requires significant computing resources and time. Secondly, real-world data is noisy and imbalanced. As a result, several recent papers try to make the training process more efficient and robust. However, most existing work either focuses on robustness or efficiency, but not both. In this work, we introduce Glister, a GeneraLIzation based data Subset selecTion for Efficient and Robust learning framework. We formulate Glister as a mixed discrete-continuous bi-level optimization problem to select a subset of the training data, which maximizes the log-likelihood on a held-out validation set. Next, we propose an iterative online algorithm Glister-Online, which performs data selection iteratively along with the parameter updates and can be applied to any loss-based learning algorithm. We then show that for a rich class of loss functions including cross-entropy, hinge-loss, squared-loss, and logistic-loss, the inner discrete data selection is an instance of (weakly) submodular optimization, and we analyze conditions for which Glister-Online reduces the validation loss and converges. Finally, we propose Glister-Active, an extension to batch active learning, and we empirically demonstrate the performance of Glister on a wide range of tasks including, (a) data selection to reduce training time, (b) robust learning under label noise and imbalance settings, and (c) batch-active learning with several deep and shallow models. We show that our framework improves upon state of the art both in efficiency and accuracy (in cases (a) and (c)) and is more efficient compared to other state-of-the-art robust learning algorithms in case (b).

dataset, deep learning, neural network, (18 more...)

2012.1063

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-14-2020

Improving Accuracy of Federated Learning in Non-IID Settings

Ozdayi, Mustafa Safa, Kantarcioglu, Murat, Iyer, Rishabh

Federated Learning (FL) is a decentralized machine learning protocol that allows a set of participating agents to collaboratively train a model without sharing their data. This makes FL particularly suitable for settings where data privacy is desired. However, it has been observed that the performance of FL is closely tied with the local data distributions of agents. Particularly, in settings where local data distributions vastly differ among agents, FL performs rather poorly with respect to the centralized training. To address this problem, we hypothesize the reasons behind the performance degradation, and develop some techniques to address these reasons accordingly. In this work, we identify four simple techniques that can improve the performance of trained models without incurring any additional communication overhead to FL, but rather, some light computation overhead either on the client, or the server-side. In our experimental analysis, a combination of our techniques improved the validation accuracy of a model trained via FL by more than 12% with respect to our baseline. This is about 5% less than the accuracy of the model trained on centralized data.

agent, artificial intelligence, machine learning, (15 more...)

2010.15582

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningAug-22-2020

Data Programming using Semi-Supervision and Subset Selection

Maheshwari, Ayush, Chatterjee, Oishik, Killamsetty, KrishnaTeja, Iyer, Rishabh, Ramakrishnan, Ganesh

The paradigm of data programming~\cite{bach2019snorkel} has shown a lot of promise in using weak supervision in the form of rules and labelling functions to learn in scenarios where labelled data is not available. Another approach which has shown a lot of promise is that of semi-supervised learning where we augment small amounts of labelled data with a large unlabelled dataset. In this work, we argue that by not using any labelled data, data programming based approaches can yield sub-optimal performance, particularly, in cases when the labelling functions are noisy. The first contribution of this work is to study a framework of joint learning which combines un-supervised consensus from labelling functions with semi-supervised learning and \emph{jointly learns a model} to efficiently use the rules/labelling functions along with semi-supervised loss functions on the feature space. Next, we also study a subset selection approach to \emph{select} the set of examples which can be used as the labelled set. We evaluate our techniques on synthetic data as well as four publicly available datasets and show improvement over state-of-the-art techniques\footnote{Source code of the paper at \url{https://github.com/ayushbits/Semi-Supervised-LFs-Subset-Selection}}.

dataset, inductive learning, neural network, (19 more...)

2008.09887

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Machine LearningJul-3-2020

Submodular Combinatorial Information Measures with Applications in Machine Learning

Iyer, Rishabh, Khargonkar, Ninad, Bilmes, Jeff, Asnani, Himanshu

Information-theoretic quantities like entropy and mutual information have found numerous uses in machine learning. It is well known that there is a strong connection between these entropic quantities and submodularity since entropy over a set of random variables is submodular. In this paper, we study combinatorial information measures that generalize independence, (conditional) entropy, (conditional) mutual information, and total correlation defined over sets of (not necessarily random) variables. These measures strictly generalize the corresponding entropic measures since they are all parameterized via submodular functions that themselves strictly generalize entropy. Critically, we show that, unlike entropic mutual information in general, the submodular mutual information is actually submodular in one argument, holding the other fixed, for a large class of submodular functions whose third-order partial derivatives satisfy a non-negativity property. This turns out to include a number of practically useful cases such as the facility location and set-cover functions. We study specific instantiations of the submodular information measures on these, as well as the probabilistic coverage, graph-cut, and saturated coverage functions, and see that they all have mathematically intuitive and practically useful expressions. Regarding applications, we connect the maximization of submodular (conditional) mutual information to problems such as mutual-information-based, query-based, and privacy-preserving summarization -- and we connect optimizing the multi-set submodular mutual information to clustering and robust partitioning.

artificial intelligence, mutual information, optimization problem, (18 more...)

2006.15412

Country:

North America > United States (0.14)
Europe > Spain (0.14)

Genre: Research Report (0.49)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

arXiv.org Machine LearningJun-14-2019

A Unified Framework of Robust Submodular Optimization

Iyer, Rishabh

In this paper, we shall study a unified framework of robust submodular optimization. We study this problem both from a minimization and maximization perspective (previous work has only focused on variants of robust submodular maximization). We do this under a broad range of combinatorial constraints including cardinality, knapsack, matroid as well as graph based constraints such as cuts, paths, matchings and trees. Furthermore, we also study robust submodular minimization and maximization under multiple submodular upper and lower bound constraints. We show that all these problems are motivated by important machine learning applications including robust data subset selection, robust co-operative cuts and robust co-operative matchings. In each case, we provide scalable approximation algorithms and also study hardness bounds. Finally, we empirically demonstrate the utility of our algorithms on real world applications.

approximation, artificial intelligence, optimization problem, (18 more...)

1906.06393

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceFeb-26-2019

A Memoization Framework for Scaling Submodular Optimization to Large Scale Problems

Iyer, Rishabh, Bilmes, Jeff

Given a set S V, optimization problems, where standard algorithms we define the gain of an element j / S in the context that treat the submodular functions S as f(j S) f(S j) f(S). A more intuitive in the value oracle model do not scale. In characterization is the diminishing returns characterization: this paper, we present a model called the precomputational A function f is submodular if it satisfies diminishing complexity model, along with marginal returns, namely f(j S) f(j T) a unifying memoization based framework, which looks at the specific form of the given for all S T,j / T, and is monotone if f(j S) 0 for submodular function. A key ingredient in all j / S,S V. this framework is the notion of a precomputed While submodular functions naturally occur in a number statistic, which is maintained in the course of real world applications, they also admit nice of the algorithms. We show that we can theoretical characterizations and algorithms. In particular, easily integrate this idea into a large class of many simple iterative procedures like greedy [47], submodular optimization problems including local search [13] and majorization-minimization [27] constrained and unconstrained submodular yield theoretical guarantees for these problems. These maximization, minimization, difference of algorithms are very efficient, scalable and easy to implement, submodular optimization, optimization with and hence are being used more often in several submodular constraints and several other large scale machine learning problems. The scale related optimization problems. Moreover, of machine learning problems are often massive, with memoization can be integrated in both discrete dataset sizes of several hundreds of millions of examples.

artificial intelligence, optimization problem, submodular function, (16 more...)

1902.10176

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Education > Focused Education > Special Education (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningSep-12-2018

A Unified Batch Online Learning Framework for Click Prediction

Iyer, Rishabh, Acharya, Nimit, Bompada, Tanuja, Charles, Denis, Manavoglu, Eren

We present a unified framework for Batch Online Learning (OL) for Click Prediction in Search Advertisement. Machine Learning models once deployed, show non-trivial accuracy and calibration degradation over time due to model staleness. It is therefore necessary to regularly update models, and do so automatically. This paper presents two paradigms of Batch Online Learning, one which incrementally updates the model parameters via an early stopping mechanism, and another which does so through a proximal regularization. We argue how both these schemes naturally trade-off between old and new data. We then theoretically and empirically show that these two seemingly different schemes are closely related. Through extensive experiments, we demonstrate the utility of of our OL framework; how the two OL schemes relate to each other and how they trade-off between the new and historical data. We then compare batch OL to full model retrains, and show how online learning is more robust to data issues. We also demonstrate the long term impact of Online Learning, the role of the initial Models in OL, the impact of delays in the update, and finally conclude with some implementation details and challenges in deploying a real world online learning system in production. While this paper mostly focuses on application of click prediction for search advertisement, we hope that the lessons learned here can be carried over to other problem domains.

computer based training, educational technology, online learning, (22 more...)

1809.04673

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.47)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

arXiv.org Machine LearningJul-17-2018

Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization

Iyer, Rishabh, Halloran, John T., Wei, Kai

This paper introduces Jensen, an easily extensible and scalable toolkit for production-level machine learning and convex optimization. Jensen implements a framework of convex (or loss) functions, convex optimization algorithms (including Gradient Descent, L-BFGS, Stochastic Gradient Descent, Conjugate Gradient, etc.), and a family of machine learning classifiers and regressors (Logistic Regression, SVMs, Least Square Regression, etc.). This framework makes it possible to deploy and train models with a few lines of code, and also extend and build upon this by integrating new loss functions and optimization algorithms.

artificial intelligence, jensen, optimization problem, (16 more...)

1807.06574

Country: North America > United States > California > Yolo County > Davis (0.14)

Genre: Research Report (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)