AITopics

2402.15021

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (0.34)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

arXiv.org Artificial IntelligenceJun-5-2023

LibAUC: A Deep Learning Library for X-Risk Optimization

Yuan, Zhuoning, Zhu, Dixian, Qiu, Zi-Hao, Li, Gang, Wang, Xuanhui, Yang, Tianbao

The Torch [36] have dramatically reduced the efforts of developers motivation of developing LibAUC is to address the convergence and researchers for implementing different DL methods without issues of existing libraries for solving these problems. In particular, worrying about low-level computations (e.g., automatic differentiation, existing libraries may not converge or require very large mini-batch tensor operations, etc). Based on these platforms, plenty sizes in order to attain good performance for these problems, due of DL libraries have been developed for different purposes, which to the usage of the standard mini-batch technique in the empirical can be organized into different categories including (i) supporting risk minimization (ERM) framework. Our library is for deep X-risk specific tasks [15, 35], e.g., TF-Ranking for LTR [35], VISSL for optimization (DXO) that has achieved great success in solving a variety self-supervised learning (SSL) [15], (ii) supporting specific data, of tasks for CID, LTR and CLR. The contributions of this paper e.g., DGL and DIG for graphs [31, 55]; (iii) supporting specific models include: (1) It introduces a new mini-batch based pipeline for implementing [13, 58, 59], e.g., Transformers for transformer models [59]. DXO algorithms, which differs from existing DL pipeline in However, it has been observed that these existing platforms and the design of controlled data samplers and dynamic mini-batch losses; libraries have encountered some unique challenges when solving (2) It provides extensive benchmarking experiments for ablation some classical and emerging problems in AI, including classification studies and comparison with existing libraries. The LibAUC library for imbalanced data (CID), learning to rank (LTR), contrastive features scalable performance for millions of items to be contrasted, learning of representations (CLR). In particular, prior works have faster and better convergence than existing libraries for optimizing observed that large mini-batch sizes are necessary to attain good X-risks, seamless PyTorch deployment and versatile APIs for various performance for these problems [4, 5, 7, 37, 43, 46], which restricts loss optimization. Our library is available to the open source the capabilities of these AI models in the real-world.

artificial intelligence, library, machine learning, (17 more...)

doi: 10.1145/3580305.3599861

2306.03065

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (0.51)
Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-19-2023

Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization

Qiu, Zi-Hao, Hu, Quanqi, Yuan, Zhuoning, Zhou, Denny, Zhang, Lijun, Yang, Tianbao

In this paper, we aim to optimize a contrastive loss with individualized temperatures in a principled and systematic manner for self-supervised learning. The common practice of using a global temperature parameter $\tau$ ignores the fact that ``not all semantics are created equal", meaning that different anchor data may have different numbers of samples with similar semantics, especially when data exhibits long-tails. First, we propose a new robust contrastive loss inspired by distributionally robust optimization (DRO), providing us an intuition about the effect of $\tau$ and a mechanism for automatic temperature individualization. Then, we propose an efficient stochastic algorithm for optimizing the robust contrastive loss with a provable convergence guarantee without using large mini-batch sizes. Theoretical and experimental results show that our algorithm automatically learns a suitable $\tau$ for each sample. Specifically, samples with frequent semantics use large temperatures to keep local semantic structures, while samples with rare semantics use small temperatures to induce more separable features. Our method not only outperforms prior strong baselines (e.g., SimCLR, CLIP) on unimodal and bimodal datasets with larger improvements on imbalanced data but also is less sensitive to hyper-parameters. To our best knowledge, this is the first methodical approach to optimizing a contrastive loss with individualized temperatures.

artificial intelligence, machine learning, natural language, (16 more...)

2305.11965

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.60)

arXiv.org Artificial IntelligenceApr-24-2023

Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and Personalized Federated Learning

Wang, Bokun, Yuan, Zhuoning, Ying, Yiming, Yang, Tianbao

In recent years, model-agnostic meta-learning (MAML) has become a popular research area. However, the stochastic optimization of MAML is still underdeveloped. Existing MAML algorithms rely on the "episode" idea by sampling a few tasks and data points to update the meta-model at each iteration. Nonetheless, these algorithms either fail to guarantee convergence with a constant mini-batch size or require processing a large number of tasks at every iteration, which is unsuitable for continual learning or cross-device federated learning where only a small number of tasks are available per iteration or per round. To address these issues, this paper proposes memory-based stochastic algorithms for MAML that converge with vanishing error. The proposed algorithms require sampling a constant number of tasks and data samples per iteration, making them suitable for the continual learning scenario. Moreover, we introduce a communication-efficient memory-based MAML algorithm for personalized federated learning in cross-device (with client sampling) and cross-silo (without client sampling) settings. Our theoretical analysis improves the optimization theory for MAML, and our empirical results corroborate our theoretical findings. Interested readers can access our code at https://github.com/bokun-wang/moml.

artificial intelligence, machine learning, optimization problem, (14 more...)

2106.04911

Country:

North America > United States > Texas (0.28)
North America > United States > Iowa (0.28)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.83)

arXiv.org Machine LearningFeb-8-2021

Federated Deep AUC Maximization for Heterogeneous Data with a Constant Communication Complexity

Yuan, Zhuoning, Guo, Zhishuai, Xu, Yi, Ying, Yiming, Yang, Tianbao

Deep AUC (area under the ROC curve) Maximization (DAM) has attracted much attention recently due to its great potential for imbalanced data classification. However, the research on Federated Deep AUC Maximization (FDAM) is still limited. Compared with standard federated learning (FL) approaches that focus on decomposable minimization objectives, FDAM is more complicated due to its minimization objective is non-decomposable over individual examples. In this paper, we propose improved FDAM algorithms for heterogeneous data by solving the popular non-convex strongly-concave min-max formulation of DAM in a distributed fashion. A striking result of this paper is that the communication complexity of the proposed algorithm is a constant independent of the number of machines and also independent of the accuracy level, which improves an existing result by orders of magnitude. Of independent interest, the proposed algorithm can also be applied to a class of non-convex-strongly-concave min-max problems. The experiments have demonstrated the effectiveness of our FDAM algorithm on benchmark datasets, and on medical chest X-ray images from different organizations. Our experiment shows that the performance of FDAM using data from multiple hospitals can improve the AUC score on testing data from a single hospital for detecting life-threatening diseases based on chest radiographs.

complexity, deep learning, neural network, (22 more...)

2102.04635

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningDec-5-2020

Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification

Yuan, Zhuoning, Yan, Yan, Sonka, Milan, Yang, Tianbao

Deep AUC Maximization (DAM) is a paradigm for learning a deep neural network by maximizing the AUC score of the model on a dataset. Most previous works of AUC maximization focus on the perspective of optimization by designing efficient stochastic algorithms, and studies on generalization performance of DAM on difficult tasks are missing. In this work, we aim to make DAM more practical for interesting real-world applications (e.g., medical image classification). First, we propose a new margin-based surrogate loss function for the AUC score (named as the AUC margin loss). It is more robust than the commonly used AUC square loss, while enjoying the same advantage in terms of large-scale stochastic optimization. Second, we conduct empirical studies of our DAM method on difficult medical image classification tasks, namely classification of chest x-ray images for identifying many threatening diseases and classification of images of skin lesions for identifying melanoma. Our DAM method has achieved great success on these difficult tasks, i.e., the 1st place on Stanford CheXpert competition (by the paper submission date) and Top 1% rank (rank 33 out of 3314 teams) on Kaggle 2020 Melanoma classification competition. We also conduct extensive ablation studies to demonstrate the advantages of the new AUC margin loss over the AUC square loss on benchmark datasets. To the best of our knowledge, this is the first work that makes DAM succeed on large-scale medical image datasets.

dataset, deep learning, neural network, (21 more...)

2012.03173

Country: North America > United States (0.14)

Genre:

Instructional Material > Online (0.81)
Instructional Material > Course Syllabus & Notes (0.81)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (0.55)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

arXiv.org Machine LearningOct-9-2020

Fast Objective & Duality Gap Convergence for Nonconvex-Strongly-Concave Min-Max Problems

Guo, Zhishuai, Yuan, Zhuoning, Yan, Yan, Yang, Tianbao

This paper focuses on stochastic methods for solving smooth non-convex strongly-concave min-max problems, which have received increasing attention due to their potential applications in deep learning (e.g., deep AUC maximization). However, most of the existing algorithms are slow in practice, and their analysis revolves around the convergence to a nearly stationary point. We consider leveraging the Polyak-\L ojasiewicz (PL) condition to design faster stochastic algorithms with stronger convergence guarantee. Although PL condition has been utilized for designing many stochastic minimization algorithms, their applications for non-convex min-max optimization remains rare. In this paper, we propose and analyze proximal epoch-based methods, and establish fast convergence in terms of both {\bf the primal objective gap and the duality gap}. Our analysis is interesting in threefold: (i) it is based on a novel Lyapunov function that consists of the primal objective gap and the duality gap of a regularized function; (ii) it only requires a weaker PL condition for establishing the primal objective convergence than that required for the duality gap convergence; (iii) it yields the optimal dependence on the accuracy level $\epsilon$, i.e., $O(1/\epsilon)$. We also make explicit the dependence on the problem parameters and explore regions of weak convexity parameter that lead to improved dependence on condition numbers. Experiments on deep AUC maximization demonstrate the effectiveness of our methods. Our method (MaxAUC) achieved an AUC of 0.922 on private testing set on {\bf CheXpert competition}.

deep learning, iteration complexity, neural network, (19 more...)

2006.06889

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Machine LearningOct-8-2020

Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks

Guo, Zhishuai, Liu, Mingrui, Yuan, Zhuoning, Shen, Li, Liu, Wei, Yang, Tianbao

In this paper, we study distributed algorithms for large-scale AUC maximization with a deep neural network as a predictive model. Although distributed learning techniques have been investigated extensively in deep learning, they are not directly applicable to stochastic AUC maximization with deep neural networks due to its striking differences from standard loss minimization problems (e.g., cross-entropy). Towards addressing this challenge, we propose and analyze a communication-efficient distributed optimization algorithm based on a {\it non-convex concave} reformulation of the AUC maximization, in which the communication of both the primal variable and the dual variable between each worker and the parameter server only occurs after multiple steps of gradient-based updates in each worker. Compared with the naive parallel version of an existing algorithm that computes stochastic gradients at individual machines and averages them for updating the model parameters, our algorithm requires a much less number of communication rounds and still achieves a linear speedup in theory. To the best of our knowledge, this is the \textbf{first} work that solves the {\it non-convex concave min-max} problem for AUC maximization with deep neural networks in a communication-efficient distributed manner while still maintaining the linear speedup property in theory. Our experiments on several benchmark datasets show the effectiveness of our algorithm and also confirm our theory.

deep learning, neural network, null 2, (12 more...)

2005.02426

Country: North America > United States > Iowa > Johnson County > Iowa City (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningAug-30-2019

Stochastic AUC Maximization with Deep Neural Networks

Liu, Mingrui, Yuan, Zhuoning, Ying, Yiming, Yang, Tianbao

Stochastic AUC maximization has garnered an increasing interest due to better fit to imbalanced data classification. However, existing works are limited to stochastic AUC maximization with a linear predictive model, which restricts its predictive power when dealing with extremely complex data. In this paper, we consider stochastic AUC maximization problem with a deep neural network as the predictive model. Building on the saddle point reformulation of a surrogated loss of AUC, the problem can be cast into a {\it non-convex concave} min-max problem. The main contribution made in this paper is to make stochastic AUC maximization more practical for deep neural networks and big data with theoretical insights as well. In particular, we propose to explore Polyak-\L{}ojasiewicz (PL) condition that has been proved and observed in deep learning, which enables us to develop new stochastic algorithms with even faster convergence rate and more practical step size scheme. An AdaGrad-style algorithm is also analyzed under the PL condition with adaptive convergence rate. Our experimental results demonstrate the effectiveness of the proposed algorithms.

deep learning, neural network, null, (17 more...)

1908.10831

Country: North America > United States > Iowa > Johnson County > Iowa City (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningDec-11-2018

Why Does Stagewise Training Accelerate Convergence of Testing Error Over SGD?

Yang, Tianbao, Yan, Yan, Yuan, Zhuoning, Jin, Rong

Stagewise training strategy is commonly used for learning neural networks, which uses a stochastic algorithm (e.g., SGD) starting with a relatively large step size (aka learning rate) and geometrically decreasing the step size after a number of iterations. It has been observed that the stagewise SGD has much faster convergence than the vanilla SGD with a polynomial decaying step size in terms of both training error and testing error. But how to explain this phenomenon has been largely ignored by existing studies. This paper provides some theoretical evidence for explaining this faster convergence. In particular, we consider the stagewise training strategy for minimizing empirical risk that satisfies the Polyak-Łojasiewicz condition, which has been observed/proved for neural networks and also holds for a broad family of convex functions. For convex loss functions and "nicebehaviored" non-convexloss functions that are close to a convex function (namely weakly convex functions), we establish faster convergence of stagewise training than the vanilla SGD under the same condition on both training error and testing error. Indeed, the proposed algorithm has additional favorable features that come with theoretical guarantee for the considered non-convex optimization problems, including using explicit algorithmic regularization at each stage, using stagewise averaged solution for restarting, and returning the last stagewise averaged solution as the final solution. To differentiate from commonly used stagewise SGD, we refer to our algorithm as stagewise regularized training algorithm or Start. Of independent interest, the proved testing error bounds for a family of nonconvex lossfunctions are dimensionality and norm independent.

neural network, optimization problem, testing error, (16 more...)

1812.03934

Country: North America > United States > Iowa > Johnson County > Iowa City (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)