AITopics | Xu, Yangyang

Collaborating Authors

Xu, Yangyang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DeMT: Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction

Xu, Yangyang, Yang, Yibo, Zhang, Lefei

arXiv.org Artificial IntelligenceMar-4-2023

Convolution neural networks (CNNs) and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL). Most of the current studies on MTL solely rely on CNN or Transformer. In this work, we present a novel MTL model by combining both merits of deformable CNN and query-based Transformer for multi-task learning of dense prediction. Our method, named DeMT, is based on a simple and effective encoder-decoder architecture (i.e., deformable mixer encoder and task-aware transformer decoder). First, the deformable mixer encoder contains two types of operators: the channel-aware mixing operator leveraged to allow communication among different channels ($i.e.,$ efficient channel location mixing), and the spatial-aware deformable operator with deformable convolution applied to efficiently sample more informative spatial locations (i.e., deformed features). Second, the task-aware transformer decoder consists of the task interaction block and task query block. The former is applied to capture task interaction features via self-attention. The latter leverages the deformed features and task-interacted features to generate the corresponding task-specific feature through a query-based Transformer for corresponding task predictions. Extensive experiments on two dense image prediction datasets, NYUD-v2 and PASCAL-Context, demonstrate that our model uses fewer GFLOPs and significantly outperforms current Transformer- and CNN-based competitive models on a variety of metrics. The code are available at https://github.com/yangyangxu0/DeMT .

artificial intelligence, machine learning, transformer, (17 more...)

arXiv.org Artificial Intelligence

2301.03461

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Li, Zichong, Chen, Pin-Yu, Liu, Sijia, Lu, Songtao, Xu, Yangyang

arXiv.org Artificial IntelligenceDec-19-2022

Many real-world problems not only have complicated nonconvex functional constraints but also use a large number of data points. This motivates the design of efficient stochastic methods on finite-sum or expectation constrained problems. In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i.e. smooth+nonsmooth) objective and nonconvex smooth functional constraints. We adopt the standard iALM framework and design a subroutine by using the momentum-based variance-reduced proximal stochastic gradient method (PStorm) and a postprocessing step. Under certain regularity conditions (assumed also in existing works), to reach an $\varepsilon$-KKT point in expectation, we establish an oracle complexity result of $O(\varepsilon^{-5})$, which is better than the best-known $O(\varepsilon^{-6})$ result. Numerical experiments on the fairness constrained problem and the Neyman-Pearson classification problem with real data demonstrate that our proposed method outperforms an existing method with the previously best-known complexity result.

artificial intelligence, machine learning, stoc-ialm, (14 more...)

arXiv.org Artificial Intelligence

2212.09513

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

When Neural Networks Fail to Generalize? A Model Sensitivity Perspective

Zhang, Jiajin, Chao, Hanqing, Dhurandhar, Amit, Chen, Pin-Yu, Tajer, Ali, Xu, Yangyang, Yan, Pingkun

arXiv.org Artificial IntelligenceDec-1-2022

Domain generalization (DG) aims to train a model to perform well in unseen domains under different distributions. This paper considers a more realistic yet more challenging scenario,namely Single Domain Generalization (Single-DG), where only a single source domain is available for training. To tackle this challenge, we first try to understand when neural networks fail to generalize? We empirically ascertain a property of a model that correlates strongly with its generalization that we coin as "model sensitivity". Based on our analysis, we propose a novel strategy of Spectral Adversarial Data Augmentation (SADA) to generate augmented images targeted at the highly sensitive frequencies. Models trained with these hard-to-learn samples can effectively suppress the sensitivity in the frequency space, which leads to improved generalization performance. Extensive experiments on multiple public datasets demonstrate the superiority of our approach, which surpasses the state-of-the-art single-DG methods.

artificial intelligence, machine learning, sensitivity map, (16 more...)

arXiv.org Artificial Intelligence

2212.0085

Country: North America > United States (0.47)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Xie, Tengyang, Liu, Bo, Xu, Yangyang, Ghavamzadeh, Mohammad, Chow, Yinlam, Lyu, Daoming, Yoon, Daesub

Neural Information Processing SystemsDec-31-2018

Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Legendre-Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration's solution and the convergence rate of the randomly picked solution are provided, and their applicability is demonstrated on several benchmark domains.

algorithm, health & medicine, optimization problem, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Industry:

Information Technology (0.74)
Health & Medicine (0.48)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Xie, Tengyang, Liu, Bo, Xu, Yangyang, Ghavamzadeh, Mohammad, Chow, Yinlam, Lyu, Daoming, Yoon, Daesub

Neural Information Processing SystemsDec-31-2018

Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whoselearning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework formean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Legendre-Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration's solution and the convergence rate of the randomly picked solution are provided, and their applicability is demonstrated on several benchmark domains.

algorithm, artificial intelligence, optimization problem, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Industry:

Information Technology (0.74)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Markov Chain Block Coordinate Descent

Sun, Tao, Sun, Yuejiao, Xu, Yangyang, Yin, Wotao

arXiv.org Machine LearningNov-21-2018

The method of block coordinate gradient descent (BCD) has been a powerful method for large-scale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block selection is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixing-time properties of a Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz differentiable functions, which can be nonconvex. When the functions are convex and strongly convex, we establish both sublinear and linear convergence rates, respectively. We also present a method of Markov chain inertial BCD. Finally, we discuss potential applications.

artificial intelligence, machine learning, markov chain, (16 more...)

arXiv.org Machine Learning

1811.0899

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Liu, Bo, Xie, Tengyang, Xu, Yangyang, Ghavamzadeh, Mohammad, Chow, Yinlam, Lyu, Daoming, Yoon, Daesub

arXiv.org Machine LearningSep-9-2018

Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration's solution and the convergence rate of the randomly picked solution are provided, and their applicability is demonstrated on several benchmark domains.

algorithm, health & medicine, optimization problem, (19 more...)

arXiv.org Machine Learning

1809.02292

Country: North America (0.46)

Genre: Research Report (0.64)

Industry:

Information Technology (0.74)
Health & Medicine (0.48)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)

Add feedback

Hybrid Jacobian and Gauss-Seidel proximal block coordinate update methods for linearly constrained convex programming

Xu, Yangyang

arXiv.org Machine LearningJan-3-2018

Recent years have witnessed the rapid development of block coordinate update (BCU) methods, which are particularly suitable for problems involving large-sized data and/or variables. In optimization, BCU first appears as the coordinate descent method that works well for smooth problems or those with separable nonsmooth terms and/or separable constraints. As nonseparable constraints exist, BCU can be applied under primal-dual settings. In the literature, it has been shown that for weakly convex problems with nonseparable linear constraint, BCU with fully Gauss-Seidel updating rule may fail to converge and that with fully Jacobian rule can converge sublinearly. However, empirically the method with Jacobian update is usually slower than that with Gauss-Seidel rule. To maintain their advantages, we propose a hybrid Jacobian and Gauss-Seidel BCU method for solving linearly constrained multi-block structured convex programming, where the objective may have a nonseparable quadratic term and separable nonsmooth terms. At each primal block variable update, the method approximates the augmented Lagrangian function at an affine combination of the previous two iterates, and the affinely mixing matrix with desired nice properties can be chosen through solving a semidefinite programming. We show that the hybrid method enjoys the theoretical convergence guarantee as Jacobian BCU. In addition, we numerically demonstrate that the method can perform as well as Gauss-Seidel method and better than a recently proposed randomized primal-dual BCU method.

artificial intelligence, convergence, optimization problem, (18 more...)

arXiv.org Machine Learning

1608.03928

Country:

Asia (0.14)
North America > United States > New York (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Accelerated Primal-Dual Proximal Block Coordinate Updating Methods for Constrained Convex Optimization

Xu, Yangyang, Zhang, Shuzhong

arXiv.org Machine LearningNov-20-2017

Block Coordinate Update (BCU) methods enjoy low per-update computational complexity because every time only one or a few block variables would need to be updated among possibly a large number of blocks. They are also easily parallelized and thus have been particularly popular for solving problems involving large-scale dataset and/or variables. In this paper, we propose a primal-dual BCU method for solving linearly constrained convex program in multi-block variables. The method is an accelerated version of a primal-dual algorithm proposed by the authors, which applies randomization in selecting block variables to update and establishes an $O(1/t)$ convergence rate under weak convexity assumption. We show that the rate can be accelerated to $O(1/t^2)$ if the objective is strongly convex. In addition, if one block variable is independent of the others in the objective, we then show that the algorithm can be modified to achieve a linear rate of convergence. The numerical experiments show that the accelerated method performs stably with a single set of parameters while the original method needs to tune the parameters for different datasets in order to achieve a comparable level of performance.

artificial intelligence, objective, optimization problem, (17 more...)

arXiv.org Machine Learning

1702.05423

Country: Asia (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Peng, Zhimin, Xu, Yangyang, Yan, Ming, Yin, Wotao

arXiv.org Machine LearningNov-15-2017

Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown. This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for large unbounded delays. An explicit formula of stepsize that guarantees convergence is given depending on delays' statistics. With $p+1$ identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter $p$, matching our theoretical model, and thus the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay induced stepsize is too conservative, often slowing down the convergence of the algorithm.

artificial intelligence, convergence, machine learning, (13 more...)

arXiv.org Machine Learning

1612.04425

Country:

North America > United States > Michigan > Ingham County (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback