AITopics | Wang, Gang

Collaborating Authors

Wang, Gang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Multistep Lyapunov Approach for Finite-Time Analysis of Biased Stochastic Approximation

Wang, Gang, Li, Bingcong, Giannakis, Georgios B.

arXiv.org Machine LearningSep-10-2019

Motivated by the widespread use of temporal-difference (TD-) and Q-learning algorithms in reinforcement learning, this paper studies a class of biased stochastic approximation (SA) procedures under a mild "ergodic-like" assumption on the underlying stochastic noise sequence. Building upon a carefully designed multistep Lyapunov function that looks ahead to several future updates to accommodate the stochastic perturbations (for control of the gradient bias), we prove a general result on the convergence of the iterates, and use it to derive non-asymptotic bounds on the mean-square error in the case of constant stepsizes. This novel looking-ahead viewpoint renders finite-time analysis of biased SA algorithms under a large family of stochastic perturbations possible. For direct comparison with existing contributions, we also demonstrate these bounds by applying them to TD- and Q-learning with linear function approximation, under the practical Markov chain observation model. The resultant finite-time error bound for both the TD- as well as the Q-learning algorithms is the first of its kind, in the sense that it holds i) for the unmodified versions (i.e., without making any modifications to the parameter updates) using even nonlinear function approximators; as well as for Markov chains ii) under general mixing conditions and iii) starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.

artificial intelligence, null 2, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1909.04299

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Real-time Power System State Estimation and Forecasting via Deep Neural Networks

Zhang, Liang, Wang, Gang, Giannakis, Georgios B.

arXiv.org Machine LearningNov-14-2018

Contemporary smart power grids are being challenged by rapid voltage fluctuations, due to large-scale deployment of renewable generation, electric vehicles, and demand response programs. In this context, monitoring the grid's operating conditions in real time becomes increasingly critical. With the emergent large scale and nonconvexity however, past optimization based power system state estimation (PSSE) schemes are computationally expensive or yield suboptimal performance. To bypass these hurdles, this paper advocates deep neural networks (DNNs) for real-time power system monitoring. By unrolling a state-of-the-art prox-linear SE solver, a novel modelspecific DNN is developed for real-time PSSE, which entails a minimal tuning effort, and is easy to train. To further enable system awareness even ahead of the time horizon, as well as to endow the DNN-based estimator with resilience, deep recurrent neural networks (RNNs) are pursued for power system state forecasting. Deep RNNs exploit the long-term nonlinear dependencies present in the historical voltage time series to enable forecasting, and they are easy to implement. Numerical tests showcase improved performance of the proposed DNN-based estimation and forecasting approaches compared with existing alternatives. Empirically, the novel model-specific DNN-based PSSE offers nearly an order of magnitude improvement in performance over competing alternatives, including the widely adopted Gauss-Newton PSSE solver, in our tests using real load data on the IEEE 118-bus benchmark system.

deep learning, forecasting, neural network, (16 more...)

arXiv.org Machine Learning

1811.06146

Country:

North America > United States (0.68)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.84)

Industry:

Machinery > Industrial Machinery (1.00)
Energy > Power Industry (1.00)
Transportation > Ground > Road (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization

Wang, Gang, Giannakis, Georgios B., Chen, Jie

arXiv.org Machine LearningAug-14-2018

Neural networks with ReLU activations have achieved great empirical success in various domains. However, existing results for learning ReLU networks either pose assumptions on the underlying data distribution being e.g. Gaussian, or require the network size and/or training size to be sufficiently large. In this context, the problem of learning a two-layer ReLU network is approached in a binary classification setting, where the data are linearly separable and a hinge loss criterion is adopted. Leveraging the power of random noise, this contribution presents a novel stochastic gradient descent (SGD) algorithm, which can provably train any single-hidden-layer ReLU network to attain global optimality, despite the presence of infinitely many bad local minima and saddle points in general. This result is the first of its kind, requiring no assumptions on the data distribution, training/network size, or initialization. Convergence of the resultant iterative algorithm to a global minimum is analyzed by establishing both an upper bound and a lower bound on the number of effective (non-zero) updates to be performed. Furthermore, generalization guarantees are developed for ReLU networks trained with the novel SGD. These guarantees highlight a fundamental difference (at least in the worst case) between learning a ReLU network as well as a leaky ReLU network in terms of sample complexity. Numerical tests using synthetic data and real images validate the effectiveness of the algorithm and the practical merits of the theory.

deep learning, neural network, relu network, (18 more...)

arXiv.org Machine Learning

1808.04685

Country:

Europe (1.00)
North America > United States > California (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Nonlinear Dimensionality Reduction for Discriminative Analytics of Multiple Datasets

Chen, Jia, Wang, Gang, Giannakis, Georgios B.

arXiv.org Machine LearningMay-14-2018

Principal component analysis (PCA) is widely used for feature extraction and dimensionality reduction, with documented merits in diverse tasks involving high-dimensional data. Standard PCA copes with one dataset at a time, but it is challenged when it comes to analyzing multiple datasets jointly. In certain data science settings however, one is often interested in extracting the most discriminative information from one dataset of particular interest (a.k.a. target data) relative to the other(s) (a.k.a. background data). To this end, this paper puts forth a novel approach, termed discriminative (d) PCA, for such discriminative analytics of multiple datasets. Under certain conditions, dPCA is proved to be least-squares optimal in recovering the component vector unique to the target data relative to background data. To account for nonlinear data correlations, (linear) dPCA models for one or multiple background datasets are generalized through kernel-based learning. Interestingly, all dPCA variants admit an analytical solution obtainable with a single (generalized) eigenvalue decomposition. Finally, corroborating dimensionality reduction tests using both synthetic and real datasets are provided to validate the effectiveness of the proposed methods.

artificial intelligence, health & medicine, target data, (18 more...)

arXiv.org Machine Learning

1805.05502

Country:

North America > Canada (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.84)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.81)

Add feedback

Canonical Correlation Analysis of Datasets with a Common Source Graph

Chen, Jia, Wang, Gang, Shen, Yanning, Giannakis, Georgios B.

arXiv.org Machine LearningMar-27-2018

Canonical correlation analysis (CCA) is a powerful technique for discovering whether or not hidden sources are commonly present in two (or more) datasets. Its well-appreciated merits include dimensionality reduction, clustering, classification, feature selection, and data fusion. The standard CCA however, does not exploit the geometry of the common sources, which may be available from the given data or can be deduced from (cross-) correlations. In this paper, this extra information provided by the common sources generating the data is encoded in a graph, and is invoked as a graph regularizer. This leads to a novel graph-regularized CCA approach, that is termed graph (g) CCA. The novel gCCA accounts for the graph-induced knowledge of common sources, while minimizing the distance between the wanted canonical variables. Tailored for diverse practical settings where the number of data is smaller than the data vector dimensions, the dual formulation of gCCA is also developed. One such setting includes kernels that are incorporated to account for nonlinear data dependencies. The resultant graph-kernel (gk) CCA is also obtained in closed form. Finally, corroborating image classification tests over several real datasets are presented to showcase the merits of the novel linear, dual, and kernel approaches relative to competing alternatives.

health & medicine, neurology, vector, (17 more...)

arXiv.org Machine Learning

1803.10309

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

Gu, Jiuxiang (Nanyang Technological University) | Cai, Jianfei (Nanyang Technological University) | Wang, Gang (Alibaba AI Labs) | Chen, Tsuhan (Nanyang Technological University)

AAAI ConferencesFeb-8-2018

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multi-stage prediction framework for image captioning, composed of multiple decoders each of which operates on the output of the previous stage, producing increasingly refined image descriptions. Our proposed learning approach addresses the difficulty of vanishing gradients during training by providing a learning objective function that enforces intermediate supervisions. Particularly, we optimize our model with a reinforcement learning approach which utilizes the output of each intermediate decoder's test-time inference algorithm as well as the output of its preceding decoder to normalize the rewards, which simultaneously solves the well-known exposure bias problem and the loss-evaluation mismatch problem. We extensively evaluate the proposed approach on MSCOCO and show that our approach can achieve the state-of-the-art performance.

decoder, deep learning, neural network, (20 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Solving Most Systems of Random Quadratic Equations

Wang, Gang, Giannakis, Georgios, Saad, Yousef, Chen, Jie

Neural Information Processing SystemsDec-31-2017

This paper deals with finding an $n$-dimensional solution $\bm{x}$ to a system of quadratic equations $y_i=|\langle\bm{a}_i,\bm{x}\rangle|^2$, $1\le i \le m$, which in general is known to be NP-hard. We put forth a novel procedure, that starts with a \emph{weighted maximal correlation initialization} obtainable with a few power iterations, followed by successive refinements based on \emph{iteratively reweighted gradient-type iterations}. The novel techniques distinguish themselves from prior works by the inclusion of a fresh (re)weighting regularization. For certain random measurement models, the proposed procedure returns the true solution $\bm{x}$ with high probability in time proportional to reading the data $\{(\bm{a}_i;y_i)\}_{1\le i \le m}$, provided that the number $m$ of equations is some constant $c>0$ times the number $n$ of unknowns, that is, $m\ge cn$. Empirically, the upshots of this contribution are: i) perfect signal recovery in the high-dimensional regime given only an \emph{information-theoretic limit number} of equations; and, ii) (near-)optimal statistical accuracy in the presence of additive noise. Extensive numerical tests using both synthetic data and real images corroborate its improved signal recovery performance and computational efficiency relative to state-of-the-art approaches.

artificial intelligence, optimization problem, vector, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

DPCA: Dimensionality Reduction for Discriminative Analytics of Multiple Large-Scale Datasets

Wang, Gang, Chen, Jia, Giannakis, Georgios B.

arXiv.org Machine LearningOct-25-2017

Principal component analysis (PCA) has well-documented merits for data extraction and dimensionality reduction. PCA deals with a single dataset at a time, and it is challenged when it comes to analyzing multiple datasets. Yet in certain setups, one wishes to extract the most significant information of one dataset relative to other datasets. Specifically, the interest may be on identifying, namely extracting features that are specific to a single target dataset but not the others. This paper develops a novel approach for such so-termed discriminative data analysis, and establishes its optimality in the least-squares (LS) sense under suitable data modeling assumptions. The criterion reveals linear combinations of variables by maximizing the ratio of the variance of the target data to that of the remainders. The novel approach solves a generalized eigenvalue problem by performing SVD just once. Numerical tests using synthetic and real datasets showcase the merits of the proposed approach relative to its competing alternatives.

artificial intelligence, health & medicine, target data, (18 more...)

arXiv.org Machine Learning

1710.09429

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.91)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)

Add feedback

Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow

Wang, Gang, Giannakis, Georgios B., Eldar, Yonina C.

arXiv.org Machine LearningAug-20-2017

This paper presents a new algorithm, termed \emph{truncated amplitude flow} (TAF), to recover an unknown vector $\bm{x}$ from a system of quadratic equations of the form $y_i=|\langle\bm{a}_i,\bm{x}\rangle|^2$, where $\bm{a}_i$'s are given random measurement vectors. This problem is known to be \emph{NP-hard} in general. We prove that as soon as the number of equations is on the order of the number of unknowns, TAF recovers the solution exactly (up to a global unimodular constant) with high probability and complexity growing linearly with both the number of unknowns and the number of equations. Our TAF approach adopts the \emph{amplitude-based} empirical loss function, and proceeds in two stages. In the first stage, we introduce an \emph{orthogonality-promoting} initialization that can be obtained with a few power iterations. Stage two refines the initial estimate by successive updates of scalable \emph{truncated generalized gradient iterations}, which are able to handle the rather challenging nonconvex and nonsmooth amplitude-based objective function. In particular, when vectors $\bm{x}$ and $\bm{a}_i$'s are real-valued, our gradient truncation rule provably eliminates erroneously estimated signs with high probability to markedly improve upon its untruncated version. Numerical tests using synthetic data and real images demonstrate that our initialization returns more accurate and robust estimates relative to spectral initializations. Furthermore, even under the same initialization, the proposed amplitude-based refinement outperforms existing Wirtinger flow variants, corroborating the superior performance of TAF over state-of-the-art algorithms.

artificial intelligence, optimization problem, probability, (19 more...)

arXiv.org Machine Learning

1605.08285

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)

Add feedback

Solving Almost all Systems of Random Quadratic Equations

Wang, Gang, Giannakis, Georgios B., Saad, Yousef, Chen, Jie

arXiv.org Machine LearningMay-29-2017

This paper deals with finding an $n$-dimensional solution $x$ to a system of quadratic equations of the form $y_i=|\langle{a}_i,x\rangle|^2$ for $1\le i \le m$, which is also known as phase retrieval and is NP-hard in general. We put forth a novel procedure for minimizing the amplitude-based least-squares empirical loss, that starts with a weighted maximal correlation initialization obtainable with a few power or Lanczos iterations, followed by successive refinements based upon a sequence of iteratively reweighted (generalized) gradient iterations. The two (both the initialization and gradient flow) stages distinguish themselves from prior contributions by the inclusion of a fresh (re)weighting regularization technique. The overall algorithm is conceptually simple, numerically scalable, and easy-to-implement. For certain random measurement models, the novel procedure is shown capable of finding the true solution $x$ in time proportional to reading the data $\{(a_i;y_i)\}_{1\le i \le m}$. This holds with high probability and without extra assumption on the signal $x$ to be recovered, provided that the number $m$ of equations is some constant $c>0$ times the number $n$ of unknowns in the signal vector, namely, $m>cn$. Empirically, the upshots of this contribution are: i) (almost) $100\%$ perfect signal recovery in the high-dimensional (say e.g., $n\ge 2,000$) regime given only an information-theoretic limit number of noiseless equations, namely, $m=2n-1$ in the real-valued Gaussian case; and, ii) (nearly) optimal statistical accuracy in the presence of additive noise of bounded support. Finally, substantial numerical tests using both synthetic data and real images corroborate markedly improved signal recovery performance and computational efficiency of our novel procedure relative to state-of-the-art approaches.

artificial intelligence, equation, optimization problem, (18 more...)

arXiv.org Machine Learning

1705.10407

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report > Promising Solution (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback