AITopics | Liao, Zhenyu

Collaborating Authors

Liao, Zhenyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton

Niu, Chengmei, Liao, Zhenyu, Ling, Zenan, Mahoney, Michael W.

arXiv.org Machine LearningFeb-19-2025

A substantial body of work in machine learning (ML) and randomized numerical linear algebra (RandNLA) has exploited various sorts of random sketching methodologies, including random sampling and random projection, with much of the analysis using Johnson--Lindenstrauss and subspace embedding techniques. Recent studies have identified the issue of inversion bias -- the phenomenon that inverses of random sketches are not unbiased, despite the unbiasedness of the sketches themselves. This bias presents challenges for the use of random sketches in various ML pipelines, such as fast stochastic optimization, scalable statistical estimators, and distributed optimization. In the context of random projection, the inversion bias can be easily corrected for dense Gaussian projections (which are, however, too expensive for many applications). Recent work has shown how the inversion bias can be corrected for sparse sub-gaussian projections. In this paper, we show how the inversion bias can be corrected for random sampling methods, both uniform and non-uniform leverage-based, as well as for structured random projections, including those based on the Hadamard transform. Using these results, we establish problem-independent local convergence rates for sub-sampled Newton methods.

artificial intelligence, eff, machine learning, (18 more...)

arXiv.org Machine Learning

2502.13583

Country:

North America > United States > California (0.14)
Asia > China > Hubei Province (0.14)

Genre: Research Report > New Finding (0.45)

Industry: Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

The Breakdown of Gaussian Universality in Classification of High-dimensional Mixtures

Mai, Xiaoyi, Liao, Zhenyu

arXiv.org Machine LearningOct-7-2024

The assumption of Gaussian or Gaussian mixture data has been extensively exploited in a long series of precise performance analyses of machine learning (ML) methods, on large datasets having comparably numerous samples and features. To relax this restrictive assumption, subsequent efforts have been devoted to establish "Gaussian equivalent principles" by studying scenarios of Gaussian universality where the asymptotic performance of ML methods on non-Gaussian data remains unchanged when replaced with Gaussian data having the same mean and covariance. Beyond the realm of Gaussian universality, there are few exact results on how the data distribution affects the learning performance. In this article, we provide a precise high-dimensional characterization of empirical risk minimization, for classification under a general mixture data setting of linear factor models that extends Gaussian mixtures. The Gaussian universality is shown to break down under this setting, in the sense that the asymptotic learning performance depends on the data distribution beyond the class means and covariances. To clarify the limitations of Gaussian universality in classification of mixture data and to understand the impact of its breakdown, we specify conditions for Gaussian universality and discuss their implications for the choice of loss function.

artificial intelligence, high-dimensional mixture, machine learning, (3 more...)

arXiv.org Machine Learning

2410.05609

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Add feedback

"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach

Gu, Lingyu, Du, Yongqi, Zhang, Yuan, Xie, Di, Pu, Shiliang, Qiu, Robert C., Liao, Zhenyu

arXiv.org Machine LearningFeb-29-2024

Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, so that they can be deployed on low-power IoT devices. In this paper, building upon recent advances in neural tangent kernel (NTK) and random matrix theory (RMT), we provide a novel compression approach to wide and fully-connected \emph{deep} neural nets. Specifically, we demonstrate that in the high-dimensional regime where the number of data points $n$ and their dimension $p$ are both large, and under a Gaussian mixture model for the data, there exists \emph{asymptotic spectral equivalence} between the NTK matrices for a large family of DNN models. This theoretical result enables "lossless" compression of a given DNN to be performed, in the sense that the compressed network yields asymptotically the same NTK as the original (dense and unquantized) network, with its weights and activations taking values \emph{only} in $\{ 0, \pm 1 \}$ up to a scaling. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme, with code available at \url{https://github.com/Model-Compression/Lossless_Compression}.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Machine Learning

2403.00258

Country:

Europe (0.46)
Asia > China (0.28)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures

Ling, Zenan, Li, Longbo, Feng, Zhanbo, Zhang, Yixuan, Zhou, Feng, Qiu, Robert C., Liao, Zhenyu

arXiv.org Artificial IntelligenceFeb-4-2024

Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of the conjugate kernel (CK) and neural tangent kernel (NTK) matrices for implicit DEQs, when the input data are drawn from a high-dimensional Gaussian mixture. We prove, in this setting, that the spectral behavior of these Implicit-CKs and NTKs depend on the DEQ activation function and initial weight variances, but only via a system of four nonlinear equations. As a direct consequence of this theoretical result, we demonstrate that a shallow explicit network can be carefully designed to produce the same CK or NTK as a given DEQ. Despite derived here for Gaussian mixture data, empirical results show the proposed theory and design principle also apply to popular real-world datasets.

artificial intelligence, deq, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2402.02697

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Robust and Communication-Efficient Federated Domain Adaptation via Random Features

Feng, Zhanbo, Wang, Yuanjie, Li, Jie, Yang, Fan, Lou, Jiong, Mi, Tiebin, Qiu, Robert. C., Liao, Zhenyu

arXiv.org Machine LearningNov-8-2023

Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, federated domain adaptation (FDA) emerges as a powerful approach to address this challenge. Most existing FDA approaches typically focus on aligning the distributions between source and target domains by minimizing their (e.g., MMD) distance. Such strategies, however, inevitably introduce high communication overheads and can be highly sensitive to network reliability. In this paper, we introduce RF-TCA, an enhancement to the standard Transfer Component Analysis approach that significantly accelerates computation without compromising theoretical and empirical performance. Leveraging the computational advantage of RF-TCA, we further extend it to FDA setting with FedRF-TCA. The proposed FedRF-TCA protocol boasts communication complexity that is \emph{independent} of the sample size, while maintaining performance that is either comparable to or even surpasses state-of-the-art FDA methods. We present extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA.

artificial intelligence, fedrf-tca, machine learning, (13 more...)

arXiv.org Machine Learning

2311.04686

Country: North America > United States (0.95)

Genre:

Research Report (0.81)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint

Ling, Zenan, Liao, Zhenyu, Qiu, Robert C.

arXiv.org Machine LearningAug-30-2023

Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions.

artificial intelligence, implicit nn, machine learning, (15 more...)

arXiv.org Machine Learning

2308.16425

Country: Asia > China (0.15)

Genre: Research Report (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Analysis and Approximate Inference of Large and Dense Random Kronecker Graphs

Liao, Zhenyu, Xia, Yuanqian, Niu, Chengmei, Xiao, Yong

arXiv.org Artificial IntelligenceJun-14-2023

Random graph models are playing an increasingly important role in science and industry, and finds their applications in a variety of fields ranging from social and traffic networks, to recommendation systems and molecular genetics. In this paper, we perform an in-depth analysis of the random Kronecker graph model proposed in \cite{leskovec2010kronecker}, when the number of graph vertices $N$ is large. Built upon recent advances in random matrix theory, we show, in the dense regime, that the random Kronecker graph adjacency matrix follows approximately a signal-plus-noise model, with a small-rank (of order at most $\log N$) signal matrix that is linear in the graph parameters and a random noise matrix having a quarter-circle-form singular value distribution. This observation allows us to propose a ``denoise-and-solve'' meta algorithm to approximately infer the graph parameters, with reduced computational complexity and (asymptotic) performance guarantee. Numerical experiments of graph inference and graph classification on both synthetic and realistic graphs are provided to support the advantageous performance of the proposed approach.

artificial intelligence, machine learning, matrix, (14 more...)

arXiv.org Artificial Intelligence

2306.08489

Country:

North America > United States > Wisconsin (0.14)
North America > United States > New York (0.14)
Asia > China > Hubei Province (0.14)
Asia > China > Guangdong Province (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.64)

Add feedback

Random matrices in service of ML footprint: ternary random features with no performance loss

Ali, Hafiz Tiomoko, Liao, Zhenyu, Couillet, Romain

arXiv.org Machine LearningOct-5-2021

In this article, we investigate the spectral behavior of random features kernel matrices of the type ${\bf K} = \mathbb{E}_{{\bf w}} \left[\sigma\left({\bf w}^{\sf T}{\bf x}_i\right)\sigma\left({\bf w}^{\sf T}{\bf x}_j\right)\right]_{i,j=1}^n$, with nonlinear function $\sigma(\cdot)$, data ${\bf x}_1, \ldots, {\bf x}_n \in \mathbb{R}^p$, and random projection vector ${\bf w} \in \mathbb{R}^p$ having i.i.d. entries. In a high-dimensional setting where the number of data $n$ and their dimension $p$ are both large and comparable, we show, under a Gaussian mixture model for the data, that the eigenspectrum of ${\bf K}$ is independent of the distribution of the i.i.d.(zero-mean and unit-variance) entries of ${\bf w}$, and only depends on $\sigma(\cdot)$ via its (generalized) Gaussian moments $\mathbb{E}_{z\sim \mathcal N(0,1)}[\sigma'(z)]$ and $\mathbb{E}_{z\sim \mathcal N(0,1)}[\sigma''(z)]$. As a result, for any kernel matrix ${\bf K}$ of the form above, we propose a novel random features technique, called Ternary Random Feature (TRF), that (i) asymptotically yields the same limiting kernel as the original ${\bf K}$ in a spectral sense and (ii) can be computed and stored much more efficiently, by wisely tuning (in a data-dependent manner) the function $\sigma$ and the random vector ${\bf w}$, both taking values in $\{-1,0,1\}$. The computation of the proposed random features requires no multiplication, and a factor of $b$ times less bits for storage compared to classical random features such as random Fourier features, with $b$ the number of bits to store full precision values. Besides, it appears in our experiments on real data that the substantial gains in computation and storage are accompanied with somewhat improved performances compared to state-of-the-art random features compression/quantization methods.

artificial intelligence, machine learning, neural network, (14 more...)

arXiv.org Machine Learning

2110.01899

Country:

Europe (0.67)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Hessian Eigenspectra of More Realistic Nonlinear Models

Liao, Zhenyu, Mahoney, Michael W.

arXiv.org Machine LearningMar-16-2021

Given an optimization problem, the Hessian matrix and its eigenspectrum can be used in many ways, ranging from designing more efficient second-order algorithms to performing model analysis and regression diagnostics. When nonlinear models and non-convex problems are considered, strong simplifying assumptions are often made to make Hessian spectral analysis more tractable. This leads to the question of how relevant the conclusions of such analyses are for more realistic nonlinear models. In this paper, we exploit deterministic equivalent techniques from random matrix theory to make a \emph{precise} characterization of the Hessian eigenspectra for a broad family of nonlinear models, including models that generalize the classical generalized linear models, without relying on strong simplifying assumptions used previously. We show that, depending on the data properties, the nonlinear response model, and the loss function, the Hessian can have \emph{qualitatively} different spectral behaviors: of bounded or unbounded support, with single- or multi-bulk, and with isolated eigenvalues on the left- or right-hand side of the bulk. By focusing on such a simple but nontrivial nonlinear model, our analysis takes a step forward to unveil the theoretical origin of many visually striking features observed in more complex machine learning models.

eigenvalue, neural network, optimization problem, (18 more...)

arXiv.org Machine Learning

2103.01519

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Sparse sketches with small inversion bias

Dereziński, Michał, Liao, Zhenyu, Dobriban, Edgar, Mahoney, Michael W.

arXiv.org Machine LearningNov-20-2020

For a tall $n\times d$ matrix $A$ and a random $m\times n$ sketching matrix $S$, the sketched estimate of the inverse covariance matrix $(A^\top A)^{-1}$ is typically biased: $E[(\tilde A^\top\tilde A)^{-1}]\ne(A^\top A)^{-1}$, where $\tilde A=SA$. This phenomenon, which we call inversion bias, arises, e.g., in statistics and distributed optimization, when averaging multiple independently constructed estimates of quantities that depend on the inverse covariance. We develop a framework for analyzing inversion bias, based on our proposed concept of an $(\epsilon,\delta)$-unbiased estimator for random matrices. We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, then after simple rescaling, the estimator $(\frac m{m-d}\tilde A^\top\tilde A)^{-1}$ is $(\epsilon,\delta)$-unbiased for $(A^\top A)^{-1}$ with a sketch of size $m=O(d+\sqrt d/\epsilon)$. This implies that for $m=O(d)$, the inversion bias of this estimator is $O(1/\sqrt d)$, which is much smaller than the $\Theta(1)$ approximation error obtained as a consequence of the subspace embedding guarantee for sub-gaussian sketches. We then propose a new sketching technique, called LEverage Score Sparsified (LESS) embeddings, which uses ideas from both data-oblivious sparse embeddings as well as data-aware leverage-based row sampling methods, to get $\epsilon$ inversion bias for sketch size $m=O(d\log d+\sqrt d/\epsilon)$ in time $O(\text{nnz}(A)\log n+md^2)$, where nnz is the number of non-zeros. The key techniques enabling our analysis include an extension of a classical inequality of Bai and Silverstein for random quadratic forms, which we call the Restricted Bai-Silverstein inequality; and anti-concentration of the Binomial distribution via the Paley-Zygmund inequality, which we use to prove a lower bound showing that leverage score sampling sketches generally do not achieve small inversion bias.

artificial intelligence, matrix, survey article, (20 more...)

arXiv.org Machine Learning

2011.10695

Country:

North America > United States > California (0.14)
North America > United States > New York (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.92)

Add feedback