AITopics | Yang, Haizhao

Collaborating Authors

Yang, Haizhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Parsing the Language of Expression: Enhancing Symbolic Regression with Domain-Aware Symbolic Priors

Huang, Sikai, Wen, Yixin Berry, Adusumilli, Tara, Choudhary, Kusum, Yang, Haizhao

arXiv.org Artificial IntelligenceMar-12-2025

Symbolic regression is essential for deriving interpretable expressions that elucidate complex phenomena by exposing the underlying mathematical and physical relationships in data. In this paper, we present an advanced symbolic regression method that integrates symbol priors from diverse scientific domains - including physics, biology, chemistry, and engineering - into the regression process. By systematically analyzing domain-specific expressions, we derive probability distributions of symbols to guide expression generation. We propose novel tree-structured recurrent neural networks (RNNs) that leverage these symbol priors, enabling domain knowledge to steer the learning process. Additionally, we introduce a hierarchical tree structure for representing expressions, where unary and binary operators are organized to facilitate more efficient learning. To further accelerate training, we compile characteristic expression blocks from each domain and include them in the operator dictionary, providing relevant building blocks. Experimental results demonstrate that leveraging symbol priors significantly enhances the performance of symbolic regression, resulting in faster convergence and higher accuracy.

artificial intelligence, expression, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.09592

Country:

North America > United States > Maryland (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)
North America > United States > Florida (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

From Equations to Insights: Unraveling Symbolic Structures in PDEs with LLMs

Bhatnagar, Rohan, Liang, Ling, Patel, Krish, Yang, Haizhao

arXiv.org Artificial IntelligenceMar-12-2025

Motivated by the remarkable success of artificial intelligence (AI) across diverse fields, the application of AI to solve scientific problems-often formulated as partial differential equations (PDEs)-has garnered increasing attention. While most existing research concentrates on theoretical properties (such as well-posedness, regularity, and continuity) of the solutions, alongside direct AI-driven methods for solving PDEs, the challenge of uncovering symbolic relationships within these equations remains largely unexplored. In this paper, we propose leveraging large language models (LLMs) to learn such symbolic relationships. Our results demonstrate that LLMs can effectively predict the operators involved in PDE solutions by utilizing the symbolic information in the PDEs. Furthermore, we show that discovering these symbolic relationships can substantially improve both the efficiency and accuracy of the finite expression method for finding analytical approximation of PDE solutions, delivering a fully interpretable solution pipeline. This work opens new avenues for understanding the symbolic structure of scientific problems and advancing their solution processes.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.09986

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Curse of Dimensionality in Neural Network Optimization

Na, Sanghoon, Yang, Haizhao

arXiv.org Machine LearningFeb-7-2025

The curse of dimensionality in neural network optimization under the mean-field regime is studied. It is demonstrated that when a shallow neural network with a Lipschitz continuous activation function is trained using either empirical or population risk to approximate a target function that is $r$ times continuously differentiable on $[0,1]^d$, the population risk may not decay at a rate faster than $t^{-\frac{4r}{d-2r}}$, where $t$ is an analog of the total number of optimization iterations. This result highlights the presence of the curse of dimensionality in the optimization computation required to achieve a desired accuracy. Instead of analyzing parameter evolution directly, the training dynamics are examined through the evolution of the parameter distribution under the 2-Wasserstein gradient flow. Furthermore, it is established that the curse of dimensionality persists when a locally Lipschitz continuous activation function is employed, where the Lipschitz constant in $[-x,x]$ is bounded by $O(x^\delta)$ for any $x \in \mathbb{R}$. In this scenario, the population risk is shown to decay at a rate no faster than $t^{-\frac{(4+2\delta)r}{d-2r}}$. To the best of our knowledge, this work is the first to analyze the impact of function smoothness on the curse of dimensionality in neural network optimization theory.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Machine Learning

2502.0536

Country: North America > United States > Maryland (0.14)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

PINS: Proximal Iterations with Sparse Newton and Sinkhorn for Optimal Transport

Wu, Di, Liang, Ling, Yang, Haizhao

arXiv.org Artificial IntelligenceFeb-5-2025

Optimal transport (OT) is a critical problem in optimization and machine learning, where accuracy and efficiency are paramount. Although entropic regularization and the Sinkhorn algorithm improve scalability, they frequently encounter numerical instability and slow convergence, especially when the regularization parameter is small. In this work, we introduce Proximal Iterations with Sparse Newton and Sinkhorn methods (PINS) to efficiently compute highly accurate solutions for large-scale OT problems. A reduced computational complexity through overall sparsity and global convergence are guaranteed by rigorous theoretical analysis. Our approach offers three key advantages: it achieves accuracy comparable to exact solutions, progressively accelerates each iteration for greater efficiency, and enhances robustness by reducing sensitivity to regularization parameters. Extensive experiments confirm these advantages, demonstrating superior performance compared to related methods.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.03749

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks

Chen, Ke, Yi, Chugang, Yang, Haizhao

arXiv.org Machine LearningOct-2-2024

We study the implicit bias towards low-rank weight matrices when training neural networks (NN) with Weight Decay (WD). We prove that when a ReLU NN is sufficiently trained with Stochastic Gradient Descent (SGD) and WD, its weight matrix is approximately a rank-two matrix. Empirically, we demonstrate that WD is a necessary condition for inducing this low-rank bias across both regression and classification tasks. Our work differs from previous studies as our theoretical analysis does not rely on common assumptions regarding the training data distribution, optimality of weight matrices, or specific training procedures. Furthermore, by leveraging the low-rank bias, we derive improved generalization error bounds and provide numerical evidence showing that better generalization can be achieved. Thus, our work offers both theoretical and empirical insights into the strong generalization performance of SGD when combined with WD.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

2410.02176

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Solving High-Dimensional Partial Integral Differential Equations: The Finite Expression Method

Hardwick, Gareth, Liang, Senwei, Yang, Haizhao

arXiv.org Artificial IntelligenceOct-1-2024

In this paper, we introduce a new finite expression method (FEX) to solve high-dimensional partial integro-differential equations (PIDEs). This approach builds upon the original FEX and its inherent advantages with new advances: 1) A novel method of parameter grouping is proposed to reduce the number of coefficients in high-dimensional function approximation; 2) A Taylor series approximation method is implemented to significantly improve the computational efficiency and accuracy of the evaluation of the integral terms of PIDEs. The new FEX based method, denoted FEX-PG to indicate the addition of the parameter grouping (PG) step to the algorithm, provides both high accuracy and interpretable numerical solutions, with the outcome being an explicit equation that facilitates intuitive understanding of the underlying solution structures. These features are often absent in traditional methods, such as finite element methods (FEM) and finite difference methods, as well as in deep learning-based approaches. To benchmark our method against recent advances, we apply the new FEX-PG to solve benchmark PIDEs in the literature. In high-dimensional settings, FEX-PG exhibits strong and robust performance, achieving relative errors on the order of single precision machine epsilon.

artificial intelligence, dimension, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.00835

Country: North America > United States > Maryland (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Accelerating Multi-Block Constrained Optimization Through Learning to Optimize

Liang, Ling, Austin, Cameron, Yang, Haizhao

arXiv.org Artificial IntelligenceSep-25-2024

Learning to Optimize (L2O) approaches, including algorithm unrolling, plug-and-play methods, and hyperparameter learning, have garnered significant attention and have been successfully applied to the Alternating Direction Method of Multipliers (ADMM) and its variants. However, the natural extension of L2O to multi-block ADMM-type methods remains largely unexplored. Such an extension is critical, as multi-block methods leverage the separable structure of optimization problems, offering substantial reductions in per-iteration complexity. Given that classical multi-block ADMM does not guarantee convergence, the Majorized Proximal Augmented Lagrangian Method (MPALM), which shares a similar form with multi-block ADMM and ensures convergence, is more suitable in this setting. Despite its theoretical advantages, MPALM's performance is highly sensitive to the choice of penalty parameters. To address this limitation, we propose a novel L2O approach that adaptively selects this hyperparameter using supervised learning. We demonstrate the versatility and effectiveness of our method by applying it to the Lasso problem and the optimal transport problem. Our numerical results show that the proposed framework outperforms popular alternatives. Given its applicability to generic linearly constrained composite optimization problems, this work opens the door to a wide range of potential real-world applications.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2409.1732

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Energy > Power Industry (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Vertex Exchange Method for a Class of Quadratic Programming Problems

Liang, Ling, Toh, Kim-Chuan, Yang, Haizhao

arXiv.org Artificial IntelligenceJul-3-2024

A vertex exchange method is proposed for solving the strongly convex quadratic program subject to the generalized simplex constraint. We conduct rigorous convergence analysis for the proposed algorithm and demonstrate its essential roles in solving some important classes of constrained convex optimization. To get a feasible initial point to execute the algorithm, we also present and analyze a highly efficient semismooth Newton method for computing the projection onto the generalized simplex. The excellent practical performance of the proposed algorithms is demonstrated by a set of extensive numerical experiments. Our theoretical and numerical results further motivate the potential applications of the considered model and the proposed algorithms.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2407.03294

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

FMint: Bridging Human Designed and Data Pretrained Models for Differential Equation Foundation Model

Song, Zezheng, Yuan, Jiaxin, Yang, Haizhao

arXiv.org Artificial IntelligenceMay-22-2024

In this paper, we propose a pre-trained foundation model \textbf{FMint} (\textbf{F}oundation \textbf{M}odel based on \textbf{In}i\textbf{t}ialization), designed to speed up large-scale simulations of various differential equations with high accuracy via error correction. Human-designed simulation algorithms excel at capturing the fundamental physics of engineering problems, but often need to balance the trade-off between accuracy and efficiency. While deep learning methods offer innovative solutions across numerous scientific fields, they frequently fall short in domain-specific knowledge. FMint bridges these gaps through conditioning on the initial coarse solutions obtained from conventional human-designed algorithms, and trained to obtain refined solutions for various differential equations. Based on the backbone of large language models, we adapt the in-context learning scheme to learn a universal error correction method for dynamical systems from given prompted sequences of coarse solutions. The model is pre-trained on a corpus of 600K ordinary differential equations (ODEs), and we conduct extensive experiments on both in-distribution and out-of-distribution tasks. FMint outperforms various baselines on large-scale simulation, and demonstrates its capability in generalization to unseen ODEs. Our approach achieves an accuracy improvement of 1 to 2 orders of magnitude over state-of-the-art dynamical system simulators, and delivers a 5X speedup compared to traditional numerical algorithms.

arxiv preprint arxiv, large language model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2404.14688

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization

Liang, Ling, Yang, Haizhao

arXiv.org Artificial IntelligenceJan-23-2024

We consider a regularized expected reward optimization problem in the non-oblivious setting that covers many existing problems in reinforcement learning (RL). In order to solve such an optimization problem, we apply and analyze the classical stochastic proximal gradient method. In particular, the method has shown to admit an $O(\epsilon^{-4})$ sample complexity to an $\epsilon$-stationary point, under standard conditions. Since the variance of the classical stochastic gradient estimator is typically large which slows down the convergence, we also apply an efficient stochastic variance-reduce proximal gradient method with an importance sampling based ProbAbilistic Gradient Estimator (PAGE). To the best of our knowledge, the application of this method represents a novel approach in addressing the general regularized reward optimization problem. Our analysis shows that the sample complexity can be improved from $O(\epsilon^{-4})$ to $O(\epsilon^{-3})$ under additional conditions. Our results on the stochastic (variance-reduced) proximal gradient method match the sample complexity of their most competitive counterparts under similar settings in the RL literature.

artificial intelligence, machine learning, optimization problem, (11 more...)

arXiv.org Artificial Intelligence

2401.12508

Country:

North America > United States (0.14)
Asia (0.14)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback