AITopics | Wang, Haoyue

Collaborating Authors

Wang, Haoyue

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models

Meng, Xiang, Behdin, Kayhan, Wang, Haoyue, Mazumder, Rahul

arXiv.org Artificial IntelligenceJun-11-2024

The impressive performance of Large Language Models (LLMs) across various natural language processing tasks comes at the cost of vast computational resources and storage requirements. One-shot pruning techniques offer a way to alleviate these burdens by removing redundant weights without the need for retraining. Yet, the massive scale of LLMs often forces current pruning approaches to rely on heuristics instead of optimization-based techniques, potentially resulting in suboptimal compression. In this paper, we introduce ALPS, an optimization-based framework that tackles the pruning problem using the operator splitting technique and a preconditioned conjugate gradient-based post-processing step. Our approach incorporates novel techniques to accelerate and theoretically guarantee convergence while leveraging vectorization and GPU parallelism for efficiency. ALPS substantially outperforms state-of-the-art methods in terms of the pruning objective and perplexity reduction, particularly for highly sparse models. On the OPT-30B model with 70% sparsity, ALPS achieves a 13% reduction in test perplexity on the WikiText dataset and a 19% improvement in zero-shot benchmark performance compared to existing methods.

large language model, machine learning, pruning, (16 more...)

arXiv.org Artificial Intelligence

2406.07831

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

On the Convergence of CART under Sufficient Impurity Decrease Condition

Mazumder, Rahul, Wang, Haoyue

arXiv.org Machine LearningOct-25-2023

The decision tree is a flexible machine learning model that finds its success in numerous applications. It is usually fitted in a recursively greedy manner using CART. In this paper, we investigate the convergence rate of CART under a regression setting. First, we establish an upper bound on the prediction error of CART under a sufficient impurity decrease (SID) condition \cite{chi2022asymptotic} -- our result improves upon the known result by \cite{chi2022asymptotic} under a similar assumption. Furthermore, we provide examples that demonstrate the error bound cannot be further improved by more than a constant or a logarithmic factor. Second, we introduce a set of easily verifiable sufficient conditions for the SID condition. Specifically, we demonstrate that the SID condition can be satisfied in the case of an additive model, provided that the component functions adhere to a ``locally reverse Poincar{\'e} inequality". We discuss several well-known function classes in non-parametric estimation to illustrate the practical utility of this concept.

artificial intelligence, inequality, machine learning, (18 more...)

arXiv.org Machine Learning

2310.17114

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Linear regression with partially mismatched data: local search with theoretical guarantees

Mazumder, Rahul, Wang, Haoyue

arXiv.org Machine LearningJun-3-2021

Linear regression is a fundamental modeling tool in statistics and related fields. In this paper, we study an important variant of linear regression in which the predictor-response pairs are partially mismatched. We use an optimization formulation to simultaneously learn the underlying regression coefficients and the permutation corresponding to the mismatches. The combinatorial structure of the problem leads to computational challenges. We propose and study a simple greedy local search algorithm for this optimization problem that enjoys strong theoretical guarantees and appealing computational performance. We prove that under a suitable scaling of the number of mismatched pairs compared to the number of samples and features, and certain assumptions on problem data; our local search algorithm converges to a nearly-optimal solution at a linear rate. In particular, in the noiseless case, our algorithm converges to the global optimal solution with a linear convergence rate. We also propose an approximate local search step that allows us to scale our approach to much larger instances. We conduct numerical experiments to gather further insights into our theoretical results and show promising performance gains compared to existing approaches.

artificial intelligence, assumption 3, optimization problem, (18 more...)

arXiv.org Machine Learning

2106.02175

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Interior-point Methods Strike Back: Solving the Wasserstein Barycenter Problem

Ge, Dongdong, Wang, Haoyue, Xiong, Zikai, Ye, Yinyu

arXiv.org Machine LearningMay-30-2019

To compare, summarize, and combine probability measures defined on a space is a fundamental task in statistics and machine learning. Given support points of probability measures in a metric space and a transportation cost function (e.g. the Euclidean distance), Wasserstein distance defines a distance between two measures as the minimal transportation cost between them. This notion of distance leads to a host of important applications, including text classification [28], clustering [23, 24, 14], unsupervised learning [21], semi-supervised learning [44], statistics [36, 37, 46, 19], and others [5, 39, 45]. Given a set of measures in the same space, the 2-Wasserstein barycenter is defined as the measure minimizing the sum of squared 2-Wasserstein distances to all measures in the set. For example, if a set of images (with common structure but varying noise) are modeled as probability measures, then the Wasserstein barycenter is a mixture of the images that share this common structure.

artificial intelligence, barycenter, machine learning, (15 more...)

arXiv.org Machine Learning

1905.12895

Country:

Asia (0.28)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.68)

Add feedback