AITopics | Yan, Fanqi

Collaborating Authors

Yan, Fanqi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective

Yan, Fanqi, Nguyen, Huy, Akbarian, Pedram, Ho, Nhat, Rinaldo, Alessandro

arXiv.org Artificial IntelligenceJan-31-2025

Transformer models [54] have been known as the state-of-the-art architecture for a wide range of machine learning and deep learning applications, including language modeling [16, 3, 47, 51], computer vision [17, 4, 46, 35], and reinforcement learning [5, 31, 25], etc. One of the central components that contribute to the success of the Transformer models is the self-attention mechanism, which enables sequence-to-sequence models to concentrate on relevant parts of the input data. In particular, for each token in an input sequence, the self-attention mechanism computes a context vector formulated as a weighted sum of the tokens, where more relevant tokens to the context are assigned larger weights than others (see Section 2.1 for a formal definition). Therefore, self-attention is able to capture long-range dependencies and complex relationships within the data. However, since the weights in the context vector are normalized by the softmax function, there might be an undesirable competition among the tokens, that is, an increase in the weight of a token leads to a decrease in the weights of others. As a consequence, the traditional softmax self-attention mechanism might focus only on a few aspects of the data and possibly ignore other informative features [48]. Additionally, [22] also discovered that the tokens' inner dependence on the attention scores owing to the softmax normalization partly causes the attention sink phenomenon occurring

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.00281

Country:

Asia (0.28)
Europe (0.27)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts

Yan, Fanqi, Nguyen, Huy, Le, Dung, Akbarian, Pedram, Ho, Nhat

arXiv.org Machine LearningOct-16-2024

We conduct the convergence analysis of parameter estimation in the contaminated mixture of experts. This model is motivated from the prompt learning problem where ones utilize prompts, which can be formulated as experts, to fine-tune a large-scaled pre-trained model for learning downstream tasks. There are two fundamental challenges emerging from the analysis: (i) the proportion in the mixture of the pre-trained model and the prompt may converge to zero where the prompt vanishes during the training; (ii) the algebraic interaction among parameters of the pre-trained model and the prompt can occur via some partial differential equation and decelerate the prompt learning. In response, we introduce a distinguishability condition to control the previous parameter interaction. Additionally, we also consider various types of expert structures to understand their effects on the parameter estimation. In each scenario, we provide comprehensive convergence rates of parameter estimation along with the corresponding minimax lower bounds.

artificial intelligence, equation, machine learning, (18 more...)

arXiv.org Machine Learning

2410.12258

Country:

Asia (0.27)
North America > United States > Texas (0.14)

Genre: Research Report (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts

Nguyen, Huy, Akbarian, Pedram, Yan, Fanqi, Ho, Nhat

arXiv.org Machine LearningSep-24-2023

Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive deep-learning architectures without increasing the computational cost. Despite its popularity in real-world applications, the theoretical understanding of that gating function has remained an open problem. The main challenge comes from the structure of the top-K sparse softmax gating function, which partitions the input space into multiple regions with distinct behaviors. By focusing on a Gaussian mixture of experts, we establish theoretical results on the effects of the top-K sparse softmax gating function on both density and parameter estimations. Our results hinge upon defining novel loss functions among parameters to capture different behaviors of the input regions. When the true number of experts $k_{\ast}$ is known, we demonstrate that the convergence rates of density and parameter estimations are both parametric on the sample size. However, when $k_{\ast}$ becomes unknown and the true model is over-specified by a Gaussian mixture of $k$ experts where $k > k_{\ast}$, our findings suggest that the number of experts selected from the top-K sparse softmax gating function must exceed the total cardinality of a certain number of Voronoi cells associated with the true parameters to guarantee the convergence of the density estimation. Moreover, while the density estimation rate remains parametric under this setting, the parameter estimation rates become substantially slow due to an intrinsic interaction between the softmax gating and expert functions.

artificial intelligence, exp, machine learning, (16 more...)

arXiv.org Machine Learning

2309.1385

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Hybrid Probabilistic Inference with Logical Constraints: Tractability and Message-Passing

Zeng, Zhe, Yan, Fanqi, Morettin, Paolo, Vergari, Antonio, Broeck, Guy Van den

arXiv.org Artificial IntelligenceSep-20-2019

Weighted model integration (WMI) is a very appealing framework for probabilistic inference: it allows to express the complex dependencies of real-world hybrid scenarios where variables are heterogeneous in nature (both continuous and discrete) via the language of Satisfiability Modulo Theories (SMT); as well as computing probabilistic queries with arbitrarily complex logical constraints. Recent work has shown WMI inference to be reducible to a model integration (MI) problem, under some assumptions, thus effectively allowing hybrid probabilistic reasoning by volume computations. In this paper, we introduce a novel formulation of MI via a message passing scheme that allows to efficiently compute the marginal densities and statistical moments of all the variables in linear time. As such, we are able to amortize inference for arbitrarily rich MI queries when they conform to the problem structure, here represented as the primal graph associated to the SMT formula. Furthermore, we theoretically trace the tractability boundaries of exact MI. Indeed, we prove that in terms of the structural requirements on the primal graph that make our MI algorithm tractable - bounding its diameter and treewidth - the bounds are not only sufficient, but necessary for tractable inference via MI.

artificial intelligence, formula, integration, (15 more...)

arXiv.org Artificial Intelligence

1909.09362

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)

Add feedback