AITopics | Nguyen, Tan Minh

Collaborating Authors

Nguyen, Tan Minh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Revisiting Kernel Attention with Correlated Gaussian Process Representation

Bui, Long Minh, Huu, Tho Tran, Dinh, Duy, Nguyen, Tan Minh, Hoang, Trong Nghia

arXiv.org Artificial IntelligenceFeb-27-2025

Transformers have increasingly become the de facto method to model sequential data with state-of-the-art performance. Due to its widespread use, being able to estimate and calibrate its modeling uncertainty is important to understand and design robust transformer models. To achieve this, previous works have used Gaussian processes (GPs) to perform uncertainty calibration for the attention units of transformers and attained notable successes. However, such approaches have to confine the transformers to the space of symmetric attention to ensure the necessary symmetric requirement of their GP's kernel specification, which reduces the representation capacity of the model. To mitigate this restriction, we propose the Correlated Gaussian Process Transformer (CGPT), a new class of transformers whose self-attention units are modeled as cross-covariance between two correlated GPs (CGPs). This allows asymmetries in attention and can enhance the representation capacity of GP-based transformers. We also derive a sparse approximation for CGP to make it scale better. Our empirical studies show that both CGP-based and sparse CGP-based transformers achieve better performance than state-of-the-art GP-based transformers on a variety of benchmark tasks. The code for our experiments is available at https://github.com/MinhLong210/CGP-Transformers.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.20525

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

A Clifford Algebraic Approach to E(n)-Equivariant High-order Graph Neural Networks

Tran, Hoang-Viet, Vo, Thieu N., Huu, Tho Tran, Nguyen, Tan Minh

arXiv.org Machine LearningOct-6-2024

Designing neural network architectures that can handle data symmetry is crucial. This is especially important for geometric graphs whose properties are equivariance under Euclidean transformations. Current equivariant graph neural networks (EGNNs), particularly those using message passing, have a limitation in expressive power. Recent high-order graph neural networks can overcome this limitation, yet they lack equivariance properties, representing a notable drawback in certain applications in chemistry and physical sciences. In this paper, we introduce the Clifford Group Equivariant Graph Neural Networks (CG-EGNNs), a novel EGNN that enhances high-order message passing by integrating high-order local structures in the context of Clifford algebras. As a key benefit of using Clifford algebras, CG-EGNN can learn functions that capture equivariance from positional features. By adopting the high-order message passing mechanism, CG-EGNN gains richer information from neighbors, thus improving model performance. Furthermore, we establish the universality property of the $k$-hop message passing framework, showcasing greater expressive power of CG-EGNNs with additional $k$-hop message passing mechanism. We empirically validate that CG-EGNNs outperform previous methods on various benchmarks including n-body, CMU motion capture, and MD17, highlighting their effectiveness in geometric deep learning.

artificial intelligence, machine learning, neural network, (13 more...)

arXiv.org Machine Learning

2410.04692

Country:

Europe (0.14)
Asia (0.14)

Genre: Research Report (0.63)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Equivariant Neural Functional Networks for Transformers

Tran, Viet-Hoang, Vo, Thieu N., The, An Nguyen, Huu, Tho Tran, Nguyen-Nhat, Minh-Khoi, Tran, Thanh, Pham, Duy-Tung, Nguyen, Tan Minh

arXiv.org Artificial IntelligenceOct-5-2024

This paper systematically explores neural functional networks (NFN) for transformer architectures. NFN are specialized neural networks that treat the weights, gradients, or sparsity patterns of a deep neural network (DNN) as input data and have proven valuable for tasks such as learnable optimizers, implicit data representations, and weight editing. While NFN have been extensively developed for MLP and CNN, no prior work has addressed their design for transformers, despite the importance of transformers in modern deep learning. This paper aims to address this gap by providing a systematic study of NFN for transformers. We first determine the maximal symmetric group of the weights in a multi-head attention module as well as a necessary and sufficient condition under which two sets of hyperparameters of the multi-head attention module define the same function. We then define the weight space of transformer architectures and its associated group action, which leads to the design principles for NFN in transformers. Based on these, we introduce Transformer-NFN, an NFN that is equivariant under this group action. Additionally, we release a dataset of more than 125,000 Transformers model checkpoints trained on two datasets with two different tasks, providing a benchmark for evaluating Transformer-NFN and encouraging further research on transformer training and performance.

machine learning, natural language, pseudocode, (19 more...)

arXiv.org Artificial Intelligence

2410.04209

Country:

North America > United States > Texas (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Equivariant Polynomial Functional Networks

Vo, Thieu N., Tran, Viet-Hoang, Huu, Tho Tran, The, An Nguyen, Tran, Thanh, Nguyen-Nhat, Minh-Khoi, Pham, Duy-Tung, Nguyen, Tan Minh

arXiv.org Artificial IntelligenceOct-5-2024

Neural Functional Networks (NFNs) have gained increasing interest due to their wide range of applications, including extracting information from implicit representations of data, editing network weights, and evaluating policies. A key design principle of NFNs is their adherence to the permutation and scaling symmetries inherent in the connectionist structure of the input neural networks. Recent NFNs have been proposed with permutation and scaling equivariance based on either graph-based message-passing mechanisms or parameter-sharing mechanisms. However, graph-based equivariant NFNs suffer from high memory consumption and long running times. On the other hand, parameter-sharing-based NFNs built upon equivariant linear layers exhibit lower memory consumption and faster running time, yet their expressivity is limited due to the large size of the symmetric group of the input neural networks. The challenge of designing a permutation and scaling equivariant NFN that maintains low memory consumption and running time while preserving expressivity remains unresolved. In this paper, we propose a novel solution with the development of MAGEP-NFN (Monomial mAtrix Group Equivariant Polynomial NFN). Our approach follows the parameter-sharing mechanism but differs from previous works by constructing a nonlinear equivariant layer represented as a polynomial in the input weights. This polynomial formulation enables us to incorporate additional relationships between weights from different input hidden layers, enhancing the model's expressivity while keeping memory consumption and running time low, thereby addressing the aforementioned challenge. We provide empirical evidence demonstrating that MAGEP-NFN achieves competitive performance and efficiency compared to existing baselines.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.04213

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Demystifying the Token Dynamics of Deep Selective State Space Models

Vo, Thieu N, Pham, Tung D., Tong, Xin T., Nguyen, Tan Minh

arXiv.org Artificial IntelligenceOct-4-2024

Selective state space models (SSM), such as Mamba, have gained prominence for their effectiveness in modeling sequential data. Despite their outstanding empirical performance, a comprehensive theoretical understanding of deep selective SSM remains elusive, hindering their further development and adoption for applications that need high fidelity. In this paper, we investigate the dynamical properties of tokens in a pre-trained Mamba model. In particular, we derive the dynamical system governing the continuous-time limit of the Mamba model and characterize the asymptotic behavior of its solutions. In the one-dimensional case, we prove that only one of the following two scenarios happens: either all tokens converge to zero, or all tokens diverge to infinity. We provide criteria based on model parameters to determine when each scenario occurs. For the convergent scenario, we empirically verify that this scenario negatively impacts the model's performance. For the divergent scenario, we prove that different tokens will diverge to infinity at different rates, thereby contributing unequally to the updates during model training. Based on these investigations, we propose two refinements for the model: excluding the convergent scenario and reordering tokens based on their importance scores, both aimed at improving practical performance. Our experimental results validate these refinements, offering insights into enhancing Mamba's effectiveness in real-world applications.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.03292

Country: North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Designing Robust Transformers using Robust Kernel Density Estimation

Han, Xing, Ren, Tongzheng, Nguyen, Tan Minh, Nguyen, Khai, Ghosh, Joydeep, Ho, Nhat

arXiv.org Artificial IntelligenceNov-8-2023

Transformer-based architectures have recently exhibited remarkable successes across different domains beyond just powering large language models. However, existing approaches typically focus on predictive accuracy and computational cost, largely ignoring certain other practical issues such as robustness to contaminated samples. In this paper, by re-interpreting the self-attention mechanism as a non-parametric kernel density estimator, we adapt classical robust kernel density estimation methods to develop novel classes of transformers that are resistant to adversarial attacks and data contamination. We first propose methods that down-weight outliers in RKHS when computing the self-attention operations. We empirically show that these methods produce improved performance over existing state-of-the-art methods, particularly on image data under adversarial attacks. Then we leverage the median-of-means principle to obtain another efficient approach that results in noticeably enhanced performance and robustness on language modeling and time series classification tasks. Our methods can be combined with existing transformers to augment their robust properties, thus promising to impact a wide variety of applications.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.05794

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > Promising Solution (0.66)

Industry:

Information Technology > Security & Privacy (0.68)
Government > Military (0.54)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback