AITopics | Tran, Thanh

Collaborating Authors

Tran, Thanh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Equivariant Neural Functional Networks for Transformers

Tran, Viet-Hoang, Vo, Thieu N., The, An Nguyen, Huu, Tho Tran, Nguyen-Nhat, Minh-Khoi, Tran, Thanh, Pham, Duy-Tung, Nguyen, Tan Minh

arXiv.org Artificial IntelligenceOct-5-2024

This paper systematically explores neural functional networks (NFN) for transformer architectures. NFN are specialized neural networks that treat the weights, gradients, or sparsity patterns of a deep neural network (DNN) as input data and have proven valuable for tasks such as learnable optimizers, implicit data representations, and weight editing. While NFN have been extensively developed for MLP and CNN, no prior work has addressed their design for transformers, despite the importance of transformers in modern deep learning. This paper aims to address this gap by providing a systematic study of NFN for transformers. We first determine the maximal symmetric group of the weights in a multi-head attention module as well as a necessary and sufficient condition under which two sets of hyperparameters of the multi-head attention module define the same function. We then define the weight space of transformer architectures and its associated group action, which leads to the design principles for NFN in transformers. Based on these, we introduce Transformer-NFN, an NFN that is equivariant under this group action. Additionally, we release a dataset of more than 125,000 Transformers model checkpoints trained on two datasets with two different tasks, providing a benchmark for evaluating Transformer-NFN and encouraging further research on transformer training and performance.

machine learning, natural language, pseudocode, (19 more...)

arXiv.org Artificial Intelligence

2410.04209

Country:

North America > United States > Texas (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Equivariant Polynomial Functional Networks

Vo, Thieu N., Tran, Viet-Hoang, Huu, Tho Tran, The, An Nguyen, Tran, Thanh, Nguyen-Nhat, Minh-Khoi, Pham, Duy-Tung, Nguyen, Tan Minh

arXiv.org Artificial IntelligenceOct-5-2024

Neural Functional Networks (NFNs) have gained increasing interest due to their wide range of applications, including extracting information from implicit representations of data, editing network weights, and evaluating policies. A key design principle of NFNs is their adherence to the permutation and scaling symmetries inherent in the connectionist structure of the input neural networks. Recent NFNs have been proposed with permutation and scaling equivariance based on either graph-based message-passing mechanisms or parameter-sharing mechanisms. However, graph-based equivariant NFNs suffer from high memory consumption and long running times. On the other hand, parameter-sharing-based NFNs built upon equivariant linear layers exhibit lower memory consumption and faster running time, yet their expressivity is limited due to the large size of the symmetric group of the input neural networks. The challenge of designing a permutation and scaling equivariant NFN that maintains low memory consumption and running time while preserving expressivity remains unresolved. In this paper, we propose a novel solution with the development of MAGEP-NFN (Monomial mAtrix Group Equivariant Polynomial NFN). Our approach follows the parameter-sharing mechanism but differs from previous works by constructing a nonlinear equivariant layer represented as a polynomial in the input weights. This polynomial formulation enables us to incorporate additional relationships between weights from different input hidden layers, enhancing the model's expressivity while keeping memory consumption and running time low, thereby addressing the aforementioned challenge. We provide empirical evidence demonstrating that MAGEP-NFN achieves competitive performance and efficiency compared to existing baselines.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.04213

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

Lee, Jooyoung, Yang, Fan, Tran, Thanh, Hu, Qian, Barut, Emre, Chang, Kai-Wei, Su, Chengwei

arXiv.org Artificial IntelligenceApr-4-2024

We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resource-efficient in the sense that it only requires training the lightweight LM. We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals. We assess our method with multi-hop extractive question answering (QA) benchmarks, HotpotQA, and 2WikiMultiHopQA. Experimental results show that our approach outperforms all baselines regarding answer prediction accuracy. We also find that reinforcement learning helps the model to produce higher-quality rationales with improved QA performance.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2404.03414

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.69)

Add feedback

CAPTAIN at COLIEE 2023: Efficient Methods for Legal Information Retrieval and Entailment Tasks

Nguyen, Chau, Nguyen, Phuong, Tran, Thanh, Nguyen, Dat, Trieu, An, Pham, Tin, Dang, Anh, Nguyen, Le-Minh

arXiv.org Artificial IntelligenceJan-7-2024

The Competition on Legal Information Extraction/Entailment (COLIEE) is held annually to encourage advancements in the automatic processing of legal texts. Processing legal documents is challenging due to the intricate structure and meaning of legal language. In this paper, we outline our strategies for tackling Task 2, Task 3, and Task 4 in the COLIEE 2023 competition. Our approach involved utilizing appropriate state-of-the-art deep learning methods, designing methods based on domain characteristics observation, and applying meticulous engineering practices and methodologies to the competition. As a result, our performance in these tasks has been outstanding, with first places in Task 2 and Task 3, and promising results in Task 4. Our source code is available at https://github.com/Nguyen2015/CAPTAIN-COLIEE2023/tree/coliee2023.

information retrieval, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2401.03551

Country: Asia > Japan > Honshū (0.15)

Genre: Research Report > New Finding (0.46)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

JAMES: Normalizing Job Titles with Multi-Aspect Graph Embeddings and Reasoning

Yamashita, Michiharu, Shen, Jia Tracy, Tran, Thanh, Ekhtiari, Hamoon, Lee, Dongwon

arXiv.org Artificial IntelligenceOct-23-2023

In online job marketplaces, it is important to establish a well-defined job title taxonomy for various downstream tasks (e.g., job recommendation, users' career analysis, and turnover prediction). Job Title Normalization (JTN) is such a cleaning step to classify user-created non-standard job titles into normalized ones. However, solving the JTN problem is non-trivial with challenges: (1) semantic similarity of different job titles, (2) non-normalized user-created job titles, and (3) large-scale and long-tailed job titles in real-world applications. To this end, we propose a novel solution, named JAMES, that constructs three unique embeddings (i.e., graph, contextual, and syntactic) of a target job title to effectively capture its various traits. We further propose a multi-aspect co-attention mechanism to attentively combine these embeddings, and employ neural logical reasoning representations to collaboratively estimate similarities between messy job titles and normalized job titles in a reasoning space. To evaluate JAMES, we conduct comprehensive experiments against ten competing models on a large-scale real-world dataset with over 350,000 job titles. Our experimental results show that JAMES significantly outperforms the best baseline by 10.06% in Precision@10 and by 17.52% in NDCG@10, respectively.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2202.10739

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding

Wei, Kai, Tran, Thanh, Chang, Feng-Ju, Sathyendra, Kanthashree Mysore, Muniyappa, Thejaswi, Liu, Jing, Raju, Anirudh, McGowan, Ross, Susanj, Nathan, Rastrow, Ariya, Strimel, Grant P.

arXiv.org Artificial IntelligenceDec-13-2021

Recent years have seen significant advances in end-to-end (E2E) spoken language understanding (SLU) systems, which directly predict intents and slots from spoken audio. While dialogue history has been exploited to improve conventional text-based natural language understanding systems, current E2E SLU approaches have not yet incorporated such critical contextual signals in multi-turn and task-oriented dialogues. In this work, we propose a contextual E2E SLU model architecture that uses a multi-head attention mechanism over encoded previous utterances and dialogue acts (actions taken by the voice assistant) of a multi-turn dialogue. We detail alternative methods to integrate these contexts into the state-ofthe-art recurrent and transformer-based models. When applied to a large de-identified dataset of utterances collected by a voice assistant, our method reduces average word and semantic error rates by 10.8% and 12.6%, respectively. We also present results on a publicly available dataset and show that our method significantly improves performance over a noncontextual baseline

machine learning, natural language, utterance, (17 more...)

arXiv.org Artificial Intelligence

2112.06743

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

HABERTOR: An Efficient and Effective Deep Hatespeech Detector

Tran, Thanh, Hu, Yifan, Hu, Changwei, Yen, Kevin, Tan, Fei, Lee, Kyumin, Park, Serim

arXiv.org Artificial IntelligenceOct-17-2020

We present our HABERTOR model for detecting hatespeech in large scale user-generated content. Inspired by the recent success of the BERT model, we propose several modifications to BERT to enhance the performance on the downstream hatespeech classification task. HABERTOR inherits BERT's architecture, but is different in four aspects: (i) it generates its own vocabularies and is pre-trained from the scratch using the largest scale hatespeech dataset; (ii) it consists of Quaternion-based factorized components, resulting in a much smaller number of parameters, faster training and inferencing, as well as less memory usage; (iii) it uses our proposed multi-source ensemble heads with a pooling layer for separate input sources, to further enhance its effectiveness; and (iv) it uses a regularized adversarial training with our proposed fine-grained and adaptive noise magnitude to enhance its robustness. Through experiments on the large-scale real-world hatespeech dataset with 1.4M annotated comments, we show that HABERTOR works better than 15 state-of-the-art hatespeech detection methods, including fine-tuning Language Models. In particular, comparing with BERT, our HABERTOR is 4~5 times faster in the training/inferencing phase, uses less than 1/3 of the memory, and has better performance, even though we pre-train it by using less than 1% of the number of words. Our generalizability analysis shows that HABERTOR transfers well to other unseen hatespeech datasets and is a more efficient and effective alternative to BERT for the hatespeech classification.

deep learning, habertor, neural network, (21 more...)

arXiv.org Artificial Intelligence

2010.08865

Genre: Research Report (1.00)

Industry:

Information Technology (0.46)
Law (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Quaternion-Based Self-Attentive Long Short-Term User Preference Encoding for Recommendation

Tran, Thanh, You, Di, Lee, Kyumin

arXiv.org Artificial IntelligenceAug-30-2020

Quaternion space has brought several benefits over the traditional Euclidean space: Quaternions (i) consist of a real and three imaginary components, encouraging richer representations; (ii) utilize Hamilton product which better encodes the inter-latent interactions across multiple Quaternion components; and (iii) result in a model with smaller degrees of freedom and less prone to overfitting. Unfortunately, most of the current recommender systems rely on real-valued representations in Euclidean space to model either user's long-term or short-term interests. In this paper, we fully utilize Quaternion space to model both user's long-term and short-term preferences. We first propose a QUaternion-based self-Attentive Long term user Encoding (QUALE) to study the user's long-term intents. Then, we propose a QUaternion-based self-Attentive Short term user Encoding (QUASE) to learn the user's short-term interests. To enhance our models' capability, we propose to fuse QUALE and QUASE into one model, namely QUALSE, by using a Quaternion-based gating mechanism. We further develop Quaternion-based Adversarial learning along with the Bayesian Personalized Ranking (QABPR) to improve our model's robustness. Extensive experiments on six real-world datasets show that our fused QUALSE model outperformed 11 state-of-the-art baselines, improving 8.43% at HIT@1 and 10.27% at NDCG@1 on average compared with the best baseline.

deep learning, neural network, quaternion, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3340531.3411926

2008.13335

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Signed Distance-based Deep Memory Recommender

Tran, Thanh, Liu, Xinyue, Lee, Kyumin, Kong, Xiangnan

arXiv.org Artificial IntelligenceMay-1-2019

Personalized recommendation algorithms learn a user's preference for an item by measuring a distance/similarity between them. However, some of the existing recommendation models (e.g., matrix factorization) assume a linear relationship between the user and item. This approach limits the capacity of recommender systems, since the interactions between users and items in real-world applications are much more complex than the linear relationship. To overcome this limitation, in this paper, we design and propose a deep learning framework called Signed Distance-based Deep Memory Recommender, which captures non-linear relationships between users and items explicitly and implicitly, and work well in both general recommendation task and shopping basket-based recommendation task. Through an extensive empirical study on six real-world datasets in the two recommendation tasks, our proposed approach achieved significant improvement over ten state-of-the-art recommendation models.

dataset, deep learning, neural network, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3308558.3313460

1905.00453

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Services (0.46)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Regularizing Matrix Factorization with User and Item Embeddings for Recommendation

Tran, Thanh, Lee, Kyumin, Liao, Yiming, Lee, Dongwon

arXiv.org Artificial IntelligenceAug-31-2018

Following recent successes in exploiting both latent factor and word embedding models in recommendation, we propose a novel Regularized Multi-Embedding (RME) based recommendation model that simultaneously encapsulates the following ideas via decomposition: (1) which items a user likes, (2) which two users co-like the same items, (3) which two items users often co-liked, and (4) which two items users often co-disliked. In experimental validation, the RME outperforms competing state-of-the-art models in both explicit and implicit feedback datasets, significantly improving Recall@5 by 5.9~7.0%, NDCG@20 by 4.3~5.6%, and MAP@10 by 7.9~8.9%. In addition, under the cold-start scenario for users with the lowest number of interactions, against the competing models, the RME outperforms NDCG@5 by 20.2% and 29.4% in MovieLens-10M and MovieLens-20M datasets, respectively. Our datasets and source code are available at: https://github.com/thanhdtran/RME.git.

artificial intelligence, dataset, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3269206.3271730

1809.00979

Country:

North America > United States (0.14)
Europe > Italy (0.14)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback