AITopics | Lam, Hoang Thanh

Collaborating Authors

Lam, Hoang Thanh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WaveGAS: Waveform Relaxation for Scaling Graph Neural Networks

Vatter, Jana, Zayats, Mykhaylo, Galindo, Marcos Martínez, López, Vanessa, Mayer, Ruben, Jacobsen, Hans-Arno, Lam, Hoang Thanh

arXiv.org Artificial IntelligenceFeb-27-2025

With the ever-growing size of real-world graphs, numerous techniques to overcome resource limitations when training Graph Neural Networks (GNNs) have been developed. One such approach, GNNAutoScale (GAS), uses graph partitioning to enable training under constrained GPU memory. GAS also stores historical embedding vectors, which are retrieved from one-hop neighbors in other partitions, ensuring critical information is captured across partition boundaries. The historical embeddings which come from the previous training iteration are stale compared to the GAS estimated embeddings, resulting in approximation errors of the training algorithm. Furthermore, these errors accumulate over multiple layers, leading to suboptimal node embeddings. To address this shortcoming, we propose two enhancements: first, WaveGAS, inspired by waveform relaxation, performs multiple forward passes within GAS before the backward pass, refining the approximation of historical embeddings and gradients to improve accuracy; second, a gradient-tracking method that stores and utilizes more accurate historical gradients during training. Empirical results show that WaveGAS enhances GAS and achieves better accuracy, even outperforming methods that train on full graphs, thanks to its robust estimation of node embeddings.

artificial intelligence, iteration, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.19986

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Bavaria (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

TabularFM: An Open Framework For Tabular Foundational Models

Tran, Quan M., Hoang, Suong N., Nguyen, Lam M., Phan, Dzung, Lam, Hoang Thanh

arXiv.org Artificial IntelligenceJun-17-2024

Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured data, such as text and images, or semi-structured data, like time-series. However, there has been limited attention to structured data, such as tabular data, which, despite its prevalence, remains under-studied due to a lack of clean datasets and insufficient research on the transferability of FMs for various tabular data tasks. In response to this gap, we introduce a framework called TabularFM, which incorporates state-of-the-art methods for developing FMs specifically for tabular data. This includes variations of neural architectures such as GANs, VAEs, and Transformers. We have curated a million of tabular datasets and released cleaned versions to facilitate the development of tabular FMs. We pretrained FMs on this curated data, benchmarked various learning methods on these datasets, and released the pretrained models along with leaderboards for future comparative studies. Our fully open-sourced system provides a comprehensive analysis of the transferability of tabular FMs. By releasing these datasets, pretrained models, and leaderboards, we aim to enhance the validity and usability of tabular FMs in the near future.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.09837

Country:

Asia > Vietnam (0.14)
Europe > Germany (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Description Boosting for Zero-Shot Entity and Relation Classification

Picco, Gabriele, Fuchs, Leopold, Galindo, Marcos Martínez, Purpura, Alberto, López, Vanessa, Lam, Hoang Thanh

arXiv.org Artificial IntelligenceJun-4-2024

For entity recognition - including classification Named Entity Recognition (NER) and Relation and linking - and relation classification problems, Extraction (RE) allow for the extraction and categorization recent ZSL methods (Aly et al., 2021; Ledell Wu, of structured data from unstructured 2020; Chen and Li, 2021) rely on textual descriptions text, which in turn enables not only more accurate of entities or relations. Descriptions provide entity recognition and relationship extraction, but the required information about the semantics of entities also getting data from several unstructured sources, (or relations), which help the models to identify helping to build knowledge graphs and the semantic entity mentions in texts without observing them web. However, these methods usually rely on during training. Works such as (Ledell Wu, 2020; labeled data (usually human-annotated data) for a De Cao et al., 2021) and (Aly et al., 2021) show good performance, usually requiring domain experts how effective it is to use textual descriptions to perform for data acquisition and labeling, which may entity recognition tasks in the zero-shot context.

information retrieval, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.02245

Country:

Europe (1.00)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery

Lam, Hoang Thanh, Sbodio, Marco Luca, Galindo, Marcos Martínez, Zayats, Mykhaylo, Fernández-Díaz, Raúl, Valls, Víctor, Picco, Gabriele, Ramis, Cesar Berrospi, López, Vanessa

arXiv.org Artificial IntelligenceOct-19-2023

Recent research on predicting the binding affinity between drug molecules and proteins use representations learned, through unsupervised learning techniques, from large databases of molecule SMILES and protein sequences. While these representations have significantly enhanced the predictions, they are usually based on a limited set of modalities, and they do not exploit available knowledge about existing relations among molecules and proteins. In this study, we demonstrate that by incorporating knowledge graphs from diverse sources and modalities into the sequences or SMILES representation, we can further enrich the representation and achieve state-of-the-art results for drug-target binding affinity prediction in the established Therapeutic Data Commons (TDC) benchmarks. We release a set of multimodal knowledge graphs, integrating data from seven public data sources, and containing over 30 million triples. Our intention is to foster additional research to explore how multimodal knowledge enhanced protein/molecule embeddings can improve prediction tasks, including prediction of binding affinity. We also release some pretrained models learned from our multimodal knowledge graphs, along with source code for running standard benchmark tasks for prediction of biding affinity.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2306.12802

Country:

Europe > Switzerland (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Zshot: An Open-source Framework for Zero-Shot Named Entity Recognition and Relation Extraction

Picco, Gabriele, Galindo, Marcos Martínez, Purpura, Alberto, Fuchs, Leopold, López, Vanessa, Lam, Hoang Thanh

arXiv.org Artificial IntelligenceJul-25-2023

The Zero-Shot Learning (ZSL) task pertains to the identification of entities or relations in texts that were not seen during training. ZSL has emerged as a critical research area due to the scarcity of labeled data in specific domains, and its applications have grown significantly in recent years. With the advent of large pretrained language models, several novel methods have been proposed, resulting in substantial improvements in ZSL performance. There is a growing demand, both in the research community and industry, for a comprehensive ZSL framework that facilitates the development and accessibility of the latest methods and pretrained models.In this study, we propose a novel ZSL framework called Zshot that aims to address the aforementioned challenges. Our primary objective is to provide a platform that allows researchers to compare different state-of-the-art ZSL methods with standard benchmark datasets. Additionally, we have designed our framework to support the industry with readily available APIs for production under the standard SpaCy NLP pipeline. Our API is extendible and evaluable, moreover, we include numerous enhancements such as boosting the accuracy with pipeline ensembling and visualization utilities available as a SpaCy extension.

artificial intelligence, natural language, zshot, (17 more...)

arXiv.org Artificial Intelligence

2307.13497

Country:

Europe (1.00)
Asia > China (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Evaluating Robustness of Cooperative MARL: A Model-based Approach

Pham, Nhan H., Nguyen, Lam M., Chen, Jie, Lam, Hoang Thanh, Das, Subhro, Weng, Tsui-Wei

arXiv.org Artificial IntelligenceFeb-7-2022

In recent years, a proliferation of methods were developed for cooperative multi-agent reinforcement learning (c-MARL). However, the robustness of c-MARL agents against adversarial attacks has been rarely explored. In this paper, we propose to evaluate the robustness of c-MARL agents via a model-based approach. Our proposed formulation can craft stronger adversarial state perturbations of c-MARL agents(s) to lower total team rewards more than existing model-free approaches. In addition, we propose the first victim-agent selection strategy which allows us to develop even stronger adversarial attack. Numerical experiments on multi-agent MuJoCo benchmarks illustrate the advantage of our approach over other baselines. The proposed model-based attack consistently outperforms other baselines in all tested environments.

machine learning, model-based approach, reinforcement learning, (3 more...)

arXiv.org Artificial Intelligence

2202.03558

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

Ensembling Graph Predictions for AMR Parsing

Lam, Hoang Thanh, Picco, Gabriele, Hou, Yufang, Lee, Young-Suk, Nguyen, Lam M., Phan, Dzung T., López, Vanessa, Astudillo, Ramon Fernandez

arXiv.org Artificial IntelligenceOct-18-2021

In many machine learning tasks, models are trained to predict structure data such as graphs. For example, in natural language processing, it is very common to parse texts into dependency trees or abstract meaning representation (AMR) graphs. On the other hand, ensemble methods combine predictions from multiple models to create a new one that is more robust and accurate than individual predictions. In the literature, there are many ensembling techniques proposed for classification or regression problems, however, ensemble graph prediction has not been studied thoroughly. In this work, we formalize this problem as mining the largest graph that is the most supported by a collection of graph predictions. As the problem is NP-Hard, we propose an efficient heuristic algorithm to approximate the optimal solution. To validate our approach, we carried out experiments in AMR parsing problems. The experimental results demonstrate that the proposed approach can combine the strength of state-of-the-art AMR parsers to create new predictions that are more accurate than any individual models in five standard benchmark datasets.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2110.09131

Country:

Europe > Spain (0.28)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Neural Feature Learning From Relational Database

Lam, Hoang Thanh, Minh, Tran Ngoc, Sinn, Mathieu, Buesser, Beat, Wistuba, Martin

arXiv.org Artificial IntelligenceJun-17-2018

Feature engineering is one of the most important but most tedious tasks in data science. This work studies automation of feature learning from relational database. We first prove theoretically that finding the optimal features from relational data for predictive tasks is NP-hard. We propose an efficient rule-based approach based on heuristics and a deep neural network to automatically learn appropriate features from relational data. We benchmark our approaches in ensembles in past Kaggle competitions. Our new approach wins late medals and beats the state-of-the-art solutions with significant margins. To the best of our knowledge, this is the first time an automated data science system could win medals in Kaggle competitions with complex relational database.

deep learning, neural network, transformation, (19 more...)

arXiv.org Artificial Intelligence

1801.05372

Country: North America > United States (0.68)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Databases (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Correlation Space for Time Series

Qiu, Han, Lam, Hoang Thanh, Fusco, Francesco, Sinn, Mathieu

arXiv.org Machine LearningMar-13-2018

We propose an approximation algorithm for efficient correlation search in time series data. In our method, we use Fourier transform and neural network to embed time series into a low-dimensional Euclidean space. The given space is learned such that time series correlation can be effectively approximated from Euclidean distance between corresponding embedded vectors. Therefore, search for correlated time series can be done using an index in the embedding space for efficient nearest neighbor search. Our theoretical analysis illustrates that our method's accuracy can be guaranteed under certain regularity conditions. We further conduct experiments on real-world datasets and the results show that our method indeed outperforms the baseline solution. In particular, for approximation of correlation, our method reduces the approximation loss by a half in most test cases compared to the baseline solution. For top-$k$ highest correlation search, our method improves the precision from 5\% to 20\% while the query time is similar to the baseline approach query time.

deep learning, neural network, time series, (18 more...)

arXiv.org Machine Learning

1802.03628

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

(Blue) Taxi Destination and Trip Time Prediction from Partial Trajectories

Lam, Hoang Thanh, Diaz-Aviles, Ernesto, Pascale, Alessandra, Gkoufas, Yiannis, Chen, Bei

arXiv.org Machine LearningSep-17-2015

Real-time estimation of destination and travel time for taxis is of great importance for existing electronic dispatch systems. We present an approach based on trip matching and ensemble learning, in which we leverage the patterns observed in a dataset of roughly 1.7 million taxi journeys to predict the corresponding final destination and travel time for ongoing taxi trips, as a solution for the ECML/PKDD Discovery Challenge 2015 competition. The results of our empirical evaluation show that our approach is effective and very robust, which led our team -- BlueTaxi -- to the 3rd and 7th position of the final rankings for the trip time and destination prediction tasks, respectively. Given the fact that the final rankings were computed using a very small test set (with only 320 trips) we believe that our approach is one of the most robust solutions for the challenge based on the consistency of our good results across the test sets.

artificial intelligence, ground transportation, prediction, (18 more...)

arXiv.org Machine Learning

1509.05257

Country: Europe > Portugal (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (0.70)
Transportation > Ground > Road (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback