AITopics

2503.11249

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceMar-13-2025

Distance-Based Tree-Sliced Wasserstein Distance

Tran, Hoang V., Nguyen, Khoi N. M., Pham, Trang, Chu, Thanh T., Le, Tam, Nguyen, Tan M.

To overcome computational challenges of Optimal Transport (OT), several variants of Sliced Wasserstein (SW) has been developed in the literature. These approaches exploit the closed-form expression of the univariate OT by projecting measures onto (one-dimensional) lines. However, projecting measures onto low-dimensional spaces can lead to a loss of topological information. Tree-Sliced Wasserstein distance on Systems of Lines (TSW-SL) has emerged as a promising alternative that replaces these lines with a more advanced structure called tree systems. The tree structures enhance the ability to capture topological information of the metric while preserving computational efficiency. However, at the core of TSW-SL, the splitting maps, which serve as the mechanism for pushing forward measures onto tree systems, focus solely on the position of the measure supports while disregarding the projecting domains. Moreover, the specific splitting map used in TSW-SL leads to a metric that is not invariant under Euclidean transformations, a typically expected property for OT on Euclidean space. In this work, we propose a novel class of splitting maps that generalizes the existing one studied in TSW-SL enabling the use of all positional information from input measures, resulting in a novel Distance-based Tree-Sliced Wasserstein (Db-TSW) distance. In addition, we introduce a simple tree sampling process better suited for Db-TSW, leading to an efficient GPU-friendly implementation for tree systems, similar to the original SW. We also provide a comprehensive theoretical analysis of proposed class of splitting maps to verify the injectivity of the corresponding Radon Transform, and demonstrate that Db-TSW is an Euclidean invariant metric. We empirically show that Db-TSW significantly improves accuracy compared to recent SW variants while maintaining low computational cost via a wide range of experiments.

artificial intelligence, machine learning, wasserstein distance, (14 more...)

2503.1105

Country: Asia > Vietnam (0.14)

Genre:

Research Report (0.81)
Workflow (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Machine LearningJun-19-2024

Tree-Sliced Wasserstein Distance on a System of Lines

Tran, Viet-Hoang, Pham, Trang, Tran, Tho, Le, Tam, Nguyen, Tan M.

Sliced Wasserstein (SW) distance in Optimal Transport (OT) is widely used in various applications thanks to its statistical effectiveness and computational efficiency. On the other hand, Tree Wassenstein (TW) and Tree-sliced Wassenstein (TSW) are instances of OT for probability measures where its ground cost is a tree metric. TSW also has a low computational complexity, i.e. linear to the number of edges in the tree. Especially, TSW is identical to SW when the tree is a chain. While SW is prone to loss of topological information of input measures due to relying on one-dimensional projection, TSW is more flexible and has a higher degree of freedom by choosing a tree rather than a line to alleviate the curse of dimensionality in SW. However, for practical applications, popular tree metric sampling methods are heavily built upon given supports, which limits their capacity to adapt to new supports. In this paper, we propose the Tree-Sliced Wasserstein distance on a System of Lines (TSW-SL), which brings a connection between SW and TSW. Compared to SW and TSW, our TSW-SL benefits from the higher degree of freedom of TSW while being suitable to dynamic settings as SW. In TSW-SL, we use a variant of the Radon Transform to project measures onto a system of lines, resulting in measures on a space with a tree metric, then leverage TW to efficiently compute distances between them. We empirically verify the advantages of TSW-SL over the traditional SW by conducting a variety of experiments on gradient flows, image style transfer, and generative models.

artificial intelligence, machine learning, tree system, (14 more...)

2406.13725

Country: Asia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Machine LearningMay-22-2024

Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts

Nguyen, Huy, Akbarian, Pedram, Pham, Trang, Nguyen, Trang, Zhang, Shujian, Ho, Nhat

The cosine router in sparse Mixture of Experts (MoE) has recently emerged as an attractive alternative to the conventional linear router. Indeed, the cosine router demonstrates favorable performance in image and language tasks and exhibits better ability to mitigate the representation collapse issue, which often leads to parameter redundancy and limited representation potentials. Despite its empirical success, a comprehensive analysis of the cosine router in sparse MoE has been lacking. Considering the least square estimation of the cosine routing sparse MoE, we demonstrate that due to the intrinsic interaction of the model parameters in the cosine router via some partial differential equations, regardless of the structures of the experts, the estimation rates of experts and model parameters can be as slow as $\mathcal{O}(1/\log^{\tau}(n))$ where $\tau > 0$ is some constant and $n$ is the sample size. Surprisingly, these pessimistic non-polynomial convergence rates can be circumvented by the widely used technique in practice to stabilize the cosine router -- simply adding noises to the $\mathbb{L}_{2}$ norms in the cosine router, which we refer to as \textit{perturbed cosine router}. Under the strongly identifiable settings of the expert functions, we prove that the estimation rates for both the experts and model parameters under the perturbed cosine routing sparse MoE are significantly improved to polynomial rates. Finally, we conduct extensive simulation studies in both synthetic and real data settings to empirically validate our theoretical results.

artificial intelligence, cosine router, machine learning, (16 more...)

2405.14131

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Artificial IntelligenceMay-22-2024

Mixture of Experts Meets Prompt-Based Continual Learning

Le, Minh, Nguyen, An, Nguyen, Huy, Nguyen, Trang, Pham, Trang, Van Ngo, Linh, Ho, Nhat

Exploiting the power of pre-trained models, prompt-based approaches stand out compared to other continual learning solutions in effectively preventing catastrophic forgetting, even with very few learnable parameters and without the need for a memory buffer. While existing prompt-based continual learning methods excel in leveraging prompts for state-of-the-art performance, they often lack a theoretical explanation for the effectiveness of prompting. This paper conducts a theoretical analysis to unravel how prompts bestow such advantages in continual learning, thus offering a new perspective on prompt design. We first show that the attention block of pre-trained models like Vision Transformers inherently encodes a special mixture of experts architecture, characterized by linear experts and quadratic gating score functions. This realization drives us to provide a novel view on prefix tuning, reframing it as the addition of new task-specific experts, thereby inspiring the design of a novel gating mechanism termed Non-linear Residual Gates (NoRGa). Through the incorporation of non-linear activation and residual connection, NoRGa enhances continual learning performance while preserving parameter efficiency. The effectiveness of NoRGa is substantiated both theoretically and empirically across diverse benchmarks and pretraining paradigms.

artificial intelligence, exp, machine learning, (15 more...)

2405.14124

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

arXiv.org Artificial IntelligenceAug-9-2018

Relational dynamic memory networks

Pham, Trang, Tran, Truyen, Venkatesh, Svetha

Working memory is an essential component of reasoning -- the capacity to answer a new question by manipulating acquired knowledge. Current memory-augmented neural networks offer a differentiable method to realize limited reasoning with support of a working memory module. Memory modules are often implemented as a set of memory slots without explicit relational exchange of content. This does not naturally match multi-relational domains in which data is structured. We design a new model dubbed Relational Dynamic Memory Network (RDMN) to fill this gap. The memory can have a single or multiple components, each of which realizes a multi-relational graph of memory slots. The memory is dynamically updated in the reasoning process controlled by the central controller. We evaluate the capability of RDMN on several important application domains: software vulnerability, molecular bioactivity and chemical reaction. Results demonstrate the efficacy of the proposed model.

deep learning, graph, neural network, (22 more...)

1808.04247

Genre: Research Report (0.70)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.87)

arXiv.org Machine LearningAug-14-2017

Graph Classification via Deep Learning with Virtual Nodes

Pham, Trang, Tran, Truyen, Dam, Hoa, Venkatesh, Svetha

Learning representation for graph classification turns a variable-size graph into a fixed-size vector (or matrix). Such a representation works nicely with algebraic manipulations. Here we introduce a simple method to augment an attributed graph with a virtual node that is bidirectionally connected to all existing nodes. The virtual node represents the latent aspects of the graph, which are not immediately available from the attributes and local connectivity structures. The expanded graph is then put through any node representation method. The representation of the virtual node is then the representation of the entire graph. In this paper, we use the recently introduced Column Network for the expanded graph, resulting in a new end-to-end graph classification model dubbed Virtual Column Network (VCN). The model is validated on two tasks: (i) predicting bio-activity of chemical compounds, and (ii) finding software vulnerability from source code. Results demonstrate that VCN is competitive against well-established rivals.

deep learning, graph, neural network, (18 more...)

1708.04357

Country: Oceania > Australia (0.14)

Genre: Research Report (0.70)

Industry: Health & Medicine > Therapeutic Area (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningApr-10-2017

DeepCare: A Deep Dynamic Memory Model for Predictive Medicine

Pham, Trang, Tran, Truyen, Phung, Dinh, Venkatesh, Svetha

Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illness states and predicts future medical outcomes. At the data level, DeepCare represents care episodes as vectors in space, models patient health state trajectories through explicit memory of historical records. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timed events by moderating the forgetting and consolidation of memory cells. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling, intervention recommendation, and future risk prediction. On two important cohorts with heavy social and economic burden -- diabetes and mental health -- the results show improved modeling and risk prediction accuracy.

deep learning, deepcare, diabetes, (25 more...)

1602.00357

Country:

Oceania > Australia (0.14)
Asia > Japan (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.89)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

arXiv.org Machine LearningMar-13-2017

One Size Fits Many: Column Bundle for Multi-X Learning

Pham, Trang, Tran, Truyen, Venkatesh, Svetha

Much recent machine learning research has been directed towards leveraging shared statistics among labels, instances and data views, commonly referred to as multi-label, multi-instance and multi-view learning. The underlying premises are that there exist correlations among input parts and among output targets, and the predictive performance would increase when the correlations are incorporated. In this paper, we propose Column Bundle (CLB), a novel deep neural network for capturing the shared statistics in data. CLB is generic that the same architecture can be applied for various types of shared statistics by changing only input and output handling. CLB is capable of scaling to thousands of input parts and output labels by avoiding explicit modeling of pairwise relations. We evaluate CLB on different types of data: (a) multi-label, (b) multi-view, (c) multi-view/multi-label and (d) multi-instance. CLB demonstrates a comparable and competitive performance in all datasets against state-of-the-art methods designed specifically for each type.

dataset, deep learning, neural network, (18 more...)

1702.07021

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

AAAI ConferencesFeb-14-2017

Column Networks for Collective Classification

Pham, Trang (Deakin University) | Tran, Truyen (Deakin University) | Phung, Dinh (Deakin University) | Venkatesh, Svetha (Deakin University)

Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computationally challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient with linear complexity in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all of these applications, CLN demonstrates a higher accuracy than state-of-the-art rivals.

classification, deep learning, diabetes, (22 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)