AITopics | Pham, Tuan

Collaborating Authors

Pham, Tuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Lightspeed Geometric Dataset Distance via Sliced Optimal Transport

Nguyen, Khai, Nguyen, Hai, Pham, Tuan, Ho, Nhat

arXiv.org Machine LearningJan-31-2025

Dataset distances provide a powerful framework for comparing datasets based on their underlying structures, distributions, or content. These measures are essential in applications where understanding the relationships between datasets drives decision-making, such as assessing data quality, detecting distributional shifts, or quantifying biases. They play a critical role in machine learning workflows, enabling tasks like domain adaptation, transfer learning, continual learning, and fairness evaluation. Additionally, dataset distances are valuable in emerging areas such as synthetic data evaluation, 3D shape comparison, and federated learning, where comparing heterogeneous data distributions is fundamental. By capturing meaningful similarities and differences between datasets, these measures facilitate data-driven insights, enhance model robustness, and support novel applications across diverse fields. A common approach to comparing datasets relies on proxies, such as analyzing the learning curves of a predefined model [28, 16] or examining its optimal parameters [1, 22] on a given task. Another strategy involves making strong assumptions about the similarity or co-occurrence of labels between datasets [47]. However, these methods often lack theoretical guarantees, are heavily dependent on the choice of the probe model, and require training the model to completion (e.g., to identify optimal parameters) for each dataset under comparison. To address limitations of previous approaches, model-agnostic approaches are developed.

artificial intelligence, machine learning, projection, (16 more...)

arXiv.org Machine Learning

2501.18901

Country:

North America > United States > Texas (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > New Finding (0.46)
Overview > Innovation (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

One Diffusion to Generate Them All

Le, Duong H., Pham, Tuan, Lee, Sangho, Clark, Christopher, Kembhavi, Aniruddha, Mandt, Stephan, Krishna, Ranjay, Lu, Jiasen

arXiv.org Artificial IntelligenceNov-25-2024

We introduce OneDiffusion, a versatile, large-scale diffusion model that seamlessly supports bidirectional image synthesis and understanding across diverse tasks. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps, while also handling tasks like image deblurring, upscaling, and reverse processes such as depth estimation and segmentation. Additionally, OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs. Our model takes a straightforward yet effective approach by treating all tasks as frame sequences with varying noise scales during training, allowing any frame to act as a conditioning image at inference time. Our unified training framework removes the need for specialized architectures, supports scalable multi-task training, and adapts smoothly to any resolution, enhancing both generalization and scalability. Experimental results demonstrate competitive performance across tasks in both generation and prediction such as text-to-image, multiview generation, ID preservation, depth estimation and camera pose estimation despite relatively small training dataset. Our code and checkpoint are freely available at https://github.com/lehduong/OneDiffusion

diffusion model, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2411.16318

Country:

North America > United States > California (0.14)
Europe > Italy (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Neural NeRF Compression

Pham, Tuan, Mandt, Stephan

arXiv.org Artificial IntelligenceJun-13-2024

Neural Radiance Fields (NeRFs) have emerged as powerful tools for capturing detailed 3D scenes through continuous volumetric representations. Recent NeRFs utilize feature grids to improve rendering quality and speed; however, these representations introduce significant storage overhead. This paper presents a novel method for efficiently compressing a grid-based NeRF model, addressing the storage overhead concern. Our approach is based on the non-linear transform coding paradigm, employing neural compression for compressing the model's feature grids. Due to the lack of training data involving many i.i.d scenes, we design an encoder-free, end-to-end optimized approach for individual scenes, using lightweight decoders. To leverage the spatial inhomogeneity of the latent feature grids, we introduce an importance-weighted rate-distortion objective and a sparse entropy model employing a masking mechanism. Our experimental results validate that our proposed method surpasses existing works in terms of grid-based NeRF compression efficacy and reconstruction quality.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.08943

Country:

Asia > Japan > Honshū > Chūbu (0.14)
Europe > Austria > Vienna (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Preserving Identity with Variational Score for General-purpose 3D Editing

Le, Duong H., Pham, Tuan, Kembhavi, Aniruddha, Mandt, Stephan, Ma, Wei-Chiu, Lu, Jiasen

arXiv.org Artificial IntelligenceJun-13-2024

We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models. Specifically, our approach is inspired by the recently proposed method for 2D image editing - Delta Denoising Score (DDS). We pinpoint the limitations in DDS for 2D and 3D editing, which causes detail loss and over-saturation. To address this, we propose an additional score distillation term that enforces identity preservation. This results in a more stable editing process, gradually optimizing NeRF models to match target prompts while retaining crucial input characteristics. We demonstrate the effectiveness of our approach in zero-shot image and neural field editing. Our method successfully alters visual attributes, adds both subtle and substantial structural elements, translates shapes, and achieves competitive results on standard 2D and 3D editing benchmarks. Additionally, our method imposes no constraints like masking or pre-training, making it compatible with a wide range of pre-trained diffusion models. This allows for versatile editing without needing neural field-to-mesh conversion, offering a more user-friendly experience.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2406.08953

Country:

Asia (0.28)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Media (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Temporal Predictive Coding For Model-Based Planning In Latent Space

Nguyen, Tung, Shu, Rui, Pham, Tuan, Bui, Hung, Ermon, Stefano

arXiv.org Artificial IntelligenceJun-14-2021

High-dimensional observations are a major challenge in the application of model-based reinforcement learning (MBRL) to real-world environments. To handle high-dimensional sensory inputs, existing approaches use representation learning to map high-dimensional observations into a lower-dimensional latent space that is more amenable to dynamics estimation and planning. In this work, we present an information-theoretic approach that employs temporal predictive coding to encode elements in the environment that can be predicted across time. Since this approach focuses on encoding temporally-predictable information, we implicitly prioritize the encoding of task-relevant components over nuisance information within the environment that are provably task-irrelevant. By learning this representation in conjunction with a recurrent state space model, we can then perform planning in latent space. We evaluate our model on a challenging modification of standard DMControl tasks where the background is replaced with natural videos that contain complex but irrelevant information to the planning task. Our experiments show that our model is superior to existing methods in the challenging complex-background setting while remaining competitive with current state-of-the-art models in the standard setting.

information, litigation, neural network, (16 more...)

arXiv.org Artificial Intelligence

2106.07156

Country:

Europe (0.67)
North America > United States > California (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Law > Litigation (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback