AITopics | Nguyen, Minh

Collaborating Authors

Nguyen, Minh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Decentralized Navigation of a Cable-Towed Load using Quadrupedal Robot Team via MARL

Chen, Wen-Tse, Nguyen, Minh, Li, Zhongyu, Sue, Guo Ning, Sreenath, Koushil

arXiv.org Artificial IntelligenceMar-23-2025

This work addresses the challenge of enabling a team of quadrupedal robots to collaboratively tow a cable-connected load through cluttered and unstructured environments while avoiding obstacles. Leveraging cables allows the multi-robot system to navigate narrow spaces by maintaining slack when necessary. However, this introduces hybrid physical interactions due to alternating taut and slack states, with computational complexity that scales exponentially as the number of agents increases. To tackle these challenges, we developed a scalable and decentralized system capable of dynamically coordinating a variable number of quadrupedal robots while managing the hybrid physical interactions inherent in the load-towing task. At the core of this system is a novel multi-agent reinforcement learning (MARL)-based planner, designed for decentralized coordination. The MARL-based planner is trained using a centralized training with decentralized execution (CTDE) framework, enabling each robot to make decisions autonomously using only local (ego) observations. To accelerate learning and ensure effective collaboration across varying team sizes, we introduce a tailored training curriculum for MARL. Experimental results highlight the flexibility and scalability of the framework, demonstrating successful deployment with one to four robots in real-world scenarios and up to twelve robots in simulation. The decentralized planner maintains consistent inference times, regardless of the team size. Additionally, the proposed system demonstrates robustness to environment perturbations and adaptability to varying load weights. This work represents a step forward in achieving flexible and efficient multi-legged robotic collaboration in complex and real-world environments.

artificial intelligence, machine learning, robot, (17 more...)

arXiv.org Artificial Intelligence

2503.18221

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.68)
Instructional Material > Course Syllabus & Notes (0.54)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

One STEP at a time: Language Agents are Stepwise Planners

Nguyen, Minh, Shareghi, Ehsan

arXiv.org Artificial IntelligenceNov-13-2024

Language agents have shown promising adaptability in dynamic environments to perform complex tasks. However, despite the versatile knowledge embedded in large language models, these agents still fall short when it comes to tasks that require planning. We introduce STEP, a novel framework designed to efficiently learn from previous experiences to enhance the planning capabilities of language agents in future steps. Concretely, STEP functions through four interconnected components. First, the Planner takes on the task, breaks it down into subtasks and provides relevant insights. Then the Executor generates action candidates, while the Evaluator ensures the actions align with learned rules from previous experiences. Lastly, Memory stores experiences to inform future decisions. In the ScienceWorld benchmark, our results show that STEP consistently outperforms state-of-the-art models, achieving an overall score of 67.4 and successfully completing 12 out of 18 tasks. These findings highlight STEP's potential as a framework for enhancing planning capabilities in language agents, paving the way for more sophisticated task-solving in dynamic environments.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2411.08432

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Education > Educational Setting > Continuing Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Design Space Exploration of Embedded SoC Architectures for Real-Time Optimal Control

Dong, Kris Shengjun, Nikiforov, Dima, Soedarmadji, Widyadewi, Nguyen, Minh, Fletcher, Christopher, Shao, Yakun Sophia

arXiv.org Artificial IntelligenceOct-24-2024

Empowering resource-limited robots to execute computationally intensive tasks such as locomotion and manipulation is challenging. This project provides a comprehensive design space exploration to determine optimal hardware computation architectures suitable for model-based control algorithms. We profile and optimize representative architectural designs across general-purpose scalar, vector processors, and specialized accelerators. Specifically, we compare CPUs, vector machines, and domain-specialized accelerators with kernel-level benchmarks and end-to-end representative robotic workloads. Our exploration provides a quantitative performance, area, and utilization comparison and analyzes the trade-offs between these representative distinct architectural designs. We demonstrate that architectural modifications, software, and system optimization can alleviate bottlenecks and enhance utilization. Finally, we propose a code generation flow to simplify the engineering work for mapping robotic workloads to specialized architectures.

artificial intelligence, design space exploration, real-time optimal control, (1 more...)

arXiv.org Artificial Intelligence

2410.12142

Genre: Research Report (0.66)

Industry: Construction & Engineering (0.73)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Artificial Intelligence > Robots (0.73)

Add feedback

Low-cost Robust Night-time Aerial Material Segmentation through Hyperspectral Data and Sparse Spatio-Temporal Learning

Bajaj, Chandrajit, Nguyen, Minh, Bhardwaj, Shubham

arXiv.org Artificial IntelligenceOct-19-2024

Material segmentation is a complex task, particularly when dealing with aerial data in poor lighting and atmospheric conditions. To address this, hyperspectral data from specialized cameras can be very useful in addition to RGB images. However, due to hardware constraints, high spectral data often come with lower spatial resolution. Additionally, incorporating such data into a learning-based segmentation framework is challenging due to the numerous data channels involved. To overcome these difficulties, we propose an innovative Siamese framework that uses time series-based compression to effectively and scalably integrate the additional spectral data into the segmentation task. We demonstrate our model's effectiveness through competitive benchmarks on aerial datasets in various environmental conditions.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2410.15208

Country: North America > Canada (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

Point Cloud Compression with Bits-back Coding

Hieu, Nguyen Quang, Nguyen, Minh, Hoang, Dinh Thai, Nguyen, Diep N., Dutkiewicz, Eryk

arXiv.org Artificial IntelligenceOct-9-2024

This paper introduces a novel lossless compression method for compressing geometric attributes of point cloud data with bits-back coding. Our method specializes in using a deep learning-based probabilistic model to estimate the Shannon's entropy of the point cloud information, i.e., geometric attributes of the 3D floating points. Once the entropy of the point cloud dataset is estimated with a convolutional variational autoencoder (CVAE), we use the learned CVAE model to compress the geometric attributes of the point clouds with the bits-back coding technique. The novelty of our method with bits-back coding specializes in utilizing the learned latent variable model of the CVAE to compress the point cloud data. By using bits-back coding, we can capture the potential correlation between the data points, such as similar spatial features like shapes and scattering regions, into the lower-dimensional latent space to further reduce the compression ratio. The main insight of our method is that we can achieve a competitive compression ratio as conventional deep learning-based approaches, while significantly reducing the overhead cost of storage and/or communicating the compression codec, making our approach more applicable in practical scenarios. Throughout comprehensive evaluations, we found that the cost for the overhead is significantly small, compared to the reduction of the compression ratio when compressing large point cloud datasets. Experiment results show that our proposed approach can achieve a compression ratio of 1.56 bit-per-point on average, which is significantly lower than the baseline approach such as Google's Draco with a compression ratio of 1.83 bit-per-point.

artificial intelligence, machine learning, point cloud, (15 more...)

arXiv.org Artificial Intelligence

2410.18115

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models

Nguyen, Minh, Dernoncourt, Franck, Yoon, Seunghyun, Deilamsalehy, Hanieh, Tan, Hao, Rossi, Ryan, Tran, Quan Hung, Bui, Trung, Nguyen, Thien Huu

arXiv.org Artificial IntelligenceJul-16-2024

We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these gaps, we present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names. Through extensive experiments, our best model achieves a great precision of 80.3\%, setting a new benchmark for SpeakerID. The data and code are publicly available here: \url{https://github.com/adobe-research/speaker-identification}

machine learning, natural language, transcript, (20 more...)

arXiv.org Artificial Intelligence

2407.12094

Country:

Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.93)
Media > News (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Min P Sampling: Balancing Creativity and Coherence at High Temperature

Nguyen, Minh, Baker, Andrew, Kirsch, Andreas, Neo, Clement

arXiv.org Artificial IntelligenceJul-1-2024

Large Language Models (LLMs) generate longform text by successively sampling the next token based on the probability distribution of the token vocabulary at each decoding step. Current popular truncation sampling methods such as top-$p$ sampling, also known as nucleus sampling, often struggle to balance coherence and creativity in generating text, particularly when using higher temperatures. To address this issue, we propose min-$p$, a dynamic truncation sampling method, that establishes a minimum base percentage threshold for tokens, which the scales according to the probability of the top candidate token. Through experiments on several benchmarks, such as GPQA, GSM8K and AlpacaEval Creative Writing, we demonstrate that min-$p$ improves the coherence and quality of generated text even at high temperatures, while also facilitating more creative and diverse outputs compared to top-$p$ and other sampling methods. As of writing, min-$p$ has been adopted by multiple open-source LLM implementations, and have been independently assessed by members of the open-source LLM community, further validating its practical utility and potential.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.01082

Country:

Oceania > Australia (0.14)
North America > Canada (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.86)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Knockout: A simple way to handle missing inputs

Nguyen, Minh, Karaman, Batuhan K., Kim, Heejong, Wang, Alan Q., Liu, Fengbei, Sabuncu, Mert R.

arXiv.org Artificial IntelligenceJun-3-2024

Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training multiple models. Marginalization can obtain calibrated predictions but it is computationally costly and therefore only feasible for low dimensional inputs. Imputation may result in inaccurate predictions because it employs point estimates for missing variables and does not work well for high dimensional inputs (e.g., images). Training multiple models whereby each model takes different subsets of inputs can work well but requires knowing missing input patterns in advance. Furthermore, training and retaining multiple models can be costly. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification of Knockout and show that it can be viewed as an implicit marginalization strategy. We evaluate Knockout in a wide range of simulations and real-world datasets and show that it can offer strong empirical performance.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.20448

Country:

Oceania > Australia (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Planning Robot Placement for Object Grasping

Saini, Manish, Jacob, Melvin Paul, Nguyen, Minh, Hochgeschwender, Nico

arXiv.org Artificial IntelligenceMay-26-2024

When performing manipulation-based activities such as picking objects, a mobile robot needs to position its base at a location that supports successful execution. To address this problem, prominent approaches typically rely on costly grasp planners to provide grasp poses for a target object, which are then are then analysed to identify the best robot placements for achieving each grasp pose. In this paper, we propose instead to first find robot placements that would not result in collision with the environment and from where picking up the object is feasible, then evaluate them to find the best placement candidate. Our approach takes into account the robot's reachability, as well as RGB-D images and occupancy grid maps of the environment for identifying suitable robot poses. The proposed algorithm is embedded in a service robotic workflow, in which a person points to select the target object for grasping. We evaluate our approach with a series of grasping experiments, against an existing baseline implementation that sends the robot to a fixed navigation goal. The experimental results show how the approach allows the robot to grasp the target object from locations that are very challenging to the baseline implementation.

artificial intelligence, planning & scheduling, robot, (15 more...)

arXiv.org Artificial Intelligence

2405.16692

Country: Europe > Germany > Bremen > Bremen (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.70)

Add feedback

DPO: Differential reinforcement learning with application to optimal configuration search

Bajaj, Chandrajit, Nguyen, Minh

arXiv.org Artificial IntelligenceApr-23-2024

Reinforcement learning (RL) with continuous state and action spaces remains one of the most challenging problems within the field. Most current learning methods focus on integral identities such as value functions to derive an optimal strategy for the learning agent. In this paper, we instead study the dual form of the original RL formulation to propose the first differential RL framework that can handle settings with limited training samples and short-length episodes. Our approach introduces Differential Policy Optimization (DPO), a pointwise and stage-wise iteration method that optimizes policies encoded by local-movement operators. We prove a pointwise convergence estimate for DPO and provide a regret bound comparable with current theoretical works. Such pointwise estimate ensures that the learned policy matches the optimal path uniformly across different steps. We then apply DPO to a class of practical RL problems which search for optimal configurations with Lagrangian rewards. DPO is easy to implement, scalable, and shows competitive results on benchmarking experiments against several popular RL methods.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2404.15617

Country:

North America > United States > Texas (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback