AITopics | Chen, Yi-Ting

Collaborating Authors

Chen, Yi-Ting

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mitigating Cross-Modal Distraction and Ensuring Geometric Feasibility via Affordance-Guided, Self-Consistent MLLMs for Food Preparation Task Planning

Shen, Yu-Hong, Wu, Chuan-Yu, Yang, Yi-Ru, Tai, Yen-Ling, Chen, Yi-Ting

arXiv.org Artificial IntelligenceMar-17-2025

We study Multimodal Large Language Models (MLLMs) with in-context learning for food preparation task planning. In this context, we identify two key challenges: cross-modal distraction and geometric feasibility. Cross-modal distraction occurs when the inclusion of visual input degrades the reasoning performance of a MLLM. Geometric feasibility refers to the ability of MLLMs to ensure that the selected skills are physically executable in the environment. To address these issues, we adapt Chain of Thought (CoT) with Self-Consistency to mitigate reasoning loss from cross-modal distractions and use affordance predictor as skill preconditions to guide MLLM on geometric feasibility. We construct a dataset to evaluate the ability of MLLMs on quantity estimation, reachability analysis, relative positioning and collision avoidance. We conducted a detailed evaluation to identify issues among different baselines and analyze the reasons for improvement, providing insights into each approach. Our method reaches a success rate of 76.7% on the entire dataset, showing a substantial improvement over the CoT baseline at 36.7%.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.13055

Country: Asia > Taiwan (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Shared-unique Features and Task-aware Prioritized Sampling on Multi-task Reinforcement Learning

Lin, Po-Shao, Yeh, Jia-Fong, Chen, Yi-Ting, Hsu, Winston H.

arXiv.org Artificial IntelligenceJun-2-2024

We observe that current state-of-the-art (SOTA) methods suffer from the performance imbalance issue when performing multi-task reinforcement learning (MTRL) tasks. While these methods may achieve impressive performance on average, they perform extremely poorly on a few tasks. To address this, we propose a new and effective method called STARS, which consists of two novel strategies: a shared-unique feature extractor and task-aware prioritized sampling. First, the shared-unique feature extractor learns both shared and task-specific features to enable better synergy of knowledge between different tasks. Second, the task-aware sampling strategy is combined with the prioritized experience replay for efficient learning on tasks with poor performance. The effectiveness and stability of our STARS are verified through experiments on the mainstream Meta-World benchmark. From the results, our STARS statistically outperforms current SOTA methods and alleviates the performance imbalance issue. Besides, we visualize the learned features to support our claims and enhance the interpretability of STARS.

machine learning, reinforcement learning, transition, (15 more...)

arXiv.org Artificial Intelligence

2406.00761

Country: Africa > Ethiopia (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

Hung, Kuo-Han, Lo, Pang-Chi, Yeh, Jia-Fong, Hsu, Han-Yuan, Chen, Yi-Ting, Hsu, Winston H.

arXiv.org Artificial IntelligenceMay-26-2024

We study reward models for long-horizon manipulation tasks by learning from action-free videos and language instructions, which we term the visual-instruction correlation (VIC) problem. Recent advancements in cross-modality modeling have highlighted the potential of reward modeling through visual and language correlations. However, existing VIC methods face challenges in learning rewards for long-horizon tasks due to their lack of sub-stage awareness, difficulty in modeling task complexities, and inadequate object state estimation. To address these challenges, we introduce VICtoR, a novel hierarchical VIC reward model capable of providing effective reward signals for long-horizon manipulation tasks. VICtoR precisely assesses task progress at various levels through a novel stage detector and motion progress evaluator, offering insightful guidance for agents learning the task effectively. To validate the effectiveness of VICtoR, we conducted extensive experiments in both simulated and real-world environments. The results suggest that VICtoR outperformed the best existing VIC methods, achieving a 43% improvement in success rates for long-horizon tasks.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2405.16545

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes

Kung, Chi-Hsi, Lu, Shu-Wei, Tsai, Yi-Hsuan, Chen, Yi-Ting

arXiv.org Artificial IntelligenceApr-20-2024

In this paper, we study multi-label atomic activity recognition. Despite the notable progress in action recognition, it is still challenging to recognize atomic activities due to a deficiency in a holistic understanding of both multiple road users' motions and their contextual information. In this paper, we introduce Action-slot, a slot attention-based approach that learns visual action-centric representations, capturing both motion and contextual information. Our key idea is to design action slots that are capable of paying attention to regions where atomic activities occur, without the need for explicit perception guidance. To further enhance slot attention, we introduce a background slot that competes with action slots, aiding the training process in avoiding unnecessary focus on background regions devoid of activities. Yet, the imbalanced class distribution in the existing dataset hampers the assessment of rare activities. To address the limitation, we collect a synthetic dataset called TACO, which is four times larger than OATS and features a balanced distribution of atomic activities. To validate the effectiveness of our method, we conduct comprehensive experiments and ablation studies against various action recognition baselines. We also show that the performance of multi-label atomic activity recognition on real-world datasets can be improved by pretraining representations on TACO. We will release our source code and dataset. See the videos of visualization on the project page: https://hcis-lab.github.io/Action-slot/

action-slot, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2311.17948

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AED: Adaptable Error Detection for Few-shot Imitation Policy

Yeh, Jia-Fong, Hung, Kuo-Han, Lo, Pang-Chi, Chung, Chi-Ming, Wu, Tsung-Han, Su, Hung-Ting, Chen, Yi-Ting, Hsu, Winston H.

arXiv.org Artificial IntelligenceFeb-6-2024

We study how to report few-shot imitation (FSI) policies' behavior errors in novel environments, a novel task named adaptable error detection (AED). The potential to cause serious damage to surrounding areas limits the application of FSI policies in real-world scenarios. Thus, a robust system is necessary to notify operators when FSI policies are inconsistent with the intent of demonstrations. We develop a cross-domain benchmark for the challenging AED task, consisting of 329 base and 158 novel environments. This task introduces three challenges, including (1) detecting behavior errors in novel environments, (2) behavior errors occurring without revealing notable changes, and (3) lacking complete temporal information of the rollout due to the necessity of online detection. To address these challenges, we propose Pattern Observer (PrObe) to parse discernible patterns in the policy feature representations of normal or error states, whose effectiveness is verified in the proposed benchmark. Through our comprehensive evaluation, PrObe consistently surpasses strong baselines and demonstrates a robust capability to identify errors arising from a wide range of FSI policies. Moreover, we conduct comprehensive ablations and experiments (error correction, demonstration quality, etc.) to validate the practicality of our proposed task and methodology.

artificial intelligence, demonstration, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.0386

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Data Science (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

SKT-Hang: Hanging Everyday Objects via Object-Agnostic Semantic Keypoint Trajectory Generation

Kuo, Chia-Liang, Chao, Yu-Wei, Chen, Yi-Ting

arXiv.org Artificial IntelligenceDec-8-2023

We study the problem of hanging a wide range of grasped objects on diverse supporting items. Hanging objects is a ubiquitous task that is encountered in numerous aspects of our everyday lives. However, both the objects and supporting items can exhibit substantial variations in their shapes and structures, bringing two challenging issues: (1) determining the task-relevant geometric structures across different objects and supporting items, and (2) identifying a robust action sequence to accommodate the shape variations of supporting items. To this end, we propose Semantic Keypoint Trajectory (SKT), an object-agnostic representation that is highly versatile and applicable to various everyday objects. We also propose Shape-conditioned Trajectory Deformation Network (SCTDN), a model that learns to generate SKT by deforming a template trajectory based on the task-relevant geometric structure features of the supporting items. We conduct extensive experiments and demonstrate substantial improvements in our framework over existing robot hanging methods in the success rate and inference time. Finally, our simulation-trained framework shows promising hanging results in the real world. For videos and supplementary materials, please visit our project webpage: https://hcis-lab.github.io/SKT-Hang/.

artificial intelligence, machine learning, trajectory, (13 more...)

arXiv.org Artificial Intelligence

2312.04936

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

RiskBench: A Scenario-based Benchmark for Risk Identification

Kung, Chi-Hsi, Yang, Chieh-Chi, Pao, Pang-Yuan, Lu, Shu-Wei, Chen, Pin-Lun, Lu, Hsin-Cheng, Chen, Yi-Ting

arXiv.org Artificial IntelligenceDec-4-2023

Intelligent driving systems aim to achieve a zero-collision mobility experience, requiring interdisciplinary efforts to enhance safety performance. This work focuses on risk identification, the process of identifying and analyzing risks stemming from dynamic traffic participants and unexpected events. While significant advances have been made in the community, the current evaluation of different risk identification algorithms uses independent datasets, leading to difficulty in direct comparison and hindering collective progress toward safety performance enhancement. To address this limitation, we introduce \textbf{RiskBench}, a large-scale scenario-based benchmark for risk identification. We design a scenario taxonomy and augmentation pipeline to enable a systematic collection of ground truth risks under diverse scenarios. We assess the ability of ten algorithms to (1) detect and locate risks, (2) anticipate risks, and (3) facilitate decision-making. We conduct extensive experiments and summarize future research on risk identification. Our aim is to encourage collaborative endeavors in achieving a society with zero collisions. We have made our dataset and benchmark toolkit publicly on the project page: https://hcis-lab.github.io/RiskBench/

artificial intelligence, machine learning, scenario, (12 more...)

arXiv.org Artificial Intelligence

2312.01659

Country:

Asia > Taiwan (0.14)
North America > United States (0.14)
Europe (0.14)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.70)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)

Add feedback

Combined Scaling for Zero-shot Transfer Learning

Pham, Hieu, Dai, Zihang, Ghiasi, Golnaz, Kawaguchi, Kenji, Liu, Hanxiao, Yu, Adams Wei, Yu, Jiahui, Chen, Yi-Ting, Luong, Minh-Thang, Wu, Yonghui, Tan, Mingxing, Le, Quoc V.

arXiv.org Artificial IntelligenceApr-12-2023

We present a combined scaling method - named BASIC - that achieves 85.7% top-1 accuracy on the ImageNet ILSVRC-2012 validation set without learning from any labeled ImageNet example. This accuracy surpasses best published similar models - CLIP and ALIGN - by 9.3%. Our BASIC model also shows significant improvements in robustness benchmarks. For instance, on 5 test sets with natural distribution shifts such as ImageNet-{A,R,V2,Sketch} and ObjectNet, our model achieves 84.3% top-1 average accuracy, only a small drop from its original ImageNet accuracy. To achieve these results, we scale up the contrastive learning framework of CLIP and ALIGN in three dimensions: data size, model size, and batch size. Our dataset has 6.6B noisy image-text pairs, which is 4x larger than ALIGN, and 16x larger than CLIP. Our largest model has 3B weights, which is 3.75x larger in parameters and 8x larger in FLOPs than ALIGN and CLIP. Finally, our batch size is 65536 which is 2x more than CLIP and 4x more than ALIGN. We encountered two main challenges with the scaling rules of BASIC. First, the main challenge with implementing the combined scaling rules of BASIC is the limited memory of accelerators, such as GPUs and TPUs. To overcome the memory limit, we propose two simple methods which make use of gradient checkpointing and model parallelism. Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood. To shed light on the benefits of large contrastive batch sizes, we develop a theoretical framework which shows that larger contrastive batch sizes lead to smaller generalization gaps for image-text models such as BASIC.

accuracy, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2111.1005

Genre: Research Report > New Finding (0.67)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DROID: Driver-centric Risk Object Identification

Li, Chengxi, Chan, Stanley H., Chen, Yi-Ting

arXiv.org Artificial IntelligenceFeb-28-2023

Identification of high-risk driving situations is generally approached through collision risk estimation or accident pattern recognition. In this work, we approach the problem from the perspective of subjective risk. We operationalize subjective risk assessment by predicting driver behavior changes and identifying the cause of changes. To this end, we introduce a new task called driver-centric risk object identification (DROID), which uses egocentric video to identify object(s) influencing a driver's behavior, given only the driver's response as the supervision signal. We formulate the task as a cause-effect problem and present a novel two-stage DROID framework, taking inspiration from models of situation awareness and causal inference. A subset of data constructed from the Honda Research Institute Driving Dataset (HDD) is used to evaluate DROID. We demonstrate state-of-the-art DROID performance, even compared with strong baseline models using this dataset. Additionally, we conduct extensive ablative studies to justify our design choices. Moreover, we demonstrate the applicability of DROID for risk assessment.

artificial intelligence, computer vision, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2106.13201

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (0.91)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

CLR-GAM: Contrastive Point Cloud Learning with Guided Augmentation and Feature Mapping

Malla, Srikanth, Chen, Yi-Ting

arXiv.org Artificial IntelligenceFeb-27-2023

Point cloud data plays an essential role in robotics and self-driving applications. Yet, annotating point cloud data is time-consuming and nontrivial while they enable learning discriminative 3D representations that empower downstream tasks, such as classification and segmentation. Recently, contrastive learning-based frameworks have shown promising results for learning 3D representations in a self-supervised manner. However, existing contrastive learning methods cannot precisely encode and associate structural features and search the higher dimensional augmentation space efficiently. In this paper, we present CLR-GAM, a novel contrastive learning-based framework with Guided Augmentation (GA) for efficient dynamic exploration strategy and Guided Feature Mapping (GFM) for similar structural feature association between augmented point clouds. We empirically demonstrate that the proposed approach achieves state-of-the-art performance on both simulated and real-world 3D point cloud datasets for three different downstream tasks, i.e., 3D point cloud classification, few-shot learning, and object part segmentation.

artificial intelligence, machine learning, point cloud, (15 more...)

arXiv.org Artificial Intelligence

2302.14306

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback