AITopics | Tseng, Yu-Chee

Collaborating Authors

Tseng, Yu-Chee

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Resolving Positional Ambiguity in Dialogues by Vision-Language Models for Robot Navigation

Chen, Kuan-Lin, Wei, Tzu-Ti, Yeh, Li-Tzu, Kao, Elaine, Tseng, Yu-Chee, Chen, Jen-Jee

arXiv.org Artificial IntelligenceSep-30-2024

We consider an autonomous navigation robot that can accept human commands through natural language to provide services in an indoor environment. These natural language commands may include time, position, object, and action components. However, we observe that the positional components within such commands usually refer to objects in the environment that may contain different levels of positional ambiguity. For example, the command "Go to the chair!" may be ambiguous when there are multiple chairs of the same type in a room. In order to disambiguate these commands, we employ a large language model and a large vision-language model to conduct multiple turns of conversations with the user. We propose a two-level approach that utilizes a vision-language model to map the meanings in natural language to a unique object ID in images and then performs another mapping from the unique object ID to a 3D depth map, thereby allowing the robot to navigate from its current position to the target position. To the best of our knowledge, this is the first work linking foundation models to the positional ambiguity issue.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.12802

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Scale-Aware Crowd Counting Using a Joint Likelihood Density Map and Synthetic Fusion Pyramid Network

Hsieh, Yi-Kuan, Hsieh, Jun-Wei, Tseng, Yu-Chee, Chang, Ming-Ching, Wang, Bor-Shiun

arXiv.org Artificial IntelligenceJan-2-2023

We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of our knowledge, this work is the first to properly handle such noise at multiple scales in end-to-end loss design and thus push the crowd counting state-of-the-art. We model the noise of crowd annotation points as a Gaussian and derive the crowd probability density map from the input image. We then approximate the joint distribution of crowd density maps with the full covariance of multiple scales and derive a low-rank approximation for tractability and efficient implementation. The derived scale-aware loss function is used to train the SPF-Net. We show that it outperforms various loss functions on four public datasets: UCF-QNRF, UCF CC 50, NWPU and ShanghaiTech A-B datasets. The proposed SPF-Net can accurately predict the locations of people in the crowd, despite training on noisy training annotations.

artificial intelligence, crowd counting, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2211.06835

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback