AITopics | Su, Hung-Ting

Collaborating Authors

Su, Hung-Ting

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses

Su, Hung-Ting, Hsu, Ya-Ching, Lin, Xudong, Shi, Xiang-Qian, Niu, Yulei, Hsu, Han-Yuan, Lee, Hung-yi, Hsu, Winston H.

arXiv.org Artificial IntelligenceSep-22-2024

Large language models (LLMs) equipped with chain-of-thoughts (CoT) prompting have shown significant multi-step reasoning capabilities in factual content like mathematics, commonsense, and logic. However, their performance in narrative reasoning, which demands greater abstraction capabilities, remains unexplored. This study utilizes tropes in movie synopses to assess the narrative reasoning abilities of state-of-the-art LLMs and uncovers their low performance. We introduce a trope-wise querying approach to address these challenges and boost the F1 score by 11.8 points. Moreover, Figure 1: While LLMs have revolutionized NLP reasoning, while prior studies suggest that CoT enhances surpassing previous supervised learning (SL) multi-step reasoning, this study shows methods and even reaching human-level performance CoT can cause hallucinations in narrative content, on some tasks, their limitations become apparent when reducing GPT-4's performance. We also tested against the Trope dataset. NLU: Natural Language introduce an Adversarial Injection method to Understanding, CS: Commonsense. Check Section embed trope-related text tokens into movie synopses 1 and 2.2 for details.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2409.14324

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.68)
Leisure & Entertainment (0.46)
Law Enforcement & Public Safety (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Add feedback

Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

Su, Hung-Ting, Chao, Chun-Tong, Hsu, Ya-Ching, Lin, Xudong, Niu, Yulei, Lee, Hung-Yi, Hsu, Winston H.

arXiv.org Artificial IntelligenceJun-16-2024

Large Language Models (LLMs) have demonstrated effectiveness not only in language tasks but also in video reasoning. This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills: (1) Abstract Perception: understanding and tokenizing abstract concepts in videos, and (2) Long-range Compositional Reasoning: planning and integrating intermediate reasoning steps for understanding long-range videos with numerous frames. Utilizing tropes from movie storytelling, TiM evaluates the reasoning capabilities of state-of-the-art LLM-based approaches. Our experiments show that current methods, including Captioner-Reasoner, Large Multimodal Model Instruction Fine-tuning, and Visual Programming, only marginally outperform a random baseline when tackling the challenges of Abstract Perception and Long-range Compositional Reasoning. To address these deficiencies, we propose Face-Enhanced Viper of Role Interactions (FEVoRI) and Context Query Reduction (ConQueR), which enhance Visual Programming by fostering role interaction awareness and progressively refining movie contexts and trope queries during reasoning processes, significantly improving performance by 15 F1 points. However, this performance still lags behind human levels (40 vs. 65 F1). Additionally, we introduce a new protocol to evaluate the necessity of Abstract Perception and Long-range Compositional Reasoning for task resolution. This is done by analyzing the code generated through Visual Programming using an Abstract Syntax Tree (AST), thereby confirming the increased complexity of TiM. The dataset and code are available at: https://ander1119.github.io/TiM

large language model, long-range compositional reasoning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2406.10923

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.69)
Media > Film (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Enhancing Sustainable Urban Mobility Prediction with Telecom Data: A Spatio-Temporal Framework Approach

Lin, ChungYi, Tung, Shen-Lung, Su, Hung-Ting, Hsu, Winston H.

arXiv.org Artificial IntelligenceMay-26-2024

Traditional traffic prediction, limited by the scope of sensor data, falls short in comprehensive traffic management. Mobile networks offer a promising alternative using network activity counts, but these lack crucial directionality. Thus, we present the TeltoMob dataset, featuring undirected telecom counts and corresponding directional flows, to predict directional mobility flows on roadways. To address this, we propose a two-stage spatio-temporal graph neural network (STGNN) framework. The first stage uses a pre-trained STGNN to process telecom data, while the second stage integrates directional and geographic insights for accurate prediction. Our experiments demonstrate the framework's compatibility with various STGNN models and confirm its effectiveness. We also show how to incorporate the framework into real-world transportation systems, enhancing sustainable urban mobility.

machine learning, mobility flow, temporal reasoning, (19 more...)

arXiv.org Artificial Intelligence

2405.17507

Country: Asia (0.14)

Genre: Research Report (0.50)

Industry:

Telecommunications (1.00)
Information Technology (1.00)
Transportation > Infrastructure & Services (0.92)
Transportation > Ground > Road (0.49)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.40)

Add feedback

Tracking-Assisted Object Detection with Event Cameras

Yen, Ting-Kang, Morawski, Igor, Dangi, Shusil, He, Kai, Lin, Chung-Yi, Yeh, Jia-Fong, Su, Hung-Ting, Hsu, Winston

arXiv.org Artificial IntelligenceMar-27-2024

Event-based object detection has recently garnered attention in the computer vision community due to the exceptional properties of event cameras, such as high dynamic range and no motion blur. However, feature asynchronism and sparsity cause invisible objects due to no relative motion to the camera, posing a significant challenge in the task. Prior works have studied various memory mechanisms to preserve as many features as possible at the current time, guided by temporal clues. While these implicit-learned memories retain some short-term information, they still struggle to preserve long-term features effectively. In this paper, we consider those invisible objects as pseudo-occluded objects and aim to reveal their features. Firstly, we introduce visibility attribute of objects and contribute an auto-labeling algorithm to append additional visibility labels on an existing event camera dataset. Secondly, we exploit tracking strategies for pseudo-occluded objects to maintain their permanence and retain their bounding boxes, even when features have not been available for a very long time. These strategies can be treated as an explicit-learned memory guided by the tracking objective to record the displacements of objects across frames. Lastly, we propose a spatio-temporal feature aggregation module to enrich the latent features and a consistency loss to increase the robustness of the overall pipeline. We conduct comprehensive experiments to verify our method's effectiveness where still objects are retained but real occluded objects are discarded. The results demonstrate that (1) the additional visibility labels can assist in supervised training, and (2) our method outperforms state-of-the-art approaches with a significant improvement of 7.9% absolute mAP.

artificial intelligence, machine learning, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2403.1833

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Tel2Veh: Fusion of Telecom Data and Vehicle Flow to Predict Camera-Free Traffic via a Spatio-Temporal Framework

Lin, ChungYi, Tung, Shen-Lung, Su, Hung-Ting, Hsu, Winston H.

arXiv.org Artificial IntelligenceMar-5-2024

Vehicle flow, a crucial indicator for transportation, is often limited by detector coverage. With the advent of extensive mobile network coverage, we can leverage mobile user activities, or cellular traffic, on roadways as a proxy for vehicle flow. However, as counts of cellular traffic may not directly align with vehicle flow due to data from various user types, we present a new task: predicting vehicle flow in camera-free areas using cellular traffic. To uncover correlations within multi-source data, we deployed cameras on selected roadways to establish the Tel2Veh dataset, consisting of extensive cellular traffic and sparse vehicle flows. Addressing this challenge, we propose a framework that independently extracts features and integrates them with a graph neural network (GNN)-based fusion to discern disparities, thereby enabling the prediction of unseen vehicle flows using cellular traffic. This work advances the use of telecom data in transportation and pioneers the fusion of telecom and vision-based data, offering solutions for traffic management.

artificial intelligence, machine learning, temporal reasoning, (14 more...)

arXiv.org Artificial Intelligence

2403.12991

Genre: Research Report (0.40)

Industry:

Telecommunications (1.00)
Information Technology > Networks (0.35)
Transportation > Infrastructure & Services (0.32)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

AED: Adaptable Error Detection for Few-shot Imitation Policy

Yeh, Jia-Fong, Hung, Kuo-Han, Lo, Pang-Chi, Chung, Chi-Ming, Wu, Tsung-Han, Su, Hung-Ting, Chen, Yi-Ting, Hsu, Winston H.

arXiv.org Artificial IntelligenceFeb-6-2024

We study how to report few-shot imitation (FSI) policies' behavior errors in novel environments, a novel task named adaptable error detection (AED). The potential to cause serious damage to surrounding areas limits the application of FSI policies in real-world scenarios. Thus, a robust system is necessary to notify operators when FSI policies are inconsistent with the intent of demonstrations. We develop a cross-domain benchmark for the challenging AED task, consisting of 329 base and 158 novel environments. This task introduces three challenges, including (1) detecting behavior errors in novel environments, (2) behavior errors occurring without revealing notable changes, and (3) lacking complete temporal information of the rollout due to the necessity of online detection. To address these challenges, we propose Pattern Observer (PrObe) to parse discernible patterns in the policy feature representations of normal or error states, whose effectiveness is verified in the proposed benchmark. Through our comprehensive evaluation, PrObe consistently surpasses strong baselines and demonstrates a robust capability to identify errors arising from a wide range of FSI policies. Moreover, we conduct comprehensive ablations and experiments (error correction, demonstration quality, etc.) to validate the practicality of our proposed task and methodology.

artificial intelligence, demonstration, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.0386

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Data Science (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

TelTrans: Applying Multi-Type Telecom Data to Transportation Evaluation and Prediction via Multifaceted Graph Modeling

Lin, ChungYi, Tung, Shen-Lung, Su, Hung-Ting, Hsu, Winston H.

arXiv.org Artificial IntelligenceJan-6-2024

To address the limitations of traffic prediction from location-bound detectors, we present Geographical Cellular Traffic (GCT) flow, a novel data source that leverages the extensive coverage of cellular traffic to capture mobility patterns. Our extensive analysis validates its potential for transportation. Focusing on vehicle-related GCT flow prediction, we propose a graph neural network that integrates multivariate, temporal, and spatial facets for improved accuracy. Experiments reveal our model's superiority over baselines, especially in long-term predictions. We also highlight the potential for GCT flow integration into transportation systems.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2401.03138

Genre: Research Report (0.50)

Industry:

Telecommunications (1.00)
Information Technology > Security & Privacy (0.93)
Transportation > Infrastructure & Services (0.92)
Transportation > Ground > Road (0.72)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Security & Privacy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Unsupervised Adversarial Detection without Extra Model: Training Loss Should Change

Chyou, Chien Cheng, Su, Hung-Ting, Hsu, Winston H.

arXiv.org Artificial IntelligenceAug-6-2023

Adversarial robustness poses a critical challenge in the deployment of deep learning models for real-world applications. Traditional approaches to adversarial training and supervised detection rely on prior knowledge of attack types and access to labeled training data, which is often impractical. Existing unsupervised adversarial detection methods identify whether the target model works properly, but they suffer from bad accuracies owing to the use of common cross-entropy training loss, which relies on unnecessary features and strengthens adversarial attacks. We propose new training losses to reduce useless features and the corresponding detection method without prior knowledge of adversarial attacks. The detection rate (true positive rate) against all given white-box attacks is above 93.9% except for attacks without limits (DF($\infty$)), while the false positive rate is barely 2.5%. The proposed method works well in all tested attack types and the false positive rates are even better than the methods good at certain types.

artificial intelligence, machine learning, raw output, (16 more...)

arXiv.org Artificial Intelligence

2308.03243

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

Su, Hung-Ting, Niu, Yulei, Lin, Xudong, Hsu, Winston H., Chang, Shih-Fu

arXiv.org Artificial IntelligenceApr-7-2023

Causal Video Question Answering (CVidQA) queries not only association or temporal relations but also causal relations in a video. Existing question synthesis methods pre-trained question generation (QG) systems on reading comprehension datasets with text descriptions as inputs. However, QG models only learn to ask association questions (e.g., ``what is someone doing...'') and result in inferior performance due to the poor transfer of association knowledge to CVidQA, which focuses on causal questions like ``why is someone doing ...''. Observing this, we proposed to exploit causal knowledge to generate question-answer pairs, and proposed a novel framework, Causal Knowledge Extraction from Language Models (CaKE-LM), leveraging causal commonsense knowledge from language models to tackle CVidQA. To extract knowledge from LMs, CaKE-LM generates causal questions containing two events with one triggering another (e.g., ``score a goal'' triggers ``soccer player kicking ball'') by prompting LM with the action (soccer player kicking ball) to retrieve the intention (to score a goal). CaKE-LM significantly outperforms conventional methods by 4% to 6% of zero-shot CVidQA accuracy on NExT-QA and Causal-VidQA datasets. We also conduct comprehensive analyses and provide key findings for future research.

artificial intelligence, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2304.03754

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Sports > Soccer (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

BIRD-PCC: Bi-directional Range Image-based Deep LiDAR Point Cloud Compression

Liu, Chia-Sheng, Yeh, Jia-Fong, Hsu, Hao, Su, Hung-Ting, Lee, Ming-Sui, Hsu, Winston H.

arXiv.org Artificial IntelligenceMar-8-2023

The large amount of data collected by LiDAR sensors brings the issue of LiDAR point cloud compression (PCC). Previous works on LiDAR PCC have used range image representations and followed the predictive coding paradigm to create a basic prototype of a coding framework. However, their prediction methods give an inaccurate result due to the negligence of invalid pixels in range images and the omission of future frames in the time step. Moreover, their handcrafted design of residual coding methods could not fully exploit spatial redundancy. To remedy this, we propose a coding framework BIRD-PCC. Our prediction module is aware of the coordinates of invalid pixels in range images and takes a bidirectional scheme. Also, we introduce a deep-learned residual coding module that can further exploit spatial redundancy within a residual frame. Experiments conducted on SemanticKITTI and KITTI-360 datasets show that BIRD-PCC outperforms other methods in most bitrate conditions and generalizes well to unseen environments.

artificial intelligence, compression, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.04027

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback