Goto

Collaborating Authors

 Prendinger, Helmut


Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery

arXiv.org Artificial Intelligence

In this article, we explore the potential of zero-shot Large Multimodal Models (LMMs) in the domain of drone perception. We focus on person detection and action recognition tasks and evaluate two prominent LMMs, namely YOLO-World and GPT-4V(ision) using a publicly available dataset captured from aerial views. Traditional deep learning approaches rely heavily on large and high-quality training datasets. However, in certain robotic settings, acquiring such datasets can be resource-intensive or impractical within a reasonable timeframe. The flexibility of prompt-based Large Multimodal Models (LMMs) and their exceptional generalization capabilities have the potential to revolutionize robotics applications in these scenarios. Our findings suggest that YOLO-World demonstrates good detection performance. GPT-4V struggles with accurately classifying action classes but delivers promising results in filtering out unwanted region proposals and in providing a general description of the scenery. This research represents an initial step in leveraging LMMs for drone perception and establishes a foundation for future investigations in this area.


Accurate Household Occupant Behavior Modeling Based on Data Mining Techniques

AAAI Conferences

An important requirement of household energy simulation models is their accuracy in estimating energy demand and its fluctuations. Occupant behavior has a major impact upon energy demand. However, Markov chains, the traditional approach to model occupant behavior, (1) has limitations in accurately capturing the coordinated behavior of occupants and (2) is prone to over-fitting. To address these issues, we propose a novel approach that relies on a combination of data mining techniques. The core idea of our model is to determine the behavior of occupants based on nearest neighbor comparison over a database of sample data. Importantly, the model takes into account features related to the coordination of occupants' activities. We use a customized distance function suited for mixed categorical and numerical data. Further, association rule learning allows us to capture the coordination between occupants. Using real data from four households in Japan we are able to show that our model outperforms the traditional Markov chain model with respect to occupant coordination and generalization of behavior patterns.


Evaluating HILDA in the CODA Project: A Case Study in Question Generation Using Automatic Discourse Analysis

AAAI Conferences

Recent studies on question generation identify the need for automatic discourse analysers. We evaluated the feasibility of integrating an available discourse analyser called HILDA for a specific question generation system called CODA; introduce an approach by extracting a discourse corpus from the CODA parallel corpus; and identified future work towards automatic discourse analysis in the domain of question generation.