Maekawa, Takuya
Reconstructing Depth Images of Moving Objects from Wi-Fi CSI Data
Cao, Guanyu, Maekawa, Takuya, Ohara, Kazuya, Kishino, Yasue
This study proposes a new deep learning method for reconstructing depth images of moving objects within a specific area using Wi-Fi channel state information (CSI). The Wi-Fi-based depth imaging technique has novel applications in domains such as security and elder care. However, reconstructing depth images from CSI is challenging because learning the mapping function between CSI and depth images, both of which are high-dimensional data, is particularly difficult. To address the challenge, we propose a new approach called Wi-Depth. The main idea behind the design of Wi-Depth is that a depth image of a moving object can be decomposed into three core components: the shape, depth, and position of the target. Therefore, in the depth-image reconstruction task, Wi-Depth simultaneously estimates the three core pieces of information as auxiliary tasks in our proposed VAE-based teacher-student architecture, enabling it to output images with the consistency of a correct shape, depth, and position. In addition, the design of Wi-Depth is based on our idea that this decomposition efficiently takes advantage of the fact that shape, depth, and position relate to primitive information inferred from CSI such as angle-of-arrival, time-of-flight, and Doppler frequency shift.
Unsupervised Human Activity Recognition through Two-stage Prompting with ChatGPT
Xia, Qingxin, Maekawa, Takuya, Hara, Takahiro
Wearable sensor devices, which offer the advantage of recording daily objects used by a person while performing an activity, enable the feasibility of unsupervised Human Activity Recognition (HAR). Unfortunately, previous unsupervised approaches using the usage sequence of objects usually require a proper description of activities manually prepared by humans. Instead, we leverage the knowledge embedded in a Large Language Model (LLM) of ChatGPT. Because the sequence of objects robustly characterizes the activity identity, it is possible that ChatGPT already learned the association between activities and objects from existing contexts. However, previous prompt engineering for ChatGPT exhibits limited generalization ability when dealing with a list of words (i.e., sequence of objects) due to the similar weighting assigned to each word in the list. In this study, we propose a two-stage prompt engineering, which first guides ChatGPT to generate activity descriptions associated with objects while emphasizing important objects for distinguishing similar activities; then outputs activity classes and explanations for enhancing the contexts that are helpful for HAR. To the best of our knowledge, this is the first study that utilizes ChatGPT to recognize activities using objects in an unsupervised manner. We conducted our approach on three datasets and demonstrated the state-of-the-art performance.
A benchmark for computational analysis of animal behavior, using animal-borne tags
Hoffman, Benjamin, Cusimano, Maddie, Baglione, Vittorio, Canestrari, Daniela, Chevallier, Damien, DeSantis, Dominic L., Jeantet, Lorène, Ladds, Monique A., Maekawa, Takuya, Mata-Silva, Vicente, Moreno-González, Víctor, Trapote, Eva, Vainio, Outi, Vehkaoja, Antti, Yoda, Ken, Zacarian, Katherine, Friedlaender, Ari, Rutz, Christian
Animal-borne sensors ('bio-loggers') can record a suite of kinematic and environmental data, which can elucidate animal ecophysiology and improve conservation efforts. Machine learning techniques are useful for interpreting the large amounts of data recorded by bio-loggers, but there exists no standard for comparing the different machine learning techniques in this domain. To address this, we present the Bio-logger Ethogram Benchmark (BEBE), a collection of datasets with behavioral annotations, standardized modeling tasks, and evaluation metrics. BEBE is to date the largest, most taxonomically diverse, publicly available benchmark of this type, and includes 1654 hours of data collected from 149 individuals across nine taxa. We evaluate the performance of ten different machine learning methods on BEBE, and identify key challenges to be addressed in future work. Datasets, models, and evaluation code are made publicly available at https://github.com/earthspecies/BEBE, to enable community use of BEBE as a point of comparison in methods development.
OpenPack: A Large-scale Dataset for Recognizing Packaging Works in IoT-enabled Logistic Environments
Yoshimura, Naoya, Morales, Jaime, Maekawa, Takuya, Hara, Takahiro
Unlike human daily activities, existing publicly available sensor datasets for work activity recognition in industrial domains are limited by difficulties in collecting realistic data as close collaboration with industrial sites is required. This also limits research on and development of AI methods for industrial applications. To address these challenges and contribute to research on machine recognition of work activities in industrial domains, in this study, we introduce a new large-scale dataset for packaging work recognition called OpenPack. OpenPack contains 53.8 hours of multimodal sensor data, including keypoints, depth images, acceleration data, and readings from IoT-enabled devices (e.g., handheld barcode scanners used in work procedures), collected from 16 distinct subjects with different levels of packaging work experience. On the basis of this dataset, we propose a neural network model designed to recognize work activities, which efficiently fuses sensor data and readings from IoT-enabled devices by processing them within different streams in a ladder-shaped architecture, and the experiment showed the effectiveness of the architecture. We believe that OpenPack will contribute to the community of action/activity recognition with sensors. OpenPack dataset is available at https://open-pack.github.io/.
Using Social Media Background to Improve Cold-start Recommendation Deep Models
Zhang, Yihong, Maekawa, Takuya, Hara, Takahiro
In recommender systems, a cold-start problem occurs when there is no past interaction record associated with the user or item. Typical solutions to the cold-start problem make use of contextual information, such as user demographic attributes or product descriptions. A group of works have shown that social media background can help predicting temporal phenomenons such as product sales and stock price movements. In this work, our goal is to investigate whether social media background can be used as extra contextual information to improve recommendation models. Based on an existing deep neural network model, we proposed a method to represent temporal social media background as embeddings and fuse them as an extra component in the model. We conduct experimental evaluations on a real-world e-commerce dataset and a Twitter dataset. The results show that our method of fusing social media background with the existing model does generally improve recommendation performance. In some cases the recommendation accuracy measured by hit-rate@K doubles after fusing with social media background. Our findings can be beneficial for future recommender system designs that consider complex temporal information representing social interests.
Generating an Event Timeline About Daily Activities From a Semantic Concept Stream
Miyanishi, Taiki (Advanced Telecommunications Research Institute International (ATR)) | Hirayama, Jun-ichiro (RIKEN Center for Advanced Intelligence Project (AIP)) | Maekawa, Takuya (Advanced Telecommunications Research Institute International (ATR)) | Kawanabe, Motoaki (Graduate School of Information Science and Technology, Osaka University)
Recognizing activities of daily living (ADLs) in the real world is an important task for understanding everyday human life. However, even though our life events consist of chronological ADLs with the corresponding places and objects (e.g., drinking coffee in the living room after making coffee in the kitchen and walking to the living room), most existing works focus on predicting individual activity labels from sensor data. In this paper, we introduce a novel framework that produces an event timeline of ADLs in a home environment. The proposed method combines semantic concepts such as action, object, and place detected by sensors for generating stereotypical event sequences with the following three real-world properties. First, we use temporal interactions among concepts to remove objects and places unrelated to each action. Second, we use commonsense knowledge mined from a language resource to find a possible combination of concepts in the real world. Third, we use temporal variations of events to filter repetitive events, since our daily life changes over time. We use cross-place validation to evaluate our proposed method on a daily-activities dataset with manually labeled event descriptions. The empirical evaluation demonstrates that our method using real-world properties improves the performance of generating an event timeline over diverse environments.
Egocentric Video Search via Physical Interactions
Miyanishi, Taiki (Advanced Telecommunications Research Institute International) | Hirayama, Jun-ichiro (Advanced Telecommunications Research Institute International) | Kong, Quan (Osaka University) | Maekawa, Takuya (Osaka University) | Moriya, Hiroki (Advanced Telecommunications Research Institute International) | Suyama, Takayuki (Advanced Telecommunications Research Institute International)
Retrieving past egocentric videos about personal daily life is important to support and augment human memory. Most previous retrieval approaches have ignored the crucial feature of human-physical world interactions, which is greatly related to our memory and experience of daily activities. In this paper, we propose a gesture-based egocentric video retrieval framework, which retrieves past visual experience using body gestures as non-verbal queries. We use a probabilistic framework based on a canonical correlation analysis that models physical interactions through a latent space and uses them for egocentric video retrieval and re-ranking search results. By incorporating physical interactions into the retrieval models, we address the problems resulting from the variability of human motions. We evaluate our proposed method on motion and egocentric video datasets about daily activities in household settings and demonstrate that our egocentric video retrieval framework robustly improves retrieval performance when retrieving past videos from personal and even other persons' video archives.