Not enough data to create a plot.
Try a different view from the menu above.
Wang, Yuhan
G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios
Wang, Zeyu, Shi, Yuanchun, Wang, Yuntao, Yao, Yuchen, Yan, Kun, Wang, Yuhan, Ji, Lei, Xu, Xuhai, Yu, Chun
Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze -- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables -- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual field, and voice-based natural language queries to facilitate a more intuitive querying process. In a user-enactment study involving 21 participants in 3 daily scenarios (p = 21, scene = 3), we revealed the ambiguity in users' query language and a gaze-voice coordination pattern in users' natural query behaviors with G-VOILA. Based on the quantitative and qualitative findings, we developed a design framework for the G-VOILA paradigm, which effectively integrates the gaze data with the in-situ querying context. Then we implemented a G-VOILA proof-of-concept using cutting-edge deep learning techniques. A follow-up user study (p = 16, scene = 2) demonstrates its effectiveness by achieving both higher objective score and subjective score, compared to a baseline without gaze data. We further conducted interviews and provided insights for future gaze-facilitated information querying systems.
Lasso Ridge based XGBoost and Deep_LSTM Help Tennis Players Perform better
Zhai, Wankang, Wang, Yuhan
Understanding the dynamics of momentum and game fluctuation in tennis matches is cru-cial for predicting match outcomes and enhancing player performance. In this study, we present a comprehensive analysis of these factors using a dataset from the 2023 Wimbledon final. Ini-tially, we develop a sliding-window-based scoring model to assess player performance, ac-counting for the influence of serving dominance through a serve decay factor. Additionally, we introduce a novel approach, Lasso-Ridge-based XGBoost, to quantify momentum effects, lev-eraging the predictive power of XGBoost while mitigating overfitting through regularization. Through experimentation, we achieve an accuracy of 94% in predicting match outcomes, iden-tifying key factors influencing winning rates. Subsequently, we propose a Derivative of the winning rate algorithm to quantify game fluctuation, employing an LSTM_Deep model to pre-dict fluctuation scores. Our model effectively captures temporal correlations in momentum fea-tures, yielding mean squared errors ranging from 0.036 to 0.064. Furthermore, we explore me-ta-learning using MAML to transfer our model to predict outcomes in ping-pong matches, though results indicate a comparative performance decline. Our findings provide valuable in-sights into momentum dynamics and game fluctuation, offering implications for sports analytics and player training strategies.
LSTTN: A Long-Short Term Transformer-based Spatio-temporal Neural Network for Traffic Flow Forecasting
Luo, Qinyao, He, Silu, Han, Xing, Wang, Yuhan, Li, Haifeng
Accurate traffic forecasting is a fundamental problem in intelligent transportation systems and learning long-range traffic representations with key information through spatiotemporal graph neural networks (STGNNs) is a basic assumption of current traffic flow prediction models. However, due to structural limitations, existing STGNNs can only utilize short-range traffic flow data; therefore, the models cannot adequately learn the complex trends and periodic features in traffic flow. Besides, it is challenging to extract the key temporal information from the long historical traffic series and obtain a compact representation. To solve the above problems, we propose a novel LSTTN (Long-Short Term Transformer-based Network) framework comprehensively considering the long- and short-term features in historical traffic flow. First, we employ a masked subseries Transformer to infer the content of masked subseries from a small portion of unmasked subseries and their temporal context in a pretraining manner, forcing the model to efficiently learn compressed and contextual subseries temporal representations from long historical series. Then, based on the learned representations, long-term trend is extracted by using stacked 1D dilated convolution layers, and periodic features are extracted by dynamic graph convolution layers. For the difficulties in making time-step level prediction, LSTTN adopts a short-term trend extractor to learn fine-grained short-term temporal features. Finally, LSTTN fuses the long-term trend, periodic features and short-term features to obtain the prediction results. Experiments on four real-world datasets show that in 60-minute-ahead long-term forecasting, the LSTTN model achieves a minimum improvement of 5.63\% and a maximum improvement of 16.78\% over baseline models. The source code is available at https://github.com/GeoX-Lab/LSTTN.
Towards Dense and Accurate Radar Perception Via Efficient Cross-Modal Diffusion Model
Zhang, Ruibin, Xue, Donglai, Wang, Yuhan, Geng, Ruixu, Gao, Fei
Millimeter wave (mmWave) radars have attracted significant attention from both academia and industry due to their capability to operate in extreme weather conditions. However, they face challenges in terms of sparsity and noise interference, which hinder their application in the field of micro aerial vehicle (MAV) autonomous navigation. To this end, this paper proposes a novel approach to dense and accurate mmWave radar point cloud construction via cross-modal learning. Specifically, we introduce diffusion models, which possess state-of-the-art performance in generative modeling, to predict LiDAR-like point clouds from paired raw radar data. We also incorporate the most recent diffusion model inference accelerating techniques to ensure that the proposed method can be implemented on MAVs with limited computing resources.We validate the proposed method through extensive benchmark comparisons and real-world experiments, demonstrating its superior performance and generalization ability. Code and pretrained models will be available at https://github.com/ZJU-FAST-Lab/Radar-Diffusion.
Development of a Reliable and Accessible Caregiving Language Model (CaLM)
Parmanto, Bambang, Aryoyudanta, Bayu, Soekinto, Wilbert, Setiawan, I Made Agus, Wang, Yuhan, Hu, Haomin, Saptono, Andi, Choi, Yong K.
Unlike professional caregivers, family caregivers often assume this role without formal preparation or training. Because of this, there is an urgent need to enhance the capacity of family caregivers to provide quality care. Large language models can potentially be used as a foundation technology for supporting caregivers as educational tools or as adjunct to care. This study aimed to develop a reliable Caregiving Language Model (CaLM) by using FMs and a caregiving knowledge base, develop an accessible CaLM using a small FM that requires fewer computing resources, and evaluate the performance of the model compared to a large FM. We developed CaLM using the Retrieval Augmented Generation (RAG) framework combined with FM fine-tuning for improving the quality of FM answers by grounding the model on a caregiving knowledge base. We used two small FMs as candidates for the FM of CaLM (LLaMA-2 and Falcon with 7B parameters) and larger FM GPT-3.5 as a benchmark. We developed the caregiving knowledge base by gathering various types of documents from the Internet. In this study, we focused on caregivers of individuals with Alzheimer's Disease Related Dementias. We evaluated the models' performance using the benchmark metrics commonly used in evaluating language models and their reliability to provide accurate references with the answers. The RAG framework improved the performance of all FMs used in this study across all measures. As expected, the large FM performed better than small FMs across all metrics. The most interesting result is that small fine-tuned FMs with RAG performed significantly better than GPT 3.5 across all metrics. The fine-tuned LLaMA-2 small FM performed better than GPT 3.5 (even with RAG) in returning references with the answers. The study shows that reliable and accessible CaLM can be developed by using small FMs with a knowledge base specific to the caregiving domain.
MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention
Wu, Ruolan, Yu, Chun, Pan, Xiaole, Liu, Yujia, Zhang, Ningning, Fu, Yue, Wang, Yuhan, Zheng, Zhi, Chen, Li, Jiang, Qiaolei, Xu, Xuhai, Shi, Yuanchun
Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users' physical contexts and mental states. We first conduct a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leverage large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We develop MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users' in-the-moment physical contexts, mental states, app usage behaviors, users' goals & habits as input, and generates high-quality and flexible persuasive content with appropriate persuasion strategies. We conduct a 5-week field experiment (N=25) to compare MindShift with baseline techniques. The results show that MindShift significantly improves intervention acceptance rates by 17.8-22.5% and reduces smartphone use frequency by 12.1-14.4%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains.