Narain, Jaya
RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data
Xu, Maxwell A., Narain, Jaya, Darnell, Gregory, Hallgrimsson, Haraldur, Jeong, Hyewon, Forde, Darren, Fineman, Richard, Raghuram, Karthik J., Rehg, James M., Ren, Shirley
We present RelCon, a novel self-supervised Relative Contrastive learning approach that uses a learnable distance measure in combination with a softened contrastive loss for training an motion foundation model from wearable sensors. The learned distance provides a measurement of semantic similarity between a pair of accelerometer time-series segments, which is used to measure the distance between an anchor and various other sampled candidate segments. The self-supervised model is trained on 1 billion segments from 87,376 participants from a large wearables dataset. The model achieves strong performance across multiple downstream tasks, encompassing both classification and regression. To our knowledge, we are the first to show the generalizability of a self-supervised learning model with motion data from wearables across distinct evaluation tasks. Advances in self-supervised learning (SSL) combined with the availability of large-scale datasets have resulted in a proliferation of foundation models (FMs) in computer vision (Oquab et al., 2023), NLP (OpenAI et al., 2023), and speech understanding (Yang et al., 2024). These models provide powerful, general-purpose representations for a particular domain of data, and support generalization to a broad set of downstream tasks without the need for finetuning. For example, the image representation contained in the DINOv2 (Oquab et al., 2023) model was trained in an entirely selfsupervised way and achieves state-of-the-art performance on multiple dense image prediction tasks such as depth estimation and semantic segmentation, by decoding a frozen base representation with task-specific heads. In contrast to these advances, the times-series have not yet benefited from the foundation model approach, with a few exceptions (Abbaspourazad et al., 2024; Das et al., 2023). This is particularly unfortunate for problems in mobile health (mHealth) signal analysis, which encompasses data modalities such as accelerometry, PPG, and ECG (Rehg et al., 2017), as the collection of mHealth data from participants can be time-consuming and expensive. However, recent advances in self-supervised learning for mHealth signals (Abbaspourazad et al., 2024; Yuan et al., 2024; Xu et al., 2024) have shown promising performance, raising the question of whether it is now feasible to train foundation models for mHealth signals. In this paper, we demonstrate, for the first time, the feasibility of adopting a foundation model approach for the analysis of accelerometry data across tasks. Accelerometry is an important mHealth signal modality that is used in human activity recognition (HAR) (Haresamudram et al., 2022), physical health status assessment (Xu et al., 2022), energy expenditure estimation (Stutz et al., 2024), and gait assessment (Apple, 2021), among many other tasks.
Do LLMs "know" internally when they follow instructions?
Heo, Juyeon, Heinze-Deml, Christina, Elachqar, Oussama, Ren, Shirley, Nallasamy, Udhay, Miller, Andy, Chan, Kwan Ho Ryan, Narain, Jaya
Instruction-following is crucial for building AI agents with large language models (LLMs), as these models must adhere strictly to user-provided constraints and guidelines. However, LLMs often fail to follow even simple and clear instructions. To improve instruction-following behavior and prevent undesirable outputs, a deeper understanding of how LLMs' internal states relate to these outcomes is required. Our analysis of LLM internal states reveal a dimension in the input embedding space linked to successful instruction-following. We demonstrate that modifying representations along this dimension improves instruction-following success rates compared to random changes, without compromising response quality. Further investigation reveals that this dimension is more closely related to the phrasing of prompts rather than the inherent difficulty of the task or instructions. This discovery also suggests explanations for why LLMs sometimes fail to follow clear instructions and why prompt engineering is often effective, even when the content remains largely unchanged. This work provides insight into the internal workings of LLMs' instruction-following, paving the way for reliable LLM agents. Given the potential of large language models (LLMs), there has been significant interest in utilizing these models to build personal AI agents. For instance, one could imagine deploying an LLM as a personal healthcare assistant, such as a fitness or nutrition planner, or for psychological counseling (Li et al., 2024b; Wang et al., 2023; Tu et al., 2024). Compared to traditional machine learningbased AI agents, LLMs offer the advantage of being easily adaptable through prompting, allowing users to provide guidelines and personal information without the need to retrain model weights. Instruction-following is critical in the development of personal AI agents with LLMs through prompts because these models must adhere to the constraints and guidelines to ensure safe and trustworthy interactions. For example, suppose an LLM is building a personal fitness plan for a user with knee problems.
Do LLMs estimate uncertainty well in instruction-following?
Heo, Juyeon, Xiong, Miao, Heinze-Deml, Christina, Narain, Jaya
Large language models (LLMs) could be valuable personal AI agents across various domains, provided they can precisely follow user instructions. However, recent studies have shown significant limitations in LLMs' instruction-following capabilities, raising concerns about their reliability in high-stakes applications. Accurately estimating LLMs' uncertainty in adhering to instructions is critical to mitigating deployment risks. We present, to our knowledge, the first systematic evaluation of the uncertainty estimation abilities of LLMs in the context of instruction-following. Our study identifies key challenges with existing instruction-following benchmarks, where multiple factors are entangled with uncertainty stems from instruction-following, complicating the isolation and comparison across methods and models. To address these issues, we introduce a controlled evaluation setup with two benchmark versions of data, enabling a comprehensive comparison of uncertainty estimation methods under various conditions. Our findings show that existing uncertainty methods struggle, particularly when models make subtle errors in instruction following. While internal model states provide some improvement, they remain inadequate in more complex scenarios. The insights from our controlled evaluation setups provide a crucial understanding of LLMs' limitations and potential for uncertainty estimation in instruction-following tasks, paving the way for more trustworthy AI agents.
Latent Phrase Matching for Dysarthric Speech
Lea, Colin, Yee, Dianna, Narain, Jaya, Huang, Zifang, Tooley, Lauren, Bigham, Jeffrey P., Findlater, Leah
Many consumer speech recognition systems are not tuned for people with speech disabilities, resulting in poor recognition and user experience, especially for severe speech differences. Recent studies have emphasized interest in personalized speech models from people with atypical speech patterns. We propose a query-by-example-based personalized phrase recognition system that is trained using small amounts of speech, is language agnostic, does not assume a traditional pronunciation lexicon, and generalizes well across speech difference severities. On an internal dataset collected from 32 people with dysarthria, this approach works regardless of severity and shows a 60% improvement in recall relative to a commercial speech recognition system. On the public EasyCall dataset of dysarthric speech, our approach improves accuracy by 30.5%. Performance degrades as the number of phrases increases, but consistently outperforms ASR systems when trained with 50 unique phrases.