Goto

Collaborating Authors

 Tailor, Shyam


Scaling Wearable Foundation Models

arXiv.org Artificial Intelligence

Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks like exercise and activity recognition.


Transforming Wearable Data into Health Insights using Large Language Model Agents

arXiv.org Artificial Intelligence

Personal health data, often derived from personal devices such as wearables, are distinguished by their multi-dimensional, continuous and longitudinal measurements that capture granular observations of physiology and behavior in-situ rather than in a clinical setting. Research studies have highlighted the significant health impacts of physical activity and sleep patterns, emphasizing the potential for wearable-derived data to reveal personalized health insights and promote positive behavior changes [1, 4, 30, 46, 47]. For example, individuals with a device-measured Physical Activity Energy Expenditure (PAEE) that is 5 kJ/kg/day higher had a 37% lower premature mortality risk [47]. Those with frequent sleep disturbances were associated with an increase in risk of hypertension, diabetes and cardiovascular diseases [9, 30]. A large meta-analysis suggests that activity trackers improve physical activity and promote weight loss, with users taking 1800 extra steps per day [16]. Despite these gross benefits, using wearable data to derive intelligent responses and insights to personal health queries is non-trivial. These data are usually collected without clinical supervision and users often do not have access to the expertise that could aid in data interpretation. For example, a common question of wearable device users is "How can I get better sleep?". Though a seemingly straightforward question, arriving at an ideal response would involve performing a series of complex, independent analytical steps across multiple irregularly sampled time series such as: checking the availability of recent data, deciding on metrics to optimize (e.g.


Towards a Personal Health Large Language Model

arXiv.org Artificial Intelligence

In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We created and curated three datasets that test 1) production of personalized insights and recommendations from sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep outcomes. For the first task we designed 857 case studies in collaboration with domain experts to assess real-world scenarios in sleep and fitness. Through comprehensive evaluation of domain-specific rubrics, we observed that Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. We evaluated PH-LLM domain knowledge using multiple choice sleep medicine and fitness examinations. PH-LLM achieved 79% on sleep and 88% on fitness, exceeding average scores from a sample of human experts. Finally, we trained PH-LLM to predict self-reported sleep quality outcomes from textual and multimodal encoding representations of wearable data, and demonstrate that multimodal encoding is required to match performance of specialized discriminative models. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM.