reminder
Microsoft Copilot claims it can set reminders. My phone never buzzed
PCWorld tested Microsoft Copilot's new reminder feature for Android and iOS phones, which allows setting reminders from a PC similar to old Cortana functions. The feature proved unreliable during testing, with reminders failing to trigger notifications on devices, raising concerns about Copilot's overall utility. With SimilarWeb reporting only 1 percent usage figures for Copilot, this unreliability could further impact user trust and adoption rates. Microsoft has quietly added reminders to Copilot. Well, at least Copilot seems to think so.
- Information Technology > Security & Privacy (0.79)
- Leisure & Entertainment > Games > Computer Games (0.60)
VIGIL: A Reflective Runtime for Self-Healing Agents
Agentic LLM frameworks promise autonomous behavior via task decomposition, tool use, and iterative planning, but most deployed systems remain brittle. They lack runtime introspection, cannot diagnose their own failure modes, and do not improve over time without human intervention. In practice, many agent stacks degrade into decorated chains of LLM calls with no structural mechanisms for reliability. We present VIGIL (Verifiable Inspection and Guarded Iterative Learning), a reflective runtime that supervises a sibling agent and performs autonomous maintenance rather than task execution. VIGIL ingests behavioral logs, appraises each event into a structured emotional representation, maintains a persistent EmoBank with decay and contextual policies, and derives an RBT diagnosis that sorts recent behavior into strengths, opportunities, and failures. From this analysis, VIGIL generates both guarded prompt updates that preserve core identity semantics and read only code proposals produced by a strategy engine that operates on log evidence and code hotspots. VIGIL functions as a state gated pipeline. Illegal transitions produce explicit errors rather than allowing the LLM to improvise. In a reminder latency case study, VIGIL identified elevated lag, proposed prompt and code repairs, and when its own diagnostic tool failed due to a schema conflict, it surfaced the internal error, produced a fallback diagnosis, and emitted a repair plan. This demonstrates meta level self repair in a deployed agent runtime.
Pebble Index: Everything You Need to Know About the 75 Smart Ring
You can speak into the Pebble Index to have it remember things or set reminders, timers, and tasks. Pebble is on a roll--happily skipping along a calm lake, if you will. The resurrected smartwatch company recovered its trademarked name a few months ago, shipped all its new Pebble 2 Duo watches, and is about to start shipping the Pebble 2 Time, which alone received more than 25,000 preorders. But the company is already moving on to some new hardware: the Pebble Index 01 . And unlike most other smart rings, the Pebble Index doesn't measure your heart rate or track your sleep.
- North America > United States > California (0.05)
- Europe > Slovakia (0.05)
- Europe > Czechia (0.05)
- Information Technology (0.71)
- Health & Medicine > Therapeutic Area (0.35)
- Information Technology > Communications > Mobile (0.98)
- Information Technology > Artificial Intelligence > Natural Language (0.71)
Drift No More? Context Equilibria in Multi-Turn LLM Interactions
Dongre, Vardhan, Rossi, Ryan A., Lai, Viet Dac, Yoon, David Seunghyun, Hakkani-Tür, Dilek, Bui, Trung
Large Language Models (LLMs) excel at single-turn tasks such as instruction following and summarization, yet real-world deployments require sustained multi-turn interactions where user goals and conversational context persist and evolve. A recurring challenge in this setting is context drift: the gradual divergence of a model's outputs from goal-consistent behavior across turns. Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics. In this work, we present a study of context drift in multi-turn interactions and propose a simple dynamical framework to interpret its behavior. We formalize drift as the turn-wise KL divergence between the token-level predictive distributions of the test model and a goal-consistent reference model, and propose a recurrence model that interprets its evolution as a bounded stochastic process with restoring forces and controllable interventions. We instantiate this framework in both synthetic long-horizon rewriting tasks and realistic user-agent simulations such as in $τ$-Bench, measuring drift for several open-weight LLMs that are used as user simulators. Our experiments consistently reveal stable, noise-limited equilibria rather than runaway degradation, and demonstrate that simple reminder interventions reliably reduce divergence in line with theoretical predictions. Together, these results suggest that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay, providing a foundation for studying and mitigating context drift in extended interactions.
Appendix A Reminders about integral probability metrics Let
In the context of Section 4.1, we have (at least) the following instantiations of Assumption 4.2: (i) Assume the reward is bounded by r We provide a proof for Lemma 4.1 for completeness. Now we prove Theorem 4.2. We first note that a two-sided bound follows from Lemma 4.1: | η We outline the practical MOPO algorithm in Algorithm 2. To answer question (3), we conduct a thorough ablation study on MOPO. The main goal of the ablation study is to understand how the choice of reward penalty affects performance. Require: reward penalty coefficient λ rollout horizon h, rollout batch size b .
- North America > United States (0.29)
- Asia > China (0.15)
- Media (1.00)
- Leisure & Entertainment > Sports (1.00)
- Government (1.00)
- (2 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Communications > Mobile (0.82)
Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams
Tanjim, Tauhid, George, Jonathan St., Ching, Kevin, Taylor, Angelique
The human-robot interaction (HRI) field has recognized the importance of enabling robots to interact with teams. Human teams rely on effective communication for successful collaboration in time-sensitive environments. Robots can play a role in enhancing team coordination through real-time assistance. Despite significant progress in human-robot teaming research, there remains an essential gap in how robots can effectively communicate with action teams using multimodal interaction cues in time-sensitive environments. This study addresses this knowledge gap in an experimental in-lab study to investigate how multimodal robot communication in action teams affects workload and human perception of robots. We explore team collaboration in a medical training scenario where a robotic crash cart (RCC) provides verbal and non-verbal cues to help users remember to perform iterative tasks and search for supplies. Our findings show that verbal cues for object search tasks and visual cues for task reminders reduce team workload and increase perceived ease of use and perceived usefulness more effectively than a robot with no feedback. Our work contributes to multimodal interaction research in the HRI field, highlighting the need for more human-robot teaming research to understand best practices for integrating collaborative robots in time-sensitive environments such as in hospitals, search and rescue, and manufacturing applications.
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.94)
4 research-backed ways to beat the winter blues in the colder months
As winter approaches and daylight saving time has ended, many people are bracing themselves for shorter days, colder weather and what's often dismissed as the "winter blues ." But these seasonal shifts are more than a passing inconvenience, and can disrupt people's energy, moods and daily routines . Seasonal affective disorder (SAD) is a condition that heightens depressive symptoms during the fall and winter months, while the "winter blues" refers to a milder, temporary dip in mood. Although the exact cause of SAD remains unclear, it's thought to be linked to reduced exposure to natural light during the fall and winter, which can disrupt our circadian rhythm. Lower light levels affect brain chemistry by reducing serotonin -- a neurotransmitter that regulates mood, sleep and appetite -- while keeping melatonin elevated during daylight hours, leading to sleepiness and fatigue.
- North America > Canada (0.05)
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.05)
- Asia > Middle East > Jordan (0.05)
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
He, Wei, Sun, Yueqing, Hao, Hongyan, Hao, Xueyuan, Xia, Zhikang, Gu, Qi, Han, Chengcheng, Zhao, Dengchang, Su, Hui, Zhang, Kefeng, Gao, Man, Su, Xi, Cai, Xiaodong, Cai, Xunliang, Yang, Yu, Zhao, Yunke
As LLM-based agents are increasingly deployed in real-life scenarios, existing benchmarks fail to capture their inherent complexity of handling extensive information, leveraging diverse resources, and managing dynamic user interactions. To address this gap, we introduce VitaBench, a challenging benchmark that evaluates agents on versatile interactive tasks grounded in real-world settings. Drawing from daily applications in food delivery, in-store consumption, and online travel services, VitaBench presents agents with the most complex life-serving simulation environment to date, comprising 66 tools. Through a framework that eliminates domain-specific policies, we enable flexible composition of these scenarios and tools, yielding 100 cross-scenario tasks (main results) and 300 single-scenario tasks. Each task is derived from multiple real user requests and requires agents to reason across temporal and spatial dimensions, utilize complex tool sets, proactively clarify ambiguous instructions, and track shifting user intent throughout multi-turn conversations. Moreover, we propose a rubric-based sliding window evaluator, enabling robust assessment of diverse solution pathways in complex environments and stochastic interactions. Our comprehensive evaluation reveals that even the most advanced models achieve only 30% success rate on cross-scenario tasks, and less than 50% success rate on others. Overall, we believe VitaBench will serve as a valuable resource for advancing the development of AI agents in practical real-world applications. The code, dataset, and leaderboard are available at https://vitabench.github.io/
- Transportation > Passenger (1.00)
- Health & Medicine (1.00)
- Transportation > Ground > Rail (0.94)
- (2 more...)
Appendix A Reminders about integral probability metrics Let
In the context of Section 4.1, we have (at least) the following instantiations of Assumption 4.2: (i) Assume the reward is bounded by r We provide a proof for Lemma 4.1 for completeness. Now we prove Theorem 4.2. We first note that a two-sided bound follows from Lemma 4.1: | η We outline the practical MOPO algorithm in Algorithm 2. To answer question (3), we conduct a thorough ablation study on MOPO. The main goal of the ablation study is to understand how the choice of reward penalty affects performance. Require: reward penalty coefficient λ rollout horizon h, rollout batch size b .