breakfast
- Europe > Italy (0.04)
- Asia > India (0.04)
- Indian Ocean > Arabian Gulf (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Media > Film (0.68)
- Leisure & Entertainment (0.68)
- Education > Curriculum > Subject-Specific Education (0.67)
- Education > Educational Setting (0.45)
An Ontology for Unified Modeling of Tasks, Actions, Environments, and Capabilities in Personal Service Robotics
Martorana, Margherita, Urgese, Francesca, Tiddi, Ilaria, Schlobach, Stefan
Personal service robots are increasingly used in domestic settings to assist older adults and people requiring support. Effective operation involves not only physical interaction but also the ability to interpret dynamic environments, understand tasks, and choose appropriate actions based on context. This requires integrating both hardware components (e.g. sensors, actuators) and software systems capable of reasoning about tasks, environments, and robot capabilities. Frameworks such as the Robot Operating System (ROS) provide open-source tools that help connect low-level hardware with higher-level functionalities. However, real-world deployments remain tightly coupled to specific platforms. As a result, solutions are often isolated and hard-coded, limiting interoperability, reusability, and knowledge sharing. Ontologies and knowledge graphs offer a structured way to represent tasks, environments, and robot capabilities. Existing ontologies, such as the Socio-physical Model of Activities (SOMA) and the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE), provide models for activities, spatial relationships, and reasoning structures. However, they often focus on specific domains and do not fully capture the connection between environment, action, robot capabilities, and system-level integration. In this work, we propose the Ontology for roBOts and acTions (OntoBOT), which extends existing ontologies to provide a unified representation of tasks, actions, environments, and capabilities. Our contributions are twofold: (1) we unify these aspects into a cohesive ontology to support formal reasoning about task execution, and (2) we demonstrate its generalizability by evaluating competency questions across four embodied agents - TIAGo, HSR, UR3, and Stretch - showing how OntoBOT enables context-aware reasoning, task-oriented execution, and knowledge sharing in service robotics.
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- North America > United States > New York > New York County > New York City (0.04)
The science behind what we eat for breakfast
Our ideas of what qualifies as breakfast food are cultural distinctions, not scientific ones. Breakthroughs, discoveries, and DIY tips sent every weekday. If you live in the United States, there's a good chance you were picturing some combination of eggs, bacon, cereal, and/or pancakes. Of course, we also know that a classic British breakfast consists of beans and fried bread--two savory foods most Americans don't associate with their first meal of the day. Is there a science behind why we think of some foods as breakfast and others as not?
- South America (0.05)
- North America > United States > Texas (0.05)
- North America > Central America (0.05)
- (2 more...)
- Health & Medicine (1.00)
- Government (0.70)
Four science-based rules that will make your conversations flow
One of the four pillars of good conversation is levity. You needn't be a comedian, you can but have some fun Conversation lies at the heart of our relationships – yet many of us find it surprisingly hard to talk to others. We may feel anxious at the thought of making small talk with strangers and struggle to connect with the people who are closest to us. If that sounds familiar, Alison Wood Brooks hopes to help. She is a professor at Harvard Business School, where she teaches an oversubscribed course called "TALK: How to talk gooder in business and life", and the author of a new book, Talk: The science of conversation and the art of being ourselves.
VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures
Sung, Yoo Yeon, Kim, Hannah, Zhang, Dan
AI practitioners increasingly use large language model (LLM) agents in compound AI systems to solve complex reasoning tasks, these agent executions often fail to meet human standards, leading to errors that compromise the system's overall performance. Addressing these failures through human intervention is challenging due to the agents' opaque reasoning processes, misalignment with human expectations, the complexity of agent dependencies, and the high cost of manual inspection. This paper thus introduces a human-centered evaluation framework for Verifying LLM Agent failures (VeriLA), which systematically assesses agent failures to reduce human effort and make these agent failures interpretable to humans. The framework first defines clear expectations of each agent by curating human-designed agent criteria. Then, it develops a human-aligned agent verifier module, trained with human gold standards, to assess each agent's execution output. This approach enables granular evaluation of each agent's performance by revealing failures from a human standard, offering clear guidelines for revision, and reducing human cognitive load. Our case study results show that VeriLA is both interpretable and efficient in helping practitioners interact more effectively with the system. By upholding accountability in human-agent collaboration, VeriLA paves the way for more trustworthy and human-aligned compound AI systems.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > Maryland (0.04)
- (4 more...)
Presumed Cultural Identity: How Names Shape LLM Responses
Pawar, Siddhesh, Arora, Arnav, Kaffee, Lucie-Aimée, Augenstein, Isabelle
Names are deeply tied to human identity. They can serve as markers of individuality, cultural heritage, and personal history. However, using names as a core indicator of identity can lead to over-simplification of complex identities. When interacting with LLMs, user names are an important point of information for personalisation. Names can enter chatbot conversations through direct user input (requested by chatbots), as part of task contexts such as CV reviews, or as built-in memory features that store user information for personalisation. We study biases associated with names by measuring cultural presumptions in the responses generated by LLMs when presented with common suggestion-seeking queries, which might involve making assumptions about the user. Our analyses demonstrate strong assumptions about cultural identity associated with names present in LLM generations across multiple cultures. Our work has implications for designing more nuanced personalisation systems that avoid reinforcing stereotypes while maintaining meaningful customisation.
- North America > Canada (0.15)
- Asia > Middle East > Republic of Türkiye (0.14)
- Europe > Russia (0.14)
- (40 more...)
Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments
Arora, Raghav, Singh, Shivam, Swaminathan, Karthik, Datta, Ahana, Banerjee, Snehasis, Bhowmick, Brojeshwar, Jatavallabhula, Krishna Murthy, Sridharan, Mohan, Krishna, Madhava
Assistive agents performing household tasks such as making the bed or cooking breakfast often compute and execute actions that accomplish one task at a time. However, efficiency can be improved by anticipating upcoming tasks and computing an action sequence that jointly achieves these tasks. State-of-the-art methods for task anticipation use data-driven deep networks and Large Language Models (LLMs), but they do so at the level of high-level tasks and/or require many training examples. Our framework leverages the generic knowledge of LLMs through a small number of prompts to perform high-level task anticipation, using the anticipated tasks as goals in a classical planning system to compute a sequence of finer-granularity actions that jointly achieve these goals. We ground and evaluate our framework's abilities in realistic scenarios in the VirtualHome environment and demonstrate a 31% reduction in execution time compared with a system that does not consider upcoming tasks.
- North America > United States > Massachusetts (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > India > Telangana > Hyderabad (0.04)
Everything You Need to Know About the WIRED & Octopus Energy Tech Summit 2024
Get ready for the return of the annual energy summit in Berlin on October 10. Returning for its second edition this October in Berlin, the WIRED & Octopus Energy Tech Summit is bringing together Europe's leading experts and visionaries in the green energy sector to explore how to accelerate the creation of a fully carbon-free energy system. Last year's summit focused on the urgent need for green technology in the wake of the energy crisis. Audiences heard from business leaders, startup founders, politicians, inventors, and even an astronaut. This year, energy leaders from across the EU will meet to carve the path to a rapid global energy transition.
CASPR: Automated Evaluation Metric for Contrastive Summarization
Ananthamurugan, Nirupan, Duong, Dat, George, Philip, Gupta, Ankita, Tata, Sandeep, Gunel, Beliz
Summarizing comparative opinions about entities (e.g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making. However, reliably measuring the contrastiveness of the output summaries without relying on human evaluations remains an open problem. Prior work has proposed token-overlap based metrics, Distinctiveness Score, to measure contrast which does not take into account the sensitivity to meaning-preserving lexical variations. In this work, we propose an automated evaluation metric CASPR to better measure contrast between a pair of summaries. Our metric is based on a simple and light-weight method that leverages natural language inference (NLI) task to measure contrast by segmenting reviews into single-claim sentences and carefully aggregating NLI scores between them to come up with a summary-level score. We compare CASPR with Distinctiveness Score and a simple yet powerful baseline based on BERTScore. Our results on a prior dataset CoCoTRIP demonstrate that CASPR can more reliably capture the contrastiveness of the summary pairs compared to the baselines.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Zelikman, Eric, Harik, Georges, Shao, Yijia, Jayasiri, Varuna, Haber, Nick, Goodman, Noah D.
When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%$\rightarrow$10.9%) and CommonsenseQA (36.3%$\rightarrow$47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way.