cupboard
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Bristol (0.04)
Supplementary Materials: Humans in Kitchens: A Dataset for Multi-Person Human Motion Forecasting with Scene Context
Figure 1: Sample scenes with 3d human poses projected onto camera views for each kitchen. A sample skeleton can be seen in Figure 2. frames: t; frame number in actual dataset time act: t 82; action annotations, where 1 determines an action and 0 its absence. On top of that, SMPL's shape parameter determines limb length ensuring that the body skeleton remains consistent across time. We bear all responsibility in case of violation of rights. Please note that the dataset can be used without the video data.
Supplementary Materials: Humans in Kitchens: A Dataset for Multi-Person Human Motion Forecasting with Scene Context
Figure 1: Sample scenes with 3d human poses projected onto camera views for each kitchen. A sample skeleton can be seen in Figure 2. frames: t; frame number in actual dataset time act: t 82; action annotations, where 1 determines an action and 0 its absence. On top of that, SMPL's shape parameter determines limb length ensuring that the body skeleton remains consistent across time. We bear all responsibility in case of violation of rights. Please note that the dataset can be used without the video data.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Bristol (0.04)
DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation
Chen, Zixuan, Yin, Junhui, Chen, Yangtao, Huo, Jing, Tian, Pinzhuo, Shi, Jieqi, Hou, Yiwen, Li, Yinchuan, Gao, Yang
Generalizing language-conditioned multi-task imitation learning (IL) models to novel long-horizon 3D manipulation tasks remains a significant challenge. To address this, we propose DeCo (Task Decomposition and Skill Composition), a model-agnostic framework compatible with various multi-task IL models, designed to enhance their zero-shot generalization to novel, compositional, long-horizon 3D manipulation tasks. DeCo first decomposes IL demonstrations into a set of modular atomic tasks based on the physical interaction between the gripper and objects, and constructs an atomic training dataset that enables models to learn a diverse set of reusable atomic skills during imitation learning. At inference time, DeCo leverages a vision-language model (VLM) to parse high-level instructions for novel long-horizon tasks, retrieve the relevant atomic skills, and dynamically schedule their execution; a spatially-aware skill-chaining module then ensures smooth, collision-free transitions between sequential skills. We evaluate DeCo in simulation using DeCoBench, a benchmark specifically designed to assess zero-shot generalization of multi-task IL models in compositional long-horizon 3D manipulation. Across three representative multi-task IL models (RVT-2, 3DDA, and ARP), DeCo achieves success rate improvements of 66.67%, 21.53%, and 57.92%, respectively, on 12 novel compositional tasks. Moreover, in real-world experiments, a DeCo-enhanced model trained on only 6 atomic tasks successfully completes 9 novel long-horizon tasks, yielding an average success rate improvement of 53.33% over the base multi-task IL model. Video demonstrations are available at: https://deco226.github.io.
- Asia > Singapore (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
Plant in Cupboard, Orange on Table, Book on Shelf. Benchmarking Practical Reasoning and Situation Modelling in a Text-Simulated Situated Environment
Jordan, Jonathan, Hakimov, Sherzod, Schlangen, David
Large language models (LLMs) have risen to prominence as 'chatbots' for users to interact via natural language. However, their abilities to capture common-sense knowledge make them seem promising as language-based planners of situated or embodied action as well. We have implemented a simple text-based environment -- similar to others that have before been used for reinforcement-learning of agents -- that simulates, very abstractly, a household setting. We use this environment and the detailed error-tracking capabilities we implemented for targeted benchmarking of LLMs on the problem of practical reasoning: Going from goals and observations to actions. Our findings show that environmental complexity and game restrictions hamper performance, and concise action planning is demanding for current LLMs.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- (2 more...)
Bee-ware of British honey! Almost half the varieties sold in UK supermarkets are bulked out with cheap sugar syrups, research reveals - but a new test can detect if the one in your cupboard is fake
The humble jar of honey might seem sweet and innocent, but experts warn that British shoppers have been getting stung when spending on this staple. Investigations have revealed that unscrupulous honey producers around the world bulk out their products with cheap sugars that are almost impossible to detect. However, scientists have now developed a test which can easily spot the difference between fake and real honey - without even opening the jar. The light-based technique can detect the unique chemical signature of real honey as well as the syrups that try to imitate it. While the test isn't readily available yet, experts told MailOnline that consumers may be able to spot the frauds in their cupboards using nothing more than their phone torch within five to 10 years.
- Europe > United Kingdom (0.05)
- Asia > Middle East > Republic of Türkiye (0.05)
- Asia > China (0.05)
- Health & Medicine (0.47)
- Retail (0.41)
- Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.41)
Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models
Jung, Chani, Kim, Dongkwan, Jin, Jiho, Kim, Jiseon, Seonwoo, Yeon, Choi, Yejin, Oh, Alice, Kim, Hyunwoo
While humans naturally develop theory of mind (ToM), the capability to understand other people's mental states and beliefs, state-of-the-art large language models (LLMs) underperform on simple ToM benchmarks. We posit that we can extend our understanding of LLMs' ToM abilities by evaluating key human ToM precursors -- perception inference and perception-to-belief inference -- in LLMs. We introduce two datasets, Percept-ToMi and Percept-FANToM, to evaluate these precursory inferences for ToM in LLMs by annotating characters' perceptions on ToMi and FANToM, respectively. Our evaluation of eight state-of-the-art LLMs reveals that the models generally perform well in perception inference while exhibiting limited capability in perception-to-belief inference (e.g., lack of inhibitory control). Based on these results, we present PercepToM, a novel ToM method leveraging LLMs' strong perception inference capability while supplementing their limited perception-to-belief inference. Experimental results demonstrate that PercepToM significantly enhances LLM's performance, especially in false belief scenarios.
- Asia > Singapore (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
- (2 more...)
On the Efficacy of Text-Based Input Modalities for Action Anticipation
Beedu, Apoorva, Samel, Karan, Essa, Irfan
Although the task of anticipating future actions is highly uncertain, information from additional modalities help to narrow down plausible action choices. Each modality provides different environmental context for the model to learn from. While previous multi-modal methods leverage information from modalities such as video and audio, we primarily explore how text inputs for actions and objects can also enable more accurate action anticipation. Therefore, we propose a Multi-modal Anticipative Transformer (MAT), an attention-based video transformer architecture that jointly learns from multi-modal features and text captions. We train our model in two-stages, where the model first learns to predict actions in the video clip by aligning with captions, and during the second stage, we fine-tune the model to predict future actions. Compared to existing methods, MAT has the advantage of learning additional environmental context from two kinds of text inputs: action descriptions during the pre-training stage, and the text inputs for detected objects and actions during modality feature fusion. Through extensive experiments, we evaluate the effectiveness of the pre-training stage, and show that our model outperforms previous methods on all datasets. In addition, we examine the impact of object and action information obtained via text and perform extensive ablations. We evaluate the performance on on three datasets: EpicKitchens-100, EpicKitchens-55 and EGTEA GAZE+; and show that text descriptions do indeed aid in more effective action anticipation.
Evaluating Large Language Model Creativity from a Literary Perspective
Shanahan, Murray, Clarke, Catherine
This paper assesses the potential for large language models (LLMs) to serve as assistive tools in the creative writing process, by means of a single, in-depth case study. In the course of the study, we develop interactive and multi-voice prompting strategies that interleave background descriptions (scene setting, plot elements), instructions that guide composition, samples of text in the target style, and critical discussion of the given samples. We qualitatively evaluate the results from a literary critical perspective, as well as from the standpoint of computational creativity (a sub-field of artificial intelligence). Our findings lend support to the view that the sophistication of the results that can be achieved with an LLM mirrors the sophistication of the prompting.
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > Pennsylvania (0.04)
- (2 more...)