Goto

Collaborating Authors

 food item


Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering

Jain, Riddhi, Patwardhan, Manasi, Deshpande, Parijat, Runkana, Venkataramana

arXiv.org Artificial Intelligence

Abstract--The immense diversity in the culture and culinary of Indian cuisines calls attention to the major shortcoming of the existing Visual Question Answering(VQA) systems which are inclined towards the foods from western regionRecent attempt towards building a VQA dataset for Indian food is a step towards addressing this challenge. However, their approach towards VQA follows a two-step process in which the answer is generated first, followed by the explanation of the expected answer . In this work, we claim that food VQA requires to follow a multi-step reasoning process to arrive at an accurate answer, especially in the context of India food, which involves understanding complex culinary context and identifying relationships between various food items. With this hypothesis we create reasoning chains upon the QA with minimal human intervention. With augmentation of reasoning chains, we observed accuracy improvement of an average 10 percentage points on the baseline. We provide detailed analysis in terms the effect of addition of reasoning chains for the Indian Food VQA task. One of the most important part of culture and social aspects in everyday life is food. In a country like India, food highlights immense diversity based on geography, religion, and traditions of different regions. A single mealcontain items which differ in preparation, presentation and flavor. This richness in the culinary and the culture, poses unique set of challenges for AI systems that target the understanding of content related to Indian food. A powerful framework that has emerged to connect visual and language reasoning is Visual Question Answering(VQA) [6].


Food4All: A Multi-Agent Framework for Real-time Free Food Discovery with Integrated Nutritional Metadata

Yuan, Zhengqing, Li, Yiyang, Sun, Weixiang, Zhang, Zheyuan, Shi, Kaiwen, Murugesan, Keerthiram, Ye, Yanfang

arXiv.org Artificial Intelligence

Food insecurity remains a persistent public health emergency in the United States, tightly interwoven with chronic disease, mental illness, and opioid misuse. Yet despite the existence of thousands of food banks and pantries, access remains fragmented: 1) current retrieval systems depend on static directories or generic search engines, which provide incomplete and geographically irrelevant results; 2) LLM-based chatbots offer only vague nutritional suggestions and fail to adapt to real-world constraints such as time, mobility, and transportation; and 3) existing food recommendation systems optimize for culinary diversity but overlook survival-critical needs of food-insecure populations, including immediate proximity, verified availability, and contextual barriers. These limitations risk leaving the most vulnerable individuals, those experiencing homelessness, addiction, or digital illiteracy, unable to access urgently needed resources. To address this, we introduce Food4All, the first multi-agent framework explicitly designed for real-time, context-aware free food retrieval. Food4All unifies three innovations: 1) heterogeneous data aggregation across official databases, community platforms, and social media to provide a continuously updated pool of food resources; 2) a lightweight reinforcement learning algorithm trained on curated cases to optimize for both geographic accessibility and nutritional correctness; and 3) an online feedback loop that dynamically adapts retrieval policies to evolving user needs. By bridging information acquisition, semantic analysis, and decision support, Food4All delivers nutritionally annotated and guidance at the point of need. This framework establishes an urgent step toward scalable, equitable, and intelligent systems that directly support populations facing food insecurity and its compounding health risks.


GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks

Tai, Yen-Ling, Yang, Yi-Ru, Yu, Kuan-Ting, Chao, Yu-Wei, Chen, Yi-Ting

arXiv.org Artificial Intelligence

Robotic food scooping is a critical manipulation skill for food preparation and service robots. However, existing robot learning algorithms, especially learn-from-demonstration methods, still struggle to handle diverse and dynamic food states, which often results in spillage and reduced reliability. In this work, we introduce GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks. This framework leverages guided diffusion policy to minimize food spillage during scooping and to ensure reliable transfer of food items from the initial to the target location. Specifically, we design a spillage predictor that estimates the probability of spillage given current observation and action rollout. The predictor is trained on a simulated dataset with food spillage scenarios, constructed from four primitive shapes (spheres, cubes, cones, and cylinders) with varied physical properties such as mass, friction, and particle size. At inference time, the predictor serves as a differentiable guidance signal, steering the diffusion sampling process toward safer trajectories while preserving task success. We validate GRITS on a real-world robotic food scooping platform. GRITS is trained on six food categories and evaluated on ten unseen categories with different shapes and quantities. GRITS achieves an 82% task success rate and a 4% spillage rate, reducing spillage by over 40% compared to baselines without guidance, thereby demonstrating its effectiveness.


"Teammates, Am I Clear?": Analysing Legible Behaviours in Teams

Faria, Miguel, Melo, Francisco S., Paiva, Ana

arXiv.org Artificial Intelligence

In this paper we investigate the notion of legibility in sequential decision-making in the context of teams and teamwork. There have been works that extend the notion of legibility to sequential decision making, for deterministic and for stochastic scenarios. However, these works focus on one agent interacting with one human, foregoing the benefits of having legible decision making in teams of agents or in team configurations with humans. In this work we propose an extension of legible decision-making to multi-agent settings that improves the performance of agents working in collaboration. We showcase the performance of legible decision making in team scenarios using our proposed extension in multi-agent benchmark scenarios. We show that a team with a legible agent is able to outperform a team composed solely of agents with standard optimal behaviour.


Dietary Intake Estimation via Continuous 3D Reconstruction of Food

Lee, Wallace, Chen, YuHao

arXiv.org Artificial Intelligence

Monitoring dietary habits is crucial for preventing health risks associated with overeating and undereating, including obesity, diabetes, and cardiovascular diseases. Traditional methods for tracking food intake rely on self-reported data before or after the eating, which are prone to inaccuracies. This study proposes an approach to accurately monitor ingest behaviours by leveraging 3D food models constructed from monocular 2D video. Using COLMAP and pose estimation algorithms, we generate detailed 3D representations of food, allowing us to observe changes in food volume as it is consumed. Experiments with toy models and real food items demonstrate the approach's potential. Meanwhile, we have proposed a new methodology for automated state recognition challenges to accurately detect state changes and maintain model fidelity. The 3D reconstruction approach shows promise in capturing comprehensive dietary behaviour insights, ultimately contributing to the development of automated and accurate dietary monitoring tools.


World Food Atlas Project

Rostami, Ali, Xie, Z, Ishino, A, Yamakata, Y, Aizawa, K, Jain, Ramesh

arXiv.org Artificial Intelligence

A coronavirus pandemic is forcing people to be "at home" all over the world. In a life of hardly ever going out, we would have realized how the food we eat affects our bodies. What can we do to know our food more and control it better? To give us a clue, we are trying to build a World Food Atlas (WFA) that collects all the knowledge about food in the world. In this paper, we present two of our trials. The first is the Food Knowledge Graph (FKG), which is a graphical representation of knowledge about food and ingredient relationships derived from recipes and food nutrition data. The second is the FoodLog Athl and the RecipeLog that are applications for collecting people's detailed records about food habit. We also discuss several problems that we try to solve to build the WFA by integrating these two ideas.


Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT

Tayal, Anuja, Salunke, Devika, Di Eugenio, Barbara, Allen-Meares, Paula, Abril, Eulalia Puig, Garcia, Olga, Dickens, Carolyn, Boyd, Andrew

arXiv.org Artificial Intelligence

Conversational assistants are becoming more and more popular, including in healthcare, partly because of the availability and capabilities of Large Language Models. There is a need for controlled, probing evaluations with real stakeholders which can highlight advantages and disadvantages of more traditional architectures and those based on generative AI. We present a within-group user study to compare two versions of a conversational assistant that allows heart failure patients to ask about salt content in food. One version of the system was developed in-house with a neurosymbolic architecture, and one is based on ChatGPT. The evaluation shows that the in-house system is more accurate, completes more tasks and is less verbose than the one based on ChatGPT; on the other hand, the one based on ChatGPT makes fewer speech errors and requires fewer clarifications to complete the task. Patients show no preference for one over the other.


Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

Karim, Aabid, Karim, Abdul, Lohana, Bhoomika, Keon, Matt, Singh, Jaswinder, Sattar, Abdul

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have significantly advanced various fields, particularly coding, mathematical reasoning, and logical problem solving. However, a critical question remains: Do these mathematical reasoning abilities persist when LLMs are presented with culturally adapted math problems? Specifically, how do LLMs perform when faced with math problems embedded in cultural contexts that have no significant representation in main stream web-scale AI training data? To explore this, we generated six synthetic cultural datasets from GSM8K, a widely used benchmark for assessing LLMs' mathematical reasoning skills. While preserving the mathematical logic and numerical values of the original GSM8K test set, we modify cultural elements such as personal names, food items, place names, etc. These culturally adapted datasets provide a more reliable framework for evaluating LLMs' mathematical reasoning under shifting cultural contexts. Our findings reveal that LLMs struggle with math problems when cultural references change, even though the underlying mathematical structure remains constant. Smaller models exhibit greater performance drops compared to larger models. Interestingly, our results also suggest that cultural familiarity can enhance mathematical reasoning. Even models with no explicit mathematical training but exposure to relevant cultural contexts sometimes outperform larger, mathematically proficient models on culturally embedded math problems. This study highlights the impact of cultural context on the mathematical reasoning abilities of LLMs, underscoring the need for more diverse and representative training data to improve robustness in real-world applications. The benchmark data sets and script for reproducing the results are available at https://github.com/akarim23131/Lost_in_Cultural_Translation


Kiri-Spoon: A Kirigami Utensil for Robot-Assisted Feeding

Keely, Maya, Franco, Brandon, Grothoff, Casey, Jenamani, Rajat Kumar, Bhattacharjee, Tapomayukh, Losey, Dylan P., Nemlekar, Heramb

arXiv.org Artificial Intelligence

For millions of adults with mobility limitations, eating meals is a daily challenge. A variety of robotic systems have been developed to address this societal need. Unfortunately, end-user adoption of robot-assisted feeding is limited, in part because existing devices are unable to seamlessly grasp, manipulate, and feed diverse foods. Recent works seek to address this issue by creating new algorithms for food acquisition and bite transfer. In parallel to these algorithmic developments, however, we hypothesize that mechanical intelligence will make it fundamentally easier for robot arms to feed humans. We therefore propose Kiri-Spoon, a soft utensil specifically designed for robot-assisted feeding. Kiri-Spoon consists of a spoon-shaped kirigami structure: when actuated, the kirigami sheet deforms into a bowl of increasing curvature. Robot arms equipped with Kiri-Spoon can leverage the kirigami structure to wrap-around morsels during acquisition, contain those items as the robot moves, and then compliantly release the food into the user's mouth. Overall, Kiri-Spoon combines the familiar and comfortable shape of a standard spoon with the increased capabilities of soft robotic grippers. In what follows, we first apply a stakeholder-driven design process to ensure that Kiri-Spoon meets the needs of caregivers and users with physical disabilities. We next characterize the dynamics of Kiri-Spoon, and derive a mechanics model to relate actuation force to the spoon's shape. The paper concludes with three separate experiments that evaluate (a) the mechanical advantage provided by Kiri-Spoon, (b) the ways users with disabilities perceive our system, and (c) how the mechanical intelligence of Kiri-Spoon complements state-of-the-art algorithms. Our results suggest that Kiri-Spoon advances robot-assisted feeding across diverse foods, multiple robotic platforms, and different manipulation algorithms.


Machine Learning for Sentiment Analysis of Imported Food in Trinidad and Tobago

Daniels, Cassandra, Khan, Koffka

arXiv.org Artificial Intelligence

This research investigates the performance of various machine learning algorithms (CNN, LSTM, VADER, and RoBERTa) for sentiment analysis of Twitter data related to imported food items in Trinidad and Tobago. The study addresses three primary research questions: the comparative accuracy and efficiency of the algorithms, the optimal configurations for each model, and the potential applications of the optimized models in a live system for monitoring public sentiment and its impact on the import bill. The dataset comprises tweets from 2018 to 2024, divided into imbalanced, balanced, and temporal subsets to assess the impact of data balancing and the COVID-19 pandemic on sentiment trends. Ten experiments were conducted to evaluate the models under various configurations. Results indicated that VADER outperformed the other models in both multi-class and binary sentiment classifications. The study highlights significant changes in sentiment trends pre- and post-COVID-19, with implications for import policies.