Goto

Collaborating Authors

 user manual


Robot Operation of Home Appliances by Reading User Manuals

arXiv.org Artificial Intelligence

Operating home appliances, among the most common tools in every household, is a critical capability for assistive home robots. This paper presents ApBot, a robot system that operates novel household appliances by "reading" their user manuals. ApBot faces multiple challenges: (i) infer goal-conditioned partial policies from their unstructured, textual descriptions in a user manual document, (ii) ground the policies to the appliance in the physical world, and (iii) execute the policies reliably over potentially many steps, despite compounding errors. To tackle these challenges, ApBot constructs a structured, symbolic model of an appliance from its manual, with the help of a large vision-language model (VLM). It grounds the symbolic actions visually to control panel elements. Finally, ApBot closes the loop by updating the model based on visual feedback. Our experiments show that across a wide range of simulated and real-world appliances, ApBot achieves consistent and statistically significant improvements in task success rate, compared with state-of-the-art large VLMs used directly as control policies. These results suggest that a structured internal representations plays an important role in robust robot operation of home appliances, especially, complex ones.


Bridging Literature and the Universe Via A Multi-Agent Large Language Model System

arXiv.org Artificial Intelligence

As cosmological simulations and their associated software become increasingly complex, physicists face the challenge of searching through vast amounts of literature and user manuals to extract simulation parameters from dense academic papers, each using different models and formats. Translating these parameters into executable scripts remains a time-consuming and error-prone process. To improve efficiency in physics research and accelerate the cosmological simulation process, we introduce SimAgents, a multi-agent system designed to automate both parameter configuration from the literature and preliminary analysis for cosmology research. SimAgents is powered by specialized LLM agents capable of physics reasoning, simulation software validation, and tool execution. These agents collaborate through structured communication, ensuring that extracted parameters are physically meaningful, internally consistent, and software-compliant. We also construct a cosmological parameter extraction evaluation dataset by collecting over 40 simulations in published papers from Arxiv and leading journals that cover diverse simulation types. Experiments on the dataset demonstrate a strong performance of SimAgents, highlighting its effectiveness and potential to accelerate scientific research for physicists. Our demonstration video is available at: https://youtu.be/w1zLpm_CaWA. The complete system and dataset are publicly available at https://github.com/xwzhang98/SimAgents.


Bridging the Gap with Retrieval-Augmented Generation: Making Prosthetic Device User Manuals Available in Marginalised Languages

arXiv.org Artificial Intelligence

Millions of people in African countries face barriers to accessing healthcare due to language and literacy gaps. This research tackles this challenge by transforming complex medical documents -- in this case, prosthetic device user manuals -- into accessible formats for underserved populations. This case study in cross-cultural translation is particularly pertinent/relevant for communities that receive donated prosthetic devices but may not receive the accompanying user documentation. Or, if available online, may only be available in formats (e.g., language and readability) that are inaccessible to local populations (e.g., English-language, high resource settings/cultural context). The approach is demonstrated using the widely spoken Pidgin dialect, but our open-source framework has been designed to enable rapid and easy extension to other languages/dialects. This work presents an AI-powered framework designed to process and translate complex medical documents, e.g., user manuals for prosthetic devices, into marginalised languages. The system enables users -- such as healthcare workers or patients -- to upload English-language medical equipment manuals, pose questions in their native language, and receive accurate, localised answers in real time. Technically, the system integrates a Retrieval-Augmented Generation (RAG) pipeline for processing and semantic understanding of the uploaded manuals. It then employs advanced Natural Language Processing (NLP) models for generative question-answering and multilingual translation. Beyond simple translation, it ensures accessibility to device instructions, treatment protocols, and safety information, empowering patients and clinicians to make informed healthcare decisions.


Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents

arXiv.org Artificial Intelligence

The deployment of Large Language Models (LLMs) in customer support is constrained by hallucination-generating false information-and the high cost of proprietary models. To address these challenges, we propose a retrieval-augmented question-answering (QA) pipeline and explore how to balance human input and automation. Using a dataset of questions about a Samsung Smart TV user manual, we demonstrate that synthetic data generated by LLMs outperforms crowdsourced data in reducing hallucination in finetuned models. We also compare self-training (fine-tuning models on their own outputs) and knowledge distillation (fine-tuning on stronger models' outputs, e.g., GPT-4o), and find that self-training achieves comparable hallucination reduction. We conjecture that this surprising finding can be attributed to increased exposure bias issues in the knowledge distillation case and support this conjecture with post hoc analysis. We also improve robustness to unanswerable questions and retrieval failures with contextualized "I don't know" responses. These findings show that scalable, cost-efficient QA systems can be built using synthetic data and self-training with open-source models, reducing reliance on proprietary tools or costly human annotations.


CARE: A Clue-guided Assistant for CSRs to Read User Manuals

arXiv.org Artificial Intelligence

It is time-saving to build a reading assistant for customer service representations (CSRs) when reading user manuals, especially information-rich ones. Current solutions don't fit the online custom service scenarios well due to the lack of attention to user questions and possible responses. Hence, we propose to develop a time-saving and careful reading assistant for CSRs, named CARE. It can help the CSRs quickly find proper responses from the user manuals via explicit clue chains. Specifically, each of the clue chains is formed by inferring over the user manuals, starting from the question clue aligned with the user question and ending at a possible response. To overcome the shortage of supervised data, we adopt the self-supervised strategy for model learning. The offline experiment shows that CARE is efficient in automatically inferring accurate responses from the user manual. The online experiment further demonstrates the superiority of CARE to reduce CSRs' reading burden and keep high service quality, in particular with >35% decrease in time spent and keeping a >0.75 ICC score.


Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals

arXiv.org Artificial Intelligence

The machine reading comprehension (MRC) of user manuals has huge potential in customer service. However, current methods have trouble answering complex questions. Therefore, we introduce the Knowing-how & Knowing-that task that requires the model to answer factoid-style, procedure-style, and inconsistent questions about user manuals. We resolve this task by jointly representing the steps and facts in a graph TARA, which supports a unified inference of various questions. Towards a systematical benchmarking study, we design a heuristic method to automatically parse user manuals into TARAs and build an annotated dataset to test the model's ability in answering real-world questions. Empirical results demonstrate that representing user manuals as TARAs is a desired solution for the MRC of user manuals. An in-depth investigation of TARA further sheds light on the issues and broader impacts of future representations of user manuals. We hope our work can move the MRC of user manuals to a more complex and realistic stage.


What China's Algorithm Registry Reveals about AI Governance - Carnegie Endowment for International Peace

#artificialintelligence

For the past year, the Chinese government has been conducting some of the earliest experiments in building regulatory tools to govern artificial intelligence (AI). In that process, China is trying to tackle a problem that will soon face governments around the world: Can regulators gain meaningful insight into the functioning of algorithms, and ensure they perform within acceptable bounds? One particular tool deserves attention both for its impact within China, and for the lessons technologists and policymakers in other countries can draw from it: a mandatory registration system created by China's internet regulator for recommendation algorithms. Although the full details of the registry are not public, by digging into its online instruction manual, we can reveal new insights into China's emerging regulatory architecture for algorithms. The algorithm registry was created by China's 2022 regulation on recommendation algorithms (English translation), which came into effect in March of this year and was led by the Cyberspace Administration of China (CAC).


Snap's first drone, Pixy, fully revealed in FCC photos

#artificialintelligence

Update April 28th, 1:09PM ET: Snap has officially revealed the Pixy, and we went hands-on with it. You can read our full article -- with video! Our original article about the FCC documents follows. It appears Snap is working on a drone called Pixy, and the whole thing just leaked with a huge amount of details, including photos and a seemingly unfinished user manual, published by the FCC. It's small: rulers in the photos indicate the drone is roughly 130 millimeters wide and 120 millimeters tall, which translates to approximately 5.1 inches by 4.7 inches.


Adapting machine translation models to new genres

#artificialintelligence

Neural machine translation systems are often optimized to perform well for specific text genres or domains, such as newspaper articles, user manuals, or customer support chats. In industrial settings with hundreds of language pairs to serve, however, a single translation system per language pair, which performs well across different text domains, is more efficient to deploy and maintain. Additionally, service providers may not know in advance which domains customers will be interested in. At this year's Conference on Empirical Methods in Natural Language Processing (EMNLP), we are presenting a new approach to multidomain adaptation for neural translation models, or adapting an existing model to new domains while maintaining translation quality in the original domain. Our approach provides a better trade-off between performance on old and new tasks than its predecessors do.


15 tech tips you won't find in a user manual

FOX News

Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. Most gadgets don't come with a user manual that spells out every single feature. We learn them by doing, when someone spills the beans, or asking, "How'd you do that?" For example, no one thinks to dive into a new router's settings.