Goto

Collaborating Authors

 rico




Maximally-Informative Retrieval for State Space Model Generation

Becker, Evan, Bowman, Benjamin, Trager, Matthew, Liu, Tian Yu, Zancato, Luca, Xia, Wei, Soatto, Stefano

arXiv.org Artificial Intelligence

Given a query and dataset, the optimal way of answering the query is to make use all the information available. Modern LLMs exhibit impressive ability to memorize training data, but data not deemed important during training is forgotten, and information outside that training set cannot be made use of. Processing an entire dataset at inference time is infeasible due to the bounded nature of model resources (e.g. context size in transformers or states in state space models), meaning we must resort to external memory. This constraint naturally leads to the following problem: How can we decide based on the present query and model, what among a virtually unbounded set of known data matters for inference? To minimize model uncertainty for a particular query at test-time, we introduce Retrieval In-Context Optimization (RICO), a retrieval method that uses gradients from the LLM itself to learn the optimal mixture of documents for answer generation. Unlike traditional retrieval-augmented generation (RAG), which relies on external heuristics for document retrieval, our approach leverages direct feedback from the model. Theoretically, we show that standard top-$k$ retrieval with model gradients can approximate our optimization procedure, and provide connections to the leave-one-out loss. We demonstrate empirically that by minimizing an unsupervised loss objective in the form of question perplexity, we can achieve comparable retriever metric performance to BM25 with \emph{no finetuning}. Furthermore, when evaluated on quality of the final prediction, our method often outperforms fine-tuned dense retrievers such as E5.


RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Wang, Yuchi, Cai, Yishuo, Ren, Shuhuai, Yang, Sihan, Yao, Linli, Liu, Yuanxin, Zhang, Yuanxing, Wan, Pengfei, Sun, Xu

arXiv.org Artificial Intelligence

Image recaptioning is widely used to generate training datasets with enhanced quality for various multimodal tasks. Existing recaptioning methods typically rely on powerful multimodal large language models (MLLMs) to enhance textual descriptions, but often suffer from inaccuracies due to hallucinations and incompleteness caused by missing fine-grained details. To address these limitations, we propose RICO, a novel framework that refines captions through visual reconstruction. Specifically, we leverage a text-to-image model to reconstruct a caption into a reference image, and prompt an MLLM to identify discrepancies between the original and reconstructed images to refine the caption. This process is performed iteratively, further progressively promoting the generation of more faithful and comprehensive descriptions. To mitigate the additional computational cost induced by the iterative process, we introduce RICO-Flash, which learns to generate captions like RICO using DPO. Extensive experiments demonstrate that our approach significantly improves caption accuracy and completeness, outperforms most baselines by approximately 10% on both CapsBench and CompreCap. Code released at https://github.com/wangyuchi369/RICO.


Rico: extended TIAGo robot towards up-to-date social and assistive robot usage scenarios

Winiarski, Tomasz, Dudek, Wojciech, Giełdowski, Daniel

arXiv.org Artificial Intelligence

Social and assistive robotics have vastly increased in popularity in recent years. Due to the wide range of usage, robots executing such tasks must be highly reliable and possess enough functions to satisfy multiple scenarios. This article describes a mobile, artificial intelligence-driven, robotic platform Rico. Its prior usage in similar scenarios, the number of its capabilities, and the experiments it presented should qualify it as a proper arm-less platform for social and assistive circumstances.


Interpreting and learning voice commands with a Large Language Model for a robot system

Stankevich, Stanislau, Dudek, Wojciech

arXiv.org Artificial Intelligence

Robots are increasingly common in both industry and daily life, such as in nursing homes where they can assist staff. A key challenge is developing intuitive interfaces for easy communication. The use of Large Language Models (LLMs) like GPT-4 has enhanced robot capabilities, allowing for real-time interaction and decision-making. This integration improves robots' adaptability and functionality. This project focuses on merging LLMs with databases to improve decision-making and enable knowledge acquisition for the request interpretation problems.


Multimodal Icon Annotation For Mobile Applications

Zang, Xiaoxue, Xu, Ying, Chen, Jindong

arXiv.org Artificial Intelligence

Annotating user interfaces (UIs) that involves localization and classification of meaningful UI elements on a screen is a critical step for many mobile applications such as screen readers and voice control of devices. Annotating object icons, such as menu, search, and arrow backward, is especially challenging due to the lack of explicit labels on screens, their similarity to pictures, and their diverse shapes. Existing studies either use view hierarchy or pixel based methods to tackle the task. Pixel based approaches are more popular as view hierarchy features on mobile platforms are often incomplete or inaccurate, however it leaves out instructional information in the view hierarchy such as resource-ids or content descriptions. We propose a novel deep learning based multi-modal approach that combines the benefits of both pixel and view hierarchy features as well as leverages the state-of-the-art object detection techniques. In order to demonstrate the utility provided, we create a high quality UI dataset by manually annotating the most commonly used 29 icons in Rico, a large scale mobile design dataset consisting of 72k UI screenshots. The experimental results indicate the effectiveness of our multi-modal approach. Our model not only outperforms a widely used object classification baseline but also pixel based object detection models. Our study sheds light on how to combine view hierarchy with pixel features for annotating UI elements.


Recursed is not Recursive: A Jarring Result

Demaine, Erik, Kopinsky, Justin, Lynch, Jayson

arXiv.org Artificial Intelligence

Recursed is a 2D puzzle platform video game featuring treasure chests that, when jumped into, instantiate a room that can later be exited (similar to function calls), optionally generating a \jar that returns back to that room (similar to continuations). We prove that Recursed is RE-complete and thus undecidable (not recursive) by a reduction from the Post Correspondence Problem. Our reduction is "practical": the reduction from PCP results in fully playable levels that abide by all constraints governing levels (including the 15x20 room size) designed for the main game. Our reduction is also "efficient": a Turing machine can be simulated by a Recursed level whose size is linear in the encoding size of the Turing machine and whose solution length is polynomial in the running time of the Turing machine.


The best weapon in 'Just Cause 4' is Mother Nature

Engadget

Just Cause 4 arrives at the end of a busy season of open world games. Fortunately, the series has always done things differently from the likes of Assassin's Creed, Read Dead Redemption, Far Cry and the rest. It's the game that coaxes you into causing destruction and explosions, offering a shamelessly hard-boiled physics playground for you cut loose inside. During a lengthy playtime session last week with what appears to be very close to the final game, Just Cause 4 begs to be live-streamed, clipped and shared on Twitch, Twitter, Reddit, Discord and everywhere else. That's how the team describes both the new elemental forces (tornadoes in four different kinds), and Rico Rodriguez and his super-powered grappling hook.


'Just Cause 4' parachutes in December 4th

Engadget

Square Enix brought more than just Kingdom Hearts to Microsoft's pre-E3 2018 press conference -- there's also Just Cause 4. Rico is back, once again parachuting into some disastrous warzone with both guns and attitude in tow. The quick trailer showed off some new-gen powered gameplay with plenty of driving, shooting, explosions and impressive weather effects as our hero plunged directly into a tornado. The game is scheduled for release December 4th on Xbox One, PS4 and PC, so at least that's one title we won't be waiting until Q1 2019 for. While a gameplay trailer showed off Just Cause 4 at the event, you should look below for the "Welcome to Just Cause 4" video that goes in-depth with developers from Avalanche Studios (which is also behind the 80s-themed robot shooter Generation Zero) explaining what's new this time around. This time the game takes place on the home turf of a group that has been featured in previous games called The Black Hand.