Goto

Collaborating Authors

 whiteboard


How can robots acquire skills through interactions with the physical world? An interview with Jiaheng Hu

AIHub

How can robots acquire skills through interactions with the physical world? One of the key challenges in building robots for household or industrial settings is the need to master the control of high-degree-of-freedom systems such as mobile manipulators. Reinforcement learning has been a promising avenue for acquiring robot control policies, however, scaling to complex systems has proved tricky. In their work SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL, and introduce a method that renders real-world reinforcement learning feasible for complex embodiments. We caught up with Jiaheng to find out more.



Supplementary Materials: Humans in Kitchens: A Dataset for Multi-Person Human Motion Forecasting with Scene Context

Neural Information Processing Systems

Figure 1: Sample scenes with 3d human poses projected onto camera views for each kitchen. A sample skeleton can be seen in Figure 2. frames: t; frame number in actual dataset time act: t 82; action annotations, where 1 determines an action and 0 its absence. On top of that, SMPL's shape parameter determines limb length ensuring that the body skeleton remains consistent across time. We bear all responsibility in case of violation of rights. Please note that the dataset can be used without the video data.


Asynchronous Collective Tree Exploration: a Distributed Algorithm, and a new Lower Bound

Cosson, Romain, Massoulié, Laurent

arXiv.org Artificial Intelligence

We study the problem of collective tree exploration in which a team of $k$ mobile agents must collectively visit all nodes of an unknown tree in as few moves as possible. The agents all start from the root and discover adjacent edges as they progress in the tree. Communication is distributed in the sense that agents share information by reading and writing on whiteboards located at all nodes. Movements are asynchronous, in the sense that the speeds of all agents are controlled by an adversary at all times. All previous competitive guarantees for collective tree exploration are either distributed but synchronous, or asynchronous but centralized. In contrast, we present a distributed asynchronous algorithm that explores any tree of $n$ nodes and depth $D$ in at most $2n+O(k^2 2^kD)$ moves, i.e., with a regret that is linear in $D$, and a variant algorithm with a guarantee in $O(k/\log k)(n+kD)$, i.e., with a competitive ratio in $O(k/\log k)$. We note that our regret guarantee is asymptotically optimal (i.e., $1$-competitive) from the perspective of average-case complexity. We then present a new general lower bound on the competitive ratio of asynchronous collective tree exploration, in $Ω(\log^2 k)$. This lower bound applies to both the distributed and centralized settings, and improves upon the previous lower bound in $Ω(\log k)$.


An Exploratory Study of ML Sketches and Visual Code Assistants

Gomes, Luís F., Hellendoorn, Vincent J., Aldrich, Jonathan, Abreu, Rui

arXiv.org Artificial Intelligence

This paper explores the integration of Visual Code Assistants in Integrated Development Environments (IDEs). In Software Engineering, whiteboard sketching is often the initial step before coding, serving as a crucial collaboration tool for developers. Previous studies have investigated patterns in SE sketches and how they are used in practice, yet methods for directly using these sketches for code generation remain limited. The emergence of visually-equipped large language models presents an opportunity to bridge this gap, which is the focus of our research. In this paper, we built a first prototype of a Visual Code Assistant to get user feedback regarding in-IDE sketch-to-code tools. We conduct an experiment with 19 data scientists, most of whom regularly sketch as part of their job. We investigate developers' mental models by analyzing patterns commonly observed in their sketches when developing an ML workflow. Analysis indicates that diagrams were the preferred organizational component (52.6%), often accompanied by lists (42.1%) and numbered points (36.8%). Our tool converts their sketches into a Python notebook by querying an LLM. We use an LLM-as-judge setup to score the quality of the generated code, finding that even brief sketching can effectively generate useful code outlines. We also find a positive correlation between sketch time and the quality of the generated code. We conclude the study by conducting extensive interviews to assess the tool's usefulness, explore potential use cases, and understand developers' needs. As noted by participants, promising applications for these assistants include education, prototyping, and collaborative settings. Our findings signal promise for the next generation of Code Assistants to integrate visual information, both to improve code generation and to better leverage developers' existing sketching practices.


Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs

Hu, Zichao, Li, Junyi Jessy, Guha, Arjun, Biswas, Joydeep

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown great promise at generating robot programs from natural language given domain-specific robot application programming interfaces (APIs). However, the performance gap between proprietary LLMs and smaller open-weight LLMs remains wide. This raises a question: Can we fine-tune smaller open-weight LLMs for generating domain-specific robot programs to close the performance gap with proprietary LLMs? While Self-Instruct is a promising solution by generating a diverse set of training data, it cannot verify the correctness of these programs. In contrast, a robot simulator with a well-defined world can identify execution errors but limits the diversity of programs that it can verify. In this work, we introduce Robo-Instruct, which brings the best of both worlds -- it promotes the diversity of Self-Instruct while providing the correctness of simulator-based checking. Robo-Instruct introduces RoboSim to synthesize a consistent world state on the fly by inferring properties relevant to the program being checked, and simulating actions accordingly. Furthermore, the instructions and programs generated by Self-Instruct may be subtly inconsistent -- such as the program missing a step implied by the instruction. Robo-Instruct further addresses this with InstAlign, an instruction-program alignment procedure that revises the task instruction to reflect the actual results of the generated program. Given a few seed task descriptions and the robot APIs, Robo-Instruct is capable of generating a training dataset using only a small open-weight model. This dataset can then be used to fine-tune small open-weight language models, enabling them to match or even exceed the performance of several proprietary LLMs, such as GPT-3.5-Turbo and Gemini-Pro.


Comparative Analysis of Programming by Demonstration Methods: Kinesthetic Teaching vs Human Demonstration

Maric, Bruno, Zoric, Filip, Petric, Frano, Orsag, Matko

arXiv.org Artificial Intelligence

Programming by demonstration (PbD) is a simple and efficient way to program robots without explicit robot programming. PbD enables unskilled operators to easily demonstrate and guide different robots to execute task. In this paper we present comparison of demonstration methods with comprehensive user study. Each participant had to demonstrate drawing simple pattern with human demonstration using virtual marker and kinesthetic teaching with robot manipulator. To evaluate differences between demonstration methods, we conducted user study with 24 participants which filled out NASA raw task load index (rTLX) and system usability scale (SUS). We also evaluated similarity of the executed trajectories to measure difference between demonstrated and ideal trajectory. We concluded study with finding that human demonstration using a virtual marker is on average 8 times faster, superior in terms of quality and imposes 2 times less overall workload than kinesthetic teaching.


Among the A.I. Doomsayers

The New Yorker

Katja Grace's apartment, in West Berkeley, is in an old machinist's factory, with pitched roofs and windows at odd angles. It has terra-cotta floors and no central heating, which can create the impression that you've stepped out of the California sunshine and into a duskier place, somewhere long ago or far away. Yet there are also some quietly futuristic touches. Nonperishables stacked in the pantry. A sleek white machine that does lab-quality RNA tests.


Because everything needs AI in 2023, Mattel added it to Pictionary

Engadget

It's the year 2023, so anything that can get an injection of AI will get an injection of AI. However, I doubt many people had the board game Pictionary on their artificial intelligence bingo card. Mattel just surprised us all and announced a new version of the game, Pictionary Vs. It's the brand's first title to "incorporate AI technology" and marks the company's "first major leap into the category." The difference between this and traditional Pictionary is that here everyone works to stump the artificial intelligence, instead of each other.


Zoom now says it won't use any customer content for AI training

Engadget

Zoom has reversed course (again) and updated its terms of service after a backlash earlier this week. Following consumer blowback to a recently highlighted update to its terms which appeared to grant the platform the unlimited ability to use customer data to train AI models, it now says it will not use any consumer data to train AI models from Zoom or third parties. The previous wording said it wouldn't do so "without customer consent," which raised eyebrows since "consent" was (at best) a gray area for people joining a call (and acknowledging a pop-up) in which the meeting organizer enabled the feature and already agreed to the terms. Zoom's changes were listed in a preamble update to its previous blog post. "Following feedback received regarding Zoom's recently updated terms of service, particularly related to our new generative artificial intelligence features, Zoom has updated our terms of service and the below blog post to make it clear that Zoom does not use any of your audio, video, chat, screen-sharing, attachments, or other communications like customer content (such as poll results, whiteboard, and reactions) to train Zoom's or third-party artificial intelligence models," the notice reads.