AITopics | Attarian, Maria

Collaborating Authors

Attarian, Maria

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GeoMatch++: Morphology Conditioned Geometry Matching for Multi-Embodiment Grasping

Wei, Yunze, Attarian, Maria, Gilitschenski, Igor

arXiv.org Artificial IntelligenceDec-25-2024

As we aspire to solve more dexterous tasks in robotics, multi-finger grasping becomes of increasing importance. However, the varying degrees of freedom (DoF) of end-effectors and high multimodality of grasping modes depending on both end-effectors and objects, still pose open challenges. Previous works in grasping focus on parallel grippers [1, 2, 3], a single multi-finger gripper [4, 5, 6, 7], or a shared policy for multiple dexterous grippers [8, 9, 10, 11]. However, even methods that explore cross-embodiment mostly focus on generalization to unseen objects, and still show limited zero-shot generalization to unseen grippers. In this work, we propose GeoMatch++, a multi-embodiment grasping method which improves out-of-domain generalization on unseen grippers by leveraging robot morphology. Intuitively, robot morphology is essential to grasping - various end-effectors may have a different number of fingers, but fingertips and palm tend to be the most frequent contact regions. Thus, we hypothesize that learning good morphology embeddings can lead to a transferable grasping policy between different robots. Our main contribution is learning geometry correlation features between objects and end-effector morphology, which improve out-of-domain grasp success by 9.64% compared to previous methods, and our method showcases a minimal decrease in performance compared to in-domain evaluation.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.18998

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)

Add feedback

Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Liang, Jacky, Xia, Fei, Yu, Wenhao, Zeng, Andy, Arenas, Montserrat Gonzalez, Attarian, Maria, Bauza, Maria, Bennice, Matthew, Bewley, Alex, Dostmohamed, Adil, Fu, Chuyuan Kelly, Gileadi, Nimrod, Giustina, Marissa, Gopalakrishnan, Keerthana, Hasenclever, Leonard, Humplik, Jan, Hsu, Jasmine, Joshi, Nikhil, Jyenis, Ben, Kew, Chase, Kirmani, Sean, Lee, Tsang-Wei Edward, Lee, Kuang-Huei, Michaely, Assaf Hurwitz, Moore, Joss, Oslund, Ken, Rao, Dushyant, Ren, Allen, Tabanpour, Baruch, Vuong, Quan, Wahid, Ayzaan, Xiao, Ted, Xu, Ying, Zhuang, Vincent, Xu, Peng, Frey, Erik, Caluwaerts, Ken, Zhang, Tingnan, Ichter, Brian, Tompson, Jonathan, Takayama, Leila, Vanhoucke, Vincent, Shafran, Izhak, Mataric, Maja, Sadigh, Dorsa, Heess, Nicolas, Rao, Kanishka, Stewart, Nik, Tan, Jie, Parada, Carolina

arXiv.org Artificial IntelligenceMay-31-2024

Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for only as long as it fits within the context size of the LLM, and can be forgotten over longer interactions. In this work, we investigate fine-tuning the robot code-writing LLMs, to remember their in-context interactions and improve their teachability i.e., how efficiently they adapt to human inputs (measured by average number of corrections before the user considers the task successful). Our key observation is that when human-robot interactions are viewed as a partially observable Markov decision process (in which human language inputs are observations, and robot code outputs are actions), then training an LLM to complete previous interactions is training a transition dynamics model -- that can be combined with classic robotics techniques such as model predictive control (MPC) to discover shorter paths to success. This gives rise to Language Model Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments -- improving non-expert teaching success rates of unseen tasks by 26.9% while reducing the average number of human corrections from 2.4 to 1.9. Experiments show that LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. See videos, code, and demos at: https://robot-teaching.github.io/.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.1145

Country: North America (0.14)

Genre: Research Report > Experimental Study (0.67)

Industry:

Education (1.00)
Energy > Oil & Gas > Upstream (0.80)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

Jain, Vidhi, Attarian, Maria, Joshi, Nikhil J, Wahid, Ayzaan, Driess, Danny, Vuong, Quan, Sanketi, Pannag R, Sermanet, Pierre, Welker, Stefan, Chan, Christine, Gilitschenski, Igor, Bisk, Yonatan, Dwibedi, Debidatta

arXiv.org Artificial IntelligenceMar-19-2024

While large-scale robotic systems typically rely on textual instructions for tasks, this work explores a different approach: can robots infer the task directly from observing humans? This shift necessitates the robot's ability to decode human intent and translate it into executable actions within its physical constraints and environment. We introduce Vid2Robot, a novel end-to-end video-based learning framework for robots. Given a video demonstration of a manipulation task and current visual observations, Vid2Robot directly produces robot actions. This is achieved through a unified representation model trained on a large dataset of human video and robot trajectory. The model leverages cross-attention mechanisms to fuse prompt video features to the robot's current state and generate appropriate actions that mimic the observed task. To further improve policy performance, we propose auxiliary contrastive losses that enhance the alignment between human and robot video representations. We evaluate Vid2Robot on real-world robots, demonstrating a 20% improvement in performance compared to other video-conditioned policies when using human demonstration videos. Additionally, our model exhibits emergent capabilities, such as successfully transferring observed motions from one object to another, and long-horizon composition, thus showcasing its potential for real-world applications. Project website: vid2robot.github.io

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.12943

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Geometry Matching for Multi-Embodiment Grasping

Attarian, Maria, Asif, Muhammad Adil, Liu, Jingzhou, Hari, Ruthrash, Garg, Animesh, Gilitschenski, Igor, Tompson, Jonathan

arXiv.org Artificial IntelligenceDec-6-2023

Many existing learning-based grasping approaches concentrate on a single embodiment, provide limited generalization to higher DoF end-effectors and cannot capture a diverse set of grasp modes. We tackle the problem of grasping using multiple embodiments by learning rich geometric representations for both objects and end-effectors using Graph Neural Networks. Our novel method - GeoMatch - applies supervised learning on grasping data from multiple embodiments, learning end-to-end contact point likelihood maps as well as conditional autoregressive predictions of grasps keypoint-by-keypoint. We compare our method against baselines that support multiple embodiments. Our approach performs better across three end-effectors, while also producing diverse grasps. Examples, including real robot demos, can be found at geo-match.github.io.

artificial intelligence, machine learning, point cloud, (18 more...)

arXiv.org Artificial Intelligence

2312.03864

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Combining Learned Lyrical Structures and Vocabulary for Improved Lyric Generation

Castro, Pablo Samuel, Attarian, Maria

arXiv.org Artificial IntelligenceNov-12-2018

The use of language models for generating lyrics and poetry has received an increased interest in the last few years. They pose a unique challenge relative to standard natural language problems, as their ultimate purpose is reative, notions of accuracy and reproducibility are secondary to notions of lyricism, structure, and diversity. In this creative setting, traditional quantitative measures for natural language problems, such as BLEU scores, prove inadequate: a high-scoring model may either fail to produce output respecting the desired structure (e.g. song verses), be a terribly boring creative companion, or both. In this work we propose a mechanism for combining two separately trained language models into a framework that is able to produce output respecting the desired song structure, while providing a richness and diversity of vocabulary that renders it more creatively appealing.

artificial intelligence, lyric, natural language, (19 more...)

arXiv.org Artificial Intelligence

1811.04651

Country: North America > Canada (0.14)

Genre: Research Report (0.50)

Industry:

Media > Music (0.70)
Leisure & Entertainment (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback