AITopics | Reed, Scott

Plotting

Reed, Scott

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation

Maddukuri, Abhiram, Jiang, Zhenyu, Chen, Lawrence Yunliang, Nasiriany, Soroush, Xie, Yuqi, Fang, Yu, Huang, Wenqi, Wang, Zu, Xu, Zhenjia, Chernyadev, Nikita, Reed, Scott, Goldberg, Ken, Mandlekar, Ajay, Fan, Linxi, Zhu, Yuke

arXiv.org Artificial IntelligenceApr-2-2025

Large real-world robot datasets hold great potential to train generalist robot models, but scaling real-world human data collection is time-consuming and resource-intensive. Simulation has great potential in supplementing large-scale data, especially with recent advances in generative AI and automated data generation tools that enable scalable creation of robot behavior datasets. However, training a policy solely in simulation and transferring it to the real world often demands substantial human effort to bridge the reality gap. A compelling alternative is to co-train the policy on a mixture of simulation and real-world datasets. Preliminary studies have recently shown this strategy to substantially improve the performance of a policy over one trained on a limited amount of real-world data. Nonetheless, the community lacks a systematic understanding of sim-and-real co-training and what it takes to reap the benefits of simulation data for real-robot learning. This work presents a simple yet effective recipe for utilizing simulation data to solve vision-based robotic manipulation tasks. We derive this recipe from comprehensive experiments that validate the co-training strategy on various simulation and real-world datasets. Using two domains--a robot arm and a humanoid--across diverse tasks, we demonstrate that simulation data can enhance real-world task performance by an average of 38%, even with notable differences between the simulation and real-world data. Videos and additional results can be found at https://co-training.github.io/

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.24361

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

NVIDIA, null, Bjorck, Johan, Castañeda, Fernando, Cherniadev, Nikita, Da, Xingye, Ding, Runyu, Fan, Linxi "Jim", Fang, Yu, Fox, Dieter, Hu, Fengyuan, Huang, Spencer, Jang, Joel, Jiang, Zhenyu, Kautz, Jan, Kundalia, Kaushil, Lao, Lawrence, Li, Zhiqi, Lin, Zongyu, Lin, Kevin, Liu, Guilin, Llontop, Edith, Magne, Loic, Mandlekar, Ajay, Narayan, Avnish, Nasiriany, Soroush, Reed, Scott, Tan, You Liang, Wang, Guanzhi, Wang, Zu, Wang, Jing, Wang, Qi, Xiang, Jiannan, Xie, Yuqi, Xu, Yinzhen, Xu, Zhenjia, Ye, Seonghyeon, Yu, Zhiding, Zhang, Ao, Zhang, Hao, Zhao, Yizhou, Zheng, Ruijie, Zhu, Yuke

arXiv.org Artificial IntelligenceMar-18-2025

General-purpose robots need a versatile body and an intelligent mind. Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy in the human world. A robot foundation model, trained on massive and diverse data sources, is essential for enabling the robots to reason about novel situations, robustly handle real-world variability, and rapidly learn new tasks. To this end, we introduce GR00T N1, an open foundation model for humanoid robots. GR00T N1 is a Vision-Language-Action (VLA) model with a dual-system architecture. The vision-language module (System 2) interprets the environment through vision and language instructions. The subsequent diffusion transformer module (System 1) generates fluid motor actions in real time. Both modules are tightly coupled and jointly trained end-to-end. We train GR00T N1 with a heterogeneous mixture of real-robot trajectories, human videos, and synthetically generated datasets. We show that our generalist robot model GR00T N1 outperforms the state-of-the-art imitation learning baselines on standard simulation benchmarks across multiple robot embodiments. Furthermore, we deploy our model on the Fourier GR-1 humanoid robot for language-conditioned bimanual manipulation tasks, achieving strong performance with high data efficiency.

dataset, open foundation model, robot, (14 more...)

arXiv.org Artificial Intelligence

2503.14734

Country:

Europe (0.28)
North America > United States > New York (0.14)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.68)

Add feedback

Genie: Generative Interactive Environments

Bruce, Jake, Dennis, Michael, Edwards, Ashley, Parker-Holder, Jack, Shi, Yuge, Hughes, Edward, Lai, Matthew, Mavalankar, Aditi, Steigerwald, Richie, Apps, Chris, Aytar, Yusuf, Bechtle, Sarah, Behbahani, Feryal, Chan, Stephanie, Heess, Nicolas, Gonzalez, Lucy, Osindero, Simon, Ozair, Sherjil, Reed, Scott, Zhang, Jingwei, Zolna, Konrad, Clune, Jeff, de Freitas, Nando, Singh, Satinder, Rocktäschel, Tim

arXiv.org Artificial IntelligenceFeb-23-2024

We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.15391

Country:

North America > United States (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

Bousmalis, Konstantinos, Vezzani, Giulia, Rao, Dushyant, Devin, Coline, Lee, Alex X., Bauza, Maria, Davchev, Todor, Zhou, Yuxiang, Gupta, Agrim, Raju, Akhil, Laurens, Antoine, Fantacci, Claudio, Dalibard, Valentin, Zambelli, Martina, Martins, Murilo, Pevceviciute, Rugile, Blokzijl, Michiel, Denil, Misha, Batchelor, Nathan, Lampe, Thomas, Parisotto, Emilio, Żołna, Konrad, Reed, Scott, Colmenarejo, Sergio Gómez, Scholz, Jon, Abdolmaleki, Abbas, Groth, Oliver, Regli, Jean-Baptiste, Sushkov, Oleg, Rothörl, Tom, Chen, José Enrique, Aytar, Yusuf, Barker, Dave, Ortiz, Joy, Riedmiller, Martin, Springenberg, Jost Tobias, Hadsell, Raia, Nori, Francesco, Heess, Nicolas

arXiv.org Artificial IntelligenceDec-22-2023

The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.

large language model, natural language, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2306.11706

Genre: Research Report > New Finding (0.67)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Shaking the foundations: delusions in sequence models for interaction and control

Ortega, Pedro A., Kunesch, Markus, Delétang, Grégoire, Genewein, Tim, Grau-Moya, Jordi, Veness, Joel, Buchli, Jonas, Degrave, Jonas, Piot, Bilal, Perolat, Julien, Everitt, Tom, Tallec, Corentin, Parisotto, Emilio, Erez, Tom, Chen, Yutian, Reed, Scott, Hutter, Marcus, de Freitas, Nando, Legg, Shane

arXiv.org Artificial IntelligenceOct-20-2021

The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models "lack the understanding of the cause and effect of their actions" leading them to draw incorrect inferences due to auto-suggestive delusions. In this report we explain where this mismatch originates, and show that it can be resolved by treating actions as causal interventions. Finally, we show that in supervised learning, one can teach a system to condition or intervene on data by training with factual and counterfactual error signals respectively.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2110.10819

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Semi-supervised reward learning for offline reinforcement learning

Konyushkova, Ksenia, Zolna, Konrad, Aytar, Yusuf, Novikov, Alexander, Reed, Scott, Cabi, Serkan, de Freitas, Nando

arXiv.org Artificial IntelligenceDec-12-2020

In offline reinforcement learning (RL) agents are trained using a logged dataset. It appears to be the most natural route to attack real-life applications because in domains such as healthcare and robotics interactions with the environment are either expensive or unethical. Training agents usually requires reward functions, but unfortunately, rewards are seldom available in practice and their engineering is challenging and laborious. To overcome this, we investigate reward learning under the constraint of minimizing human reward annotations. We consider two types of supervision: timestep annotations and demonstrations. We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data. In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards. We further investigate the relationship between the quality of the reward model and the final policies. We notice, for example, that the reward models do not need to be perfect to result in useful policies.

annotation, artificial intelligence, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2012.06899

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Offline Learning from Demonstrations and Unlabeled Experience

Zolna, Konrad, Novikov, Alexander, Konyushkova, Ksenia, Gulcehre, Caglar, Wang, Ziyu, Aytar, Yusuf, Denil, Misha, de Freitas, Nando, Reed, Scott

arXiv.org Machine LearningNov-27-2020

Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations. However, BC does not effectively leverage what we will refer to as unlabeled experience: data of mixed and unknown quality without reward annotations. This unlabeled data can be generated by a variety of sources such as human teleoperation, scripted policies and other agents on the same robot. Towards data-driven offline robot learning that can use this unlabeled experience, we introduce Offline Reinforced Imitation Learning (ORIL). ORIL first learns a reward function by contrasting observations from demonstrator and unlabeled trajectories, then annotates all data with the learned reward, and finally trains an agent via offline reinforcement learning. Across a diverse set of continuous control and simulated robotic manipulation tasks, we show that ORIL consistently outperforms comparable BC agents by effectively leveraging unlabeled experience.

artificial intelligence, reinforcement learning, reward model, (18 more...)

arXiv.org Machine Learning

2011.13885

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Critic Regularized Regression

Wang, Ziyu, Novikov, Alexander, Zolna, Konrad, Springenberg, Jost Tobias, Reed, Scott, Shahriari, Bobak, Siegel, Noah, Merel, Josh, Gulcehre, Caglar, Heess, Nicolas, de Freitas, Nando

arXiv.org Artificial IntelligenceSep-5-2020

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learning from a fixed dataset. In this paper, we propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR). We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks.

artificial intelligence, rcrr binary max rcrr, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2006.15134

Country: North America (0.28)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Task-Relevant Adversarial Imitation Learning

Zolna, Konrad, Reed, Scott, Novikov, Alexander, Colmenarej, Sergio Gomez, Budden, David, Cabi, Serkan, Denil, Misha, de Freitas, Nando, Wang, Ziyu

arXiv.org Artificial IntelligenceOct-2-2019

We show that a critical problem in adversarial imitation from high-dimensional sensory data is the tendency of discriminator networks to distinguish agent and expert behaviour using task-irrelevant features beyond the control of the agent. We analyze this problem in detail and propose a solution as well as several baselines that outperform standard Generative Adversarial Imitation Learning (GAIL). Our proposed solution, Task-Relevant Adversarial Imitation Learning (TRAIL), uses a constrained optimization objective to overcome task-irrelevant features. Comprehensive experiments show that TRAIL can solve challenging manipulation tasks from pixels by imitating human operators, where other agents such as behaviour cloning (BC), standard GAIL, improved GAIL variants including our newly proposed baselines, and Deterministic Policy Gradients from Demonstrations (DPGfD) fail to find solutions, even when the other agents have access to task reward.

artificial intelligence, discriminator, neural network, (18 more...)

arXiv.org Artificial Intelligence

1910.01077

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning Compositional Neural Programs with Recursive Tree Search and Planning

Pierrot, Thomas, Ligner, Guillaume, Reed, Scott, Sigaud, Olivier, Perrin, Nicolas, Laterre, Alexandre, Kas, David, Beguir, Karim, de Freitas, Nando

arXiv.org Artificial IntelligenceMay-30-2019

We propose a novel reinforcement learning algorithm, AlphaNPI, that incorporates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero. NPI contributes structural biases in the form of modularity, hierarchy and recursion, which are helpful to reduce sample complexity, improve generalization and increase interpretability. AlphaZero contributes powerful neural network guided search algorithms, which we augment with recursion. AlphaNPI only assumes a hierarchical program specification with sparse rewards: 1 when the program execution satisfies the specification, and 0 otherwise. Using this specification, AlphaNPI is able to train NPI models effectively with RL for the first time, completely eliminating the need for strong supervision in the form of execution traces. The experiments show that AlphaNPI can sort as well as previous strongly supervised NPI variants. The AlphaNPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disks.

alphanpi, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

1905.12941

Country: Asia > Vietnam > Hanoi > Hanoi (0.26)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback