AITopics | droid

Collaborating Authors

droid

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding

Nikolov, Nikolay, Albanese, Giuliano, Dey, Sombit, Yanev, Aleksandar, Van Gool, Luc, Zaech, Jan-Nico, Paudel, Danda Pani

arXiv.org Artificial IntelligenceNov-24-2025

Robotic Foundation Models (RFMs) hold great promise as generalist, end-to-end systems for robot control. Yet their ability to generalize across new environments, tasks, and embodiments remains limited. We argue that a major bottleneck lies in their foundations: most RFMs are built by fine-tuning internet-pretrained Vision-Language Models (VLMs). However, these VLMs are trained on 2D image-language tasks and lack the 3D spatial reasoning inherently required for embodied control in the 3D world. Bridging this gap directly with large-scale robotic data is costly and difficult to scale. Instead, we propose to enrich easy-to-collect non-robotic image data with 3D annotations and enhance a pretrained VLM with 3D understanding capabilities. Following this strategy, we train SPEAR-VLM, a 3D-aware VLM that infers object coordinates in 3D space from a single 2D image. Building on SPEAR-VLM, we introduce our main contribution, $~\textbf{SPEAR-1}$: a robotic foundation model that integrates grounded 3D perception with language-instructed embodied control. Trained on $\sim$45M frames from 24 Open X-Embodiment datasets, SPEAR-1 outperforms or matches state-of-the-art models such as $π_0$-FAST and $π_{0.5}$, while it uses 20$\times$ fewer robot demonstrations. This carefully-engineered training strategy unlocks new VLM capabilities and as a consequence boosts the reliability of embodied control beyond what is achievable with only robotic data. We make our model weights and 3D-annotated datasets publicly available.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.17411

Country:

North America > Montserrat (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Bird or droid? Starlings nail R2-D2 beeps and boops.

Popular ScienceNov-7-2025, 21:00:00 GMT

The songbirds are even better at mimicking the'Star Wars' robot than parrots. Breakthroughs, discoveries, and DIY tips sent every weekday. Songbirds like parrots and parakeets might be well known for squeaking out embarrassing one-liners and certain four-letter words, but those aren't the only sounds they can mimic. Birds have been observed copying dog barks, car alarms, and even chainsaws . But it turns out some species are better equipped to copy the droid's high-pitched beeps and boops than others.

artificial intelligence, beep and boop, social media, (15 more...)

Popular Science

Country:

South America > Chile (0.05)
North America > United States > New York (0.05)
North America > Canada (0.05)
(5 more...)

Genre: Research Report > New Finding (0.69)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.49)
Media > Photography (0.49)
Health & Medicine > Therapeutic Area (0.49)

Technology:

Information Technology > Artificial Intelligence > Robots (0.53)
Information Technology > Communications > Social Media (0.49)

Add feedback

Why human-shaped robots loom large in Musk's Tesla plans

BBC NewsNov-7-2025, 12:15:04 GMT

Why human-shaped robots loom large in Musk's Tesla plans It has appeared in Tesla showrooms, on its factory floors and has even posed with Kim Kardashian. But Elon Musk's vision for his human-like robot Optimus is much grander than that. Since first unveiling it at a Tesla showcase in 2022, the tech billionaire has suggested his company's droid could play a huge role in the homes and lives of people all over the world. Along with self-driving robotaxis and Cybertrucks, Musk believes Tesla robots are key to establishing a foothold in the artificial intelligence (AI) landscape. And investors who signed off on his $1tn (£760bn) pay package on Thursday would appear to agree .

artificial intelligence, musk, robot, (13 more...)

BBC News

Country:

South America (0.15)
North America > Central America (0.15)
Oceania > Australia (0.05)
(14 more...)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.38)

Add feedback

DROID: Dual Representation for Out-of-Scope Intent Detection

Rashwan, Wael, Zawbaa, Hossam M., Dutta, Sourav, Assem, Haytham

arXiv.org Artificial IntelligenceOct-17-2025

Abstract--Detecting out-of-scope (OOS) user utterances remains a key challenge in task-oriented dialogue systems and, more broadly, in open-set intent recognition. Existing approaches often depend on strong distributional assumptions or auxiliary calibration modules. We present DROID (Dual Representation for Out-of-Scope Intent Detection), a compact end-to-end framework that combines two complementary encoders--the Universal Sentence Encoder (USE) for broad semantic generalization and a domain-adapted Transformer-based Denoising Autoencoder (TSDAE) for domain-specific contextual distinctions. Their fused representations are processed by a lightweight branched classifier with a single calibrated threshold that separates in-domain and OOS intents without post-hoc scoring. T o enhance boundary learning under limited supervision, DROID incorporates both synthetic and open-domain outlier augmentation. Despite using only 1.5M trainable parameters, DROID consistently outperforms recent state-of-the-art baselines across multiple intent benchmarks, achieving macro-F1 improvements of 6-15% for known and 8-20% for OOS intents, with the largest gains in low-resource settings. These results demonstrate that dual-encoder representations with simple calibration can yield robust, scalable, and reliable OOS detection for neural dialogue systems. ONVERSA TIONAL AI systems are a primary interface for user assistance across sectors such as customer service, healthcare, and finance. A core requirement is intent classification--mapping utterances to predefined intents so downstream components can act appropriately [1].

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.1411

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(10 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Grandfather builds the droids he was always looking for

Kurt Zimmerman brought Star Wars from a galaxy far, far away to Michigan. Kurt makes his droids out of wood, but they're filled and painted to look like metal. Breakthroughs, discoveries, and DIY tips sent every weekday. The wood exploded into a million pieces, covering the workshop floor. As he stood there looking at the mess he just made, Kurt Zimmerman was at a crossroads moment.

david nield, droid, kurt, (15 more...)

Popular Science

Country:

North America > United States > Michigan (0.25)
North America > United States > Texas (0.05)
North America > United States > Florida > Orange County (0.05)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence (0.68)
Information Technology > Communications > Social Media (0.30)

Add feedback

A real issue: video game developers are being accused of using AI – even when they aren't

The GuardianJun-26-2025, 09:00:01 GMT

In April, game developer Stamina Zero achieved what should have been a marketing slam-dunk: the launch trailer for the studio's game Little Droid was published on PlayStation's official YouTube channel. The response was a surprise for the developer. The game looks interesting, people wrote in the comments, but was "ruined" by AI art. But the game's cover art, used as the thumbnail for the YouTube video, was in fact made by a real person, according to developer Lana Ro. "We know the artist, we've seen her work, so such a negative reaction was unexpected for us, and at first we didn't know how to respond or how to feel," Ro said. It's not wrong for people to be worried about AI use in video games – in fact, it's good to be sceptical, and ensure that the media you support aligns with your values. Common arguments against generative AI relate to environmental impact, art theft and just general quality, and video game developers are grappling with how generative AI will impact their jobs.

artificial intelligence, machine learning, social media, (14 more...)

The Guardian

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Games (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

What Matters in Learning from Large-Scale Datasets for Robot Manipulation

Saxena, Vaibhav, Bronars, Matthew, Arachchige, Nadun Ranawaka, Wang, Kuancheng, Shin, Woo Chul, Nasiriany, Soroush, Mandlekar, Ajay, Xu, Danfei

arXiv.org Artificial IntelligenceJun-17-2025

Imitation learning from large multi-task demonstration datasets has emerged as a promising path for building generally-capable robots. As a result, 1000s of hours have been spent on building such large-scale datasets around the globe. Despite the continuous growth of such efforts, we still lack a systematic understanding of what data should be collected to improve the utility of a robotics dataset and facilitate downstream policy learning. In this work, we conduct a large-scale dataset composition study to answer this question. We develop a data generation framework to procedurally emulate common sources of diversity in existing datasets (such as sensor placements and object types and arrangements), and use it to generate large-scale robot datasets with controlled compositions, enabling a suite of dataset composition studies that would be prohibitively expensive in the real world. We focus on two practical settings: (1) what types of diversity should be emphasized when future researchers collect large-scale datasets for robotics, and (2) how should current practitioners retrieve relevant demonstrations from existing datasets to maximize downstream policy performance on tasks of interest. Our study yields several critical insights -- for example, we find that camera poses and spatial arrangements are crucial dimensions for both diversity in collection and alignment in retrieval. In real-world robot learning settings, we find that not only do our insights from simulation carry over, but our retrieval strategies on existing datasets such as DROID allow us to consistently outperform existing training strategies by up to 70%. More results at https://robo-mimiclabs.github.io/

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.13536

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Khazatsky, Alexander, Pertsch, Karl, Nair, Suraj, Balakrishna, Ashwin, Dasari, Sudeep, Karamcheti, Siddharth, Nasiriany, Soroush, Srirama, Mohan Kumar, Chen, Lawrence Yunliang, Ellis, Kirsty, Fagan, Peter David, Hejna, Joey, Itkina, Masha, Lepert, Marion, Ma, Yecheng Jason, Miller, Patrick Tree, Wu, Jimmy, Belkhale, Suneel, Dass, Shivin, Ha, Huy, Jain, Arhan, Lee, Abraham, Lee, Youngwoon, Memmel, Marius, Park, Sungjae, Radosavovic, Ilija, Wang, Kaiyuan, Zhan, Albert, Black, Kevin, Chi, Cheng, Hatch, Kyle Beltran, Lin, Shan, Lu, Jingpei, Mercat, Jean, Rehman, Abdul, Sanketi, Pannag R, Sharma, Archit, Simpson, Cody, Vuong, Quan, Walke, Homer Rich, Wulfe, Blake, Xiao, Ted, Yang, Jonathan Heewon, Yavary, Arefeh, Zhao, Tony Z., Agia, Christopher, Baijal, Rohan, Castro, Mateo Guaman, Chen, Daphne, Chen, Qiuyu, Chung, Trinity, Drake, Jaimyn, Foster, Ethan Paul, Gao, Jensen, Herrera, David Antonio, Heo, Minho, Hsu, Kyle, Hu, Jiaheng, Jackson, Donovon, Le, Charlotte, Li, Yunshuang, Lin, Kevin, Lin, Roy, Ma, Zehan, Maddukuri, Abhiram, Mirchandani, Suvir, Morton, Daniel, Nguyen, Tony, O'Neill, Abigail, Scalise, Rosario, Seale, Derick, Son, Victor, Tian, Stephen, Tran, Emi, Wang, Andrew E., Wu, Yilin, Xie, Annie, Yang, Jingyun, Yin, Patrick, Zhang, Yunchu, Bastani, Osbert, Berseth, Glen, Bohg, Jeannette, Goldberg, Ken, Gupta, Abhinav, Gupta, Abhishek, Jayaraman, Dinesh, Lim, Joseph J, Malik, Jitendra, Martín-Martín, Roberto, Ramamoorthy, Subramanian, Sadigh, Dorsa, Song, Shuran, Wu, Jiajun, Yip, Michael C., Zhu, Yuke, Kollar, Thomas, Levine, Sergey, Finn, Chelsea

arXiv.org Artificial IntelligenceMar-19-2024

The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.

dataset, diversity, droid, (14 more...)

arXiv.org Artificial Intelligence

2403.12945

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > California > Yolo County > Davis (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback

Improving Model's Focus Improves Performance of Deep Learning-Based Synthetic Face Detectors

Piland, Jacob, Czajka, Adam, Sweet, Christopher

arXiv.org Artificial IntelligenceMar-1-2023

Deep learning-based models generalize better to unknown data samples after being guided "where to look" by incorporating human perception into training strategies. We made an observation that the entropy of the model's salience trained in that way is lower when compared to salience entropy computed for models training without human perceptual intelligence. Thus the question: does further increase of model's focus, by lowering the entropy of model's class activation map, help in further increasing the performance? In this paper we propose and evaluate several entropy-based new loss function components controlling the model's focus, covering the full range of the level of such control, from none to its "aggressive" minimization. We show, using a problem of synthetic face detection, that improving the model's focus, through lowering entropy, leads to models that perform better in an open-set scenario, in which the test samples are synthesized by unknown generative models. We also show that optimal performance is obtained when the model's loss function blends three aspects: regular classification, low-entropy of the model's focus, and human-guided saliency.

artificial intelligence, entropy, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.00818

Country:

North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
North America > United States > Hawaii (0.04)
North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DROID: Driver-centric Risk Object Identification

Li, Chengxi, Chan, Stanley H., Chen, Yi-Ting

arXiv.org Artificial IntelligenceFeb-28-2023

Identification of high-risk driving situations is generally approached through collision risk estimation or accident pattern recognition. In this work, we approach the problem from the perspective of subjective risk. We operationalize subjective risk assessment by predicting driver behavior changes and identifying the cause of changes. To this end, we introduce a new task called driver-centric risk object identification (DROID), which uses egocentric video to identify object(s) influencing a driver's behavior, given only the driver's response as the supervision signal. We formulate the task as a cause-effect problem and present a novel two-stage DROID framework, taking inspiration from models of situation awareness and causal inference. A subset of data constructed from the Honda Research Institute Driving Dataset (HDD) is used to evaluate DROID. We demonstrate state-of-the-art DROID performance, even compared with strong baseline models using this dataset. Additionally, we conduct extensive ablative studies to justify our design choices. Moreover, we demonstrate the applicability of DROID for risk assessment.

artificial intelligence, computer vision, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2106.13201

Country:

Asia > Taiwan (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(8 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (0.91)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback