AITopics | Moulon, Pierre

Collaborating Authors

Moulon, Pierre

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos

Banerjee, Prithviraj, Shkodrani, Sindi, Moulon, Pierre, Hampali, Shreyas, Han, Shangchen, Zhang, Fan, Zhang, Linguang, Fountain, Jade, Miller, Edward, Basol, Selen, Newcombe, Richard, Wang, Robert, Engel, Jakob Julian, Hodan, Tomas

arXiv.org Artificial IntelligenceNov-28-2024

We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground-truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects. In addition to simple pick-up/observe/put-down actions, HOT3D contains scenarios resembling typical actions in a kitchen, office, and living room environment. The dataset is recorded by two head-mounted devices from Meta: Project Aria, a research prototype of light-weight AR/AI glasses, and Quest 3, a production VR headset sold in millions of units. Ground-truth poses were obtained by a professional motion-capture system using small optical markers attached to hands and objects. Hand annotations are provided in the UmeTrack and MANO formats and objects are represented by 3D meshes with PBR materials obtained by an in-house scanner. In our experiments, we demonstrate the effectiveness of multi-view egocentric data for three popular tasks: 3D hand tracking, 6DoF object pose estimation, and 3D lifting of unknown in-hand objects. The evaluated multi-view methods, whose benchmarking is uniquely enabled by HOT3D, significantly outperform their single-view counterparts.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.19167

Country: Asia (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.71)

Add feedback

Aria Everyday Activities Dataset

Lv, Zhaoyang, Charron, Nicholas, Moulon, Pierre, Gamino, Alexander, Peng, Cheng, Sweeney, Chris, Miller, Edward, Tang, Huixuan, Meissner, Jeff, Dong, Jing, Somasundaram, Kiran, Pesqueira, Luis, Schwesinger, Mark, Parkhi, Omkar, Gu, Qiao, De Nardi, Renzo, Cheng, Shangyi, Saarinen, Steve, Baiyya, Vijay, Zou, Yuyang, Newcombe, Richard, Engel, Jakob Julian, Pan, Xiaqing, Ren, Carl

arXiv.org Artificial IntelligenceFeb-21-2024

We present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor locations. Each of the recording contains multimodal sensor data recorded through the Project Aria glasses. In addition, AEA provides machine perception data including high frequency globally aligned 3D trajectories, scene point cloud, per-frame 3D eye gaze vector and time aligned speech transcription. In this paper, we demonstrate a few exemplar research applications enabled by this dataset, including neural scene reconstruction and prompted segmentation. AEA is an open source dataset that can be downloaded from https://www.projectaria.com/datasets/aea/. We are also providing open-source implementations and examples of how to use the dataset in Project Aria Tools https://github.com/facebookresearch/projectaria_tools.

artificial intelligence, natural language, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2402.13349

Country: Asia (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.88)

Add feedback

Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations

Majumder, Sagnik, Jiang, Hao, Moulon, Pierre, Henderson, Ethan, Calamia, Paul, Grauman, Kristen, Ithapu, Vamsi Krishna

arXiv.org Artificial IntelligenceApr-20-2023

Can conversational videos captured from multiple egocentric viewpoints reveal the map of a scene in a cost-efficient way? We seek to answer this question by proposing a new problem: efficiently building the map of a previously unseen 3D environment by exploiting shared information in the egocentric audio-visual observations of participants in a natural conversation. Our hypothesis is that as multiple people ("egos") move in a scene and talk among themselves, they receive rich audio-visual cues that can help uncover the unseen areas of the scene. Given the high cost of continuously processing egocentric visual streams, we further explore how to actively coordinate the sampling of visual information, so as to minimize redundancy and reduce power use. To that end, we present an audio-visual deep reinforcement learning approach that works with our shared scene mapper to selectively turn on the camera to efficiently chart out the space. We evaluate the approach using a state-of-the-art audio-visual simulator for 3D scenes as well as real-world video. Our model outperforms previous state-of-the-art mapping methods, and achieves an excellent cost-accuracy tradeoff. Project: http://vision.cs.utexas.edu/projects/chat2map.

machine learning, mapping, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2301.02184

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback