AITopics | Yamashita, Atsushi

Collaborating Authors

Yamashita, Atsushi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation

Yin, Wanqi, Cai, Zhongang, Wang, Ruisi, Zeng, Ailing, Wei, Chen, Sun, Qingping, Mei, Haiyi, Wang, Yanjun, Pang, Hui En, Zhang, Mingyuan, Zhang, Lei, Loy, Chen Change, Yamashita, Atsushi, Yang, Lei, Liu, Ziwei

arXiv.org Artificial IntelligenceJan-16-2025

Abstract--Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods focus on training innovative architectural designs on confined datasets. In this work, we investigate the impact of scaling up EHPS towards a family of generalist foundation models. More importantly, capitalizing on insights obtained from the extensive benchmarking process, we optimize our training scheme and select datasets that lead to a significant leap in EHPS capabilities. Ultimately, we achieve diminishing returns at 10M training instances from diverse data sources. To exclude the influence of algorithmic design, we base our experiments on two minimalist architectures: SMPLer-X, which consists of an intermediate step for hand and face localization, and SMPLest-X, an even simpler version that reduces the network to its bare essentials and highlights significant advances in the capture of articulated hands. Moreover, our finetuning strategy turns the generalist into specialist models, allowing them to achieve further performance boosts. Notably, our foundation models consistently deliver state-of-the-art results on seven benchmarks such as AGORA, UBody, EgoBody, and our proposed SynHand dataset for comprehensive hand evaluation. This task typically uses parametric human performance across a basket of key benchmarks, in order to models (e.g., SMPL-X [1]) as a powerful representation provide a holistic measurement of generalization capabilities. of the human body, face, and hands. With a flurry of Our study underscores the importance of harnessing a diverse datasets entering the scene in recent years [2], [3], multitude of datasets to capitalize on their complementary [4], [5], [6], [7], [8], [9], [10], [11], providing the community nature. Moreover, we contribute a new dataset, SynHand, new opportunities to study various aspects such as capture to provide the community with a long-awaiting benchmark environment, pose distribution, body visibility, and camera for comprehensive hand pose evaluation in a whole-body views. Yet, the state-of-the-art methods channel their attention setting. SynHand features diverse hand poses in close-up towards advancements in architectural designs and human shots, accurately annotated as part of the wholebody remain tethered to a limited selection of these datasets, SMPL-X labels. Accordingly, we establish a systematic benchmark results across various scenarios.

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.09782

Country:

Asia > China (0.67)
Asia > Japan > Honshū (0.28)

Genre:

Research Report > New Finding (0.92)
Research Report > Promising Solution (0.68)

Industry:

Education > Educational Setting (0.92)
Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models

Wang, Yongdong, Xiao, Runze, Kasahara, Jun Younes Louhi, Yajima, Ryosuke, Nagatani, Keiji, Yamashita, Atsushi, Asama, Hajime

arXiv.org Artificial IntelligenceNov-13-2024

Large Language Models (LLMs) have demonstrated significant reasoning capabilities in robotic systems. However, their deployment in multi-robot systems remains fragmented and struggles to handle complex task dependencies and parallel execution. This study introduces the DART-LLM (Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models) system, designed to address these challenges. DART-LLM utilizes LLMs to parse natural language instructions, decomposing them into multiple subtasks with dependencies to establish complex task sequences, thereby enhancing efficient coordination and parallel execution in multi-robot systems. The system includes the QA LLM module, Breakdown Function modules, Actuation module, and a Vision-Language Model (VLM)-based object detection module, enabling task decomposition and execution from natural language instructions to robotic actions. Experimental results demonstrate that DART-LLM excels in handling long-horizon tasks and collaborative tasks with complex dependencies. Even when using smaller models like Llama 3.1 8B, the system achieves good performance, highlighting DART-LLM's robustness in terms of model size. Please refer to the project website \url{https://wyd0817.github.io/project-dart-llm/} for videos and code.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2411.09022

Country: Asia > Japan (0.16)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

State-Free Inference of State-Space Models: The Transfer Function Approach

Parnichkun, Rom N., Massaroli, Stefano, Moro, Alessandro, Smith, Jimmy T. H., Hasani, Ramin, Lechner, Mathias, An, Qi, Ré, Christopher, Asama, Hajime, Ermon, Stefano, Suzuki, Taiji, Yamashita, Atsushi, Poli, Michael

arXiv.org Artificial IntelligenceJun-1-2024

We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of the proposed frequency domain transfer function parametrization, which enables direct computation of its corresponding convolutional kernel's spectrum via a single Fast Fourier Transform. Our experimental results across multiple sequence lengths and state sizes illustrates, on average, a 35% training speed improvement over S4 layers -- parametrized in time-domain -- on the Long Range Arena benchmark, while delivering state-of-the-art downstream performances over other attention-free approaches. Moreover, we report improved perplexity in language modeling over a long convolutional Hyena baseline, by simply introducing our transfer function parametrization. Our code is available at https://github.com/ruke1ire/RTF.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.06147

Country:

North America > United States > Oregon (0.14)
North America > United States > Louisiana (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

WHAC: World-grounded Humans and Cameras

Yin, Wanqi, Cai, Zhongang, Wang, Ruisi, Wang, Fanzhou, Wei, Chen, Mei, Haiyi, Xiao, Weiye, Yang, Zhitao, Sun, Qingping, Yamashita, Atsushi, Liu, Ziwei, Yang, Lei

arXiv.org Artificial IntelligenceMar-19-2024

Estimating human and camera trajectories with accurate scale in the world coordinate system from a monocular video is a highly desirable yet challenging and ill-posed problem. In this study, we aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera. Our approach is founded on two key observations. Firstly, camera-frame SMPL-X estimation methods readily recover absolute human depth. Secondly, human motions inherently provide absolute spatial cues. By integrating these insights, we introduce a novel framework, referred to as WHAC, to facilitate world-grounded expressive human pose and shape estimation (EHPS) alongside camera pose estimation, without relying on traditional optimization techniques. Additionally, we present a new synthetic dataset, WHAC-A-Mole, which includes accurately annotated humans and cameras, and features diverse interactive human motions as well as realistic camera trajectories. Extensive experiments on both standard and newly established benchmarks highlight the superiority and efficacy of our framework. We will make the code and dataset publicly available.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2403.12959

Country:

Asia > Middle East > Israel (0.14)
Asia > Japan > Honshū (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Media (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.35)

Add feedback

Motion Degeneracy in Self-supervised Learning of Elevation Angle Estimation for 2D Forward-Looking Sonar

Wang, Yusheng, Ji, Yonghoon, Wu, Chujie, Tsuchiya, Hiroshi, Asama, Hajime, Yamashita, Atsushi

arXiv.org Artificial IntelligenceJul-31-2023

2D forward-looking sonar is a crucial sensor for underwater robotic perception. A well-known problem in this field is estimating missing information in the elevation direction during sonar imaging. There are demands to estimate 3D information per image for 3D mapping and robot navigation during fly-through missions. Recent learning-based methods have demonstrated their strengths, but there are still drawbacks. Supervised learning methods have achieved high-quality results but may require further efforts to acquire 3D ground-truth labels. The existing self-supervised method requires pretraining using synthetic images with 3D supervision. This study aims to realize stable self-supervised learning of elevation angle estimation without pretraining using synthetic images. Failures during self-supervised learning may be caused by motion degeneracy problems. We first analyze the motion field of 2D forward-looking sonar, which is related to the main supervision signal. We utilize a modern learning framework and prove that if the training dataset is built with effective motions, the network can be trained in a self-supervised manner without the knowledge of synthetic data. Both simulation and real experiments validate the proposed method.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2307.1616

Country: Asia > Japan (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

2D Forward Looking Sonar Simulation with Ground Echo Modeling

Wang, Yusheng, Wu, Chujie, Ji, Yonghoon, Tsuchiya, Hiroshi, Asama, Hajime, Yamashita, Atsushi

arXiv.org Artificial IntelligenceApr-17-2023

Imaging sonar produces clear images in underwater environments, independent of water turbidity and lighting conditions. The next generation 2D forward looking sonars are compact in size and able to generate high-resolution images which facilitate underwater robotics research. Considering the difficulties and expenses of implementing experiments in underwater environments, tremendous work has been focused on sonar image simulation. However, sonar artifacts like multi-path reflection were not sufficiently discussed, which cannot be ignored in water tank environments. In this paper, we focus on the influence of echoes from the flat ground. We propose a method to simulate the ground echo effect physically in acoustic images. We model the multi-bounce situations using the single-bounce framework for computation efficiency. We compare the real image captured in the water tank with the synthetic images to validate the proposed methods.

artificial intelligence, reflection, yamashita, (18 more...)

arXiv.org Artificial Intelligence

2304.08146

Country: Asia > Japan (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Sensing and Signal Processing > Image Processing (0.67)

Add feedback

Continuous-Depth Neural Models for Dynamic Graph Prediction

Poli, Michael, Massaroli, Stefano, Rabideau, Clayton M., Park, Junyoung, Yamashita, Atsushi, Asama, Hajime, Park, Jinkyoo

arXiv.org Artificial IntelligenceJun-22-2021

We introduce the framework of continuous-depth graph neural networks (GNNs). Neural graph differential equations (Neural GDEs) are formalized as the counterpart to GNNs where the input-output relationship is determined by a continuum of GNN layers, blending discrete topological structures and differential equations. The proposed framework is shown to be compatible with static GNN models and is extended to dynamic and stochastic settings through hybrid dynamical system theory. Here, Neural GDEs improve performance by exploiting of the underlying dynamics geometry, further introducing the ability to accommodate irregularly sampled data. Results prove the effectiveness of the proposed models across applications, such as traffic forecasting or prediction in genetic regulatory networks.

arxiv preprint arxiv, deep learning, neural network, (13 more...)

arXiv.org Artificial Intelligence

2106.11581

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Stochastic Optimal Policies via Gradient Descent

Massaroli, Stefano, Poli, Michael, Peluchetti, Stefano, Park, Jinkyoo, Yamashita, Atsushi, Asama, Hajime

arXiv.org Artificial IntelligenceJun-7-2021

We systematically develop a learning-based treatment of stochastic optimal control (SOC), relying on direct optimization of parametric control policies. We propose a derivation of adjoint sensitivity results for stochastic differential equations through direct application of variational calculus. Then, given an objective function for a predetermined task specifying the desiderata for the controller, we optimize their parameters via iterative gradient descent methods. In doing so, we extend the range of applicability of classical SOC techniques, often requiring strict assumptions on the functional form of system and control. We verify the performance of the proposed approach on a continuous-time, finite horizon portfolio optimization with proportional transaction costs.

arxiv preprint arxiv, neural network, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LCSYS.2021.3086672.

2106.0378

Country: Asia > Japan (0.14)

Genre: Research Report (0.40)

Industry:

Banking & Finance (0.46)
Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Differentiable Multiple Shooting Layers

Massaroli, Stefano, Poli, Michael, Sonoda, Sho, Suzuki, Taji, Park, Jinkyoo, Yamashita, Atsushi, Asama, Hajime

arXiv.org Machine LearningJun-7-2021

Leveraging time-parallel methods for differential equations, Multiple Shooting Layers (MSLs) seek solutions of initial value problems via parallelizable root-finding algorithms. MSLs broadly serve as drop-in replacements for neural ordinary differential equations (Neural ODEs) with improved efficiency in number of function evaluations (NFEs) and wall-clock inference time. We develop the algorithmic framework of MSLs, analyzing the different choices of solution methods from a theoretical and computational perspective. MSLs are showcased in long horizon optimal control of ODEs and PDEs and as latent models for sequence generation. Finally, we investigate the speedups obtained through application of MSL inference in neural controlled differential equations (Neural CDEs) for time series classification of medical data.

deep learning, iteration, neural network, (19 more...)

arXiv.org Machine Learning

2106.03885

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Mathematics of Computing (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Optimal Energy Shaping via Neural Approximators

Massaroli, Stefano, Poli, Michael, Califano, Federico, Park, Jinkyoo, Yamashita, Atsushi, Asama, Hajime

arXiv.org Artificial IntelligenceJan-14-2021

We introduce optimal energy shaping as an enhancement of classical passivity-based control methods. A promising feature of passivity theory, alongside stability, has traditionally been claimed to be intuitive performance tuning along the execution of a given task. However, a systematic approach to adjust performance within a passive control framework has yet to be developed, as each method relies on few and problem-specific practical insights. Here, we cast the classic energy-shaping control design process in an optimal control framework; once a task-dependent performance metric is defined, an optimal solution is systematically obtained through an iterative procedure relying on neural networks and gradient-based optimization. The proposed method is validated on state-regulation tasks.

controller, deep learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2101.05537

Country: North America > United States (0.34)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback