radian
Training-Free Robot Pose Estimation using Off-the-Shelf Foundational Models
Pose estimation of a robot arm from visual inputs is a challenging task. However, with the increasing adoption of robot arms for both industrial and residential use cases, reliable joint angle estimation can offer improved safety and performance guarantees, and also be used as a verifier to further train robot policies. This paper introduces using frontier vision-language models (VLMs) as an ``off-the-shelf" tool to estimate a robot arm's joint angles from a single target image. By evaluating frontier VLMs on both synthetic and real-world image-data pairs, this paper establishes a performance baseline attained by current FLMs. In addition, this paper presents empirical results suggesting that test time scaling or parameter scaling alone does not lead to improved joint angle predictions.
BuilderBench -- A benchmark for generalist agents
Ghugare, Raj, Ji, Catherine, Wantlin, Kathryn, Schofield, Jin, Eysenbach, Benjamin
Today's AI models learn primarily through mimicry and sharpening, so it is not surprising that they struggle to solve problems beyond the limits set by existing data. To solve novel problems, agents should acquire skills for exploring and learning through experience. Finding a scalable learning mechanism for developing agents that learn through interaction remains a major open problem. In this work, we introduce BuilderBench, a benchmark to accelerate research into agent pre-training that centers open-ended exploration. BuilderBench requires agents to learn how to build any structure using blocks. BuilderBench is equipped with $(1)$ a hardware accelerated simulator of a robotic agent interacting with various physical blocks, and $(2)$ a task-suite with over 42 diverse target structures that are carefully curated to test an understanding of physics, mathematics, and long-horizon planning. During training, agents have to explore and learn general principles about the environment without any external supervision. During evaluation, agents have to build the unseen target structures from the task suite. Solving these tasks requires a sort of \emph{embodied reasoning} that is not reflected in words but rather in actions, experimenting with different strategies and piecing them together. Our experiments show that many of these tasks challenge the current iteration of algorithms. Hence, we also provide a ``training wheels'' protocol, in which agents are trained and evaluated to build a single target structure from the task suite. Finally, we provide single-file implementations of six different algorithms as a reference point for researchers.
- North America > United States (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report (0.85)
- Instructional Material > Course Syllabus & Notes (0.48)
- North America > United States > New York (0.04)
- North America > Canada > Quebec > Montreal (0.04)
SpaceTrack-TimeSeries: Time Series Dataset towards Satellite Orbit Analysis
Guo, Zhixin, Shi, Qi, Xu, Xiaofan, Shan, Sixiang, Qin, Limin, Ge, Linqiang, Zhang, Rui, Dai, Ya, Zhu, Hua, Jiang, Guowei
With the rapid advancement of aerospace technology and the large-scale deployment of low Earth orbit (LEO) satellite constellations, the challenges facing astronomical observations and deep space exploration have become increasingly pronounced. As a result, the demand for high-precision orbital data on space objects-along with comprehensive analyses of satellite positioning, constellation configurations, and deep space satellite dynamics-has grown more urgent. However, there remains a notable lack of publicly accessible, real-world datasets to support research in areas such as space object maneuver behavior prediction and collision risk assessment. This study seeks to address this gap by collecting and curating a representative dataset of maneuvering behavior from Starlink satellites. The dataset integrates Two-Line Element (TLE) catalog data with corresponding high-precision ephemeris data, thereby enabling a more realistic and multidimensional modeling of space object behavior. It provides valuable insights into practical deployment of maneuver detection methods and the evaluation of collision risks in increasingly congested orbital environments.
- Asia > China > Shanghai > Shanghai (0.05)
- Oceania > Australia > South Australia > Adelaide (0.04)
- North America > United States > South Carolina > Charleston County > Charleston (0.04)
- (2 more...)
- Aerospace & Defense (1.00)
- Government > Regional Government > North America Government > United States Government (0.67)
- Government > Space Agency (0.46)
- Government > Military (0.46)
Quantification of Tenseness in English and Japanese Tense-Lax Vowels: A Lagrangian Model with Indicator $\theta_1$ and Force of Tenseness Ftense(t)
The concept of vowel tenseness has traditionally been examined through the binary distinction of tense and lax vowels. However, no universally accepted quantitative definition of tenseness has been established in any language. Previous studies, including those by Jakobson, Fant, and Halle (1951) and Chomsky and Halle (1968), have explored the relationship between vowel tenseness and the vocal tract. Building on these foundations, Ishizaki (2019, 2022) proposed an indirect quantification of vowel tenseness using formant angles $\theta_1$ and $\theta_{F1}$ and their first and second derivatives, $d^Z_1(t)/dt = \lim \tan \theta_1(t$) and $d^2 Z_1(t)/dt^2 = d/dt \lim \tan \theta_1(t)$. This study extends this approach by investigating the potential role of a force-related parameter in determining vowel quality. Specifically, we introduce a simplified model based on the Lagrangian equation to describe the dynamic interaction of the tongue and jaw within the oral cavity during the articulation of close vowels. This model provides a theoretical framework for estimating the forces involved in vowel production across different languages, offering new insights into the physical mechanisms underlying vowel articulation. The findings suggest that this force-based perspective warrants further exploration as a key factor in phonetic and phonological studies.
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective
Li, Xianliang, Luo, Jun, Zheng, Zhiwei, Wang, Hanxiao, Luo, Li, Wen, Lingkun, Wu, Linlong, Xu, Sheng
Momentum-based optimizers are widely adopted for training neural networks. However, the optimal selection of momentum coefficients remains elusive. This uncertainty impedes a clear understanding of the role of momentum in stochastic gradient methods. In this paper, we present a frequency domain analysis framework that interprets the momentum method as a time-variant filter for gradients, where adjustments to momentum coefficients modify the filter characteristics. Our experiments support this perspective and provide a deeper understanding of the mechanism involved. Moreover, our analysis reveals the following significant findings: high-frequency gradient components are undesired in the late stages of training; preserving the original gradient in the early stages, and gradually amplifying low-frequency gradient components during training both enhance generalization performance. Based on these insights, we propose Frequency Stochastic Gradient Descent with Momentum (FSGDM), a heuristic optimizer that dynamically adjusts the momentum filtering characteristic with an empirically effective dynamic magnitude response. Experimental results demonstrate the superiority of FSGDM over conventional momentum optimizers.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (5 more...)
A minimalistic representation model for head direction system
Zhao, Minglu, Xu, Dehong, Kong, Deqian, Zhang, Wen-Hao, Wu, Ying Nian
We present a minimalistic representation model for the head direction (HD) system, aiming to learn a high-dimensional representation of head direction that captures essential properties of HD cells. Our model is a representation of rotation group $U(1)$, and we study both the fully connected version and convolutional version. We demonstrate the emergence of Gaussian-like tuning profiles and a 2D circle geometry in both versions of the model. We also demonstrate that the learned model is capable of accurate path integration.
Behavioral Cloning Models Reality Check for Autonomous Driving
Yildirim, Mustafa, Dagda, Barkin, Asodia, Vinal, Fallah, Saber
How effective are recent advancements in autonomous vehicle perception systems when applied to real-world autonomous vehicle control? While numerous vision-based autonomous vehicle systems have been trained and evaluated in simulated environments, there is a notable lack of real-world validation for these systems. This paper addresses this gap by presenting the real-world validation of state-of-the-art perception systems that utilize Behavior Cloning (BC) for lateral control, processing raw image data to predict steering commands. The dataset was collected using a scaled research vehicle and tested on various track setups. Experimental results demonstrate that these methods predict steering angles with low error margins in real-time, indicating promising potential for real-world applications.
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
- Europe > Latvia > Dagda Municipality > Dagda (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Robotics & Automation (0.67)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)