Ningxia Hui Autonomous Region
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- Europe > Austria (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (10 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- North America > United States > Virginia (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > North Carolina (0.04)
- (4 more...)
- North America > United States > Virginia (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again
Knowledge Distillation (KD) aims at transferring the knowledge of a well-performed neural network (the {\it teacher}) to a weaker one (the {\it student}). A peculiar phenomenon is that a more accurate model doesn't necessarily teach better, and temperature adjustment can neither alleviate the mismatched capacity.
- Asia > Middle East > Jordan (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Virginia (0.04)
- (17 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- (8 more...)
FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via Neural Action Tokenization
Liu, Yicheng, Zhang, Shiduo, Dong, Zibin, Ye, Baijun, Yuan, Tianyuan, Yu, Xiaopeng, Yin, Linqi, Lu, Chenhao, Shi, Junhao, Yu, Luca Jiang-Tao, Zheng, Liangtao, Jiang, Tao, Gong, Jingjing, Qiu, Xipeng, Zhao, Hang
UCSD Figure 1: F AST er combines a learnable action tokenizer (FASTerVQ) and an autoregressive VLA model (FASTerVLA), achieving efficient compression, fast control, and strong performance across eight real and simulated embodiments. Autoregressive vision-language-action (VLA) models have recently demonstrated strong capabilities in robotic manipulation. However, their core process of action tokenization often involves a trade-off between reconstruction fidelity and inference efficiency. We introduce F AST er, a unified framework for efficient and generalizable robot learning that integrates a learnable tokenizer with an autore-gressive policy built upon it. FASTerVLA builds on this tokenizer with block-wise autore-gressive decoding and a lightweight action expert, achieving both faster inference and higher task performance. Extensive experiments across simulated and real-world benchmarks show that FASTerVQ delivers superior reconstruction quality, high token utilization, and strong cross-task and cross-embodiment generalization, while FASTerVLA further improves overall capability, surpassing previous state-of-the-art VLA models in both inference speed and task performance. Vision-Language-Action (VLA) models represent a paradigm shift in robotics, embodying generalist robot policies trained on increasingly large-scale robotic datasets (Chenjia Bai, 2024). These models are categorized primarily by their method of robot action prediction, with the most prominent approaches being diffusion-based (Team et al., 2024; Black et al., 2024) and autoregressive VLA (Belkhale & Sadigh, 2024; Kim et al., 2024; Pertsch et al., 2025; Zhou et al., 2025) models. While diffusion-based models have demonstrated superior precision in manipulation tasks, they often exhibit a notable deficiency in leveraging critical visual and linguistic cues (Pertsch et al., 2025; Dong et al., 2025). In contrast, recent research indicates that a carefully designed autoregres-sive VLA model can increasingly bridge the performance gap with its diffusion-based counterparts, while simultaneously offering enhanced instruction-following capabilities (Pertsch et al., 2025; Intelligence et al., 2025; Hancock et al., 2025), superior scene generalization (Pertsch et al., 2025), and effective transfer of common-sense knowledge (Brohan et al., 2023). Most importantly, autoregres-sive VLA models share the most architectural similarity to the highly successful Vision-Language Models (VLMs), suggesting significant potential for future advancements. A pivotal challenge within autoregressive VLA models is the development of an appropriate tok-enization scheme to discretize continuous robot action sequence into action tokens (Wang et al., 2025c; Pertsch et al., 2025). Numerous sequence modeling studies, including LLMs and Speech-LLMs, have demonstrated that tokenizer quality directly determines model performance (Radford et al., 2019; Zhang et al., 2023; Gong et al., 2025).
H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation
Zhu, Yijie, Shao, Rui, Liu, Ziyang, He, Jie, Liu, Jizhihui, Wang, Jiuru, Yu, Zitong
Unified video and action prediction models hold great potential for robotic manipulation, as future observations offer contextual cues for planning, while actions reveal how interactions shape the environment. However, most existing approaches treat observation and action generation in a monolithic and goal-agnostic manner, often leading to semantically misaligned predictions and incoherent behaviors. To this end, we propose H-GAR, a Hierarchical interaction framework via Goal-driven observation-Action Refinement.To anchor prediction to the task objective, H-GAR first produces a goal observation and a coarse action sketch that outline a high-level route toward the goal. To enable explicit interaction between observation and action under the guidance of the goal observation for more coherent decision-making, we devise two synergistic modules. (1) Goal-Conditioned Observation Synthesizer (GOS) synthesizes intermediate observations based on the coarse-grained actions and the predicted goal observation. (2) Interaction-Aware Action Refiner (IAAR) refines coarse actions into fine-grained, goal-consistent actions by leveraging feedback from the intermediate observations and a Historical Action Memory Bank that encodes prior actions to ensure temporal consistency. By integrating goal grounding with explicit action-observation interaction in a coarse-to-fine manner, H-GAR enables more accurate manipulation. Extensive experiments on both simulation and real-world robotic manipulation tasks demonstrate that H-GAR achieves state-of-the-art performance.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Workflow (0.94)
- Research Report (0.82)