tennis ball
Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model
Xu, Wenjiang, Wang, Cindy, Fang, Rui, Zhang, Mingkang, Li, Lusong, Xu, Jing, Gu, Jiayuan, Zeng, Zecui, Chen, Rui
World models have emerged as a pivotal component in robot manipulation planning, enabling agents to predict future environmental states and reason about the consequences of actions before execution. While video-generation models are increasingly adopted, they often lack rigorous physical grounding, leading to hallucinations and a failure to maintain consistency in long-horizon physical constraints. To address these limitations, we propose Embodied Tree of Thoughts (EToT), a novel Real2Sim2Real planning framework that leverages a physics-based interactive digital twin as an embodied world model. EToT formulates manipulation planning as a tree search expanded through two synergistic mechanisms: (1) Priori Branching, which generates diverse candidate execution paths based on semantic and spatial analysis; and (2) Reflective Branching, which utilizes VLMs to diagnose execution failures within the simulator and iteratively refine the planning tree with corrective actions. By grounding high-level reasoning in a physics simulator, our framework ensures that generated plans adhere to rigid-body dynamics and collision constraints. We validate EToT on a suite of short- and long-horizon manipulation tasks, where it consistently outperforms baselines by effectively predicting physical dynamics and adapting to potential failures. Website at https://embodied-tree-of-thoughts.github.io .
What Really Counts? Examining Step and Token Level Attribution in Multilingual CoT Reasoning
Ferrao, Jeremias, Basar, Ezgi, Islam, Khondoker Ittehadul, Hassani, Mahrokh
This study investigates the attribution patterns underlying Chain-of-Thought (CoT) reasoning in multilingual LLMs. While prior works demonstrate the role of CoT prompting in improving task performance, there are concerns regarding the faithfulness and interpretability of the generated reasoning chains. To assess these properties across languages, we applied two complementary attribution methods--ContextCite for step-level attribution and Inseq for token-level attribution--to the Qwen2.5 1.5B-Instruct model using the MGSM benchmark. Our experimental results highlight key findings such as: (1) attribution scores excessively emphasize the final reasoning step, particularly in incorrect generations; (2) structured CoT prompting significantly improves accuracy primarily for high-resource Latin-script languages; and (3) controlled perturbations via negation and distractor sentences reduce model accuracy and attribution coherence. These findings highlight the limitations of CoT prompting, particularly in terms of multilingual robustness and interpretive transparency.
Automated Tennis Player and Ball Tracking with Court Keypoints Detection (Hawk Eye System)
Desu, Venkata Manikanta, Ali, Syed Fawaz
This study presents a complete pipeline for automated tennis match analysis. Our framework integrates multiple deep learning models to detect and track players and the tennis ball in real time, while also identifying court keypoints for spatial reference. Using YOLOv8 for player detection, a custom-trained YOLOv5 model for ball tracking, and a ResNet50-based architecture for court keypoint detection, our system provides detailed analytics including player movement patterns, ball speed, shot accuracy, and player reaction times. The experimental results demonstrate robust performance in varying court conditions and match scenarios. The model outputs an annotated video along with detailed performance metrics, enabling coaches, broadcasters, and players to gain actionable insights into the dynamics of the game.
If you don't know about these video tools, you're already behind
Center for Humane Technology co-founder Tristan Harris discusses the future of artificial intelligence on'America Reports.' I've said it before, and I'll say it again, AI is changing everything. This is next-level, movie-magic stuff. Enter to win 500 for you and 500 for your favorite person or charity in our Pay It Forward Sweepstakes. Let's talk about the wild part first. You don't need any editing software.
Multidimensional Consistency Improves Reasoning in Language Models
Lai, Huiyuan, Zhang, Xiao, Nissim, Malvina
While Large language models (LLMs) have proved able to address some complex reasoning tasks, we also know that they are highly sensitive to input variation, which can lead to different solution paths and final answers. Answer consistency across input variations can thus be taken as a sign of stronger confidence. Leveraging this insight, we introduce a framework, {\em Multidimensional Reasoning Consistency} where, focusing on math problems, models are systematically pushed to diversify solution paths towards a final answer, thereby testing them for answer consistency across multiple input variations. We induce variations in (i) order of shots in prompt, (ii) problem phrasing, and (iii) languages used. Extensive experiments on a large range of open-source state-of-the-art LLMs of various sizes show that reasoning consistency differs by variation dimension, and that by aggregating consistency across dimensions, our framework consistently enhances mathematical reasoning performance on both monolingual dataset GSM8K and multilingual dataset MGSM, especially for smaller models.
Catching Spinning Table Tennis Balls in Simulation with End-to-End Curriculum Reinforcement Learning
Hu, Xiaoyi, Mao, Yue, Wang, Gang, Li, Qingdu, Zhang, Jianwei, Ji, Yunfeng
The game of table tennis is renowned for its extremely high spin rate, but most table tennis robots today struggle to handle balls with such rapid spin. To address this issue, we have contributed a series of methods, including: 1. Curriculum Reinforcement Learning (RL): This method helps the table tennis robot learn to play table tennis progressively from easy to difficult tasks. 2. Analysis of Spinning Table Tennis Ball Collisions: We have conducted a physics-based analysis to generate more realistic trajectories of spinning table tennis balls after collision. 3. Definition of Trajectory States: The definition of trajectory states aids in setting up the reward function. 4. Selection of Valid Rally Trajectories: We have introduced a valid rally trajectory selection scheme to ensure that the robot's training is not influenced by abnormal trajectories. 5. Reality-to-Simulation (Real2Sim) Transfer: This scheme is employed to validate the trained robot's ability to handle spinning balls in real-world scenarios. With Real2Sim, the deployment costs for robotic reinforcement learning can be further reduced. Moreover, the trajectory-state-based reward function is not limited to table tennis robots; it can be generalized to a wide range of cyclical tasks. To validate our robot's ability to handle spinning balls, the Real2Sim experiments were conducted. For the specific video link of the experiment, please refer to the supplementary materials.
Saarthi: The First AI Formal Verification Engineer
Kumar, Aman, Gadde, Deepak Narayan, Radhakrishna, Keerthan Kopparam, Lettnin, Djones
Recently, Devin has made a significant buzz in the Artificial Intelligence (AI) community as the world's first fully autonomous AI software engineer, capable of independently developing software code [1] [2]. Devin uses the concept of agentic workflow in Generative AI (GenAI), which empowers AI agents to engage in a more dynamic, iterative, and self-reflective process. With Saarthi, verification engineers can focus on more complex problems, and verification teams can strive for more ambitious goals. The domain-agnostic implementation of Saarthi makes it scalable for use across various domains such as RTL design, UVM-based verification, and others. Hardware design verification, especially formal verification, entails a methodical and disciplined approach to the planning, development, execution, and sign-off of functionally correct hardware designs. Formal verification uses mathematical methods to prove the correctness of hardware designs against their specifications, ensuring that all possible states and inputs are considered, which complements traditional simulation-based verification techniques that might only cover a subset of possible scenarios due to practical constraints [3]. The formal verification process encompasses several key roles, such as organizational coordination, task allocation, code development, property proving, analyzing Counter Examples (CEXs), debugging, coverage closure, and documentation preparation. These roles are crucial for managing the complexity and ensuring the thoroughness of the verification process. For instance, analyzing counterexamples involves identifying specific scenarios where the design might fail to meet its specifications, which is critical for debugging and refining the design. This highly intricate activity demands meticulous attention to detail, given its long development cycles and the critical nature of ensuring hardware functionality and reliability [4]. The field of Natural Language Processing (NLP) has undergone a significant transformation with the advent of Large Language Models (LLMs) [5].
Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments
Payoungkhamdee, Patomporn, Tuchinda, Pume, Baek, Jinheon, Cahyawijaya, Samuel, Udomcharoenchaikit, Can, Manakul, Potsawee, Limkonchotiwat, Peerat, Chuangsuwanich, Ekapol, Nutanong, Sarana
Multi-step reasoning is essential for large language models (LLMs), yet multilingual performance remains challenging. While Chain-of-Thought (CoT) prompting improves reasoning, it struggles with non-English languages due to the entanglement of reasoning and execution. Program-of-Thought (PoT) prompting separates reasoning from execution, offering a promising alternative but shifting the challenge to generating programs from non-English questions. We propose a framework to evaluate PoT by separating multilingual reasoning from code execution to examine (i) the impact of fine-tuning on question-reasoning alignment and (ii) how reasoning quality affects answer correctness. Our findings demonstrate that PoT fine-tuning substantially enhances multilingual reasoning, outperforming CoT fine-tuned models. We further demonstrate a strong correlation between reasoning quality (measured through code quality) and answer accuracy, highlighting its potential as a test-time performance improvement heuristic.
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths
Chia, Yew Ken, Chen, Guizhen, Xu, Weiwen, Tuan, Luu Anh, Poria, Soujanya, Bing, Lidong
Advanced models such as OpenAI o1 exhibit impressive problem-solving capabilities through step-by-step reasoning. However, they may still falter on more complex problems, making errors that disrupt their reasoning paths. We attribute this to the expansive solution space, where each step has the risk of diverging into mistakes. To enhance language model reasoning, we introduce a specialized training framework called Reasoning Paths Optimization (RPO), which enables learning to reason and explore from diverse paths. Our approach encourages favorable branches at each reasoning step while penalizing unfavorable ones, enhancing the model's overall problem-solving performance. Reasoning Paths Optimization does not rely on large-scale human-annotated rationales or outputs from closed-source models, making it scalable and data-efficient. We focus on multi-step reasoning tasks, such as math word problems and science-based exam questions. The experiments demonstrate that our framework significantly enhances the reasoning performance of large language models, with up to 3.1% and 4.3% improvement on GSM8K and MMLU (STEM) respectively. Our data and code can be found at https://reasoning-paths.github.io.
Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning
Wu, Zixuan, Zaidi, Zulfiqar, Patil, Adithya, Xiao, Qingyu, Gombolay, Matthew
In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D image space, closing the planning loop within this 2D space, and transfer constrained motion of interest back to task space. Additionally, we demonstrate that the learned policy can serve as a local planner in conjunction with position control. We apply this framework in the wheelchair tennis navigation problem to guide the wheelchair into the ball-hitting region. Our pipeline achieves a navigation success rate of 97.67% in reaching real-world recorded tennis ball trajectories with a physical robot wheelchair, and achieve a success rate of 68.49% in a real-world, real-time experiment on a full-sized tennis court.