Goto

Collaborating Authors

 apollo


Sutton's predictions v Gladiators star Apollo

BBC News

Having won only one of their past six Premier League games and drawn 2-2 at Tottenham after being 2-0 up, can second-placed Manchester City get back on track at Liverpool on Sunday? I wouldn't rule City out of anything at the moment said BBC Sport football expert Chris Sutton. But the way they folded in the second half against Tottenham was a real worry. Sutton is making predictions for all 380 Premier League games this season, against AI, BBC Sport readers and a variety of guests. His guest for week 25 is Gladiators star Apollo, real name Alex Gray, who supports Newcastle . Before becoming a Gladiator, the 6ft 6in Gray played Premiership rugby for three teams and also American Football for NFL side Atlanta Falcons.


A Minimalist Optimizer Design for LLM Pretraining

Glentis, Athanasios, Li, Jiaxiang, Han, Andi, Hong, Mingyi

arXiv.org Artificial Intelligence

Training large language models (LLMs) typically relies on adaptive optimizers such as Adam, which introduce extra operations and require significant more memory to maintain first- and second-order moments than SGD. While recent works such as GaLore, Fira and APOLLO have proposed state-compressed variants to reduce memory consumption, a fundamental question remains: What are the minimum modifications to plain SGD needed to match state-of-the-art pretraining performance? We systematically investigate this question using a bottom-up approach, and identify two simple yet highly (memory- and compute-) efficient techniques: (1) column-wise gradient normalization (normalizing the gradient along the output dimension), which boosts SGD performance without momentum; and (2) applying first-order momentum only to the output layer, where gradient variance is highest. Combining these two techniques lead to SCALE (Stochastic Column-normAlized Last-layer momEntum), a simple optimizer for memory efficient pretraining. Across multiple LLaMA models (60M-1B), SCALE matches or exceeds the performance of Adam while using only 35-45% of the total memory. It also consistently outperforms memory-efficient optimizers such as GaLore, Fira and APOLLO, making it a strong candidate for large-scale pretraining under memory constraints. For LLaMA 7B model, SCALE outperforms the state-of-the-art memory-efficient methods APOLLO and Muon, in terms of both perplexity and memory consumption.


APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Ospanov, Azim, Farnia, Farzan, Yousefzadeh, Roozbeh

arXiv.org Artificial Intelligence

Formal reasoning and automated theorem proving constitute a challenging subfield of machine learning, in which machines are tasked with proving mathematical theorems using formal languages like Lean. A formal verification system can check whether a formal proof is correct or not almost instantaneously, but generating a completely correct formal proof with large language models (LLMs) remains a formidable task. The usual approach in the literature is to prompt the LLM many times (up to several thousands) until one of the generated proofs passes the verification system. In this work, we present APOLLO (Automated PrOof repair viaLLM and Lean cOllaboration), a modular, model-agnostic agentic framework that combines the strengths of the Lean compiler with an LLM's reasoning abilities to achieve better proof-generation results at a low token and sampling budgets. Apollo directs a fully automated process in which the LLM generates proofs for theorems, a set of agents analyze the proofs, fix the syntax errors, identify the mistakes in the proofs using Lean, isolate failing sub-lemmas, utilize automated solvers, and invoke an LLM on each remaining goal with a low top-K budget. The repaired sub-proofs are recombined and reverified, iterating up to a user-controlled maximum number of attempts. On the miniF2F benchmark, we establish a new state-of-the-art accuracy of 84.9% among sub 8B-parameter models (as of August 2025) while keeping the sampling budget below one hundred. Moreover, Apollo raises the state-of-the-art accuracy for Goedel-Prover-SFT to 65.6% while cutting sample complexity from 25,600 to a few hundred. General-purpose models (o3-mini, o4-mini) jump from 3-7% to over 40% accuracy. Our results demonstrate that targeted, compiler-guided repair of LLM outputs yields dramatic gains in both efficiency and correctness, suggesting a general paradigm for scalable automated theorem proving.


Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

Tang, Liou, Joshi, James, Kundu, Ashish

arXiv.org Artificial Intelligence

Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.


Program of Thoughts for Financial Reasoning: Leveraging Dynamic In-Context Examples and Generative Retrieval

Khatuya, Subhendu, Naidu, Shashwat, Goyal, Pawan, Ganguly, Niloy

arXiv.org Artificial Intelligence

Despite continuous advancements in the capabilities of large language models (LLMs), numerical reasoning remains a challenging area. Techniques like chain-of-thought prompting, tree-of-thought prompting, and program-of-thought prompting guide LLMs through intermediate reasoning steps. Although in-context learning with few-shot prompting has improved performance, LLMs still lag behind state-of-the-art models on financial numerical reasoning datasets such as FinQA and ConvFinQA. In this work, we introduce FINDER, a novel two-step framework, to enhance LLMs' capabilities in financial numerical reasoning. The first step utilizes a generative retriever to extract relevant facts from unstructured data, including both text and tables. This is followed by context-aware Program of Thought prompting with dynamic selection of in-context examples. Our model FINDER achieves a new state-of-the-art performance on both the FinQA and ConvFinQA datasets, surpassing previous benchmarks with execution accuracy improvements of 5.98% and 4.05%, respectively.


Toward a Full-Stack Co-Simulation Platform for Testing of Automated Driving Systems

Bi, Dong, Zhao, Yongqi, Gu, Zhengguo, Mihalj, Tomislav, Hu, Jia, Eichberger, Arno

arXiv.org Artificial Intelligence

Virtual testing has emerged as an effective approach to accelerate the deployment of automated driving systems. Nevertheless, existing simulation toolchains encounter difficulties in integrating rapid, automated scenario generation with simulation environments supporting advanced automated driving capabilities. To address this limitation, a full-stack toolchain is presented, enabling automatic scenario generation from real-world datasets and efficient validation through a co-simulation platform based on CarMaker, ROS, and Apollo. The simulation results demonstrate the effectiveness of the proposed toolchain. A demonstration video showcasing the toolchain is available at the provided link: https://youtu.be/taJw_-CmSiY.


On-Demand Scenario Generation for Testing Automated Driving Systems

Yan, Songyang, Zhang, Xiaodong, Hao, Kunkun, Xin, Haojie, Luo, Yonggang, Yang, Jucheng, Fan, Ming, Yang, Chao, Sun, Jun, Yang, Zijiang

arXiv.org Artificial Intelligence

The safety and reliability of Automated Driving Systems (ADS) are paramount, necessitating rigorous testing methodologies to uncover potential failures before deployment. Traditional testing approaches often prioritize either natural scenario sampling or safety-critical scenario generation, resulting in overly simplistic or unrealistic hazardous tests. In practice, the demand for natural scenarios (e.g., when evaluating the ADS's reliability in real-world conditions), critical scenarios (e.g., when evaluating safety in critical situations), or somewhere in between (e.g., when testing the ADS in regions with less civilized drivers) varies depending on the testing objectives. To address this issue, we propose the On-demand Scenario Generation (OSG) Framework, which generates diverse scenarios with varying risk levels. Achieving the goal of OSG is challenging due to the complexity of quantifying the criticalness and naturalness stemming from intricate vehicle-environment interactions, as well as the need to maintain scenario diversity across various risk levels. OSG learns from real-world traffic datasets and employs a Risk Intensity Regulator to quantitatively control the risk level. It also leverages an improved heuristic search method to ensure scenario diversity. We evaluate OSG on the Carla simulators using various ADSs. We verify OSG's ability to generate scenarios with different risk levels and demonstrate its necessity by comparing accident types across risk levels. With the help of OSG, we are now able to systematically and objectively compare the performance of different ADSs based on different risk levels.


Testing the Fault-Tolerance of Multi-Sensor Fusion Perception in Autonomous Driving Systems

Tian, Haoxiang, Ding, Wenqiang, Han, Xingshuo, Wu, Guoquan, Guo, An, Chen, Junqi Zhang. Wei, Wei, Jun, Zhang, Tianwei

arXiv.org Artificial Intelligence

High-level Autonomous Driving Systems (ADSs), such as Google Waymo and Baidu Apollo, typically rely on multi-sensor fusion (MSF) based approaches to perceive their surroundings. This strategy increases perception robustness by combining the respective strengths of the camera and LiDAR and directly affects the safety-critical driving decisions of autonomous vehicles (AVs). However, in real-world autonomous driving scenarios, cameras and LiDAR are subject to various faults, which can probably significantly impact the decision-making and behaviors of ADSs. Existing MSF testing approaches only discovered corner cases that the MSF-based perception cannot accurately detected by MSF-based perception, while lacking research on how sensor faults affect the system-level behaviors of ADSs. To address this gap, we conduct the first exploration of the fault tolerance of MSF perception-based ADS for sensor faults. In this paper, we systematically and comprehensively build fault models for cameras and LiDAR in AVs and inject them into the MSF perception-based ADS to test its behaviors in test scenarios. To effectively and efficiently explore the parameter spaces of sensor fault models, we design a feedback-guided differential fuzzer to discover the safety violations of MSF perception-based ADS caused by the injected sensor faults. We evaluate FADE on the representative and practical industrial ADS, Baidu Apollo. Our evaluation results demonstrate the effectiveness and efficiency of FADE, and we conclude some useful findings from the experimental results. To validate the findings in the physical world, we use a real Baidu Apollo 6.0 EDU autonomous vehicle to conduct the physical experiments, and the results show the practical significance of our findings.


Noble Audio FoKus Apollo review: The high price of pristine audio

Engadget

That's because most audio companies sell their top-of-the-line gear around 300- 400. The FoKus Rex5 earbuds, for example, cram in five separate drivers where much of the competition uses two at the most. Noble was also among the first to employ xMEMS drivers in wireless earbuds in a bid to improve bass performance. Enter the FoKus Apollo, a 649 pair of active noise canceling (ANC) headphones with a detachable boom mic and up to 80 hours of battery life. The real star of the show is the driver setup, which Noble says is the first time this configuration appears in wireless headphones.


Open-Source Autonomous Driving Software Platforms: Comparison of Autoware and Apollo

Jung, Hee-Yang, Paek, Dong-Hee, Kong, Seung-Hyun

arXiv.org Artificial Intelligence

Full-stack autonomous driving system spans diverse technological domains-including perception, planning, and control-that each require in-depth research. Moreover, validating such technologies of the system necessitates extensive supporting infrastructure, from simulators and sensors to high-definition maps. These complexities with barrier to entry pose substantial limitations for individual developers and research groups. Recently, open-source autonomous driving software platforms have emerged to address this challenge by providing autonomous driving technologies and practical supporting infrastructure for implementing and evaluating autonomous driving functionalities. Among the prominent open-source platforms, Autoware and Apollo are frequently adopted in both academia and industry. While previous studies have assessed each platform independently, few have offered a quantitative and detailed head-to-head comparison of their capabilities. In this paper, we systematically examine the core modules of Autoware and Apollo and evaluate their middleware performance to highlight key differences. These insights serve as a practical reference for researchers and engineers, guiding them in selecting the most suitable platform for their specific development environments and advancing the field of full-stack autonomous driving system.