AITopics

2510.10402

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceOct-14-2025

Aristotle: IMO-level Automated Theorem Proving

Achim, Tudor, Best, Alex, Bietti, Alberto, Der, Kevin, Fédérico, Mathïs, Gukov, Sergei, Halpern-Leistner, Daniel, Henningsgard, Kirsten, Kudryashov, Yury, Meiburg, Alexander, Michelsen, Martin, Patterson, Riley, Rodriguez, Eric, Scharff, Laura, Shanker, Vikram, Sicca, Vladmir, Sowrirajan, Hari, Swope, Aidan, Tamas, Matyas, Tenev, Vlad, Thomm, Jonathan, Williams, Harold, Wu, Lawrence

We introduce Aristotle, an AI system that combines formal verification with informal reasoning, achieving gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver. Our system demonstrates state-of-the-art performance with favorable scaling properties for automated theorem proving.

large language model, logic & formal reasoning, machine learning, (21 more...)

2510.01346

Country:

North America (0.46)
Asia > India (0.28)

Genre:

Research Report (0.52)
Personal > Honors (0.34)

Industry: Leisure & Entertainment (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Shree, Atul, Jupuru, Harshith

FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms

CTC-based ASR systems face computational and memory bottlenecks in resource-limited environments. Traditional CTC decoders, requiring up to 90% of processing time in systems (e.g., wav2vec2-large on L4 GPUs), face inefficiencies due to exhaustive token-level operations. This paper introduces Frame Level Token Pruning for Connectionist Temporal Classification (FLToP CTC), a novel decoding algorithm that employs frame-level token pruning guided by a relative threshold probability. By dynamically eliminating low-probability tokens per frame, FLToP CTC reduces compute and memory demands while maintaining negligible WER degradation. On LibriSpeech, FLToP CTC achieves a 10.5x runtime speedup and 2.78x memory reduction versus standard CTC decoders. Its simplicity enables seamless integration into CTC decoders across platforms (CPUs, GPUs, etc.). FLToP CTC addresses CTC bottlenecks, offering scalability for resource-limited environments and realtime applications, enhancing speech recognition accessibility and efficiency.

artificial intelligence, machine learning, pruning, (13 more...)

2510.09085

Country: Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.32)

Ghoukasian, Hrad, Lee, Bonwoo, Asoodeh, Shahab

Locally Optimal Private Sampling: Beyond the Global Minimax

We study the problem of sampling from a distribution under local differential privacy (LDP). Given a private distribution $P \in \mathcal{P}$, the goal is to generate a single sample from a distribution that remains close to $P$ in $f$-divergence while satisfying the constraints of LDP. This task captures the fundamental challenge of producing realistic-looking data under strong privacy guarantees. While prior work by Park et al. (NeurIPS'24) focuses on global minimax-optimality across a class of distributions, we take a local perspective. Specifically, we examine the minimax risk in a neighborhood around a fixed distribution $P_0$, and characterize its exact value, which depends on both $P_0$ and the privacy level. Our main result shows that the local minimax risk is determined by the global minimax risk when the distribution class $\mathcal{P}$ is restricted to a neighborhood around $P_0$. To establish this, we (1) extend previous work from pure LDP to the more general functional LDP framework, and (2) prove that the globally optimal functional LDP sampler yields the optimal local sampler when constrained to distributions near $P_0$. Building on this, we also derive a simple closed-form expression for the locally minimax-optimal samplers which does not depend on the choice of $f$-divergence. We further argue that this local framework naturally models private sampling with public data, where the public data distribution is represented by $P_0$. In this setting, we empirically compare our locally optimal sampler to existing global methods, and demonstrate that it consistently outperforms global minimax samplers.

artificial intelligence, minimax-optimal sampler, sampler, (15 more...)

2510.09485

Country: North America (0.45)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.92)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

CausalDynamics: A large-scale benchmark for structural discovery of dynamical causal models

Herdeanu, Benjamin, Nathaniel, Juan, Roesch, Carla, Buch, Jatan, Ramien, Gregor, Haux, Johannes, Gentine, Pierre

Causal discovery for dynamical systems poses a major challenge in fields where active interventions are infeasible. Most methods used to investigate these systems and their associated benchmarks are tailored to deterministic, low-dimensional and weakly nonlinear time-series data. To address these limitations, we present CausalDynamics, a large-scale benchmark and extensible data generation framework to advance the structural discovery of dynamical causal models. Our benchmark consists of true causal graphs derived from thousands of both linearly and nonlinearly coupled ordinary and stochastic differential equations as well as two idealized climate models. We perform a comprehensive evaluation of state-of-the-art causal discovery algorithms for graph reconstruction on systems with noisy, confounded, and lagged dynamics. CausalDynamics consists of a plug-and-play, build-your-own coupling workflow that enables the construction of a hierarchy of physical systems. We anticipate that our framework will facilitate the development of robust causal discovery algorithms that are broadly applicable across domains while addressing their unique challenges. We provide a user-friendly implementation and documentation on https://kausable.github.io/CausalDynamics.

artificial intelligence, machine learning, node, (17 more...)

2505.1662

Country: Europe (0.46)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
(3 more...)

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Nasr, Milad, Carlini, Nicholas, Sitawarin, Chawin, Schulhoff, Sander V., Hayes, Jamie, Ilie, Michael, Pluto, Juliette, Song, Shuang, Chaudhari, Harsh, Shumailov, Ilia, Thakurta, Abhradeep, Xiao, Kai Yuanqing, Terzis, Andreas, Tramèr, Florian

How should we evaluate the robustness of language model defenses? Current defenses against jailbreaks and prompt injections (which aim to prevent an attacker from eliciting harmful knowledge or remotely triggering malicious actions, respectively) are typically evaluated either against a static set of harmful attack strings, or against computationally weak optimization methods that were not designed with the defense in mind. We argue that this evaluation process is flawed. Instead, we should evaluate defenses against adaptive attackers who explicitly modify their attack strategy to counter a defense's design while spending considerable resources to optimize their objective. By systematically tuning and scaling general optimization techniques-gradient descent, reinforcement learning, random search, and human-guided exploration-we bypass 12 recent defenses (based on a diverse set of techniques) with attack success rate above 90% for most; importantly, the majority of defenses originally reported near-zero attack success rates. We believe that future defense work must consider stronger attacks, such as the ones we describe, in order to make reliable and convincing claims of robustness.

large language model, machine learning, natural language, (21 more...)

2510.09023

Country:

Europe (0.67)
Asia (0.67)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

MagicDock: Toward Docking-oriented De Novo Ligand Design via Gradient Inversion

Chen, Zekai, Li, Xunkai, Zhang, Sirui, Sun, Henan, Li, Jia, Li, Zhenjun, Zhou, Bing, Li, Rong-Hua, Wang, Guoren

De novo ligand design is a fundamental task that seeks to generate protein or molecule candidates that can effectively dock with protein receptors and achieve strong binding affinity entirely from scratch. It holds paramount significance for a wide spectrum of biomedical applications. However, most existing studies are constrained by the \textbf{Pseudo De Novo}, \textbf{Limited Docking Modeling}, and \textbf{Inflexible Ligand Type}. To address these issues, we propose MagicDock, a forward-looking framework grounded in the progressive pipeline and differentiable surface modeling. (1) We adopt a well-designed gradient inversion framework. To begin with, general docking knowledge of receptors and ligands is incorporated into the backbone model. Subsequently, the docking knowledge is instantiated as reverse gradient flows by binding prediction, which iteratively guide the de novo generation of ligands. (2) We emphasize differentiable surface modeling in the docking process, leveraging learnable 3D point-cloud representations to precisely capture binding details, thereby ensuring that the generated ligands preserve docking validity through direct and interpretable spatial fingerprints. (3) We introduce customized designs for different ligand types and integrate them into a unified gradient inversion framework with flexible triggers, thereby ensuring broad applicability. Moreover, we provide rigorous theoretical guarantees for each component of MagicDock. Extensive experiments across 9 scenarios demonstrate that MagicDock achieves average improvements of 27.1\% and 11.7\% over SOTA baselines specialized for protein or molecule ligand design, respectively.

artificial intelligence, machine learning, optimization problem, (19 more...)

2510.0902

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.92)
(2 more...)

Harshbarger, Ian, Chidambaram, Calvin

Slim Scheduler: A Runtime-Aware RL and Scheduler System for Efficient CNN Inference

Most neural network scheduling research focuses on optimizing static, end-to-end models of fixed width, overlooking dynamic approaches that adapt to heterogeneous hardware and fluctuating runtime conditions. We present Slim Scheduler, a hybrid scheduling framework that integrates a Proximal Policy Optimization (PPO) reinforcement learning policy with algorithmic, greedy schedulers to coordinate distributed inference for slimmable models. Each server runs a local greedy scheduler that batches compatible requests and manages instance scaling based on VRAM and utilization constraints, while the PPO router learns global routing policies for device selection, width ratio, and batch configuration. This hierarchical design reduces search space complexity, mitigates overfitting to specific hardware, and balances efficiency and throughput. Compared to a purely randomized task distribution baseline, Slim Scheduler can achieve various accuracy and latency trade-offs such as: A 96.45% reduction in mean latency and a 97.31% reduction in energy usage dropping accuracy to the slimmest model available (70.3%). It can then accomplish an overall reduction in average latency plus energy consumption with an increase in accuracy at the cost of higher standard deviations of said latency and energy, effecting overall task throughput.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2510.09018

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Energy (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.36)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Alrashedy, Kamel, Srihari, Vriksha, Zaidi, Zulfiqar, Srivastava, Ridam, Tambwekar, Pradyumna, Gombolay, Matthew

Constraints-of-Thought: A Framework for Constrained Reasoning in Language-Model-Guided Search

While researchers have made significant progress in enabling large language models (LLMs) to perform multi-step planning, LLMs struggle to ensure that those plans align with high-level user intent and satisfy symbolic constraints, especially in complex, multi-step domains. Existing reasoning approaches such as Chain-of-Thought (CoT), Tree-of-Thought (ToT), and verifier-augmented methods, expand the search space but often yield infeasible actions or hallucinated steps. To overcome these limitations, we propose Constraints-of-Thought (Const-o-T), a framework that provides a structured prior that enables Monte Carlo Tree Search (MCTS) focus search on semantically meaningful paths. Each reasoning step is represented as an (intent, constraint) pair, which serves both to compress the search space and enforce validity. Unlike prior methods that merely generate reasoning traces or validate outputs post hoc, Const-o-T uses (intent, constraint)pairs to actively focus the search toward feasible and meaningful plans. We integrate Const-o-T into MCTS using a structured representation of intent-constraint pairs constraints prune infeasible branches and guide exploration toward semantically valid actions, improving planning efficiency and verifiable decision-making. We demonstrate across three domains Risk game, CAD code generation, and arithmetic reasoning that our approach outperforms baselines, yielding higher accuracy and stronger structural alignment. Our contribution is to demonstrate that Const-of-T offers a generalizable foundation for constraint-guided reasoning, enabling more efficient, constraint-aligned, and domain-adaptable planning with LLMs.

constraint, large language model, machine learning, (17 more...)

2510.08992

Genre:

Research Report > New Finding (1.00)
Workflow (0.93)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Dwivedula, Rohit, Saxena, Divyanshu, Akella, Aditya, Chaudhuri, Swarat, Kim, Daehyeok

Man-Made Heuristics Are Dead. Long Live Code Generators!

Policy design for various systems controllers has conventionally been a manual process, with domain experts carefully tailoring heuristics for the specific instance in which the policy will be deployed. In this paper, we re-imagine policy design via a novel automated search technique fueled by recent advances in generative models, specifically Large Language Model (LLM)-driven code generation. We outline the design and implementation of PolicySmith, a framework that applies LLMs to synthesize instance-optimal heuristics. We apply PolicySmith to two long-standing systems policies - web caching and congestion control, highlighting the opportunities unraveled by this LLM-driven heuristic search. For caching, PolicySmith discovers heuristics that outperform established baselines on standard open-source traces. For congestion control, we show that PolicySmith can generate safe policies that integrate directly into the Linux kernel.

computing machinery, large language model, machine learning, (14 more...)

2510.08803

Country:

North America > United States > California > Santa Clara County (0.29)
North America > United States > Texas > Travis County > Austin (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)