AITopics | ath

Collaborating Authors

ath

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes

Kone, Cyrille, Jamieson, Kevin

arXiv.org Machine LearningMay-6-2026

We study the $(\varepsilon, δ)$-PAC policy identification problem in finite-horizon episodic Markov Decision Processes. Existing approaches provide finite-time guarantees for approximate settings ($\varepsilon>0$) but suffer from high computational cost, rendering them hard to implement, and also suffer from suboptimal dependence on $\log(1/δ)$. We propose a randomized and computationally efficient algorithm for best policy identification that combines posterior sampling with an online learning algorithm to guide exploration in the MDP. Our method achieves asymptotic optimality in sample complexity, also in terms of posterior contraction rate, and runs in $O(S^2AH)$ per episode, matching standard model-based approaches. Unlike prior algorithms such as MOCA and PEDEL, our guarantees remain meaningful in the asymptotic regime and avoid sub-optimal polynomial dependence on $\log(1/δ)$. Our results provide both theoretical insights and practical tools for efficient policy identification in tabular MDPs.

artificial intelligence, machine learning, sth, (15 more...)

arXiv.org Machine Learning

2605.03921

Genre: Research Report > New Finding (0.48)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

DARLING: Detection Augmented Reinforcement Learning with Non-Stationary Guarantees

Gerogiannis, Argyrios, Huang, Yu-Han, Veeravalli, Venugopal V.

arXiv.org Machine LearningApr-21-2026

We study model-free reinforcement learning (RL) in non-stationary finite-horizon episodic Markov decision processes (MDPs) without prior knowledge of the non-stationarity. We focus on the piecewise-stationary (PS) setting, where both the reward and transition dynamics can change an arbitrary number of times. We propose Detection Augmented Reinforcement Learning (DARLING), a modular wrapper for PS-RL that applies to both tabular and linear MDPs, without knowledge of the changes. Under certain change-point separation and reachability conditions, DARLING improves the best available dynamic regret bounds in both settings and yields strong empirical performance. We further establish the first minimax lower bounds for PS-RL in tabular and linear MDPs, showing that DARLING is the first nearly optimal algorithm. Experiments on standard benchmarks demonstrate that DARLING consistently surpasses the state-of-the-art methods across diverse non-stationary scenarios.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2604.16684

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

MobILE: Model-BasedImitationLearning From ObservationAlone

Neural Information Processing SystemsFeb-11-2026, 20:22:52 GMT

Weprovide aunified analysis for MobILE, and demonstrate that MobILE enjoys strong performance guarantees for classes of MDP dynamics that satisfy certain well studied notions of structural complexity. We also show that the ILFO problem isstrictly harder than the standard IL problem by presenting an exponential sample complexity separation between ILand ILFO.

ath, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

UnpackingRewardShaping

Neural Information Processing SystemsFeb-9-2026, 09:55:56 GMT

Much of this work is based on upper confidence bound (UCB) principles and prescribes some kind of exploration bonus to prioritize exploration of rarely visited regions.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment

Jeung, Wonje, Yoon, Sangyeon, Kahng, Minsuk, No, Albert

arXiv.org Artificial IntelligenceOct-24-2025

Large Reasoning Models (LRMs) have become powerful tools for complex problem solving, but their structured reasoning pathways can lead to unsafe outputs when exposed to harmful prompts. Existing safety alignment methods reduce harmful outputs but can degrade reasoning depth, leading to significant trade-offs in complex, multi-step tasks, and remain vulnerable to sophisticated jailbreak attacks. To address this, we introduce SAFEPATH, a lightweight alignment method that fine-tunes LRMs to emit a short, 8-token Safety Primer at the start of their reasoning, in response to harmful prompts, while leaving the rest of the reasoning process unsupervised. Empirical results across multiple benchmarks indicate that SAFEPATH effectively reduces harmful outputs while maintaining reasoning performance. Specifically, SAFEPATH reduces harmful responses by up to 90.0% and blocks 83.3% of jailbreak attempts in the DeepSeek-R1-Distill-Llama-8B model, while requiring 295.9x less compute than Direct Refusal and 314.1x less than SafeChain. We further introduce a zero-shot variant that requires no fine-tuning. In addition, we provide a comprehensive analysis of how existing methods in LLMs generalize, or fail, when applied to reasoning-centric models, revealing critical gaps and new directions for safer AI.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.14667

Genre: Research Report > New Finding (0.67)

Industry:

Banking & Finance (0.47)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A-MHA: Anytime Multi-Heuristic A

Natarajan, Ramkumar, Saleem, Muhammad Suhail, Xiao, William, Aine, Sandip, Choset, Howie, Likhachev, Maxim

arXiv.org Artificial IntelligenceSep-1-2025

Designing good heuristic functions for graph search requires adequate domain knowledge. It is often easy to design heuristics that perform well and correlate with the underlying true cost-to-go values in certain parts of the search space but these may not be admissible throughout the domain thereby affecting the optimality guarantees of the search. Bounded suboptimal search using several such partially good but inadmissible heuristics was developed in Multi-Heuristic A* (MHA*). Although MHA* leverages multiple inadmissible heuristics to potentially generate a faster suboptimal solution, the original version does not improve the solution over time. It is a one shot algorithm that requires careful setting of inflation factors to obtain a desired one time solution. In this work, we tackle this issue by extending MHA* to an anytime version that finds a feasible suboptimal solution quickly and continually improves it until time runs out. Our work is inspired from the Anytime Repairing A* (ARA*) algorithm. We prove that our precise adaptation of ARA* concepts in the MHA* framework preserves the original suboptimal and completeness guarantees and enhances MHA* to perform in an anytime fashion. Furthermore, we report the performance of A-MHA* in 3-D path planning domain and sliding tiles puzzle and compare against MHA* and other anytime algorithms.

algorithm, artificial intelligence, open 0, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/socs.v10i1.18486 10.1609/socs.v10i1.18486 10.1609/socs.v10i1.18486

2508.21637

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection

Yan, Yibo, Wang, Shen, Huo, Jiahao, Yu, Philip S., Hu, Xuming, Wen, Qingsong

arXiv.org Artificial IntelligenceMar-23-2025

Mathematical error detection in educational settings presents a significant challenge for Multimodal Large Language Models (MLLMs), requiring a sophisticated understanding of both visual and textual mathematical content along with complex reasoning capabilities. Though effective in mathematical problem-solving, MLLMs often struggle with the nuanced task of identifying and categorizing student errors in multimodal mathematical contexts. Therefore, we introduce MathAgent, a novel Mixture-of-Math-Agent framework designed specifically to address these challenges. Our approach decomposes error detection into three phases, each handled by a specialized agent: an image-text consistency validator, a visual semantic interpreter, and an integrative error analyzer. This architecture enables more accurate processing of mathematical content by explicitly modeling relationships between multimodal problems and student solution steps. We evaluate MathAgent on real-world educational data, demonstrating approximately 5% higher accuracy in error step identification and 3% improvement in error categorization compared to baseline models. Besides, MathAgent has been successfully deployed in an educational platform that has served over one million K-12 students, achieving nearly 90% student satisfaction while generating significant cost savings by reducing manual error detection.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.18132

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)

Add feedback

DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models

Zou, Chengke, Guo, Xingang, Yang, Rui, Zhang, Junyu, Hu, Bin, Zhang, Huan

arXiv.org Artificial IntelligenceOct-29-2024

The rapid advancements in Vision-Language Models (VLMs) have shown great potential in tackling mathematical reasoning tasks that involve visual context. Unlike humans who can reliably apply solution steps to similar problems with minor modifications, we found that SOTA VLMs like GPT-4o can consistently fail in these scenarios, revealing limitations in their mathematical reasoning capabilities. In this paper, we investigate the mathematical reasoning robustness in VLMs and evaluate how well these models perform under different variants of the same question, such as changes in visual numerical values or function graphs. While several vision-based math benchmarks have been developed to assess VLMs' problem-solving capabilities, these benchmarks contain only static sets of problems and cannot easily evaluate mathematical reasoning robustness. To fill this gap, we introduce DynaMath, a dynamic visual math benchmark designed for in-depth assessment of VLMs. DynaMath includes 501 high-quality, multi-topic seed questions, each represented as a Python program. Those programs are carefully designed and annotated to enable the automatic generation of a much larger set of concrete questions, including many different types of visual and textual variations. DynaMath allows us to evaluate the generalization ability of VLMs, by assessing their performance under varying input conditions of a seed question. We evaluated 14 SOTA VLMs with 5,010 generated concrete questions. Our results show that the worst-case model accuracy, defined as the percentage of correctly answered seed questions in all 10 variants, is significantly lower than the average-case accuracy. Our analysis emphasizes the need to study the robustness of VLMs' reasoning abilities, and DynaMath provides valuable insights to guide the development of more reliable models for mathematical reasoning.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.00836

Country:

North America > United States > Illinois (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Crafting the Path: Robust Query Rewriting for Information Retrieval

Baek, Ingeol, Lee, Jimin, Yang, Joonho, Lee, Hwanhee

arXiv.org Artificial IntelligenceJul-17-2024

Query rewriting aims to generate a new query that can complement the original query to improve the information retrieval system. Recent studies on query rewriting, such as query2doc (Q2D), query2expand (Q2E) and querey2cot (Q2C), rely on the internal knowledge of Large Language Models (LLMs) to generate a relevant passage to add information to the query. Nevertheless, the efficacy of these methodologies may markedly decline in instances where the requisite knowledge is not encapsulated within the model's intrinsic parameters. In this paper, we propose a novel structured query rewriting method called Crafting the Path tailored for retrieval systems. Crafting the Path involves a three-step process that crafts query-related information necessary for finding the passages to be searched in each step. Specifically, the Crafting the Path begins with Query Concept Comprehension, proceeds to Query Type Identification, and finally conducts Expected Answer Extraction. Experimental results show that our method outperforms previous rewriting methods, especially in less familiar domains for LLMs. We demonstrate that our method is less dependent on the internal parameter knowledge of the model and generates queries with fewer factual inaccuracies. Furthermore, we observe that Crafting the Path has less latency compared to the baselines.

computational linguistic, information, query, (13 more...)

arXiv.org Artificial Intelligence

2407.12529

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.14)
North America > United States > Michigan (0.04)
Europe > Germany (0.04)
(19 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area (0.94)
Media (0.68)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Adaptive Thresholding Heuristic for KPI Anomaly Detection

Isaac, Ebenezer R. H. P., Sharma, Akshat

arXiv.org Artificial IntelligenceAug-21-2023

A plethora of outlier detectors have been explored in the time series domain, however, in a business sense, not all outliers are anomalies of interest. Existing anomaly detection solutions are confined to certain outlier detectors limiting their applicability to broader anomaly detection use cases. Network KPIs (Key Performance Indicators) tend to exhibit stochastic behaviour producing statistical outliers, most of which do not adversely affect business operations. Thus, a heuristic is required to capture the business definition of an anomaly for time series KPI. This article proposes an Adaptive Thresholding Heuristic (ATH) to dynamically adjust the detection threshold based on the local properties of the data distribution and adapt to changes in time series patterns. The heuristic derives the threshold based on the expected periodicity and the observed proportion of anomalies minimizing false positives and addressing concept drift. ATH can be used in conjunction with any underlying seasonality decomposition method and an outlier detector that yields an outlier score. This method has been tested on EON1-Cell-U, a labeled KPI anomaly dataset produced by Ericsson, to validate our hypothesis. Experimental results show that ATH is computationally efficient making it scalable for near real time anomaly detection and flexible with multiple forecasters and outlier detectors.

anomaly, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2308.10504

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

Add feedback