AITopics

Musawi, Rahmatullah, Lu, Sheng

Towards Contamination Resistant Benchmarks

The rapid development of large language models (LLMs) has transformed the landscape of natural language processing. Evaluating LLMs properly is crucial for understanding their potential and addressing concerns such as safety. However, LLM evaluation is confronted by various factors, among which contamination stands out as a key issue that undermines the reliability of evaluations. In this work, we introduce the concept of contamination resistance to address this challenge. We propose a benchmark based on Caesar ciphers (e.g., "ab" to "bc" when the shift is 1), which, despite its simplicity, is an excellent example of a contamination resistant benchmark. We test this benchmark on widely used LLMs under various settings, and we find that these models struggle with this benchmark when contamination is controlled. Our findings reveal issues in current LLMs and raise important questions regarding their true capabilities. Our work contributes to the development of contamination resistant benchmarks, enabling more rigorous LLM evaluation and offering insights into the true capabilities and limitations of LLMs.

large language model, machine learning, natural language, (18 more...)

2505.08389

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Implet: A Post-hoc Subsequence Explainer for Time Series Models

Meng, Fanyu, Kan, Ziwen, Rezaei, Shahbaz, Kong, Zhaodan, Chen, Xin, Liu, Xin

--Explainability in time series models is crucial for fostering trust, facilitating debugging, and ensuring interpretabil-ity in real-world applications. In this work, we introduce Im-plet, a novel post-hoc explainer that generates accurate and concise subsequence-level explanations for time series models. Our approach identifies critical temporal segments that significantly contribute to the model's predictions, providing enhanced interpretability beyond traditional feature-attribution methods. Based on it, we propose a cohort-based (group-level) explanation framework designed to further improve the conciseness and interpretability of our explanations. We evaluate Implet on several standard time-series classification benchmarks, demonstrating its effectiveness in improving interpretability. Deep learning models have demonstrated remarkable success in various time series forecasting and classification tasks, often surpassing traditional statistical methods. Despite their effectiveness, these models are frequently considered black-boxes, making their predictions challenging to interpret. Understanding which temporal patterns or subsequences contribute significantly to a model's decisions is crucial for building trust, debugging erroneous predictions, and making informed decisions, especially in high-stakes domains such as finance, healthcare, and climate science. Existing explainability methods for time series predominantly rely on feature attribution techniques, such as gradient-based saliency maps or perturbation-based approaches. While these methods provide valuable insights into individual time points or features influencing model predictions, their high dimensionality can complicate interpretation. In comparison, subsequence-based explanations, such as shapelet-based methods, are a type of global explanation that offer more intuitive insights by identifying discriminative temporal patterns within time series data.

artificial intelligence, data mining, machine learning, (19 more...)

2505.08748

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)

Genre: Research Report (0.82)

Industry: Transportation (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Miller, Justin K, Tang, Wenjia

Evaluating LLM Metrics Through Real-World Capabilities

As generative AI becomes increasingly embedded in everyday workflows, it is important to evaluate its performance in ways that reflect real-world usage rather than abstract notions of intelligence. Unlike many existing benchmarks that assess general intelligence, our approach focuses on real-world utility, evaluating how well models support users in everyday tasks. While current benchmarks emphasize code generation or factual recall, users rely on AI for a much broader range of activities-from writing assistance and summarization to citation formatting and stylistic feedback. In this paper, we analyze large-scale survey data and usage logs to identify six core capabilities that represent how people commonly use Large Language Models (LLMs): Summarization, Technical Assistance, Reviewing Work, Data Structuring, Generation, and Information Retrieval. We then assess the extent to which existing benchmarks cover these capabilities, revealing significant gaps in coverage, efficiency measurement, and interpretability. Drawing on this analysis, we use human-centered criteria to identify gaps in how well current benchmarks reflect common usage that is grounded in five practical criteria: coherence, accuracy, clarity, relevance, and efficiency. For four of the six capabilities, we identify the benchmarks that best align with real-world tasks and use them to compare leading models. We find that Google Gemini outperforms other models-including OpenAI's GPT, xAI's Grok, Meta's LLaMA, Anthropic's Claude, DeepSeek, and Qwen from Alibaba-on these utility-focused metrics.

benchmark, large language model, machine learning, (17 more...)

2505.08253

Country:

Asia > China (0.04)
Oceania > Australia (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(5 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report > Experimental Study (0.67)

Industry:

Media (1.00)
Law (1.00)
Information Technology (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.69)

Large Language Models for Computer-Aided Design: A Survey

Zhang, Licheng, Le, Bach, Akhtar, Naveed, Lam, Siew-Kei, Ngo, Tuan

Large Language Models (LLMs) have seen rapid advancements in recent years, with models like ChatGPT and DeepSeek, showcasing their remarkable capabilities across diverse domains. While substantial research has been conducted on LLMs in various fields, a comprehensive review focusing on their integration with Computer-Aided Design (CAD) remains notably absent. CAD is the industry standard for 3D modeling and plays a vital role in the design and development of products across different industries. As the complexity of modern designs increases, the potential for LLMs to enhance and streamline CAD workflows presents an exciting frontier. This article presents the first systematic survey exploring the intersection of LLMs and CAD. We begin by outlining the industrial significance of CAD, highlighting the need for AI-driven innovation. Next, we provide a detailed overview of the foundation of LLMs. We also examine both closed-source LLMs as well as publicly available models. The core of this review focuses on the various applications of LLMs in CAD, providing a taxonomy of six key areas where these models are making considerable impact. Finally, we propose several promising future directions for further advancements, which offer vast opportunities for innovation and are poised to shape the future of CAD technology. Github: https://github.com/lichengzhanguom/LLMs-CAD-Survey-Taxonomy

large language model, machine learning, natural language, (18 more...)

2505.08137

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Asia > Singapore (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Genre:

Research Report > Promising Solution (0.46)
Overview > Innovation (0.34)

Industry: Information Technology > Software (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Tepakbong, Nathanael, Zhou, Ding-Xuan, Zhou, Xiang

Super-fast rates of convergence for Neural Networks Classifiers under the Hard Margin Condition

We study the classical binary classification problem for hypothesis spaces of Deep Neural Networks (DNNs) with ReLU activation under Tsybakov's low-noise condition with exponent $q>0$, and its limit-case $q\to\infty$ which we refer to as the "hard-margin condition". We show that DNNs which minimize the empirical risk with square loss surrogate and $\ell_p$ penalty can achieve finite-sample excess risk bounds of order $\mathcal{O}\left(n^{-α}\right)$ for arbitrarily large $α>0$ under the hard-margin condition, provided that the regression function $η$ is sufficiently smooth. The proof relies on a novel decomposition of the excess risk which might be of independent interest.

artificial intelligence, assumption, machine learning, (18 more...)

2505.08262

Country:

Asia > China > Hong Kong (0.04)
Europe > Italy (0.04)
Asia > Japan > Honshū > Kansai > Wakayama Prefecture > Wakayama (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Computationally Efficient Diffusion Models in Medical Imaging: A Comprehensive Review

Abdullah, null, Huang, Tao, Lee, Ickjai, Ahn, Euijoon

The diffusion model has recently emerged as a potent approach in computer vision, demonstrating remarkable performances in the field of generative artificial intelligence. Capable of producing high-quality synthetic images, diffusion models have been successfully applied across a range of applications. However, a significant challenge remains with the high computational cost associated with training and generating these models. This study focuses on the efficiency and inference time of diffusion-based generative models, highlighting their applications in both natural and medical imaging. We present the most recent advances in diffusion models by categorizing them into three key models: the Denoising Diffusion Probabilistic Model (DDPM), the Latent Diffusion Model (LDM), and the Wavelet Diffusion Model (WDM). These models play a crucial role in medical imaging, where producing fast, reliable, and high-quality medical images is essential for accurate analysis of abnormalities and disease diagnosis. We first investigate the general framework of DDPM, LDM, and WDM and discuss the computational complexity gap filled by these models in natural and medical imaging. We then discuss the current limitations of these models as well as the opportunities and future research directions in medical imaging.

artificial intelligence, diffusion model, machine learning, (15 more...)

2505.07866

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Oceania > Australia > Queensland > Cairns Region > Cairns (0.04)
North America > United States > Virginia (0.04)
(5 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.92)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)

Khirianova, Alexandra, Solodneva, Ekaterina, Pudovikov, Andrey, Osokin, Sergey, Samosvat, Egor, Dorn, Yuriy, Ledovsky, Alexander, Zenkova, Yana

BAT: Benchmark for Auto-bidding Task

arXiv.org Machine LearningMay-14-2025

The optimization of bidding strategies for online advertising slot auctions presents a critical challenge across numerous digital marketplaces. A significant obstacle to the development, evaluation, and refinement of real-time autobidding algorithms is the scarcity of comprehensive datasets and standardized benchmarks. To address this deficiency, we present an auction benchmark encompassing the two most prevalent auction formats. We implement a series of robust baselines on a novel dataset, addressing the most salient Real-Time Bidding (RTB) problem domains: budget pacing uniformity and Cost Per Click (CPC) constraint optimization. This benchmark provides a user-friendly and intuitive framework for researchers and practitioners to develop and refine innovative autobidding algorithms, thereby facilitating advancements in the field of programmatic advertising. The implementation and additional resources can be accessed at the following repository (https://github.com/avito-tech/bat-autobidding-benchmark, https://doi.org/10.5281/zenodo.14794182).

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1145/3696410.3714657

2505.08485

Country:

Asia > Russia (0.15)
North America > United States > New York > New York County > New York City (0.06)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.06)
(16 more...)

Genre: Research Report (0.82)

Industry:

Marketing (1.00)
Information Technology > Services (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.34)

Otoum, Yazan, Asad, Arghavan, Nayak, Amiya

LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems

The increasing complexity and scale of the Internet of Things (IoT) have made security a critical concern. This paper presents a novel Large Language Model (LLM)-based framework for comprehensive threat detection and prevention in IoT environments. The system integrates lightweight LLMs fine-tuned on IoT-specific datasets (IoT-23, TON_IoT) for real-time anomaly detection and automated, context-aware mitigation strategies optimized for resource-constrained devices. A modular Docker-based deployment enables scalable and reproducible evaluation across diverse network conditions. Experimental results in simulated IoT environments demonstrate significant improvements in detection accuracy, response latency, and resource efficiency over traditional security methods. The proposed framework highlights the potential of LLM-driven, autonomous security solutions for future IoT ecosystems.

large language model, machine learning, natural language, (18 more...)

2505.0024

Country: Oceania > Australia (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges

Lee, Yunseo, Song, John Youngeun, Kim, Dongsun, Kim, Jindae, Kim, Mijung, Nam, Jaechang

Recent technical breakthroughs in large language models (LLMs) have enabled them to fluently generate source code. Software developers often leverage both general-purpose and code-specialized LLMs to revise existing code or even generate a whole function from scratch. These capabilities are also beneficial in no-code or low-code contexts, in which one can write programs without a technical background. However, due to their internal design, LLMs are prone to generating hallucinations, which are incorrect, nonsensical, and not justifiable information but difficult to identify its presence. This problem also occurs when generating source code. Once hallucinated code is produced, it is often challenging for users to identify and fix it, especially when such hallucinations can be identified under specific execution paths. As a result, the hallucinated code may remain unnoticed within the codebase. This survey investigates recent studies and techniques relevant to hallucinations generated by CodeLLMs. We categorize the types of hallucinations in the code generated by CodeLLMs, review existing benchmarks and mitigation strategies, and identify open challenges. Based on these findings, this survey outlines further research directions in the detection and removal of hallucinations produced by CodeLLMs.

large language model, machine learning, natural language, (15 more...)

2504.20799

Country:

North America > United States (0.46)
Asia > South Korea (0.28)
Oceania > Australia (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology > Software (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)