AITopics

2505.1384

Country:

Asia > Middle East > Jordan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.92)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Energy > Power Industry (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-21-2025

MLZero: A Multi-Agent System for End-to-end Machine Learning Automation

Fang, Haoyang, Han, Boran, Erickson, Nick, Zhang, Xiyuan, Zhou, Su, Dagar, Anirudh, Zhang, Jiani, Turkmen, Ali Caner, Hu, Cuixiong, Rangwala, Huzefa, Wu, Ying Nian, Wang, Bernie, Karypis, George

Existing AutoML systems have advanced the automation of machine learning (ML); however, they still require substantial manual configuration and expert input, particularly when handling multimodal data. We introduce MLZero, a novel multi-agent framework powered by Large Language Models (LLMs) that enables end-to-end ML automation across diverse data modalities with minimal human intervention. A cognitive perception module is first employed, transforming raw multimodal inputs into perceptual context that effectively guides the subsequent workflow. To address key limitations of LLMs, such as hallucinated code generation and outdated API knowledge, we enhance the iterative code generation process with semantic and episodic memory. MLZero demonstrates superior performance on MLE-Bench Lite, outperforming all competitors in both success rate and solution quality, securing six gold medals. Additionally, when evaluated on our Multimodal AutoML Agent Benchmark, which includes 25 more challenging tasks spanning diverse data modalities, MLZero outperforms the competing methods by a large margin with a success rate of 0.92 (+263.6\%) and an average rank of 2.28. Our approach maintains its robust effectiveness even with a compact 8B LLM, outperforming full-size systems from existing solutions.

large language model, machine learning, natural language, (13 more...)

2505.13941

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material > Course Syllabus & Notes (0.92)

Industry:

Transportation > Air (1.00)
Information Technology (1.00)
Education (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Miletić, Filip, Schmid, Aaron, Walde, Sabine Schulte im

Probing BERT for German Compound Semantics

arXiv.org Artificial IntelligenceMay-21-2025

This paper investigates the extent to which pretrained German BERT encodes knowledge of noun compound semantics. We comprehensively vary combinations of target tokens, layers, and cased vs. uncased models, and evaluate them by predicting the compositionality of 868 gold standard compounds. Looking at representational patterns within the transformer architecture, we observe trends comparable to equivalent prior work on English, with compositionality information most easily recoverable in the early layers. However, our strongest results clearly lag behind those reported for English, suggesting an inherently more difficult task in German. This may be due to the higher productivity of compounding in German than in English and the associated increase in constituent-level ambiguity, including in our target compound set.

large language model, machine learning, natural language, (19 more...)

2505.1413

Country:

South America > Colombia > Meta Department > Villavicencio (0.05)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
(12 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Machine LearningMay-21-2025

Synthetic Non-stationary Data Streams for Recognition of the Unknown

Komorniczak, Joanna

The problem of data non-stationarity is commonly addressed in data stream processing. In a dynamic environment, methods should continuously be ready to analyze time-varying data -- hence, they should enable incremental training and respond to concept drifts. An equally important variability typical for non-stationary data stream environments is the emergence of new, previously unknown classes. Often, methods focus on one of these two phenomena -- detection of concept drifts or detection of novel classes -- while both difficulties can be observed in data streams. Additionally, concerning previously unknown observations, the topic of open set of classes has become particularly important in recent years, where the goal of methods is to efficiently classify within known classes and recognize objects outside the model competence. This article presents a strategy for synthetic data stream generation in which both concept drifts and the emergence of new classes representing unknown objects occur. The presented research shows how unsupervised drift detectors address the task of detecting novelty and concept drifts and demonstrates how the generated data streams can be utilized in the open set recognition task.

concept drift, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2505.13745

Country:

Europe > Poland > Lower Silesia Province > Wroclaw (0.04)
South America > Brazil > Maranhão (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.32)

arXiv.org Machine LearningMay-21-2025

Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference

Du, Jin, Chen, Li, Xian, Xun, Luo, An, Tian, Fangqiao, Wang, Ganghua, Doss, Charles, Shen, Xiaotong, Ding, Jie

Reliable causal inference is essential for making decisions in high-stakes areas like medicine, economics, and public policy. However, it remains unclear whether large language models (LLMs) can handle rigorous and trustworthy statistical causal inference. Current benchmarks usually involve simplified tasks. For example, these tasks might only ask LLMs to identify semantic causal relationships or draw conclusions directly from raw data. As a result, models may overlook important statistical pitfalls, such as Simpson's paradox or selection bias. This oversight limits the applicability of LLMs in the real world. To address these limitations, we propose CausalPitfalls, a comprehensive benchmark designed to rigorously evaluate the capability of LLMs in overcoming common causal inference pitfalls. Our benchmark features structured challenges across multiple difficulty levels, each paired with grading rubrics. This approach allows us to quantitatively measure both causal reasoning capabilities and the reliability of LLMs' responses. We evaluate models using two protocols: (1) direct prompting, which assesses intrinsic causal reasoning, and (2) code-assisted prompting, where models generate executable code for explicit statistical analysis. Additionally, we validate the effectiveness of this judge by comparing its scoring with assessments from human experts. Our results reveal significant limitations in current LLMs when performing statistical causal inference. The CausalPitfalls benchmark provides essential guidance and quantitative metrics to advance the development of trustworthy causal reasoning systems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2505.1377

Country:

North America > United States > Minnesota (0.07)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > Strength High (0.68)
Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Environmental Medicine (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Yousefimehr, Behnam, Ghatee, Mehdi, Seifi, Mohammad Amin, Fazli, Javad, Tavakoli, Sajed, Rafei, Zahra, Ghaffari, Shervin, Nikahd, Abolfazl, Gandomani, Mahdi Razi, Orouji, Alireza, Kashani, Ramtin Mahmoudi, Heshmati, Sarina, Mousavi, Negin Sadat

Data Balancing Strategies: A Survey of Resampling and Augmentation Methods

arXiv.org Machine LearningMay-21-2025

Imbalanced data poses a significant obstacle in machine learning, as an unequal distribution of class labels often results in skewed predictions and diminished model accuracy. To mitigate this problem, various resampling strategies have been developed, encompassing both oversampling and undersampling techniques aimed at modifying class proportions. Conventional oversampling approaches like SMOTE enhance the representation of the minority class, whereas undersampling methods focus on trimming down the majority class. Advances in deep learning have facilitated the creation of more complex solutions, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), which are capable of producing high-quality synthetic examples. This paper reviews a broad spectrum of data balancing methods, classifying them into categories including synthetic oversampling, adaptive techniques, generative models, ensemble-based strategies, hybrid approaches, undersampling, and neighbor-based methods. Furthermore, it highlights current developments in resampling techniques and discusses practical implementations and case studies that validate their effectiveness. The paper concludes by offering perspectives on potential directions for future exploration in this domain.

artificial intelligence, data quality, machine learning, (19 more...)

arXiv.org Machine Learning

2505.13518

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
South America > Uruguay > Maldonado > Maldonado (0.04)
Africa > Mali (0.04)
(6 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Son, Guijin, Hong, Jiwoo, Fan, Honglu, Nam, Heejeong, Ko, Hyunwoo, Lim, Seungwon, Song, Jinyeop, Choi, Jinha, Paulo, Gonçalo, Yu, Youngjae, Biderman, Stella

Recent advances in large language models (LLMs) have fueled the vision of automated scientific discovery, often called AI Co-Scientists. To date, prior work casts these systems as generative co-authors responsible for crafting hypotheses, synthesizing code, or drafting manuscripts. In this work, we explore a complementary application: using LLMs as verifiers to automate the \textbf{academic verification of scientific manuscripts}. To that end, we introduce SPOT, a dataset of 83 published papers paired with 91 errors significant enough to prompt errata or retraction, cross-validated with actual authors and human annotators. Evaluating state-of-the-art LLMs on SPOT, we find that none surpasses 21.1\% recall or 6.1\% precision (o3 achieves the best scores, with all others near zero). Furthermore, confidence estimates are uniformly low, and across eight independent runs, models rarely rediscover the same errors, undermining their reliability. Finally, qualitative analysis with domain experts reveals that even the strongest models make mistakes resembling student-level misconceptions derived from misunderstandings. These findings highlight the substantial gap between current LLM capabilities and the requirements for dependable AI-assisted academic verification.

large language model, machine learning, natural language, (13 more...)

2505.11855

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology (0.46)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Jayalath, Dulhan, Landau, Gilad, Jones, Oiwi Parker

Unlocking Non-Invasive Brain-to-Text

Despite major advances in surgical brain-to-text (B2T), i.e. transcribing speech from invasive brain recordings, non-invasive alternatives have yet to surpass even chance on standard metrics. This remains a barrier to building a non-invasive brain-computer interface (BCI) capable of restoring communication in paralysed individuals without surgery. Here, we present the first non-invasive B2T result that significantly exceeds these critical baselines, raising BLEU by $1.4\mathrm{-}2.6\times$ over prior work. This result is driven by three contributions: (1) we extend recent word-classification models with LLM-based rescoring, transforming single-word predictors into closed-vocabulary B2T systems; (2) we introduce a predictive in-filling approach to handle out-of-vocabulary (OOV) words, substantially expanding the effective vocabulary; and (3) we demonstrate, for the first time, how to scale non-invasive B2T models across datasets, unlocking deep learning at scale and improving accuracy by $2.1\mathrm{-}2.3\times$. Through these contributions, we offer new insights into the roles of data quality and vocabulary size. Together, our results remove a major obstacle to realising practical non-invasive B2T systems.

large language model, machine learning, natural language, (17 more...)

2505.13446

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Althunayyan, Muzun, Javed, Amir, Rana, Omer

A Survey of Learning-Based Intrusion Detection Systems for In-Vehicle Network

Connected and Autonomous Vehicles (CAVs) enhance mobility but face cybersecurity threats, particularly through the insecure Controller Area Network (CAN) bus. Cyberattacks can have devastating consequences in connected vehicles, including the loss of control over critical systems, necessitating robust security solutions. In-vehicle Intrusion Detection Systems (IDSs) offer a promising approach by detecting malicious activities in real time. This survey provides a comprehensive review of state-of-the-art research on learning-based in-vehicle IDSs, focusing on Machine Learning (ML), Deep Learning (DL), and Federated Learning (FL) approaches. Based on the reviewed studies, we critically examine existing IDS approaches, categorising them by the types of attacks they detect - known, unknown, and combined known-unknown attacks - while identifying their limitations. We also review the evaluation metrics used in research, emphasising the need to consider multiple criteria to meet the requirements of safety-critical systems. Additionally, we analyse FL-based IDSs and highlight their limitations. By doing so, this survey helps identify effective security measures, address existing limitations, and guide future research toward more resilient and adaptive protection mechanisms, ensuring the safety and reliability of CAVs.

artificial intelligence, detection, machine learning, (15 more...)

2505.11551

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom (0.04)
South America > Brazil (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Vatsal, Shubham, Dubey, Harsh, Singh, Aditi

Multilingual Prompt Engineering in Large Language Models: A Survey Across NLP Tasks

Large language models (LLMs) have demonstrated impressive performance across a wide range of Natural Language Processing (NLP) tasks. However, ensuring their effectiveness across multiple languages presents unique challenges. Multilingual prompt engineering has emerged as a key approach to enhance LLMs' capabilities in diverse linguistic settings without requiring extensive parameter re-training or fine-tuning. With growing interest in multilingual prompt engineering over the past two to three years, researchers have explored various strategies to improve LLMs' performance across languages and NLP tasks. By crafting structured natural language prompts, researchers have successfully extracted knowledge from LLMs across different languages, making these techniques an accessible pathway for a broader audience, including those without deep expertise in machine learning, to harness the capabilities of LLMs. In this paper, we survey and categorize different multilingual prompting techniques based on the NLP tasks they address across a diverse set of datasets that collectively span around 250 languages. We further highlight the LLMs employed, present a taxonomy of approaches and discuss potential state-of-the-art (SoTA) methods for specific multilingual datasets. Additionally, we derive a range of insights across language families and resource levels (high-resource vs. low-resource), including analyses such as the distribution of NLP tasks by language resource type and the frequency of prompting methods across different language families. Our survey reviews 36 research papers covering 39 prompting techniques applied to 30 multilingual NLP tasks, with the majority of these studies published in the last two years.

arxiv preprint arxiv, large language model, machine learning, (14 more...)

2505.11665

Country:

North America > United States (0.14)
Africa > Niger (0.04)
Asia > China > Henan Province > Zhengzhou (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)