AITopics

Plotting

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization

Song, Youngjun, Hwang, Youngsik, Lee, Jonghun, Lee, Heechang, Lim, Dong-Young

arXiv.org Machine LearningMar-30-2025

Domain generalization (DG) aims to learn models that can generalize well to unseen domains by training only on a set of source domains. Sharpness-Aware Minimization (SAM) has been a popular approach for this, aiming to find flat minima in the total loss landscape. However, we show that minimizing the total loss sharpness does not guarantee sharpness across individual domains. In particular, SAM can converge to fake flat minima, where the total loss may exhibit flat minima, but sharp minima are present in individual domains. Moreover, the current perturbation update in gradient ascent steps is ineffective in directly updating the sharpness of individual domains. Motivated by these findings, we introduce a novel DG algorithm, Decreased-overhead Gradual Sharpness-Aware Minimization (DGSAM), that applies gradual domain-wise perturbation to reduce sharpness consistently across domains while maintaining computational efficiency. Our experiments demonstrate that DGSAM outperforms state-of-the-art DG methods, achieving improved robustness to domain shifts and better performance across various benchmarks, while reducing computational overhead compared to SAM.

artificial intelligence, machine learning, sharpness, (14 more...)

arXiv.org Machine Learning

2503.2343

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

LaViC: Adapting Large Vision-Language Models to Visually-Aware Conversational Recommendation

Jeon, Hyunsik, Koide, Satoshi, Wang, Yu, He, Zhankui, McAuley, Julian

arXiv.org Artificial IntelligenceMar-30-2025

Conversational recommender systems engage users in dialogues to refine their needs and provide more personalized suggestions. Although textual information suffices for many domains, visually driven categories such as fashion or home decor potentially require detailed visual information related to color, style, or design. To address this challenge, we propose LaViC (Large Vision-Language Conversational Recommendation Framework), a novel approach that integrates compact image representations into dialogue-based recommendation systems. LaViC leverages a large vision-language model in a two-stage process: (1) visual knowledge self-distillation, which condenses product images from hundreds of tokens into a small set of visual tokens in a self-distillation manner, significantly reducing computational overhead, and (2) recommendation prompt tuning, which enables the model to incorporate both dialogue context and distilled visual tokens, providing a unified mechanism for capturing textual and visual features. To support rigorous evaluation of visually-aware conversational recommendation, we construct a new dataset by aligning Reddit conversations with Amazon product listings across multiple visually oriented categories (e.g., fashion, beauty, and home). This dataset covers realistic user queries and product appearances in domains where visual details are crucial. Extensive experiments demonstrate that LaViC significantly outperforms text-only conversational recommendation methods and open-source vision-language baselines. Moreover, LaViC achieves competitive or superior accuracy compared to prominent proprietary baselines (e.g., GPT-3.5-turbo, GPT-4o-mini, and GPT-4o), demonstrating the necessity of explicitly using visual data for capturing product attributes and showing the effectiveness of our vision-language integration. Our code and dataset are available at https://github.com/jeon185/LaViC.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.23312

Country: North America > United States (0.96)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science

Seo, Wonduk, Lee, Juhyeon, Bu, Yi

arXiv.org Artificial IntelligenceMar-30-2025

Large Language Models (LLMs) have revolutionized automated data analytics and machine learning by enabling dynamic reasoning and adaptability. While recent approaches have advanced multi-stage pipelines through multi-agent systems, they typically rely on rigid, single-path workflows that limit the exploration and integration of diverse strategies, often resulting in suboptimal predictions. To address these challenges, we propose SPIO (Sequential Plan Integration and Optimization), a novel framework that leverages LLM-driven decision-making to orchestrate multi-agent planning across four key modules: data preprocessing, feature engineering, modeling, and hyperparameter tuning. In each module, dedicated planning agents independently generate candidate strategies that cascade into subsequent stages, fostering comprehensive exploration. A plan optimization agent refines these strategies by suggesting several optimized plans. We further introduce two variants: SPIO-S, which selects a single best solution path as determined by the LLM, and SPIO-E, which selects the top k candidate plans and ensembles them to maximize predictive performance. Extensive experiments on Kaggle and OpenML datasets demonstrate that SPIO significantly outperforms state-of-the-art methods, providing a robust and scalable solution for automated data science task.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.23314

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Linguistic Loops and Geometric Invariants as a Way to Pre-Verbal Thought?

Corradetti, Daniele, Marrani, Alessio

arXiv.org Artificial IntelligenceMar-30-2025

In this work we introduce the concepts of linguistic transformation, linguistic loop and semantic deficit. By exploiting Lie group theoretical and geometric techniques, we define invariants that capture the structural properties of a whole linguistic loop. This result introduces new line of research, employing tools from Lie theory and higher-dimensional geometry within language studies. But, even more intriguingly, our study hints to a mathematical characterization of the meta-linguistic or pre-verbal thought, namely of those cognitive structures that precede the language.

artificial intelligence, natural language, transformation, (14 more...)

arXiv.org Artificial Intelligence

2503.23311

Country: Europe > Portugal (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.69)

Add feedback

Exploring Explainable Multi-player MCTS-minimax Hybrids in Board Game Using Process Mining

Qian, Yiyu, Miller, Tim, Qian, Zheng, Zhao, Liyuan

arXiv.org Artificial IntelligenceMar-30-2025

Monte-Carlo Tree Search (MCTS) is a family of sampling-based search algorithms widely used for online planning in sequential decision-making domains and at the heart of many recent advances in artificial intelligence. Understanding the behavior of MCTS agents is difficult for developers and users due to the frequently large and complex search trees that result from the simulation of many possible futures, their evaluations, and their relationships. This paper presents our ongoing investigation into potential explanations for the decision-making and behavior of MCTS. A weakness of MCTS is that it constructs a highly selective tree and, as a result, can miss crucial moves and fall into tactical traps. Full-width minimax search constitutes the solution. We integrate shallow minimax search into the rollout phase of multi-player MCTS and use process mining technique to explain agents' strategies in 3v3 checkers.

algorithm, artificial intelligence, fitness, (15 more...)

arXiv.org Artificial Intelligence

2503.23326

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

A Multi-Agent Framework with Automated Decision Rule Optimization for Cross-Domain Misinformation Detection

Li, Hui, Wang, Ante, li, kunquan, Wang, Zhihao, Zhang, Liang, Qiu, Delai, Liu, Qingsong, Su, Jinsong

arXiv.org Artificial IntelligenceMar-30-2025

Misinformation spans various domains, but detection methods trained on specific domains often perform poorly when applied to others. With the rapid development of Large Language Models (LLMs), researchers have begun to utilize LLMs for cross-domain misinformation detection. However, existing LLM-based methods often fail to adequately analyze news in the target domain, limiting their detection capabilities. More importantly, these methods typically rely on manually designed decision rules, which are limited by domain knowledge and expert experience, thus limiting the generalizability of decision rules to different domains. To address these issues, we propose a MultiAgent Framework for cross-domain misinformation detection with Automated Decision Rule Optimization (MARO). Under this framework, we first employs multiple expert agents to analyze target-domain news. Subsequently, we introduce a question-reflection mechanism that guides expert agents to facilitate higherquality analysis. Furthermore, we propose a decision rule optimization approach based on carefully-designed cross-domain validation tasks to iteratively enhance the effectiveness of decision rules in different domains. Experimental results and in-depth analysis on commonlyused datasets demonstrate that MARO achieves significant improvements over existing methods.

acc, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.23329

Genre: Research Report > New Finding (0.46)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

AI Agents in Engineering Design: A Multi-Agent Framework for Aesthetic and Aerodynamic Car Design

Elrefaie, Mohamed, Qian, Janet, Wu, Raina, Chen, Qian, Dai, Angela, Ahmed, Faez

arXiv.org Artificial IntelligenceMar-30-2025

We introduce the concept of "Design Agents" for engineering applications, particularly focusing on the automotive design process, while emphasizing that our approach can be readily extended to other engineering and design domains. Our framework integrates AI-driven design agents into the traditional engineering workflow, demonstrating how these specialized computational agents interact seamlessly with engineers and designers to augment creativity, enhance efficiency, and significantly accelerate the overall design cycle. By automating and streamlining tasks traditionally performed manually, such as conceptual sketching, styling enhancements, 3D shape retrieval and generative modeling, computational fluid dynamics (CFD) meshing, and aerodynamic simulations, our approach reduces certain aspects of the conventional workflow from weeks and days down to minutes. These agents leverage state-of-the-art vision-language models (VLMs), large language models (LLMs), and geometric deep learning techniques, providing rapid iteration and comprehensive design exploration capabilities. We ground our methodology in industry-standard benchmarks, encompassing a wide variety of conventional automotive designs, and utilize high-fidelity aerodynamic simulations to ensure practical and applicable outcomes. Furthermore, we present design agents that can swiftly and accurately predict simulation outcomes, empowering engineers and designers to engage in more informed design optimization and exploration. This research underscores the transformative potential of integrating advanced generative AI techniques into complex engineering tasks, paving the way for broader adoption and innovation across multiple engineering disciplines.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.23315

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry:

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

Add feedback

Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics

Zhu, Jing, Ju, Mingxuan, Liu, Yozen, Koutra, Danai, Shah, Neil, Zhao, Tong

arXiv.org Artificial IntelligenceMar-30-2025

Generative recommendation (GR) has become a powerful paradigm in recommendation systems that implicitly links modality and semantics to item representation, in contrast to previous methods that relied on non-semantic item identifiers in autoregressive models. However, previous research has predominantly treated modalities in isolation, typically assuming item content is unimodal (usually text). We argue that this is a significant limitation given the rich, multimodal nature of real-world data and the potential sensitivity of GR models to modality choices and usage. Our work aims to explore the critical problem of Multimodal Generative Recommendation (MGR), highlighting the importance of modality choices in GR nframeworks. We reveal that GR models are particularly sensitive to different modalities and examine the challenges in achieving effective GR when multiple modalities are available. By evaluating design strategies for effectively leveraging multiple modalities, we identify key challenges and introduce MGR-LF++, an enhanced late fusion framework that employs contrastive modality alignment and special tokens to denote different modalities, achieving a performance improvement of over 20% compared to single-modality alternatives.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.23333

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation

Zheng, Hongwei, Li, Han, Dai, Wenrui, Zheng, Ziyang, Li, Chenglin, Zou, Junni, Xiong, Hongkai

arXiv.org Artificial IntelligenceMar-30-2025

Existing 2D-to-3D human pose estimation (HPE) methods struggle with the occlusion issue by enriching information like temporal and visual cues in the lifting stage. In this paper, we argue that these methods ignore the limitation of the sparse skeleton 2D input representation, which fundamentally restricts the 2D-to-3D lifting and worsens the occlusion issue. T o address these, we propose a novel two-stage generative densification method, named Hierarchical Pose AutoRegressive Transformer (HiP ART), to generate hierarchical 2D dense poses from the original sparse 2D pose. Specifically, we first develop a multi-scale skeleton tokenization module to quantize the highly dense 2D pose into hierarchical tokens and propose a Skeleton-aware Alignment to strengthen token connections. W e then develop a Hierarchical AutoRegressive Modeling scheme for hierarchical 2D pose generation. With generated hierarchical poses as inputs for 2D-to-3D lifting, the proposed method shows strong robustness in occluded scenarios and achieves state-of-the-art performance on the single-frame-based 3D HPE. Moreover, it outperforms numerous multi-frame methods while reducing parameter and computational complexity and can also complement them to further enhance performance and robustness.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2503.23331

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Add feedback

JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human Self-Destructive Behavior Content in Jirai Community

Xiao, Yunze, He, Tingyu, Wang, Lionel Z., Ma, Yiming, Song, Xingyu, Xu, Xiaohang, Li, Irene, Ng, Ka Chung

arXiv.org Artificial IntelligenceMar-30-2025

This paper introduces JiraiBench, the first bilingual benchmark for evaluating large language models' effectiveness in detecting self-destructive content across Chinese and Japanese social media communities. Focusing on the transnational "Jirai" (landmine) online subculture that encompasses multiple forms of self-destructive behaviors including drug overdose, eating disorders, and self-harm, we present a comprehensive evaluation framework incorporating both linguistic and cultural dimensions. Our dataset comprises 10,419 Chinese posts and 5,000 Japanese posts with multidimensional annotation along three behavioral categories, achieving substantial inter-annotator agreement. Experimental evaluations across four state-of-the-art models reveal significant performance variations based on instructional language, with Japanese prompts unexpectedly outperforming Chinese prompts when processing Chinese content. This emergent cross-cultural transfer suggests that cultural proximity can sometimes outweigh linguistic similarity in detection tasks. Cross-lingual transfer experiments with fine-tuned models further demonstrate the potential for knowledge transfer between these language systems without explicit target language training. These findings highlight the need for culturally-informed approaches to multilingual content moderation and provide empirical evidence for the importance of cultural context in developing more effective detection systems for vulnerable online communities.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2503.21679

Country:

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback