AITopics | intra

Collaborating Authors

intra

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models

Wang, Guoyan, Huang, Yanyan, Chen, Chunlin, Wang, Lifeng, Sun, Yuxiang

arXiv.org Artificial IntelligenceNov-26-2025

Cross-platform strategy game automation remains a challenge due to diverse user interfaces and dynamic battlefield environments. Existing Vision--Language Models (VLMs) struggle with generalization across heterogeneous platforms and lack precision in interface understanding and action execution. We introduce Yanyun-3, a VLM-based agent that integrates Qwen2.5-VL for visual reasoning and UI-TARS for interface execution. We propose a novel data organization principle -- combination granularity -- to distinguish intra-sample fusion and inter-sample mixing of multimodal data (static images, multi-image sequences, and videos). The model is fine-tuned using QLoRA on a curated dataset across three strategy game platforms. The optimal strategy (M*V+S) achieves a 12.98x improvement in BLEU-4 score and a 63% reduction in inference time compared to full fusion. Yanyun-3 successfully executes core tasks (e.g., target selection, resource allocation) across platforms without platform-specific tuning. Our findings demonstrate that structured multimodal data organization significantly enhances VLM performance in embodied tasks. Yanyun-3 offers a generalizable framework for GUI automation, with broader implications for robotics and autonomous systems.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.12937

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Singapore (0.14)
(10 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Military (1.00)
Leisure & Entertainment > Games > Computer Games (0.92)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(4 more...)

Add feedback

M$^3$Prune: Hierarchical Communication Graph Pruning for Efficient Multi-Modal Multi-Agent Retrieval-Augmented Generation

Shao, Weizi, Zhang, Taolin, Zhou, Zijie, Chen, Chen, Wang, Chengyu, He, Xiaofeng

arXiv.org Artificial IntelligenceNov-26-2025

Recent advancements in multi-modal retrieval-augmented generation (mRAG), which enhance multi-modal large language models (MLLMs) with external knowledge, have demonstrated that the collective intelligence of multiple agents can significantly outperform a single model through effective communication. Despite impressive performance, existing multi-agent systems inherently incur substantial token overhead and increased computational costs, posing challenges for large-scale deployment. To address these issues, we propose a novel Multi-Modal Multi-agent hierarchical communication graph PRUNING framework, termed M$^3$Prune. Our framework eliminates redundant edges across different modalities, achieving an optimal balance between task performance and token overhead. Specifically, M$^3$Prune first applies intra-modal graph sparsification to textual and visual modalities, identifying the edges most critical for solving the task. Subsequently, we construct a dynamic communication topology using these key edges for inter-modal graph sparsification. Finally, we progressively prune redundant edges to obtain a more efficient and hierarchical topology. Extensive experiments on both general and domain-specific mRAG benchmarks demonstrate that our method consistently outperforms both single-agent and robust multi-agent mRAG systems while significantly reducing token consumption.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.19969

Country:

Europe > Austria > Vienna (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Anhui Province > Hefei (0.04)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

A Unified Convergence Analysis for Semi-Decentralized Learning: Sampled-to-Sampled vs. Sampled-to-All Communication

Rodio, Angelo, Neglia, Giovanni, Chen, Zheng, Larsson, Erik G.

arXiv.org Artificial IntelligenceNov-18-2025

In semi-decentralized federated learning, devices primarily rely on device-to-device communication but occasionally interact with a central server. Periodically, a sampled subset of devices uploads their local models to the server, which computes an aggregate model. The server can then either (i) share this aggregate model only with the sampled clients (sampled-to-sampled, S2S) or (ii) broadcast it to all clients (sampled-to-all, S2A). Despite their practical significance, a rigorous theoretical and empirical comparison of these two strategies remains absent. We address this gap by analyzing S2S and S2A within a unified convergence framework that accounts for key system parameters: sampling rate, server aggregation frequency, and network connectivity. Our results--both analytical and experimental--reveal distinct regimes where one strategy outperforms the other, depending primarily on the degree of data heterogeneity across devices. These insights lead to concrete design guidelines for practical semi-decentralized FL deployments.

artificial intelligence, inter, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.1156

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > France (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Sweden (0.04)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Lower Bound of Hash Codes ' Performance

Neural Information Processing SystemsNov-15-2025, 22:47:33 GMT

Nevertheless, a theoretical analysis of criteria for learning good hash codes remains largely unexploited.

hash code, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.68)
(2 more...)

Add feedback

b0dd033cbe58aa5ea27747271bfd84e3-Supplemental.pdf

Neural Information Processing SystemsNov-15-2025, 11:13:39 GMT

artificial intelligence, intra, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A More algorithmic details and analysis on the proposed method

Neural Information Processing SystemsNov-14-2025, 01:51:35 GMT

We summarize the SD module in Algorithm 1. We omit some algorithmic details and state the SD module in Algorithm 1 for an easy understanding. Here, we continue to elaborate our mechanism in Algorithm 2. The main supplement is the step of ASR is already higher than 90%. However, it doesn't work under clean-label attacks (shown in Figure 1(c,f)) since poisoned samples are mixed up with clean samples. Then, we reuse the SD module and find that clean and poisoned samples can be well separated.

artificial intelligence, machine learning, module, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

Beyond Fixed Depth: Adaptive Graph Neural Networks for Node Classification Under Varying Homophily

Hevapathige, Asela, Wijesinghe, Asiri, Zehmakan, Ahad N.

arXiv.org Artificial IntelligenceNov-11-2025

Graph Neural Networks (GNNs) have achieved significant success in addressing node classification tasks. However, the effectiveness of traditional GNNs degrades on heterophilic graphs, where connected nodes often belong to different labels or properties. While recent work has introduced mechanisms to improve GNN performance under heterophily, certain key limitations still exist. Most existing models apply a fixed aggregation depth across all nodes, overlooking the fact that nodes may require different propagation depths based on their local homophily levels and neighborhood structures. Moreover, many methods are tailored to either homophilic or heterophilic settings, lacking the flexibility to generalize across both regimes. To address these challenges, we develop a theoretical framework that links local structural and label characteristics to information propagation dynamics at the node level. Our analysis shows that optimal aggregation depth varies across nodes and is critical for preserving class-discriminative information. Guided by this insight, we propose a novel adaptive-depth GNN architecture that dynamically selects node-specific aggregation depths using theoretically grounded metrics. Our method seamlessly adapts to both homophilic and heterophilic patterns within a unified model. Extensive experiments demonstrate that our approach consistently enhances the performance of standard GNN backbones across diverse benchmarks.

artificial intelligence, machine learning, node, (16 more...)

arXiv.org Artificial Intelligence

2511.06608

Country:

North America > United States > Texas (0.05)
North America > United States > Wisconsin (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Equitable Survival Prediction: A Fairness-Aware Survival Modeling (FASM) Approach

Liu, Mingxuan, Ning, Yilin, Wang, Haoyuan, Hong, Chuan, Engelhard, Matthew, Bitterman, Danielle S., La Cava, William G., Liu, Nan

arXiv.org Artificial IntelligenceOct-24-2025

As machine learning models become increasingly integrated into healthcare, structural inequities and social biases embedded in clinical data can be perpetuated or even amplified by data-driven models. In survival analysis, censoring and time dynamics can further add complexity to fair model development. Additionally, algorithmic fairness approaches often overlook disparities in cross-group rankings, e.g., high-risk Black patients may be ranked below lower-risk White patients who do not experience the event of mortality. Such misranking can reinforce biological essentialism and undermine equitable care. We propose a Fairness-Aware Survival Modeling (FASM), designed to mitigate algorithmic bias regarding both intra-group and cross-group risk rankings over time. Using breast cancer prognosis as a representative case and applying FASM to SEER breast cancer data, we show that FASM substantially improves fairness while preserving discrimination performance comparable to fairness-unaware survival models. Time-stratified evaluations show that FASM maintains stable fairness over a 10-year horizon, with the greatest improvements observed during the mid-term of follow-up. Our approach enables the development of survival models that prioritize both accuracy and equity in clinical decision-making, advancing fairness as a core principle in clinical care.

artificial intelligence, fairness, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.20629

Country:

North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Emotion-Disentangled Embedding Alignment for Noise-Robust and Cross-Corpus Speech Emotion Recognition

Tiwari, Upasana, Chakraborty, Rupayan, Kopparapu, Sunil Kumar

arXiv.org Artificial IntelligenceOct-13-2025

Effectiveness of speech emotion recognition in real-world scenarios is often hindered by noisy environments and variability across datasets. This paper introduces a two-step approach to enhance the robustness and generalization of speech emotion recognition models through improved representation learning. First, our model employs EDRL (Emotion-Disentangled Representation Learning) to extract class-specific discriminative features while preserving shared similarities across emotion categories. Next, MEA (Multiblock Embedding Alignment) refines these representations by projecting them into a joint discriminative latent subspace that maximizes covariance with the original speech input. The learned EDRL-MEA embeddings are subsequently used to train an emotion classifier using clean samples from publicly available datasets, and are evaluated on unseen noisy and cross-corpus speech samples. Improved performance under these challenging conditions demonstrates the effectiveness of the proposed method.

artificial intelligence, emotion recognition, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.09072

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Indonesia > Bali (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

Exploring Cross-Client Memorization of Training Data in Large Language Models for Federated Learning

Udsa, Tinnakit, Udomcharoenchaikit, Can, Payoungkhamdee, Patomporn, Nutanong, Sarana, Rattanavipanon, Norrathep

arXiv.org Artificial IntelligenceOct-13-2025

Federated learning (FL) enables collaborative training without raw data sharing, but still risks training data memorization. Existing FL memorization detection techniques focus on one sample at a time, underestimating more subtle risks of cross-sample memorization. In contrast, recent work on centralized learning (CL) has introduced fine-grained methods to assess memorization across all samples in training data, but these assume centralized access to data and cannot be applied directly to FL. We bridge this gap by proposing a framework that quantifies both intra- and inter-client memorization in FL using fine-grained cross-sample memorization measurement across all clients. Based on this framework, we conduct two studies: (1) measuring subtle memorization across clients and (2) examining key factors that influence memorization, including decoding strategies, prefix length, and FL algorithms. Our findings reveal that FL models do memorize client data, particularly intra-client data, more than inter-client data, with memorization influenced by training and inferencing factors.

artificial intelligence, machine learning, memorization, (16 more...)

arXiv.org Artificial Intelligence

2510.0875

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Virginia (0.04)
North America > United States > Arizona (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Consumer Health (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback