AITopics

Effective Training Data Synthesis for Improving MLLM Chart Understanding

Yang, Yuwei, Zhang, Zeyu, Hou, Yunzhong, Li, Zhuowan, Liu, Gaowen, Payani, Ali, Ting, Yuan-Sen, Zheng, Liang

Being able to effectively read scientific plots, or chart understanding, is a central part toward building effective agents for science. However, existing multimodal large language models (MLLMs), especially open-source ones, are still falling behind with a typical success rate of 30%-50% on challenging benchmarks. Previous studies on fine-tuning MLLMs with synthetic charts are often restricted by their inadequate similarity to the real charts, which could compromise model training and performance on complex real-world charts. In this study, we show that modularizing chart generation and diversifying visual details improves chart understanding capabilities. In particular, we design a five-step data synthesis pipeline, where we separate data and function creation for single plot generation, condition the generation of later subplots on earlier ones for multi-subplot figures, visually diversify the generated figures, filter out low quality data, and finally generate the question-answer (QA) pairs with GPT-4o. This approach allows us to streamline the generation of fine-tuning datasets and introduce the effective chart dataset (ECD), which contains 10k+ chart images and 300k+ QA pairs, covering 25 topics and featuring 250+ chart type combinations with high visual complexity. We show that ECD consistently improves the performance of various MLLMs on a range of real-world and synthetic test sets. Code, data and models are available at: https://github.com/yuweiyang-anu/ECD.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2508.06492

Genre: Research Report > New Finding (0.66)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Adaptive Heterogeneous Graph Neural Networks: Bridging Heterophily and Heterogeneity

Chen, Qin, Song, Guojie

Heterogeneous graphs (HGs) are common in real-world scenarios and often exhibit heterophily. However, most existing studies focus on either heterogeneity or heterophily in isolation, overlooking the prevalence of heterophilic HGs in practical applications. Such ignorance leads to their performance degradation. In this work, we first identify two main challenges in modeling heterophily HGs: (1) varying heterophily distributions across hops and meta-paths; (2) the intricate and often heterophily-driven diversity of semantic information across different meta-paths. Then, we propose the Adaptive Heterogeneous Graph Neural Network (AHGNN) to tackle these challenges. AHGNN employs a heterophily-aware convolution that accounts for heterophily distributions specific to both hops and meta-paths. It then integrates messages from diverse semantic spaces using a coarse-to-fine attention mechanism, which filters out noise and emphasizes informative signals. Experiments on seven real-world graphs and twenty baselines demonstrate the superior performance of AHGNN, particularly in high-heterophily situations.

artificial intelligence, graph, machine learning, (10 more...)

2508.06034

Country:

Asia (0.94)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Majchrzak, Martyna, Mańdziuk, Jacek

Training chord recognition models on artificially generated audio

One of the challenging problems in Music Information Retrieval is the acquisition of enough non-copyrighted audio recordings for model training and evaluation. This study compares two Transformer-based neural network models for chord sequence recognition in audio recordings and examines the effectiveness of using an artificially generated dataset for this purpose. The models are trained on various combinations of Artificial Audio Multitracks (AAM), Schubert's Winterreise Dataset, and the McGill Billboard Dataset and evaluated with three metrics: Root, MajMin and Chord Content Metric (CCM). The experiments prove that even though there are certainly differences in complexity and structure between artificially generated and human-composed music, the former can be useful in certain scenarios. Specifically, AAM can enrich a smaller training dataset of music composed by a human or can even be used as a standalone training set for a model that predicts chord sequences in pop music, if no other data is available.

artificial intelligence, machine learning, natural language, (19 more...)

2508.05878

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

MM-FusionNet: Context-Aware Dynamic Fusion for Multi-modal Fake News Detection with Large Vision-Language Models

He, Junhao, Liu, Tianyu, Zhao, Jingyuan, Turner, Benjamin

The proliferation of multi-modal fake news on social media poses a significant threat to public trust and social stability. Traditional detection methods, primarily text-based, often fall short due to the deceptive interplay between misleading text and images. While Large Vision-Language Models (LVLMs) offer promising avenues for multi-modal understanding, effectively fusing diverse modal information, especially when their importance is imbalanced or contradictory, remains a critical challenge. This paper introduces MM-FusionNet, an innovative framework leveraging LVLMs for robust multi-modal fake news detection. Our core contribution is the Context-Aware Dynamic Fusion Module (CADFM), which employs bi-directional cross-modal attention and a novel dynamic modal gating network. This mechanism adaptively learns and assigns importance weights to textual and visual features based on their contextual relevance, enabling intelligent prioritization of information. Evaluated on the large-scale Multi-modal Fake News Dataset (LMFND) comprising 80,000 samples, MM-FusionNet achieves a state-of-the-art F1-score of 0.938, surpassing existing multi-modal baselines by approximately 0.5% and significantly outperforming single-modal approaches. Further analysis demonstrates the model's dynamic weighting capabilities, its robustness to modality perturbations, and performance remarkably close to human-level, underscoring its practical efficacy and interpretability for real-world fake news detection.

large language model, machine learning, natural language, (18 more...)

2508.05684

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

From Static to Dynamic: A Streaming RAG Approach to Real-time Knowledge Base

Zhu, Yuzhou

Dynamic streams from news feeds, social media, sensor networks, and financial markets challenge static RAG frameworks. Full-scale indices incur high memory costs; periodic rebuilds introduce latency that undermines data freshness; naive sampling sacrifices semantic coverage. We present Streaming RAG, a unified pipeline that combines multi-vector cosine screening, mini-batch clustering, and a counter-based heavy-hitter filter to maintain a compact prototype set. We further prove an approximation bound \$E\[R(K\_t)] \ge R^\* - L Δ\$ linking retrieval quality to clustering variance. An incremental index upsert mechanism refreshes prototypes without interrupting queries. Experiments on eight real-time streams show statistically significant gains in Recall\@10 (up to 3 points, p < 0.01), end-to-end latency below 15 ms, and throughput above 900 documents per second under a 150 MB budget. Hyperparameter sensitivity analysis over cluster count, admission probability, relevance threshold, and counter capacity validates default settings. In open-domain question answering with GPT-3.5 Turbo, we record 3.2-point gain in Exact Match and 2.8-point gain in F1 on SQuAD; abstractive summarization yields ROUGE-L improvements. Streaming RAG establishes a new Pareto frontier for retrieval augmentation.

large language model, machine learning, woodruff, (21 more...)

2508.05662

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.67)

Industry:

Banking & Finance (0.70)
Media > News (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
(2 more...)

Beyond Single Labels: Improving Conversational Recommendation through LLM-Powered Data Augmentation

Xu, Haozhe, Wang, Xiaohua, Lv, Changze, Zheng, Xiaoqing

Conversational recommender systems (CRSs) enhance recommendation quality by engaging users in multi-turn dialogues, capturing nuanced preferences through natural language interactions. However, these systems often face the false negative issue, where items that a user might like are incorrectly labeled as negative during training, leading to suboptimal recommendations.Expanding the label set through data augmentation presents an intuitive solution but faces the challenge of balancing two key aspects: ensuring semantic relevance and preserving the collaborative information inherent in CRS datasets. To address these issues, we propose a novel data augmentation framework that first leverages an LLM-based semantic retriever to identify diverse and semantically relevant items, which are then filtered by a relevance scorer to remove noisy candidates. Building on this, we introduce a two-stage training strategy balancing semantic relevance and collaborative information. Extensive experiments on two benchmark datasets and user simulators demonstrate significant and consistent performance improvements across various recommenders, highlighting the effectiveness of our approach in advancing CRS performance.

large language model, machine learning, natural language, (17 more...)

2508.05657

Country: Asia (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

De Santis, Enrico, Rizzi, Antonello

Noosemia: toward a Cognitive and Phenomenological Account of Intentionality Attribution in Human-Generative AI Interaction

This paper introduces and formalizes Noosemìa, a novel cognitive-phenomenological pattern emerging from human interaction with generative AI systems, particularly those enabling dialogic or multimodal exchanges. We propose a multidisciplinary framework to explain how, under certain conditions, users attribute intentionality, agency, and even interiority to these systems - a process grounded not in physical resemblance, but in linguistic performance, epistemic opacity, and emergent technological complexity. By linking an LLM declination of meaning holism to our technical notion of the LLM Contextual Cognitive Field, we clarify how LLMs construct meaning relationally and how coherence and a simulacrum of agency arise at the human-AI interface. The analysis situates noosemia alongside pareidolia, animism, the intentional stance and the uncanny valley, distinguishing its unique characteristics. We also introduce a-noosemia to describe the phenomenological withdrawal of such projections. The paper concludes with reflections on the broader philosophical, epistemological and social implications of noosemic dynamics and directions for future research.

large language model, machine learning, natural language, (19 more...)

2508.02622

Country:

North America > United States (1.00)
Europe (0.67)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology (1.00)
Government (1.00)
Media (0.92)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

CUB: Benchmarking Context Utilisation Techniques for Language Models

Hagström, Lovisa, Kim, Youna, Yu, Haeun, Lee, Sang-goo, Johansson, Richard, Cho, Hyunsoo, Augenstein, Isabelle

Incorporating external knowledge is crucial for knowledge-intensive tasks, such as question answering and fact checking. However, language models (LMs) may ignore relevant information that contradicts outdated parametric memory or be distracted by irrelevant contexts. While many context utilisation manipulation techniques (CMTs) have recently been proposed to alleviate these issues, few have seen systematic comparison. In this paper, we develop CUB (Context Utilisation Benchmark) - the first comprehensive benchmark designed to help practitioners within retrieval-augmented generation (RAG) diagnose CMTs under different context conditions. With this benchmark, we conduct the most extensive evaluation to date of seven state-of-the-art methods, representative of the main categories of CMTs, across three diverse datasets and tasks, applied to nine LMs. Our results reveal that most existing CMTs struggle to handle the full spectrum of context types encountered in real-world retrieval-augmented scenarios. We also find that many CMTs display inflated performance on simple synthesised datasets, compared to more realistic datasets with naturally occurring samples. Our findings expose critical gaps in current CMT evaluation practices and demonstrate the need for holistic testing and the development of CMTs that can robustly handle multiple context types.

computational linguistic, large language model, machine learning, (20 more...)

2505.16518

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

The GuardianAug-10-2025, 05:00:15 GMT

Digital resurrection: fascination and fear over the rise of the deathbot

Rod Stewart had a few surprise guests at a recent concert in Charlotte, North Carolina. His old friend Ozzy Osbourne, the lead singer of Black Sabbath who died last month, was apparently beamed in from some kind of rock heaven, where he was reunited with other departed stars including Michael Jackson, Tina Turner and Bob Marley. The AI-generated images divided Stewart's fans. Some denounced them as disrespectful and distasteful; others found the tribute beautiful. At about the same time, another AI controversy erupted when Jim Acosta, a former CNN White House correspondent, interviewed a digital recreation of Joaquin Oliver, who was killed at the age of 17 in a 2018 high school shooting in Florida.

avatar, deathbot, resurrection, (15 more...)

The Guardian

Country:

North America > United States > North Carolina > Mecklenburg County > Charlotte (0.25)
Asia > China (0.05)

Industry:

Education > Health & Safety > School Safety & Security > School Violence (0.89)
Media > Music (0.55)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.32)