AITopics

This replication study modifies ALMM, the Adaptive Linear Mapping Model constructed for the next song recommendation, to the news recommendation problem on the MIND dataset. The original version of ALMM computes latent representations for users, last-time items, and current items in a tensor factorization structure and learns a linear mapping from content features to latent item vectors. Our replication aims to improve recommendation performance in cold-start scenarios by restructuring this model to sequential news click behavior, viewing consecutively read articles as (last news, next news) tuples. Instead of the original audio features, we apply BERT and a TF-IDF (Term Frequency-Inverse Document Frequency) to news titles and abstracts to extract token contextualized representations and align them with triplet-based user reading patterns. We also propose a reproducibly thorough pre-processing pipeline combining news filtering and feature integrity validation. Our implementation of ALMM with TF-IDF shows relatively improved recommendation accuracy and robustness over Forbes and Oord baseline models in the cold-start scenario. We demonstrate that ALMM in a minimally modified state is not suitable for next news recommendation.

artificial intelligence, machine learning, recommendation, (17 more...)

2508.01036

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)

Genre: Research Report (0.82)

Industry:

Media > Music (0.89)
Leisure & Entertainment (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

MAO-ARAG: Multi-Agent Orchestration for Adaptive Retrieval-Augmented Generation

Chen, Yiqun, Zhang, Erhan, Yan, Lingyong, Wang, Shuaiqiang, Huang, Jizhou, Yin, Dawei, Mao, Jiaxin

In question-answering (QA) systems, Retrieval-Augmented Generation (RAG) has become pivotal in enhancing response accuracy and reducing hallucination issues. The architecture of RAG systems varies significantly, encompassing single-round RAG, iterative RAG, and reasoning RAG, each tailored to address different types of queries. Due to the varying complexity of real-world queries, a fixed RAG pipeline often struggles to balance performance and cost efficiency across different queries. To address this challenge, we propose an adaptive RAG framework called MAO-ARAG, which leverages multi-agent orchestration. Our adaptive RAG is conceived as a multi-turn framework. Specifically, we define multiple executor agents, representing typical RAG modules such as query reformulation agents, document selection agent, and generation agents. A planner agent intelligently selects and integrates the appropriate agents from these executors into a suitable workflow tailored for each query, striving for high-quality answers while maintaining reasonable costs. During each turn, the planner agent is trained using reinforcement learning, guided by an outcome-based reward (F1 score) and a cost-based penalty, continuously improving answer quality while keeping costs within a reasonable range.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2508.01005

Country:

Asia (0.46)
North America > United States > Indiana (0.14)

Genre:

Research Report (0.82)
Workflow (0.51)

Industry: Media (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Generative AI Adoption in Postsecondary Education, AI Hype, and ChatGPT's Launch

Pedersen, Isabel

The rapid integration of generative artificial intelligence (AI) into postsecondary education and many other sectors resulted in a global reckoning with this new technology. This paper contributes to the study of the multifaceted influence of generative AI, with a particular focus on OpenAI's ChatGPT within academic settings during the first six months after the release in three specific ways . First, it scrutinize s the rise of ChatGPT as a transformative event construed through a study of mainstream discourses exhibiting AI hype. Second, i t discusses the perceived implications of generative AI for writing, teaching, and learning t hrough the lens of critical discourse analysis and critical AI studies . Third, i t encourages the necessity for best practices in the adoption of generative AI technologies in education.

artificial intelligence, machine learning, natural language, (14 more...)

doi: 10.18357/otessaj.2024.4.1.59

2508.01003

Country: North America (0.28)

Genre: Research Report (1.00)

Industry:

Media > News (1.00)
Education > Educational Setting > Higher Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

Niu, Yadong, Wang, Tianzi, Dinkel, Heinrich, Sun, Xingwei, Zhou, Jiahao, Li, Gang, Liu, Jizhong, Liu, Xunying, Zhang, Junbo, Luan, Jian

While large audio-language models have advanced open-ended audio understanding, they still fall short of nuanced human-level comprehension. This gap persists largely because current benchmarks, limited by data annotations and evaluation metrics, fail to reliably distinguish between generic and highly detailed model outputs. To this end, this work introduces MECAT, a Multi-Expert Constructed Benchmark for Fine-Grained Audio Understanding Tasks. Generated via a pipeline that integrates analysis from specialized expert models with Chain-of-Thought large language model reasoning, MECAT provides multi-perspective, fine-grained captions and open-set question-answering pairs. The benchmark is complemented by a novel metric: DATE (Discriminative-Enhanced Audio Text Evaluation). This metric penalizes generic terms and rewards detailed descriptions by combining single-sample semantic similarity with cross-sample discriminability. A comprehensive evaluation of state-of-the-art audio models is also presented, providing new insights into their current capabilities and limitations. The data and code are available at https://github.com/xiaomi-research/mecat

artificial intelligence, large language model, natural language, (18 more...)

2507.23511

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.46)

Can Memory-Augmented LLM Agents Aid Journalism in Interpreting and Framing News for Diverse Audiences?

Ouyang, Leyi

Modern news is often comprehensive, weaving together information from diverse domains, including technology, finance, and agriculture. This very comprehensiveness creates a challenge for interpretation, as audiences typically possess specialized knowledge related to their expertise, age, or standpoint. Consequently, a reader might fully understand the financial implications of a story but fail to grasp or even actively misunderstand its legal or technological dimensions, resulting in critical comprehension gaps. In this work, we investigate how to identify these comprehension gaps and provide solutions to improve audiences' understanding of news content, particularly in the aspects of articles outside their primary domains of knowledge. We propose MADES, an agent-based framework designed to simulate societal communication. The framework utilizes diverse agents, each configured to represent a specific occupation or age group. Each agent is equipped with a memory system. These agents are then simulated to discuss the news. This process enables us to monitor and analyze their behavior and cognitive processes. Our findings indicate that the framework can identify confusions and misunderstandings within news content through its iterative discussion process. Based on these accurate identifications, the framework then designs supplementary material. We validated these outcomes using both statistical analysis and human evaluation, and the results show that agents exhibit significantly improved news understanding after receiving this supplementary material.

large language model, machine learning, natural language, (14 more...)

2507.21055

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > News (1.00)
Law (1.00)
Banking & Finance (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

A Scoping Review of Natural Language Processing in Addressing Medically Inaccurate Information: Errors, Misinformation, and Hallucination

Sun, Zhaoyi, Yim, Wen-Wai, Uzuner, Ozlem, Xia, Fei, Yetisgen, Meliha

Objective: This review aims to explore the potential and challenges of using Natural Language Processing (NLP) to detect, correct, and mitigate medically inaccurate information, including errors, misinformation, and hallucination. By unifying these concepts, the review emphasizes their shared methodological foundations and their distinct implications for healthcare. Our goal is to advance patient safety, improve public health communication, and support the development of more reliable and transparent NLP applications in healthcare. Methods: A scoping review was conducted following PRISMA guidelines, analyzing studies from 2020 to 2024 across five databases. Studies were selected based on their use of NLP to address medically inaccurate information and were categorized by topic, tasks, document types, datasets, models, and evaluation metrics. Results: NLP has shown potential in addressing medically inaccurate information on the following tasks: (1) error detection (2) error correction (3) misinformation detection (4) misinformation correction (5) hallucination detection (6) hallucination mitigation. However, challenges remain with data privacy, context dependency, and evaluation standards. Conclusion: This review highlights the advancements in applying NLP to tackle medically inaccurate information while underscoring the need to address persistent challenges. Future efforts should focus on developing real-world datasets, refining contextual methods, and improving hallucination management to ensure reliable and transparent healthcare applications.

information retrieval, large language model, machine learning, (25 more...)

doi: 10.1016/j.jbi.2025.104866

2505.00008

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Media > News (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Vaccines (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(8 more...)

Saad, Motaz, Langlois, David, Smaili, Kamel

Building and Aligning Comparable Corpora

Comparable corpus is a set of topic aligned documents in multiple languages, which are not necessarily translations of each other. These documents are useful for multilingual natural language processing when there is no parallel text available in some domains or languages. In addition, comparable documents are informative because they can tell what is being said about a topic in different languages. In this paper, we present a method to build comparable corpora from Wikipedia encyclopedia and EURONEWS website in English, French and Arabic languages. We further experiment a method to automatically align comparable documents using cross-lingual similarity measures. We investigate two cross-lingual similarity measures to align comparable documents. The first measure is based on bilingual dictionary, and the second measure is based on Latent Semantic Indexing (LSI). Experiments on several corpora show that the Cross-Lingual LSI (CL-LSI) measure outperforms the dictionary based measure. Finally, we collect English and Arabic news documents from the British Broadcast Corporation (BBC) and from ALJAZEERA (JSC) news website respectively. Then we use the CL-LSI similarity measure to automatically align comparable documents of BBC and JSC. The evaluation of the alignment shows that CL-LSI is not only able to align cross-lingual documents at the topic level, but also it is able to do this at the event level.

data mining, machine learning, natural language, (21 more...)

2508.02555

Country:

Europe (1.00)
Asia > Middle East (0.93)
North America > United States (0.93)

Genre: Research Report (1.00)

Industry: Media > News (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

What's in the News? Towards Identification of Bias by Commission, Omission, and Source Selection (COSS)

Zhukova, Anastasia, Ruas, Terry, Hamborg, Felix, Donnay, Karsten, Gipp, Bela

In a world overwhelmed with news, determining which information comes from reliable sources or how neutral is the reported information in the news articles poses a challenge to news readers. In this paper, we propose a methodology for automatically identifying bias by commission, omission, and source selection (COSS) as a joint three-fold objective, as opposed to the previous work separately addressing these types of bias. In a pipeline concept, we describe the goals and tasks of its steps toward bias identification and provide an example of a visualization that leverages the extracted features and patterns of text reuse.

artificial intelligence, information, natural language, (16 more...)

doi: 10.1109/JCDL57899.2023.00050

2508.0254

Country:

Europe > Germany (0.30)
North America > United States > New Mexico (0.15)

Genre: Research Report (0.40)

Industry: Media > News (1.00)

Technology:

Information Technology > Communications > Social Media (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.31)

AIAP: A No-Code Workflow Builder for Non-Experts with Natural Language and Multi-Agent Collaboration

An, Hyunjn, Kim, Yongwon, Seo, Wonduk, Park, Joonil, Kang, Daye, Oh, Changhoon, Kim, Dokyun, Lee, Seunghyun

While many tools are available for designing AI, non-experts still face challenges in clearly expressing their intent and managing system complexity. We introduce AIAP, a no-code platform that integrates natural language input with visual workflows. AIAP leverages a coordinated multi-agent system to decompose ambiguous user instructions into modular, actionable steps, hidden from users behind a unified interface. A user study involving 32 participants showed that AIAP's AI-generated suggestions, modular workflows, and automatic identification of data, actions, and context significantly improved participants' ability to develop services intuitively. These findings highlight that natural language-based visual programming significantly reduces barriers and enhances user experience in AI service design.

artificial intelligence, machine learning, natural language, (14 more...)

2508.0247

Country: North America > United States (0.46)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting

Luo, Miaosen, Long, Jiesen, Li, Zequn, Yang, Yunying, Jiang, Yuncheng, Mai, Sijie

--Multimodal Affective Computing (MAC) aims to recognize and interpret human emotions by integrating information from diverse modalities such as text, video, and audio. Recent advancements in Multimodal Large Language Models (MLLMs) have significantly reshaped the landscape of MAC by offering a unified framework for processing and aligning cross-modal information. However, practical challenges remain, including performance variability across complex MAC tasks and insufficient understanding of how architectural designs and data characteristics impact affective analysis. T o address these gaps, we conduct a systematic benchmark evaluation of state-of-the-art open-source MLLMs capable of concurrently processing audio, visual, and textual modalities across multiple established MAC datasets. Our evaluation not only compares the performance of these MLLMs but also provides actionable insights into model optimization by analyzing the influence of model architectures and dataset properties. Furthermore, we propose a novel hybrid strategy that combines generative knowledge prompting with supervised fine-tuning to enhance MLLMs' affective computing capabilities. Experimental results demonstrate that this integrated approach significantly improves performance across various MAC tasks, offering a promising avenue for future research and development in this field. Multimodal Affective Computing (MAC) aims to recognize, perceive, infer, and interpret human emotions through the integration of information from multiple modalities, including text, video, and audio [1]. Human emotional expressions are inherently complex and multimodal in nature [2], a characteristic that makes unimodal approaches particularly vulnerable to ambiguity, noise interference, and information loss [3].

large language model, machine learning, natural language, (15 more...)

2508.02429

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)