AITopics

doi: 10.1109/ECAI65401.2025.11095452

2507.11084

Country: Asia > Bangladesh (0.72)

Genre: Research Report > New Finding (0.35)

Industry:

Information Technology (0.68)
Media > News (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-6-2025

AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation

Rong, Yan, Wang, Jinting, Lei, Guangzhi, Yang, Shan, Liu, Li

Multimodality-to-Multiaudio (MM2MA) generation faces significant challenges in synthesizing diverse and contextually aligned audio types (e.g., sound effects, speech, music, and songs) from multimodal inputs (e.g., video, text, images), owing to the scarcity of high-quality paired datasets and the lack of robust multi-task learning frameworks. Recently, multi-agent system shows great potential in tackling the above issues. However, directly applying it to MM2MA task presents three critical challenges: (1) inadequate fine-grained understanding of multimodal inputs (especially for video), (2) the inability of single models to handle diverse audio events, and (3) the absence of self-correction mechanisms for reliable outputs. To this end, we propose AudioGenie, a novel training-free multi-agent system featuring a dual-layer architecture with a generation team and a supervisor team. For the generation team, a fine-grained task decomposition and an adaptive Mixture-of-Experts (MoE) collaborative entity are designed for detailed comprehensive multimodal understanding and dynamic model selection, and a trial-and-error iterative refinement module is designed for self-correction. The supervisor team ensures temporal-spatial consistency and verifies outputs through feedback loops. Moreover, we build MA-Bench, the first benchmark for MM2MA tasks, comprising 198 annotated videos with multi-type audios. Experiments demonstrate that our AudioGenie achieves state-of-the-art (SOTA) or comparable performance across 9 metrics in 8 tasks. User study further validates the effectiveness of our method in terms of quality, accuracy, alignment, and aesthetic. The project website with audio samples can be found at https://audiogenie.github.io/.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

2505.22053

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report (0.82)

Industry:

Media (0.94)
Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

SlateAug-5-2025, 23:02:34 GMT

The Much-Hyped New em Wizard of Oz /em Is an Atrocity

Although it is, at least according to the Library of Congress, the most-watched movie of all time, The Wizard of Oz was a costly failure at the box office, and only became a perennial favorite thanks to the regular TV airings that began in the 1950s. But in the decades since it's become a metonym for the wonder of the big screen, a movie even people who prefer their content streaming will make the effort to see in a movie theater. Beginning on Labor Day weekend, audiences will get to experience the movie on perhaps the largest screen ever created. But it won't be The Wizard of Oz as we've come to know it for the better part of a century. The version of the movie that will fill Las Vegas' Sphere starting Aug. 28 has been retooled to fit the venue's curved shell, its images enhanced and expanded to fill four football fields' worth of 16K LED screens--the foundation of an immersive presentation that also includes flames, gusts of wind, and inflatable flying monkeys piloted by drone. It is, to quote the title of a CBS news report, "The Wizard of Oz as you've never seen it before."

mankiewicz, movie, wizard, (14 more...)

Slate

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.61)
Africa > Middle East > Morocco > Casablanca-Settat Region > Casablanca (0.05)
North America > United States > New York (0.05)
North America > United States > Kansas (0.05)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence (0.48)
Information Technology > Communications > Social Media (0.31)

FOX NewsAug-5-2025, 15:37:09 GMT

Jim Acosta blasted on social media after 'interviewing' AI avatar of Parkland shooting victim

Jim Acosta and James Carville speculated whether President Trump will try to rig the 2026 midterms in his favor on "The Jim Acosta Show." Former CNN anchor Jim Acosta was slammed on social media after he posted a clip of his "interview" with an artificially animated avatar of deceased teenager Joaquin Oliver to promote a gun control message on Monday. Working with the gun control group Change the Ref, founded by Oliver's parents, Acosta had a conversation on his Substack with an avatar created by the father of the son, who was killed in the Parkland high school shooting in 2018. Oliver would have turned 25 on Monday. Social media users were shocked by Acosta's "grotesque" interview and slammed the journalist for using the deceased teen's avatar for political content.

acosta, avatar, interview, (12 more...)

FOX News

Genre: Research Report (0.79)

Industry:

Education > Health & Safety > School Safety & Security > School Violence (0.93)
Media > News (0.81)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.53)

FOX NewsAug-5-2025, 10:00:50 GMT

AI models can secretly infect each other

Fox News anchor Bret Baier examines the U.S. power supply on'Special Report.' Artificial intelligence is getting smarter. But it may also be getting more dangerous. A new study reveals that AI models can secretly transmit subliminal traits to one another, even when the shared training data appears harmless. Researchers showed that AI systems can pass along behaviors like bias, ideology, or even dangerous suggestions.

ai model, cyberguy, training data, (12 more...)

FOX News

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
Europe > Poland > Masovia Province > Warsaw (0.05)

Industry: Media > News (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.58)

Uni-Layout: Integrating Human Feedback in Unified Layout Generation and Evaluation

Lu, Shuo, Chen, Yanyin, Feng, Wei, Fan, Jiahao, Li, Fengheng, Zhang, Zheng, Lv, Jingjing, Shen, Junjie, Law, Ching, Liang, Jian

Layout generation plays a crucial role in enhancing both user experience and design efficiency. However, current approaches suffer from task-specific generation capabilities and perceptually misaligned evaluation metrics, leading to limited applicability and ineffective measurement. In this paper, we propose \textit{Uni-Layout}, a novel framework that achieves unified generation, human-mimicking evaluation and alignment between the two. For universal generation, we incorporate various layout tasks into a single taxonomy and develop a unified generator that handles background or element contents constrained tasks via natural language prompts. To introduce human feedback for the effective evaluation of layouts, we build \textit{Layout-HF100k}, the first large-scale human feedback dataset with 100,000 expertly annotated layouts. Based on \textit{Layout-HF100k}, we introduce a human-mimicking evaluator that integrates visual and geometric information, employing a Chain-of-Thought mechanism to conduct qualitative assessments alongside a confidence estimation module to yield quantitative measurements. For better alignment between the generator and the evaluator, we integrate them into a cohesive system by adopting Dynamic-Margin Preference Optimization (DMPO), which dynamically adjusts margins based on preference strength to better align with human judgments. Extensive experiments show that \textit{Uni-Layout} significantly outperforms both task-specific and general-purpose methods. Our code is publicly available at https://github.com/JD-GenX/Uni-Layout.

large language model, machine learning, natural language, (16 more...)

2508.02374

Country: Asia > China (0.16)

Genre: Research Report (1.00)

Industry: Media > Publishing (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Kim, Soyeon, Wang, Jindong, Xie, Xing, Whang, Steven Euijong

Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in Large Language Models

Facts evolve over time, making it essential for Large Language Models (LLMs) to handle time-sensitive factual knowledge accurately and reliably. While factual Time-Sensitive Question-Answering (TSQA) tasks have been widely studied, existing benchmarks often rely on manual curation or a small, fixed set of predefined templates, which restricts scalable and comprehensive TSQA evaluation. To address these challenges, we propose TDBench, a new benchmark that systematically constructs TSQA pairs by harnessing temporal databases and database techniques such as temporal SQL and functional dependencies. We also introduce a fine-grained evaluation metric called time accuracy, which assesses the validity of time references in model explanations alongside traditional answer accuracy to enable a more reliable TSQA evaluation. Extensive experiments on contemporary LLMs show how \ours{} enables scalable and comprehensive TSQA evaluation while reducing the reliance on human labor, complementing existing Wikipedia/Wikidata-based TSQA evaluation approaches by enabling LLM evaluation on application-specific data and seamless multi-hop question generation. Code and data are publicly available at: https://github.com/ssoy0701/tdbench.git.

large language model, machine learning, natural language, (13 more...)

2508.02045

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.93)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Sports > Olympic Games (1.00)
Government > Regional Government (1.00)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Diffusion Models for Future Networks and Communications: A Comprehensive Survey

Luong, Nguyen Cong, Hai, Nguyen Duc, Van Le, Duc, Nguyen, Huy T., Vu, Thai-Hoc, Huynh-The, Thien, Zhang, Ruichen, Anh, Nguyen Duc Duy, Niyato, Dusit, Di Renzo, Marco, Kim, Dong In, Pham, Quoc-Viet

The rise of Generative AI (GenAI) in recent years has catalyzed transformative advances in wireless communications and networks. Among the members of the GenAI family, Diffusion Models (DMs) have risen to prominence as a powerful option, capable of handling complex, high-dimensional data distribution, as well as consistent, noise-robust performance. In this survey, we aim to provide a comprehensive overview of the theoretical foundations and practical applications of DMs across future communication systems. We first provide an extensive tutorial of DMs and demonstrate how they can be applied to enhance optimizers, reinforcement learning and incentive mechanisms, which are popular approaches for problems in wireless networks. Then, we review and discuss the DM-based methods proposed for emerging issues in future networks and communications, including channel modeling and estimation, signal detection and data reconstruction, integrated sensing and communication, resource management in edge computing networks, semantic communications and other notable issues. We conclude the survey with highlighting technical limitations of DMs and their applications, as well as discussing future research directions.

large language model, machine learning, reinforcement learning, (18 more...)

2508.01586

Country:

Europe (1.00)
Asia > Vietnam (0.68)
North America > United States > California > Los Angeles County > Los Angeles (0.27)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Telecommunications (1.00)
Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
(4 more...)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(6 more...)

ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations

Al-Sabbagh, Rania

This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/) 2 R. Al-Sabbagh / Data in Brief 54 (2024) 1 10271 Subject Computer Science, Social Sciences Specific subject area Natural Language Processing, machine translation, large-language models, translation studies, cross-linguistic analysis, lexical semantics Data format Translated and aligned Type of data Texts (Bilingual tables in Microsoft Excel files) Data collection The ArzEn-MultiGenre dataset consists of three genres: song lyrics, novels, and subtitles. The data was gathered from various sources using different methods. A website was crawled for song lyrics using an in-house web crawler, and professional translators manually translated the lyrics into English. For novels, hard copies were collected in English and Egyptian Arabic, then scanned and converted into text files using an Optical Character Recognizer (OCR). The OCR output was then manually reviewed and aligned.

artificial intelligence, machine translation, natural language, (16 more...)

doi: 10.1016/j.dib.2024.110271

2508.01411

Country: Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

A hierarchy tree data structure for behavior-based user segment representation

Liu, Yang, Kang, Xuejiao, Iyer, Sathya, Malik, Idris, Li, Ruixuan, Wang, Juan, Lu, Xinchen, Zhao, Xiangxue, Wang, Dayong, Liu, Menghan, Liu, Isaac, Liang, Feng, Yu, Yinzhe

User attributes are essential in multiple stages of modern recommendation systems and are particularly important for mitigating the cold-start problem and improving the experience of new or infrequent users. We propose Behavior-based User Segmentation (BUS), a novel tree-based data structure that hierarchically segments the user universe with various users' categorical attributes based on the users' product-specific engagement behaviors. During the BUS tree construction, we use Normalized Discounted Cumulative Gain (NDCG) as the objective function to maximize the behavioral representativeness of marginal users relative to active users in the same segment. The constructed BUS tree undergoes further processing and aggregation across the leaf nodes and internal nodes, allowing the generation of popular social content and behavioral patterns for each node in the tree. To further mitigate bias and improve fairness, we use the social graph to derive the user's connection-based BUS segments, enabling the combination of behavioral patterns extracted from both the user's own segment and connection-based segments as the connection aware BUS-based recommendation. Our offline analysis shows that the BUS-based retrieval significantly outperforms traditional user cohort-based aggregation on ranking quality. We have successfully deployed our data structure and machine learning algorithm and tested it with various production traffic serving billions of users daily, achieving statistically significant improvements in the online product metrics, including music ranking and email notifications. To the best of our knowledge, our study represents the first list-wise learning-to-rank framework for tree-based recommendation that effectively integrates diverse user categorical attributes while preserving real-world semantic interpretability at a large industrial scale.

artificial intelligence, machine learning, recommendation, (19 more...)

2508.01115

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Services (0.46)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)