AITopics

SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models

Tang, Zhenwei, Jiao, Difan, Yang, Blair, Anderson, Ashton

Evaluating whether vision-language models (VLMs) reason consistently across representations is challenging because modality comparisons are typically confounded by task differences and asymmetric information. We introduce SEAM, a benchmark that pairs semantically equivalent inputs across four domains that have existing standardized textual and visual notations. By employing distinct notation systems across modalities, in contrast to OCR-based image-text pairing, SEAM provides a rigorous comparative assessment of the textual-symbolic and visual-spatial reasoning capabilities of VLMs. Across 21 contemporary models, we observe systematic modality imbalance: vision frequently lags language in overall performance, despite the problems containing semantically equivalent information, and cross-modal agreement is relatively low. Our error analysis reveals two main drivers: textual perception failures from tokenization in domain notation and visual perception failures that induce hallucinations. We also show that our results are largely robust to visual transformations. SEAM establishes a controlled, semantically equivalent setting for measuring and improving modality-agnostic reasoning.

large language model, machine learning, natural language, (19 more...)

2508.18179

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (0.93)
Leisure & Entertainment > Games > Chess (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Versatile Framework for Song Generation with Prompt-based Control

Zhang, Yu, Guo, Wenxiang, Pan, Changhao, Zhu, Zhiyuan, Li, Ruiqi, Lu, Jingyu, Huang, Rongjie, Zhang, Ruiyuan, Hong, Zhiqing, Jiang, Ziyue, Zhao, Zhou

Song generation focuses on producing controllable high-quality songs based on various prompts. However, existing methods struggle to generate vocals and accompaniments with prompt-based control and proper alignment. Additionally, they fall short in supporting various tasks. To address these challenges, we introduce VersBand, a multi-task song generation framework for synthesizing high-quality, aligned songs with prompt-based control. VersBand comprises these primary models: 1) VocalBand, a decoupled model, leverages the flow-matching method for generating singing styles, pitches, and mel-spectrograms, allowing fast, high-quality vocal generation with style control. 2) AccompBand, a flow-based transformer model, incorporates the Band-MOE, selecting suitable experts for enhanced quality, alignment, and control. This model allows for generating controllable, high-quality accompaniments aligned with vocals. 3) Two generation models, LyricBand for lyrics and MelodyBand for melodies, contribute to the comprehensive multi-task song generation system, allowing for extensive control based on multiple prompts. Experimental results show that VersBand outperforms baseline models across multiple song generation tasks using objective and subjective metrics. Demos and codes are available at https://aaronz345.github.io/VersBandDemo and https://github.com/AaronZ345/VersBand.

large language model, machine learning, natural language, (19 more...)

2504.19062

Genre: Research Report > New Finding (0.48)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Speech (0.67)

Poudel, Amrit, Milkowski, Maria, Weninger, Tim

The Power of Framing: How News Headlines Guide Search Behavior

Search engines play a central role in how people gather information, but subtle cues like headline framing may influence not only what users believe but also how they search. While framing effects on judgment are well documented, their impact on subsequent search behavior is less understood. We conducted a controlled experiment where participants issued queries and selected from headlines filtered by specific linguistic frames. Headline framing significantly shaped follow-up queries: conflict and strategy frames disrupted alignment with prior selections, while episodic frames led to more concrete queries than thematic ones. We also observed modest short-term frame persistence that declined over time. These results suggest that even brief exposure to framing can meaningfully alter the direction of users information-seeking behavior.

artificial intelligence, information retrieval, natural language, (17 more...)

2508.17131

Country: North America > United States (0.68)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area (0.94)
Media > News (0.69)
Education > Educational Setting (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.51)

Token Homogenization under Positional Bias

Yusupov, Viacheslav, Maksimov, Danil, Alaeva, Ameliia, Zaitceva, Tatiana, Anna, Antipina, Vasileva, Anna, Liu, Chenlin, Chheng, Rayuth, Sazanakov, Danil, Chetvergov, Andrey, Ermilova, Alina, Shvetsov, Egor

This paper investigates token homogenization - the convergence of token representations toward uniformity across transformer layers and its relationship to positional bias in large language models. We empirically examine whether homogenization occurs and how positional bias amplifies this effect. Through layer-wise similarity analysis and controlled experiments, we demonstrate that tokens systematically lose distinctiveness during processing, particularly when biased toward extremal positions. Our findings confirm both the existence of homogenization and its dependence on positional attention mechanisms.

large language model, machine learning, natural language, (18 more...)

2508.17126

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.54)

Industry:

Media > Film (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Rad, Melissa Kazemi, Purpura, Alberto, Kumar, Himanshu, Chen, Emily, Sorower, Mohammad Shahed

GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection

We address the problem of data scarcity in harmful text classification for guardrailing applications and introduce GRAID (Geometric and Reflective AI-Driven Data Augmentation), a novel pipeline that leverages Large Language Models (LLMs) for dataset augmentation. GRAID consists of two stages: (i) generation of geometrically controlled examples using a constrained LLM, and (ii) augmentation through a multi-agentic reflective process that promotes stylistic diversity and uncovers edge cases. This combination enables both reliable coverage of the input space and nuanced exploration of harmful content. Using two benchmark data sets, we demonstrate that augmenting a harmful text classification dataset with GRAID leads to significant improvements in downstream guardrail model performance.

artificial intelligence, large language model, natural language, (16 more...)

2508.17057

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Law Enforcement & Public Safety (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Decoding Alignment: A Critical Survey of LLM Development Initiatives through Value-setting and Data-centric Lens

Chalkidis, Ilias

AI Alignment, primarily in the form of Reinforcement Learning from Human Feedback (RLHF), has been a cornerstone of the post-training phase in developing Large Language Models (LLMs). It has also been a popular research topic across various disciplines beyond Computer Science, including Philosophy and Law, among others, highlighting the socio-technical challenges involved. Nonetheless, except for the computational techniques related to alignment, there has been limited focus on the broader picture: the scope of these processes, which primarily rely on the selected objectives (values), and the data collected and used to imprint such objectives into the models. This work aims to reveal how alignment is understood and applied in practice from a value-setting and data-centric perspective. For this purpose, we investigate and survey (`audit') publicly available documentation released by 6 LLM development initiatives by 5 leading organizations shaping this technology, focusing on proprietary (OpenAI's GPT, Anthropic's Claude, Google's Gemini) and open-weight (Meta's Llama, Google's Gemma, and Alibaba's Qwen) initiatives, all published in the last 3 years. The findings are documented in detail per initiative, while there is also an overall summary concerning different aspects, mainly from a value-setting and data-centric perspective. On the basis of our findings, we discuss a series of broader related concerns.

large language model, machine learning, natural language, (19 more...)

2508.16982

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Rao, Pooja S. B., Šćepanović, Sanja, Jayagopi, Dinesh Babu, Cherubini, Mauro, Quercia, Daniele

The AI Model Risk Catalog: What Developers and Researchers Miss About Real-World AI Harms

We analyzed nearly 460,000 AI model cards from Hugging Face to examine how developers report risks. From these, we extracted around 3,000 unique risk mentions and built the \emph{AI Model Risk Catalog}. We compared these with risks identified by researchers in the MIT Risk Repository and with real-world incidents from the AI Incident Database. Developers focused on technical issues like bias and safety, while researchers emphasized broader social impacts. Both groups paid little attention to fraud and manipulation, which are common harms arising from how people interact with AI. Our findings show the need for clearer, structured risk reporting that helps developers think about human-interaction and systemic risks early in the design process. The catalog and paper appendix are available at: https://social-dynamics.net/ai-risks/catalog.

category, large language model, machine learning, (15 more...)

2508.16672

Country:

North America > United States (0.93)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

The Impact of Artificial Intelligence on Human Thought

Gesnot, Rénald

This research paper examines, from a multidimensional perspective (cognitive, social, ethical, and philosophical), how AI is transforming human thought. It highlights a cognitive offloading effect: the externalization of mental functions to AI can reduce intellectual engagement and weaken critical thinking. On the social level, algorithmic personalization creates filter bubbles that limit the diversity of opinions and can lead to the homogenization of thought and polarization. This research also describes the mechanisms of algorithmic manipulation (exploitation of cognitive biases, automated disinformation, etc.) that amplify AI's power of influence. Finally, the question of potential artificial consciousness is discussed, along with its ethical implications. The report as a whole underscores the risks that AI poses to human intellectual autonomy and creativity, while proposing avenues (education, transparency, governance) to align AI development with the interests of humanity.

large language model, machine learning, natural language, (22 more...)

2508.16628

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Instructional Material (1.00)

Industry:

Media > News (1.00)
Leisure & Entertainment (1.00)
Information Technology > Services (1.00)
(10 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(6 more...)

Social Identity in Human-Agent Interaction: A Primer

Seaborn, Katie

Social identity theory (SIT) and social categorization theory (SCT) are two facets of the social identity approach (SIA) to understanding social phenomena. SIT and SCT are models that describe and explain how people interact with one another socially, connecting the individual to the group through an understanding of underlying psychological mechanisms and intergroup behaviour. SIT, originally developed in the 1970s, and SCT, a later, more general offshoot, have been broadly applied to a range of social phenomena among people. The rise of increasingly social machines embedded in daily life has spurned efforts on understanding whether and how artificial agents can and do participate in SIA activities. As agents like social robots and chatbots powered by sophisticated large language models (LLMs) advance, understanding the real and potential roles of these technologies as social entities is crucial. Here, I provide a primer on SIA and extrapolate, through case studies and imagined examples, how SIT and SCT can apply to artificial social agents. I emphasize that not all human models and sub-theories will apply. I further argue that, given the emerging competence of these machines and our tendency to be taken in by them, we experts may need to don the hat of the uncanny killjoy, for our own good.

artificial intelligence, large language model, natural language, (16 more...)

doi: 10.1145/3760500

2508.16609

Country:

North America > United States (1.00)
Asia > Japan (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Overview (1.00)

Industry:

Education > Educational Setting (0.67)
Media (0.67)
Leisure & Entertainment (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.95)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)