AITopics

This paper describes the instruction dataset used to fine-tune a set of transformer-based large language models (LLMs) developed in the PLLuM (Polish Large Language Model) project. We present a functional typology of the organic, converted, and synthetic instructions used in PLLuM and share some observations about the implications of using human-authored versus synthetic instruction datasets in the linguistic adaptation of base LLMs. Additionally, we release the first representative subset of the PLLuM instruction corpus (PLLuMIC), which we believe to be useful in guiding and planning the development of similar datasets for other LLMs.

large language model, machine learning, natural language, (20 more...)

2511.17161

Country:

Asia (1.00)
Europe > Poland (0.67)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Education (0.92)
Information Technology (0.68)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Device-Guided Music Transfer

Hung, Manh Pham, Hu, Changshuo, Dang, Ting, Ma, Dong

Device-guided music transfer adapts playback across unseen devices for users who lack them. Existing methods mainly focus on modifying the timbre, rhythm, harmony, or instrumentation to mimic genres or artists, overlooking the diverse hardware properties of the playback device (i.e., speaker). Therefore, we propose DeMT, which processes a speaker's frequency response curve as a line graph using a vision-language model to extract device embeddings. These embeddings then condition a hybrid transformer via feature-wise linear modulation. Fine-tuned on a self-collected dataset, DeMT enables effective speaker-style transfer and robust few-shot adaptation for unseen devices, supporting applications like device-style augmentation and quality enhancement.

artificial intelligence, machine learning, natural language, (17 more...)

2511.17136

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Do Vision-Language Models Understand Visual Persuasiveness?

Park, Gyuwon

Recent advances in vision-language models (VLMs) have enabled impressive multi-modal reasoning and understanding. Yet, whether these models truly grasp visual persuasion-how visual cues shape human attitudes and decisions-remains unclear. To probe this question, we construct a high-consensus dataset for binary persuasiveness judgment and introduce the taxonomy of Visual Persuasive Factors (VPFs), encompassing low-level perceptual, mid-level compositional, and high-level semantic cues. We also explore cognitive steering and knowledge injection strategies for persuasion-relevant reasoning. Empirical analysis across VLMs reveals a recall-oriented bias-models over-predict high persuasiveness-and weak discriminative power for low/mid-level features. In contrast, high-level semantic alignment between message and object presence emerges as the strongest predictor of human judgment. Among intervention strategies, simple instruction or unguided reasoning scaffolds yield marginal or negative effects, whereas concise, object-grounded rationales significantly improve precision and F1 scores. These results indicate that VLMs core limitation lies not in recognizing persuasive objects but in linking them to communicative intent.

large language model, machine learning, natural language, (18 more...)

2511.17036

Country:

North America > United States (0.46)
Europe > Austria (0.28)
Asia (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

ARQUSUMM: Argument-aware Quantitative Summarization of Online Conversations

Tang, An Quang, Zhang, Xiuzhen, Dinh, Minh Ngoc, Li, Zhuang

Online conversations have become more prevalent on public discussion platforms (e.g. Reddit). With growing controversial topics, it is desirable to summarize not only diverse arguments, but also their rationale and justification. Early studies on text summarization focus on capturing general salient information in source documents, overlooking the argumentative nature of online conversations. Recent research on conversation summarization although considers the argumentative relationship among sentences, fail to explicate deeper argument structure within sentences for summarization. In this paper, we propose a novel task of argument-aware quantitative summarization to reveal the claim-reason structure of arguments in conversations, with quantities measuring argument strength. We further propose ARQUSUMM, a novel framework to address the task. To reveal the underlying argument structure within sentences, ARQUSUMM leverages LLM few-shot learning grounded in the argumentation theory to identify propositions within sentences and their claim-reason relationships. For quantitative summarization, ARQUSUMM employs argument structure-aware clustering algorithms to aggregate arguments and quantify their support. Experiments show that ARQUSUMM outperforms existing conversation and quantitative summarization models and generate summaries representing argument structures that are more helpful to users, of high textual quality and quantification accuracy.

computational linguistic, large language model, machine learning, (16 more...)

2511.16985

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Media > News (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Generative Augmented Reality: Paradigms, Technologies, and Future Applications

Liang, Chen, Zheng, Jiawen, Zeng, Yufeng, Tan, Yi, Lyu, Hengye, Zheng, Yuhui, Li, Zisu, Weng, Yueting, Shi, Jiaxin, Zhang, Hanwang

This paper introduces Generative Augmented Reality (GAR) as a next-generation paradigm that reframes augmentation as a process of world re-synthesis rather than world composition by a conventional AR engine. GAR replaces the conventional AR engine's multi-stage modules with a unified generative backbone, where environmental sensing, virtual content, and interaction signals are jointly encoded as conditioning inputs for continuous video generation. We formalize the computational correspondence between AR and GAR, survey the technical foundations that make real-time generative augmentation feasible, and outline prospective applications that leverage its unified inference model. We envision GAR as a future AR paradigm that delivers high-fidelity experiences in terms of realism, interactivity, and immersion, while eliciting new research challenges on technologies, content ecosystems, and the ethical and societal implications.

large language model, machine learning, natural language, (18 more...)

2511.16783

Country:

North America > United States (0.69)
Asia > China (0.46)

Genre:

Overview (1.00)
Research Report (0.81)

Industry:

Media (0.93)
Education > Educational Setting (0.92)
Information Technology > Services (0.67)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Revisiting Audio-language Pretraining for Learning General-purpose Audio Representation

Tseng, Wei-Cheng, Zhou, Xuanru, Huo, Mingyue, Shao, Yiwen, Zhang, Hao, Yu, Dong

Audio-language pretraining holds promise for general-purpose audio understanding, yet remains underexplored compared to its vision counterpart. While vision-language models like CLIP serve as widely adopted foundations, existing audio-language models primarily excel at retrieval tasks with limited adoption as general-purpose encoders. We identify three key barriers: limited large-scale audio-text corpora, insufficient caption diversity, and lack of systematic exploration and evaluation. To this end, we introduce CaptionStew, a 10.7M caption dataset aggregating diverse open-source audio-text corpora across multiple domains and captioning styles. Using this resource, we conduct the first comprehensive evaluation comparing contrastive and captioning objectives for audio representation learning across speech, music, and environmental sound tasks. Our results demonstrate that audio-language pretraining yields competitive, transferable representations. Through systematic data-scaling experiments, we reveal complementary objective strengths: contrastive learning achieves superior data efficiency at smaller scales, while captioning demonstrates better scalability on language-involved audio understanding tasks. We also find that common supervised initialization practices provide diminishing returns at scale, challenging current approaches. These findings establish audio-language pretraining as a viable pathway toward general-purpose audio representations, guiding future research. To accelerate progress, we release data preparation recipes, training protocols, and pretrained models, paving the way toward universal audio understanding. Early advances relied on supervised learning, where models trained on labeled corpora were adapted to related downstream tasks or transferred across domains (Kong et al., 2020; Chen et al., 2022a; Snyder et al., 2018; Desplanques et al., 2020).

large language model, machine learning, natural language, (17 more...)

2511.16757

Country: Europe (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

AI use in American newspapers is widespread, uneven, and rarely disclosed

Russell, Jenna, Karpinska, Marzena, Akinode, Destiny, Thai, Katherine, Emi, Bradley, Spero, Max, Iyyer, Mohit

AI is rapidly transforming journalism, but the extent of its use in published newspaper articles remains unclear. We address this gap by auditing a large-scale dataset of 186K articles from online editions of 1.5K American newspapers published in the summer of 2025. Using Pangram, a state-of-the-art AI detector, we discover that approximately 9% of newly-published articles are either partially or fully AI-generated. This AI use is unevenly distributed, appearing more frequently in smaller, local outlets, in specific topics such as weather and technology, and within certain ownership groups. We also analyze 45K opinion pieces from Washington Post, New York Times, and Wall Street Journal, finding that they are 6.4 times more likely to contain AI-generated content than news articles from the same publications, with many AI-flagged op-eds authored by prominent public figures. Despite this prevalence, we find that AI use is rarely disclosed: a manual audit of 100 AI-flagged articles found only five disclosures of AI use. Overall, our audit highlights the immediate need for greater transparency and updated editorial standards regarding the use of AI in journalism to maintain public trust.

ai use, large language model, machine learning, (20 more...)

2510.18774

Country:

North America > United States (1.00)
Europe (0.93)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > New Finding (1.00)
Personal (1.00)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Communications > Social Media (0.67)

Srivastava, Vedika, Singh, Hemant Kumar, Singh, Jaisal

ISS-Geo142: A Benchmark for Geolocating Astronaut Photography from the International Space Station

This paper introduces ISS-Geo142, a curated benchmark for geolocating astronaut photography captured from the International Space Station (ISS). Although the ISS position at capture time is known precisely, the specific Earth locations depicted in these images are typically not directly georeferenced, making automated localization non-trivial. ISS-Geo142 consists of 142 images with associated metadata and manually determined geographic locations, spanning a range of spatial scales and scene types. On top of this benchmark, we implement and evaluate three geolocation pipelines: a neural network based approach (NN-Geo) using VGG16 features and cross-correlation over map-derived Areas of Interest (AOIs), a Scale-Invariant Feature Transform based pipeline (SIFT-Match) using sliding-window feature matching on stitched high-resolution AOIs, and TerraByte, an AI system built around a GPT-4 model with vision capabilities that jointly reasons over image content and ISS coordinates. On ISS-Geo142, NN-Geo achieves a match for 75.52\% of the images under our evaluation protocol, SIFT-Match attains high precision on structurally rich scenes at substantial computational cost, and TerraByte establishes the strongest overall baseline, correctly geolocating approximately 90\% of the images while also producing human-readable geographic descriptions. The methods and experiments were originally developed in 2023; this manuscript is a revised and extended version that situates the work relative to subsequent advances in cross-view geo-localization and remote-sensing vision--language models. Taken together, ISS-Geo142 and these three pipelines provide a concrete, historically grounded benchmark for future work on ISS image geolocation.

artificial intelligence, machine learning, natural language, (18 more...)

2504.21194

Country: Africa > Middle East > Egypt (0.28)

Genre: Research Report (0.64)

Industry:

Media > Photography (0.62)
Government > Space Agency (0.61)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.35)

Technology:

Information Technology > Geographic Information Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

FOX NewsNov-23-2025, 21:02:15 GMT

'Perfect storm': Doctors warn of alarming rise in adult-onset food allergies

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by Refinitiv Lipper .

artificial intelligence, food allergy, social media, (8 more...)

FOX News

Country: North America > United States (1.00)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine > Consumer Health (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.76)

FOX NewsNov-23-2025, 20:38:18 GMT

Researchers say human hair could soon be key to repairing teeth damaged by cavities

Scientists at King's College London developed a toothpaste ingredient using keratin from human hair that can repair and strengthen damaged tooth enamel.

artificial intelligence, lifestyle real estate tech science, social media, (8 more...)

FOX News

Country: North America > United States (0.71)

Genre: Research Report > New Finding (0.85)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine > Consumer Health (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.75)