AITopics

Cooperative systems often remain in persistently suboptimal yet stable states. This paper explains such "rational stagnation" as an equilibrium sustained by a rational adversary whose utility follows the principle of potential loss, $u_{D} = U_{ideal} - U_{actual}$. Starting from the Prisoner's Dilemma, we show that the transformation $u_{i}' = a\,u_{i} + b\,u_{j}$ and the ratio of mutual recognition $w = b/a$ generate a fragile cooperation band $[w_{\min},\,w_{\max}]$ where both (C,C) and (D,D) are equilibria. Extending to a dynamic model with stochastic cooperative payoffs $R_{t}$ and intervention costs $(C_{c},\,C_{m})$, a Bellman-style analysis yields three strategic regimes: immediate destruction, rational stagnation, and intervention abandonment. The appendix further generalizes the utility to a reference-dependent nonlinear form and proves its stability under reference shifts, ensuring robustness of the framework. Applications to social-media algorithms and political trust illustrate how adversarial rationality can deliberately preserve fragility.

adversary, artificial intelligence, machine learning, (15 more...)

2510.22232

Genre: Research Report (1.00)

Industry:

Government (1.00)
Media (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Ronai, Or, Kulikov, Vladimir, Michaeli, Tomer

FlowOpt: Fast Optimization Through Whole Flow Processes for Training-Free Editing

The remarkable success of diffusion and flow-matching models has ignited a surge of works on adapting them at test time for controlled generation tasks. Examples range from image editing to restoration, compression and personalization. However, due to the iterative nature of the sampling process in those models, it is computationally impractical to use gradient-based optimization to directly control the image generated at the end of the process. As a result, existing methods typically resort to manipulating each timestep separately. Here we introduce FlowOpt - a zero-order (gradient-free) optimization framework that treats the entire flow process as a black box, enabling optimization through the whole sampling path without backpropagation through the model. Our method is both highly efficient and allows users to monitor the intermediate optimization results and perform early stopping if desired. We prove a sufficient condition on FlowOpt's step-size, under which convergence to the global optimum is guaranteed. We further show how to empirically estimate this upper bound so as to choose an appropriate step-size. We demonstrate how FlowOpt can be used for image editing, showcasing two options: (i) inversion (determining the initial noise that generates a given image), and (ii) directly steering the edited image to be similar to the source image while conforming to a target text prompt. In both cases, FlowOpt achieves state-of-the-art results while using roughly the same number of neural function evaluations (NFEs) as existing methods. Code and examples are available on the project's webpage.

artificial intelligence, machine learning, natural language, (18 more...)

2510.2201

Genre: Research Report (0.50)

Industry: Media > Photography (0.55)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Generative AI in Depth: A Survey of Recent Advances, Model Variants, and Real-World Applications

Yazdani, Shamim, Singh, Akansha, Saxena, Nripsuta, Wang, Zichong, Palikhe, Avash, Pan, Deng, Pal, Umapada, Yang, Jie, Zhang, Wenbin

In recent years, deep learning based generative models, particularly Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models (DMs), have been instrumental in in generating diverse, high-quality content across various domains, such as image and video synthesis. This capability has led to widespread adoption of these models and has captured strong public interest. As they continue to advance at a rapid pace, the growing volume of research, expanding application areas, and unresolved technical challenges make it increasingly difficult to stay current. To address this need, this survey introduces a comprehensive taxonomy that organizes the literature and provides a cohesive framework for understanding the development of GANs, VAEs, and DMs, including their many variants and combined approaches. We highlight key innovations that have improved the quality, diversity, and controllability of generated outputs, reflecting the expanding potential of generative artificial intelligence. In addition to summarizing technical progress, we examine rising ethical concerns, including the risks of misuse and the broader societal impact of synthetic media. Finally, we outline persistent challenges and propose future research directions, offering a structured and forward looking perspective for researchers in this fast evolving field.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

2510.21887

Country:

Europe (1.00)
Asia (0.92)
North America > United States > California (0.28)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.92)
Research Report > New Finding (0.92)

Industry:

Leisure & Entertainment (1.00)
Law > Intellectual Property & Technology Law (1.00)
Information Technology > Security & Privacy (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Loth, Jackson, Sarmento, Pedro, Sandler, Mark, Barthet, Mathieu

GuitarFlow: Realistic Electric Guitar Synthesis From Tablatures via Flow Matching and Style Transfer

Music generation in the audio domain using artificial intelligence (AI) has witnessed steady progress in recent years. However for some instruments, particularly the guitar, controllable instrument synthesis remains limited in expressivity. We introduce GuitarFlow, a model designed specifically for electric guitar synthesis. The generative process is guided using tablatures, an ubiquitous and intuitive guitar-specific symbolic format. The tablature format easily represents guitar-specific playing techniques (e.g. bends, muted strings and legatos), which are more difficult to represent in other common music notation formats such as MIDI. Our model relies on an intermediary step of first rendering the tablature to audio using a simple sample-based virtual instrument, then performing style transfer using Flow Matching in order to transform the virtual instrument audio into more realistic sounding examples. This results in a model that is quick to train and to perform inference, requiring less than 6 hours of training data. We present the results of objective evaluation metrics, together with a listening test, in which we show significant improvement in the realism of the generated guitar audio from tablatures.

artificial intelligence, machine learning, natural language, (16 more...)

2510.21872

Country: Asia > Japan (0.28)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

HDR Image Reconstruction using an Unsupervised Fusion Model

Nagaswetha, Kumbha

High Dynamic Range (HDR) imaging aims to reproduce the wide range of brightness levels present in natural scenes, which the human visual system can perceive but conventional digital cameras often fail to capture due to their limited dynamic range. To address this limitation, we propose a deep learning-based multi-exposure fusion approach for HDR image generation. The method takes a set of differently exposed Low Dynamic Range (LDR) images, typically an underexposed and an overexposed image, and learns to fuse their complementary information using a convolutional neural network (CNN). The underexposed image preserves details in bright regions, while the overexposed image retains information in dark regions; the network effectively combines these to reconstruct a high-quality HDR output. The model is trained in an unsupervised manner, without relying on ground-truth HDR images, making it practical for real-world applications where such data is unavailable. We evaluate our results using the Multi-Exposure Fusion Structural Similarity Index Measure (MEF-SSIM) and demonstrate that our approach achieves superior visual quality compared to existing fusion methods. A customized loss function is further introduced to improve reconstruction fidelity and optimize model performance.

artificial intelligence, hdr image, machine learning, (17 more...)

2510.21815

Genre: Research Report > New Finding (0.34)

Industry: Media > Photography (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Chavali, Apoorva, Menezes, Reeve

Words to Waves: Emotion-Adaptive Music Recommendation System

Current recommendation systems often tend to overlook emotional context and rely on historical listening patterns or static mood tags. This paper introduces a novel music recommendation framework employing a variant of Wide and Deep Learning architecture that takes in real-time emotional states inferred directly from natural language as inputs and recommends songs that closely portray the mood. The system captures emotional contexts from user-provided textual descriptions by using transformer-based embeddings, which were finetuned to predict the emotional dimensions of valence-arousal. The deep component of the architecture utilizes these embeddings to generalize unseen emotional patterns, while the wide component effectively memorizes user-emotion and emotion-genre associations through cross-product features. Experimental results show that personalized music selections positively influence the user's emotions and lead to a significant improvement in emotional relevance.

artificial intelligence, machine learning, natural language, (17 more...)

2510.21724

Genre:

Research Report (0.70)
Overview (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ueda, Kentaro, Takayanagi, Takehiro

PREFINE: Personalized Story Generation via Simulated User Critics and User-Specific Rubric Generation

While recent advances in Large Language Models (LLMs) have improved the quality of creative text generation, significant challenges remain in producing personalized stories that reflect individual user preferences. Conventional approaches rely on explicit feedback or fine-tuning, which presents practical issues regarding user burden, data collection, computational costs, and privacy. In this work, we propose PREFINE (Persona-and-Rubric Guided Critique-and-Refine), a novel framework that extends the Critique-and-Refine paradigm to personalization. PREFINE constructs a pseudo-user agent from a user's interaction history and generates user-specific rubrics (evaluation criteria). By having this agent critique and refine outputs on the user's behalf based on these tailored rubrics, our method achieves personalized generation without requiring parameter updates or direct user feedback. We conducted a comprehensive evaluation on the PerDOC and PerMPST story datasets. We designed three baseline methods and several model variants to verify the contribution of each component of our framework. In automatic evaluations (LLM-as-a-Judge), PREFINE achieved higher win rates and statistically significant scores than the baselines, without compromising general story quality. Analysis of the model variants confirmed that both the pseudo-user agent and the user-specific rubrics are crucial for enhancing personalization performance. Beyond story generation, our approach holds potential for enabling efficient personalization in broader applications, such as dialogue systems, education, and recommendation.

large language model, machine learning, natural language, (20 more...)

2510.21721

Country:

Asia (1.00)
North America > United States (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Information Technology (0.92)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues

Kim, Eunsu, Park, Junyeong, Oh, Juhyun, Park, Kiwoong, Song, Seyoung, Doğruöz, A. Seza, Kim, Najoung, Oh, Alice

As large language models (LLMs) are increasingly used in human-AI interactions, their social reasoning capabilities in interpersonal contexts are critical. We introduce SCRIPTS, a 1k-dialogue dataset in English and Korean, sourced from movie scripts. The task involves evaluating models' social reasoning capability to infer the interpersonal relationships (e.g., friends, sisters, lovers) between speakers in each dialogue. Each dialogue is annotated with probabilistic relational labels (Highly Likely, Less Likely, Unlikely) by native (or equivalent) Korean and English speakers from Korea and the U.S. Evaluating nine models on our task, current proprietary LLMs achieve around 75-80% on the English dataset, whereas their performance on Korean drops to 58-69%. More strikingly, models select Unlikely relationships in 10-25% of their responses. Furthermore, we find that thinking models and chain-of-thought prompting, effective for general reasoning, provide minimal benefits for social reasoning and occasionally amplify social biases. Our findings reveal significant limitations in current LLMs' social reasoning capabilities, highlighting the need for efforts to develop socially-aware language models.

large language model, machine learning, natural language, (18 more...)

2510.19028

Country:

Asia (1.00)
Europe (0.67)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations

Barami, Tal, Berman, Nimrod, Naiman, Ilan, Hason, Amos H., Ezra, Rotem, Azencot, Omri

Learning disentangled representations in sequential data is a key goal in deep learning, with broad applications in vision, audio, and time series. While real-world data involves multiple interacting semantic factors over time, prior work has mostly focused on simpler two-factor static and dynamic settings, primarily because such settings make data collection easier, thereby overlooking the inherently multi-factor nature of real-world data. We introduce the first standardized benchmark for evaluating multi-factor sequential disentanglement across six diverse datasets spanning video, audio, and time series. Our benchmark includes modular tools for dataset integration, model development, and evaluation metrics tailored to multi-factor analysis. We additionally propose a post-hoc Latent Exploration Stage to automatically align latent dimensions with semantic factors, and introduce a Koopman-inspired model that achieves state-of-the-art results. Moreover, we show that Vision-Language Models can automate dataset annotation and serve as zero-shot disentanglement evaluators, removing the need for manual labels and human intervention. Together, these contributions provide a robust and scalable foundation for advancing multi-factor sequential disentanglement. Our code is available on GitHub, and the datasets and trained models are available on Hugging Face.

artificial intelligence, machine learning, natural language, (20 more...)

2510.17313

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Riou, Alain, Serrà, Joan, Mitsufuji, Yuki

Automatic Music Sample Identification with Multi-Track Contrastive Learning

ABSTRACT Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is, detecting such sampled content and retrieving the material from which it originates. To do so, we adopt a self-supervised learning approach that leverages a multi-track dataset to create positive pairs of artificial mixes, and design a novel contrastive learning objective. We show that such method significantly outperforms previous state-of-the-art baselines, that is robust to various genres, and that scales well when increasing the number of noise songs in the reference database. In addition, we extensively analyze the contribution of the different components of our training pipeline and highlight, in particular, the need for high-quality separated stems for this task.

artificial intelligence, dataset, machine learning, (14 more...)

2510.11507

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)