Media
CHRIS: Clothed Human Reconstruction with Side View Consistency
Liu, Dong, Yang, Yifan, Huang, Zixiong, Gao, Yuxin, Tan, Mingkui
--Creating a realistic clothed human from a single-view RGB image is crucial for applications like mixed reality and filmmaking. Despite some progress in recent years, mainstream methods often fail to fully utilize side-view information, as the input single-view image contains front-view information only. This leads to globally unrealistic topology and local surface inconsistency in side views. T o address these, we introduce Clothed Human Reconstruction with Side View Consistency, namely CHRIS, which consists of 1) A Side-View Normal Discriminator that enhances global visual reasonability by distinguishing the generated side-view normals from the ground truth ones; 2) A Multi-to-One Gradient Computation (M2O) that ensures local surface consistency. M2O calculates the gradient of a sampling point by integrating the gradients of the nearby points, effectively acting as a smooth operation. Experimental results demonstrate that CHRIS achieves state-of-the-art performance on public benchmarks and outperforms the prior work. Creating realistic digital humans with intricate clothing details plays a pivotal role in the field of mixed reality [1] and filmmaking [2].
Evaluating Design Decisions for Dual Encoder-based Entity Disambiguation
Entity disambiguation (ED) is the task of linking mentions in text to corresponding entries in a knowledge base. Dual Encoders address this by embedding mentions and label candidates in a shared embedding space and applying a similarity metric to predict the correct label. In this work, we focus on evaluating key design decisions for Dual Encoder-based ED, such as its loss function, similarity metric, label verbalization format, and negative sampling strategy. We present the resulting model VerbalizED, a document-level Dual Encoder model that includes contextual label verbalizations and efficient hard negative sampling. Additionally, we explore an iterative prediction variant that aims to improve the disambiguation of challenging data points. Comprehensive experiments on AIDA-Yago validate the effectiveness of our approach, offering insights into impactful design choices that result in a new State-of-the-Art system on the ZELDA benchmark.
AI-generated Text Detection: A Multifaceted Approach to Binary and Multiclass Classification
Abburi, Harika, Bhattacharya, Sanmitra, Bowen, Edward, Pudota, Nirmala
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing across a wide range of styles and genres. However, such capabilities are prone to potential misuse, such as fake news generation, spam email creation, and misuse in academic assignments. As a result, accurate detection of AI-generated text and identification of the model that generated it are crucial for maintaining the responsible use of LLMs. In this work, we addressed two sub-tasks put forward by the Defactify workshop under AI-Generated Text Detection shared task at the Association for the Advancement of Artificial Intelligence (AAAI 2025): Task A involved distinguishing between human-authored or AI-generated text, while Task B focused on attributing text to its originating language model. For each task, we proposed two neural architectures: an optimized model and a simpler variant. For Task A, the optimized neural architecture achieved fifth place with $F1$ score of 0.994, and for Task B, the simpler neural architecture also ranked fifth place with $F1$ score of 0.627.
AI can be more persuasive than humans in debates, scientists find
Artificial intelligence can do just as well as humans, if not better, when it comes to persuading others in a debate, and not just because it cannot shout, a study has found. Experts say the results are concerning, not least as it has potential implications for election integrity. "If persuasive AI can be deployed at scale, you can imagine armies of bots microtargeting undecided voters, subtly nudging them with tailored political narratives that feel authentic," said Francesco Salvi, the first author of the research from the Swiss Federal Institute of Technology in Lausanne. He added that such influence was hard to trace, even harder to regulate and nearly impossible to debunk in real time. "I would be surprised if malicious actors hadn't already started to use these tools to their advantage to spread misinformation and unfair propaganda," Salvi said.
AI can do a better job of persuading people than we do
Their findings are the latest in a growing body of research demonstrating LLMs' powers of persuasion. The authors warn they show how AI tools can craft sophisticated, persuasive arguments if they have even minimal information about the humans they're interacting with. The research has been published in the journal Nature Human Behavior. "Policymakers and online platforms should seriously consider the threat of coordinated AI-based disinformation campaigns, as we have clearly reached the technological level where it is possible to create a network of LLM-based automated accounts able to strategically nudge public opinion in one direction," says Riccardo Gallotti, an interdisciplinary physicist at Fondazione Bruno Kessler in Italy, who worked on the project. "These bots could be used to disseminate disinformation, and this kind of diffused influence would be very hard to debunk in real time," he says.
User-centric Music Recommendations
Castillo, Jaime Ramirez, Flores, M. Julia, Nicholson, Ann E.
This work presents a user-centric recommendation framework, designed as a pipeline with four distinct, connected, and customizable phases. These phases are intended to improve explainability and boost user engagement. We have collected the historical Last.fm track playback records of a single user over approximately 15 years. The collected dataset includes more than 90,000 playbacks and approximately 14,000 unique tracks. From track playback records, we have created a dataset of user temporal contexts (each row is a specific moment when the user listened to certain music descriptors). As music descriptors, we have used community-contributed Last.fm tags and Spotify audio features. They represent the music that, throughout years, the user has been listening to. Next, given the most relevant Last.fm tags of a moment (e.g. the hour of the day), we predict the Spotify audio features that best fit the user preferences in that particular moment. Finally, we use the predicted audio features to find tracks similar to these features. The final aim is to recommend (and discover) tracks that the user may feel like listening to at a particular moment. For our initial study case, we have chosen to predict only a single audio feature target: danceability. The framework, however, allows to include more target variables. The ability to learn the musical habits from a single user can be quite powerful, and this framework could be extended to other users.
DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models
Bertazzini, Giulia, Baracchi, Daniele, Shullani, Dasara, Echizen, Isao, Piva, Alessandro
The remarkable ease of use of diffusion models for image generation has led to a proliferation of synthetic content online. While these models are often employed for legitimate purposes, they are also used to generate fake images that support misinformation and hate speech. Consequently, it is crucial to develop robust tools capable of detecting whether an image has been generated by such models. Many current detection methods, however, require large volumes of sample images for training. Unfortunately, due to the rapid evolution of the field, existing datasets often cover only a limited range of models and quickly become outdated. In this work, we introduce DRAGON, a comprehensive dataset comprising images from 25 diffusion models, spanning both recent advancements and older, well-established architectures. The dataset contains a broad variety of images representing diverse subjects. To enhance image realism, we propose a simple yet effective pipeline that leverages a large language model to expand input prompts, thereby generating more diverse and higher-quality outputs, as evidenced by improvements in standard quality metrics. The dataset is provided in multiple sizes (ranging from extra-small to extra-large) to accomodate different research scenarios. DRAGON is designed to support the forensic community in developing and evaluating detection and attribution techniques for synthetic content. Additionally, the dataset is accompanied by a dedicated test set, intended to serve as a benchmark for assessing the performance of newly developed methods.
Qualia Optimization
This report explores the speculative question: what if current or future AI systems have qualia, such as pain or pleasure? It does so by assuming that AI systems might someday possess qualia -- and that the quality of these subjective experiences should be considered alongside performance metrics. Concrete mathematical problem settings, inspired by reinforcement learning formulations and theories from philosophy of mind, are then proposed and initial approaches and properties are presented. These properties enable refinement of the problem setting, culminating with the proposal of methods that promote reinforcement.
The heteronomy of algorithms: Traditional knowledge and computational knowledge
If an active citizen should increasingly be a computationally enlightened one, replacing the autonomy of reason with the heteronomy of algorithms, then I argue in this article that we must begin teaching the principles of critiquing the computal through new notions of what we might call digital Bildung. Indeed, if civil society itself is mediated by computational systems and media, the public use of reason must also be complemented by skills for negotiating and using these computal forms to articulate such critique. Not only is there a need to raise the intellectual tone regarding computation and its related softwarization processes, but there is an urgent need to attend to the likely epistemic challenges from computation which, as presently constituted, tends towards justification through a philosophy of utility rather than through a philosophy of care for the territory of the intellect. We therefore need to develop an approach to this field that uses concepts and methods drawn from philosophy, politics, history, anthropology, sociology, media studies, computer science, and the humanities more generally, to try to understand these issues - particularly the way in which software and data increasingly penetrate our everyday life and the pressures and fissures that are created. We must, in other words, move to undertake a critical interdisciplinary research program to understand the way in which these systems are created, instantiated, and normatively engendered in both specific and general contexts.
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
Sun, Jiashuo, Zhong, Xianrui, Zhou, Sizhe, Han, Jiawei
Retrieval-augmented generation (RAG) systems combine large language models (LLMs) with external knowledge retrieval, making them highly effective for knowledge-intensive tasks. A crucial but often under-explored component of these systems is the reranker. Since irrelevant documents in RAG systems can mislead the generator, the reranker plays a vital role in refining retrieved documents to enhance generation quality and explainability. However, it is challenging to determine the appropriate number of documents ($k$) that the reranker should select: too few may result in missing critical information, while too many introduce noise and inefficiencies. Although recent studies have explored LLM-based rerankers, they primarily leverage internal model knowledge and overlook the rich supervisory signals that LLMs can provide, such as using response quality as feedback for optimizing reranking decisions. In this paper, we propose DynamicRAG, a novel RAG framework where the reranker dynamically adjusts both the order and number of retrieved documents based on the query. We model the reranker as an agent optimized through reinforcement learning (RL), using rewards derived from LLM output quality. Across seven knowledge-intensive datasets, DynamicRAG demonstrates superior performance, achieving state-of-the-art results among models of same parameter sizes. The model, data and code are available at https://github.com/GasolSun36/DynamicRAG.