Goto

Collaborating Authors

 authenticity


Reddit's human content wins amid the AI flood

BBC News

Reddit's human content wins amid the AI flood For Ines Tan there's one particular site she turns to again and again for advice - and that's Reddit. Tan, who works in communications, regularly jumps on the site for skincare advice, to view reactions to shows she watches, such as The Traitors, and for help planning her upcoming wedding in May. It's a very empathetic place, she says of Reddit. For my wedding, I've found help emotionally, logistically and inspiration-wise. Tan believes people are consulting the online discussion platform more as they're craving human interaction in the world of increasing AI slop.


The Most Powerful Politics Influencers Barely Post About Politics

WIRED

New research shows that social media creators have enormous influence over their audiences' politics--especially those who don't normally share political content. Donald Trump's appearances on the podcasts of Joe Rogan and Theo Von, among others, were seen by many as a key part of securing his second term in office. But while Trump was speculating about alien life on Mars with Rogan, he had a team of acolytes appearing on dozens, if not hundreds, of much smaller niche podcasts hosted by right-wing content creators who typically don't talk about politics. This is how, just six days before the election, Kash Patel, the man now struggling to run the FBI, ended up appearing on the livestream, a fringe, QAnon-infused show hosted on a platform called Pilled. "The Deep State exists," Patel told the audience.


User Negotiations of Authenticity, Ownership, and Governance on AI-Generated Video Platforms: Evidence from Sora

Shen, Bohui, Bhatta, Shrikar, Ireebanije, Alex, Liu, Zexuan, Choudhry, Abhinav, Gumusel, Ece, Zhou, Kyrie Zhixuan

arXiv.org Artificial Intelligence

As AI-generated video platforms rapidly advance, ethical challenges such as copyright infringement emerge. This study examines how users make sense of AI-generated videos on OpenAI's Sora by conducting a qualitative content analysis of user comments. Through a thematic analysis, we identified four dynamics that characterize how users negotiate authenticity, authorship, and platform governance on Sora. First, users acted as critical evaluators of realism, assessing micro-details such as lighting, shadows, fluid motion, and physics to judge whether AI-generated scenes could plausibly exist. Second, users increasingly shifted from passive viewers to active creators, expressing curiosity about prompts, techniques, and creative processes. Text prompts were perceived as intellectual property, generating concerns about plagiarism and remixing norms. Third, users reported blurred boundaries between real and synthetic media, worried about misinformation, and even questioned the authenticity of other commenters, suspecting bot-generated engagement. Fourth, users contested platform governance: some perceived moderation as inconsistent or opaque, while others shared tactics for evading prompt censorship through misspellings, alternative phrasing, emojis, or other languages. Despite this, many users also enforced ethical norms by discouraging the misuse of real people's images or disrespectful content. Together, these patterns highlighted how AI-mediated platforms complicate notions of reality, creativity, and rule-making in emerging digital ecosystems. Based on the findings, we discuss governance challenges in Sora and how user negotiations inform future platform governance.


SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation

Zenith, Ayush, Zumbrun, Arnold, Raut, Neel, Lin, Jing

arXiv.org Artificial Intelligence

The performance of machine learning models depends heavily on training data. The scarcity of large-scale, well-annotated datasets poses significant challenges in creating robust models. To address this, synthetic data generated through simulations and generative models has emerged as a promising solution, enhancing dataset diversity and improving the performance, reliability, and resilience of models. However, evaluating the quality of this generated data requires an effective metric. This paper introduces the Synthetic Dataset Quality Metric (SDQM) to assess data quality for object detection tasks without requiring model training to converge. This metric enables more efficient generation and selection of synthetic datasets, addressing a key challenge in resource-constrained object detection tasks. In our experiments, SDQM demonstrated a strong correlation with the mean Average Precision (mAP) scores of YOLOv11, a leading object detection model, while previous metrics only exhibited moderate or weak correlations. Additionally, it provides actionable insights for improving dataset quality, minimizing the need for costly iterative training. This scalable and efficient metric sets a new standard for evaluating synthetic data.


There's Never Been a Worse Time to Be Authentic at Work

WIRED

There's Never Been a Worse Time to Be Authentic at Work Workers have been told to bring themselves to work, only to be disappointed time and time again, argues author Jodi-Ann Burey in her new book. Jodi-Ann Burey was only two weeks into her new role as an inclusion marketing manager for an outdoor retail company when she was accused of having a "race agenda." Burey, who is Black, was no stranger to workplace hypocrisy; as she sees it, the office is a petri dish where the knotty dynamics of society are concentrated. At the time of the accusation in February 2020, however, all she could do was laugh. "I was like, you knew who I was before you poached me. This is exactly what you wanted me to do," she says over Zoom.


A Dynamic Knowledge Update-Driven Model with Large Language Models for Fake News Detection

Jin, Di, Yang, Jun, Wang, Xiaobao, Zhang, Junwei, Li, Shuqi, He, Dongxiao

arXiv.org Artificial Intelligence

As the Internet and social media evolve rapidly, distinguishing credible news from a vast amount of complex information poses a significant challenge. Due to the suddenness and instability of news events, the authenticity labels of news can potentially shift as events develop, making it crucial for fake news detection to obtain the latest event updates. Existing methods employ retrieval-augmented generation to fill knowledge gaps, but they suffer from issues such as insufficient credibility of retrieved content and interference from noisy information. We propose a DYnamic kNowledge updAte-driven MOdel for fake news detection (DYNAMO), which leverages knowledge graphs to achieve continuous updating of new knowledge and integrates with large language models to fulfill dual functions: news authenticity detection and verification of new knowledge correctness, solving the two key problems of ensuring the authenticity of new knowledge and deeply mining news semantics. Specifically, we first construct a news-domain-specific knowledge graph. Then, we use Monte Carlo Tree Search to decompose complex news and verify them step by step. Finally, we extract and update new knowledge from verified real news texts and reasoning paths. Experimental results demonstrate that DYNAMO achieves the best performance on two real-world datasets.


SCAR: A Characterization Scheme for Multi-Modal Dataset

Su, Ri, Chen, Zhao, Cao, Caleb Chen, Tang, Nan, Chen, Lei

arXiv.org Artificial Intelligence

Foundation models exhibit remarkable generalization across diverse tasks, largely driven by the characteristics of their training data. Recent data-centric methods like pruning and compression aim to optimize training but offer limited theoretical insight into how data properties affect generalization, especially the data characteristics in sample scaling. Traditional perspectives further constrain progress by focusing predominantly on data quantity and training efficiency, often overlooking structural aspects of data quality. In this study, we introduce SCAR, a principled scheme for characterizing the intrinsic structural properties of datasets across four key measures: Scale, Coverage, Authenticity, and Richness. Unlike prior data-centric measures, SCAR captures stable characteristics that remain invariant under dataset scaling, providing a robust and general foundation for data understanding. Leveraging these structural properties, we introduce Foundation Data-a minimal subset that preserves the generalization behavior of the full dataset without requiring model-specific retraining. We model single-modality tasks as step functions and estimate the distribution of the foundation data size to capture step-wise generalization bias across modalities in the target multi-modal dataset. Finally, we develop a SCAR-guided data completion strategy based on this generalization bias, which enables efficient, modality-aware expansion of modality-specific characteristics in multimodal datasets. Experiments across diverse multi-modal datasets and model architectures validate the effectiveness of SCAR in predicting data utility and guiding data acquisition. Code is available at https://github.com/McAloma/SCAR.


Navigating the New Landscape: A Conceptual Model for Project-Based Assessment (PBA) in the Age of GenAI

Kadel, Rajan, Shailendra, Samar, Saxena, Urvashi Rahul

arXiv.org Artificial Intelligence

The rapid integration of Generative Artificial Intelligence (GenAI) into higher education presents both opportunities and challenges for assessment design, particularly within Project-Based Assessment (PBA) contexts. Traditional assessment methods often emphasise the final product in the PBA, which can now be significantly influenced or created by GenAI tools, raising concerns regarding product authenticity, academic integrity, and learning validation. This paper advocates for a reimagined assessment model for Project-Based Learning (PBL) or a capstone project that prioritises process-oriented evaluation, multi-modal and multifaceted assessment design, and ethical engagement with GenAI to enable higher-order thinking. The model also emphasises the use of (GenAI-assisted) personalised feedback by a supervisor as an observance of the learning process during the project lifecycle. A use case scenario is provided to illustrate the application of the model in a capstone project setting. The paper concludes with recommendations for educators and curriculum designers to ensure that assessment practices remain robust, learner-centric, and integrity-driven in the evolving landscape of GenAI.


Ethical Medical Image Synthesis

Jin, Weina, Sinha, Ashish, Abhishek, Kumar, Hamarneh, Ghassan

arXiv.org Artificial Intelligence

The task of ethical Medical Image Synthesis (MISyn) is to ensure that the MISyn techniques are researched and developed ethically throughout their entire lifecycle, which is essential to prevent the negative impacts of MISyn. To address the ever-increasing needs and requirements for ethical practice of MISyn research and development, we first conduct a theoretical analysis that identifies the key properties of ethical MISyn and intrinsic limits of MISyn. We identify that synthetic images lack inherent grounding in real medical phenomena, cannot fully represent the training medical images, and inevitably introduce new distribution shifts and biases. Ethical risks can arise from not acknowledging the intrinsic limits and weaknesses of synthetic images compared to medical images, with the extreme form manifested as misinformation of MISyn that substitutes synthetic images for medical images without acknowledgment. The resulting ethical harms include eroding trust in the medical imaging dataset environment and causing algorithmic discrimination towards stakeholders and the public. To facilitate collective efforts towards ethical MISyn within and outside the medical image analysis community, we then propose practical supports for ethical practice in MISyn based on the theoretical analysis, including ethical practice recommendations that adapt the existing technical standards, problem formulation, design, and evaluation practice of MISyn to the ethical challenges; and oversight recommendations to facilitate checks and balances from stakeholders and the public. We also present two case studies that demonstrate how to apply the ethical practice recommendations in practice, and identify gaps between existing practice and the ethical practice recommendations.


Optimizing Retrieval-Augmented Generation (RAG) for Colloquial Cantonese: A LoRA-Based Systematic Review

Calonge, David Santandreu, Smail, Linda

arXiv.org Artificial Intelligence

This review examines recent advances in Parameter-Efficient Fine-Tuning (PEFT), with a focus on Low-Rank Adaptation (LoRA), to optimize Retrieval-Augmented Generation (RAG) systems like Qwen3, DeepSeek, and Kimi. These systems face challenges in understanding and generating authentic Cantonese colloquial expressions due to limited annotated data and linguistic variability. The review evaluates the integration of LoRA within RAG frameworks, benchmarks PEFT methods for retrieval and generation accuracy, identify domain adaptation strategies under limited data, and compares fine-tuning techniques aimed at improving semantic fidelity under data-scarce conditions. A systematic analysis of recent studies employing diverse LoRA variants, synthetic data generation, user feedback integration, and adaptive parameter allocation was conducted to assess their impact on computational efficiency, retrieval precision, linguistic authenticity, and scalability. Findings reveal that dynamic and ensemble LoRA adaptations significantly reduce trainable parameters without sacrificing retrieval accuracy and generation quality in dialectal contexts. However, limitations remain in fully preserving fine-grained linguistic nuances, especially for low-resource settings like Cantonese. The integration of real-time user feedback and domain-specific data remains underdeveloped, limiting model adaptability and personalization. While selective parameter freezing and nonlinear adaptation methods offer better trade-offs between efficiency and accuracy, their robustness at scale remains an open challenge. This review highlights the promise of PEFT-enhanced RAG systems for domain-specific language tasks and calls for future work targeting dialectal authenticity, dynamic adaptation, and scalable fine-tuning pipelines.