Goto

Collaborating Authors

 Media


Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis

arXiv.org Artificial Intelligence

Text-to-speech (TTS) technology has achieved impressive results for widely spoken languages, yet many under-resourced languages remain challenged by limited data and linguistic complexities. In this paper, we present a novel methodology that integrates a data-optimized framework with an advanced acoustic model to build high-quality TTS systems for low-resource scenarios. We demonstrate the effectiveness of our approach using Thai as an illustrative case, where intricate phonetic rules and sparse resources are effectively addressed. Our method enables zero-shot voice cloning and improved performance across diverse client applications, ranging from finance to healthcare, education, and law. Extensive evaluations - both subjective and objective - confirm that our model meets state-of-the-art standards, offering a scalable solution for TTS production in data-limited settings, with significant implications for broader industry adoption and multilingual accessibility.


MuSaRoNews: A Multidomain, Multimodal Satire Dataset from Romanian News Articles

arXiv.org Artificial Intelligence

Satire and fake news can both contribute to the spread of false information, even though both have different purposes (one if for amusement, the other is to misinform). However, it is not enough to rely purely on text to detect the incongruity between the surface meaning and the actual meaning of the news articles, and, often, other sources of information (e.g., visual) provide an important clue for satire detection. This work introduces a multimodal corpus for satire detection in Romanian news articles named MuSaRoNews. Specifically, we gathered 117,834 public news articles from real and satirical news sources, composing the first multimodal corpus for satire detection in the Romanian language. We conducted experiments and showed that the use of both modalities improves performance.


FairEval: Evaluating Fairness in LLM-Based Recommendations with Personality Awareness

arXiv.org Artificial Intelligence

Recent advances in Large Language Models (LLMs) have enabled their application to recommender systems (RecLLMs), yet concerns remain regarding fairness across demographic and psychological user dimensions. We introduce FairEval, a novel evaluation framework to systematically assess fairness in LLM-based recommendations. FairEval integrates personality traits with eight sensitive demographic attributes,including gender, race, and age, enabling a comprehensive assessment of user-level bias. We evaluate models, including ChatGPT 4o and Gemini 1.5 Flash, on music and movie recommendations. FairEval's fairness metric, PAFS, achieves scores up to 0.9969 for ChatGPT 4o and 0.9997 for Gemini 1.5 Flash, with disparities reaching 34.79 percent. These results highlight the importance of robustness in prompt sensitivity and support more inclusive recommendation systems.


Data over dialogue: Why artificial intelligence is unlikely to humanise medicine

arXiv.org Artificial Intelligence

Recently, a growing number of experts in artificial intelligence (AI) and medicine have be-gun to suggest that the use of AI systems, particularly machine learning (ML) systems, is likely to humanise the practice of medicine by substantially improving the quality of clinician-patient relationships. In this thesis, however, I argue that medical ML systems are more likely to negatively impact these relationships than to improve them. In particular, I argue that the use of medical ML systems is likely to comprise the quality of trust, care, empathy, understanding, and communication between clinicians and patients.


What Contributes to Affective Polarization in Networked Online Environments? Evidence from an Agent-Based Model

arXiv.org Artificial Intelligence

Affective polarization, or, inter-party hostility, is increasingly recognized as a pervasive issue in democracies worldwide, posing a threat to social cohesion. The digital media ecosystem, now widely accessible and ever-present, has often been implicated in accelerating this phenomenon. However, the precise causal mechanisms responsible for driving affective polarization have been a subject of extensive debate. While the concept of echo chambers, characterized by individuals ensconced within like-minded groups, bereft of counter-attitudinal content, has long been the prevailing hypothesis, accumulating empirical evidence suggests a more nuanced picture. This study aims to contribute to the ongoing debate by employing an agent-based model to illustrate how affective polarization is either fostered or hindered by individual news consumption and dissemination patterns based on ideological alignment. To achieve this, we parameterize three key aspects: (1) The affective asymmetry of individuals' engagement with in-party versus out-party content, (2) The proportion of in-party members within one's social neighborhood, and (3) The degree of partisan bias among the elites within the population. Subsequently, we observe macro-level changes in affective polarization within the population under various conditions stipulated by these parameters. This approach allows us to explore the intricate dynamics of affective polarization within digital environments, shedding light on the interplay between individual behaviors, social networks, and information exposure.


Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing

arXiv.org Artificial Intelligence

Instruction-based Image Editing (IIE) models have made significantly improvement due to the progress of multimodal large language models (MLLMs) and diffusion models, which can understand and reason about complex editing instructions. In addition to advancing current IIE models, accurately evaluating their output has become increasingly critical and challenging. Current IIE evaluation methods and their evaluation procedures often fall short of aligning with human judgment and often lack explainability. To address these limitations, we propose JUdgement through Routing of Expertise (JURE). Each expert in JURE is a pre-selected model assumed to be equipped with an atomic expertise that can provide useful feedback to judge output, and the router dynamically routes the evaluation task of a given instruction and its output to appropriate experts, aggregating their feedback into a final judge. JURE is trustworthy in two aspects. First, it can effortlessly provide explanations about its judge by examining the routed experts and their feedback. Second, experimental results demonstrate that JURE is reliable by achieving superior alignment with human judgments, setting a new standard for automated IIE evaluation. Moreover, JURE's flexible design is future-proof - modular experts can be seamlessly replaced or expanded to accommodate advancements in IIE, maintaining consistently high evaluation quality. Our evaluation data and results are available at https://github.com/Cyyyyyrus/JURE.git.


EnDive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models

arXiv.org Artificial Intelligence

The diversity of human language, shaped by social, cultural, and regional influences, presents significant challenges for natural language processing (NLP) systems. Existing benchmarks often overlook intra-language variations, leaving speakers of non-standard dialects underserved. To address this gap, we introduce EnDive (English Diversity), a benchmark that evaluates five widely-used large language models (LLMs) across tasks in language understanding, algorithmic reasoning, mathematics, and logic. Our framework translates Standard American English datasets into five underrepresented dialects using few-shot prompting with verified examples from native speakers, and compare these translations against rule-based methods via fluency assessments, preference tests, and semantic similarity metrics. Human evaluations confirm high translation quality, with average scores of at least 6.02/7 for faithfulness, fluency, and formality. By filtering out near-identical translations, we create a challenging dataset that reveals significant performance disparities - models consistently underperform on dialectal inputs compared to Standard American English. EnDive thus advances dialect-aware NLP by uncovering model biases and promoting more equitable language technologies.


A Multimedia Analytics Model for the Foundation Model Era

arXiv.org Artificial Intelligence

The rapid advances in Foundation Models and agentic Artificial Intelligence are transforming multimedia analytics by enabling richer, more sophisticated interactions between humans and analytical systems. Existing conceptual models for visual and multimedia analytics, however, do not adequately capture the complexity introduced by these powerful AI paradigms. To bridge this gap, we propose a comprehensive multimedia analytics model specifically designed for the foundation model era. Building upon established frameworks from visual analytics, multimedia analytics, knowledge generation, analytic task definition, mixed-initiative guidance, and human-in-the-loop reinforcement learning, our model emphasizes integrated human-AI teaming based on visual analytics agents from both technical and conceptual perspectives. Central to the model is a seamless, yet explicitly separable, interaction channel between expert users and semi-autonomous analytical processes, ensuring continuous alignment between user intent and AI behavior. The model addresses practical challenges in sensitive domains such as intelligence analysis, investigative journalism, and other fields handling complex, high-stakes data. We illustrate through detailed case studies how our model facilitates deeper understanding and targeted improvement of multimedia analytics solutions. By explicitly capturing how expert users can optimally interact with and guide AI-powered multimedia analytics systems, our conceptual framework sets a clear direction for system design, comparison, and future research.


Fox News 'Antisemitism Exposed' Newsletter: Software giant fires anti-Israel worker for hate rant

FOX News

The two workers say their employment was terminated over the protests. Fox News' "Antisemitism Exposed" newsletter brings you stories on the rising anti-Jewish prejudice across the U.S. and the world. TOP STORY: Microsoft fired an employee who disrupted the company's 50th anniversary celebration event to voice their opposition to its work supplying artificial intelligence technology to Israel. As Microsoft AI CEO Mustafa Suleyman spoke at the event, Ibtihal Aboussad began shouting at him, accusing him of being "a war profiteer." She demanded that Suleyman "stop using AI for genocide."


Pulp unveil their first new album in 24 years

BBC News

"All the moving images featured in the video are the result of me feeding in a still image and then typing in a'prompt' such as: 'The black & white figure remains still whilst the bus in the background drives off', which led to [a] sequence where the coach weirdly slides towards the cut-out of me," said Cocker. "The weekend I began work on the video was a strange time: I went out of the house and kept expecting weird transformations of the surrounding environment due to the images the computer had been generating. "The experience had marked me. I don't know whether I've recovered yet." After completing the video, the musician said he'd landed firmly on the side of "human intelligence" over AI.