Goto

Collaborating Authors

 Media


NeoQA: Evidence-based Question Answering with Generated News Events

arXiv.org Artificial Intelligence

Evaluating Retrieval-Augmented Generation (RAG) in large language models (LLMs) is challenging because benchmarks can quickly become stale. Questions initially requiring retrieval may become answerable from pretraining knowledge as newer models incorporate more recent information during pretraining, making it difficult to distinguish evidence-based reasoning from recall. We introduce NeoQA (News Events for Out-of-training Question Answering), a benchmark designed to address this issue. To construct NeoQA, we generated timelines and knowledge bases of fictional news events and entities along with news articles and Q\&A pairs to prevent LLMs from leveraging pretraining knowledge, ensuring that no prior evidence exists in their training data. We propose our dataset as a new platform for evaluating evidence-based question answering, as it requires LLMs to generate responses exclusively from retrieved evidence and only when sufficient evidence is available. NeoQA enables controlled evaluation across various evidence scenarios, including cases with missing or misleading details. Our findings indicate that LLMs struggle to distinguish subtle mismatches between questions and evidence, and suffer from short-cut reasoning when key information required to answer a question is missing from the evidence, underscoring key limitations in evidence-based reasoning.


Would You Rely on an Eerie Agent? A Systematic Review of the Impact of the Uncanny Valley Effect on Trust in Human-Agent Interaction

arXiv.org Artificial Intelligence

Trust is a fundamental component of human-agent interaction. With the increasing presence of artificial agents in daily life, it is essential to understand how people perceive and trust these agents. One of the key challenges affecting this perception is the Uncanny Valley Effect (UVE), where increasingly human-like artificial beings can be perceived as eerie or repelling. Despite growing interest in trust and the UVE, existing research varies widely in terms of how these concepts are defined and operationalized. This inconsistency raises important questions about how and under what conditions the UVE influences trust in agents. A systematic understanding of their relationship is currently lacking. This review aims to examine the impact of the UVE on human trust in agents and to identify methodological patterns, limitations, and gaps in the existing empirical literature. Following PRISMA guidelines, a systematic search identified 53 empirical studies that investigated both UVE-related constructs and trust or trust-related outcomes. Studies were analyzed based on a structured set of categories, including types of agents and interactions, methodological and measurement approaches, and key findings. The results of our systematic review reveal that most studies rely on static images or hypothetical scenarios with limited real-time interaction, and the majority use subjective trust measures. This review offers a novel framework for classifying trust measurement approaches with regard to the best-practice criteria for empirically investigating the UVE. As the first systematic attempt to map the intersection of UVE and trust, this review contributes to a deeper understanding of their interplay and offers a foundation for future research. Keywords: the uncanny valley effect, trust, human-likeness, affinity response, human-agent interaction


Preliminary Explorations with GPT-4o(mni) Native Image Generation

arXiv.org Artificial Intelligence

Recently, the visual generation ability by GPT-4o(mni) has been unlocked by OpenAI. It demonstrates a very remarkable generation capability with excellent multimodal condition understanding and varied task instructions. In this paper, we aim to explore the capabilities of GPT-4o across various tasks. Inspired by previous study, we constructed a task taxonomy along with a carefully curated set of test samples to conduct a comprehensive qualitative test. Benefiting from GPT-4o's powerful multimodal comprehension, its image-generation process demonstrates abilities surpassing those of traditional image-generation tasks. Thus, regarding the dimensions of model capabilities, we evaluate its performance across six task categories: traditional image generation tasks, discriminative tasks, knowledge-based generation, commonsense-based generation, spatially-aware image generation, and temporally-aware image generation. These tasks not only assess the quality and conditional alignment of the model's outputs but also probe deeper into GPT-4o's understanding of real-world concepts. Our results reveal that GPT-4o performs impressively well in general-purpose synthesis tasks, showing strong capabilities in text-to-image generation, visual stylization, and low-level image processing. However, significant limitations remain in its ability to perform precise spatial reasoning, instruction-grounded generation, and consistent temporal prediction. Furthermore, when faced with knowledge-intensive or domain-specific scenarios, such as scientific illustrations or mathematical plots, the model often exhibits hallucinations, factual errors, or structural inconsistencies. These findings suggest that while GPT-4o marks a substantial advancement in unified multimodal generation, there is still a long way to go before it can be reliably applied to professional or safety-critical domains.


Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset

arXiv.org Artificial Intelligence

This study presents a novel approach to enhance the cost-to-quality ratio of image generation with diffusion models. We hypothesize that differences between distilled (e.g. FLUX.1-schnell) and baseline (e.g. FLUX.1-dev) models are consistent and, therefore, learnable within a specialized domain, like portrait generation. We generate a synthetic paired dataset and train a fast image-to-image translation head. Using two sets of low- and high-quality synthetic images, our model is trained to refine the output of a distilled generator (e.g., FLUX.1-schnell) to a level comparable to a baseline model like FLUX.1-dev, which is more computationally intensive. Our results show that the pipeline, which combines a distilled version of a large generative model with our enhancement layer, delivers similar photorealistic portraits to the baseline version with up to an 82% decrease in computational cost compared to FLUX.1-dev. This study demonstrates the potential for improving the efficiency of AI solutions involving large-scale image generation.


AI technology helps reunite lost dogs with their owners

FOX News

Petco Love Lost is a free platform that uses AI-powered photo matching to reunite lost pets with their families. When Michael Bown left New York City for a family reunion at the Jersey Shore, he never imagined he'd return to a nightmare. His beloved adopted dog, Millie, just a year old, slipped out of her collar during a walk in the East Village and vanished into the night. What followed was a frantic, emotional and ultimately heartwarming journey, one that highlights the power of community, technology and a little bit of luck. Join The FREE CyberGuy Report: Get my expert tech tips, critical security alerts and exclusive deals -- plus instant access to my free Ultimate Scam Survival Guide when you sign up! Michael's story began with a simple act of trust, leaving Millie in the care of a close friend.


Fox News AI Newsletter: Where US, China stand in AI race

FOX News

AI ARMS RACE: OpenAI co-founder Sam Altman joined three other artificial intelligence (AI) and technology executives for a Senate Commerce Committee hearing on winning the global AI race and strengthening domestic capabilities in computing and innovation. Sam Altman, chief executive officer of OpenAI, during a fireside chat at University College London (UCL) in London, UK, on Wednesday, May 24, 2023. Altman said part of the reason for his current tour of European cities is to discover a suitable location for a new office. EMBRACING AI: Some companies have been adjusting their workforce as they simultaneously embrace artificial intelligence and automation more, according to Forbes. NEW INVESTORS: OpenAI is shaking up its corporate structure to bring in new investors and accelerate the development of artificial general intelligence (AGI).


Elton John and Dua Lipa seek protection from AI

BBC News

Not everyone agrees with the artists' approach. Julia Willemyns, co-founder of the Centre for British Progress think tank, said such proposals could hamper the UK and its bid for growth. The measures would "do nothing to stop foreign firms from using content from the British creative industries," she told the BBC. These tools, which can produce new content in response to simple text prompts, have become increasingly popular and available to consumers. But their capabilities have been accompanied by concerns and criticism over their data use and energy demand.


Paul McCartney and Dua Lipa among artists urging Starmer to rethink AI copyright plans

The Guardian

"We will lose an immense growth opportunity if we give our work away at the behest of a handful of powerful overseas tech companies and with it our future income, the UK's position as a creative powerhouse, and any hope that the technology of daily life will embody the values and laws of the United Kingdom," the letter says. Urging parliamentarians on all sides of the political spectrum and in both houses to support the change, the letter says: "We urge you to vote in support of the UK creative industries. Supporting us supports the creators of the future. Our work is not yours to give away." Spanning the worlds of music, theatre, film, literature, art and media, the more than 400 signatories include Elton John, Kazuo Ishiguro, Annie Lennox, Rachel Whiteread, Jeanette Winterson, the National Theatre and the News Media Association, which represents more than 800 news titles including the Guardian.


World's first Star Wars-style hoverbike can hit 124mph and DOESN'T need propellors to fly

Daily Mail - Science & tech

A company say they have developed a Star Wars-inspired speeder bike that can zoom to 124mph. Poland-based Volonaut says their Airbike is the first'hoverbike' vehicle of its kind that does not use propellers to fly. Incredible videos show someone sitting on the device as it appears to effortlessly glide through the air. At one point it hovers remarkably steady as the rider lifts a hand to wave at the camera. The firm says: 'This groundbreaking design shares a lot of similarities to'speeder bikes' featured in popular science-fiction movies.'


OpenAI's Sam Altman thanks Sen John Fetterman for 'normalizing hoodies'

FOX News

Sen. John Fetterman, D-Pa., receives praise for his less-than-formal attire from Sam Altman during a Commerce Committee hearing. Sen. John Fetterman, D-Pa., was one of the final senators to question OpenAI chief Sam Altman during Thursday's Senate Commerce Committee hearing, and the subject of both Three Mile Island and the Democrat's penchant for Carhartt outerwear came up. Fetterman said that as a senator he has been able to meet people with "much more impressive jobs and careers" and that due to Altman's technology, "humans will have a wonderful ability to adapt." He told Altman that some Americans are worried about AI on various levels, and he asked the executive to address it. In response, Altman said he appreciated Fetterman's praise.