Goto

Collaborating Authors

 Media


'My skin was peeling' - the African women tricked into making Russian drones

BBC News

'My skin was peeling' - the African women tricked into making Russian drones On her first day of work, Adau realised she had made a big mistake. We got our uniforms, not even knowing exactly what we were going to do. From the first day of work we were taken to the drones factory. We stepped in and we saw drones everywhere and people working. Then they took us to our different work stations.


Mystery of the 'golfer's curse' is SOLVED: Scientists pinpoint why golf balls 'lip out' after appearing to enter the hole

Daily Mail - Science & tech

Now he's dead, here's the full story of what happened that day... and the ghastly aftermath no one knows about Wake up and see he's the master of the dark arts: MEGYN KELLY blows the lid on the REAL Mamdani... how are they missing this? 'Screaming' Sydney Sweeney'hates' that she was caught hiding in ex-fiancé's car: Now insiders spill truth about backseat rendezvous and lingering'frustrations' Mom is an Oscar winner who has acted with Selena Gomez and Nicole Kidman, who is this nepo kid who came out last year? Bella Thorne continues swimsuit season as she works sexy bikini for Los Cabo trip with her'love' Mark Emms Experts pinpoint typical life expectancy from initial dementia diagnosis - and there's a huge variation between different subtypes Diddy's male prison protector unmasked: How disgraced mogul has repaid him... and turned to God for repentance Teachers threatened over bloody'Problem Solved' T-shirts over claims they mocked murder of Charlie Kirk Astonishing moment Miss Universe winner storms out of this year's event after pageant president reprimands Miss Mexico and tells security to remove her for not showing'respect' Mystery of the'golfer's curse' is SOLVED: Scientists pinpoint why golf balls'lip out' after appearing to enter the hole READ MORE: Golf balls are a'product of colonial exploitation', exhibition says Experts have finally solved the mystery of one of the most infuriating occurrences in golf - the dreaded lip out. The phenomenon occurs when the golf ball appears to enter the hole, only to immediately pop back out again. Scientists have finally pinpointed the physics behind the'curse', which has plagued everyone from amateur hobbyists to PGA professionals. Best of all, they've revealed the best way to avoid it - keeping your score intact.


Beyond the Link: Assessing LLMs' ability to Classify Political Content across Global Media

arXiv.org Artificial Intelligence

The use of large language models (LLMs) is becoming common in political science and digital media research. While LLMs have demonstrated ability in labelling tasks, their effectiveness to classify Political Content (PC) from URLs remains underexplored. This article evaluates whether LLMs can accurately distinguish PC from non-PC using both the text and the URLs of news articles across five countries (France, Germany, Spain, the UK, and the US) and their different languages. Using cutting-edge models, we benchmark their performance against human-coded data to assess whether URL-level analysis can approximate full-text analysis. Our findings show that URLs embed relevant information and can serve as a scalable, cost-effective alternative to discern PC. However, we also uncover systematic biases: LLMs seem to overclassify centrist news as political, leading to false positives that may distort further analyses. We conclude by outlining methodological recommendations on the use of LLMs in political science research.


Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities

arXiv.org Artificial Intelligence

As model context lengths continue to grow, concerns about whether models effectively use the full context length have persisted. While several carefully designed long-context evaluations have recently been released, these evaluations tend to rely on retrieval from one or more sections of the context, which allows nearly all of the context tokens to be disregarded as noise. This represents only one type of task that might be performed with long context. We introduce Oolong, a benchmark of long-context reasoning tasks that require analyzing individual chunks of text on an atomic level, and then aggregating these analyses to answer distributional questions. Oolong is separated into two task sets: Oolong-synth, a set of naturalistic synthetic tasks, where we can easily ablate components of the reasoning problem; and Oolong-real, a downstream setting which requires reasoning over real-world conversational data. Oolong requires models to reason over large quantities of examples, to perform both classification and counting in-context, and to reason over temporal and user relations. Even frontier models struggle on Oolong, with GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro all achieving less than 50% accuracy on both splits at 128K. We release the data and evaluation harness for Oolong to enable further development of models that can reason over large quantities of text.


VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

arXiv.org Artificial Intelligence

Code has emerged as a precise and executable medium for reasoning and action in the agent era. Yet, progress has largely focused on language-centric tasks such as program synthesis and debugging, leaving visual-centric coding underexplored. Inspired by how humans reason over sketches, we advocate SVG code as a compact, interpretable, and executable visual representation. We introduce VCode, a benchmark that reframes multimodal understanding as code generation: given an image, a model must produce SVG that preserves symbolic meaning for downstream reasoning. VCode covers three domains - general commonsense (MM-Vet), professional disciplines (MMMU), and visual-centric perception (CV-Bench). To assess symbolic fidelity, we propose CodeVQA, a novel evaluation protocol in which a policy model answers questions over rendered SVGs; correct answers indicate faithful symbolic preservation. Empirically, frontier VLMs struggle to generate faithful SVGs, revealing a persistent gap between language-centric and visual-centric coding. To close this gap, we introduce VCoder, an agentic framework that augments VLMs along two axes: (i) Thinking with Revision, which iteratively analyzes discrepancies and refines SVG code; and (ii) Acting with Visual Tools, where detectors and parsers supply structured cues such as objects, shapes, and text beyond the model's intrinsic capacity. Across benchmarks, frontier VLMs with strong reasoning capabilities score well overall yet remain limited in professional knowledge and 3D reasoning. VCoder delivers a 12.3-point overall gain over the top-performing Claude-4-Opus. Human studies show that both humans and VLMs perform worse on rendered SVGs, their consistency reveals the promise of symbolic visual representation. The benchmark and code are available at https://github.com/CSU-JPG/VCode.


Beyond Single Embeddings: Capturing Diverse Targets with Multi-Query Retrieval

arXiv.org Artificial Intelligence

Most text retrievers generate one query vector to retrieve relevant documents. Y et, the conditional distribution of relevant documents for the query may be multi-modal, e.g., representing different interpretations of the query. We first quantify the limitations of existing retrievers. All retrievers we evaluate struggle more as the distance between target document embeddings grows. Our model autoregressively generates multiple query vectors, and all the predicted query vectors are used to retrieve documents from the corpus. We show that on the synthetic vectorized data, the proposed method could capture multiple target distributions perfectly, showing 4x better performance than single embedding model. We also fine-tune our model on real-world multi-answer retrieval datasets and evaluate in-domain. AMER presents 4 and 21% relative gains over single-embedding baselines on two datasets we evaluate on. Furthermore, we consistently observe larger gains on the subset of dataset where the embeddings of the target documents are less similar to each other. We demonstrate the potential of using a multi-query vector retriever and open up a new direction for future work. As large language models (LLMs) have limited, out-dated parametric knowledge, augmenting knowledge at inference time by prepending retrieved documents has risen as a de facto solution (Fan et al., 2024; Gao et al., 2023). Recovering a diverse set of documents is crucial to provide comprehensive information (Xu et al., 2023), as an answer providing partial information can be technically correct yet misleading to users. In this work, we study retrieving a diverse set of documents per query. We first analyze the behaviors of existing retrievers (Izacard et al., 2022; Y ang et al., 2025b; Zhang et al., 2025; Lee et al., 2025a) on datasets (Min et al., 2020; Amouyal et al., 2023) containing questions that admit multiple valid answers.


Dexterous Robotic Piano Playing at Scale

arXiv.org Artificial Intelligence

This work has been submitted to the IEEE for possible publication. Abstract--Endowing robot hands with human-level dexterity has been a long-standing goal in robotics. Bimanual robotic piano playing represents a particularly challenging task: it is high-dimensional, contact-rich, and requires fast, precise control. Our approach is built on three core components. First, we introduce an automatic fingering strategy based on Optimal Transport (OT), allowing the agent to autonomously discover efficient piano-playing strategies from scratch without demonstrations. Second, we conduct large-scale Reinforcement Learning (RL) by training more than 2,000 agents, each specialized in distinct music pieces, and aggregate their experience into a dataset named RP1M++, consisting of over one million trajectories for robotic piano playing. Extensive experiments and ablation studies highlight the effectiveness and scalability of our approach, advancing dexterous robotic piano playing at scale. Achieving human-level dexterity remains one of the central challenges in robotics. The difficulty stems from the breadth of challenges ranging from contact-rich manipulation to dynamic athletic tasks, each posing distinct demands. Manipulation tasks, such as grasping or reorienting objects [1], require sustained application of appropriate forces at moderate speeds across objects with diverse shapes, materials, and weight distributions. Dynamic tasks, such as juggling [2] or table tennis [3], involve frequent contact changes, demand high precision, and allow little tolerance for error due to the rarity of contact opportunities. The combination of requiring both precision and speed makes reproducing human-level dexterity particularly challenging. Q. Gao is with the University of Southern California, CA 90007, United States (e-mail: quankaig@usc.edu). Q. Cheng is with Imperial College London, SW7 2AZ, London, United Kingdom (e-mail: c.qian24@imperial.ac.uk). J. Kannala is with the University of Oulu, 90570 Oulu, Finland. D. B uchler is also with the University of Alberta (Canada), the Alberta Machine Intelligence Institute (Amii), & holds a Canada CIFAR AI Chair.


AI Credibility Signals Outrank Institutions and Engagement in Shaping News Perception on Social Media

arXiv.org Artificial Intelligence

AI-generated content is rapidly becoming a salient component of online information ecosystems, yet its influence on public trust and epistemic judgments remains poorly understood. We present a large-scale mixed-design experiment (N = 1,000) investigating how AI-generated credibility scores affect user perception of political news. Our results reveal that AI feedback significantly moderates partisan bias and institutional distrust, surpassing traditional engagement signals such as likes and shares. These findings demonstrate the persuasive power of generative AI and suggest a need for design strategies that balance epistemic influence with user autonomy.


Human-Machine Ritual: Synergic Performance through Real-Time Motion Recognition

arXiv.org Artificial Intelligence

We introduce a lightweight, real-time motion recognition system that enables synergic human-machine performance through wearable IMU sensor data, MiniRocket time-series classification, and responsive multimedia control. By mapping dancer-specific movement to sound through somatic memory and association, we propose an alternative approach to human-machine collaboration, one that preserves the expressive depth of the performing body while leveraging machine learning for attentive observation and responsiveness. We demonstrate that this human-centered design reliably supports high accuracy classification (<50 ms latency), offering a replicable framework to integrate dance-literate machines into creative, educational, and live performance contexts.


Solving cold start in news recommendations: a RippleNet-based system for large scale media outlet

arXiv.org Artificial Intelligence

We present a scalable recommender system implementation based on RippleNet, tailored for the media domain with a production deployment in Onet.pl, one of Poland's largest online media platforms. Our solution addresses the cold-start problem for newly published content by integrating content-based item embeddings into the knowledge propagation mechanism of RippleNet, enabling effective scoring of previously unseen items. The system architecture leverages Amazon SageMaker for distributed training and inference, and Apache Airflow for orchestrating data pipelines and model retraining workflows. To ensure high-quality training data, we constructed a comprehensive golden dataset consisting of user and item features and a separate interaction table, all enabling flexible extensions and integration of new signals.