commentary
Timely Clinical Diagnosis through Active Test Selection
Estévez, Silas Ruhrberg, Astorga, Nicolás, van der Schaar, Mihaela
There is growing interest in using machine learning (ML) to support clinical diagnosis, but most approaches rely on static, fully observed datasets and fail to reflect the sequential, resource-aware reasoning clinicians use in practice. Diagnosis remains complex and error prone, especially in high-pressure or resource-limited settings, underscoring the need for frameworks that help clinicians make timely and cost-effective decisions. We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design), a diagnostic framework that integrates Bayesian Experimental Design (BED) with large language models (LLMs) to better emulate real-world diagnostic reasoning. At each step, ACTMED selects the test expected to yield the greatest reduction in diagnostic uncertainty for a given patient. LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data. Clinicians can remain in the loop; reviewing test suggestions, interpreting intermediate outputs, and applying clinical judgment throughout. We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use. This represents a step toward transparent, adaptive, and clinician-aligned diagnostic systems that generalize across settings with reduced reliance on domain-specific data.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Spain (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- (2 more...)
MCAD: Multimodal Context-Aware Audio Description Generation For Soccer
Chaudhary, Lipisha, Mittal, Trisha, Gopalakrishnan, Subhadra, Nwogu, Ifeoma, Pytlarz, Jaclyn
Abstract--Audio Descriptions (AD) are essential for making visual content accessible to individuals with visual impairments. Recent works have shown a promising step towards automating AD, but they have been limited to describing high-quality movie content using human-annotated ground truth AD in the process. In this work, we present an end-to-end pipeline, MCAD, that extends AD generation beyond movies to the domain of sports, with a focus on soccer games, without relying on ground truth AD. T o address the absence of domain-specific AD datasets, we fine-tune a Video Large Language Model on publicly available movie AD datasets so that it learns the narrative structure and conventions of AD. During inference, MCAD incorporates multimodal contextual cues such as player identities, soccer events/actions, and commentary from the game. These cues, combined with input prompts to the fine-tuned Video-LLM, allow the system to produce complete AD text for each video segment. We further introduce a new evaluation metric, ARGE-AD, designed to accurately assess the quality of generated AD. ARGE-AD evaluates the generated AD for the presence of five characteristics: (i) usage of people's names, (ii) mention of actions/events, (iii) appropriate length of AD, (iv) absence of pronouns, and (v) overlap from commentary/subtitles. We present an in-depth analysis of our approach on both movie and soccer datasets. We also validate the use of this metric to quantitatively comment on the quality of generated AD using our metric across domains. Additionally, we contribute audio descriptions for 100 soccer game clips annotated by two AD experts. Audio Description (AD) is the descriptive spoken narration of visual content, primarily for assisting visual impairments in accessing visual content [1].
- Europe > Spain > Galicia > Madrid (0.04)
- Europe > United Kingdom > England (0.04)
- North America > United States > Colorado > Denver County > Denver (0.04)
- (6 more...)
Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
Juvekar, Kush, Bhattacharya, Arghya, Khadloya, Sai, Saxena, Utkarsh
Large language models (LLMs) are entering legal workflows, yet we lack a jurisdiction-specific framework to assess their baseline competence therein. We use India's public legal examinations as a transparent proxy. Our multi-year benchmark assembles objective screens from top national and state exams and evaluates open and frontier LLMs under real-world exam conditions. To probe beyond multiple-choice questions, we also include a lawyer-graded, paired-blinded study of long-form answers from the Supreme Court's Advocate-on-Record exam. This is, to our knowledge, the first exam-grounded, India-specific yardstick for LLM court-readiness released with datasets and protocols. Our work shows that while frontier systems consistently clear historical cutoffs and often match or exceed recent top-scorer bands on objective exams, none surpasses the human topper on long-form reasoning. Grader notes converge on three reliability failure modes: procedural or format compliance, authority or citation discipline, and forum-appropriate voice and structure. These findings delineate where LLMs can assist (checks, cross-statute consistency, statute and precedent lookups) and where human leadership remains essential: forum-specific drafting and filing, procedural and relief strategy, reconciling authorities and exceptions, and ethical, accountable judgment.
- Research Report > Strength High (0.48)
- Research Report > Experimental Study (0.48)
CMT-Bench: Cricket Multi-Table Generation Benchmark for Probing Robustness in Large Language Models
Upadhyay, Ritam, Ahuja, Naman, Baral, Rishabh, Garimella, Aparna, Gupta, Vivek
LLM Driven text-to-table (T2T) systems often rely on extensive prompt-engineering or iterative event extraction in code-parsable formats, which boosts scores but are computationally expensive and obscure how models actually reason over temporal evolving narratives to summarise key information. We present CMT-Bench, a diagnostic benchmark built from live cricket commentary that requires dynamic table generation across two evolving schemas under a dense, rule-governed policy. CMT-Bench is designed to probe robustness via three semantics-preserving dimensions: (i) extractive-cue ablation to separate extractive shortcuts from state tracking, (ii) temporal prefixing to test long-context stability, and (iii) entity-form perturbations (anonymization, outof-distribution substitutions, role-entangling paraphrases) to assess sensitivity to surface variation. Across diverse long-context stateof-the-art LLMs, we find large drops without extractive summaries, monotonic degradation with input length, and consistent accuracy drop under entity-form changes. Complementary distributional tests confirm significant shifts in numeric error patterns, indicating drift in reasoning rather than mere noise. Our results show that current LLMs are brittle in dynamic Textto-table generation, motivating robustness-first evaluation as a prerequisite for developing efficient and scalable approaches for this task.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (7 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Leisure & Entertainment > Sports > Cricket (1.00)
- Leisure & Entertainment > Sports > Soccer (0.93)
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
Zheng, Qirui, Wang, Xingbo, Cheng, Keyuan, Ali, Muhammad Asif, Lu, Yunlong, Li, Wenxin
The advent of artificial intelligence has propelled AI-Generated Game Commentary (AI-GGC) into a rapidly expanding field, offering benefits such as unlimited availability and personalized narration. However, current researches in this area remain fragmented, and a comprehensive survey that systematically unifies existing efforts is still missing. To bridge this gap, our survey introduces a unified framework that systematically organizes the AI-GGC landscape. We present a novel taxonomy focused on three core commentator capabilities: Live Observation, Strategic Analysis, and Historical Recall. Commentary is further categorized into three functional types: Descriptive, Analytical, and Background. Building on this structure, we provide an in-depth review of state-of-the-art methods, datasets, and evaluation metrics across various game genres. Finally, we highlight key challenges such as real-time reasoning, multimodal integration, and evaluation bottlenecks, and outline promising directions for future research and system development in AI-GGC.
- Europe > Czechia > Prague (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (17 more...)
- Research Report (1.00)
- Overview (1.00)
- Leisure & Entertainment > Sports > Soccer (0.95)
- Leisure & Entertainment > Games > Computer Games (0.68)
- Leisure & Entertainment > Sports > Basketball (0.67)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Xu, Ruyi, Xiao, Guangxuan, Chen, Yukang, He, Liuning, Peng, Kelly, Lu, Yao, Han, Song
Vision-language models (VLMs) could power real-time assistants and autonomous agents, but they face a critical challenge: understanding near-infinite video streams without escalating latency and memory usage. Processing entire videos with full attention leads to quadratic computational costs and poor performance on long videos. Meanwhile, simple sliding window methods are also flawed, as they either break coherence or suffer from high latency due to redundant recomputation. In this paper, we introduce StreamingVLM, a model designed for real-time, stable understanding of infinite visual input. Our approach is a unified framework that aligns training with streaming inference. During inference, we maintain a compact KV cache by reusing states of attention sinks, a short window of recent vision tokens, and a long window of recent text tokens. This streaming ability is instilled via a simple supervised fine-tuning (SFT) strategy that applies full attention on short, overlapped video chunks, which effectively mimics the inference-time attention pattern without training on prohibitively long contexts. For evaluation, we build Inf-Streams-Eval, a new benchmark with videos averaging over two hours that requires dense, per-second alignment between frames and text. On Inf-Streams-Eval, StreamingVLM achieves a 66.18% win rate against GPT-4O mini and maintains stable, real-time performance at up to 8 FPS on a single NVIDIA H100. Notably, our SFT strategy also enhances general VQA abilities without any VQA-specific fine-tuning, improving performance on LongVideoBench by +4.30 and OVOBench Realtime by +5.96. Code is available at https://github.com/mit-han-lab/streaming-vlm.
Generative Data Refinement: Just Ask for Better Data
Jiang, Minqi, Araújo, João G. M., Ellsworth, Will, Gooding, Sian, Grefenstette, Edward
For a fixed parameter size, the capabilities of large models are primarily determined by the quality and quantity of its training data. Consequently, training datasets now grow faster than the rate at which new data is indexed on the web, leading to projected data exhaustion over the next decade. Much more data exists as user-generated content that is not publicly indexed, but incorporating such data comes with considerable risks, such as leaking private information and other undesirable content. We introduce a framework, Generative Data Refinement (GDR), for using pretrained generative models to transform a dataset with undesirable content into a refined dataset that is more suitable for training. Our experiments show that GDR can outperform industry-grade solutions for dataset anonymization, as well as enable direct detoxification of highly unsafe datasets. Moreover, we show that by generating synthetic data that is conditioned on each example in the real dataset, GDR's refined outputs naturally match the diversity of web scale datasets, and thereby avoid the often challenging task of generating diverse synthetic data via model prompting. The simplicity and effectiveness of GDR make it a powerful tool for scaling up the total stock of training data for frontier models.
- North America > United States > California (0.04)
- Europe > Croatia (0.04)
- Africa (0.04)
The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?
Shanahan, Murray, Das, Tara, Thurman, Robert
This paper presents a case study in the use of a large language model to generate a fictional Buddhist "sutra"', and offers a detailed analysis of the resulting text from a philosophical and literary point of view. The conceptual subtlety, rich imagery, and density of allusion found in the text make it hard to causally dismiss on account of its mechanistic origin. This raises questions about how we, as a society, should come to terms with the potentially unsettling possibility of a technology that encroaches on human meaning-making. We suggest that Buddhist philosophy, by its very nature, is well placed to adapt.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Tibet Autonomous Region (0.04)
- (5 more...)
Peer Review as Structured Commentary: Immutable Identity, Public Dialogue, and Reproducible Scholarship
This paper reconceptualises peer review as structured public commentary. Traditional academic validation is hindered by anonymity, latency, and gatekeeping. We propose a transparent, identity-linked, and reproducible system of scholarly evaluation anchored in open commentary. Leveraging blockchain for immutable audit trails and AI for iterative synthesis, we design a framework that incentivises intellectual contribution, captures epistemic evolution, and enables traceable reputational dynamics. This model empowers fields from computational science to the humanities, reframing academic knowledge as a living process rather than a static credential.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Greenland (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.92)
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding
Gautam, Sushant, Midoglu, Cise, Thambawita, Vajira, Riegler, Michael A., Halvorsen, Pål, Shah, Mubarak
The integration of artificial intelligence in sports analytics has transformed soccer video understanding, enabling real-time, automated insights into complex game dynamics. Traditional approaches rely on isolated data streams, limiting their effectiveness in capturing the full context of a match. To address this, we introduce SoccerChat, a multimodal conversational AI framework that integrates visual and textual data for enhanced soccer video comprehension. Leveraging the extensive SoccerNet dataset, enriched with jersey color annotations and automatic speech recognition (ASR) transcripts, SoccerChat is fine-tuned on a structured video instruction dataset to facilitate accurate game understanding, event classification, and referee decision making. We benchmark SoccerChat on action classification and referee decision-making tasks, demonstrating its performance in general soccer event comprehension while maintaining competitive accuracy in referee decision making. Our findings highlight the importance of multimodal integration in advancing soccer analytics, paving the way for more interactive and explainable AI-driven sports analysis. https://github.com/simula/SoccerChat
- Europe > Norway (0.40)
- North America > United States (0.04)
- Overview (1.00)
- Research Report > New Finding (0.34)