Generative AI
Match Chat: Real Time Generative AI and Generative Computing for Tennis
Baughman, Aaron, Akay, Gozde, Morales, Eduardo, Agarwal, Rahul, Srivastava, Preetika
We present Match Chat, a real-time, agent-driven assistant designed to enhance the tennis fan experience by delivering instant, accurate responses to match-related queries. Match Chat integrates Generative Artificial Intelligence (GenAI) with Generative Computing (GenComp) techniques to synthesize key insights during live tennis singles matches. The system debuted at the 2025 Wimbledon Championships and the 2025 US Open, where it provided about 1 million users with seamless access to streaming and static data through natural language queries. The architecture is grounded in an Agent-Oriented Architecture (AOA) combining rule engines, predictive models, and agents to pre-process and optimize user queries before passing them to GenAI components. The Match Chat system had an answer accuracy of 92.83% with an average response time of 6.25 seconds under loads of up to 120 requests per second (RPS). Over 96.08% of all queries were guided using interactive prompt design, contributing to a user experience that prioritized clarity, responsiveness, and minimal effort. The system was designed to mask architectural complexity, offering a frictionless and intuitive interface that required no onboarding or technical familiarity. Across both Grand Slam deployments, Match Chat maintained 100% uptime and supported nearly 1 million unique users, underscoring the scalability and reliability of the platform. This work introduces key design patterns for real-time, consumer-facing AI systems that emphasize speed, precision, and usability that highlights a practical path for deploying performant agentic systems in dynamic environments.
Evalet: Evaluating Large Language Models by Fragmenting Outputs into Functions
Kim, Tae Soo, Lee, Heechan, Lee, Yoonjoo, Seering, Joseph, Kim, Juho
Practitioners increasingly rely on Large Language Models (LLMs) to evaluate generative AI outputs through "LLM-as-a-Judge" approaches. However, these methods produce holistic scores that obscure which specific elements influenced the assessments. We propose functional fragmentation, a method that dissects each output into key fragments and interprets the rhetoric functions that each fragment serves relative to evaluation criteria -- surfacing the elements of interest and revealing how they fulfill or hinder user goals. We instantiate this approach in Evalet, an interactive system that visualizes fragment-level functions across many outputs to support inspection, rating, and comparison of evaluations. A user study (N=10) found that, while practitioners struggled to validate holistic scores, our approach helped them identify 48% more evaluation misalignments. This helped them calibrate trust in LLM evaluations and rely on them to find more actionable issues in model outputs. Our work shifts LLM evaluation from quantitative scores toward qualitative, fine-grained analysis of model behavior.
Synthetic Survival Data Generation for Heart Failure Prognosis Using Deep Generative Models
Puttanawarut, Chanon, Fongsrisin, Natcha, Amornritvanich, Porntep, Looareesuwan, Panu, Ratanatharathorn, Cholatid
Background: Heart failure (HF) research is constrained by limited access to large, shareable datasets due to privacy regulations and institutional barriers. Synthetic data generation offers a promising solution to overcome these challenges while preserving patient confidentiality. Methods: We generated synthetic HF datasets from institutional data comprising 12,552 unique patients using five deep learning models: tabular variational autoencoder (TVAE), normalizing flow, ADSGAN, SurvivalGAN, and tabular denoising diffusion probabilistic models (TabDDPM). We comprehensively evaluated synthetic data utility through statistical similarity metrics, survival prediction using machine learning and privacy assessments. Results: SurvivalGAN and TabDDPM demonstrated high fidelity to the original dataset, exhibiting similar variable distributions and survival curves after applying histogram equalization. SurvivalGAN (C-indices: 0.71-0.76) and TVAE (C-indices: 0.73-0.76) achieved the strongest performance in survival prediction evaluation, closely matched real data performance (C-indices: 0.73-0.76). Privacy evaluation confirmed protection against re-identification attacks. Conclusions: Deep learning-based synthetic data generation can produce high-fidelity, privacy-preserving HF datasets suitable for research applications. This publicly available synthetic dataset addresses critical data sharing barriers and provides a valuable resource for advancing HF research and predictive modeling.
OpenAI Rolls Out Teen Safety Features Amid Growing Scrutiny
CEO Sam Altman announced an age-prediction system and new parental controls in a blog post on Tuesday. OpenAI announced new teen safety features for ChatGPT on Tuesday as part of an ongoing effort to respond to concerns about how minors engage with chatbots . The company is building an age-prediction system that identifies if a user is under 18 years old and routes them to an " age-appropriate " system that blocks graphic sexual content. If the system detects that the user is considering suicide or self-harm, it will contact the user's parents. In cases of imminent danger, if a user's parents are unreachable, the system may contact the authorities.
Around one-third of AI search tool answers make unsupported claims
AI tools including Perplexity and Open AI's GPT-4 often provide one-sided answers to contentious questions, and don't back up their arguments with reliable sources How well-supported are the claims made by AI tools? Generative AI tools, and the deep research agents and search engines powered by them, frequently make unsupported and biased claims that aren't backed up by the sources they cite. That's according to an analysis which found that about one-third of answers provided by the AI tools aren't backed up by reliable sources. For OpenAI's GPT 4.5, the figure was even higher, at 47 per cent. Alongside this, they put five deep research agents through their paces: GPT-5's Deep Research feature, Bing Chat's Think Deeper option and deep research tools offered by You.com, Google Gemini and Perplexity.
Advancing Medical Artificial Intelligence Using a Century of Cases
Buckley, Thomas A., Conci, Riccardo, Brodeur, Peter G., Gusdorf, Jason, Beltrรกn, Sourik, Behrouzi, Bita, Crowe, Byron, Dockterman, Jacob, Muhammad, Muzzammil, Ohnigian, Sarah, Sanchez, Andrew, Diao, James A., Shah, Aashna P., Restrepo, Daniel, Rosenberg, Eric S., Lea, Andrew S., Zitnik, Marinka, Podolsky, Scott H., Kanjee, Zahir, Abdulnour, Raja-Elie E., Koshy, Jacob M., Rodman, Adam, Manrai, Arjun K.
BACKGROUND: For over a century, the New England Journal of Medicine Clinicopathological Conferences (CPCs) have tested the reasoning of expert physicians and, recently, artificial intelligence (AI). However, prior AI evaluations have focused on final diagnoses without addressing the multifaceted reasoning and presentation skills required of expert discussants. METHODS: Using 7102 CPCs (1923-2025) and 1021 Image Challenges (2006-2025), we conducted extensive physician annotation and automated processing to create CPC-Bench, a physician-validated benchmark spanning 10 text-based and multimodal tasks, against which we evaluated leading large language models (LLMs). Then, we developed "Dr. CaBot," an AI discussant designed to produce written and slide-based video presentations using only the case presentation, modeling the role of the human expert in these cases. RESULTS: When challenged with 377 contemporary CPCs, o3 (OpenAI) ranked the final diagnosis first in 60% of cases and within the top ten in 84% of cases, outperforming a 20-physician baseline; next-test selection accuracy reached 98%. Event-level physician annotations quantified AI diagnostic accuracy per unit of information. Performance was lower on literature search and image tasks; o3 and Gemini 2.5 Pro (Google) achieved 67% accuracy on image challenges. In blinded comparisons of CaBot vs. human expert-generated text, physicians misclassified the source of the differential in 46 of 62 (74%) of trials, and scored CaBot more favorably across quality dimensions. To promote research, we are releasing CaBot and CPC-Bench. CONCLUSIONS: LLMs exceed physician performance on complex text-based differential diagnosis and convincingly emulate expert medical presentations, but image interpretation and literature retrieval remain weaker. CPC-Bench and CaBot may enable transparent and continued tracking of progress in medical AI.
LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications
Lin, Yujun, Zhang, Zhekai, Han, Song
Modern tensor applications, especially foundation models and generative AI applications require multiple input modalities (both vision and language), which increases the demand for flexible accelerator architecture. Existing frameworks suffer from the trade-off between design flexibility and productivity of RTL generation: either limited to very few hand-written templates or cannot automatically generate the RTL. To address this challenge, we propose the LEGO framework, which targets tensor applications and automatically generates spatial architecture design and outputs synthesizable RTL code without handwritten RTL design templates. Leveraging the affine-transformation-based architecture representation, LEGO front end finds interconnections between function units, synthesizes the memory system, and fuses different spatial dataflow designs based on data reuse analysis. LEGO back end then translates the hardware in a primitive-level graph to perform lower-level optimizations, and applies a set of linear-programming algorithms to optimally insert pipeline registers and reduce the overhead of unused logic when switching spatial dataflows. Our evaluation demonstrates that LEGO can achieve 3.2x speedup and 2.4x energy efficiency compared to previous work Gemmini, and can generate one architecture for diverse modern foundation models in generative AI applications.
Data-Driven Analysis of Text-Conditioned AI-Generated Music: A Case Study with Suno and Udio
Casini, Luca, Vila, Laura Cros, Dalmazzo, David, Kaila, Anna-Kaisa, Sturm, Bob L. T.
Online AI platforms for creating music from text prompts (AI music), such as Suno and Udio, are now being used by hundreds of thousands of users. Some AI music is appearing in advertising, and even charting, in multiple countries. How are these platforms being used? What subjects are inspiring their users? This article answers these questions for Suno and Udio using a large collection of songs generated by users of these platforms from May to October 2024. Using a combination of state-of-the-art text embedding models, dimensionality reduction and clustering methods, we analyze the prompts, tags and lyrics, and automatically annotate and display the processed data in interactive plots. Our results reveal prominent themes in lyrics, language preference, prompting strategies, as well as peculiar attempts at steering models through the use of metatags. To promote the musicological study of the developing cultural practice of AI-generated music we share our code and resources.
DTGen: Generative Diffusion-Based Few-Shot Data Augmentation for Fine-Grained Dirty Tableware Recognition
Hao, Lifei, Cheng, Yue, Huang, Baoqi, Jia, Bing, Zhao, Xuandong
Intelligent tableware cleaning is a critical application in food safety and smart homes, but existing methods are limited by coarse-grained classification and scarcity of few-shot data, making it difficult to meet industrialization requirements. We propose DTGen, a few-shot data augmentation scheme based on generative diffusion models, specifically designed for fine-grained dirty tableware recognition. DTGen achieves efficient domain specialization through LoRA, generates diverse dirty images via structured prompts, and ensures data quality through CLIP-based cross-modal filtering. Under extremely limited real few-shot conditions, DTGen can synthesize virtually unlimited high-quality samples, significantly improving classifier performance and supporting fine-grained dirty tableware recognition. We further elaborate on lightweight deployment strategies, promising to transfer DTGen's benefits to embedded dishwashers and integrate with cleaning programs to intelligently regulate energy consumption and detergent usage. Research results demonstrate that DTGen not only validates the value of generative AI in few-shot industrial vision but also provides a feasible deployment path for automated tableware cleaning and food safety monitoring.
CareerPooler: AI-Powered Metaphorical Pool Simulation Improves Experience and Outcomes in Career Exploration
Wang, Ziyi, Zeng, Ziwen, Li, Yuan, Ding, Zijian
Career exploration is uncertain, requiring decisions with limited information and unpredictable outcomes. While generative AI offers new opportunities for career guidance, most systems rely on linear chat interfaces that produce overly comprehensive and idealized suggestions, overlooking the non-linear and effortful nature of real-world trajectories. We present CareerPooler, a generative AI-powered system that employs a pool-table metaphor to simulate career development as a spatial and narrative interaction. Users strike balls representing milestones, skills, and random events, where hints, collisions, and rebounds embody decision-making under uncertainty. In a within-subjects study with 24 participants, CareerPooler significantly improved engagement, information gain, satisfaction, and career clarity compared to a chatbot baseline. Qualitative findings show that spatial-narrative interaction fosters experience-based learning, resilience through setbacks, and reduced psychological burden. Our findings contribute to the design of AI-assisted career exploration systems and more broadly suggest that visually grounded analogical interactions can make generative systems engaging and satisfying.