AITopics

2509.12592

Country:

North America > United States (0.93)
Europe > United Kingdom > England > Greater London > London > Wimbledon (0.25)

Genre:

Research Report (0.53)
Overview (0.46)

Industry: Leisure & Entertainment > Sports > Tennis (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
(2 more...)

arXiv.org Artificial IntelligenceSep-17-2025

Evalet: Evaluating Large Language Models by Fragmenting Outputs into Functions

Kim, Tae Soo, Lee, Heechan, Lee, Yoonjoo, Seering, Joseph, Kim, Juho

Practitioners increasingly rely on Large Language Models (LLMs) to evaluate generative AI outputs through "LLM-as-a-Judge" approaches. However, these methods produce holistic scores that obscure which specific elements influenced the assessments. We propose functional fragmentation, a method that dissects each output into key fragments and interprets the rhetoric functions that each fragment serves relative to evaluation criteria -- surfacing the elements of interest and revealing how they fulfill or hinder user goals. We instantiate this approach in Evalet, an interactive system that visualizes fragment-level functions across many outputs to support inspection, rating, and comparison of evaluations. A user study (N=10) found that, while practitioners struggled to validate holistic scores, our approach helped them identify 48% more evaluation misalignments. This helped them calibrate trust in LLM evaluations and rely on them to find more actionable issues in model outputs. Our work shifts LLM evaluation from quantitative scores toward qualitative, fine-grained analysis of model behavior.

large language model, machine learning, natural language, (18 more...)

2509.11206

Country: North America > United States > California (0.28)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (0.47)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Puttanawarut, Chanon, Fongsrisin, Natcha, Amornritvanich, Porntep, Looareesuwan, Panu, Ratanatharathorn, Cholatid

Synthetic Survival Data Generation for Heart Failure Prognosis Using Deep Generative Models

arXiv.org Artificial IntelligenceSep-17-2025

Background: Heart failure (HF) research is constrained by limited access to large, shareable datasets due to privacy regulations and institutional barriers. Synthetic data generation offers a promising solution to overcome these challenges while preserving patient confidentiality. Methods: We generated synthetic HF datasets from institutional data comprising 12,552 unique patients using five deep learning models: tabular variational autoencoder (TVAE), normalizing flow, ADSGAN, SurvivalGAN, and tabular denoising diffusion probabilistic models (TabDDPM). We comprehensively evaluated synthetic data utility through statistical similarity metrics, survival prediction using machine learning and privacy assessments. Results: SurvivalGAN and TabDDPM demonstrated high fidelity to the original dataset, exhibiting similar variable distributions and survival curves after applying histogram equalization. SurvivalGAN (C-indices: 0.71-0.76) and TVAE (C-indices: 0.73-0.76) achieved the strongest performance in survival prediction evaluation, closely matched real data performance (C-indices: 0.73-0.76). Privacy evaluation confirmed protection against re-identification attacks. Conclusions: Deep learning-based synthetic data generation can produce high-fidelity, privacy-preserving HF datasets suitable for research applications. This publicly available synthetic dataset addresses critical data sharing barriers and provides a valuable resource for advancing HF research and predictive modeling.

artificial intelligence, deep learning, machine learning, (17 more...)

2509.04245

Country:

North America > United States (0.46)
Asia > Thailand (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)

WIREDSep-16-2025, 20:53:52 GMT

OpenAI Rolls Out Teen Safety Features Amid Growing Scrutiny

CEO Sam Altman announced an age-prediction system and new parental controls in a blog post on Tuesday. OpenAI announced new teen safety features for ChatGPT on Tuesday as part of an ongoing effort to respond to concerns about how minors engage with chatbots . The company is building an age-prediction system that identifies if a user is under 18 years old and routes them to an " age-appropriate " system that blocks graphic sexual content. If the system detects that the user is considering suicide or self-harm, it will contact the user's parents. In cases of imminent danger, if a user's parents are unreachable, the system may contact the authorities.

altman, openai, teen safety feature, (15 more...)

WIRED

Country:

Oceania > Australia (0.05)
South America (0.05)
North America > United States > Mississippi (0.05)
(4 more...)

Industry:

Government (0.70)
Law > Statutes (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.80)

New ScientistSep-16-2025, 14:00:38 GMT

Around one-third of AI search tool answers make unsupported claims

AI tools including Perplexity and Open AI's GPT-4 often provide one-sided answers to contentious questions, and don't back up their arguments with reliable sources How well-supported are the claims made by AI tools? Generative AI tools, and the deep research agents and search engines powered by them, frequently make unsupported and biased claims that aren't backed up by the sources they cite. That's according to an analysis which found that about one-third of answers provided by the AI tools aren't backed up by reliable sources. For OpenAI's GPT 4.5, the figure was even higher, at 47 per cent. Alongside this, they put five deep research agents through their paces: GPT-5's Deep Research feature, Bing Chat's Think Deeper option and deep research tools offered by You.com, Google Gemini and Perplexity.

ai search tool answer make, search engine, unsupported claim, (10 more...)

New Scientist

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.57)

Advancing Medical Artificial Intelligence Using a Century of Cases

Buckley, Thomas A., Conci, Riccardo, Brodeur, Peter G., Gusdorf, Jason, Beltrán, Sourik, Behrouzi, Bita, Crowe, Byron, Dockterman, Jacob, Muhammad, Muzzammil, Ohnigian, Sarah, Sanchez, Andrew, Diao, James A., Shah, Aashna P., Restrepo, Daniel, Rosenberg, Eric S., Lea, Andrew S., Zitnik, Marinka, Podolsky, Scott H., Kanjee, Zahir, Abdulnour, Raja-Elie E., Koshy, Jacob M., Rodman, Adam, Manrai, Arjun K.

BACKGROUND: For over a century, the New England Journal of Medicine Clinicopathological Conferences (CPCs) have tested the reasoning of expert physicians and, recently, artificial intelligence (AI). However, prior AI evaluations have focused on final diagnoses without addressing the multifaceted reasoning and presentation skills required of expert discussants. METHODS: Using 7102 CPCs (1923-2025) and 1021 Image Challenges (2006-2025), we conducted extensive physician annotation and automated processing to create CPC-Bench, a physician-validated benchmark spanning 10 text-based and multimodal tasks, against which we evaluated leading large language models (LLMs). Then, we developed "Dr. CaBot," an AI discussant designed to produce written and slide-based video presentations using only the case presentation, modeling the role of the human expert in these cases. RESULTS: When challenged with 377 contemporary CPCs, o3 (OpenAI) ranked the final diagnosis first in 60% of cases and within the top ten in 84% of cases, outperforming a 20-physician baseline; next-test selection accuracy reached 98%. Event-level physician annotations quantified AI diagnostic accuracy per unit of information. Performance was lower on literature search and image tasks; o3 and Gemini 2.5 Pro (Google) achieved 67% accuracy on image challenges. In blinded comparisons of CaBot vs. human expert-generated text, physicians misclassified the source of the differential in 46 of 62 (74%) of trials, and scored CaBot more favorably across quality dimensions. To promote research, we are releasing CaBot and CPC-Bench. CONCLUSIONS: LLMs exceed physician performance on complex text-based differential diagnosis and convincingly emulate expert medical presentations, but image interpretation and literature retrieval remain weaker. CPC-Bench and CaBot may enable transparent and continued tracking of progress in medical AI.

large language model, machine learning, natural language, (20 more...)

2509.12194

Country: North America > United States > Massachusetts (0.30)

Genre: Research Report (0.94)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications

Lin, Yujun, Zhang, Zhekai, Han, Song

Modern tensor applications, especially foundation models and generative AI applications require multiple input modalities (both vision and language), which increases the demand for flexible accelerator architecture. Existing frameworks suffer from the trade-off between design flexibility and productivity of RTL generation: either limited to very few hand-written templates or cannot automatically generate the RTL. To address this challenge, we propose the LEGO framework, which targets tensor applications and automatically generates spatial architecture design and outputs synthesizable RTL code without handwritten RTL design templates. Leveraging the affine-transformation-based architecture representation, LEGO front end finds interconnections between function units, synthesizes the memory system, and fuses different spatial dataflow designs based on data reuse analysis. LEGO back end then translates the hardware in a primitive-level graph to perform lower-level optimizations, and applies a set of linear-programming algorithms to optimally insert pipeline registers and reduce the overhead of unused logic when switching spatial dataflows. Our evaluation demonstrates that LEGO can achieve 3.2x speedup and 2.4x energy efficiency compared to previous work Gemmini, and can generate one architecture for diverse modern foundation models in generative AI applications.

large language model, machine learning, natural language, (21 more...)

doi: 10.1109/HPCA61900.2025.00101

2509.12053

Country: North America > United States (0.28)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.54)

Casini, Luca, Vila, Laura Cros, Dalmazzo, David, Kaila, Anna-Kaisa, Sturm, Bob L. T.

Data-Driven Analysis of Text-Conditioned AI-Generated Music: A Case Study with Suno and Udio

Online AI platforms for creating music from text prompts (AI music), such as Suno and Udio, are now being used by hundreds of thousands of users. Some AI music is appearing in advertising, and even charting, in multiple countries. How are these platforms being used? What subjects are inspiring their users? This article answers these questions for Suno and Udio using a large collection of songs generated by users of these platforms from May to October 2024. Using a combination of state-of-the-art text embedding models, dimensionality reduction and clustering methods, we analyze the prompts, tags and lyrics, and automatically annotate and display the processed data in interactive plots. Our results reveal prominent themes in lyrics, language preference, prompting strategies, as well as peculiar attempts at steering models through the use of metatags. To promote the musicological study of the developing cultural practice of AI-generated music we share our code and resources.

large language model, lyric, machine learning, (22 more...)

2509.11824

Country: Europe (0.93)

Genre: Research Report > New Finding (0.48)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
(2 more...)

DTGen: Generative Diffusion-Based Few-Shot Data Augmentation for Fine-Grained Dirty Tableware Recognition

Hao, Lifei, Cheng, Yue, Huang, Baoqi, Jia, Bing, Zhao, Xuandong

Intelligent tableware cleaning is a critical application in food safety and smart homes, but existing methods are limited by coarse-grained classification and scarcity of few-shot data, making it difficult to meet industrialization requirements. We propose DTGen, a few-shot data augmentation scheme based on generative diffusion models, specifically designed for fine-grained dirty tableware recognition. DTGen achieves efficient domain specialization through LoRA, generates diverse dirty images via structured prompts, and ensures data quality through CLIP-based cross-modal filtering. Under extremely limited real few-shot conditions, DTGen can synthesize virtually unlimited high-quality samples, significantly improving classifier performance and supporting fine-grained dirty tableware recognition. We further elaborate on lightweight deployment strategies, promising to transfer DTGen's benefits to embedded dishwashers and integrate with cleaning programs to intelligently regulate energy consumption and detergent usage. Research results demonstrate that DTGen not only validates the value of generative AI in few-shot industrial vision but also provides a feasible deployment path for automated tableware cleaning and food safety monitoring.

artificial intelligence, machine learning, natural language, (16 more...)

2509.11661

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.48)

Industry:

Energy (0.68)
Health & Medicine (0.55)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.55)
Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

CareerPooler: AI-Powered Metaphorical Pool Simulation Improves Experience and Outcomes in Career Exploration

Wang, Ziyi, Zeng, Ziwen, Li, Yuan, Ding, Zijian

Career exploration is uncertain, requiring decisions with limited information and unpredictable outcomes. While generative AI offers new opportunities for career guidance, most systems rely on linear chat interfaces that produce overly comprehensive and idealized suggestions, overlooking the non-linear and effortful nature of real-world trajectories. We present CareerPooler, a generative AI-powered system that employs a pool-table metaphor to simulate career development as a spatial and narrative interaction. Users strike balls representing milestones, skills, and random events, where hints, collisions, and rebounds embody decision-making under uncertainty. In a within-subjects study with 24 participants, CareerPooler significantly improved engagement, information gain, satisfaction, and career clarity compared to a chatbot baseline. Qualitative findings show that spatial-narrative interaction fosters experience-based learning, resilience through setbacks, and reduced psychological burden. Our findings contribute to the design of AI-assisted career exploration systems and more broadly suggest that visually grounded analogical interactions can make generative systems engaging and satisfying.

large language model, machine learning, natural language, (22 more...)

2509.11461

Country: North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry:

Leisure & Entertainment (1.00)
Banking & Finance (0.93)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)