Goto

Collaborating Authors

 Generative AI


Automatic generation of DRI Statements

arXiv.org Artificial Intelligence

Assessing the quality of group deliberation is essential for improving our understanding of deliberative processes. The Deliberative Reason Index (DRI) offers a sophisticated metric for evaluating group reasoning, but its implementation has been constrained by the complex and time-consuming process of statement generation. This thesis introduces an innovative, automated approach to DRI statement generation that leverages advanced natural language processing (NLP) and large language models (LLMs) to substantially reduce the human effort involved in survey preparation. Key contributions are a systematic framework for automated DRI statement generation and a methodological innovation that significantly lowers the barrier to conducting comprehensive deliberative process assessments. In addition, the findings provide a replicable template for integrating generative artificial intelligence into social science research methodologies.


AI-Driven Robotics for Optics

arXiv.org Artificial Intelligence

Optics is foundational to research in many areas of science and engineering, including nanophotonics, quantum information, materials science, biomedical imaging, and metrology. However, the design, assembly, and alignment of optical experiments remain predominantly manual, limiting throughput and reproducibility. Automating such experiments is challenging due to the strict, non-negotiable precision requirements and the diversity of optical configurations found in typical laboratories. Here, we introduce a platform that integrates generative artificial intelligence, computer vision, and robotics to automate free-space optical experiments. The platform translates user-defined goals into valid optical configurations, assembles them using a robotic arm, and performs micrometer-scale fine alignment using a robot-deployable tool. It then executes a range of automated measurements, including beam characterization, polarization mapping, and spectroscopy, with consistency surpassing that of human operators. This work demonstrates the first flexible, AI-driven automation platform for optics, offering a path towards remote operation, cloud labs, and high-throughput discovery in the optical sciences.


Output Supervision Can Obfuscate the Chain of Thought

arXiv.org Artificial Intelligence

OpenAI (2025a) showed that training against a chain of thought (CoT) monitor can cause obfuscated CoTs, which contain bad behavior the monitor cannot detect. They proposed to keep CoTs monitorable by training only against output monitors that do not have access to CoT. We show that such training can still cause obfuscated CoTs via two mechanisms. First, when a model is trained to produce a safe-looking output, that model may generalize to making its CoTs look safe. Second, since later tokens are conditioned on earlier ones, safe-looking CoTs may increase the likelihood of safe outputs, causing safe-looking CoTs to be reinforced. We introduce two mitigations to address these two issues, which achieve a Pareto improvement in terms of monitorability and task performance compared to regular training. Our work implies that preserving CoT monitorabil-ity is more difficult than previously thought; we suggest practical guidelines for AI developers to maintain monitorable CoTs. Reasoning models, which generate a CoT before their final answer, dominate mathematics and coding benchmarks (Guo et al., 2025; OpenAI, 2025b; Qwen Team, 2025). Korbak et al. (2025) argue that AI developers should aim to maintain the ability to detect model misbehavior by analyzing CoTs ("CoT monitorability"), claiming that it offers a "unique opportunity for AI safety". Baker et al. (2025) show that training frontier models against a CoT monitor can cause their CoTs to become obfuscated: unwanted behaviors persist in a way that is undetectable by the monitor. To address this, they recommend against directly optimizing the CoT. However, we show that even training against a monitor that sees only the final output (and not the CoT) can still cause obfuscated CoT. We refer to this effect as feedback spillover, and show that it can occur for two reasons:Figure 1: Feedback spillover: training against an output monitor obfuscates the CoT. 1 To mitigate parametric feedback spillover, we generate the CoT and output using two distinct models, which we refer to as the "Mind" and "Face" respectively, as proposed by Kokotajlo & Demski (2024).


OpenAI's Fidji Simo Plans to Make ChatGPT Way More Useful--and Have You Pay For It

WIRED

As OpenAI expands in every direction, the new CEO of Applications is on a mission to make ChatGPT indispensable and lucrative. In case OpenAI's structure couldn't get any weirder--a nonprofit in charge of a for-profit that's become a public benefit corporation--it now has two CEOs. There's Sam Altman, chief executive of the whole company, who manages research and compute. And as of this summer, there's Fidji Simo, the former CEO of Instacart, who manages everything else. Simo hasn't been seen much at OpenAI's San Francisco office since she began as CEO of Applications in August. But her presence is felt at every level of the company--not least because she's heading up ChatGPT and basically every function that might make OpenAI money. Simo is dealing with a relapse of postural orthostatic tachycardia syndrome (POTS) that makes her prone to fainting if she stands for long periods of time. "Being present from 8 am to midnight every day, responding within five minutes, people feel like I'm there and that they can reach me immediately, that I jump on the phone within five minutes," she tells me. Employees confirm that this is true. OpenAI's famously Slack-driven culture can be overwhelming for new hires. Employees say she is often seen popping into channels and threads, sharing thoughts and asking questions.


Generative Artificial Intelligence Adoption Among Bangladeshi Journalists: Exploring Journalists' Awareness, Acceptance, Usage, and Organizational Stance on Generative AI

arXiv.org Artificial Intelligence

Newsrooms and journalists across the world are adopting Generative AI (GenAI). Drawing on in-depth interviews with 23 journalists, this study identifies Bangladeshi journalists' awareness, acceptance, usage patterns, and their media organizations' stance toward GenAI. This study finds Bangladeshi journalists' high reliance on GenAI like their Western colleagues despite limited institutional support and the near absence of AI policy. Despite this contrast, concerns over GenAI's implications in journalism between the West and non-West were mostly identical. Moreover, this study contributes to the Unified Theory of Acceptance and Use of Technology (UTAUT) by proposing two changes regarding GenAI adoption among journalists in non-Western settings. First, this study identifies the non-contribution of facilitating conditions in shaping behavioral intent in GenAI adoption in non-Western contexts. Second, social influence works in a horizontal order through informal peer pressure or professional motivation in the absence of formal institutional hierarchical pressure. Voluntariness in the context of Bangladeshi journalists is underpinned by their professional compulsion. Therefore, this study contributes to understanding how contextual factors shape technology adoption trajectories in non-Western journalism.


When to Stop Federated Learning: Zero-Shot Generation of Synthetic Validation Data with Generative AI for Early Stopping

arXiv.org Artificial Intelligence

Federated Learning (FL) enables collaborative model training across decentralized devices while preserving data privacy. However, FL methods typically run for a predefined number of global rounds, often leading to unnecessary computation when optimal performance is reached earlier. In addition, training may continue even when the model fails to achieve meaningful performance. To address this inefficiency, we introduce a zero-shot synthetic validation framework that leverages generative AI to monitor model performance and determine early stopping points. Our approach adaptively stops training near the optimal round, thereby conserving computational resources and enabling rapid hyperparameter adjustments. Numerical results on multi-label chest X-ray classification demonstrate that our method reduces training rounds by up to 74% while maintaining accuracy within 1% of the optimal.


Automated Analysis of Learning Outcomes and Exam Questions Based on Bloom's Taxonomy

arXiv.org Artificial Intelligence

This paper explores the automatic classification of exam questions and learning outcomes according to Bloom's Taxonomy. A small dataset of 600 sentences labeled with six cognitive categories - Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation - was processed using traditional machine learning (ML) models (Naive Bayes, Logistic Regression, Support Vector Machines), recurrent neural network architectures (LSTM, BiLSTM, GRU, BiGRU), transformer-based models (BERT and RoBERTa), and large language models (OpenAI, Gemini, Ollama, Anthropic). Each model was evaluated under different preprocessing and augmentation strategies (for example, synonym replacement, word embeddings, etc.). Among traditional ML approaches, Support Vector Machines (SVM) with data augmentation achieved the best overall performance, reaching 94 percent accuracy, recall, and F1 scores with minimal overfitting. In contrast, the RNN models and BERT suffered from severe overfitting, while RoBERTa initially overcame it but began to show signs as training progressed. Finally, zero-shot evaluations of large language models (LLMs) indicated that OpenAI and Gemini performed best among the tested LLMs, achieving approximately 0.72-0.73 accuracy and comparable F1 scores. These findings highlight the challenges of training complex deep models on limited data and underscore the value of careful data augmentation and simpler algorithms (such as augmented SVM) for Bloom's Taxonomy classification.


Sabiรก: Um Chatbot de Inteligรชncia Artificial Generativa para Suporte no Dia a Dia do Ensino Superior

arXiv.org Artificial Intelligence

Students often report difficulties in accessing day-to-day academic information, which is usually spread across numerous institutional documents and websites. This fragmentation results in a lack of clarity and causes confusion about routine university information. This project proposes the development of a chatbot using Generative Artificial Intelligence (GenAI) and Retrieval-Augmented Generation (RAG) to simplify access to such information. Several GenAI models were tested and evaluated based on quality metrics and the LLM-as-a-Judge approach. Among them, Gemini 2.0 Flash stood out for its quality and speed, and Gemma 3n for its good performance and open-source nature.


Do AI Voices Learn Social Nuances? A Case of Politeness and Speech Rate

arXiv.org Artificial Intelligence

Voice-based artificial intelligence is increasingly expected to adhere to human social conventions, but can it learn implicit cues that are not explicitly programmed? This study investigates whether state-of-the-art text-to-speech systems have internalized the human tendency to reduce speech rate to convey politeness - a non-obvious prosodic marker. We prompted 22 synthetic voices from two leading AI platforms (AI Studio and OpenAI) to read a fixed script under both "polite and formal" and "casual and informal" conditions and measured the resulting speech duration. Across both AI platforms, the polite prompt produced slower speech than the casual prompt with very large effect sizes, an effect that was statistically significant for all of AI Studio's voices and for a large majority of OpenAI's voices. These results demonstrate that AI can implicitly learn and replicate psychological nuances of human communication, highlighting its emerging role as a social actor capable of reinforcing human social norms.


Wage Sentiment Indices Derived from Survey Comments via Large Language Models

arXiv.org Artificial Intelligence

The emergence of generative Artificial Intelligence (AI) has created new opportunities for economic text analysis. This study proposes a Wage Sentiment Index (WSI) constructed with Large Language Models (LLMs) to forecast wage dynamics in Japan. The analysis is based on the Economy Watchers Survey (EWS), a monthly survey conducted by the Cabinet Office of Japan that captures real-time economic assessments from workers in industries highly sensitive to business conditions. The WSI extends the framework of the Price Sentiment Index (PSI) used in prior studies, adapting it specifically to wage related sentiment. To ensure scalability and adaptability, a data architecture is also developed that enables integration of additional sources such as newspapers and social media. Experimental results demonstrate that WSI models based on LLMs significantly outperform both baseline approaches and pretrained models. These findings highlight the potential of LLM-driven sentiment indices to enhance the timeliness and effectiveness of economic policy design by governments and central banks.