Personal
The Ethical Compass of the Machine: Evaluating Large Language Models for Decision Support in Construction Project Management
Azie, Somtochukwu, Meng, Yiping
The integration of Artificial Intelligence (AI) into construction project management (CPM) is accelerating, with Large Language Models (LLMs) emerging as accessible decision-support tools. This study aims to critically evaluate the ethical viability and reliability of LLMs when applied to the ethically sensitive, high-risk decision-making contexts inherent in CPM. A mixed-methods research design was employed, involving the quantitative performance testing of two leading LLMs against twelve real-world ethical scenarios using a novel Ethical Decision Support Assessment Checklist (EDSAC), and qualitative analysis of semi-structured interviews with 12 industry experts to capture professional perceptions. The findings reveal that while LLMs demonstrate adequate performance in structured domains such as legal compliance, they exhibit significant deficiencies in handling contextual nuance, ensuring accountability, and providing transparent reasoning. Stakeholders expressed considerable reservations regarding the autonomous use of AI for ethical judgments, strongly advocating for robust human-in-the-loop oversight. To our knowledge, this is one of the first studies to empirically test the ethical reasoning of LLMs within the construction domain. It introduces the EDSAC framework as a replicable methodology and provides actionable recommendations, emphasising that LLMs are currently best positioned as decision-support aids rather than autonomous ethical agents.
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
Chen, Runjin, Arditi, Andy, Sleight, Henry, Evans, Owain, Lindsey, Jack
Large language models interact with users through a simulated 'Assistant' persona. While the Assistant is typically trained to be helpful, harmless, and honest, it sometimes deviates from these ideals. In this paper, we identify directions in the model's activation space-persona vectors-underlying several traits, such as evil, sycophancy, and propensity to hallucinate. We confirm that these vectors can be used to monitor fluctuations in the Assistant's personality at deployment time. We then apply persona vectors to predict and control personality shifts that occur during training. We find that both intended and unintended personality changes after finetuning are strongly correlated with shifts along the relevant persona vectors. These shifts can be mitigated through post-hoc intervention, or avoided in the first place with a new preventative steering method. Moreover, persona vectors can be used to flag training data that will produce undesirable personality changes, both at the dataset level and the individual sample level. Our method for extracting persona vectors is automated and can be applied to any personality trait of interest, given only a natural-language description.
DiaCBT: A Long-Periodic Dialogue Corpus Guided by Cognitive Conceptualization Diagram for CBT-based Psychological Counseling
Zhou, Yougen, Zhou, Ningning, Chen, Qin, Zhou, Jie, Zhou, Aimin, He, Liang
Psychotherapy reaches only a small fraction of individuals suffering from mental disorders due to social stigma and the limited availability of therapists. Large language models (LLMs), when equipped with professional psychotherapeutic skills, offer a promising solution to expand access to mental health services. However, the lack of psychological conversation datasets presents significant challenges in developing effective psychotherapy-guided conversational agents. In this paper, we construct a long-periodic dialogue corpus for counseling based on cognitive behavioral therapy (CBT). Our curated dataset includes multiple sessions for each counseling and incorporates cognitive conceptualization diagrams (CCDs) to guide client simulation across diverse scenarios. To evaluate the utility of our dataset, we train an in-depth counseling model and present a comprehensive evaluation framework to benchmark it against established psychological criteria for CBT-based counseling. Results demonstrate that DiaCBT effectively enhances LLMs' ability to emulate psychologists with CBT expertise, underscoring its potential for training more professional counseling agents.
Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions
Kang, Minwoo, Moon, Suhong, Lee, Seung Hyeong, Raj, Ayush, Suh, Joseph, Chan, David M., Canny, John
Large language models (LLMs) are increasingly capable of simulating human behavior, offering cost-effective ways to estimate user responses to various surveys and polls. However, the questions in these surveys usually reflect socially understood attitudes: the patterns of attitudes of old/young, liberal/conservative, as understood by both members and non-members of those groups. It is not clear whether the LLM binding is \emph{deep}, meaning the LLM answers as a member of a particular in-group would, or \emph{shallow}, meaning the LLM responds as an out-group member believes an in-group member would. To explore this difference, we use questions that expose known in-group/out-group biases. This level of fidelity is critical for applying LLMs to various political science studies, including timely topics on polarization dynamics, inter-group conflict, and democratic backsliding. To this end, we propose a novel methodology for constructing virtual personas with synthetic user "backstories" generated as extended, multi-turn interview transcripts. This approach is justified by the theory of \emph{narrative identity} which argues that personality at the highest level is \emph{constructed} from self-narratives. Our generated backstories are longer, rich in detail, and consistent in authentically describing a singular individual, compared to previous methods. We show that virtual personas conditioned on our backstories closely replicate human response distributions (up to an 87% improvement as measured by Wasserstein Distance) and produce effect sizes that closely match those observed in the original studies of in-group/out-group biases. Altogether, our work extends the applicability of LLMs beyond estimating socially understood responses, enabling their use in a broader range of human studies.
Physics Supernova: AI Agent Matches Elite Gold Medalists at IPhO 2025
Qiu, Jiahao, Shi, Jingzhe, Juan, Xinzhe, Zhao, Zelin, Geng, Jiayi, Liu, Shilong, Wang, Hongru, Wu, Sanfeng, Wang, Mengdi
Physics provides fundamental laws that describe and predict the natural world. AI systems aspiring toward more general, real-world intelligence must therefore demonstrate strong physics problem-solving abilities: to formulate and apply physical laws for explaining and predicting physical processes. The International Physics Olympiad (IPhO)--the world's most prestigious physics competition--offers a rigorous benchmark for this purpose. We introduce Physics Supernova, an AI agent system with superior physics problem-solving abilities that match elite IPhO gold medalists. In IPhO 2025 theory problems, Physics Supernova attains 23.5/30 points, ranking 14th of 406 contestants and surpassing the median performance of human gold medalists. We extensively analyzed Physics Supernova's capabilities and flexibility across diverse physics tasks. These results show that principled tool integration within agent systems can deliver competitive improvements in solving challenging science problems. The codes are available at https://github.com/CharlesQ9/Physics-Supernova.
Structure and Destructure: Dual Forces in the Making of Knowledge Engines
The making of knowledge engines in natural language processing has been shaped by two seemingly distinct paradigms: one grounded in structure, the other driven by massively available unstructured data. The structured paradigm leverages predefined symbolic interactions, such as knowledge graphs, as priors and designs models to capture them. In contrast, the unstructured paradigm centers on scaling transformer architectures with increasingly vast data and model sizes, as seen in modern large language models. Despite their divergence, this thesis seeks to establish conceptual connections bridging these paradigms. Two complementary forces, structure and destructure, emerge across both paradigms: structure organizes seen symbolic interactions, while destructure, through periodic embedding resets, improves model plasticity and generalization to unseen scenarios. These connections form a new recipe for developing general knowledge engines that can support transparent, controllable, and adaptable intelligent systems.
Oversight Structures for Agentic AI in Public-Sector Organizations
Schmitz, Chris, Rystrรธm, Jonathan, Batzner, Jan
This paper finds that the introduction of agentic AI systems intensifies existing challenges to traditional public sector oversight mechanisms -- which rely on siloed compliance units and episodic approvals rather than continuous, integrated supervision. We identify five governance dimensions essential for responsible agent deployment: cross-departmental implementation, comprehensive evaluation, enhanced security protocols, operational visibility, and systematic auditing. We evaluate the capacity of existing oversight structures to meet these challenges, via a mixed-methods approach consisting of a literature review and interviews with civil servants in AI-related roles. We find that agent oversight poses intensified versions of three existing governance challenges: continuous oversight, deeper integration of governance and operational capabilities, and interdepartmental coordination. We propose approaches that both adapt institutional structures and design agent oversight compatible with public sector constraints.
Onion CEO Ben Collins Hasn't Given Up on Print--or Buying Infowars
Onion CEO Ben Collins Hasn't Given Up on Print--or Buying Infowars A year after relaunching The Onion as a newspaper, Collins visits to talk about why "going into something and not ruining it is bravery." Ben Collins made a big bet. A year ago, just a few months after he'd been named CEO of The Onion, he relaunched its print edition. Once a favorite on university campuses, The Onion hadn't published a physical issue since 2013 . Common wisdom said that readership, and advertising dollars, just weren't there for newspapers. But Collins, a fan of the satirical paper since childhood, thought "that's dumb." Readers celebrated The Onion's relaunch and the ability to read all of its bitingly funny headlines on a single broadsheet. Collins wouldn't give exact numbers on how many people are currently subscribed to the print edition but did say they should be enough to keep its writers' room humming (a few weeks after we taped this episode, the Wall Street Journal reported that The Onion now boasts more than 53,000 paying subscribers). On this episode of, I spoke with Collins about his hopes for The Onion, the future of journalism, and his Balatro addiction. KATIE DRUMMOND: Do you have a recent favorite Onion headline? Can I look it up for you? "Ghislaine Maxwell Can't Help but Notice Interview Room Covered in Plastic Sheeting." The staff churns out like 15 a day that are great. I sit there, and I still don't know how they do it. When I say they throw away eight or nine of the best sentences I would ever write every day, I mean that sincerely.
Estimated Informed Anytime Search for Sampling-Based Planning via Adaptive Sampler
Zhang, Liding, Cai, Kuanqi, Zhang, Yu, Bing, Zhenshan, Wang, Chaoqun, Wu, Fan, Haddadin, Sami, Knoll, Alois
Path planning in robotics often involves solving continuously valued, high-dimensional problems. Popular informed approaches include graph-based searches, such as A*, and sampling-based methods, such as Informed RRT*, which utilize informed set and anytime strategies to expedite path optimization incrementally. Informed sampling-based planners define informed sets as subsets of the problem domain based on the current best solution cost. However, when no solution is found, these planners re-sample and explore the entire configuration space, which is time-consuming and computationally expensive. This article introduces Multi-Informed Trees (MIT*), a novel planner that constructs estimated informed sets based on prior admissible solution costs before finding the initial solution, thereby accelerating the initial convergence rate. Moreover, MIT* employs an adaptive sampler that dynamically adjusts the sampling strategy based on the exploration process. Furthermore, MIT* utilizes length-related adaptive sparse collision checks to guide lazy reverse search. These features enhance path cost efficiency and computation times while ensuring high success rates in confined scenarios. Through a series of simulations and real-world experiments, it is confirmed that MIT* outperforms existing single-query, sampling-based planners for problems in R^4 to R^16 and has been successfully applied to real-world robot manipulation tasks. A video showcasing our experimental results is available at: https://youtu.be/30RsBIdexTU
Documenting Deployment with Fabric: A Repository of Real-World AI Governance
Jorgensen, Mackenzie, Brogle, Kendall, Collins, Katherine M., Ibrahim, Lujain, Shah, Arina, Ivanovic, Petra, Broestl, Noah, Piles, Gabriel, Dongha, Paul, Abdulhussein, Hatim, Weller, Adrian, Powers, Jillian, Bhatt, Umang
Artificial intelligence (AI) is increasingly integrated into society, from financial services and traffic management to creative writing. Academic literature on the deployment of AI has mostly focused on the risks and harms that result from the use of AI. We introduce Fabric, a publicly available repository of deployed AI use cases to outline their governance mechanisms. Through semi-structured interviews with practitioners, we collect an initial set of 20 AI use cases. In addition, we co-design diagrams of the AI workflow with the practitioners. We discuss the oversight mechanisms and guardrails used in practice to safeguard AI use. The Fabric repository includes visual diagrams of AI use cases and descriptions of the deployed systems. Using the repository, we surface gaps in governance and find common patterns in human oversight of deployed AI systems. We intend for Fabric to serve as an extendable, evolving tool for researchers to study the effectiveness of AI governance.