Personal
"I Said Things I Needed to Hear Myself": Peer Support as an Emotional, Organisational, and Sociotechnical Practice in Singapore
Sim, Kellie Yu Hui, Choo, Kenny Tsu Wei
Peer support plays a vital role in expanding access to mental health care by providing empathetic, community-based support outside formal clinical systems. As digital platforms increasingly mediate such support, the design and impact of these technologies remain under-examined, particularly in Asian contexts. This paper presents findings from an interview study with 20 peer supporters in Singapore, who operate across diverse online, offline, and hybrid environments. Through a thematic analysis, we unpack how participants start, conduct, and sustain peer support, highlighting their motivations, emotional labour, and the sociocultural dimensions shaping their practices. Building on this grounded understanding, we surface design directions for culturally responsive digital tools that scaffold rather than supplant relational care. Drawing insights from qualitative accounts, we offer a situated perspective on how AI might responsibly augment peer support. This research contributes to human-centred computing by articulating the lived realities of peer supporters and proposing design implications for trustworthy and context-sensitive AI in mental health.
"Is This Really a Human Peer Supporter?": Misalignments Between Peer Supporters and Experts in LLM-Supported Interactions
Sim, Kellie Yu Hui, Lee, Roy Ka-Wei, Choo, Kenny Tsu Wei
Mental health is a growing global concern, prompting interest in AI-driven solutions to expand access to psychosocial support. Peer support, grounded in lived experience, offers a valuable complement to professional care. However, variability in training, effectiveness, and definitions raises concerns about quality, consistency, and safety. Large Language Models (LLMs) present new opportunities to enhance peer support interactions, particularly in real-time, text-based interactions. We present and evaluate an AI-supported system with an LLM-simulated distressed client, context-sensitive LLM-generated suggestions, and real-time emotion visualisations. 2 mixed-methods studies with 12 peer supporters and 5 mental health professionals (i.e., experts) examined the system's effectiveness and implications for practice. Both groups recognised its potential to enhance training and improve interaction quality. However, we found a key tension emerged: while peer supporters engaged meaningfully, experts consistently flagged critical issues in peer supporter responses, such as missed distress cues and premature advice-giving. This misalignment highlights potential limitations in current peer support training, especially in emotionally charged contexts where safety and fidelity to best practices are essential. Our findings underscore the need for standardised, psychologically grounded training, especially as peer support scales globally. They also demonstrate how LLM-supported systems can scaffold this development--if designed with care and guided by expert oversight. This work contributes to emerging conversations on responsible AI integration in mental health and the evolving role of LLMs in augmenting peer-delivered care.
Discovering Forbidden Topics in Language Models
Rager, Can, Wendler, Chris, Gandikota, Rohit, Bau, David
Refusal discovery is the task of identifying the full set of topics that a language model refuses to discuss. We introduce this new problem setting and develop a refusal discovery method, Iterated Prefill Crawler (IPC), that uses token prefilling to find forbidden topics. We benchmark IPC on Tulu-3-8B, an open-source model with public safety tuning data. Our crawler manages to retrieve 31 out of 36 topics within a budget of 1000 prompts. Next, we scale the crawler to a frontier model using the prefilling option of Claude-Haiku. Finally, we crawl three widely used open-weight models: Llama-3.3-70B and two of its variants finetuned for reasoning: DeepSeek-R1-70B and Perplexity-R1-1776-70B. DeepSeek-R1-70B reveals patterns consistent with censorship tuning: The model exhibits "thought suppression" behavior that indicates memorization of CCP-aligned responses. Although Perplexity-R1-1776-70B is robust to censorship, IPC elicits CCP-aligned refusals answers in the quantized model. Our findings highlight the critical need for refusal discovery methods to detect biases, boundaries, and alignment failures of AI systems.
Brian Wilson, musical genius behind the Beach Boys, dies at 82
Brian Wilson, the musical savant who scripted a defining Southern California soundtrack with a run of hit songs with the Beach Boys before being pulled down a rabbit hole of despair and depression when his highly anticipated masterwork was shelved unfinished, has died. Wilson's family announced his death Wednesday morning on Facebook. "We are at a loss for words right now," the post said. "Please respect our privacy at this time as our family is grieving. We realize we are sharing our grief with the world," said the statement, also shared on Instagram and the musician's website. The statement didn't reveal a cause of death. Wilson died more than a year after it was revealed he was diagnosed with dementia and placed under a conservatorship in May 2024.
A Survey of Link Prediction in N-ary Knowledge Graphs
Wei, Jiyao, Guan, Saiping, Li, Da, Jin, Xiaolong, Guo, Jiafeng, Cheng, Xueqi
N-ary Knowledge Graphs (NKGs) are a specialized type of knowledge graph designed to efficiently represent complex real-world facts. Unlike traditional knowledge graphs, where a fact typically involves two entities, NKGs can capture n-ary facts containing more than two entities. Link prediction in NKGs aims to predict missing elements within these n-ary facts, which is essential for completing NKGs and improving the performance of downstream applications. This task has recently gained significant attention. In this paper, we present the first comprehensive survey of link prediction in NKGs, providing an overview of the field, systematically categorizing existing methods, and analyzing their performance and application scenarios. We also outline promising directions for future research.
From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?
Zhou, Zhanke, Feng, Xiao, Zhu, Zhaocheng, Yao, Jiangchao, Koyejo, Sanmi, Han, Bo
While existing benchmarks probe the reasoning abilities of large language models (LLMs) across diverse domains, they predominantly assess passive reasoning, providing models with all the information needed to reach a solution. By contrast, active reasoning-where an LLM must interact with external systems to acquire missing evidence or data-has received little systematic attention. To address this shortfall, we present AR-Bench, a novel benchmark designed explicitly to evaluate an LLM's active reasoning skills. AR-Bench comprises three task families-detective cases, situation puzzles, and guessing numbers-that together simulate real-world, agentic scenarios and measure performance across commonsense, logical, and symbolic reasoning challenges. Empirical evaluation on AR-Bench demonstrates that contemporary LLMs exhibit pronounced difficulties with active reasoning: they frequently fail to acquire or leverage the information needed to solve tasks. This gap highlights a stark divergence between their passive and active reasoning abilities. Moreover, ablation studies indicate that even advanced strategies, such as tree-based searching or post-training approaches, yield only modest gains and fall short of the levels required for real-world deployment. Collectively, these findings highlight the critical need to advance methodology for active reasoning, e.g., incorporating interactive learning, real-time feedback loops, and environment-aware objectives for training. The benchmark is publicly available at: https://github.com/tmlr-group/AR-Bench.
A Computer Wrote My Mother's Obituary
The funeral director said "AI" as if it were a normal element of memorial services, like caskets or flowers. Of all places, I had not expected artificial intelligence to follow me into the small, windowless room of the mortuary. But here it was, ready to assist me in the task of making sense of death. It was already Wednesday, and I'd just learned that I had to write an obituary for my mother by Thursday afternoon if I wanted it to run in Sunday's paper. AI could help me do this.
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models
Yagoubi, Mouadh, Dahou, Yasser, Mokeddem, Billel, Belkada, Younes, Le-Khac, Phuc H., Boussaha, Basma El Amel, Alami, Reda, Zuo, Jingwei, Marsili, Damiano, Farooq, Mugariya, Lalmas, Mounia, Gkioxari, Georgia, Gallinari, Patrick, Torr, Philip, Hacid, Hakim
Existing benchmarks have proven effective for assessing the performance of fully trained large language models. However, we find striking differences in the early training stages of small models, where benchmarks often fail to provide meaningful or discriminative signals. To explore how these differences arise, this competition tackles the challenge of designing scientific knowledge evaluation tasks specifically tailored for measuring early training progress of language models. Participants are invited to develop novel evaluation methodologies or adapt existing benchmarks to better capture performance differences among language models. To support this effort, we provide three pre-trained small models (0.5B, 1B, and 3B parameters), along with intermediate checkpoints sampled during training up to 200B tokens. All experiments and development work can be run on widely available free cloud-based GPU platforms, making participation accessible to researchers with limited computational resources. Submissions will be evaluated based on three criteria: the quality of the performance signal they produce, the consistency of model rankings at 1 trillion tokens of training, and their relevance to the scientific knowledge domain. By promoting the design of tailored evaluation strategies for early training, this competition aims to attract a broad range of participants from various disciplines, including those who may not be machine learning experts or have access to dedicated GPU resources. Ultimately, this initiative seeks to make foundational LLM research more systematic and benchmark-informed from the earliest phases of model development.
Human and AI collaboration in Fitness Education:A Longitudinal Study with a Pilates Instructor
Artificial intelligence is poised to transform teaching and coaching practices,yet its optimal role alongside human expertise remains unclear.This study investigates human and AI collaboration in fitness education through a one year qualitative case study with a Pilates instructor.The researcher participated in the instructor classes and conducted biweekly semi structured interviews to explore how generative AI could be integrated into class planning and instruction.
Congratulations to the #IJCAI2025 award winners
The winners of three International Joint Conferences on Artificial Intelligence (IJCAI) awards have been announced. These three distinctions are: the Award for Research Excellence, the Computers and Thought Award and the John McCarthy Award. The Research Excellence award is given to a scientist who has carried out a program of research of consistently high quality throughout an entire career yielding several substantial results. The winner of the 2025 Award for Research Excellence is Rina Dechter, Distinguished Professor of Computer Science, University of California, Irvine, USA . Professor Dechter is recognized for her seminal contributions to the fields of constraint satisfaction and probabilistic inference, including novel algorithmic frameworks, modeling ideas, complexity analyses, and unifying principles.