Personal
Sports and Women's Sports: Gender Bias in Text Generation with Olympic Data
Large Language Models (LLMs) have been shown to be biased in prior work, as they generate text that is in line with stereotypical views of the world or that is not representative of the viewpoints and values of historically marginalized demographic groups. In this work, we propose using data from parallel men's and women's events at the Olympic Games to investigate different forms of gender bias in language models. We define three metrics to measure bias, and find that models are consistently biased against women when the gender is ambiguous in the prompt. In this case, the model frequently retrieves only the results of the men's event with or without acknowledging them as such, revealing pervasive gender bias in LLMs in the context of athletics.
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
Ye, Zhen, Zhu, Xinfa, Chan, Chi-Min, Wang, Xinsheng, Tan, Xu, Lei, Jiahe, Peng, Yi, Liu, Haohe, Jin, Yizhu, DAI, Zheqi, Lin, Hongzhan, Chen, Jianyi, Du, Xingjian, Xue, Liumeng, Chen, Yunlin, Li, Zhifei, Xie, Lei, Kong, Qiuqiang, Guo, Yike, Xue, Wei
Recent advances in text-based large language models (LLMs), particularly in the GPT series and the o1 model, have demonstrated the effectiveness of scaling both training-time and inference-time compute. However, current state-of-the-art TTS systems leveraging LLMs are often multi-stage, requiring separate models (e.g., diffusion models after LLM), complicating the decision of whether to scale a particular model during training or testing. This work makes the following contributions: First, we explore the scaling of train-time and inference-time compute for speech synthesis. Second, we propose a simple framework Llasa for speech synthesis that employs a single-layer vector quantizer (VQ) codec and a single Transformer architecture to fully align with standard LLMs such as Llama. Our experiments reveal that scaling train-time compute for Llasa consistently improves the naturalness of synthesized speech and enables the generation of more complex and accurate prosody patterns. Furthermore, from the perspective of scaling inference-time compute, we employ speech understanding models as verifiers during the search, finding that scaling inference-time compute shifts the sampling modes toward the preferences of specific verifiers, thereby improving emotional expressiveness, timbre consistency, and content accuracy. In addition, we released the checkpoint and training code for our TTS model (1B, 3B, 8B) and codec model publicly available.
Innovative Framework for Early Estimation of Mental Disorder Scores to Enable Timely Interventions
Singh, Himanshi, Tiwari, Sadhana, Agarwal, Sonali, Chandra, Ritesh, Sonbhadra, Sanjay Kumar, Singh, Vrijendra
-- Individuals' general well - being is greatly impacted by mental health conditions including depression and Post - Traumatic Stress Disorder (PTSD), underscoring the importance of early detection and precise diagnosis in order to facilitate prompt clinical in tervention. An advanced multimodal deep learning system for the automated classification of PTSD and depression is presented in this paper. Utilizing textual and audio data from clinical interview datasets, the method com - bines features taken from both mo dalities by combining the architectures of LSTM (Long Short - Term Memory) and BiLSTM (Bidirectional Long Short - Term Memory).Although text features focus on speech's semantic and grammatical components; audio features capture vocal traits including rhythm, t one, and pitch. This combination of modalities enhances the model's capacity to identify minute patterns connected to mental health conditions. Using test datasets, the proposed method achieves classification accuracies of 92% for depression and 93% for PT SD, outper - forming traditional unimodal approaches and demonstrating its accuracy and robustness. In addi - tion to lowering people's quality of life, many illnesses have a significant negative impact on society and the economy. If not treated or recognized, mental health issues can lead to chronic diseases, decreased functioning, and even higher death rates. In under - resourced areas mental health issues are prevalent, even with advancements in clinical practice, traditional methods of diagnosing these disorders -- such as psychological testing and in - person interviews -- are still limited due to their subjective nature, resource - intensive nature, and reliance on the availabil - ity of qualified healthcare professionals.
Interview with Nisarg Shah: Understanding fairness in AI and machine learning
During the 33rd International Joint Conference on Artificial Intelligence (IJCAI), held in Jeju, I had the opportunity to meet with one of the keynote speakers, and winner of the 2024 IJCAI Computers and Thought Award, Professor Nisarg Shah. I asked him about his research, the role of theory in machine learning research, fairness and safety guarantees, regulation, conference reviews, and advice for those just starting out on their research journey. Could you start by telling us about yourself, your career, and your education? Nisarg Shah (NS): I grew up in India and went to IIT Bombay for my undergraduate. Ever since then, I knew that I wanted to go into higher education and academia. I actually did do an industrial placement after my undergrad, and I got a job offer that was very lucrative and would have been more lucrative than doing a PhD. However, that [money] is not why I wanted to do my PhD. I wanted to do my PhD because I was genuinely curious about different questions in this field, and I wanted to study more about them and have fun while doing it.
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
Chervonyi, Yuri, Trinh, Trieu H., Olลกรกk, Miroslav, Yang, Xiaomeng, Nguyen, Hoang, Menegali, Marcelo, Jung, Junehyuk, Verma, Vikas, Le, Quoc V., Luong, Thang
We present AlphaGeometry2, a significantly improved version of AlphaGeometry introduced in Trinh et al. (2024), which has now surpassed an average gold medalist in solving Olympiad geometry problems. To achieve this, we first extend the original AlphaGeometry language to tackle harder problems involving movements of objects, and problems containing linear equations of angles, ratios, and distances. This, together with other additions, has markedly improved the coverage rate of the AlphaGeometry language on International Math Olympiads (IMO) 2000-2024 geometry problems from 66% to 88%. The search process of AlphaGeometry2 has also been greatly improved through the use of Gemini architecture for better language modeling, and a novel knowledge-sharing mechanism that combines multiple search trees. Together with further enhancements to the symbolic engine and synthetic data generation, we have significantly boosted the overall solving rate of AlphaGeometry2 to 84% for $\textit{all}$ geometry problems over the last 25 years, compared to 54% previously. AlphaGeometry2 was also part of the system that achieved silver-medal standard at IMO 2024 https://dpmd.ai/imo-silver. Last but not least, we report progress towards using AlphaGeometry2 as a part of a fully automated system that reliably solves geometry problems directly from natural language input.
Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers
Beaglehole, Daniel, Radhakrishnan, Adityanarayanan, Boix-Adserร , Enric, Belkin, Mikhail
A trained Large Language Model (LLM) contains much of human knowledge. Yet, it is difficult to gauge the extent or accuracy of that knowledge, as LLMs do not always ``know what they know'' and may even be actively misleading. In this work, we give a general method for detecting semantic concepts in the internal activations of LLMs. Furthermore, we show that our methodology can be easily adapted to steer LLMs toward desirable outputs. Our innovations are the following: (1) we use a nonlinear feature learning method to identify important linear directions for predicting concepts from each layer; (2) we aggregate features across layers to build powerful concept detectors and steering mechanisms. We showcase the power of our approach by attaining state-of-the-art results for detecting hallucinations, harmfulness, toxicity, and untruthful content on seven benchmarks. We highlight the generality of our approach by steering LLMs towards new concepts that, to the best of our knowledge, have not been previously considered in the literature, including: semantic disambiguation, human languages, programming languages, hallucinated responses, science subjects, poetic/Shakespearean English, and even multiple concepts simultaneously. Moreover, our method can steer concepts with numerical attributes such as product reviews. We provide our code (including a simple API for our methods) at https://github.com/dmbeaglehole/neural_controllers .
AI 'godfather' predicts another revolution in the tech in next five years
One of the "godfathers" of modern artificial intelligence has predicted a further revolution in the technology by the end of the decade, and says current systems are too limited to create domestic robots and fully automated cars. Yann LeCun, the chief AI scientist at Mark Zuckerberg's Meta, said new breakthroughs are needed in order for the systems to understand and interact with the physical world. LeCun spoke as one of seven engineers who were awarded the 500,000 Queen Elizabeth prize for engineering on Tuesday for their contributions to machine learning, a cornerstone of AI. Recent breakthroughs in the sector, led by the launch of OpenAI's ChatGPT chatbot, have heightened expectations โ and fears โ of systems gaining human levels of intelligence. However, LeCun said there was some way to go before AIs matched humans or animals, with the current cutting-edge technology excelling at "manipulating language" but not at understanding the physical world.
Stuart J. Russell wins 2025 AAAI Award for Artificial Intelligence for the Benefit of Humanity
The AAAI Award for Artificial Intelligence for the Benefit of Humanity recognizes positive impacts of artificial intelligence to protect, enhance, and improve human life in meaningful ways with long-lived effects. The award is given annually at the conference for the Association for the Advancement of Artificial Intelligence (AAAI). This year, the AAAI Awards Committee has announced that the 2025 recipient of the award and 25,000 prize is Stuart J. Russell, "for his work on the conceptual and theoretical foundations of provably beneficial AI and his leadership in creating the field of AI safety". Stuart will give an invited talk at AAAI 2025 entitled "Can AI Benefit Humanity?" Stuart J. Russell is a Distinguished Professor of Computer Science at the University of California, Berkeley, and holds the Michael H. Smith and Lotfi A. Zadeh Chair in Engineering.
France pitches AI summit as 'wake-up call' for Europe
France hosts top tech players next week at an artificial intelligence summit meant as a "wake-up call" for Europe as it struggles with AI challenges from the United States and China. Players from across the sector and representatives from 80 nations will gather in the French capital on Feb. 10 and 11 in the sumptuous Grand Palais, built for the 1900 Universal Exhibition.
Consistent Client Simulation for Motivational Interviewing-based Counseling
Yang, Yizhe, Achananuparp, Palakorn, Huang, Heyan, Jiang, Jing, Pinto, John, Giam, Jenny, Leng, Kit Phey, Lim, Nicholas Gabriel, Ern, Cameron Tan Shi, Lim, Ee-peng
Simulating human clients in mental health counseling is crucial for training and evaluating counselors (both human or simulated) in a scalable manner. Nevertheless, past research on client simulation did not focus on complex conversation tasks such as mental health counseling. In these tasks, the challenge is to ensure that the client's actions (i.e., interactions with the counselor) are consistent with with its stipulated profiles and negative behavior settings. In this paper, we propose a novel framework that supports consistent client simulation for mental health counseling. Our framework tracks the mental state of a simulated client, controls its state transitions, and generates for each state behaviors consistent with the client's motivation, beliefs, preferred plan to change, and receptivity. By varying the client profile and receptivity, we demonstrate that consistent simulated clients for different counseling scenarios can be effectively created. Both our automatic and expert evaluations on the generated counseling sessions also show that our client simulation method achieves higher consistency than previous methods.