position
- Leisure & Entertainment > Games (1.00)
- Materials > Chemicals > Industrial Gases > Liquified Gas (0.46)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)
- Energy > Oil & Gas > Midstream (0.46)
Towards Automated Semantic Interpretability in Reinforcement Learning via Vision-Language Models
Li, Zhaoxin, Xi-Jia, Zhang, Altundas, Batuhan, Chen, Letian, Paleja, Rohan, Gombolay, Matthew
Semantic Interpretability in Reinforcement Learning (RL) enables transparency, accountability, and safer deployment by making the agent's decisions understandable and verifiable. Achieving this, however, requires a feature space composed of human-understandable concepts, which traditionally rely on human specification and fail to generalize to unseen environments. In this work, we introduce Semantically Interpretable Reinforcement Learning with Vision-Language Models Empowered Automation (SILVA), an automated framework that leverages pre-trained vision-language models (VLM) for semantic feature extraction and interpretable tree-based models for policy optimization. SILVA first queries a VLM to identify relevant semantic features for an unseen environment, then extracts these features from the environment. Finally, it trains an Interpretable Control Tree via RL, mapping the extracted features to actions in a transparent and interpretable manner. To address the computational inefficiency of extracting features directly with VLMs, we develop a feature extraction pipeline that generates a dataset for training a lightweight convolutional network, which is subsequently used during RL. By leveraging VLMs to automate tree-based RL, SILVA removes the reliance on human annotation previously required by interpretable models while also overcoming the inability of VLMs alone to generate valid robot policies, enabling semantically interpretable reinforcement learning without human-in-the-loop.
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Position: Model Collapse Does Not Mean What You Think
Schaeffer, Rylan, Kazdan, Joshua, Arulandu, Alvan Caleb, Koyejo, Sanmi
The proliferation of AI-generated content online has fueled concerns over \emph{model collapse}, a degradation in future generative models' performance when trained on synthetic data generated by earlier models. Industry leaders, premier research journals and popular science publications alike have prophesied catastrophic societal consequences stemming from model collapse. In this position piece, we contend this widespread narrative fundamentally misunderstands the scientific evidence. We highlight that research on model collapse actually encompasses eight distinct and at times conflicting definitions of model collapse, and argue that inconsistent terminology within and between papers has hindered building a comprehensive understanding of model collapse. To assess how significantly different interpretations of model collapse threaten future generative models, we posit what we believe are realistic conditions for studying model collapse and then conduct a rigorous assessment of the literature's methodologies through this lens. While we leave room for reasonable disagreement, our analysis of research studies, weighted by how faithfully each study matches real-world conditions, leads us to conclude that certain predicted claims of model collapse rely on assumptions and conditions that poorly match real-world conditions, and in fact several prominent collapse scenarios are readily avoidable. Altogether, this position paper argues that model collapse has been warped from a nuanced multifaceted consideration into an oversimplified threat, and that the evidence suggests specific harms more likely under society's current trajectory have received disproportionately less attention.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)
Factorio Learning Environment
Hopkins, Jack, Bakler, Mart, Khan, Akbir
Large Language Models (LLMs) are rapidly saturating existing benchmarks, necessitating new open-ended evaluations. We introduce the Factorio Learning Environment (FLE), based on the game of Factorio, that tests agents in long-term planning, program synthesis, and resource optimization. FLE provides exponentially scaling challenges -- from basic automation to complex factories processing millions of resource units per second. We provide two settings: (1) lab-play consisting of eight structured tasks with fixed resources, and (2) open-play with the unbounded task of building the largest factory on an procedurally generated map. We demonstrate across both settings that models still lack strong spatial reasoning. In lab-play, we find that LLMs exhibit promising short-horizon skills, yet are unable to operate effectively in constrained environments, reflecting limitations in error analysis. In open-play, while LLMs discover automation strategies that improve growth (e.g electric-powered drilling), they fail to achieve complex automation (e.g electronic-circuit manufacturing).
- Workflow (0.67)
- Research Report (0.64)
- Materials > Metals & Mining (1.00)
- Materials > Chemicals (0.95)
- Education (0.72)
- (5 more...)
Position: Ensuring mutual privacy is necessary for effective external evaluation of proprietary AI systems
Bucknall, Ben, Trager, Robert F., Osborne, Michael A.
The external evaluation of AI systems is increasingly recognised as a crucial approach for understanding their potential risks. However, facilitating external evaluation in practice faces significant challenges in balancing evaluators' need for system access with AI developers' privacy and security concerns. Additionally, evaluators have reason to protect their own privacy - for example, in order to maintain the integrity of held-out test sets. We refer to the challenge of ensuring both developers' and evaluators' privacy as one of providing mutual privacy. In this position paper, we argue that (i) addressing this mutual privacy challenge is essential for effective external evaluation of AI systems, and (ii) current methods for facilitating external evaluation inadequately address this challenge, particularly when it comes to preserving evaluators' privacy. In making these arguments, we formalise the mutual privacy problem; examine the privacy and access requirements of both model owners and evaluators; and explore potential solutions to this challenge, including through the application of cryptographic and hardware-based approaches.
- North America > United States (0.46)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Position: Beyond Assistance -- Reimagining LLMs as Ethical and Adaptive Co-Creators in Mental Health Care
Badawi, Abeer, Laskar, Md Tahmid Rahman, Huang, Jimmy Xiangji, Raza, Shaina, Dolatabadi, Elham
This position paper argues for a fundamental shift in how Large Language Models (LLMs) are integrated into the mental health care domain. We advocate for their role as co-creators rather than mere assistive tools. While LLMs have the potential to enhance accessibility, personalization, and crisis intervention, their adoption remains limited due to concerns about bias, evaluation, over-reliance, dehumanization, and regulatory uncertainties. To address these challenges, we propose two structured pathways: SAFE-i (Supportive, Adaptive, Fair, and Ethical Implementation) Guidelines for ethical and responsible deployment, and HAAS-e (Human-AI Alignment and Safety Evaluation) Framework for multidimensional, human-centered assessment. SAFE-i provides a blueprint for data governance, adaptive model engineering, and real-world integration, ensuring LLMs align with clinical and ethical standards. HAAS-e introduces evaluation metrics that go beyond technical accuracy to measure trustworthiness, empathy, cultural sensitivity, and actionability. We call for the adoption of these structured approaches to establish a responsible and scalable model for LLM-driven mental health support, ensuring that AI complements-rather than replaces-human expertise.
- North America > United States (0.93)
- North America > Canada (0.14)
- Europe > Switzerland (0.14)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- Research Report > New Finding (0.93)
Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents
Pink, Mathis, Wu, Qinyuan, Vo, Vy Ai, Turek, Javier, Mu, Jianing, Huth, Alexander, Toneva, Mariya
As Large Language Models (LLMs) evolve from text-completion tools into fully fledged agents operating in dynamic environments, they must address the challenge of continually learning and retaining long-term knowledge. Many biological systems solve these challenges with episodic memory, which supports single-shot learning of instance-specific contexts. Inspired by this, we present an episodic memory framework for LLM agents, centered around five key properties of episodic memory that underlie adaptive and context-sensitive behavior. With various research efforts already partially covering these properties, this position paper argues that now is the right time for an explicit, integrated focus on episodic memory to catalyze the development of long-term agents. To this end, we outline a roadmap that unites several research directions under the goal to support all five properties of episodic memory for more efficient long-term LLM agents.
- North America > United States > New York (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Europe > Italy (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Position: Continual Learning Benefits from An Evolving Population over An Unified Model
Lu, Aojun, Ke, Junchao, Ding, Chunhui, Fan, Jiahao, Sun, Yanan
Deep neural networks have demonstrated remarkable success in machine learning; however, they remain fundamentally ill-suited for Continual Learning (CL). Recent research has increasingly focused on achieving CL without the need for rehearsal. Among these, parameter isolation-based methods have proven particularly effective in enhancing CL by optimizing model weights for each incremental task. Despite their success, they fall short in optimizing architectures tailored to distinct incremental tasks. To address this limitation, updating a group of models with different architectures offers a promising alternative to the traditional CL paradigm that relies on a single unified model. Building on this insight, this study introduces a novel Population-based Continual Learning (PCL) framework. PCL extends CL to the architectural level by maintaining and evolving a population of neural network architectures, which are continually refined for the current task through NAS. Importantly, the well-evolved population for the current incremental task is naturally inherited by the subsequent one, thereby facilitating forward transfer, a crucial objective in CL. Throughout the CL process, the population evolves, yielding task-specific architectures that collectively form a robust CL system. Experimental results demonstrate that PCL outperforms state-of-the-art rehearsal-free CL methods that employs a unified model, highlighting its potential as a new paradigm for CL.
Position: LLMs Can be Good Tutors in Foreign Language Education
Ye, Jingheng, Wang, Shen, Zou, Deqing, Yan, Yibo, Wang, Kun, Zheng, Hai-Tao, Xu, Zenglin, King, Irwin, Yu, Philip S., Wen, Qingsong
While recent efforts have begun integrating large language models (LLMs) into foreign language education (FLE), they often rely on traditional approaches to learning tasks without fully embracing educational methodologies, thus lacking adaptability to language learning. To address this gap, we argue that LLMs have the potential to serve as effective tutors in FLE. Specifically, LLMs can play three critical roles: (1) as data enhancers, improving the creation of learning materials or serving as student simulations; (2) as task predictors, serving as learner assessment or optimizing learning pathway; and (3) as agents, enabling personalized and inclusive education. We encourage interdisciplinary research to explore these roles, fostering innovation while addressing challenges and risks, ultimately advancing FLE through the thoughtful integration of LLMs.
- Asia > Thailand > Bangkok > Bangkok (0.05)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (13 more...)
- Overview (1.00)
- Instructional Material > Course Syllabus & Notes (0.34)
- Education > Educational Setting (1.00)
- Education > Curriculum > Subject-Specific Education (1.00)
- Education > Assessment & Standards > Student Performance (0.93)
- Education > Educational Technology > Educational Software > Computer Based Training (0.69)