user study
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Jordan (0.04)
- (4 more...)
- Education (1.00)
- Transportation (0.69)
- Health & Medicine > Therapeutic Area (0.67)
- North America > Canada > Newfoundland and Labrador > Newfoundland (0.05)
- North America > United States > Texas > Loving County (0.04)
- North America > United States > North Dakota > Billings County (0.04)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.48)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (3 more...)
Use-Case-Grounded Simulations for Explanation Evaluation
A growing body of research runs human subject evaluations to study whether providing users with explanations of machine learning models can help them with practical real-world use cases. However, running user studies is challenging and costly, and consequently each study typically only evaluates a limited number of different settings, e.g., studies often only evaluate a few arbitrarily selected model explanation methods. To address these challenges and aid user study design, we introduce Simulated Evaluations (SimEvals). SimEvals involve training algorithmic agents that take as input the information content (such as model explanations) that would be presented to the user, to predict answers to the use case of interest. The algorithmic agent's test set accuracy provides a measure of the predictiveness of the information content for the downstream use case. We run a comprehensive evaluation on three real-world use cases (forward simulation, model debugging, and counterfactual reasoning) to demonstrate that SimEvals can effectively identify which explanation methods will help humans for each use case. These results provide evidence that \simevals{} can be used to efficiently screen an important set of user study design decisions, e.g., selecting which explanations should be presented to the user, before running a potentially costly user study.
Learning to Code with Context: A Study-Based Approach
Borghoff, Uwe M., Minas, Mark, Schopp, Jannis
The rapid emergence of generative AI tools is transforming the way software is developed. Consequently, software engineering education must adapt to ensure that students not only learn traditional development methods but also understand how to meaningfully and responsibly use these new technologies. In particular, project-based courses offer an effective environment to explore and evaluate the integration of AI assistance into real-world development practices. This paper presents our approach and a user study conducted within a university programming project in which students collaboratively developed computer games. The study investigates how participants used generative AI tools throughout different phases of the software development process, identifies the types of tasks where such tools were most effective, and analyzes the challenges students encountered. Building on these insights, we further examine a repository-aware, locally deployed large language model (LLM) assistant designed to provide project-contextualized support. The system employs Retrieval-Augmented Generation (RAG) to ground responses in relevant documentation and source code, enabling qualitative analysis of model behavior, parameter sensitivity, and common failure modes. The findings deepen our understanding of context-aware AI support in educational software projects and inform future integration of AI-based assistance into software engineering curricula.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
- Overview (0.92)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.69)
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Pach, Mateusz, Karthik, Shyamgopal, Bouniot, Quentin, Belongie, Serge, Akata, Zeynep
Sparse Autoencoders (SAEs) have recently gained attention as a means to improve the interpretability and steerability of Large Language Models (LLMs), both of which are essential for AI safety. In this work, we extend the application of SAEs to Vision-Language Models (VLMs), such as CLIP, and introduce a comprehensive framework for evaluating monosemanticity at the neuron-level in visual representations. To ensure that our evaluation aligns with human perception, we propose a benchmark derived from a large-scale user study. Our experimental results reveal that SAEs trained on VLMs significantly enhance the monosemanticity of individual neurons, with sparsity and wide latents being the most influential factors. Further, we demonstrate that applying SAE interventions on CLIP's vision encoder directly steers multimodal LLM outputs (e.g., LLaVA), without any modifications to the underlying language model. These findings emphasize the practicality and efficacy of SAEs as an unsupervised tool for enhancing both interpretability and control of VLMs. Code and benchmark data are available at https://github.com/ExplainableML/sae-for-vlm.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
Lost in the Pipeline: How Well Do Large Language Models Handle Data Preparation?
Spreafico, Matteo, Tassini, Ludovica, Sancricca, Camilla, Cappiello, Cinzia
Large language models have recently demonstrated their exceptional capabilities in supporting and automating various tasks. Among the tasks worth exploring for testing large language model capabilities, we considered data preparation, a critical yet often labor-intensive step in data-driven processes. This paper investigates whether large language models can effectively support users in selecting and automating data preparation tasks. To this aim, we considered both general-purpose and fine-tuned tabular large language models. We prompted these models with poor-quality datasets and measured their ability to perform tasks such as data profiling and cleaning. We also compare the support provided by large language models with that offered by traditional data preparation tools. To evaluate the capabilities of large language models, we developed a custom-designed quality model that has been validated through a user study to gain insights into practitioners' expectations.
- North America > United States (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
Is Passive Expertise-Based Personalization Enough? A Case Study in AI-Assisted Test-Taking
Siyan, Li, Zhang, Jason, Maharaj, Akash, Shi, Yuanming, Li, Yunyao
Novice and expert users have different systematic preferences in task-oriented dialogues. However, whether catering to these preferences actually improves user experience and task performance remains understudied. To investigate the effects of expertise-based personalization, we first built a version of an enterprise AI assistant with passive personalization. We then conducted a user study where participants completed timed exams, aided by the two versions of the AI assistant. Preliminary results indicate that passive personalization helps reduce task load and improve assistant perception, but reveal task-specific limitations that can be addressed through providing more user agency. These findings underscore the importance of combining active and passive personalization to optimize user experience and effectiveness in enterprise task-oriented environments.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.93)
Towards Automating Data Access Permissions in AI Agents
Wu, Yuhao, Yang, Ke, Roesner, Franziska, Kohno, Tadayoshi, Zhang, Ning, Iqbal, Umar
As AI agents attempt to autonomously act on users' behalf, they raise transparency and control issues. We argue that permission-based access control is indispensable in providing meaningful control to the users, but conventional permission models are inadequate for the automated agentic execution paradigm. We therefore propose automated permission management for AI agents. Our key idea is to conduct a user study to identify the factors influencing users' permission decisions and to encode these factors into an ML-based permission management assistant capable of predicting users' future decisions. We find that participants' permission decisions are influenced by communication context but importantly individual preferences tend to remain consistent within contexts, and align with those of other participants. Leveraging these insights, we develop a permission prediction model achieving 85.1% accuracy overall and 94.4% for high-confidence predictions. We find that even without using permission history, our model achieves an accuracy of 66.9%, and a slight increase of training samples (i.e., 1-4) can substantially increase the accuracy by 10.8%.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- North America > Bonaire, Sint Eustatius and Saba > Bonaire > Kralendijk (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)