Goto

Collaborating Authors

 messenger




Depth and Autonomy: A Framework for Evaluating LLM Applications in Social Science Research

Sanaei, Ali, Rajabzadeh, Ali

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly utilized by researchers across a wide range of domains, and qualitative social science is no exception; however, this adoption faces persistent challenges, including interpretive bias, low reliability, and weak auditability. We introduce a framework that situates LLM usage along two dimensions, interpretive depth and autonomy, thereby offering a straightforward way to classify LLM applications in qualitative research and to derive practical design recommendations. We present the state of the literature with respect to these two dimensions, based on all published social science papers available on Web of Science that use LLMs as a tool and not strictly as the subject of study. Rather than granting models expansive freedom, our approach encourages researchers to decompose tasks into manageable segments, much as they would when delegating work to capable undergraduate research assistants. By maintaining low levels of autonomy and selectively increasing interpretive depth only where warranted and under supervision, one can plausibly reap the benefits of LLMs while preserving transparency and reliability.


Not My Agent, Not My Boundary? Elicitation of Personal Privacy Boundaries in AI-Delegated Information Sharing

Guo, Bingcan, Xu, Eryue, Zhang, Zhiping, Li, Tianshi

arXiv.org Artificial Intelligence

Aligning AI systems with human privacy preferences requires understanding individuals' nuanced disclosure behaviors beyond general norms. Yet eliciting such boundaries remains challenging due to the context-dependent nature of privacy decisions and the complex trade-offs involved. We present an AI-powered elicitation approach that probes individuals' privacy boundaries through a discriminative task. We conducted a between-subjects study that systematically varied communication roles and delegation conditions, resulting in 1,681 boundary specifications from 169 participants for 61 scenarios. We examined how these contextual factors and individual differences influence the boundary specification. Quantitative results show that communication roles influence individuals' acceptance of detailed and identifiable disclosure, AI delegation and individuals' need for privacy heighten sensitivity to disclosed identifiers, and AI delegation results in less consensus across individuals. Our findings highlight the importance of situating privacy preference elicitation within real-world data flows. We advocate using nuanced privacy boundaries as an alignment goal for future AI systems.




Reasoning Capabilities of Large Language Models on Dynamic Tasks

Wong, Annie, Bäck, Thomas, Plaat, Aske, van Stein, Niki, Kononova, Anna V.

arXiv.org Artificial Intelligence

Large language models excel on static benchmarks, but their ability as self-learning agents in dynamic environments remains unclear. We evaluate three prompting strategies: self-reflection, heuristic mutation, and planning across dynamic tasks with open-source models. We find that larger models generally outperform smaller ones, but that strategic prompting can close this performance gap. Second, an overly long prompt can negatively impact smaller models on basic reactive tasks, while larger models show more robust behaviour. Third, advanced prompting techniques primarily benefit smaller models on complex games, but offer less improvement for already high-performing large language models. Yet, we find that advanced reasoning methods yield highly variable outcomes: while capable of significantly improving performance when reasoning and decision-making align, they also introduce instability and can lead to big performance drops. Compared to human performance, our findings reveal little evidence of true emergent reasoning. Instead, large language model performance exhibits persistent limitations in areas like planning and spatial coordination, suggesting that large language models still suffer fundamental shortcomings that may not be fully overcome through self-reflective prompting alone. Reasoning is a multi-faceted task, and while methods like Chain-of-thought improve multi-step reasoning on math word problems, our findings using dynamic benchmarks highlight important shortcomings in general reasoning capabilities, indicating a need to move beyond static benchmarks to capture the complexity of reasoning.


Scientists explain why BepiColombo's mission to Mercury is so tricky

Popular Science

It seems like it should be pretty easy to get to Mercury. The little rocky planet is so much closer to Earth than distant destinations like Jupiter, where we've successfully sent multiple spacecraft. Plus, it doesn't have a crushing atmosphere like our nearest neighbor Venus. But, in fact, it's actually really difficult to reach the innermost planet of our solar system--which makes it that much more impressive that the ESA and JAXA's BepiColombo mission has almost reached Mercury, recently completing its final flyby of the planet before entering orbit next year. Reaching Mercury is such a challenge because "the gravitational pull of the Sun is very strong near Mercury, which makes it difficult for spacecraft to slow down enough to enter orbit around the planet," explains Lina Hadid, staff scientist at CNRS in France and principal investigator of one of BepiColombo's instruments.


Distributed Networked Multi-task Learning

Hong, Lingzhou, Garcia, Alfredo

arXiv.org Artificial Intelligence

--We consider a distributed multi-task learning scheme that accounts for multiple linear model estimation tasks with heterogeneous and/or correlated data streams. We assume that nodes can be partitioned into groups corresponding to different learning tasks and communicate according to a directed network topology. Each node estimates a linear model asynchronously and is subject to local (within-group) regularization and global (across groups) regularization terms targeting noise reduction and generalization performance improvement respectively. We provide a finite-time characterization of convergence of the estimators and task relation and illustrate the scheme's general applicability in two examples: random field temperature estimation and modeling student performance from different academic districts. Index T erms --Multi-task Learning, Distributed Optimization, Network-based computing systems, Multi-agent systems. N the current age of big data, many applications often face the challenge of processing large and complex datasets, which are usually not available in a single place but rather distributed across multiple locations. Approaches that require data to be aggregated in a central location may be subject to significant scalability and storage challenges. In other scenarios, data are scattered across different sites and owned by different individuals or organizations. Data privacy and security requirements make it difficult to merge such data in an easy way. In both contexts, Distributed Learning (DL) [1]-[3] can provide feasible solutions by building high-performance models shared among multiple nodes while maintaining user privacy and data confidentiality. DL aims to build a collective machine learning model based on the data from multiple computing nodes that can process and store data and are connected via networks. Nodes can utilize neighboring information to improve their own performance: rather than sharing raw data, they only exchange model information such as model parameters or gradients to avoid revealing sensitive information. This work was supported in part by the National Science Foundation under A ward ECCS-1933878 and in part by the Air Force Office of Scientific Research under Grant 15RT0767. Lingzhou Hong and Alfredo Garcia are with the Department of Industrial & Systems Engineering, Texas A&M University, College Station, TX 77843 USA (e-mail: { hlz, alfredo.garcia


Revealed: The 5 Hollywood A-listers who will voice Meta's AI chatbot

Daily Mail - Science & tech

Have you ever dreamed of chatting to your favourite celebrity? Well, now that dream could become a reality, thanks to Meta's latest update. The tech giant has revealed the Hollywood A-listers who will lend their voices to its AI chatbot. Users will able to choose between five famous voices, including some that are instantly recognisable. So, which one will you choose for your Meta AI chatbot?