Westport
Decision-Making Amid Information-Based Threats in Sociotechnical Systems: A Review
Allred, Aaron R., Richardson, Erin E., Bostrom, Sarah R., Crum, James, Spencer, Cara, Tossell, Chad, Niemeyer, Richard E., Hirshfield, Leanne, Hayman, Allison P. A.
Technological systems increasingly mediate human information exchange, spanning interactions among humans as well as between humans and artificial agents. The unprecedented scale and reliance on information disseminated through these systems substantially expand the scope of information-based influence that can both enable and undermine sound decision-making. Consequently, understanding and protecting decision-making today faces growing challenges, as individuals and organizations must navigate evolving opportunities and information-based threats across varied domains and information environments. While these risks are widely recognized, research remains fragmented: work evaluating information-based threat phenomena has progressed largely in isolation from foundational studies of human information processing. In this review, we synthesize insights from both domains to identify shared cognitive mechanisms that mediate vulnerability to information-based threats and shape behavioral outcomes. Finally, we outline directions for future research aimed at integrating these perspectives, emphasizing the importance of such integration for mitigating human vulnerabilities and aligning human-machine representations.
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Connecticut > Fairfield County > Westport (0.04)
- (14 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Government > Military (1.00)
- Government > Regional Government > North America Government > United States Government (0.92)
- (3 more...)
The Quick Red Fox gets the best Data Driven Classroom Interviews: A manual for an interview app and its associated methodology
Ocumpaugh, Jaclyn, Paquette, Luc, Baker, Ryan S., Barany, Amanda, Ginger, Jeff, Casano, Nathan, Zambrano, Andres F., Liu, Xiner, Wei, Zhanlan, Zhou, Yiqui, Liu, Qianhui, Hutt, Stephen, Andres, Alexandra M. A., Nasiar, Nidhi, Giordano, Camille, van Velsen, Martin, Mogessi, Micheal
Data Driven Classroom Interviews (DDCIs) are an interviewing technique that is facilitated by recent technological developments in the learning analytics community. DDCIs are short, targeted interviews that allow researchers to contextualize students' interactions with a digital learning environment (e.g., intelligent tutoring systems or educational games) while minimizing the amount of time that the researcher interrupts that learning experience, and focusing researcher time on the events they most want to focus on DDCIs are facilitated by a research tool called the Quick Red Fox (QRF)--an open-source server-client Android app that optimizes researcher time by directing interviewers to users that have just displayed an interesting behavior (previously defined by the research team). QRF integrates with existing student modeling technologies (e.g., behavior-sensing, affect-sensing, detection of self-regulated learning) to alert researchers to key moments in a learner's experience. This manual documents the tech while providing training on the processes involved in developing triggers and interview techniques; it also suggests methods of analyses.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Pennsylvania (0.04)
- (13 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- (4 more...)
- Education > Educational Technology > Educational Software > Computer Based Training (1.00)
- Education > Educational Setting (1.00)
Benchmarking is Broken -- Don't Let AI be its Own Judge
Cheng, Zerui, Wohnig, Stella, Gupta, Ruchika, Alam, Samiul, Abdullahi, Tassallah, Ribeiro, João Alves, Nielsen-Garcia, Christian, Mir, Saif, Li, Siran, Orender, Jason, Bahrainian, Seyed Ali, Kirste, Daniel, Gokaslan, Aaron, Glinka, Mikołaj, Eickhoff, Carsten, Wolff, Ruben
The meteoric rise of AI, with its rapidly expanding market capitalization, presents both transformative opportunities and critical challenges. Chief among these is the urgent need for a new, unified paradigm for trustworthy evaluation, as current benchmarks increasingly reveal critical vulnerabilities. Issues like data contamination and selective reporting by model developers fuel hype, while inadequate data quality control can lead to biased evaluations that, even if unintentionally, may favor specific approaches. As a flood of participants enters the AI space, this "Wild West" of assessment makes distinguishing genuine progress from exaggerated claims exceptionally difficult. Such ambiguity blurs scientific signals and erodes public confidence, much as unchecked claims would destabilize financial markets reliant on credible oversight from agencies like Moody's. In high-stakes human examinations (e.g., SAT, GRE), substantial effort is devoted to ensuring fairness and credibility; why settle for less in evaluating AI, especially given its profound societal impact? This position paper argues that the current laissez-faire approach is unsustainable. We contend that true, sustainable AI advancement demands a paradigm shift: a unified, live, and quality-controlled benchmarking framework robust by construction, not by mere courtesy and goodwill. To this end, we dissect the systemic flaws undermining today's AI evaluation, distill the essential requirements for a new generation of assessments, and introduce PeerBench (with its prototype implementation at https://www.peerbench.ai/), a community-governed, proctored evaluation blueprint that embodies this paradigm through sealed execution, item banking with rolling renewal, and delayed transparency. Our goal is to pave the way for evaluations that can restore integrity and deliver genuinely trustworthy measures of AI progress.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Ohio (0.04)
- North America > United States > Michigan (0.04)
- (4 more...)
- Banking & Finance (0.88)
- Information Technology > Security & Privacy (0.68)
- Social Sector (0.66)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Reward Model Perspectives: Whose Opinions Do Reward Models Reward?
Reward models (RMs) are central to the alignment of language models (LMs). An RM often serves as a proxy for human preferences to guide downstream LM behavior. However, our understanding of RM behavior is limited. Our work (i) formalizes a framework for measuring the alignment of opinions captured by RMs, (ii) investigates the extent to which RMs demonstrate sociodemographic biases, and (iii) explores the effects of prompting to steer rewards towards the preferences of a target group. We study the subjective and diverse perspectives on controversial topics, which allows us to quantify RM perspectives in terms of their opinions, attitudes, and values. We show that RMs are poorly aligned with several demographic groups and can systematically reward harmful stereotypes, and steering alone is not enough to overcome these limitations. Our findings underscore the need for more careful consideration of RM behavior in model alignment during preference learning to prevent the propagation of unwanted social biases in the language technologies that we use.
- South America > Ecuador (0.04)
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- (21 more...)
- Health & Medicine > Therapeutic Area (0.93)
- Government (0.68)
Which Cultural Lens Do Models Adopt? On Cultural Positioning Bias and Agentic Mitigation in LLMs
Wan, Yixin, Chen, Xingrun, Chang, Kai-Wei
Large language models (LLMs) have unlocked a wide range of downstream generative applications. However, we found that they also risk perpetuating subtle fairness issues tied to culture, positioning their generations from the perspectives of the mainstream US culture while demonstrating salient externality towards non-mainstream ones. In this work, we identify and systematically investigate this novel culture positioning bias, in which an LLM's default generative stance aligns with a mainstream view and treats other cultures as outsiders. We propose the CultureLens benchmark with 4000 generation prompts and 3 evaluation metrics for quantifying this bias through the lens of a culturally situated interview script generation task, in which an LLM is positioned as an onsite reporter interviewing local people across 10 diverse cultures. Empirical evaluation on 5 state-of-the-art LLMs reveals a stark pattern: while models adopt insider tones in over 88 percent of US-contexted scripts on average, they disproportionately adopt mainly outsider stances for less dominant cultures. To resolve these biases, we propose 2 inference-time mitigation methods: a baseline prompt-based Fairness Intervention Pillars (FIP) method, and a structured Mitigation via Fairness Agents (MFA) framework consisting of 2 pipelines: (1) MFA-SA (Single-Agent) introduces a self-reflection and rewriting loop based on fairness guidelines. (2) MFA-MA (Multi-Agent) structures the process into a hierarchy of specialized agents: a Planner Agent(initial script generation), a Critique Agent (evaluates initial script against fairness pillars), and a Refinement Agent (incorporates feedback to produce a polished, unbiased script). Empirical results showcase the effectiveness of agent-based methods as a promising direction for mitigating biases in generative LLMs.
- Asia > Middle East > UAE (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- (14 more...)
Birds of a Different Feather Flock Together: Exploring Opportunities and Challenges in Animal-Human-Machine Teaming
Cohen, Myke C., Grimm, David A., Mirsky, Reuth, Yin, Xiaoyun
Birds of a Different Feather Flock Together: Exploring Opportunities and Challenges in Animal-Human-Machine Teaming Myke C. Cohen 1,2, David A. Grimm 3, Reuth Mirsky 4, and Xiaoyun Yin 1 1 Arizona State University, Mesa, AZ 2 Aptima, Inc., Woburn, MA 3 Georgia Institute of Technology, Atlanta, GA 4 Tufts University, Medford, MA Abstract Animal-Human-Machine (AHM) teams are a type of hybrid intelligence system wherein interactions between a human, AI-enabled machine, and animal members can result in unique capabilities greater than the sum of their parts. This paper calls for a systematic approach to studying the design of AHM team structures to optimize performance and overcome limitations in various applied settings. We consider the challenges and opportunities in investigating the synergistic potential of AHM team members by introducing a set of dimensions of AHM team functioning to effectively utilize each member's strengths while compensating for individual weaknesses. Using three representative examples of such teams--security screening, search-and-rescue, and guide dogs--the paper illustrates how AHM teams can tackle complex tasks. We conclude with open research directions that this multidimensional approach presents for studying hybrid human-AI systems beyond AHM teams. Keywords: multi-agent systems, animal-human-machine teaming, functional allocation 1 Introduction Consider a Blind or Visually Impaired (BVI) person training to be assisted by a guide dog. When the pair reaches an obstacle along their path, they 1 arXiv:2504.13973v1
- North America > United States > Massachusetts > Middlesex County > Woburn (0.24)
- North America > United States > Massachusetts > Middlesex County > Medford (0.24)
- North America > United States > Georgia > Fulton County > Atlanta (0.24)
- (9 more...)
Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task
Wu, Junjie, Yu, Mo, Liu, Lemao, Yeung, Dit-Yan, Zhou, Jie
While LLMs have exhibited strong performance on various NLP tasks, it is noteworthy that most of these tasks rely on utilizing the vast amount of knowledge encoded in LLMs' parameters, rather than solving new problems without prior knowledge. In cognitive research, the latter ability is referred to as fluid intelligence, which is considered to be critical for assessing human intelligence. Recent research on fluid intelligence assessments has highlighted significant deficiencies in LLMs' abilities. In this paper, we analyze the challenges LLMs face in demonstrating fluid intelligence through controlled experiments, using the most representative ARC task as an example. Our study revealed three major limitations in existing LLMs: limited ability for skill composition, unfamiliarity with abstract input formats, and the intrinsic deficiency of left-to-right decoding. Our data and code can be found in https://wujunjie1998.github.io/araoc-benchmark.github.io/.
- Europe > Austria > Vienna (0.14)
- North America > United States > Connecticut > Fairfield County > Westport (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
- Research Report > New Finding (0.88)
- Research Report > Experimental Study (0.68)
Zoning in American Cities: Are Reforms Making a Difference? An AI-based Analysis
Salazar-Miranda, Arianna, Talen, Emily
Cities are at the forefront of addressing global sustainability challenges, particularly those exacerbated by climate change. Traditional zoning codes, which often segregate land uses, have been linked to increased vehicular dependence, urban sprawl, and social disconnection, undermining broader social and environmental sustainability objectives. This study investigates the adoption and impact of form-based codes (FBCs), which aim to promote sustainable, compact, and mixed-use urban forms as a solution to these issues. Using Natural Language Processing (NLP) techniques, we analyzed zoning documents from over 2000 U.S. census-designated places to identify linguistic patterns indicative of FBC principles. Our findings reveal widespread adoption of FBCs across the country, with notable variations within regions. FBCs are associated with higher floor-to-area ratios, narrower and more consistent street setbacks, and smaller plots. We also find that places with FBCs have improved walkability, shorter commutes, and a higher share of multi-family housing. Our findings highlight the utility of NLP for evaluating zoning codes and underscore the potential benefits of form-based zoning reforms for enhancing urban sustainability.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- (12 more...)
- Banking & Finance > Real Estate (1.00)
- Law > Real Estate Law (0.68)
- Government > Regional Government > North America Government > United States Government (0.46)
GEMS: Generative Expert Metric System through Iterative Prompt Priming
Cheng, Ti-Chung, Badea, Carmen, Bird, Christian, Zimmermann, Thomas, DeLine, Robert, Forsgren, Nicole, Ford, Denae
Across domains, metrics and measurements are fundamental to identifying challenges, informing decisions, and resolving conflicts. Despite the abundance of data available in this information age, not only can it be challenging for a single expert to work across multi-disciplinary data, but non-experts can also find it unintuitive to create effective measures or transform theories into context-specific metrics that are chosen appropriately. This technical report addresses this challenge by examining software communities within large software corporations, where different measures are used as proxies to locate counterparts within the organization to transfer tacit knowledge. We propose a prompt-engineering framework inspired by neural activities, demonstrating that generative models can extract and summarize theories and perform basic reasoning, thereby transforming concepts into context-aware metrics to support software communities given software repository data. While this research zoomed in on software communities, we believe the framework's applicability extends across various fields, showcasing expert-theory-inspired metrics that aid in triaging complex challenges.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (6 more...)
Beyond Preferences in AI Alignment
Zhi-Xuan, Tan, Carroll, Micah, Franklin, Matija, Ashton, Hal
The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term a preferentist approach to AI alignment. In this paper, we characterize and challenge the preferentist approach, describing conceptual and technical alternatives that are ripe for further research. We first survey the limits of rational choice theory as a descriptive model, explaining how preferences fail to capture the thick semantic content of human values, and how utility representations neglect the possible incommensurability of those values. We then critique the normativity of expected utility theory (EUT) for humans and AI, drawing upon arguments showing how rational agents need not comply with EUT, while highlighting how EUT is silent on which preferences are normatively acceptable. Finally, we argue that these limitations motivate a reframing of the targets of AI alignment: Instead of alignment with the preferences of a human user, developer, or humanity-writ-large, AI systems should be aligned with normative standards appropriate to their social roles, such as the role of a general-purpose assistant. Furthermore, these standards should be negotiated and agreed upon by all relevant stakeholders. On this alternative conception of alignment, a multiplicity of AI systems will be able to serve diverse ends, aligned with normative standards that promote mutual benefit and limit harm despite our plural and divergent values.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- (14 more...)
- Research Report (0.63)
- Overview (0.45)
- Law (1.00)
- Government (1.00)
- Health & Medicine (0.92)