haic
Patterns and Mechanisms of Contrastive Activation Engineering
Hao, Yixiong, Panda, Ayush, Shabalin, Stepan, Ali, Sheikh Abdur Raheem
A BSTRACT Controlling the behavior of Large Language Models (LLMs) remains a significant challenge due to their inherent complexity and opacity. While techniques like fine-tuning can modify model behavior, they typically require extensive computational resources. Recent work has introduced a class of contrastive activation engineering (CAE) techniques as promising approaches for steering LLM outputs through targeted modifications to their internal representations. Applied at inference-time with zero cost, CAE has the potential to introduce a new paradigm of flexible, task-specific LLM behavior tuning. We analyze the performance of CAE in in-distribution, out-of-distribution settings, evaluate drawbacks, and begin to develop comprehensive guidelines for its effective deployment. We find that 1. CAE is only reliably effective when applied to in-distribution contexts. Contrastive activation engineering (CAE) emerged from AI safety literature (Turner et al., 2024) in 2023 as a class of techniques capable of altering LLM generations at inference time with zero cost (Turner et al., 2024; Panickssery et al., 2024). A LLM's learned representations are not necessarily interpretable but affect the behavior in a meaningful, predictable way (Park et al., 2024).
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel (0.04)
Evaluating Human-AI Collaboration: A Review and Methodological Framework
Fragiadakis, George, Diou, Christos, Kousiouris, George, Nikolaidou, Mara
The use of artificial intelligence (AI) in working environments with individuals, known as Human-AI Collaboration (HAIC), has become essential in a variety of domains, boosting decision-making, efficiency, and innovation. Despite HAIC's wide potential, evaluating its effectiveness remains challenging due to the complex interaction of components involved. This paper provides a detailed analysis of existing HAIC evaluation approaches and develops a fresh paradigm for more effectively evaluating these systems. Our framework includes a structured decision tree which assists to select relevant metrics based on distinct HAIC modes (AI-Centric, Human-Centric, and Symbiotic). By including both quantitative and qualitative metrics, the framework seeks to represent HAIC's dynamic and reciprocal nature, enabling the assessment of its impact and success. This framework's practicality can be examined by its application in an array of domains, including manufacturing, healthcare, finance, and education, each of which has unique challenges and requirements. Our hope is that this study will facilitate further research on the systematic evaluation of HAIC in real-world applications.
- North America > United States (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- (3 more...)
- Research Report (1.00)
- Questionnaire & Opinion Survey (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Banking & Finance (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.93)
- (5 more...)