AITopics | Halawi, Danny

Collaborating Authors

Halawi, Danny

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities

Karger, Ezra, Bastani, Houtan, Yueh-Han, Chen, Jacobs, Zachary, Halawi, Danny, Zhang, Fred, Tetlock, Philip E.

arXiv.org Artificial IntelligenceJan-6-2025

Forecasts of future events are essential inputs into informed decision-making. Machine learning (ML) systems have the potential to deliver forecasts at scale, but there is no framework for evaluating the accuracy of ML systems on a standardized set of forecasting questions. To address this gap, we introduce ForecastBench: a dynamic benchmark that evaluates the accuracy of ML systems on an automatically generated and regularly updated set of 1,000 forecasting questions. To avoid any possibility of data leakage, ForecastBench is comprised solely of questions about future events that have no known answer at the time of submission. We quantify the capabilities of current ML systems by collecting forecasts from expert (human) forecasters, the general public, and LLMs on a random subset of questions from the benchmark ($N=200$). While LLMs have achieved super-human performance on many benchmarks, they perform less well here: expert forecasters outperform the top-performing LLM (p-value $<0.01$). We display system and human scores in a public leaderboard at www.forecastbench.org.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2409.19839

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.14)

Genre: Research Report > Experimental Study (0.88)

Industry:

Leisure & Entertainment (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Trading (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

Halawi, Danny, Wei, Alexander, Wallace, Eric, Wang, Tony T., Haghtalab, Nika, Steinhardt, Jacob

arXiv.org Artificial IntelligenceJun-28-2024

Black-box finetuning is an emerging interface for adapting state-of-the-art language models to user needs. However, such access may also let malicious actors undermine model safety. To demonstrate the challenge of defending finetuning interfaces, we introduce covert malicious finetuning, a method to compromise model safety via finetuning while evading detection. Our method constructs a malicious dataset where every individual datapoint appears innocuous, but finetuning on the dataset teaches the model to respond to encoded harmful requests with encoded harmful responses. Applied to GPT-4, our method produces a finetuned model that acts on harmful instructions 99% of the time and avoids detection by defense mechanisms such as dataset inspection, safety evaluations, and input/output classifiers. Our findings question whether black-box finetuning access can be secured against sophisticated adversaries.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.20053

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dominion: A New Frontier for AI Research

Halawi, Danny, Sarmasi, Aron, Saltzen, Siena, McCoy, Joshua

arXiv.org Artificial IntelligenceMay-10-2024

Games have long played a role in AI research, both as a test-bed, and as a moving goal-post, constantly driving innovation. From the heyday of chess agents, when Deep Blue beat Gary Kasparov, to more recent advances, like AlphaGo's dark horse ascent to fame, games have both assisted AI research and provided something to aim for. As the AIs got better, the games they were applied to also got more complex. New game mechanics, such as the fog of war in StarCraft and the stochasticity of Poker, pushed researchers to adapt their methods to ever greater generality. In this paper, we argue that the deck-building strategy game Dominion [1] deserves to join the ranks of AI benchmark games, providing an RL-based bot in service of that benchmark. Dominion has all of the abovementioned elements, but it also incorporates a mechanic that is not present in other popular RL benchmarks: every game is played with a different set of cards. Since each dominion card has a specific rule printed on it, and the set of 10 cards for a game are randomly picked from among hundreds of cards, no two games of Dominion can be approached the same way. Thus a key part of playing Dominion is adapting one's inductive bias of how to play to the specific cards on the table.

artificial intelligence, dominion, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.06846

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Games > Chess (0.54)

Add feedback

Approaching Human-Level Forecasting with Language Models

Halawi, Danny, Zhang, Fred, Yueh-Han, Chen, Steinhardt, Jacob

arXiv.org Artificial IntelligenceFeb-28-2024

Forecasting future events is important for policy and decision making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large dataset of questions from competitive forecasting platforms. Under a test set published after the knowledge cut-offs of our LMs, we evaluate the end-to-end performance of our system against the aggregates of human forecasts. On average, the system nears the crowd aggregate of competitive forecasters, and in some settings surpasses it. Our work suggests that using LMs to forecast the future could provide accurate predictions at scale and help to inform institutional decision making.

brier score, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.18563

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine (1.00)
Government > Voting & Elections (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Eliciting Latent Predictions from Transformers with the Tuned Lens

Belrose, Nora, Furman, Zach, Smith, Logan, Halawi, Danny, Ostrovsky, Igor, McKinney, Lev, Biderman, Stella, Steinhardt, Jacob

arXiv.org Artificial IntelligenceNov-26-2023

We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer. To do so, we train an affine probe for each block in a frozen pretrained model, making it possible to decode every hidden state into a distribution over the vocabulary. Our method, the \emph{tuned lens}, is a refinement of the earlier ``logit lens'' technique, which yielded useful insights but is often brittle. We test our method on various autoregressive language models with up to 20B parameters, showing it to be more predictive, reliable and unbiased than the logit lens. With causal experiments, we show the tuned lens uses similar features to the model itself. We also find the trajectory of latent predictions can be used to detect malicious inputs with high accuracy. All code needed to reproduce our results can be found at https://github.com/AlignmentResearch/tuned-lens.

large language model, logit lens, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.08112

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Japan (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.92)

Add feedback

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

Halawi, Danny, Denain, Jean-Stanislas, Steinhardt, Jacob

arXiv.org Artificial IntelligenceJul-18-2023

Modern language models can imitate complex patterns through few-shot learning, enabling them to complete challenging tasks without fine-tuning. However, imitation can also lead models to reproduce inaccuracies or harmful content if present in the context. We study harmful imitation through the lens of a model's internal representations, and identify two related phenomena: overthinking and false induction heads. The first phenomenon, overthinking, appears when we decode predictions from intermediate layers, given correct vs. incorrect few-shot demonstrations. At early layers, both demonstrations induce similar model behavior, but the behavior diverges sharply at some "critical layer", after which the accuracy given incorrect demonstrations progressively decreases. The second phenomenon, false induction heads, are a possible mechanistic cause of overthinking: these are heads in late layers that attend to and copy false information from previous demonstrations, and whose ablation reduces overthinking. Beyond scientific understanding, our results suggest that studying intermediate model computations could be a promising avenue for understanding and guarding against harmful model behaviors.

demonstration, machine learning, pattern recognition, (16 more...)

arXiv.org Artificial Intelligence

2307.09476

Country:

North America > United States (0.93)
Europe (0.92)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Government (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Banking & Finance (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback