AITopics | Rambachan, Ashesh

Collaborating Authors

Rambachan, Ashesh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large Language Models: An Applied Econometric Framework

Ludwig, Jens, Mullainathan, Sendhil, Rambachan, Ashesh

arXiv.org Artificial IntelligenceJan-3-2025

How can we use the novel capacities of large language models (LLMs) in empirical research? And how can we do so while accounting for their limitations, which are themselves only poorly understood? We develop an econometric framework to answer this question that distinguishes between two types of empirical tasks. Using LLMs for prediction problems (including hypothesis generation) is valid under one condition: no ``leakage'' between the LLM's training dataset and the researcher's sample. No leakage can be ensured by using open-source LLMs with documented training data and published weights. Using LLM outputs for estimation problems to automate the measurement of some economic concept (expressed either by some text or from human subjects) requires the researcher to collect at least some validation data: without such data, the errors of the LLM's automation cannot be assessed and accounted for. As long as these steps are taken, LLM outputs can be used in empirical research with the familiar econometric guarantees we desire. Using two illustrative applications to finance and political economy, we find that these requirements are stringent; when they are violated, the limitations of LLMs now result in unreliable empirical estimates. Our results suggest the excitement around the empirical uses of LLMs is warranted -- they allow researchers to effectively use even small amounts of language data for both prediction and estimation -- but only with these safeguards in place.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.07031

Country: North America > United States > New York (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment (1.00)
Law (1.00)
Health & Medicine (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Program Evaluation with Remotely Sensed Outcomes

Rambachan, Ashesh, Singh, Rahul, Viviano, Davide

arXiv.org Machine LearningNov-16-2024

While traditional program evaluations typically rely on surveys to measure outcomes, certain economic outcomes such as living standards or environmental quality may be infeasible or costly to collect. As a result, recent empirical work estimates treatment effects using remotely sensed variables (RSVs), such mobile phone activity or satellite images, instead of ground-truth outcome measurements. Common practice predicts the economic outcome from the RSV, using an auxiliary sample of labeled RSVs, and then uses such predictions as the outcome in the experiment. We prove that this approach leads to biased estimates of treatment effects when the RSV is a post-outcome variable. We nonparametrically identify the treatment effect, using an assumption that reflects the logic of recent empirical research: the conditional distribution of the RSV remains stable across both samples, given the outcome and treatment. Our results do not require researchers to know or consistently estimate the relationship between the RSV, outcome, and treatment, which is typically mis-specified with unstructured data. We form a representation of the RSV for downstream causal inference by predicting the outcome and predicting the treatment, with better predictions leading to more precise causal estimates. We re-evaluate the efficacy of a large-scale public program in India, showing that the program's measured effects on local consumption and poverty can be replicated using satellite

artificial intelligence, experimental sample, machine learning, (17 more...)

arXiv.org Machine Learning

2411.10959

Country:

Asia > India (0.34)
North America > United States > Massachusetts (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Evaluating the World Model Implicit in a Generative Model

Vafa, Keyon, Chen, Justin Y., Kleinberg, Jon, Mullainathan, Sendhil, Rambachan, Ashesh

arXiv.org Artificial IntelligenceJun-22-2024

Recent work suggests that large language models may implicitly learn world models. How should we assess this possibility? We formalize this question for the case where the underlying reality is governed by a deterministic finite automaton. This includes problems as diverse as simple logical reasoning, geographic navigation, game-playing, and chemistry. We propose new evaluation metrics for world model recovery inspired by the classic Myhill-Nerode theorem from language theory. We illustrate their utility in three domains: game playing, logic puzzles, and navigation. In all domains, the generative models we consider do well on existing diagnostics for assessing world models, but our evaluation metrics reveal their world models to be far less coherent than they appear. Such incoherence creates fragility: using a generative model to solve related but subtly different tasks can lead it to fail badly. Building generative models that meaningfully capture the underlying logic of the domains they model would be immensely valuable; our results suggest new ways to assess how close a given model is to that goal.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.03689

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.86)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
(3 more...)

Add feedback

Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function

Vafa, Keyon, Rambachan, Ashesh, Mullainathan, Sendhil

arXiv.org Artificial IntelligenceJun-3-2024

What makes large language models (LLMs) impressive is also what makes them hard to evaluate: their diversity of uses. To evaluate these models, we must understand the purposes they will be used for. We consider a setting where these deployment decisions are made by people, and in particular, people's beliefs about where an LLM will perform well. We model such beliefs as the consequence of a human generalization function: having seen what an LLM gets right or wrong, people generalize to where else it might succeed. We collect a dataset of 19K examples of how humans make generalizations across 79 tasks from the MMLU and BIG-Bench benchmarks. We show that the human generalization function can be predicted using NLP methods: people have consistent structured ways to generalize. We then evaluate LLM alignment with the human generalization function. Our results show that -- especially for cases where the cost of mistakes is high -- more capable models (e.g. GPT-4) can do worse on the instances people choose to use them for, exactly because they are not aligned with the human generalization function.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.01382

Country:

North America (0.46)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Robust Design and Evaluation of Predictive Algorithms under Unobserved Confounding

Rambachan, Ashesh, Coston, Amanda, Kennedy, Edward

arXiv.org Artificial IntelligenceAug-24-2023

Predictive algorithms inform consequential decisions in settings where the outcome is selectively observed given some choices made by human decision makers. There often exists unobserved confounders that affected the decision maker's choice and the outcome. We propose a unified methodology for the robust design and evaluation of predictive algorithms in selectively observed data under such unobserved confounding. Our approach imposes general assumptions on how much the outcome may vary on average between unselected and selected units conditional on observed covariates and identified nuisance parameters, formalizing popular empirical strategies for imputing missing data such as proxy outcomes and instrumental variables. We develop debiased machine learning estimators for the bounds on a large class of predictive performance estimands, such as the conditional likelihood of the outcome, a predictive algorithm's mean square error, true/false positive rate, and many others, under these assumptions. In an administrative dataset from a large Australian financial institution, we illustrate how varying assumptions on unobserved confounding leads to meaningful changes in default risk predictions and evaluations of credit scores across sensitive groups.

artificial intelligence, estimator, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2212.09844

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Characterizing Fairness Over the Set of Good Models Under Selective Labels

Coston, Amanda, Rambachan, Ashesh, Chouldechova, Alexandra

arXiv.org Machine LearningJan-12-2021

Algorithmic risk assessments are increasingly used to make and inform decisions in a wide variety of high-stakes settings. In practice, there is often a multitude of predictive models that deliver similar overall performance, an empirical phenomenon commonly known as the "Rashomon Effect." While many competing models may perform similarly overall, they may have different properties over various subgroups, and therefore have drastically different predictive fairness properties. In this paper, we develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance, or "the set of good models." We provide tractable algorithms to compute the range of attainable group-level predictive disparities and the disparity minimizing model over the set of good models. We extend our framework to address the empirically relevant challenge of selectively labelled data in the setting where the selection decision and outcome are unconfounded given the observed data features. We illustrate our methods in two empirical applications. In a real world credit-scoring task, we build a model with lower predictive disparities than the benchmark model, and demonstrate the benefits of properly accounting for the selective labels problem. In a recidivism risk prediction task, we audit an existing risk score, and find that it generates larger predictive disparities than any model in the set of good models.

disparity, health & medicine, optimization problem, (22 more...)

arXiv.org Machine Learning

2101.00352

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > France > Île-de-France (0.14)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Banking & Finance > Credit (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Modeling & Simulation (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Bias In, Bias Out? Evaluating the Folk Wisdom

Rambachan, Ashesh, Roth, Jonathan

arXiv.org Machine LearningSep-18-2019

We evaluate the folk wisdom that algorithms trained on data produced by biased human decision-makers necessarily reflect this bias. We consider a setting where training labels are only generated if a biased decision-maker takes a particular action, and so bias arises due to selection into the training data. In our baseline model, the more biased the decision-maker is toward a group, the more the algorithm favors that group. We refer to this phenomenon as "algorithmic affirmative action." We then clarify the conditions that give rise to algorithmic affirmative action. Whether a prediction algorithm reverses or inherits bias depends critically on how the decision-maker affects the training data as well as the label used in training. We illustrate our main theoretical results in a simulation study applied to the New York City Stop, Question and Frisk dataset.

african american, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1909.08518

Country: North America > United States > New York (0.24)

Genre: Research Report > New Finding (0.69)

Industry:

Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.76)
Government > Regional Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback