AITopics | Ash, Elliott

Collaborating Authors

Ash, Elliott

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning

Wu, Tianyi, Ni, Jingwei, Hooi, Bryan, Zhang, Jiaheng, Ash, Elliott, Ng, See-Kiong, Sachan, Mrinmaya, Leippold, Markus

arXiv.org Artificial IntelligenceFeb-17-2025

Instruction Fine-tuning (IFT) can enhance the helpfulness of Large Language Models (LLMs), but it may lower their truthfulness. This trade-off arises because IFT steers LLMs to generate responses with long-tail knowledge that is not well covered during pre-training, leading to more informative but less truthful answers when generalizing to unseen tasks. In this paper, we empirically demonstrate this helpfulness-truthfulness trade-off in IFT and propose $\textbf{UNIT}$, a novel IFT paradigm to address it. UNIT teaches LLMs to recognize their uncertainty and explicitly reflect it at the end of their responses. Experimental results show that UNIT-tuned models maintain their helpfulness while distinguishing between certain and uncertain claims, thereby reducing hallucinations.

helpfulness, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2502.11962

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

DIRAS: Efficient LLM-Assisted Annotation of Document Relevance in Retrieval Augmented Generation

Ni, Jingwei, Schimanski, Tobias, Lin, Meihong, Sachan, Mrinmaya, Ash, Elliott, Leippold, Markus

arXiv.org Artificial IntelligenceJun-20-2024

Retrieval Augmented Generation (RAG) is widely employed to ground responses to queries on domain-specific documents. But do RAG implementations leave out important information or excessively include irrelevant information? To allay these concerns, it is necessary to annotate domain-specific benchmarks to evaluate information retrieval (IR) performance, as relevance definitions vary across queries and domains. Furthermore, such benchmarks should be cost-efficiently annotated to avoid annotation selection bias. In this paper, we propose DIRAS (Domain-specific Information Retrieval Annotation with Scalability), a manual-annotation-free schema that fine-tunes open-sourced LLMs to annotate relevance labels with calibrated relevance probabilities. Extensive evaluation shows that DIRAS fine-tuned models achieve GPT-4-level performance on annotating and ranking unseen (query, document) pairs, and is helpful for real-world RAG development.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.14162

Country:

North America > United States (0.28)
Asia > Japan > Honshū (0.14)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback

Lerner, Emilia Agis, Dorner, Florian E., Ash, Elliott, Goel, Naman

arXiv.org Artificial IntelligenceJun-9-2024

There is a growing body of work on learning from human feedback to align various aspects of machine learning systems with human values and preferences. We consider the setting of fairness in content moderation, in which human feedback is used to determine how two comments -- referencing different sensitive attribute groups -- should be treated in comparison to one another. With a novel dataset collected from Prolific and MTurk, we find significant gaps in fairness preferences depending on the race, age, political stance, educational level, and LGBTQ+ identity of annotators. We also demonstrate that demographics mentioned in text have a strong influence on how users perceive individual fairness in moderation. Further, we find that differences also exist in downstream classifiers trained to predict human preferences. Finally, we observe that an ensemble, giving equal weight to classifiers trained on annotations from different demographics, performs better for different demographic intersections; compared to a single classifier that gives equal weight to each annotation.

annotator, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2406.05902

Country:

North America > United States > New York (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (0.93)
Law Enforcement & Public Safety (0.67)
Education > Educational Setting > K-12 Education (0.46)
Government > Regional Government > Europe Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering

Schimanski, Tobias, Ni, Jingwei, Kraus, Mathias, Ash, Elliott, Leippold, Markus

arXiv.org Artificial IntelligenceJun-3-2024

Advances towards more faithful and traceable answers of Large Language Models (LLMs) are crucial for various research and practical endeavors. One avenue in reaching this goal is basing the answers on reliable sources. However, this Evidence-Based QA has proven to work insufficiently with LLMs in terms of citing the correct sources (source quality) and truthfully representing the information within sources (answer attributability). In this work, we systematically investigate how to robustly fine-tune LLMs for better source quality and answer attributability. Specifically, we introduce a data generation pipeline with automated data quality filters, which can synthesize diversified high-quality training and testing data at scale. We further introduce four test sets to benchmark the robustness of fine-tuned specialist models. Extensive evaluation shows that fine-tuning on synthetic data improves performance on both in- and out-of-distribution. Furthermore, we show that data quality, which can be drastically improved by proposed quality filters, matters more than quantity in improving Evidence-Based QA.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.08277

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine (0.46)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

AFaCTA: Assisting the Annotation of Factual Claim Detection with Reliable LLM Annotators

Ni, Jingwei, Shi, Minjing, Stammbach, Dominik, Sachan, Mrinmaya, Ash, Elliott, Leippold, Markus

arXiv.org Artificial IntelligenceJun-2-2024

With the rise of generative AI, automated fact-checking methods to combat misinformation are becoming more and more important. However, factual claim detection, the first step in a fact-checking pipeline, suffers from two key issues that limit its scalability and generalizability: (1) inconsistency in definitions of the task and what a claim is, and (2) the high cost of manual annotation. To address (1), we review the definitions in related work and propose a unifying definition of factual claims that focuses on verifiability. To address (2), we introduce AFaCTA (Automatic Factual Claim deTection Annotator), a novel framework that assists in the annotation of factual claims with the help of large language models (LLMs). AFaCTA calibrates its annotation confidence with consistency along three predefined reasoning paths. Extensive evaluation and experiments in the domain of political speech reveal that AFaCTA can efficiently assist experts in annotating factual claims and training high-quality classifiers, and can work with or without expert supervision. Our analyses also result in PoliClaim, a comprehensive claim detection dataset spanning diverse political topics.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.11073

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Experimental Study (0.68)

Industry:

Media > News (1.00)
Law (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Where Do People Tell Stories Online? Story Detection Across Online Communities

Antoniak, Maria, Mire, Joel, Sap, Maarten, Ash, Elliott, Piper, Andrew

arXiv.org Artificial IntelligenceNov-16-2023

People share stories online for a myriad of purposes, whether as a means of self-disclosure, processing difficult personal experiences, providing needed information or entertainment, or persuading others to share their beliefs. Better understanding of online storytelling can illuminate the dynamics of social movements, sensemaking practices, persuasion strategies, and more. However, unlike other media such as books and visual content where the narrative nature of the content is often overtly signaled at the document level, studying storytelling in online communities is challenging due to the mixture of storytelling and non-storytelling behavior, which can be interspersed within documents and across diverse topics and settings. We introduce a codebook and create the Storytelling in Online Communities Corpus, an expert-annotated dataset of 502 English-language posts and comments with labeled story and event spans. Using our corpus, we train and evaluate an online story detection model, which we use to investigate the role storytelling of in different social contexts. We identify distinctive features of online storytelling, the prevalence of storytelling among different communities, and the conversational patterns of storytelling.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.09675

Country:

North America > United States (0.93)
Europe (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

LePaRD: A Large-Scale Dataset of Judges Citing Precedents

Mahari, Robert, Stammbach, Dominik, Ash, Elliott, Pentland, Alex `Sandy'

arXiv.org Artificial IntelligenceNov-15-2023

We present the Legal Passage Retrieval Dataset LePaRD. LePaRD is a massive collection of U.S. federal judicial citations to precedent in context. The dataset aims to facilitate work on legal passage prediction, a challenging practice-oriented legal retrieval and reasoning task. Legal passage prediction seeks to predict relevant passages from precedential court decisions given the context of a legal argument. We extensively evaluate various retrieval approaches on LePaRD, and find that classification appears to work best. However, we note that legal precedent prediction is a difficult task, and there remains significant room for improvement. We hope that by publishing LePaRD, we will encourage others to engage with a legal NLP task that promises to help expand access to justice by reducing the burden associated with legal research. A subset of the LePaRD dataset is freely available and the whole dataset will be released upon publication.

machine learning, natural language, precedent, (19 more...)

arXiv.org Artificial Intelligence

2311.09356

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.50)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Enhancing Public Understanding of Court Opinions with Automated Summarizers

Ash, Elliott, Kesari, Aniket, Naidu, Suresh, Song, Lena, Stammbach, Dominik

arXiv.org Artificial IntelligenceNov-11-2023

Judges are important policymakers but are less accountable to the public than legislators. One way judges strengthen the legitimacy of their policy choices given low accountability is by providing written justifications based on shared principles, which are then published as judicial opinions. John Rawls argued that "[The U.S. Supreme Court's] role is not merely defensive but to give due and continuing effect to public reason by serving as its institutional exemplar." Presumably, this legitimizing function is best served when the general population can understand the written justifications. In practice, however, judicial opinions tend to be extremely long and written in complicated technical language that is inaccessible except to trained lawyers.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2311.06534

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (0.68)

Industry:

Law > Government & the Courts (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

Add feedback

WCLD: Curated Large Dataset of Criminal Cases from Wisconsin Circuit Courts

Ash, Elliott, Goel, Naman, Li, Nianyun, Marangon, Claudia, Sun, Peiyao

arXiv.org Artificial IntelligenceOct-28-2023

Machine learning based decision-support tools in criminal justice systems are subjects of intense discussions and academic research. There are important open questions about the utility and fairness of such tools. Academic researchers often rely on a few small datasets that are not sufficient to empirically study various real-world aspects of these questions. In this paper, we contribute WCLD, a curated large dataset of 1.5 million criminal cases from circuit courts in the U.S. state of Wisconsin. We used reliable public data from 1970 to 2020 to curate attributes like prior criminal counts and recidivism outcomes. The dataset contains large number of samples from five racial groups, in addition to information like sex and age (at judgment and first offense). Other attributes in this dataset include neighborhood characteristics obtained from census data, detailed types of offense, charge severity, case decisions, sentence lengths, year of filing etc. We also provide pseudo-identifiers for judge, county and zipcode. The dataset will not only enable researchers to more rigorously study algorithmic fairness in the context of criminal justice, but also relate algorithmic challenges with various systemic issues. We also discuss in detail the process of constructing the dataset and provide a datasheet. The WCLD dataset is available at \url{https://clezdata.github.io/wcld/}.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.18724

Country: North America > United States > Wisconsin (0.62)

Genre: Research Report > New Finding (0.68)

Industry:

Law > Criminal Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

The Law and NLP: Bridging Disciplinary Disconnects

Mahari, Robert, Stammbach, Dominik, Ash, Elliott, Pentland, Alex 'Sandy'

arXiv.org Artificial IntelligenceOct-22-2023

Legal practice is intrinsically rooted in the fabric of language, yet legal practitioners and scholars have been slow to adopt tools from natural language processing (NLP). At the same time, the legal system is experiencing an access to justice crisis, which could be partially alleviated with NLP. In this position paper, we argue that the slow uptake of NLP in legal practice is exacerbated by a disconnect between the needs of the legal community and the focus of NLP researchers. In a review of recent trends in the legal NLP literature, we find limited overlap between the legal NLP community and legal academia. Our interpretation is that some of the most popular legal NLP tasks fail to address the needs of legal practitioners. We discuss examples of legal NLP tasks that promise to bridge disciplinary disconnects and highlight interesting areas for legal NLP research that remain underexplored.

artificial intelligence, bridging disciplinary disconnect, natural language, (1 more...)

arXiv.org Artificial Intelligence

2310.14346

Genre: Research Report (0.40)

Industry: Law (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback