AITopics | Estornell, Andrew

Collaborating Authors

Estornell, Andrew

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ACC-Debate: An Actor-Critic Approach to Multi-Agent Debate

Estornell, Andrew, Ton, Jean-Francois, Yao, Yuanshun, Liu, Yang

arXiv.org Artificial IntelligenceNov-4-2024

Large language models (LLMs) have demonstrated a remarkable ability to serve as general-purpose tools for various language-based tasks. Recent works have demonstrated that the efficacy of such models can be improved through iterative dialog between multiple models, frequently referred to as multi-agent debate (MAD). While debate shows promise as a means of improving model efficacy, most works in this area treat debate as an emergent behavior, rather than a learned behavior. In doing so, current debate frameworks rely on collaborative behaviors to have been sufficiently trained into off-the-shelf models. To address this limitation, we propose ACC-Debate, an Actor-Critic based learning framework to produce a two-agent team specialized in debate. We demonstrate that ACC-Debate outperforms SotA debate techniques on a wide array of benchmarks. Recently, large language models (LLMs) have rapidly become a cornerstone in various applications, redefining how we process and generate language at scale (Thirunavukarasu et al., 2023; Hadi et al., 2023; Jiang et al., 2024). Their ability to handle diverse tasks, from translation (Zhu et al., 2024; Otter et al., 2020) to answering complex questions (Zhang et al., 2024; Hao et al., 2024; Havrilla et al., 2024), has attracted the attention of both industry as well as academia. However, despite these advancements, LLMs still exhibit notable weaknesses, particularly when it comes to answering factual questions and reasoning (Tonmoy et al., 2024; Rawte et al., 2023; Huang et al., 2023). To address these limitations, several techniques have been proposed, such as Chain-of-Thought (CoT) prompting (Wei et al., 2022), self-reflection (Ji et al., 2023; Shinn et al., 2023), and multiagent debate (MAD) (Du et al., 2023), to name a few. These approaches aim to improve the reasoning abilities of LLMs by guiding them toward more accurate answers through structured thinking or discourse. However, the majority of these techniques do not involve training the model specifically for these tasks but instead rely on zero-shot or few-shot capabilities.

arxiv preprint arxiv, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.00053

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Education (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Dataset Representativeness and Downstream Task Fairness

Borza, Victor, Estornell, Andrew, Ho, Chien-Ju, Malin, Bradley, Vorobeychik, Yevgeniy

arXiv.org Artificial IntelligenceJun-28-2024

Our society collects data on people for a wide range of applications, from building a census for policy evaluation to running meaningful clinical trials. To collect data, we typically sample individuals with the goal of accurately representing a population of interest. However, current sampling processes often collect data opportunistically from data sources, which can lead to datasets that are biased and not representative, i.e., the collected dataset does not accurately reflect the distribution of demographics of the true population. This is a concern because subgroups within the population can be under- or over-represented in a dataset, which may harm generalizability and lead to an unequal distribution of benefits and harms from downstream tasks that use such datasets (e.g., algorithmic bias in medical decision-making algorithms). In this paper, we assess the relationship between dataset representativeness and group-fairness of classifiers trained on that dataset. We demonstrate that there is a natural tension between dataset representativeness and classifier fairness; empirically we observe that training datasets with better representativeness can frequently result in classifiers with higher rates of unfairness. We provide some intuition as to why this occurs via a set of theoretical results in the case of univariate classifiers. We also find that over-sampling underrepresented groups can result in classifiers which exhibit greater bias to those groups. Lastly, we observe that fairness-aware sampling strategies (i.e., those which are specifically designed to select data with high downstream fairness) will often over-sample members of majority groups. These results demonstrate that the relationship between dataset representativeness and downstream classifier fairness is complex; balancing these two quantities requires special care from both model- and dataset-designers.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2407.0017

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.94)
Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Predicting Customer Goals in Financial Institution Services: A Data-Driven LSTM Approach

Estornell, Andrew, Vasileiou, Stylianos Loukas, Yeoh, William, Borrajo, Daniel, Silva, Rui

arXiv.org Artificial IntelligenceMay-22-2024

By incorporating state-space graph in recent years, driven by rapid technological advancements, embeddings into the LSTM model, we further enrich the evolving customer expectations, and increased model's understanding of the relationships and dependencies competition. As customers demand more personalized and among various features within the dataset, which may convenient services, financial institutions are under pressure lead to improved performance. This combination of LSTM to develop a deeper understanding of their clients' needs and models and state graph embeddings offers a more scalable preferences. This has led to a growing interest in leveraging and efficient solution in predicting customer goals and actions, data-driven approaches to gain insights into customer behavior while maintaining a high level of accuracy and robustness and predict future actions.

artificial intelligence, customer, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.19399

Genre: Research Report (0.64)

Industry: Banking & Finance > Credit (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting

Wei, Jiaheng, Yao, Yuanshun, Ton, Jean-Francois, Guo, Hongyi, Estornell, Andrew, Liu, Yang

arXiv.org Artificial IntelligenceFeb-15-2024

LLM is known to provide factually inaccurate information that appears to be confident, i.e. hallucination. It is currently a major obstacle to the reliability and trustworthiness of LLM [13, 34, 21]. An essential step towards solving this problem is measuring hallucinations. However, this is challenging from a data perspective as existing metrics presume that benchmark datasets posses gold-standard answers, i.e. "best" or "correct" answers written by humans [16]. The requirement of such answers imposes two fundamental limitations on hallucination measurement: 1) hiring human annotators to produce gold-standard answers is costly in both time and money [4, 43, 38]; 2) gold-standard answers are prone to natural human errors [7, 6, 49]. To this end, we take a step forward and propose a framework which measures the LLM hallucinations without the requirement of gold-standard answers. Our framework is partially inspired by the literature on learning with noisy labels [23, 18, 19], where there are no ground-truth labels for verifying the quality of imperfect human annotations [43, 38, 20], detecting annotation errors [48, 26, 47], or training models robustly [42, 3, 17, 36, 39]. Our basic idea is simple: leveraging off-the-shelf and high-quality LLMs to generate answers that serve as a proxy for gold-standard answers. The primary challenge in such an approach is how to properly weigh the expertise of each LLM for a given question x, without a priori knowledge of the true (i.e.

answer, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2402.10412

Country:

North America > United States > Illinois (0.14)
North America > United States > Nevada (0.14)
North America > United States > Arkansas (0.14)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Media > Television (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback