Goto

Collaborating Authors

 Hendler, James


Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data

arXiv.org Artificial Intelligence

Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research.


More Samples or More Prompt Inputs? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt Engineering

arXiv.org Artificial Intelligence

While most existing works on LLM prompt-engineering focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can't we design and leverage multiple prompt inputs together to further improve the LLM performance? In this work, we propose In-Context Sampling (ICS), a low-resource LLM prompt-engineering technique to produce the most confident prediction results by optimizing the construction of multiple ICL prompt inputs. Extensive experiments with two SOTA LLMs (FlanT5-XL and Mistral-7B) on three NLI datasets (e-SNLI, Multi-NLI, and ANLI) illustrate that ICS can consistently enhance LLM's prediction performance and confidence. An ablation study suggests that a diversity-based ICS strategy may further improve LLM's performance, which sheds light on a new yet promising future research direction.


Beyond Labels: Empowering Human Annotators with Natural Language Explanations through a Novel Active-Learning Architecture

arXiv.org Artificial Intelligence

Yet, existing low-resource learning techniques, such as Active Learning (AL), that aim to support human annotators mostly focus on the label while neglecting the natural language explanation of a data point. This work proposes a novel AL architecture to support experts' real-world need for label and explanation annotations in low-resource scenarios. Our AL architecture leverages an explanationgeneration model to produce explanations guided by human explanations, a prediction model that utilizes generated explanations toward prediction faithfully, and a novel data diversity-based AL sampling strategy that benefits from the explanation annotations. Automated and human evaluations demonstrate the effectiveness of incorporating explanations Figure 1: Our dual-model AL system architecture at into AL sampling and the improved human annotation every iteration: 1) the AL data selector chooses a few efficiency and trustworthiness with our unlabeled examples; 2) human annotators provide an AL architecture. Additional ablation studies illustrate explanation and label for each data instance; 3) the annotated the potential of our AL architecture explanations are used to finetune the explanationgeneration for transfer learning, generalizability, and integration model; 4) the annotated labels and generated with large language models (LLMs).


Are Human Explanations Always Helpful? Towards Objective Evaluation of Human Natural Language Explanations

arXiv.org Artificial Intelligence

Human-annotated labels and explanations are critical for training explainable NLP models. However, unlike human-annotated labels whose quality is easier to calibrate (e.g., with a majority vote), human-crafted free-form explanations can be quite subjective. Before blindly using them as ground truth to train ML models, a vital question needs to be asked: How do we evaluate a human-annotated explanation's quality? In this paper, we build on the view that the quality of a human-annotated explanation can be measured based on its helpfulness (or impairment) to the ML models' performance for the desired NLP tasks for which the annotations were collected. In comparison to the commonly used Simulatability score, we define a new metric that can take into consideration the helpfulness of an explanation for model performance at both fine-tuning and inference. With the help of a unified dataset format, we evaluated the proposed metric on five datasets (e.g., e-SNLI) against two model architectures (T5 and BART), and the results show that our proposed metric can objectively evaluate the quality of human-annotated explanations, while Simulatability falls short.


AnaXNet: Anatomy Aware Multi-label Finding Classification in Chest X-ray

arXiv.org Artificial Intelligence

Radiologists usually observe anatomical regions of chest X-ray images as well as the overall image before making a decision. However, most existing deep learning models only look at the entire X-ray image for classification, failing to utilize important anatomical information. In this paper, we propose a novel multi-label chest X-ray classification model that accurately classifies the image finding and also localizes the findings to their correct anatomical regions. Specifically, our model consists of two modules, the detection module and the anatomical dependency module. The latter utilizes graph convolutional networks, which enable our model to learn not only the label dependency but also the relationship between the anatomical regions in the chest X-ray. We further utilize a method to efficiently create an adjacency matrix for the anatomical regions using the correlation of the label across the different regions. Detailed experiments and analysis of our results show the effectiveness of our method when compared to the current state-of-the-art multi-label chest X-ray image classification methods while also providing accurate location information.


Exploiting Class Learnability in Noisy Data

arXiv.org Machine Learning

In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets harvested via these means, sometimes resulting in entire classes of data on which learned classifiers generalize poorly. For real world applications, we argue that it can be beneficial to avoid training on such classes entirely. In this work, we aim to explore the classes in a given data set, and guide supervised training to spend time on a class proportional to its learnability. By focusing the training process, we aim to improve model generalization on classes with a strong signal. To that end, we develop an online algorithm that works in conjunction with classifier and training algorithm, iteratively selecting training data for the classifier based on how well it appears to generalize on each class. Testing our approach on a variety of data sets, we show our algorithm learns to focus on classes for which the model has low generalization error relative to strong baselines, yielding a classifier with good performance on learnable classes.


To Serve AI (It's a Cookbook)

AI Magazine

James A. Hendler was recognized with the AAAI Distinguished Service Award at AAAI-17 for his contributions to the field of artificial intelligence through sustained service to AAAI, other professional societies and government activities promoting the importance of artificial intelligence research. This article presents his recipe for success advice, with advice directed at newer AI researchers (with some notes for experienced ones as well).


AI Theory and Practice: A Discussion on Hard Challenges and Opportunities Ahead

AI Magazine

So, we have a variety of people here with different interests and backgrounds that I asked to talk about not just the key challenges ahead but potential opportunities and promising pathways, trajectories to solving those problems, and their predictions about how R&D might proceed in terms of the timing of various kinds of development over time. I asked the panelists briefly to frame their comments sharing a little bit about fundamental questions, such as, "What is the research goal?" Not everybody stays up late at night hunched over a computer or a simulation or a robotic system, pondering the foundations of intelligence and human-level AI. We have here today Lise Getoor from the University ipate the liability and insurance industry; and the of Maryland; Devika Subramanian, who other one, that it was a human interface problem, comes to us from Rice University; we have Carlos that people don't necessarily want to go and type Guestrin from Carnegie Mellon University (CMU); a bunch of yes/no questions into a computer to get James Hendler from Rensselaer Polytechnic Institute an answer, even with a rule-based explanation, (RPI); Mike Wellman at the University of that if you'd taken that just a step further and Michigan; Henry Kautz at tjhe University of solved the human problem, it might have worked. Rochester; and Joe Konstan, who comes to us from Related to that, I was remembering a bunch of the Midwest, as our Minneapolis person here on these smart house projects. And I have to admit I the panel. I think everyone Joe Konstan: I was actually surprised when you hates smart spaces. I think of myself at the core there's nobody there, do you warn people and give in human-computer interaction. So I went back them a chance to answer? There's no good answer and started looking at what I knew of artificial to this question. I can tell you if that person is in intelligence to try to see where the path forward bed asleep, the answer is no, don't wake them up was, and I was inspired by the past.


An Ensemble Learning and Problem Solving Architecture for Airspace Management

AAAI Conferences

In this paper we describe the application of a novel learning and problem solving architecture to the domain of airspace management, where multiple requests for the use of airspace need to be reconciled and managed automatically. The key feature of our "Generalized Integrated Learning Architecture" (GILA) is a set of integrated learning and reasoning (ILR) systems coordinated by a central meta-reasoning executive (MRE). Each ILR learns independently from the same training example and contributes to problem-solving in concert with other ILRs as directed by the MRE. Formal evaluations show that our system performs as well as or better than humans after learning from the same training data. Further, GILA outperforms any individual ILR run in isolation, thus demonstrating the power of the ensemble architecture for learning and problem solving.