AITopics | Buntine, Wray

Collaborating Authors

Buntine, Wray

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Assessing the Alignment of FOL Closeness Metrics with Human Judgement

Thatikonda, Ramya Keerthy, Buntine, Wray, Shareghi, Ehsan

arXiv.org Artificial IntelligenceJan-15-2025

The recent successful paradigm of solving logical reasoning problems with tool-augmented large language models (LLMs) leverages translation of natural language statements into First-Order Logic~(FOL) and external theorem provers. However, the correctness of FOL statements, comprising operators and text predicates, often goes unverified due to the lack of a reliable evaluation metric for comparing generated and ground-truth FOLs. In this paper, we present a comprehensive study of sensitivity of existing metrics and their alignment with human judgement on FOL evaluation. Using ground-truth FOLs, we carefully designed various perturbations on the ground-truth to assess metric sensitivity. We sample FOL translation candidates for natural language statements and measure the ranking alignment between automatic metrics and human annotators. Our empirical findings highlight oversensitivity in the n-gram metric BLEU for text perturbations, the semantic graph metric Smatch++ for structural perturbations, and FOL metric for operator perturbation. We also observe a closer alignment between BertScore and human judgement. Additionally, we show that combining metrics enhances both alignment and sensitivity compared to using individual metrics.

large language model, logic & formal reasoning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.08613

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.56)

Add feedback

Strategies for Improving NL-to-FOL Translation with LLMs: Data Generation, Incremental Fine-Tuning, and Verification

Thatikonda, Ramya Keerthy, Han, Jiuzhou, Buntine, Wray, Shareghi, Ehsan

arXiv.org Artificial IntelligenceSep-24-2024

Logical reasoning is a fundamental task in natural language processing that presents significant challenges to Large Language Models (LLMs). The inherent characteristics of logical reasoning makes it well-suited for symbolic representations such as first-order logic (FOL). Research in symbolic logical reasoning explored FOL generation using state-of-the-art LLMs (i.e., GPT-4) to produce FOL translations of natural language (NL) statements, but errors in translation are usually not the focus. We address this by categorizing the translation errors in FOL statements generated by LLMs. To make progress towards improving the quality of FOL translations for smaller language models such as LLaMA-2 13B and Mistral 7B, we create ProofFOL, a high-quality FOL-annotated subset of ProofWriter dataset using GPT-4o. The models fine-tuned on this silver standard data achieve a significant gain in performance when compared to larger language models such as LLaMA-2 70B. In addition to improving the model using large data, we also tackle the issue of data scarcity and introduce an incremental framework encompassing of data augmentation and verification steps. In the augmentation process, a single pair of (premises, conclusion) is split into multiple new instances based on the predicates and FOLs. This data is used for fine-tuning, and the inference on this model generates FOLs with fewer errors over the model trained on the original data. Our investigation on the translation errors leads to generation of a perturbation dataset, which is used to train a verifier that corrects potential syntactic and semantic FOL translation errors. We demonstrate an efficient method for making the most of a limited existing human-annotated dataset. Our results show state-of-the-art performance for ProofWriter and ProntoQA datasets using ProofFOL on LLaMA-2 and Mistral models.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2409.16461

Country:

Asia (0.67)
North America > United States (0.46)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MTP: A Dataset for Multi-Modal Turning Points in Casual Conversations

Ho, Gia-Bao Dinh, Tan, Chang Wei, Darban, Zahra Zamanzadeh, Salehi, Mahsa, Haffari, Gholamreza, Buntine, Wray

arXiv.org Artificial IntelligenceSep-23-2024

Detecting critical moments, such as emotional outbursts or changes in decisions during conversations, is crucial for understanding shifts in human behavior and their consequences. Our work introduces a novel problem setting focusing on these moments as turning points (TPs), accompanied by a meticulously curated, high-consensus, human-annotated multi-modal dataset. We provide precise timestamps, descriptions, and visual-textual evidence high-lighting changes in emotions, behaviors, perspectives, and decisions at these turning points. We also propose a framework, TPMaven, utilizing state-of-the-art vision-language models to construct a narrative from the videos and large language models to classify and detect turning points in our multi-modal dataset. Evaluation results show that TPMaven achieves an F1-score of 0.88 in classification and 0.61 in detection, with additional explanations aligning with human expectations.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2409.14801

Country:

North America (0.46)
Asia (0.46)
Europe (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Discrete Diffusion Language Model for Long Text Summarization

Dat, Do Huu, Anh, Do Duc, Luu, Anh Tuan, Buntine, Wray

arXiv.org Artificial IntelligenceJun-25-2024

While diffusion models excel at conditional generating high-quality images, prior works in discrete diffusion models were not evaluated on conditional long-text generation. In this work, we address the limitations of prior discrete diffusion models for conditional long-text generation, particularly in long sequence-to-sequence tasks such as abstractive summarization. Despite fast decoding speeds compared to autoregressive methods, previous diffusion models failed on the abstractive summarization task due to the incompatibility between the backbone architectures and the random noising process. To overcome these challenges, we introduce a novel semantic-aware noising process that enables Transformer backbones to handle long sequences effectively. Additionally, we propose CrossMamba, an adaptation of the Mamba model to the encoder-decoder paradigm, which integrates seamlessly with the random absorbing noising process. Our approaches achieve state-of-the-art performance on three benchmark summarization datasets: Gigaword, CNN/DailyMail, and Arxiv, outperforming existing discrete diffusion models on ROUGE metrics as well as possessing much faster speed in inference compared to autoregressive models.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.10998

Country:

North America > United States > Louisiana (0.14)
Europe > Portugal (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

Yang, Xiaohao, Zhao, He, Phung, Dinh, Buntine, Wray, Du, Lan

arXiv.org Artificial IntelligenceJun-13-2024

Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g., perplexity) or focus on only one specific aspect of a model (e.g., topic quality or document representation quality) at a time, which is insufficient to reflect the overall model performance. In this paper, we propose WALM (Words Agreement with Language Model), a new evaluation method for topic modeling that comprehensively considers the semantic quality of document representations and topics in a joint manner, leveraging the power of large language models (LLMs). With extensive experiments involving different types of topic models, WALM is shown to align with human judgment and can serve as a complementary evaluation method to the existing ones, bringing a new perspective to topic modeling. Our software package will be available at https://github.com/Xiaohao-Yang/Topic_Model_Evaluation, which can be integrated with many widely used topic models.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.09008

Country: Oceania > Australia > Victoria > Melbourne (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Navigating Conflicting Views: Harnessing Trust for Learning

Lu, Jueqing, Du, Lan, Buntine, Wray, Jung, Myong Chol, Dipnall, Joanna, Gabbe, Belinda

arXiv.org Artificial IntelligenceJun-2-2024

Resolving conflicts is essential to make the decisions of multi-view classification more reliable. Much research has been conducted on learning consistent informative representations among different views, assuming that all views are identically important and strictly aligned. However, real-world multi-view data may not always conform to these assumptions, as some views may express distinct information. To address this issue, we develop a computational trust-based discounting method to enhance the existing trustworthy framework in scenarios where conflicts between different views may arise. Its belief fusion process considers the trustworthiness of predictions made by individual views via an instance-wise probability-sensitive trust discounting mechanism. We evaluate our method on six real-world datasets, using Top-1 Accuracy, AUC-ROC for Uncertainty-Aware Prediction, Fleiss' Kappa, and a new metric called Multi-View Agreement with Ground Truth that takes into consideration the ground truth labels. The experimental results show that computational trust can effectively resolve conflicts, paving the way for more reliable multi-view classification models in real-world applications.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2406.00958

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Diagnostic Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Survey on the Real Power of ChatGPT

Liu, Ming, Liu, Ran, Zhu, Ye, Wang, Hua, Qu, Youyang, Li, Rongsheng, Sheng, Yongpan, Buntine, Wray

arXiv.org Artificial IntelligenceMay-9-2024

ChatGPT has changed the AI community and an active research line is the performance evaluation of ChatGPT. A key challenge for the evaluation is that ChatGPT is still closed-source and traditional benchmark datasets may have been used by ChatGPT as the training data. In this paper, (i) we survey recent studies which uncover the real performance levels of ChatGPT in seven categories of NLP tasks, (ii) review the social implications and safety issues of ChatGPT, and (iii) emphasize key challenges and opportunities for its evaluation. We hope our survey can shed some light on its blackbox manner, so that researchers are not misleaded by its surface generation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.00704

Country: Asia > China (0.46)

Genre: Research Report (0.84)

Industry:

Health & Medicine (0.46)
Information Technology (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Vietnamese-English Medical Machine Translation

Vo, Nhu, Nguyen, Dat Quoc, Le, Dung D., Piccardi, Massimo, Buntine, Wray

arXiv.org Artificial IntelligenceMar-28-2024

Machine translation for Vietnamese-English in the medical domain is still an under-explored research area. In this paper, we introduce MedEV -- a high-quality Vietnamese-English parallel dataset constructed specifically for the medical domain, comprising approximately 360K sentence pairs. We conduct extensive experiments comparing Google Translate, ChatGPT (gpt-3.5-turbo), state-of-the-art Vietnamese-English neural machine translation models and pre-trained bilingual/multilingual sequence-to-sequence models on our new MedEV dataset. Experimental results show that the best performance is achieved by fine-tuning "vinai-translate" for each translation direction. We publicly release our dataset to promote further research.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.19161

Country: Oceania > Australia (0.14)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Towards Uncertainty-Aware Language Agent

Han, Jiuzhou, Buntine, Wray, Shareghi, Ehsan

arXiv.org Artificial IntelligenceFeb-7-2024

While Language Agents have achieved promising success by placing Large Language Models at the core of a more versatile design that dynamically interacts with the external world, the existing approaches neglect the notion of uncertainty during these interactions. We present the Uncertainty-Aware Language Agent (UALA), a framework that orchestrates the interaction between the agent and the external world using uncertainty quantification. Compared with other well-known counterparts like ReAct, our extensive experiments across 3 representative tasks (HotpotQA, StrategyQA, MMLU) and various LLM sizes demonstrate that UALA brings a significant improvement of performance, while having a substantially lower reliance on the external world (i.e., reduced number of tool calls and tokens). Our analyses provide various insights including the great potential of UALA compared with agent fine-tuning, and underscore the unreliability of verbalised confidence of LLMs as a proxy for uncertainty.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2401.14016

Country:

Europe (0.67)
North America > United States > Georgia > Dougherty County (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

OntoMedRec: Logically-Pretrained Model-Agnostic Ontology Encoders for Medication Recommendation

Tan, Weicong, Wang, Weiqing, Zhou, Xin, Buntine, Wray, Bingham, Gordon

arXiv.org Artificial IntelligenceJan-28-2024

Most existing medication recommendation models learn representations for medical concepts based on electronic health records (EHRs) and make recommendations with learnt representations. However, most medications appear in the dataset for limited times, resulting in insufficient learning of their representations. Medical ontologies are the hierarchical classification systems for medical terms where similar terms are in the same class on a certain level. In this paper, we propose OntoMedRec, the logically-pretrained and model-agnostic medical Ontology Encoders for Medication Recommendation that addresses data sparsity problem with medical ontologies. We conduct comprehensive experiments on benchmark datasets to evaluate the effectiveness of OntoMedRec, and the result shows the integration of OntoMedRec improves the performance of various models in both the entire EHR datasets and the admissions with few-shot medications. We provide the GitHub repository for the source code on https://anonymous.4open.science/r/OntoMedRec-D123

artificial intelligence, machine learning, ontology, (18 more...)

arXiv.org Artificial Intelligence

2401.15814

Country:

Oceania > Australia (0.14)
Asia > Vietnam (0.14)

Genre: Research Report (0.70)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Health Care Technology > Medical Record (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback