Goto

Collaborating Authors

 Large Language Model


Augmenting Interpretable Models with LLMs during Training

arXiv.org Artificial Intelligence

Recent large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains (e.g. medicine) and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Augmented Interpretable Models (Aug-imodels), a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1,000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented counterparts. Aug-GAM can even outperform much larger models (e.g. a 6-billion parameter GPT-J model), despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data. All code for using Aug-imodels and reproducing results is made available on Github.


Tools to spot AI essays show bias against non-native English speakers

New Scientist

Working out who has produced work isn't always an easy matter Tools to detect if a body of English text has been written by humans or artificial intelligence exhibit bias against people whose primary language isn't English. The tests frequently misidentify their work as being created by an AI. Text-generating AI models such as OpenAI's ChatGPT and GPT-4 are being used by some students at schools and universities to create essays that they are passing off as their own work.


From pope's jacket to napalm recipes: how worrying is AI's rapid growth?

The Guardian

When the boss of Google admits to losing sleep over the negative potential of artificial intelligence, perhaps it is time to get worried. Sundar Pichai told the CBS programme 60 Minutes this month that AI could be "very harmful" if deployed wrongly, and was developing fast. "So does that keep me up at night? Google has launched Bard, a chatbot to rival the ChatGPT phenomenon, and its parent, Alphabet, owns the world-leading DeepMind, a UK-based AI company. He is not the only AI insider to voice concerns. Last week, Elon Musk said he had fallen out with the Google co-founder Larry Page because Page was "not taking AI safety seriously enough". Musk told Fox News that Page wanted "digital superintelligence, basically a digital god, if you will, as soon as possible". So how much of a danger is posed by unrestrained AI development? Musk is one of thousands of signatories to a letter published by the Future of Life Institute, a thinktank, that called for a six-month moratorium on the creation of "giant" AIs more powerful than GPT-4, the system that underpins ChatGPT and the chatbot integrated with Microsoft's Bing search engine. The risks cited by the letter include "loss of control of our civilization". The approach to product development shown by AI practitioners and the tech industry would not be tolerated in any other field, said Valรฉrie Pisano, another signatory to the letter. Pisano, the chief executive of Mila โ€“ the Quebec Artificial Intelligence Institute โ€“ says work was being carried out to make sure that these systems were not racist or violent, in a process known as alignment (ie, making sure they "align" with human values). But then they were released into the public realm. "The technology is put out there, and as the system interacts with humankind, its developers wait to see what happens and make adjustments based on that.


Tech Billionaires Bet on Fusion as Holy Grail for Business

WSJ.com: WSJD - Technology

Sam Altman became a tech sensation this year as the CEO of OpenAI, the artificial-intelligence startup that seems pulled from science fiction. But Mr. Altman, who has been among Silicon Valley's most prominent investors for more than a decade, has placed one of the biggest bets of his career on a company that might be even more futuristic: a nuclear-fusion startup called Helion Energy Inc.


Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness

arXiv.org Artificial Intelligence

The capability of Large Language Models (LLMs) like ChatGPT to comprehend user intent and provide reasonable responses has made them extremely popular lately. In this paper, we focus on assessing the overall ability of ChatGPT using 7 fine-grained information extraction (IE) tasks. Specially, we present the systematically analysis by measuring ChatGPT's performance, explainability, calibration, and faithfulness, and resulting in 15 keys from either the ChatGPT or domain experts. Our findings reveal that ChatGPT's performance in Standard-IE setting is poor, but it surprisingly exhibits excellent performance in the OpenIE setting, as evidenced by human evaluation. In addition, our research indicates that ChatGPT provides high-quality and trustworthy explanations for its decisions. However, there is an issue of ChatGPT being overconfident in its predictions, which resulting in low calibration. Furthermore, ChatGPT demonstrates a high level of faithfulness to the original text in the majority of cases. We manually annotate and release the test sets of 7 fine-grained IE tasks contains 14 datasets to further promote the research. The datasets and code are available at https://github.com/pkuserc/ChatGPT_for_IE.


Divide and Prompt: Chain of Thought Prompting for Text-to-SQL

arXiv.org Artificial Intelligence

Chain-of-thought (CoT) prompting combined with large language models (LLMs) have achieved encouraging results on complex reasoning tasks. Text-to-SQL is a critical semantic parsing task that converts natural language questions into SQL statements, involving a complex reasoning process. However, there is little work about using CoT prompting to activate LLM's reasoning capabilities on Text-to-SQL tasks. In this work, we propose a new paradigm for prompting Text-to-SQL tasks, called Divide-and-Prompt, which first divides the task into subtasks, and then approach each subtask through CoT. We present 3 prompting-based methods to enhance the Text-to-SQL ability of LLMs. Experiments show that these prompts guide LLMs to generate Text-to-SQL with higher execution accuracy.


SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models

arXiv.org Artificial Intelligence

Interpreting remote sensing imagery enables numerous downstream applications ranging from land-use planning to deforestation monitoring. Robustly classifying this data is challenging due to the Earth's geographic diversity. While many distinct satellite and aerial image classification datasets exist, there is yet to be a benchmark curated that suitably covers this diversity. In this work, we introduce SATellite ImageNet (SATIN), a metadataset curated from 27 existing remotely sensed datasets, and comprehensively evaluate the zero-shot transfer classification capabilities of a broad range of vision-language (VL) models on SATIN. We find SATIN to be a challenging benchmark-the strongest method we evaluate achieves a classification accuracy of 52.0%. We provide a $\href{https://satinbenchmark.github.io}{\text{public leaderboard}}$ to guide and track the progress of VL models in this important domain.


A Lightweight Constrained Generation Alternative for Query-focused Summarization

arXiv.org Artificial Intelligence

Query-focused summarization (QFS) aims to provide a summary of a document that satisfies information need of a given query and is useful in various IR applications, such as abstractive snippet generation. Current QFS approaches typically involve injecting additional information, e.g. query-answer relevance or fine-grained token-level interaction between a query and document, into a finetuned large language model. However, these approaches often require extra parameters \& training, and generalize poorly to new dataset distributions. To mitigate this, we propose leveraging a recently developed constrained generation model Neurological Decoding (NLD) as an alternative to current QFS regimes which rely on additional sub-architectures and training. We first construct lexical constraints by identifying important tokens from the document using a lightweight gradient attribution model, then subsequently force the generated summary to satisfy these constraints by directly manipulating the final vocabulary likelihood. This lightweight approach requires no additional parameters or finetuning as it utilizes both an off-the-shelf neural retrieval model to construct the constraints and a standard generative language model to produce the QFS. We demonstrate the efficacy of this approach on two public QFS collections achieving near parity with the state-of-the-art model with substantially reduced complexity.


A Thorough Examination on Zero-shot Dense Retrieval

arXiv.org Artificial Intelligence

Recent years have witnessed the significant advance in dense retrieval (DR) based on powerful pre-trained language models (PLM). DR models have achieved excellent performance in several benchmark datasets, while they are shown to be not as competitive as traditional sparse retrieval models (e.g., BM25) in a zero-shot retrieval setting. However, in the related literature, there still lacks a detailed and comprehensive study on zero-shot retrieval. In this paper, we present the first thorough examination of the zero-shot capability of DR models. We aim to identify the key factors and analyze how they affect zero-shot retrieval performance. In particular, we discuss the effect of several key factors related to source training set, analyze the potential bias from the target dataset, and review and compare existing zero-shot DR models. Our findings provide important evidence to better understand and develop zero-shot DR models.


Segment Anything in Non-Euclidean Domains: Challenges and Opportunities

arXiv.org Artificial Intelligence

The recent work known as Segment Anything (SA) has made significant strides in pushing the boundaries of semantic segmentation into the era of foundation models. The impact of SA has sparked extremely active discussions and ushered in an encouraging new wave of developing foundation models for the diverse tasks in the Euclidean domain, such as object detection and image inpainting. Despite the promising advances led by SA, the concept has yet to be extended to the non-Euclidean graph domain. In this paper, we explore a novel Segment Non-Euclidean Anything (SNA) paradigm that strives to develop foundation models that can handle the diverse range of graph data within the non-Euclidean domain, seeking to expand the scope of SA and lay the groundwork for future research in this direction. To achieve this goal, we begin by discussing the recent achievements in foundation models associated with SA. We then shed light on the unique challenges that arise when applying the SA concept to graph analysis, which involves understanding the differences between the Euclidean and non-Euclidean domains from both the data and task perspectives. Motivated by these observations, we present several preliminary solutions to tackle the challenges of SNA and detail their corresponding limitations, along with several potential directions to pave the way for future SNA research. Experiments on five Open Graph Benchmark (OGB) datasets across various tasks, including graph property classification and regression, as well as multi-label prediction, demonstrate that the performance of the naive SNA solutions has considerable room for improvement, pointing towards a promising avenue for future exploration of Graph General Intelligence.