Large Language Model
Empower Your Model with Longer and Better Context Comprehension
Gao, Yifei, Wang, Lei, Fang, Jun, Hu, Longhua, Cheng, Jun
Recently, with the emergence of numerous Large Language Models (LLMs), the implementation of AI has entered a new era. Irrespective of these models' own capacity and structure, there is a growing demand for LLMs to possess enhanced comprehension of longer and more complex contexts with relatively smaller sizes. Models often encounter an upper limit when processing sequences of sentences that extend beyond their comprehension capacity and result in off-topic or even chaotic responses. While several recent works attempt to address this issue in various ways, they rarely focus on "why models are unable to compensate or strengthen their capabilities on their own". In this paper, we thoroughly investigate the nature of information transfer within LLMs and propose a novel technique called Attention Transition. This technique empowers models to achieve longer and better context comprehension with minimal additional training or impact on generation fluency. Our experiments are conducted on the challenging XSum dataset using LLaMa-7b model with context token length ranging from 800 to 1900. Results demonstrate that we achieve substantial improvements compared with the original generation results evaluated by GPT4.
RRAML: Reinforced Retrieval Augmented Machine Learning
Bacciu, Andrea, Cuconasu, Florin, Siciliano, Federico, Silvestri, Fabrizio, Tonellotto, Nicola, Trappolini, Giovanni
The emergence of large language models (LLMs) has revolutionized machine learning and related fields, showcasing remarkable abilities in comprehending, generating, and manipulating human language. However, their conventional usage through API-based text prompt submissions imposes certain limitations in terms of context constraints and external source availability. To address these challenges, we propose a novel framework called Reinforced Retrieval Augmented Machine Learning (RRAML). RRAML integrates the reasoning capabilities of LLMs with supporting information retrieved by a purpose-built retriever from a vast user-provided database. By leveraging recent advancements in reinforcement learning, our method effectively addresses several critical challenges. Firstly, it circumvents the need for accessing LLM gradients. Secondly, our method alleviates the burden of retraining LLMs for specific tasks, as it is often impractical or impossible due to restricted access to the model and the computational intensity involved. Additionally we seamlessly link the retriever's task with the reasoner, mitigating hallucinations and reducing irrelevant, and potentially damaging retrieved documents. We believe that the research agenda outlined in this paper has the potential to profoundly impact the field of AI, democratizing access to and utilization of LLMs for a wide range of entities.
Artificial intelligence is algorithmic mimicry: why artificial "agents" are not (and won't be) proper agents
What is the prospect of developing artificial general intelligence (AGI)? I investigate this question by systematically comparing living and algorithmic systems, with a special focus on the notion of "agency." There are three fundamental differences to consider: (1) Living systems are autopoietic, that is, self-manufacturing, and therefore able to set their own intrinsic goals, while algorithms exist in a computational environment with target functions that are both provided by an external agent. (2) Living systems are embodied in the sense that there is no separation between their symbolic and physical aspects, while algorithms run on computational architectures that maximally isolate software from hardware. (3) Living systems experience a large world, in which most problems are ill-defined (and not all definable), while algorithms exist in a small world, in which all problems are well-defined. These three differences imply that living and algorithmic systems have very different capabilities and limitations. In particular, it is extremely unlikely that true AGI (beyond mere mimicry) can be developed in the current algorithmic framework of AI research. Consequently, discussions about the proper development and deployment of algorithmic tools should be shaped around the dangers and opportunities of current narrow AI, not the extremely unlikely prospect of the emergence of true agency in artificial systems.
Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education
Sharma, Prabin, Thapa, Kisan, Thapa, Dikshya, Dhakal, Prastab, Upadhaya, Mala Deep, Adhikari, Santosh, Khanal, Salik Ram
Artificial intelligence is gaining traction in more ways than ever before. The popularity of language models and AI-based businesses has soared since ChatGPT was made available to the general public via OpenAI. It is becoming increasingly common for people to use ChatGPT both professionally and personally. Considering the widespread use of ChatGPT and the reliance people place on it, this study determined how reliable ChatGPT can be for answering complex medical and clinical questions. Harvard University gross anatomy along with the United States Medical Licensing Examination (USMLE) questionnaire were used to accomplish the objective. The paper evaluated the obtained results using a 2-way ANOVA and posthoc analysis. Both showed systematic covariation between format and prompt. Furthermore, the physician adjudicators independently rated the outcome's accuracy, concordance, and insight. As a result of the analysis, ChatGPT-generated answers were found to be more context-oriented and represented a better model for deductive reasoning than regular Google search results. Furthermore, ChatGPT obtained 58.8% on logical questions and 60% on ethical questions. This means that the ChatGPT is approaching the passing range for logical questions and has crossed the threshold for ethical questions. The paper believes ChatGPT and other language learning models can be invaluable tools for e-learners; however, the study suggests that there is still room to improve their accuracy. In order to improve ChatGPT's performance in the future, further research is needed to better understand how it can answer different types of questions.
Fine-Tuned but Zero-Shot 3D Shape Sketch View Similarity and Retrieval
Berardi, Gianluca, Gryaditskaya, Yulia
Recently, encoders like ViT (vision transformer) and ResNet have been trained on vast datasets and utilized as perceptual metrics for comparing sketches and images, as well as multi-domain encoders in a zero-shot setting. However, there has been limited effort to quantify the granularity of these encoders. Our work addresses this gap by focusing on multi-modal 2D projections of individual 3D instances. This task holds crucial implications for retrieval and sketch-based modeling. We show that in a zero-shot setting, the more abstract the sketch, the higher the likelihood of incorrect image matches. Even within the same sketch domain, sketches of the same object drawn in different styles, for example by distinct individuals, might not be accurately matched. One of the key findings of our research is that meticulous fine-tuning on one class of 3D shapes can lead to improved performance on other shape classes, reaching or surpassing the accuracy of supervised methods. We compare and discuss several fine-tuning strategies. Additionally, we delve deeply into how the scale of an object in a sketch influences the similarity of features at different network layers, helping us identify which network layers provide the most accurate matching. Significantly, we discover that ViT and ResNet perform best when dealing with similar object scales. We believe that our work will have a significant impact on research in the sketch domain, providing insights and guidance on how to adopt large pretrained models as perceptual losses.
Towards Explainable and Language-Agnostic LLMs: Symbolic Reverse Engineering of Language at Scale
Large language models (LLMs) have achieved a milestone that undenia-bly changed many held beliefs in artificial intelligence (AI). However, there remains many limitations of these LLMs when it comes to true language understanding, limitations that are a byproduct of the under-lying architecture of deep neural networks. Moreover, and due to their subsymbolic nature, whatever knowledge these models acquire about how language works will always be buried in billions of microfeatures (weights), none of which is meaningful on its own, making such models hopelessly unexplainable. To address these limitations, we suggest com-bining the strength of symbolic representations with what we believe to be the key to the success of LLMs, namely a successful bottom-up re-verse engineering of language at scale. As such we argue for a bottom-up reverse engineering of language in a symbolic setting. Hints on what this project amounts to have been suggested by several authors, and we discuss in some detail here how this project could be accomplished.
VISAR: A Human-AI Argumentative Writing Assistant with Visual Programming and Rapid Draft Prototyping
Zhang, Zheng, Gao, Jie, Dhaliwal, Ranjodh Singh, Li, Toby Jia-Jun
In argumentative writing, writers must brainstorm hierarchical writing goals, ensure the persuasiveness of their arguments, and revise and organize their plans through drafting. Recent advances in large language models (LLMs) have made interactive text generation through a chat interface (e.g., ChatGPT) possible. However, this approach often neglects implicit writing context and user intent, lacks support for user control and autonomy, and provides limited assistance for sensemaking and revising writing plans. To address these challenges, we introduce VISAR, an AI-enabled writing assistant system designed to help writers brainstorm and revise hierarchical goals within their writing context, organize argument structures through synchronized text editing and visual programming, and enhance persuasiveness with argumentation spark recommendations. VISAR allows users to explore, experiment with, and validate their writing plans using automatic draft prototyping. A controlled lab study confirmed the usability and effectiveness of VISAR in facilitating the argumentative writing planning process.
ThoughtSource: A central hub for large language model reasoning data
Ott, Simon, Hebenstreit, Konstantin, Liรฉvin, Valentin, Hother, Christoffer Egeberg, Moradi, Milad, Mayrhauser, Maximilian, Praas, Robert, Winther, Ole, Samwald, Matthias
Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results across a wide range of tasks. LLMs are still limited, however, in that they frequently fail at complex reasoning, their reasoning processes are opaque, they are prone to 'hallucinate' facts, and there are concerns about their underlying biases. Letting models verbalize reasoning steps as natural language, a technique known as chain-of-thought prompting, has recently been proposed as a way to address some of these issues. Here we present ThoughtSource, a meta-dataset and software library for chain-of-thought (CoT) reasoning. The goal of ThoughtSource is to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs, enabling empirical evaluations, and providing training data. This first release of ThoughtSource integrates seven scientific/medical, three general-domain and five math word question answering datasets.
Large Language Models Struggle to Learn Long-Tail Knowledge
Kandpal, Nikhil, Deng, Haikang, Roberts, Adam, Wallace, Eric, Raffel, Colin
The Internet contains a wealth of knowledge -- from the birthdays of historical figures to tutorials on how to code -- all of which may be learned by language models. However, while certain pieces of information are ubiquitous on the web, others appear extremely rarely. In this paper, we study the relationship between the knowledge memorized by large language models and the information in pre-training datasets scraped from the web. In particular, we show that a language model's ability to answer a fact-based question relates to how many documents associated with that question were seen during pre-training. We identify these relevant documents by entity linking pre-training datasets and counting documents that contain the same entities as a given question-answer pair. Our results demonstrate strong correlational and causal relationships between accuracy and relevant document count for numerous question answering datasets (e.g., TriviaQA), pre-training corpora (e.g., ROOTS), and model sizes (e.g., 176B parameters). Moreover, while larger models are better at learning long-tail knowledge, we estimate that today's models must be scaled by many orders of magnitude to reach competitive QA performance on questions with little support in the pre-training data. Finally, we show that retrieval-augmentation can reduce the dependence on relevant pre-training information, presenting a promising approach for capturing the long-tail.
Towards Out-Of-Distribution Generalization: A Survey
Liu, Jiashuo, Shen, Zheyan, He, Yue, Zhang, Xingxuan, Xu, Renzhe, Yu, Han, Cui, Peng
Traditional machine learning paradigms are based on the assumption that both training and test data follow the same statistical pattern, which is mathematically referred to as Independent and Identically Distributed ($i.i.d.$). However, in real-world applications, this $i.i.d.$ assumption often fails to hold due to unforeseen distributional shifts, leading to considerable degradation in model performance upon deployment. This observed discrepancy indicates the significance of investigating the Out-of-Distribution (OOD) generalization problem. OOD generalization is an emerging topic of machine learning research that focuses on complex scenarios wherein the distributions of the test data differ from those of the training data. This paper represents the first comprehensive, systematic review of OOD generalization, encompassing a spectrum of aspects from problem definition, methodological development, and evaluation procedures, to the implications and future directions of the field. Our discussion begins with a precise, formal characterization of the OOD generalization problem. Following that, we categorize existing methodologies into three segments: unsupervised representation learning, supervised model learning, and optimization, according to their positions within the overarching learning process. We provide an in-depth discussion on representative methodologies for each category, further elucidating the theoretical links between them. Subsequently, we outline the prevailing benchmark datasets employed in OOD generalization studies. To conclude, we overview the existing body of work in this domain and suggest potential avenues for future research on OOD generalization. A summary of the OOD generalization methodologies surveyed in this paper can be accessed at http://out-of-distribution-generalization.com.