Overview
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption
Shi, Luohe, Zhang, Hongyi, Yao, Yao, Li, Zuchao, Zhao, Hai
Large Language Models (LLMs), epitomized by ChatGPT' s release in late 2022, have revolutionized various industries with their advanced language comprehension. However, their efficiency is challenged by the Transformer architecture' s struggle with handling long texts. KV-Cache has emerged as a pivotal solution to this issue, converting the time complexity of token generation from quadratic to linear, albeit with increased GPU memory overhead proportional to conversation length. With the development of the LLM community and academia, various KV-Cache compression methods have been proposed. In this review, we dissect the various properties of KV-Cache and elaborate on various methods currently used to optimize the KV-Cache space usage of LLMs. These methods span the pre-training phase, deployment phase, and inference phase, and we summarize the commonalities and differences among these methods. Additionally, we list some metrics for evaluating the long-text capabilities of large language models, from both efficiency and capability perspectives. Our review thus sheds light on the evolving landscape of LLM optimization, offering insights into future advancements in this dynamic field.
Language Models as Models of Language
This chapter critically examines the potential contributions of modern language models to theoretical linguistics. Despite their focus on engineering goals, these models' ability to acquire sophisticated linguistic knowledge from mere exposure to data warrants a careful reassessment of their relevance to linguistic theory. I review a growing body of empirical evidence suggesting that language models can learn hierarchical syntactic structure and exhibit sensitivity to various linguistic phenomena, even when trained on developmentally plausible amounts of data. While the competence/performance distinction has been invoked to dismiss the relevance of such models to linguistic theory, I argue that this assessment may be premature. By carefully controlling learning conditions and making use of causal intervention methods, experiments with language models can potentially constrain hypotheses about language acquisition and competence. I conclude that closer collaboration between theoretical linguists and computational researchers could yield valuable insights, particularly in advancing debates about linguistic nativism.
HADRON: Human-friendly Control and Artificial Intelligence for Military Drone Operations
Faulí, Ana M. Casado, Malizia, Mario, Hasselmann, Ken, Flécher, Emile Le, De Cubber, Geert, Lauwens, Ben
As drones are getting more and more entangled in our society, more untrained users require the capability to operate them. This scenario is to be achieved through the development of artificial intelligence capabilities assisting the human operator in controlling the Unmanned Aerial System (UAS) and processing the sensor data, thereby alleviating the need for extensive operator training. This paper presents the HADRON project that seeks to develop and test multiple novel technologies to enable human-friendly control of drone swarms. This project is divided into three main parts. The first part consists of the integration of different technologies for the intuitive control of drones, focusing on novice or inexperienced pilots and operators. The second part focuses on the development of a multi-drone system that will be controlled from a command and control station, in which an expert pilot can supervise the operations of the multiple drones. The third part of the project will focus on reducing the cognitive load on human operators, whether they are novice or expert pilots. For this, we will develop AI tools that will assist drone operators with semi-automated real-time data processing.
A Comprehensive Survey on EEG-Based Emotion Recognition: A Graph-Based Perspective
Liu, Chenyu, Zhou, Xinliang, Wu, Yihao, Ding, Yi, Zhai, Liming, Wang, Kun, Jia, Ziyu, Liu, Yang
Compared to other modalities, electroencephalogram (EEG) based emotion recognition can intuitively respond to emotional patterns in the human brain and, therefore, has become one of the most focused tasks in affective computing. The nature of emotions is a physiological and psychological state change in response to brain region connectivity, making emotion recognition focus more on the dependency between brain regions instead of specific brain regions. A significant trend is the application of graphs to encapsulate such dependency as dynamic functional connections between nodes across temporal and spatial dimensions. Concurrently, the neuroscientific underpinnings behind this dependency endow the application of graphs in this field with a distinctive significance. However, there is neither a comprehensive review nor a tutorial for constructing emotion-relevant graphs in EEG-based emotion recognition. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of graph-related methods in this field from a methodological perspective. We propose a unified framework for graph applications in this field and categorize these methods on this basis. Finally, based on previous studies, we also present several open challenges and future directions in this field.
Artificial Neural Network and Deep Learning: Fundamentals and Theory
"Artificial Neural Network and Deep Learning: Fundamentals and Theory" offers a comprehensive exploration of the foundational principles and advanced methodologies in neural networks and deep learning. This book begins with essential concepts in descriptive statistics and probability theory, laying a solid groundwork for understanding data and probability distributions. As the reader progresses, they are introduced to matrix calculus and gradient optimization, crucial for training and fine-tuning neural networks. The book delves into multilayer feed-forward neural networks, explaining their architecture, training processes, and the backpropagation algorithm. Key challenges in neural network optimization, such as activation function saturation, vanishing and exploding gradients, and weight initialization, are thoroughly discussed. The text covers various learning rate schedules and adaptive algorithms, providing strategies to optimize the training process. Techniques for generalization and hyperparameter tuning, including Bayesian optimization and Gaussian processes, are also presented to enhance model performance and prevent overfitting. Advanced activation functions are explored in detail, categorized into sigmoid-based, ReLU-based, ELU-based, miscellaneous, non-standard, and combined types. Each activation function is examined for its properties and applications, offering readers a deep understanding of their impact on neural network behavior. The final chapter introduces complex-valued neural networks, discussing complex numbers, functions, and visualizations, as well as complex calculus and backpropagation algorithms. This book equips readers with the knowledge and skills necessary to design, and optimize advanced neural network models, contributing to the ongoing advancements in artificial intelligence.
Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks
De Stefano, Gianluca, Schönherr, Lea, Pellegrino, Giancarlo
Retrieval Augmented Generation (RAG) is a technique commonly used to equip models with out of distribution knowledge. This process involves collecting, indexing, retrieving, and providing information to an LLM for generating responses. Despite its growing popularity due to its flexibility and low cost, the security implications of RAG have not been extensively studied. The data for such systems are often collected from public sources, providing an attacker a gateway for indirect prompt injections to manipulate the responses of the model. In this paper, we investigate the security of RAG systems against end-to-end indirect prompt manipulations. First, we review existing RAG framework pipelines, deriving a prototypical architecture and identifying critical parameters. We then examine prior works searching for techniques that attackers can use to perform indirect prompt manipulations. Finally, we implemented Rag 'n Roll, a framework to determine the effectiveness of attacks against end-to-end RAG applications. Our results show that existing attacks are mostly optimized to boost the ranking of malicious documents during the retrieval phase. However, a higher rank does not immediately translate into a reliable attack. Most attacks, against various configurations, settle around a 40% success rate, which could rise to 60% when considering ambiguous answers as successful attacks (those that include the expected benign one as well). Additionally, when using unoptimized documents, attackers deploying two of them (or more) for a target query can achieve similar results as those using optimized ones. Finally, exploration of the configuration space of a RAG showed limited impact in thwarting the attacks, where the most successful combination severely undermines functionality.
OWL2Vec4OA: Tailoring Knowledge Graph Embeddings for Ontology Alignment
Teymurova, Sevinj, Jiménez-Ruiz, Ernesto, Weyde, Tillman, Chen, Jiaoyan
Ontology alignment is integral to achieving semantic interoperability as the number of available ontologies covering intersecting domains is increasing. This paper proposes OWL2Vec4OA, an extension of the ontology embedding system OWL2Vec*. While OWL2Vec* has emerged as a powerful technique for ontology embedding, it currently lacks a mechanism to tailor the embedding to the ontology alignment task. OWL2Vec4OA incorporates edge confidence values from seed mappings to guide the random walk strategy. We present the theoretical foundations, implementation details, and experimental evaluation of our proposed extension, demonstrating its potential effectiveness for ontology alignment tasks.
From Graphs to Qubits: A Critical Review of Quantum Graph Neural Networks
Ceschini, Andrea, Mauro, Francesco, De Falco, Francesca, Sebastianelli, Alessandro, Verdone, Alessio, Rosato, Antonello, Saux, Bertrand Le, Panella, Massimo, Gamba, Paolo, Ullo, Silvia L.
Quantum Graph Neural Networks (QGNNs) represent a novel fusion of quantum computing and Graph Neural Networks (GNNs), aimed at overcoming the computational and scalability challenges inherent in classical GNNs that are powerful tools for analyzing data with complex relational structures but suffer from limitations such as high computational complexity and over-smoothing in large-scale applications. Quantum computing, leveraging principles like superposition and entanglement, offers a pathway to enhanced computational capabilities. This paper critically reviews the state-of-the-art in QGNNs, exploring various architectures. We discuss their applications across diverse fields such as high-energy physics, molecular chemistry, finance and earth sciences, highlighting the potential for quantum advantage. Additionally, we address the significant challenges faced by QGNNs, including noise, decoherence, and scalability issues, proposing potential strategies to mitigate these problems. This comprehensive review aims to provide a foundational understanding of QGNNs, fostering further research and development in this promising interdisciplinary field.
IN-Sight: Interactive Navigation through Sight
Schoch, Philipp, Yang, Fan, Ma, Yuntao, Leutenegger, Stefan, Hutter, Marco, Leboutet, Quentin
Current visual navigation systems often treat the environment as static, lacking the ability to adaptively interact with obstacles. This limitation leads to navigation failure when encountering unavoidable obstructions. In response, we introduce IN-Sight, a novel approach to self-supervised path planning, enabling more effective navigation strategies through interaction with obstacles. Utilizing RGB-D observations, IN-Sight calculates traversability scores and incorporates them into a semantic map, facilitating long-range path planning in complex, maze-like environments. To precisely navigate around obstacles, IN-Sight employs a local planner, trained imperatively on a differentiable costmap using representation learning techniques. The entire framework undergoes end-to-end training within the state-of-the-art photorealistic Intel SPEAR Simulator. We validate the effectiveness of IN-Sight through extensive benchmarking in a variety of simulated scenarios and ablation studies. Moreover, we demonstrate the system's real-world applicability with zero-shot sim-to-real transfer, deploying our planner on the legged robot platform ANYmal, showcasing its practical potential for interactive navigation in real environments.
Markov Senior -- Learning Markov Junior Grammars to Generate User-specified Content
Oğuz, Mehmet Kayra, Dockhorn, Alexander
Markov Junior is a probabilistic programming language used for procedural content generation across various domains. However, its reliance on manually crafted and tuned probabilistic rule sets, also called grammars, presents a significant bottleneck, diverging from approaches that allow rule learning from examples. In this paper, we propose a novel solution to this challenge by introducing a genetic programming-based optimization framework for learning hierarchical rule sets automatically. Our proposed method ``Markov Senior'' focuses on extracting positional and distance relations from single input samples to construct probabilistic rules to be used by Markov Junior. Using a Kullback-Leibler divergence-based fitness measure, we search for grammars to generate content that is coherent with the given sample. To enhance scalability, we introduce a divide-and-conquer strategy that enables the efficient generation of large-scale content. We validate our approach through experiments in generating image-based content and Super Mario levels, demonstrating its flexibility and effectiveness. In this way, ``Markov Senior'' allows for the wider application of Markov Junior for tasks in which an example may be available, but the design of a generative rule set is infeasible.