Goto

Collaborating Authors

 ead



EAD: An EEG Adapter for Automated Classification

Singh, Pushapdeep, Nigam, Jyoti, Krishna, Medicherla Vamsi, Bhavsar, Arnav, Nigam, Aditya

arXiv.org Artificial Intelligence

While electroencephalography (EEG) has been a popular modality for neural decoding, it often involves task specific acquisition of the EEG data. This poses challenges for the development of a unified pipeline to learn embeddings for various EEG signal classification, which is often involved in various decoding tasks. Traditionally, EEG classification involves the step of signal preprocessing and the use of deep learning techniques, which are highly dependent on the number of EEG channels in each sample. However, the same pipeline cannot be applied even if the EEG data is collected for the same experiment but with different acquisition devices. This necessitates the development of a framework for learning EEG embeddings, which could be highly beneficial for tasks involving multiple EEG samples for the same task but with varying numbers of EEG channels. In this work, we propose EEG Adapter (EAD), a flexible framework compatible with any signal acquisition device. More specifically, we leverage a recent EEG foundational model with significant adaptations to learn robust representations from the EEG data for the classification task. We evaluate EAD on two publicly available datasets achieving state-of-the-art accuracies 99.33% and 92.31% on EEG-ImageNet and BrainLat respectively. This illustrates the effectiveness of the proposed framework across diverse EEG datasets containing two different perception tasks: stimulus and resting-state EEG signals. We also perform zero-shot EEG classification on EEG-ImageNet task to demonstrate the generalization capability of the proposed approach.


HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Luo, Cheng, Cai, Zefan, Sun, Hanshi, Xiao, Jinqi, Yuan, Bo, Xiao, Wen, Hu, Junjie, Zhao, Jiawei, Chen, Beidi, Anandkumar, Anima

arXiv.org Artificial Intelligence

Transformer-based large language models (LLMs) demonstrate impressive performance in long context generation. Extending the context length has disproportionately shifted the memory footprint of LLMs during inference to the key-value cache (KV cache). In this paper, we propose HEADINFER, which offloads the KV cache to CPU RAM while avoiding the need to fully store the KV cache for any transformer layer on the GPU. HEADINFER employs a fine-grained, head-wise offloading strategy, maintaining only selective attention heads KV cache on the GPU while computing attention output dynamically. Through roofline analysis, we demonstrate that HEADINFER maintains computational efficiency while significantly reducing memory footprint. We evaluate HEADINFER on the Llama-3-8B model with a 1-million-token sequence, reducing the GPU memory footprint of the KV cache from 128 GB to 1 GB and the total GPU memory usage from 207 GB to 17 GB, achieving a 92% reduction compared to BF16 baseline inference. Notably, HEADINFER enables 4-million-token inference with an 8B model on a single consumer GPU with 24GB memory (e.g., NVIDIA RTX 4090) without approximation methods.


A-MEM: Agentic Memory for LLM Agents

Xu, Wujiang, Liang, Zujie, Mei, Kai, Gao, Hang, Tan, Juntao, Zhang, Yongfeng

arXiv.org Artificial Intelligence

While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basic principles of the Zettelkasten method, we designed our memory system to create interconnected knowledge networks through dynamic indexing and linking. When a new memory is added, we generate a comprehensive note containing multiple structured attributes, including contextual descriptions, keywords, and tags. The system then analyzes historical memories to identify relevant connections, establishing links where meaningful similarities exist. Additionally, this process enables memory evolution - as new memories are integrated, they can trigger updates to the contextual representations and attributes of existing historical memories, allowing the memory network to continuously refine its understanding. Our approach combines the structured organization principles of Zettelkasten with the flexibility of agent-driven decision making, allowing for more adaptive and context-aware memory management. Empirical experiments on six foundation models show superior improvement against existing SOTA baselines. The source code is available at https://github.com/WujiangXu/AgenticMemory.


Advancing Manuscript Metadata: Work in Progress at the Jagiellonian University

Miranda, Luiz do Valle, Kutt, Krzysztof, Nalepa, Grzegorz J.

arXiv.org Artificial Intelligence

As part of ongoing research projects, three Jagiellonian University units -- the Jagiellonian University Museum, the Jagiellonian University Archives, and the Jagiellonian Library -- are collaborating to digitize cultural heritage documents, describe them in detail, and then integrate these descriptions into a linked data cloud. Achieving this goal requires, as a first step, the development of a metadata model that, on the one hand, complies with existing standards, on the other hand, allows interoperability with other systems, and on the third, captures all the elements of description established by the curators of the collections. In this paper, we present a report on the current status of the work, in which we outline the most important requirements for the data model under development and then make a detailed comparison with the two standards that are the most relevant from the point of view of collections: Europeana Data Model used in Europeana and Encoded Archival Description used in Kalliope.


Embodied Active Defense: Leveraging Recurrent Feedback to Counter Adversarial Patches

Wu, Lingxuan, Yang, Xiao, Dong, Yinpeng, Xie, Liuwei, Su, Hang, Zhu, Jun

arXiv.org Artificial Intelligence

The vulnerability of deep neural networks to adversarial patches has motivated numerous defense strategies for boosting model robustness. However, the prevailing defenses depend on single observation or pre-established adversary information to counter adversarial patches, often failing to be confronted with unseen or adaptive adversarial attacks and easily exhibiting unsatisfying performance in dynamic 3D environments. Inspired by active human perception and recurrent feedback mechanisms, we develop Embodied Active Defense (EAD), a proactive defensive strategy that actively contextualizes environmental information to address misaligned adversarial patches in 3D real-world settings. To achieve this, EAD develops two central recurrent sub-modules, i.e., a perception module and a policy module, to implement two critical functions of active vision. These models recurrently process a series of beliefs and observations, facilitating progressive refinement of their comprehension of the target object and enabling the development of strategic actions to counter adversarial patches in 3D environments. To optimize learning efficiency, we incorporate a differentiable approximation of environmental dynamics and deploy patches that are agnostic to the adversary strategies. Extensive experiments demonstrate that EAD substantially enhances robustness against a variety of patches within just a few steps through its action policy in safety-critical tasks (e.g., face recognition and object detection), without compromising standard accuracy. Furthermore, due to the attack-agnostic characteristic, EAD facilitates excellent generalization to unseen attacks, diminishing the averaged attack success rate by 95 percent across a range of unseen adversarial attacks.


Fast and Accurate Reduced-Order Modeling of a MOOSE-based Additive Manufacturing Model with Operator Learning

Yaseen, Mahmoud, Yushu, Dewen, German, Peter, Wu, Xu

arXiv.org Machine Learning

One predominant challenge in additive manufacturing (AM) is to achieve specific material properties by manipulating manufacturing process parameters during the runtime. Such manipulation tends to increase the computational load imposed on existing simulation tools employed in AM. The goal of the present work is to construct a fast and accurate reduced-order model (ROM) for an AM model developed within the Multiphysics Object-Oriented Simulation Environment (MOOSE) framework, ultimately reducing the time/cost of AM control and optimization processes. Our adoption of the operator learning (OL) approach enabled us to learn a family of differential equations produced by altering process variables in the laser's Gaussian point heat source. More specifically, we used the Fourier neural operator (FNO) and deep operator network (DeepONet) to develop ROMs for time-dependent responses. Furthermore, we benchmarked the performance of these OL methods against a conventional deep neural network (DNN)-based ROM. Ultimately, we found that OL methods offer comparable performance and, in terms of accuracy and generalizability, even outperform DNN at predicting scalar model responses. The DNN-based ROM afforded the fastest training time. Furthermore, all the ROMs were faster than the original MOOSE model yet still provided accurate predictions. FNO had a smaller mean prediction error than DeepONet, with a larger variance for time-dependent responses. Unlike DNN, both FNO and DeepONet were able to simulate time series data without the need for dimensionality reduction techniques. The present work can help facilitate the AM optimization process by enabling faster execution of simulation tools while still preserving evaluation accuracy.


Robust Natural Language Understanding with Residual Attention Debiasing

Wang, Fei, Huang, James Y., Yan, Tianyi, Zhou, Wenxuan, Chen, Muhao

arXiv.org Artificial Intelligence

Natural language understanding (NLU) models often suffer from unintended dataset biases. Among bias mitigation methods, ensemble-based debiasing methods, especially product-of-experts (PoE), have stood out for their impressive empirical success. However, previous ensemble-based debiasing methods typically apply debiasing on top-level logits without directly addressing biased attention patterns. Attention serves as the main media of feature interaction and aggregation in PLMs and plays a crucial role in providing robust prediction. In this paper, we propose REsidual Attention Debiasing (READ), an end-to-end debiasing method that mitigates unintended biases from attention. Experiments on three NLU tasks show that READ significantly improves the performance of BERT-based models on OOD data with shortcuts removed, including +12.9% accuracy on HANS, +11.0% accuracy on FEVER-Symmetric, and +2.7% F1 on PAWS. Detailed analyses demonstrate the crucial role of unbiased attention in robust NLU models and that READ effectively mitigates biases in attention. Code is available at https://github.com/luka-group/READ.


Construction of Decision Trees and Acyclic Decision Graphs from Decision Rule Systems

Durdymyradov, Kerven, Moshkov, Mikhail

arXiv.org Artificial Intelligence

In this paper, we consider the problems of transforming systems of decision rules into decision trees. This paper builds upon our previous work [12]. In that paper, we showed that the minimum depth of a decision tree derived from the decision rule system can be much less than the number of different attributes in the rules from the system. In such cases, it is reasonable to use decision trees. In the present paper, for some types of decision rule systems and problems, we prove the existence of polynomial time algorithms for the construction of decision trees and two types of acyclic decision graphs representing decision trees. In all other cases, we prove the absence of such algorithms using the fact that the minimum number of nodes in decision trees or acyclic decision graphs can grow as a superpolynomial function depending on the size of decision rule systems. To avoid difficulties related to the number of nodes in the decision trees, we discuss also the possibility of not building the entire decision tree, but describing the computation path in this tree for the given input.


Generate rather than Retrieve: Large Language Models are Strong Context Generators

Yu, Wenhao, Iter, Dan, Wang, Shuohang, Xu, Yichong, Ju, Mingxuan, Sanyal, Soumya, Zhu, Chenguang, Zeng, Michael, Jiang, Meng

arXiv.org Artificial Intelligence

Knowledge-intensive tasks, such as open-domain question answering (QA), require access to a large amount of world or domain knowledge. A common approach for knowledge-intensive tasks is to employ a retrieve-then-read pipeline that first retrieves a handful of relevant contextual documents from an external corpus such as Wikipedia and then predicts an answer conditioned on the retrieved documents. In this paper, we present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators. We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer. Furthermore, we propose a novel clustering-based prompting method that selects distinct prompts, resulting in the generated documents that cover different perspectives, leading to better recall over acceptable answers. We conduct extensive experiments on three different knowledge-intensive tasks, including open-domain QA, fact checking, and dialogue system. Notably, GenRead achieves 71.6 and 54.4 exact match scores on TriviaQA and WebQ, significantly outperforming the state-of-the-art retrieve-then-read pipeline DPR-FiD by +4.0 and +3.9, without retrieving any documents from any external knowledge source. Lastly, we demonstrate the model performance can be further improved by combining retrieval and generation. Our code and generated documents can be found at https://github.com/wyu97/GenRead.