Goto

Collaborating Authors

 hallmark


Characterizing emergent representations in a space of candidate learning rules for deep networks

Neural Information Processing Systems

How are sensory representations learned via experience? Deep learning offers a theoretical toolkit for studying how neural codes emerge under different learning rules. Studies suggesting that representations in deep networks resemble those in biological brains have mostly relied on one specific learning rule: gradient descent, the workhorse behind modern deep learning. However, it remains unclear how robust these emergent representations in deep networks are to this specific choice of learning algorithm. Here we present a continuous two-dimensional space of candidate learning rules, parameterized by levels of top-down feedback and Hebbian learning.


ImmunoFOMO: Are Language Models missing what oncologists see?

Sinha, Aman, Popescu, Bogdan-Valentin, Coubez, Xavier, Clausel, Marianne, Constant, Mathieu

arXiv.org Artificial Intelligence

Language models (LMs) capabilities have grown with a fast pace over the past decade leading researchers in various disciplines, such as biomedical research, to increasingly explore the utility of LMs in their day-to-day applications. Domain specific language models have already been in use for biomedical natural language processing (NLP) applications. Recently however, the interest has grown towards medical language models and their understanding capabilities. In this paper, we investigate the medical conceptual grounding of various language models against expert clinicians for identification of hallmarks of immunotherapy in breast cancer abstracts. Our results show that pre-trained language models have potential to outperform large language models in identifying very specific (low-level) concepts.


QUAD-LLM-MLTC: Large Language Models Ensemble Learning for Healthcare Text Multi-Label Classification

Sakai, Hajar, Lam, Sarah S.

arXiv.org Artificial Intelligence

The escalating volume of collected healthcare textual data presents a unique challenge for automated Multi-Label Text Classification (MLTC), which is primarily due to the scarcity of annotated texts for training and their nuanced nature. Traditional machine learning models often fail to fully capture the array of expressed topics. However, Large Language Models (LLMs) have demonstrated remarkable effectiveness across numerous Natural Language Processing (NLP) tasks in various domains, which show impressive computational efficiency and suitability for unsupervised learning through prompt engineering. Consequently, these LLMs promise an effective MLTC of medical narratives. However, when dealing with various labels, different prompts can be relevant depending on the topic. To address these challenges, the proposed approach, QUAD-LLM-MLTC, leverages the strengths of four LLMs: GPT-4o, BERT, PEGASUS, and BART. QUAD-LLM-MLTC operates in a sequential pipeline in which BERT extracts key tokens, PEGASUS augments textual data, GPT-4o classifies, and BART provides topics' assignment probabilities, which results in four classifications, all in a 0-shot setting. The outputs are then combined using ensemble learning and processed through a meta-classifier to produce the final MLTC result. The approach is evaluated using three samples of annotated texts, which contrast it with traditional and single-model methods. The results show significant improvements across the majority of the topics in the classification's F1 score and consistency (F1 and Micro-F1 scores of 78.17% and 80.16% with standard deviations of 0.025 and 0.011, respectively). This research advances MLTC using LLMs and provides an efficient and scalable solution to rapidly categorize healthcare-related text data without further training.


Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs' Domain-Specific Insight Learning?

Pezeshkpour, Pouya, Hruschka, Estevam

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable performance on various tasks, yet their ability to extract and internalize deeper insights from domain-specific datasets remains underexplored. In this study, we investigate how continual pre-training can enhance LLMs' capacity for insight learning across three distinct forms: declarative, statistical, and probabilistic insights. Focusing on two critical domains: medicine and finance, we employ LoRA to train LLMs on two existing datasets. To evaluate each insight type, we create benchmarks to measure how well continual pre-training helps models go beyond surface-level knowledge. We also assess the impact of document modification on capturing insights. The results show that, while continual pre-training on original documents has a marginal effect, modifying documents to retain only essential information significantly enhances the insight-learning capabilities of LLMs.


A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain

Kroll, Hermann, Sackhoff, Pascal, Thang, Bill Matthias, Ksouri, Maha, Balke, Wolf-Tilo

arXiv.org Artificial Intelligence

Digital libraries that maintain extensive textual collections may One way to explore a digital library's content is to apply natural want to further enrich their content for certain downstream applications, language processing methods, e.g., identify central entities (e.g., e.g., building knowledge graphs, semantic enrichment of the Person Albert Einstein), their relationships (e.g., Albert Einstein documents, or implementing novel access paths. All of these applications was born in Ulm), and classify documents as belonging to require some text processing, either to identify relevant classes (e.g., descriptive articles). The extraction of semantic relationships entities, extract semantic relationships between them, or to classify between named entities is already used in several digital documents into some categories. However, implementing reliable, library projects for different purposes, e.g., constructing a biomedical supervised workflows can become quite challenging for a digital knowledge graph from scientific papers like SemMedDB [18], library because suitable training data must be crafted, and reliable harvesting leader boards of how computer science methods perform models must be trained. While many works focus on achieving the on benchmarks [17], harvesting scientific information as done highest accuracy on some benchmarks, we tackle the problem from in SciGraph [44], enabling graph-based discovery systems in digital a digital library practitioner. In other words, we also consider tradeoffs libraries [20], or enriching library content like newspapers as done between accuracy and application costs, dive into training data in the Swiss-Luxembourgish impresso [10].


Characterizing emergent representations in a space of candidate learning rules for deep networks

Neural Information Processing Systems

How are sensory representations learned via experience? Deep learning offers a theoretical toolkit for studying how neural codes emerge under different learning rules. Studies suggesting that representations in deep networks resemble those in biological brains have mostly relied on one specific learning rule: gradient descent, the workhorse behind modern deep learning. However, it remains unclear how robust these emergent representations in deep networks are to this specific choice of learning algorithm. Here we present a continuous two-dimensional space of candidate learning rules, parameterized by levels of top-down feedback and Hebbian learning.


Hallmarks of Optimization Trajectories in Neural Networks and LLMs: The Lengths, Bends, and Dead Ends

Singh, Sidak Pal, He, Bobby, Hofmann, Thomas, Schölkopf, Bernhard

arXiv.org Machine Learning

We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich structure of parameters contained within their optimization trajectories. Towards this end, we introduce some natural notions of the complexity of optimization trajectories, both qualitative and quantitative, which reveal the inherent nuance and interplay involved between various optimization choices, such as momentum, weight decay, and batch size. We use them to provide key hallmarks about the nature of optimization in deep neural networks: when it goes right, and when it finds itself in a dead end. Further, thanks to our trajectory perspective, we uncover an intertwined behaviour of momentum and weight decay that promotes directional exploration, as well as a directional regularization behaviour of some others. We perform experiments over large-scale vision and language settings, including large language models (LLMs) with up to 12 billion parameters, to demonstrate the value of our approach.


Improving Cancer Hallmark Classification with BERT-based Deep Learning Approach

Zavrak, Sultan, Yilmaz, Seyhmus

arXiv.org Artificial Intelligence

This paper presents a novel approach to accurately classify the hallmarks of cancer, which is a crucial task in cancer research. Our proposed method utilizes the Bidirectional Encoder Representations from Transformers (BERT) architecture, which has shown exceptional performance in various downstream applications. By applying transfer learning, we fine-tuned the pre-trained BERT model on a small corpus of biomedical text documents related to cancer. The outcomes of our experimental investigations demonstrate that our approach attains a noteworthy accuracy of 94.45%, surpassing almost all prior findings with a substantial increase of at least 8.04% as reported in the literature. These findings highlight the effectiveness of our proposed model in accurately classifying and comprehending text documents for cancer research, thus contributing significantly to the field. As cancer remains one of the top ten leading causes of death globally, our approach holds great promise in advancing cancer research and improving patient outcomes. Keywords: BERT, cancer hallmark classification, transfer learning, deep learning, natural language processing 1. Introduction Cancer is one of the most difficult sicknesses for individuals in many parts of the world today, including epigenetic and genetic mutations (Jiang et al., 2020). Up to now, millions of people have died due to this disease in the world (Organization, 2008). The study of cancer has a long history that stretches from the past to the present and has consistently drawn the attention of biomedical researchers.


Joint Action is a Framework for Understanding Partnerships Between Humans and Upper Limb Prostheses

Dawson, Michael R., Parker, Adam S. R., Williams, Heather E., Shehata, Ahmed W., Hebert, Jacqueline S., Chapman, Craig S., Pilarski, Patrick M.

arXiv.org Artificial Intelligence

Recent advances in upper limb prostheses have led to significant improvements in the number of movements provided by the robotic limb. However, the method for controlling multiple degrees of freedom via user-generated signals remains challenging. To address this issue, various machine learning controllers have been developed to better predict movement intent. As these controllers become more intelligent and take on more autonomy in the system, the traditional approach of representing the human-machine interface as a human controlling a tool becomes limiting. One possible approach to improve the understanding of these interfaces is to model them as collaborative, multi-agent systems through the lens of joint action. The field of joint action has been commonly applied to two human partners who are trying to work jointly together to achieve a task, such as singing or moving a table together, by effecting coordinated change in their shared environment. In this work, we compare different prosthesis controllers (proportional electromyography with sequential switching, pattern recognition, and adaptive switching) in terms of how they present the hallmarks of joint action. The results of the comparison lead to a new perspective for understanding how existing myoelectric systems relate to each other, along with recommendations for how to improve these systems by increasing the collaborative communication between each partner.


Understanding Human-Machine Collaboration(Artificial Intelligence)

#artificialintelligence

Abstract: We present a model of sense-making that greatly facilitates the collaboration between an intelligent analyst and a knowledge-based agent. It is a general model grounded in the science of evidence and the scientific method of hypothesis generation and testing, where sense-making hypotheses that explain an observation are generated, relevant evidence is then discovered, and the hypotheses are tested based on the discovered evidence. We illustrate how the model enables an analyst to directly instruct the agent to understand situations involving the possible production of weapons (e.g., chemical warfare agents) and how the agent becomes increasingly more competent in understanding other situations from that domain (e.g., possible production of centrifuge-enriched uranium or of stealth fighter aircraft) Abstract: There is a growing desire to create computer systems that can communicate effectively to collaborate with humans on complex, open-ended activities. Assessing these systems presents significant challenges. We describe a framework for evaluating systems engaged in open-ended complex scenarios where evaluators do not have the luxury of comparing performance to a single right answer.