Goto

Collaborating Authors

 Wen, Bo


Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications

arXiv.org Artificial Intelligence

This paper presents SOLOMON, a novel Neuro-inspired Large Language Model (LLM) Reasoning Network architecture that enhances the adaptability of foundation models for domain-specific applications. Through a case study in semiconductor layout design, we demonstrate how SOLOMON enables swift adaptation of general-purpose LLMs to specialized tasks by leveraging Prompt Engineering and In-Context Learning techniques. Our experiments reveal the challenges LLMs face in spatial reasoning and applying domain knowledge to practical problems. Results show that SOLOMON instances significantly outperform their baseline LLM counterparts and achieve performance comparable to state-of-the-art reasoning model, o1-preview. We discuss future research directions for developing more adaptive AI systems that can continually learn, adapt, and evolve in response to new information and changing requirements.


FeBiM: Efficient and Compact Bayesian Inference Engine Empowered with Ferroelectric In-Memory Computing

arXiv.org Artificial Intelligence

In scenarios with limited training data or where explainability is crucial, conventional neural network-based machine learning models often face challenges. In contrast, Bayesian inference-based algorithms excel in providing interpretable predictions and reliable uncertainty estimation in these scenarios. While many state-of-the-art in-memory computing (IMC) architectures leverage emerging non-volatile memory (NVM) technologies to offer unparalleled computing capacity and energy efficiency for neural network workloads, their application in Bayesian inference is limited. This is because the core operations in Bayesian inference differ significantly from the multiplication-accumulation (MAC) operations common in neural networks, rendering them generally unsuitable for direct implementation in most existing IMC designs. In this paper, we propose FeBiM, an efficient and compact Bayesian inference engine powered by multi-bit ferroelectric field-effect transistor (FeFET)-based IMC. FeBiM effectively encodes the trained probabilities of a Bayesian inference model within a compact FeFET-based crossbar. It maps quantized logarithmic probabilities to discrete FeFET states. As a result, the accumulated outputs of the crossbar naturally represent the posterior probabilities, i.e., the Bayesian inference model's output given a set of observations. This approach enables efficient in-memory Bayesian inference without the need for additional calculation circuitry. As the first FeFET-based in-memory Bayesian inference engine, FeBiM achieves an impressive storage density of 26.32 Mb/mm$^{2}$ and a computing efficiency of 581.40 TOPS/W in a representative Bayesian classification task. These results demonstrate 10.7$\times$/43.4$\times$ improvement in compactness/efficiency compared to the state-of-the-art hardware implementation of Bayesian inference.


AlcLaM: Arabic Dialectal Language Model

arXiv.org Artificial Intelligence

These models significantly enhance Arabic Pre-trained Language Models (PLMs) utilizing selfsupervised NLP tasks over multilingual models. However, learning techniques, such as BERT (Devlin they are predominantly trained on Modern Standard et al., 2018a) and RoBERTa (Liu et al., 2019), Arabic (MSA) datasets. This focus on MSA have become pivotal in advancing the field of introduces two primary limitations: first, there is natural language processing (NLP) through transfer reduced recognition of dialectal tokens, which vary learning. These models have significantly enhanced widely across different Arabic-speaking regions; performance across a variety of NLP tasks second, there is a biased weighting towards MSA by leveraging vast amounts of textual data and extensive tokens in the models, which may not accurately computational resources. However, the necessity reflect the linguistic nuances present in everyday for large corpora and the substantial computational Arabic usage.


Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health

arXiv.org Artificial Intelligence

The rapid advancements in large language models (LLMs) have opened up new opportunities for transforming patient engagement in healthcare through conversational AI. This paper presents an overview of the current landscape of LLMs in healthcare, specifically focusing on their applications in analyzing and generating conversations for improved patient engagement. We showcase the power of LLMs in handling unstructured conversational data through four case studies: (1) analyzing mental health discussions on Reddit, (2) developing a personalized chatbot for cognitive engagement in seniors, (3) summarizing medical conversation datasets, and (4) designing an AI-powered patient engagement system. These case studies demonstrate how LLMs can effectively extract insights and summarizations from unstructured dialogues and engage patients in guided, goal-oriented conversations. Leveraging LLMs for conversational analysis and generation opens new doors for many patient-centered outcomes research opportunities. However, integrating LLMs into healthcare raises important ethical considerations regarding data privacy, bias, transparency, and regulatory compliance. We discuss best practices and guidelines for the responsible development and deployment of LLMs in healthcare settings. Realizing the full potential of LLMs in digital health will require close collaboration between the AI and healthcare professionals communities to address technical challenges and ensure these powerful tools' safety, efficacy, and equity.


Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

arXiv.org Artificial Intelligence

Exploiting activation sparsity is a promising approach to significantly accelerating the inference process of large language models (LLMs) without compromising performance. However, activation sparsity is determined by activation functions, and commonly used ones like SwiGLU and GeGLU exhibit limited sparsity. Simply replacing these functions with ReLU fails to achieve sufficient sparsity. Moreover, inadequate training data can further increase the risk of performance degradation. To address these challenges, we propose a novel dReLU function, which is designed to improve LLM activation sparsity, along with a high-quality training data mixture ratio to facilitate effective sparsification. Additionally, we leverage sparse activation patterns within the Feed-Forward Network (FFN) experts of Mixture-of-Experts (MoE) models to further boost efficiency. By applying our neuron sparsification method to the Mistral and Mixtral models, only 2.5 billion and 4.3 billion parameters are activated per inference iteration, respectively, while achieving even more powerful model performance. Evaluation results demonstrate that this sparsity achieves a 2-5x decoding speedup. Remarkably, on mobile phones, our TurboSparse-Mixtral-47B achieves an inference speed of 11 tokens per second. Our models are available at \url{https://huggingface.co/PowerInfer}


Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

arXiv.org Artificial Intelligence

Unstructured text in medical notes and dialogues contains rich information. Recent advancements in Large Language Models (LLMs) have demonstrated superior performance in question answering and summarization tasks on unstructured text data, outperforming traditional text analysis approaches. However, there is a lack of scientific studies in the literature that methodically evaluate and report on the performance of different LLMs, specifically for domain-specific data such as medical chart notes. We propose an evaluation approach to analyze the performance of open-source LLMs such as Llama2 and Mistral for medical summarization tasks, using GPT-4 as an assessor. Our innovative approach to quantitative evaluation of LLMs can enable quality control, support the selection of effective LLMs for specific tasks, and advance knowledge discovery in digital health.


Remote Inference of Cognitive Scores in ALS Patients Using a Picture Description

arXiv.org Artificial Intelligence

Amyotrophic lateral sclerosis is a fatal disease that not only affects movement, speech, and breath but also cognition. Recent studies have focused on the use of language analysis techniques to detect ALS and infer scales for monitoring functional progression. In this paper, we focused on another important aspect, cognitive impairment, which affects 35-50% of the ALS population. In an effort to reach the ALS population, which frequently exhibits mobility limitations, we implemented the digital version of the Edinburgh Cognitive and Behavioral ALS Screen (ECAS) test for the first time. This test which is designed to measure cognitive impairment was remotely performed by 56 participants from the EverythingALS Speech Study. As part of the study, participants (ALS and non-ALS) were asked to describe weekly one picture from a pool of many pictures with complex scenes displayed on their computer at home. We analyze the descriptions performed within +/- 60 days from the day the ECAS test was administered and extract different types of linguistic and acoustic features. We input those features into linear regression models to infer 5 ECAS sub-scores and the total score. Speech samples from the picture description are reliable enough to predict the ECAS subs-scores, achieving statistically significant Spearman correlation values between 0.32 and 0.51 for the model's performance using 10-fold cross-validation.


"It Felt Like Having a Second Mind": Investigating Human-AI Co-creativity in Prewriting with Large Language Models

arXiv.org Artificial Intelligence

Prewriting is the process of discovering and developing ideas before a first draft, which requires divergent thinking and often implies unstructured strategies such as diagramming, outlining, free-writing, etc. Although large language models (LLMs) have been demonstrated to be useful for a variety of tasks including creative writing, little is known about how users would collaborate with LLMs to support prewriting. The preferred collaborative role and initiative of LLMs during such a creativity process is also unclear. To investigate human-LLM collaboration patterns and dynamics during prewriting, we conducted a three-session qualitative study with 15 participants in two creative tasks: story writing and slogan writing. The findings indicated that during collaborative prewriting, there appears to be a three-stage iterative Human-AI Co-creativity process that includes Ideation, Illumination, and Implementation stages. This collaborative process champions the human in a dominant role, in addition to mixed and shifting levels of initiative that exist between humans and LLMs. This research also reports on collaboration breakdowns that occur during this process, user perceptions of using existing LLMs during Human-AI Co-creativity, and discusses design implications to support this co-creativity process.


Multi-Scale Adaptive Graph Neural Network for Multivariate Time Series Forecasting

arXiv.org Artificial Intelligence

Multivariate time series (MTS) forecasting plays an important role in the automation and optimization of intelligent applications. It is a challenging task, as we need to consider both complex intra-variable dependencies and inter-variable dependencies. Existing works only learn temporal patterns with the help of single inter-variable dependencies. However, there are multi-scale temporal patterns in many real-world MTS. Single inter-variable dependencies make the model prefer to learn one type of prominent and shared temporal patterns. In this paper, we propose a multi-scale adaptive graph neural network (MAGNN) to address the above issue. MAGNN exploits a multi-scale pyramid network to preserve the underlying temporal dependencies at different time scales. Since the inter-variable dependencies may be different under distinct time scales, an adaptive graph learning module is designed to infer the scale-specific inter-variable dependencies without pre-defined priors. Given the multi-scale feature representations and scale-specific inter-variable dependencies, a multi-scale temporal graph neural network is introduced to jointly model intra-variable dependencies and inter-variable dependencies. After that, we develop a scale-wise fusion module to effectively promote the collaboration across different time scales, and automatically capture the importance of contributed temporal patterns. Experiments on four real-world datasets demonstrate that MAGNN outperforms the state-of-the-art methods across various settings.


Billion-scale Pre-trained E-commerce Product Knowledge Graph Model

arXiv.org Artificial Intelligence

In recent years, knowledge graphs have been widely applied to organize data in a uniform way and enhance many tasks that require knowledge, for example, online shopping which has greatly facilitated people's life. As a backbone for online shopping platforms, we built a billion-scale e-commerce product knowledge graph for various item knowledge services such as item recommendation. However, such knowledge services usually include tedious data selection and model design for knowledge infusion, which might bring inappropriate results. Thus, to avoid this problem, we propose a Pre-trained Knowledge Graph Model (PKGM) for our billion-scale e-commerce product knowledge graph, providing item knowledge services in a uniform way for embedding-based models without accessing triple data in the knowledge graph. Notably, PKGM could also complete knowledge graphs during servicing, thereby overcoming the common incompleteness issue in knowledge graphs. We test PKGM in three knowledge-related tasks including item classification, same item identification, and recommendation. Experimental results show PKGM successfully improves the performance of each task.