AITopics | King, Irwin

Collaborating Authors

King, Irwin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Low-Rank Adaptation for Foundation Models: A Comprehensive Review

Yang, Menglin, Chen, Jialin, Zhang, Yifei, Liu, Jiahong, Zhang, Jiasheng, Ma, Qiyao, Verma, Harshit, Zhang, Qianru, Zhou, Min, King, Irwin, Ying, Rex

arXiv.org Artificial IntelligenceDec-31-2024

The rapid advancement of foundation modelslarge-scale neural networks trained on diverse, extensive datasetshas revolutionized artificial intelligence, enabling unprecedented advancements across domains such as natural language processing, computer vision, and scientific discovery. However, the substantial parameter count of these models, often reaching billions or trillions, poses significant challenges in adapting them to specific downstream tasks. Low-Rank Adaptation (LoRA) has emerged as a highly promising approach for mitigating these challenges, offering a parameter-efficient mechanism to fine-tune foundation models with minimal computational overhead. This survey provides the first comprehensive review of LoRA techniques beyond large Language Models to general foundation models, including recent techniques foundations, emerging frontiers and applications of low-rank adaptation across multiple domains. Finally, this survey discusses key challenges and future research directions in theoretical understanding, scalability, and robustness. This survey serves as a valuable resource for researchers and practitioners working with efficient foundation model adaptation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.00365

Country: Asia (0.45)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Context-aware Inductive Knowledge Graph Completion with Latent Type Constraints and Subgraph Reasoning

Li, Muzhi, Yang, Cehao, Xu, Chengjin, Song, Zixing, Jiang, Xuhui, Guo, Jian, Leung, Ho-fung, King, Irwin

arXiv.org Artificial IntelligenceDec-27-2024

Inductive knowledge graph completion (KGC) aims to predict missing triples with unseen entities. Recent works focus on modeling reasoning paths between the head and tail entity as direct supporting evidence. However, these methods depend heavily on the existence and quality of reasoning paths, which limits their general applicability in different scenarios. In addition, we observe that latent type constraints and neighboring facts inherent in KGs are also vital in inferring missing triples. To effectively utilize all useful information in KGs, we introduce CATS, a novel context-aware inductive KGC solution. With sufficient guidance from proper prompts and supervised fine-tuning, CATS activates the strong semantic understanding and reasoning capabilities of large language models to assess the existence of query triples, which consist of two modules. First, the type-aware reasoning module evaluates whether the candidate entity matches the latent entity type as required by the query relation. Then, the subgraph reasoning module selects relevant reasoning paths and neighboring facts, and evaluates their correlation to the query triple. Experiment results on three widely used datasets demonstrate that CATS significantly outperforms state-of-the-art methods in 16 out of 18 transductive, inductive, and few-shot settings with an average absolute MRR improvement of 7.2%.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.16803

Country:

Asia (0.68)
Europe (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Government > Regional Government > North America Government > United States Government (0.71)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.73)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)

Add feedback

NILE: Internal Consistency Alignment in Large Language Models

Hu, Minda, Zhang, Qiyuan, Wang, Yufei, He, Bowei, Wang, Hongru, Zhou, Jingyan, Li, Liangyou, Wang, Yasheng, Ma, Chen, King, Irwin

arXiv.org Artificial IntelligenceDec-21-2024

As a crucial step to enhance LLMs alignment with human intentions, Instruction Fine-Tuning (IFT) has a high demand on dataset quality. However, existing IFT datasets often contain knowledge that is inconsistent with LLMs' internal knowledge learned from the pre-training phase, which can greatly affect the efficacy of IFT. To address this issue, we introduce NILE (iNternal consIstency aLignmEnt) framework, aimed at optimizing IFT datasets to unlock LLMs' capability further. NILE operates by eliciting target pre-trained LLM's internal knowledge corresponding to instruction data. The internal knowledge is leveraged to revise the answer in IFT datasets. Additionally, we propose a novel Internal Consistency Filtering (ICF) method to filter training samples, ensuring its high consistency with LLM's internal knowledge. Our experiments demonstrate that NILE-aligned IFT datasets sharply boost LLM performance across multiple LLM ability evaluation datasets, achieving up to 66.6% gain on Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each component of the NILE}framework contributes to these substantial performance improvements, and provides compelling evidence that dataset consistency with pre-trained internal knowledge is pivotal for maximizing LLM potential.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.16686

Country:

Asia (1.00)
North America > Mexico (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Retrieval, Reasoning, Re-ranking: A Context-Enriched Framework for Knowledge Graph Completion

Li, Muzhi, Yang, Cehao, Xu, Chengjin, Jiang, Xuhui, Qi, Yiyan, Guo, Jian, Leung, Ho-fung, King, Irwin

arXiv.org Artificial IntelligenceNov-12-2024

The Knowledge Graph Completion~(KGC) task aims to infer the missing entity from an incomplete triple. Existing embedding-based methods rely solely on triples in the KG, which is vulnerable to specious relation patterns and long-tail entities. On the other hand, text-based methods struggle with the semantic gap between KG triples and natural language. Apart from triples, entity contexts (e.g., labels, descriptions, aliases) also play a significant role in augmenting KGs. To address these limitations, we propose KGR3, a context-enriched framework for KGC. KGR3 is composed of three modules. Firstly, the Retrieval module gathers supporting triples from the KG, collects plausible candidate answers from a base embedding model, and retrieves context for each related entity. Then, the Reasoning module employs a large language model to generate potential answers for each query triple. Finally, the Re-ranking module combines candidate answers from the two modules mentioned above, and fine-tunes an LLM to provide the best answer. Extensive experiments on widely used datasets demonstrate that KGR3 consistently improves various KGC methods. Specifically, the best variant of KGR3 achieves absolute Hits@1 improvements of 12.3% and 5.6% on the FB15k237 and WN18RR datasets.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.08165

Country:

North America > United States > Illinois (1.00)
Europe (1.00)
Asia > China (0.68)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Soccer (0.95)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.72)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)

Add feedback

FedCVD: The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data

Zhang, Yukun, Chen, Guanzhong, Xu, Zenglin, Wang, Jianyong, Zeng, Dun, Li, Junfan, Wang, Jinghua, Qi, Yuan, King, Irwin

arXiv.org Artificial IntelligenceOct-27-2024

Cardiovascular diseases (CVDs) are currently the leading cause of death worldwide, highlighting the critical need for early diagnosis and treatment. Machine learning (ML) methods can help diagnose CVDs early, but their performance relies on access to substantial data with high quality. However, the sensitive nature of healthcare data often restricts individual clinical institutions from sharing data to train sufficiently generalized and unbiased ML models. Federated Learning (FL) is an emerging approach, which offers a promising solution by enabling collaborative model training across multiple participants without compromising the privacy of the individual data owners. However, to the best of our knowledge, there has been limited prior research applying FL to the cardiovascular disease domain. Moreover, existing FL benchmarks and datasets are typically simulated and may fall short of replicating the complexity of natural heterogeneity found in realistic datasets that challenges current FL algorithms. To address these gaps, this paper presents the first real-world FL benchmark for cardiovascular disease detection, named FedCVD. This benchmark comprises two major tasks: electrocardiogram (ECG) classification and echocardiogram (ECHO) segmentation, based on naturally scattered datasets constructed from the CVD data of seven institutions. Our extensive experiments on these datasets reveal that FL faces new challenges with real-world non-IID and long-tail data. The code and datasets of FedCVD are available https://github.com/SMILELab-FL/FedCVD.

artificial intelligence, data mining, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2411.0705

Country:

North America > United States (1.00)
Europe (1.00)
Asia > China (1.00)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs

Zhang, Yifei, Zhu, Hao, Liu, Aiwei, Yu, Han, Koniusz, Piotr, King, Irwin

arXiv.org Artificial IntelligenceOct-25-2024

Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks. However, the enormous size of LLMs poses significant challenges in terms of computational complexity and resource requirements. Low-Rank Adaptation (LoRA) has emerged as a promising solution. However, there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum. In this work, we propose eXtreme Gradient Boosting LoRA (XGBLoRA), a novel framework that bridges this gap by leveraging the power of ensemble learning. Inspired by gradient boosting, XGBLoRA iteratively learns and merges a sequence of LoRA adaptations to refine model predictions. It achieves better performance than the standard LoRA, while enjoying the computational efficiency of rank-1 adaptations. We provide theoretical analysis to show the convergence and optimality of our approach, and conduct extensive experiments on a range of natural language processing tasks. The results demonstrate that XGBLoRA consistently outperforms standard LoRA and achieves performance comparable to full fine-tuning with significantly fewer trainable parameters. This work advances parameter-efficient fine-tuning for LLMs, and offers a promising solution for adapting LLMs to downstream tasks while optimizing performance and efficiency.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.19694

Country:

Europe (0.28)
Asia (0.28)

Genre:

Research Report > New Finding (0.87)
Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Recent Advances of Multimodal Continual Learning: A Comprehensive Survey

Yu, Dianzhi, Zhang, Xinni, Chen, Yankai, Liu, Aiwei, Zhang, Yifei, Yu, Philip S., King, Irwin

arXiv.org Artificial IntelligenceOct-10-2024

Continual learning (CL) aims to empower machine learning models to learn continually from new data, while building upon previously acquired knowledge without forgetting. As machine learning models have evolved from small to large pre-trained architectures, and from supporting unimodal to multimodal data, multimodal continual learning (MMCL) methods have recently emerged. The primary challenge of MMCL is that it goes beyond a simple stacking of unimodal CL methods, as such straightforward approaches often yield unsatisfactory performance. In this work, we present the first comprehensive survey on MMCL. We provide essential background knowledge and MMCL settings, as well as a structured taxonomy of MMCL methods. We categorize existing MMCL methods into four categories, i.e., regularization-based, architecture-based, replay-based, and prompt-based methods, explaining their methodologies and highlighting their key innovations. Additionally, to prompt further research in this field, we summarize open MMCL datasets and benchmarks, and discuss several promising future directions for investigation and development. We have also created a GitHub repository for indexing relevant MMCL papers and open resources available at https://github.com/LucyDYu/Awesome-Multimodal-Continual-Learning.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.05352

Country:

North America > United States (1.00)
Asia (0.93)

Genre: Overview (1.00)

Industry:

Health & Medicine (0.67)
Education (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Hyperbolic Fine-tuning for Large Language Models

Yang, Menglin, Feng, Aosong, Xiong, Bo, Liu, Jihong, King, Irwin, Ying, Rex

arXiv.org Artificial IntelligenceOct-4-2024

Large language models (LLMs) have demonstrated remarkable performance on various tasks. However, it remains an open question whether the default Euclidean space is the most suitable choice for embedding tokens in LLMs. In this study, we first investigate the non-Euclidean characteristics of LLMs. Our findings reveal that token frequency follows a power-law distribution, with high-frequency tokens clustering near the origin and low-frequency tokens positioned farther away. Additionally, token embeddings exhibit a high degree of hyperbolicity, indicating a latent tree-like structure in the embedding space. Building on the observation, we propose to efficiently fine-tune LLMs in hyperbolic space to better exploit the underlying complex structures. However, we found that this fine-tuning in hyperbolic space cannot be achieved with naive application of exponential and logarithmic maps, when the embedding and weight matrices both reside in Euclidean space. To address this technique issue, we introduce a new method called hyperbolic low-rank efficient fine-tuning, HypLoRA, that performs low-rank adaptation directly on the hyperbolic manifold, avoiding the cancellation effect caused by the exponential and logarithmic maps, thus preserving the hyperbolic modeling capabilities. Through extensive experiments, we demonstrate that HypLoRA significantly enhances the performance of LLMs on reasoning tasks, particularly for complex reasoning problems. In particular, HypLoRA improves the performance in the complex AQuA dataset by up to 13.0%, showcasing its effectiveness in handling complex reasoning challenges

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.0401

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Recent Advances in Speech Language Models: A Survey

Cui, Wenqian, Yu, Dianzhi, Jiao, Xiaoqi, Meng, Ziqiao, Zhang, Guangyan, Wang, Qichao, Guo, Yiwen, King, Irwin

arXiv.org Artificial IntelligenceOct-1-2024

Large Language Models (LLMs) have recently garnered significant attention, primarily for their capabilities in text-based interactions. However, natural human interaction often relies on speech, necessitating a shift towards voice-based models. A straightforward approach to achieve this involves a pipeline of ``Automatic Speech Recognition (ASR) + LLM + Text-to-Speech (TTS)", where input speech is transcribed to text, processed by an LLM, and then converted back to speech. Despite being straightforward, this method suffers from inherent limitations, such as information loss during modality conversion and error accumulation across the three stages. To address these issues, Speech Language Models (SpeechLMs) -- end-to-end models that generate speech without converting from text -- have emerged as a promising alternative. This survey paper provides the first comprehensive overview of recent methodologies for constructing SpeechLMs, detailing the key components of their architecture and the various training recipes integral to their development. Additionally, we systematically survey the various capabilities of SpeechLMs, categorize the evaluation metrics for SpeechLMs, and discuss the challenges and future research directions in this rapidly evolving field.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.03751

Country:

Asia (1.00)
Europe (0.93)
North America > United States > Minnesota (0.28)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HC-GLAD: Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection

Fu, Yali, Li, Jindong, Liu, Jiahong, Xing, Qianli, Wang, Qi, King, Irwin

arXiv.org Artificial IntelligenceJul-2-2024

Unsupervised graph-level anomaly detection (UGAD) has garnered increasing attention in recent years due to its significance. However, most existing methods only rely on traditional graph neural networks to explore pairwise relationships but such kind of pairwise edges are not enough to describe multifaceted relationships involving anomaly. There is an emergency need to exploit node group information which plays a crucial role in UGAD. In addition, most previous works ignore the global underlying properties (e.g., hierarchy and power-law structure) which are common in real-world graph datasets and therefore are indispensable factors on UGAD task. In this paper, we propose a novel Dual Hyperbolic Contrastive Learning for Unsupervised Graph-Level Anomaly Detection (HC-GLAD in short). To exploit node group connections, we construct hypergraphs based on gold motifs and subsequently perform hypergraph convolution. Furthermore, to preserve the hierarchy of real-world graphs, we introduce hyperbolic geometry into this field and conduct both graph and hypergraph embedding learning in hyperbolic space with hyperboloid model. To the best of our knowledge, this is the first work to simultaneously apply hypergraph with node group connections and hyperbolic geometry into this field. Extensive experiments on several real world datasets of different fields demonstrate the superiority of HC-GLAD on UGAD task. The code is available at https://github.com/Yali-F/HC-GLAD.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2407.02057

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback