AITopics | Chen, Tianlong

Collaborating Authors

Chen, Tianlong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity and Performance Restoration

He, Shwai, Li, Ang, Chen, Tianlong

arXiv.org Artificial IntelligenceJun-24-2024

Vision-Language Models (VLMs) integrate information from multiple modalities and have shown remarkable success across various tasks. However, deploying large-scale VLMs in resource-constrained scenarios is challenging. Pruning followed by finetuning offers a potential solution but remains underexplored for VLMs. This study addresses two key questions: how to distribute sparsity across different modality-specific models, and how to restore the performance of pruned sparse VLMs. Our preliminary studies identified two effective pruning settings: applying the same sparsity to both vision and language models, and pruning only the language models. While LoRA finetuning aims to restore sparse models, it faces challenges due to incompatibility with sparse models, disrupting the pruned sparsity. To overcome these issues, we propose SparseLoRA, which applies sparsity directly to LoRA weights. Our experimental results demonstrate significant improvements, including an 11.3\% boost under 2:4 sparsity and a 47.6\% enhancement under unstructured 70\% sparsity. Code is released at: \url{https://github.com/Shwai-He/VLM-Compression}.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.02424

Country:

North America > United States > Maryland (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations

Duan, Jinhao, Zhang, Renming, Diffenderfer, James, Kailkhura, Bhavya, Sun, Lichao, Stengel-Eskin, Elias, Bansal, Mohit, Chen, Tianlong, Xu, Kaidi

arXiv.org Artificial IntelligenceJun-10-2024

As Large Language Models (LLMs) are integrated into critical real-world applications, their strategic and logical reasoning abilities are increasingly crucial. This paper evaluates LLMs' reasoning abilities in competitive environments through game-theoretic tasks, e.g., board and card games that require pure logic and strategic reasoning to compete with opponents. We first propose GTBench, a language-driven environment composing 10 widely recognized tasks, across a comprehensive game taxonomy: complete versus incomplete information, dynamic versus static, and probabilistic versus deterministic scenarios. Then, we (1) Characterize the game-theoretic reasoning of LLMs; and (2) Perform LLM-vs.-LLM competitions as reasoning evaluation. We observe that (1) LLMs have distinct behaviors regarding various gaming scenarios; for example, LLMs fail in complete and deterministic games yet they are competitive in probabilistic gaming scenarios; (2) Most open-source LLMs, e.g., CodeLlama-34b-Instruct and Llama-2-70b-chat, are less competitive than commercial LLMs, e.g., GPT-4, in complex games, yet the recently released Llama-3-70b-Instruct makes up for this shortcoming. In addition, code-pretraining greatly benefits strategic reasoning, while advanced reasoning methods such as Chain-of-Thought (CoT) and Tree-of-Thought (ToT) do not always help. We further characterize the game-theoretic properties of LLMs, such as equilibrium and Pareto Efficiency in repeated games. Detailed error profiles are provided for a better understanding of LLMs' behavior. We hope our research provides standardized protocols and serves as a foundation to spur further explorations in the strategic reasoning of LLMs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.12348

Country:

North America > United States (0.14)
Europe > Sweden (0.14)

Genre: Research Report > Experimental Study (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

Tan, Zhen, Zhao, Chengshuai, Moraffah, Raha, Li, Yifan, Kong, Yu, Chen, Tianlong, Liu, Huan

arXiv.org Artificial IntelligenceJun-2-2024

Due to their unprecedented ability to process and respond to various types of data, Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI). As these advanced generative models increasingly form collaborative networks for complex tasks, the integrity and security of these systems are crucial. Our paper, ``The Wolf Within'', explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content. Unlike direct harmful output generation for MLLMs, our research demonstrates how a single MLLM agent can be subtly influenced to generate prompts that, in turn, induce other MLLM agents in the society to output malicious content. Our findings reveal that, an MLLM agent, when manipulated to produce specific prompts or instructions, can effectively ``infect'' other agents within a society of MLLMs. This infection leads to the generation and circulation of harmful outputs, such as dangerous instructions or misinformation, across the society. We also show the transferability of these indirectly generated prompts, highlighting their possibility in propagating malice through inter-agent communication. This research provides a critical insight into a new dimension of threat posed by MLLMs, where a single agent can act as a catalyst for widespread malevolent influence. Our work underscores the urgent need for developing robust mechanisms to detect and mitigate such covert manipulations within MLLM societies, ensuring their safe and ethical utilization in societal applications.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2402.14859

Country: North America > United States (0.47)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Zhang, Yihua, Li, Pingzhi, Hong, Junyuan, Li, Jiaxiang, Zhang, Yimeng, Zheng, Wenqing, Chen, Pin-Yu, Lee, Jason D., Yin, Wotao, Hong, Mingyi, Wang, Zhangyang, Liu, Sijia, Chen, Tianlong

arXiv.org Artificial IntelligenceMay-27-2024

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications like on-device training where memory efficiency is paramount. This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during LLM fine-tuning, building on the initial concept introduced by MeZO. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques, through a comprehensive, first-of-its-kind benchmarking study across five LLM families (Roberta, OPT, LLaMA, Vicuna, Mistral), three task complexities, and five fine-tuning schemes. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance. We further introduce novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity. Our study offers a promising direction for achieving further memory-efficient LLM fine-tuning. Codes to reproduce all our experiments are at https://github.com/ZO-Bench/ZO-LLM .

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2402.11592

Country:

North America > United States (0.67)
Europe > Austria > Vienna (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Graph Sparsification via Mixture of Graphs

Zhang, Guibin, Sun, Xiangguo, Yue, Yanwei, Wang, Kun, Chen, Tianlong, Pan, Shirui

arXiv.org Artificial IntelligenceMay-23-2024

Graph Neural Networks (GNNs) have demonstrated superior performance across various graph learning tasks but face significant computational challenges when applied to large-scale graphs. One effective approach to mitigate these challenges is graph sparsification, which involves removing non-essential edges to reduce computational overhead. However, previous graph sparsification methods often rely on a single global sparsity setting and uniform pruning criteria, failing to provide customized sparsification schemes for each node's complex local context. In this paper, we introduce Mixture-of-Graphs (MoG), leveraging the concept of Mixtureof-Experts (MoE), to dynamically select tailored pruning solutions for each node. Specifically, MoG incorporates multiple sparsifier experts, each characterized by unique sparsity levels and pruning criteria, and selects the appropriate experts for each node. Subsequently, MoG performs a mixture of the sparse graphs produced by different experts on the Grassmann manifold to derive an optimal sparse graph. One notable property of MoG is its entirely local nature, as it depends on the specific circumstances of each individual node. Extensive experiments on four large-scale OGB datasets and two superpixel datasets, equipped with five GNN backbones, demonstrate that MoG (I) identifies subgraphs at higher sparsity levels (8.67% 50.85%), with performance equal to or better than the dense graph, (II) achieves 1.47 2.62 speedup in GNN inference with negligible performance drop, and (III) boosts "top-student" GNN performance (1.02% on RevGNN+

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.1426

Country:

Asia (0.68)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature

Li, Dawei, Yang, Shu, Tan, Zhen, Baik, Jae Young, Yun, Sukwon, Lee, Joseph, Chacko, Aaron, Hou, Bojian, Duong-Tran, Duy, Ding, Ying, Liu, Huan, Shen, Li, Chen, Tianlong

arXiv.org Artificial IntelligenceMay-12-2024

Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on studying Alzheimer's Disease (AD), a specialized sub-field in biomedicine and a global health priority. With a synergized framework of LLM and KG mutually enhancing each other, we first leverage LLM to construct an evolving AD-specific knowledge graph (KG) sourced from AD-related scientific literature, and then we utilize a coarse-to-fine sampling method with a novel self-aware knowledge retrieval approach to select appropriate knowledge from the KG to augment LLM inference capabilities. The experimental results, conducted on our constructed AD question answering (ADQA) benchmark, underscore the efficacy of DALK. Additionally, we perform a series of detailed analyses that can offer valuable insights and guidelines for the emerging topic of mutually enhancing KG and LLM. We will release the code and data at https://github.com/David-Li0406/DALK.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.04819

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent

Li, Pingzhi, Liu, Junyu, Wang, Hanrui, Chen, Tianlong

arXiv.org Artificial IntelligenceApr-30-2024

Optimization techniques in deep learning are predominantly led by first-order gradient methodologies, such as SGD. However, neural network training can greatly benefit from the rapid convergence characteristics of second-order optimization. Newton's GD stands out in this category, by rescaling the gradient using the inverse Hessian. Nevertheless, one of its major bottlenecks is matrix inversion, which is notably time-consuming in $O(N^3)$ time with weak scalability. Matrix inversion can be translated into solving a series of linear equations. Given that quantum linear solver algorithms (QLSAs), leveraging the principles of quantum superposition and entanglement, can operate within a $\text{polylog}(N)$ time frame, they present a promising approach with exponential acceleration. Specifically, one of the most recent QLSAs demonstrates a complexity scaling of $O(d\cdot\kappa \log(N\cdot\kappa/\epsilon))$, depending on: {size~$N$, condition number~$\kappa$, error tolerance~$\epsilon$, quantum oracle sparsity~$d$} of the matrix. However, this also implies that their potential exponential advantage may be hindered by certain properties (i.e. $\kappa$ and $d$). We propose Q-Newton, a hybrid quantum-classical scheduler for accelerating neural network training with Newton's GD. Q-Newton utilizes a streamlined scheduling module that coordinates between quantum and classical linear solvers, by estimating & reducing $\kappa$ and constructing $d$ for the quantum solver. Our evaluation showcases the potential for Q-Newton to significantly reduce the total training time compared to commonly used optimizers like SGD. We hypothesize a future scenario where the gate time of quantum machines is reduced, possibly realized by attoseconds physics. Our evaluation establishes an ambitious and promising target for the evolution of quantum computing.

artificial intelligence, machine learning, matrix, (14 more...)

arXiv.org Artificial Intelligence

2405.00252

Country: North America > United States > Massachusetts (0.14)

Genre:

Workflow (0.93)
Research Report > Promising Solution (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Nakamura, Taishi, Mishra, Mayank, Tedeschi, Simone, Chai, Yekun, Stillerman, Jason T, Friedrich, Felix, Yadav, Prateek, Laud, Tanmay, Chien, Vu Minh, Zhuo, Terry Yue, Misra, Diganta, Bogin, Ben, Vu, Xuan-Son, Karpinska, Marzena, Dantuluri, Arnav Varma, Kusa, Wojciech, Furlanello, Tommaso, Yokota, Rio, Muennighoff, Niklas, Pai, Suhas, Adewumi, Tosin, Laippala, Veronika, Yao, Xiaozhe, Junior, Adalberto, Ariyak, Alpay, Drozd, Aleksandr, Clive, Jordan, Gupta, Kshitij, Chen, Liangyu, Sun, Qi, Tsui, Ken, Persaud, Noah, Fahmy, Nour, Chen, Tianlong, Bansal, Mohit, Monti, Nicolo, Dang, Tai, Luo, Ziyang, Bui, Tien-Tung, Navigli, Roberto, Mehta, Virendra, Blumberg, Matthew, May, Victor, Nguyen, Huu, Pyysalo, Sampo

arXiv.org Artificial IntelligenceApr-23-2024

Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, whereas pretraining from scratch is computationally expensive, and compliance with AI safety and development laws. This paper presents Aurora-M, a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code. Continually pretrained from StarCoderPlus on 435 billion additional tokens, Aurora-M surpasses 2 trillion tokens in total training token count. It is the first open-source multilingual model fine-tuned on human-reviewed safety instructions, thus aligning its development not only with conventional red-teaming considerations, but also with the specific concerns articulated in the Biden-Harris Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Aurora-M is rigorously evaluated across various tasks and languages, demonstrating robustness against catastrophic forgetting and outperforming alternatives in multilingual settings, particularly in safety evaluations. To promote responsible open-source LLM development, Aurora-M and its variants are released at https://huggingface.co/collections/aurora-m/aurora-m-models-65fdfdff62471e09812f5407 .

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.00399

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

Jaiswal, Ajay, Hu, Bodun, Yin, Lu, Ro, Yeonju, Liu, Shiwei, Chen, Tianlong, Akella, Aditya

arXiv.org Artificial IntelligenceApr-4-2024

Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges for autoregressive token-by-token generation. To mitigate computation overload incurred during generation, several early-exit and layer-dropping strategies have been proposed. Despite some promising success due to the redundancy across LLMs layers on metrics like Rough-L/BLUE, our careful knowledge-intensive evaluation unveils issues such as generation collapse, hallucination of wrong facts, and noticeable performance drop even at the trivial exit ratio of 10-15% of layers. We attribute these errors primarily to ineffective handling of the KV cache through state copying during early-exit. In this work, we observed the saturation of computationally expensive feed-forward blocks of LLM layers and proposed FFN-SkipLLM, which is a novel fine-grained skip strategy of autoregressive LLMs. More specifically, FFN-SkipLLM is an input-adaptive feed-forward skipping strategy that can skip 25-30% of FFN blocks of LLMs with marginal change in performance on knowledge-intensive generation tasks without any requirement to handle KV cache. Our extensive experiments and ablation across benchmarks like MT-Bench, Factoid-QA, and variable-length text summarization illustrate how our simple and ease-at-use method can facilitate faster autoregressive decoding.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2404.03865

Country:

Asia > India (0.51)
North America > United States > Texas (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Government > Regional Government > Asia Government > India Government (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Thought Graph: Generating Thought Process for Biological Reasoning

Hsu, Chi-Yang, Cox, Kyle, Xu, Jiawei, Tan, Zhen, Zhai, Tianhua, Hu, Mengzhou, Pratt, Dexter, Chen, Tianlong, Hu, Ziniu, Ding, Ying

arXiv.org Artificial IntelligenceMar-11-2024

We present the Thought Graph as a novel framework to support complex reasoning and use gene set analysis as an example to uncover semantic relationships between biological processes. Our framework stands out for its ability to provide a deeper understanding of gene sets, significantly surpassing GSEA by 40.28% and LLM baselines by 5.38% based on cosine similarity to human annotations. Our analysis further provides insights into future directions of biological processes naming, and implications for bioinformatics and precision medicine.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3589335.3651572

2403.07144

Country:

North America > United States > Texas > Travis County > Austin (0.16)
North America > United States > California > San Diego County (0.15)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback