AITopics | Lu, Jun

Collaborating Authors

Lu, Jun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large Language Model Compression via the Nested Activation-Aware Decomposition

Lu, Jun, Xu, Tianyi, Ding, Bill, Li, David, Kang, Yu

arXiv.org Artificial IntelligenceMar-21-2025

In this paper, we tackle the critical challenge of compressing large language models (LLMs) to facilitate their practical deployment and broader adoption. We introduce a novel post-training compression paradigm that focuses on low-rank decomposition of LLM weights. Our analysis identifies two main challenges in this task: the variability in LLM activation distributions and handling unseen activations from different datasets and models. To address these challenges, we propose a nested activation-aware framework (NSVD) for LLMs, a training-free approach designed to enhance the accuracy of low-rank decompositions by managing activation outliers through transforming the weight matrix based on activation distribution and the original weight matrix. This method allows for the absorption of outliers into the transformed weight matrix, improving decomposition accuracy. Our comprehensive evaluation across eight datasets and six models from three distinct LLM families demonstrates the superiority of NSVD over current state-of-the-art methods, especially at medium to large compression ratios or in multilingual and multitask settings.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.17101

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Generalizable Machine Learning Models for Predicting Data Center Server Power, Efficiency, and Throughput

Lei, Nuoa, Shehabi, Arman, Lu, Jun, Cao, Zhi, Koomey, Jonathan, Smith, Sarah, Masanet, Eric

arXiv.org Artificial IntelligenceMar-8-2025

In the rapidly evolving digital era, comprehending the intricate dynamics influencing server power consumption, efficiency, and performance is crucial for sustainable data center operations. However, existing models lack the ability to provide a detailed and reliable understanding of these intricate relationships. This study employs a machine learning-based approach, using the SPECPower_ssj2008 database, to facilitate user-friendly and generalizable server modeling. The resulting models demonstrate high accuracy, with errors falling within approximately 10% on the testing dataset, showcasing their practical utility and generalizability. Through meticulous analysis, predictive features related to hardware availability date, server workload level, and specifications are identified, providing insights into optimizing energy conservation, efficiency, and performance in server deployment and operation. By systematically measuring biases and uncertainties, the study underscores the need for caution when employing historical data for prospective server modeling, considering the dynamic nature of technology landscapes. Collectively, this work offers valuable insights into the sustainable deployment and operation of servers in data centers, paving the way for enhanced resource use efficiency and more environmentally conscious practices.

artificial intelligence, information management, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.06439

Country: North America > United States > California > Santa Barbara County > Santa Barbara (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Information Technology > Services (0.95)
Energy > Power Industry (0.94)
Government (0.93)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Practical Topics in Optimization

Lu, Jun

arXiv.org Artificial IntelligenceFeb-16-2025

In an era where data-driven decision-making and computational efficiency are paramount, optimization plays a foundational role in advancing fields such as mathematics, computer science, operations research, machine learning, and beyond. From refining machine learning models to improving resource allocation and designing efficient algorithms, optimization techniques serve as essential tools for tackling complex problems. This book aims to provide both an introductory guide and a comprehensive reference, equipping readers with the necessary knowledge to understand and apply optimization methods within their respective fields. Our primary goal is to demystify the inner workings of optimization algorithms, including black-box and stochastic optimizers, by offering both formal and intuitive explanations. Starting from fundamental mathematical principles, we derive key results to ensure that readers not only learn how these techniques work but also understand when and why to apply them effectively. By striking a careful balance between theoretical depth and practical application, this book serves a broad audience, from students and researchers to practitioners seeking robust optimization strategies.

generalized conditional gradient method, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2503.05882

Country:

North America > United States (0.45)
North America > Canada > Ontario > Toronto (0.13)

Genre:

Workflow (1.00)
Research Report > New Finding (0.92)
Summary/Review (0.87)

Industry:

Health & Medicine (1.00)
Energy > Oil & Gas (0.45)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

Add feedback

Gradient Descent, Stochastic Optimization, and Other Tales

Lu, Jun

arXiv.org Artificial IntelligenceJan-12-2024

The goal of this paper is to debunk and dispel the magic behind black-box optimizers and stochastic optimizers. It aims to build a solid foundation on how and why the techniques work. This manuscript crystallizes this knowledge by deriving from simple intuitions, the mathematics behind the strategies. This tutorial doesn't shy away from addressing both the formal and informal aspects of gradient descent and stochastic optimization methods. By doing so, it hopes to provide readers with a deeper understanding of these techniques as well as the when, the how and the why of applying these algorithms. Gradient descent is one of the most popular algorithms to perform optimization and by far the most common way to optimize machine learning tasks. Its stochastic version receives attention in recent years, and this is particularly true for optimizing deep neural networks. In deep neural networks, the gradient followed by a single sample or a batch of samples is employed to save computational resources and escape from saddle points. In 1951, Robbins and Monro published \textit{A stochastic approximation method}, one of the first modern treatments on stochastic optimization that estimates local gradients with a new batch of samples. And now, stochastic optimization has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this article is to give a self-contained introduction to concepts and mathematical tools in gradient descent and stochastic optimization.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2205.00832

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Matrix Decomposition and Applications

Lu, Jun

arXiv.org Artificial IntelligenceDec-28-2023

In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2201.00145

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (0.45)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Media > Film (0.92)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

Add feedback

Bayesian Matrix Decomposition and Applications

Lu, Jun

arXiv.org Artificial IntelligenceFeb-18-2023

The sole aim of this book is to give a self-contained introduction to concepts and mathematical tools in Bayesian matrix decomposition in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning Bayesian matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of variational inference for conducting the optimization. We refer the reader to literature in the field of Bayesian analysis for a more detailed introduction to the related fields. This book is primarily a summary of purpose, significance of important Bayesian matrix decomposition methods, e.g., real-valued decomposition, nonnegative matrix factorization, Bayesian interpolative decomposition, and the origin and complexity of the methods which shed light on their applications. The mathematical prerequisite is a first course in statistics and linear algebra. Other than this modest background, the development is self-contained, with rigorous proof provided throughout.

data mining, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2302.11337

Country: North America > United States (1.00)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Banking & Finance > Trading (1.00)
Media > Film (0.93)
Leisure & Entertainment (0.93)
(3 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(7 more...)

Add feedback

Flexible and Hierarchical Prior for Bayesian Nonnegative Matrix Factorization

Lu, Jun, Ye, Xuanyu

arXiv.org Machine LearningJun-19-2022

In this paper, we introduce a probabilistic model for learning nonnegative matrix factorization (NMF) that is commonly used for predicting missing values and finding hidden patterns in the data, in which the matrix factors are latent variables associated with each data dimension. The nonnegativity constraint for the latent factors is handled by choosing priors with support on the nonnegative subspace. Bayesian inference procedure based on Gibbs sampling is employed. We evaluate the model on several real-world datasets including MovieLens 100K and MovieLens 1M with different sizes and dimensions and show that the proposed Bayesian NMF GRRN model leads to better predictions and avoids overfitting compared to existing Bayesian NMF approaches.

artificial intelligence, bayesian inference, machine learning, (13 more...)

arXiv.org Machine Learning

2205.11025

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

A survey on Bayesian inference for Gaussian mixture model

Lu, Jun

arXiv.org Machine LearningAug-20-2021

Clustering has become a core technology in machine learning, largely due to its application in the field of unsupervised learning, clustering, classification, and density estimation. A frequentist approach exists to hand clustering based on mixture model which is known as the EM algorithm where the parameters of the mixture model are usually estimated into a maximum likelihood estimation framework. Bayesian approach for finite and infinite Gaussian mixture model generates point estimates for all variables as well as associated uncertainty in the form of the whole estimates' posterior distribution. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in Bayesian inference for finite and infinite Gaussian mixture model in order to seamlessly introduce their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning this field and given the paucity of scope to present this discussion, e.g., the separated analysis of the generation of Dirichlet samples by stick-breaking and Polya's Urn approaches. We refer the reader to literature in the field of the Dirichlet process mixture model for a much detailed introduction to the related fields. Some excellent examples include (Frigyik et al., 2010; Murphy, 2012; Gelman et al., 2014; Hoff, 2009). This survey is primarily a summary of purpose, significance of important background and techniques for Gaussian mixture model, e.g., Dirichlet prior, Chinese restaurant process, and most importantly the origin and complexity of the methods which shed light on their modern applications. The mathematical prerequisite is a first course in probability. Other than this modest background, the development is self-contained, with rigorous proofs provided throughout.

artificial intelligence, bayesian inference, mixture model, (15 more...)

arXiv.org Machine Learning

2108.11753

Country: North America > United States (0.67)

Genre:

Instructional Material > Course Syllabus & Notes (0.65)
Research Report (0.63)
Overview (0.54)

Industry: Consumer Products & Services > Restaurants (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation

Liu, Xiaofeng, Li, Site, Ge, Yubin, Ye, Pengyi, You, Jane, Lu, Jun

arXiv.org Artificial IntelligenceAug-17-2021

The unsupervised domain adaptation (UDA) has been widely adopted to alleviate the data scalability issue, while the existing works usually focus on classifying independently discrete labels. However, in many tasks (e.g., medical diagnosis), the labels are discrete and successively distributed. The UDA for ordinal classification requires inducing non-trivial ordinal distribution prior to the latent space. Target for this, the partially ordered set (poset) is defined for constraining the latent vector. Instead of the typically i.i.d. Gaussian latent prior, in this work, a recursively conditional Gaussian (RCG) set is adapted for ordered constraint modeling, which admits a tractable joint distribution prior. Furthermore, we are able to control the density of content vector that violates the poset constraints by a simple "three-sigma rule". We explicitly disentangle the cross-domain images into a shared ordinal prior induced ordinal content space and two separate source/target ordinal-unrelated spaces, and the self-training is worked on the shared space exclusively for ordinal-aware domain alignment. Extensive experiments on UDA medical diagnoses and facial age estimation demonstrate its effectiveness.

health & medicine, neural network, xiaofeng liu, (18 more...)

arXiv.org Artificial Intelligence

2107.13467

Country:

North America > United States > Massachusetts (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Revisit the Fundamental Theorem of Linear Algebra

Lu, Jun

arXiv.org Artificial IntelligenceAug-9-2021

This survey is meant to provide an introduction to the fundamental theorem of linear algebra and the theories behind them. Our goal is to give a rigorous introduction to the readers with prior exposure to linear algebra. Specifically, we provide some details and proofs of some results from (Strang, 1993). We then describe the fundamental theorem of linear algebra from different views and find the properties and relationships behind the views. The fundamental theorem of linear algebra is essential in many fields, such as electrical engineering, computer science, machine learning, and deep learning. This survey is primarily a summary of purpose, significance of important theories behind it. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in theory behind the fundamental theorem of linear algebra and rigorous analysis in order to seamlessly introduce its properties in four subspaces in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results and given the paucity of scope to present this discussion, e.g., the separated analysis of the (orthogonal) projection matrices. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields. Some excellent examples include (Rose, 1982; Strang, 2009; Trefethen and Bau III, 1997; Strang, 2019, 2021).

artificial intelligence, machine learning, survey article, (16 more...)

arXiv.org Artificial Intelligence

2108.04432

Genre: Overview (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback