AITopics | Qiu, Zi-Hao

Collaborating Authors

Qiu, Zi-Hao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO

Qiu, Zi-Hao, Guo, Siqi, Xu, Mao, Zhao, Tuo, Zhang, Lijun, Yang, Tianbao

arXiv.org Artificial IntelligenceJun-16-2024

The temperature parameter plays a profound role during training and/or inference with large foundation models (LFMs) such as large language models (LLMs) and CLIP models. Particularly, it adjusts the logits in the softmax function in LLMs, which is crucial for next token generation, and it scales the similarities in the contrastive loss for training CLIP models. A significant question remains: Is it viable to learn a neural network to predict a personalized temperature of any input data for enhancing LFMs"? In this paper, we present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs. Our solution is composed of a novel learning framework with a robust loss underpinned by constrained distributionally robust optimization (DRO), and a properly designed TempNet with theoretical inspiration. TempNet can be trained together with a large foundation model from scratch or learned separately given a pretrained foundation model. It is not only useful for predicting personalized temperature to promote the training of LFMs but also generalizable and transferable to new tasks. Our experiments on LLMs and CLIP models demonstrate that TempNet greatly improves the performance of existing solutions or models, e.g. Table 1. The code to reproduce the experimental results in this paper can be found at https://github.com/zhqiu/TempNet.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.04575

Country:

North America > United States > Texas (0.14)
Europe > Austria > Vienna (0.14)
Asia > Middle East > Israel > Mediterranean Sea (0.14)

Genre: Research Report (1.00)

Industry:

Social Sector (0.94)
Law (0.93)
Education (0.67)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LibAUC: A Deep Learning Library for X-Risk Optimization

Yuan, Zhuoning, Zhu, Dixian, Qiu, Zi-Hao, Li, Gang, Wang, Xuanhui, Yang, Tianbao

arXiv.org Artificial IntelligenceJun-5-2023

The Torch [36] have dramatically reduced the efforts of developers motivation of developing LibAUC is to address the convergence and researchers for implementing different DL methods without issues of existing libraries for solving these problems. In particular, worrying about low-level computations (e.g., automatic differentiation, existing libraries may not converge or require very large mini-batch tensor operations, etc). Based on these platforms, plenty sizes in order to attain good performance for these problems, due of DL libraries have been developed for different purposes, which to the usage of the standard mini-batch technique in the empirical can be organized into different categories including (i) supporting risk minimization (ERM) framework. Our library is for deep X-risk specific tasks [15, 35], e.g., TF-Ranking for LTR [35], VISSL for optimization (DXO) that has achieved great success in solving a variety self-supervised learning (SSL) [15], (ii) supporting specific data, of tasks for CID, LTR and CLR. The contributions of this paper e.g., DGL and DIG for graphs [31, 55]; (iii) supporting specific models include: (1) It introduces a new mini-batch based pipeline for implementing [13, 58, 59], e.g., Transformers for transformer models [59]. DXO algorithms, which differs from existing DL pipeline in However, it has been observed that these existing platforms and the design of controlled data samplers and dynamic mini-batch losses; libraries have encountered some unique challenges when solving (2) It provides extensive benchmarking experiments for ablation some classical and emerging problems in AI, including classification studies and comparison with existing libraries. The LibAUC library for imbalanced data (CID), learning to rank (LTR), contrastive features scalable performance for millions of items to be contrasted, learning of representations (CLR). In particular, prior works have faster and better convergence than existing libraries for optimizing observed that large mini-batch sizes are necessary to attain good X-risks, seamless PyTorch deployment and versatile APIs for various performance for these problems [4, 5, 7, 37, 43, 46], which restricts loss optimization. Our library is available to the open source the capabilities of these AI models in the real-world.

artificial intelligence, library, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3580305.3599861

2306.03065

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (0.51)
Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Blockwise Stochastic Variance-Reduced Methods with Parallel Speedup for Multi-Block Bilevel Optimization

Hu, Quanqi, Qiu, Zi-Hao, Guo, Zhishuai, Zhang, Lijun, Yang, Tianbao

arXiv.org Artificial IntelligenceJun-2-2023

In this paper, we consider non-convex multi-block bilevel optimization (MBBO) problems, which involve $m\gg 1$ lower level problems and have important applications in machine learning. Designing a stochastic gradient and controlling its variance is more intricate due to the hierarchical sampling of blocks and data and the unique challenge of estimating hyper-gradient. We aim to achieve three nice properties for our algorithm: (a) matching the state-of-the-art complexity of standard BO problems with a single block; (b) achieving parallel speedup by sampling $I$ blocks and sampling $B$ samples for each sampled block per-iteration; (c) avoiding the computation of the inverse of a high-dimensional Hessian matrix estimator. However, it is non-trivial to achieve all of these by observing that existing works only achieve one or two of these properties. To address the involved challenges for achieving (a, b, c), we propose two stochastic algorithms by using advanced blockwise variance-reduction techniques for tracking the Hessian matrices (for low-dimensional problems) or the Hessian-vector products (for high-dimensional problems), and prove an iteration complexity of $O(\frac{m\epsilon^{-3}\mathbb{I}(I

artificial intelligence, blockwise stochastic variance-reduced method, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2305.1873

Country: North America > United States > Texas > Brazos County > College Station (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization

Qiu, Zi-Hao, Hu, Quanqi, Yuan, Zhuoning, Zhou, Denny, Zhang, Lijun, Yang, Tianbao

arXiv.org Artificial IntelligenceMay-19-2023

In this paper, we aim to optimize a contrastive loss with individualized temperatures in a principled and systematic manner for self-supervised learning. The common practice of using a global temperature parameter $\tau$ ignores the fact that ``not all semantics are created equal", meaning that different anchor data may have different numbers of samples with similar semantics, especially when data exhibits long-tails. First, we propose a new robust contrastive loss inspired by distributionally robust optimization (DRO), providing us an intuition about the effect of $\tau$ and a mechanism for automatic temperature individualization. Then, we propose an efficient stochastic algorithm for optimizing the robust contrastive loss with a provable convergence guarantee without using large mini-batch sizes. Theoretical and experimental results show that our algorithm automatically learns a suitable $\tau$ for each sample. Specifically, samples with frequent semantics use large temperatures to keep local semantic structures, while samples with rare semantics use small temperatures to induce more separable features. Our method not only outperforms prior strong baselines (e.g., SimCLR, CLIP) on unimodal and bimodal datasets with larger improvements on imbalanced data but also is less sensitive to hyper-parameters. To our best knowledge, this is the first methodical approach to optimizing a contrastive loss with individualized temperatures.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.11965

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.60)

Add feedback

Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence

Qiu, Zi-Hao, Hu, Quanqi, Zhong, Yongjian, Zhang, Lijun, Yang, Tianbao

arXiv.org Artificial IntelligenceFeb-2-2023

NDCG, namely Normalized Discounted Cumulative Gain, is a widely used ranking metric in information retrieval and machine learning. However, efficient and provable stochastic methods for maximizing NDCG are still lacking, especially for deep models. In this paper, we propose a principled approach to optimize NDCG and its top-$K$ variant. First, we formulate a novel compositional optimization problem for optimizing the NDCG surrogate, and a novel bilevel compositional optimization problem for optimizing the top-$K$ NDCG surrogate. Then, we develop efficient stochastic algorithms with provable convergence guarantees for the non-convex objectives. Different from existing NDCG optimization methods, the per-iteration complexity of our algorithms scales with the mini-batch size instead of the number of total items. To improve the effectiveness for deep learning, we further propose practical strategies by using initial warm-up and stop gradient operator. Experimental results on multiple datasets demonstrate that our methods outperform prior ranking approaches in terms of NDCG. To the best of our knowledge, this is the first time that stochastic algorithms are proposed to optimize NDCG with a provable convergence guarantee. Our proposed methods are implemented in the LibAUC library at https://libauc.org/.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2202.12183

Country: North America > United States > Iowa (0.28)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback