AITopics | Xie, Pengtao

Collaborating Authors

Xie, Pengtao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization

Guo, Han, Hosseini, Ramtin, Zhang, Ruiyi, Somayajula, Sai Ashish, Chowdhury, Ranak Roy, Gupta, Rajesh K., Xie, Pengtao

arXiv.org Artificial IntelligenceFeb-28-2024

Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation of MAE lies in its disregard for the varying informativeness of different patches, as it uniformly selects patches to mask. To overcome this, some approaches propose masking based on patch informativeness. However, these methods often do not consider the specific requirements of downstream tasks, potentially leading to suboptimal representations for these tasks. In response, we introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that leverages end-to-end feedback from downstream tasks to learn an optimal masking strategy during pretraining. Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning. Compared to existing methods, it demonstrates remarkable improvements across diverse datasets and tasks, showcasing its adaptability and efficiency. Our code is available at: https://github.com/Alexiland/MLOMAE

artificial intelligence, dataset, machine learning, (10 more...)

arXiv.org Artificial Intelligence

2402.18128

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.48)
Transportation > Ground > Road (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Making Scalable Meta Learning Practical

Choe, Sang Keun, Mehta, Sanket Vaibhav, Ahn, Hwijeen, Neiswanger, Willie, Xie, Pengtao, Strubell, Emma, Xing, Eric

arXiv.org Artificial IntelligenceOct-23-2023

Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and 2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2310.05674

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Liang, Youwei, Zhang, Ruiyi, Zhang, Li, Xie, Pengtao

arXiv.org Artificial IntelligenceMay-18-2023

A ChatGPT-like system for drug compounds could be a game-changer in pharmaceutical research, accelerating drug discovery, enhancing our understanding of structure-activity relationships, guiding lead optimization, aiding drug repurposing, reducing the failure rate, and streamlining clinical trials. In this work, we make an initial attempt towards enabling ChatGPT-like capabilities on drug molecule graphs, by developing a prototype system DrugChat. DrugChat works in a similar way as ChatGPT. Users upload a compound molecule graph and ask various questions about this compound. DrugChat will answer these questions in a multi-turn, interactive manner. The DrugChat system consists of a graph neural network (GNN), a large language model (LLM), and an adaptor. The GNN takes a compound molecule graph as input and learns a representation for this graph. The adaptor transforms the graph representation produced by the GNN into another representation that is acceptable to the LLM. The LLM takes the compound representation transformed by the adaptor and users' questions about this compound as inputs and generates answers. All these components are trained end-to-end. To train DrugChat, we collected instruction tuning datasets which contain 10,834 drug compounds and 143,517 question-answer pairs. The code and data is available at \url{https://github.com/UCSD-AI4H/drugchat}

large language model, machine learning, natural language, (5 more...)

arXiv.org Artificial Intelligence

2309.03907

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Zhou, Ce, Li, Qian, Li, Chen, Yu, Jun, Liu, Yixin, Wang, Guangjing, Zhang, Kai, Ji, Cheng, Yan, Qiben, He, Lifang, Peng, Hao, Li, Jianxin, Wu, Jia, Liu, Ziwei, Xie, Pengtao, Xiong, Caiming, Pei, Jian, Yu, Philip S., Sun, Lichao

arXiv.org Artificial IntelligenceMay-1-2023

Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A PFM (e.g., BERT, ChatGPT, and GPT-4) is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. BERT learns bidirectional encoder representations from Transformers, which are trained on large datasets as contextual language models. Similarly, the generative pretrained transformer (GPT) method employs Transformers as the feature extractor and is trained using an autoregressive paradigm on large datasets. Recently, ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few shot prompting. The remarkable achievements of PFM have brought significant breakthroughs to various fields of AI. Numerous studies have proposed different methods, raising the demand for an updated survey. This study provides a comprehensive review of recent research advancements, challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. The review covers the basic components and existing pretraining methods used in natural language processing, computer vision, and graph learning. Additionally, it explores advanced PFMs used for different data modalities and unified PFMs that consider data quality and quantity. The review also discusses research related to the fundamentals of PFMs, such as model efficiency and compression, security, and privacy. Finally, the study provides key implications, future research directions, challenges, and open problems in the field of PFMs. Overall, this survey aims to shed light on the research of the PFMs on scalability, security, logical reasoning ability, cross-domain learning ability, and the user-friendly interactive ability for artificial general intelligence.

artificial intelligence, chatbot, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.09419

Country:

Asia (0.92)
North America > United States > California (0.27)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)
Research Report > Experimental Study (0.85)

Industry:

Leisure & Entertainment (1.00)
Education (1.00)
Information Technology > Security & Privacy (0.87)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning by Grouping: A Multilevel Optimization Framework for Improving Fairness in Classification without Losing Accuracy

Hosseini, Ramtin, Zhang, Li, Garg, Bhanu, Xie, Pengtao

arXiv.org Artificial IntelligenceApr-2-2023

The integration of machine learning models in various real-world applications is becoming more prevalent to assist humans in their daily decision-making tasks as a result of recent advancements in this field. However, it has been discovered that there is a tradeoff between the accuracy and fairness of these decision-making tasks. In some cases, these AI systems can be unfair by exhibiting bias or discrimination against certain social groups, which can have severe consequences in real life. Inspired by one of the most well-known human learning skills called grouping, we address this issue by proposing a novel machine learning framework where the ML model learns to group a diverse set of problems into distinct subgroups to solve each subgroup using its specific sub-model. Our proposed framework involves three stages of learning, which are formulated as a three-level optimization problem: (i) learning to group problems into different subgroups; (ii) learning group-specific sub-models for problem-solving; and (iii) updating group assignments of training examples by minimizing the validation loss. These three learning stages are performed end-to-end in a joint manner using gradient descent. To improve fairness and accuracy, we develop an efficient optimization algorithm to solve this three-level optimization problem. To further reduce the risk of overfitting in small datasets, we incorporate domain adaptation techniques in the second stage of training. We further apply our method to neural architecture search. Extensive experiments on various datasets demonstrate our method's effectiveness and performance improvements in both fairness and accuracy. Our proposed Learning by Grouping can reduce overfitting and achieve state-of-the-art performances with fixed human-designed network architectures and searchable network architectures on various datasets.

artificial intelligence, machine learning, subgroup, (14 more...)

arXiv.org Artificial Intelligence

2304.00486

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.88)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Betty: An Automatic Differentiation Library for Multilevel Optimization

Choe, Sang Keun, Neiswanger, Willie, Xie, Pengtao, Xing, Eric

arXiv.org Artificial IntelligenceMar-15-2023

Gradient-based multilevel optimization (MLO) has gained attention as a framework for studying numerous problems, ranging from hyperparameter optimization and meta-learning to neural architecture search and reinforcement learning. However, gradients in MLO, which are obtained by composing best-response Jacobians via the chain rule, are notoriously difficult to implement and memory/compute intensive. Multilevel optimization (MLO) addresses nested optimization scenarios, where upper level optimization problems are constrained by lower level optimization problems following an underlying hierarchical dependency. MLO has gained considerable attention as a unified mathematical framework for studying diverse problems including meta-learning (Finn et al., 2017; Rajeswaran et al., 2019), hyperparameter optimization (Franceschi et al., 2017), neural architecture search (Liu et al., 2019), and reinforcement learning (Konda & Tsitsiklis, 1999; Rajeswaran et al., 2020). While a majority of existing work is built upon bilevel optimization, the simplest case of MLO, there have been recent efforts that go beyond this two-level hierarchy. For example, (Raghu et al., 2021) proposed trilevel optimization that combines hyperparameter optimization with two-level pretraining and finetuning. More generally, conducting joint optimization over machine learning pipelines consisting of multiple models and hyperparameter sets can be approached as deeper instances of MLO (Garg et al., 2022; Raghu et al., 2021; Somayajula et al., 2022; Such et al., 2020). Following its increasing popularity, a multitude of optimization algorithms have been proposed to solve MLO. Among them, gradient-based (or first-order) approaches (Pearlmutter & Siskind, 2008; Lorraine et al., 2020; Raghu et al., 2021; Sato et al., 2021) have recently received the limelight from the machine learning community, due to their ability to carry out efficient high-dimensional optimization, under which all of the above listed applications fall. Nevertheless, research in gradientbased MLO has been largely impeded by two major bottlenecks.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2207.02849

Country: North America > United States (0.67)

Genre: Research Report (0.82)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning Compiler Pass Orders using Coreset and Normalized Value Prediction

Liang, Youwei, Stone, Kevin, Shameli, Ali, Cummins, Chris, Elhoushi, Mostafa, Guo, Jiadong, Steiner, Benoit, Yang, Xiaomeng, Xie, Pengtao, Leather, Hugh, Tian, Yuandong

arXiv.org Artificial IntelligenceJan-27-2023

Finding the optimal pass sequence of compilation can lead to a significant reduction in program size and/or improvement in program efficiency. Prior works on compilation pass ordering have two major drawbacks. They either require an excessive budget (in terms of compilation steps) at compile time or fail to generalize to unseen programs. In this paper, for code-size reduction tasks, we propose a novel pipeline to find program-dependent pass sequences within 45 compilation calls. It first identifies a coreset of 50 pass sequences via greedy optimization of a submodular function, and then learns a policy with Graph Neural Network (GNN) to pick the optimal sequence by predicting the normalized values of the pass sequences in the coreset. Despite its simplicity, our pipeline outperforms the default -Oz flag by an average of 4.7% over a large collection (4683) of unseen code repositories from diverse domains across 14 datasets. In comparison, previous approaches like reinforcement learning on the raw pass sequence space may take days to train due to sparse reward, and may not generalize well in held-out ones from different domains. Our results demonstrate that existing human-designed compiler flags can be improved with a simple yet effective technique that transforms the raw action space into a small one with denser rewards.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2301.05104

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Learning from Mistakes -- A Framework for Neural Architecture Search

Garg, Bhanu, Zhang, Li, Sridhara, Pradyumna, Hosseini, Ramtin, Xing, Eric, Xie, Pengtao

arXiv.org Artificial IntelligenceJan-13-2022

Learning from one's mistakes is an effective human learning technique where the learners focus more on the topics where mistakes were made, so as to deepen their understanding. In this paper, we investigate if this human learning strategy can be applied in machine learning. We propose a novel machine learning method called Learning From Mistakes (LFM), wherein the learner improves its ability to learn by focusing more on the mistakes during revision. We formulate LFM as a three-stage optimization problem: 1) learner learns; 2) learner re-learns focusing on the mistakes, and; 3) learner validates its learning. We develop an efficient algorithm to solve the LFM problem. We apply the LFM framework to neural architecture search on CIFAR-10, CIFAR-100, and Imagenet. Experimental results strongly demonstrate the effectiveness of our model.

machine learning, teaching medhods, teaching method, (19 more...)

arXiv.org Artificial Intelligence

2111.06353

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Self-directed Machine Learning

Zhu, Wenwu, Wang, Xin, Xie, Pengtao

arXiv.org Artificial IntelligenceJan-7-2022

Conventional machine learning (ML) relies heavily on manual design from machine learning experts to decide learning tasks, data, models, optimization algorithms, and evaluation metrics, which is labor-intensive, time-consuming, and cannot learn autonomously like humans. In education science, self-directed learning, where human learners select learning tasks and materials on their own without requiring hands-on guidance, has been shown to be more effective than passive teacher-guided learning. Inspired by the concept of self-directed human learning, we introduce the principal concept of Self-directed Machine Learning (SDML) and propose a framework for SDML. Specifically, we design SDML as a self-directed learning process guided by self-awareness, including internal awareness and external awareness. Our proposed SDML process benefits from self task selection, self data selection, self model selection, self optimization strategy selection and self evaluation metric selection through self-awareness without human guidance. Meanwhile, the learning performance of the SDML process serves as feedback to further improve self-awareness. We propose a mathematical formulation for SDML based on multi-level optimization. Furthermore, we present case studies together with potential applications of SDML, followed by discussing future research directions. We expect that SDML could enable machines to conduct human-like self-directed learning and provide a new perspective towards artificial general intelligence.

machine learning, reinforcement learning, selection, (20 more...)

arXiv.org Artificial Intelligence

2201.01289

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry:

Education > Educational Setting (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
(3 more...)

Add feedback

Interleaving Learning, with Application to Neural Architecture Search

Ban, Hao, Xie, Pengtao

arXiv.org Artificial IntelligenceMar-11-2021

Interleaving learning is a human learning technique where a learner interleaves the studies of multiple topics, which increases long-term retention and improves ability to transfer learned knowledge. Inspired by the interleaving learning technique of humans, in this paper we explore whether this learning methodology is beneficial for improving the performance of machine learning models as well. We propose a novel machine learning framework referred to as interleaving learning (IL). In our framework, a set of models collaboratively learn a data encoder in an interleaving fashion: the encoder is trained by model 1 for a while, then passed to model 2 for further training, then model 3, and so on; after trained by all models, the encoder returns back to model 1 and is trained again, then moving to model 2, 3, etc. This process repeats for multiple rounds. Our framework is based on multi-level optimization consisting of multiple inter-connected learning stages. An efficient gradient-based algorithm is developed to solve the multi-level optimization problem. We apply interleaving learning to search neural architectures for image classification on CIFAR-10, CIFAR-100, and ImageNet. The effectiveness of our method is strongly demonstrated by the experimental results.

cifar-10, neural network, optimization problem, (20 more...)

arXiv.org Artificial Intelligence

2103.07018

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback