AITopics | Zhao, Zhe

Collaborating Authors

Zhao, Zhe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

Wu, Taiqiang, Zhao, Zhe, Wang, Jiahao, Bai, Xingyu, Wang, Lei, Wong, Ngai, Yang, Yujiu

arXiv.org Artificial IntelligenceMar-27-2023

Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic. However, MLPs rely exclusively on the node features and fail to capture the graph structural information. Previous methods address this issue by processing graph edges into extra inputs for MLPs, but such graph structures may be unavailable for various scenarios. To this end, we propose a Prototype-Guided Knowledge Distillation~(PGKD) method, which does not require graph edges~(edge-free) yet learns structure-aware MLPs. Specifically, we analyze the graph structural information in GNN teachers, and distill such information from GNNs to MLPs via prototypes in an edge-free setting. Experimental results on popular graph benchmarks demonstrate the effectiveness and robustness of the proposed PGKD.

artificial intelligence, gnn teacher, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2303.13763

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

Benbaki, Riade, Chen, Wenyu, Meng, Xiang, Hazimeh, Hussein, Ponomareva, Natalia, Zhao, Zhe, Mazumder, Rahul

arXiv.org Artificial IntelligenceFeb-28-2023

The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful, these techniques often face serious tradeoffs between computational requirements and compression quality. In this work, we propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint. Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and results in significant improvements in speed, memory, and performance over existing optimization-based approaches for network pruning. CHITA's main workhorse performs combinatorial optimization updates on a memory-friendly representation of local quadratic approximation(s) of the loss function. On a standard benchmark of pretrained models and datasets, CHITA leads to significantly better sparsity-accuracy tradeoffs than competing methods. For example, for MLPNet with only 2% of the weights retained, our approach improves the accuracy by 63% relative to the state of the art. Furthermore, when used in conjunction with fine-tuning SGD steps, our method achieves significant accuracy gains over the state-of-the-art approaches.

approximation, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.14623

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > Promising Solution (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

QVIP: An ILP-based Formal Verification Approach for Quantized Neural Networks

Zhang, Yedi, Zhao, Zhe, Song, Fu, Zhang, Min, Chen, Taolue, Sun, Jun

arXiv.org Artificial IntelligenceDec-9-2022

Deep learning has become a promising programming paradigm in software development, owing to its surprising performance in solving many challenging tasks. Deep neural networks (DNNs) are increasingly being deployed in practice, but are limited on resource-constrained devices owing to their demand for computational power. Quantization has emerged as a promising technique to reduce the size of DNNs with comparable accuracy as their floating-point numbered counterparts. The resulting quantized neural networks (QNNs) can be implemented energy-efficiently. Similar to their floating-point numbered counterparts, quality assurance techniques for QNNs, such as testing and formal verification, are essential but are currently less explored. In this work, we propose a novel and efficient formal verification approach for QNNs. In particular, we are the first to propose an encoding that reduces the verification problem of QNNs into the solving of integer linear constraints, which can be solved using off-the-shelf solvers. Our encoding is both sound and complete. We demonstrate the application of our approach on local robustness verification and maximum robustness radius computation. We implement our approach in a prototype tool QVIP and conduct a thorough evaluation. Experimental results on QNNs with different quantization bits confirm the effectiveness and efficiency of our approach, e.g., two orders of magnitude faster and able to solve more verification tasks in the same time limit than the state-of-the-art methods.

artificial intelligence, machine learning, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2212.11138

Country:

North America > United States (0.47)
Asia (0.28)

Genre:

Research Report > Promising Solution (0.68)
Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (0.67)
Automobiles & Trucks (0.67)
Information Technology > Security & Privacy (0.46)
Information Technology > Robotics & Automation (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Multi-Task Generalization via Regularizing Spurious Correlation

Hu, Ziniu, Zhao, Zhe, Yi, Xinyang, Yao, Tiansheng, Hong, Lichan, Sun, Yizhou, Chi, Ed H.

arXiv.org Artificial IntelligenceNov-24-2022

Multi-Task Learning (MTL) is a powerful learning paradigm to improve generalization performance via knowledge sharing. However, existing studies find that MTL could sometimes hurt generalization, especially when two tasks are less correlated. One possible reason that hurts generalization is spurious correlation, i.e., some knowledge is spurious and not causally related to task labels, but the model could mistakenly utilize them and thus fail when such correlation changes. In MTL setup, there exist several unique challenges of spurious correlation. First, the risk of having non-causal knowledge is higher, as the shared MTL model needs to encode all knowledge from different tasks, and causal knowledge for one task could be potentially spurious to the other. Second, the confounder between task labels brings in a different type of spurious correlation to MTL. We theoretically prove that MTL is more prone to taking non-causal knowledge from other tasks than single-task learning, and thus generalize worse. To solve this problem, we propose Multi-Task Causal Representation Learning framework, aiming to represent multi-task knowledge via disentangled neural modules, and learn which module is causally related to each task via MTL-specific invariant regularization. Experiments show that it could enhance MTL model's performance by 5.5% on average over Multi-MNIST, MovieLens, Taskonomy, CityScape, and NYUv2, via alleviating spurious correlation problem.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

2205.09797

Country:

Europe (1.00)
North America > Canada (0.68)
North America > United States > California > Los Angeles County (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (0.92)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Scalable Bayesian Inference for Detection and Deblending in Astronomical Images

Hansen, Derek, Mendoza, Ismael, Liu, Runjing, Pang, Ziteng, Zhao, Zhe, Avestruz, Camille, Regier, Jeffrey

arXiv.org Machine LearningJul-12-2022

We present a new probabilistic method for detecting, deblending, and cataloging astronomical sources called the Bayesian Light Source Separator (BLISS). BLISS is based on deep generative models, which embed neural networks within a Bayesian model. For posterior inference, BLISS uses a new form of variational inference known as Forward Amortized Variational Inference. The BLISS inference routine is fast, requiring a single forward pass of the encoder networks on a GPU once the encoder networks are trained. BLISS can perform fully Bayesian inference on megapixel images in seconds, and produces highly accurate catalogs. BLISS is highly extensible, and has the potential to directly answer downstream scientific questions in addition to producing probabilistic catalogs.

artificial intelligence, bliss, machine learning, (16 more...)

arXiv.org Machine Learning

2207.05642

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.91)

Add feedback

Transformer Memory as a Differentiable Search Index

Tay, Yi, Tran, Vinh Q., Dehghani, Mostafa, Ni, Jianmo, Bahri, Dara, Mehta, Harsh, Qin, Zhen, Hui, Kai, Zhao, Zhe, Gupta, Jai, Schuster, Tal, Cohen, William W., Metzler, Donald

arXiv.org Artificial IntelligenceFeb-16-2022

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model. To this end, we introduce the Differentiable Search Index (DSI), a new paradigm that learns a text-to-text model that maps string queries directly to relevant docids; in other words, a DSI model answers queries directly using only its parameters, dramatically simplifying the whole retrieval process. We study variations in how documents and their identifiers are represented, variations in training procedures, and the interplay between models and corpus sizes. Experiments demonstrate that given appropriate design choices, DSI significantly outperforms strong baselines such as dual encoder models. Moreover, DSI demonstrates strong generalization capabilities, outperforming a BM25 baseline in a zero-shot setup.

artificial intelligence, differentiable search index, transformer memory

arXiv.org Artificial Intelligence

2202.06991

Genre: Research Report (0.40)

Technology:

Information Technology > Information Management > Search (0.60)
Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

Adversarial Attacks on ML Defense Models Competition

Dong, Yinpeng, Fu, Qi-An, Yang, Xiao, Xiang, Wenzhao, Pang, Tianyu, Su, Hang, Zhu, Jun, Tang, Jiayu, Chen, Yuefeng, Mao, XiaoFeng, He, Yuan, Xue, Hui, Li, Chao, Liu, Ye, Zhang, Qilong, Gao, Lianli, Yu, Yunrui, Gao, Xitong, Zhao, Zhe, Lin, Daquan, Lin, Jiadong, Song, Chuanbiao, Wang, Zihao, Wu, Zhennan, Guo, Yang, Cui, Jiequan, Xu, Xiaogang, Chen, Pengguang

arXiv.org Artificial IntelligenceOct-15-2021

Due to the vulnerability of deep neural networks (DNNs) to adversarial examples, a large number of defense techniques have been proposed to alleviate this problem in recent years. However, the progress of building more robust models is usually hampered by the incomplete or incorrect robustness evaluation. To accelerate the research on reliable evaluation of adversarial robustness of the current defense models in image classification, the TSAIL group at Tsinghua University and the Alibaba Security group organized this competition along with a CVPR 2021 workshop on adversarial machine learning (https://aisecure-workshop.github.io/amlcvpr2021/). The purpose of this competition is to motivate novel attack algorithms to evaluate adversarial robustness more effectively and reliably. The participants were encouraged to develop stronger white-box attack algorithms to find the worst-case robustness of different defenses. This competition was conducted on an adversarial robustness evaluation platform -- ARES (https://github.com/thu-ml/ares), and is held on the TianChi platform (https://tianchi.aliyun.com/competition/entrance/531847/introduction) as one of the series of AI Security Challengers Program. After the competition, we summarized the results and established a new adversarial robustness benchmark at https://ml.cs.tsinghua.edu.cn/ares-bench/, which allows users to upload adversarial attack algorithms and defense models for evaluation.

artificial intelligence, machine learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

2110.08042

Country:

Asia > China (0.28)
North America > United States > Wisconsin (0.14)

Genre:

Workflow (0.46)
Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

The Benchmark Lottery

Dehghani, Mostafa, Tay, Yi, Gritsenko, Alexey A., Zhao, Zhe, Houlsby, Neil, Diaz, Fernando, Metzler, Donald, Vinyals, Oriol

arXiv.org Artificial IntelligenceJul-14-2021

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods. This paper proposes the notion of "a benchmark lottery" that describes the overall fragility of the ML benchmarking process. The benchmark lottery postulates that many factors, other than fundamental algorithmic superiority, may lead to a method being perceived as superior. On multiple benchmark setups that are prevalent in the ML community, we show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks, highlighting the fragility of the current paradigms and potential fallacious interpretation derived from benchmarking ML methods. Given that every benchmark makes a statement about what it perceives to be important, we argue that this might lead to biased progress in the community. We discuss the implications of the observed phenomena and provide recommendations on mitigating them using multiple machine learning domains and communities as use cases, including natural language processing, computer vision, information retrieval, recommender systems, and reinforcement learning.

benchmark, deep learning, neural network, (22 more...)

arXiv.org Artificial Intelligence

2107.07002

Country:

Europe (0.28)
North America > United States > Louisiana (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment > Games (0.67)
Education (0.67)
Health & Medicine > Diagnostic Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

Hazimeh, Hussein, Zhao, Zhe, Chowdhery, Aakanksha, Sathiamoorthy, Maheswaran, Chen, Yihua, Mazumder, Rahul, Hong, Lichan, Chi, Ed H.

arXiv.org Machine LearningJun-9-2021

The Mixture-of-experts (MoE) architecture is showing promising results in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The lack of smoothness can lead to convergence and statistical performance issues when training with gradient-based methods. In this paper, we develop DSelect-k: the first, continuously differentiable and sparse gate for MoE, based on a novel binary encoding formulation. Our gate can be trained using first-order methods, such as stochastic gradient descent, and offers explicit control over the number of experts to select. We demonstrate the effectiveness of DSelect-k in the context of MTL, on both synthetic and real datasets with up to 128 tasks. Our experiments indicate that MoE models based on DSelect-k can achieve statistically significant improvements in predictive and expert selection performance. Notably, on a real-world large-scale recommender system, DSelect-k achieves over 22% average improvement in predictive performance compared to the Top-k gate. We provide an open-source TensorFlow implementation of our gate.

deep learning, dselect-k, neural network, (17 more...)

arXiv.org Machine Learning

2106.0376

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Attack as Defense: Characterizing Adversarial Examples using Robustness

Zhao, Zhe, Chen, Guangke, Wang, Jingyi, Yang, Yiwei, Song, Fu, Sun, Jun

arXiv.org Artificial IntelligenceMar-13-2021

As a new programming paradigm, deep learning has expanded its application to many real-world problems. At the same time, deep learning based software are found to be vulnerable to adversarial attacks. Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks. In this work, we propose a novel characterization to distinguish adversarial examples from benign ones based on the observation that adversarial examples are significantly less robust than benign ones. As existing robustness measurement does not scale to large networks, we propose a novel defense framework, named attack as defense (A2D), to detect adversarial examples by effectively evaluating an example's robustness. A2D uses the cost of attacking an input for robustness evaluation and identifies those less robust examples as adversarial since less robust examples are easier to attack. Extensive experiment results on MNIST, CIFAR10 and ImageNet show that A2D is more effective than recent promising approaches. We also evaluate our defence against potential adaptive attacks and show that A2D is effective in defending carefully designed adaptive attacks, e.g., the attack success rate drops to 0% on CIFAR10.

adversarial example, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2103.07633

Country:

Asia (0.46)
North America > United States (0.30)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback