Qi, Fei
Uniform tensor clustering by jointly exploring sample affinities of various orders
Cai, Hongmin, Qi, Fei, Li, Junyu, Hu, Yu, Zhang, Yue, Cheung, Yiu-ming, Hu, Bin
Conventional clustering methods based on pairwise affinity usually suffer from the concentration effect while processing huge dimensional features yet low sample sizes data, resulting in inaccuracy to encode the sample proximity and suboptimal performance in clustering. To address this issue, we propose a unified tensor clustering method (UTC) that characterizes sample proximity using multiple samples' affinity, thereby supplementing rich spatial sample distributions to boost clustering. Specifically, we find that the triadic tensor affinity can be constructed via the Khari-Rao product of two affinity matrices. Furthermore, our early work shows that the fourth-order tensor affinity is defined by the Kronecker product. Therefore, we utilize arithmetical products, Khatri-Rao and Kronecker products, to mathematically integrate different orders of affinity into a unified tensor clustering framework. Thus, the UTC jointly learns a joint low-dimensional embedding to combine various orders. Finally, a numerical scheme is designed to solve the problem. Experiments on synthetic datasets and real-world datasets demonstrate that 1) the usage of high-order tensor affinity could provide a supplementary characterization of sample proximity to the popular affinity matrix; 2) the proposed method of UTC is affirmed to enhance clustering by exploiting different order affinities when processing high-dimensional data.
DarwinML: A Graph-based Evolutionary Algorithm for Automated Machine Learning
Qi, Fei, Xia, Zhaohui, Tang, Gaoyang, Yang, Hang, Song, Yu, Qian, Guangrui, An, Xiong, Lin, Chunhuan, Shi, Guangming
Abstract--As an emerging field, Automated Machine Learning (AutoML) aims to reduce or eliminate manual operations that require expertise in machine learning. In this paper, a graphbased architectureis employed to represent flexible combinations of ML models, which provides a large searching space compared to tree-based and stacking-based architectures. Based on this, an evolutionary algorithm is proposed to search for the best architecture, where the mutation and heredity operators are the key for architecture evolution. With Bayesian hyper-parameter optimization, the proposed approach can automate the workflow of machine learning. On the PMLB dataset, the proposed approach shows the state-of-the-art performance compared with TPOT, Autostacker, and auto-sklearn. Some of the optimized models are with complex structures which are difficult to obtain in manual design. I. INTRODUCTION Various models have been thoroughly investigated by the machine learning (ML) community. In theory, these models are general and applicable to both academia and industry. However, it could be time-consuming to build a solution on a specific ML task, even for a ML expert.