Liu, Bin


Sequential online prediction in the presence of outliers and change points: an instant temporal structure learning approach

arXiv.org Machine Learning

In this paper, we consider sequential online prediction (SOP) for streaming data in the presence of outliers and change points. We propose an INstant TEmporal structure Learning (INTEL) algorithm to address this problem.Our INTEL algorithm is developed based on a full consideration to the duality between online prediction and anomaly detection. We first employ a mixture of weighted GP models (WGPs) to cover the expected possible temporal structures of the data. Then, on the basis of the rich modeling capacity of this WGP mixture, we develop an efficient technique to instantly learn (capture) the temporal structure of the data that follows a regime shift. This instant learning is achieved only by adjusting one hyper-parameter value of the mixture model. A weighted generalization of the product of experts (POE) model is used for fusing predictions yielded from multiple GP models. An outlier is declared once a real observation seriously deviates from the fused prediction. If a certain number of outliers are consecutively declared, then a change point is declared. Extensive experiments are performed using a diverse of real datasets. Results show that the proposed algorithm is significantly better than benchmark methods for SOP in the presence of outliers and change points.


Synthetic Oversampling of Multi-Label Data based on Local Label Distribution

arXiv.org Machine Learning

Class-imbalance is an inherent characteristic of multi-label data which affects the prediction accuracy of most multi-label learning methods. One efficient strategy to deal with this problem is to employ resampling techniques before training the classifier. Existing multilabel sampling methods alleviate the (global) imbalance of multi-label datasets. However, performance degradation is mainly due to rare subconcepts and overlapping of classes that could be analysed by looking at the local characteristics of the minority examples, rather than the imbalance of the whole dataset. We propose a new method for synthetic oversampling of multi-label data that focuses on local label distribution to generate more diverse and better labeled instances. Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed approach in a variety of evaluation measures, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data.


Deep Learning Inversion of Electrical Resistivity Data

arXiv.org Artificial Intelligence

The inverse problem of electrical resistivity surveys (ERS) is difficult because of its nonlinear and ill-posed nature. For this task, traditional linear inversion methods still face challenges such as sub-optimal approximation and initial model selection. Inspired by the remarkable non-linear mapping ability of deep learning approaches, in this paper we propose to build the mapping from apparent resistivity data (input) to resistivity model (output) directly by convolutional neural networks (CNNs). However, the vertically varying characteristic of patterns in the apparent resistivity data may cause ambiguity when using CNNs with the weight sharing and effective receptive field properties. To address the potential issue, we supply an additional tier feature map to CNNs to help it get aware of the relationship between input and output. Based on the prevalent U-Net architecture, we design our network (ERSInvNet) which can be trained end-to-end and reach real-time inference during testing. We further introduce depth weighting function and smooth constraint into loss function to improve inversion accuracy for the deep region and suppress false anomalies. Four groups of experiments are considered to demonstrate the feasibility and efficiency of the proposed methods. According to the comprehensive qualitative analysis and quantitative comparison, ERSInvNet with tier feature map, smooth constraints and depth weighting function together achieve the best performance.


Harnessing Low-Fidelity Data to Accelerate Bayesian Optimization via Posterior Regularization

arXiv.org Machine Learning

Bayesian optimization (BO) is a powerful paradigm for derivative-free global optimization of a black-box objective function (BOF) that is expensive to evaluate. However, the overhead of BO can still be prohibitive if the maximum number of allowed function evaluations is less than required. In this paper, we investigate how to reduce the required number of function evaluations for BO without compromise in solution quality. We explore the idea of posterior regularization for harnessing low fidelity (LF) data within the Gaussian process upper confidence bound (GP-UCB) framework. The LF data are assumed to arise from previous evaluations of an LF approximation of the BOF. An extra GP expert called LF-GP is trained to fit the LF data. We develop a dynamic weighted product of experts (DW-POE) fusion operator. The regularization is induced from this operator on the posterior of the BOF. The impact of the LF-GP expert on the resulting regularized posterior is adaptively adjusted via Bayesian formalism. Extensive experimental results on benchmark BOF optimization tasks demonstrate the superior performance of the proposed algorithm over state-of-the-art.


Deep learning Inversion of Seismic Data

arXiv.org Artificial Intelligence

In this paper, we propose a new method to tackle the mapping challenge from time-series data to spatial image in the field of seismic exploration, i.e., reconstructing the velocity model directly from seismic data by deep neural networks (DNNs). The conventional way to address this ill-posed seismic inversion problem is through iterative algorithms, which suffer from poor nonlinear mapping and strong non-uniqueness. Other attempts may either import human intervention errors or underuse seismic data. The challenge for DNNs mainly lies in the weak spatial correspondence, the uncertain reflection-reception relationship between seismic data and velocity model as well as the time-varying property of seismic data. To approach these challenges, we propose an end-to-end Seismic Inversion Networks (SeisInvNet for short) with novel components to make the best use of all seismic data. Specifically, we start with every seismic trace and enhance it with its neighborhood information, its observation setup and global context of its corresponding seismic profile. Then from enhanced seismic traces, the spatially aligned feature maps can be learned and further concatenated to reconstruct velocity model. In general, we let every seismic trace contribute to the reconstruction of the whole velocity model by finding spatial correspondence. The proposed SeisInvNet consistently produces improvements over the baselines and achieves promising performance on our proposed SeisInv dataset according to various evaluation metrics, and the inversion results are more consistent with the target from the aspects of velocity value, subsurface structure and geological interface. In addition to the superior performance, the mechanism is also carefully discussed, and some potential problems are identified for further study.


Deep Metric Transfer for Label Propagation with Limited Annotated Data

arXiv.org Artificial Intelligence

We study object recognition under the constraint that each object class is only represented by very few observations. In such cases, naive supervised learning would lead to severe over-fitting in deep neural networks due to limited training data. We tackle this problem by creating much more training data through label propagation from the few labeled examples to a vast collection of unannotated images. Our main insight is that such a label propagation scheme can be highly effective when the similarity metric used for propagation is learned and transferred from other related domains with lots of data. We test our approach on semi-supervised learning, transfer learning and few-shot recognition, where we learn our similarity metric using various supervised/unsupervised pretraining methods, and transfer it to unlabeled data across different data distributions. By taking advantage of unlabeled data in this way, we achieve significant improvements on all three tasks. Notably, our approach outperforms current state-of-the-art techniques by an absolute $20\%$ for semi-supervised learning on CIFAR10, $10\%$ for transfer learning from ImageNet to CIFAR10, and $6\%$ for few-shot recognition on mini-ImageNet, when labeled examples are limited.


A Very Brief and Critical Discussion on AutoML

arXiv.org Artificial Intelligence

Bin Liu School of Computer Science Nanjing University of Posts and Telecommunications Nanjing, 210023 China Email: bins@ieee.org Abstract This contribution presents a very brief and critical discussion on automated machine learning (AutoML), which is categorized here into two classes, referred to as narrow AutoML and generalized AutoML, respectively. The conclusions yielded from this discussion can be summarized as follows: (1) most existent research on AutoML belongs to the class of narrow AutoML; (2) advances in narrow AutoML are mainly motivated by commercial needs, while any possible benefit obtained is definitely at a cost of increase in computing burdens; (3)the concept of generalized AutoML has a strong tie in spirit with artificial general intelligence (AGI), also called "strong AI", for which obstacles abound for obtaining pivotal progresses. AutoML has recently emerged as a hot research topic in the field of machine learning (ML) and artificial intelligence (AI). As we know, a typical ML pipeline requires a lot of human's participation for e.g., data pre-processing, feature engineering, algorithm selection, model selection and hyperparameter optimization.


Deep Segment Attentive Embedding for Duration Robust Speaker Verification

arXiv.org Machine Learning

LSTM-based speaker verification usually uses a fixed-length local segment randomly truncated from an utterance to learn the utterance-level speaker embedding, while using the average embedding of all segments of a test utterance to verify the speaker, which results in a critical mismatch between testing and training. This mismatch degrades the performance of speaker verification, especially when the durations of training and testing utterances are very different. To alleviate this issue, we propose the deep segment attentive embedding method to learn the unified speaker embeddings for utterances of variable duration. Each utterance is segmented by a sliding window and LSTM is used to extract the embedding of each segment. Instead of only using one local segment, we use the whole utterance to learn the utterance-level embedding by applying an attentive pooling to the embeddings of all segments. Moreover, the similarity loss of segment-level embeddings is introduced to guide the segment attention to focus on the segments with more speaker discriminations, and jointly optimized with the similarity loss of utterance-level embeddings. Systematic experiments on Tongdun and VoxCeleb show that the proposed method significantly improves robustness of duration variant and achieves the relative Equal Error Rate reduction of 50% and 11.54% , respectively.


Particle Filtering Methods for Stochastic Optimization with Application to Large-Scale Empirical Risk Minimization

arXiv.org Machine Learning

There is a recent interest in developing statistical filtering methods for stochastic optimization (FSO) by leveraging a probabilistic perspective of the incremental proximity methods (IPMs). The existent FSO methods are derived based on the Kalman filter (KF) and extended KF (EKF). Different with classical stochastic optimization methods such as the stochastic gradient descent (SGD) and typical IPMs, such KF-type algorithms possess a desirable property, namely they do not require pre-scheduling of the learning rate for convergence. However, on the other side, they have inherent limitations inherited from the nature of KF mechanisms. It is a consensus that the class of particle filters (PFs) outperforms the KF and its variants remarkably for nonlinear and/or non-Gaussian statistical filtering tasks. Hence, it is natural to ask if the FSO methods can benefit from the PF theory to get around of the limitations of the KF-type IPMs. We provide an affirmative answer to the aforementioned question by developing three PF based SO (PFSO) algorithms. We also provide a discussion of relationships among (1) PF methods designed for stochastic dynamic filtering; (2) PF methods designed for static parameter estimation; and (3) our PFSO algorithms. For performance evaluation, we apply the proposed algorithms to solve a least-square fitting problem using simulated dataset, and the empirical risk minimization (ERM) problem in binary classification using real datasets. The experimental results demonstrate that our algorithms outperform remarkably existent methods in terms of numerical stability, convergence speed, classification error rate and flexibility in handling different types of models and loss functions.


A Particle Filter based Multi-Objective Optimization Algorithm: PFOPS

arXiv.org Artificial Intelligence

This letter is concerned with a recently developed paradigm of population-based optimization, termed particle filter optimization (PFO). In contrast with the commonly used meta-heuristics based methods, the PFO paradigm is attractive in terms of coherence in theory and easiness in mathematical analysis and interpretation. However, current PFO algorithms only work for single-objective optimization cases, while many real-life problems involve multiple objectives to be optimized simultaneously. To this end, we make an effort to extend the scope of application of the PFO paradigm to multi-objective optimization (MOO) cases. An idea called path sampling is adopted within the PFO scheme to balance the different objectives to be optimized. The resulting algorithm is thus termed PFO with Path Sampling (PFOPS). Experimental results show that the proposed algorithm works consistently well for three different types of MOO problems, which are characterized by an associated convex, concave and discontinuous Pareto front, respectively.