If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
Online retailers execute a very large number of price updates when compared to brick-and-mortar stores. Even a few mis-priced items can have a significant business impact and result in a loss of customer trust. Early detection of anomalies in an automated real-time fashion is an important part of such a pricing system. In this paper, we describe unsupervised and supervised anomaly detection approaches we developed and deployed for a large-scale online pricing system at Walmart. Our system detects anomalies both in batch and real-time streaming settings, and the items flagged are reviewed and actioned based on priority and business impact. We found that having the right architecture design was critical to facilitate model performance at scale, and business impact and speed were important factors influencing model selection, parameter choice, and prioritization in a production environment for a large-scale system. We conducted analyses on the performance of various approaches on a test set using real-world retail data and fully deployed our approach into production. We found that our approach was able to detect the most important anomalies with high precision.
Recently, researchers have started decomposing deep neural network models according to their semantics or functions. Recent work has shown the effectiveness of decomposed functional blocks for defending adversarial attacks, which add small input perturbation to the input image to fool the DNN models. This work proposes a profiling-based method to decompose the DNN models to different functional blocks, which lead to the effective path as a new approach to exploring DNNs' internal organization. Specifically, the per-image effective path can be aggregated to the class-level effective path, through which we observe that adversarial images activate effective path different from normal images. We propose an effective path similarity-based method to detect adversarial images with an interpretable model, which achieve better accuracy and broader applicability than the state-of-the-art technique.
Industrial recommender systems usually consist of the matching stage and the ranking stage, in order to handle the billion-scale of users and items. The matching stage retrieves candidate items relevant to user interests, while the ranking stage sorts candidate items by user interests. Thus, the most critical ability is to model and represent user interests for either stage. Most of the existing deep learning-based models represent one user as a single vector which is insufficient to capture the varying nature of user's interests. In this paper, we approach this problem from a different view, to represent one user with multiple vectors encoding the different aspects of the user's interests. We propose the Multi-Interest Network with Dynamic routing (MIND) for dealing with user's diverse interests in the matching stage. Specifically, we design a multi-interest extractor layer based on capsule routing mechanism, which is applicable for clustering historical behaviors and extracting diverse interests. Furthermore, we develop a technique named label-aware attention to help learn a user representation with multiple vectors. Through extensive experiments on several public benchmarks and one large-scale industrial dataset from Tmall, we demonstrate that MIND can achieve superior performance than state-of-the-art methods for recommendation. Currently, MIND has been deployed for handling major online traffic at the homepage on Mobile Tmall App.
Although the convolutional neural networks (CNNs) have become popular for various image processing and computer vision task recently, it remains a challenging problem to reduce the storage cost of the parameters for resource-limited platforms. In the previous studies, tensor decomposition (TD) has achieved promising compression performance by embedding the kernel of a convolutional layer into a low-rank subspace. However the employment of TD is naively on the kernel or its specified variants. Unlike the conventional approaches, this paper shows that the kernel can be embedded into more general or even random low-rank subspaces. We demonstrate this by compressing the convolutional layers via randomly-shuffled tensor decomposition (RsTD) for a standard classification task using CIFAR-10. In addition, we analyze how the spatial similarity of the training data influences the low-rank structure of the kernels. The experimental results show that the CNN can be significantly compressed even if the kernels are randomly shuffled. Furthermore, the RsTD-based method yields more stable classification accuracy than the conventional TD-based methods in a large range of compression ratios.
In this paper, a novel circular and structural operator tracker (CSOT) is proposed for high performance visual tracking, it not only possesses the powerful discriminative capability of SOSVM but also efficiently inherits the superior computational efficiency of DCF. Based on the proposed circular and structural operators, a set of primal confidence score maps can be obtained by circular correlating feature maps with their corresponding structural correlation filters. Furthermore, an implicit interpolation is applied to convert the multi-resolution feature maps to the continuous domain and make all primal confidence score maps have the same spatial resolution. Then, we exploit an efficient ensemble post-processor based on relative entropy, which can coalesce primal confidence score maps and create an optimal confidence score map for more accurate localization. The target is localized on the peak of the optimal confidence score map. Besides, we introduce a collaborative optimization strategy to update circular and structural operators by iteratively training structural correlation filters, which significantly reduces computational complexity and improves robustness. Experimental results demonstrate that our approach achieves state-of-the-art performance in mean AUC scores of 71.5% and 69.4% on the OTB-2013 and OTB-2015 benchmarks respectively, and obtains a third-best expected average overlap (EAO) score of 29.8% on the VOT-2017 benchmark.
Low-rank tensor decomposition is a promising approach for analysis and understanding of real-world data. Many such analyses require correct recovery of the true latent factors, but the conditions of exact recovery are not known for many existing tensor decomposition methods. In this paper, we derive such conditions for a general class of tensor decomposition methods where each latent tensor component can be reshuffled into a low-rank matrix of arbitrary shape. The reshuffling operation generalizes the traditional unfolding operation, and provides flexibility to recover true latent factors of complex data-structures. We prove that exact recovery can be guaranteed by using a convex program when a type of incoherence measure is upper bounded. The results on image steganography show that our method obtains the state-of-the-art performance. The theoretical analysis in this paper is expected to be useful to derive similar results for other types of tensor-decomposition methods.
In tensor completion tasks, the traditional low-rank tensor decomposition models suffer from laborious model selection problem due to high model sensitivity. Especially for tensor ring (TR) decomposition, the number of model possibility grows exponentially with the tensor order, which makes it rather challenging to find the optimal TR decomposition. In this paper, by exploiting the low-rank structure on TR latent space, we propose a novel tensor completion method, which is robust to model selection. In contrast to imposing low-rank constraint on the data space, we introduce nuclear norm regularization on the latent TR factors, resulting in that the optimization step using singular value decomposition (SVD) can be performed at a much smaller scale. By leveraging the alternating direction method of multipliers (ADMM) scheme, the latent TR factors with optimal rank and the recovered tensor can be obtained simultaneously. Our proposed algorithm effectively alleviates the burden of TR-rank selection, therefore the computational cost is greatly reduced. The extensive experimental results on synthetic data and real-world data demonstrate the superior high performance and efficiency of the proposed approach against the state-of-the-art algorithms.
Compared with visible object tracking, thermal infrared (TIR) object tracking can track an arbitrary target in total darkness since it cannot be influenced by illumination variations. However, there are many unwanted attributes that constrain the potentials of TIR tracking, such as the absence of visual color patterns and low resolutions. Recently, structured output support vector machine (SOSVM) and discriminative correlation filter (DCF) have been successfully applied to visible object tracking, respectively. Motivated by these, in this paper, we propose a large margin structured convolution operator (LMSCO) to achieve efficient TIR object tracking. To improve the tracking performance, we employ the spatial regularization and implicit interpolation to obtain continuous deep feature maps, including deep appearance features and deep motion features, of the TIR targets. Finally, a collaborative optimization strategy is exploited to significantly update the operators. Our approach not only inherits the advantage of the strong discriminative capability of SOSVM but also achieves accurate and robust tracking with higher-dimensional features and more dense samples. To the best of our knowledge, we are the first to incorporate the advantages of DCF and SOSVM for TIR object tracking. Comprehensive evaluations on two thermal infrared tracking benchmarks, i.e. VOT-TIR2015 and VOT-TIR2016, clearly demonstrate that our LMSCO tracker achieves impressive results and outperforms most state-of-the-art trackers in terms of accuracy and robustness with sufficient frame rate.
High-dimensional data in many areas such as computer vision and machine learning tasks brings in computational and analytical difficulty. Feature selection which selects a subset from observed features is a widely used approach for improving performance and effectiveness of machine learning models with high-dimensional data. In this paper, we propose a novel AutoEncoder Feature Selector (AEFS) for unsupervised feature selection which combines autoencoder regression and group lasso tasks. Compared to traditional feature selection methods, AEFS can select the most important features by excavating both linear and nonlinear information among features, which is more flexible than the conventional self-representation method for unsupervised feature selection with only linear assumptions. Experimental results on benchmark dataset show that the proposed method is superior to the state-of-the-art method.
In this work, we consider the task of classifying the binary positive-unlabeled (PU) data. The existing discriminative learning based PU models attempt to seek an optimal re-weighting strategy for U data, so that a decent decision boundary can be found. In contrast, we provide a totally new paradigm to attack the binary PU task, from perspective of generative learning by leveraging the powerful generative adversarial networks (GANs). Our generative positive-unlabeled (GPU) learning model is devised to express P and N data distributions. It comprises of three discriminators and two generators with different roles, producing both positive and negative samples that resemble those come from the real training dataset. Even with rather limited labeled P data, our GPU framework is capable of capturing the underlying P and N data distribution with infinite realistic sample streams. In this way, an optimal classifier can be trained on those generated samples using a very deep neural networks (DNNs). Moreover, an useful variant of GPU is also introduced for semi-supervised classification.