If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
Person re-identification (re-ID) tasks aim to identify the same person in multiple images captured from non-overlapping camera views. Most previous re-ID studies have attempted to solve this problem through either representation learning or metric learning, or by combining both techniques. Representation learning relies on the latent factors or attributes of the data. In most of these works, the dimensionality of the factors/attributes has to be manually determined for each new dataset. Thus, this approach is not robust. Metric learning optimizes a metric across the dataset to measure similarity according to distance. However, choosing the optimal method for computing these distances is data dependent, and learning the appropriate metric relies on a sufficient number of pair-wise labels. To overcome these limitations, we propose a novel algorithm for person re-ID, called semi-supervised Bayesian attribute learning. We introduce an Indian Buffet Process to identify the priors of the latent attributes. The dimensionality of attributes factors is then automatically determined by nonparametric Bayesian learning. Meanwhile, unlike traditional distance metric learning, we propose a re-identification probability distribution to describe how likely it is that a pair of images contains the same person. This technique relies solely on the latent attributes of both images. Moreover, pair-wise labels that are not known can be estimated from pair-wise labels that are known, making this a robust approach for semi-supervised learning. Extensive experiments demonstrate the superior performance of our algorithm over several state-of-the-art algorithms on small-scale datasets and comparable performance on large-scale re-ID datasets.
Liu, Meng (Peking University) | Xu, Chang (University of Technology, Sydney) | Luo, Yong (Nanyang Technological University) | Xu, Chao (Peking University) | Wen, Yonggang (Nanyang Technological University) | Tao, Dacheng (University of Technology, Sydney)
Feature selection aims to select a small subset from the high-dimensional features which can lead to better learning performance, lower computational complexity, and better model readability. The class imbalance problem has been neglected by traditional feature selection methods, therefore the selected features will be biased towards the majority classes. Because of the superiority of F-measure to accuracy for imbalanced data, we propose to use F-measure as the performance measure for feature selection algorithms. As a pseudo-linear function, the optimization of F-measure can be achieved by minimizing the total costs. In this paper, we present a novel cost-sensitive feature selection (CSFS) method which optimizes F-measure instead of accuracy to take class imbalance issue into account. The features will be selected according to optimal F-measure classifier after solving a series of cost-sensitive feature selection sub-problems. The features selected by our method will fully represent the characteristics of not only majority classes, but also minority classes. Extensive experimental results conducted on synthetic, multi-class and multi-label datasets validate the efficiency and significance of our feature selection method.
Verma, Sunny (University Of Technology, Sydney) | Liu, Wei (University of Technology, Sydney) | Wang, Chen (Commonwealth Scientific and Industrial Research Organisation) | Zhu, Liming (Commonwealth Scientific and Industrial Research Organisation)
Real world data is usually generated over multiple time periods associated with multiple labels, which can be represented as multiple labeled tensor sequences. These sequences are linked together, sharing some common features while exhibiting their own unique features. Conventional tensor factorization techniques are limited to extract either common or unique features, but not both simultaneously. However, both types of these features are important in many machine learning systems as they inherently affect the systems' performance. In this paper, we propose a novel supervised tensor factorization technique which simultaneously extracts ordered common and unique features. Classification results using features extracted by our method on CIFAR-10 database achieves significantly better performance over other factorization methods, illustrating the effectiveness of the proposed technique.
WiFi-SLAM aims to map WiFi signals within an unknown environment while simultaneously determining the location of a mobile device. This localization method has been extensively used in indoor, space, undersea, and underground environments. For the sake of accuracy, most methods label the signal readings against ground truth locations. However, this is impractical in large environments, where it is hard to collect and maintain the data. Some methods use latent variable models to generate latent-space locations of signal strength data, an advantage being that no prior labeling of signal strength readings and their physical locations is required. However, the generated latent variables cannot cover all wireless signal locations and WiFi-SLAM performance is significantly degraded. Here we propose the diversified generative latent variable model (DGLVM) to overcome these limitations. By building a positive-definite kernel function, a diversity-encouraging prior is introduced to render the generated latent variables non-overlapping, thus capturing more wireless signal measurements characteristics. The defined objective function is then solved by variational inference. Our experiments illustrate that the method performs WiFi localization more accurately than other label-free methods.
In reinforcement learning (RL), reward shaping is used to show the desirable behavior by assigning positive or negative reward for learner’s preceding action. However, for reward shaping through human-generated rewards, an important aspect is to make it approachable to humans. Typically, a human teacher’s role requires being watchful of agent’s action to assign judgmental feedback based on prior knowledge. It can be a mentally tough and unpleasant exercise especially for lengthy teaching sessions. We present a method, Shaping from Interactive Demonstrations (SfID), which instead of judgmental reward takes action label from human. Therefore, it simplifies the teacher’s role to demonstrating the action to select from a state. We compare SfID with a standard reward shaping approach on Sokoban domain. The results show the competitiveness of SfID with the standard reward shaping.
Zhou, Chuan (Institute of Information Engineering, Chinese Academy of Sciences) | Lu, Wei-Xue (Academy of Mathematics and Systems Science, Chinese Academy of Sciences) | Zhang, Peng (University of Technology, Sydney) | Wu, Jia (Centre for Quantum Computation &) | Hu, Yue (Intelligent Systems, University of Technology, Sydney) | Guo, Li (Institute of Information Engineering, Chinese Academy of Sciences)
In this paper we theoretically study the minimum Differentially Resolving Set (DRS) problem derived from the classical sensor placement optimization problem in network source locating. A DRS of a graph G = ( V, E ) is defined as a subset S ⊆ V where any two elements in V can be distinguished by their different differential characteristic sets defined on S. The minimum DRS problem aims to find a DRS S in the graph G with minimum total weight Σ v∈S w ( v ). In this paper we establish a group of Integer Linear Programming (ILP) models as the solution. By the weighted set cover theory, we propose an approximation algorithm with the Θ(ln n ) approximability for the minimum DRS problem on general graphs, where n is the graph size.
Matrix completion is to recover missing/unobserved values of a data matrix from very limited observations. Due to widely potential applications, it has received growing interests in fields from machine learning, data mining, to collaborative filtering and computer vision. To ensure the successful recovery of missing values, most existing matrix completion algorithms utilise the low-rank assumption, i.e., the fully observed data matrix has a low rank, or equivalently the columns of the matrix can be linearly represented by a few numbers of basis vectors. Although such low-rank assumption applies generally in practice, real-world data can process much richer structural information. In this paper, we present a new model for matrix completion, motivated by the separability assumption of nonnegative matrices from the recent literature of matrix factorisations: there exists a set of columns of the matrix such that the resting columns can be represented by their convex combinations. Given the separability property, which holds reasonably for many applications, our model provides a more accurate matrix completion than the low-rank based algorithms. Further, we derives a scalable algorithm to solve our matrix completion model, which utilises a randomised method to select the basis columns under the separability assumption and a coordinate gradient based method to automatically deal with the structural constraints in optimisation. Compared to the state-of-the-art algorithms, the proposed matrix completion model achieves competitive results on both synthetic and real datasets.
A cascade classifier has turned out to be effective insliding-window based real-time object detection. In acascade classifier, node learning is the key process,which includes feature selection and classifier design. Previous algorithms fail to effectively tackle the asymmetry and intersection problems existing in cascade classification, thereby limiting the performance of object detection. In this paper, we improve current feature selection algorithm by addressing both asymmetry and intersection problems. We formulate asymmetric feature selection as a submodular function maximization problem. We then propose a new algorithm SAFS with formal performance guarantee to solve this problem.We use face detection as a case study and perform experiments on two real-world face detection datasets. The experimental results demonstrate that our algorithm SAFS outperforms the state-of-art feature selection algorithms in cascade object detection, such as FFS and LACBoost.
Linear submodular bandits has been proven to be effective in solving the diversification and feature-based exploration problems in retrieval systems. Concurrently, many web-based applications, such as news article recommendation and online ad placement, can be modeled as budget-limited problems. However, the diversification problem under a budget constraint has not been considered. In this paper, we first introduce the budget constraint to linear submodular bandits as a new problem called the linear submodular bandits with a knapsack constraint. We then define an alpha-approximation unit-cost regret considering that submodular function maximization is NP-hard. To solve this problem, we propose two greedy algorithms based on a modified UCB rule. We then prove these two algorithms with different regret bounds and computational costs. We also conduct a number of experiments and the experimental results confirm our theoretical analyses.
Vanishing Component Analysis (VCA) is a recently proposed prominent work in machine learning. It narrows the gap between tools and computational algebra: the vanishing ideal and its applications to classification problem. In this paper, we will analyze VCA in the kernel view, which is also another important research direction in machine learning. Under a very weak assumption, we provide a different point of view to VCA and make the kernel trick on VCA become possible. We demonstrate that the projection matrix derived by VCA is located in the same space as that of Kernel Principal Component Analysis (KPCA) with a polynomial kernel. Two groups of projections can express each other by linear transformation. Furthermore, we prove that KPCA and VCA have identical discriminative power, provided that the ratio trace criteria is employed as the measurement. We also show that the kernel formulated by the inner products of VCA's projections can be expressed by the KPCA's kernel linearly. Based on the analysis above, we proposed a novel Discriminative Vanishing Component Analysis (DVCA) approach. Experimental results are provided for demonstration.