If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
The aim of this paper is to select the RBF neural network centers under concurrent faults. It is well known that fault tolerance is a very attractive property for neural network algorithms. And center selection is an important procedure during the training process of RBF neural network. In this paper, we will address these two issues simultaneously and devise two novel algorithms. Both of them are based on the framework of ADMM and utilize the technique of sparse approximation. For both two methods, we first define a fault tolerant objective function. After that, the first method introduces the MCP function (an approximate l0-norm function) and combine it with ADMM framework to select the RBF centers. While the second method utilize ADMM and IHT to solve the problem. The convergence of both two methods is proved. Simulation results show that the proposed algorithms are superior to many existing center selection algorithms under concurrent fault.
In the context of machine learning, disparate impact refers to a form of systematic discrimination whereby the output distribution of a model depends on the value of a sensitive attribute (e.g., race or gender). In this paper, we propose an information-theoretic framework to analyze the disparate impact of a binary classification model. We view the model as a fixed channel, and quantify disparate impact as the divergence in output distributions over two groups. Our aim is to find a correction function that can perturb the input distributions of each group to align their output distributions. We present an optimization problem that can be solved to obtain a correction function that will make the output distributions statistically indistinguishable. We derive closed-form expressions to efficiently compute the correction function, and demonstrate the benefits of our framework on a recidivism prediction problem based on the ProPublica COMPAS dataset.
Wang, Wei (Institute of Software, Chinese Academy of Sciences) | Wang, Hao (360 Search Lab, Qihoo 360) | Ran, Zhi-Yong (Chongqing University of Posts and Telecommunications) | He, Ran (Institute of Automation, Chinese Academy of Sciences)
Cross-domain data reconstruction methods derive a shared transformation across source and target domains. These methods usually make a specific assumption on noise, which exhibits limited ability when the target data are contaminated by different kinds of complex noise in practice. To enhance the robustness of domain adaptation under severe noise conditions, this paper proposes a novel reconstruction based algorithm in an information-theoretic setting. Specifically, benefiting from the theoretical property of correntropy, the proposed algorithm is distinguished with: detecting the contaminated target samples without making any specific assumption on noise; greatly suppressing the negative influence of noise on cross-domain transformation. Moreover, a relative entropy based regularization of the transformation is incorporated to avoid trivial solutions with the reaped theoretic advantages, i.e., non-negativity and scale-invariance. For optimization, a half-quadratic technique is developed to minimize the non-convex information-theoretic objectives with explicitly guaranteed convergence. Experiments on two real-world domain adaptation tasks demonstrate the superiority of our method.
In recent years, RTB(Real Time Bidding) becomes a popular online advertisement trading method. During the auction, each DSP(Demand Side Platform) is supposed to evaluate current opportunity and respond with an ad and corresponding bid price. It's essential for DSP to find an optimal ad selection and bid price determination strategy which maximizes revenue or performance under budget and ROI(Return On Investment) constraints in P4P(Pay For Performance) or P4U(Pay For Usage) mode. We solve this problem by 1) formalizing the DSP problem as a constrained optimization problem, 2) proposing the augmented MMKP(Multi-choice Multi-dimensional Knapsack Problem) with general solution, 3) and demonstrating the DSP problem is a special case of the augmented MMKP and deriving specialized strategy. Our strategy is verified through simulation and outperforms state-of-the-art strategies in real application. To the best of our knowledge, our solution is the first dual based DSP bidding framework that is derived from strict second price auction assumption and generally applicable to the multiple ads scenario with various objectives and constraints.
Wang, Wei (Institute of Software, Chinese Academy of Sciences) | Wang, Hao (Institute of Software, Chinese Academy of Sciences) | Zhang, Chen (Institute of Software, Chinese Academy of Sciences) | Gao, Yang (Institute of Software, Chinese Academy of Sciences)
As a fundamental constituent of machine learning, domain adaptation generalizes a learning model from a source domain to a different (but related) target domain. In this paper, we focus on semi-supervised domain adaptation and explicitly extend the applied range of unlabeled target samples into the combination of distribution alignment and adaptive classifier learning. Specifically, our extension formulates the following aspects in a single optimization: 1) learning a cross-domain predictive model by developing the Fredholm integral based kernel prediction framework; 2) reducing the distribution difference between two domains; 3) exploring multiple kernels to induce an optimal learning space. Correspondingly, such an extension is distinguished with allowing for noise resiliency, facilitating knowledge transfer and analyzing diverse data characteristics. It is emphasized that we prove the differentiability of our formulation and present an effective optimization procedure based on the reduced gradient, guaranteeing rapid convergence. Comprehensive empirical studies verify the effectiveness of the proposed method.
Link prediction is a fundamental task in such areas as social network analysis, information retrieval, and bioinformatics. Usually link prediction methods use the link structures or node attributes as the sources of information. Recently, the relational topic model (RTM) and its variants have been proposed as hybrid methods that jointly model both sources of information and achieve very promising accuracy. However, the representations (features) learned by them are still not effective enough to represent the nodes (items). To address this problem, we generalize recent advances in deep learning from solely modeling i.i.d. sequences of attributes to jointly modeling graphs and non-i.i.d. sequences of attributes. Specifically, we follow the Bayesian deep learning framework and devise a hierarchical Bayesian model, called relational deep learning (RDL), to jointly model high-dimensional node attributes and link structures with layers of latent variables. Due to the multiple nonlinear transformations in RDL, standard variational inference is not applicable. We propose to utilize the product of Gaussians (PoG) structure in RDL to relate the inferences on different variables and derive a generalized variational inference algorithm for learning the variables and predicting the links. Experiments on three real-world datasets show that RDL works surprisingly well and significantly outperforms the state of the art.
Wang, Hao, Yeung, Dit-Yan
While perception tasks such as visual object recognition and text understanding play an important role in human intelligence, the subsequent tasks that involve inference, reasoning and planning require an even higher level of intelligence. The past few years have seen major advances in many perception tasks using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and Bayesian models within a principled probabilistic framework, which we call Bayesian deep learning. In this unified framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in return, the feedback from the inference process is able to enhance the perception of text or images. This paper proposes a general framework for Bayesian deep learning and reviews its recent applications on recommender systems, topic models, and control. In this paper, we also discuss the relationship and differences between Bayesian deep learning and other related topics like Bayesian treatment of neural networks.
Yang, Shangdong (Nanjing University) | Gao, Yang (Nanjing University) | An, Bo (Nanyang Technological University) | Wang, Hao (Nanjing University) | Chen, Xingguo (Nanjing University of Posts and Telecommunications)
There are two classes of average reward reinforcement learning (RL) algorithms: model-based ones that explicitly maintain MDP models and model-free ones that do not learn such models. Though model-free algorithms are known to be more efficient, they often cannot converge to optimal policies due to the perturbation of parameters. In this paper, a novel model-free algorithm is proposed, which makes use of constant shifting values (CSVs) estimated from prior knowledge. To encourage exploration during the learning process, the algorithm constantly subtracts the CSV from the rewards. A terminating condition is proposed to handle the unboundedness of Q-values caused by such substraction. The convergence of the proposed algorithm is proved under very mild assumptions. Furthermore, linear function approximation is investigated to generalize our method to handle large-scale tasks. Extensive experiments on representative MDPs and the popular game Tetris show that the proposed algorithms significantly outperform the state-of-the-art ones.
Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recent advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art.
Tag recommendation has become one of the most important ways of organizing and indexing online resources like articles, movies, and music. Since tagging information is usually very sparse, effective learning of the content representation for these resources is crucial to accurate tag recommendation. Recently, models proposed for tag recommendation, such as collaborative topic regression and its variants, have demonstrated promising accuracy. However, a limitation of these models is that, by using topic models like latent Dirichlet allocation as the key component, the learned representation may not be compact and effective enough. Moreover, since relational data exist as an auxiliary data source in many applications, it is desirable to incorporate such data into tag recommendation models. In this paper, we start with a deep learning model called stacked denoising autoencoder (SDAE) in an attempt to learn more effective content representation. We propose a probabilistic formulation for SDAE and then extend it to a relational SDAE (RSDAE) model. RSDAE jointly performs deep representation learning and relational learning in a principled way under a probabilistic framework. Experiments conducted on three real datasets show that both learning more effective representation and learning from relational data are beneficial steps to take to advance the state of the art.