AITopics

Data poisoning attacks compromise the integrity of machine-learning models by introducing malicious training samples to influence the results during test time. In this work, we investigate backdoor data poisoning attack on deep neural networks (DNNs) by inserting a backdoor pattern in the training images. The resulting attack will misclassify poisoned test samples while maintaining high accuracies for the clean test-set. We present two approaches for detection of such poisoned samples by quantifying the uncertainty estimates associated with the trained models. In the first approach, we model the outputs of the various layers (deep features) with parametric probability distributions learnt from the clean held-out dataset. At inference, the likelihoods of deep features w.r.t these distributions are calculated to derive uncertainty estimates. In the second approach, we use Bayesian deep neural networks trained with mean-field variational inference to estimate model uncertainty associated with the predictions. The uncertainty estimates from these methods are used to discriminate clean from the poisoned samples.

data poisoning attack, dataset, poisoning attack, (15 more...)

1912.01206

Country:

North America > United States (0.05)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Nepal (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Mazhar, Othmane, Rojas, Cristian R., Fischione, Carlo, Hesamzadeh, Mohammad R.

Bayesian Model Selection for Change Point Detection and Clustering

We address the new problem of estimating a piece-wise constant signal with the purpose of detecting its change points and the levels of clusters. Our approach is to model it as a nonparametric penalized least square model selection on a family of models indexed over the collection of partitions of the design points and propose a computationally efficient algorithm to approximately solve it. Statistically, minimizing such a penalized criterion yields an approximation to the maximum a posteriori probability (MAP) estimator. The criterion is then analyzed and an oracle inequality is derived using a Gaussian concentration inequality. The oracle inequality is used to derive on one hand conditions for consistency and on the other hand an adaptive upper bound on the expected square risk of the estimator, which statistically motivates our approximation. Finally, we apply our algorithm to simulated data to experimentally validate the statistical guarantees and illustrate its behavior.

change point, mazhar author, nullnull, (15 more...)

1912.01308

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(5 more...)

Genre:

Research Report (0.63)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.82)

Rank Aggregation via Heterogeneous Thurstone Preference Models

Jin, Tao, Xu, Pan, Gu, Quanquan, Farnoud, Farzad

We propose the Heterogeneous Thurstone Model (HTM) for aggregating ranked data, which can take the accuracy levels of different users into account. By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users. Under this framework, we also propose a rank aggregation algorithm based on alternating gradient descent to estimate the underlying item scores and accuracy levels of different users simultaneously from noisy pairwise comparisons. We theoretically prove that the proposed algorithm converges linearly up to a statistical error which matches that of the state-of-the-art method for the single-user BTL model. We evaluate the proposed HTM model and algorithm on both synthetic and real data, demonstrating that it outperforms existing methods.

algorithm, crowdtcv 0, pairwise comparison, (15 more...)

1912.01211

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(17 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Communications > Social Media (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Li, Chang, Feng, Haoyun, de Rijke, Maarten

A Contextual-Bandit Approach to Online Learning to Rank for Relevance and Diversity

Online learning to rank (LTR) focuses on learning a policy from user interactions that builds a list of items sorted in decreasing order of the item utility. It is a core area in modern interactive systems, such as search engines, recommender systems, or conversational assistants. Previous online LTR approaches either assume the relevance of an item in the list to be independent of other items in the list or the relevance of an item to be a submodular function of the utility of the list. The former type of approach may result in a list of low diversity that has relevant items covering the same aspects, while the latter approaches may lead to a highly diversified list but with some non-relevant items. In this paper, we study an online LTR problem that considers both item relevance and topical diversity. We assume cascading user behavior, where a user browses the displayed list of items from top to bottom and clicks the first attractive item and stops browsing the rest. We propose a hybrid contextual bandit approach, called CascadeHybrid, for solving this problem. CascadeHybrid models item relevance and topical diversity using two independent functions and simultaneously learns those functions from user click feedback. We derive a gap-free bound on the n-step regret of CascadeHybrid. We conduct experiments to evaluate CascadeHybrid on the MovieLens and Yahoo music datasets. Our experimental results show that CascadeHybrid outperforms the baselines on both datasets.

attraction probability, cascadehybrid, diversity, (14 more...)

1912.00508

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.62)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.69)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.62)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningDec-2-2019

Flow Contrastive Estimation of Energy-Based Models

Gao, Ruiqi, Nijkamp, Erik, Kingma, Diederik P., Xu, Zhen, Dai, Andrew M., Wu, Ying Nian

This paper studies a training method to jointly estimate an energy-based model and a flow-based model, in which the two models are iteratively updated based on a shared adversarial value function. This joint training method has the following traits. (1) The update of the energy-based model is based on noise contrastive estimation, with the flow model serving as a strong noise distribution. (2) The update of the flow model approximately minimizes the Jensen-Shannon divergence between the flow model and the data distribution. (3) Unlike generative adversarial networks (GAN) which estimates an implicit probability distribution defined by a generator model, our method estimates two explicit probabilistic distributions on the data. Using the proposed method we demonstrate a significant improvement on the synthesis quality of the flow model, and show the effectiveness of unsupervised feature learning by the learned energy-based model. Furthermore, the proposed training method can be easily adapted to semi-supervised learning. We achieve competitive results to the state-of-the-art semi-supervised learning methods.

artificial intelligence, arxiv preprint arxiv, machine learning, (19 more...)

1912.00589

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Sheng, Tianhong, Sriperumbudur, Bharath K.

On Distance and Kernel Measures of Conditional Independence

arXiv.org Machine LearningDec-2-2019

Measuring conditional independence is one of the important tasks in statistical inference and is fundamental in causal discovery, feature selection, dimensionality reduction, Bayesian network learning, and others. In this work, we explore the connection between conditional independence measures induced by distances on a metric space and reproducing kernels associated with a reproducing kernel Hilbert space (RKHS). For certain distance and kernel pairs, we show the distance-based conditional independence measures to be equivalent to that of kernel-based measures. On the other hand, we also show that some popular---in machine learning---kernel conditional independence measures based on the Hilbert-Schmidt norm of a certain cross-conditional covariance operator, do not have a simple distance representation, except in some limiting cases. This paper, therefore, shows the distance and kernel measures of conditional independence to be not quite equivalent unlike in the case of joint independence as shown by Sejdinovic et al. (2013).

artificial intelligence, independence, machine learning, (17 more...)

1912.01103

Country:

North America > United States > New York (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Fitzsimons, Jack K, Schmon, Sebastian M, Roberts, Stephen J

Implicit Priors for Knowledge Sharing in Bayesian Neural Networks

arXiv.org Machine LearningDec-2-2019

Bayesian interpretations of neural network have a long history, dating back to early work in the 1990's and have recently regained attention because of their desirable properties like uncertainty estimation, model robustness and regularisation. We want to discuss here the application of Bayesian models to knowledge sharing between neural networks. Knowledge sharing comes in different facets, such as transfer learning, model distillation and shared embeddings. All of these tasks have in common that learned "features" ought to be shared across different networks. Theoretically rooted in the concepts of Bayesian neural networks this work has widespread application to general deep learning.

convolutional layer, gaussian process, neural network, (14 more...)

1912.00874

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Kitson, Neville Kenneth, Constantinou, Anthony C.

Learning Bayesian networks from demographic and health survey data

arXiv.org Artificial IntelligenceDec-2-2019

Child mortality from preventable diseases such as pneumonia and diarrhoea in low and middle-income countries remains a serious global challenge. We combine knowledge with available Demographic and Health Survey (DHS) data from India, to construct Bayesian Networks (BNs) and investigate the factors associated with childhood diarrhoea. We make use of freeware tools to learn the graphical structure of the DHS data with score-based, constraint-based, and hybrid structure learning algorithms. We investigate the effect of missing values, sample size, and knowledge-based constraints on each of the structure learning algorithms and assess their accuracy with multiple scoring functions. Weaknesses in the survey methodology and data available, as well as the variability in the BNs generated, mean that is not possible to learn a definitive causal BN from data. However, knowledge-based constraints are found to be useful in reducing the variation in the graphs produced by the different algorithms, and produce graphs which are more reflective of the likely influential relationships in the data. Furthermore, valuable insights are gained into the performance and characteristics of the structure learning algorithms. Two score-based algorithms in particular, TABU and FGES, demonstrate many desirable qualities; a) with sufficient data, they produce a graph which is similar to the reference graph, b) they are relatively insensitive to missing values, and c) behave well with knowledge-based constraints. The results provide a basis for further investigation of the DHS data and for a deeper understanding of the behaviour of the structure learning algorithms when applied to real-world settings.

algorithm, constraint, graph, (17 more...)

arXiv.org Artificial Intelligence

1912.00715

Country:

Asia > India (0.24)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(7 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Boyce, W. Paul, Lindsay, Tony, Zgonnikov, Arkady, Rano, Ignacio, Wong-Lin, KongFatt

Optimality and limitations of audio-visual integration for cognitive systems

arXiv.org Artificial IntelligenceDec-2-2019

Multimodal integration is an important process in perceptual decision-making. In humans, this process has often been shown to be statistically optimal, or near optimal: sensory information is combined in a fashion that minimises the average error in perceptual representation of stimuli. However, sometimes there are costs that come with the optimization, manifesting as illusory percepts. We review audio-visual facilitations and illusions that are products of multisensory integration, and the computational models that account for these phenomena. In particular, the same optimal computational model can lead to illusory percepts, and we suggest that more studies should be needed to detect and mitigate these illusions, as artefacts in artificial cognitive systems. We provide cautionary considerations when designing artificial cognitive systems with the view of avoiding such artefacts. Finally, we suggest avenues of research towards solutions to potential pitfalls in system design. We conclude that detailed understanding of multisensory integration and the mechanisms behind audio-visual illusions can benefit the design of artificial cognitive systems.

integration, perception, stimuli, (14 more...)

arXiv.org Artificial Intelligence

1912.00581

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > Northern Ireland > County Londonderry > Londonderry (0.04)
Asia > Japan (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

arXiv.org Machine LearningDec-1-2019

Factor Analysis on Citation, Using a Combined Latent and Logistic Regression Model

Suh, Namjoon, Huo, Xiaoming, Heim, Eric, Seversky, Lee

We propose a combined model, which integrates the latent factor model and the logistic regression model, for the citation network. It is noticed that neither a latent factor model nor a logistic regression model alone is sufficient to capture the structure of the data. The proposed model has a latent (i.e., factor analysis) model to represents the main technological trends (a.k.a., factors), and adds a sparse component that captures the remaining ad-hoc dependence. Parameter estimation is carried out through the construction of a joint-likelihood function of edges and properly chosen penalty terms. The convexity of the objective function allows us to develop an efficient algorithm, while the penalty terms push towards a low-dimensional latent component and a sparse graphical structure. Simulation results show that the proposed method works well in practical situations. The proposed method has been applied to a real application, which contains a citation network of statisticians (Ji and Jin, 2016). Some interesting findings are reported.

algorithm, denote, matrix, (16 more...)

1912.00524

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)