AITopics

The trace norm is widely used in multi-task learning as it can discover low-rank structures among tasks in terms of model parameters. Nowadays, with the emerging of big datasets and the popularity of deep learning techniques, tensor trace norms have been used for deep multi-task models. However, existing tensor trace norms cannot discover all the low-rank structures and they require users to manually determine the importance of their components. To solve those two issues together, in this paper, we propose a Generalized Tensor Trace Norm (GTTN). The GTTN is defined as a convex combination of matrix trace norms of all possible tensor flattenings and hence it can discover all the possible low-rank structures. In the induced objective function, we will learn combination coefficients in the GTTN to automatically determine the importance. Experiments on real-world datasets demonstrate the effectiveness of the proposed GTTN.

tensor, tensor trace norm, trace norm, (10 more...)

2002.04799

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > Scotland (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Carbone, Ginevra, Wicker, Matthew, Laurenti, Luca, Patane, Andrea, Bortolussi, Luca, Sanguinetti, Guido

Robustness of Bayesian Neural Networks to Gradient-Based Attacks

Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, the problem remains open. In this paper, we analyse the geometry of adversarial attacks in the large-data, overparametrized limit for Bayesian Neural Networks (BNNs). We show that, in the limit, vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution, i.e., when the data lies on a lower-dimensional submanifold of the ambient space. As a direct consequence, we demonstrate that in the limit BNN posteriors are robust to gradient-based adversarial attacks. Experimental results on the MNIST and Fashion MNIST datasets with BNNs trained with Hamiltonian Monte Carlo and Variational Inference support this line of argument, showing that BNNs can display both high accuracy and robustness to gradient based adversarial attacks.

arxiv preprint arxiv, gradient, robustness, (13 more...)

2002.04359

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Li, Zeqian, Mozer, Michael C., Whitehill, Jacob

Compositional Embeddings for Multi-Label One-Shot Learning

We explore the idea of compositional set embeddings that can be used to infer not just a single class per input (e.g., image, video, audio signal), but a collection of classes, in the setting of one-shot learning. Class compositionality is useful in tasks such as multi-object detection in images and multi-speaker diarization in audio. Specifically, we devise and implement two novel models consisting of (1) an embedding function f trained jointly with a "composite" function g that computes set union operations between the classes encoded in two embedding vectors; and (2) embedding f trained jointly with a "query" function h that computes whether the classes encoded in one embedding subsume the classes encoded in another embedding. In contrast to previously developed methods, these models must both determine the classes associated with the input examples and encode the relationships between different class label sets. In experiments conducted on simulated data, OmniGlot, LibriSpeech and Open Images datasets, the proposed composite embedding models outperform baselines based on traditional embedding methods.

accuracy, baseline, experiment, (16 more...)

2002.04193

Country: North America > United States > Massachusetts > Worcester County > Worcester (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Janizek, Joseph D., Sturmfels, Pascal, Lee, Su-In

Explaining Explanations: Axiomatic Feature Interactions for Deep Networks

Recent work has shown great promise in explaining neural network behavior. In particular, feature attribution methods explain which features were most important to a model's prediction on a given input. However, for many tasks, simply knowing which features were important to a model's prediction may not provide enough insight to understand model behavior. The interactions between features within the model may better help us understand not only the model, but also why certain features are more important than others. In this work we present Integrated Hessians, an extension of Integrated Gradients that explains pairwise feature interactions in neural networks. Integrated Hessians overcomes several theoretical limitations of previous methods to explain interactions, and unlike such previous methods is not limited to a specific architecture or class of neural network. We apply Integrated Hessians on a variety of neural networks trained on language data, biological data, astronomy data, and medical data and gain new insight into model behavior in each domain. Code available at https://github.com/suinleelab/path_explain

interaction, movie, neural network, (13 more...)

2002.04138

Country:

Europe > Italy > Marche > Ancona Province > Ancona (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
(5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized Hamiltonian Monte Carlo

Lee, Yin Tat, Shen, Ruoqi, Tian, Kevin

We show that the gradient norm $\|\nabla f(x)\|$ for $x \sim \exp(-f(x))$, where $f$ is strongly convex and smooth, concentrates tightly around its mean. This removes a barrier in the prior state-of-the-art analysis for the well-studied Metropolized Hamiltonian Monte Carlo (HMC) algorithm for sampling from a strongly logconcave distribution. We correspondingly demonstrate that Metropolized HMC mixes in $\tilde{O}(\kappa d)$ iterations, improving upon the $\tilde{O}(\kappa^{1.5}\sqrt{d} + \kappa d)$ runtime of (Dwivedi et. al. '18, Chen et. al. '19) by a factor $(\kappa/d)^{1/2}$ when the condition number $\kappa$ is large. Our mixing time analysis introduces several techniques which to our knowledge have not appeared in the literature and may be of independent interest, including restrictions to a nonconvex set with good conductance behavior, and a new reduction technique for boosting a constant-accuracy total variation guarantee under weak warmness assumptions. This is the first mixing time result for logconcave distributions using only first-order function information which achieves linear dependence on $\kappa$; we also give evidence that this dependence is likely to be necessary for standard Metropolized first-order methods.

algorithm, exp, inequality, (15 more...)

2002.04121

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Multimodal active speaker detection and virtual cinematography for video conferencing

Cutler, Ross, Mehran, Ramin, Johnson, Sam, Zhang, Cha, Kirk, Adam, Whyte, Oliver, Kowdle, Adarsh

Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting and zooming of a video conferencing camera: users subjectively rate an expert video cinematographer's video significantly higher than unedited video. We describe a new automated ASD and VC that performs within 0.3 MOS of an expert cinematographer based on subjective ratings with a 1-5 scale. This system uses a 4K wide-FOV camera, a depth camera, and a microphone array; it extracts features from each modality and trains an ASD using an AdaBoost machine learning system that is very efficient and runs in real-time. A VC is similarly trained using machine learning to optimize the subjective quality of the overall experience. To avoid distracting the room participants and reduce switching latency the system has no moving parts -- the VC works by cropping and zooming the 4K wide-FOV video stream. The system was tuned and evaluated using extensive crowdsourcing techniques and evaluated on a dataset with N=100 meetings, each 2-5 minutes in length.

asd and vc, speaker detection, video, (10 more...)

2002.03977

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Washington > King County > Seattle (0.05)
North America > United States > Washington > King County > Redmond (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Communications > Collaboration (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Communications > Social Media > Crowdsourcing (0.35)

AnomalyDAE: Dual autoencoder for anomaly detection on attributed networks

Fan, Haoyi, Zhang, Fengbin, Li, Zuoyong

Anomaly detection on attributed networks aims at finding nodes whose patterns deviate significantly from the majority of reference nodes, which is pervasive in many applications such as network intrusion detection and social spammer detection. However, most existing methods neglect the complex cross-modality interactions between network structure and node attribute. In this paper, we propose a deep joint representation learning framework for anomaly detection through a dual autoencoder (AnomalyDAE), which captures the complex interactions between network structure and node attribute for high-quality embeddings. Specifically, AnomalyDAE consists of a structure autoencoder and an attribute autoencoder to learn both node embedding and attribute embedding jointly in latent space. Moreover, attention mechanism is employed in structure encoder to learn the importance between a node and its neighbors for an effective capturing of structure pattern, which is important to anomaly detection. Besides, by taking both the node embedding and attribute embedding as inputs of attribute decoder, the cross-modality interactions between network structure and node attribute are learned during the reconstruction of node attribute. Finally, anomalies can be detected by measuring the reconstruction errors of nodes from both the structure and attribute perspectives. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method.

anomaly detection, network structure, node, (13 more...)

2002.03665

Country:

North America > United States (0.14)
Asia > China > Heilongjiang Province > Harbin (0.05)
Asia > China > Fujian Province > Fuzhou (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (0.51)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Liu, Zhaoqiang, Gomes, Selwyn, Tiwari, Avtansh, Scarlett, Jonathan

Sample Complexity Bounds for 1-bit Compressive Sensing and Binary Stable Embeddings with Generative Priors

The goal of standard 1-bit compressive sensing is to accurately recover an unknown sparse vector from binary-valued measurements, each indicating the sign of a linear function of the vector. Motivated by recent advances in compressive sensing with generative models, where a generative modeling assumption replaces the usual sparsity assumption, we study the problem of 1-bit compressive sensing with generative models. We first consider noiseless 1-bit measurements, and provide sample complexity bounds for approximate recovery under i.i.d.~Gaussian measurements and a Lipschitz continuous generative prior, as well as a near-matching algorithm-independent lower bound. Moreover, we demonstrate that the Binary $\epsilon$-Stable Embedding property, which characterizes the robustness of the reconstruction to measurement errors and noise, also holds for 1-bit compressive sensing with Lipschitz continuous generative models with sufficiently many Gaussian measurements. In addition, we apply our results to neural network generative models, and provide a proof-of-concept numerical experiment demonstrating significant improvements over sparsity-based approaches.

generative model, probability, vector, (14 more...)

2002.01697

Country:

Asia > Singapore (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Structural Deep Clustering Network

Bo, Deyu, Wang, Xiao, Shi, Chuan, Zhu, Meiqi, Lu, Emiao, Cui, Peng

Clustering is a fundamental task in data analysis. Recently, deep clustering, which derives inspiration primarily from deep learning approaches, achieves state-of-the-art performance and has attracted considerable attention. Current deep clustering methods usually boost the clustering results by means of the powerful representation ability of deep learning, e.g., autoencoder, suggesting that learning an effective representation for clustering is a crucial requirement. The strength of deep clustering methods is to extract the useful representations from the data itself, rather than the structure of data, which receives scarce attention in representation learning. Motivated by the great success of Graph Convolutional Network (GCN) in encoding the graph structure, we propose a Structural Deep Clustering Network (SDCN) to integrate the structural information into deep clustering. Specifically, we design a delivery operator to transfer the representations learned by autoencoder to the corresponding GCN layer, and a dual self-supervised mechanism to unify these two different deep neural architectures and guide the update of the whole model. In this way, the multiple structures of data, from low-order to high-order, are naturally combined with the multiple representations learned by autoencoder. Furthermore, we theoretically analyze the delivery operator, i.e., with the delivery operator, GCN improves the autoencoder-specific representation as a high-order graph regularization constraint and autoencoder helps alleviate the over-smoothing problem in GCN. Through comprehensive experiments, we demonstrate that our propose model can consistently perform better over the state-of-the-art techniques.

autoencoder, information, representation, (15 more...)

doi: 10.1145/3366423.3380214

2002.01633

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.05)
Asia > China > Beijing > Beijing (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Maia, José E. B., Figueredo, Levi P.

Cooperative Observation of Targets moving over a Planar Graph with Prediction of Positions

arXiv.org Artificial IntelligenceFeb-12-2020

Consider a team with two types of agents: targets and observers. Observers are aerial UAVs that observe targets moving on land with their movements restricted to the paths that form a planar graph on the surface. Observers have limited range of vision and targets do not avoid observers. The objective is to maximize the integral of the number of targets observed in the observation interval. Taking advantage of the fact that the future positions of targets in the short term are predictable, we show in this article a modified hill climbing algorithm that surpasses its previous versions in this new setting of the CTO problem.

algorithm, graph, observer, (16 more...)

arXiv.org Artificial Intelligence

2002.05294

Country:

South America > Brazil > Ceará > Fortaleza (0.04)
North America > United States > Hawaii (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > Puerto Rico > Cataño > Cataño (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)