AITopics

Regularization and normalization have become indispensable components in training deep neural networks, resulting in faster training and improved generalization performance. We propose the projected error function regularization loss (PER) that encourages activations to follow the standard normal distribution. PER randomly projects activations onto one-dimensional space and computes the regularization loss in the projected space. PER is similar to the Pseudo-Huber loss in the projected space, thus taking advantage of both $L^1$ and $L^2$ regularization losses. Besides, PER can capture the interaction between hidden units by projection vector drawn from a unit sphere. By doing so, PER minimizes the upper bound of the Wasserstein distance of order one between an empirical distribution of activations and the standard normal distribution. To the best of the authors' knowledge, this is the first work to regularize activations via distribution matching in the probability distribution space. We evaluate the proposed method on the image classification task and the word-level language modeling task.

activation, international conference, normalization, (13 more...)

2002.05366

Country: Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Horváth, Samuel, Lei, Lihua, Richtárik, Peter, Jordan, Michael I.

Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization

Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step-size schemes and batch sizes, in different regimes. Despite the appealing theoretical results, such divisive strategies provide little, if any, insight to practitioners to select algorithms that work broadly without tweaking the hyperparameters. In this work, blending the "geometrization" technique introduced by Lei & Jordan 2016 and the \texttt{SARAH} algorithm of Nguyen et al., 2017, we propose the Geometrized \texttt{SARAH} algorithm for non-convex finite-sum and stochastic optimization. Our algorithm is proved to achieve adaptivity to both the magnitude of the target accuracy and the Polyak-\L{}ojasiewicz (PL) constant if present. In addition, it achieves the best-available convergence rate for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.

algorithm, neural information processing system, optimization, (13 more...)

2002.05359

Country:

Asia > Middle East > Jordan (0.25)
North America > United States > California > Santa Clara County > Palo Alto (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)

Geom-GCN: Geometric Graph Convolutional Networks

Pei, Hongbin, Wei, Bingzhe, Chang, Kevin Chen-Chuan, Lei, Yu, Yang, Bo

Message-passing neural networks (MPNNs) have been successfully applied to representation learning on graphs in a variety of real-world applications. However, two fundamental weaknesses of MPNNs' aggregators limit their ability to represent graph-structured data: losing the structural information of nodes in neighborhoods and lacking the ability to capture long-range dependencies in disassortative graphs. Few studies have noticed the weaknesses from different perspectives. From the observations on classical neural network and network geometry, we propose a novel geometric aggregation scheme for graph neural networks to overcome the two weaknesses. The behind basic idea is the aggregation on a graph can benefit from a continuous space underlying the graph. The proposed aggregation scheme is permutation-invariant and consists of three modules, node embedding, structural neighborhood, and bi-level aggregation. We also present an implementation of the scheme in graph convolutional networks, termed Geom-GCN (Geometric Graph Convolutional Networks), to perform transductive learning on graphs. Experimental results show the proposed Geom-GCN achieved state-of-the-art performance on a wide range of open datasets of graphs. Code is available at https://github.com/graphdml-uiuc-jlu/geom-gcn.

graph, neighborhood, node, (16 more...)

2002.05287

Country:

North America > United States > Illinois (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Jilin Province (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Learning Stochastic Behaviour of Aggregate Data

Ma, Shaojun, Liu, Shu, Zha, Hongyuan, Zhou, Haomin

Learning nonlinear dynamics of aggregate data is a challenging problem since the full trajectory of each individual is not observable, namely, the individual observed at one time point may not be observed at next time point. One class of existing work investigate such dynamics by requiring complete longitudinal individual-level trajectories. However, in most of the practical applications, the requirement is unrealistic due to technical limitations, experimental costs and/or privacy issues. The other one class of methods learn the dynamics by regarding aggregate behaviour as a stochastic process with/without hidden variable. The performances of such methods may be restricted due to complex dynamics, high dimensions and computation costs. In this paper, we propose a new weak form based framework to study the hidden dynamics of aggregate data via Wasserstein generative adversarial network(WGAN) and Fokker Planck Equation(FPE). Our model fall into the second class of methods with simple structure and computation. We demonstrate our approach in the context of a series of synthetic and real-world datasets.

learning stochastic behaviour, time point, wasserstein distance, (12 more...)

2002.03513

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.54)
Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Hatamian, Faezeh Nejati, Ravikumar, Nishant, Vesal, Sulaiman, Kemeth, Felix P., Struck, Matthias, Maier, Andreas

The Effect of Data Augmentation on Classification of Atrial Fibrillation in Short Single-Lead ECG Signals Using Deep Neural Networks

Cardiovascular diseases are the most common cause of mortality worldwide. Detection of atrial fibrillation (AF) in the asymptomatic stage can help prevent strokes. It also improves clinical decision making through the delivery of suitable treatment such as, anticoagulant therapy, in a timely manner. The clinical significance of such early detection of AF in electrocardiogram (ECG) signals has inspired numerous studies in recent years, of which many aim to solve this task by leveraging machine learning algorithms. ECG datasets containing AF samples, however, usually suffer from severe class imbalance, which if unaccounted for, affects the performance of classification algorithms. Data augmentation is a popular solution to tackle this problem. In this study, we investigate the impact of various data augmentation algorithms, e.g., oversampling, Gaussian Mixture Models (GMMs) and Generative Adversarial Networks (GANs), on solving the class imbalance problem. These algorithms are quantitatively and qualitatively evaluated, compared and discussed in detail. The results show that deep learning-based AF signal classification methods benefit more from data augmentation using GANs and GMMs, than oversampling. Furthermore, the GAN results in circa $3\%$ better AF classification accuracy in average while performing comparably to the GMM in terms of f1-score.

algorithm, augmentation, ecg signal, (16 more...)

2002.0287

Country:

Europe > Germany (0.04)
Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)

Genre: Research Report > New Finding (0.55)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Over-the-Air Adversarial Attacks on Deep Learning Based Modulation Classifier over Wireless Channels

Kim, Brian, Sagduyu, Yalin E., Davaslioglu, Kemal, Erpek, Tugba, Ulukus, Sennur

We consider a wireless communication system that consists of a transmitter, a receiver, and an adversary. The transmitter transmits signals with different modulation types, while the receiver classifies its received signals to modulation types using a deep learning-based classifier. In the meantime, the adversary makes over-the-air transmissions that are received as superimposed with the transmitter's signals to fool the classifier at the receiver into making errors. While this evasion attack has received growing interest recently, the channel effects from the adversary to the receiver have been ignored so far such that the previous attack mechanisms cannot be applied under realistic channel effects. In this paper, we present how to launch a realistic evasion attack by considering channels from the adversary to the receiver. Our results show that modulation classification is vulnerable to an adversarial attack over a wireless channel that is modeled as Rayleigh fading with path loss and shadowing. We present various adversarial attacks with respect to availability of information about channel, transmitter input, and classifier architecture. First, we present two types of adversarial attacks, namely a targeted attack (with minimum power) and non-targeted attack that aims to change the classification to a target label or to any other label other than the true label, respectively. Both are white-box attacks that are transmitter input-specific and use channel information. Then we introduce an algorithm to generate adversarial attacks using limited channel information where the adversary only knows the channel distribution. Finally, we present a black-box universal adversarial perturbation (UAP) attack where the adversary has limited knowledge about both channel and transmitter input.

adversarial attack, adversary, receiver, (15 more...)

2002.024

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > Maryland > Montgomery County > Rockville (0.04)

Genre: Research Report > New Finding (0.54)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Simulation Pipeline for Traffic Evacuation in Urban Areas and Emergency Traffic Management Policy Improvements

Chen, Yu, Shafi, S. Yusef, Chen, Yi-fan

Traffic evacuation plays a critical role in saving lives in devastating disasters such as hurricanes, wildfires, floods, earthquakes, etc. An ability to evaluate evacuation plans in advance for these rare events, including identifying traffic flow bottlenecks, improving traffic management policies, and understanding the robustness of the traffic management policy are critical for emergency management. Given the rareness of such events and the corresponding lack of real data, traffic simulation provides a flexible and versatile approach for such scenarios, and furthermore allows dynamic interaction with the simulated evacuation. In this paper, we build a traffic simulation pipeline to explore the above problems, covering many aspects of evacuation, including map creation, demand generation, vehicle behavior, bottleneck identification, traffic management policy improvement, and results analysis. We apply the pipeline to two case studies in California. The first is Paradise, which was destroyed by a large wildfire in 2018 and experienced catastrophic traffic jams during the evacuation. The second is Mill Valley, which has high risk of wildfire and potential traffic issues since the city is situated in a narrow valley.

evacuation, traffic management policy, vehicle, (15 more...)

2002.06198

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > Butte County > Paradise (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Clark, Peter, Tafjord, Oyvind, Richardson, Kyle

Transformers as Soft Reasoners over Language

AI has long pursued the goal of having systems reason over *explicitly provided* knowledge, but building suitable representations has proved challenging. Here we explore whether transformers can similarly learn to reason (or emulate reasoning), but using rules expressed in language, thus bypassing a formal representation. We provide the first demonstration that this is possible, and characterize the extent of this capability. To do this, we use a collection of synthetic datasets that test increasing levels of reasoning complexity (number of rules, presence of negation, and depth of chaining). We find transformers appear to learn rule-based reasoning with high (99%) accuracy on these datasets, and in a way that generalizes to test data requiring substantially deeper chaining than in the training data (95%+ scores). We also demonstrate that the models transfer well to two hand-authored rulebases, and to rulebases paraphrased into more natural language. These findings are significant as it suggests a new role for transformers, namely as a limited "soft theorem prover" operating over explicit theories in language. This in turn suggests new possibilities for explainability, correctability, and counterfactual reasoning in question-answering. All datasets and a live demo are available at http://rule-reasoning.apps.allenai.org/

dataset, reasoning, transformer, (16 more...)

2002.05867

Country:

Europe > Germany > Brandenburg > Potsdam (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Pan, Yangchen, Mei, Jincheng, Farahmand, Amir-massoud

Frequency-based Search-control in Dyna

Model-based reinforcement learning has been empirically demonstrated as a successful strategy to improve sample efficiency. In particular, Dyna is an elegant model-based architecture integrating learning and planning that provides huge flexibility of using a model. One of the most important components in Dyna is called search-control, which refers to the process of generating state or state-action pairs from which we query the model to acquire simulated experiences. Search-control is critical in improving learning efficiency. In this work, we propose a simple and novel search-control strategy by searching high frequency regions of the value function. Our main intuition is built on Shannon sampling theorem from signal processing, which indicates that a high frequency signal requires more samples to reconstruct. We empirically show that a high frequency function is more difficult to approximate. This suggests a search-control strategy: we should use states from high frequency regions of the value function to query the model to acquire more samples. We develop a simple strategy to locally measure the frequency of a function by gradient and hessian norms, and provide theoretical justification for this approach. We then apply our strategy to search-control in Dyna, and conduct experiments to show its property and effectiveness on benchmark domains.

conference paper, international conference, value function, (14 more...)

2002.05822

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > Canada > Alberta (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Disentangling Overlapping Beliefs by Structured Matrix Factorization

Yang, Chaoqi, Li, Jinyang, Wang, Ruijie, Yao, Shuochao, Shao, Huajie, Liu, Dongxin, Liu, Shengzhong, Wang, Tianshi, Abdelzaher, Tarek F.

Much work on social media opinion polarization focuses on identifying separate or orthogonal beliefs from media traces, thereby missing points of agreement among different communities. This paper develops a new class of Non-negative Matrix Factorization (NMF) algorithms that allow identification of both agreement and disagreement points when beliefs of different communities partially overlap. Specifically, we propose a novel Belief Structured Matrix Factorization algorithm (BSMF) to identify partially overlapping beliefs in polarized public social media. BSMF is totally unsupervised and considers three types of information: (i) who posted which opinion, (ii) keyword-level message similarity, and (iii) empirically observed social dependency graphs (e.g., retweet graphs), to improve belief separation. In the space of unsupervised belief separation algorithms, the emphasis was mostly given to the problem of identifying disjoint (e.g., conflicting) beliefs. The case when individuals with different beliefs agree on some subset of points was less explored. We observe that social beliefs overlap even in polarized scenarios. Our proposed unsupervised algorithm captures both the latent belief intersections and dissimilarities. We discuss the properties of the algorithm and conduct extensive experiments on both synthetic data and real-world datasets. The results show that our model outperforms all compared baselines by a great margin.

algorithm, factorization, matrix, (16 more...)

2002.05797

Country:

Europe > Russia (0.14)
Asia > Russia (0.14)
Oceania > Australia (0.04)
(10 more...)

Genre:

Research Report (0.70)
Overview (0.46)

Industry:

Energy (1.00)
Government > Regional Government > Europe Government (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)