Banff
Deep State Space Models for Nonlinear System Identification
Gedon, Daniel, Wahlström, Niklas, Schön, Thomas B., Ljung, Lennart
An actively evolving model class for generative temporal models developed in the deep learning community are deep state space models (SSMs) which have a close connection to classic SSMs. In this work six new deep SSMs are implemented and evaluated for the identification of established nonlinear dynamic system benchmarks. The models and their parameter learning algorithms are elaborated rigorously. The usage of deep SSMs as a black-box identification model can describe a wide range of dynamics due to the flexibility of deep neural networks. Additionally, the uncertainty of the system is modelled and therefore one obtains a much richer representation and a whole class of systems to describe the underlying dynamics.
A Thorough Comparison Study on Adversarial Attacks and Defenses for Common Thorax Disease Classification in Chest X-rays
Rao, Chendi, Cao, Jiezhang, Zeng, Runhao, Chen, Qi, Fu, Huazhu, Xu, Yanwu, Tan, Mingkui
Recently, deep neural networks (DNNs) have made great progress on automated diagnosis with chest X-rays images. However, DNNs are vulnerable to adversarial examples, which may cause misdiagnoses to patients when applying the DNN based methods in disease detection. Recently, there is few comprehensive studies exploring the influence of attack and defense methods on disease detection, especially for the multi-label classification problem. In this paper, we aim to review various adversarial attack and defense methods on chest X-rays. First, the motivations and the mathematical representations of attack and defense methods are introduced in details. Second, we evaluate the influence of several state-of-the-art attack and defense methods for common thorax disease classification in chest X-rays. We found that the attack and defense methods have poor performance with excessive iterations and large perturbations. To address this, we propose a new defense method that is robust to different degrees of perturbations. This study could provide new insights into methodological development for the community.
On the Integration of LinguisticFeatures into Statistical and Neural Machine Translation
New machine translations (MT) technologies are emerging rapidly and with them, bold claims of achieving human parity such as: (i) the results produced approach "accuracy achieved by average bilingual human translators" (Wu et al., 2017b) or (ii) the "translation quality is at human parity when compared to professional human translators" (Hassan et al., 2018) have seen the light of day (Laubli et al., 2018). Aside from the fact that many of these papers craft their own definition of human parity, these sensational claims are often not supported by a complete analysis of all aspects involved in translation. Establishing the discrepancies between the strengths of statistical approaches to MT and the way humans translate has been the starting point of our research. By looking at MT output and linguistic theory, we were able to identify some remaining issues. The problems range from simple number and gender agreement errors to more complex phenomena such as the correct translation of aspectual values and tenses. Our experiments confirm, along with other studies (Bentivogli et al., 2016), that neural MT has surpassed statistical MT in many aspects. However, some problems remain and others have emerged. We cover a series of problems related to the integration of specific linguistic features into statistical and neural MT, aiming to analyse and provide a solution to some of them. Our work focuses on addressing three main research questions that revolve around the complex relationship between linguistics and MT in general. We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations and integrate additional features into the existing pipelines. We identify overgeneralization or 'algorithmic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.
Stochastic Flows and Geometric Optimization on the Orthogonal Group
Choromanski, Krzysztof, Cheikhi, David, Davis, Jared, Likhosherstov, Valerii, Nazaret, Achille, Bahamou, Achraf, Song, Xingyou, Akarte, Mrugank, Parker-Holder, Jack, Bergquist, Jacob, Gao, Yuan, Pacchiano, Aldo, Sarlos, Tamas, Weller, Adrian, Sindhwani, Vikas
We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$. We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, normalizing flows and metric learning. We show an intriguing connection between efficient stochastic optimization on the orthogonal group and graph theory (e.g. matching problem, partition functions over graphs, graph-coloring). We leverage the theory of Lie groups and provide theoretical results for the designed class of algorithms. We demonstrate broad applicability of our methods by showing strong performance on the seemingly unrelated tasks of learning world models to obtain stable policies for the most difficult $\mathrm{Humanoid}$ agent from $\mathrm{OpenAI}$ $\mathrm{Gym}$ and improving convolutional neural networks.
A Hybrid Residual Dilated LSTM end Exponential Smoothing Model for Mid-Term Electric Load Forecasting
Dudek, Grzegorz, Pełka, Paweł, Smyl, Slawek
This work presents a hybrid and hierarchical deep learning model for mid-term load forecasting. The model combines exponential smoothing (ETS), advanced Long Short-Term Memory (LSTM) and ensembling. ETS extracts dynamically the main components of each individual time series and enables the model to learn their representation. Multi-layer LSTM is equipped with dilated recurrent skip connections and a spatial shortcut path from lower layers to allow the model to better capture long-term seasonal relationships and ensure more efficient training. A common learning procedure for LSTM and ETS, with a penalized pinball loss, leads to simultaneous optimization of data representation and forecasting performance. In addition, ensembling at three levels ensures a powerful regularization. A simulation study performed on the monthly electricity demand time series for 35 European countries confirmed the high performance of the proposed model and its competitiveness with classical models such as ARIMA and ETS as well as state-of-the-art models based on machine learning.
Deep Learning on Knowledge Graph for Recommender System: A Survey
Gao, Yang, Li, Yi-Fan, Lin, Yu, Gao, Hang, Khan, Latifur
Recent advances in research have demonstrated the effectiveness of knowledge graphs (KG) in providing valuable external knowledge to improve recommendation systems (RS). A knowledge graph is capable of encoding high-order relations that connect two objects with one or multiple related attributes. With the help of the emerging Graph Neural Networks (GNN), it is possible to extract both object characteristics and relations from KG, which is an essential factor for successful recommendations. In this paper, we provide a comprehensive survey of the GNN-based knowledge-aware deep recommender systems. Specifically, we discuss the state-of-the-art frameworks with a focus on their core component, i.e., the graph embedding module, and how they address practical recommendation issues such as scalability, cold-start and so on. We further summarize the commonly-used benchmark datasets, evaluation metrics as well as open-source codes. Finally, we conclude the survey and propose potential research directions in this rapidly growing field.
Defense Through Diverse Directions
Bender, Christopher M., Li, Yang, Shi, Yifeng, Reiter, Michael K., Oliva, Junier B.
In this work we develop a novel Bayesian neural network methodology to achieve strong adversarial robustness without the need for online adversarial training. Unlike previous efforts in this direction, we do not rely solely on the stochasticity of network weights by minimizing the divergence between the learned parameter distribution and a prior. Instead, we additionally require that the model maintain some expected uncertainty with respect to all input covariates. We demonstrate that by encouraging the network to distribute evenly across inputs, the network becomes less susceptible to localized, brittle features which imparts a natural robustness to targeted perturbations. We show empirical robustness on several benchmark datasets.
Graph Neural Networks for Decentralized Controllers
Gama, Fernando, Tolstaya, Ekaterina, Ribeiro, Alejandro
Dynamical systems comprised of autonomous agents arise in many relevant problems such as multi-agent robotics, smart grids, or smart cities. Controlling these systems is of paramount importance to guarantee a successful deployment. Optimal centralized controllers are readily available but face limitations in terms of scalability and practical implementation. Optimal decentralized controllers, on the other hand, are difficult to find. In this paper, we use graph neural networks (GNNs) to learn decentralized controllers from data. GNNs are well-suited for the task since they are naturally distributed architectures. Furthermore, they are equivariant and stable, leading to good scalability and transferability properties. The problem of flocking is explored to illustrate the power of GNNs in learning decentralized controllers.
Accelerating Deep Reinforcement Learning With the Aid of a Partial Model: Power-Efficient Predictive Video Streaming
Liu, Dong, Zhao, Jianyu, Yang, Chenyang, Hanzo, Lajos
Predictive power allocation is conceived for power-efficient video streaming over mobile networks using deep reinforcement learning. The goal is to minimize the accumulated energy consumption over a complete video streaming session for a mobile user under the quality of service constraint that avoids video playback interruptions. To handle the continuous state and action spaces, we resort to deep deterministic policy gradient (DDPG) algorithm for solving the formulated problem. In contrast to previous predictive resource policies that first predict future information with historical data and then optimize the policy based on the predicted information, the proposed policy operates in an online and end-to-end manner. By judiciously designing the action and state that only depend on slowly-varying average channel gains, the signaling overhead between the edge server and the base stations can be reduced, and the dynamics of the system can be learned effortlessly. To improve the robustness of streaming and accelerate learning, we further exploit the partially known dynamics of the system by integrating the concepts of safer layer, post-decision state, and virtual experience into the basic DDPG algorithm. Our simulation results show that the proposed polices converge to the optimal policy derived based on perfect prediction of the future large-scale channel gains and outperforms the first-predictthen-optimize policy in the presence of prediction errors. By harnessing the partially known model of the system dynamics, the convergence speed can be dramatically improved. I. INTRODUCTION Mobile video traffic is expected to account for more than 75% of the global mobile data by 2021, and video-on-demand (VoD) services represent the main contributor [2]. This paper was presented in part at IEEE Globecom 2019 [1]. To avoid video stalling for a user experiencing hostile channel conditions, a base station (BS) can increase its transmit power for ensuring that the video segment is downloaded before being played.
Vulnerabilities of Connectionist AI Applications: Evaluation and Defence
Berghoff, Christian, Neu, Matthias, von Twickel, Arndt
This article deals with the IT security of connectionist artificial intelligence (AI) applications, focusing on threats to integrity, one of the three IT security goals. Such threats are for instance most relevant in prominent AI computer vision applications. In order to present a holistic view on the IT security goal integrity, many additional aspects such as interpretability, robustness and documentation are taken into account. A comprehensive list of threats and possible mitigations is presented by reviewing the state-of-the-art literature. AI-specific vulnerabilities such as adversarial attacks and poisoning attacks as well as their AI-specific root causes are discussed in detail. Additionally and in contrast to former reviews, the whole AI supply chain is analysed with respect to vulnerabilities, including the planning, data acquisition, training, evaluation and operation phases. The discussion of mitigations is likewise not restricted to the level of the AI system itself but rather advocates viewing AI systems in the context of their supply chains and their embeddings in larger IT infrastructures and hardware devices. Based on this and the observation that adaptive attackers may circumvent any single published AI-specific defence to date, the article concludes that single protective measures are not sufficient but rather multiple measures on different levels have to be combined to achieve a minimum level of IT security for AI applications.